ULTRA-EFFICIENT DEEP LEARNING IN SCALE-OUT SERVERS
In the new era of AI and intelligent machines, deep learning is shaping our world like no other computing model in history. Interactive speech, visual search, and video recommendations are a few of many AI-based services that we use every day.
Accuracy and responsiveness are key to user adoption for these services. As deep learning models increase in accuracy and complexity, CPUs are no longer capable of delivering a responsive user experience.
The NVIDIA Tesla P4 is powered by the revolutionary NVIDIA Pascal™ architecture and purpose-built to boost efficiency for scale-out servers running deep learning workloads, enabling smart responsive AI-based services. It slashes inference latency by 15X in any hyperscale infrastructure and provides an incredible 60X better energy efficiency than CPUs. This unlocks a new wave of AI services previous impossible due to latency limitations.
|Product Series||Tesla P4|
|Core Type||NVIDIA CUDA|
|Core Clock Speed||810MHz (1063MHz Boost Clock)|
|Host Interface||PCI Express 3.0 x16|
|Stream Cores||2560 CUDA Cores|
|Max Memory Size||8 GB GDDR5|
|Memory Clock Speed||6Gbps GDDR5|
|Max Memory Bandwidth||192 GB/s|
|Peak Single Precision floating point performance (GFLOP)||5.5 TeraFLOPS|
|Integer Operations (INT8)||22 TOPS (Tera- Operations per Second)|
|NVIDIA CUDA™ Technology||Yes|
|Enhanced Programmability with Page Migration Engine||Yes|
|Server-Optimized for Data Center Deployment||Yes|
|Hardware-Accelerated Video Engine||1x Decode Engine, 2x Encode Engine|
|Dimensions||Low-Profile PCI Express Form Factor|