EXPERIENCE MAXIMUM INFERENCE THROUGHPUT
In the new era of AI and intelligent machines, deep learning is shaping our world like no
other computing model in history. GPUs powered by the revolutionary NVIDIA Pascal™
architecture provide the computational engine for the new era of artificial intelligence,
enabling amazing user experiences by accelerating deep learning applications at scale.
The NVIDIA Tesla P40 is purpose-built to deliver maximum throughput for deep learning
deployment. With 47 TOPS (Tera-Operations Per Second) of inference performance and
INT8 operations per GPU, a single server with 8 Tesla P40s delivers the performance of
over 140 CPU servers.
As models increase in accuracy and complexity, CPUs are no longer capable of delivering
interactive user experience. The Tesla P40 delivers over 30X lower latency than a CPU for
real-time responsiveness in even the most complex models.
Main Specifications | |
Product Series | Tesla P40 |
Core Type | NVIDIA CUDA |
Core Clock Speed | 1303MHz (1531Mhz Boost Clock) |
Host Interface | PCI Express 3.0 x16 |
GPU Architecture | Pascal |
Detailed Specifications | |
Streaming Processor Cores | 3840 CUDA Cores |
Memory Clock Speed | 7.2Gbps GDDR5 |
Memory Interface | 384-bit |
Max Memory Size | 24 GB GDDR5 |
Max Memory Bandwidth | 346 GB/s |
Peak Single Precision floating point performance (GFLOP) | 12 TeraFLOPS |
Peak INT8 Tensor Core | 47 TOPS (Tera- Operations per Second) |
NVIDIA CUDA™ Technology | Yes |
Enhanced Programmability with Page Migration Engine | Yes |
ECC Protection | Yes |
Server-Optimized for Data Center Deployment | Yes |
Hardware-Accelerated Video Engine | 1x Decode Engine, 2x Encode Engine |
Cooling | Passive |
Dual Slot | Yes |
Dimensions | 4.4” H x 10.5” L, Dual Slot, Full Height |
Max Graphics Card Power (W) | 250W |