In-Depth Comparison of NVIDIA Tesla “Pascal” GPU Accelerators

This article provides in-depth details of the NVIDIA Tesla P-series GPU accelerators (codenamed “Pascal”). “Pascal” GPUs improve upon the previous-generation “Kepler”, and “Maxwell” architectures. Pascal GPUs were announced at GTC 2016 and began shipping in September 2016. Note: these have since been superseded by the NVIDIA Volta GPU architecture.

Important changes available in the “Pascal” GPU architecture include:

  • Exceptional performance with up to 5.3 TFLOPS double- and 10.6 TFLOPS single-precision floating-point performance.
  • NVLink enables a 5X increase in bandwidth between Tesla Pascal GPUs and from GPUs to supported system CPUs (compared with PCI-E).
  • High-bandwidth HBM2 memory provides a 3X improvement in memory performance compared to Kepler and Maxwell GPUs.
  • Pascal Unified Memory allows GPU applications to directly access the memory of all GPUs as well as all of system memory (up to 512TB).
  • Up to 4MB L2 caches are available on Pascal GPUs (compared to 1.5MB on Kepler and 3MB on Maxwell).
  • Native ECC Memory detects and corrects memory errors without any capacity or performance overhead.
  • Energy-efficiency – Pascal GPUs deliver nearly twice the FLOPS per Watt as Kepler GPUs.
  • Efficient SM units – Pascal’s architecture doubles the number of registers per thread
  • Improved atomics in Pascal allow for an atomic add instruction in global memory (previous GPUs supported only shared memory atomics). Atomics can also be performed within the memory of other GPUs in the system.
  • Half-precision FP support improves performance for low-precision operations (frequently used in neural network training)
  • INT8 support improves performance for low-precision integer operations (frequently used in neural network inference)
  • Compute Preemption allows higher-priority tasks to interrupt currently-running tasks.

Tesla “Pascal” GPU Specifications

The table below summarizes the features of the available Tesla Pascal GPU Accelerators. To learn more about any of these products, or to find out how best to leverage their capabilities, please speak with an HPC expert.

Comparison between “Kepler”, “Maxwell”, and “Pascal” GPU Architectures

Feature Kepler GK210 Maxwell GM200 Maxwell GM204 Pascal GP100 Pascal GP102
Compute Capability3.
Threads per Warp32
Max Warps per SM64
Max Threads per SM2048
Max Thread Blocks per SM1632
Max Concurrent Kernels3212832
32-bit Registers per SM128 K64 K
Max Registers per Thread Block64 K
Max Registers per Thread255
Max Threads per Thread Block1024
L1 Cache Configurationsplit with shared memory24KB dedicated L1 cache
Shared Memory Configurations16KB + 112KB L1 Cache

32KB + 96KB L1 Cache

48KB + 80KB L1 Cache

(128KB total)
96KB dedicated64KB dedicated96KB dedicated
Max Shared Memory per Thread Block48KB
Max X Grid Dimension232-1
Dynamic ParallelismYes

For a complete listing of Compute Capabilities, reference the NVIDIA CUDA Documentation

Additional Tesla “Pascal” GPU products

NVIDIA has also released Tesla P4 GPUs. These GPUs are primarily for embedded and hyperscale deployments, and are not expected to be used in the HPC space.

Hardware-accelerated video encoding and decoding

All NVIDIA “Pascal” GPUs include one or more hardware units for video encoding and decoding (NVENC / NVDEC). For complete hardware details, reference NVIDIA’s encoder/decoder support matrix. To learn more about GPU-accelerated video encode/decode, see NVIDIA’s Video Codec SDK.

You May Also Like

  • Knowledge Center

    Common Maintenance Tasks (Clusters)

    The following items should be completed to maintain the health of your Linux cluster. For servers and workstations, please see Common Maintenance Tasks (Workstations and Servers). Backup non-replaceable data Remember that RAID is not a replacement for backups. If your system is stolen, hacked or started on fire, your data will be gone forever. Automate this…

  • Knowledge Center

    Detailed Specifications of the “Ice Lake SP” Intel Xeon Processor Scalable Family CPUs

    This article provides in-depth discussion and analysis of the 10nm Intel Xeon Processor Scalable Family (formerly codenamed “Ice Lake-SP” or “Ice Lake Scalable Processor”). These processors replace the previous 14nm “Cascade Lake-SP” microarchitecture and are available for sale as of April 6, 2021. The “Ice Lake SP” CPUs are the 3rd generation of Intel’s Xeon…

  • Knowledge Center

    Detailed Specifications of the AMD EPYC “Milan” CPUs

    This article provides in-depth discussion and analysis of the 7nm AMD EPYC processor (codenamed “Milan” and based on AMD’s Zen3 architecture). EPYC “Milan” processors replace the previous “Rome” processors and are available for sale as of March 15th, 2021. These new CPUs are the third iteration of AMD’s EPYC server processor family. They are compatible…