,

In-Depth Comparison of NVIDIA Tesla “Volta” GPU Accelerators

This article provides in-depth details of the NVIDIA Tesla V-series GPU accelerators (codenamed “Volta”). “Volta” GPUs improve upon the previous-generation “Pascal” architecture. Volta GPUs began shipping in September 2017 and were updated to 32GB of memory in March 2018; Tesla V100S was released in late 2019. Note: these have since been superseded by the NVIDIA Ampere GPU architecture.

This page is intended to be a fast and easy reference of key specs for these GPUs. You may wish to browse our Tesla V100 Price Analysis and Tesla V100 GPU Review for more extended discussion.

Important features available in the “Volta” GPU architecture include:

  • Exceptional HPC performance with up to 8.2 TFLOPS double- and 16.4 TFLOPS single-precision floating-point performance.
  • Deep Learning training performance with up to 130 TFLOPS FP16 half-precision floating-point performance.
  • Deep Learning inference performance with up to 62.8 TeraOPS INT8 8-bit integer performance.
  • Simultaneous execution of FP32 and INT32 operations improves the overall computational throughput of the GPU
  • NVLink enables an 8~10X increase in bandwidth between the Tesla GPUs and from GPUs to supported system CPUs (compared with PCI-E).
  • High-bandwidth HBM2 memory provides a 3X improvement in memory performance compared to previous-generation GPUs.
  • Enhanced Unified Memory allows GPU applications to directly access the memory of all GPUs as well as all of system memory (up to 512TB).
  • Native ECC Memory detects and corrects memory errors without any capacity or performance overhead.
  • Combined L1 Cache and Shared Memory provides additional flexibility and higher performance than Pascal.
  • Cooperative Groups – a new programming model introduced in CUDA 9 for organizing groups of communicating threads

Tesla “Volta” GPU Specifications

The table below summarizes the features of the available Tesla Volta GPU Accelerators. To learn more about these products, or to find out how best to leverage their capabilities, please speak with an HPC expert.

Comparison between “Kepler”, “Pascal”, and “Volta” GPU Architectures

FeatureKepler GK210Pascal GP100Volta GV100
Compute Capability ^3.76.07.0
Threads per Warp32
Max Warps per SM64
Max Threads per SM2048
Max Thread Blocks per SM1632
Max Concurrent Kernels32128
32-bit Registers per SM128 K64 K
Max Registers per Thread Block64 K
Max Registers per Thread255
Max Threads per Thread Block1024
L1 Cache Configurationsplit with shared memory24KB dedicated L1 cache32KB ~ 128KB
(dynamic with shared memory)
Shared Memory Configurations16KB + 112KB L1 Cache

32KB + 96KB L1 Cache

48KB + 80KB L1 Cache

(128KB total)
64KBconfigurable up to 96KB; remainder for L1 Cache

(128KB total)
Max Shared Memory per Thread Block48KB96KB*
Max X Grid Dimension232-1
Hyper-QYes
Dynamic ParallelismYes
Unified MemoryNoYes
Pre-EmptionNoYes

^ For a complete listing of Compute Capabilities, reference the NVIDIA CUDA Documentation
* above 48 KB requires dynamic shared memory

Hardware-accelerated video encoding and decoding

All NVIDIA “Volta” GPUs include one or more hardware units for video encoding and decoding (NVENC / NVDEC). For complete hardware details, reference NVIDIA’s encoder/decoder support matrix. To learn more about GPU-accelerated video encode/decode, see NVIDIA’s Video Codec SDK.

You May Also Like

  • Knowledge Center

    Common Maintenance Tasks (Clusters)

    The following items should be completed to maintain the health of your Linux cluster. For servers and workstations, please see Common Maintenance Tasks (Workstations and Servers). Backup non-replaceable data Remember that RAID is not a replacement for backups. If your system is stolen, hacked or started on fire, your data will be gone forever. Automate this…

  • Knowledge Center

    Detailed Specifications of the “Ice Lake SP” Intel Xeon Processor Scalable Family CPUs

    This article provides in-depth discussion and analysis of the 10nm Intel Xeon Processor Scalable Family (formerly codenamed “Ice Lake-SP” or “Ice Lake Scalable Processor”). These processors replace the previous 14nm “Cascade Lake-SP” microarchitecture and are available for sale as of April 6, 2021. The “Ice Lake SP” CPUs are the 3rd generation of Intel’s Xeon…

  • Knowledge Center

    Detailed Specifications of the AMD EPYC “Milan” CPUs

    This article provides in-depth discussion and analysis of the 7nm AMD EPYC processor (codenamed “Milan” and based on AMD’s Zen3 architecture). EPYC “Milan” processors replace the previous “Rome” processors and are available for sale as of March 15th, 2021. These new CPUs are the third iteration of AMD’s EPYC server processor family. They are compatible…