Revision for “In-Depth Comparison of NVIDIA Tesla “Volta” GPU Accelerators” created on March 9, 2022 @ 12:08:33
In-Depth Comparison of NVIDIA Tesla "Volta" GPU Accelerators
|
<em>This article provides in-depth details of the NVIDIA Tesla V-series GPU accelerators (codenamed "Volta"). "Volta" GPUs improve upon the previous-generation <a href="https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-tesla-pascal-gpu-accelerators/" rel="noopener noreferrer" target="_blank">"Pascal"</a> architecture. Volta GPUs began shipping in September 2017 and were updated to 32GB of memory in March 2018; Tesla V100S was released in late 2019. <strong>Note: these have since been superseded by the <a href="https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-ampere-gpu-accelerators/" rel="noopener noreferrer" target="_blank">NVIDIA Ampere GPU architecture</a>.</strong>
This page is intended to be a fast and easy reference of key specs for these GPUs. You may wish to browse our <a href="https://www.microway.com/hpc-tech-tips/nvidia-tesla-v100-price-analysis/" target="_blank" rel="noopener noreferrer">Tesla V100 Price Analysis</a> and <a href="https://www.microway.com/hpc-tech-tips/tesla-v100-volta-gpu-review/" target="_blank" rel="noopener noreferrer">Tesla V100 GPU Review</a> for more extended discussion.</em> <h2>Important features available in the "Volta" GPU architecture include:</h2> <h2>Tesla "Volta" GPU Specifications</h2> "HPC
<table> <thead> <tr> <th>Feature</th> <th>Tesla V100 SXM2 16GB/32GB</th> <th>Tesla V100 PCI-E 16GB/32GB</th> <th>Tesla V100S PCI-E 32GB</th> <th>Quadro GV100 32GB</th> </tr> </thead> <tbody> <tr><td class="rowhead">GPU Chip(s)</td><td colspan=4>Volta GV100</td></tr> <tr><td class="rowhead">TensorFLOPS</td><td>125 TFLOPS</td><td>112 TFLOPS</td><td>130 TFLOPS</td><td>118.5 TFLOPS</td></tr> <tr><td class="rowhead">Integer Operations (INT8)*</td><td>62.8 TOPS</td><td>56.0 TOPS</td><td>65 TOPS</td><td>59.3 TOPS</td></tr> <tr><td class="rowhead">Half Precision (FP16)*</td><td>31.4 TFLOPS</td><td>28 TFLOPS</td><td>32.8 TFLOPS</td><td>29.6 TFLOPS</td></tr> <tr><td class="rowhead">Single Precision (FP32)*</td><td>15.7 TFLOPS</td><td>14.0 TFLOPS</td><td>16.4 TFLOPS</td><td>14.8 TFLOPS</td></tr> <tr><td class="rowhead">Double Precision (FP64)*</td><td>7.8 TFLOPS</td><td>7.0 TFLOPS</td><td>8.2 TFLOPS</td><td>7.4 TFLOPS</td></tr> <tr><td class="rowhead">On-die HBM2 Memory</td><td colspan=2>16GB or 32GB</td><td colspan=2>32GB</td></tr> <tr><td class="rowhead">Memory Bandwidth</td><td colspan=2>900 GB/s</td><td>1,134 GB/s</td><td>870 GB/s</td></tr> <tr><td class="rowhead">L2 Cache</td><td colspan=4>6 MB</td></tr> <tr><td class="rowhead">Interconnect</td><td>NVLink 2.0 (6 bricks) + PCI-E 3.0</td><td colspan=2>PCI-Express 3.0</td><td>NVLink 2.0 (4 bricks) + PCI-E 3.0</td></tr> <tr><td class="rowhead">Theoretical transfer bandwidth (bidirectional)</td><td>300 GB/s</td><td colspan=2>32 GB/s</td><td>200 GB/s</td></tr> <tr><td class="rowhead">Achievable transfer bandwidth</td><td>143.5 GB/s</td><td colspan=3>~12 GB/s</td></tr> <tr><td class="rowhead"># of SM Units</td><td colspan=4>80</td></tr> <tr><td class="rowhead"># of Tensor Cores</td><td colspan=4>640</td></tr> <tr><td class="rowhead"># of integer INT32 CUDA Cores</td><td colspan=4>5120</td></tr> <tr><td class="rowhead"># of single-precision FP32 CUDA Cores</td><td colspan=4>5120</td></tr> <tr><td class="rowhead"># of double-precision FP64 CUDA Cores</td><td colspan=4>2560</td></tr> <tr><td class="rowhead">GPU Base Clock</td><td>not published</td><td>1245Mhz</td><td colspan=2>not published</td></tr> <tr><td class="rowhead">GPU Boost Support</td><td colspan=4>Yes – Dynamic</td></tr> <tr><td class="rowhead">GPU Boost Clock</td><td>1530 MHz</td><td>~1380 MHz</td><td colspan=2>TBM</td></tr> <tr><td class="rowhead">Compute Capability</td><td colspan=4>7.0</td></tr> <tr><td class="rowhead">Workstation Support</td><td colspan=3>-</td><td>yes</td></tr> <tr><td class="rowhead">Server Support</td><td colspan=3>yes</td><td>specific server models only</td> <tr><td class="rowhead">Cooling Type</td><td colspan=3>Passive</td><td>Active</td></tr> </tr> <tr><td class="rowhead">Wattage (TDP)</td><td>300W</td><td colspan=3>250W</td></tr> </tbody> </table> <em>* theoretical peak performance with GPU Boost enabled</em> <h2>Comparison between "Kepler", "Pascal", and "Volta" GPU Architectures</h2> <h2>Hardware-accelerated video encoding and decoding</h2> |