Revision for “In-Depth Comparison of NVIDIA Tesla “Pascal” GPU Accelerators” created on March 9, 2022 @ 12:08:13
In-Depth Comparison of NVIDIA Tesla "Pascal" GPU Accelerators
|
<em>This article provides in-depth details of the NVIDIA Tesla P-series GPU accelerators (codenamed "Pascal"). "Pascal" GPUs improve upon the previous-generation <a href="https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-tesla-kepler-gpu-accelerators/" target="_blank" rel="noopener noreferrer">"Kepler"</a>, and <a href="https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-tesla-maxwell-gpu-accelerators/" target="_blank" rel="noopener noreferrer">"Maxwell"</a> architectures. Pascal GPUs were announced at GTC 2016 and began shipping in September 2016. <strong>Note: these have since been superseded by the <a href="https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-tesla-volta-gpu-accelerators/" rel="noopener noreferrer" target="_blank">NVIDIA Volta GPU architecture</a>.</strong></em>
<h2>Important changes available in the "Pascal" GPU architecture include:</h2> <h2>Tesla "Pascal" GPU Specifications</h2> "HPC
<table> <thead> <tr> <th>Feature</th> <th>Tesla P100 SXM2 16GB</th> <th>Tesla P100 PCI-E 16GB</th> <th>Tesla P100 PCI-E 12GB</th> </tr> </thead> <tbody> <tr><td class="rowhead">GPU Chip(s)</td><td colspan=3>Pascal GP100</td></tr> <tr><td class="rowhead">Integer Operations (INT8)*</td><td colspan=3>-</td></tr> <tr><td class="rowhead">Half Precision (FP16)*</td><td>21.2 TFLOPS</td><td colspan=2>18.7 TFLOPS</td></tr> <tr><td class="rowhead">Single Precision (FP32)*</td><td>10.6 TFLOPS</td><td colspan=2>9.3 TFLOPS</td></tr> <tr><td class="rowhead">Double Precision (FP64)*</td><td>5.3 TFLOPS</td><td colspan=2>4.7 TFLOPS</td></tr> <tr><td class="rowhead">On-die HBM2 Memory</td><td colspan=2>16GB</td><td>12GB</td></tr> <tr><td class="rowhead">Memory Bandwidth</td><td colspan=2>732 GB/s</td><td>549 GB/s</td></tr> <tr><td class="rowhead">L2 Cache</td><td colspan=3>4 MB</td></tr> <tr><td class="rowhead">Interconnect</td><td>NVLink + PCI-E 3.0</td><td colspan=2>PCI-Express 3.0</td></tr> <tr><td class="rowhead">Theoretical transfer bandwidth</td><td>80 GB/s</td><td colspan=2>16 GB/s</td></tr> <tr><td class="rowhead">Achievable transfer bandwidth</td><td>~66 GB/s</td><td colspan=2>~12 GB/s</td></tr> <tr><td class="rowhead"># of SM Units</td><td colspan=3>56</td></tr> <tr><td class="rowhead"># of single-precision CUDA Cores</td><td colspan=3>3584</td></tr> <tr><td class="rowhead"># of double-precision CUDA Cores</td><td colspan=3>1792</td></tr> <tr><td class="rowhead">GPU Base Clock</td><td>1328 MHz</td><td colspan=2>1126 MHz</td></tr> <tr><td class="rowhead">GPU Boost Support</td><td colspan=3>Yes – Dynamic</td></tr> <tr><td class="rowhead">GPU Boost Clock</td><td>1480 MHz</td><td colspan=2>1303 MHz</td></tr> <tr><td class="rowhead">Compute Capability</td><td colspan=3>6.0</td></tr> <tr><td class="rowhead">Workstation Support</td><td colspan=3>-</td></tr> <tr><td class="rowhead">Server Support</td><td colspan=3>yes</td></tr> <tr><td class="rowhead">Wattage (TDP)</td><td>300W</td><td colspan=2>250W</td></tr> </tbody> </table> <em>* Measured with GPU Boost enabled</em> "Deep
<table> <thead> <tr> <th>Feature</th> <th>Tesla P40 PCI-E 24GB</th> </tr> </thead> <tbody> <tr><td class="rowhead">GPU Chip(s)</td><td>Pascal GP102</td></tr> <tr><td class="rowhead">Integer Operations (INT8)*</td><td>47 TOPS</td></tr> <tr><td class="rowhead">Half Precision (FP16)*</td><td>-</td></tr> <tr><td class="rowhead">Single Precision (FP32)*</td><td>12 TFLOPS</td></tr> <tr><td class="rowhead">Double Precision (FP64)*</td><td>-</td></tr> <tr><td class="rowhead">Onboard GDDR5 Memory</td><td>24GB</td></tr> <tr><td class="rowhead">Memory Bandwidth</td><td>346 GB/s</td></tr> <tr><td class="rowhead">L2 Cache</td><td>3 MB</td></tr> <tr><td class="rowhead">Interconnect</td><td>PCI-Express 3.0</td></tr> <tr><td class="rowhead">Theoretical transfer bandwidth</td><td>16 GB/s</td></tr> <tr><td class="rowhead">Achievable transfer bandwidth</td><td>~12 GB/s</td></tr> <tr><td class="rowhead"># of SM Units</td><td>30</td></tr> <tr><td class="rowhead"># of single-precision CUDA Cores</td><td colspan=3>3840</td></tr> <tr><td class="rowhead">GPU Base Clock</td><td>1303 MHz</td></tr> <tr><td class="rowhead">GPU Boost Support</td><td>Yes – Dynamic</td></tr> <tr><td class="rowhead">GPU Boost Clock</td><td>1531 MHz</td></tr> <tr><td class="rowhead">Compute Capability</td><td>6.1</td></tr> <tr><td class="rowhead">Workstation Support</td><td>-</td></tr> <tr><td class="rowhead">Server Support</td><td>yes</td></tr> <tr><td class="rowhead">Wattage (TDP)</td><td>250W</td></tr> </tbody> </table> <em>* Measured with GPU Boost enabled</em> <h2>Comparison between "Kepler", "Maxwell", and "Pascal" GPU Architectures</h2> <h2>Additional Tesla "Pascal" GPU products</h2> <h2>Hardware-accelerated video encoding and decoding</h2> |