In-Depth Comparison of NVIDIA Quadro “Turing” GPU Accelerators

Revision for “In-Depth Comparison of NVIDIA Quadro “Turing” GPU Accelerators” created on May 31, 2019 @ 13:58:25

TitleContentExcerpt
In-Depth Comparison of NVIDIA Quadro "Turing" GPU Accelerators
<em>This article provides in-depth details of the NVIDIA Quadro RTX "Turing" GPUs. The details come from <a href="https://www.youtube.com/watch?v=gTbWVt2_OWc" target="_blank" rel="noopener noreferrer">NVIDIA’s launch presentation</a> and <a href="https://nvidianews.nvidia.com/news/nvidia-unveils-quadro-rtx-worlds-first-ray-tracing-gpu" target="_blank" rel="noopener noreferrer">press materials</a> from SIGGRAPH 2018, and they are evolving as NVIDIA releases more information on the GPUs. NVIDIA "Turing" GPUs bring an evolved core architecture and add dedicated ray tracing units to the previous-generation <a href="https://www.microway.com/knowledge-center-articles/in-depth-comparison-of-nvidia-tesla-volta-gpu-accelerators/" target="_blank" rel="noopener noreferrer">"Volta"</a> architecture. Turing GPUs will begin shipping in 4Q 2018. <a href="https://www.microway.com/contact/">Contact us</a> about system availability with these GPUs.</em>
<h2>Important features available in the "Turing" GPU architecture include:</h2>
<ul>
<li><strong>New RT Ray Tracing Cores</strong> for the first realtime ray-tracing performance</li>
<li><strong>Evolved Deep Learning performance</strong> with over 130 Tensor TFLOPS (training) and and 500 TOPS Int4 (inference) throughput</li>
<li><strong>NVLink 2.0</strong> between GPUs—when optional NVLink bridges are added—supporting up to 2 bricks and up to 100GB/sec bidirectional bandwidth</li>
<li><strong>New GDDR6 Memory</strong> with a substantial improvement in memory performance compared to previous-generation GPUs.</li>
</ul>
<h2>Quadro "Turing" GPU Specifications</h2>
The table below summarizes the features of the available Quadro Turing GPU Accelerators. To learn more about these products, or to find out how best to leverage their capabilities, please speak with an <a title="Talk to an Expert – Contact Microway" href="http://www.microway.com/contact/">HPC expert</a>.

<table>
<thead>
<tr>
<th>Feature</th>
<th>Quadro RTX 8000</th>
<th>Quadro RTX 6000</th>
<th>Quadro RTX 5000</th>
<th>Quadro RTX 4000</th>
</tr>
</thead>
<tbody>
<tr>
<td class="rowhead">GPU Chip(s)</td>
<td colspan="2">Turing, TU102</td>
<td>Turing, TU104</td>
<td>Turing, TU106</td>
</tr>
<tr>
<td class="rowhead">TensorFLOPS</td>
<td colspan="2">130.5 Tensor TFLOPS*</td>
<td>89.2 Tensor TFLOPS*</td>
<td>57.0 Tensor TFLOPS*</td>
</tr>
<tr>
<td class="rowhead">Integer Operations (INT4)</td>
<td colspan="2">522 TOPS*</td>
<td>356.8 TOPS*</td>
<td>Unknown</td>
</tr>
<tr>
<td class="rowhead">Integer Operations (INT8)</td>
<td colspan="2">261 TOPS*</td>
<td>178.4 TOPS*</td>
<td>Unknown</td>
</tr>
<tr>
<td class="rowhead">Half Precision (FP16)</td>
<td colspan="2">32.6 TFLOPS</td>
<td>22.3 TFLOPS</td>
<td>14.2 TFLOPS</td>
</tr>
<tr>
<td class="rowhead">Single Precision (FP32)</td>
<td colspan="2">16.3 TFLOPS*</td>
<td>11.2 TFLOPS*</td>
<td>7.1 TFLOPS*</td>
</tr>
<tr>
<td class="rowhead">Double Precision (FP64)</td>
<td colspan="2">.509 TFLOPS*</td>
<td>.350 TFLOPS*</td>
<td>.222 TFLOPS*</td>
</tr>
<tr>
<td class="rowhead">Ray Tracing</td>
<td colspan="2">10 GigaRays/s</td>
<td>8 GigaRays/sec</td>
<td>6 GigaRays/sec</td>
</tr>
<tr>
<td class="rowhead"># of CUDA Cores</td>
<td colspan="2">4608</td>
<td>3072</td>
<td>2034</td>
</tr>
<tr>
<td class="rowhead"># of Turing Tensor Cores</td>
<td colspan="2">576</td>
<td>384</td>
<td>288</td>
</tr>
<tr>
<td class="rowhead"># of SM Units</td>
<td colspan="2">72</td>
<td>48</td>
<td>36</td>
</tr>
<tr>
<td class="rowhead"># of RT Cores</td>
<td colspan=2>72</td>
<td>48</td>
<td>36</td>
</tr>
<tr>
<td class="rowhead">GPU Base Clock</td>
<td colspan=2>1455 Mhz</td>
<td>1620 Mhz</td>
<td>Unknown Mhz</td>
</tr>
<tr>
<td class="rowhead">GPU Boost Clock</td>
<td colspan=2>1770 Mhz</td>
<td>1815 Mhz</td>
<td>Unknown Mhz</td>
</tr>
<tr>
<td class="rowhead">GDDR6 Memory</td>
<td>48GB</td>
<td>24GB</td>
<td>16GB</td>
<td>8GB</td>
</tr>
<tr>
<td class="rowhead">Memory Bandwidth</td>
<td colspan="2">672 GB/sec</td>
<td>448 GB/sec</td>
<td>416 GB/sec</td>
</tr>
<tr>
<td class="rowhead">Interconnect</td>
<td colspan="2">PCI-E 3.0 + optional NVLink 2.0 (2 bricks)</td>
<td>PCI-E 3.0 + optional NVLink 2.0 (1 brick)</td>
<td>PCI-E 3.0</td>
</tr>
<tr>
<td class="rowhead">Theoretical transfer bandwidth (bidirectional)</td>
<td colspan="2">100 GB/s NVLink<br> 32GB/s PCI-E x16 3.0</td>
<td>50 GB/s NVLink<br>32GB/s PCI-E x16 3.0</td>
<td>32GB/s PCI-E x16 3.0</td>
</tr>
<tr>
<td class="rowhead">Achievable transfer bandwidth</td>
<td colspan="3">TBC NVLink, ~12 GB/s PCI-E x16 3.0</td>
<td>~12 GB/s PCI-E x16 3.0</td>
</tr>
<tr>
<td class="rowhead">GPU Boost Support</td>
<td colspan="4">Yes – Dynamic</td>
</tr>

<!–

<tr>
<td class="rowhead">Compute Capability</td>
<td colspan=4>8.0</td>
</tr>
<tr>

–>
<tr>
<td class="rowhead">Workstation Support</td>
<td colspan="4">yes</td>
</tr>
<tr>
<td class="rowhead">Server Support</td>
<td colspan="4">specific server models only</td>
</tr>
<tr>
<td class="rowhead">Wattage (TDP)</td>
<td colspan=2>295W</td>
<td>265W</td>
<td>160W</td>
</tr>
<tr>
<td class="rowhead">Cooling Type</td>
<td colspan="4">Active</td>
</tr>
</tbody>
</table>

<em>* FLOPS and TOPS calculations are presented at Max Boost</em>

<!–
<h2>Comparison between "Kepler", "Pascal", and "Volta" GPU Architectures</h2>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Kepler GK210</th>
<th>Pascal GP100</th>
<th>Volta GV100</th>
</tr>
</thead>
<tbody>
<tr>
<td class="rowhead">Compute Capability &Hat;</td>
<td>3.7</td>
<td>6.0</td>
<td>7.0</td>
</tr>
<tr>
<td class="rowhead">Threads per Warp</td>
<td colspan=3>32</td>
</tr>
<tr>
<td class="rowhead">Max Warps per SM</td>
<td colspan=3>64</td>
</tr>
<tr>
<td class="rowhead">Max Threads per SM</td>
<td colspan=3>2048</td>
</tr>
<tr>
<td class="rowhead">Max Thread Blocks per SM</td>
<td>16</td>
<td colspan=2>32</td>
</tr>
<tr>
<td class="rowhead">Max Concurrent Kernels</td>
<td>32</td>
<td colspan=2>128</td>
</tr>
<tr>
<td class="rowhead">32-bit Registers per SM</td>
<td>128 K</td>
<td colspan=2>64 K</td>
</tr>
<tr>
<td class="rowhead">Max Registers per Thread Block</td>
<td colspan=3>64 K</td>
</tr>
<tr>
<td class="rowhead">Max Registers per Thread</td>
<td colspan=3>255</td>
</tr>
<tr>
<td class="rowhead">Max Threads per Thread Block</td>
<td colspan=3>1024</td>
</tr>
<tr>
<td class="rowhead">L1 Cache Configuration</td>
<td>split with shared memory</td>
<td>24KB dedicated L1 cache</td>
<td>32KB ~ 128KB
(dynamic with shared memory)</td>
</tr>
<tr>
<td class="rowhead">Shared Memory Configurations</td>
<td>16KB + 112KB L1 Cache

32KB + 96KB L1 Cache

48KB + 80KB L1 Cache

<em>(128KB total)</em></td>
<td>64KB</td>
<td>configurable up to 96KB; remainder for L1 Cache

<em>(128KB total)</em></td>
</tr>
<tr>
<td class="rowhead">Max Shared Memory per Thread Block</td>
<td colspan=2>48KB</td>
<td>96KB*</td>
</tr>
<tr>
<td class="rowhead">Max X Grid Dimension</td>
<td colspan=3>2<sup>32-1</sup></td>
</tr>
<tr>
<td class="rowhead">Hyper-Q</td>
<td colspan=3>Yes</td>
</tr>
<tr>
<td class="rowhead">Dynamic Parallelism</td>
<td colspan=3>Yes</td>
</tr>
<tr>
<td class="rowhead">Unified Memory</td>
<td>No</td>
<td colspan=2>Yes</td>
</tr>
<tr>
<td class="rowhead">Pre-Emption</td>
<td>No</td>
<td colspan=2>Yes</td>
</tr>
</tbody>
</table>
<em>&Hat; For a complete listing of Compute Capabilities, reference the <a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications__technical-specifications-per-compute-capability" target="_blank" rel="noopener noreferrer">NVIDIA CUDA Documentation</a></em>
<em>* above 48 KB requires dynamic shared memory</em>
<h2>Hardware-accelerated video encoding and decoding</h2>
All NVIDIA "Volta" GPUs include one or more hardware units for video encoding and decoding (NVENC / NVDEC). For complete hardware details, reference NVIDIA’s <a href="https://developer.nvidia.com/video-encode-decode-gpu-support-matrix" target="_blank" rel="noopener noreferrer">encoder/decoder support matrix</a>. To learn more about GPU-accelerated video encode/decode, see NVIDIA’s <a href="https://developer.nvidia.com/nvidia-video-codec-sdk" target="_blank" rel="noopener noreferrer">Video Codec SDK</a>.

–>



Old New Date Created Author Actions
May 31, 2019 @ 13:58:25 Brett Newman
May 31, 2019 @ 13:57:00 [Autosave] Brett Newman
May 31, 2019 @ 13:51:10 Brett Newman
May 31, 2019 @ 13:49:24 Brett Newman
October 25, 2018 @ 16:51:29 Eliot Eshelman
October 25, 2018 @ 16:37:55 [Autosave] Eliot Eshelman
September 25, 2018 @ 16:18:41 Brett Newman
September 25, 2018 @ 11:18:17 Brett Newman
September 25, 2018 @ 10:30:54 Brett Newman
August 21, 2018 @ 18:52:48 Brett Newman
August 21, 2018 @ 18:05:31 Brett Newman
August 21, 2018 @ 18:03:01 Brett Newman
August 21, 2018 @ 17:59:58 Brett Newman
August 21, 2018 @ 17:57:40 Brett Newman
August 21, 2018 @ 17:48:41 Brett Newman
August 17, 2018 @ 18:42:38 Brett Newman
August 17, 2018 @ 18:41:09 Brett Newman
August 17, 2018 @ 18:40:17 Brett Newman
August 17, 2018 @ 18:33:56 Brett Newman
August 17, 2018 @ 18:09:11 Brett Newman
August 17, 2018 @ 17:48:28 Brett Newman
August 17, 2018 @ 17:47:51 Brett Newman
August 17, 2018 @ 17:47:14 Brett Newman

Comments are closed.