In-Depth Comparison of NVIDIA Quadro “Turing” GPU Accelerators

Articles > In-Depth Comparison of NVIDIA Quadro “Turing” GPU Accelerators
This article provides in-depth details of the NVIDIA Quadro RTX “Turing” GPUs. The details come from NVIDIA’s launch presentation and press materials from SIGGRAPH 2018, and they are evolving as NVIDIA releases more information on the GPUs. NVIDIA “Turing” GPUs bring an evolved core architecture and add dedicated ray tracing units to the previous-generation “Volta” architecture. Turing GPUs will begin shipping in 4Q 2018. Contact us about system availability with these GPUs.

Important features available in the “Turing” GPU architecture include:

  • New RT Ray Tracing Cores for the first realtime ray-tracing performance
  • Evolved Deep Learning performance with over 130 Tensor TFLOPS (training) and and 500 TOPS Int4 (inference) throughput
  • NVLink 2.0 between GPUs—when optional NVLink bridges are added—supporting up to 2 bricks and up to 100GB/sec bidirectional bandwidth
  • New GDDR6 Memory with a substantial improvement in memory performance compared to previous-generation GPUs.

Quadro “Turing” GPU Specifications

The table below summarizes the features of the available Quadro Turing GPU Accelerators. To learn more about these products, or to find out how best to leverage their capabilities, please speak with an HPC expert.

Feature Quadro RTX 8000 Quadro RTX 6000 Quadro RTX 5000 Quadro RTX 4000
GPU Chip(s) Turing, TU102 Turing, TU104 Turing, TU106
TensorFLOPS 130.5 Tensor TFLOPS* 89.2 Tensor TFLOPS* 57.0 Tensor TFLOPS*
Integer Operations (INT4) 522 TOPS* 356.8 TOPS* Unknown
Integer Operations (INT8) 261 TOPS* 178.4 TOPS* Unknown
Half Precision (FP16) 32.6 TFLOPS 22.3 TFLOPS 14.2 TFLOPS
Single Precision (FP32) 16.3 TFLOPS* 11.2 TFLOPS* 7.1 TFLOPS*
Double Precision (FP64) .509 TFLOPS* .350 TFLOPS* .222 TFLOPS*
Ray Tracing 10 GigaRays/s 8 GigaRays/sec 6 GigaRays/sec
# of CUDA Cores 4608 3072 2034
# of Turing Tensor Cores 576 384 288
# of SM Units 72 48 36
# of RT Cores 72 48 36
GPU Base Clock 1455 Mhz 1620 Mhz Unknown Mhz
GPU Boost Clock 1770 Mhz 1815 Mhz Unknown Mhz
GDDR6 Memory 48GB 24GB 16GB 8GB
Memory Bandwidth 672 GB/sec 448 GB/sec 416 GB/sec
Interconnect PCI-E 3.0 + optional NVLink 2.0 (2 bricks) PCI-E 3.0 + optional NVLink 2.0 (1 brick) PCI-E 3.0
Theoretical transfer bandwidth (bidirectional) 100 GB/s NVLink
32GB/s PCI-E x16 3.0
50 GB/s NVLink
32GB/s PCI-E x16 3.0
32GB/s PCI-E x16 3.0
Achievable transfer bandwidth TBC NVLink, ~12 GB/s PCI-E x16 3.0 ~12 GB/s PCI-E x16 3.0
GPU Boost Support Yes – Dynamic
Workstation Support yes
Server Support specific server models only
Wattage (TDP) 295W 265W 160W
Cooling Type Active

* FLOPS and TOPS calculations are presented at Max Boost

Category: Tags:


Comments are closed.