Articles > In-Depth Comparison of NVIDIA Quadro “Turing” GPU Accelerators
This article provides in-depth details of the NVIDIA Quadro RTX “Turing” GPUs. NVIDIA “Turing” GPUs bring an evolved core architecture and add dedicated ray tracing units to the previous-generation “Volta” architecture. Turing GPUs began shipping in late 2018.
* FLOPS and TOPS calculations are presented at Max Boost
† Passively-cooled models are available with slightly reduced clock speeds
Important features available in the “Turing” GPU architecture include:
- New RT Ray Tracing Cores for the first realtime ray-tracing performance
- Evolved Deep Learning performance with over 130 Tensor TFLOPS (training) and and 500 TOPS Int4 (inference) throughput
- NVLink 2.0 between GPUs—when optional NVLink bridges are added—supporting up to 2 bricks and up to 100GB/sec bidirectional bandwidth
- New GDDR6 Memory with a substantial improvement in memory performance compared to previous-generation GPUs.
Quadro “Turing” GPU Specifications
The table below summarizes the features of the available Quadro Turing GPU Accelerators. To learn more about these products, or to find out how best to leverage their capabilities, please speak with an HPC expert.
Professional Visualization, Ray Tracing, & Deep Learning Applications
Feature | Quadro RTX 8000 | Quadro RTX 6000 | Quadro RTX 5000 | Quadro RTX 4000 |
---|---|---|---|---|
GPU Chip(s) | Turing, TU102 | Turing, TU104 | Turing, TU106 | |
TensorFLOPS | 130.5 Tensor TFLOPS* | 89.2 Tensor TFLOPS* | 57.0 Tensor TFLOPS* | |
Integer Operations (INT4) | 522 TOPS* | 356.8 TOPS* | Unknown | |
Integer Operations (INT8) | 261 TOPS* | 178.4 TOPS* | Unknown | |
Half Precision (FP16) | 32.6 TFLOPS | 22.3 TFLOPS | 14.2 TFLOPS | |
Single Precision (FP32) | 16.3 TFLOPS* | 11.2 TFLOPS* | 7.1 TFLOPS* | |
Double Precision (FP64) | .509 TFLOPS* | .350 TFLOPS* | .222 TFLOPS* | |
Ray Tracing | 10 GigaRays/s | 8 GigaRays/sec | 6 GigaRays/sec | |
# of CUDA Cores | 4608 | 3072 | 2034 | |
# of Turing Tensor Cores | 576 | 384 | 288 | |
# of SM Units | 72 | 48 | 36 | |
# of RT Cores | 72 | 48 | 36 | |
GPU Base Clock | 1455 Mhz | 1620 Mhz | Unknown Mhz | |
GPU Boost Clock | 1770 Mhz | 1815 Mhz | Unknown Mhz | |
GDDR6 Memory | 48GB | 24GB | 16GB | 8GB |
Memory Bandwidth | 672 GB/sec | 448 GB/sec | 416 GB/sec | |
Interconnect | PCI-E 3.0 + optional NVLink 2.0 (2 bricks) | PCI-E 3.0 + optional NVLink 2.0 (1 brick) | PCI-E 3.0 | |
Theoretical transfer bandwidth (bidirectional) | 100 GB/s NVLink 32GB/s PCI-E x16 3.0 |
50 GB/s NVLink 32GB/s PCI-E x16 3.0 |
32GB/s PCI-E x16 3.0 | |
Achievable transfer bandwidth | ~94 GB/s NVLink ~12 GB/s PCI-E x16 3.0 |
~12 GB/s PCI-E x16 3.0 | ||
GPU Boost Support | Yes – Dynamic | |||
Workstation Support | yes | |||
Server Support | Yes, with passive GPU version | specific server models only | ||
Wattage (TDP) | 295W | 265W | 160W | |
Cooling Type | Active† | Active |
† Passively-cooled models are available with slightly reduced clock speeds
Category: Performance Tags: