In-Depth Comparison of NVIDIA Quadro “Turing” GPU Accelerators

Brett Newman

August 21, 2018

This article provides in-depth details of the NVIDIA Quadro RTX “Turing” GPUs. NVIDIA “Turing” GPUs bring an evolved core architecture and add dedicated ray tracing units to the previous-generation “Volta” architecture. Turing GPUs began shipping in late 2018.

Important features available in the “Turing” GPU architecture include:

New RT Ray Tracing Cores for the first realtime ray-tracing performance
Evolved Deep Learning performance with over 130 Tensor TFLOPS (training) and and 500 TOPS Int4 (inference) throughput
NVLink 2.0 between GPUs—when optional NVLink bridges are added—supporting up to 2 bricks and up to 100GB/sec bidirectional bandwidth
New GDDR6 Memory with a substantial improvement in memory performance compared to previous-generation GPUs.

Quadro “Turing” GPU Specifications

The table below summarizes the features of the available Quadro Turing GPU Accelerators. To learn more about these products, or to find out how best to leverage their capabilities, please speak with an HPC expert.

Professional Visualization, Ray Tracing, & Deep Learning Applications

Feature	Quadro RTX 8000	Quadro RTX 6000	Quadro RTX 5000	Quadro RTX 4000
GPU Chip(s)	Turing, TU102		Turing, TU104	Turing, TU106
TensorFLOPS	130.5 Tensor TFLOPS*		89.2 Tensor TFLOPS*	57.0 Tensor TFLOPS*
Integer Operations (INT4)	522 TOPS*		356.8 TOPS*	Unknown
Integer Operations (INT8)	261 TOPS*		178.4 TOPS*	Unknown
Half Precision (FP16)	32.6 TFLOPS		22.3 TFLOPS	14.2 TFLOPS
Single Precision (FP32)	16.3 TFLOPS*		11.2 TFLOPS*	7.1 TFLOPS*
Double Precision (FP64)	.509 TFLOPS*		.350 TFLOPS*	.222 TFLOPS*
Ray Tracing	10 GigaRays/s		8 GigaRays/sec	6 GigaRays/sec
# of CUDA Cores	4608		3072	2034
# of Turing Tensor Cores	576		384	288
# of SM Units	72		48	36
# of RT Cores	72		48	36
GPU Base Clock	1455 Mhz		1620 Mhz	Unknown Mhz
GPU Boost Clock	1770 Mhz		1815 Mhz	Unknown Mhz
GDDR6 Memory	48GB	24GB	16GB	8GB
Memory Bandwidth	672 GB/sec		448 GB/sec	416 GB/sec
Interconnect	PCI-E 3.0 + optional NVLink 2.0 (2 bricks)		PCI-E 3.0 + optional NVLink 2.0 (1 brick)	PCI-E 3.0
Theoretical transfer bandwidth (bidirectional)	100 GB/s NVLink 32GB/s PCI-E x16 3.0		50 GB/s NVLink 32GB/s PCI-E x16 3.0	32GB/s PCI-E x16 3.0
Achievable transfer bandwidth	~94 GB/s NVLink ~12 GB/s PCI-E x16 3.0			~12 GB/s PCI-E x16 3.0
GPU Boost Support	Yes – Dynamic
Workstation Support	yes
Server Support	Yes, with passive GPU version		specific server models only
Wattage (TDP)	295W		265W	160W
Cooling Type	Active†		Active

* FLOPS and TOPS calculations are presented at Max Boost
† Passively-cooled models are available with slightly reduced clock speeds

Common Maintenance Tasks (Clusters)

Detailed Specifications of the “Ice Lake SP” Intel Xeon Processor Scalable Family CPUs

Detailed Specifications of the AMD EPYC “Milan” CPUs