NVIDIA’s Tesla K20 GPU is currently the de facto standard for high-performance heterogeneous computing. Based upon the Kepler GK110 architecture, these are the GPUs you want if you’ll be taking advantage of the latest advancements available in CUDA 5.0 and CUDA 5.5. This generation was designed specifically for the exciting new features in CUDA such as dynamic parallelism.
With 5GB or 6GB of GDDR5 memory, they provide up to 3.95 TFLOPS single-precision and 1.33 TFLOPS double-precision floating point performance. Two variants of the GPU are available: K20 (available for workstations and servers) and K20X (available only for servers). Here are the full specifications:
NVIDIA Tesla K20 GPU Accelerator
Integrated in Microway NumberSmasher and Navion GPU Workstations, Servers and GPU Clusters
Specifications
- 2496 CUDA cores
- 3.52 TFLOPS single, 1.17 TFLOPS double
- 5GB GDDR5 memory
- Memory bandwidth up to 208GB/sec
- PCI-E x16 Gen2 interface to system
- Supports Dynamic Parallelism and HyperQ features
- Passive (K20m) or active (K20c) heatsink options for servers and workstations
NVIDIA Tesla K20X GPU Accelerator
Integrated in Microway NumberSmasher and Navion GPU Servers and GPU Clusters
Specifications
- 2688 CUDA cores
- 3.95 TFLOPS single, 1.32 TFLOPS double
- 6GB GDDR5 memory
- Memory bandwidth up to 250GB/sec
- PCI-E x16 Gen2 interface to system
- Supports Dynamic Parallelism and HyperQ features
- Passive heatsink relies on chassis cooling of specially-designed GPU servers
Technical Details
For the technically-minded audience, here is the full information dump from nvidia-smi
on Tesla K20 and K20X GPUs:
==============NVSMI LOG==============
Timestamp : Thu Jul 18 12:16:49 2013
Driver Version : 310.32
Attached GPUs : 1
GPU 0000:02:00.0
Product Name : Tesla K20m
Display Mode : Disabled
Persistence Mode : Enabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 033421200xxxx
GPU UUID : GPU-dcf6d5d9-6a9e-xxxx-xxxx-e561b7xxxxxx
VBIOS Version : 80.10.11.00.06
Inforom Version
Image Version : 2081.0208.01.07
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : Compute
Pending : Compute
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x102810DE
Bus Id : 0000:02:00.0
Sub System Id : 0x101510DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
User Defined Clocks : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 5119 MB
Used : 13 MB
Free : 5106 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : N/A
Temperature
Gpu : 23 C
Power Readings
Power Management : Supported
Power Draw : 11.93 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 705 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 758 MHz
SM : 758 MHz
Memory : 2600 MHz
Compute Processes : None
==============NVSMI LOG==============
Timestamp : Tue Dec 3 12:54:39 2013
Driver Version : 325.15
Attached GPUs : 3
GPU 0000:02:00.0
Supported Clocks
Memory : 2600 MHz
Graphics : 758 MHz
Graphics : 705 MHz
Graphics : 666 MHz
Graphics : 640 MHz
Graphics : 614 MHz
Memory : 324 MHz
Graphics : 324 MHz
==============NVSMI LOG==============
Timestamp : Wed Nov 27 15:47:57 2013
Driver Version : 319.37
Attached GPUs : 1
GPU 0000:02:00.0
Product Name : Tesla K20Xm
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 128
Driver Model
Current : N/A
Pending : N/A
Serial Number : 032351309xxxx
GPU UUID : GPU-23d6aecc-4996-d45a-a68c-15a69e0fxxxx
VBIOS Version : 80.10.39.00.02
Inforom Version
Image Version : 2081.0200.01.09
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : Compute
Pending : Compute
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x102110DE
Bus Id : 0000:02:00.0
Sub System Id : 0x097D10DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
Unknown : Not Active
Memory Usage
Total : 5759 MB
Used : 12 MB
Free : 5747 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending : No
Temperature
Gpu : 27 C
Power Readings
Power Management : Supported
Power Draw : 30.88 W
Power Limit : 235.00 W
Default Power Limit : 235.00 W
Enforced Power Limit : 235.00 W
Min Power Limit : 150.00 W
Max Power Limit : 235.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Applications Clocks
Graphics : 732 MHz
Memory : 2600 MHz
Default Applications Clocks
Graphics : 732 MHz
Memory : 2600 MHz
Max Clocks
Graphics : 784 MHz
SM : 784 MHz
Memory : 2600 MHz
Compute Processes : None
==============NVSMI LOG==============
Timestamp : Wed Nov 27 15:49:32 2013
Driver Version : 319.37
Attached GPUs : 1
GPU 0000:02:00.0
Supported Clocks
Memory : 2600 MHz
Graphics : 784 MHz
Graphics : 758 MHz
Graphics : 732 MHz
Graphics : 705 MHz
Graphics : 666 MHz
Graphics : 640 MHz
Graphics : 614 MHz
Memory : 324 MHz
Graphics : 324 MHz
CUDA Device Query for Tesla K20
NVIDIA’s deviceQuery utility (from the CUDA SDK examples) demonstrates how applications can query the capabilities of a CUDA-capable GPU. This utility also gives valuable details about the Tesla GPU products.
deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla K20m"
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 5120 MBytes (5368512512 bytes)
(13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores
GPU Clock rate: 706 MHz (0.71 GHz)
Memory Clock rate: 2600 Mhz
Memory Bus Width: 320-bit
L2 Cache Size: 1310720 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Tesla K20m
Result = PASS
Usage Differences for Tesla K20 vs K20X
It’s worth noting that there are different versions of the Tesla GPU products depending on the type of installation. For Tesla K20 we can provide anything from quiet workstations to full compute clusters. For the higher-performing Tesla K20X the options are limited. In particular, it’s not possible to provide a workstation which is quiet.
If a tower/workstation form-factor is required, we do have one available. Unfortunately, it’s rather noisy. There is no quiet configuration on the market for Tesla K20X. Please contact one of Microway’s HPC experts if you would like to discuss the alternatives.
For those curious as to why there are two separate product versions, it’s simply a question of optimized cooling. Take a look at our GPU servers and you’ll see that airflow is carefully channeled through the GPU slots. This provides the best possible cooling for dense installations, but simply doesn’t cool properly in workstation configurations. For workstations, actively-cooled versions are available for many GPUs.