NVIDIA Tesla K20 GPU Accelerator (Kepler GK110) Up Close

NVIDIA’s Tesla K20 GPU is currently the de facto standard for high-performance heterogeneous computing. Based upon the Kepler GK110 architecture, these are the GPUs you want if you’ll be taking advantage of the latest advancements available in CUDA 5.0 and CUDA 5.5. This generation was designed specifically for the exciting new features in CUDA such as dynamic parallelism.

With 5GB or 6GB of GDDR5 memory, they provide up to 3.95 TFLOPS single-precision and 1.33 TFLOPS double-precision floating point performance. Two variants of the GPU are available: K20 (available for workstations and servers) and K20X (available only for servers). Here are the full specifications:

Tesla K20 GPU Specifications

NVIDIA Tesla K20 GPU Accelerator

Integrated in Microway NumberSmasher and Navion GPU Workstations, Servers and GPU Clusters

SpecificationsPhotograph of NVIDIA Tesla Kepler GK110 K20 GPU without cover

  • 2496 CUDA cores
  • 3.52 TFLOPS single, 1.17 TFLOPS double
  • 5GB GDDR5 memory
  • Memory bandwidth up to 208GB/sec
  • PCI-E x16 Gen2 interface to system
  • Supports Dynamic Parallelism and HyperQ features
  • Passive (K20m) or active (K20c) heatsink options for servers and workstations

Tesla K20X GPU (peak performance) Specifications

NVIDIA Tesla K20X GPU Accelerator

Integrated in Microway NumberSmasher and Navion GPU Servers and GPU Clusters

SpecificationsPhotograph of NVIDIA Tesla Kepler GK110 K20 GPU without cover

  • 2688 CUDA cores
  • 3.95 TFLOPS single, 1.32 TFLOPS double
  • 6GB GDDR5 memory
  • Memory bandwidth up to 250GB/sec
  • PCI-E x16 Gen2 interface to system
  • Supports Dynamic Parallelism and HyperQ features
  • Passive heatsink relies on chassis cooling of specially-designed GPU servers

Technical Details

For the technically-minded audience, here is the full information dump from nvidia-smi on Tesla K20 and K20X GPUs:

nvidia-smi: Tesla K20

==============NVSMI LOG==============

Timestamp                       : Thu Jul 18 12:16:49 2013
Driver Version                  : 310.32

Attached GPUs                   : 1
GPU 0000:02:00.0
    Product Name                : Tesla K20m
    Display Mode                : Disabled
    Persistence Mode            : Enabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : 033421200xxxx
    GPU UUID                    : GPU-dcf6d5d9-6a9e-xxxx-xxxx-e561b7xxxxxx
    VBIOS Version               : 80.10.11.00.06
    Inforom Version
        Image Version           : 2081.0208.01.07
        OEM Object              : 1.1
        ECC Object              : 3.0
        Power Management Object : N/A
    GPU Operation Mode
        Current                 : Compute
        Pending                 : Compute
    PCI
        Bus                     : 0x02
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x102810DE
        Bus Id                  : 0000:02:00.0
        Sub System Id           : 0x101510DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 1
            Link Width
                Max             : 16x
                Current         : 16x
    Fan Speed                   : N/A
    Performance State           : P8
    Clocks Throttle Reasons
        Idle                    : Active
        User Defined Clocks     : Not Active
        SW Power Cap            : Not Active
        HW Slowdown             : Not Active
        Unknown                 : Not Active
    Memory Usage
        Total                   : 5119 MB
        Used                    : 13 MB
        Free                    : 5106 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 0 %
        Memory                  : 0 %
    Ecc Mode
        Current                 : Disabled
        Pending                 : Disabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Texture Memory  : N/A
                Total           : N/A
            Double Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Texture Memory  : N/A
                Total           : N/A
        Aggregate
            Single Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Texture Memory  : N/A
                Total           : N/A
            Double Bit            
                Device Memory   : N/A
                Register File   : N/A
                L1 Cache        : N/A
                L2 Cache        : N/A
                Texture Memory  : N/A
                Total           : N/A
    Temperature
        Gpu                     : 23 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 11.93 W
        Power Limit             : 225.00 W
        Default Power Limit     : 225.00 W
        Min Power Limit         : 150.00 W
        Max Power Limit         : 225.00 W
    Clocks
        Graphics                : 324 MHz
        SM                      : 324 MHz
        Memory                  : 324 MHz
    Applications Clocks
        Graphics                : 705 MHz
        Memory                  : 2600 MHz
    Max Clocks
        Graphics                : 758 MHz
        SM                      : 758 MHz
        Memory                  : 2600 MHz
    Compute Processes           : None
==============NVSMI LOG==============

Timestamp                           : Tue Dec  3 12:54:39 2013
Driver Version                      : 325.15

Attached GPUs                       : 3
GPU 0000:02:00.0
    Supported Clocks
        Memory                      : 2600 MHz
            Graphics                : 758 MHz
            Graphics                : 705 MHz
            Graphics                : 666 MHz
            Graphics                : 640 MHz
            Graphics                : 614 MHz
        Memory                      : 324 MHz
            Graphics                : 324 MHz

nvidia-smi: Tesla K20X

==============NVSMI LOG==============

Timestamp                           : Wed Nov 27 15:47:57 2013
Driver Version                      : 319.37

Attached GPUs                       : 1
GPU 0000:02:00.0
    Product Name                    : Tesla K20Xm
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 032351309xxxx
    GPU UUID                        : GPU-23d6aecc-4996-d45a-a68c-15a69e0fxxxx
    VBIOS Version                   : 80.10.39.00.02
    Inforom Version
        Image Version               : 2081.0200.01.09
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : Compute
        Pending                     : Compute
    PCI
        Bus                         : 0x02
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102110DE
        Bus Id                      : 0000:02:00.0
        Sub System Id               : 0x097D10DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
    Fan Speed                       : N/A
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Unknown                     : Not Active
    Memory Usage
        Total                       : 5759 MB
        Used                        : 12 MB
        Free                        : 5747 MB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
        Aggregate
            Single Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
            Double Bit            
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Total               : 0
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending                     : No
    Temperature
        Gpu                         : 27 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 30.88 W
        Power Limit                 : 235.00 W
        Default Power Limit         : 235.00 W
        Enforced Power Limit        : 235.00 W
        Min Power Limit             : 150.00 W
        Max Power Limit             : 235.00 W
    Clocks
        Graphics                    : 324 MHz
        SM                          : 324 MHz
        Memory                      : 324 MHz
    Applications Clocks
        Graphics                    : 732 MHz
        Memory                      : 2600 MHz
    Default Applications Clocks
        Graphics                    : 732 MHz
        Memory                      : 2600 MHz
    Max Clocks
        Graphics                    : 784 MHz
        SM                          : 784 MHz
        Memory                      : 2600 MHz
    Compute Processes               : None
==============NVSMI LOG==============

Timestamp                           : Wed Nov 27 15:49:32 2013
Driver Version                      : 319.37

Attached GPUs                       : 1
GPU 0000:02:00.0
    Supported Clocks
        Memory                      : 2600 MHz
            Graphics                : 784 MHz
            Graphics                : 758 MHz
            Graphics                : 732 MHz
            Graphics                : 705 MHz
            Graphics                : 666 MHz
            Graphics                : 640 MHz
            Graphics                : 614 MHz
        Memory                      : 324 MHz
            Graphics                : 324 MHz

CUDA Device Query for Tesla K20

NVIDIA’s deviceQuery utility (from the CUDA SDK examples) demonstrates how applications can query the capabilities of a CUDA-capable GPU. This utility also gives valuable details about the Tesla GPU products.

nvidia-smi: Tesla K20

deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K20m"
  CUDA Driver Version / Runtime Version          5.5 / 5.5
  CUDA Capability Major/Minor version number:    3.5
  Total amount of global memory:                 5120 MBytes (5368512512 bytes)
  (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
  GPU Clock rate:                                706 MHz (0.71 GHz)
  Memory Clock rate:                             2600 Mhz
  Memory Bus Width:                              320-bit
  L2 Cache Size:                                 1310720 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Tesla K20m
Result = PASS

Usage Differences for Tesla K20 vs K20X

It’s worth noting that there are different versions of the Tesla GPU products depending on the type of installation. For Tesla K20 we can provide anything from quiet workstations to full compute clusters. For the higher-performing Tesla K20X the options are limited. In particular, it’s not possible to provide a workstation which is quiet.

If a tower/workstation form-factor is required, we do have one available. Unfortunately, it’s rather noisy. There is no quiet configuration on the market for Tesla K20X. Please contact one of Microway’s HPC experts if you would like to discuss the alternatives.

For those curious as to why there are two separate product versions, it’s simply a question of optimized cooling. Take a look at our GPU servers and you’ll see that airflow is carefully channeled through the GPU slots. This provides the best possible cooling for dense installations, but simply doesn’t cool properly in workstation configurations. For workstations, actively-cooled versions are available for many GPUs.

Eliot Eshelman

About Eliot Eshelman

My interests span from astrophysics to bacteriophages; high-performance computers to small spherical magnets. I've been an avid Linux geek (with a focus on HPC) for more than a decade. I work as Microway's Vice President of Strategic Accounts and HPC Initiatives.
This entry was posted in Hardware and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *