NVIDIA has announced a new version of their popular Tesla M40 GPU – one with 24GB of high-speed GDDR5 memory. The name hasn’t really changed – the new GPU is named NVIDIA Tesla M40 24GB. If you are curious about the original version with less memory, we have a detailed examination of the original M40 GPU.
As support for GPUs grows – particularly in the exploding fields of Machine Learning and Deep Learning – there has been increasing need for large quantities of GPU memory. The Tesla M40 24GB provides the most memory available to date in a single-GPU Tesla card. The remaining specifications of the new M40 match that of the original: 7 TFLOPS of single-precision floating point performance.
The Tesla M40 continues to be the only high-performance Tesla compute GPU based upon the “Maxwell” architecture. “Maxwell” provides excellent performance per watt, as evidenced by the fact that this GPU provides 7 TFLOPS within a 250W power envelope.
Maximum single-GPU memory and performance: Tesla M40 24GB GPU
Available in Microway NumberSmasher GPU Servers and GPU Clusters
Specifications
- 3072 CUDA GPU cores (GM200)
- 7.0 TFLOPS single; 0.21 TFLOPS double-precision
- 24GB GDDR5 memory
- Memory bandwidth up to 288 GB/s
- PCI-E x16 Gen3 interface to system
- Dynamic GPU Boost for optimal clock speeds
- Passive heatsink design for installation in qualified GPU servers
Technical Details
The nvidia-smi
status report shown below reflects the capabilities of the new M40 24GB GPU:
[root@node4 ~]# nvidia-smi -a -i 0 ==============NVSMI LOG============== Timestamp : Fri May 20 15:35:26 2016 Driver Version : 361.28 Attached GPUs : 4 GPU 0000:84:00.0 Product Name : Tesla M40 Product Brand : Tesla Display Mode : Disabled Display Active : Disabled Persistence Mode : Enabled Accounting Mode : Enabled Accounting Mode Buffer Size : 1920 Driver Model Current : N/A Pending : N/A Serial Number : xxxxxxxxxxxxx GPU UUID : GPU-dbacebc6-3878-d72d-ebe9-87fb50xxxxxx Minor Number : 3 VBIOS Version : 84.00.56.00.03 MultiGPU Board : No Board ID : 0xXXXX Inforom Version Image Version : G600.xxxx.xx.xx OEM Object : 1.1 ECC Object : 3.0 Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A PCI Bus : 0x84 Device : 0x00 Domain : 0x0000 Device Id : 0xXXXXXXXX Bus Id : 0000:84:00.0 Sub System Id : 0x117110DE GPU Link Info PCIe Generation Max : 3 Current : 3 Link Width Max : 16x Current : 16x Bridge Chip Type : N/A Firmware : N/A Replays since reset : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Fan Speed : N/A Performance State : P0 Clocks Throttle Reasons Idle : Not Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active Sync Boost : Not Active Unknown : Not Active FB Memory Usage Total : 23039 MiB Used : 23009 MiB Free : 30 MiB BAR1 Memory Usage Total : 32768 MiB Used : 4 MiB Free : 32764 MiB Compute Mode : Default Utilization Gpu : 99 % Memory : 100 % Encoder : 0 % Decoder : 0 % Ecc Mode Current : Enabled Pending : Enabled ECC Errors Volatile Single Bit Device Memory : 0 Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Total : 0 Double Bit Device Memory : 0 Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Total : 0 Aggregate Single Bit Device Memory : 0 Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Total : 0 Double Bit Device Memory : 0 Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Total : 0 Retired Pages Single Bit ECC : 0 Double Bit ECC : 0 Pending : No Temperature GPU Current Temp : 51 C GPU Shutdown Temp : 92 C GPU Slowdown Temp : 89 C Power Readings Power Management : Supported Power Draw : 124.63 W Power Limit : 250.00 W Default Power Limit : 250.00 W Enforced Power Limit : 250.00 W Min Power Limit : 180.00 W Max Power Limit : 250.00 W Clocks Graphics : 1113 MHz SM : 1113 MHz Memory : 3004 MHz Video : 1025 MHz Applications Clocks Graphics : 1114 MHz Memory : 3004 MHz Default Applications Clocks Graphics : 947 MHz Memory : 3004 MHz Max Clocks Graphics : 1114 MHz SM : 1114 MHz Memory : 3004 MHz Video : 1024 MHz Clock Policy Auto Boost : On Auto Boost Default : On Processes : None
NVIDIA deviceQuery on Tesla M40 24GB
The output below, from the CUDA 7.5 SDK samples, shows the output of the architecture and capabilities of the Tesla M40 24GB GPU accelerators.
deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Tesla M40" CUDA Driver Version / Runtime Version 8.0 / 7.5 CUDA Capability Major/Minor version number: 5.2 Total amount of global memory: 23040 MBytes (24159059968 bytes) (24) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores GPU Max Clock rate: 1112 MHz (1.11 GHz) Memory Clock rate: 3004 Mhz Memory Bus Width: 384-bit L2 Cache Size: 3145728 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Tesla M40 Result = PASS
Additional Tesla M40 24GB Information
To learn more about the differences between the Tesla M40 24GB and other versions of the Tesla product line, please review our “Kepler” and “Maxwell” Tesla GPU knowledge center articles:
- In-Depth Comparison of NVIDIA Tesla “Kepler” GPU Accelerators
- In-Depth Comparison of NVIDIA Tesla “Maxwell” GPU Accelerators
To learn more about GPU-accelerated servers and clusters which provide the Tesla M40 24GB, please see our NVIDIA GPU technology page. Although we are able to provide the M40 in tower workstation systems, the design of the heatsink does not allow for quiet workstations.
This post was last updated on 2016-06-23