Category Archives: Benchmarking

NVIDIA Tesla P40 GPU Accelerator (Pascal GP102) Up Close

As NVIDIA’s GPUs become increasingly vital to the fields of AI and intelligent machines, NVIDIA has produced GPU models specifically targeted to these applications. The new Tesla P40 GPU is NVIDIA’s premiere product for deep learning deployments. It is specifically … Continue reading

Deep Learning Benchmarks of NVIDIA Tesla P100 PCIe, Tesla K80, and Tesla M40 GPUs

Sources of CPU benchmarks, used for estimating performance on similar workloads, have been available throughout the course of CPU development. For example, the Standard Performance Evaluation Corporation has compiled a large set of applications benchmarks, running on a variety of … Continue reading

Comparing NVLink vs PCI-E with NVIDIA Tesla P100 GPUs on OpenPOWER Servers

The new NVIDIA Tesla P100 GPUs are available with both PCI-Express and NVLink connectivity. How do these two types of connectivity compare? This post provides a rundown of NVLink vs PCI-E and explores the benefits of NVIDIA’s new NVLink technology.

NVIDIA Tesla P100 NVLink 16GB GPU Accelerator (Pascal GP100 SXM2) Up Close

The NVIDIA Tesla P100 NVLink GPUs are a big advancement. For the first time, the GPU is stepping outside the traditional “add in card” design. No longer tied to the fixed specifications of PCI-Express cards, NVIDIA’s engineers have designed a … Continue reading

NVIDIA Tesla P100 PCI-E 16GB GPU Accelerator (Pascal GP100) Up Close

NVIDIA’s new Tesla P100 PCI-E GPU is a big step up for HPC users, and for GPU users in general. Although other workloads have been leveraging the newer “Maxwell” architecture, HPC applications have been using “Kepler” GPUs for a couple … Continue reading

More Tips on OpenACC Acceleration

One blog post may not be enough to present all tips for performance acceleration using OpenACC. So here, more tips on OpenACC acceleration are provided, complementing our previous blog post on accelerating code with OpenACC. Further tips discussed here are: … Continue reading

NVIDIA Tesla M40 24GB GPU Accelerator (Maxwell GM200) Up Close

NVIDIA has announced a new version of their popular Tesla M40 GPU – one with 24GB of high-speed GDDR5 memory. The name hasn’t really changed – the new GPU is named NVIDIA Tesla M40 24GB. If you are curious about … Continue reading

DDR4 Memory on Xeon E5-2600v3 with 3 DIMMs per channel

This week I had the opportunity to run the STREAM memory benchmark on a Microway 2U NumberSmasher server which supports up to 3 DIMMs per channel.  In practice, this system is typically configured with 768GB or 1.5TB of DDR4 memory. … Continue reading

NVIDIA Tesla M40 12GB GPU Accelerator (Maxwell GM200) Up Close

With the release of Tesla M40, NVIDIA continues to diversify its professional compute GPU lineup. Designed specifically for Deep Learning applications, the M40 provides 7 TFLOPS of single-precision floating point performance and 12GB of high-speed GDDR5 memory. It works extremely … Continue reading

Caffe Deep Learning Tutorial using NVIDIA DIGITS on Tesla K80 & K40 GPUs

In this Caffe deep learning tutorial, we will show how to use DIGITS in order to train a classifier on a small image set.  Along the way, we’ll see how to adjust certain run-time parameters, such as the learning rate, … Continue reading