Home > HPC Tech Tips

NVIDIA Datacenter Manager (DCGM) for More Effective GPU Management

Posted on April 1, 2018 by Brett Newman

Managing an HPC server can be a tricky job, and managing multiple servers even more complex. Adding GPUs adds even more power yet new levels of granularity. Luckily, there’s a powerful, and effective tool available for managing multiple servers or a cluster of GPUs: NVIDIA Datacenter GPU Manager.

Executing hardware or health checks

DCGM’s power comes from its ability to access all kinds of low level data from the GPUs in your system. Much of this data is reported by NVML (NVIDIA Management Library), and it may be accessible via IPMI on your system. But DCGM helps make it far easier to access and use the following:

Report what GPUs are installed, in which slots and PCI-E trees and make a group

Build a group of GPUs once you know which slots your GPUs are installed in and on which PCI-E trees and NUMA nodes they are on. This is great for binding jobs, linking available capabilities.

Determine GPU link states, bandwidths

Provide a report of the PCI-Express link speed each GPU is running at. You may also perform D2D and H2D bandwidth tests inside your system (to take action on the reports)

Read temps, boost states, power consumption, or utilization

Deliver data on the energy usage and utilization of your GPUs. This data can be used to control the cluster

Driver versions and CUDA versions

Report on the versions of CUDA, NVML, and the NVIDIA GPU driver installed on your system

Run sample jobs and integrated validation

Run basic diagnostics and sample jobs that are built into the DCGM package.

Set policies

DCGM provide a mechanism to set policies to a group of GPUs.

Continue reading →

Designing A Production-Class AI Cluster

Posted on October 27, 2017 by Sam Wheeler

Artificial Intelligence (AI) and, more specifically, Deep Learning (DL) are revolutionizing the way businesses utilize the vast amounts of data they collect and how researchers accelerate their time to discovery. Some of the most significant examples come from the way AI has already impacted life as we know it such as smartphone speech recognition, search engine image classification, and cancer detection in biomedical imaging. Most businesses have collected troves of data or incorporated new avenues to collect data in recent years. Through the innovations of deep learning, that same data can be used to gain insight, make accurate predictions, and pave the path to discovery.

Developing a plan to integrate AI workloads into an existing business infrastructure or research group presents many challenges. However, there are two key elements that will drive the decisions to customizing an AI cluster. First, understanding the types and volumes of data is paramount to beginning to understand the computational requirements of training the neural network. Secondly, understanding the business expectation for time to result is equally important. Each of these factors influence the first and second stages of the AI workload, respectively. Underestimating the data characteristics will result in insufficient computational and infrastructure resources to train the networks in a reasonable timeframe. Moreover, underestimating the value and requirement of time-to-results can fail to deliver ROI to the business or hamper research results.

Below are summaries of the different features of system design that must be evaluated when configuring an AI cluster.

Continue reading →

Tesla V100 “Volta” GPU Review

Posted on September 28, 2017 by Brett Newman

Index of our review:
Speeds and Feeds
Which GPU is for me?
Enhanced NVLink
Programming Improvements
What Volta Means for me?
Tesla V100 SXM 2.0 GPU

The next generation NVIDIA Volta architecture is here. With it comes the new Tesla V100 “Volta” GPU, the most advanced datacenter GPU ever built.

Volta is NVIDIA’s 2nd GPU architecture in ~12 months, and it builds upon the massive advancements of the Pascal architecture. Whether your workload is in HPC, AI, or even remote visualization & graphics acceleration, Tesla V100 has something for you.

Two Flavors, one giant leap: Tesla V100 PCI-E & Tesla V100 with NVLink

For those who love speeds and feeds, here’s a summary of the key enhancements vs Tesla P100 GPUs

Performance of Tesla GPUs, Generation to Generation
	Tesla V100 with NVLink	Tesla V100 PCI-E	Tesla P100 with NVLink	Tesla P100 PCI-E	Ratio Tesla V100:P100
DP TFLOPS	7.8 TFLOPS	7.0 TFLOPS	5.3 TFLOPS	4.7 TFLOPS	~1.4-1.5X
SP TFLOPS	15.7 TFLOPS	14 TFLOPS	9.3 TFLOPS	8.74 TFLOPS	~1.4-1.5X
TensorFLOPS	125 TFLOPS	112 TFLOPS	21.2 TFLOPS 1/2 Precision	18.7 TFLOPS 1/2 Precision	~6X
Interface (bidirec. BW)	300GB/sec	32GB/sec	160GB/sec	32GB/sec	1.88X NVLink 9.38X PCI-E
Memory Bandwidth	900GB/sec	900GB/sec	720GB/sec	720GB/sec	1.25X
CUDA Cores (Tensor Cores)	5120 (640)	5120 (640)	3584	3584

Selecting the right Tesla V100 for you:

With Tesla P100 “Pascal” GPUs, there was a substantial price premium to the NVLink-enabled SXM2.0 form factor GPUs. We’re excited to see things even out for Tesla V100.

However, that doesn’t mean selecting a GPU is as simple as picking one that matches a system design. Here’s some guidance to help you evaluate your options:
Continue reading →

One-shot Learning Methods Applied to Drug Discovery with DeepChem

Posted on July 26, 2017 by John Murphy

Experimental data sets for drug discovery are sometimes limited in size, due to the difficulty of gathering this type of data. Drug discovery data sets are expensive to obtain, and some are the result of clinical trials, which might not be repeatable for ethical reasons. The ClinTox data set, for example, is comprised of data from FDA clinical trials of drug candidates, where some data sets are derived from failures, due to toxic side effects [2]. For cases where training data is scarce, application of one-shot learning methods have demonstrated significantly improved performance over methods consisting only of graphical convolution networks. The performance of one-shot network architectures will be discussed here for several drug discovery data sets, which are described in Table 1. These data sets, along with one-shot learning methods, have been integrated into the DeepChem deep learning framework, as a result of research published by Altae-Tran, et al. [1]. While data remains scarce for some problem domains, such as drug discovery, one-shot learning methods could pose an important alternative network architecture, which can possibly far outperform methods which use only graphical convolution.

Continue reading →

DeepChem – a Deep Learning Framework for Drug Discovery

Posted on April 28, 2017 by John Murphy

A powerful new open source deep learning framework for drug discovery is now available for public download on github. This new framework, called DeepChem, is python-based, and offers a feature-rich set of functionality for applying deep learning to problems in drug discovery and cheminformatics. Previous deep learning frameworks, such as scikit-learn have been applied to chemiformatics, but DeepChem is the first to accelerate computation with NVIDIA GPUs.

The framework uses Google TensorFlow, along with scikit-learn, for expressing neural networks for deep learning. It also makes use of the RDKit python framework, for performing more basic operations on molecular data, such as converting SMILES strings into molecular graphs. The framework is now in the alpha stage, at version 0.1. As the framework develops, it will move toward implementing more models in TensorFlow, which use GPUs for training and inference. This new open source framework is poised to become an accelerating factor for innovation in drug discovery across industry and academia.

Continue reading →

GPU-accelerated HPC Containers with Singularity

Posted on April 11, 2017 by Eliot Eshelman

Fighting with application installations is frustrating and time consuming. It’s not what domain experts should be spending their time on. And yet, every time users move their project to a new system, they have to begin again with a re-assembly of their complex workflow.

This is a problem that containers can help to solve. HPC groups have had some success with more traditional containers (e.g., Docker), but there are security concerns that have made them difficult to use on HPC systems. Singularity, the new tool from the creator of CentOS and Warewulf, aims to resolve these issues.

Continue reading →

NVIDIA Tesla P40 GPU Accelerator (Pascal GP102) Up Close

Posted on February 7, 2017 by Eliot Eshelman

As NVIDIA’s GPUs become increasingly vital to the fields of AI and intelligent machines, NVIDIA has produced GPU models specifically targeted to these applications. The new Tesla P40 GPU is NVIDIA’s premiere product for deep learning deployments. It is specifically designed for high-speed inference workloads, which means running data through pre-trained neural networks. However, it also offers significant processing performance for projects which do not require 64-bit double-precision floating point capability (many neural networks can be trained using the 32-bit single-precision floating point on the Tesla P40). For those cases, these GPUs can be used to accelerate both the neural network training and the inference.

Continue reading →

Deep Learning Benchmarks of NVIDIA Tesla P100 PCIe, Tesla K80, and Tesla M40 GPUs

Posted on January 27, 2017 by John Murphy

Sources of CPU benchmarks, used for estimating performance on similar workloads, have been available throughout the course of CPU development. For example, the Standard Performance Evaluation Corporation has compiled a large set of applications benchmarks, running on a variety of CPUs, across a multitude of systems. There are certainly benchmarks for GPUs, but only during the past year has an organized set of deep learning benchmarks been published. Called DeepMarks, these deep learning benchmarks are available to all developers who want to get a sense of how their application might perform across various deep learning frameworks.

The benchmarking scripts used for the DeepMarks study are published at GitHub. The original DeepMarks study was run on a Titan X GPU (Maxwell microarchitecture), having 12GB of onboard video memory. Here we will examine the performance of several deep learning frameworks on a variety of Tesla GPUs, including the Tesla P100 16GB PCIe, Tesla K80, and Tesla M40 12GB GPUs.

Continue reading →

Comparing NVLink vs PCI-E with NVIDIA Tesla P100 GPUs on OpenPOWER Servers

Posted on January 26, 2017 by Eliot Eshelman

The new NVIDIA Tesla P100 GPUs are available with both PCI-Express and NVLink connectivity. How do these two types of connectivity compare? This post provides a rundown of NVLink vs PCI-E and explores the benefits of NVIDIA’s new NVLink technology.

Photo of NVIDIA Tesla P100 NVLink GPUs in an OpenPOWER server

Continue reading →

NVIDIA Tesla P100 NVLink 16GB GPU Accelerator (Pascal GP100 SXM2) Up Close

Posted on January 18, 2017 by Eliot Eshelman

The NVIDIA Tesla P100 NVLink GPUs are a big advancement. For the first time, the GPU is stepping outside the traditional “add in card” design. No longer tied to the fixed specifications of PCI-Express cards, NVIDIA’s engineers have designed a new form factor that best suits the needs of the GPU. With their SXM2 design, NVIDIA can run GPUs to their full potential.

One of the biggest changes this allows is the NVLink interconnect, which allows GPUs to operate beyond the restrictions of the PCI-Express bus. Instead, the GPUs communicate with one another over this high-speed link. Additionally, these new “Pascal” architecture GPUs bring improvements including higher performance, faster connectivity, and more flexibility for users & programmers.

Continue reading →

NVIDIA Datacenter Manager (DCGM) for More Effective GPU Management

Executing hardware or health checks

Report what GPUs are installed, in which slots and PCI-E trees and make a group

Determine GPU link states, bandwidths

Read temps, boost states, power consumption, or utilization

Driver versions and CUDA versions

Run sample jobs and integrated validation

Set policies

Designing A Production-Class AI Cluster

Tesla V100 “Volta” GPU Review

Two Flavors, one giant leap: Tesla V100 PCI-E & Tesla V100 with NVLink

Selecting the right Tesla V100 for you:

One-shot Learning Methods Applied to Drug Discovery with DeepChem

DeepChem – a Deep Learning Framework for Drug Discovery

GPU-accelerated HPC Containers with Singularity

NVIDIA Tesla P40 GPU Accelerator (Pascal GP102) Up Close

Deep Learning Benchmarks of NVIDIA Tesla P100 PCIe, Tesla K80, and Tesla M40 GPUs

Comparing NVLink vs PCI-E with NVIDIA Tesla P100 GPUs on OpenPOWER Servers

NVIDIA Tesla P100 NVLink 16GB GPU Accelerator (Pascal GP100 SXM2) Up Close

Archives

Meta

Talk to an Expert

Take a Test Drive

Configure Your Solution

Schedule a Consultation

Subscribe to Microway’s Technical Newsletter

HPC-Tech-Tip Categories

Subscribe to Blog

Technologies

Products

Knowledge Center

Pre-Configured Systems

NVIDIA DGX H100™

NVIDIA DGX POD™

EOL – NVIDIA DGX A100™

AI Anywhere Solution

Executing hardware or health checks

Report what GPUs are installed, in which slots and PCI-E trees and make a group

Determine GPU link states, bandwidths

Read temps, boost states, power consumption, or utilization

Driver versions and CUDA versions

Run sample jobs and integrated validation

Set policies

Two Flavors, one giant leap: Tesla V100 PCI-E & Tesla V100 with NVLink

Selecting the right Tesla V100 for you:

Archives

Meta

Talk to an Expert

Take a Test Drive

Configure Your Solution

Schedule a Consultation

Subscribe to Microway’s Technical Newsletter

HPC-Tech-Tip Categories

HPC-Tech-Tip Tags

Subscribe to Blog