Tesla V100 “Volta” GPU Review

The next generation NVIDIA Volta architecture is here. With it comes the new Tesla V100 “Volta” GPU, the most advanced datacenter GPU ever built.

Volta is NVIDIA’s 2nd GPU architecture in ~12 months, and it builds upon the massive advancements of the Pascal architecture. Whether your workload is in HPC, AI, or even remote visualization & graphics acceleration, Tesla V100 has something for you.

Two Flavors, one giant leap: Tesla V100 PCI-E & Tesla V100 with NVLink

For those who love speeds and feeds, here’s a summary of the key enhancements vs Tesla P100 GPUs

Performance of Tesla GPUs, Generation to Generation
Tesla V100 with NVLink Tesla V100 PCI-E Tesla P100 with NVLink Tesla P100 PCI-E Ratio Tesla V100:P100
DP TFLOPS 7.5 TFLOPS 7.0 TFLOPS 5.3 TFLOPS 4.7 TFLOPS ~1.4-1.5X
SP TFLOPS 15 TFLOPS 14 TFLOPS 9.3 TFLOPS 8.74 TFLOPS ~1.4-1.5X
TensorFLOPS 120 TFLOPS 112 TFLOPS 21.2 TFLOPS 1/2 Precision 18.7 TFLOPS 1/2 Precision ~6X
Interface (bidirec. BW)
300GB/sec 32GB/sec 160GB/sec 32GB/sec 1.88X NVLink
9.38X PCI-E
Memory Bandwidth 900GB/sec 900GB/sec 720GB/sec 720GB/sec 1.25X
CUDA Cores
(Tensor Cores)
5120
(640)
5120
(640)
3584 3584

Selecting the right Tesla V100 for you:

With Tesla P100 “Pascal” GPUs, there was a substantial price premium to the NVLink-enabled SXM2.0 form factor GPUs. We’re excited to see things even out for Tesla V100.

However, that doesn’t mean selecting a GPU is as simple as picking one that matches a system design. Here’s some guidance to help you evaluate your options:
Continue reading

One-shot Learning Methods Applied to Drug Discovery with DeepChem

Experimental data sets for drug discovery are sometimes limited in size, due to the difficulty of gathering this type of data. Drug discovery data sets are expensive to obtain, and some are the result of clinical trials, which might not be repeatable for ethical reasons. The ClinTox data set, for example, is comprised of data from FDA clinical trials of drug candidates, where some data sets are derived from failures, due to toxic side effects [2]. For cases where training data is scarce, application of one-shot learning methods have demonstrated significantly improved performance over methods consisting only of graphical convolution networks. The performance of one-shot network architectures will be discussed here for several drug discovery data sets, which are described in Table 1. These data sets, along with one-shot learning methods, have been integrated into the DeepChem deep learning framework, as a result of research published by Altae-Tran, et al. [1]. While data remains scarce for some problem domains, such as drug discovery, one-shot learning methods could pose an important alternative network architecture, which can possibly far outperform methods which use only graphical convolution.

Continue reading

DeepChem – a Deep Learning Framework for Drug Discovery

A powerful new open source deep learning framework for drug discovery is now available for public download on github. This new framework, called DeepChem, is python-based, and offers a feature-rich set of functionality for applying deep learning to problems in drug discovery and cheminformatics. Previous deep learning frameworks, such as scikit-learn have been applied to chemiformatics, but DeepChem is the first to accelerate computation with NVIDIA GPUs.

The framework uses Google TensorFlow, along with scikit-learn, for expressing neural networks for deep learning. It also makes use of the RDKit python framework, for performing more basic operations on molecular data, such as converting SMILES strings into molecular graphs. The framework is now in the alpha stage, at version 0.1. As the framework develops, it will move toward implementing more models in TensorFlow, which use GPUs for training and inference. This new open source framework is poised to become an accelerating factor for innovation in drug discovery across industry and academia.

Continue reading

GPU-accelerated HPC Containers with Singularity

Fighting with application installations is frustrating and time consuming. It’s not what domain experts should be spending their time on. And yet, every time users move their project to a new system, they have to begin again with a re-assembly of their complex workflow.

This is a problem that containers can help to solve. HPC groups have had some success with more traditional containers (e.g., Docker), but there are security concerns that have made them difficult to use on HPC systems. Singularity, the new tool from the creator of CentOS and Warewulf, aims to resolve these issues.

Continue reading

NVIDIA Tesla P40 GPU Accelerator (Pascal GP102) Up Close

As NVIDIA’s GPUs become increasingly vital to the fields of AI and intelligent machines, NVIDIA has produced GPU models specifically targeted to these applications. The new Tesla P40 GPU is NVIDIA’s premiere product for deep learning deployments. It is specifically designed for high-speed inference workloads, which means running data through pre-trained neural networks. However, it also offers significant processing performance for projects which do not require 64-bit double-precision floating point capability (many neural networks can be trained using the 32-bit single-precision floating point on the Tesla P40). For those cases, these GPUs can be used to accelerate both the neural network training and the inference.

Continue reading

Deep Learning Benchmarks of NVIDIA Tesla P100 PCIe, Tesla K80, and Tesla M40 GPUs

Sources of CPU benchmarks, used for estimating performance on similar workloads, have been available throughout the course of CPU development. For example, the Standard Performance Evaluation Corporation has compiled a large set of applications benchmarks, running on a variety of CPUs, across a multitude of systems. There are certainly benchmarks for GPUs, but only during the past year has an organized set of deep learning benchmarks been published. Called DeepMarks, these deep learning benchmarks are available to all developers who want to get a sense of how their application might perform across various deep learning frameworks.

The benchmarking scripts used for the DeepMarks study are published at GitHub. The original DeepMarks study was run on a Titan X GPU (Maxwell microarchitecture), having 12GB of onboard video memory. Here we will examine the performance of several deep learning frameworks on a variety of Tesla GPUs, including the Tesla P100 16GB PCIe, Tesla K80, and Tesla M40 12GB GPUs.

Continue reading

Comparing NVLink vs PCI-E with NVIDIA Tesla P100 GPUs on OpenPOWER Servers

The new NVIDIA Tesla P100 GPUs are available with both PCI-Express and NVLink connectivity. How do these two types of connectivity compare? This post provides a rundown of NVLink vs PCI-E and explores the benefits of NVIDIA’s new NVLink technology.

Photo of NVIDIA Tesla P100 NVLink GPUs in an OpenPOWER server

Continue reading

NVIDIA Tesla P100 NVLink 16GB GPU Accelerator (Pascal GP100 SXM2) Up Close

The NVIDIA Tesla P100 NVLink GPUs are a big advancement. For the first time, the GPU is stepping outside the traditional “add in card” design. No longer tied to the fixed specifications of PCI-Express cards, NVIDIA’s engineers have designed a new form factor that best suits the needs of the GPU. With their SXM2 design, NVIDIA can run GPUs to their full potential.

One of the biggest changes this allows is the NVLink interconnect, which allows GPUs to operate beyond the restrictions of the PCI-Express bus. Instead, the GPUs communicate with one another over this high-speed link. Additionally, these new “Pascal” architecture GPUs bring improvements including higher performance, faster connectivity, and more flexibility for users & programmers.

Close-Up Photo of the NVIDIA Tesla P100 NVLink GPU

Continue reading

NVIDIA Tesla P100 PCI-E 16GB GPU Accelerator (Pascal GP100) Up Close

NVIDIA’s new Tesla P100 PCI-E GPU is a big step up for HPC users, and for GPU users in general. Although other workloads have been leveraging the newer “Maxwell” architecture, HPC applications have been using “Kepler” GPUs for a couple years. The new GPUs bring many improvements, including higher performance, faster connectivity, and more flexibility for users & programmers.

Close-up photo of the NVIDIA Tesla P100 PCI-E GPU

Continue reading

NVIDIA Tesla P100 Price Analysis

Now that NVIDIA has launched their new Pascal GPUs, the next question is “What is the Tesla P100 Price?”

Although it’s still a month or two before shipments of P100 start, the specifications and pricing of Microway’s Tesla P100 GPU-accelerated systems are available. If you’re planning a new project for delivery later this year, we’d be happy to help you get on board. These new GPUs are exceptionally powerful.

Continue reading