Tesla V100 “Volta” GPU Review

The next generation NVIDIA Volta architecture is here. With it comes the new Tesla V100 “Volta” GPU, the most advanced datacenter GPU ever built.

Volta is NVIDIA’s 2nd GPU architecture in ~12 months, and it builds upon the massive advancements of the Pascal architecture. Whether your workload is in HPC, AI, or even remote visualization & graphics acceleration, Tesla V100 has something for you.

Two Flavors, one giant leap: Tesla V100 PCI-E & Tesla V100 with NVLink

For those who love speeds and feeds, here’s a summary of the key enhancements vs Tesla P100 GPUs

Performance of Tesla GPUs, Generation to Generation
Tesla V100 with NVLink Tesla V100 PCI-E Tesla P100 with NVLink Tesla P100 PCI-E Ratio Tesla V100:P100
DP TFLOPS 7.8 TFLOPS 7.0 TFLOPS 5.3 TFLOPS 4.7 TFLOPS ~1.4-1.5X
SP TFLOPS 15.7 TFLOPS 14 TFLOPS 9.3 TFLOPS 8.74 TFLOPS ~1.4-1.5X
TensorFLOPS 125 TFLOPS 112 TFLOPS 21.2 TFLOPS 1/2 Precision 18.7 TFLOPS 1/2 Precision ~6X
Interface (bidirec. BW)
300GB/sec 32GB/sec 160GB/sec 32GB/sec 1.88X NVLink
9.38X PCI-E
Memory Bandwidth 900GB/sec 900GB/sec 720GB/sec 720GB/sec 1.25X
CUDA Cores
(Tensor Cores)
5120
(640)
5120
(640)
3584 3584

Selecting the right Tesla V100 for you:

With Tesla P100 “Pascal” GPUs, there was a substantial price premium to the NVLink-enabled SXM2.0 form factor GPUs. We’re excited to see things even out for Tesla V100.

However, that doesn’t mean selecting a GPU is as simple as picking one that matches a system design. Here’s some guidance to help you evaluate your options:

Tesla V100 PCI-E

Tesla V100 PCI-E

Tesla V100 PCIe GPU

  • Great for the workload is a blend of HPC & Deep Learning
  • Fits in some existing PCI-E GPU servers
  • Best where workload is not bandwidth/data transfer constrained

Tesla V100 PCI-E is available in our NumberSmasher 1U GPU Server, NumberSmasher 4U GPU Server, and Octoputer platforms today.

Tesla V100 with NVLink

Tesla V100 with NVLink

Tesla V100 SXM 2.0 GPU

  • Where performance matters
  • Where GPU:GPU or CPU:GPU bandwidth are paramount
  • Where advanced system typologies are required
  • Where your priority is Coherence and CORAL system emulation (building mini-versions of Summit & Sierra)

Tesla V100 with NVLink is available today in DGX-1V, NumberSmasher 1U Tesla GPU Server with NVLink, and Power Systems AC922

Performance Summary

Increases in relative performance are widely workload dependent. But early testing demonstates HPC performance advancing approximately 50%, in just a 12 month period.
Tesla V100 HPC PerformanceTesla V100 HPC Performance
If you haven’t made the jump to Tesla P100 yet, Tesla V100 is an even more compelling proposition.

For Deep Learning, Tesla V100 delivers a massive leap in performance. This extends to training time, and it also brings unique advancements for inference as well.
Deep Learning Performance Summary -Tesla V100

Enhancements for Every Workload

Now that we’ve learned a bit about each GPU, let’s review the breakthrough advancements for Tesla V100.

Next Generation NVIDIA NVLink Technology (NVLink 2.0)

NVLink helps you to transcend the bottlenecks in PCI-Express based systems and dramatically improve the data flow to GPU accelerators.

The total data pipe for each Tesla V100 with NVLink GPUs is now 300GB/sec of bidirectional bandwidth—that’s nearly 10X the data flow of PCI-E x16 3.0 interfaces!

Bandwidth

NVIDIA has enhanced the bandwidth of each individual NVLink brick by 25%: from 40GB/sec (20+20GB/sec in each direction) to 50GB/sec (25+25GB/sec) of bidirectional bandwidth.

This improved signaling helps carry the load of data intensive applications.

Number of Bricks

But enhanced NVLink isn’t just about simple signaling improvements. Point to point NVLink connections are divided into “bricks” or links. Each brick delivers

From 4 bricks to 6

Over and above the signaling improvements, Tesla V100 with NVLink increases the number of NVLink “bricks” that are embedded into each GPU: from 4 bricks to 6.

This 50% increase in brick quantity delivers a large increase in bandwidth for the world’s most data intensive workloads. It also allows for more diverse set of system designs and configurations.

The NVLink Bank Account

NVIDIA NVLink technology continues to be a breakthrough for both HPC and Deep Learning workloads. But as with previous generations of NVLink, there’s a design choice related to a simple question:

Where do you want to spend your NVLink bandwidth?

Think about NVLink bricks as a “spending” or “bank account.” Each NVLink system-design strikes a different balance between where they “spend the funds.” You may wish to:

  • Spend links on GPU:GPU communication
  • Focus on increasing the number of GPUs
  • Broaden CPU:GPU bandwidth to overcome the PCI-E bottleneck
  • Create a balanced system, or prioritize design choices solely for a single workload

There are good reasons for each, or combinations, of these choices. DGX-1V, NumberSmasher 1U GPU Server with NVLink, and future IBM Power Systems products all set different priorities.

We’ll dive more deeply into these choices in a future post.

Net-net: the NVLink bandwidth increases from 160GB/sec to 300GB/sec (bidirectional), and a number of new, diverse HPC and AI hardware designs are now possible.

Programming Improvements

Tesla V100 and CUDA 9 bring a host of improvements for GPU programmers. These include:

  • Cooperative Groups
  • A new L1 cache + shared memory, that simplifies programming
  • A new SIMT model, that relieves the need to program to fit 32 thread warps

We won’t explore these in detail in this post, but we encourage you to read the following resources for more:

What does Tesla V100 mean for me?

Tesla V100 GPUs mean a massive change for many workloads. But each workload will see different impacts. Here’s a short summary of what we see:

  1. An increase in performance on HPC applications (on paper FLOPS increase of 50%, diverse real-world impacts)
  2. A massive leap for Deep Learning Training
  3. 1 GPU, many Deep Learning workloads
  4. New system designs, better tuned to your applications
  5. Radical new, and radically simpler, programming paradigms (coherence)

Need help navigating Volta? Here’s a few ways to move forward:

Configure a Solution Now, if you know what design you are after

Contact a Microway HPC Expert about Volta and Tesla V100

Learn More about Tesla V100 GPU enabled Microway offerings

This entry was posted in Hardware and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *