Comparing NVLink vs PCI-E with NVIDIA Tesla P100 GPUs on OpenPOWER Servers

The new NVIDIA Tesla P100 GPUs are available with both PCI-Express and NVLink connectivity. How do these two types of connectivity compare? This post provides a rundown of NVLink vs PCI-E and explores the benefits of NVIDIA’s new NVLink technology.

Photo of NVIDIA Tesla P100 NVLink GPUs in an OpenPOWER server

Continue reading

NVIDIA Tesla P100 NVLink 16GB GPU Accelerator (Pascal GP100 SXM2) Up Close

The NVIDIA Tesla P100 NVLink GPUs are a big advancement. For the first time, the GPU is stepping outside the traditional “add in card” design. No longer tied to the fixed specifications of PCI-Express cards, NVIDIA’s engineers have designed a new form factor that best suits the needs of the GPU. With their SXM2 design, NVIDIA can run GPUs to their full potential.

One of the biggest changes this allows is the NVLink interconnect, which allows GPUs to operate beyond the restrictions of the PCI-Express bus. Instead, the GPUs communicate with one another over this high-speed link. Additionally, these new “Pascal” architecture GPUs bring improvements including higher performance, faster connectivity, and more flexibility for users & programmers.

Close-Up Photo of the NVIDIA Tesla P100 NVLink GPU

Continue reading

NVIDIA Tesla P100 PCI-E 16GB GPU Accelerator (Pascal GP100) Up Close

NVIDIA’s new Tesla P100 PCI-E GPU is a big step up for HPC users, and for GPU users in general. Although other workloads have been leveraging the newer “Maxwell” architecture, HPC applications have been using “Kepler” GPUs for a couple years. The new GPUs bring many improvements, including higher performance, faster connectivity, and more flexibility for users & programmers.

Close-up photo of the NVIDIA Tesla P100 PCI-E GPU

Continue reading

NVIDIA Tesla P100 Price Analysis

Now that NVIDIA has launched their new Pascal GPUs, the next question is “What is the Tesla P100 Price?”

Although it’s still a month or two before shipments of P100 start, the specifications and pricing of Microway’s Tesla P100 GPU-accelerated systems are available. If you’re planning a new project for delivery later this year, we’d be happy to help you get on board. These new GPUs are exceptionally powerful.

Continue reading

More Tips on OpenACC Acceleration

One blog post may not be enough to present all tips for performance acceleration using OpenACC. So here, more tips on OpenACC acceleration are provided, complementing our previous blog post on accelerating code with OpenACC.

Further tips discussed here are:

  • linearizing a 2D array
  • usage of contiguous memory
  • parallelizing loops
  • PGI compiler information reports
  • OpenACC general guidelines
  • the OpenACC runtime library

Continue reading

Can I use Deep Learning?

If you’ve been reading the press this year, you’ve probably seen mention of deep learning or machine learning. You’ve probably gotten the impression they can do anything and solve every problem. It’s true that computers can be better than humans at recognizing people’s faces or playing the game Go. However, it’s not the solution to every problem. We want to help you understand if you can use deep learning. And if so, how it will help you.

Continue reading

Deep Learning Applications in Science and Engineering

Over the past decade, and particularly over the past several years, Deep learning applications have been developed for a wide range of scientific and engineering problems. For example, deep learning methods have recently increased the level of significance of the Higgs Boson detection at the LHC. Similar analysis is being used to explore possible decay modes of the Higgs. Deep Learning methods fall under the larger category of Machine Learning, which includes various methods, such as Support Vector Machines (SVMs), Kernel Methods, Hidden Markov Models (HMMs), Bayesian Methods, along with regression techniques, among others.

Deep learning is a methodology which involves computation using an artificial neural network (ANN). Training deep networks was not always within practical reach, however. The main difficulty arose from the vanishing/exploding gradient problem. See a previous blog on Theano and Keras for a discussion on this. Training deep networks has become feasible with the developments of GPU parallel computation, better error minimization algorithms, careful network weight initialization, the application of regularization or dropout methods, and the use of Rectified Linear Units (ReLUs) as artificial neuron activation functions. ReLUs can tune weights such that the backpropagation signal does not become too attenuated.

Continue reading

Microway joins the OpenPOWER Foundation

We’re excited to announce that Microway has joined the OpenPOWER Foundation as a Silver member. We are integrating the OpenPOWER technologies into our server systems and HPC clusters. We’re also offering our HPC software tools on OpenPOWER.

The collaboration between OpenPOWER members is going to bring exciting new possibilities to High Performance Computing, and to the IT industry in general. The OpenPOWER Foundation’s list of members is quite impressive, but also represents a very broad range of interests and industries. Our efforts will focus on molding these technologies into performant and easy-to-use HPC systems. Our experts ensure that Microway systems “just work”, so expect nothing less from our OpenPOWER offerings.

Continue reading

NVIDIA Tesla M40 24GB GPU Accelerator (Maxwell GM200) Up Close

NVIDIA has announced a new version of their popular Tesla M40 GPU – one with 24GB of high-speed GDDR5 memory. The name hasn’t really changed – the new GPU is named NVIDIA Tesla M40 24GB. If you are curious about the original version with less memory, we have a detailed examination of the original M40 GPU.

As support for GPUs grows – particularly in the exploding fields of Machine Learning and Deep Learning – there has been increasing need for large quantities of GPU memory. The Tesla M40 24GB provides the most memory available to date in a single-GPU Tesla card. The remaining specifications of the new M40 match that of the original: 7 TFLOPS of single-precision floating point performance.

The Tesla M40 continues to be the only high-performance Tesla compute GPU based upon the “Maxwell” architecture. “Maxwell” provides excellent performance per watt, as evidenced by the fact that this GPU provides 7 TFLOPS within a 250W power envelope.

Maximum single-GPU memory and performance: Tesla M40 24GB GPU

Available in Microway NumberSmasher GPU Servers and GPU Clusters

Photo of the NVIDIA Tesla M40 24GB GPU Accelerator bottom edge


  • 3072 CUDA GPU cores (GM200)
  • 7.0 TFLOPS single; 0.21 TFLOPS double-precision
  • 24GB GDDR5 memory
  • Memory bandwidth up to 288 GB/s
  • PCI-E x16 Gen3 interface to system
  • Dynamic GPU Boost for optimal clock speeds
  • Passive heatsink design for installation in qualified GPU servers

Continue reading

Intel Xeon E5-2600 v4 “Broadwell” Processor Review

Today we begin shipping Intel’s new Xeon E5-2600 v4 processors. They provide more CPU cores, more cache, faster memory access and more efficient operation. These are based upon the Intel microarchitecture code-named “Broadwell” – we expect them to be the HPC processors of choice.

Important changes in Xeon E5-2600 v4 include:

  • Up to 22 processor cores per CPU
  • Support for DDR4 memory speeds up to 2400MHz
  • Faster Floating Point Instruction performance
  • Improved parallelism in scheduling micro-operations
  • Improved performance for large data sets

Continue reading