Tag Archives: gpu

Benchmarking NAMD on a GPU-Accelerated HPC Cluster with NVIDIA Tesla K40

This is a tutorial on the usage of GPU-accelerated NAMD for molecular dynamics simulations. We make it simple to test your codes on the latest high-performance systems – you are free to use your own applications on our cluster and … Continue reading

Running AMBER on a GPU Cluster

Welcome to our tutorial on GPU-accelerated AMBER! We make it easy to benchmark your applications and problem sets on the latest hardware. Our GPU Test Drive Cluster provides developers, scientists, academics, and anyone else interested in GPU computing with the … Continue reading

CUB in Action – some simple examples using the CUB template library

In my previous post, I presented a brief introduction to the CUB library of CUDA primitives written by Duane Merrill of NVIDIA. CUB provides a set of highly-configurable software components, which include warp- and block-level kernel components as well as … Continue reading

PCI-Express Root Complex Confusion?

I’ve had several customers comment to me that it’s difficult to find someone that can speak with them intelligently about PCI-E root complex questions. And yet, it’s of vital importance when considering multi-CPU systems that have various PCI-Express devices (most … Continue reading

Introducing CUDA UnBound (CUB)

CUB – a configurable C++ template library of high-performance CUDA primitives Each new generation of NVIDIA GPUs brings with it a dramatic increase in compute power and the pace of development over the past several years has been rapid. The … Continue reading

NVIDIA Tesla K40 GPUs, the High Performance Choice for Many Applications

NVIDIA Tesla K40 is now the leading Tesla GPU for performance.  Here are some important use-cases where Tesla K40 might greatly accelerate your GPU-accelerated applications: Pick Tesla K40 for Large Data Sets GPU memory has always been at a greater … Continue reading

NVIDIA Tesla K40 “Atlas” GPU Accelerator (Kepler GK110b) Up Close

NVIDIA’s latest Tesla accelerator is without a doubt the most powerful GPU available. With almost 3,000 CUDA cores and 12GB GDDR5 memory, it wins in practically every* performance test you’ll see. As with the “Kepler” K20 GPUs, the Tesla K40 … Continue reading

CUDA Code Migration (Fermi to Kepler Architecture) on Tesla GPUs

The debut of NVIDIA’s Kepler architecture in 2012 marked a significant milestone in the evolution of general-purpose GPU computing. In particular, Kepler GK110 (compute capability 3.5) brought unrivaled compute power and introduced a number of new features to enhance GPU … Continue reading

Avoiding GPU Memory Performance Bottlenecks

This post is Topic #3 (post 3) in our series Parallel Code: Maximizing your Performance Potential. Many applications contain algorithms which make use of multi-dimensional arrays (or matrices). For cases where threads need to index the higher dimensions of the … Continue reading

GPU Shared Memory Performance Optimization

This post is Topic #3 (post 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, I provided an introduction to the various types of memory available for use in a CUDA application. Now that you’re … Continue reading