Tag Archives: CUDA

Home > CUDA

NVIDIA Tesla M40 24GB GPU Accelerator (Maxwell GM200) Up Close

Posted on April 1, 2016 by Eliot Eshelman

NVIDIA has announced a new version of their popular Tesla M40 GPU – one with 24GB of high-speed GDDR5 memory. The name hasn’t really changed – the new GPU is named NVIDIA Tesla M40 24GB. If you are curious about … Continue reading →

Accelerating Code with OpenACC and the NVIDIA Visual Profiler

Posted on March 14, 2016 by John Murphy

Comprised of a set of compiler directives, OpenACC was created to accelerate code using the many streaming multiprocessors (SM) present on a GPU. Similar to how OpenMP is used for accelerating code on multicore CPUs, OpenACC can accelerate code on … Continue reading →

NVIDIA Tesla M40 12GB GPU Accelerator (Maxwell GM200) Up Close

Posted on February 10, 2016 by Eliot Eshelman

With the release of Tesla M40, NVIDIA continues to diversify its professional compute GPU lineup. Designed specifically for Deep Learning applications, the M40 provides 7 TFLOPS of single-precision floating point performance and 12GB of high-speed GDDR5 memory. It works extremely … Continue reading →

CUB in Action – some simple examples using the CUB template library

Posted on June 18, 2014 by Justin Foley (for Microway)

In my previous post, I presented a brief introduction to the CUB library of CUDA primitives written by Duane Merrill of NVIDIA. CUB provides a set of highly-configurable software components, which include warp- and block-level kernel components as well as … Continue reading →

Introducing CUDA UnBound (CUB)

Posted on April 14, 2014 by Justin Foley (for Microway)

CUB – a configurable C++ template library of high-performance CUDA primitives Each new generation of NVIDIA GPUs brings with it a dramatic increase in compute power and the pace of development over the past several years has been rapid. The … Continue reading →

NVIDIA Tesla K40 “Atlas” GPU Accelerator (Kepler GK110b) Up Close

Posted on November 18, 2013 by Eliot Eshelman

NVIDIA’s latest Tesla accelerator is without a doubt the most powerful GPU available. With almost 3,000 CUDA cores and 12GB GDDR5 memory, it wins in practically every* performance test you’ll see. As with the “Kepler” K20 GPUs, the Tesla K40 … Continue reading →

CUDA Code Migration (Fermi to Kepler Architecture) on Tesla GPUs

Posted on October 27, 2013 by Justin Foley (for Microway)

The debut of NVIDIA’s Kepler architecture in 2012 marked a significant milestone in the evolution of general-purpose GPU computing. In particular, Kepler GK110 (compute capability 3.5) brought unrivaled compute power and introduced a number of new features to enhance GPU … Continue reading →

Avoiding GPU Memory Performance Bottlenecks

Posted on September 30, 2013 by Justin McKennon (for Microway)

This post is Topic #3 (post 3) in our series Parallel Code: Maximizing your Performance Potential. Many applications contain algorithms which make use of multi-dimensional arrays (or matrices). For cases where threads need to index the higher dimensions of the … Continue reading →

GPU Shared Memory Performance Optimization

Posted on September 26, 2013 by Justin McKennon (for Microway)

This post is Topic #3 (post 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, I provided an introduction to the various types of memory available for use in a CUDA application. Now that you’re … Continue reading →

GPU Memory Types – Performance Comparison

Posted on August 6, 2013 by Justin McKennon (for Microway)

This post is Topic #3 (part 1) in our series Parallel Code: Maximizing your Performance Potential. CUDA devices have several different memory spaces: Global, local, texture, constant, shared and register memory. Each type of memory on the device has its … Continue reading →

Tag Archives: CUDA

NVIDIA Tesla M40 24GB GPU Accelerator (Maxwell GM200) Up Close

Accelerating Code with OpenACC and the NVIDIA Visual Profiler

NVIDIA Tesla M40 12GB GPU Accelerator (Maxwell GM200) Up Close

CUB in Action – some simple examples using the CUB template library

Introducing CUDA UnBound (CUB)

NVIDIA Tesla K40 “Atlas” GPU Accelerator (Kepler GK110b) Up Close

CUDA Code Migration (Fermi to Kepler Architecture) on Tesla GPUs

Avoiding GPU Memory Performance Bottlenecks

GPU Shared Memory Performance Optimization

GPU Memory Types – Performance Comparison

Archives

Meta

Talk to an Expert

Take a Test Drive

Configure Your Solution

Subscribe to Microway’s Technical Newsletter

HPC-Tech-Tip Categories

Subscribe to Blog

Technologies

Products

Knowledge Center

Pre-Configured Systems

NVIDIA DGX H100™

NVIDIA DGX POD™

EOL – NVIDIA DGX A100™

AI Anywhere Solution

Tag Archives: CUDA

Archives

Meta

Talk to an Expert

Take a Test Drive

Configure Your Solution

Subscribe to Microway’s Technical Newsletter

HPC-Tech-Tip Categories

HPC-Tech-Tip Tags

Subscribe to Blog