-
Archives
- June 2020
- May 2020
- March 2020
- August 2019
- June 2019
- April 2019
- March 2019
- May 2018
- April 2018
- October 2017
- September 2017
- July 2017
- April 2017
- February 2017
- January 2017
- December 2016
- August 2016
- July 2016
- June 2016
- April 2016
- March 2016
- February 2016
- January 2016
- October 2015
- September 2015
- July 2015
- June 2015
- May 2015
- April 2015
- November 2014
- October 2014
- September 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- April 2013
- March 2013
- February 2013
- January 2013
- September 2012
- May 2012
- April 2012
- January 2012
- December 2011
- September 2011
- August 2011
- July 2011
-
Meta
Tag Archives: CUDA
NVIDIA Tesla M40 24GB GPU Accelerator (Maxwell GM200) Up Close
NVIDIA has announced a new version of their popular Tesla M40 GPU – one with 24GB of high-speed GDDR5 memory. The name hasn’t really changed – the new GPU is named NVIDIA Tesla M40 24GB. If you are curious about … Continue reading
Accelerating Code with OpenACC and the NVIDIA Visual Profiler
Comprised of a set of compiler directives, OpenACC was created to accelerate code using the many streaming multiprocessors (SM) present on a GPU. Similar to how OpenMP is used for accelerating code on multicore CPUs, OpenACC can accelerate code on … Continue reading
NVIDIA Tesla M40 12GB GPU Accelerator (Maxwell GM200) Up Close
With the release of Tesla M40, NVIDIA continues to diversify its professional compute GPU lineup. Designed specifically for Deep Learning applications, the M40 provides 7 TFLOPS of single-precision floating point performance and 12GB of high-speed GDDR5 memory. It works extremely … Continue reading
CUB in Action – some simple examples using the CUB template library
In my previous post, I presented a brief introduction to the CUB library of CUDA primitives written by Duane Merrill of NVIDIA. CUB provides a set of highly-configurable software components, which include warp- and block-level kernel components as well as … Continue reading
Introducing CUDA UnBound (CUB)
CUB – a configurable C++ template library of high-performance CUDA primitives Each new generation of NVIDIA GPUs brings with it a dramatic increase in compute power and the pace of development over the past several years has been rapid. The … Continue reading
NVIDIA Tesla K40 “Atlas” GPU Accelerator (Kepler GK110b) Up Close
NVIDIA’s latest Tesla accelerator is without a doubt the most powerful GPU available. With almost 3,000 CUDA cores and 12GB GDDR5 memory, it wins in practically every* performance test you’ll see. As with the “Kepler” K20 GPUs, the Tesla K40 … Continue reading
CUDA Code Migration (Fermi to Kepler Architecture) on Tesla GPUs
The debut of NVIDIA’s Kepler architecture in 2012 marked a significant milestone in the evolution of general-purpose GPU computing. In particular, Kepler GK110 (compute capability 3.5) brought unrivaled compute power and introduced a number of new features to enhance GPU … Continue reading
Avoiding GPU Memory Performance Bottlenecks
This post is Topic #3 (post 3) in our series Parallel Code: Maximizing your Performance Potential. Many applications contain algorithms which make use of multi-dimensional arrays (or matrices). For cases where threads need to index the higher dimensions of the … Continue reading
GPU Shared Memory Performance Optimization
This post is Topic #3 (post 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, I provided an introduction to the various types of memory available for use in a CUDA application. Now that you’re … Continue reading
GPU Memory Types – Performance Comparison
This post is Topic #3 (part 1) in our series Parallel Code: Maximizing your Performance Potential. CUDA devices have several different memory spaces: Global, local, texture, constant, shared and register memory. Each type of memory on the device has its … Continue reading