Author Archives: Justin McKennon (for Microway)

Justin McKennon (for Microway)

About Justin McKennon (for Microway)

My name is Justin McKennon. I'm a 24 year old nerd from Springfield, MA. I'm an electrical engineer by degree but I like to pretend I'm a computer scientist. My hobbies span from studying the mechanics and physics of the golf swing, to exploiting GPUs and hardware accelerators for scientific applications. I've been studying high performance computing since 2008, and I specialize in CUDA and the optimization of parallel algorithms.

Avoiding GPU Memory Performance Bottlenecks

This post is Topic #3 (post 3) in our series Parallel Code: Maximizing your Performance Potential. Many applications contain algorithms which make use of multi-dimensional arrays (or matrices). For cases where threads need to index the higher dimensions of the … Continue reading

GPU Shared Memory Performance Optimization

This post is Topic #3 (post 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, I provided an introduction to the various types of memory available for use in a CUDA application. Now that you’re … Continue reading

GPU Memory Types – Performance Comparison

This post is Topic #3 (part 1) in our series Parallel Code: Maximizing your Performance Potential. CUDA devices have several different memory spaces: Global, local, texture, constant, shared and register memory. Each type of memory on the device has its … Continue reading

Optimize CUDA Host/Device Transfers

This post is Topic #2 (part 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, CUDA Host/Device Transfers and Data Movement, I provided an introduction into the bottlenecks associated with host/device transfers and data movement. … Continue reading

CUDA Host/Device Transfers and Data Movement

This post is Topic #2 (part 1) in our series Parallel Code: Maximizing your Performance Potential. In post #1, I discussed a few ways to optimize the performance of your application via controlling your threads and provided some insight as … Continue reading

CUDA Parallel Thread Management

This post is Topic #1 in our series Parallel Code: Maximizing your Performance Potential. Regardless of the environment or architecture you are using, one thing is certain: you must properly manage the threads running in your application to optimize performance. This … Continue reading

Parallel Code: Maximizing your Performance Potential

No matter what the purpose of your application is, one thing is certain. You want to get the most bang for your buck. You see research papers being published and presented making claims of tremendous speed increases by running algorithms … Continue reading