Tag Archives: CUDA

Optimize CUDA Host/Device Transfers

This post is Topic #2 (part 2) in our series Parallel Code: Maximizing your Performance Potential. In my previous post, CUDA Host/Device Transfers and Data Movement, I provided an introduction into the bottlenecks associated with host/device transfers and data movement. … Continue reading

CUDA Host/Device Transfers and Data Movement

This post is Topic #2 (part 1) in our series Parallel Code: Maximizing your Performance Potential. In post #1, I discussed a few ways to optimize the performance of your application via controlling your threads and provided some insight as … Continue reading

CUDA Parallel Thread Management

This post is Topic #1 in our series Parallel Code: Maximizing your Performance Potential. Regardless of the environment or architecture you are using, one thing is certain: you must properly manage the threads running in your application to optimize performance. This … Continue reading

Parallel Code: Maximizing your Performance Potential

No matter what the purpose of your application is, one thing is certain. You want to get the most bang for your buck. You see research papers being published and presented making claims of tremendous speed increases by running algorithms … Continue reading