MATLAB is a well-known and widely-used application – and for good reason. It functions as a powerful, yet easy-to-use, platform for technical computing. With support for a variety of parallel execution methods, MATLAB also performs well. Support for running MATLAB on GPUs has been built-in for a couple years, with better support in each release. If you haven’t tried yet, take this opportunity to test MATLAB performance on GPUs. Microway’s GPU Test Drive makes the process quick and easy. As we’ll show in this post, you can expect to see 3X to 6X performance increases for many tasks (with 30X to 60X speedups on select workloads).
Access a Compute Node with GPU-accelerated MATLAB
Getting started with MATLAB on our GPU cluster is easy: complete this form to sign up for MATLAB GPU benchmarking. We will send you an e-mail with detailed instructions for logging in and starting up MATLAB. Once you’re in, all you need to do is click the MATLAB icon and the latest version of GPU-Accelerated MATLAB will pop up:
We use NoMachine to export the graphical sessions from our cluster to your local PC/laptop. This makes login extremely user-friendly, ensures your interactive session performs well and provides a built-in method for file transfers in and out of the GPU cluster. MATLAB is fairly well-known for performing sluggishly over standard Unix/Linux graphical sessions (e.g., X11 forwarding, VNC), but you’ll have no such issues here.
You’ll be dropped into a standard MATLAB workspace. A variety of parallelized demonstrations of GPU usage are included with MATLAB. Pick one and give it a try! You can type
paralleldemo_gpu and then hit
<TAB> to see the full list of options.
Measure MATLAB GPU Speedups
Below we show the output from several of the built-in MATLAB parallel GPU demos. A few are text-only, but several include a graphical component or performance plot. The first example runs a quick test on memory transfer speeds and computational throughput. Results from both the GPU and the host (CPUs) are shown:
>> paralleldemo_gpu_benchmark Using a Tesla K40m GPU. Achieved peak send speed of 3.44069 GB/s Achieved peak gather speed of 2.20036 GB/s Achieved peak read+write speed on the GPU: 233.613 GB/s Achieved peak read+write speed on the host: 12.9773 GB/s Achieved peak calculation rates of 398.9 GFLOPS (host), 1345.8 GFLOPS (GPU)
Note that the host results will be impacted by the number of local workers available in the Parallel Computing Toolbox. Since version R2011b, the default has been limited to 12 threads/CPU cores. With the release of R2014a, Mathworks removed that limit. For these tests we changed the number of workers to 20 in the Parallel Preferences dialog box.
The next demo generates plots of the speedup between matrix multiplications on dual 10-core Xeon CPUs versus a single NVIDIA Tesla K40 GPU. Both single-precision and double-precision floating-point calculations were run.
>> paralleldemo_gpu_backslash Starting benchmarks with 8 different single-precision matrices of sizes ranging from 1024-by-1024 to 29696-by-29696. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 66.278709 Gigaflops on GPU: 107.556334 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 235.782899 Gigaflops on GPU: 988.360718 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 345.775846 Gigaflops on GPU: 1411.722193 Creating a matrix of size 13312-by-13312. Gigaflops on CPU: 430.923486 Gigaflops on GPU: 1631.047366 Creating a matrix of size 17408-by-17408. Gigaflops on CPU: 493.923539 Gigaflops on GPU: 1708.917025 Creating a matrix of size 21504-by-21504. Gigaflops on CPU: 529.809413 Gigaflops on GPU: 1754.558735 Creating a matrix of size 25600-by-25600. Gigaflops on CPU: 567.786871 Gigaflops on GPU: 1804.538355 Creating a matrix of size 29696-by-29696. Gigaflops on CPU: 597.913569 Gigaflops on GPU: 1842.050491 Starting benchmarks with 6 different double-precision matrices of sizes ranging from 1024-by-1024 to 21504-by-21504. Creating a matrix of size 1024-by-1024. Gigaflops on CPU: 45.881347 Gigaflops on GPU: 84.044136 Creating a matrix of size 5120-by-5120. Gigaflops on CPU: 112.758309 Gigaflops on GPU: 653.228694 Creating a matrix of size 9216-by-9216. Gigaflops on CPU: 135.980895 Gigaflops on GPU: 883.155216 Creating a matrix of size 13312-by-13312. Gigaflops on CPU: 223.848074 Gigaflops on GPU: 975.277154 Creating a matrix of size 17408-by-17408. Gigaflops on CPU: 254.737638 Gigaflops on GPU: 1004.284010 Creating a matrix of size 21504-by-21504. Gigaflops on CPU: 277.688546 Gigaflops on GPU: 1028.731291
GPU-Accelerated Stencil Operations
MATLAB also includes a couple of Stencil Operation demos running on a GPU. These include both a “generic” implementation and an optimized implementation using GPU shared & texture memory. As shown below, MATLAB GPU speedups can be 30+ times faster than MATLAB on CPUs with properly-optimized algorithms.
>> paralleldemo_gpu_mexstencil Average time on the GPU: 1.119ms per generation Average time of 0.038ms per generation (29.4x faster). Average time of 0.019ms per generation (58.9x faster). First version using gpuArray: 1.119ms per generation. MEX with shared memory: 0.038ms per generation (29.4x faster). MEX with texture memory: 0.019ms per generation (58.9x faster).
Running your own test of MATLAB GPU speedups
To see a list of other useful demos, take a look at the GPU-accelerated examples on Mathworks FileExchange. You’ll find a large number of useful demonstrations, including:
- GPU acceleration for FFTs
- Heat transfer equations
- Navier-Stokes equations for incompressible fluids
- Anisotropic Diffusion
- Gradient Vector Flow (GVF) force field calculation
- 3D linear and trilinear interpolation
- more than 60 others
Also consider that nearly 300 of MATLAB’s standard functions support GPU acceleration (as of release R2014b). Utilizing these capabilities is quite straightforward: your data must be loaded into a
gpuArray. With this done, pass the
gpuArray to any of MATLAB’s standard functions and the operations will be carried out on the GPU!
Will GPU acceleration speed up your research?
With our pre-configured GPU cluster, running MATLAB on high-performance GPUs is as easy as running it on your own workstation. Find out for yourself how much faster you’ll be able to work if you add GPUs to your toolbelt. Sign up for a GPU Test Drive today!
“Solving 2nd Order Wave Equation on the GPU Using Spectral Methods” by Jiro Doke
Mathworks MATLAB Central