Summer has arrived bringing new technologies that improve your productivity. We're introducing new solutions built upon the Intel Xeon Phi coprocessor (based on the MIC architecture). Our blog features a series of posts for getting the best performance on your coprocessor or GPU. NVIDIA is preparing an improved version of CUDA 5. Look out for a new Microway.com in July.
Intel Fills out the Xeon Phi Product Line
Along with the announcement that Xeon Phi powers
the most powerful supercomputer in the world, Tianhe-2, Intel has released the next models in the Phi product line. These provide a richer set of options for those looking to get started with coprocessors (Xeon Phi 3120A, actively-cooled) and those looking for the highest performance (Xeon Phi 7120P).
Model |
DP GFLOPS |
Memory Capacity |
Memory Bandwidth |
Xeon Phi 3120P |
1003 |
6GB |
240 GB/s |
 |
Xeon Phi 3120A |
1003 |
6GB |
240 GB/s |
 |
Xeon Phi 5110P |
1011 |
8GB |
320 GB/s |
 |
Xeon Phi 7120P |
1208 |
16GB |
352 GB/s |
 |
Xeon Phi coprocessors are now available in Microway's full line of products, including quiet WhisperStations and rackmount NumberSmasher servers. All are available with parallel compilers/analyzers and accelerated math libraries preconfigured.
New or Updated Xeon Phi Programming and Educational Resources
Microway HPC Tech Tip
Parallel Code: Maximizing Your Performance Potential
Abstract: No matter what the purpose of your application, one thing is certain. You want to get the most bang for your buck. You see research papers being published and presented making claims of tremendous speed increases by running algorithms on GPUs (e.g., NVIDIA Tesla), in a cluster, or on a hardware accelerator (such as the Xeon Phi or Cell BE). These architectures allow for massively parallel execution of code that, if done properly, can yield lofty performance gains.
Unlike most aspects of programming, the actual writing of the programs is (relatively) simple. Most hardware accelerators support (or are very similar to) C based programming languages. This makes hitting the ground running with parallel coding an actually doable task. While mastering the development of massively parallel code is an entirely different matter, with a basic understanding of the principles behind efficient, parallel code, one can obtain substantial performance increases compared to traditional programming and serial execution of the same algorithms.
New Partnerships with Panasas and Revolution Analytics
We are excited to announce Microway's partnership with Panasas for high-performance parallel storage.
Panasas ActiveStor is the world's fastest parallel storage system, bringing plug-and-play simplicity to large scale storage deployments. ActiveStor offers the performance of parallel storage without the headaches commonly associated with such systems.
Microway has also entered into a partnership with Revolution Analytics. Researchers throughout the world use R, but many require more performance than is available from the open-source package.
Revolution R provides significant performance and scalability features, without sacrificing compatibility with the community-developed packages researchers rely upon.
Further details will be announced soon.
NVIDIA Announces CUDA 5.5 Release-Candidate
NVIDIA has announced that CUDA 5.5 is ready for testing, with a production release to follow. Features of note include:
Optimized For MPI Applications
- Enhanced Hyper-Q support for multiple MPI processes via the new Multi-Process Service (MPS) on Linux systems
- MPI Workload Prioritization enabled by CUDA stream prioritization
- Multi-process MPI debugging and profiling
Guided Performance Analysis
- Step-by-step guidance helps you identify performance bottlenecks and apply optimizations in the NVIDIA Visual Profile and Nsight, Eclipse Edition
Support For ARM Platforms
- Native compilation, for easy application porting
- Fast cross-compile on x86 for large applications
Fast CUDA-Python is here with NumbaPro
Python has a massive user base and robust community. If you've been waiting for fast CUDA-Python support, Continuum Analytics is delivering
NumbaPro in partnership with NVIDIA. NumbaPro is part of the
Anaconda Accelerate library and features:
- GPU targeting with single line vectorization commands
- Robust just in time compiler for CUDA GPUs targeted towards more complicated codes
- Support for the new high-level Compute Unit (CU) abstraction (Experimental)
- Optional CUDA-based API for custom management of threads and blocks
New CUDA Handbook Available
This is a huge 500+ page volume for CUDA GPU Programmers. Written by one of the original architects of CUDA, it includes critical information needed to improve your CUDA code performance and provides advanced techniques:
- Detailed discussion of programming for the Kepler GPU architecture and CUDA 5 features
- Discussion of host hardware architecture (CPU), PCI-E structure, NUMA/SMP and their effects on performance
- Detailed tips for programming multiple GPUs
- Microbenchmarks for memory bandwidth and performance
- New demo code optimized for Kepler GPU architecture
- Information on pre-ported libraries