AVX2 Optimization and Haswell-EP (Xeon E5-2600v3) CPU Features

We’re very excited to be delivering systems with the new Xeon E5-2600v3 and E5-1600v3 CPUs. If you are the type who loves microarchitecture details and compiler optimization, there’s a lot to gain. If you haven’t explored the latest techniques and instructions for optimization, it’s never a bad time to start.

Many end users don’t always see instruction changes as consequential. However, they can be absolutely critical to achieving optimal application performance. Here’s a comparison of Theoretical Peak Performance of the latest CPUs with and without FMA3:
Plot of Xeon E5-2600v3 Theoretical Peak Performance (GFLOPS)

Only a small set of codes will be capable of issuing almost exclusively FMA instructions (e.g., LINPACK). Achieved performance for well-parallelized & optimized applications is likely to fall between the grey and colored bars. Still, without employing a compiler optimized for FMA3 instructions, you are leaving significant potential performance of your Xeon E5-2600v3-based hardware purchase on the table.

Know your CPUs, know your instructions

With that in mind, we would like to summarize and link to these new resources from Intel:

Intel: Xeon E5-2600v3 Technical Overview

  • A brief summary of Haswell-NI (Haswell New Instructions) that add dedicated instructions for signal processing, encryption, and math functions
  • Summary of power improvements in the Haswell architecture
  • Detailed comparison of C600 and C610 series chipsets
  • Virtualization improvements and new security features

Intel: How AVX2 Improves Performance on Server Applications

  • Instructions on how to recompile your code for AVX2 instructions and supported compilers
  • Other methods of employing AVX2: Intel MKL, coding with intrinsic instructions, and assembly
  • Summary of LINPACK performance gains delivered simply by using AVX2

