Intel Xeon E5-2600 v3 “Haswell” Processor Review

Update:

As of March 31, 2016 we recommend version four of these Intel Xeon CPUs. Please see our new post Intel Xeon E5-2600 v4 “Broadwell” Processor Review

Intel has launched brand new Xeon E5-2600 v3 CPUs with groundbreaking new features. These CPUs build upon the leading performance of their predecessors with more a robust microarchitecture, faster memory, wider buses, and increased core counts and clock speed. The result is dramatically improved performance for HPC.

Important changes available in E5-2600 v3 “Haswell” include:

  • Support for brand new DDR4-2133 memory
  • Up to 18 processor cores per socket (with options for 6- to 16-cores)
  • Improved AVX 2.0 Instructions with:
    • New floating point FMA, with up to 2X the FLOPS per core (16 FLOPS/clock)
    • 256-bit wide integer vector instructions
  • A revised C610 Series Chipset delivering substantially improved I/O for every server (SATA, USB 3.0)
  • Increased L1, L2 cache bandwidth and faster QPI links
  • Slightly tweaked “Grantley” socket (Socket R3) and platforms

DDR4: Memory Architecture for the Present and Future

Xeon E5-2600 v3 is one of the first server CPUs to support DDR4 memory. DDR4 is big news: it takes advantage of a new design with fewer chips on each module, lower voltages, and superior power efficiency (20% less power per module). Apart from the benefits today, these changes ensure DDR4 DIMMs are primed to accept ever higher chip densities and clocks that exceed those of today’s DDR4-2133 modules. Physical characteristics of the DIMMs themselves have changed too: a slight curvature for easier seating and more PINs on each module.

Memory Performance

On top of the new JDEC standard for DIMMs themselves, Intel has increased the memory speed stepping for all  Xeon E5-2600 v3 CPU SKUs. The result is a 13-20% increase in memory performance:

  • Entry-level “Basic” CPUs now support 1600MHz memory (a 20% increase)
  • Mid-level “Standard” CPUs now support 1866MHz memory (a 16% increase)
  • Higher-end “Advanced”“High Core Count” & “Frequency Optimized” CPUs now support up to 4 DIMMs per socket at 2133MHz (a 14% increase)

Finally, it’s worth noting that configurations that populate 3 DIMMs per channel (up to a 40% performance penalty with older Xeons) or LR-DIMMs (14-40% penalty on previous gen, depending on population) see far higher frequencies than on current CPUs.

In short, DDR4 means even higher memory bandwidth today – a critical driver of HPC performance. It pairs nicely with the increased core counts of the new CPUs.

New Instructions – AVX 2.0

One of the primary drivers of the Xeon E5-2600 CPUs’ robust performance has been wider instructions, termed AVX (Advanced Vector Instructions). Intel has made its largest improvement to AVX in 3 years with Haswell’s addition of AVX 2.0:

256-bit integer instructions

Sandy-Bridge and Ivy Bridge CPUs delivered class leading floating point performance due to a 256-bit floating point unit in each core. This unit was twice as wide as that in previous Xeon CPUs and enabled twice the FLOPS of competing CPUs.

The integer unit remained at 128-bit (identical in Sandy Bridge and Ivy Bridge), but integer performance was buttressed with comparatively high clock speeds and Turbo Boost features.

With Xeon E5-2600 v3, Intel has widened the integer unit to the same 256-bits. The result is faster performance on many integer codes, even on CPUs with lower clock speeds. For example, the integer performance of the 12-core 2.7GHz IvyBridge E5-2697 v2 lies roughly between the two Haswell processors the E5-2660 v3 (10-core, 2.6Ghz) and the E5-2670 v3 (12-core, 2.3GHz).

FMA

AVX 2.0 also features a new fused-multiply-add instruction. For codes that perform multiply and add instructions in short succession, FMA reduces the number of cycles in half. 2X the FLOPS for areas of code leveraging these instructions proves extremely consequential for math and science algorithms. Since floating point performance is most important to our customers, we discuss these improvements in more detail below.

Performance – Faster in Nearly Every Metric

Much like with the Sandy-Bridge generation of Xeons, Intel has plugged in a new architecture, improved memory performance, and increased core counts and clock speeds all at once.

Users generally should expect at least a 10% increase in performance per core, excluding the new instructions. Coupled with the memory change and new instructions, this means dramatic changes (SPEC CPU2006 benchmarks):

  • Xeon E5-2620 v2 to v3: 18% performance improvement
  • Xeon E5-2630 – E5-2697 v2 to E5-2630 – E5-2697 v3: between 22% and 29% performance improvement ¹
  • Xeon E5-2697 v2 to Xeon E5-2698 v3/E5-2699 v3: between 27% and 32% performance improvement ²

¹ Transitioning from the same number v2 SKU to v3 SKU for these models (ex: Xeon E5-2640 v2, to Xeon E5-2640 v3, 2.0 vs. 2.6Ghz) often bundles an increase in core count, clock speed, memory performance, and the architecture improvements. Performance increase stated represents the net gain of these factors. DDR4 memory might result in a higher system cost.

² These two new high-end Haswell processors have no equivalent IvyBridge SKU and thus enjoy the largest performance deltas.

Theoretical Performance and LINPACK

Below is a chart with the theoretical peak performance (FLOPS) of the new Haswell-EP (Xeon E5-2600v3) CPUs with the new instructions. If you look at the graph below, you’ll see that the Haswell E5-2630 v3 is roughly equivalent to the flagship IvyBridge E5-2697 v2 (whose performance suffers without the new instruction support).

Comparison between Xeon E5-2600 v3 vs Xeon E5-2600 v2 Theoretical Peak Performance when using FMA3 and AVX Instructions

 

Keep in mind, however, that that these are peak theoretical numbers; depending upon how much your applications can take advantage of FMA, the performance gains could be far lower (see our Detailed Specifications). The 20% – 30% increases mentioned earlier come from the SPEC CPU2006 benchmarks, which execute a suite of real world applications.

Another dramatic comparison is the Xeon E5-2697 v2 (2.7Ghz, 12-core) to the new Xeon E5-2699 v3 (2.3Ghz, 18 core) on LINPACK. The new model represents a 91% increase in performance. The main reason for this substantial improvement is the new AVX 2.0 instruction set, specifically FMA. The increase in core count also contributes.

Should you prefer the most apples-to-apples architecture comparison of Xeon E5-2697 v2 (2.7Ghz, 12-core) to the Xeon E5-2690 v3 (2.6Ghz, 12-core), there is a 54% increase in LINPACK performance.

 

Transitioning from “Ivy Bridge” E5-2600 v2 Series Xeons

Xeon E5-2600 v3 and Xeon E5-2600 v2 CPUs do not use the same CPU socket, and DDR4 does come with a cost premium. Some large installations may still find a price/performance argument for the Ivy Bridge CPUs, and a few platforms (e.g., complex Phi- & GPU-accelerated servers) will take time to transition to the new CPU socket.

However, end users who are willing to invest slightly more will find attractive new SKUs to leverage in their clusters, servers, and workstations. All new CPUs offer faster memory speeds and QPI transfers. Applications which effectively leverage the new FMA instructions should be able to achieve higher performance than flagship v2 CPUs using almost any of the v3 CPUs.

Comparisons of note (providing increased value for your dollar):

  • Xeon E5-2640 v2 transition to Xeon E5-2630 v3: same core count, faster clock speeds, faster memory; lower price
  • Xeon E5-2650 v2 transition to Xeon E5-2640 v3: identical core count, clock speed, and turbo boost speed yet costs are also lower
  • Xeon E5-2695 v2 and E5-2697 v2 transition to Xeon E5-2690 v3: provides similar base and turbo speeds and at a lower price.
  • Xeon E5-2695v2 and E5-2697 v2 transition to Xeon E5-2683 v3: for well-threaded applications able to accept a lower clock speed, the two extra cores in Xeon E5-2683 v3 will outperform at a much lower price

Nearly all processor transitions come at similar or lower costs on the CPU-side. Customers may choose to apply the savings towards their DDR4 memory capacity.

Further Grantley Platform Improvements

C610 Series Chipset

Some end-users found the earlier C600 chipset needed to be supplemented to meet their needs. Intel has added features that address many of these situations:

  1. SATA: Increase from 2 SATA3 + 4 SATA2 to at least 6 SATA3 ports
  2. USB: USB 3.0 support now native to the chipset, rather than board manufacturers adding a supplemental chip
  3. Ethernet: More common deployment of RJ45-based 10GigE; a new 40GigE controller (Fortville)

QPI Links

Intel’s Quick Path Interconnect link between the two CPU sockets now features faster speeds for every SKU:

  • Entry-level “Basic” CPUs at 7.2 GT/sec
  • Mid-level “Standard” CPUs at 8.0 GT/sec
  • Higher-end “Advanced”“High Core Count” & “Frequency Optimized” CPUs at 9.6 GT/sec

QPI allows for rapid access to memory on the non-local CPU socket.

Next Steps – Putting Xeon E5-2600 v3 into Production

As always, please contact an HPC expert if you would like to discuss in further detail. You may also wish to review our products which leverage these new Xeon processors:

For more analysis of the Xeon E5-2600 v3 processor series, please read:

Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors

Intel’s Xeon E5 Resource Page

Summary of Intel Xeon E5-2600 v3 Series Specifications

ModelStock FrequencyMax Turbo BoostCore CountMemory SpeedL3 CacheQPI SpeedTDP (Watts)
E5-2699 v32.30 GHz3.60 GHz182133 MHz45MB9.6 GT/s145W
E5-2698 v31640MB135W
E5-2697 v32.60 GHz3.60 GHz1435MB145W
E5-2695 v32.30 GHz3.30 GHz120W
E5-2683 v32.00 GHz3.00 GHz
E5-2690 v32.60 GHz3.50 GHz1230MB135W
E5-2680 v32.50 GHz3.30 GHz120W
E5-2670 v32.30 GHz3.10 GHz
E5-2687W v33.10 GHz3.50 GHz1025MB160W
E5-2660 v32.60 GHz3.30 GHz105W
E5-2650 v32.30 GHz3.00 GHz
E5-2667 v33.20 GHz3.60 GHz820MB135W
E5-2640 v32.60 GHz3.40 GHz1866 MHz8 GT/s90W
E5-2630 v32.40 GHz3.20 GHz85W
E5-2643 v33.40 GHz3.70 GHz62133 MHz9.6 GT/s135W
E5-2620 v32.40 GHz3.20 GHz1866 MHz15MB8 GT/s85W
E5-2637 v33.50 GHz3.70 GHz42133 MHz9.6 GT/s135W
E5-2623 v33.00 GHz3.50 GHz1866 MHz10MB8 GT/s105W

HPC groups do not typically choose Intel’s “Basic” and “Low Power” models – those skus are not shown.

You May Also Like