Update:
As of March 31, 2016 we recommend version four of these Intel Xeon CPUs. Please see our new post Intel Xeon E5-2600 v4 “Broadwell” Processor Review
Intel has launched brand new Xeon E5-2600 v3 CPUs with groundbreaking new features. These CPUs build upon the leading performance of their predecessors with more a robust microarchitecture, faster memory, wider buses, and increased core counts and clock speed. The result is dramatically improved performance for HPC.
Important changes available in E5-2600 v3 “Haswell” include:
- Support for brand new DDR4-2133 memory
- Up to 18 processor cores per socket (with options for 6- to 16-cores)
- Improved AVX 2.0 Instructions with:
- New floating point FMA, with up to 2X the FLOPS per core (16 FLOPS/clock)
- 256-bit wide integer vector instructions
- A revised C610 Series Chipset delivering substantially improved I/O for every server (SATA, USB 3.0)
- Increased L1, L2 cache bandwidth and faster QPI links
- Slightly tweaked “Grantley” socket (Socket R3) and platforms
DDR4: Memory Architecture for the Present and Future
Xeon E5-2600 v3 is one of the first server CPUs to support DDR4 memory. DDR4 is big news: it takes advantage of a new design with fewer chips on each module, lower voltages, and superior power efficiency (20% less power per module). Apart from the benefits today, these changes ensure DDR4 DIMMs are primed to accept ever higher chip densities and clocks that exceed those of today’s DDR4-2133 modules. Physical characteristics of the DIMMs themselves have changed too: a slight curvature for easier seating and more PINs on each module.
Memory Performance
On top of the new JDEC standard for DIMMs themselves, Intel has increased the memory speed stepping for all Xeon E5-2600 v3 CPU SKUs. The result is a 13-20% increase in memory performance:
- Entry-level “Basic” CPUs now support 1600MHz memory (a 20% increase)
- Mid-level “Standard” CPUs now support 1866MHz memory (a 16% increase)
- Higher-end “Advanced”, “High Core Count” & “Frequency Optimized” CPUs now support up to 4 DIMMs per socket at 2133MHz (a 14% increase)
Finally, it’s worth noting that configurations that populate 3 DIMMs per channel (up to a 40% performance penalty with older Xeons) or LR-DIMMs (14-40% penalty on previous gen, depending on population) see far higher frequencies than on current CPUs.
In short, DDR4 means even higher memory bandwidth today – a critical driver of HPC performance. It pairs nicely with the increased core counts of the new CPUs.
New Instructions – AVX 2.0
One of the primary drivers of the Xeon E5-2600 CPUs’ robust performance has been wider instructions, termed AVX (Advanced Vector Instructions). Intel has made its largest improvement to AVX in 3 years with Haswell’s addition of AVX 2.0:
256-bit integer instructions
Sandy-Bridge and Ivy Bridge CPUs delivered class leading floating point performance due to a 256-bit floating point unit in each core. This unit was twice as wide as that in previous Xeon CPUs and enabled twice the FLOPS of competing CPUs.
The integer unit remained at 128-bit (identical in Sandy Bridge and Ivy Bridge), but integer performance was buttressed with comparatively high clock speeds and Turbo Boost features.
With Xeon E5-2600 v3, Intel has widened the integer unit to the same 256-bits. The result is faster performance on many integer codes, even on CPUs with lower clock speeds. For example, the integer performance of the 12-core 2.7GHz IvyBridge E5-2697 v2 lies roughly between the two Haswell processors the E5-2660 v3 (10-core, 2.6Ghz) and the E5-2670 v3 (12-core, 2.3GHz).
FMA
AVX 2.0 also features a new fused-multiply-add instruction. For codes that perform multiply and add instructions in short succession, FMA reduces the number of cycles in half. 2X the FLOPS for areas of code leveraging these instructions proves extremely consequential for math and science algorithms. Since floating point performance is most important to our customers, we discuss these improvements in more detail below.
Performance – Faster in Nearly Every Metric
Much like with the Sandy-Bridge generation of Xeons, Intel has plugged in a new architecture, improved memory performance, and increased core counts and clock speeds all at once.
Users generally should expect at least a 10% increase in performance per core, excluding the new instructions. Coupled with the memory change and new instructions, this means dramatic changes (SPEC CPU2006 benchmarks):
- Xeon E5-2620 v2 to v3: 18% performance improvement
- Xeon E5-2630 – E5-2697 v2 to E5-2630 – E5-2697 v3: between 22% and 29% performance improvement ¹
- Xeon E5-2697 v2 to Xeon E5-2698 v3/E5-2699 v3: between 27% and 32% performance improvement ²
¹ Transitioning from the same number v2 SKU to v3 SKU for these models (ex: Xeon E5-2640 v2, to Xeon E5-2640 v3, 2.0 vs. 2.6Ghz) often bundles an increase in core count, clock speed, memory performance, and the architecture improvements. Performance increase stated represents the net gain of these factors. DDR4 memory might result in a higher system cost.
² These two new high-end Haswell processors have no equivalent IvyBridge SKU and thus enjoy the largest performance deltas.
Theoretical Performance and LINPACK
Below is a chart with the theoretical peak performance (FLOPS) of the new Haswell-EP (Xeon E5-2600v3) CPUs with the new instructions. If you look at the graph below, you’ll see that the Haswell E5-2630 v3 is roughly equivalent to the flagship IvyBridge E5-2697 v2 (whose performance suffers without the new instruction support).
Keep in mind, however, that that these are peak theoretical numbers; depending upon how much your applications can take advantage of FMA, the performance gains could be far lower (see our Detailed Specifications). The 20% – 30% increases mentioned earlier come from the SPEC CPU2006 benchmarks, which execute a suite of real world applications.
Another dramatic comparison is the Xeon E5-2697 v2 (2.7Ghz, 12-core) to the new Xeon E5-2699 v3 (2.3Ghz, 18 core) on LINPACK. The new model represents a 91% increase in performance. The main reason for this substantial improvement is the new AVX 2.0 instruction set, specifically FMA. The increase in core count also contributes.
Should you prefer the most apples-to-apples architecture comparison of Xeon E5-2697 v2 (2.7Ghz, 12-core) to the Xeon E5-2690 v3 (2.6Ghz, 12-core), there is a 54% increase in LINPACK performance.
Transitioning from “Ivy Bridge” E5-2600 v2 Series Xeons
Xeon E5-2600 v3 and Xeon E5-2600 v2 CPUs do not use the same CPU socket, and DDR4 does come with a cost premium. Some large installations may still find a price/performance argument for the Ivy Bridge CPUs, and a few platforms (e.g., complex Phi- & GPU-accelerated servers) will take time to transition to the new CPU socket.
However, end users who are willing to invest slightly more will find attractive new SKUs to leverage in their clusters, servers, and workstations. All new CPUs offer faster memory speeds and QPI transfers. Applications which effectively leverage the new FMA instructions should be able to achieve higher performance than flagship v2 CPUs using almost any of the v3 CPUs.
Comparisons of note (providing increased value for your dollar):
- Xeon E5-2640 v2 transition to Xeon E5-2630 v3: same core count, faster clock speeds, faster memory; lower price
- Xeon E5-2650 v2 transition to Xeon E5-2640 v3: identical core count, clock speed, and turbo boost speed yet costs are also lower
- Xeon E5-2695 v2 and E5-2697 v2 transition to Xeon E5-2690 v3: provides similar base and turbo speeds and at a lower price.
- Xeon E5-2695v2 and E5-2697 v2 transition to Xeon E5-2683 v3: for well-threaded applications able to accept a lower clock speed, the two extra cores in Xeon E5-2683 v3 will outperform at a much lower price
Nearly all processor transitions come at similar or lower costs on the CPU-side. Customers may choose to apply the savings towards their DDR4 memory capacity.
Further Grantley Platform Improvements
C610 Series Chipset
Some end-users found the earlier C600 chipset needed to be supplemented to meet their needs. Intel has added features that address many of these situations:
- SATA: Increase from 2 SATA3 + 4 SATA2 to at least 6 SATA3 ports
- USB: USB 3.0 support now native to the chipset, rather than board manufacturers adding a supplemental chip
- Ethernet: More common deployment of RJ45-based 10GigE; a new 40GigE controller (Fortville)
QPI Links
Intel’s Quick Path Interconnect link between the two CPU sockets now features faster speeds for every SKU:
- Entry-level “Basic” CPUs at 7.2 GT/sec
- Mid-level “Standard” CPUs at 8.0 GT/sec
- Higher-end “Advanced”, “High Core Count” & “Frequency Optimized” CPUs at 9.6 GT/sec
QPI allows for rapid access to memory on the non-local CPU socket.
Next Steps – Putting Xeon E5-2600 v3 into Production
As always, please contact an HPC expert if you would like to discuss in further detail. You may also wish to review our products which leverage these new Xeon processors:
For more analysis of the Xeon E5-2600 v3 processor series, please read:
Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors
Intel’s Xeon E5 Resource Page
Summary of Intel Xeon E5-2600 v3 Series Specifications
Model | Stock Frequency | Max Turbo Boost | Core Count | Memory Speed | L3 Cache | QPI Speed | TDP (Watts) |
---|---|---|---|---|---|---|---|
E5-2699 v3 | 2.30 GHz | 3.60 GHz | 18 | 2133 MHz | 45MB | 9.6 GT/s | 145W |
E5-2698 v3 | 16 | 40MB | 135W | ||||
E5-2697 v3 | 2.60 GHz | 3.60 GHz | 14 | 35MB | 145W | ||
E5-2695 v3 | 2.30 GHz | 3.30 GHz | 120W | ||||
E5-2683 v3 | 2.00 GHz | 3.00 GHz | |||||
E5-2690 v3 | 2.60 GHz | 3.50 GHz | 12 | 30MB | 135W | ||
E5-2680 v3 | 2.50 GHz | 3.30 GHz | 120W | ||||
E5-2670 v3 | 2.30 GHz | 3.10 GHz | |||||
E5-2687W v3 | 3.10 GHz | 3.50 GHz | 10 | 25MB | 160W | ||
E5-2660 v3 | 2.60 GHz | 3.30 GHz | 105W | ||||
E5-2650 v3 | 2.30 GHz | 3.00 GHz | |||||
E5-2667 v3 | 3.20 GHz | 3.60 GHz | 8 | 20MB | 135W | ||
E5-2640 v3 | 2.60 GHz | 3.40 GHz | 1866 MHz | 8 GT/s | 90W | ||
E5-2630 v3 | 2.40 GHz | 3.20 GHz | 85W | ||||
E5-2643 v3 | 3.40 GHz | 3.70 GHz | 6 | 2133 MHz | 9.6 GT/s | 135W | |
E5-2620 v3 | 2.40 GHz | 3.20 GHz | 1866 MHz | 15MB | 8 GT/s | 85W | |
E5-2637 v3 | 3.50 GHz | 3.70 GHz | 4 | 2133 MHz | 9.6 GT/s | 135W | |
E5-2623 v3 | 3.00 GHz | 3.50 GHz | 1866 MHz | 10MB | 8 GT/s | 105W |
HPC groups do not typically choose Intel’s “Basic” and “Low Power” models – those skus are not shown.