Intel has launched new 4-socket Xeon E5-4600v3 CPUs. They are the perfect choice for “just beyond dual socket” system scaling. Leverage them for larger memory capacity, faster memory bandwidth, and higher core-count when you aren’t ready for a multi-system purchase.
Here are a few of the main technical improvements:
- DDR4-2133 memory support, for increased memory bandwidth
- Up to 18 cores per socket, faster QPI links up to 9.6GT/sec between sockets
- Up to 48 DIMMs per server, for a maximum of 3TB memory
- Haswell core microarchitecture with new instructions
Why pick a 4-socket Xeon E5-4600v3 CPU over a 2 socket solution?
Increased memory space vs 2 socket
Dual socket systems max out at 512GB affordably (1TB at cost); however, many HPC users have models that outgrow that memory space. Xeon E5-4600v3 systems double the DIMM count for up to 1.5TB affordably (3TB at higher cost).
For applications like ANSYS, COMSOL, and other CAE, multiphysics, and CFD suites, this can be a game changer. Traditionally, achieving these types of memory capacities required large multi-node cluster installations. Usage of such a cluster to run simulations is almost always more effort. The Xeon E5-4600v3 permits larger models to run on a single system with a familiar single OS instance. Don’t underestimate the power of ease-of-use.
Increased core count vs 2 socket
Hand-in-hand with the memory space comes core count. What good are loading up big models if you can’t scale compute throughput to run the simulations? The Xeon E5-4600v3 CPUs mean systems deliver up to 72 cores. Executing on that scale means a faster time to solution for you and more work accomplished.
Increased aggregate memory bandwidth
One overlooked aspect of 4P systems is superior memory bandwidth. Intel integrates the same memory controller in the Xeon E5-2600v3 CPUs into each Xeon E5-4600v3 socket. However, there’s twice as many CPUs in each system: the net result is 2X the aggregate memory bandwidth per system.
Increased memory bandwidth per core (by selecting 4 sockets but fewer cores per socket)
Users might be concerned about memory bandwidth per CPU core. We find that CFD and multiphysics applications are especially sensitive. But a 4-socket system presents unique opportunities: you may select fewer cores per socket while achieving the same core count.
If you select smartly, you will have 2X the memory bandwidth per core available in your system vs. a 2 socket solution. This strategy can also be used to maximize throughput for a software license with a hard core count ceiling.
Detailed Technical Improvements
You’ve heard the why, but the nuts and bolts generation-to-generation improvements matter too. Let’s review in detail:
DDR4-2133 memory support- bandwidth and efficiency
Memory bandwidth is critical for HPC users. CFD, CAE/simulation, life-sciences and custom coded applications benefit most. With the new CPUs, you’ll see the following improvements over Xeon E5-4600v2:
- Entry-level “Basic” CPU operates memory at 1600Mhz (increase of 20%)
- Mid-level “Standard” CPUs now operate memory at 1866Mhz (increase of 16%)
- Higher-end “Advanced”, “High Core Count” & “Frequency Optimized” CPUs now support up to 4 DIMMs per socket at 2133MHz (increase of 14%), 8 DIMMs per socket with LR-DIMMS
The increase in memory clocks means Xeon E5-4600v3 delivers more memory bandwidth per socket, up to 68GB/sec. Moreover, DDR4 DIMMs operate at 1.2v resulting in a substantial power-efficiency gain.
Increased core counts – more for your money
Throughout the stack, core counts are increasing:
- Xeon E5-4610v3 and E5-4620v3: 10 cores per socket, a 25% core count increase over the previous generation
- Xeon E5-4640v3, E5-4650v3: 12 cores per socket, a 50% core count increase over the previous generation
- E5-4669v3: 18 cores per socket, a 33% core count increase over the previous generation
- New E5-4660v3 SKU delivers 14 cores per socket with a reasonable 120W TDP
Increased core counts means deploying larger jobs, scheduling more HPC users on the same system, and deploying more virtual machines. It also helps increase the aggregate throughput of your systems. You can do far more work with Xeon E5-4600v3.
Memory latency and DIMM size
DDR4 doesn’t just mean faster clocks – it also brings with it support for fewer compromises and larger DIMM sizes. 32GB DIMMs are now available as registered as well as load reduced (32GB DDR4-2133 RDIMMs vs. 32GB DDR4-2133 LRDIMMs) modules. The shift to a traditional register in an RDIMM from a specialty buffer in an LRDIMM means a substantial latency decrease.
Advances in manufacturing for DDR4 also mean larger DIMM sizes. 64GB LRDIMMs are now being manufactured to help support that outstanding 3TB memory capacity.
Haswell microarchitecture and AVX2
AVX2 is an advanced CPU instruction set that debuted in the Haswell architecture and has shown strong benefits:
- New floating point FMA, with up to 2X the FLOPS per core (16 FLOPS/clock)
- 256-bit wide integer vector instructions
These new instructions are extremely consequential. We encourage you to learn more about these improvements, and how to compile for the new instructions, with our post on AVX2 Optimization.
Intel Xeon E5-4600v3 Series Specifications
HPC groups do not typically choose Intel’s “Basic” models (e.g., E5-4610v3)
Intel Xeon E5-4600v3 Frequency Optimized SKUs
The above SKUs offer better memory bandwidth per core
We think the improvements in the Xeon E5-4600v3 CPUs make them a unique alternative to far more complicated HPC installations and a worthwhile upgrade from their predecessors. Want to learn more about the Xeon E5-4600v3 CPUs? Talk with an expert and assess how they might fit your HPC needs.