Detailed Specifications of the Intel Xeon E5-4600 v3 “Haswell-EP” Processors

Articles > Detailed Specifications of the Intel Xeon E5-4600 v3 “Haswell-EP” Processors
This article provides in-depth discussion and analysis of the 22nm Xeon E5-4600 v3 series processors (formerly codenamed “Haswell-EP”). “Haswell” processors replace the previous 22nm “Ivy Bridge” microarchitecture and are available for sale as of June 1, 2015. For an introduction, read our blog post Xeon E5-4600v3 4-socket CPU Review

Important changes available in E5-4600 v3 “Haswell-EP” include:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for DDR4 memory speeds up to 2133MHz
  • Advanced Vector Extensions version 2.0 (AVX2 instructions):
    • allow 256-bit wide operations for both integer and floating-point numbers (the older AVX instructions supported only floating-point operations)
    • introduce Fused Multiply Add FMA3 instructions, which allow a multiply and an accumulate instruction to be completed in a single cycle (potentially doubling throughput for floating-point applications – up to 16 FLOPS per cycle)
    • add support for additional instructions, including Gather and vector shift
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC applications.

Exceptional Computational Performance

The Xeon E5-4600 v3 processors provide some of the highest performance available to date in a socketed CPU (similar to their dual-socket “Haswell-EP” counterparts). For the first time, this architecture offers a single CPU capable of more than half a TeraFLOPS (500 GFLOPS) and total system performance over 2 TFLOPS!. This is made possible through the use of AVX2 with FMA3 instructions. The plot below compares the peak performance of a single CPU with and without FMA instructions:

Chart of Xeon E5-4600 v3 Theoretical Peak Performance in GigaFLOPS

The colored bars indicate performance using only AVX instructions; the grey bars indicate theoretical peak performance when using AVX with FMA. Note that only a small set of codes will be capable of issuing almost exclusively FMA instructions (e.g., LINPACK). Most applications will issue a variety of instructions, which will result in lower than peak FLOPS. Expect the achieved performance for well-parallelized & optimized applications to fall between the grey and colored bars.

Intel Xeon E5-4600 v3 Series Specifications

The tabs below compare the features and specifications of the new model line. Intel has divided the CPUs into several groups:

  • Standard: cost-effective CPUs with moderate performance
  • Advanced: CPUs offering the highest performance for most applications
  • High Core Count: ideal for well-parallelized applications; CPUs providing the highest number of processor cores (sometimes sacrificing clock frequency in favor of core count)
  • Frequency Optimized: ideal for non-parallel/single-threaded applications; CPUs with the highest clock speeds (sacrificing number of cores in order to provide the highest frequencies)

Although these processors introduce significant performance increases, technical readers will see that many of the changes are incremental: increased core counts, improved DDR memory speed, etc. However, processor clock speeds/frequencies have not seen significant improvements.

In fact, in some cases the CPU frequency has been lowered from the previous models. Processor frequency and Turbo Boost behavior have changed significantly with this release. Those metrics are discussed in further detail in the next section.

CPU Cores

Chart of the number of CPU Cores in the Xeon E5-4600 v3 CPUs

Memory Speed

Chart of the Xeon E5-4600 v3 Supported Memory Speeds

L3 Cache

Chart of the Xeon E5-4600 v3 CPU L3 Cache Sizes

QPI

Chart of the Xeon E5-4600 v3 CPU QPI Performance

TDP

Chart of the Xeon E5-4600 v3 CPU TDP Wattages

Specifications Table

Model Frequency Frequency (AVX) Turbo Boost Core Count L3 Cache QPI Speed Memory Speed TDP (Watts)
E5-4669 v3 2.10 GHz 1.80 GHz 2.90 GHz 18 45MB 9.6 GT/s 2133 MHz 135W
E5-4667 v3 2.00 GHz 1.70 GHz 2.90 GHz 16 40MB
E5-4660 v3 2.10 GHz 1.80 GHz 2.90 GHz 14 35MB 120W
E5-4650 v3 2.10 GHz 1.80 GHz 2.80 GHz 12 30MB 105W
E5-4640 v3 1.90 GHz 1.60 GHz 2.60 GHz 8.0 GT/s 1866 MHz
E5-4620 v3 2.00 GHz 1.70 GHz 2.60 GHz 10 25MB
E5-4610 v3 1.70 GHz 1.70 GHz None 6.4 GT/s 1600 MHz

HPC groups do not typically choose Intel’s “Basic” models (e.g., E5-4610 v3)

Intel Xeon E5-4600v3 Frequency Optimized SKUs

Model Frequency Frequency (AVX) Turbo Boost Core Count L3 Cache QPI Speed Memory Speed TDP (Watts)
E5-4655 v3 2.90 GHz 2.60 GHz 3.20 GHz 6 30MB 9.6 GT/s 2133 MHz 135W
E5-4627 v3 2.60 GHz 2.30 GHz 3.20 GHz 10 25MB

The above SKUs offer better memory bandwidth per core

Clock Speeds & Turbo Boost in Xeon E5-4600 v3 series “Haswell” processors

With each new processor line, Intel introduces new architecture optimizations. The design of the “Haswell” architecture acknowledges that highly-parallel/vectorized applications place the highest load on the processor cores (requiring more power and thus generating more heat). While a CPU core is executing intensive vector tasks (AVX instructions), the clock speed may be reduced to keep the processor within its power limits (TDP).

In effect, this may result in the processor running at a lower frequency than the “base” clock speed advertised for each model. For that reason, each “Haswell” processor model is assigned two “base” frequencies:

  1. AVX mode: due to the higher power requirements of AVX instructions, clock speeds may be somewhat lower while executing AVX instructions *
  2. Non-AVX mode: while not executing AVX instructions, the processor will operate at what would traditionally be considered the “stock” frequency

* a CPU core will return to Non-AVX mode 1 millisecond after AVX instructions complete

AVX and Non-AVX Turbo Boost

Just as in previous architectures, “Haswell” CPUs include the Turbo Boost feature which causes each processor core to operate well above the “base” clock speed during most operations. The precise clock speed increase depends upon the number & intensity of tasks running on each CPU. With the “Haswell” architecture, Turbo Boost speed increases also depend upon the types of instructions (AVX vs. Non-AVX).

The two plots below show that processor clock speeds can be categorized as:

  1. All cores on the CPU actively running Non-AVX instructions
  2. All cores on the CPU actively running AVX instructions
  3. A single active core running Non-AVX instructions (all other cores on the CPU must be idle)
  4. A single active core running AVX instructions (all other cores on the CPU must be idle)

Clock Speeds for All-Core Operation

Chart of Xeon E5-4600 v3 CPU Frequency and Turbo Boost Speeds with AVX and Non-AVX Instructions (when all cores are active)

Clock Speeds for Single-Core Operation

Chart of Xeon E5-4600 v3 CPU Frequency and Turbo Boost Speeds when running AVX and Non-AVX Instructions (when only a single core is active)

Note that despite the clear rules stated above, each value is still a range of clock speeds. Because workloads are so diverse, Intel is unable to guarantee one specific clock speed for AVX or Non-AVX instructions. Users are guaranteed that cores will run within a specific range, but each application will have to be benchmarked to determine which frequencies a CPU will operate at.

When examining the differences between AVX and Non-AVX instructions, notice that Non-AVX instructions do not result in dramatically higher Turbo Boost speeds. With the exception of the E5-4620 v3, none of the grey bars rises any higher than the colored bars. Thus, for most CPUs the maximum possible Turbo Boost speed is the same when using AVX and Non-AVX instructions. However, heavy usage of AVX instructions may reduce the clock speed by as much as 300MHz.

Recall that AVX2 introduces support for both integer and floating-point instructions, which means any compute-intensive application will be using such instructions (if it has been properly designed and compiled). HPC users should expect their processors to be running in AVX mode most of the time.

Of course, it is worth remembering that the usage of AVX instructions can result in as much as a 100% increase in performance. It is much better to leverage AVX instructions – gaining the 100% increase in instruction throughput and suffering the small 5% to 15% CPU clock speed penalty. It would be unwise to turn off AVX with the expectation that overall performance would increase.

Top Clock Speeds for Specific Core Counts

When workloads leave some CPU cores idle, the Xeon E5-4600 v3 processors are able to use that headroom to increase the clock speed of the cores which are performing work. Just as with other Turbo Boost scenarios, the precise speed increase will depend upon the CPU model. It will also depend upon how many CPU cores are active.

We advise users to consider how many CPU cores their application is able to saturate. The tabs below detail the peak Turbo Boost frequencies for each CPU model, sorted by the number of active cores:

1/2

Chart of Xeon E5-4600 v3 CPU Frequency when 1 to 2 cores are active

3

Chart of Xeon E5-4600 v3 CPU Frequency when 3 cores are active

4

Chart of Xeon E5-4600 v3 CPU Frequency when 4 cores are active

5

Chart of Xeon E5-4600 v3 CPU Frequency when 5 cores are active

6

Chart of Xeon E5-4600 v3 CPU Frequency when 6 cores are active

7

Chart of Xeon E5-4600 v3 CPU Frequency when 7 cores are active

8

Chart of Xeon E5-4600 v3 CPU Frequency when 8 cores are active

9/10

Chart of Xeon E5-4600 v3 CPU Frequency when 9 or 10 cores are active

11/12

Chart of Xeon E5-4600 v3 CPU Frequency when 11 or 12 cores are active

13/14

Chart of Xeon E5-4600 v3 CPU Frequency when 13 or 14 cores are active

15/16

Chart of Xeon E5-4600 v3 CPU Frequency when 15 or 16 cores are active

17+

Chart of Xeon E5-4600 v3 CPU Frequency when 17 or 18 cores are active

All of the above plots show CPU frequencies for applications utilizing AVX instructions. The colored bars indicate the worst-case scenario – CPUs will run at least this fast. The grey bars indicate the expected clock speeds for most workloads.

Cost-Effectiveness and Power Efficiency of Xeon E5-4600 v3 CPUs

The “Haswell-EP” processors have nearly the same price structure and power requirements as earlier Xeon E5-4600 products, so their cost-effectiveness and power-efficiency should be quite attractive to HPC users. Savvy readers may find the following facts useful:

  • The Xeon E5-4627 v3 CPUs are typically optimized for HPC workloads. Additionally, they feature pricing attractive to HPC groups.
  • The power requirement (TDP) for each model has increased by 5 Watts over the previous generation. This is due to integration of the Voltage Regulator Modules (VRMs) which were previously placed on the motherboard. Thus, CPU TDP increases 5W and motherboard TDP decreases 5W.
  • The following graphs depict the cost-effectiveness and power-efficiency of only the CPU itself. In many cases, HPC users will find that once they’ve taken the full platform and cluster design into account, the cost-effectiveness of a higher core count CPU may be more beneficial than these plots demonstrate.

Performance vs. Price

Chart of Xeon E5-4600 v3 Processor Cost-Effectiveness

Performance vs. Power

Chart of Xeon E5-4600 v3 CPU Power-Efficiency (measured in Watts per GigaFLOPS)

Processor Prices

Chart of Xeon E5-4600 v3 CPU Prices

Summary of features in Xeon E5-4600 v3 “Haswell-EP” processors

In addition to the capabilities mentioned at the top of this article, these processors include many of the successful features from earlier Xeon designs. The list below provides a summary of relevant technology features:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for Quad-channel ECC DDR4 memory speeds up to 2133MHz
  • Direct PCI-Express (generation 3.0) connections between each CPU and peripheral devices such as network adapters, GPUs and coprocessors (40 PCI-E lanes per socket)
  • Advanced Vector Extensions (AVX 2.0):
    • effectively double the throughput of integer and floating-point operations with math units expanded from 128-bits to 256-bits
    • introduce Fused Multiply Add (FMA3) instructions which allow a multiply and an accumulate instruction to be completed in a single cycle (effectively doubling the FLOPS/clock from 8 to 16 for each core of a CPU)
    • add support for additional instructions, including Gather and vector shift
    • F16C 16-bit Floating-Point conversion instructions accelerate data conversion between 16-bit and 32-bit floating point formats
  • Turbo Boost technology improves performance under peak loads by increasing processor clock speeds. With version 2.0, (introduced in “Sandy Bridge”) clock speeds are boosted more frequently, to higher speeds and for longer periods of time. With “Haswell”, top clock speeds depend upon the type of instructions (AVX vs. Non-AVX).
  • Faster Quick Path Interconnect (QPI) links between processor sockets improve communication speeds for multi-threaded applications
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control
  • Intel Data Direct I/O Technology increases performance and reduces latency by allowing Intel ethernet controllers and adapters to talk directly with the processor cache
  • Advanced Encryption Standard New Instructions (AES-NI) accelerate encryption and decryption for fast, affordable data protection and security
  • 32-bit & 64-bit Intel Virtualization Technology (VT/VT-x) for Directed I/O (VT-d) and Connectivity (VT-c) deliver faster performance for core virtualization processes and provide built-in hardware support for I/O virtualization.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Hyper-Threading technology allows two threads to “share” a processor core for improved resource usage. Although useful for some workloads, it is not recommended for HPC applications.
Category: Tags:

 

Comments are closed.