Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors

Articles > Detailed Specifications of the Intel Xeon E5-2600v3 “Haswell-EP” Processors
This article provides in-depth discussion and analysis of the 22nm Xeon E5-2600v3 series processors (formerly codenamed “Haswell-EP”). “Haswell” processors replace the previous 22nm “Ivy Bridge” microarchitecture and are available for sale as of September 8, 2014. Note: these have since been superceded by Xeon E5-2600v4 Broadwell-EP Processors.

Important changes available in E5-2600v3 “Haswell-EP” include:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for DDR4 memory speeds up to 2133MHz
  • Advanced Vector Extensions version 2.0 (AVX2 instructions):
    • allow 256-bit wide operations for both integer and floating-point numbers (the older AVX instructions supported only floating-point operations)
    • introduce Fused Multiply Add FMA3 instructions, which allow a multiply and an accumulate instruction to be completed in a single cycle (potentially doubling throughput for floating-point applications – up to 16 FLOPS per cycle)
    • add support for additional instructions, including Gather and vector shift
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC applications.

Exceptional Computational Performance

The Xeon E5-2600v3 processors introduce the highest performance available to date in a socketed CPU. For the first time, a single CPU is capable of more than half a TeraFLOPS (500 GFLOPS). This is made possible through the use of AVX2 with FMA3 instructions. The plot below compares the peak performance of these CPUs with and without FMA instructions:

Plot of Xeon E5-2600v3 Theoretical Peak Performance (GFLOPS)

The colored bars indicate performance using only AVX instructions; the grey bars indicate theoretical peak performance when using AVX with FMA. Note that only a small set of codes will be capable of issuing almost exclusively FMA instructions (e.g., LINPACK). Most applications will issue a variety of instructions, which will result in lower than peak FLOPS. Expect the achieved performance for well-parallelized & optimized applications to fall between the grey and colored bars.

Intel Xeon E5-2600v3 Series Specifications

The tabs below compare the features and specifications of the new model line. Intel has divided the CPUs into several groups:

  • Standard: cost-effective CPUs with moderate performance
  • Advanced: CPUs offering the highest performance for most applications
  • High Core Count: ideal for well-parallelized applications; CPUs providing the highest number of processor cores (sometimes sacrificing clock frequency in favor of core count)
  • Frequency Optimized: ideal for non-parallel/single-threaded applications; CPUs with the highest clock speeds (sacrificing number of cores in order to provide the highest frequencies)

Although these processors introduce significant performance increases, technical readers will see that many of the changes are incremental: increased core counts, improved DDR memory speed, etc. However, processor clock speeds/frequencies have not seen significant improvements.

In fact, in some cases the CPU frequency has been lowered from the previous models. Processor frequency and Turbo Boost behavior have changed significantly with this release. Those metrics are discussed in further detail in the next section.

Chart of Xeon E5-2600v3 Number of CPU Cores

Chart of Xeon E5-2600v3 Memory Performance

Chart of Xeon E5-2600v3 CPU L3 Cache Size

Chart of Xeon E5-2600v3 QPI Performance

Chart of Xeon E5-2600v3 CPU Wattage (TDP)

Model AVX Frequency AVX Turbo Boost Core Count Memory Speed L3 Cache QPI Speed TDP (Watts)
E5-2699v3 1.90 GHz 3.30 GHz 18 2133 MHz 45MB 9.6 GT/s 145W
E5-2698v3 16 40MB 135W
E5-2697v3 2.20 GHz 3.30 GHz 14 35MB 145W
E5-2695v3 1.90 GHz 3.00 GHz 120W
E5-2683v3 1.70 GHz 2.70 GHz
E5-2690v3 2.30 GHz 3.20 GHz 12 30MB 135W
E5-2680v3 2.10 GHz 3.10 GHz 120W
E5-2670v3 2.00 GHz 2.90 GHz
E5-2687Wv3 2.70 GHz 3.50 GHz 10 25MB 160W
E5-2660v3 2.20 GHz 3.10 GHz 105W
E5-2650v3 2.00 GHz 2.80 GHz
E5-2667v3 2.70 GHz 3.50 GHz 8 20MB 135W
E5-2640v3 2.20 GHz 3.40 GHz 1866 MHz 8 GT/s 90W
E5-2630v3 2.10 GHz 3.20 GHz 85W
E5-2643v3 2.80 GHz 3.50 GHz 6 2133 MHz 9.6 GT/s 135W
E5-2620v3 2.10 GHz 3.20 GHz 1866 MHz 15MB 8 GT/s 85W
E5-2637v3 3.20 GHz 3.60 GHz 4 2133 MHz 9.6 GT/s 135W
E5-2623v3 2.70 GHz 3.50 GHz 1866 MHz 10MB 8 GT/s 105W

HPC groups do not typically choose Intel’s “Basic” and “Low Power” models – those skus are not shown.

Clock Speeds & Turbo Boost in Xeon E5-2600v3 series “Haswell” processors

With each new processor line, Intel introduces new architecture optimizations. The design of the “Haswell” architecture acknowledges that highly-parallel/vectorized applications place the highest load on the processor cores (requiring more power and thus generating more heat). While a CPU core is executing intensive vector tasks (AVX instructions), the clock speed may be reduced to keep the processor within its power limits (TDP).

In effect, this may result in the processor running at a lower frequency than the “base” clock speed advertised for each model. For that reason, each “Haswell” processor model is assigned two “base” frequencies:

  1. AVX mode: due to the higher power requirements of AVX instructions, clock speeds may be somewhat lower while executing AVX instructions *
  2. Non-AVX mode: while not executing AVX instructions, the processor will operate at what would traditionally be considered the “stock” frequency

* a CPU core will return to Non-AVX mode 1 millisecond after AVX instructions complete

AVX and Non-AVX Turbo Boost

Just as in previous architectures, “Haswell” CPUs include the Turbo Boost feature which causes each processor core to operate well above the “base” clock speed during most operations. The precise clock speed increase depends upon the number & intensity of tasks running on each CPU. With the “Haswell” architecture, Turbo Boost speed increases also depend upon the types of instructions (AVX vs. Non-AVX).

The two plots below show that processor clock speeds can be categorized as:

  1. All cores on the CPU actively running Non-AVX instructions
  2. All cores on the CPU actively running AVX instructions
  3. A single active core running Non-AVX instructions (all other cores on the CPU must be idle)
  4. A single active core running AVX instructions (all other cores on the CPU must be idle)

Note that despite the clear rules stated above, each value is still a range of clock speeds. Because workloads are so diverse, Intel is unable to guarantee one specific clock speed for AVX or Non-AVX instructions. Users are guaranteed that cores will run within a specific range, but each application will have to be benchmarked to determine which frequencies a CPU will operate at.

When examining the differences between AVX and Non-AVX instructions, notice that Non-AVX instructions typically result in no more than a 100MHz to 200MHz increase in the highest clock speed. However, AVX instructions may cause clock speeds to drop by 300MHz to 400MHz if they are particularly intensive.

Recall that AVX2 introduces support for both integer and floating-point instructions, which means any compute-intensive application will be using such instructions (if it has been properly designed and compiled). HPC users should expect their processors to be running in AVX mode most of the time.

Top Clock Speeds for Specific Core Counts

When workloads leave some CPU cores idle, the Xeon E5-2600v3 processors are able to use that headroom to increase the clock speed of the cores which are performing work. Just as with other Turbo Boost scenarios, the precise speed increase will depend upon the CPU model. It will also depend upon how many CPU cores are active.

We advise users to consider how many CPU cores their application is able to saturate. The tabs below detail the peak Turbo Boost frequencies for each CPU model, sorted by the number of active cores:

All of the above plots show CPU frequencies for applications utilizing AVX instructions. The colored bars indicate the worst-case scenario – CPUs will run at least this fast. The grey bars indicate the expected clock speeds for most workloads.

Cost-Effectiveness and Power Efficiency of Xeon E5-2600v3 CPUs

The “Haswell-EP” processors have nearly the same price structure and power requirements as earlier Xeon E5-2600 products, so their cost-effectiveness and power-efficiency should be quite attractive to HPC users. Savvy readers may find the following facts useful:

  • Although v3 Xeons follow the same price steps as their v2 counterparts, three High-Core-Count models were late additions. These models are higher performing and carry higher prices than previous E5-2600 models.
  • The power requirement (TDP) for each model has increased by 5 Watts over the previous generation. This is due to integration of the Voltage Regulator Modules (VRMs) which were previously placed on the motherboard. Thus, CPU TDP increases 5W and motherboard TDP decreases 5W.
  • The following graphs depict the cost-effectiveness and power-efficiency of only the CPU itself. In many cases, HPC users will find that once they’ve taken the full platform and cluster design into account, the cost-effectiveness of a higher core count CPU may be more beneficial than these plots demonstrate.

Summary of features in Xeon E5-2600v3 “Haswell-EP” processors

In addition to the capabilities mentioned at the top of this article, these processors include many of the successful features from earlier Xeon designs. The list below provides a summary of relevant technology features:

  • Up to 18 processor cores per socket (with options for 4-, 6-, 8-, 10-, 12-, 14- and 16-cores)
  • Support for Quad-channel ECC DDR4 memory speeds up to 2133MHz
  • Direct PCI-Express (generation 3.0) connections between each CPU and peripheral devices such as network adapters, GPUs and coprocessors (40 PCI-E lanes per socket)
  • Advanced Vector Extensions (AVX 2.0):
    • effectively double the throughput of integer and floating-point operations with math units expanded from 128-bits to 256-bits
    • introduce Fused Multiply Add (FMA3) instructions which allow a multiply and an accumulate instruction to be completed in a single cycle (effectively doubling the FLOPS/clock from 8 to 16 for each core of a CPU)
    • add support for additional instructions, including Gather and vector shift
    • F16C 16-bit Floating-Point conversion instructions accelerate data conversion between 16-bit and 32-bit floating point formats
  • Turbo Boost technology improves performance under peak loads by increasing processor clock speeds. With version 2.0, (introduced in “Sandy Bridge”) clock speeds are boosted more frequently, to higher speeds and for longer periods of time. With “Haswell”, top clock speeds depend upon the type of instructions (AVX vs. Non-AVX).
  • Dual Quick Path Interconnect (QPI) links between processor sockets improve communication speeds for multi-threaded applications
  • Improved energy efficiency with Per Core P-States and independent uncore frequency control
  • Intel Data Direct I/O Technology increases performance and reduces latency by allowing Intel ethernet controllers and adapters to talk directly with the processor cache
  • Advanced Encryption Standard New Instructions (AES-NI) accelerate encryption and decryption for fast, affordable data protection and security
  • 32-bit & 64-bit Intel Virtualization Technology (VT/VT-x) for Directed I/O (VT-d) and Connectivity (VT-c) deliver faster performance for core virtualization processes and provide built-in hardware support for I/O virtualization.
  • Intel APIC Virtualization (APICv) provides increased virtualization performance
  • Hyper-Threading technology allows two threads to “share” a processor core for improved resource usage. Although useful for some workloads, it is not recommended for HPC applications.
Category: Tags:


Comments are closed.