These new CPUs are the second iteration of AMD’s EPYC server processor family. They remain compatible with the existing workstation and server platforms, but feature significant feature and performance improvements. Some of the new features (e.g., PCI-E 4.0) will require updated/revised platforms. If you’re looking to upgrade to or deploy these new CPUs, please speak with one of our experts to learn more.
Important features/changes in EPYC “Rome” CPUs include:
- Up to 64 processor cores per socket (with options for 8-, 12-, 16-, 24-, 32-, and 48-cores)
- Improved CPU clock speeds up to 3.1GHz (with Boost speeds up to 3.4GHz)
- Increased computational performance:
- Full support for 256-bit AVX2 instructions with two 256-bit FMA units per CPU core
The previous “Naples” architecture split 256-bit instructions into two separate 128-bit operations
- Up to 16 double-precision FLOPS per cycle per core
- Double-precision floating point multiplies complete in 3 cycles (down from 4)
- 15% increase in instructions completed per clock cycle (IPC) for integer operations
- Full support for 256-bit AVX2 instructions with two 256-bit FMA units per CPU core
- Memory capacity & performance features:
- Eight-channel memory controller on each CPU
- Support for DDR4 memory speeds up to 3200MHz (up from 2666MHz)
- Up to 4TB memory per CPU socket
- Up to 256MB L3 cache per CPU (up from 64MB)
- Support for PCI-Express generation 4.0 (which doubles the throughput of gen 3.0)
- Up to 128 lanes of PCI-Express per CPU socket
- Improvements to NUMA architecture:
- Simplified design with one NUMA domain per CPU Socket
- Uniform latencies between CPU dies (plus fewer hops between cores)
- Improved InfinityFabric performance (read speed per clock is doubled to 32 bytes)
- Integrated in-silicon security mitigations for Spectre
With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC & AI applications.
Before diving into the details, it helps to keep in mind the following recommendations. Based on our experience with HPC and Deep Learning deployments, our guidance for selecting among the EPYC options is as follows:
- 8-core EPYC CPUs – not recommended for HPC
While available for a low price, these models are not as cost-effective as many of the higher core count models.
- 12-, 16-, and 24-core EPYC CPUs – suitable for most HPC workloads
While not typically offering the best cost-effectiveness, they provide excellent performance at lower price points.
- 32-core EPYC CPUs – excellent for HPC workloads
These models offer excellent price/performance along with relatively high clock speeds and core counts
- 48-core and 64-core EPYC CPUs – suitable for certain HPC workloads
Although the highest core count models appear to provide the best cost-effectiveness and power efficiency, many applications exhibit diminishing returns at the highest core counts. For scalable applications that are not memory bandwidth bound, these EPYC CPUs will be excellent choices.
Microway operates a Test Drive cluster to assist in evaluating and comparing these options as users develop the specifications for their new HPC & AI deployments. We would be happy to help you evaluate AMD EPYC processors as you plan your purchase.
Unprecedented Computational Performance
The EPYC “Rome” processors deliver new capabilities and exceptional performance. Many models provide over 1 TFLOPS (one teraflop of double-precision 64-bit performance per second) and several models provide 2 TFLOPS. This performance is achieved by doubling the computational power of each core and doubling the number of cores. The plot below shows the performance range across this new CPU line-up:
As shown above, the shaded/colored bars indicate the expected performance ranges for each CPU model. These peak performance numbers are achieved when executing 256-bit AVX2 instructions with FMA. Note that only a small set of codes issue almost exclusively AVX2 FMA instructions (e.g., LINPACK). Most applications issue a variety of instructions and will achieve lower than the peak FLOPS values shown above. Applications which have not been re-compiled with an appropriate compiler would not include AVX2 instructions and would thus achieve lower performance.
The dotted lines indicate the possible peak performance if all cores are operating at boosted clock speeds. While theoretically possible for short bursts, sustained performance at these levels is not expected. Sections of code with dense, vectorized instructions are very demanding, and typically result in the processor core slightly lowering its clock speed (this behavior is not unique to AMD CPUs). While AMD has not published specific clock speed expectations for such codes, Microway expects the EPYC “Rome” CPUs to operate near their “base” clock speed values even when executing code with intensive instructions.
The CPU models above are sorted by price (as discussed in the next section). The lowest-performance models provide fewer numbers of CPU cores, less cache, and slower memory speeds. Higher-end models offer high core counts for the best performance. HPC and AI groups are generally expected to favor the mid-range processor models, as the highest core count CPUs are priced at a premium.
Note that those models which only support single-CPU installations are separated on the left side of each plot.
AMD EPYC “Rome” Price Ranges
The new EPYC “Rome” processors span a fairly wide range of prices, so budget must be considered when selecting a CPU. While the entry-level models are under $1,000, the highest-end EPYC processors cost nearly $10,000 each. It would be frustrating to plan for 64-core processors when the budget cannot support the price. The plot below compares the prices of the EPYC “Rome” processors:
All the CPUs in this article are sorted by price (as shown in the plot above). To ease comparisons, all of the plots in this article are ordered to match the above plot. Keep this pricing in mind as you review this article and plan your system architecture. The color of each bar indicates the expected customer price per CPU:
- Low price tier: prices below $1,000 per CPU
- Mid price tier: prices between $1,000 and $2,000
- High price tier: prices between $2,000 and $4,000
- Premium price tier: prices above $4,000 per EPYC CPU
Most HPC users are expected to select CPU models around the high price tier. These models provide industry-leading performance (and excellent performance per dollar) for a price under $4,000 per processor. Applications can certainly leverage the premium EPYC processor models, but they will come at a higher price.
AMD “Rome” EPYC Processor Specifications
The set of tabs below compares the features and specifications of this new EPYC processor family. Take note that certain CPU SKUs are designed for single-socket systems (indicated with a P suffix on the part number). All other models may be used in either a single- or dual-socket system. The P-series AMD EPYC CPUs have a lower price and are thus the most cost-effective models, but remember that they are not available in dual-CPU systems.
Most HPC groups will find that processors with 16 to 32 CPU cores are attractively priced. Systems with up to 32-cores per CPU will be in the price ranges commonly selected for HPC & AI workloads. However, the 48-core and 64-core models will be at higher prices than many groups consider cost-effective.
The CPU frequencies of the new AMD EPYC CPUs are relatively high – a welcome improvement over the previous generation. Each core supports “Boost” speeds enabling temporary boosts of speed over the base clock speed. The maximum Boost speed for each CPU model is shown as a dotted line.
Supported memory speeds of the AMD EPYC “Rome” processors are straightforward. The amount of memory bandwidth available per CPU core can be an important consideration, but is simply a function of the number of cores. Teams planning to select CPUs with higher core counts need to ensure that each core won’t be starved of data.
With EPYC Rome, AMD has brought a tremendous increase in the size of L3 cache. This is expected to offer significant benefits for demanding applications.
Although dual-socket systems continue to be the most common for HPC & AI workloads, the capabilities of a single EPYC CPU can be compelling. A variety of single-socket EPYC systems are available, and AMD has built several CPU SKUs which support only single-socket operation. The plot below compares the various CPU socket counts supported by this processor line-up.
As the industry pushes for increased performance, CPU power requirements have increased across the board. The lowest-wattage EPYC “Rome” CPU is 120 Watts, with the majority of models in the 155W~180W range. The highest core count models are over 200 Watts. HPC users must be certain that the systems they select have received thorough thermal validation. Systems which run hot will throttle CPU speeds and suffer lower performance.
Cost-Effectiveness and Power Efficiency of EPYC “Rome” CPUs
Overall, the AMD EPYC processors provide great value in price spent versus performance achieved. However, there is a spectrum of efficiency, with certain CPU models offering particularly compelling value. Also remember that the prices and power requirements for some of the top models are fairly high. Savvy readers may find the following facts useful:
- The most cost-effective CPUs likely to be selected are EPYC 7452 and EPYC 7552
- If a balance of cost-effectiveness and higher clock speed are needed, look to EPYC 7502
- While the EPYC 7702 looks to be the most cost-effective on paper, it is important to consider that many applications may not be able to scale efficiently to 64 cores. Benchmark before making the selection.
- Applications which can be satisfied by a single CPU will benefit greatly from the single-socket EPYC 7xx2P models
The plots below compare the cost-effectiveness and power efficiency of these CPU models. The intent is to go beyond the raw “speeds and feeds” of the processors to determine which models will be most attractive for HPC and Deep Learning/AI deployments.
Historically, we have looked at the performance of each CPU and compared that with their costs. However, this presents a distorted view as it does not include all the other necessary components in an HPC/AI system (the server, system memory, high-speed fabric, etc). That simplistic comparison is shown further below, but first examine this plot which demonstrates the cost per FLOPS for a set of complete Compute Nodes with AMD EPYC Rome CPUs, 4GB of system memory per core, and 100Gbps EDR InfiniBand. As shown, the models with higher core counts tend to be the most cost-effective overall.
The plot below shows a simple comparison of CPU performance versus the number of FLOPS provided for each model. This may be useful when comparing to older CPU models, but for new projects we recommend the plot above.
This plot compares the power requirements (TDP) versus performance throughput of each CPU. Although this generation includes some of the highest-wattage CPUs to date, each is actually quite power efficient. In fact, even the 225-Watt CPU models are among the top most-efficient models in this product line.
Recommended CPU Models for HPC & AI/Deep Learning
Although most of the EPYC CPUs will offer excellent performance, it is common for computationally-demanding sites to set a floor on CPU clock speeds (usually around 2.5GHz). The intent is to ensure that no workload suffers too low of a performance, as not all are well parallelized. While there are users who would prefer higher clock speeds, experience shows that most groups settle on a minimum clock speed around ~2.5GHz. With that in mind, the comparisons below highlight only those CPU models which offer 2.5+GHz performance.
As discussed above, this plot shows the price per FLOPS for Complete Compute nodes with 4GB of system memory per CPU core (which is towards the lower end of what many HPC sites currently deploy).