These new CPUs are the second iteration of AMD’s EPYC server processor family. They remain compatible with the existing workstation and server platforms, but feature significant feature and performance improvements. Some of the new features (e.g., PCI-E 4.0) will require updated/revised platforms. If you’re looking to upgrade to or deploy these new CPUs, please speak with one of our experts to learn more.
Important features/changes in EPYC “Rome” CPUs include:
- Up to 64 processor cores per socket (with options for 8-, 12-, 16-, 24-, 32-, and 48-cores)
- Improved CPU clock speeds up to 3.1GHz (with Boost speeds up to 3.4GHz)
- Increased computational performance:
- Full support for 256-bit AVX2 instructions with two 256-bit FMA units per CPU core
The previous “Naples” architecture split 256-bit instructions into two separate 128-bit operations
- Up to 16 double-precision FLOPS per cycle per core
- Full support for 256-bit AVX2 instructions with two 256-bit FMA units per CPU core
- Memory capacity & performance features:
- Eight-channel memory controller on each CPU
- Support for DDR4 memory speeds up to 3200MHz (up from 2666MHz)
- Up to 4TB memory per CPU socket
- Up to 256MB L3 cache per CPU (up from 64MB)
- Support for PCI-Express generation 4.0 (which doubles the throughput of gen 3.0)
- Up to 128 lanes of PCI-Express per CPU socket
- Integrated in-silicon security mitigations for Spectre
With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC & AI applications.
Before diving into the details, it helps to keep in mind the following recommendations. Based on our experience with HPC and Deep Learning deployments, our guidance for selecting among the EPYC options is as follows:
- 8-core EPYC CPUs – not recommended for HPC
While available for a low price, these models are not as cost-effective as many of the higher core count models.
- 12-, 16-, and 24-core EPYC CPUs – recommended for most HPC workloads
The best balance of performance and price.
- 32-core EPYC CPUs – recommended for certain HPC workloads
Many applications exhibit diminishing returns as core counts increase. For scalable applications that are not memory bandwidth bound, the 32-core EPYC CPUs will be cost-effective.
- 48-core and 64-core EPYC CPUs – recommended only for specific HPC workloads
Although the highest core count models provide the highest performance, their higher price makes them suitable only for particular workloads (e.g., high core count, SMP, and large-memory Compute Nodes).
Microway operates a Test Drive cluster to assist in evaluating and comparing these options as users develop the specifications for their new HPC & AI deployments. We would be happy to help you evaluate AMD EPYC processors as you plan your purchase.
Unprecedented Computational Performance
The EPYC “Rome” processors deliver new capabilities and exceptional performance. Many models provide over 1 TFLOPS (one teraflop of double-precision 64-bit performance per second) and several models provide 2 TFLOPS. This performance is achieved by doubling the computational power of each core and doubling the number of cores. The plot below shows the performance range across this new CPU line-up:
As shown above, the shaded/colored bars indicate the expected performance ranges for each CPU model. These peak performance numbers are achieved when executing 256-bit AVX2 instructions with FMA. Note that only a small set of codes issue almost exclusively AVX2 FMA instructions (e.g., LINPACK). Most applications issue a variety of instructions and will achieve lower than the peak FLOPS values shown above. Applications which have not been re-compiled with an appropriate compiler would not include AVX2 instructions and would thus achieve lower performance.
The dotted lines indicate the possible peak performance if all cores are operating at boosted clock speeds. While theoretically possible for short bursts, sustained performance at these levels is not expected. Sections of code with dense, vectorized instructions are very demanding, and typically result in the processor core slightly lowering its clock speed (this behavior is not unique to AMD CPUs). While AMD has not published specific clock speed expectations for such codes, Microway expects the EPYC “Rome” CPUs to operate near their “base” clock speed values even when executing code with intensive instructions.
The CPU models above are sorted by price (as discussed in the next section). The lowest-performance models provide fewer numbers of CPU cores, less cache, and slower memory speeds. Higher-end models offer high core counts for the best performance. HPC and AI groups are generally expected to favor the mid-range processor models, as the highest performance CPUs are priced at a premium.
AMD EPYC “Rome” Price Ranges
The new EPYC “Rome” processors span a fairly wide range of prices, so budget must be considered when selecting a CPU. While the entry-level models are under $1,000, the highest-end EPYC processors cost nearly $10,000 each. It would be frustrating to plan for 64-core processors when the budget cannot support the price. The plot below compares the prices of the EPYC “Rome” processors:
All the CPUs in this article are sorted by price (as shown in the plot above). To ease comparisons, all of the plots in this article are ordered to match the above plot. Keep this pricing in mind as you review this article and plan your system architecture. The color of each bar indicates the expected customer price per CPU:
- Low price tier: prices below $1,000 per CPU
- Mid price tier: prices between $1,000 and $2,000
- High price tier: prices between $2,000 and $4,000
- Premium price tier: prices above $4,000 per EPYC CPU
Most HPC users are expected to select CPU models in the low, mid, or high price range. These models provide industry-leading performance for a price under $4,000 per processor. Applications can certainly leverage the premium EPYC processor models, but they will come at a higher price.
AMD “Rome” EPYC Processor Specifications
The set of tabs below compares the features and specifications of this new EPYC processor family. Take note that certain CPU SKUs are designed for single-socket systems (indicated with a P suffix on the part number). All other models may be used in either a single- or dual-socket system. The P-series AMD EPYC CPUs have a lower price and are thus the most cost-effective models, but remember that they are not available in dual-CPU systems.
Most HPC groups will find that processors with 8 to 24 CPU cores are attractively priced. Systems with up to 32-cores per CPU will be in the price ranges commonly selected for HPC & AI workloads. However, the 48-core and 64-core models will be at a higher price than most groups would consider cost-effective.
The CPU frequencies of the new AMD EPYC CPUs are relatively high – a welcome improvement over the previous generation. Each core supports “Boost” speeds enabling temporary boosts of speed over the base clock speed. The maximum Boost speed for each CPU model is shown as a dotted line.
Supported memory speeds of the AMD EPYC “Rome” processors are straightforward. The amount of memory bandwidth available per CPU core can be an important consideration, but is simply a function of the number of cores. Teams planning to select CPUs with higher core counts need to ensure that each core won’t be starved of data.
Although dual-socket systems continue to be the most common for HPC & AI workloads, the capabilities of a single EPYC CPU can be compelling. A variety of single-socket EPYC systems are available, and AMD has built several CPU SKUs which support only single-socket operation. The plot below compares the various CPU socket counts supported by this processor line-up.
As the industry pushes for increased performance, CPU power requirements have increased across the board. The lowest-wattage EPYC “Rome” CPU is 120 Watts, with the majority of models in the 155W~180W range. The highest core count models are over 200 Watts. HPC users must be certain that the systems they select have received thorough thermal validation. Systems which run hot will throttle CPU speeds and suffer lower performance.
Cost-Effectiveness and Power Efficiency of EPYC “Rome” CPUs
Overall, the AMD EPYC processors provide great value in price spent versus performance achieved. However, there is a spectrum of efficiency, with certain CPU models offering particularly compelling value. Also remember that the prices and power requirements for some of the top models are fairly high. Savvy readers may find the following facts useful:
- While the EPYC 7282 looks to be the most cost-effective on paper, it is important to consider that is has only half the L3 cache of its other 16-core siblings. Benchmark before making the selection.
- Applications which can be satisfied by a single CPU will benefit greatly from the single-socket EPYC 7xx2P models
The plots below compare the cost-effectiveness and power efficiency of these CPU models. The intent is to go beyond the raw “speeds and feeds” of the processors to determine which models will be most attractive for HPC and Deep Learning/AI deployments.
This plot compares the power requirements (TDP) versus performance throughput of each CPU. Although this generation includes some of the highest-wattage CPUs to date, each is actually quite power efficient. In fact, even the 225-Watt CPU models are among the top most-efficient models in this product line.
Recommended CPU Models for HPC & AI/Deep Learning
Although most of the EPYC CPUs will offer excellent performance, it is common for computationally-demanding sites to set a floor on CPU clock speeds (usually around 2.5GHz). The intent is to ensure that no workload suffers too low of a performance, as not all are well parallelized. While there are users who would prefer higher clock speeds, experience shows that most groups settle on a minimum clock speed of 2.5GHz to 2.6GHz. With that in mind, the comparisons below highlight only those CPU models which offer 2.5+GHz performance.