Revision for “Detailed Specifications of the AMD EPYC “Rome” CPUs” created on March 15, 2021 @ 11:01:30
Detailed Specifications of the AMD EPYC "Rome" CPUs
<em>This article provides in-depth discussion and analysis of the 7nm AMD EPYC processor (codenamed "Rome" and based on AMD’s Zen2 architecture). EPYC "Rome" processors replace the previous "Naples" processors and are available for sale as of August 7th, 2019. We also have provided an <a href="https://www.microway.com/hpc-tech-tips/amd-epyc-rome-cpu-review/">AMD EPYC "Rome" CPU Review</a> that you may wish to review.</em> <strong>Note: these have since been superseded by the <a href="https://www.microway.com/knowledge-center-articles/detailed-specifications-of-the-amd-epyc-milan-cpus/">"Milan" AMD EPYC CPUs</a>.</strong>
These new CPUs are the second iteration of AMD’s EPYC server processor family. They remain compatible with the existing workstation and server platforms, but feature significant feature and performance improvements. Some of the new features (e.g., PCI-E 4.0) will require updated/revised platforms. If you’re looking to upgrade to or deploy these new CPUs, please <a href="https://www.microway.com/contact/" rel="noopener noreferrer" target="_blank">speak with one of our experts</a> to learn more.
<h2>Important features/changes in EPYC "Rome" CPUs include:</h2>
Before diving into the details, it helps to keep in mind the following recommendations. Based on our experience with HPC and Deep Learning deployments, our guidance for selecting among the EPYC options is as follows:
As shown above, the <em>shaded/colored bars</em> indicate the expected performance ranges for each CPU model. These peak performance numbers are achieved when executing 256-bit AVX2 instructions with FMA. Note that only a small set of codes issue almost exclusively AVX2 FMA instructions (e.g., LINPACK). Most applications issue a variety of instructions and will achieve lower than the peak FLOPS values shown above. Applications which have not been re-compiled with an appropriate compiler would not include AVX2 instructions and would thus achieve lower performance.
The dotted lines indicate the possible peak performance if all cores are operating at boosted clock speeds. While theoretically possible for short bursts, sustained performance at these levels is not expected. Sections of code with dense, vectorized instructions are very demanding, and typically result in the processor core slightly lowering its clock speed (this behavior is not unique to AMD CPUs). While AMD has not published specific clock speed expectations for such codes, Microway expects the EPYC "Rome" CPUs to operate near their "base" clock speed values even when executing code with intensive instructions.
The CPU models above are sorted by price (as discussed in the next section). The lowest-performance models provide fewer numbers of CPU cores, less cache, and slower memory speeds. Higher-end models offer high core counts for the best performance. HPC and AI groups are generally expected to favor the mid-range processor models, as the highest core count CPUs are priced at a premium.
<strong><em>Note that those models which only support single-CPU installations are separated on the left side of each plot.</em></strong>
All the CPUs in this article are sorted by price (as shown in the plot above). To ease comparisons, all of the plots in this article are ordered to match the above plot. Keep this pricing in mind as you review this article and plan your system architecture. The color of each bar indicates the expected customer price per CPU:
Most HPC users are expected to select CPU models around the high price tier. These models provide industry-leading performance (and excellent performance per dollar) for a price under $4,000 per processor. Applications can certainly leverage the premium EPYC processor models, but they will come at a higher price.
Most HPC groups will find that processors with 16 to 32 CPU cores are attractively priced. Systems with up to 32-cores per CPU will be in the price ranges commonly selected for HPC & AI workloads. However, the 48-core and 64-core models will be at higher prices than many groups consider cost-effective.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__CPU_Number_of_Cores.png" alt="Chart comparing the AMD EPYC "Rome" CPU processor core counts" width="750" height="660" class="aligncenter size-full wp-image-11771" />
The CPU frequencies of the new AMD EPYC CPUs are relatively high – a welcome improvement over the previous generation. Each core supports "Boost" speeds enabling temporary boosts of speed over the base clock speed. The maximum Boost speed for each CPU model is shown as a dotted line.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__CPU_Clock_Speeds.png" alt="Chart comparing the AMD EPYC "Rome" CPU clock frequencies" width="750" height="660" class="aligncenter size-full wp-image-11772" />
Supported memory speeds of the AMD EPYC "Rome" processors are straightforward. The amount of memory bandwidth available per CPU core can be an important consideration, but is simply a function of the number of cores. Teams planning to select CPUs with higher core counts need to ensure that each core won’t be starved of data.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__Memory_Performance.png" alt="Chart comparing the AMD EPYC "Rome" CPU supported memory speeds" width="750" height="660" class="aligncenter size-full wp-image-11773" />
With EPYC Rome, AMD has brought a tremendous increase in the size of L3 cache. This is expected to offer significant benefits for demanding applications.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__CPU_L3_Cache_Size.png" alt="Chart comparing the AMD EPYC "Rome" CPU L3 cache sizes" width="750" height="660" class="aligncenter size-full wp-image-11774" />
Although dual-socket systems continue to be the most common for HPC & AI workloads, the capabilities of a single EPYC CPU can be compelling. A variety of single-socket EPYC systems are available, and AMD has built several CPU SKUs which support only single-socket operation. The plot below compares the various CPU socket counts supported by this processor line-up.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__CPU_Supported_Socket_Count.png" alt="Chart comparing the AMD EPYC "Rome" CPU supported socket counts" width="750" height="660" class="aligncenter size-full wp-image-11775" />
As the industry pushes for increased performance, CPU power requirements have increased across the board. The lowest-wattage EPYC "Rome" CPU is 120 Watts, with the majority of models in the 155W~180W range. The highest core count models are over 200 Watts. HPC users must be certain that the systems they select have received thorough thermal validation. Systems which run hot will throttle CPU speeds and suffer lower performance.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__CPU_TDP_Wattage.png" alt="Chart comparing the AMD EPYC "Rome" CPU TDP wattage requirements" width="750" height="660" class="aligncenter size-full wp-image-11776" />
The plots below compare the cost-effectiveness and power efficiency of these CPU models. The intent is to go beyond the raw "speeds and feeds" of the processors to determine which models will be most attractive for HPC and Deep Learning/AI deployments.
Historically, we have looked at the performance of each CPU and compared that with their costs. However, this presents a distorted view as it does not include all the other necessary components in an HPC/AI system (the server, system memory, high-speed fabric, etc). That simplistic comparison is shown further below, but first examine this plot which demonstrates the cost per FLOPS for a set of complete Compute Nodes with AMD EPYC Rome CPUs, 4GB of system memory per core, and 100Gbps EDR InfiniBand. As shown, the models with higher core counts tend to be the most cost-effective overall.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__Cost-Effectiveness__Including-Server.png" alt="Chart comparing the cost-effectiveness of AMD EPYC "Rome" CPUs (including the price of server and memory)" width="750" height="660" class="aligncenter size-full wp-image-12059" />
The plot below shows a simple comparison of CPU performance versus the number of FLOPS provided for each model. This may be useful when comparing to older CPU models, but for new projects we recommend the plot above.
This plot compares the power requirements (TDP) versus performance throughput of each CPU. Although this generation includes some of the highest-wattage CPUs to date, each is actually quite power efficient. In fact, even the 225-Watt CPU models are among the top most-efficient models in this product line.
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__Power-Efficiency.png" alt="Chart comparing the AMD EPYC "Rome" CPU power efficiency" width="750" height="660" class="aligncenter size-full wp-image-11778" />
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__2.5Ghz_CPU_Number_of_Cores.png" alt="Comparison chart of AMD EPYC "Rome" 2.5+GHz CPU core counts" width="750" height="660" class="aligncenter size-full wp-image-11779" />
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__2.5GHz_Theoretical_Peak_Performance_FLOPS.png" alt="Chart comparing AMD EPYC "Rome" 2.5+GHz CPU core counts" width="750" height="660" class="aligncenter size-full wp-image-11780" />
As discussed above, this plot shows the price per FLOPS for Complete Compute nodes with 4GB of system memory per CPU core (which is towards the lower end of what many HPC sites currently deploy).
<img src="https://www.microway.com/wp-content/uploads/AMD-EPYC_Rome__2.5GHz_Cost-Effectiveness__Including-Server.png" alt="Chart comparing the cost-effectiveness of 2.5+GHz AMD EPYC "Rome" CPUs (including the price of server and memory)" width="750" height="660" class="aligncenter size-full wp-image-12060" />