Detailed Specifications of the AMD EPYC “Milan” CPUs

Articles > Detailed Specifications of the AMD EPYC “Milan” CPUs
This article provides in-depth discussion and analysis of the 7nm AMD EPYC processor (codenamed “Milan” and based on AMD’s Zen3 architecture). EPYC “Milan” processors replace the previous “Rome” processors and are available for sale as of March 15th, 2021.

These new CPUs are the third iteration of AMD’s EPYC server processor family. They are compatible with existing workstation and server platforms that supported “Rome”, but include new performance and security improvements. If you’re looking to upgrade to or deploy these new CPUs, please speak with one of our experts to learn more.

Important features/changes in EPYC “Milan” CPUs include:

  • Up to 64 processor cores per socket (with options for 8-, 16-, 24-, 28-, 32-, 48-, and 56-cores)
  • Improved CPU clock speeds up to 3.7GHz (with Max Boost speeds up to 4.1GHz)
  • Unified 32MB L3 cache shared between each set of 8 cores (instead of two separate 16MB caches)
  • Increase in instructions completed per clock cycle (IPC)
  • IOMMU for improved IO performance in virtualized environments
  • The security/memory encryption features present in “Rome”, along with SEV-SNP support (protecting against malicious hypervisors)
  • Plus all the advantages of the previous “Rome” generation:
    • Full support for 256-bit AVX2 instructions with two 256-bit FMA units per CPU core
    • Up to 16 double-precision FLOPS per cycle per core
    • Eight-channel memory controller on each CPU
    • Support for DDR4 memory speeds up to 3200MHz
    • Up to 4TB memory per CPU socket
    • Up to 256MB L3 cache per CPU
    • Support for PCI-Express generation 4.0 (which doubles the throughput of gen 3.0)
    • 128 lanes of PCI-Express 4.0 per CPU socket

With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC & AI applications.

Before diving into the details, it helps to keep in mind the following recommendations. Based on our experience with HPC and Deep Learning deployments, our general guidance for selecting among the EPYC options is shown below. Note that certain applications may deviate from this general advice (e.g., software which benefits from particularly high clock speeds or larger L3 cache per core).

  • 8-core EPYC CPUs – not recommended for HPC
    While perfect for particular applications, these models are not as cost-effective as many of the higher core count options.
  • 16-core to 28-core EPYC CPUs – suitable for most HPC workloads
    While not typically offering the best cost-effectiveness, they provide excellent performance at lower price points.
  • 32-core EPYC CPUs – excellent for HPC workloads
    These models offer excellent price/performance along with higher clock speeds and core counts
  • 48-core to 64-core EPYC CPUs – suitable for certain HPC workloads
    Although these models with high core counts may provide the highest cost-effectiveness and power efficiency, some applications exhibit diminishing returns at the highest core counts. Scalable applications that are not memory bandwidth bound will benefit the most from these EPYC CPUs.

Microway provides a Test Drive cluster to assist in evaluating and comparing products as users determine the ideal specifications for their new HPC & AI deployments. We would be happy to help you evaluate AMD EPYC processors as you plan your next deployment.

AMD EPYC “Milan” Computational Performance

This latest iteration of EPYC CPUs offers excellent performance. However, many of the on-paper comparisons between this generation and the previous generation do not demonstrate large gains. Application benchmarking will be needed to demonstrate many of the gains (such as those provided by the larger/unified L3 cache and the IPC improvements). That being said, most models in this generation provide at least 1 TFLOPs (one teraflop of double-precision 64-bit compute per second) and the 64-core CPUs provide over 2 TFLOPS. The plot below shows the expected performance across this new CPU line-up:
Chart comparing the AMD EPYC 'Milan' CPU theoretical GFLOPS performance with AVX2 instructions

In the chart above, shaded/colored bars indicate the expected performance ranges for each CPU model on traditional HPC applications that use double-precision 64-bit math operations. Peak performance numbers are achieved when executing 256-bit AVX2 instructions with FMA. Note that only a small set of applications are able to use exclusively AVX2 FMA instructions (e.g., LINPACK). Most applications issue a variety of instructions and will achieve lower than the peak FLOPS values shown above. Applications which have not been re-compiled in recent years (with a compiler supporting AVX2 instructions) would achieve lower performance.

The dotted lines above each bar indicate the possible peak performance were all CPU cores operating at boosted clock speeds. While theoretically possible for short amounts of time, sustained performance at these increased CPU frequencies is not expected. Sections of code with dense, vectorized instructions are very demanding, and typically result in each core slightly lowering clock speeds (a behavior not unique to AMD CPUs). While AMD has not published specific clock speed expectations for such codes, Microway expects the EPYC “Milan” CPUs to operate near their standard/published clock speed values when all cores are in use.

Throughout this article, the CPU models are sorted largely by price. The lowest-performance models provide fewer numbers of CPU cores and less L3 cache memory. Higher-end models offer high core counts for the increased performance. HPC and AI groups are generally expected to favor the processor models in the middle of the pack, as the highest core count CPUs are priced at a premium.

Note that those models which only support single-CPU installations are separated on the left side of each plot.

AMD “Milan” EPYC Processor Specifications

The tabs below compare the features and specifications of this 3rd iteration of the EPYC processors. Please notice that CPU models ending with a P suffix are designed for single-socket systems (and do not operate in dual-socket systems). All other CPU models are compatible with both single- or dual-socket systems. The P-series EPYC processors tend to be priced lower and can thus be quite cost-effective, however they are not available in dual-CPU systems.

Core Count

This EPYC generation offers a broad selection of core counts. Most teams running computational apps will find that processors with 24 to 32 CPU cores are quite cost-effective, though the 64-core models have also been quite popular.
Chart comparing the CPU core counts of the AMD EPYC "Milan" processors

CPU Clock Speed

AMD continues to increase the CPU frequencies of the EPYC CPUs. This ‘Milan’ generation offers multiple SKUs operating above 3GHz. Additionally, the boost clock frequencies are considerably increased – with many nearing or exceeding 4GHz. Each CPU core supports “boost” speeds enabling temporary boosts of speed over the base clock speed. The maximum Boost speed for each CPU model is shown as a dotted line.
Chart comparing the clock speeds of the AMD EPYC "Milan" processors

Memory Speed

All EPYC “Milan” processors offer exceptional memory bandwidth, with all models supporting 3200MHz DDR4 memory. The amount of memory throughput available to each CPU core is an important consideration for some applications, but is simply a function of the number of cores. Teams deploting CPUs with higher core counts need to ensure each core won’t be starved of data.
Chart comparing the supported memory speeds of each AMD EPYC "Milan" CPU

L3 Cache Size

AMD continues to offer large L3 cache sizes, and the EPYC ‘Milan’ CPUs continue this lead. More than half of the models provide 256MB total L3 cache, with most of the rest providing 128MB L3 cache. This generation allows each CPU core access to up to 32MB of L3 (up from 16MB in the previous generation).
Chart comparing the L3 cache size of AMD EPYC "Milan" CPUs

Power Usage (TDP)

The industry continues to push for increased performance, and power control within CPUs continues to advance. With ‘Milan’ EPYC CPUs, each model ships with a default power consumption setting. However, each can be adjusted in the BIOS to set a new configurable TDP (cTDP), which might be higher or lower than the default. The default TDP of each model is shown in the plot below. Note the lowest default wattage for EPYC “Milan” is 155 Watts, with the majority of models in the 200W~240W range, and two models at 280 Watts. Demanding computational users must be certain that the systems they select have received thorough thermal validation. Systems not designed for these wattages will run hot, throttle CPU speeds, and provide lower performance.
Chart comparing the TDP wattage of AMD EPYC "Milan" processors

Editor’s note: complete pricing was not available at time of publication, so additional analysis of price and cost-effectiveness of each CPU SKU will be added to this article when available.

Category: Tags:

 

Comments are closed.