These new CPUs are the third iteration of AMD’s EPYC server processor family. They are compatible with existing workstation and server platforms that supported “Rome”, but include new performance and security improvements. If you’re looking to upgrade to or deploy these new CPUs, please speak with one of our experts to learn more.
Important features/changes in EPYC “Milan” CPUs include:
- Up to 64 processor cores per socket (with options for 8-, 16-, 24-, 28-, 32-, 48-, and 56-cores)
- Improved CPU clock speeds up to 3.7GHz (with Max Boost speeds up to 4.1GHz)
- Unified 32MB L3 cache shared between each set of 8 cores (instead of two separate 16MB caches)
- Increase in instructions completed per clock cycle (IPC)
- IOMMU for improved IO performance in virtualized environments
- The security/memory encryption features present in “Rome”, along with SEV-SNP support (protecting against malicious hypervisors)
- Plus all the advantages of the previous “Rome” generation:
- Full support for 256-bit AVX2 instructions with two 256-bit FMA units per CPU core
- Up to 16 double-precision FLOPS per cycle per core
- Eight-channel memory controller on each CPU
- Support for DDR4 memory speeds up to 3200MHz
- Up to 4TB memory per CPU socket
- Up to 256MB L3 cache per CPU
- Support for PCI-Express generation 4.0 (which doubles the throughput of gen 3.0)
- 128 lanes of PCI-Express 4.0 per CPU socket
With a product this complex, it’s very difficult to cover every aspect of the design. Here, we concentrate primarily on the performance of the processors for HPC & AI applications.
Before diving into the details, it helps to keep in mind the following recommendations. Based on our experience with HPC and Deep Learning deployments, our general guidance for selecting among the EPYC options is shown below. Note that certain applications may deviate from this general advice (e.g., software which benefits from particularly high clock speeds or larger L3 cache per core).
- 8-core EPYC CPUs – not recommended for HPC
While perfect for particular applications, these models are not as cost-effective as many of the higher core count options.
- 16-core to 28-core EPYC CPUs – suitable for most HPC workloads
While not typically offering the best cost-effectiveness, they provide excellent performance at lower price points.
- 32-core EPYC CPUs – excellent for HPC workloads
These models offer excellent price/performance along with higher clock speeds and core counts
- 48-core to 64-core EPYC CPUs – suitable for certain HPC workloads
Although these models with high core counts may provide the highest cost-effectiveness and power efficiency, some applications exhibit diminishing returns at the highest core counts. Scalable applications that are not memory bandwidth bound will benefit the most from these EPYC CPUs.
Microway provides a Test Drive cluster to assist in evaluating and comparing products as users determine the ideal specifications for their new HPC & AI deployments. We would be happy to help you evaluate AMD EPYC processors as you plan your next deployment.
AMD EPYC “Milan” Computational Performance
This latest iteration of EPYC CPUs offers excellent performance. However, many of the on-paper comparisons between this generation and the previous generation do not demonstrate large gains. Application benchmarking will be needed to demonstrate many of the gains (such as those provided by the larger/unified L3 cache and the IPC improvements). That being said, most models in this generation provide at least 1 TFLOPs (one teraflop of double-precision 64-bit compute per second) and the 64-core CPUs provide over 2 TFLOPS. The plot below shows the expected performance across this new CPU line-up:
In the chart above, shaded/colored bars indicate the expected performance ranges for each CPU model on traditional HPC applications that use double-precision 64-bit math operations. Peak performance numbers are achieved when executing 256-bit AVX2 instructions with FMA. Note that only a small set of applications are able to use exclusively AVX2 FMA instructions (e.g., LINPACK). Most applications issue a variety of instructions and will achieve lower than the peak FLOPS values shown above. Applications which have not been re-compiled in recent years (with a compiler supporting AVX2 instructions) would achieve lower performance.
The dotted lines above each bar indicate the possible peak performance were all CPU cores operating at boosted clock speeds. While theoretically possible for short amounts of time, sustained performance at these increased CPU frequencies is not expected. Sections of code with dense, vectorized instructions are very demanding, and typically result in each core slightly lowering clock speeds (a behavior not unique to AMD CPUs). While AMD has not published specific clock speed expectations for such codes, Microway expects the EPYC “Milan” CPUs to operate near their standard/published clock speed values when all cores are in use.
Throughout this article, the CPU models are sorted largely by price. The lowest-performance models provide fewer numbers of CPU cores and less L3 cache memory. Higher-end models offer high core counts for the increased performance. HPC and AI groups are generally expected to favor the processor models in the middle of the pack, as the highest core count CPUs are priced at a premium.
Note that those models which only support single-CPU installations are separated on the left side of each plot.
AMD “Milan” EPYC Processor Specifications
The tabs below compare the features and specifications of this 3rd iteration of the EPYC processors. Please notice that CPU models ending with a P suffix are designed for single-socket systems (and do not operate in dual-socket systems). All other CPU models are compatible with both single- or dual-socket systems. The P-series EPYC processors tend to be priced lower and can thus be quite cost-effective, however they are not available in dual-CPU systems.
CPU Clock Speed
L3 Cache Size
Power Usage (TDP)
Editor’s note: complete pricing was not available at time of publication, so additional analysis of price and cost-effectiveness of each CPU SKU will be added to this article when available.