News: 0001504513

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

NVIDIA GH200 Grace CPU vs. AMD EPYC 9005 Turin CPU Performance

([Processors] 5 Hours Ago 7 Comments)


With the AMD [1]EPYC 9005 "Turin" testing over the past month since launch I have looked at how well the new [2]EPYC Turin CPUs compete against Intel Xeon , how [3]Turin Dense dominates in performance and power efficiency to AmpereOne at 192 cores, and [4]the generational uplift from EPYC Genoa to Turin at the same core counts, among other [5]Turin performance benchmark tests. Up for comparison today is a look at how the NVIDIA Grace CPU performance within the GH200 Superchip compares to the AMD EPYC Turin processors.

The NVIDIA Grace CPU performance wasn't previously compared to the AMD EPYC 9005 "Turin" processors due to not having the hardware on-hand for comparison and the last time I did [6]GH200 benchmarking was in early 2024 even before Ubuntu 24.04 LTS was released... So out-of-date / non-comparable data. But [7]GPTshop.ai recently provided me remote access again to one of their NVIDIA GH200 systems for benchmarking. I've been able to carry out some fresh NVIDIA GH200 Grace CPU benchmarks among other GH200 testing for future articles. In today's article is a look at the GH200's Arm-based Grace CPU performance compared to the Zen 5 Turin cores within the EPYC 9005 series.

With the EPYC Turin tests being just CPU-focused, the GH200 testing was as well in looking at the Grace CPU performance with its 72 cores based on Arm Neoverse-V2 and 480GB of memory. All testing was based on Ubuntu 24.04 LTS albeit with a patched kernel on the EPYC side for having Zen 5 CPU power consumption monitoring. There was slightly different single NVMe drive storage due to having only remote access to this GPTshop.ai GH200 system. The default GCC 13.2 compiler was used on both platforms, Ubuntu 24.04 LTS running with the "performance" CPU frequency scaling driver. The GH200 system was also running with the [8]64K page size kernel for best AArch64 performance potential.

Due to the focus of the GH200 on performance and not dense cloud computing, EPYC Turin Dense like the EPYC 9965 192-core CPU wasn't included as part of this comparison but looking at the full Turin (classic) cores. The EPYC 9575F 64-core, EPYC 9655 96-core, and EPYC 9755 128-core processors were used in this comparison against the 72-core NVIDIA Grace CPU. Due to this being just a targeted comparison against a single Grace CPU, the EPYC 9575F / 9655 / 9755 CPUs were just shown in their single socket (1P) configuration.

The raw CPU performance was looked at for this article along with monitoring the CPU power consumption of each processor using exposed Linux interfaces for power monitoring to show the CPU performance-per-Watt as well.

Enjoy these results as you wish, keeping in mind the slight hardware/software differences due to the remote GH200 system access and again this article is focusing just on the CPU performance across these servers. There is also a reduced set of benchmarks compared to my prior AMD/Intel x86_64 testing due to some of the software packages not working well or at least not optimized at all for AArch64.

Thanks to [9]GPTshop.ai for providing the remote access to the GH200 for this German company producing high-end desktop systems for AI and HPC using the GH200 and soon with GB200 Blackwell too.



[1] https://www.phoronix.com/search/EPYC+9005

[2] https://www.phoronix.com/review/amd-epyc-9965-9755-benchmarks

[3] https://www.phoronix.com/review/amd-epyc-9965-ampereone

[4] https://www.phoronix.com/review/amd-epyc-9655

[5] https://www.phoronix.com/search/Turin

[6] https://www.phoronix.com/search/GH200

[7] https://gptshop.ai/

[8] https://www.phoronix.com/review/aarch64-64k-kernel-perf

[9] https://gptshop.ai/



Microsoft gives you Windows... Linux gives you the whole house.