News: 0001524341

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Bisecting The Linux 6.14 Performance Regression With System76 Thelio + AMD Threadripper

([Linux Kernel] 91 Minutes Ago Power Management Woes)


Yesterday I showcased [1]Linux 6.14 Git performance worse than Linux 6.13 and 6.12 in a number of multi-threaded workloads . Due to that initial discover being on the lone AMD EPYC Turin 2P server that is always busy running through new benchmarks for future content as well as I am being persistently short on time and constantly under pressure due to the state of the web/ad industry, I didn't expect to get around to digging deeper into the problem in the near-term. But as I ended up being able to reproduce some of the regressions on a System76 Thelio Major workstation at my desk with the still mighty powerful Ryzen Threadripper 7980X, I was able to turn around a quick bisect.

With the Linux 6.14 Git kernel performance regression noted yesterday on the 256-core / 512-thread Zen 5 server, a wide array of multi-threaded workloads regressed compared to the 6.13 and 6.12 stable kernels. Thanks to System76 with having the [2]Thelio Major Ryzen Threadripper workstation for testing using the [3]Threadripper 7980X 64-core / 128-thread processor with quad channel DDR5 memory, I decided to poke at Linux 6.14 there.

Sure enough, I was able to reproduce performance regressions of some of the same workloads on this Zen 4 Threadripper workstation. And there it's a quick and easy bisect with it not being as in-demand for other articles/benchmarking as the Zen 5 hardware.

The other srsRAN benchmark also reproduced a slowdown on 6.14 to a lesser extent.

I used the srsRAN 5G software as a quicker-running test case that also showed a significant performance drop on the EPYC 9005 server when running in the multi-threaded mode.

The bisect pointed to the Linux 6.14 power management updates:

From there it's presumably one of the AMD P-State driver changes that is introducing this performance regression on Linux 6.14... After all, the Linux 6.14 power management pull was dominated by AMD P-State changes and both the Zen 5 server and Zen 4 workstation are using the amd_pstate driver.

As mentioned in yesterday's article, on a different EPYC 1P Turin server last week when running Linux 6.14 Git I didn't see this regression. With that Supermicro Zen 5 server, it's still using the ACPI CPUFreq driver due to ACPI CPPC not being properly supported there for being able to use the AMD P-State driver. So it jives with this bisect if this multi-threaded performance regression is coming due to an amd_pstate issue.

Anyhow, that's where I am at for the moment and for those beginning to test Linux 6.14 Git and using the amd_pstate driver, you may want to pay special attention to multi-threaded workloads for any possible performance regressions... See [4]yesterday's article for more of the benchmarks I found to be regressed. Thanks to [5]System76 with the Thelio Major powered by the AMD Ryzen Threadripper for making a quick round of kernel bisecting.



[1] https://www.phoronix.com/review/linux-614-early-regression

[2] https://www.phoronix.com/review/system76-thelio-threadripper-2024

[3] https://www.phoronix.com/review/threadripper-7970x-7980x-linux

[4] https://www.phoronix.com/review/linux-614-early-regression

[5] https://system76.com/



hamishmb

HD7950

kozman

Are you mentally here at Pizza Hut??