News: 0001524341

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Bisecting The Linux 6.14 Performance Regression With System76 Thelio + AMD Threadripper

([Linux Kernel] 91 Minutes Ago Power Management Woes)


Yesterday I showcased [1]Linux 6.14 Git performance worse than Linux 6.13 and 6.12 in a number of multi-threaded workloads . Due to that initial discover being on the lone AMD EPYC Turin 2P server that is always busy running through new benchmarks for future content as well as I am being persistently short on time and constantly under pressure due to the state of the web/ad industry, I didn't expect to get around to digging deeper into the problem in the near-term. But as I ended up being able to reproduce some of the regressions on a System76 Thelio Major workstation at my desk with the still mighty powerful Ryzen Threadripper 7980X, I was able to turn around a quick bisect.

With the Linux 6.14 Git kernel performance regression noted yesterday on the 256-core / 512-thread Zen 5 server, a wide array of multi-threaded workloads regressed compared to the 6.13 and 6.12 stable kernels. Thanks to System76 with having the [2]Thelio Major Ryzen Threadripper workstation for testing using the [3]Threadripper 7980X 64-core / 128-thread processor with quad channel DDR5 memory, I decided to poke at Linux 6.14 there.

Sure enough, I was able to reproduce performance regressions of some of the same workloads on this Zen 4 Threadripper workstation. And there it's a quick and easy bisect with it not being as in-demand for other articles/benchmarking as the Zen 5 hardware.

The other srsRAN benchmark also reproduced a slowdown on 6.14 to a lesser extent.

I used the srsRAN 5G software as a quicker-running test case that also showed a significant performance drop on the EPYC 9005 server when running in the multi-threaded mode.

The bisect pointed to the Linux 6.14 power management updates:

From there it's presumably one of the AMD P-State driver changes that is introducing this performance regression on Linux 6.14... After all, the Linux 6.14 power management pull was dominated by AMD P-State changes and both the Zen 5 server and Zen 4 workstation are using the amd_pstate driver.

As mentioned in yesterday's article, on a different EPYC 1P Turin server last week when running Linux 6.14 Git I didn't see this regression. With that Supermicro Zen 5 server, it's still using the ACPI CPUFreq driver due to ACPI CPPC not being properly supported there for being able to use the AMD P-State driver. So it jives with this bisect if this multi-threaded performance regression is coming due to an amd_pstate issue.

Anyhow, that's where I am at for the moment and for those beginning to test Linux 6.14 Git and using the amd_pstate driver, you may want to pay special attention to multi-threaded workloads for any possible performance regressions... See [4]yesterday's article for more of the benchmarks I found to be regressed. Thanks to [5]System76 with the Thelio Major powered by the AMD Ryzen Threadripper for making a quick round of kernel bisecting.



[1] https://www.phoronix.com/review/linux-614-early-regression

[2] https://www.phoronix.com/review/system76-thelio-threadripper-2024

[3] https://www.phoronix.com/review/threadripper-7970x-7980x-linux

[4] https://www.phoronix.com/review/linux-614-early-regression

[5] https://system76.com/



hamishmb

HD7950

kozman

Real computer scientists only write specs for languages that might run
on future hardware. Nobody trusts them to write specs for anything homo
sapiens will ever be able to fit on a single planet.