Benchmarking The AMD INVLPGB Linux Kernel Patches For Better Performance
([Software] 105 Minutes Ago
5 Comments)
- Reference: 0001515546
- News link: https://www.phoronix.com/review/amd-invlpgb-linux
- Source link:
Last weekend a Meta engineer posted [1]Linux kernel patches to make use of the AMD INVLPGB instruction for broadcast TLB invalidation . The Linux kernel can in turn invalidate TLB entries on remote CPUs without needing to send IPIs and without having to wait for remote CPUs to handle those interrupts. Synthetic benchmarks shown in that patch series were very promising and thus I carried out some benchmarking over the holidays of this AMD INVLPGB support for the Linux kernel.
[2]
In a TLB flushing synthetic test case the Linux kernel on an AMD EPYC Milan server went from 527k loops/second to 1157k loops/second using these INVLPGB patches plus the "lru_add_drain" patch series. Quite a big win! Thus I was eager to test out these patches myself to see how it would impact real-world workloads.
As noted in that article last weekend, AMD INVLPGB is present going back to Zen 3 processors but only now is it being wired up for use by the Linux kernel engineer... And thanks to Meta engineer Rik van Riel. It's a bit surprising and unfortunate to see AMD having not tackled this functionality previously for benefiting recent generations of Ryzen and EPYC processors on Linux.
On the AMD EPYC 9655 + Supermicro H13SSL-N server build I ran benchmarks using the upstream Linux 6.13 Git kernel as of 24 December and then rebuilding the Linux kernel with the AMD INVLPGB v2 patch series and lru_add_drain patches applied. There were some minor changes needed to the INVLPGB v2 patch series to apply against the Linux 6.13 upstream Git state but were all trivial adjustments to the code. From there I carried out a range of multi-threaded benchmarks to look for real-world workloads where the AMD INVLPGB support was helping out...
Of the workloads where finding measurable benefit from these AMD INVLPGB patches included the Xmrig miner, various OpenJDK Java workloads from DaCapo Benchmark to Apache Cassnadra, MariaDB / MySQL, Apache IoTDB, the Nginx HTTPS web server, Whisper.cpp, and various other multi-threaded workloads.... Up to a few percent gain from this patch series in the testing done over the holidays on this AMD EPYC Zen 5 server. Of course, these patches can benefit AMD Linux systems from Zen 3 and Zen 4 generations too. There was possibly a minor performance regression in the ClickHouse database but it was a ~2% difference and may just come down to noise. Various other workloads were tested as well but without any statistically significant difference. Additional testing on more systems and more benchmarks will hopefully reveal any other limitations but overall these AMD INVLPGB Linux patches are looking promising.
I will have more AMD INVLPGB Linux benchmarks in the new year when having more time for testing and trying out the patched kernel on an increased range of hardware. In any event nice seeing these gains that benefit going back to Zen 3 albeit a bit surprising that the Linux kernel wasn't pursuing AMD INVLPGB for broadcast TLB invalidation until now. Hopefully these patches from Meta will manage to be upstreamed to the Linux kernel in a coming kernel cycle. If you appreciate all of the relentless Linux benchmarking I do each and every day, please consider showing your end of year support by [3]joining Phoronix Premium to enjoy the site ad-free, multi-page articles on a single page, and additional benefits. [4]PayPal and [5]Stripe tips also remain supported and much appreciated to help with dealing with the difficult state of the web ad industry and ad block usage, among other challenges. Here's to hopefully much more benchmarking in 2025.
[1] https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits
[2] https://www.phoronix.com/image-viewer.php?id=amd-invlpgb-linux&image=amd_invlpgb_1_lrg
[3] https://www.phoronix.com/phoronix-premium
[4] https://www.paypal.com/donate/?hosted_button_id=EA79CCDLNFJNW
[5] https://buy.stripe.com/28o02d1yG1Lp8H67ss
[2]
In a TLB flushing synthetic test case the Linux kernel on an AMD EPYC Milan server went from 527k loops/second to 1157k loops/second using these INVLPGB patches plus the "lru_add_drain" patch series. Quite a big win! Thus I was eager to test out these patches myself to see how it would impact real-world workloads.
As noted in that article last weekend, AMD INVLPGB is present going back to Zen 3 processors but only now is it being wired up for use by the Linux kernel engineer... And thanks to Meta engineer Rik van Riel. It's a bit surprising and unfortunate to see AMD having not tackled this functionality previously for benefiting recent generations of Ryzen and EPYC processors on Linux.
On the AMD EPYC 9655 + Supermicro H13SSL-N server build I ran benchmarks using the upstream Linux 6.13 Git kernel as of 24 December and then rebuilding the Linux kernel with the AMD INVLPGB v2 patch series and lru_add_drain patches applied. There were some minor changes needed to the INVLPGB v2 patch series to apply against the Linux 6.13 upstream Git state but were all trivial adjustments to the code. From there I carried out a range of multi-threaded benchmarks to look for real-world workloads where the AMD INVLPGB support was helping out...
Of the workloads where finding measurable benefit from these AMD INVLPGB patches included the Xmrig miner, various OpenJDK Java workloads from DaCapo Benchmark to Apache Cassnadra, MariaDB / MySQL, Apache IoTDB, the Nginx HTTPS web server, Whisper.cpp, and various other multi-threaded workloads.... Up to a few percent gain from this patch series in the testing done over the holidays on this AMD EPYC Zen 5 server. Of course, these patches can benefit AMD Linux systems from Zen 3 and Zen 4 generations too. There was possibly a minor performance regression in the ClickHouse database but it was a ~2% difference and may just come down to noise. Various other workloads were tested as well but without any statistically significant difference. Additional testing on more systems and more benchmarks will hopefully reveal any other limitations but overall these AMD INVLPGB Linux patches are looking promising.
I will have more AMD INVLPGB Linux benchmarks in the new year when having more time for testing and trying out the patched kernel on an increased range of hardware. In any event nice seeing these gains that benefit going back to Zen 3 albeit a bit surprising that the Linux kernel wasn't pursuing AMD INVLPGB for broadcast TLB invalidation until now. Hopefully these patches from Meta will manage to be upstreamed to the Linux kernel in a coming kernel cycle. If you appreciate all of the relentless Linux benchmarking I do each and every day, please consider showing your end of year support by [3]joining Phoronix Premium to enjoy the site ad-free, multi-page articles on a single page, and additional benefits. [4]PayPal and [5]Stripe tips also remain supported and much appreciated to help with dealing with the difficult state of the web ad industry and ad block usage, among other challenges. Here's to hopefully much more benchmarking in 2025.
[1] https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits
[2] https://www.phoronix.com/image-viewer.php?id=amd-invlpgb-linux&image=amd_invlpgb_1_lrg
[3] https://www.phoronix.com/phoronix-premium
[4] https://www.paypal.com/donate/?hosted_button_id=EA79CCDLNFJNW
[5] https://buy.stripe.com/28o02d1yG1Lp8H67ss