Intel Spots A 3888.9% Performance Improvement In The Linux Kernel From One Line Of Code
([Intel] 5 Hours Ago
3888.9% Performance)
- Reference: 0001504597
- News link: https://www.phoronix.com/news/Intel-Linux-3888.9-Performance
- Source link:
Intel's Linux kernel test robot has reported a 3888.9% performance improvement in the mainline Linux kernel as of this past week.
The Intel kernel test robot [1]reported the 3888.9% improvement with its "will-it-scale.per_process_ops" scalability test case running on an Intel Xeon Platinum (Cooper Lake) test server. Intel thankfully has the resources to maintain this automated service for per-kernel commit/patch testing and has been maintaining their public kernel test robot for years now to help catch performance changes both positive and negative to the Linux kernel code.
The commit in question causing this massive uplift to performance is [2]mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes . The patch message confirms it will fix some prior performance regressions and deliver some major uplift in specialized cases.
"Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page.
However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600% slowdown of the cactusBSSN benchmark on some platforms. The benchmark seems to create many mappings of 4632kB, which would have merged to a large THP-backed area before commit efa7df3e3bb5 and now they are fragmented to multiple areas each aligned to PMD boundary with gaps between. The regression then seems to be caused mainly due to the benchmark's memory access pattern suffering from TLB or cache aliasing due to the aligned boundaries of the individual areas.
Another known regression bisected to commit efa7df3e3bb5 is darktable and early testing suggests this patch fixes the regression there as well.
To fix the regression but still try to benefit from THP-friendly anonymous mapping alignment, add a condition that the size of the mapping must be a multiple of PMD size instead of at least PMD size. In case of many odd-sized mapping like the cactusBSSN creates, those will stop being aligned and with gaps between, and instead naturally merge again."
That mmap patch merged last week affects just one line of code. The cited memory management patch introducing regressions into the mainline Linux kernel have been upstream since December of 2023.
I'll be firing up some benchmarks on my side to look for any other real-world workloads seeing any measurable shift in performance with this latest Linux kernel code beyond the smaller synthetic test cases.
[1] https://lore.kernel.org/lkml/202411072132.a8d2cf0f-oliver.sang@intel.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4148aeab412432bf928f311eca8a2ba52bb05df
The Intel kernel test robot [1]reported the 3888.9% improvement with its "will-it-scale.per_process_ops" scalability test case running on an Intel Xeon Platinum (Cooper Lake) test server. Intel thankfully has the resources to maintain this automated service for per-kernel commit/patch testing and has been maintaining their public kernel test robot for years now to help catch performance changes both positive and negative to the Linux kernel code.
The commit in question causing this massive uplift to performance is [2]mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes . The patch message confirms it will fix some prior performance regressions and deliver some major uplift in specialized cases.
"Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page.
However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600% slowdown of the cactusBSSN benchmark on some platforms. The benchmark seems to create many mappings of 4632kB, which would have merged to a large THP-backed area before commit efa7df3e3bb5 and now they are fragmented to multiple areas each aligned to PMD boundary with gaps between. The regression then seems to be caused mainly due to the benchmark's memory access pattern suffering from TLB or cache aliasing due to the aligned boundaries of the individual areas.
Another known regression bisected to commit efa7df3e3bb5 is darktable and early testing suggests this patch fixes the regression there as well.
To fix the regression but still try to benefit from THP-friendly anonymous mapping alignment, add a condition that the size of the mapping must be a multiple of PMD size instead of at least PMD size. In case of many odd-sized mapping like the cactusBSSN creates, those will stop being aligned and with gaps between, and instead naturally merge again."
That mmap patch merged last week affects just one line of code. The cited memory management patch introducing regressions into the mainline Linux kernel have been upstream since December of 2023.
I'll be firing up some benchmarks on my side to look for any other real-world workloads seeing any measurable shift in performance with this latest Linux kernel code beyond the smaller synthetic test cases.
[1] https://lore.kernel.org/lkml/202411072132.a8d2cf0f-oliver.sang@intel.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4148aeab412432bf928f311eca8a2ba52bb05df
fong38