Linux 7.0 Speeds Up Reclaiming File-Backed Large Folios By 50~75%
- Reference: 0001614551
- News link: https://www.phoronix.com/news/Linux-7.0-Faster-Large-Folios
- Source link:
The patches to support batch checking of references and unmapping for large folios is showing very nice performance numbers for reclaiming file-backed large folios. This work was carried out by Alibaba engineer Baolin Wang. He explained back on [1]the patch series :
"Currently, folio_referenced_one() always checks the young flag for each PTE sequentially, which is inefficient for large folios. This inefficiency is especially noticeable when reclaiming clean file-backed large folios, where folio_referenced() is observed as a significant performance hotspot.
Moreover, on Arm architecture, which supports contiguous PTEs, there is already an optimization to clear the young flags for PTEs within a contiguous range. However, this is not sufficient. We can extend this to perform batched operations for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE)."
When the patch series concludes with the batched unmapping for file large folios is where the numbers come out and are quite enticing:
"Performance testing:
Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface. I can observe 75% performance improvement on my Arm64 32-core server (and 50%+ improvement on my X86 machine) with this patch."
Some nice gains and with the increasing use of folios throughout the Linux kernel.
See [2]this MM pull request for those interested in these latest patches now merged for Linux 7.0.
[1] https://lore.kernel.org/lkml/cover.1770645603.git.baolin.wang@linux.alibaba.com/
[2] https://lore.kernel.org/lkml/20260218200016.8906fb904af9439e7b496327@linux-foundation.org/