An Enticing Optimization For Linux Memory Reclaim On Today's Multi-Core Platforms
([Linux Kernel] 51 Minutes Ago
Memory Reclaim)
- Reference: 0001622698
- News link: https://www.phoronix.com/news/Linux-Better-Reclaim-Multi-Core
- Source link:
A new set of Linux kernel patches for batch TLB flushing for dirty folios within the kernel's vmscan path were recently floated on the Linux kernel mailing list. This batch TLB flushing optimization for dirty folios during memory reclaim can be a significant performance win with today's multi-core hardware.
Tencent engineer Zhang Peng sent out the set of patches to improve the Linux kernel's behavior when performing page-out in memory reclamation. Currently the function to flush dirty pages from the Translation Lookaside Buffer (TLB) is called for each individual dirty folio, but that can lead to excessive Inter-Processor Interrupts (IPIs) that impact performance of the system.
With the proposed code, dirty folios are queued into batches and then performed a single TLB flush for each of the batches rather than on an individual folio basis.
Using stress-ng to benchmark the kernel behavior was a 26.9% throughput improvement with the five proposed patches.
The patch series was originally proposed earlier in March while today brought the [1]v2 patch series in aiming to reduce IPI overhead on multi-core systems.
[1] https://lore.kernel.org/lkml/20260326-batch-tlb-flush-v2-0-403e523325c4@icloud.com/
Tencent engineer Zhang Peng sent out the set of patches to improve the Linux kernel's behavior when performing page-out in memory reclamation. Currently the function to flush dirty pages from the Translation Lookaside Buffer (TLB) is called for each individual dirty folio, but that can lead to excessive Inter-Processor Interrupts (IPIs) that impact performance of the system.
With the proposed code, dirty folios are queued into batches and then performed a single TLB flush for each of the batches rather than on an individual folio basis.
Using stress-ng to benchmark the kernel behavior was a 26.9% throughput improvement with the five proposed patches.
The patch series was originally proposed earlier in March while today brought the [1]v2 patch series in aiming to reduce IPI overhead on multi-core systems.
[1] https://lore.kernel.org/lkml/20260326-batch-tlb-flush-v2-0-403e523325c4@icloud.com/