Linux 6.11 To Offer More Fine-Tuned Control Over Swappiness
([Linux Kernel] 3 Hours Ago
Swappiness)
- Reference: 0001475973
- News link: https://www.phoronix.com/news/Linux-6.11-Fine-Swappiness
- Source link:
As part of the memory management changes expected to be merged for the upcoming Linux 6.11 cycle is allowing more fine-tuned control over the swappiness setting used to determine how aggressively pages are swapped out of physical system memory and into the on-disk swap space.
With the new code from Meta, a swappiness argument is supported for memory.reclaim. This effectively allows more finer-grained control over the swapiness behavior without overriding the global swappiness setting.
Dan Schatzberg of Meta explains in [1]the patch adding swappiness= support to memory.reclaim:
Allow proactive reclaimers to submit an additional swappiness=[val] argument to memory.reclaim. This overrides the global or per-memcg swappiness setting for that reclaim attempt.
For example:
echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim
will perform reclaim on the rootcg with a swappiness setting of 0 (no swap) regardless of the vm.swappiness sysctl setting.
Userspace proactive reclaimers use the memory.reclaim interface to trigger reclaim. The memory.reclaim interface does not allow for any way to effect the balance of file vs anon during proactive reclaim. The only approach is to adjust the vm.swappiness setting. However, there are a few reasons we look to control the balance of file vs anon during proactive reclaim, separately from reactive reclaim:
* Swapout should be limited to manage SSD write endurance. In near-OOM situations we are fine with lots of swap-out to avoid OOMs. As these are typically rare events, they have relatively little impact on write endurance. However, proactive reclaim runs continuously and so its impact on SSD write endurance is more significant. Therefore it is desireable to control swap-out for proactive reclaim separately from reactive reclaim
* Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap exhaustion. This makes sense if the swap exhaustion is triggered due to reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. one could see OOMs when free memory is ample but anon is just particularly cold). Therefore, it's desireable to have proactive reclaim reduce or stop swap-out before the threshold at which OOM killing occurs.
In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before writes to memory.reclaim[2]. This has been in production for nearly two years and has addressed our needs to control proactive vs reactive reclaim behavior but is still not ideal for a number of reasons:
* vm.swappiness is a global setting, adjusting it can race/interfere with other system administration that wishes to control vm.swappiness. In our case, we need to disable Senpai before adjusting vm.swappiness.
* vm.swappiness is stateful - so a crash or restart of Senpai can leave a misconfigured setting. This requires some additional management to record the "desired" setting and ensure Senpai always adjusts to it.
With this patch, we avoid these downsides of adjusting vm.swappiness globally.
Good news for systemd-oomd users and others wanting more control over the Linux swapiness behavior.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything&id=49003d83c7dabf67036c4523d9907b16fde9cb24
With the new code from Meta, a swappiness argument is supported for memory.reclaim. This effectively allows more finer-grained control over the swapiness behavior without overriding the global swappiness setting.
Dan Schatzberg of Meta explains in [1]the patch adding swappiness= support to memory.reclaim:
Allow proactive reclaimers to submit an additional swappiness=[val] argument to memory.reclaim. This overrides the global or per-memcg swappiness setting for that reclaim attempt.
For example:
echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim
will perform reclaim on the rootcg with a swappiness setting of 0 (no swap) regardless of the vm.swappiness sysctl setting.
Userspace proactive reclaimers use the memory.reclaim interface to trigger reclaim. The memory.reclaim interface does not allow for any way to effect the balance of file vs anon during proactive reclaim. The only approach is to adjust the vm.swappiness setting. However, there are a few reasons we look to control the balance of file vs anon during proactive reclaim, separately from reactive reclaim:
* Swapout should be limited to manage SSD write endurance. In near-OOM situations we are fine with lots of swap-out to avoid OOMs. As these are typically rare events, they have relatively little impact on write endurance. However, proactive reclaim runs continuously and so its impact on SSD write endurance is more significant. Therefore it is desireable to control swap-out for proactive reclaim separately from reactive reclaim
* Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap exhaustion. This makes sense if the swap exhaustion is triggered due to reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. one could see OOMs when free memory is ample but anon is just particularly cold). Therefore, it's desireable to have proactive reclaim reduce or stop swap-out before the threshold at which OOM killing occurs.
In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before writes to memory.reclaim[2]. This has been in production for nearly two years and has addressed our needs to control proactive vs reactive reclaim behavior but is still not ideal for a number of reasons:
* vm.swappiness is a global setting, adjusting it can race/interfere with other system administration that wishes to control vm.swappiness. In our case, we need to disable Senpai before adjusting vm.swappiness.
* vm.swappiness is stateful - so a crash or restart of Senpai can leave a misconfigured setting. This requires some additional management to record the "desired" setting and ensure Senpai always adjusts to it.
With this patch, we avoid these downsides of adjusting vm.swappiness globally.
Good news for systemd-oomd users and others wanting more control over the Linux swapiness behavior.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything&id=49003d83c7dabf67036c4523d9907b16fde9cb24
phoronix