Sched_ext Scheduler Idle Selection Being Extended For LLC & NUMA Awareness
([Linux Kernel] 3 Hours Ago
sched_ext NUMA Awareness)
- Reference: 0001502138
- News link: https://www.phoronix.com/news/sched_ext-NUMA-Awareness
- Source link:
While the [1]sched_ext extensible scheduler code was merged for Linux 6.12, work on sched_ext itself it is not over. New patches this weekend continue working on NUMA awareness for it with its default idle selection policy while similar work on CPU last level cache (LLC) awareness are slated for the upcoming Linux 6.13 cycle.
Queued last week within [2]sched_ext.git's "for-6.13" branch is [3]a patch to introduce LLC awareness to the default idle selection policy. By leveraging the Linux kernel's scheduler topology information, LLC awareness is added to the idle selection policy.
"This allows schedulers using the built-in policy to make more informed decisions when selecting an idle CPU in systems with multiple LLCs, such as NUMA systems or chiplet-based architectures, and it helps keep tasks within the same LLC domain, thereby improving cache locality.
For efficiency, LLC awareness is applied only to tasks that can run on all the CPUs in the system for now. If a task's affinity is modified from user space, it's the responsibility of user space to choose the appropriate optimized scheduling domain."
That LLC awareness for sched_ext will in turn be introduced with Linux 6.13. Andrea Righi of NVIDIA introduced that support.
Andrea Righi has also been working on adding NUMA awareness to the default idle selection code too. That code is still undergoing code review but the latest work there was [4]posted Sunday to the Linux kernel mailing list. That code extends the built-in idle CPU selection policy to prioritize CPUs within the same NUMA node. Righi explains in that patch:
"With this change applied, the built-in CPU idle selection policy follows this logic:
- always prioritize CPUs from fully idle SMT cores,
- select the same CPU if possible,
- select a CPU within the same LLC domain,
- select a CPU within the same NUMA node.
Both NUMA and LLC awareness features are enabled only when the system has multiple NUMA nodes or multiple LLC domains.
In the future, we may want to improve the NUMA node selection to account the node distance from prev_cpu. Currently, the logic only tries to keep tasks running on the same NUMA node. If all CPUs within a node are busy, the next NUMA node is chosen randomly."
We'll see if that NUMA awareness is ready in time for the upcoming Linux 6.13 merge window to join the LLC awareness support. In any event there continues to be a lot of interesting developments and adoption around sched_ext now that it's mainlined.
[1] https://www.phoronix.com/search/sched_ext
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/log/?h=for-6.13
[3] https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/commit/?h=for-6.13&id=dfa4ed29b18c5f26cd311b0da7f049dbb2a2b33b
[4] https://lore.kernel.org/lkml/20241027174953.49655-1-arighi@nvidia.com/
Queued last week within [2]sched_ext.git's "for-6.13" branch is [3]a patch to introduce LLC awareness to the default idle selection policy. By leveraging the Linux kernel's scheduler topology information, LLC awareness is added to the idle selection policy.
"This allows schedulers using the built-in policy to make more informed decisions when selecting an idle CPU in systems with multiple LLCs, such as NUMA systems or chiplet-based architectures, and it helps keep tasks within the same LLC domain, thereby improving cache locality.
For efficiency, LLC awareness is applied only to tasks that can run on all the CPUs in the system for now. If a task's affinity is modified from user space, it's the responsibility of user space to choose the appropriate optimized scheduling domain."
That LLC awareness for sched_ext will in turn be introduced with Linux 6.13. Andrea Righi of NVIDIA introduced that support.
Andrea Righi has also been working on adding NUMA awareness to the default idle selection code too. That code is still undergoing code review but the latest work there was [4]posted Sunday to the Linux kernel mailing list. That code extends the built-in idle CPU selection policy to prioritize CPUs within the same NUMA node. Righi explains in that patch:
"With this change applied, the built-in CPU idle selection policy follows this logic:
- always prioritize CPUs from fully idle SMT cores,
- select the same CPU if possible,
- select a CPU within the same LLC domain,
- select a CPU within the same NUMA node.
Both NUMA and LLC awareness features are enabled only when the system has multiple NUMA nodes or multiple LLC domains.
In the future, we may want to improve the NUMA node selection to account the node distance from prev_cpu. Currently, the logic only tries to keep tasks running on the same NUMA node. If all CPUs within a node are busy, the next NUMA node is chosen randomly."
We'll see if that NUMA awareness is ready in time for the upcoming Linux 6.13 merge window to join the LLC awareness support. In any event there continues to be a lot of interesting developments and adoption around sched_ext now that it's mainlined.
[1] https://www.phoronix.com/search/sched_ext
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/log/?h=for-6.13
[3] https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/commit/?h=for-6.13&id=dfa4ed29b18c5f26cd311b0da7f049dbb2a2b33b
[4] https://lore.kernel.org/lkml/20241027174953.49655-1-arighi@nvidia.com/
skeevy420