Intel's Cache Aware Scheduling Inches Closer To Being Merged For Linux

([Intel] 4 Hours Ago Cache Aware Scheduling)

Reference: 0001633638
News link: https://www.phoronix.com/news/Linux-Cache-Aware-Sched-Nears
Source link:

I have been writing about the [1]Cache Aware Scheduling work led by Intel engineers on the Linux kernel for more than a year. I've also tested out Cache Aware Scheduling on both Intel and AMD CPUs with the patched Linux kernel to great success. And thus very happy to see the Cache Aware Scheduling patches inching closer to the mainline Linux kernel.

The Cache Aware Scheduling patches as of a few days ago were queued into Peter Zijlstra's [2]sched/cache Git branch . Hopefully soon Peter will push them into a tip/tip.git branch that would then put it in direct trajectory for merging into the next Linux kernel merge window.

Cache Aware Scheduling can help enhance Linux performance on modern CPUs with multiple cache domains. The scheduler tries to help ensure that tasks sharing data are colocated to the same last level cache (LLC) domain for ensuring better cache locality and reducing cache misses/bouncing. I've benchmarked [3]nice performance gains on AMD EPYC and [4]better Xeon 6 performance too . Once Cache Aware Scheduling is merged or about to hit the finish line, I'll be through with some fresh benchmarks.

Beyond those patches now hitting Peter's development Git branch, [5]a new patch series from Intel engineer Tim Chen were sent out on Wednesday that now consist of just enhancements atop those patches staged in that Git branch.

The new enhancement patch series resolves an over-aggregation issue that could occur with the Cache Aware Scheduling behavior. Plus there are also various bug fixes thanks to [6]the AI-based reporting by Sashiko .

Tim Chen noted on the patch cover letter:

"Compared with cache-aware v4, the major change in the first part is storing the LLC effective size in the per-CPU bottom sched_domain. This allows checking whether a task's memory footprint exceeds the threshold by fetching the value directly from the corresponding sched_domain, instead of recalculating it every time. Besides, the NUMA balance page-fault statistics is used instead of RSS to estimate the working set. We also picked up Jianyong's optimization patch to reduce CPU scan overhead. However, if NUMA balancing is not enabled we will not have this working set estimate. Perhaps using RSS will be apprpriate for such scenario.

...

Test results show that the current version keeps the same performance as v4 for workloads and platforms we tested.

Future plans are to introduce fine-grained control of using cache aware scheduling on specific tasks after the load-balance-based cache-aware scheduling is merged:

- Look into task tagging (e.g. with schedqos framework, cgroup) for non process based tasks grouping to LLC.

- Evaluate fast cache-aware aggregation in the wakeup path."

Great to see yet more improvements to Cache Aware Scheduling on the way and hopefully it will soon make its way to the mainline Linux kernel.

[1] https://www.phoronix.com/search/Cache+Aware+Scheduling

[2] https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/cache

[3] https://www.phoronix.com/review/cache-aware-scheduling-amd-turin

[4] https://www.phoronix.com/review/intel-xeon-6-cache-sched

[5] https://lore.kernel.org/all/cover.1778703694.git.tim.c.chen@linux.intel.com/

[6] https://www.phoronix.com/news/Sashiko-Linux-AI-Code-Review

News: 0001633638

Intel's Cache Aware Scheduling Inches Closer To Being Merged For Linux