News: 0001627226

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

WQ_AFFN_CACHE_SHARD Merged For Linux 7.1: Significant Win For CPUs With Many Cores Per LLC

([Linux Kernel] 2 Hours Ago WQ_AFFN_CACHE_SHARD Affinity Scope)


The workqueue changes merged today for the Linux 7.1 kernel are significant for today's modern high-end processors where there can be many CPU cores per last level cache (LLC / L3 cache). The new WQ_AFFN_CACHE_SHARD affinity scope can reduce some contention on such systems and help achieve greater performance.

Linux engineer Breno Leitao with Meta worked on the set of patches for introducing the WQ_AFFN_CACHE_SHARD affinity scope to address observed bottlenecks where there are many CPU cores sharing the same L3 cache that can lead to heavy spinlock contention. The default unbound workqueue with WQ_AFFN_CACHE where there is just one pool for the entire system can lead to contention and hurt I/O performance.

While this issue is most observable on today's high-end Intel / AMD / Arm high core count processors, even for a 12-core system with a single shared L3 cache, Oracle engineer Check Lever had found when using NFS-over-RDMA with 12 FIO jobs that around 39% of the CPU cycles were spent in a spin lock slow-path largely from the default workqueue behavior.

[1]

WQ_AFFN_CACHE_SHARD as this new intermediate affinity level showed nice throughput gains for Intel Xeon and NVIDIA Grace CPUs. Even on a 16-core Xeon D server there was an observed improvement up to 5.9% in FIO with random reads from NVMe storage. Or as noted with [2]this merged patch :

"Benchmark on NVIDIA Grace (72 CPUs, single LLC, 50k items/thread), show cache_shard delivers ~5x the throughput and ~6.5x lower p50 latency compared to cache scope on this 72-core single-LLC system."

With the WQ_AFFN_CACHE_SHARD affinity scope, it subdivides each LLC into groups of at most wq_cache_shard_size CPUs with wq_cache_shard_size defaulting to eight but can be configured at boot time. This new cache affinity is the default one with its introduction in Linux 7.1.

WQ_AFFN_CACHE_SHARD is the main highlight of the [3]workqueue changes submitted for Linux 7.1 that were [4]merged today as the latest enticing optimization of this next kernel version.



[1] https://www.phoronix.com/image-viewer.php?id=2026&image=linux_71_workqueue_lrg

[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5920d046f7ae3bf9cf51b9d915c1fff13d299d84

[3] https://lore.kernel.org/lkml/283d252c7356bdc7640f48ef716051cb@kernel.org/

[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7de6b4a246330fe29fa2fd144b4724ca35d60d6c



"If you want to eat hippopotamus, you've got to pay the freight."
-- attributed to an IBM guy, about why IBM software uses so much memory