Restartable Sequences "RSEQ" Seeing Up To 16.7x Speedup With Newest Linux Patch
([Linux Kernel] 6 Hours Ago
RSEQ Cache Local)
- Reference: 0001497381
- News link: https://www.phoronix.com/news/RSEQ-Cache-Local-Speedup
- Source link:
For those making use of [1]Restartable Sequences (RSEQ) on Linux systems, there is an enticing performance optimization on the way.
RSEQ as a reminder is a low-level synchronization primitive for operating on per-CPU data in user-space. With work by Mathieu Desnoyers, there is an improvement around cache locality for RSQE concurrency IDs for intermittent workloads. Desnoyers explained in the patch:
"commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") introduced a per-mm/cpu current concurrency id (mm_cid), which keeps a reference to the concurrency id allocated for each CPU. This reference expires shortly after a 100ms delay.
These per-CPU references keep the per-mm-cid data cache-local in situations where threads are running at least once on each CPU within each 100ms window, thus keeping the per-cpu reference alive.
However, intermittent workloads behaving in bursts spaced by more than 100ms on each CPU exhibit bad cache locality and degraded performance compared to purely per-cpu data indexing, because concurrency IDs are allocated over various CPUs and cores, therefore losing cache locality of the associated data."
With the newest work to improve the per-MM-CID cache locality, there can be very nice speed-ups for intermittent workloads:
Those interested in this performance optimization work for Restartable Sequences can see [2]this patch for all the details.
[1] https://www.phoronix.com/search/Restartable+Sequences
[2] https://lore.kernel.org/lkml/20241009135007.2084357-1-mathieu.desnoyers@efficios.com/
RSEQ as a reminder is a low-level synchronization primitive for operating on per-CPU data in user-space. With work by Mathieu Desnoyers, there is an improvement around cache locality for RSQE concurrency IDs for intermittent workloads. Desnoyers explained in the patch:
"commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") introduced a per-mm/cpu current concurrency id (mm_cid), which keeps a reference to the concurrency id allocated for each CPU. This reference expires shortly after a 100ms delay.
These per-CPU references keep the per-mm-cid data cache-local in situations where threads are running at least once on each CPU within each 100ms window, thus keeping the per-cpu reference alive.
However, intermittent workloads behaving in bursts spaced by more than 100ms on each CPU exhibit bad cache locality and degraded performance compared to purely per-cpu data indexing, because concurrency IDs are allocated over various CPUs and cores, therefore losing cache locality of the associated data."
With the newest work to improve the per-MM-CID cache locality, there can be very nice speed-ups for intermittent workloads:
Those interested in this performance optimization work for Restartable Sequences can see [2]this patch for all the details.
[1] https://www.phoronix.com/search/Restartable+Sequences
[2] https://lore.kernel.org/lkml/20241009135007.2084357-1-mathieu.desnoyers@efficios.com/
MastaG