Linux Scheduler Adapted For A Latency Win & Avoiding An RT Deadlock
([Linux Kernel] 5 Hours Ago
Defer Throttle)
- Reference: 0001573881
- News link: https://www.phoronix.com/news/Linux-CFS-Defer-Throttle
- Source link:
A patch series for the Linux kernel scheduler code is queued up for expected introduction in Linux 6.18 to defer throttle when tasks exit to user-space. These changes to switch the scheduler to a task-based throttle model and task-based throttle time accounting can provide a latency win and also address a possible deadlock situation for real-time "RT" kernels.
Queued up today in tip/tip.git's "sched/core" Git branch are the patches for reworking the scheduler code around throttling. The status quo issue is described in the patch cover letter of [1]defer throttle when task exits to user :
"CFS tasks can end up throttled while holding locks that other, non-throttled tasks are blocking on.
For !PREEMPT_RT, this can be a source of latency due to the throttling causing a resource acquisition denial.
For PREEMPT_RT, this is worse and can lead to a deadlock:
o A CFS task p0 gets throttled while holding read_lock(&lock)
o A task p1 blocks on write_lock(&lock), making further readers enter the slowpath
o A ktimers or ksoftirqd task blocks on read_lock(&lock)
...
To fix this issue for PREEMPT_RT and improve latency situation for !PREEMPT_RT, change the throttle model to task based, i.e. when a cfs_rq is throttled, mark its throttled status but do not remove it from cpu's rq. Instead, for tasks that belong to this cfs_rq, when they get picked, add a task work to them so that when they return to user, they can be dequeued. In this way, tasks throttled will not hold any kernel resources. When cfs_rq gets unthrottled, enqueue back those throttled tasks."
With these patches now [2]queued into the sched/core TIP branch, this task-based throttle model work should be merged for the upcoming Linux 6.18 merge window barring no objections from Linus Torvalds or other code issues from coming to light.
[1] https://lore.kernel.org/all/20250829081120.806-1-ziqianlu@bytedance.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/core
Queued up today in tip/tip.git's "sched/core" Git branch are the patches for reworking the scheduler code around throttling. The status quo issue is described in the patch cover letter of [1]defer throttle when task exits to user :
"CFS tasks can end up throttled while holding locks that other, non-throttled tasks are blocking on.
For !PREEMPT_RT, this can be a source of latency due to the throttling causing a resource acquisition denial.
For PREEMPT_RT, this is worse and can lead to a deadlock:
o A CFS task p0 gets throttled while holding read_lock(&lock)
o A task p1 blocks on write_lock(&lock), making further readers enter the slowpath
o A ktimers or ksoftirqd task blocks on read_lock(&lock)
...
To fix this issue for PREEMPT_RT and improve latency situation for !PREEMPT_RT, change the throttle model to task based, i.e. when a cfs_rq is throttled, mark its throttled status but do not remove it from cpu's rq. Instead, for tasks that belong to this cfs_rq, when they get picked, add a task work to them so that when they return to user, they can be dequeued. In this way, tasks throttled will not hold any kernel resources. When cfs_rq gets unthrottled, enqueue back those throttled tasks."
With these patches now [2]queued into the sched/core TIP branch, this task-based throttle model work should be merged for the upcoming Linux 6.18 merge window barring no objections from Linus Torvalds or other code issues from coming to light.
[1] https://lore.kernel.org/all/20250829081120.806-1-ziqianlu@bytedance.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=sched/core
Kjell