Linux 7.1 Lands High Resolution Timer "HRTIMER" Overhaul
([Linux Kernel] 5 Hours Ago
HRTIMER Rework)
- Reference: 0001627866
- News link: https://www.phoronix.com/news/Linux-7.1-HRTIMER-Overhaul
- Source link:
Merged this week for Linux 7.1 was a rework of the high resolution timer "HRTIMER" subsystem for reducing the overhead of frequently-armed timers, such as the HRTICK scheduler timer. The HRTICK scheduler timer is useful for enhancing system responsiveness and fairness.
This HRTIMER overhaul now in Linux 7.1 is what was covered back in early March on Phoronix with [1]improvements for reducing HRTICK timer overhead and in turn delivering a nice efficiency win.
Thomas Gleixner explained in the now-merged pull request:
"A rework of the hrtimer subsystem to reduce the overhead for frequently armed timers, especially the hrtick scheduler timer.
- Better timer locality decision
- Simplification of the evaluation of the first expiry time by keeping track of the neighbor timers in the RB-tree by providing a RB-tree variant with neighbor links. That avoids walking the RB-tree on removal to find the next expiry time, but even more important allows to quickly evaluate whether a timer which is rearmed changes the position in the RB-tree with the modified expiry time or not. If not, the dequeue/enqueue sequence which both can end up in rebalancing can be completely avoided.
- Deferred reprogramming of the underlying clock event device. This optimizes for the situation where a hrtimer callback sets the need resched bit. In that case the code attempts to defer the re-programming of the clock event device up to the point where the scheduler has picked the next task and has the next hrtick timer armed. In case that there is no immediate reschedule or soft interrupts have to be handled before reaching the reschedule point in the interrupt entry code the clock event is reprogrammed in one of those code paths to prevent that the timer becomes stale.
- Support for clocksource coupled clockevents
The TSC deadline timer is coupled to the TSC. The next event is programmed in TSC time. Currently this is done by converting the CLOCK_MONOTONIC based expiry value into a relative timeout, converting it into TSC ticks, reading the TSC adding the delta ticks and writing the deadline MSR.
As the timekeeping core has the conversion factors for the TSC already, the whole back and forth conversion can be completely avoided. The timekeeping core calculates the reverse conversion factors from nanoseconds to TSC ticks and utilizes the base timestamps of TSC and CLOCK_MONOTONIC which are updated once per tick. This allows a direct conversion into the TSC deadline value without reading the time and as a bonus keeps the deadline conversion in sync with the TSC conversion factors, which are updated by adjtimex() on systems with NTP/PTP enabled.
- Allow inlining of the clocksource read and clockevent write functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR.
With all those enhancements in place a hrtick enabled scheduler provides the same performance as without hrtick. But also other hrtimer users obviously benefit from these optimizations."
The timers [2]pull also brings robustness improvements, rewriting of the clocksource watchdog, and other improvements.
[1] https://www.phoronix.com/news/Linux-7.1-HRTICK-Timer
[2] https://lore.kernel.org/lkml/177601564263.7932.2238613098642404049.tglx@xen13/
This HRTIMER overhaul now in Linux 7.1 is what was covered back in early March on Phoronix with [1]improvements for reducing HRTICK timer overhead and in turn delivering a nice efficiency win.
Thomas Gleixner explained in the now-merged pull request:
"A rework of the hrtimer subsystem to reduce the overhead for frequently armed timers, especially the hrtick scheduler timer.
- Better timer locality decision
- Simplification of the evaluation of the first expiry time by keeping track of the neighbor timers in the RB-tree by providing a RB-tree variant with neighbor links. That avoids walking the RB-tree on removal to find the next expiry time, but even more important allows to quickly evaluate whether a timer which is rearmed changes the position in the RB-tree with the modified expiry time or not. If not, the dequeue/enqueue sequence which both can end up in rebalancing can be completely avoided.
- Deferred reprogramming of the underlying clock event device. This optimizes for the situation where a hrtimer callback sets the need resched bit. In that case the code attempts to defer the re-programming of the clock event device up to the point where the scheduler has picked the next task and has the next hrtick timer armed. In case that there is no immediate reschedule or soft interrupts have to be handled before reaching the reschedule point in the interrupt entry code the clock event is reprogrammed in one of those code paths to prevent that the timer becomes stale.
- Support for clocksource coupled clockevents
The TSC deadline timer is coupled to the TSC. The next event is programmed in TSC time. Currently this is done by converting the CLOCK_MONOTONIC based expiry value into a relative timeout, converting it into TSC ticks, reading the TSC adding the delta ticks and writing the deadline MSR.
As the timekeeping core has the conversion factors for the TSC already, the whole back and forth conversion can be completely avoided. The timekeeping core calculates the reverse conversion factors from nanoseconds to TSC ticks and utilizes the base timestamps of TSC and CLOCK_MONOTONIC which are updated once per tick. This allows a direct conversion into the TSC deadline value without reading the time and as a bonus keeps the deadline conversion in sync with the TSC conversion factors, which are updated by adjtimex() on systems with NTP/PTP enabled.
- Allow inlining of the clocksource read and clockevent write functions when they are tiny enough, e.g. on x86 RDTSC and WRMSR.
With all those enhancements in place a hrtick enabled scheduler provides the same performance as without hrtick. But also other hrtimer users obviously benefit from these optimizations."
The timers [2]pull also brings robustness improvements, rewriting of the clocksource watchdog, and other improvements.
[1] https://www.phoronix.com/news/Linux-7.1-HRTICK-Timer
[2] https://lore.kernel.org/lkml/177601564263.7932.2238613098642404049.tglx@xen13/