Linux 6.16 Will Now Conveniently Report Hard/Soft Lockups & RCU Stall Counts
([Linux Kernel] 2 Hours Ago
sysfs Reports)
- Reference: 0001550720
- News link: https://www.phoronix.com/news/Linux-6.16-Hard-Soft-Lockups
- Source link:
A very convenient addition to [1]Linux 6.16 for system administrators is reporting to user-space via sysfs counters for the number of hard and soft lock-ups as well as RCU stalls.
Linux 6.16 introduces /sys/kernel/hardlockup_count and /sys/kernel/softlockup_count and /sys/kernel/rcu_stall_count as easy means of seeing the hard lockup, soft lockup, and RCU stall counts for the booted kernel. The count is the total number since boot time for seeing if lock-ups are happening often and/or for comparing the counts over a period of time.
Max Kellermann who authored the patches for these counters commented in the patch series:
"Commits 9db89b411170 ("exit: Expose "oops_count" to sysfs") and 8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for oopses and warnings to sysfs, and these two patches do the same for hard/soft lockups and RCU stalls.
All of these counters are useful for monitoring tools to detect whether the machine is healthy. If the kernel has experienced a lockup or a stall, it's probably due to a kernel bug, and I'd like to detect that quickly and easily. There is currently no way to detect that, other than parsing dmesg. Or observing indirect effects: such as certain tasks not responding, but then I need to observe all tasks, and it may take a while until these effects become visible/measurable. I'd rather be able to detect the primary cause more quickly, possibly before everything falls apart."
These handy /sys/kernel/hardlockup_count , /sys/kernel/softlockup_count and /sys/kernel/rcu_stall_count counters were added via the [2]non-MM pull for Linux 6.16.
[1] https://www.phoronix.com/search/Linux+6.16
[2] https://lore.kernel.org/lkml/20250531153157.8fd9b708ae4009f5dbe81a9e@linux-foundation.org/
Linux 6.16 introduces /sys/kernel/hardlockup_count and /sys/kernel/softlockup_count and /sys/kernel/rcu_stall_count as easy means of seeing the hard lockup, soft lockup, and RCU stall counts for the booted kernel. The count is the total number since boot time for seeing if lock-ups are happening often and/or for comparing the counts over a period of time.
Max Kellermann who authored the patches for these counters commented in the patch series:
"Commits 9db89b411170 ("exit: Expose "oops_count" to sysfs") and 8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for oopses and warnings to sysfs, and these two patches do the same for hard/soft lockups and RCU stalls.
All of these counters are useful for monitoring tools to detect whether the machine is healthy. If the kernel has experienced a lockup or a stall, it's probably due to a kernel bug, and I'd like to detect that quickly and easily. There is currently no way to detect that, other than parsing dmesg. Or observing indirect effects: such as certain tasks not responding, but then I need to observe all tasks, and it may take a while until these effects become visible/measurable. I'd rather be able to detect the primary cause more quickly, possibly before everything falls apart."
These handy /sys/kernel/hardlockup_count , /sys/kernel/softlockup_count and /sys/kernel/rcu_stall_count counters were added via the [2]non-MM pull for Linux 6.16.
[1] https://www.phoronix.com/search/Linux+6.16
[2] https://lore.kernel.org/lkml/20250531153157.8fd9b708ae4009f5dbe81a9e@linux-foundation.org/
phoronix