News: 0001603427

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Linux Addressing Out-Of-Memory Killer Inaccuracy On Large Core Count Systems

([Linux Kernel] 5 Hours Ago OOM Killer)


A patch is on the way to the Linux kernel and looks like it could be ready for the 6.20~7.0 kernel for addressing out-of-memory "OOM" killer inaccuracy behavior when dealing with large core count systems.

A patch by Linux developer Mathieu Desnoyers made it into Andrew Morton's "mm-everything" queue this week to fix out-of-memory killer inaccuracy on large many-core systems.

In early 2025 it was [1]reported that there were inaccuracies in the OOM killer when dealing with today's high core count systems, at least in the 250+ core/thread count range:

"Recently, several internal services had an RSS usage regression as part of a kernel upgrade. Previously, they were on a pre-6.2 kernel and were able to read RSS statistics in a backup watchdog process to monitor and decide if they'd overrun their memory budget. Now, however, a representative service with five threads, expected to use about a hundred MB of memory, on a 250-cpu machine had memory usage tens of megabytes different from the expected amount -- this constituted a significant percentage of inaccuracy, causing the watchdog to act.

...

This is a really tremendous inaccuracy for any few-threaded program on a large machine and impedes monitoring significantly. These stat counters are also used to make OOM killing decisions, so this additional inaccuracy could make a big difference in OOM situations -- either resulting in the wrong process being killed, or in less memory being returned from an OOM-kill than expected.

Finally, while the change to percpu_counter does significantly improve the accuracy over the previous per-thread error for many-threaded services, it does also have performance implications - up to 12% slower for short-lived processes and 9% increased system time in make test workloads."

[2]This patch working its way to the mainline kernel hopefully for the upcoming Linux 6.20~7.0 cycle should address those inaccuracies.



[1] https://lore.kernel.org/lkml/20250331223516.7810-2-sweettea-kernel@dorminy.me/

[2] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything&id=f85306789224f6862ba8bfc5e046a318a9fd58f7



A would-be disciple came to Nasrudin's hut on the mountain-side. Knowing
that every action of such an enlightened one is significant, the seeker
watched the teacher closely. "Why do you blow on your hands?" "To warm
myself in the cold." Later, Nasrudin poured bowls of hot soup for himself
and the newcomer, and blew on his own. "Why are you doing that, Master?"
"To cool the soup." Unable to trust a man who uses the same process
to arrive at two different results -- hot and cold -- the disciple departed.