News: 0001506394

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Linux Fixes Hosts Randomly Rebooting During Virtualization With Ryzen 7000/8000 CPUs

([AMD] 2 Hours Ago VMLOAD/VMSAVE On Zen4 Client)


Ahead of the [1]Linux 6.12 kernel release expected today there is a last minute "x86/urgent" pull request. Notable with this last minute x86 urgent fixes for Linux 6.12 -- and also to be back-ported to prior kernel versions -- is working around an issue with AMD Ryzen Zen 4 client processors such as the Ryzen 7000/8000 series processors when making use of virtualization that could lead to the host randomly being rebooted.

The issue originates from [2]this bug report back in July over random host rebooots when using AMD Ryzen 7000/8000 series CPUs and using nested virtual machines. The bug report noted:

"Running nested VMs on AMD Ryzen 7000/8000 (ZEN4) CPUs results in random host's reboots.

There is no kernel panic, no log entries, no relevant output to serial console. It is as if platform is simply hard reset. It seems time to reproduce it varies from system to system and can be dependent on workload and even specific CPU model."

Fast forward several months, the issue is that VMLOAD/VMSAVE support is being wrongly advertised on the Zen 4 client processors. So the change for Linux 6.12 and prior stable kernel versions is to clear the VMLOAD/VMSAVE capability for the Zen 4 client SoCs. VMLOAD/VMSAVE remains supported and enabled for the AMD EPYC 4004/8004/9004 server processors with this issue just affecting the Ryzen client processors.

[3]The patch by AMD Linux engineer Mario Limonciello explains:

"A number of Zen4 client SoCs advertise the ability to use virtualized VMLOAD/VMSAVE, but using these instructions is reported to be a cause of a random host reboot.

These instructions aren't intended to be advertised on Zen4 client so clear the capability."

This fix is in the [4]x86/urgent pull request . Also on the AMD side there is a fix for for a Kdump kernel failure on AMD Secure Memory Encryption (SME) systems when the kernel is built with CONFIG_IMA_KEXEC enabled.



[1] https://www.phoronix.com/news/Linux-6.12-Feature-Reminder

[2] https://bugzilla.kernel.org/show_bug.cgi?id=219009

[3] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0

[4] https://lore.kernel.org/lkml/20241117123710.GAZznjdtizJgrgwx1I@fat_crate.local/



sophisticles

I would have you imagine, then, that there exists in the mind of man a block
of wax... and that we remember and know what is imprinted as long as the
image lasts; but when the image is effaced, or cannot be taken, then we
forget or do not know.
-- Plato, Dialogs, Theateus 191

[Quoted in "VMS Internals and Data Structures", V4.4, when
referring to image activation and termination.]