Linux Fixes Hosts Randomly Rebooting During Virtualization With Ryzen 7000/8000 CPUs
([AMD] 2 Hours Ago
VMLOAD/VMSAVE On Zen4 Client)
- Reference: 0001506394
- News link: https://www.phoronix.com/news/Linux-Clear-VMLOAD-VMSAVE-Zen4
- Source link:
Ahead of the [1]Linux 6.12 kernel release expected today there is a last minute "x86/urgent" pull request. Notable with this last minute x86 urgent fixes for Linux 6.12 -- and also to be back-ported to prior kernel versions -- is working around an issue with AMD Ryzen Zen 4 client processors such as the Ryzen 7000/8000 series processors when making use of virtualization that could lead to the host randomly being rebooted.
The issue originates from [2]this bug report back in July over random host rebooots when using AMD Ryzen 7000/8000 series CPUs and using nested virtual machines. The bug report noted:
"Running nested VMs on AMD Ryzen 7000/8000 (ZEN4) CPUs results in random host's reboots.
There is no kernel panic, no log entries, no relevant output to serial console. It is as if platform is simply hard reset. It seems time to reproduce it varies from system to system and can be dependent on workload and even specific CPU model."
Fast forward several months, the issue is that VMLOAD/VMSAVE support is being wrongly advertised on the Zen 4 client processors. So the change for Linux 6.12 and prior stable kernel versions is to clear the VMLOAD/VMSAVE capability for the Zen 4 client SoCs. VMLOAD/VMSAVE remains supported and enabled for the AMD EPYC 4004/8004/9004 server processors with this issue just affecting the Ryzen client processors.
[3]The patch by AMD Linux engineer Mario Limonciello explains:
"A number of Zen4 client SoCs advertise the ability to use virtualized VMLOAD/VMSAVE, but using these instructions is reported to be a cause of a random host reboot.
These instructions aren't intended to be advertised on Zen4 client so clear the capability."
This fix is in the [4]x86/urgent pull request . Also on the AMD side there is a fix for for a Kdump kernel failure on AMD Secure Memory Encryption (SME) systems when the kernel is built with CONFIG_IMA_KEXEC enabled.
[1] https://www.phoronix.com/news/Linux-6.12-Feature-Reminder
[2] https://bugzilla.kernel.org/show_bug.cgi?id=219009
[3] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0
[4] https://lore.kernel.org/lkml/20241117123710.GAZznjdtizJgrgwx1I@fat_crate.local/
The issue originates from [2]this bug report back in July over random host rebooots when using AMD Ryzen 7000/8000 series CPUs and using nested virtual machines. The bug report noted:
"Running nested VMs on AMD Ryzen 7000/8000 (ZEN4) CPUs results in random host's reboots.
There is no kernel panic, no log entries, no relevant output to serial console. It is as if platform is simply hard reset. It seems time to reproduce it varies from system to system and can be dependent on workload and even specific CPU model."
Fast forward several months, the issue is that VMLOAD/VMSAVE support is being wrongly advertised on the Zen 4 client processors. So the change for Linux 6.12 and prior stable kernel versions is to clear the VMLOAD/VMSAVE capability for the Zen 4 client SoCs. VMLOAD/VMSAVE remains supported and enabled for the AMD EPYC 4004/8004/9004 server processors with this issue just affecting the Ryzen client processors.
[3]The patch by AMD Linux engineer Mario Limonciello explains:
"A number of Zen4 client SoCs advertise the ability to use virtualized VMLOAD/VMSAVE, but using these instructions is reported to be a cause of a random host reboot.
These instructions aren't intended to be advertised on Zen4 client so clear the capability."
This fix is in the [4]x86/urgent pull request . Also on the AMD side there is a fix for for a Kdump kernel failure on AMD Secure Memory Encryption (SME) systems when the kernel is built with CONFIG_IMA_KEXEC enabled.
[1] https://www.phoronix.com/news/Linux-6.12-Feature-Reminder
[2] https://bugzilla.kernel.org/show_bug.cgi?id=219009
[3] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0
[4] https://lore.kernel.org/lkml/20241117123710.GAZznjdtizJgrgwx1I@fat_crate.local/
sophisticles