Linux 6.18 Lands Retpoline Optimization To Help With Intel E Cores
([Intel] 2 Hours Ago
Retpoline Optimization)
- Reference: 0001583944
- News link: https://www.phoronix.com/news/Linux-6.18-Retpoline-Opt
- Source link:
The Linux 6.18 merge window is winding down this weekend ahead of Linux 6.18-rc1 expected on Sunday. Merged today were some remaining x86 core updates, which includes a Retpoline optimization patch intended to help out Intel E core CPUs.
Return trampolines " [1]Retpolines " are needed for Spectre Variant Two mitigations. Intel engineer Peter Zijlstra landed a patch for optimizing the x86 patch_retpoline() code within the kernel. He explains with [2]the patch :
Currently the very common retpoline: "CS CALL __x86_indirect_thunk_r11" is transformed into "CALL *R11; NOP3" for eIBRS/BHI_NO parts.
Similarly, paranoid fineibt has: "CALL *R11; NOP".
Recognise that CS stuffing can avoid the extra NOP. However, due to prefix decode penalties, make sure to not emit too many CS prefixes. Notably: "CS CALL __x86_indirect_thunk_rax" must not become "CS CS CS CS CALL *RAX". Prefix decode penalties are typically many more cycles than decoding an extra NOP.
Additionally, if the retpoline is a tail-call, the "JMP *%\reg" should be followed by INT3 for straight-line-speculation mitigation, since emit_indirect() now has a length argument, move this into emit_indirect() such that other users (paranoid-fineibt) also do this.
The original [3]mailing list post for the patch adds more context:
"Finding the exact prefix decode penalties for uarchs that have eIBRS/BHI_NO is not a fun time. I've stuck to the general wisdom that 3 prefixes is mostly good (notably, the instruction at hand has no 0x0f escape which is sometimes counted towards the prefix budget -- it can have a REX prefix, but those are generally not counted towards the prefix budget).
In general Intel P-cores do not have prefix decode penalties, but the E-cores (or rather the Atom line) generally does. And since this all runs on hybrid cores, the code must accommodate them.
I hate all this."
That patch was merged to Linux Git today via the [4]x86/core pull ahead of Linux 6.18-rc1 tomorrow.
[1] https://www.phoronix.com/search/Retpolines
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=4a1e02b15ac174c3c6d5e358e67c4ba980e7b336
[3] https://lore.kernel.org/all/20250902104627.GM4068168@noisy.programming.kicks-ass.net/T/#u
[4] https://lore.kernel.org/lkml/20251011134629.GAaOpftWmLdD6L7bJn@fat_crate.local/
Return trampolines " [1]Retpolines " are needed for Spectre Variant Two mitigations. Intel engineer Peter Zijlstra landed a patch for optimizing the x86 patch_retpoline() code within the kernel. He explains with [2]the patch :
Currently the very common retpoline: "CS CALL __x86_indirect_thunk_r11" is transformed into "CALL *R11; NOP3" for eIBRS/BHI_NO parts.
Similarly, paranoid fineibt has: "CALL *R11; NOP".
Recognise that CS stuffing can avoid the extra NOP. However, due to prefix decode penalties, make sure to not emit too many CS prefixes. Notably: "CS CALL __x86_indirect_thunk_rax" must not become "CS CS CS CS CALL *RAX". Prefix decode penalties are typically many more cycles than decoding an extra NOP.
Additionally, if the retpoline is a tail-call, the "JMP *%\reg" should be followed by INT3 for straight-line-speculation mitigation, since emit_indirect() now has a length argument, move this into emit_indirect() such that other users (paranoid-fineibt) also do this.
The original [3]mailing list post for the patch adds more context:
"Finding the exact prefix decode penalties for uarchs that have eIBRS/BHI_NO is not a fun time. I've stuck to the general wisdom that 3 prefixes is mostly good (notably, the instruction at hand has no 0x0f escape which is sometimes counted towards the prefix budget -- it can have a REX prefix, but those are generally not counted towards the prefix budget).
In general Intel P-cores do not have prefix decode penalties, but the E-cores (or rather the Atom line) generally does. And since this all runs on hybrid cores, the code must accommodate them.
I hate all this."
That patch was merged to Linux Git today via the [4]x86/core pull ahead of Linux 6.18-rc1 tomorrow.
[1] https://www.phoronix.com/search/Retpolines
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=4a1e02b15ac174c3c6d5e358e67c4ba980e7b336
[3] https://lore.kernel.org/all/20250902104627.GM4068168@noisy.programming.kicks-ass.net/T/#u
[4] https://lore.kernel.org/lkml/20251011134629.GAaOpftWmLdD6L7bJn@fat_crate.local/