GCC Git Adjusts Unaligned Load/Store Costs For AMD Zen 4 & Zen 5
([GNU] 6 Hours Ago
znver4 + znver5 Tuning)
- Reference: 0001478784
- News link: https://www.phoronix.com/news/GCC-Zen-Unaligned-Load-Store
- Source link:
Stemming from a recent investigation into a GCC compiler [1]regression on Zen 4, it was discovered that the unaligned load/store costs for the Zen 4 and Zen 5 targets were inaccurate and have now been tweaked within GCC Git.
GCC compiler expert Richard Biener at SUSE reported the original Zen 4 regression and went on to analyze and fix-up the issue. In updating the unaligned load/store costs for the "Znver4" compiler target he [2]explained :
"Fixup unaligned load/store cost for znver4
Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply left untouched from znver3 where they equate the aligned costs when tweaking aligned costs for znver4. The following makes the unaligned costs equal to the aligned costs.
This avoids the miscompile seen in PR115843 but it's of course not a real fix for the issue uncovered there. But it makes it qualify as a regression fix."
It's not the first time we've seen AMD Zen targets a bit hairy as a result of starting off by copying over from prior Zen revisions but the cost tables not always being updated accurately.
And then similarly the Zen 5 (znver5) target was originally copied over too and thus it also needed [3]updating :
"Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply copied from the bogus znver4 costs. The following makes the unaligned costs equal to the aligned costs like in the fixed znver4 version."
At least as being treated as a "regression fix" these AMD Zen tuning patches should be picked up in time for the upcoming GCC 14.2 point release.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843
[2] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1e3aa9c9278db69d4bdb661a750a7268789188d6
[3] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=896393791ee34ffc176c87d232dfee735db3aaab
GCC compiler expert Richard Biener at SUSE reported the original Zen 4 regression and went on to analyze and fix-up the issue. In updating the unaligned load/store costs for the "Znver4" compiler target he [2]explained :
"Fixup unaligned load/store cost for znver4
Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply left untouched from znver3 where they equate the aligned costs when tweaking aligned costs for znver4. The following makes the unaligned costs equal to the aligned costs.
This avoids the miscompile seen in PR115843 but it's of course not a real fix for the issue uncovered there. But it makes it qualify as a regression fix."
It's not the first time we've seen AMD Zen targets a bit hairy as a result of starting off by copying over from prior Zen revisions but the cost tables not always being updated accurately.
And then similarly the Zen 5 (znver5) target was originally copied over too and thus it also needed [3]updating :
"Currently unaligned YMM and ZMM load and store costs are cheaper than aligned which causes the vectorizer to purposely mis-align accesses by adding an alignment prologue. It looks like the unaligned costs were simply copied from the bogus znver4 costs. The following makes the unaligned costs equal to the aligned costs like in the fixed znver4 version."
At least as being treated as a "regression fix" these AMD Zen tuning patches should be picked up in time for the upcoming GCC 14.2 point release.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115843
[2] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1e3aa9c9278db69d4bdb661a750a7268789188d6
[3] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=896393791ee34ffc176c87d232dfee735db3aaab
chuckula