AMD Zen 5 Tuning "Part Two" Merged For GCC Compiler

([AMD] 91 Minutes Ago AMD Znver5 Tuning)

Reference: 0001489456
News link: https://www.phoronix.com/news/AMD-Zen-5-Tuning-Part-2-GCC
Source link:

Merged today for the GCC 15 compiler in development and potentially for back-porting to the next GCC 14 point release is a second round of AMD Zen 5 " [1]znver5 " tuning.

GNU Compiler Collection (GCC) expert Jan Hubicka of SUSE's compiler team worked out this latest round of compiler tuning for benefiting the Ryzen AI 300 series, Ryzen 9000 series desktops, and upcoming EPYC Turin processors. As with past generations of Zen processors, AMD has largely relied on SUSE's compiler talent for working out much of the compiler enablement and tuning for this leading open-source compiler.

With this "round 2" tuning of AMD Znver5 compiler support, it's focused on disabling gather and scatter support by default. Similar to [2]past AMD Zen tuning for GCC , disabling gather and scatter instructions by default is done in the name of overall performance.

[3]

Jan Hubicka explained with today's [4]patch to the GCC compiler :

"We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to be more complicated.

gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot products in TSVC. Curiously enough breaking these out into microbenchmark reversed the situation and it turns out that the performance depends on how indices are distributed. gather is loss if indices are sequential, neutral if they are random and win for some strides (4, 8).

This seems to be similar to earlier zens, so I think (especially for backporting znver5 support) that it makes sense to be conistent and disable gather unless we work out a good heuristics on when to use it. Since we typically do not know the indices in advance, I don't see how that can be done."

[5]This GCC bug report was opened today for tracking the gather instructions performance on Zen CPUs. In response to a question from another developer when the gather instructions are a win: " it is mysterious.. "

Part 1 of the optimizations to Znver5 were to [6]avoid FMA chains since they don't work well on Zen 5 processors compared to Znver4.

Hopefully we'll see more AMD Zen 5 compiler tuning soon for GCC. We are also still waiting on Znver5 enablement to come for the LLVM/Clang compiler. As of writing the Znver5 support hasn't landed in LLVM Git nor are there any open pull requests from AMD or their partners in providing that support. That's particularly unfortunate with AMD Ryzen AI 300 and Ryzen 9000 series processors already shipping and [7]LLVM 19 being released in the coming days .

[1] https://www.phoronix.com/search/Znver5

[2] https://www.phoronix.com/news/GCC-12-Znver3-Gather-Tweak

[3] https://www.phoronix.com/image-viewer.php?id=2024&image=gcc_znver5_part2_lrg

[4] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d82edbe92eed53a479736fcbbe6d54d0fb42daa4

[5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116582

[6] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d6360b4083695970789fd65b9c515c11a5ce25b4

[7] https://www.phoronix.com/news/LLVM-Clang-19-Feature-Freeze

News: 0001489456

AMD Zen 5 Tuning "Part Two" Merged For GCC Compiler

phoronix