News: 0001595255

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

GNU C Library Sees Up To 12.9x Improvement With New Generic FMA Implementation

([GNU] 6 Hours Ago Faster Glibc)


Just a few days ago I wrote about [1]the Glibc math code seeing a 4x improvement on AMD Zen by changing the used FMA implementation. Merged overnight was a new generic FMA implementation for the GNU C Library and now yielding up to a 12.9x throughput improvement on AMD Zen 3.

Adhemerval Zanella contributed this new generic FMA implementation to the GNU C Library. Zanella explained in [2]the patch landing this new generic Fused Multiply Add (FMA) implementation:

"The current implementation relies on setting the rounding mode for different calculations (first to FE_TONEAREST and then to FE_TOWARDZERO) to obtain correctly rounded results. For most CPUs, this adds a significant performance overhead since it requires executing a typically slow instruction (to get/set the floating-point status), it necessitates flushing the pipeline, and breaks some compiler assumptions/optimizations.

This patch introduces a new implementation originally written by Szabolcs for musl, which utilizes mostly integer arithmetic. Floating-point arithmetic is used to raise the expected exceptions, without the need for fenv.h operations.

I added some changes compared to the original code:

* Fixed some signaling NaN issues when the 3-argument is NaN.

* Use math_uint128.h for the 64-bit multiplication operation. It allows the compiler to use 128-bit types where available, which enables some optimizations on certain targets (for instance, MIPS64).

* Fixed an arm32 issue where the libgcc routine might not respect the rounding mode. This can also be used on other targets to optimize the conversion from int64_t to double.

* Use -fexcess-precision=standard on i686."

This new musl libc based implementation is showing some "large improvements" with tests carried out by Adhemerval Zanella:

In another commit, Adhemerval Zanella summed up the recent math improvements made for Glibc 2.43 as:

"* Additional optimized and correctly rounded mathematical functions have been imported from the CORE-MATH project, in particular acosh, asinh, atanh, erf, erfc, lgamma, and tgamma.

* Optimized implementations for remainder, remaindef, frexpf, frexp, frexpl (binary128), and frexpl (intel96) have been added.

* The SVID handling for acosf, acoshf, asinhf, atan2f, atanhf, coshf, lgammaf/lgammaf_r, log10f, sinhf, sqrtf, tgammaf, y0/j0, y1/j1, and yn/jn were moved to compat symbols, allowing improvements in performance."

Look for these improvements and more with Glibc 2.43 due for release in February.



[1] https://www.phoronix.com/news/Glibc-4x-FMA-Improvement-Zen

[2] https://sourceware.org/git/?p=glibc.git;a=commit;h=bf211c34993921eccbc074f82cfbb8e9a16d850c



The most advantageous, pre-eminent thing thou canst do is not to exhibit
nor display thyself within the limits of our galaxy, but rather depart
instantaneously whence thou even now standest and flee to yet another rotten
planet in the universe, if thou canst have the good fortune to find one.
-- Carlyle