News: 0001518482

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

OpenBLAS 0.3.29 Brings Auto-Detection For Intel Granite Rapids, Apple M4 & AMD Zen 5

([Programming] 63 Minutes Ago OpenBLAS 0.3.29)


OpenBLAS 0.3.29 is out today as a big update for this widely-used, open-source implementation for Basic Linear Algebra Subprograms and LAPACK APIs.

OpenBLAS 0.3.29 brings improved thread scaling for multi-threaded SBGEMV and TRTRI, various multi-threaded fixes, improved documentation, and other general fixes.

When it comes to CPU/platform-specific work, there is initial support for detecting Apple M4 SoCs, various ARM64 performance optimizations, a number of x86_64 improvements, improved CGEMM and ZGEMM kernels for POWER10, many LoongArch 64-bit improvements, and some tuning/optimizations for RISC-V.

[1]

On the x86_64 side for OpenBLAS 0.3.29 there is CPU auto-detection for Intel Granite Rapids processors, auto-detection for AMD Zen 5 series processors, optimized SOMATCOPY_CT for AVX-capable targets, and a variety of other fixes/optimizations.

Downloads and more details on OpenBLAS 0.3.29 via [2]GitHub .



[1] https://www.phoronix.com/image-viewer.php?id=2025&image=apple_m4_zen5_lrg

[2] https://github.com/OpenMathLib/OpenBLAS/releases/tag/v0.3.29



phoronix

Don't get mad, get interest.