News: 0001539011

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

PostgreSQL Turns To AVX-512 For CRC32 Computations: Up To 3x Faster

([Free Software] 5 Hours Ago AVX-512 For CRC32C)


In addition to the recent [1]optional IO_uring support for the PostgreSQL database server on Linux and [2]async I/O batch mode , another exciting performance improvement was merged this week. Landing in the PostgreSQL database server this week was support for using AVX-512 instructions for CRC32C computations.

Leveraging AVX-512 instructions on capable AMD and Intel processors can lead to some wild performance improvements for the CRC32C cyclic redundancy check code path for this popular open-source database. The [3]commit adding this AVX-512 usage for CRC32C calculations explains:

"Compute CRC32C using AVX-512 instructions where available

The previous implementation of CRC32C on x86 relied on the native CRC32 instruction from the SSE 4.2 extension, which operates on up to 8 bytes at a time. We can get a substantial speedup by using carryless multiplication on SIMD registers, processing 64 bytes per loop iteration. Shorter inputs fall back to ordinary CRC instructions. On Intel Tiger Lake hardware (2020), CRC is now 50% faster for inputs between 64 and 112 bytes, and 3x faster for 256 bytes.

The VPCLMULQDQ instruction on 512-bit registers has been available on Intel hardware since 2019 and AMD since 2022. There is an older variant for 128-bit registers, but at least on Zen 2 it performs worse than normal CRC instructions for short inputs.

We must now do a runtime check, even for builds that target SSE 4.2. This doesn't matter in practice for WAL (arguably the most critical case), because since commit e2809e3 the final computation with the 20-byte WAL header is inlined and unrolled when targeting

that extension. Compared with two direct function calls, testing showed equal or slightly faster performance in performing an indirect

function call on several dozen bytes followed by inlined instructions on constant input of 20 bytes."

50% to 3x faster when testing on Intel Tiger Lake with this AVX-512 CRC32C code! It will be interesting to see if there are even more pronounced gains on newer Intel Xeon server processors with better AVX-512 support or similarly with the AMD Zen 4 and Zen 5 processors with their widespread AVX-512 support.

This is another great improvement for the open-source PostgreSQL 18 database server ahead of that next major feature release due out around September.



[1] https://www.phoronix.com/news/PostgreSQL-Lands-IO_uring

[2] https://www.phoronix.com/news/PostgreSQL-AIO-Batch-Mode

[3] https://github.com/postgres/postgres/commit/3c6e8c123896584f1be1fe69aaf68dcb5eb094d5



Cmdr_Zod

numacross

uid313

numacross

kernel, n.:
A part of an operating system that preserves the medieval
traditions of sorcery and black art.