News: 0001528395

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Latest AVX-512 Optimization For FFmpeg Shows Wild Improvement On AMD Ryzen

([Multimedia] 4 Hours Ago FFmpeg AVX-512)


Merged today for the widely-used FFmpeg open-source multimedia library was yet another AVX-512 optimized code path... Compared to the pure C code, the AVX2 code path was 10.98x faster while this new AVX-512 code path clocks in at 18x the performance of the common C code.

The latest FFmpeg code seeing the AVX-512 treatment is the uyvytoyuv422 function for UYVY to YUV422 format conversion. The AVX-512 optimized code path via hand-written Assembly is a great benefit here. AVX-512 namely found with Intel Xeon processors or all AMD Ryzen and EPYC processors since Zen 4. The benchmarks posted for this patch were carried out with an AMD Ryzen 9 7950X.

The gains are very beneficial with this AVX-512 code path hitting 18.02x the performance of the common C path while the AVX2 only path goes at 10.98x.

Shreesh Adiga who authored [1]the patch explained:

"The scalar loop is replaced with masked AVX512 instructions. For extracting the Y from UYVY, vperm2b is used instead of various AND and packuswb.

Instead of loading the vectors with interleaved lanes as done in AVX2 version, normal load is used. At the end of packuswb, for U and V, an extra permute operation is done to get the required layout."

A nice win for the next FFmpeg release assuming your CPU supports AVX-512. That's especially true for AMD Zen 4 and even more so with [2]the great AVX-512 AMD Zen 5 showing across their entire CPU product stack.



[1] https://github.com/FFmpeg/FFmpeg/commit/e18f87ed9f9f61c980420b315dc8ecb308831bc5

[2] https://www.phoronix.com/review/amd-epyc-turin-avx512



tessiof

usaga

coder

brad0

yump

coder

coder

usaga

mos87

There is no delight the equal of dread. As long as it is somebody else's.
-- Clive Barker