News: 0001561904

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

New FFmpeg AVX-512 Optimizations Hit Up To 36x The Performance Of Plain C Code

([Multimedia] 5 Hours Ago scene_sad + AVX-512)


Some commits merged today to FFmpeg Git provide additional hand-tuned Assembly code for [1]AVX-512 with capable Intel and AMD processors.

Open-source multimedia developer Niklas Haas today upstreamed some additional AVX2 and AVX-512 tuning to FFmpeg, on top of the multimedia library's already vast array of hand-tuned code for leveraging Advanced Vector Extensions.

For FFmpeg's avfilter scene_sad code, there is now [2]an AVX-512 implementation added that comes in at 36.31x the speed of the plain C code, according to benchmarks run by Niklas Haas. There was already an AVX2 path that achieved 25x the performance of the common C code but now with AVX-512 is exceeding 36x the performance.

[3]Another commit added high bit depth AVX2 and AVX-512 versions of the scene_sad avfilter code. There is around an 11x improvement over the common C code or around 22x when using AVX-512.

AVX-512 continues to pay off particularly with the latest AMD Zen 4 / Zen 5 and recent Intel Xeon processors.



[1] https://www.phoronix.com/search/AVX-512

[2] https://github.com/FFmpeg/FFmpeg/commit/91f2d146d418d536e14b0d0c2d32f81cb95f6b7f

[3] https://github.com/FFmpeg/FFmpeg/commit/e44a1aaeecc14fc396e0c715969ddd3cc939933d



edxposed

meego

schmidtbag

oleid

Ferrum Master

blackshard

edxposed

carewolf

carewolf

Proposed Additions to the PDP-11 Instruction Set:

PI Punch Invalid
POPI Punch Operator Immediately
PVLC Punch Variable Length Card
RASC Read And Shred Card
RPM Read Programmers Mind
RSSC reduce speed, step carefully (for improved accuracy)
RTAB Rewind tape and break
RWDSK rewind disk
RWOC Read Writing On Card
SCRBL scribble to disk - faster than a write
SLC Search for Lost Chord
SPSW Scramble Program Status Word
SRSD Seek Record and Scar Disk
STROM Store in Read Only Memory
TDB Transfer and Drop Bit
WBT Water Binary Tree