News: 0001545733

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Intel oneDNN 3.8 Brings More CPU & GPU Performance Optimizations

([Intel] 6 Hours Ago oneDNN 3.8)


Intel software engineers released oneDNN 3.8 to end out the week with various new performance optimizations and more.

The Intel oneDNN library that is now part of the UXL Foundation serves as the building blocks for AI / deep learning applications. This library provides basic building blocks for deep learning applications and is aggressively optimized for Intel's hardware offerings but with time has also developed robust support for competitor hardware platforms too.

With oneDNN 3.8 there are continued Intel AMX enhancements, better Panther Lake Xe3 integrated graphics performance, refinements for existing Xe2 graphics support, and other optimizations to benefit Intel's recent and upcoming CPU and GPU products.

"Intel Architecture Processors

- Improved matmul and inner product primitives performance on processors with Intel AMX instruction set support.

- Improved performance of convolution and inner product primitives on processors with Intel AVX2 instruction set support.

- Improved performance of int8 convolution support with zero points.

- Improved fp32 convolution performance with fp16 and bf16 compressed weights on processors with Intel AVX2 or Intel AVX-512 instruction set support.

- Improved fp16/bf16 depthwise convolution performance with fp32 bias or sum post-ops or dilation.

- Improved bf16 pooling backpropagation performance.

- Improved binary post-ops performance with per_w broadcast.

Intel Graphics Products

- Improved performance on Intel Arc graphics for future Intel Core Ultra processors (code name Panther Lake).

- Improved convolution performance on:

Intel Arc Graphics for Intel Core Ultra processor series 2 (formerly Lunar Lake).

Intel Arc B-series discrete graphics (formerly Battlemage).

- Improved int8 matmul performance with zero-points support for source and weight tensors.

- Improved f4_e2m1 and f4_e3m0 matmul and reorder performance.

- Improved performance of the following subgraphs with Graph API:

Scaled Dot Product Attention (SDPA) with int4 and int8 compressed key and value.

fp16/bf16 SDPA with fp32 intermediate data types. Using fp32 intermediate data types is recommended.

SDPA with head size 512 and 576.

Grouped Query Attention (GQA) with 5D input tensors."

The oneDNN 3.8 release also has FP16, INT8, and BF16 optimizations for AArch64 processors, Graph API support for NVIDIA GPUs, ROCm 6 support on AMD CPUs, and a variety of other smaller enhancements.

Downloads and more information on the oneDNN 3.8 library release for building out deep learning applications via [1]GitHub . New [2]oneDNN benchmarks soon for upcoming hardware releases.



[1] https://github.com/uxlfoundation/oneDNN/releases/tag/v3.8

[2] https://openbenchmarking.org/test/pts/onednn#results



phoronix

half-done, n.:
This is the best way to eat a kosher dill -- when it's still crunchy,
light green, yet full of garlic flavor. The difference between this
and the typical soggy dark green cucumber corpse is like the
difference between life and death.

You may find it difficult to find a good half-done kosher dill there
in Seattle, so what you should do is take a cab out to the airport,
fly to New York, take the JFK Express to Jay Street-Borough Hall,
transfer to an uptown F, get off at East Broadway, walk north on
Essex (along the park), make your first left onto Hester Street, walk
about fifteen steps, turn ninety degrees left, and stop. Say to the
man, "Let me have a nice half-done." Worth the trouble, wasn't it?
-- Arthur Naiman, "Every Goy's Guide to Yiddish"