News: 0001562268

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Open-Source & Rust-Written Burn MATMUL Kernels Can Compete With NVIDIA's CUDA/cuBLAS

([Programming] 5 Hours Ago Burn)


The open-source and Rust-based Burn deep learning framework developed by Tracel AI shared that their open-source matrix multiplication kernel performance can compete with and even outperform the NVIDIA CUDA cuBLAS performance. Plus Burn isn't limited to just NVIDIA GPUs but can work on most hardware/drivers, including a Vulkan back-end.

On Friday the Burn developers published a lengthy blog post going over their exciting MATMUL kernel performance relative to NVIDIA CUDA cuBLAS/CUTLASS and showing some really splendid results for this cross-platform, Rust open-source DL framework.

For those wanting to get straight to the exciting part:

"On CUDA, our Simple algorithm is remarkably fast and stable, nearly always outperforming the cuBLAS/CUTLASS reference. However, the MultiRow variant truly stands out in the end; it is also the top performer across the board on Vulkan."

Some really enticing data. Those wanting to learn more about the Burn MATMUL kernel performance can see [1]the Burn.dev blog post .

I haven't looked at Burn previously until a Phoronix reader pointed it out but I'll be checking out their open-source software for use in some possible future benchmarks, namely [2]burn-bench .



[1] https://burn.dev/blog/sota-multiplatform-matmul/

[2] https://github.com/tracel-ai/burn-bench



oleid

It's getting uncommonly easy to kill people in large numbers, and the first
thing a principle does -- if it really is a principle -- is to kill somebody.
-- Dorothy L. Sayers, "Gaudy Night"