News: 0175215845

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Researchers Claim New Technique Slashes AI Energy Use By 95% (decrypt.co)

(Tuesday October 08, 2024 @11:30PM (BeauHD) from the would-you-look-at-that dept.)


Researchers at BitEnergy AI, Inc. have developed Linear-Complexity Multiplication ( [1]L-Mul ), a technique that [2]reduces AI model power consumption by up to 95% by replacing energy-intensive floating-point multiplications with simpler integer additions. This method promises significant energy savings without compromising accuracy, but it requires specialized hardware to fully realize its benefits. Decrypt reports:

> L-Mul tackles the AI energy problem head-on by reimagining how AI models handle calculations. Instead of complex [3]floating-point multiplications , L-Mul approximates these operations using integer additions. So, for example, instead of multiplying 123.45 by 67.89, L-Mul breaks it down into smaller, easier steps using addition. This makes the calculations faster and uses less energy, while still maintaining accuracy. The results seem promising. "Applying the L-Mul operation in tensor processing hardware can potentially reduce 95% energy cost by element wise floating point tensor multiplications and 80% energy cost of dot products," the researchers claim. Without getting overly complicated, what that means is simply this: If a model used this technique, it would require 95% less energy to think, and 80% less energy to come up with new ideas, according to this research.

>

> The algorithm's impact extends beyond energy savings. L-Mul outperforms current 8-bit standards in some cases, achieving higher precision while using significantly less bit-level computation. Tests across natural language processing, vision tasks, and symbolic reasoning showed an average performance drop of just 0.07% -- a negligible tradeoff for the potential energy savings. Transformer-based models, the backbone of large language models like GPT, could benefit greatly from L-Mul. The algorithm seamlessly integrates into the attention mechanism, a computationally intensive part of these models. Tests on popular models such as Llama, Mistral, and Gemma even revealed some accuracy gain on certain vision tasks.

>

> At an operational level, L-Mul's advantages become even clearer. The research shows that multiplying two float8 numbers (the way AI models would operate today) requires 325 operations, while L-Mul uses only 157 -- less than half. "To summarize the error and complexity analysis, L-Mul is both more efficient and more accurate than fp8 multiplication," the study concludes. But nothing is perfect and this technique has a major achilles heel: It requires a special type of hardware, so the current hardware isn't optimized to take full advantage of it. Plans for specialized hardware that natively supports L-Mul calculations may be already in motion. "To unlock the full potential of our proposed method, we will implement the L-Mul and L-Matmul kernel algorithms on hardware level and develop programming APIs for high-level model design," the researchers say.



[1] https://arxiv.org/pdf/2410.00907

[2] https://decrypt.co/285154/new-technique-slashes-ai-energy

[3] https://en.wikipedia.org/wiki/Floating-point_arithmetic



Re: This is not a new Technique (Score:2)

by klipclop ( 6724090 )

How else do will they get seed investors and floods of customers to buy their hardware? :D

Re: (Score:1)

by mdvx ( 4169463 )

Seems like a couple Gen-Zs have rediscovered the fundamentals of computing, after couple years of college, 50 years later!

I prefer the most unjust peace to the most righteous war.
-- Cicero

Even peace may be purchased at too high a price.
-- Poor Richard