News: 0179838972

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Alibaba Cloud Says It Cut Nvidia AI GPU Use By 82% With New Pooling System (tomshardware.com)

(Tuesday October 21, 2025 @11:24AM (BeauHD) from the would-you-look-at-that dept.)


Alibaba Cloud claims its new Aegaeon GPU pooling system [1]cuts Nvidia GPU use by 82% , letting 213 H20 accelerators handle workloads that previously required 1,192. The advancements have been [2]detailed in a paper (PDF) at the 2025 ACM Symposium on Operating Systems (SOSP) in Seoul. Tom's Hardware reports:

> Unlike training-time breakthroughs that chase model quality or speed, Aegaeon is an inference-time scheduler designed to maximize GPU utilization across many models with bursty or unpredictable demand. Instead of pinning one accelerator to one model, Aegaeon virtualizes GPU access at the token level, allowing it to schedule tiny slices of work across a shared pool. This means one H20 could serve several different models simultaneously, with system-wide "goodput" -- a measure of effective output -- rising by as much as nine times compared to older serverless systems.

>

> The system was tested in production over several months, according to the paper, which lists authors from both Peking University and Alibaba's infrastructure division, including CTO Jingren Zhou. During that window, the number of GPUs needed to support dozens of different LLMs -- ranging in size up to 72 billion parameters -- fell from 1,192 to just 213. While the paper does not break down which models contributed most to the savings, reporting by the [3]South China Morning Post says the tests were conducted using Nvidia's H20, one of the few accelerators still legally available to Chinese buyers under current U.S. export controls.



[1] https://www.tomshardware.com/tech-industry/semiconductors/alibaba-says-new-pooling-system-cut-nvidia-gpu-use-by-82-percent

[2] https://ennanzhai.github.io/pub/sosp25-aegaeon.pdf

[3] https://www.scmp.com/business/article/3329450/alibaba-cloud-claims-slash-nvidia-gpu-use-82-new-pooling-system?module=top_story&pgtype=section



Jensen's not gonna like this (Score:3)

by Kokuyo ( 549451 )

But it's sure good to see for the rest of us.

Re:Jensen's not gonna like this (Score:5, Insightful)

by AleRunner ( 4556245 )

If you increase the efficiency of use of a resource you also increase the number of use cases that can be addressed. That can easily end up with more of the resource being used. An Nvidia CPU is now 5.5 times as valuable as it would have been before. If Nvidia has any sense they will try to build an open source scheduler like this that anyone can drop into any cloud.

Re: (Score:1)

by Kokuyo ( 549451 )

I think that's a big non-sequitur you've produced there.

Re:Jensen's not gonna like this (Score:5, Interesting)

by Smidge204 ( 605297 )

It's called [1]Jevons Paradox [wikipedia.org]

In short: the more efficiently you can use a resource, the better the ROI you get for investing in the utilization of that resource, and the more people consume.

This applies to computing power. Maybe it doesn't make sense in 1974 for a small business to invest in computer workstations for their staff. But by 1994 computers were so much more powerful, so much more capable, and actually cheaper relative to that capability (read: more efficient) that it now makes no sense to NOT invest in the technology for your business.

If this succeeds in lowering the barrier to entry for leasing AI data center resources, expect demand to go up as more people try to do more things.

=Smidge=

[1] https://en.wikipedia.org/wiki/Jevons_paradox

That's nothing. (Score:2, Funny)

by derplord ( 7203610 )

I cut GPU usage by 100% by not having anything to do with this useless bubble shit.

Re: That's nothing. (Score:5, Interesting)

by EldoranDark ( 10182303 )

Which is kinda relevant. Does this better utilisation of hardware mean we still use nearly as much energy? And now in a denser configuration?

Re: (Score:2)

by coofercat ( 719737 )

Fewer GPUs = lower power consumption, for sure, although of course, more of the transistors in those remaining GPUs are active now. Either way, the difference in quantity is so great, there *must* be a power saving here. The cost of keeping them switched on is appreciable, and I assume you can only put a certain number of GPUs on a given motherboard, so you presumably can have less servers running the GPUs as well.

Either way, this sort of thing can only be a good thing for the world because of reduced consu

Re: That's nothing. (Score:2)

by EldoranDark ( 10182303 )

It sounds like the problem this addresses is that when you have multiple models available for use, lots of cards sit idle. Judging by consumer GPUs, that could be the difference between 15w and 500w. The cynical take is that it's not a way to build fewer GPUs. It's a way to run more power through the existing ones. I don't think we're running out of ideas on where else to cram more ai output.

Row Hammer? (Score:2)

by Tablizer ( 95088 )

> Instead of pinning one accelerator to one model, Aegaeon virtualizes GPU access at the token level, allowing it to schedule tiny slices of work across a shared pool.

Does that risk a Row-Hammer-like breach whereby one customer's query can snoop on another's?

Necessity is the mother of invention (Score:3)

by marcle ( 1575627 )

The US hasn't had to be competitive in that way, and China now has more fire in the belly. I for one welcome our new Chinese AI overlords.

Re:Necessity is the mother of invention (Score:4)

by cusco ( 717999 )

Behold the power of the all-purpose diplomatic tool, Sanctions! If you want to make your enemy more self-reliant and independent of you just sanction the shit our of their country and before long they won't need you at all. If you want to promote innovation in your enemy prevent them from buying the tools they need so that they develop their own that are superior to what you sell. The brain trust in Washington DC have managed to come up with a program to promote innovation and self reliance and make the US no longer necessary to the world. It's ingenious!

Oh, what's that you say? That wasn't really their goal? Really? Seems like the entirely predictable results of implementing tens of thousands of sanctions all over the world, are our "leaders" really that stupid? We need better leaders then.

"Our journeys to the stars will be made on spaceships created by determined,
hardworking scientists and engineers applying the principles of science, not
aboard flying saucers piloted by little gray aliens from some other dimension."
-- Robert A. Baker, "The Aliens Among Us: Hypnotic Regression
Revisited", The Skeptical Inquirer, Vol. XII, No. 2