Wanted: A handy metric for gauging if GPUs are being used optimally
- Reference: 1747737429
- News link: https://www.theregister.co.uk/2025/05/20/gpu_metric/
- Source link:
According to some sources, the cost of an Nvidia H100 can be anywhere from $27,000 to $40,000, while renting GPUs via a cloud provider instead is, for example, priced at $6.98 per hour for a H100 instance on Microsoft's Azure platform. That's just for a single GPU, and naturally AI training will often require more.
Many AI development teams are also unaware of their actual GPU utilization, often assuming higher levels than those achieved in practice
Users want to keep those units working as efficiently as possible however research literature, disclosures by AI cluster operators, and model benchmarks all suggest that GPU resources are often wasted, Uptime says in a new report, " [1]GPU utilization is a confusing metric ."
Many AI development teams are also unaware of their actual GPU utilization, often assuming higher levels than those achieved in practice.
Uptime, which created the Tier classification levels for datacenters, says GPU servers engaged in training are only operational about 80 percent of the time, and while running, even well-optimized models are only likely to use 35 to 45 percent of compute performance that the silicon can deliver.
[2]
Having a simple usage metric for GPUs would be a boon for the industry, writes the report author, research analyst (and former Reg staffer) Max Smolaks. But, he says, GPUs are not comparable with other server components and require fresh ways of accounting for performance.
[3]
[4]
Current ways of tracking accelerator utilization include monitoring the average operational time for the entire server node, or tracking individual GPU load via tools supplied by the hardware provider itself, typically Nvidia or AMD.
The first method is of limited use to datacenter operators, although it may give an overall power consumption of a cluster over time. The second is the most commonly used, the report says, but not always the best metric for understanding GPU efficiency as the tools typically measure what proportion of processing elements on the chip are executing at a given time and do not take account of the actual work being done.
[5]
A better method, according to Uptime, is model FLOPS (floating point operations per second) utilization, or [6]MFU . This tracks the ratio of the observed performance of the model (measured in tokens per second) to the theoretical maximum performance of the underlying hardware, with a higher MFU equating to higher efficiency, which means shorter (and therefore less costly) training runs.
The downside is that this metric, introduced by Google Research, is difficult to calculate and the resultant figures may appear puzzlingly low, with even well-optimized models only delivering between 35 and 45 percent MFU.
[7]Qualcomm confirms it's getting into the datacenter market, probably for AI
[8]Nvidia opens up speedy NVLink interconnect to custom CPUs, ASICs
[9]China launches an AI cloud into orbit -12 sats for now, 2,800 in coming years
[10]CoreWeave may have built a house of (graphics) cards
This is because performance is impacted by factors such as the network latency and storage throughput, which mean a 100 percent score is unachievable in practice; results above 50 percent represent the current pinnacle.
Uptime concludes there is currently no entirely satisfactory metric to gauge whether GPU resources are being used effectively, but that MFU shows promise, particularly as it has a more-or-less direct relationship with power consumption.
More data gathered from real-world deployments is needed to establish what "good" looks like for an efficient AI cluster, the report states, but many organizations treat this information as proprietary and therefore keep it to themselves. ®
Get our [11]Tech Resources
[1] https://uptimeinstitute.com/gpu-utilization-is-a-confusing-metric
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aCynHgsD13qlhmT_QvnKaAAAAAE&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aCynHgsD13qlhmT_QvnKaAAAAAE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aCynHgsD13qlhmT_QvnKaAAAAAE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aCynHgsD13qlhmT_QvnKaAAAAAE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://medium.com/better-ml/using-model-flops-utilization-mfu-7b17de07faec
[7] https://www.theregister.com/2025/05/19/qualcomm_datacenter_products/
[8] https://www.theregister.com/2025/05/19/nvidia_nvlink_fusion/
[9] https://www.theregister.com/2025/05/19/asia_tech_news_roundup/
[10] https://www.theregister.com/2025/05/16/coreweave_graphics_cards/
[11] https://whitepapers.theregister.com/
The New Old Thing
Utilization evaluation features could be built-in to CPUs and GPUs, and the results made available to software reading a port.
Chip manufacturers see a negative benefit to that.
IBM's OS/360 had an API for user-supplied utilization-accounting software, but that API changed frequently and capriciously.
I can't imagine why. /sarcasm
Re: The New Old Thing
One thing that irritates me hugely about AWS Batch is that you can't find out memory utilisation very easily so you over-provision to avoid jobs crashing and therefore spend more money...
How could we be sure that the GPU manufacturers can't game the system?
Monitor with a third party offering... of course it will cost you more.
Modern tech will always offer a solution to a problem you never knew you had.
Tencent
Perhaps that is what Tencent have done and them made adjustments to fully utilise and not be bothered by the latest and greatest - which they cannot get
Instead of handy metrics, how about architecting software properly in the first place? I’ve seen use-cases for AI which don’t need AI, at least, not as a first resort. They could, for example, use an expert system to refine the request for the AI (and possibly even provide an answer without requiring the AI, and all the compute that that entails, at all) before passing it to the AI.
But efficiency isn’t really a watchword of computing these days. For all their other benefits, modern languages (pretty much anything (not quite everything though) including and since Java) aren’t efficient. They’re safe, yes, they’re easier to develop for. But they aren’t sympathetic to the available resources of the computer.
So yes. It’s an important consideration. But it’s not the only consideration.
So . . .
* It does not work.
* It is destroying the internet, artists, the environment & society.
* It is hugely expensive.
* You cannot even tell if you are getting what you pay for.
Shut up and take my money - where do I sign up? (I can't help reading that as "gouging")