GPU goliaths are devouring supercomputing – and legacy storage can't feed the beast

(2025/11/14)

Reference: 1763136009
News link: https://www.theregister.co.uk/2025/11/14/evolving_supercomputers_hpc_ai_and/
Source link:

The supercomputing landscape is fracturing. What once was a relatively unified world of massive multi-processor x86 systems has splintered into competing architectures, each racing to serve radically different masters: traditional academic workloads, extreme-scale physics simulations, and the voracious appetite of AI training runs.

At the center of this upheaval stands Nvidia, whose GPU revolution has not just made inroads, and it has detonated the old order entirely.

The consequences are stark. Legacy storage systems that powered decades of scientific breakthroughs now buckle under AI's relentless, random I/O storms. Facilities designed for sequential throughput face a new reality where metadata can consume 20 percent of all I/O operations. And as GPU clusters scale into the thousands, a brutal economic truth emerges: every second of GPU idle time bleeds money, transforming storage from a support function into a make-or-break competitive advantage.

[1]

We sat down with Ken Claffey, CEO of VDURA, to understand how this seismic shift is forcing a complete rethink of supercomputing infrastructure, from hardware to software, from architecture to economics.

[2]

[3]

Blocks & Files: How do you define a supercomputer and an HPC system? What are the differences between them?

Ken Claffey: The lines are definitely grey and increasingly blurred. Historically the delineation has really been about the size (number of nodes) of the system, as Linux clusters of commodity servicers became the defacto building block (vs previously custom supercomputers like the early Cray systems or NEC vector supercomputers). Today the traditional segmentation of Workgroup, Department, Divisional and Supercomputer probably needs more updating, as a small GPU cluster's dollar value is now such that it would be classified by the analysts as a supercomputer sale.

[4]

Blocks & Files: What different kinds of supercomputer are there, and do they differ by workload and processors?

Ken Claffey: Not all supercomputers are the same. There are Linux Cluster supercomputers. These dominate today’s Top500 list. They are built from thousands of commodity servers connected via InfiniBand or Ethernet or proprietary interconnects. Variants include:

Massively parallel clusters with distributed memory (e.g., the DOE's Frontier). Each node runs its own OS and communicates via message passing.

Commodity clusters built from off-the-shelf x86/GPU servers; hyperscale AI clusters fall here.

Different workloads favor different architectures; CPU-heavy or GPU-heavy, or memory-centric. Weather and physics simulations benefit from vector or massively parallel clusters with low latency interconnects.

Modern AI training often uses GPU heavy commodity clusters.

Special purpose systems serve narrow domains like cryptography or pattern matching, but are gaining traction again in AI-related use cases, specifically for Inference, Grok, SambaNova etc.

[5]

Blocks & Files: Is an Nvidia NVL72 rack-scale GPU server a supercomputer?

Ken Claffey: Nvidia describes its GB200 NVL72 as an “exascale AI supercomputer in a rack.” Each NVL72 encloses 18 compute trays (72 Blackwell GPUs coupled with Grace CPUs) tied together by fifth generation NVLink switches delivering 130 TBps of interconnect bandwidth. The NVLink fabric creates a single unified memory domain with over 1 petabyte per second aggregate bandwidth, and one NVL72 rack can deliver 80 petaflops of AI performance with 1.7 TB of unified HBM memory.

From a purist HPC perspective, a single NVL72 is more accurately a rackscale building block than a full supercomputer, it lacks the external storage and cluster management layers needed for full blown HPC. But when tens or hundreds of NVL72 racks are interconnected with high-performance storage (for example, VDURA V5000), the resulting system absolutely qualifies as a supercomputer. So NVL72 sits at the boundary: an extremely dense GPU cluster that can be part of a larger HPC system.

Blocks & Files: Do you think the Nvidia GPU [6]HBM will or can transfer to other types of supercomputer? Why did Nvidia get HBM developed and not other supercomputer types?

Ken Claffey: High bandwidth memory (HBM) stacks DRAM dies through silicon vias to provide thousand bit wide interfaces; HBM3e can deliver up to 1.8 TB/s per GPU. HBM isn’t unique to Nvidia, AMD’s MI300A/MI300X, Intel’s Ponte Vecchio and many AI accelerators use HBM because streaming data at terabyte per second speeds is essential for feeding hungry cores. HBM adoption depends on economics and package design: GPUs can justify the cost because they deliver very high flops per watt, while general purpose CPUs often rely on DDR/LPDDR memory with lower bandwidth.

Nvidia’s leadership in GPU HBM has been driven by AI’s insatiable demand for memory bandwidth. GPU vendors codesign the silicon with HBM suppliers (Samsung, Micron, SK Hynix) to maximize bandwidth. Traditional supercomputer vendors often focus on CPU centric workloads where large DDR memory footprints matter more than raw bandwidth. We expect HBM to proliferate in GPU-based AI systems and some CPU architectures, but commodity servers will continue to balance cost and capacity with DDR memory. Ultimately, memory technology will spread where the economics make sense.

Blocks & Files: How is the world of supercomputing reacting to AI workloads such as training and inference?

Ken Claffey: The AI revolution has turned HPC facilities into AI factories. It's clear from customers that their application landscape is changing as their users deploy more and more AI based applications which is creating new challenges for the HPC infrastructure as they increase the number of GPUs in their clusters. This in turn impacts storage as AI applications are GPU centric and create spiky, random I/O patterns, causing metadata to become 10–20 percent of I/O. Both training and inference require sustained throughput: Nvidia recommends 0.5 GBps reads and 0.25 GBps writes per GPU for DGX B200 servers and up to 4 GBps per GPU for vision workloads. That means a 10,000 GPU cluster needs 5 TBps read and 2.5 TBps write bandwidth.

To meet this demand, HPC centers are embracing parallel file systems and NVMe first architectures. AI training still relies on high throughput parallel file systems to feed GPUs and handle massive checkpointing, while inference workloads shift toward object stores and key value semantics, requiring strong metadata performance and multitenancy. The rise of GPU accelerators has shifted I/O patterns from large sequential writes to highly random, small file operations. As a result:

HPC facilities are upgrading networks to InfiniBand NDR and Ethernet 400 Gb/s and deploying NVMe‑based storage servers to saturate GPUs.

Vendors are adding [7]GPU Direct and RDMA‑based I/O paths to bypass CPU bottlenecks and reduce latency.

AI and HPC teams increasingly treat data pipelines as production lines, emphasizing resilience and automation. VDURA’s white paper highlights how GPU idle time and slow checkpointing waste money, prompting new storage architectures that minimize stalls.

Blocks & Files: How has supercomputing and HPC storage evolved? What are the main threads?

Ken Claffey: HPC storage has evolved from proprietary, hardware-bound architectures to software-defined, scale-out systems designed for AI and GPU-driven workloads. Additionally while HPC was very much designed on the concept of temporary /Scratch performant file systems, AI is more focused on sustained performance and a broader SLA that cares much more about operational reliability.

From proprietary to software defined: Early HPC relied on closed systems with HA pairs and dedicated RAID controllers. Modern platforms have shifted to SDS models aligned with hyperscaler designs, shared-nothing architectures that scale horizontally across commodity hardware containing NVMe nodes and open supply chains.

Flash & HDDs, not flash-only: The move from HDD to NVMe flash brought massive performance gains, but efficiency at scale now depends on using the full spectrum of media; SLC, TLC, QLC flash and CMR/ [8]SMR HDDs to balance throughput, IOPs endurance, and cost.

Metadata and automation: AI's billions of small files make metadata increasingly a potential performance bottleneck and an increasing percentage of the amount data stored; say 10–20 percent. VDURA's VeLO distributed metadata engine eliminates this bottleneck, supporting billions of operations with ultra-low latency.

Operational Reliability and Resilience at scale. Legacy node local RAID has been replaced by network-level erasure coding for higher resiliency to failures - increasing durability and availability. VDURA's actually offers even more with multi-level erasure coding (MLEC) that achieves greater availability and up to 12 nines of durability, ensuring continuous operation.

HPC storage has evolved into AI-ready, software-defined infrastructure; flash-first, media-aware, metadata-accelerated, and operationally resilient enough to keep pace with the fastest GPUs 24 by 7 by 365.

Blocks & Files: What are the main supercomputer storage systems and how do they differ?

Ken Claffey: Supercomputing storage has diverged along a clear line between legacy, hardware-bound systems and modern, software-defined architectures built for AI and data-intensive workloads.

[9]

Vdura vs other file systems - Click to enlarge

The industry is moving on from hardware-defined "systems" (controller pairs, proprietary arrays) to software-defined storage (SDS) "platforms" that run on commodity NVMe and HDD media. SDS enables faster innovation, mixed-media tiering (SLC, TLC, QLC flash + CMR/SMR HDD), metadata acceleration, and cloud-like scalability - the foundation of VDURA's architecture.

Blocks & Files: Why are there so many of them? Are they suited to different supercomputing workloads?

Ken Claffey: While the HPC ecosystem appears diverse, only a small group of file systems have been proven at production scale across thousands of environments. Many others remain research projects or niche deployments.

Legacy systems vs. software-defined platforms: Legacy HPC file systems like Lustre or GPFS are systems hardware-tied and manually scaled. Modern parallel file systems such as VDURA's PanFS represent software-defined platforms that separate the control and data planes, align with hyperscaler-style shared-nothing architectures, and run on commodity NVMe and HDD supply chains.

Projects vs. Products: Open-source efforts (e.g., [10]DAOS ) push innovation but often remain project-grade, while commercial SDS platforms evolve thanks to long term investment and continuous development into hardened products that balance performance, manageability, and long-term support.

Workload alignment: AI and HPC workloads vary widely, some stream multi-terabyte sequential data, others read billions of tiny files randomly. No single file system can optimize all cases, so purpose-built storage is replacing general-purpose designs like NAS and SAN based systems. Hybrid SDS platforms like VDURA integrate flash and HDD tiers, handle metadata acceleration, offer nearly unlimited linear performance scalability and deliver the availability and durability today's AI factories demand.

There may be many names in HPC storage, but only a few truly operate at scale in production environments and the clear direction is away from legacy hardware systems toward flexible, software-defined, purpose-built data platforms.

Blocks & Files: Why is it that DAOS has not become more popular?

Ken Claffey: DAOS is an open-source project. At this point, it’s viewed more as a collection of technologies than a finished product. It’s now housed at HPE, and I expect they’ll invest to make it a true product, much like I did with Lustre at ClusterStor. That will take many years of heavy investment, large-scale deployments, and operational maturity to take it from ‘project’ to ‘product’.

Blocks & Files: How might VDURA use DAOS? Could PanFS evolve to use DAOS concepts?

Ken Claffey: We see the key-value store (KVS) metadata approach as directionally correct, very similar to how PanFShas long operated with its own integrated KVS. This same concept is now reflected in the VDURA Data Platform, where we’ve further advanced and scaled our metadata engine to meet the demands of modern AI and HPC workloads.

Blocks & Files: There are IOPS and throughput. Tell me why throughput matters for AI workloads

Ken Claffey: IOPS (input/output operations per second) measures how many small 4 KiB operations a storage system can perform. It is a fine metric for transactional databases and VMs. But AI and HPC workloads stream large datasets and checkpoints. Focusing on IOPS can mislead: AI workloads are throughput driven, measured in GBps or TBps, because they move large, sequential datasets. High bandwidth ensures that GPUs remain busy and that checkpointing does not stall training. Parallel file systems distribute data across many nodes to deliver this aggregate bandwidth. Without sufficient throughput, GPUs are starved and expensive compute cycles are wasted.

VDURA’s V5000 system delivers >60 GBps per node and >2 TBps per rack. This ensures that AI pipelines are limited by model complexity, not storage. VDURA also provides up to 100Ms IOPS per rack, so it handles meta data heavy inference workloads as well. The lesson: throughput and IOPS both matters, but for AI training, throughput is king.

Blocks & Files: Do parallel storage systems bring specific advantages to supercomputers that non-parallel (serial?) storage systems cannot provide?

Ken Claffey: Absolutely. Non-parallel NAS systems like NetApp ONTAP rely on a small number of controllers handling I/O. As I previously pointed out, general purpose NAS cannot deliver the throughput or resiliency required for AI. NetApp’s AFX is their attempt at a parallel file system. Mainstream storage systems were designed for general purpose computing.

In a clear acknowledgement of advanced computing in AI, NetApp has acknowledged that they need a new type of product that is a parallel file system. They were not prepared for the future and now they are trying to catch up.

[11]Nvidia, Oracle to build 7 supercomputers for Department of Energy, including its largest ever

[12]DGX Spark, Nvidia's tiniest supercomputer, tackles large models at solid speeds

[13]AI isn't throttling HPC. It is HPC

[14]NextSilicon Maverick-2 promises to blow away the HPC market Nvidia left behind

Blocks & Files: Is GPU Direct a way of making non-parallel storage systems, like NetApp, effectively parallel?

Ken Claffey: No. If you’re not parallel you are limited to how fast the one path can go. Sure, GPU Direct can make that one path go faster, although that is not as scalable as a parallel file system that can go down many paths simultaneously. Especially when those parallel paths are GPU Direct enabled.

Blocks & Files: Now that VDURA’s PanFS supports GPU Direct, how else might VDURA adapt it to serve Nvidia GPU servers better? For example, KV Cache offload.

Ken Claffey: We are working on things in this area, stay tuned. ®

Get our [15]Tech Resources

[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aRdgJ-8BfUWXkmjapjXFtgAAAUQ&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aRdgJ-8BfUWXkmjapjXFtgAAAUQ&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aRdgJ-8BfUWXkmjapjXFtgAAAUQ&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aRdgJ-8BfUWXkmjapjXFtgAAAUQ&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aRdgJ-8BfUWXkmjapjXFtgAAAUQ&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://blocksandfiles.com/2022/04/30/hbm-2/

[7] https://blocksandfiles.com/2022/04/30/gpudirect/

[8] https://blocksandfiles.com/2022/05/06/smr/

[9] https://regmedia.co.uk/2025/11/12/vdura.jpg

[10] https://blocksandfiles.com/2025/04/15/daos-post-optane-resurrection/

[11] https://www.theregister.com/2025/10/28/nvidia_oracle_supercomputers_doe/

[12] https://www.theregister.com/2025/10/14/dgx_spark_review/

[13] https://www.theregister.com/2025/11/11/ai_hpc_opinion_piece/

[14] https://www.theregister.com/2025/10/22/nextsilicon_maverick2_fill_nvidia_hpc_void/

[15] https://whitepapers.theregister.com/

News: 1763136009

GPU goliaths are devouring supercomputing – and legacy storage can't feed the beast