News: 1764003197

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

How high-end supercomputer filesystem DAOS can break out of its niche

(2025/11/24)


DAOS has been a great success in the traditional HPC/supercomputing world, but is nowhere in the new, AI-focused, GPU supercomputing arena. What will it take for DAOS to find customers outside its high-end, legacy supercomputing niche?

The [1]DAOS parallel filesystem has a strong IO500 presence, holding positions 1 (Argonne) and 2 (LRZ) in the current [2]Production SC25 list . The two, according to HPE, combined have four times the storage benchmark score of the next 30 storage systems. DAOS also appears at number 13 (Zuse Institute, Berlin), and 17 (China Telecom Research Institute). The software appears more often in the full IO500 list, with 16 of the top 30 submissions using DAOS, and 26 of the top 45 being DAOS devotees.

DAOS has stronger still representation in the IO500 10-Node production list: systems with just 10 clients. It holds the top 3 positions plus number 6.

[3]

Top 10 IO500 10-Node production list entries - Click to enlarge

But DAOS is widespread, with 15 to 20+ production systems in active use. For its use to spread, it has to demonstrate, we understand, significantly better storage IO performance than competing software, meaning supporting more processing cores and delivering higher bandwidth. DAOS is open-source code and no single parallel processing storage system supplier is reliant on it. HPE has its ClusterStor as well as DAOS. DDN has its Lustre software, and VAST and WEKA each have their own software.

The IO500 measures storage IO performance while the [4]TOP500 rates pure supercomputer power. Nvidia GPU systems are appearing in it, with the number 17 position held by CHIE-4, a DGX B200 system. Nvidia-based systems also appear at positions 17, 22, 24, 29, 30, and 32. AMD GPU systems are also appearing.

[5]

Enakta Labs co-founder Denis Nuja reckons that in supercomputers, “Lustre is still pretty much number one. … I haven't seen a GPFS (Storage Scale) system in a long time.”

[6]

[7]

Modern supercomputers, in his view, are "usually a two storage system." Nuja cites the recommended requirements for storage from AMD, based on a 10,000 GPU specification: "They have two specs. One is a production spec and the other one is a high resolution spec. … For production you need like five terabytes per second of reads, and two and a half terabytes per second of writes, which is fine. And then you look at the high resolution spec, they mentioned 40 terabytes per second of read and 20 terabytes a second of write," which is much higher.

“If you look at 40 terabytes per second reads and 20 [in a] commercially viable way,” suppliers may need to over-provision the system enormously, to meet the write target, Nuja said.

[8]

A Lustre system would have to deploy a lot of extra capacity to hit these numbers, for example, whereas DAOS can match these numbers.

Nuja’s view is that, in the supercomputing world, Lustre is by far the most popular filesystem in use, but it’s weak at the top end, where DAOS reigns, and getting nibbled away at the lower end and in the GPU supercomputer area by VAST Data and WEKA. He says that DAOS is great with single huge systems. Lustre, VAST and WEKA, he thinks, are excellent with smaller partitions: “They are effectively building a lot of smaller systems. So you can actually segregate and have 15 VAST clusters or Lustre or WEKA or whatever, and it's a very different story.”

DAOS would have an easier path to growth if it supported GPUs better, Nvidia GPUs specifically, but there is no GPUDirect support, for example.

[9]

Nuja thinks Nvidia, by moving to object storage, is “effectively democratizing storage … anybody can pretty much plug in, whereas the GPUDirect story and all that stuff was very much under the full control of Nvidia.”

Our thinking is that Nvidia has primarily and effectively supported, and helped, parallel filesystem storage suppliers who can help it sell lots of GPUs and its software. Market presence wins. From that point of view, DAOS, with 25 or fewer systems in deployment, represents small potatoes.

But object world interface requirements are different. Nuja tells us: “We have developed an S3 interface. We can plug it into everything that consumes S3. Nvidia AIStore consumes S3. So that should work [but] we haven't tested.”

PyTorch is another GPU system data access option in his view: “if you are using PyTorch for example; a lot of people are using PyTorch for computing on their GPUs, we built the whole integration for PyTorch. We natively integrate into PyTorch. So there's nothing preventing you from doing that. It has nothing to do with the GPU itself. It has to do with the framework that's on top of it.”

[10]Nvidia, Oracle to build 7 supercomputers for Department of Energy, including its largest ever

[11]Scientific computing is about to get a massive injection of AI

[12]HPE details Vera Rubin blades for next-gen Cray supercomputers

[13]Nvidia's green500 dominance continues as France's Kairos super takes efficiency title

How could DAOS grow outside its top-end supercomputing storage niche?

Nuja said: “We need to build on more integrations and we need to get DAOS to a position where it's easily manageable, deployed, and everything, which is exactly what Enakta has been doing for two years.

“The third thing, which is the most problematic: it needs to become a choice in the minds of the users, end users. Because at the moment, all of them, when they hear about DAOS, say 'We heard about it, but it's something obscure.' …We need to educate the market, the end users, that DAOS is an option."

He added, “It's not only some kind of a science experiment, that one or two supercomputers use, and that takes time. It takes time, it takes effort. Let's be honest, the whole [14]Optane debacle didn't really help. That's behind us, thankfully.”

Nuja continued: "Now there's a clear path forward. We're doing all we can to develop a lot of these interfaces that will help users be able to consume DAOS in a more sensible way. And we think that DAOS will continue having an edge versus other systems for the foreseeable future from a performance standpoint.” ®

Get our [15]Tech Resources



[1] https://blocksandfiles.com/2025/10/27/latest-oak-ridge-national-labs-discovery-supercomputer-gets-daos-storage-option/

[2] https://io500.org/

[3] https://regmedia.co.uk/2025/11/24/io500.jpg

[4] https://top500.org/

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aSSPJZEczf8_HK-S6pjn2QAAAAY&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aSSPJZEczf8_HK-S6pjn2QAAAAY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aSSPJZEczf8_HK-S6pjn2QAAAAY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aSSPJZEczf8_HK-S6pjn2QAAAAY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/202511supercomputingmonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aSSPJZEczf8_HK-S6pjn2QAAAAY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[10] https://www.theregister.com/2025/10/28/nvidia_oracle_supercomputers_doe/

[11] https://www.theregister.com/2025/11/18/future_of_scientific_computing/

[12] https://www.theregister.com/2025/11/13/hpe_details_vera_rubin_blades/

[13] https://www.theregister.com/2025/11/21/nvidia_green500/

[14] https://blocksandfiles.com/2025/04/15/daos-post-optane-resurrection/

[15] https://whitepapers.theregister.com/



QOTD:
"He eats like a bird... five times his own weight each day."