News: 1762995993

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Microsoft is building datacenter superclusters that span continents

(2025/11/13)


Microsoft believes the next generation of AI models will use hundreds of trillions of parameters. To train them, it's not just building bigger, more efficient datacenters – it's started connecting distant facilities using high-speed networks spanning hundreds or thousands of miles.

The first node of this multi-datacenter cluster came online in October, connecting Microsoft's datacenter campus in Mount Pleasant, Wisconsin, to a facility in Atlanta, Georgia.

The software giant’s goal is to eventually scale AI workloads across datacenters using similar methods as employed to distribute high-performance computing and AI workloads across multiple servers today.

[1]

"To make improvements in the capabilities of the AI, you need to have larger and larger infrastructure to train it," said Microsoft Azure CTO Mark Russinovich in a canned [2]statement . "The amount of infrastructure required now to train these models is not just one datacenter, not two, but multiples of that."

[3]

[4]

These aren't any ordinary datacenters, either. The facilities are the first in a family of bit barns Microsoft is calling its “Fairwater” clusters. These facilities are two stories tall, use direct-to-chip liquid cooling, and consume "almost zero water," Microsoft boasts.

Eventually, Microsoft envisions this network of datacenters will scale to hundreds of thousands of diverse GPUs chosen to match workloads and availability. At its Atlanta facility, Microsoft will deploy Nvidia's GB200 NVL72 [5]rack systems , each rated to host over 120 kilowatts of kit and to offer 720 petaFLOPS of sparse FP8 compute for training, helped by the presence of 13TB HBM3e memory,.

Spreading the load

By connecting its datacenters, Microsoft will be able to train much larger models and give itself the chance to choose different locations for its facilities – meaning it can choose places with cheap land, cooler climates, and – perhaps most importantly – access to ample power.

Microsoft doesn't specify what technology it's using to bridge the roughly 1,000 kilometer (as the vulture flies) distance between the two datacenters, but it has plenty of options.

[6]

Last month, Cisco [7]revealed the Cisco 8223, a 51.2 Tbps router designed to connect AI datacenters up to 1,000 kilometers away. Broadcom intends its Jericho 4 hardware, [8]announced in August, to do the same job and provide similar bandwidth.

Meanwhile, Nvidia, which has quietly become one of the largest networking vendors in the world on the back of the AI boom, [9]has teased its Spectrum-XGS network switches with crypto-miner-turned-rent-a-GPU outfit Coreweave signed up as an early adopter.

[10]What happens when we can't just build bigger AI datacenters anymore?

[11]DeepMind working on distributed training of large AI models

[12]Cisco's new router unites disparate datacenters into AI training behemoths

[13]Broadcom's Jericho4 ASICs just opened the door to multi-datacenter AI training

We've asked Microsoft to comment on which of these technologies it's using at its Fairwater facilities, and will update this story if we hear back. But Redmond’s close ties to Nvidia certainly makes Spectrum-XGS a likely contender.

Microsoft is famously one of the few hyperscalers that's standardized on Nvidia's InfiniBand network protocol over Ethernet or a proprietary data fabric like Amazon Web Service's EFA for its high-performance compute environments.

While Microsoft has no shortage of options for stitching datacenters together, distributing AI workloads without incurring bandwidth- or latency-related penalties remains a topic of interest to researchers.

[14]

They're making good progress: Readers may recall that earlier this year, Google's DeepMind team [15]published a report showing that many of the challenges can be overcome by compressing models during training and strategically scheduling communications between datacenters. ®

Get our [16]Tech Resources



[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://news.microsoft.com/source/features/ai/from-wisconsin-to-atlanta-microsoft-connects-datacenters-to-build-its-first-ai-superfactory/

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://www.theregister.com/2024/03/21/nvidia_dgx_gb200_nvk72/

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://www.theregister.com/2025/10/08/cisco_multi_datacenter/

[8] https://www.theregister.com/2025/08/06/broadcom_jericho_4/

[9] https://investor.nvidia.com/news/press-release-details/2025/NVIDIA-Introduces-Spectrum-XGS-Ethernet-to-Connect-Distributed-Data-Centers-Into-Giga-Scale-AI-Super-Factories/default.aspx

[10] https://www.theregister.com/2025/01/24/build_bigger_ai_datacenters/

[11] https://www.theregister.com/2025/02/11/deepmind_distributed_model_training_research/

[12] https://www.theregister.com/2025/10/08/cisco_multi_datacenter/

[13] https://www.theregister.com/2025/08/06/broadcom_jericho_4/

[14] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[15] https://www.theregister.com/2025/02/11/deepmind_distributed_model_training_research/

[16] https://whitepapers.theregister.com/



And brings with it the spread of American spying

VoiceOfTruth

This should not be allowed by any country unless MS is firmly under local sovereignty. That means no access for the American regime.

Maybe not, but maybe

Anonymous Coward

Yeah, I'm [1]with LeCun on this one, that LLMs are yesterday's total deadbeat deadend for dimwit deadpans. Multi-continent-spanning datacenter superclusters for 100 trillion-parameter " models " won't help this language-limited tech ...

However, if these sperclusters can run humongous FP64 HPC workloads efficiently (eg. as nutritious DOE-like [2]hub-ba bubba spoke-aroonis ), then I'm all for 'em, big time, imho! ;)

[1] https://observer.com/2025/11/yann-lecun-leave-meta-launch-world-models-startup/

[2] https://www.nextplatform.com/2023/03/14/doe-wants-a-hub-and-spoke-system-of-hpc-systems/

NapTime ForTruth

All of this energy and destruction and investment for what? Slightly improved synthesized cat pictures, high-availability lies at hyper scale, and studies regarding how many artificially unintelligent angels can dance on the head of a pin - for some notional values of angels, dances, and pins.

Such a waste.

Thoreau was right, we have become the tools of our tools.

If a system is administered wisely,
its users will be content.
They enjoy hacking their code
and don't waste time implementing
labor-saving shell scripts.
Since they dearly love their accounts,
they aren't interested in other machines.
There may be telnet, rlogin, and ftp,
but these don't access any hosts.
There may be an arsenal of cracks and malware,
but nobody ever uses them.
People enjoy reading their mail,
take pleasure in being with their newsgroups,
spend weekends working at their terminals,
delight in the doings at the site.
And even though the next system is so close
that users can hear its key clicks and biff beeps,
they are content to die of old age
without ever having gone to see it.