Microsoft is building datacenter superclusters that span continents
- Reference: 1762995993
- News link: https://www.theregister.co.uk/2025/11/13/microsoft_fairwater_dataceter_superclusters/
- Source link:
The first node of this multi-datacenter cluster came online in October, connecting Microsoft's datacenter campus in Mount Pleasant, Wisconsin, to a facility in Atlanta, Georgia.
The software giant’s goal is to eventually scale AI workloads across datacenters using similar methods as employed to distribute high-performance computing and AI workloads across multiple servers today.
[1]
"To make improvements in the capabilities of the AI, you need to have larger and larger infrastructure to train it," said Microsoft Azure CTO Mark Russinovich in a canned [2]statement . "The amount of infrastructure required now to train these models is not just one datacenter, not two, but multiples of that."
[3]
[4]
These aren't any ordinary datacenters, either. The facilities are the first in a family of bit barns Microsoft is calling its “Fairwater” clusters. These facilities are two stories tall, use direct-to-chip liquid cooling, and consume "almost zero water," Microsoft boasts.
Eventually, Microsoft envisions this network of datacenters will scale to hundreds of thousands of diverse GPUs chosen to match workloads and availability. At its Atlanta facility, Microsoft will deploy Nvidia's GB200 NVL72 [5]rack systems , each rated to host over 120 kilowatts of kit and to offer 720 petaFLOPS of sparse FP8 compute for training, helped by the presence of 13TB HBM3e memory,.
Spreading the load
By connecting its datacenters, Microsoft will be able to train much larger models and give itself the chance to choose different locations for its facilities – meaning it can choose places with cheap land, cooler climates, and – perhaps most importantly – access to ample power.
Microsoft doesn't specify what technology it's using to bridge the roughly 1,000 kilometer (as the vulture flies) distance between the two datacenters, but it has plenty of options.
[6]
Last month, Cisco [7]revealed the Cisco 8223, a 51.2 Tbps router designed to connect AI datacenters up to 1,000 kilometers away. Broadcom intends its Jericho 4 hardware, [8]announced in August, to do the same job and provide similar bandwidth.
Meanwhile, Nvidia, which has quietly become one of the largest networking vendors in the world on the back of the AI boom, [9]has teased its Spectrum-XGS network switches with crypto-miner-turned-rent-a-GPU outfit Coreweave signed up as an early adopter.
[10]What happens when we can't just build bigger AI datacenters anymore?
[11]DeepMind working on distributed training of large AI models
[12]Cisco's new router unites disparate datacenters into AI training behemoths
[13]Broadcom's Jericho4 ASICs just opened the door to multi-datacenter AI training
We've asked Microsoft to comment on which of these technologies it's using at its Fairwater facilities, and will update this story if we hear back. But Redmond’s close ties to Nvidia certainly makes Spectrum-XGS a likely contender.
Microsoft is famously one of the few hyperscalers that's standardized on Nvidia's InfiniBand network protocol over Ethernet or a proprietary data fabric like Amazon Web Service's EFA for its high-performance compute environments.
While Microsoft has no shortage of options for stitching datacenters together, distributing AI workloads without incurring bandwidth- or latency-related penalties remains a topic of interest to researchers.
[14]
They're making good progress: Readers may recall that earlier this year, Google's DeepMind team [15]published a report showing that many of the challenges can be overcome by compressing models during training and strategically scheduling communications between datacenters. ®
Get our [16]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://news.microsoft.com/source/features/ai/from-wisconsin-to-atlanta-microsoft-connects-datacenters-to-build-its-first-ai-superfactory/
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://www.theregister.com/2024/03/21/nvidia_dgx_gb200_nvk72/
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[7] https://www.theregister.com/2025/10/08/cisco_multi_datacenter/
[8] https://www.theregister.com/2025/08/06/broadcom_jericho_4/
[9] https://investor.nvidia.com/news/press-release-details/2025/NVIDIA-Introduces-Spectrum-XGS-Ethernet-to-Connect-Distributed-Data-Centers-Into-Giga-Scale-AI-Super-Factories/default.aspx
[10] https://www.theregister.com/2025/01/24/build_bigger_ai_datacenters/
[11] https://www.theregister.com/2025/02/11/deepmind_distributed_model_training_research/
[12] https://www.theregister.com/2025/10/08/cisco_multi_datacenter/
[13] https://www.theregister.com/2025/08/06/broadcom_jericho_4/
[14] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aRVl6NBdhFCnASkDJNIN6wAAAVc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[15] https://www.theregister.com/2025/02/11/deepmind_distributed_model_training_research/
[16] https://whitepapers.theregister.com/
Maybe not, but maybe
Yeah, I'm [1]with LeCun on this one, that LLMs are yesterday's total deadbeat deadend for dimwit deadpans. Multi-continent-spanning datacenter superclusters for 100 trillion-parameter " models " won't help this language-limited tech ...
However, if these sperclusters can run humongous FP64 HPC workloads efficiently (eg. as nutritious DOE-like [2]hub-ba bubba spoke-aroonis ), then I'm all for 'em, big time, imho! ;)
[1] https://observer.com/2025/11/yann-lecun-leave-meta-launch-world-models-startup/
[2] https://www.nextplatform.com/2023/03/14/doe-wants-a-hub-and-spoke-system-of-hpc-systems/
All of this energy and destruction and investment for what? Slightly improved synthesized cat pictures, high-availability lies at hyper scale, and studies regarding how many artificially unintelligent angels can dance on the head of a pin - for some notional values of angels, dances, and pins.
Such a waste.
Thoreau was right, we have become the tools of our tools.
And brings with it the spread of American spying
This should not be allowed by any country unless MS is firmly under local sovereignty. That means no access for the American regime.