News: 1769461748

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Microsoft's Maia 200 promises Blackwell levels of performance for two-thirds the power

(2026/01/26)


Microsoft on Monday unveiled a new in-house AI accelerator to rival Nvidia's Blackwell GPUs.

Fabbed on TSMC's N3 process node, Redmond's second-gen [1]Maia 200 accelerator packs 144 billion transistors capable of churning out a collective 10 petaFLOPS of FP4 performance.

That puts the chip in direct contention with Nvidia's first-generation Blackwell GPUs, like the B200 — at least in terms of inference.

[2]

According to Scott Guthrie, EVP of cloud and AI at Microsoft, the chip has been "specifically optimized for inferencing very large models, including both reasoning and chain of thought."

[3]

[4]

Compared to training, inference is much more sensitive to memory bandwidth. For each token (think words or punctuation) generated, the entirety of the model's active weights needs to be streamed from memory. Because of this, memory bandwidth puts an upper bound on how interactive — that's how many tokens per second per user — a system can generate.

To address this, Maia 200 has been equipped with 216GB of high-speed memory spread across what appears to be six HBM3e stacks, good for a claimed 7TB/s of bandwidth.

[5]

To put that in perspective, Nvidia's B200 GPUs offer between 180 GB and 192 GB of HBM3e with up to 8 TB/s of bandwidth each. More recent iterations of Blackwell increase this to 288 GB but bandwidth remains the same.

Optimizing for inference efficiency

Microsoft is also keen to point out just how much more cost- and power-efficient Maia 200 is than competing accelerators.

"Maia is 30 percent cheaper than any other AI silicon on the market today," Guthrie said in a [6]promotional video .

At 750 watts, the chip uses considerably less power than either Nvidia's chips, which can chew through more than 1,200 watts each. This is low enough that Microsoft says Maia can be deployed in either air- or liquid-cooled datacenters.

However, it's important to remember that Maia is an inference chip. So while it may compare favorably to Nvidia's older Blackwell parts, it's not nearly as versatile.

[7]

Diving into the chip's speeds and feeds, we see Microsoft has made some significant concessions in order to maximize performance per watt.

The chip's tile tensor unit (TTU), which Microsoft calls a tensor core, supports only FP8, FP6, and FP4 datatypes in hardware. So while we still see a 2x jump in FLOPS from FP8 to FP4, workloads requiring 16- or 32-bit precision incur a stiff performance penalty, as they have to be computed on the chip's tile vector processors (TVPs).

[8]

Here's a quick rundown of the Maia 200's speeds and feeds - Click to enlarge

The good news is most LLM inference is now done at lower precisions than BF16. In fact, it's not uncommon for model weights to be stored in a 4-bit block floating point precision like NVFP4 or MXFP4, while the actual activations and KV caches (the model's short-term memory) are computed at a higher precision like MXFP8 in order to maintain accuracy.

Nonetheless, Microsoft isn't lying about this chip being an inference accelerator. Despite some advancements in ultra-low precision training, most GenAI models are still trained at higher precisions, with BF16 still the most common.

All of this is to say, while Maia 200 may be Microsoft's most competitive AI chip to date, don't expect Redmond to cut back its Nvidia GPU orders any time soon, especially with Rubin [9]promised to deliver a 5x uplift in inference performance compared to either Blackwell or Maia 200 launching later this year.

Designed to scale

Maia 200 doesn't just deliver more performance and memory than last-gen, it's also designed to scale to support massive multi-trillion parameter models.

Each Maia 200 is equipped with 2.8 TB/s of bidirectional bandwidth (1.4 TB/s in each direction), which enables it to pool its compute and memory resources across clusters of up to 6,144 chips. That works out to 61 exaFLOPS of AI compute and 1.3 petabytes of HBM3e.

This is achieved using an integrated Ethernet network on chip (NoC), which by our estimates either has 56 200 Gbps or 112 100 Gbps SerDes each. Running atop this is Microsoft's own AI transport layer protocol.

As strange as this might sound at a time when Nvidia is pushing NVLink Fusion and AMD UALink, it's not the first time we've seen Ethernet used this way. AMD is tunneling UALink over Ethernet on its MI455X series chips, and you may recall that Intel used Ethernet for chip-to-chip communications on its Gaudi family of AI accelerators.

[10]Bill Gates-backed startup aims to revive Moore's Law with optical transistors

[11]Intel puts consumer chip production on back burner as datacenters make a run on Xeons

[12]AI networking startup Upscale scores $200M to challenge Nvidia's NVSwitch

[13]House GOP wants final say on AI chip exports after Trump gives Nvidia a China hall pass

As for Microsoft's scale-up topology, the cloud giant says it's using a two-tier scale-up domain, which involves Ethernet packet switches. To us this sounds like a two-layer fat tree topology normally associated with scale-out networks.

To avoid performance bottlenecks in larger clusters, Microsoft can dynamically partition the Maia 200's 272 MB of SRAM into cluster level (CSRAM) and tile-level (TSRAM) pools.

The CSRAM pool functions as a buffer for collective communications, avoiding unnecessary data transfers between the speedy on chip memory and HBM. The TSRAM, meanwhile, serves as a cache for intermediary matrix multiplication operations and attention kernels.

We've reached out to Microsoft for clarification on both its scale-up topology; we'll let you know if we hear anything back.

In any case, Microsoft's networking is clearly designed to ensure it can run even the largest frontier models for its customers, and that includes OpenAI's GPT-5.2.

Maia 200 is already running in Microsoft's Central region in Des Moines, Iowa, with plans to bring it to its West 3 region in Phoenix and other locations in the near future.

Alongside the new chips, Microsoft has also launched an SDK in preview to provide prospective customers the tools they need to start integrating the chip into their workflows – [14]sign up to request access here . The company says the chip will support both PyTorch and Triton kernels, which should lower the barrier to adoption. ®

Get our [15]Tech Resources



[1] https://techcommunity.microsoft.com/blog/azureinfrastructureblog/deep-dive-into-the-maia-200-architecture/4489312

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aXfyH8hTaLxIF_PVcqtmwgAAA1U&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aXfyH8hTaLxIF_PVcqtmwgAAA1U&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aXfyH8hTaLxIF_PVcqtmwgAAA1U&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aXfyH8hTaLxIF_PVcqtmwgAAA1U&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://youtu.be/bGecvPR2QWo?si=R22GMgPXDeCN8cDe&t=57

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aXfyH8hTaLxIF_PVcqtmwgAAA1U&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://regmedia.co.uk/2026/01/26/maia_200_speeds.png

[9] https://www.theregister.com/2026/01/05/ces_rubin_nvidia/

[10] https://www.theregister.com/2026/01/24/neurophos_hopes_to_revive_moores_law/

[11] https://www.theregister.com/2026/01/23/intel_earnings_q4_2025/

[12] https://www.theregister.com/2026/01/22/upscale_skyhammer_nvidia/

[13] https://www.theregister.com/2026/01/21/house_gop_ai_chip_exports_trump_china_nvidia/

[14] https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR_70TiI5iu5HrYsHqj3v-nFUQkU4MThLQ1RFVVlWSVEyRklYRTBYMlYwUy4u

[15] https://whitepapers.theregister.com/



Anonymous Coward

The next version will helpfully deliver ads to your inferences.

Q: How many journalists does it take to screw in a light bulb?
A: Three. One to report it as an inspired government program to bring
light to the people, one to report it as a diabolical government plot
to deprive the poor of darkness, and one to win a Pulitzer prize for
reporting that Electric Company hired a light bulb-assassin to break
the bulb in the first place.