Positron: we don’t need no fancy HBM to compete with Nvidia’s Rubin

(2026/02/04)

Reference: 1770242779
News link: https://www.theregister.co.uk/2026/02/04/positron_hbm_no_need/
Source link:

On paper, Positron's next-gen Asimov accelerators, no doubt named for the beloved science fiction author, don't look like much of a match for Nvidia's Rubin GPUs.

Yet, the Arm-backed AI startup boasts its inference chip will churn out five times as many tokens per dollar while using one-fifth the power of Nvidia's latest accelerators to do it.

Those are certainly some bold claims, which the company contends are possible because the chip was designed to support large-scale inference workloads. Another $230 million of [1]fresh capital probably doesn't hurt either.

[2]

Positron's Asimov couldn't be more different from the GPUs popularized by Nvidia and Arm.

[3]

[4]

Unlike its prior generation Atlas systems, which used high-bandwidth memory (HBM), the Asimov uses LPDDR5x memory, which can be expanded using Compute Express Link (CXL) from 864GB to 2.3TB per chip. Higher memory capacity means more room for LLM parameters and the key-value caches used to keep track of the model state.

But while LPDDR5x is both cheaper and higher capacity than HBM, it's also glacially slow by comparison.

[5]

Nvidia's newly [6]announced Rubin GPUs pack 288GB of HBM4 good for 22 TB/s of peak bandwidth. By comparison, Asimov appears to top out at around 3 TB/s. The difference, the company claims, is its chips can actually saturate 90 percent of that bandwidth, while GPUs are lucky to hit 30 percent in the real world.

However, that stat appears only to apply to the on-package LPDDR5x memory. Any CXL memory expansion is going to be limited by the chip's 32 PCIe 3.0 lanes, which are enough for about 256 GB/s of bandwidth. From what we gather, Positron aims to use this CXL memory pool to store key-value caches (KV-Cache), something that in theory should mitigate much of the complexity and overhead of KV-Cache offloading.

We'll note that even if Positron's assertion that HBM-based GPUs only manage about 30 percent of peak bandwidth is true, Rubin's memory is still about 2.4x faster. And that's not even taking into consideration compute, something that Positron seems to have glossed over in its marketing materials.

[7]

The company claims the 400-watt chip features a 512x512 systolic array running at 2 GHz that'll support the TF32, FP16/BF16, FP8, NVFP4, and Int4 datatypes. This array is fed by a series of Armv9 cores, and can be reconfigured to something like 128x512 or 512x128 depending on which is more advantageous for the task at hand. But if you were hoping for a teraFLOPS figure, we've yet to see one.

Having said that, raw compute is only one piece of the puzzle. Few generative AI models are designed to run efficiently on a single chip. As we've seen time and time again with chips like Google's TPU or Amazon's Trainium, per chip performance is often less important than how efficiently they can scale.

Each Asimov accelerator will be equipped with 16 Tbps of chip-to-chip bandwidth. That works out to 2 TB/s, which means the interconnect is nearly as fast as the memory.

Four Asimov chips will form Positron's Titan [8]compute platform . But rather than a standalone system, these machines are a lot more like the compute blades in Nvidia's NVL72 racks. The AI startup claims that up to 4,096 Titan systems can be combined into a single scale-up domain with more than 32 petabytes of memory on board.

This is achieved using a pure chip-to-chip mesh rather than the switched scale-up fabrics we see in Nvidia or AMD's rack-scale architectures. In this respect, Positron's scale-up fabric is really more akin to Amazon's Trainium 2 clusters or Google's TPUs, which use a variety of rings and 2D and 3D torus topologies.

[9]Intel welcomes memory apocalypse with Xeon workstation refresh

[10]Bill Gates-backed startup aims to revive Moore's Law with optical transistors

[11]AI networking startup Upscale AI scores $200M to challenge Nvidia's NVSwitch

[12]Microsoft's Maia 200 promises Blackwell levels of performance for two-thirds the power

While this approach eliminates the need for power-hungry packet switches, these meshes aren't easily reconfigured. Google has gotten around this using optical circuit switches, which function a bit like a telephone switchboard to physically change the way chips connect to one another or to swap in fresh accelerators in the event of a failure. Amazon, meanwhile, has [13]embraced switched fabrics with Trainium 3, arguing it offers better scalability for inference workloads.

Positron hasn't said how it plans to handle cluster provisioning just yet, but it doesn't look like we'll have to wait long to find out. Asimov is expected to begin shipping next year. ®

Get our [14]Tech Resources

[1] https://www.businesswire.com/news/home/20260204250472/en/Positron-AI-Raises-%24230-Million-Series-B-at-Over-%241-Billion-Valuation-to-Scale-Energy-Efficient-AI-Inference

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aYPPkh0_fDDBui0S-G8BGQAAAlg&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aYPPkh0_fDDBui0S-G8BGQAAAlg&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aYPPkh0_fDDBui0S-G8BGQAAAlg&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aYPPkh0_fDDBui0S-G8BGQAAAlg&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://www.theregister.com/2026/01/05/ces_rubin_nvidia/

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/systems&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aYPPkh0_fDDBui0S-G8BGQAAAlg&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://www.positron.ai/titan

[9] https://www.theregister.com/2026/02/02/intel_xeon_workstation/

[10] https://www.theregister.com/2026/01/24/neurophos_hopes_to_revive_moores_law/

[11] https://www.theregister.com/2026/01/22/upscale_skyhammer_nvidia/

[12] https://www.theregister.com/2026/01/26/microsoft_maia_200/

[13] https://www.theregister.com/2025/12/07/trainium3_all_nvidia_nvl72_mold/

[14] https://whitepapers.theregister.com/

smyckers

Four Asimov chips will form Positron's Titan compute platform. But rather than a standalone system, these machines are a lot more like the compute blades in Nvidia's NVL72 racks. The AI startup claims that up to 4,096 Titan systems can be combined into a single scale-up domain with more than 32 petabytes of memory on board.

This is achieved using a pure chip-to-chip mesh rather than the switched scale-up fabrics we see in Nvidia or AMD's rack-scale architectures. In this respect, Positron's scale-up fabric is really more akin to Amazon's Trainium 2 clusters or Google's TPUs, which use a variety of rings and 2D and 3D torus topologies.

I can't help but think of this

https://youtu.be/aW2LvQUcwqc?si=AUCxtuJBM-BM6lUo

News: 1770242779

Positron: we don’t need no fancy HBM to compete with Nvidia’s Rubin