News: 1747740733

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Intel bets you'll stack cheap GPUs to avoid spending top dollar on Nvidia Pros

(2025/05/20)


Computex When it comes to AI accelerators, Intel isn't very competitive, and its newly announced Battlemage workstation cards don't do much to change that. But at least they're cheap. Really cheap.

For the purposes of AI, we can mostly ignore the $299 Intel Arc B50, which is positioned as a more traditional workstation GPU for graphics-intensive workloads.

But the x86 giant is pushing the more performant (and power-hungry) B60 for both graphics and AI inference applications. It hasn't set an official price yet, but Vivian Lien, vice president and general manager of client graphics at Intel, expects the cards will account for about $500 of the overall cost of a PC. On the open market, we suspect real-world pricing will come in a bit higher than that, though.

[1]

On paper, the B60 falls somewhere between Nvidia's RTX 4000 Ada and 4500 Ada Generation GPUs, which currently cost between $1,250 and $2,400.

[2]

[3]

But if AI inference is your primary focus, you're probably looking at something like Nvidia's RTX Pro 6000 workstation cards [4]announced back in March at GTC, which boast roughly 4.5-5x higher INT8 performance, and 4x the memory capacity and bandwidth of Intel's B60. Those cards are currently retailing in the neighborhood of $8,565 each. That means the B60 would come in at about 1/17th the price of a typical inference-focused Nvidia GPU.

Here's a brief rundown of how the two compare. We've included the B50 in there just for reference.

Arc Pro B50

Arc Pro B60

RTX Pro 6000

Memory Capacity

16 GB

24 GB

96 GB

Mem Bandwidth

224 GB/s

456 GB/s

1,792 GB/s

INT8 Perf

170 TOPS

197 TOPS

877-1,007 TOPS

FP4 Perf

NA

NA

1,755-2,015 TFLOPS

TDP

70W

120W-200W

300W-600W

Price

$299

~$500

~$8500

Note: all performance figures given are for dense integer / floating point performance without sparsity enabled.

Competition through parallelism

As you can see, on its own the B60 can't hold a candle to Nvidia's latest workstation cards. But, if you cram four of them into a workstation chassis, you're at least in the same ballpark. And that's exactly what Intel expects customers to do. In fact, it envisions systems with as many as eight of these chips on board. Intel is calling this concept Project Battlematrix.

[5]

Intel's Project Battlematrix promises to cram up to eight cheap and cheerful Arc Pro B60s into a box for 192GB of vRAM and 1.5 petaOPS of compute - Click to enlarge

For an eight-GPU system, you're looking at about 1.5 petaOPS of dense INT8 performance, 192GB of vRAM, and 3.6TB/s of aggregate memory bandwidth. More importantly, assuming Lien's $500-per-card estimate actually plays out, you're looking at about $4,000 worth of GPUs. Even if the cards end up being closer to $750 apiece, that's still far cheaper than buying a pair of RTX Pro 6000s.

In fact, you could conceivably rack up two eight-GPU Intel systems for less than the price of one dual-GPU Nvidia workstation.

[6]

We say rack because if you plan to deploy more than one of these, at least in the US, a single system would already be pushing the limits of a 15 amp circuit (1,800W). In such a configuration, we expect you'll need to adjust the TDP of the B60s to something closer to 120W to avoid tripping a breaker under load.

Each individual B60 consumes less power (120W-200W) than the RTX Pro 6000 workstation cards (300W-600W), but they're also less power efficient. This is especially true when you take into consideration that Nvidia's latest chips offer native support for 4-bit data types, which the B60 does not.

While an RTX Pro 6000 may be 4.5x-5x faster at INT8, it's closer to 9-10x faster at FP4, and 18x-20x faster if you can take advantage of sparsity.

[7]

But for an inference workstation, that might not be as big a deal as you think. That's because, as impressive as 4 petaFLOPS of sparse FP4 might sound, inference workloads tend to be memory-bandwidth-bottlenecked as opposed to performance limited. Inference doesn't really benefit from sparsity, either.

More compute can be helpful during the inference prefill stage, when the model is processing your prompt. However, this is most noticeable for workloads like summarizing a report, or systems that need to serve large quantities of concurrent requests.

What's more, the B60 might not natively support 4-bit datatypes, but it can still run many 4-bit quantized models. For example, Llama.cpp, which popularized GGUF quantization, has supported Intel GPUs via Sycl for at least a year now. Just because the weights are stored at lower precision doesn't mean the activations — the compute heavy bit — have to be.

While Intel is mostly marketing the B60 as an inference card, they're still GPUs and could just as easily be used for model fine-tuning — a topic we've explored in detail before.

Multi-user environments

Having multiple GPUs in a box also presents a couple of unique opportunities for deployment, particularly in a lab environment, where you might have multiple users sharing resources.

While there are numerous ways of sharing GPU resources, like temporal slicing and resource partitioning, each comes with its own drawbacks. When you've got eight GPUs in a box, you could have eight different users each running their own workloads on a dedicated GPU.

We're told Intel is currently working on adding SR-IOV support. The tech enables a PCIe device to appear as multiple virtual devices and should vastly simplify the process of partitioning GPU resources or just passing them through to virtual machines.

Technically, this is also possible on Nvidia's RTX Pro graphics cards - at least on the 6000-series models anyway. Each chip supports Nvidia's multi-instance GPU partitioning tech, which allows you to split up the chip into either two 48GB partitions or four 24GB partitions.

However, unlocking that tech may require an Nvidia vGPU license on top of the chips' already high price.

[8]Nvidia wants to put a GB300 Superchip on your desk with DGX Station, Spark PCs

[9]El Reg's essential guide to deploying LLMs in production

[10]Nvidia opens up speedy NVLink interconnect to custom CPUs, ASICs

[11]AMD is Ryzen to the SMB occasion with a bundle of baby Epycs

Stepping up its software game

Multi-GPU systems do involve some compromises. Speeds and feeds don't always translate into real-world performance if the software isn't up to snuff.

But the software situation around Intel's Xe graphics architecture has improved steadily over the past year. And while getting workloads to run across multiple GPUs can introduce complexity, it's a pretty well understood problem at this point.

With the launch of its B60 platform, Intel has committed to further improvements to the software ecosystem, including the introduction of pre-baked container images, which will ship with everything you need to get a particular framework up and running. As we understand it, vLLM will be among the first container environments offered, but we may also see containers for Ollama and Llama.cpp before long.

As we've [12]previously explored , these kinds of container environments can make deploying AI workloads and development environments considerably easier, as rather than having to wrangle various dependencies, package managers, and then sort out bugs in version support, this work can be automated and deployed in a sandboxed environment.

These container environments are expected to start rolling alongside the cards in Q3, with SR-IOV, virtual desktop infrastructure, and other management functionality rolling out in the fourth quarter. ®

Get our [13]Tech Resources



[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/aiinfrastructuremonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aCynHmpvd-6awguK-FZADwAAAkw&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/aiinfrastructuremonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aCynHmpvd-6awguK-FZADwAAAkw&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/aiinfrastructuremonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aCynHmpvd-6awguK-FZADwAAAkw&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[4] https://www.theregister.com/2025/03/18/gtc_frame_nvidias_budget_blackwell/

[5] https://regmedia.co.uk/2025/05/19/intel_project_battlematrix.jpg

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/aiinfrastructuremonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aCynHmpvd-6awguK-FZADwAAAkw&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_specialfeatures/aiinfrastructuremonth&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aCynHmpvd-6awguK-FZADwAAAkw&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://www.theregister.com/2025/03/18/gtc_frame_nvidias_budget_blackwell/

[9] https://www.theregister.com/2025/04/22/llm_production_guide/

[10] https://www.theregister.com/2025/05/19/nvidia_nvlink_fusion/

[11] https://www.theregister.com/2025/05/13/amd_epyc_4005/

[12] https://www.theregister.com/2024/07/07/containerize_ai_apps/

[13] https://whitepapers.theregister.com/



It's Intel

cyberdemon

So Linux drivers will be shit or nonexistent, and the product will be killed off entirely in a few years..

Other than my obvious cynicism re. Intel, the ability to shun cloud LLMs and instead run them locally would be quite nice. Not sure I'd spend the requisite $4000 to stack 8 of them for 192GB though

Re: It's Intel

NoneSuch

Ordinarily, yes. However, Intel is on the back foot and needs to re-establish itself. They might even *gasp* listen to their customer base for once. Sticking to a lower cost GPU may pay off for them in the end in certain profit motivated executives can be sidelined for a while.

Linux drivers will suck for 3-6 months after release, but that has typically been the case.

Re: It's Intel

Yet Another Anonymous coward

Yes but if you ignore the extra cost and complexity of all the extra motherboards, power, cooling, rack-space, interconnects, management issues and latency from many more weaker cards - I think you'll find they have a compelling business case. If only they had the software support that NVidia has

Re: It's Intel

Rich 2

From what little I’ve seen and read, the new Intel cards work pretty well on Linux.

nVidia meanwhile are making a total balls-up of their latest offerings, regardless of OS.

I’m particularly aggrieved with nVidia because they have removed power control support from their Linux driver for (I think I have this right) 3000 series and older. Which means my laptop now cooks itself. Despite a forum chain of complaints as long as your arm, nVidia are completely ignoring the problem and offering no explanation.

My next PC will not include an nVidia graphics card in it

williamyf

Intel already has an AI only solution in the formof the Gaudi chips, mostly developed by Habana labs, with some sprinkles of nervana and movidius technology.

problem is, if you wanted to repurpose these for HPC (say weather modeling, nuclear modelling, geological seismic oil/gas surveys, movie/VFX rendering) the results were abysmal.

meanwhile, Arc GPUs are decent to mediocre at AI, but are decent to good for HPC. While these are marketed at AI, I'd not be surprised if the vast majority of these "Battlematrix" cards end up doing HPC

PS: intel was/is hard at work to produce a hybrid of gaudiand arc, but hit some snags. "ElReg" and sister site "The Next Platform" have covered this in varying depth

Some optional equipment shown.