News: 0183584898

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Google Launches 'Gemma 4 12B' AI Model That Can Run On Your Laptop

(Wednesday June 03, 2026 @05:00PM (BeauHD) from the locally-processed dept.)


Google has [1]launched Gemma 4 12B, a 12-billion-parameter open AI model [2]designed to run locally on your laptop without depending entirely on cloud infrastructure. WION reports:

> According to Google, the new model delivers performance close to much larger AI systems while requiring significantly less memory. The company says Gemma 4 12B can run locally on devices equipped with just 16GB of VRAM, making advanced AI more accessible to developers, researchers and businesses. The launch highlights a growing trend across the AI industry: bringing powerful AI models directly to personal computers instead of relying solely on remote data centers.

>

> Gemma is Google's family of open AI models built using technology and research from its Gemini program. The new Gemma 4 12B model contains 12 billion parameters and has been designed to handle multiple types of information, including text, images and audio. Unlike traditional AI systems that focus only on text, Gemma 4 12B can understand visual content, process audio inputs and perform advanced reasoning tasks. This makes it suitable for a wider range of applications, from software development and content creation to research and automation. Google says the model is available under the Apache 2.0 licence, allowing developers and organizations to use, modify and deploy it with relatively few restrictions.

>

> [...] One of the most significant technical changes in Gemma 4 12B is its new unified architecture. Traditionally, multimodal AI systems use separate components known as encoders to process images, audio and text before combining the information. Google says Gemma 4 12B removes the need for separate multimodal encoders. Instead, the model processes different types of information through a unified architecture. According to the company, this helps improve efficiency while reducing memory requirements and computational overhead. The result is a model that can deliver advanced multimodal capabilities while remaining small enough to run locally on modern hardware.



[1] https://developers.googleblog.com/bringing-gemma-4-12b-to-your-laptop-unlocking-local-agentic-workflows-with-google-ai-edge/

[2] https://www.wionews.com/technology/google-launches-gemma-4-12b-this-powerful-ai-model-from-google-can-run-on-your-laptop-1780505546251



Re: (Score:2)

by WolfgangVL ( 3494585 )

It probably reads your email to build a "state-of-the-art personalized targeted ad experience"

Re: (Score:3)

by allo ( 1728082 )

There is none. The Gemma 4 series are pretty solid general purpose models and the 12B is the latest bridging the gap between the E4B and the 26B-A4B model.

Get a recent llama.cpp and get started: [1]https://llama.app/ [llama.app]

For Gemma-4 12B you may need the latest git version or wait a day for a new build.

[1] https://llama.app/

Re:What's the catch? (Score:1)

by Tablizer ( 95088 )

Probably in here somewhere: "run locally on your laptop without depending entirely on cloud"

Re: (Score:2)

by gabebear ( 251933 )

Macs... Macintosh laptops are basically the only laptops with enough RAM, memory bandwidth, and processors.

Re: (Score:2)

by allo ( 1728082 )

You can offload layers to CPU or use more aggressive quantization. 12B models fit good in 8GB VRAM. You can go lower, but the speed will not be as good.

Re: (Score:2)

by geekmux ( 1040042 )

> "Just" 16GB of vram, so "your laptop" would have to be a very expensive high end model that most won't have.

16GB of VRAM is just for the AI memoryware loan.

The second line of credit is for the other 16GB of RAM needed for 21st Century BloatOS, which is ironically expensive and now comes secondary to whatever the hell you were thinking about using that computer for other than AI, AI, or AI. Also known as the entire point of futureputers.

(If you think this is bad, just wait until 2028 AI requires 128GB..which of course ought to be enough for anybody who can afford the month-to-month lease.)

Spin? (Score:2)

by Powercntrl ( 458442 )

Didn't Google just get caught including one of their models as shovelware on Chrome? Is this an attempt to make it seem like a model trimmed down to run on consumer hardware is some kind of positive innovation, rather than a waste of space that no one asked for?

At any rate, it was kind of humbling when I checked to see if Chrome had downloaded that AI crap on my laptop and then I realized it hadn't, because my machine doesn't meet the minimum hardware specs. Oops.

Re: (Score:1)

by Anonymous Coward

It is just a new release. No idea why Slashdot reports in particular about this one, because the other Gemmas (2 smaller, 2 larger than this) are out since a few weeks. 12B is a nice medium size (medium for the GPU pool at least), but nothing that revolutionary. The Chrome model is smaller than that. Android ships the E2B and E4B models.

which laptop has 16GB of VRAM? (Score:2)

by thesjaakspoiler ( 4782965 )

I'll bet that even Google employees don't have one.

Re: (Score:2)

by Un-Thesis ( 700342 )

My HP Omen AI laptop:

| NVIDIA-SMI 595.71.05 Driver Version: 595.71.05 CUDA Version: 13.2 |

| 0 NVIDIA GeForce RTX 5080 ... Off | 00000000:C3:00.0 Off | N/A |

| N/A 56C P8 9W / 80W | 116MiB / 16303MiB | 0% Default |

Re: (Score:1)

by svx ( 764251 )

Google didn't mentioned anything about 16GB of VRAM, only the reporter did. 16GB of *unified memory* will suffice.

Re: (Score:2)

by MtHuurne ( 602934 )

16GB of dedicated VRAM is rare, but for LLMs shared RAM is often good enough and that is more common in laptops. Although with current RAM prices getting 32GB+ of shared RAM isn't the auto-pick that it used to be.

Also I don't think you need 16GB for a 12B model; generally you can quantize to 4 or 5 bits per parameter without too much of a quality loss, which should make it fit within 8GB.

"It can run on your laptop" is not the news (Score:3)

by allo ( 1728082 )

If you like 12B models, there is Mistral-Nemo since 2 years.

If you are more VRAM constrained, there was Llama 3 8B. There is now Qwen3.5 9B. Gemma 4 already had E4B and E2B for devices like your smartphone. Ministral 3 also has 3B, 8B, 14B variants. There is more than enough choice for small models. And with offloading some layers to CPU RAM you can even run larger ones.

The interesting part of this model is that it is multi-modal (it supports image and audio input) without an encoder. That's new to the Gemma architecture.

Not "entirely" local? (Score:2)

by fahrbot-bot ( 874524 )

> AI model ... designed to run locally on your laptop without depending entirely on cloud infrastructure.

Guessing the the non-local part is telemetry for training and ad revenue?

I would have promised those terrorists a trip to Disneyland if it would have
gotten the hostages released. I thank God they were satisfied with the
missiles and we didn't have to go to that extreme.
-- Oliver North