News: 1754430747

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

OpenAI makes good on its name, launches first open weights language models since GPT-2

(2025/08/05)


OpenAI released its first open weights language models since GPT-2 on Tuesday with the debut of GPT-OSS.

The models are in two sizes: The first is a 117 billion parameter reasoning model that OpenAI says delivers performance that roughly matches its proprietary o4-mini. The second is a smaller, 21 billion parameter version that we're told achieves similar performance to the o3-mini.

[1]

Here's how OpenAI says its new open weights models stack up against its proprietary models. - Click to enlarge

You can find more model benchmarks on OpenAI's blog [2]here .

As models go, these are about as open as they get. Rather than using some custom license that restricts how many users you can have or whether you can use them in commercial applications, OpenAI has opted to make its latest models available under a highly-permissive Apache 2.0 license. That means that you can do just about anything you want with them.

According to OpenAI, it trained its GPT-OSS models primarily on a diet of English text with an emphasis on STEM, coding, and general knowledge. The models also lack the vision capabilities of OpenAI's larger models like GPT4o.

[3]

During post-training, OpenAI applied reinforcement learning in a process similar to what it used to imbue o4-mini with its chain-of-thought reasoning capabilities. And just like Altman and crew's proprietary models, you can adjust the models' reasoning effort to low, medium, or high by setting the desired level in the system prompt: i.e. "Reasoning: high."

[4]

[5]

The open weights models also utilize a mixture of expert (MoE) architecture.

GPT-OSS-120B features 128 experts of which four (totaling 5.1 billion parameters) generate each output token. GPT-OSS-20B, meanwhile, is essentially a stripped-down version with 32 experts and 3.6 billion active parameters. If you're not familiar, these experts are essentially sub-models that an internal routing mechanism dynamically activates when generating a response.

[6]

Here's a quick overview of the model's MoE architecture - Click to enlarge

What this means is that, so long as you can fit these models into your VRAM, they will generate tokens far faster than a dense model of equivalent size.

Speaking of hardware, running these models shouldn't be too much of a problem because OpenAI trained them at native MXFP4 precision in the MoE layer. According to OpenAI, the 120B model can run on a single 80GB H100 GPU, while the smaller 20B version can fit in just 16GB of VRAM.

[7]

Testing GPT-OSS-20B in Ollama on an RTX 6000 Ada, we observed token generation rates in excess of 125 tokens/sec at a batch size of one.

Both models feature a native context window of 128K tokens. While that might have been competitive a year ago, Alibaba's Qwen3 family offers a 256K tokens context window while Meta's Llama 4 herd, for better or worse, supports up to 10 million token contexts.

GPT-OSS's debut comes after repeated delays, the most recent of which OpenAI CEO Sam Altman attributed to extended safety evaluations.

[8]

In a [9]blog post on Tuesday, OpenAI expanded on these safety features, which included filtering out harmful data on topics such as chemical, biological, radiological, or nuclear research and development.

OpenAI also censored the model to prevent users from entering unsafe prompts or attempting prompt injections.

"Once an open-weight model is released, adversaries may be able to fine-tune the model for malicious purposes," the company explained.

OpenAI says that, during development, these measures effectively prevented testers from coopting the models for malicious use. The company is confident enough in its safety measures that it's challenged developers to red-team the models and offered a half-million-dollar prize for anyone who can identify novel safety issues.

At launch, GPT-OSS is available on a variety of model repositories, including Hugging Face, and boasts broad support for inference frameworks, including Hugging Face Transformers, PyTorch, Triton, vLLM, Ollama, and LM Studio. If you'd like to test the models out, check out our guide to deploying LLMs locally [10]here .

GPT-OSS doesn't appear to be the only thing OpenAI has cooking. In a [11]post on X, Altman said to expect a "big upgrade later this week." GPT-5 perhaps? ®

Get our [12]Tech Resources



[1] https://regmedia.co.uk/2025/08/05/openai_gpt_oss_perf.jpg

[2] https://openai.com/index/introducing-gpt-oss/

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aJJ-59VLpITvPuNhV1DMdwAAAFU&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aJJ-59VLpITvPuNhV1DMdwAAAFU&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aJJ-59VLpITvPuNhV1DMdwAAAFU&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://regmedia.co.uk/2025/08/05/openai_oss_gpt.jpg

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aJJ-59VLpITvPuNhV1DMdwAAAFU&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aJJ-59VLpITvPuNhV1DMdwAAAFU&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[9] https://openai.com/index/introducing-gpt-oss/

[10] https://www.theregister.com/2024/03/17/ai_pc_local_llm/?td=rt-3a

[11] https://x.com/sama/status/1952759361417466016

[12] https://whitepapers.theregister.com/



There is no doubt that my lawyer is honest. For example, when he
filed his income tax return last year, he declared half of his salary
as 'unearned income.'
-- Michael Lara