Anthropic's Claude Opus 4.6 spends $20K trying to write a C compiler

(2026/02/09)

Reference: 1770656715
News link: https://www.theregister.co.uk/2026/02/09/claude_opus_46_compiler/
Source link:

An Anthropic researcher's efforts to get its newly released Opus 4.6 model to build a C compiler left him "excited," "concerned," and "uneasy."

It also left many observers on GitHub skeptical, to say the least.

Nicholas Carlini, a researcher on Anthropic's Safeguards team, detailed the experiment with what he called "agent teams" [1]in a blog that coincided with the official release of Opus 4.6.

[2]

He said he "tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. After nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V."

[3]

[4]

With agent teams, he said, "multiple Claude instances work in parallel on a shared codebase without active human intervention."

One key task was getting round the need for "an operator to be online and available to work jointly," which we presume means removing the need for Claude Code to wait for a human to tell it what to do next.

[5]

"To elicit sustained, autonomous progress, I built a harness that sticks Claude in a simple loop... When it finishes one task, it immediately picks up the next." Imagine if humans took that sort of approach.

Carlini continued: "I leave it up to each Claude agent to decide how to act. In most cases, Claude picks up the 'next most obvious' problem." This threw up a number of lessons, including the need to "write extremely high quality tests."

Readers were also advised to "put yourself in Claude's shoes." That means the "test harness should not print thousands of useless bytes" to make it easier for Claude to find what it needs.

[6]

Also, "Claude can't tell time and, left alone, will happily spend hours running tests instead of making progress."

Which might make you feel working with Claude is closer to working with a regular human than you might have thought. But what was the upshot of all of this?

"Over nearly 2,000 Claude Code sessions across two weeks, Opus 4.6 consumed 2 billion input tokens and generated 140 million output tokens, a total cost just under $20,000."

This made it "an extremely expensive project" compared to the priciest Claude Max plans, Carlini said. "But that total is a fraction of what it would cost me to produce this myself – let alone an entire team."

Other lessons? "The compiler successfully builds many projects, but not all. It's not yet a drop-in replacement for a real compiler." Moreover, "the generated code is not very efficient."

He added that the Rust code quality is "reasonable but... nowhere near the quality of what an expert Rust programmer might produce."

[7]OpenClaw reveals meaty personal information after simple cracks

[8]Anthropic apes OpenAI with cheeky chatbot commercials

[9]Anthropic cements its position as the not-OpenAI with no-ads pledge

[10]Rise of AI means companies could pass on SaaS

Carlini concluded: "Agent teams show the possibility of implementing entire, complex projects autonomously."

But as a former pen-tester, he said fully autonomous development posed real risks. "The thought of programmers deploying software they've never personally verified is a real concern." Ultimately, the experiment "excites me, [but] also leaves me feeling uneasy."

Comments on GitHub were less equivocal, not least because they felt the $20K price tag ignored a few other elements, such as the vast amount of other programmers' code the model was trained on in the first place.

As [11]mohswell put it: "If I went to the supermarket, stole a bit of every bread they had, and shoved it together, no one would say I made bread from scratch. They'd say I'm a thief. If this is 'from scratch,' then my cooking is farm-to-table."

While [12]Sambit003 opined: "The comment section and the issue itself is 'absolute cinema' moment everyone living through😂... the longer the AI generated codes I see... the safer I feel. 😂 Still we have the jobs (for long enough years)... just enjoy the overhyping bruh."

[13]Serkosal added plaintively: "okay, nice, could @claude find gf for me? No? I'm not interested." ®

Get our [14]Tech Resources

[1] https://www.anthropic.com/engineering/building-c-compiler

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aYpnFHvsz1Yu8dTPhR3o2gAAAI8&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aYpnFHvsz1Yu8dTPhR3o2gAAAI8&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aYpnFHvsz1Yu8dTPhR3o2gAAAI8&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aYpnFHvsz1Yu8dTPhR3o2gAAAI8&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aYpnFHvsz1Yu8dTPhR3o2gAAAI8&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[7] https://www.theregister.com/2026/02/05/openclaw_skills_marketplace_leaky_security/

[8] https://www.theregister.com/2026/02/05/anthropic_superbowl_openai_chatgpt_ads/

[9] https://www.theregister.com/2026/02/04/anthropic_no_advertising_in_claude/

[10] https://www.theregister.com/2026/02/04/ai_replace_saas/

[11] https://github.com/anthropics/claudes-c-compiler/issues/1#issuecomment-3869799573

[12] https://github.com/anthropics/claudes-c-compiler/issues/1#issuecomment-3862135955

[13] https://github.com/anthropics/claudes-c-compiler/issues/1#issuecomment-3861663434

[14] https://whitepapers.theregister.com/

elsergiovolador

tasked 16 agents with writing a Rust-based C compiler

That's how you would bootstrap C if compilers for it didn't exist, so the job is incomplete as typically the compiler should be written in the same language it compiles for.

"Claude can't tell time"

El Duderino

Not very 'intelligent' then, is it?

MrRtd

LLM barely passes open book test.

The polluter pays principal.

nematoad

...programmers deploying software they've never personally verified

I know that it goes against all the EULAs that have been written in the last thirty odd years, but how about making developers and the companies they work for legally liable for the dross that they turn out. If someone knew that they might be financially on the hook for mistakes made by these so-called AI agents then perhaps they might be a bit more circumspect in how they used said agents.

Crypto Monad

I think this is only possible because there are existing standards documenting the C language in reasonably formal detail - and many existing test suites which (I expect) would be re-used.

Using vibe coding for some vaguely defined task like "build a business automation system" is likely to be much harder. SAP need not worry just yet.

News: 1770656715

Anthropic's Claude Opus 4.6 spends $20K trying to write a C compiler

"Claude can't tell time"

The polluter pays principal.