News: 0177692611

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Anthropic Releases Claude 4 Models That Can Autonomously Work For Nearly a Full Corporate Workday (anthropic.com)

(Thursday May 22, 2025 @05:20PM (msmash) from the moving-forward dept.)


Anthropic [1]launched Claude Opus 4 and Claude Sonnet 4 today, positioning Opus 4 as the world's leading coding model with 72.5% performance on SWE-bench and 43.2% on Terminal-bench. Both models feature hybrid architecture supporting near-instant responses and extended thinking modes for complex reasoning tasks.

The models introduce parallel tool execution and memory capabilities that allow Claude to extract and save key facts when given local file access. Claude Code, previously in research preview, is now generally available with new VS Code and JetBrains integrations that display edits directly in developers' files. GitHub integration enables Claude to respond to pull request feedback and fix CI errors through a new beta SDK.

Pricing remains consistent with previous generations at $15/$75 per million tokens for Opus 4 and $3/$15 for Sonnet 4. Both models are available through Claude's web interface, the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. Extended thinking capabilities are included in Pro, Max, Team, and Enterprise plans, with Sonnet 4 also available to free users.

The startup, which counts Amazon and Google among its investors, said Claude Opus 4 could [2]autonomously work for nearly a full corporate workday -- seven hours. CNBC adds:

> "I do a lot of writing with Claude, and I think prior to Opus 4 and Sonnet 4, I was mostly using the models as a thinking partner, but still doing most of the writing myself," Mike Krieger, Anthropic's chief product officer, said in an interview. "And they've crossed this threshold where now most of my writing is actually ... Opus mostly, and it now is unrecognizable from my writing."

>

> Krieger added, "I love that we're kind of pushing the frontier on two sides. Like one is the coding piece and agentic behavior overall, and that's powering a lot of these coding startups. ... But then also, we're pushing the frontier on how these models can actually learn from and then be a really useful writing partner, too."



[1] https://www.anthropic.com/news/claude-4

[2] https://www.cnbc.com/2025/05/22/claude-4-opus-sonnet-anthropic.html



Re: huh? (Score:2)

by fluffernutter ( 1411889 )

It can work with other chatbots of course! They'll be able to multiply the amount of power they use doing their useful work!!

Re: (Score:2)

by Big Hairy Gorilla ( 9839972 )

and they can have meetings, and they can schedule them all electronically.

I wonder if they will have marathon meetings and then begin to fall asleep?

or maybe search the web while the other AI drones on and on...

This is all starting to sound pretty familiar :-)

Re: huh? (Score:2)

by fluffernutter ( 1411889 )

This is sounding like a Pixar film!

I want to know (Score:5, Funny)

by newslash.formatblows ( 2011678 )

what happens after 7 hours? Claude needs lunch? A hug? It goes batshit and deletes the whole repository?

Re: (Score:2)

by Dan East ( 318230 )

It means the other 17 hours it produces unusable nonsense that superficially looks correct that a human then has to spend 40 hours sorting out and fixing.

Doing what? (Score:3)

by gweihir ( 88907 )

Because producing hallucinations for 7 hours/day is pretty easy...

Re: (Score:2)

by Moridineas ( 213502 )

Given how much of your time you seem to spend here posting about LLMs and the repetitive nature of your posts, I'm starting to think you're one of them!

Re: (Score:1)

by CallMeTim ( 6454842 )

He's still gonna be ranting about how useless they are after a developer using LLMs takes his job.

Re: (Score:2)

by gweihir ( 88907 )

I am not a developer. This is just something I can _also_ do.

Re: (Score:2)

by Plugh ( 27537 )

It is the new antivaxx. People who know a little more than the average non-techie person based ofc on mostly secondhand sources amp up the scare factor and get positive feedback in terms of clicks & attention and -- if they graduate to grifting -- ads. Yeah it is all a con stochastic parrots biggest bubble since tulips scorch the planet yadda. Meanwhile materials scientists using it to make better solar panels, plasma physicists using it to enable fusion, medical science unlocking the proteome for perso

The upcoming arms race is obvious. (Score:4, Insightful)

by Tschaine ( 10502969 )

When most employees are producing multiple times the written output that they could produce on their own, everyone will need AI agents to summarize all of the documents, email, and slack/teams messages that are coming at them.

I'm not at all convinced that this will be better than communicating without the AI-powered inflation and summarization in between the humans.

In fact, this seems much more likely to introduce errors (and lose nuances) than plain old person to person communication.

Re: (Score:1)

by blue trane ( 110704 )

What if one of the persons is weird, how can you communicate with that in the workplace? What if one of the persons is flirting?

I believe it (Score:2)

by backslashdot ( 95548 )

I've seen vibe coding tools taking hours attempting to fix a bug it generated itself until it depletes of credits entirely.

Re: (Score:2)

by dinfinity ( 2300094 )

I've seen humans taking days to do that. Those hours cost a lot more than the AI model credits too.

7 hours!?! (Score:1)

by DBCubix ( 1027232 )

I barely got 30 minutes out of it before the usage limit was reached. I still have another 15 minutes to wait until the limits reset.

indent does _not_ solve the problem of:
* buggers who add 1001st broken implementation of memcmp(), call it
FooTurdCompare and prepend it with 20x80 block comment.

- Alexander Viro on coding style