Anthropic Launches Claude Opus 4.6 as Its AI Tools Rattle Software Markets (anthropic.com)
- Reference: 0180735536
- News link: https://slashdot.org/story/26/02/05/1755209/anthropic-launches-claude-opus-46-as-its-ai-tools-rattle-software-markets
- Source link: https://www.anthropic.com/news/claude-opus-4-6
The new model improves on Opus 4.5's coding abilities, the company said -- it plans more carefully, sustains longer agentic tasks, handles larger codebases more reliably, and catches its own mistakes through better debugging. It is also the first Opus-class model to feature a 1M token context window, currently in beta.
On GDPval-AA, an independent benchmark measuring performance on knowledge-work tasks in finance, legal and other domains, Opus 4.6 outperformed OpenAI's GPT-5.2 by roughly 144 Elo points. Anthropic also introduced agent teams in Claude Code, allowing multiple agents to work in parallel on tasks like codebase reviews. Pricing remains at $5/$25 per million input/output tokens.
[1] https://www.anthropic.com/news/claude-opus-4-6
[2] https://slashdot.org/story/26/02/04/1810206/as-software-stocks-slump-investors-debate-ais-existential-threat
Why Is Anthropic Crashing The Market (Score:2)
Do they hate money?
Re: (Score:3)
They certainly do not hate money, they want it funneled into their direction. And since the stock market can only make you gain money that another market participant loses, they are certainly fine with other companies' market capitalization falling.
And honestly, their value proposition is kind of enticing: "Here we offer you 'plug-ins' to replace your finance/legal/developer/etc. personnel with LLM based bots. Of course we have some cover-your-ass clause written into our offering, that you need to have al
Re: (Score:2)
The funny thing is that these "offers" have been made time and again before. They never worked. They do not really work now. The illusion just has gotten a bit better.
Re: (Score:1)
They're not crashing the market.
What crashing the market is that what's been supporting the market is the belief that AI was anytime now replace a lot of workers, resulting in massive productivity gains, saving companies a whole lot of money, a share of which would be revenue for AI suppliers instead, so they would make a lot of money.
Anytime now is the important factor.
Wall Street has been waiting for 3 years now for a technology that was always sold as "good to go today", but all that's still to see is mo
Re: (Score:3)
I think the correct answer is "you don't use a screwdriver as a hammer". Used correctly, AI tools can be quite helpful. But reports seem to show that only around 20% of companies use them correctly. (According to at least one report they produce a 5% improvement for one particular task. Whether that includes the cost of use, the article didn't say.)
Re: (Score:2)
That sounds as expected and matches some research I have done.
> When will the world learn that nothing is going to effectively shortcut having to code in large projects. The bots simply fail in real-world projects. They make the code substantively worse and more confusing. There is no silver bullet. There never was.
Indeed. There is no silver bullet and there cannot be one. Things do not work like that. What established engineering disciplines have is a ton of premade components that work reliably as expected. But when established engineering goes into full-custom design, the only thing that works is competent, experienced and very smart engineers doing it carefully and slowly. And that will not change. Nobody ever found a "silver bullet" and people have rea
I've recently done some tasks with Claude... (Score:2)
...and they were amazing. Like I told someone yesterday, two years ago AI was crap. Today, it's not too bad but I wouldn't get on a plane that had its code written by AI. In two years, I'll probably jump on the plane and take a nap.
Re: (Score:2)
I have also been impressed with Gemini, though I am against many of the use cases and do in fact think AI will have a grossly negative effect on humanity. I can pretty much guarantee that every post here that claims it can't do this, or it is just that, and it is all an illusion, is made by people who haven't even actually used it. They seem to all have no idea that an LLM is a *neural network* and what makes it a Large Language model has to do with how it is trained, not something intrinsic to its interna
More like a beginning death-rattle (Score:2)
The claims are getting grander, the defects are somewhat better hidden, but the lies are also getting more and more obvious. Might still go on for a year or two but this tech has no big future. It may have a small one, like all Hype-AI tech before (with something like 10 hypes so far, all pretty mich along the same lines as the current one, just smaller), but only if they can bring the computational effort down massively. That seems to not be happening.
Re: (Score:2)
Name the various AI models and versions, as well as the use cases, and how it didn't work and why. Otherwise you are just a guy spouting the same anti-AI line you have been spouting for decades having absolutely no idea what you are talking about.
Re: (Score:2)
Sooo, the "anti-AI line you have been spouting for decades"? What drugs are you on? The hallucinations are extreme.
Well, I get you have absolutely nothing and you are clearly not very smart. My condolences.
Re: (Score:2)
So you are saying you *haven't* been completely dismissive of AI for decades. I'm smart enough to have a very, very good memory. The internet also has such a memory. [1] ... here is what AI knows about you [google.com]. Anyone who wants to look deeper will see that you have been spewing the same lack of understanding for as long as I can recall, which again, is a very, very long time.
[1] https://www.google.com/search?q=gweihir+AI+slashdot
Re: (Score:1)
Only a boomer could be so repeatedly wrong and still convince themselves that they know better than everyone else.
Anecdotal, but don't see the value of Opus (Score:1)
DGAF about benchmarks, I recall germini beat GPT and that Chineze bazinga also performed "very well" on paper. My personal hall of fame: Sonnet - a good work horse, might act a bit silly at times, but works very well if you are setting the context boundaries well. GPT (generic) - a more capable at cracking the harder tasks. (e.g. "let's grab Oracle driver in go and patch it to get to streaming blobs). I find Sonnet's code to be more readable. Opus - I don't get the hype. 3x cost... for what? When Sonnet
I wish journalists still existed... (Score:3)
Seriously.
144 elo points better than ChatGPT? Okay. So how many does ChatGPT get? 112? 1345? 13634? 98123?
Giving this number would elevate the summary from useless to useful.
Re:I wish journalists still existed... (Score:5, Informative)
For anyone interestet, GPT5.2 scored 1462, so we're talking a 10 percent increase in score.
Re: (Score:2)
> GPT5.2 scored 1462, so we're talking a 10 percent increase in score.
Assuming that the scale isn't logarithmic or some such.
Re: (Score:3)
I guess it's just an Elo score, though I'm not clear why they all caps it? [1]https://en.wikipedia.org/wiki/Elo_rating_system [wikipedia.org]
[2]https://artificialanalysis.ai/evaluations/gdpval-aa [artificialanalysis.ai]
[1] https://en.wikipedia.org/wiki/Elo_rating_system
[2] https://artificialanalysis.ai/evaluations/gdpval-aa
Re: (Score:2)
I think because most people incorrectly assume it's an acronym and not the guys name that invented it.
Re: (Score:3)
But to be clear to the GP, that doesn't mean "it's a 10% better model". For most queries that one does for any two models, most of the generations / fixes will be "good", and so it's just basically a coin flip as to which model to choose ("I like this one's documentation more", "This one's fix was more concise", "This model was more polite", etc). 10% is actually a pretty big difference and reflect the cases where one model was unambiguously better than the other.
Re: I wish journalists still existed... (Score:3)
At the same time aren't these benchmarks useless when they're a target rather than a measure.
Re: (Score:2)
All benchmarks can be gamed. The LLM-scammers have doing this really hard because they have nothing else. And yes, benchmarks become of negative worth (because they begin to state things that are not true) when systems design is aimed at optimizing them.
Re: (Score:2)
What journalist was involved with this post? It's a single link and it points to the Anthropic website.