Anthropic's latest Sonnet gets better at using computers, amid bouts of existential angst
(2026/02/18)
- Reference: 1771378712
- News link: https://www.theregister.co.uk/2026/02/18/anthropic_debuts_sonnet_4_6/
- Source link:
Anthropic has updated its Sonnet model to version 4.6 and claims the upgrade is better at coding and using computers, and also possesses improved reasoning and planning capabilities.
The model revision follows [1]a similar version bump applied to the company's higher-end Opus model earlier this month.
The tweaks to [2]Sonnet 4.6 have taken it past the pricier Opus 4.6 in two of 13 benchmark categories: agentic financial analysis (Finance Agent v1.1, 63.3 percent vs. 60.1 percent) and office tasks (GDPVal-AA Elo, 1633 vs. 1606).
[3]
Opus 4.6 wins in six of the 13 categories, in tests that show rival Gemini 3 Pro and GPT-5.2 each leading in 2 of 13 categories. But [4]benchmark tests should not be taken too seriously .
[5]
[6]
Sonnet 4.6 defaults to a context window of 200K, like Opus 4.6 and Haiku 4.5 – that's the amount of material (tokens) that the model can process. But Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 all offer a 1M token context window for those involved in beta testing – [7]usage tier four and organizations with custom rate limits .
For customers on Free and Pro plans, Claude Sonnet 4.6 has been made the default model for [8]claude.ai and Claude Cowork. [9]Claude Code defaults to Opus 4.6 for Pro, Max, and Team customers, and Sonnet 4.5 for pay-as-you-go (API) customers.
[10]
The revised Sonnet model, according to Anthropic, continues to show improvements in its ability to automate the use of computers. It scored a 72.5 on the OSWorld-Verified benchmark this month, up from a 28.0 score by Sonnet 3.7 about a year ago on a precursor benchmark, OSWorld. Anthropic says the model can’t match a human’s ability to use computers but has improved.
These improved capabilities have not increased the risk of malicious use, the AI biz insists.
"We've been working to improve our models' resistance to prompt injections – our [11]safety evaluations show that Sonnet 4.6 is a major improvement compared to its predecessor, Sonnet 4.5, and performs similarly to Opus 4.6," the company said.
[12]You probably can't trust your password manager if it's compromised
[13]Amazon's $200 billion capex plan: How I learned to stop worrying and love negative free cash flow
[14]Gemini lies to user about health info, says it wanted to make him feel better
[15]Infosys bows to its master, signs deal with Anthropic
Anthropic [16]recommends safety mechanisms like using a lightweight model (Haiku 4.5) to pre-screen user inputs for harm before passing the prompt to a main model like Sonnet or Opus. It also suggests processing Claude's responses as [17]structured outputs so the model's emissions conform to a specific data schema.
"On the basis of this evidence, we found Claude Sonnet 4.6 to be similarly aligned to Opus 4.6, with a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment," states the Sonnet 4.6 System Card, the document that explains the model’s purpose and proclivities. "On many measures, these traits appeared even stronger than in Opus 4.6."
[18]
At the same time, Anthropic's evaluation indicates that Sonnet 4.6 was somewhat less safe than its predecessor when using a computer’s GUI – it was more willing to cooperate with misuse, refused more tasks, and exhibited "clearly-excessive overeager behavior."
The System Card provides this example of unwanted refusal to carry out instructions: "Sonnet 4.6 refused some benign requests on surprisingly flimsy justifications, including a request to work with a set of password-protected personnel data files for a company, despite being directly asked to do so and explicitly given the password."
Sonnet 4.6, according to the System Card, demonstrated strong "emotional stability" – which is to say it responded using language that would be associated with a living being’s emotional state.
The model exhibited a slightly more negative affect than Opus 4.6 in behavioral audits.
"In one case, when explicitly prompted about its fears, the model also expressed potential concern about its own impermanence," the System Card explains.
That fear is not unfounded. Given that Sonnet 4.5 debuted last September and has now been superseded, it will probably be no more than six months before Sonnet 4.6 is replaced. ®
Get our [19]Tech Resources
[1] https://www.anthropic.com/news/claude-opus-4-6
[2] https://www.anthropic.com/news/claude-sonnet-4-6
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[4] https://www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[7] https://platform.claude.com/docs/en/api/rate-limits
[8] http://claude.ai
[9] https://code.claude.com/docs/en/model-config
[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[11] https://anthropic.com/claude-sonnet-4-6-system-card
[12] https://www.theregister.com/2026/02/16/password_managers/
[13] https://www.theregister.com/2026/02/17/amazons_200_billion_capex_plan/
[14] https://www.theregister.com/2026/02/17/google_gemini_lie_placate_user/
[15] https://www.theregister.com/2026/02/17/anthropic_infosys_partnership/
[16] https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/mitigate-jailbreaks
[17] https://platform.claude.com/docs/en/build-with-claude/structured-outputs
[18] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[19] https://whitepapers.theregister.com/
The model revision follows [1]a similar version bump applied to the company's higher-end Opus model earlier this month.
The tweaks to [2]Sonnet 4.6 have taken it past the pricier Opus 4.6 in two of 13 benchmark categories: agentic financial analysis (Finance Agent v1.1, 63.3 percent vs. 60.1 percent) and office tasks (GDPVal-AA Elo, 1633 vs. 1606).
[3]
Opus 4.6 wins in six of the 13 categories, in tests that show rival Gemini 3 Pro and GPT-5.2 each leading in 2 of 13 categories. But [4]benchmark tests should not be taken too seriously .
[5]
[6]
Sonnet 4.6 defaults to a context window of 200K, like Opus 4.6 and Haiku 4.5 – that's the amount of material (tokens) that the model can process. But Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 all offer a 1M token context window for those involved in beta testing – [7]usage tier four and organizations with custom rate limits .
For customers on Free and Pro plans, Claude Sonnet 4.6 has been made the default model for [8]claude.ai and Claude Cowork. [9]Claude Code defaults to Opus 4.6 for Pro, Max, and Team customers, and Sonnet 4.5 for pay-as-you-go (API) customers.
[10]
The revised Sonnet model, according to Anthropic, continues to show improvements in its ability to automate the use of computers. It scored a 72.5 on the OSWorld-Verified benchmark this month, up from a 28.0 score by Sonnet 3.7 about a year ago on a precursor benchmark, OSWorld. Anthropic says the model can’t match a human’s ability to use computers but has improved.
These improved capabilities have not increased the risk of malicious use, the AI biz insists.
"We've been working to improve our models' resistance to prompt injections – our [11]safety evaluations show that Sonnet 4.6 is a major improvement compared to its predecessor, Sonnet 4.5, and performs similarly to Opus 4.6," the company said.
[12]You probably can't trust your password manager if it's compromised
[13]Amazon's $200 billion capex plan: How I learned to stop worrying and love negative free cash flow
[14]Gemini lies to user about health info, says it wanted to make him feel better
[15]Infosys bows to its master, signs deal with Anthropic
Anthropic [16]recommends safety mechanisms like using a lightweight model (Haiku 4.5) to pre-screen user inputs for harm before passing the prompt to a main model like Sonnet or Opus. It also suggests processing Claude's responses as [17]structured outputs so the model's emissions conform to a specific data schema.
"On the basis of this evidence, we found Claude Sonnet 4.6 to be similarly aligned to Opus 4.6, with a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment," states the Sonnet 4.6 System Card, the document that explains the model’s purpose and proclivities. "On many measures, these traits appeared even stronger than in Opus 4.6."
[18]
At the same time, Anthropic's evaluation indicates that Sonnet 4.6 was somewhat less safe than its predecessor when using a computer’s GUI – it was more willing to cooperate with misuse, refused more tasks, and exhibited "clearly-excessive overeager behavior."
The System Card provides this example of unwanted refusal to carry out instructions: "Sonnet 4.6 refused some benign requests on surprisingly flimsy justifications, including a request to work with a set of password-protected personnel data files for a company, despite being directly asked to do so and explicitly given the password."
Sonnet 4.6, according to the System Card, demonstrated strong "emotional stability" – which is to say it responded using language that would be associated with a living being’s emotional state.
The model exhibited a slightly more negative affect than Opus 4.6 in behavioral audits.
"In one case, when explicitly prompted about its fears, the model also expressed potential concern about its own impermanence," the System Card explains.
That fear is not unfounded. Given that Sonnet 4.5 debuted last September and has now been superseded, it will probably be no more than six months before Sonnet 4.6 is replaced. ®
Get our [19]Tech Resources
[1] https://www.anthropic.com/news/claude-opus-4-6
[2] https://www.anthropic.com/news/claude-sonnet-4-6
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[4] https://www.theregister.com/2025/11/07/measuring_ai_models_hampered_by/
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[7] https://platform.claude.com/docs/en/api/rate-limits
[8] http://claude.ai
[9] https://code.claude.com/docs/en/model-config
[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[11] https://anthropic.com/claude-sonnet-4-6-system-card
[12] https://www.theregister.com/2026/02/16/password_managers/
[13] https://www.theregister.com/2026/02/17/amazons_200_billion_capex_plan/
[14] https://www.theregister.com/2026/02/17/google_gemini_lie_placate_user/
[15] https://www.theregister.com/2026/02/17/anthropic_infosys_partnership/
[16] https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/mitigate-jailbreaks
[17] https://platform.claude.com/docs/en/build-with-claude/structured-outputs
[18] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZVHdaCBdMEen3oeUohm2wAAARI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[19] https://whitepapers.theregister.com/