AI agents can't teach themselves new tricks – only people can

(2026/02/19)

Reference: 1771502825
News link: https://www.theregister.co.uk/2026/02/19/ai_agents_cant_teach_themselves/
Source link:

Teach an AI agent how to fish for information and it can feed itself with data. Tell an AI agent to figure things out on its own and it may make things worse.

AI agents are machine learning models (e.g. Claude Opus 4.6) that have access to other software through a CLI harness (e.g. Claude Code) and operate in an iterative loop. These agents can be instructed to handle various tasks, some of which may not be covered in their training data.

When lacking the appropriate training, software agents can be given access to new "skills," which are essentially added reference material to impart domain-specific capabilities. "Skills" in this context refer to instructions, metadata, and other resources like scripts and templates that agents load to obtain procedural knowledge.

[1]

For example, an AI agent could be instructed how to process PDFs with a skill that consists of markdown text, code, libraries, and reference material about APIs. While the agent might have some idea how to do this from its training data, it should perform better with more specific guidance.

[2]

[3]

Yet according to a recent [4]study , SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, asking an agent to develop that skill on its own will end in disappointment. The "intelligence" part of artificial intelligence is somewhat overstated.

At least that's the case with large language models (LLMs) at inference time – when the trained model is being used as opposed to during the training process.

A new benchmark

Certain forms of machine learning, like deep learning, can be applied in a way that allows neural network models to improve their performance in domain-specific tasks [5]like video games .

The explosion of AI agents – Claude Code from Anthropic, Gemini CLI from Google, and Codex CLI from OpenAI – has led to the rapid development of skills to augment what the agents can do. [6]Skill [7]directories [8]are [9]proliferating [10]like weeds. And given how OpenClaw agents have been [11]teaching each other in the Moltbook automated community network, it seems well past time to figure out how good a job they do at it.

[12]

To date, there's been no common way to see whether these skills deliver what they promise. So a team of 40 (!) computer scientists, affiliated with with companies like Amazon, BenchFlow, ByteDance, Foxconn, and Zennity, and various universities, including Carnegie Mellon, Stanford, UC Berkeley, and Oxford, set out to develop a benchmark test to evaluate how agent skills augment performance during inference.

The authors, led by Xiangyi Li, founder of agent measurement startup BenchFlow, developed a test they dubbed SkillsBench, and described their findings in the above-mentioned preprint paper.

[13]Linus Torvalds and friends tell The Reg how Linux solo act became a global jam session

[14]Gemini lies to user about health info, says it wanted to make him feel better

[15]Google digs deep to power AI expansion with 150 MW geothermal deal

[16]Copilot spills the beans, summarizing emails it's not supposed to read

The researchers looked at seven agent-model setups across 84 tasks for 7,308 trajectories – one agent's attempt at solving a single task under a specific skills condition. Three conditions were tested: no skills, curated skills, and self-generated skills.

The agents using curated skills – designed by people – completed tasks 16.2 percent more frequently than no-skill agents on average, though with high variance.

One example cited in the study is a flood-risk analysis task. Agents without skills didn't apply the appropriate statistical math, so achieved a pass rate of only 2.9 percent. With a curated skill that told the agent to use the Pearson type III probability distribution and apply the appropriate standard USGS methodology, and that specified other details like [17]scipy function calls and parameter interpretation, the agent's task pass rate increased to 80 percent.

[18]

When analyzed in terms of specific knowledge domains, curating healthcare (+51.9 percentage points) and manufacturing (+41.9 percentage points) skills helped AI agents the most, while curating skills related to mathematics (+6.0 percentage points) and software engineering (+4.5 percentage points) provided smaller gains. The authors explain this by observing that domains requiring specialized knowledge tend to be underrepresented in training data. So it makes sense for humans to augment agents working on tasks in those domains.

And when doing so, less is more – skills with only a few (2-3) modules performed better than massive data dumps.

That applies to model scale too – curated skills help smaller models punch above their weight class in terms of task completion. Anthropic's Claude Haiku 4.5 model with skills (27.7 percent) outperformed Haiku 4.5 without skills (11 percent) and also Claude Opus 4.5 without skills (22 percent).

When it came time to get agents to teach themselves skills, the study authors directed them to

analyze the task requirements, domain knowledge, and APIs required;

write 1-5 modular skill documents to solve the task;

save each skill as a markdown file; and

to then solve the task using the generated reference material.

Agents that tried this did worse than if they hadn't tried at all.

"Self-generated skills provide negligible or negative benefit (–1.3 percentage points average), demonstrating that effective skills require human-curated domain expertise," the authors state.

For now at least, the AI revolution will not be fully automated – the machines still need human teachers to set them on the right path. ®

Get our [19]Tech Resources

[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aZdBsBdzBnmiQlgA9oKSmQAAAc0&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZdBsBdzBnmiQlgA9oKSmQAAAc0&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZdBsBdzBnmiQlgA9oKSmQAAAc0&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[4] https://arxiv.org/abs/2602.12670

[5] https://arxiv.org/pdf/1708.07902

[6] https://clelp.ai

[7] https://aiagentsdirectory.com/skills

[8] https://skills.sh

[9] https://awesomeclaude.ai

[10] https://www.skillsdirectory.com

[11] https://arxiv.org/html/2602.14477v1

[12] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aZdBsBdzBnmiQlgA9oKSmQAAAc0&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[13] https://www.theregister.com/2026/02/18/linus_torvalds_and_friends/

[14] https://www.theregister.com/2026/02/17/google_gemini_lie_placate_user/

[15] https://www.theregister.com/2026/02/18/google_ormat_geothermal_datacenter_deal/

[16] https://www.theregister.com/2026/02/18/microsoft_copilot_data_loss_prevention/

[17] https://scipy.org

[18] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aZdBsBdzBnmiQlgA9oKSmQAAAc0&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[19] https://whitepapers.theregister.com/

Researchers

elsergiovolador

Has anyone told the researchers that models don't have understanding of anything that they "work" on and they just regurgitate text they got "trained" with?

Re: Researchers

m4r35n357

Has anyone told The Register?

Re: Researchers

TheMaskedMan

"models don't have understanding of anything that they "work" on and they just regurgitate text they got "trained" with"

True enough. But it's also true enough of many, many humans. The world is full of people doing their jobs - and living their lives - according to half-remembered instructions, with no idea about why they do things this way; it's what they've been told to do, so that is what they do.

It may be that giving the model a better skill- making skill would help. But since part of that is going to involve researching the problem and solutions, and some of the models tested are not designed for deep research, that might not help too much either. Perhaps, as with humans, the lower end models are best left to the routine tasks.

Re: Researchers

Rich 2

Quite. Why do we keep seeing these studies that conclude with the bleedin’ obvious (at least obvious to anyone who knows anything about how LLMs work)?

And I’m getting so very tired of “AI” stories - they seem to make up 80% of the tech news these days

"OpenClaw agents have been teaching each other in the Moltbook automated community network"

Dan 55

Pivot to AI maintains [1]they're just cosplaying cryptobros trying to scam each other .

[1] https://pivot-to-ai.com/2026/02/16/the-obnoxious-github-openclaw-ai-bot-is-a-crypto-bro/

"The "intelligence" part of artificial intelligence is somewhat overstated"

Mike 137

It's marketing hype without which the billions of dollars wouldn't be poured into the coffers of the "AI" corporations. Real human (including technological) progress has always been driven by exceptional individuals having novel ideas, not by committees which output a consensus of the status quo. As the training of current "AI" (in reality, LLMs) is fed by data representing the status quo and from this the machine generates statistical consensuses, it should be self-evident that real novel output is to be expected solely by pure fluke.

Re: "The "intelligence" part of artificial intelligence is somewhat overstated"

vtcodger

Hype: The tech bro's have sunk a lot of money into AI. And of course, the marketing scum really have nothing else on their cart other than AI to lie about. And AI isn't totally worthless. For example, it might have taken me two minutes to find a source for the definition of The "sunk cost fallacy". Goggle conjured one up in seconds.

The sunk cost fallacy is a cognitive bias causing individuals to continue an endeavor—such as a project, relationship, or investment—solely because of previously invested resources (time, money, or effort), even when current costs outweigh potential benefits. It is "throwing good money after bad" by ignoring that past investments are unrecoverable.

And the best part? ... Google is paying for the costs ... which will quite likely turn out in the long run to be unrecoverable.

One thing AI is very good at

JimmyPage

is writing stories about AI.

I wonder why ?

News: 1771502825

AI agents can't teach themselves new tricks – only people can

Researchers

Re: Researchers

Re: Researchers

Re: Researchers

"OpenClaw agents have been teaching each other in the Moltbook automated community network"

"The "intelligence" part of artificial intelligence is somewhat overstated"

Re: "The "intelligence" part of artificial intelligence is somewhat overstated"

One thing AI is very good at