AI coding tools make developers slower but they think they're faster, study finds
- Reference: 1752273717
- News link: https://www.theregister.co.uk/2025/07/11/ai_code_tools_slow_down/
- Source link:
Computer scientists with Model Evaluation & Threat Research (METR), a non-profit research group, have published [1]a study showing that AI coding tools made software developers slower, despite expectations to the contrary.
Not only did the use of AI tools hinder developers, but it led them to hallucinate, much like the AIs have a tendency to do themselves. The developers predicted a 24 percent speedup, but even after the study concluded, they believed AI had helped them complete tasks 20 percent faster when it had actually delayed their work by about that percentage.
Surprisingly, we find that allowing AI actually increases completion time by 19 percent — AI tooling slowed developers down
"After completing the study, developers estimate that allowing AI reduced completion time by 20 percent," the study says. "Surprisingly, we find that allowing AI actually increases completion time by 19 percent — AI tooling slowed developers down."
The study involved 16 experienced developers who work on large, open source projects. The developers provided a list of real issues (e.g. bug fixes, new features, etc.) they needed to address – 246 in total – and then forecast how long they expected those tasks would take. The issues were randomly assigned to allow or disallow AI tool usage.
[2]
The developers then proceeded to work on their issues, using their AI tool of choice (mainly Cursor Pro with Claude 3.5/3.7 Sonnet) when allowed to do so. The work occurred between February and June 2025.
[3]
[4]
The study says the slowdown can likely be attributed to five factors:
"Over-optimism about AI usefulness" (developers had unrealistic expectations)
"High developer familiarity with repositories" (the devs were experienced enough that AI help had nothing to offer them)
"Large and complex repositories" (AI performs worse in large repos with 1M+ lines of code)
"Low AI reliability" (devs accepted less than 44 percent of generated suggestions and then spent time cleaning up and reviewing)
"Implicit repository context" (AI didn't understand the context in which it operated).
Other considerations like AI generation latency and failure to provide models with optimal context (input) may have played some role in the results, but the researchers say they're uncertain how such things affected the study.
[5]Tech to protect images against AI scrapers can be beaten, researchers show
[6]Datacenters feeling the heat as climate risk boils over
[7]EU tries to explain how to do AI without breaking the law
[8]Firebase Studio's new Agent Mode wants to code so you don't have to
Other researchers have also found that AI does not always live up to the hype. A [9]recent study from AI coding biz Qodo found some of the benefits of AI software assistance were undercut by the need to do additional work to check AI code suggestions. An economic survey found that generative AI has had [10]no impact on jobs or wages , based on data from Denmark. An [11]Intel study found that AI PCs make users less productive. And call center workers at a Chinese electrical utility [12]say that while AI assistance can accelerate some tasks, it also slows things down by creating more work.
That aspect of AI tool use – the added work – is evident in one of the graphics included in the study. "When AI is allowed, developers spend less time actively coding and searching for/reading information, and instead spend time prompting AI, waiting on and reviewing AI outputs, and idle," the study explains.
More anecdotally, a lot of coders find that AI tools [13]can help test new scenarios quickly in a low-stakes way and automate certain routine tasks, but don't save time overall because you still have to validate whether the code actually works – plus, [14]they don't learn like an intern. In other words, AI tools may make programming incrementally more fun, but they don't make it more efficient.
[15]
The authors – Joel Becker, Nate Rush, Beth Barnes, and David Rein – caution that their work should be reviewed in a narrow context, as a snapshot in time based on specific experimental tools and conditions.
"The slowdown we observe does not imply that current AI tools do not often improve developer's productivity – we find evidence that the high developer familiarity with repositories and the size and maturity of the repositories both contribute to the observed slowdown, and these factors do not apply in many software development settings," they say.
The authors go on to note that their findings don't imply current AI systems are not useful or that future AI models won't do better. ®
Get our [16]Tech Resources
[1] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aHHd2hDJ-W9vbNGDMRp-XwAAAcs&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHHd2hDJ-W9vbNGDMRp-XwAAAcs&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aHHd2hDJ-W9vbNGDMRp-XwAAAcs&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://www.theregister.com/2025/07/11/defenses_against_ai_scrapers_beaten/
[6] https://www.theregister.com/2025/07/11/climate_change_datacenters/
[7] https://www.theregister.com/2025/07/10/eu_ai_code_of_practice/
[8] https://www.theregister.com/2025/07/10/google_firebase_ai_updates/
[9] https://www.theregister.com/2025/06/12/devs_mostly_welcome_ai_coding/
[10] https://www.theregister.com/2025/04/29/generative_ai_no_effect_jobs_wages/
[11] https://www.theregister.com/2024/11/22/ai_pcs_productivity/
[12] https://www.theregister.com/2025/07/02/call_center_ai_assistants/
[13] https://blog.miguelgrinberg.com/post/why-generative-ai-coding-tools-and-agents-do-not-work-for-me
[14] https://blog.miguelgrinberg.com/post/why-generative-ai-coding-tools-and-agents-do-not-work-for-me
[15] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHHd2hDJ-W9vbNGDMRp-XwAAAcs&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[16] https://whitepapers.theregister.com/
Automated Ignorance < Human Genius. Story at at 11
Really, is it surprising that statistical bullshit machines are incapable of "tasks" requiring actual thought and analysis, when they are simply (polluted) statistical models of wot (supposed) humans have said in previous scenarios?
It is umbilicism on a civilisational scale to believe that a mere model of our past behaviour can now surpass ourselves to augment or replace human intelligence, based entirely on what has gone before.. And without any individualistic spark, curiosity, or hope
I worry not about superintelligent AI but apocalyptic mediocrity i.e. we are replacing all of ourselves with a simulacrum which behaves (statistically, without any actual empathy, logic or indeed intelligence) roughly as a thoroughly mediocre 2010s human might, but at a scale and speed limited only by how much energy we can burn.. Yet without any capacity to understand, improve, invent..
Constantly we feed our conscious 'souls' into this infernal contraption, in the vain hope that it might give us some temporary boost in mental output, and drive profit/growth before we all get bored of it.. Meanwhile the malign uses of such a simulacrum (dare I say homunculus) grow all the more powerful. Misinformation, disinformation, scams, fraud, attacking the weak without empathy or trauma (a feature, not a bug), building automated tools of oppression (e.g. autonomous lethal drones, which ate now terrifyingly real), and mass surveillance and control on a scale that the Stasi could nary have dreamed of) thus facilitating the destruction of our current civilisation.
(Yes I realise I sound like AMFM1 with this one, but that is what a bottle of wine or your tipple of choice does to a human - makes them mediocre, yet doesn't make them an average of everyone else)
Long after the end of what we call civilisation, all that's left of us will be some LLM sent on a probe by some tech knob, as an offering to a nonexistent God.
I am not religious,but if anything is "the devil's work" it is so-called AI
One word:
DUH!
"AI didn't understand the context"
Of course not. What is repeatedly called AI these days doesn't "understand". It does not have the capacity to do so.
Period.
Keep chipping away at the monolith though. When it is reduced to rubble, the landscape will be clearer.
Aim Low. Be less disappointed when you fail
That's a good maxim for most projects, but also has a hidden truth that the smaller the project the less likely you are to fail.
With AI assistance I have very good success with short programs and limited functionality.
A recent example is an application to ingest ebay transactions to a database and print invoices and address labels. I used Claude Sonnet 4 and I assume I was using Vibe Coding.
I gave it some sample ebay report downloads and asked to generate a database schema for SQLite and ingest application in python. It got both correct first time.
Then I fed it an invoice printer manuals and also used cups for the labels. specifying URL, label dimensons etc. that's basically all I needed. It figured out what an invoice should look like and correct address label formats (with my hint to throw out all the ebay crap). Again they just happened.
Then Claude Sonnet all on its own decided I needed a data processing flow so it generated a configuration system, three separate applications to ingest and print, and then a coordinating process to ingest, archive, and print all recently ingested transactions. It also generated man entries for all the sub-modules.
In the entire process it made three mistakes I pointed out and it rectified immediately.
Once all was done, it packaged everything up into a git project.
Claude Sonnet ended up doing far more that I would have on my own. It did it professionally. And it did it in around two hours allowing for a bit of adjustment of invoice format, and adding in some address cleaning functionality
Thank god it's not just me. This is so validating.
They're great for generating functions that embed SQL queries, and that alone make it useful to me. But the rest of the suggestions are never, ever what I want. I was writing what was essentially the same code about 4-5 times over across different files, and only after like the 4th file did Copilot start actually copying what I was doing. Any time I do have it generate something, I'm always cleaning up after it; renaming things, adding doc comments, making it adhere to my coding standards, and so on.
From personal experience, the only place they really excel is writing wrappers. When the code they're supposed to work on is already there and they just need to wrap them in something like HTTP request handler callbacks, they can pretty much write your entire HTTP server for you, with pretty much no mistakes. Not exactly something you do every day, but basically they're good for boilerplate.