AI coding tools are like that helpful but untrustworthy friend, devs say

(2025/06/12)

Reference: 1749722411
News link: https://www.theregister.co.uk/2025/06/12/devs_mostly_welcome_ai_coding/
Source link:

Exclusive Software developers largely appreciate the productivity improvements they get from AI coding tools, but they don't entirely trust their output, according to a survey conducted by AI coding biz Qodo.

As a result, some potential productivity gains get lost to manual reviews deemed necessary to check the AI's work.

Qodo offers "an agentic code quality platform for reviewing, testing, and writing code," so it has an opinion on such matters.

[1]

For its [2]report titled "The State of AI Code Quality 2025" – provided in advance to The Register – Qodo earlier this year conducted a survey of 609 developers using unspecified AI coding tools at a variety of organizations in different industries, ranging from startups to enterprises. A whopping 82 percent of the respondents said they use the tools at least weekly, and 78 percent reported productivity gains from them.

[3]

[4]

But lack of confidence is undercutting some of those gains.

"Overall, we're seeing that AI coding is a big net positive, but the gains aren’t evenly distributed," said Itamar Friedman, CEO and co-founder of Qodo in an email to The Register .

[5]

"There's a small minority of power users, who tend to be very experienced developers, who are seeing massive gains – these are the 10Xers. The majority of developers are seeing moderate gains, and there’s a group that’s failing to effectively leverage the current AI tools and is at risk of being left behind."

According to the [6]survey , about 60 percent of developers said AI improved or somewhat improved overall code quality while about 20 percent said AI had degraded or somewhat degraded their code.

Friedman emphasized that not all developers interact with AI in the same way.

[7]

"Individual contributors may feel 3x better because they’re shipping more code, but tech leads, reviewers, and those responsible for overall code quality tend to experience more pressure," he explained. "For them, the increase in code volume means more review work, more oversight, and sometimes, more stress."

The concern was widespread enough that 76 percent of respondents said they won't ship AI suggested code without human review. They prefer to manually rewrite or review AI's suggestions, and delay merges even when AI-generated code looks correct. They also avoid deeper AI integration into their workflows.

That reticence comes at a cost, because code review is actually one of the things AI is good at, according to the survey. Among those devs reporting productivity gains from AI, 81 percent of those who use it for code reviews reported quality improvements, compared to just 55 percent of those who did code reviews manually.

"Models like Gemini 2.5 Pro are excellent judges of code quality and can provide a more accurate measure than traditional software engineering metrics," said Friedman.

"With the latest model releases, they are getting to the point where they are surpassing any large scale review that can be done by humans. To quantify this, we’ve built [8]a public benchmark to evaluate model-generated pull requests and code changes against quality and completeness criteria."

[9]Trump administration's whole-government AI plans leaked on GitHub

[10]Altman fluffs superintelligence to save humanity as OpenAI slashes prices

[11]Mozilla frets about Google's push to build AI into Chrome

[12]As AI gallops through the federal workforce, lawmakers once again call for expanded training

About the trust thing

Developers have good reason for their distrust: About three-quarters of respondents encountered fairly frequent hallucinations - that is, situations in which the AI made syntax errors or called packages that don't exist.

"In our survey, only about a quarter of developers reported that hallucinations were a rare occurrence," Friedman said.

But there are ways to rein those hallucinations in. "One good method for dealing with the inherent flaws is to start a session by prompting the agent to review the codebase structure, documentation, and key files, before then giving it the actual development task," he said.

Another technique is to give the AI agent a clear specification and have it generate tests that comply with the spec, Friedman said. "Only after verifying that the tests match your intent, you have the agent implement it," he explained. He added that when a code suggestion goes awry, sometimes it's best to just start again rather than have the agent double-back to make corrections.

But concern about hallucination wasn't the biggest worry. The most requested improvement by devs was "improved contextual understanding" (26 percent), followed by "reduced hallucinations/factual errors" (24 percent), and "better code quality" (15 percent).

"Context is key for effectively using AI tools," said Friedman. "This has become a bit cliche but it means something quite simple: the information that’s fed into the models, what’s in their 'context window,' has a direct and dramatic impact on the quality of the code they generate."

Friedman explained that power users of AI coding tools make sure to provide detailed information to the AI model, including supplementary data like product requirements and specifications, examples of similar tasks, and coding styles.

In other words, to avoid "garbage in, garbage out," be more deliberate about your AI helper's diet.

Friedman argues the learning curve for dealing with AI models can be flattened by automating the model context augmentation, an endeavor that recalls how Google boosts search relevance by incorporating contextual signals and personal info.

Organizations offering these tools to developers just need to ensure whatever gets vacuumed into the maw of the AI's context window complies with corporate policies. ®

Get our [13]Tech Resources

[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aEr5lh3ezlDjyunEIgiDeQAAAAY&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://www.qodo.ai/reports/state-of-ai-code-quality/

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aEr5lh3ezlDjyunEIgiDeQAAAAY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aEr5lh3ezlDjyunEIgiDeQAAAAY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aEr5lh3ezlDjyunEIgiDeQAAAAY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://www.qodo.ai/reports/state-of-ai-code-quality/

[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aEr5lh3ezlDjyunEIgiDeQAAAAY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[8] https://qodo-merge-docs.qodo.ai/pr_benchmark/

[9] https://www.theregister.com/2025/06/10/trump_admin_leak_government_ai_plans/

[10] https://www.theregister.com/2025/06/11/openais_sam_altman_superintelligence/

[11] https://www.theregister.com/2025/06/11/mozilla_worries_googles_browser_ai/

[12] https://www.theregister.com/2025/06/09/bill_federal_employees_ai_training/

[13] https://whitepapers.theregister.com/

76% ... won't ship AI suggested code without human review

abend0c4

It's a concern that 24% apparently will.

Or 16.g% according to their own calculations.

Re: 76% ... won't ship AI suggested code without human review

elsergiovolador

Don't worry, AI also wrote corresponding test suite, so the code is good! Ship it man!

Haaaah-hahhhh-ha-ha-ha!!

An_Old_Dog

how Google boosts search relevance by incorporating contextual signals and personal info.

Despite Google's processing of massive amounts of personal information, Google has not boosted its search relevance. Contrarily, it clogs my search results with unwanted links, and shoves what I want to the bottom of the results, or omits it altogether.

If "AI" code helpers perform as badly as Google, then they are not worth the trouble, regardless of what any surveys report or white papers say.

Re: Haaaah-hahhhh-ha-ha-ha!!

elsergiovolador

I used Google this month maybe once or twice and each time it was "F*ck" followed by CTRL + w.

Gemini is good

Anonymous Coward

It's awful at most things, but the code it kicks out isn't too bad. However, it DEFINITELY needs reviewed

nobody who matters

<......."Software developers largely appreciate the productivity improvements they get from AI coding tools"......>

Says a company trying to sell AI coding tools to developers.

IP Protection?

Red Ted

"One good method for dealing with the inherent flaws is to start a session by prompting the agent to review the codebase structure, documentation, and key files, before then giving it the actual development task"

Am I alone in reading that as "pour your IP in to the AI that belongs to someone else"?

Re: IP Protection?

Captain Hogwash

I had exactly the same thought.

Re: IP Protection?

doublelayer

It depends whether you trust whatever system you're using to not collect it. Most of them do claim not to store or train on the data you send to them, but I don't trust that. Neither do my employers, who have decided that AI coding tools are useful* but that they need a contract to ensure that their code is only processed by one where there is a legal agreement not to take the code fed into it. I have some more trust that that agreement is being followed. You do have the option of local models if you're worried about that and still want to run one.

* I'm still not sold on them. They do not work well when making changes to anything large, and that's pretty much all of the things I do. The only situation where they're capable is writing the code for small functions you don't want to write, except that I've had to clean up from times where someone did that and it still went off the rails, for example when a colleague used some LLM to write a function to escape characters in a URL which happily mixed HTTP escaping and HTML escaping, and this was Python, so the right thing to do would be to just call the standard library function that someone already wrote to do HTTP escaping.

MUD

DarkwavePunk

I used to program for a MUD back in the day with LPC. Thought it would be amusing to ask my locally running LLM to write a basic MUD in C that could interpret LPC. You'll have to use your imagination to envision the shitshow it spewed forth.

Quality

david1024

If the only metric for code quality measured in reviews is that it meets coding standards and passes static analysis... Sure.

I wonder if the successful reviewers get used to the mistakes that the AI+human pairs are making and find an economy there?

If you think that in terms of the coding world it is or ever will be

Alan Bourke

anything more than a useful tool for doing autocomplete or finding examples or refactoring, and you think you'll be able to use it for developing mission-critical, tested, maintainable line of business software then I have a Tower Bridge to sell you.

Bah Humbug

Pete Sdev

My cow-orker's use of a machine-learning coding tool, which is basically regurgitated copypasta from Stackoverflow or stolen open-source code, often results in dubious code without sufficient error checking.

He often also has no real understanding of what it does and so is reliant on the tool the next time for a similar but different task.

I'd like to see less machine learning and more human learning.

Re: Bah Humbug

ecofeco

How does your coworker even have a job as your coworker?

Seriously? No snark. I hear so many companies say they want competent help and yet I hear so many stories of people who really cannot do their job.

The problem is

Muscleguy

Using an LLM do write code. You need a specific AI which is specified in code writing. Instead they built generalist LLMs which everyone tries to use as expert systems. At least the medics tend to take an off the shelf one then teach it what it needed to know. Though one I read about recently was not very good at spotting cancers. it wasn’t subtle enough. Hunan eyes were much better. In medicine there’s always a human in the loop on those.

News: 1749722411

AI coding tools are like that helpful but untrustworthy friend, devs say

76% ... won't ship AI suggested code without human review

Re: 76% ... won't ship AI suggested code without human review

Haaaah-hahhhh-ha-ha-ha!!

Re: Haaaah-hahhhh-ha-ha-ha!!

Gemini is good

IP Protection?

Re: IP Protection?

Re: IP Protection?

MUD

Quality

If you think that in terms of the coding world it is or ever will be

Bah Humbug

Re: Bah Humbug

The problem is