AI-authored code contains worse bugs than software crafted by humans

(2025/12/17)

Reference: 1765987212
News link: https://www.theregister.co.uk/2025/12/17/ai_code_bugs/
Source link:

Generating code using AI increases the number of issues that need to be reviewed and the severity of those issues.

[1]CodeRabbit , an AI-based code review platform, made that determination by looking at 470 open source pull requests for its State of AI vs Human Code Generation report.

[2]The report finds that AI-generated code contains significantly more defects of logic, maintainability, security, and performance than code created by people.

[3]

On average, AI-generated pull requests (PRs) include about 10.83 issues each, compared with 6.45 issues in human-generated PRs. That's about 1.7x more when AI is involved, meaning longer code reviews and increased risk of defects.

[4]

[5]

Problems caused by AI-generated PRs also tend to be more severe than human-made messes. AI-authored PRs contain 1.4x more critical issues and 1.7x more major issues on average than human-written PRs, the report says.

Machine-generated code therefore seems to require reviewers to deal with a large volume of issues that are more severe than those present in human-generated code.

[6]

These findings echo a report issued last month by Cortex, maker of an AI developer portal. The company's [7]Engineering in the Age of AI: 2026 Benchmark Report [PDF] found that PRs per author increased 20 percent year-over-year even as incidents per pull request increased by 23.5 percent, and change failure rates rose around 30 percent.

The CodeRabbit report found that AI-generated code falls short of meatbag-made code across the major issue categories. The bots created more logic and correctness errors (1.75x), more code quality and maintainability errors (1.64x), more security findings (1.57x), and more performance issues (1.42x).

In terms of specific security concerns, AI-generated code was 1.88x more likely to introduce improper password handling, 1.91x more likely to make insecure object references, 2.74x more likely to add XSS vulnerabilities, and 1.82x more likely to implement insecure deserialization than human devs.

[8]

One area where AI outshone people was spelling – spelling errors were 1.76x more common in human PRs than machine-generated ones. Also, human-authored code had 1.32x more testability issues than AI stuff.

"These findings reinforce what many engineering teams have sensed throughout 2025," said David Loker, director of AI at CodeRabbit, in a statement. "AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate."

CodeRabbit cautions that its methodology has limitations, such as its inability to be certain that PRs labeled as human-authored actually were exclusively authored by humans.

[9]Browser 'privacy' extensions have eye on your AI, log all your chats

[10]Mozilla Corporation installs Firefox driver in CEO reboot

[11]MI6 chief: We'll be as fluent in Python as we are in Russian

[12]Salesforce willing to lose money on AI agent licenses when customers are locked in

Other studies based on different data have come to different conclusions.

For example, an August 2025 [13]paper by University of Naples researchers, "Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity," found that AI-generated Python and Java code "is generally simpler and more repetitive, yet more prone to unused constructs and hardcoded debugging, while human-written code exhibits greater structural complexity and a higher concentration of maintainability issues."

Back in January 2025, researchers from Monash University (Australia) and University of Otago (New Zealand) published [14]a paper titled "Comparing Human and LLM Generated Code: The Jury is Still Out!"

"Our results show that although GPT-4 is capable of producing coding solutions, it frequently produces more complex code that may need more reworking to ensure maintainability," the southern hemisphere boffins wrote. "On the contrary, however, our outcomes show that a higher number of test cases passed for code generated by GPT-4 across a range of tasks than code that was generated by humans."

As to the impact of AI tools on developer productivity, researchers from Model Evaluation & Threat Research (METR) [15]reported in July that "AI tooling slowed developers down."

Your mileage may vary.

We note that Microsoft patched 1,139 CVEs in 2025, according to Trend Micro researcher Dustin Childs, who claims that's the second-largest year for CVEs by volume after 2020.

Microsoft says [16]30 percent of code in certain repos was written by AI and Copilot Actions comes with [17]a caution about "the security implications of enabling an agent on your computer."

"As Microsoft's portfolio continues to increase and as AI bugs become more prevalent, this number is likely to go higher in 2026," Childs [18]wrote in his post.

But at least we can expect fewer typos in code comments. ®

Get our [19]Tech Resources

[1] https://www.coderabbit.ai/

[2] https://www.coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aULhqEIGol_dSY776yd6gwAAAQk&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aULhqEIGol_dSY776yd6gwAAAQk&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aULhqEIGol_dSY776yd6gwAAAQk&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aULhqEIGol_dSY776yd6gwAAAQk&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://go.cortex.io/rs/563-WJM-722/images/2026-Benchmark-Report.pdf?version=0

[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aULhqEIGol_dSY776yd6gwAAAQk&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[9] https://www.theregister.com/2025/12/16/chrome_edge_privacy_extensions_quietly/

[10] https://www.theregister.com/2025/12/16/mozilla_corporation_new_ceo/

[11] https://www.theregister.com/2025/12/16/mi6_chief_well_be_as/

[12] https://www.theregister.com/2025/12/15/salesforce_ai_monetization/

[13] https://arxiv.org/abs/2508.21634

[14] https://arxiv.org/html/2501.16857v1

[15] https://arxiv.org/abs/2507.09089

[16] https://www.theregister.com/2025/04/30/microsoft_meta_autocoding/

[17] https://support.microsoft.com/en-us/windows/experimental-agentic-features-a25ede8a-e4c2-4841-85a8-44839191dfb3

[18] https://www.zerodayinitiative.com/blog/2025/12/9/the-december-2025-security-update-review

[19] https://whitepapers.theregister.com/

Accountants

Primus Secundus Tertius

Only a management accountant could believe that AI software will make good profits.

Re: Accountants

Snowy

If you have to pay for patches, more bugs equals more profits.

Re: Accountants

KittenHuffer

If you're not part of the solution there is money to be made by prolonging the problem!

---------> Mine is the one I'll wear when I come back tomorrow to fix the problem!

Vibe study

Anonymous Coward

This study sort of confirms the way I've been feeling about these tools - it's not surprising that a tool with no understanding of any kind would create serious bugs, and I would expect in many cases it takes longer to solve those bugs than it would have to write well-architected code.

It doesn't matter. We'll still be told we have to use it.

It's funny how AI - which harms the planet and makes our jobs worse - is mandated by our business leaders, while remote working - which helps the planet and improves our quality of life - is forbidden, as far as they can get away with it. I don't think these things are unconnected.

I still choose to be a programmer rather than a sloperator, as far as possible, but ultimately I need to pay the rent one way or another. Here's hoping the bubble goes up sooner than later.

I've said it once

may_i

So I'll say it again: LLMs write terrible code.

Having just taken over a project which was written by a poor programmer using Copilot to hide his lack of skill, I can confirm that LLMs have no idea about rational error handling or program structure.

They're useful for one-off, throwaway programs. They can be a great reference for all the billions of libraries out there.

Are they any use for writing production quality code?

That's an emphatic NO from me.

Re: I've said it once

Pickle Rick

Agree with your points wholeheartedly.

I've just tried Claude - first dabble at this LLM stuff, after a mate (well, I thought he was a mate!) goaded me into it. He gave me some tips: build the system in baby steps; be specific etc etc

My conclusions/observations (in short, I'll get bored):

- there's no way an inexperience programmer could produce anything useful

- the LLM makes insane assumptions for _anything_ that it's not told to not do

- creating the prompts to limit the LLM's assumptions takes as long as writing the code, requires solid programming knowledge, and the code still needs checking

- the code is bloated.

- build the system in baby steps is bollocks. The iterative process modifies the little bit that worked as it didn't accommodate the next bit (how could it?)

Reading others' code can be challenging. Historically, that often occurs on a known working system, so it can be assumed that "it's mostly right". Unraveling this shite brings that to a whole different level. A level below and not worthy of BOFH's basement.

I can see it being useful, perhaps, for creating a framework for a large and complicated application. But the working gubbins? No feckin way. And if one's thing is creating large and complicated apps, chances are you've already got that bit sorted. Tidying existing code? Maybe: change all variables to Hungarian notation; use PascalCase for functions (don't hate me!). Woo fucking hoo.

I haven't used MCP self defined code chunks yet, I expect that'll help a lot. But I've been writing software for over 40 years. Give that to a noob? Sheesh! *shudders*

What am I missing here?

dippy1

An AI tool is saying AI tools are useless?

Are they getting murderous or suicidal?

Closest to shocked face I could get

Guido Esperanto

Shocked I tell you

Shocked

Spelling errors were 1.76x more common in human PRs

Anonymous Coward

I eliminated that source of error years ago simply by not commenting my code.

News: 1765987212

AI-authored code contains worse bugs than software crafted by humans

Accountants

Re: Accountants

Re: Accountants

Vibe study

I've said it once

Re: I've said it once

What am I missing here?

Closest to shocked face I could get

Spelling errors were 1.76x more common in human PRs