Are AI Coding Assistants Really Saving Developers Time? (cio.com)

(Sunday September 29, 2024 @11:34AM (EditorDavid) from the 100x-programmers dept.)

Reference: 0175151547
News link: https://developers.slashdot.org/story/24/09/28/2132232/are-ai-coding-assistants-really-saving-developers-time
Source link: https://www.cio.com/article/3540579/devs-gaining-little-if-anything-from-ai-coding-assistants.html

Uplevel provides insights from coding and collaboration data, according to [1]a recent report from CIO magazine — and recently they measured "the time to merge code into a repository [and] the number of pull requests merged" for about 800 developers over a three-month period (comparing the statistics to the previous three months).

[2]Their study "found no significant improvements for developers" using Microsoft's AI-powered coding assistant tool Copilot, according to the article (shared by Slashdot reader [3]snydeq ):

> Use of GitHub Copilot also introduced 41% more bugs, according to the study...

>

> In addition to [4]measuring productivity , the Uplevel study looked at factors in developer burnout, and it found that GitHub Copilot hasn't helped there, either. The amount of working time spent outside of standard hours decreased for both the control group and the test group using the coding tool, but it decreased more when the developers weren't using Copilot.

An Uplevel product manager/data analyst acknowledged to the magazine that there may be other ways to measure developer productivity — but they still consider their metrics solid. "We heard that people are ending up being more reviewers for this code than in the past... You just have to keep a close eye on what is being generated; does it do the thing that you're expecting it to do?"

The article also quotes the CEO of software development firm Gehtsoft, who says they didn't see major productivity gains from LLM-based coding assistants — but did see them introducing errors into code. With different prompts generating different code sections, "It becomes increasingly more challenging to understand and debug the AI-generated code, and troubleshooting becomes so resource-intensive that it is easier to rewrite the code from scratch than fix it."

On the other hand, cloud services provider Innovative Solutions saw significant productivity gains from coding assistants like Claude Dev and GitHub Copilot. And Slashdot reader [5]destined2fail1990 says that while large/complex code bases may not see big gains, "I have seen a notable increase in productivity from using Cursor, the AI powered IDE." Yes, you have to review all the code that it generates, why wouldn't you? But often times it just works. It removes the tedious tasks like querying databases, writing model code, writing forms and processing forms, and a lot more. Some forms can have hundreds of fields and processing those fields along with doing checks for valid input is time consuming, but can be automated effectively using AI.

This prompted an interesting discussion on [6]the original story submission . Slashdot reader [7]bleedingobvious responded:

> Cursor/Claude are great BUT the code produced is almost never great quality. Even given these tools, the junior/intern teams still cannot outpace the senior devs. Great for learning, maybe, but the productivity angle not quite there.... yet.

>

> It's damned close, though. GIve it 3-6 months.

And Slashdot reader [8]abEeyore posted:

> I suspect that the results are quite a bit more nuanced than that. I expect that it is, even outside of the mentioned code review, a shift in where and how the time is spent, and not necessarily in how much time is spent.

Agree? Disagree? Share your own experiences in the comments.

And are developers really saving time with AI coding assistants?

[1] https://www.cio.com/article/3540579/devs-gaining-little-if-anything-from-ai-coding-assistants.html

[2] https://resources.uplevelteam.com/gen-ai-for-coding

[3] https://www.slashdot.org/~snydeq

[4] https://uplevelteam.com/blog/why-dora-metrics-are-only-part-of-the-equation

[5] https://www.slashdot.org/~destined2fail1990

[6] https://slashdot.org/submission/17327791/devs-gaining-little-if-anything-from-ai-coding-assistants

[7] https://www.slashdot.org/~bleedingobvious

[8] https://www.slashdot.org/~abEeyore

It's a tool. (Score:5, Interesting)

by MrNaz ( 730548 )

If you try to build a ship with nothing but a welding torch, it won't go well.

Copilot is excellent. But if you try to make it write ALL your code for you, that code will suck.

Re: (Score:3)

by echo123 ( 1266692 )

> If you try to build a ship with nothing but a welding torch, it won't go well.

> Copilot is excellent. But if you try to make it write ALL your code for you, that code will suck.

I am an open-source CMS developer. There's a vast amount of relevant open-source code that the LLM Borg has trained itself on, and the feedback is pretty good. There's no irrelevant open-source code that's been published to the web, Github, or GitLab, it all is pretty much vetted and is valid. Coding by prompt is akin to critically reviewing another developer's code IMHO.

Technology changes and the market expects developers to keep up in order to compete.

Re: (Score:2)

by echo123 ( 1266692 )

> There's no irrelevant open-source code that's been published to the web, Github, or GitLab, it all is pretty much vetted and is valid.

...in the Framework I use -- I meant to write.

Re: (Score:1)

by arglebargle_xiv ( 2212710 )

> There's a vast amount of relevant open-source code that the LLM Borg has trained itself on, and the feedback is pretty good. There's no irrelevant open-source code that's been published to the web, Github, or GitLab, it all is pretty much vetted and is valid.

You forgot to include the wink emoji, people might think you're serious there.

Re:It's a tool. (Score:5, Insightful)

by AmiMoJo ( 196126 )

I tried them out a couple of times and was not all that impressed with the results. Both times the code did at least work, but wasn't particularly good. StackExchange quality stuff, functional but far from ideal.

In both cases I'd have preferred to re-write it from scratch myself. That would give me a chance to really think through the algorithm and the potential issues with every line, something I find easier when writing code than when reviewing it.

Re: (Score:2)

by q_e_t ( 5104099 )

> If you try to build a ship with nothing but a welding torch, it won't go well.

Seemed to work OK for Liberty Ships in WW2...

coding productivity (Score:3)

by Moblaster ( 521614 )

The AI coding assistants are very powerful but right now in 2024, because code itself is very complex typically with hundreds of files in an app, AI is not quite there with a holistic code base (application level) training and output just yet.

The most productive AI users now are senior developers who can use the AI to both 1. iterate code sections insanely fast 2. actually read the code the guide the AI in the next iterations.

So TODAY you still have to know what you are doing to leverage AI tools for actually-better quality x speed output.

3-6 months you won't have to know as much.

Re:coding productivity (Score:4, Insightful)

by VeryFluffyBunny ( 5037285 )

I reckon there'll be worse problems in the longer term, e.g. AI tools may save time for developers if used appropriately, & for experienced developers, it's probably a good idea for routine work that they know inside-out. However, for inexperienced developers, who don't yet have the mastery, in-depth knowledge, & higher-level, more abstract understandings of coding, having a machine do the nitty-gritty for them may inhibit their development since they're not getting the hands-on experience & developing the working knowledge of coding features & strategies that are necessary. We may end up with a lot of coders who stay at a basic level & never progress into more competent coders. Then the older, more competent coders start retiring or moving on, & then... well, things might get a bit problematic since the remaining coders don't really understand the bigger picture.

Re: coding productivity (Score:4, Insightful)

by scrib ( 1277042 )

The question of "mastery" is one that requires perspective.

I'm over 50 and learned data structures and memory management but barely touched assembly in college. 10 years ago, I was told that understanding the difference between "pass by reference" and "pass by value" was rare.

The point is that "mastery" is having the skills to be productive with the best tools of the time, and that changes. Learning how to get the best results out of AI but not understanding how its output works is just a different layer of abstraction from using console.log but not having any idea how that makes different pixels appear on the screen.

I haven't worked with AI coders yet, but I have no doubt it is another technology I'll work with before my career is done; another thing I'll have to "master."

Re: coding productivity (Score:3)

by fluffernutter ( 1411889 )

> , I was told that understanding the difference between "pass by reference" and "pass by value" was rare.

>> So that's the reason right there why people have so many problems with memory leaks that they are developing a language that force you to do contortions to do anything. Seems it would be easier to just treach people the difference.

Re: coding productivity (Score:2)

by guruevi ( 827432 )

Which is the same problem we have today. There are plenty of people that have no interest in advancing their careers. They are happy where they are and with what they make and some literally do first level helpdesk stuff until they retire.

The thing I see AI doing right now if it has access to the codebase is the stuff junior frontend developers do. The so-called designers and artists are what is going to go, if you find out you needed a different or additional field somewhere in the middle of the project, y

Re:coding productivity (Score:4, Interesting)

by phantomfive ( 622387 )

> The AI coding assistants are very powerful but right now in 2024, because code itself is very complex typically with hundreds of files in an app, AI is not quite there with a holistic code base (application level) training and output just yet.

This is a problem because the context window of AIs is still very small. A large codebase will overwhelm it. To address that problem, we're going to need new algorithms.

Re: (Score:1)

by bleedingobvious ( 6265230 )

> code itself is very complex typically.

You are not a developer and your opinion is worth less than pig shiat.

Congrats on outing yourself.

Re: (Score:2)

by chmod a+x mojo ( 965286 )

> Yes, "AI" tools may provide benefits in some areas but it's not a silver bullet, a go-fast button, for the majority of tasks yet.

Hmm, it's almost like it is in the name - "assistant".

There's a reason it's not called a "coding slave" or "under / unpaid intern" code writer.

From what I've seen... (Score:2)

by Casandro ( 751346 )

... it kinda works for things people have done over and over again. Writing a CURD-Application in PHP probably works just fine... but then again, why on earth are we doing the same thing over and over again, shouldn't the software environment deal with such trivialities. The far bigger productivity gain would be in using environments that are tailored for the job you are trying to solve. If you have an application with 20 database tables... you shouldn't have to write your CURD-code for each one of them. Th

Re: (Score:3)

by KiloByte ( 825081 )

That you mispell that acronym as "CURD" is telling.

Re: (Score:2)

by know-nothing cunt ( 6546228 )

Whey!

"Great for learning*?! (Score:2)

by devslash0 ( 4203435 )

Teaching people how to write crap code is not the quality we should be striving for!

Re: (Score:2)

by Mr. Dollar Ton ( 5495648 )

Why not? They can then generate the code the next AI will learn from! Circular learning!

Re: (Score:2)

by postbigbang ( 761081 )

There is some truth to this. Asking for code in a comparative vacuum is going to render code without context, until the context evolves. This means organizational model training will eventually yield better code because of training and feedback loops into the blackbox which makes code.

This also permits developers generating their own code-making models to have a companion for their generating efforts over a period of time, then understanding how code relates to a larger model. Isolating lib models to functi

Re: (Score:2)

by dvice ( 6309704 )

That depends. Bad code can be better than no code, but it can also be worse than no code at all.

If you have a problem that requires 100 man years to do manually, and you manage to write bad code that does that in 1 day, crashing randomly all the time, but still managing to do it and you can verify the work to be correct, this is a good solution.

But if you add flashing lights to Linux kernel that only one person wants to use and this causes all Linux servers around the world to crash, it would have been bett

Re: (Score:2)

by bleedingobvious ( 6265230 )

Thank you!

Yes (Score:5, Insightful)

by cascadingstylesheet ( 140919 )

It's a tool. Used properly, it saves tons of time.

"Rewrite this whole section of code to use OpenStreetMaps instead of Google Maps"

Could I have done it myself? Sure. In 30 seconds? No ...

Re: (Score:3, Insightful)

by phantomfive ( 622387 )

> Could I have done it myself? Sure. In 30 seconds? No ...

If you think it only took 30 seconds, then you're one of those people leaving introducing 41% more bugs. You need to make sure you understand any code generated by AI...

Re: (Score:2)

by cascadingstylesheet ( 140919 )

>> Could I have done it myself? Sure. In 30 seconds? No ...

> If you think it only took 30 seconds, then you're one of those people leaving introducing 41% more bugs. You need to make sure you understand any code generated by AI...

Gee, we could just assume I'm stupid ... or we could assume that I was talking about the initial writing part, leaving the review and QA as a given.

In any case, the answer to the titular question is still "yes".

Re: (Score:3)

by phantomfive ( 622387 )

Yeah, but you need to add extra time for understanding what the LLM gave you, because it could (and often does) have subtle errors.

Re: (Score:2)

by echo123 ( 1266692 )

>> Could I have done it myself? Sure. In 30 seconds? No ...

> If you think it only took 30 seconds...

I think the point is the suggestion helped the skilled coder progress, rather efficiently.

Re: (Score:2)

by phantomfive ( 622387 )

If you are just copying and pasting the result (as opposed to using the output to understand the API), then you are not a skilled coder. You are a copy/paste coder. You MUST use AI to increase your skill at this point, since it's still not good enough to do it by itself.

Re: Yes (Score:1)

by guruevi ( 827432 )

If you introduce bugs swapping one third party API for the other, perhaps your code is poor to begin with.

Re: (Score:2)

by phantomfive ( 622387 )

I don't know, API swapping can be a difficult and subtle process, since differences between the two APIs are not always obvious. Unless both APIs are VERY well documented, you need to schedule time for a lot of testing.

Asked for a CRUD website; it made me DOOM (Score:2)

by thesjaakspoiler ( 4782965 )

well, you can't complain about creativity. =/

They replace Stackoverflow (Score:3)

by allo ( 1728082 )

They do not replace the person typing code, but Stackoverflow for looking up (possibly trivial) questions. You have a sidebar where you can just type ("What is the C++ idiom to do ...") and get the answer instantly. Yes, it might be able to apply it to your code, but the answer itself is the important part.

That's why Stackexchange is trying to lock down their exports (to the displeasure of the community) because they would rather have their user's content (which is CC licensed by their ToS) as their capital and nobody train on it free to use models.

Timeline (Score:2)

by phantomfive ( 622387 )

> It's damned close, though. GIve it 3-6 months.

I really wonder how they are able to estimate timelines like this. What inside information do they have that we don't? [1]3 to 6 months? [xkcd.com]

[1] https://xkcd.com/678/

It can. But also maybe not. (Score:2)

by jrnvk ( 4197967 )

It really depends - sometimes there is something relatively simple (but complex to write) that it can spit out correctly. Other times it is just creating new rabbit holes. A good developer should be able to spot if it helps or hurts pretty quickly. An inexperienced one will probably struggle more. Kind of like life before these chatbots.

Re: (Score:2)

by Hodr ( 219920 )

But we don't learn by writing perfect code, we learn by fixing problems. If an inexperienced coder leans heavily on AI generated code, but then has to deal with fixing all of the issues generated by that AI, eventually they will no longer be an inexperienced coder.

I want code review (Score:2)

by davecb ( 6526 )

I seem to get that with rabbit... [1]https://leaflessca.wordpress.c... [wordpress.com]

[1] https://leaflessca.wordpress.com/2024/09/01/contrarily-rabbits/

Depends... (Score:3)

by bradley13 ( 1118935 )

As a teacher, and someone who supervises a wide variety of student projects: I do lots of random bits of coding in different languages and using different frameworks and APIs. I cannot possibly keep the details of all of them in my head. ChatGPT is great for reminding me how to do X in language Y or with framework Z. Basically, it is a single source for reference material.

AIs are not yet very useful at actually writing code, at least, not beyond a trivial level. Just as an example, I had a student last week who was writing a web service in Java. When he closed his program, the ServerSocket was not always being properly released. ChapGPT came up with all sorts of overly-complicated solutions, none of which helped. All he actually needed to do was declare the thread to be a daemon thread, but ChatGPT never suggested that. I gave him the hint, ChatGPT gave him the syntax, and the problem was solved.

AI is tool fetishism at its silliest. (Score:2)

by Eunomion ( 8640039 )

Make tools for tasks, not the other way around. Can't stand how much IT has turned the basic concept of technology on its ass.

Headline doesn't match article contents (Score:3)

by Tony Isaac ( 1301187 )

Headline says AI didn't save developers time.

The story itself describes the experience of two companies: Uplevel and Innovative Solutions. Uplevel says they didn't see any gains (and worse bug counts), Innovative Solutions says AI helped them achieve a 2x-3x increase in productivity.

So the real headline should be "Mixed results" from AI coding assistants.

This makes me wonder how the study methodologies of the two companies differ, and how their practice--use of AI--differs.

41% more bugs (Score:3)

by Tony Isaac ( 1301187 )

What does that mean, exactly? Not all bugs are created equal. Some are serious and consequential, others are more a matter of opinion. Does this higher bug count stem from new (AI) scanning tools?

This reminds me of why I don't run Lint or ReSharper. Many of the bugs or flaws reported by these tools are accurate, but they are drowned out by a forest of inconsequential (though technically accurate) reported issues, that might or might not be necessary to fix. Many of these are more coding style preferences than actual code issues.

In this study, was the same process used to scan or count bugs before the AI tools were introduced? Or were the old, lower bug counts the result of a mor manual process?

As a hobbyist programmer (Score:2)

by ClueHammer ( 6261830 )

I find copilot quite useful and it definitely saves time. Things like auto completing comments, to getting descriptions of a block of code you cant quite get a grip on, To getting advice on how to make a certain change. Its still a long way from "write me a doom clone in rust" but I hope it gets there one day.

no surprise (Score:2)

by Tom ( 822 )

> Use of GitHub Copilot also introduced 41% more bugs, according to the study...

Let me guess, at least half of those bugs are from bad example code posted on the Internet, or from QUESTIONS rather than answers, you know the "why does this not work?" questions.

LLMs are an excellent mirror of our world. They will reflect back what we communicated amongst ourselves. If flat earthers weren't a fringe group, the LLM would gladly tell you that the Earth is flat.

Buried the Lede (Score:2)

by Geoffrey.landis ( 926948 )

Yeah, the summary definitely buried the lede. If the use of GitHub Copilot "also introduced 41% more bugs, according to the study", I don't think the important part of the story is how fast it isn't.

Specific domain: embedded and LwIP (Score:2)

by AncalagonTotof ( 1025748 )

Yes, that's very specific. Has anybody had success with AI in these areas ? I tried to get clues and solutions to problems with LwIP on STM32 with Chat GPT.

All I got were :

1) banalities, small talks.

2) things I already knew, But not helping, not providing answers to things I don't know.

3) errors, wrong answers.

I gave up trying

Re: (Score:2)

by phantomfive ( 622387 )

Yeah, when you get to embedded, everything gets harder, because StackOverflow doesn't have the answers anymore. OpenSource is really helpful in these situations because at least you can look at the code to figure out what is happening.

AI is a research project (Score:2)

by MpVpRb ( 1423381 )

I'm hopeful that it will eventually be developed into something useful

Unfortunately, investors want profits NOW, so they demand that companies release half-baked, kinda useless crap

Meanwhile financial journalists write articles about how AI is failing to meet expectations

In software development, I don't see the value in using crappy AI to help mediocre programmers more quickly develop mediocre code

I'm hopeful that the systems of the future will allow expert programmers to manage the complexity of large syst

Same pattern (Score:2)

by Tablizer ( 95088 )

Almost every new software-related idea is initially overdone and misused. Over time people figure out where and how to use it effectively instead of mostly make messes as gestures to the Fad Gods. But there will be fucked up systems left in their wake. Pitty the poor maintainers.

OOP, microservices, crypto, 80's AI, distributed, Bootstrap, etc. etc. went thru a hype stage.

Thus, I expect the initial stages will be fucked up.

Yes. True story, happened on Friday (Score:2)

by mhocker ( 607466 )

I run a software company which uses MS SQL Server and .NET Core for our app development. My lead developer had a problem with a SQL script which was stubbornly slow - he was querying a huge table with an index that really wasn't helping the performance. He created a prompt to ChatGPT to ask how to optimize the query and it came back - instantly - with a subquery approach that moved the filtering logic from the index into memory. It ran significantly faster.

I have a feeling that GPT had somehow scraped some

News: 0175151547

Are AI Coding Assistants Really Saving Developers Time? (cio.com)

It's a tool. (Score:5, Interesting)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re:It's a tool. (Score:5, Insightful)

Re: (Score:2)

coding productivity (Score:3)

Re:coding productivity (Score:4, Insightful)

Re: coding productivity (Score:4, Insightful)

Re: coding productivity (Score:3)

Re: coding productivity (Score:2)

Re:coding productivity (Score:4, Interesting)

Re: (Score:1)

Re: (Score:2)

From what I've seen... (Score:2)

Re: (Score:3)

Re: (Score:2)

"Great for learning*?! (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Yes (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: Yes (Score:1)

Re: (Score:2)

Asked for a CRUD website; it made me DOOM (Score:2)

They replace Stackoverflow (Score:3)

Timeline (Score:2)

It can. But also maybe not. (Score:2)

Re: (Score:2)

I want code review (Score:2)

Depends... (Score:3)

AI is tool fetishism at its silliest. (Score:2)

Headline doesn't match article contents (Score:3)

41% more bugs (Score:3)

As a hobbyist programmer (Score:2)

no surprise (Score:2)

Buried the Lede (Score:2)

Specific domain: embedded and LwIP (Score:2)

Re: (Score:2)

AI is a research project (Score:2)

Same pattern (Score:2)

Yes. True story, happened on Friday (Score:2)