Does Generative AI Threaten the Open Source Ecosystem? (zdnet.com)
- Reference: 0179878316
- News link: https://developers.slashdot.org/story/25/10/26/208204/does-generative-ai-threaten-the-open-source-ecosystem
- Source link: https://www.zdnet.com/article/why-open-source-may-not-survive-the-rise-of-generative-ai/
That's the warning from Sean O'Brien, who founded the Yale Privacy Lab at Yale Law School. [1]ZDNet reports :
> Open software has always counted on its code being regularly replenished. As part of the process of using it, users modify it to improve it. They add features and help to guarantee usability across generations of technology. At the same time, users improve security and patch holes that might put everyone at risk. But O'Brien says, "When generative AI systems ingest thousands of FOSS projects and regurgitate fragments without any provenance, the cycle of reciprocity collapses. The generated snippet appears originless, stripped of its license, author, and context." This means the developer downstream can't meaningfully comply with reciprocal licensing terms because the output cuts the human link between coder and code. Even if an engineer suspects that a block of AI-generated code originated under an open source license, there's no feasible way to identify the source project. The training data has been abstracted into billions of statistical weights, the legal equivalent of a black hole.
>
> The result is what O'Brien calls "license amnesia." He says, "Code floats free of its social contract and developers can't give back because they don't know where to send their contributions...."
>
> "Once AI training sets subsume the collective work of decades of open collaboration, the global commons idea, substantiated into repos and code all over the world, risks becoming a nonrenewable resource, mined and never replenished," says O'Brien. "The damage isn't limited to legal uncertainty. If FOSS projects can't rely upon the energy and labor of contributors to help them fix and improve their code, let alone patch security issues, fundamentally important components of the software the world relies upon are at risk."
>
> O'Brien says, "The commons was never just about free code. It was about freedom to build together." That freedom, and the critical infrastructure that underlies almost all of modern society, is at risk because attribution, ownership, and reciprocity are blurred when AIs siphon up everything on the Internet and launder it (the analogy of money laundering is apt), so that all that code's provenance is obscured.
[1] https://www.zdnet.com/article/why-open-source-may-not-survive-the-rise-of-generative-ai/
Folk Music (Score:3)
How is this any different than what has happened to folk music. It has been mined and used over and over again to create new music with no real acknowledgement of its province. There has been no significant additions or modifications to folk music in the modern era.
The owners of AI are going to try to claim ownership of its output even though that output relies on mining the commons. They will have effectively claimed a monopoly on of humanity's common ownership. Its either the end of intellectual property or the end of new human creativity.
Re: (Score:1)
One big difference is that since this is just algorithmic modification of a database of code you could track every piece of generated code back to the code it's based on. A single line of generated code may be based on dozens of chunks of original human-written code all with different licenses.
However, if the AI developers talked to a lawyer first they probably got permission from all of those developers to use their code and it doesn't matter. Of course many of those developers may not care.
I don't think so (Score:3)
I don't think it does, at least not currently.
AI currently doesn't generate whole big projects, just smaller snippets of code. You can't just go "Make me a non-GPL VLC" in VSCode. You can have AI write smaller things, like "Create a skeleton for a Wayland program", but in such usages it's not all that different from copying stuff from Stack Overflow and random snippets from Google.
I'd say in general anything where one would worry about licensing is too large for AI yet.
If we do get to the point where we can just have a LLM spit out a full video decoding library that actually works, then it's fair to say that we're living in the future and any concern about licensing is probably obsolete. If AI gets to that point it's probably now able to do projects of almost unlimited size and the world is being turned upside down.
Re: (Score:2)
Did someone of you *ever* saw a program acknowledging the cc-license of code from StackOverflow? I have never seen it. And I think it doesn't matter because most snippets are too short to be protected. Yeah, you didn't had the idea (otherwise you would not have needed SO), but the final snippet is usually either trivial or you have to heavily adapt it for your purpose.
And you can be sure you don't want to see how often commercial code uses MIT/BSDL code (in principle legally) but forgets to acknowledge the
Do people seek out OSS intentionally? (Score:2)
I know I do, but I mean more specifically, do enough people seek out OSS to keep it around? I go looking for OSS solutions both to save money of course but also to be able to have the code so that I can update it if it breaks later. I have successfully done this several times despite not being much of a programmer, getting hints by googling compiler errors.
Re: (Score:2)
Same here, I had this older Ten-Tec HF receiver that was computer controlled by a serial port, the software was made with older CRT monitors in mind and with newer LED screens with higher resolutions you could not read the fonts because they were so small and blurry, so I got the source code and found the entries for the fonts and changed them to TTF and after a couple of adjustments the software was usable again
This is idiotic. (Score:2)
AI doesn't "siphon up and launder" code any more than your brain does. Both *can* memorize (which is why sometimes comedians can accidentally steal a joke), but both are also learning from patterns.
AI isn't going to make people stop contributing to open source.
How many snippets occur ONLY in that open source? (Score:3)
I've implemented linked list traversal f*ck-all-knows how many times over the last 40 years, in a dozen languages. I'm sure similar or identical code exists in hundreds of open source repositories. And millions of CS homework assignments over the decades.
If you compare my code with enough projects, I'm sure you'll find matches. Not because I copied them or Stack Overflow (I was coding long before that was a thing). But because there are really only a few sane ways to implement most algorithms. Which is also why most software patents are stupid, but that's a different can of worms...
It's the "trainer's" responsibility (Score:2)
Whoever trained the LLM is the one that knowingly stole the IP. If you end up in court over something then be sure to bill OpenAI for the experience.
Complete fallacy (Score:2)
First, the assumption that a snippet of code if actually copyrightable is generally untrue.
Secondly, open source exists to help people learn to code; wholesale copying is frowned upon, but snippets or concepts aren't.
Third, unless it negatively impact the project itself, anything short of wholesale copying is likely to be ignored - when developers do that today -.
AI doesn't change this equation very much.
Commercial code on the other hand ... (Score:1)
... by hiding the source code, hides both its dependency on open source and on other less reputable sources.
People don't understand how it works (Score:2)
This is basically why everyone is panicking about AI "stealing" everything.
Well yes, it steals, like your words, not "word for word", like in "not image for image", as in "not code snippet for code snippet", it just don't work that way.
The way I have understood it is that it's more stochastic in nature, meaning it will pick a meaning out of an image, a curve, a circle, a ball, same with code as in language translation and interpretation.
You could in fact compare it to an analytical translator that is capabl
Re: (Score:2)
> So no, AI don't "steal" in the traditional way of just cutting and pasting code, words or images.
Actually, it can do that too, just not reliably.
No (Score:2)
But idiots that think they can barge in with no skill, just because AI "helps" them may well do so.
So the real doomsday scenario (Score:2)
Is because a lot of those projects aren't from obvious they are from people doing school projects or something to build a portfolio for applying for a job.
Generative AI is going to rip through the programming market, it already is devouring Junior programming jobs.
So all those people who wrote lots of useful code because they were in college for computer science or doing portfolio work for future jobs are going to go away.
That means that generative AI eventually won't have anything to train on.
Will it really matter? (Score:3)
I'm not sure it will really matter. With these kinds of tools the only closed source will be that on servers which are secured enough not to leak their code. It's already possible to reverse engineer a binary and LLMs will do a better job at converting that back into high-level programming language code snippets that will be used by other programmers. No one will be able to prove anything and courts will never be able to keep up with it all even if someone were inclined to threaten legal action.
The future of software is effectively open source whether anyone wants it to be or not.
Re: Will it really matter? (Score:2)
AI is going to replace coding. Maybe not in the next 5 years but certainly within the next 20. Users will either write UML or a flowchart/decision tree and the AI will generate the code, test it, refine it, and publish it out in under an hour. This is no different than people learning code from a book of samples routines or libraries.
Re: (Score:2)
That's the same promise as every "no code" framework had. What is the problem with it? If you write a specification detailed enough to explain what you really want, it gets very very long. If you create a concise version of it with an efficient notation ... you end up inventing a programming language syntax.
What AI can do:
- Create me a Tetris clone
- Create me a random game
What AI cannot do by itself:
- Create the game I have in mind
The random game approach has the difficulty that randomness and creativity ar
Re: (Score:2)
Another word to the creativity: I do not believe into divine inspiration. I think humans have the same options like AIs model, what is a trained brain, some kind of randomness and their input. But humans to have a lot more input. If you suddenly feel inspired, it is a combination of the inputs of the last days being processed by your brain and triggered by a recent input. But we're talking about terabyte of input, while for example LLM get less than a megabyte of input in each evaluation (and usually do not