EU-sponsored report says GenAI's 'fair use' defense does not compute
- Reference: 1752498933
- News link: https://www.theregister.co.uk/2025/07/14/eu_genai_fair_use/
- Source link:
Controversy surrounds large language models and other GenAI systems because they rely on vast swathes of copyrighted material as training data without adequate remuneration of the people and organizations creating the content.
The [1]study finds [PDF] the current exception for text-and-data mining (TDM) in EU law "was not designed to accommodate the expressive and synthetic nature of generative AI training, and its application to such systems risks distorting the purpose and limits of EU copyright exceptions."
[2]
Companies backing GenAI have argued that the use of copyrighted training data is defensible under "fair use," similar to a student using a copyrighted book to acquire knowledge.
[3]
[4]
But the new research paper – commissioned by the European Parliament's Policy Department for Justice, Civil Liberties and Institutional Affairs at the request of the Committee on Legal Affairs – found that the argument doesn't stand up.
"While it is often suggested that AI systems 'learn' in ways similar to humans – such as reading a book or studying a painting – this analogy is misleading from a legal perspective," the paper said. "Under EU copyright law, this study finds that such a comparison does not hold. When generative AI models are trained on protected content, they typically make copies and process the actual expressions found in those works. This goes beyond what is permitted under current legal exceptions for activities like research or analysis."
[5]
It argued that AI systems do not "understand" what they process in the same way humans do. "As philosopher Luciano Floridi puts it, AI acts without understanding – it follows statistical patterns rather than engaging with meaning. This difference matters legally," it said.
[6]Tech to protect images against AI scrapers can be beaten, researchers show
[7]EU tries to explain how to do AI without breaking the law
[8]Ousted US copyright chief argues Trump did not have power to remove her
[9]Cloudflare creates AI crawler tollbooth to pay publishers
The paper proposes a new EU-level statutory exception to copyright for the specific purpose of training generative AI systems, but also the introduction of an "unwaivable right to equitable remuneration for authors and rightsholders whose works are used in such training."
It also argues that while fully machine-generated outputs should remain unprotected, AI-assisted works require "harmonised protection criteria."
In May, the head of the US Copyright Office, Shira Perlmutter, [10]was removed from office after her agency concluded that AI developers' use of copyrighted material exceeded the bounds of existing fair use doctrine. Her report argued that fair use does not cover the commercial use of large volumes of copyrighted works to generate expressive content that competes in existing markets.
She is currently [11]appealing against her dismissal .
[12]
The current status quo in the US is being challenged by a number of lawsuits. Mega studios Disney and Universal are suing GenAI provider Midjourney for what they claim is a "bottomless pit of plagiarism."
The claim [13]alleges that Midjourney copied some of the studios' best-known characters from their vast movie portfolios. ®
Get our [14]Tech Resources
[1] https://www.europarl.europa.eu/RegData/etudes/STUD/2025/774095/IUST_STU(2025)774095_EN.pdf
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aHUplRDJ-W9vbNGDMRp1KgAAAdY&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHUplRDJ-W9vbNGDMRp1KgAAAdY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aHUplRDJ-W9vbNGDMRp1KgAAAdY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHUplRDJ-W9vbNGDMRp1KgAAAdY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://www.theregister.com/2025/07/11/defenses_against_ai_scrapers_beaten/
[7] https://www.theregister.com/2025/07/10/eu_ai_code_of_practice/
[8] https://www.theregister.com/2025/07/04/copyright_office_trump_filing/
[9] https://www.theregister.com/2025/07/01/cloudflare_creates_ai_crawler_toll/
[10] https://www.theregister.com/2025/05/12/us_copyright_office_ai_copyright/
[11] https://www.theregister.com/2025/07/04/copyright_office_trump_filing/
[12] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aHUplRDJ-W9vbNGDMRp1KgAAAdY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[13] https://www.theguardian.com/technology/2025/jun/11/disney-universal-ai-lawsuit
[14] https://whitepapers.theregister.com/
Good for some
I think the biggest problem is that any protections or laws here will only affect companies training ai for local public consumption.
It's not going to stop bad actors, the type of people working on ai to scam your grandparents, nor is it going to stop companies in countries like China, who already ignore european copyright or patents.
I do wonder if we can win here. If there is no regulation content creators will get screwed but if there is, we're likely handing China the opertunity to lead the world in Ai, and make it impossible for European companies to compete.
With deepseek, they've already proven themselves able, to beat what's coming out of the US.
If the work is going to be stolen regardless, do we want a world where there are European alternative ai solutions, with some safeguards or do we want children trusting the chinese black box to be fair and unbiased.
Re: Good for some
There are many different types of so-called "AI", and the mass scraping of copyrighted content is only required to build LLMs, specifically. They are not the models you would use to guide autonomous drones, detect pathologies in scans, predict weather, model protein folding, and so on and so forth. You know, the ones that are objectively useful.
LLMs are the models that can make text that looks a lot like what a human would create, and that can make images that look a lot like what a human would draw. This feat, while impressive as hell and really fun to toy with, has so far been only somewhat useful in practice (as proven by the fact that no LLM company is able to actually turn a profit so far). Promising, for sure, but so far those promises have not been kept.
My impression is that the really useful models are the specialized ones, that get trained on a purpose-built training set. This also means that you really do not need or want mass scraping. I would be all in favor of some big EU initiative to build more of those.
But frankly, if China wants to participate in the great race towards the LLM money sink, I'm fine with them taking the lead. If they manage to make one so cheap that I can use it for the party tricks that ChatGPT is really good at and still let them make a profit, good for them; it's not exactly world domination stuff.
Re: Good for some
While I agree that LLMs are less objectively useful than the others you've mentioned, LLMs are being used as interfaces and accelerators.
Recruiters are using them to decide what candidates to shortlist, they are being used to write code, summarise meetings, and rewrite emails.
You shouldn't really have to think too hard about why having biased built into a tool that is sending a summary email is bad.
When you let an LLM destill an hour long meeting into 5 bullet points, that's sent out to the people too busy to attend, you're handing over massive soft power.
And I'm not talking about the future, this is in place now. C levels are using it, project managers are using it, legal are using it, this is already affecting massive corporations today.
Check out Microsoft's Meeting Insights for an example
Re: Good for some
The biggest problem/challenge is bypassing the Whitehouse and enforcing our laws on US corporations, at levels which make them sit up and pay attention.
AI summary
I typed "AI use of copyright works". into my 'usual' search engine, and it kindly provided me with this "AI summary"
A specific issue is:
"Key Issues and Legal Considerations:
Copyright and AI Training:
The datasets used to train AI models often contain copyrighted material. AI developers argue that this use may fall under exceptions like research or study, but copyright holders may argue that it infringes their rights. "
I do like the inconsistent treatment there: "AI developers argue ..." vs "copyright holders may argue ...".
The summary at the end fails to mention anything like the 'fait accomplis' that AI has already used vast amounts of copyright material without notice or permission.
"The use of copyrighted works in AI training and the ownership of AI-generated content are complex areas of law. The UK government is actively considering reforms to address the challenges posed by AI and copyright, including transparency, enforcement, and the potential for opt-out systems. The key legal issues revolve around originality, authorship, and whether the use of copyrighted material for AI training falls under existing exceptions."
> "While it is often suggested that AI systems 'learn' in ways similar to humans – such as reading a book or studying a painting – this analogy is misleading from a legal perspective,"
Thank you! I've been saying that for a while. The LLM does something that looks like reading, but it's not the LLM getting sued, it's the LLM's developers , and they are not doing anything that looks like reading.
It's not even the LLM's developers, it's the billionnaire asshats who own the whole thing and they're not doing any reading, for sure.
They're counting (money, of course).
Rather obvious to anyone with functional brain cells, but in the US corporations get to lobby to BUY the politicians' votes for the legislation they want.
Every other nation on the planet calls that kind of unlimited "lobbying" BRIBERY.
I see two main thrusts to this argument. (And a broader discussion about artistic ownership that I freely admit is beyond my expertise)
First, the distinction between research and plagiarism. In academic and creative disciplines, drawing from multiple sources, synthesizing ideas, and citing sources while offering new insights is considered acceptable. Whether large language models (LLMs) genuinely generate new thought is a legitimate debate. But let’s grant, for the sake of argument, that an LLM can spot previously unnoticed patterns and insights that a human might not have identified, and let's further grant that it does so with proper attribution and transparent citation. That’s arguably a net benefit for knowledge advancement, especially in scientific and academic contexts.*
Contrast that with the domain of fiction or other commercial art. Even if an LLM cites its sources, the ability to produce "new" works that closely mimic the style or themes of human creators but at a fraction of the time and cost is not innovation, it is displacement. It undercuts artists by commoditizing the very labour of creativity, repackaging it into something cheaper and more efficient. From a business perspective, this is a return-on-investment dream. From the perspective of working artists, it’s a direct threat to livelihood and culture.
Second, there is the matter of how knowledge is internalized and used. A human absorbs ideas, consumes with nuance, forms a worldview through experience, values, emotion and a learned historical context. A machine doesn’t. LLMs process patterns at scale with no sentience, no grounding in lived experience, no understanding in any philosophical sense, no internal consciousness of cause and effect. The model outputs text based on statistical inference (an elaborate form of autocomplete if you will) rather than intent, intuition, comprehension or passion. Its reasoning is not only arcane, but inaccessible: a stochastic process that mimics logic without ever grasping it.
This distinction matters. When humans reuse ideas, they do so within a cultural and ethical framework that recognizes influence, homage, recognition of precursor work and theft. Machines lack that ethical compass. Their training consumes everything (licensed or not, paid or not, looking at you Meta) without distinction. That alone challenges long-standing norms about consent, ownership, and the value of intellectual labour, both monetary and artistic. We are I suspect at a crossroads similar to the Jacquard loom, with all the attending change and disruption, but I fear ultimately the money will prevail as it so often does.
* All that said, the current pollution of academic and research papers by AI written slop is proving to be a huge headache and as we stand today is doing the world no favours at all.
The US innovates, China replicates, Europe regulates
So Europe loses out on an AI future
Like they lost out on the internet because of government telecom monopolies.
Re: So Europe loses out on an AI future
That doesn't sound like so much of a bad thing to me.
But how?
an "unwaivable right to equitable remuneration for authors and rightsholders whose works are used in such training."
A wonderful idea in principle, but how to make it work is quite another matter. For example, supposing someone hosts an online repository of their own original textual works. Suppose it's free for all to read, but how would the author ever find out that it's been scraped by an "AI" bot for training? It's highly unlikely that the hugely wealthy owners/trainers of the bot will spontaneously volunteer to the "little man" that they've scraped it and the cost of verifying would be prohibitive. As with many good intentioned EU initiatives (e.g. the GDPR) they make the erroneous assumption that big business plays fair. If it did, there would be no need for a law.
"Controversy surrounds large language models and other GenAI systems"
There is no 'controversy'...
All of those companies are stealing other people's work and they're all trying to make money off of it without paying the original authors.