AI training license will allow LLM builders to pay for content they consume
- Reference: 1745514731
- News link: https://www.theregister.co.uk/2025/04/24/uk_publishing_body_launches_ai/
- Source link:
The Copyright Licensing Agency (CLA) has intends to launch a Generative AI Training Licence, which is set to be available in the third quarter of 2025.
It said the licence would pay publishers and authors – especially those unable to negotiate direct licensing deals – and give AI developers of all sizes the “legal certainty” they need to access copyrighted training data.
OpenAI wants to bend copyright rules. Study suggests it isn't waiting for permission [1]READ MORE
The [2]CLA is a not-for-profit organization representing licensing groups in the UK. It said Publishers’ Licensing Services and ALCS Authors’ Licensing and Collecting Society would be part of the launch of the Generative AI Training Licence.
Mat Pfleger, CEO of CLA, said: “Training AI models on copyrighted content requires permission and compensation. CLA’s collective licence will further demonstrate that licensing is the answer and can provide a market-based solution that is efficient and effective. Our goal is to provide a clear, legal pathway for access to quality content. One that empowers innovators to develop transformative GAI technologies whilst respecting copyright and compensating rightsholders and creators where their works are used.”
[3]
The problem the CLA is likely to face is that the tech industry is not known to wait for legal certainty before it develops products, strikes M&A deals, builds social media platforms, or launches software audits. In fact, it could be argued that legal uncertainty has created the fertile ground from which it has grown to dominate government, commerce and culture so effectively. By the time legal certainty arrives, the horse has bolted, boarded a flight to Mauritius, and is sipping a gin and tonic by the pool.
[4]
[5]
For example, $300-billion-valued OpenAI has responded to a [6]US government consultation by saying [7]it should have access to any data it wants to train GenAI models, and to stop foreign countries from trying to enforce copyright rules against it and other American AI firms.
Meanwhile, the UK government consultation on AI and copyright recently closed. It proposed copyright exemptions for text and data mining (TDM).
[8]
"Exploring a TDM exception with rights reservation mechanisms, underpinned by enhanced transparency measures, may be a viable route for facilitating the agreement of licences. This will meet the needs of both rights-holders and AI developers," [9]it said .
This is [10]the position favored by the Oracle-backed think-tank, the Tony Blair Institute for Global Change.
[11]It's fun making Studio Ghibli-style images with ChatGPT – but intellectual property is no laughing matter
[12]Copyright-ignoring AI scraper bots laugh at robots.txt so the IETF is trying to improve it
[13]Writing for humans? Perhaps in future we'll write specifically for AI – and be paid for it
[14]Do AI robo-authors qualify for copyright? It's still no, says appeals court
Then there is the question of copyrighted material already used for training data. Books3 is a commonly used dataset, with 196,640 books in plain text format, which the [15]UK's Publishers Association said has allowed copyright infringement on an "absolutely massive scale".
Meanwhile, [16]The Atlantic has alleged Meta , along with other genAI devs, may have accessed millions of copyrighted books and research papers through dataset LibGen. Researchers have [17]speculated that OpenAI may have done the same, with the allegations a part of lawsuits over the alleged use of copyrighted material. UK authors were [18]alarmed to find their copyrighted books on the database.
The Register has asked the Copyright Licensing Agency (CLA) to provide further details. ®
Get our [19]Tech Resources
[1] https://www.theregister.com/2025/04/03/openai_copyright_bypass/
[2] https://cla.co.uk/about-us/#:~:text=Mat%20Pfleger%20is%20the%20Chief,the%20rights%20and%20content%20industries.
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aAq0jp7sa6JUvdGChK282QAAAEk&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aAq0jp7sa6JUvdGChK282QAAAEk&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aAq0jp7sa6JUvdGChK282QAAAEk&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[6] https://www.whitehouse.gov/briefings-statements/2025/02/public-comment-invited-on-artificial-intelligence-action-plan/
[7] https://www.theregister.com/2025/03/13/openai_data_copyright/
[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aAq0jp7sa6JUvdGChK282QAAAEk&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[9] https://www.gov.uk/government/consultations/copyright-and-artificial-intelligence/copyright-and-artificial-intelligence
[10] https://www.theregister.com/2025/04/03/blair_institute_ai_copyright/
[11] https://www.theregister.com/2025/04/14/miyazaki_ai_and_intellectual_property/
[12] https://www.theregister.com/2025/04/09/ietf_ai_preferences_working_group/
[13] https://www.theregister.com/2025/04/01/interview_with_david_wong/
[14] https://www.theregister.com/2025/03/18/appeals_court_says_ai_authors/
[15] https://www.theregister.com/2024/04/11/mp_committee_ai_copyright/
[16] https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094/
[17] https://www.theregister.com/2025/04/03/openai_copyright_bypass/
[18] https://news.sky.com/story/british-authors-absolutely-sick-to-discover-books-on-shadow-library-allegedly-used-by-meta-to-train-ai-13336716
[19] https://whitepapers.theregister.com/
And the analogy continues, because what comes out of horses is pretty much what comes out of AI.
How much will they pay ?
1p, £1, ... ? Is that per page or per book or what ?
Does the amount that they will pay depend on the quality, how is that measured ?
I have a little scheme fermenting in my head - generate thousands of pages of nonsense and invite AI firms to pay me for it.
Re: How much will they pay ?
"generate thousands of pages of nonsense and invite AI firms to pay me for it."
Rather than them generating pages of nonsense and other people paying for it?
Re: How much will they pay ?
1) Ask ChatGPT to generate reams of nonsense (other nonsense generators are available)
2) Ask LLM trainers to pay to use said nonsense
3) $$$
What could possibly go wrong?
so 5p per statement and?
...a debit of 6p for each piece of garbage. I'm not sure who I want to win this battle. But I know the oil, natural gas, and coal industries will make a killing. Governments, don't forget to apply a tax on these transactions.
Times change, laggards are left by the wayside
"Training AI models on copyrighted content requires permission and compensation."
Permission, why? Compensation for what, and to whom?
Those are simple questions in a context of enormous assumed certainties.
Consider the UK "Public Lending Right" which leeches upon Britain's communal libraries. When introduced, it overturned long-established principles regarding access and use of books. If charged at librarians' desks to borrowers, there would have been an outcry. It passes itself as a hidden component of 'rates', the tax paid by home occupiers and businesses. Doubtless, if it was considered remotely feasible, a 'rentier' tax would apply to private book/music/DVD lending among individuals; I made no mention of 'copying' which belongs in a different kettle of would-be monetisable-by-edict fish.
There is wailing in some quarters over people, these deemed alike to others 'who steal cars', who use the 'Robin Hood' services of Anna's Archive and Sci-Hub. A notable difference among the categories of complainants about the existence of these facilities is that the former consist of publishers and authors, whilst the latter, distributing academic literature, occasions moaning only from publishers.
During the early days of home copying of digitally encoded music onto CDs, later film on DVD also, some nations introduced a levy on retail sales of blank media to 'compensate' publishers' creative accountants. Incidentally, that did not affect early attempts within the same nations to extort money from some users of the BitTorrent protocol.
Shall there be demands put before easily 'bought' legislators that all private users of the Internet are taxed to compensate allegedly creative individuals (publishers masquerading as such) for material 'filched' via Anna's Archive, Sci-Hub, and from non-rentier-compliant Internet-based AI services? Perhaps, renowned economists will be set to determine what proportion of every person's disposable income can be extracted, at source, to keep the massive cultural rentier monopolies free from worrying that their easy way of doing business is under threat.
Despite being unpalatable to some, it is undeniable that rentier economics applicable to digitally representable culture is taking its last gasp.
Look ! A horse !
I think it's bolted...