Wikipedia Signs AI Licensing Deals On Its 25th Birthday (apnews.com)
- Reference: 0180585250
- News link: https://news.slashdot.org/story/26/01/15/1516207/wikipedia-signs-ai-licensing-deals-on-its-25th-birthday
- Source link: https://apnews.com/article/wikipedia-internet-jimmy-wales-50e796d70152d79a2e0708846f84f6d7
Google had already signed on as one of the first enterprise customers back in 2022. The agreements follow the Wikimedia Foundation's push last year for AI developers to pay for access through its enterprise platform. The foundation said human traffic had fallen 8% while bot visits -- sometimes disguised to evade detection -- were heavily taxing its servers.
Wikipedia founder Jimmy Wales said he welcomes AI training on the site's human-curated content but that companies "should probably chip in and pay for your fair share of the cost that you're putting on us." The site remains the ninth most visited on the internet, hosting more than 65 million articles in 300 languages maintained by some 250,000 volunteer editors.
[1] https://apnews.com/article/wikipedia-internet-jimmy-wales-50e796d70152d79a2e0708846f84f6d7
Shame (Score:2)
Wonder how well that will go down with the editors.
Re: (Score:2)
The editors know the license they need to use when writing for Wikipedia.
This is also most likely not about articles. You can download a Wikipedia dump if you want to train on it. There are also datasets with these dumps prepared for training on the usual sites. This is about Wikimedia commons, which is A LOT more data. Rather have a company to pay for a direct download than have an inefficient bot crawling the same content. The license allows both, but the crawler causes more load on the server than allowi
Re: (Score:2)
> Well that's it for trusting Wikipedia for information. I already knew to be skeptical of Wikipedia edits, but going forward Wikipedia will be edited by AI bots with little to no human intervention. It will just be an entire shit show of hallucinations or bias opinions based on whatever training data the AI is fed.
> AI is going to send us back to the stone age. AI will create the next great war as the working population deals with massive job and income loss. In 20 years we will have a generation who are functionally illiterate and won't know a shred of history because nobody will trust any information as factual. Technology will be lost due to nobody being qualified to manage it. AI is going to make a society of idiots.
Someone in the elite classes probably sees that as yet another benefit. Much easier to control a society of idiots than a society of well-informed, well-educated folks. Granted, with the way things are going I think we'll be finding an excuse to eliminate a whole lot of people well before we have to worry about the AI takeover, but even if we don't proactively do that, it may be the end result. "For the greater good," will be translated into, "for the good of the few uber-wealthy," and the rest of us will b
Re:One more nail (Score:5, Informative)
> Well that's it for trusting Wikipedia for information. I already knew to be skeptical of Wikipedia edits, but going forward Wikipedia will be edited by AI bots with little to no human intervention.
You have got that backwards...
Re: One more nail (Score:4, Informative)
That's... not what's going on here. Wikipedia is licensing it's content for these AI companies to crawl for training their models. Presumably to, at minimum, pay for the bandwidth they're using.
Re: (Score:1)
Yeah, I realized this after I posted it... jumped to conclusions before reading the article in typical /. fashion. :P
Re: One more nail (Score:4, Funny)
> Yeah, I realized this after I posted it... jumped to conclusions before reading the article in typical /. fashion. :P
Even better, you didn't even finish reading the first paragraph of the summary. Excellent work!
Re: (Score:2)
Did you try to read at least the summary? This is not about bots writing for Wikipedia.
Why don't they just cache it locally? (Score:2)
If the AI companies need to keep checking wikipedia, why not just use some of their massive storage to cache the damn thing and stop hammering the servers? What's with this attitude of "we'd rather check the server a thousand times a second than remember what we just read"?
Re: (Score:2)
Because that would make their blatant theft of human knowledge be even more obvious copyright infringement, I guess... At least now they're paying for it.
Re: (Score:2)
Well, it's not theft if it's already free to everyone. But maybe copyrights are the reason they don't cache instead. Since the hosts are annoyed about being hammered by the AI companies, wouldn't it make more sense for the hosts to tell them, "you can cache, but you cannot constantly scrape"?
Re: (Score:2)
Bandwidth is cheaper than storage. Apparently.
Re: (Score:2)
Well, for the AI companies maybe. They don't bear the costs it places on the hosts. I would think the hosts would have the leverage to force the AI companies to stop if they allowed them to cache instead.
Re: (Score:2)
This is likely about Wikimedia commons (images, videos, etc.) which is a lot more data than the text dumps.
Re: (Score:2)
Even better - those probably don't change as often. Very cacheable.
Reasonable (Score:5, Insightful)
Wikipedia has terms of service that means you give up limited rights when you contribute to it. As such, they decide whether AI gets access to it.
Being a free service to the general public, it is totally reasonably to charge special users to use it.
This is a great way to fund the the general public's use, especially considering how the AI community has in general disregarded authors rights. Better to charge them up front.
Re: (Score:3)
Agreed. And without that deal, the slop-makers would just steal everything anyways.
Re: (Score:2)
Which the slop-makers already had done of course - With the resulting crawlers creating significant financial burden along the way.
My guess is the Wiki is now being piped direct as changes happen. Like a rolling live simulcast. Then those crawlers go away and the burden of bandwidth and server costs lift.
Re: (Score:3)
> Being a free service to the general public, it is totally reasonably to charge special users to use it.
More importantly, AI companies are going to scrape the site regardless of it being allowed or not. What this does is gives them firm legal standing that companies doing so are causing them financial losses.
YES! (Score:2)
thank you.
I'm sure they scraped most of Wikipedia already; anybody can go get an offline copy of wikipedia and have been able to do so for many years. It would be the 1st dataset I'd use to train something that large. I've already used it simply to get a list of the most commonly used english words. This is about new content, being hammered by bots, money, and legal issues.
more like Jimmy FAILS (Score:1)
It's finally happening, people! Surely they'll fork Wikipedia THIS time!
Probably the sane choice (Score:3)
Otherwise the AI pushers would just steal everything and cause damage to availability on top of that.
Just remember this kids! (Score:1)
Wikipedia is completely unbiased! It’s crowd-sourced! And the crowd only uses approved sources:
[1]https://en.wikipedia.org/wiki/... [wikipedia.org]
With Wikipedia, our AI’s are guaranteed to be benign overlords!
Sleep well.
[1] https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources
Re: (Score:3)
A fan of "doing your own research" I take it.
Re: (Score:1)
> A fan of "doing your own research" I take it.
Is that sarcasm? Not a fan of “doing your own research”. We can’t trust the plebes!
I spent thirty years in high tech - much with direct Bell Labs lineage - and the “do your own research” crowd? Morons.
Be better!
Re: (Score:2)
> ArchieBunker is in fact a moron and has a blind spot when it comes to bias. Thinks he's "smart" because he tinkers with electronics and holds politically correct opinions and sneers sarcastically at wrongthinkers.
No. Impossible. The mods love him so far!
License violation (Score:3)
This seems to be a violation of the CC-BY-SA license granted to Wikipedia by its editors, since the AI companies do not give proper attribution in the output of their models. (Given the vast volume of sources each model has ingested, it would be impractical to do so.)
Re: (Score:1)
Jimmy Wales Claims He Can Override Wikipedia License Terms By Executive Order Because of Little-Known "Nobody Will Stop Me" Loophole
So how long before Wikipedia is defaced by Redact? (Score:2)
Wikipedia is going to look like reddit soon if they aren't diligent. Wikipedia has a lot of power trip gatekeepers.
There goes the neighborhood (Score:2)
Aren't we all just sitting around waiting for a handful of people to own every fucking thing on this planet.
AI, a overhyped bullshit front end to Wikipedia (Score:2)
So, your brain has rotted to the point where you cannot use a browser? Then use AI, so it can rot further. : P
My complex relationship with Wikipedia (Score:3)
I came across Wikipedia very early on, when it had barely 40,000 articles. I was a regular contributor for about two years from 2004-2006 before I gave up because of disputes and became a vandal. Wikipedia is still growing, even though notability limits true growth and that non notable content is routinely monetized by Fandom and Knowyourmeme. Even though I am banned from Wikipedia and am not sorry for vandalising it I'm still impressed what they have built. Wikipedia has made freely available what academic journals and newspapers lock behind paywalls, so I have a respect for that.
Wikipedia's content is traceable by its history, meanwhile AI just spits out whatever is computed in the LLM algorithm. Elon Musk's disaster of Grokipedia shows what can go wrong when AI tries to make an encyclopedia with a Nazi point of view, Wikipedia is still inherently more trustable.
Sellouts (Score:1)
The people who donated time and money to this half-assery are chumps. And that's before talking about Wikia.
Re:Sellouts (Score:5, Insightful)
Servers are expensive, bandwidth is expensive, and Wikipedia draws massive amounts of human and bot traffic. They are a non-profit organization and provide their services for free. AI companies and their bots are hammering down their servers all the time, increase cost. Why should they not charge?
I donated and keep donating here and there, and I don't mind their new policy. Unless they allow AI bots to pollute Wikipedia with slop, I am fine.
Re: (Score:3)
My understanding is that the entirety of Wikipedia is only about 60 GB and is conveniently downloadable. Anyone ought to be able to download a local mirror to use, instead of hammering wikipedia's servers, and doing so might be faster for the consumer, anyway.
And in a world where hundreds of millions of mainstream users stream video , I'm not sure bandwidth really is expensive anymore. To us old-timers, the numbers today are just astonishing. I almost can't believe I used to worry so much about efficiency ..
Re: (Score:1)
Is Wikipedia not of benefit for the public good? If you donated time (or money) to stroke your own ego that's fine too, but anyone that decides to take their ball and go home after this announcement obviously wasn't in it for the public interest to begin with - to those I say good riddance.
Re: (Score:3)
This seems predicated on the assertion that AI is a public good, and not another means of centralizing power and reducing the workforce. These companies do not have your best interest at heart.
Re: (Score:2)
I'm not sure why you think your comment has anything to do with AI using Wikipedia's content to help in its mission to essentially destroy the web as an information medium.
Re: (Score:3)
Wikipedia is charging AI companies a fee for access, as opposed to the slop companies abusing the servers for free. I really don't see the problem.