News: 0178644214

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Reddit Will Block the Internet Archive (theverge.com)

(Monday August 11, 2025 @05:50PM (msmash) from the how-about-that dept.)


Reddit says that it has caught AI companies scraping its data from the Internet Archive's Wayback Machine, so it's going to start blocking the Internet Archive [1]from indexing the vast majority of Reddit . From a report:

> The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the Reddit.com homepage, which effectively means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day.

>

> "Internet Archive provides a service to the open web, but we've been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine," spokesperson Tim Rathschmidt tells The Verge.



[1] https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit



It's because (Score:5, Informative)

by OverlordQ ( 264228 )

They already [1]sold it to google [gizmodo.com]

[1] https://gizmodo.com/reddit-signs-deal-scrape-your-online-community-ai-parts-1851270475

Re:It's because (Score:5, Insightful)

by brunes69 ( 86786 )

Which is how it should work.

If AI companies want to train on data, they should have to pay for it.

Right now this entire industry is built on IP theft. Its sickening frankly.

Re: (Score:1)

by OverlordQ ( 264228 )

Yes, but if the AI companies are scraping IA and not reddit, it has zero impact on Reddit besides Reddit being pissy.

Re: (Score:2)

by dskoll ( 99328 )

If AI companies are scraping IA, it has no traffic impact on Reddit, but does have a missed-revenue-opportunity impact.

Re:It's because (Score:5, Insightful)

by brunes69 ( 86786 )

It has a huge impact because it devalues these kinds of deals and just supports the idea that these companies can run roughshod over IP rights, steal, and pillage to their hearts content without consequence.

Re: (Score:3)

by OverlordQ ( 264228 )

So is it "Information Wants to Be Free" or "Information Wants to Be Free to Only Those I Agree With"

Re: (Score:2)

by allo ( 1728082 )

It would be good news, if they can't sell user data anymore. If I post on Reddit, then I do it for people to read it, not for Reddit to sell it. And I decide myself if I am offended by AI reading my posts or not.

Re: It's because (Score:2)

by ArmoredDragon ( 3450605 )

These kinds of deals will only last until the AI hype dies down and the market matures.

Of course, spammers and russians already know about said deals and throw their own AI slop on reddit, so I question what value it has right now.

Either way, if this is how reddit intends to become profitable long-term, they're in for a rude awakening.

Copying is not theft. IP is access delay. (Score:2)

by couchslug ( 175151 )

Demonstrated public demand for shorter access delays than current IP law allows suggests the public would be better off with different laws.

There are many ways to make money from free software etc. None are harmed by downloading it and many benefit.

Re: (Score:1)

by Iamthecheese ( 1264298 )

Screw IP rights. Information wants to be free.

Re: (Score:2)

by OverlordQ ( 264228 )

So am I devaluing by browsing the site?

Re: It's because (Score:2)

by brunes69 ( 86786 )

Why are you being a sycophant for these VC backed AI companies?

Re: (Score:2)

by DamnOregonian ( 963763 )

This fight impacts a lot more than VC backed AI companies.

Why are you being intellectually dishonest?

Recursive loop? (Score:3)

by Sebby ( 238625 )

> AI companies are scraping IA

Doesn't that cause some sort of infinite or recursive loop?

Re: (Score:2)

by allo ( 1728082 )

In many legislations you cannot waive it (but you can often release your content under a license that waives all restrictions). But on the other hand, it needs to be complex enough to have copyright at all. If you write a longer post about copyright and it's exceptions it is probably protected. This post on the other hand is something everyone could have come up with. A few more thoughts and it might become protected.

Re: (Score:2, Troll)

by drinkypoo ( 153816 )

> Right now this entire industry is built on IP theft. Its sickening frankly.

What's more sickening, an industry built on "IP theft" or the term of copyright after it's been extended due to lobbying from media megaconglomerates?

Re: (Score:3)

by Mr. Dollar Ton ( 5495648 )

Both are a manifestation of the same problem - the power of money to subvert the law. Sometimes big money may be in conflict, but like the tagline from that movie, whoever wins, we lose.

Re: It's because (Score:2)

by ArmoredDragon ( 3450605 )

You guys are always wanting us to be more like Europe, because you're a rebel. Disney wanted to import European copyright laws into the US, and that's exactly what happened. How else do you intend to rebel?

Re: (Score:2)

by machineghost ( 622031 )

This guy gets it: IP is an imaginary concept! It should exist only as long as it benefits society ... but our IP laws have been corrupted to only serve the needs of a corporations.

Pretending that you should keep following made-up rules, that don't benefit anyone except the ultra-rich, as if it was some kind of moral concern, is completely idiotic.

Re: (Score:2)

by SNRatio ( 4430571 )

Move fast and break things ... laws ... people

Re: It's because (Score:2)

by ArmoredDragon ( 3450605 )

I remember only five years ago, slashdot had a very counterculture/adversarial view towards intellectual property. Now it seems to be very Jack Valenti.

You wouldn't download a reddit.

(But neither would I, the last thing I need is several terabytes worth of the internet's anus.)

Re: (Score:1)

by davidwr ( 791652 )

> the last thing I need is several terabytes worth of the internet's anus.

I think you are confusing Reddit with the many n-chans out there. It's an understandable mistake.

Re: It's because (Score:2)

by ArmoredDragon ( 3450605 )

Reddit is the anus, those are just the bigger chunks of splatter.

Re: (Score:2)

by DamnOregonian ( 963763 )

Fucking seriously.

Re: (Score:2)

by msauve ( 701917 )

I remember only five years ago, slashdot had a very counterculture/adversarial view towards intellectual property. Now it seems to be very Jack Valenti.

It's the difference between personal use, and corporate profit.

Re: (Score:2)

by djinn6 ( 1868030 )

Reminder that the "creativity" being defended by IP laws originates from Reddit users, not the Reddit the company. Most Reddit users are not aware that their posts can be sold for money.

Re: (Score:2)

by rsilvergun ( 571051 )

I'm surprised they only got 60 million out of it.

The one thing I have learned about AI is that whoever controls scrapable data controls the AIs. Because they are useless without massive training sets.

This means you can open source the models all you want they are basically worthless without the training data sets and those are going to be getting locked up behind paywalls owned and operated by Major platform holders very soon.

This means that the capital that ai represents, and it is capital just

Re: (Score:2)

by Mr. Barky ( 152560 )

I really wonder how much added value there is in recent data. For a search engine, obviously it needs to be recent - but for any other use... old data is possible as good or better than recent data (especially as new data is going to be polluted with AI-generated content). Maybe search is where all the money lies (Google isn't exactly poor...) and hence the need to scrape endlessly.

Re: (Score:2)

by Sebby ( 238625 )

> That does not rely on "Scraping" and visually snapshots all the posts.

We already have it in Windows 11 - it's called "Recall"; it's a fucking privacy nightmare.

Re: We need an archive... (Score:2)

by paul_engr ( 6280294 )

This is an utterly stupid non-idea

That's a shame actually (Score:2)

by MobyDisk ( 75490 )

I sometimes use it to archive especially insightful conversation on reddit. ...yeah it is a rare event, but it does sometimes happen. *casts furtive glance around Slashdot*

Re: That's a shame actually (Score:2)

by ArmoredDragon ( 3450605 )

Every now and then, shit has kernels of corn.

Re: (Score:1)

by davidwr ( 791652 )

but you wouldn't want to eat them.

Re: That's a shame actually (Score:2)

by paul_engr ( 6280294 )

But corn slows done one's shit intake, even marginally. Unacceptable!

Re: That's a shame actually (Score:2)

by paul_engr ( 6280294 )

If the AI ate the corn laden turds, then another AI ate its double corn laden turds, could we get to a pure corn turd utopia?

the internet forgets after all (Score:2)

by awwshit ( 6214476 )

It took AI to get the Internet to forget things, interesting.

Price, value and rivalry (Score:2)

by abulafia ( 7826 )

There was enormous value to the old internet because the marginal price for access was zero.

LLMs provide a mechanism to access the same information in a radically more energy-intensive way, which was the missing mechanism to put a price on that value.

A price tag means the data has to be made into a rivalrous good or you can't sell it. Then the old data has to be made unavailable.

Reddit Sells Their Data To AI (Score:2)

by WankerWeasel ( 875277 )

Reddit literally sells their data to Google and others for scraping. What Reddit is saying here is that they're blocking The Internet Archive because they aren't paying to scrape that data. Google pays $60 million a year to scrape Reddit for AI data.

[1]https://www.thedailybeast.com/... [thedailybeast.com]

[1] https://www.thedailybeast.com/google-will-pay-reddit-dollar60m-a-year-to-use-its-content-for-ai-report/

Re: (Score:2)

by brunes69 ( 86786 )

As it should be.

AI house-of-card companies should not be allowed to engage in rampant IP theft.

Re: (Score:3)

by WankerWeasel ( 875277 )

Bwahaha, that's literally what Reddit does. It steals content from every other source on the internet and profits from it.

Re: (Score:2)

by allo ( 1728082 )

But but but ... they can't know what their users post, can they?!

I mean if they knew that most users do not own the content they post, they would surely delete it ...

Welcome to the Walled Garden (Score:3)

by Sebby ( 238625 )

AI is why we can't have nice things.

Re: (Score:1)

by JBeretta ( 7487512 )

> AI is why we can't have nice things.

Right. What nice things do you think you'd have without AI? (for the record, I'm not a fan of AI, but even less of a fan of moronic statements)

Re: (Score:2)

by Aristos Mazer ( 181252 )

The OP is objecting to the loss of the Internet Archive and the ability to review history because of the AI scanning.

Its quite ok (Score:1)

by Slashythenkilly ( 7027842 )

Reddit has become a cesspool of whining babies almost as bad as slashdot and thats not data we need to keep.

Re: (Score:2)

by slimscsi ( 4045129 )

> Reddit has become a cesspool of whining babies almost as bad as slashdot and thats not data we need to keep.

Slashdot commentary is nowhere on the sub-level of Reddit, c'mon, that's disingenuous.

Re: Its quite ok (Score:2)

by Slashythenkilly ( 7027842 )

Its the larger point im making about what needs to be archived. If you havnt seen some of the more petty and childish arguments on /. then youre not looking.

Browse at 6 Re: Its quite ok (Score:1)

by davidwr ( 791652 )

I usually browse at 6. It filters out the petty and childish arguments that show up when I browse at 5.

Re: Its quite ok (Score:2)

by Slashythenkilly ( 7027842 )

As if you know what i "follow" geez.

Who cares? (Score:1)

by systemd-anonymousd ( 6652324 )

Who cares? Reddit is 90% bots and marketing agencies. It's useless. It's AI slop that's been digested and shit back out multiple times...AI trained on the output of AI trained on the output of AI trained on the last vestiges of actual human communication from a forgotten era of Reddit, and all of it designed to push a certain narrative, get you to think a certain way, or make it hard for you to see content they don't want you to see.

Re: Who cares? (Score:2)

by paul_engr ( 6280294 )

I did this one trick and side hustle to quintuple my dong size and bank decimals. Every second post on reddit ever. Fuck you, reddit.

And why? (Score:2)

by allo ( 1728082 )

If a company complains that AI crawlers are causing too much traffic, one might believe it or not (it's not as if Reddit isn't using a CDN for example). But why are they complaining when a mirror they are not hosting themselves is crawled? It's not as if the Internet Archive would crawl them more often when AI bots access the archive.

Re: (Score:1)

by davidwr ( 791652 )

I think the ostensible reason has to do with things like deletions. If internet archive gets it, and it's deleted later, it's still at archive.org.

I suspect the actual reason has to do with money, they want to be the only place that companies willing to pay for content can go to train their AIs, and they want to shut out companies unwilling to pay. Whether this is a good thing, a bad thing, both, or neither is probably a discussion for another time.

IP is a government gift (Score:2)

by couchslug ( 175151 )

IP is a government gift, not some natural right though Disney will disagree. IP was intended to facilitate greater good, not rent seeking.

IP is not a requirement for successful capitalism as nations which disregard it demonstrate. It is not necessary to be competitive but a deliberately conferred, supposedly temporary, market advantage intended to aid progress in the useful arts.

Slashdot luddites are silly (Score:2)

by Flentil ( 765056 )

I see AI hate regularly on other sites, but it's especially funny seeing the AI Haters come out on Slashdot. You hate technology now? Ridiculous! Don't you still want a household robot to take out the trash, or is that unfashionable now, because it's AI? What a joke AI haters are, especially here. They should pay to scrape data! Don't be ridiculous.

Re: (Score:2)

by hackertourist ( 2202674 )

hating AI in its current form does not make one a Luddite. This is not about "hating technology", it's about recognizing that what is available now is deeply flawed.

Re: (Score:2)

by DamnOregonian ( 963763 )

> hating AI in its current form does not make one a Luddite.

In a hypothetical universe that doesn't exist, sure, you're not wrong.

But the AI hatred often overlaps with flat out Luddite tendencies- sacrifice of every drop of intellectual honesty one can find to change the narrative around a technology with no regard for the facts on the ground.

When I start running into people here who don't like AI, but aren't also engaging in flat out lying in order to prop up their reality bubble, I'll be more inclined to agree with you

So stupid (Score:2)

by sinkskinkshrieks ( 6952954 )

This is collective punishment, virtue signaling, power-tripping.

I have an idea (Score:2)

by paul_engr ( 6280294 )

Maybe they should block the whole internet. They cna take pinterest with them. I get no value out of the fuckwits on reddit

Well (Score:1)

by Bahbus ( 1180627 )

reddit is garbage and has been for a long time. The IA can probably just find a new way to archive the site and it's posts.

-- All articles that coruscate with resplendence are not truly auriferous.
-- When there are visible vapors having the prevenience in ignited
carbonaceous materials, there is conflagration.
-- Sorting on the part of mendicants must be interdicted.
-- A plethora of individuals wither expertise in culinary techniques vitiated
the potable concoction produced by steeping certain coupestibles.
-- Eleemosynary deeds have their initial incidence intramurally.
-- Male cadavers are incapable of yielding testimony.
-- Individuals who make their abode in vitreous edifices would be well
advised to refrain from catapulting projectiles.