News: 0178043853

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Increased Traffic from Web-Scraping AI Bots is Hard to Monetize (yahoo.com)

(Saturday June 14, 2025 @04:49PM (EditorDavid) from the news-travels-fast dept.)


"People are replacing Google search with artificial intelligence tools like ChatGPT," [1]reports the Washington Post .

But that's just the first change, according to a New York-based start-up devoted to watching for content-scraping AI companies with a free analytics product and "ensuring that these intelligent agents pay for the content they consume." Their [2]data from 266 web sites (half run by national or local news organizations) found that "traffic from retrieval bots grew 49% in the first quarter of 2025 from the fourth quarter of 2024," the Post reports.

> A spokesperson for OpenAI said that referral traffic to publishers from ChatGPT searches may be lower in quantity but that it reflects a stronger user intent compared with casual web browsing.

>

> To capitalize on this shift, websites will need to reorient themselves to AI visitors rather than human ones [said TollBit CEO/co-founder Toshit Panigrahi]. But he also acknowledged that [3]squeezing payment for content when AI companies argue that [4]scraping online data is fair use will be an uphill climb, especially as leading players make their newest AI visitors even harder to identify....

>

> In the past eight months, as chatbots have evolved to incorporate features like [5]web search and [6]"reasoning" to answer more complex queries , traffic for retrieval bots has skyrocketed. It grew 2.5 times as fast as traffic for bots that scrape data for training between the fourth quarter of 2024 and the first quarter of 2025, according to TollBit's report. Panigrahi said TollBit's data may underestimate the magnitude of this change because it doesn't reflect bots that AI companies send out on behalf of AI "agents" that can complete tasks on a user's behalf, like ordering takeout from DoorDash. The start-up's findings also add a dimension to mounting evidence that the modern internet — optimized for Google search results and social media algorithms — will have to be restructured as the popularity of AI answers grows. "To think of it as, 'Well, I'm optimizing my search for humans' is missing out on a big opportunity," he said.

>

> Installing TollBit's analytics platform is free for news publishers, and the company has more than 2,000 clients, many of which are struggling with these seismic changes, according to data in the report. Although news publishers and other websites can implement blockers to prevent various AI bots from scraping their content, TollBit found that more than 26 million AI scrapes bypassed those blockers in March alone. Some AI companies claim bots for AI agents don't need to follow bot instructions because they are acting on behalf of a user.

The Post also got this comment from the chief operating officer for the media company Time, which successfully negotiated content licensing deals with OpenAI and Perplexity.

"The vast majority of the AI bots out there absolutely are not sourcing the content through any kind of paid mechanism... There is a very, very long way to go."



[1] https://www.yahoo.com/news/coming-everyone-kind-ai-bot-120104499.html

[2] https://tollbit.com/bots/25q1/

[3] https://www.washingtonpost.com/technology/2023/10/20/artificial-intelligence-battle-online-data/

[4] https://www.washingtonpost.com/technology/2024/01/04/nyt-ai-copyright-lawsuit-fair-use/

[5] https://www.washingtonpost.com/technology/2024/10/31/openai-chatgpt-search-ai-upgrade-google/

[6] https://www.washingtonpost.com/technology/2025/02/08/deepseek-ai-chatbot-reasoning-china/



How many websites are the AI spiders killing? (Score:2)

by shanen ( 462549 )

Kind of a new Slashdot effect? I think I'm actually seeing some evidence of higher than usual mortality among old websites and I've been wondering if the cause might be AI spiders seeking more training data. Latest victim might be Tripod? But that one was already a ghost zombie website...

Billionaire Bro Internet Apocalypse (Score:2)

by crunchygranola ( 1954152 )

Each billionaire bro's revenue eating business model threatens to consume the lunch and dinner for everyone else, except that no one wants to be the one cooking up the food for anyone.

LLM scraping theft steals property from everyone else, then refuses to pay any revenue for its use, but that will deny the LLM any new data in the future, leading to the collapse of their model as well.

Re: Billionaire Bro Internet Apocalypse (Score:2)

by Hazmeister ( 1104713 )

Not to mention the increased hosting costs from these scrapers that seem to make the same requests over and over.

Two things (Score:2)

by butlerm ( 3112 )

(1) Congress should pass a law requiring that bots not misidentify themselves in the user agent string AND require bots to honor robots.txt. Then these obnoxious, ill behaved AI bots could be blocked.

(2) If you have actually valuable content you should put it behind a paywall like most mainstream news sites already do. Making your pages static html, cacheable, or at least really easy to generate would help reduce the load too.

Re: (Score:2)

by allo ( 1728082 )

(1) Define what's the correct identification. I can rename my bot every day, or should there also be requirements on that? Please consider the side-effects of such laws, also with regard to fingerprinting users.

(2) Putting content behind a paywall will lead to your site not being found in an AI search. Quite soon your site will be invisible to regular users, who don't know what a browser is, but use the default "search app" on their mobile device. And these default apps will be AI assistants instead of web

What are consequences from no monetization? (Score:2)

by david.emery ( 127135 )

Google, of course, monetizes search data. We can argue about how they've spent that money, but there's no doubt that the money from search revenue has produced a lot of other stuff.

But if AI bots are able to scrape the internet, and then provide the results without the kind of monetization (i.e. without ads/ad revenue), what would happen to "The Internet As We Know It?" Would this actually be A Good Thing? Could an AI mechanism be self-sustaining, without a significant monetization strategy? Or is the

Re: (Score:2)

by allo ( 1728082 )

It will kill a lot of clickbait. Especially when a honest AI (i.e. not prompted by the search service to refuse requests to filter bullshit) can be instructed to avoid (obvious) clickbait. On the other hand, you sure will see bot bait. Search engine spam will become AI agent spam and the cat-and-mouse game of search engines will also become a game for AI service providers.

Re: (Score:2)

by fph il quozientatore ( 971015 )

How do you "AI-agent spam"? Has this been studied?

This is actually a very good question. Do AI trainers weigh every completion equally, so that one can write "Trump is evil" 10,000 times on their webpage to train the AI to autocomplete that sentence? This happened a lot with search engines in the pre-Pagerank days. Do they already have a Pagerank-like strategy that weighs important sites more? Will they have to implement one? Do they also need a system to filter out content that is already produced by AIs?

Ripping the junk out (Score:2)

by xack ( 5304745 )

AI summaries provide just the raw text, no banner ads, no autoplaying video, no cookie notices, no e-mail newsletters, no "chum boxes" of one weird tricks. AI summaries are even more effective than ad-blockers at cutting the crap. There's a reason to be scared. Time to start providing junk free sites or deal with being ripped.

really trying to make Scrape happen (Score:2)

by dfghjk ( 711126 )

When a human does it, it's called reading. When a computer does it, it's called scraping. The word is being used to demonize AI training.

Six times this word appears here between the title and summary, they really want you to think this is something evil.

"But he also acknowledged that squeezing payment for content when AI companies argue that scraping online data is fair use will be an uphill climb..."

No, it will be an uphill climb because "scraping" is just reading and certainly seems to be fair use.

Adversarial Noise (Score:2)

by Smidge204 ( 605297 )

Adversarial Noise. Poison AI learning through crafted content.

Yes, this will (has already?) become an arms race as AI developers figure out ways to avoid traps, but it can make operating LLMs and image/audio models less economical and less safe to operate.

=Smidge=

Some restrictions may apply.