AI web crawlers are destroying websites in their never-ending hunger for any and all content
- Reference: 1756489507
- News link: https://www.theregister.co.uk/2025/08/29/ai_web_crawlers_are_destroying/
- Source link:
Cloud services company [3]Fastly agrees. It reports that [4]80% of all AI bot traffic comes from AI data fetcher bots . So, you ask, "What's the problem? Haven't web crawlers been around since 1993 with the arrival of the [5]World Wide Web Wanderer in 1993?" Well, yes, they have. Anyone who runs a website, though, knows there's a huge, honking difference between the old-style crawlers and today's AI crawlers. The new ones are site killers.
Fastly warns that they're causing "performance degradation, service disruption, and increased operational costs." Why? Because they're hammering websites with traffic spikes that can reach up to ten or even twenty times normal levels within minutes.
[6]
Moreover, AI crawlers are much more aggressive than standard crawlers. As the [7]InMotionhosting web hosting company notes, they also tend to [8]disregard crawl delays or bandwidth-saving guidelines and extract full page text, and sometimes attempt to follow dynamic links or scripts.
[9]
[10]
The result? If you're using a shared server for your website, as many small businesses do, even if your site isn't being shaken down for content, other sites on the same hardware with the same Internet pipe may be getting hit. This means your site's performance drops through the floor even if an AI crawler isn't raiding your website.
Smaller sites, like my own [11]Practical Tech , get slammed to the point where they're simply knocked out of service. Thanks to [12]Cloudflare Distributed Denial of Service (DDoS) protection , my microsite can shrug off DDoS attacks. AI bot attacks – and let's face it, they are attacks – not so much.
[13]
Even large websites are feeling the crush. To handle the load, they must increase their processor, memory, and network resources. If they don't? Well, according to most web hosting companies, if a website takes longer than three seconds to load, more than half of visitors will abandon the site. Bounce rates jump up for every second beyond that threshold.
So when AI searchbots, with Meta (52% of AI searchbot traffic), Google (23%), and OpenAI (20%) leading the way, clobber websites with as much as 30 Terabits in a single surge, they're damaging even the largest companies' site performance.
Now, if that were traffic that I could monetize, it would be one thing. It's not. It used to be when search indexing crawler, Googlebot, came calling, I could always hope that some story on my site would land on the magical first page of someone's search results so they'd visit me, they'd read the story, and two or three times out of a hundred visits, they'd click on an ad, and I'd get a few pennies of income. Or, if I had a business site, I might sell a widget or get someone to do business with me.
[14]
AI searchbots? Not so much. AI crawlers don't direct users back to the original sources. They kick our sites around, return nothing, and we're left trying to decide how we're to make a living in the AI-driven web world.
Yes, of course, we can try to fend them off with logins, paywalls, CAPTCHA challenges, and sophisticated anti-bot technologies. You know one thing AI is good at? It's getting around those walls.
As for robots.txt files, the old-school way of blocking crawlers? Many – most? – AI crawlers simply ignore them.
For example, [15]Perplexity has been accused by Cloudflare of ignoring robots.txt files. [16]Perplexity, in turn, hotly denies this accusation . Me? All I know is I see regular waves of multiple companies' AI bots raiding my site.
There are efforts afoot to supplement robots.txt with [17]llms.txt files. This is a proposed standard to provide LLM-friendly content that LLMs can access without compromising the site's performance. Not everyone is thrilled with this approach, though, and it may yet come to nothing.
In the meantime, to combat excessive crawling, some infrastructure providers, such as [18]Cloudflare, now offer default bot-blocking services to block AI crawlers and provide mechanisms to deter AI companies from accessing their data. Other programs, such as the popular open-source and free [19]Anubis AI crawler blocker , just attempt to slow down their visits to a, if you'll pardon the expression, a crawl.
In the arms race between all businesses and their websites and AI companies, eventually, they'll reach some kind of neutrality. Unfortunately, the web will be more fragmented than ever. Sites will further restrict or monetize access. Important, accurate information will end up siloed behind walls or removed altogether.
Remember the open web? I do. I can see our kids on the Internet, where you must pay cash money to access almost anything. I don't think anyone wants a Balkanized Internet, but I fear that's exactly where we're going.
Get our [20]Tech Resources
[1] https://www.cloudflare.com/
[2] https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/
[3] https://www.fastly.com/
[4] https://www.theregister.com/2025/08/21/ai_crawler_traffic/
[5] https://en.wikipedia.org/wiki/World_Wide_Web_Wanderer
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aLIi9Uu3TLTJ2bCdtmETQQAAAEI&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[7] https://www.inmotionhosting.com/
[8] https://www.inmotionhosting.com/blog/ai-crawlers-slowing-down-your-website/
[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aLIi9Uu3TLTJ2bCdtmETQQAAAEI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aLIi9Uu3TLTJ2bCdtmETQQAAAEI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[11] https://practical-tech.com/
[12] https://www.cloudflare.com/ddos/
[13] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aLIi9Uu3TLTJ2bCdtmETQQAAAEI&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[14] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aLIi9Uu3TLTJ2bCdtmETQQAAAEI&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[15] https://www.theregister.com/2025/08/04/perplexity_ai_crawlers_accused_data_raids/
[16] https://www.zdnet.com/article/perplexity-says-cloudflares-accusations-of-stealth-ai-scraping-are-based-on-embarrassing-errors/
[17] https://llmstxt.org/
[18] https://www.theregister.com/2025/07/01/cloudflare_creates_ai_crawler_toll/
[19] https://anubis.techaro.lol/
[20] https://whitepapers.theregister.com/
AI, what is it good for? Absolutely nothin'!
Like any public space, anyone can come in. If they misbehave, they'll be asked to leave. If they persist, police and judges can get involved. Banning from all public spaces can be ordered by a court.
Visiting a public website should be treated in the same way
Killing the golden goose
While eating the seed corn.
Tech douche bros gonna tech douche bro.
1. Remember when some sites only worked with Flash on the client? And every site worked differently? Bring that back, it will load down the leachers.
2. Be IPv6 only, Big (US) Tech have not discovered that yet. At least my matchbox server does not see any scrapers.
Capitalism at it's finest
Just creating more business. Gotta keep the wheels of industry turning and all that.
A monetized internet might be the only way to solve this
I remember seeing suggestions many many years ago that we could eliminate spam if it cost 5 cents to send an email - and you were paid that same 5 cents for receiving an email! Since most of us receive more emails than we send, we wouldn't complain about such an arrangement. Companies that send out marketing stuff that isn't considered traditional spam would have to think about how often they send, just like they had to think about how often they sent stuff through snail mail. But spammers could never afford those charges, it would totally break their economic model. Thus, end of spam. Yes we'd need to figure out how to handle stuff like mailing lists who are sending us stuff we want - maybe create some whitelists for senders we don't charge. I always thought that would work, providing we had a payment infrastructure set up to handle it - sort of a TCP/IP protocol for payments. Too bad blockchain wasn't invented in 1995, that might work well for this.
I wonder if we could do the same for the web? I wouldn't care if I had to pay five cents per page I visited, if there was a way to recoup that money similarly to how I'd recoup the cost of sending emails. But it would have to be a way that you and I could take advantage of to recoup, but AI could not. I can't think of a way to do that off the top of my head. It would also provide the infrastructure for micropayments for people generating useful content to have a way to get paid that isn't filling the page with ads that most of us block, or charging a subscription which isn't really tenable for following one-off links.
Parasites...
Some might be useful and become symbiotic - other are just parsites and will harm and kill the host. It will be interesting to see which other host AIs can infect when the actual one is killed.
Dumb Stormtroopers...
Replaced my front pages with anti-AI articles and quotes a while back.