While Meta Crawls the Web for AI Training Data, Bruce Ediger Pranks Them with Endless Bad Data (bruceediger.com)
(Sunday November 16, 2025 @11:34AM (EditorDavid)
from the unfriending dept.)
[1]From the personal blog of interface expert Bruce Ediger :
> Early in March 2025, I noticed that a web crawler with a user agent string of
>
> meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
>
> was hitting my blog's machine at an unreasonable rate.
>
> I followed the URL and discovered this is what Meta uses to gather premium, human-generated content to train its LLMs. I found the rate of requests to be annoying.
>
> I already have a PHP program that creates the illusion of an [2]infinite website . I decided to answer any HTTP request that had "meta-externalagent" in its user agent string with the contents of a bork.php generated file...
>
> This worked brilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and 31, 2025...
>
> After about 3 months, I got scared that Meta's insatiable consumption of Super Great Pages about condiments, underwear and circa 2010 C-List celebs would start costing me money. So I switched to giving "meta-externalagent" a 404 status code. I decided to see how long it would take one of the highest valued companies in the world to decide to go away.
>
> The answer is 5 months.
[1] https://bruceediger.com/posts/goofing-on-meta/
[2] https://bruceediger.com/posts/anti-seo-infinite-website/
> Early in March 2025, I noticed that a web crawler with a user agent string of
>
> meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
>
> was hitting my blog's machine at an unreasonable rate.
>
> I followed the URL and discovered this is what Meta uses to gather premium, human-generated content to train its LLMs. I found the rate of requests to be annoying.
>
> I already have a PHP program that creates the illusion of an [2]infinite website . I decided to answer any HTTP request that had "meta-externalagent" in its user agent string with the contents of a bork.php generated file...
>
> This worked brilliantly. Meta ramped up to requesting 270,000 URLs on May 30 and 31, 2025...
>
> After about 3 months, I got scared that Meta's insatiable consumption of Super Great Pages about condiments, underwear and circa 2010 C-List celebs would start costing me money. So I switched to giving "meta-externalagent" a 404 status code. I decided to see how long it would take one of the highest valued companies in the world to decide to go away.
>
> The answer is 5 months.
[1] https://bruceediger.com/posts/goofing-on-meta/
[2] https://bruceediger.com/posts/anti-seo-infinite-website/