Cloudflare turns websites into faster food for AI agents
- Reference: 1770941529
- News link: https://www.theregister.co.uk/2026/02/13/cloudflare_markdown_for_ai_crawlers/
- Source link:
Having previously devised a mechanism to make AI crawlers pay to consume website content, the content delivery network is now offering web publishers a way to make it cheaper for AI services to harvest site content by converting HTML to [1]Markdown , the minimalist markup language for representing text mixed with formatting characters in a way that retains legibility.
In a [2]blog post , Cloudflare engineering director Celso Martinho and VP Will Allen explain that AI crawlers and software agents, which constitute a growing portion of web traffic, find it easier to digest documents formatted in Markdown than traditional HTML web pages.
[3]
The reason is that HTML web pages often contain a lot of characters that describe formatting and identifiers unrelated to the semantic content, and chewing through all those tags and related markup such has a computational cost.
[4]
[5]
"Feeding raw HTML to an AI is like paying by the word to read packaging instead of the letter inside," explain Martinho and Allen. "A simple ## About Us on a page in markdown costs roughly 3 tokens; its HTML equivalent – <h2 class="section-title" id="about">About Us</h2> – burns 12-15, and that's before you account for the <div> wrappers, nav bars, and script tags that pad every real web page and have zero semantic value."
To make web content easier for AI crawlers to chew, Cloudflare's network can now [6]respond to crawler network requests in Markdown rather than HTML. To make this happen, an AI crawler in its network negotiation submits the Accept negotiation header with text/markdown as one of the options.
[7]AI agent seemingly tries to shame open source developer for rejected pull request
[8]30+ Chrome extensions disguised as AI chatbots steal users' API keys, emails, other sensitive data
[9]Anthropic promises its datacenters totally won't drive up your utility bill
[10]Devilish devs spawn 287 Chrome extensions to flog your browser history to data brokers
If a site publisher enables Markdown, Cloudflare's network will answer with a response header formatted in the language, plus an x-markdown-tokens header that includes the token count. That's potentially useful for calculating whether the incoming content will fit within the model's context window or whether it needs to be broken up into a series of smaller chunks.
For a web page like the Cloudflare blog post, Markdown delivery reduces the number of tokens used from 16,180 in HTML to 3,150 in Markdown, a savings of 80 percent.
[11]
The Markdown option, which is available for HTML but not other document formats like PDF, complements another recently deployed capability, the company's [12]Content Signals Policy .
Content Signals Policy is a framework for adding machine-readable instructions to a website's robots.txt file, an implementation of the [13]Robots Exclusion Protocol that allows publishers to communicate how they expect bots and crawlers to engage with their site. It exists to specify content usage preferences more precisely.
A site's Content Signals Policy is expressed in a robots.txt directive that declares three key-value pairs. For example: User-Agent: *
Content-Signal: ai-train=no, search=yes, ai-input=no
Allow: /
The parameters specify whether content can be used for AI training, for AI search, and for AI input (post-training uses like retrieval augmented generation or [14]model grounding ).
As part of robots.txt , Content Signals Policy directives are voluntary; they do not represent technical protection measures.
[15]
According to Martinho and Allen, coding agents like Claude Code and OpenCode already ask for Markdown in their Accept headers. Web publishers can now cater to automated clients if they choose. ®
Get our [16]Tech Resources
[1] https://en.wikipedia.org/w/index.php?title=Markdown&oldid=1335119712
[2] https://blog.cloudflare.com/markdown-for-agents/
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aY6v8BGB8DOhkrG6Qf8wAgAAARE&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aY6v8BGB8DOhkrG6Qf8wAgAAARE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aY6v8BGB8DOhkrG6Qf8wAgAAARE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[6] https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/
[7] https://www.theregister.com/2026/02/12/ai_bot_developer_rejected_pull_request/
[8] https://www.theregister.com/2026/02/12/30_chrome_extensions_ai/
[9] https://www.theregister.com/2026/02/12/anthropic_power_promises/
[10] https://www.theregister.com/2026/02/11/security_researcher_287_chrome_extensions_data_leak/
[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aY6v8BGB8DOhkrG6Qf8wAgAAARE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[12] https://contentsignals.org
[13] https://www.rfc-editor.org/rfc/rfc9309.html
[14] https://docs.cloud.google.com/vertex-ai/generative-ai/docs/grounding/overview
[15] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aY6v8BGB8DOhkrG6Qf8wAgAAARE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[16] https://whitepapers.theregister.com/
Converting HTML to Markdown - I'm getting dizzy!
The purpose of [1]Markdown was to generate HTML from simple markup.
There are now scads of web pages that were written in Markdown, which was used to generate HTML for mass consumption. Of course, there are [2]browser plugins that will do the conversion of Markdown into HTML for you, and display the results, so that you can easily preview what you are writing (if you decide not to use any of the GUI programs that help you write using Markdown).
Now we have these - and many, many other - pages being converted (back) into Markdown (BUT so far I've not spotted which *flavour* of Markdown they are targeting; there are lots of variants out there - and attempts to standardise, but we all know [3]where that leads ).
And what is the advantage of this conversion? Why, to reduce the amount of text sent back *and* to strip out all the noise inside the web page. Such as all the crap that forces you to read the page the way *they* want you to see it, not the way *you* want to (remember the Good Old Days, when HTML was all about content and presentation (fonts etc) was up to you, the reader?).
So, I'd like to encourage the use of this feature and take advantage of it as an unintended bonus[2] of all this LLM flummery:
If we can all use a browser that sends the request for Markdown[1] and then does the HTML generation and display for us, we can have a nicer, less noisy, more controllable web to browse! Yay!
Especially if sites that started with Markdown in the first place can be given first dibs at the request and just return the original, skipping two stages[3]. Getting the raw Mermaid (or similar plugins - chem and pic anyone?) that Markdown variants often support for diagrams/charts, instead of JPEGs[4], would be neat as well.
[1] there must be a plugin for that, I'll check when I'm up later today
[2] although the intended bonus is good as well - if you insist on using an LLM to read a web page, at least do something to cut down on the energy use by a bit of preprocessing and, given that is CloudFlare's job, hopefully caching to reduce repetition.
[3] yes, Markdown does include an escape to HTML as standard, so we'll still have to have CloudFlare check for that and do what it can to convert it
[4] why do websites still do that? JPEGs for photos, PNG - or even GIF! - for infographics, charts and diagrams.
[1] https://daringfireball.net/projects/markdown/
[2] https://addons.mozilla.org/en-US/firefox/addon/markdown-viewer-chrome/
[3] https://www.explainxkcd.com/wiki/index.php/927:_Standards
Thanks CloudFlare...
for making it even easier for AI bots to steal content from websites and leaving the publishers empty handed. Who thought of this?
Voluntary Code of Decency = No Decency
"As part of robots.txt, Content Signals Policy directives are voluntary."
If it's voluntary then it's not worth the paper it's not written on.