Content Signals
Content Signals is a policy standard that extends robots.txt so a website can declare, in machine-readable form, how it prefers its content to be used after crawling: for search, for real-time AI answers (ai-input), or for AI training (ai-train). Cloudflare announced it on September 24, 2025.
Content Signals is a policy standard that extends robots.txt so a website can declare, in machine-readable form, how it prefers its content to be used after crawling: for search, for real-time AI answers (ai-input), or for AI training (ai-train). Cloudflare announced it on September 24, 2025.
Why It Matters
Classic robots.txt only expresses who may access which paths—it says nothing about what happens to content after it is fetched. The problem is that a single crawler often serves multiple purposes. Google, for example, uses the same crawler for search indexing and AI features, so blocking an AI crawler outright left no way to stay visible in search while refusing AI training. Content Signals fills that gap by letting sites declare preferences per use case rather than per bot. It has become one pillar of the broader debate over how content owners regain control in the AI era.
The Three Signals and the Syntax
- search: Building a search index and serving search results. Does not include AI-generated summaries.
- ai-input: Feeding content into AI models to generate real-time answers (grounding, RAG, and similar uses).
- ai-train: Training or fine-tuning AI models.
Preferences are written inside robots.txt as comma-separated yes/no values:
Content-Signal: search=yes, ai-input=yes, ai-train=no
User-Agent: *
Allow: /
Omitting a signal means no preference is expressed for that use. Cloudflare applied search=yes, ai-train=no as the default for the 3.8+ million domains using its managed robots.txt, deliberately leaving ai-input unset so each site owner decides.
Limitations and the GEO Angle
Content Signals are a declaration of preference, not a technical countermeasure—crawlers that ignore them are not stopped. Still, an explicit machine-readable statement can carry weight in future disputes over content use, and Cloudflare designed it to be paired with enforcement tools like bot blocking and Pay Per Crawl. For GEO, the key tension is that ai-input cuts both ways: if your goal is to be cited in AI answers, setting ai-input to "no" eliminates the chance of being cited via AI crawling at all. That is why sites pursuing brand visibility typically keep search=yes, ai-input=yes and selectively refuse only ai-train.
Sources:
How inblog Helps
Most blogs published with inblog exist to be visible in both search and AI answers, so if you adopt Content Signals, keeping search and ai-input open is the natural configuration. inblog's built-in analytics show referral traffic from AI channels, letting you verify with data that allowing ai-input actually translates into visits, while Google Search Console integration keeps search visibility in the same view.