llms.txt
llms.txt is a proposed markdown file served at the root of a website — /llms.txt — that gives LLM-based tools a curated, condensed map of a site's most important content. Proposed by Jeremy Howard in 2024, it aims to do for AI what sitemap.xml did for search: make the best parts of your site discoverable and digestible at machine speed.
llms.txt is a proposed markdown file served at the root of a website — /llms.txt — that gives LLM-based tools a curated, condensed map of a site's most important content. Proposed by Jeremy Howard in 2024, it aims to do for AI what sitemap.xml did for search: make the best parts of your site discoverable and digestible at machine speed.
Why It Matters
LLMs reading the web face a context-window problem: a single marketing site can blow past 200K tokens of HTML, CSS, and navigation boilerplate before the model reaches the actual content. llms.txt solves this by providing a short, curated list of the pages the site owner wants an LLM to read, written in clean markdown without cruft. Anthropic, Cloudflare, Mintlify, Zapier, and Stripe all published llms.txt files in 2024. For brands that want to be understood and cited correctly by AI, it's becoming the cheapest high-leverage GEO move.
What It Looks Like
A basic file:
# inblog
> inblog is an AI-powered blogging platform for SEO-optimized content.
## Docs
- [Getting started](https://inblog.ai/docs/getting-started): Create your first blog
- [SEO features](https://inblog.ai/docs/seo): Built-in SEO optimization
- [AI drafting](https://inblog.ai/docs/ai-drafts): How AI drafts work
## Optional
- [Changelog](https://inblog.ai/changelog): Product updates
Two sections: a heading + summary, then curated links grouped by purpose. The Optional section lists content an LLM should read only if depth is needed.
llms.txt vs robots.txt vs sitemap.xml
| File | Audience | Purpose |
|---|---|---|
robots.txt | Crawlers | What not to crawl |
sitemap.xml | Search engines | Complete list of pages to index |
llms.txt | LLM-based tools | Curated, prioritized content for ingestion |
robots.txt is a fence. sitemap.xml is a phone book. llms.txt is a curator's recommendation shelf. They're complementary, not replacements.
Two Variants
llms.txt: The short curated map — the table of contents.
llms-full.txt: An expanded version where each linked page's markdown content is inlined, giving an LLM the entire ingestible corpus in one file. Used by doc sites like Anthropic's and Mintlify's clients.
How to Author a Good llms.txt
1. Lead with a one-line positioning statement: The blockquote after the H1. This is what the LLM learns about your brand's identity.
2. Group by purpose, not structure: "Docs," "Guides," "API Reference," "Case Studies" — not "Category A," "Category B."
3. Write link descriptions as facts, not marketing: "Built-in SEO optimization" beats "Supercharge your content."
4. Put the most important pages first: LLMs under context pressure read top-to-bottom.
5. Use Optional for deep-cut content: Things the LLM should skip unless the user wants detail.
6. Update it when the site changes: An outdated llms.txt is worse than none.
Limitations
Not yet a widely enforced standard: Google, OpenAI, Anthropic have not committed to automatically reading it. Adoption is driven by LLM tools (Cursor, Perplexity, Claude's docs), not search engines.
Not a ranking signal (yet): It affects LLM ingestion quality, not SERP position.
Requires discipline: A stale llms.txt misleads the very models you're trying to reach.
Can't fix bad content: If your docs are weak, llms.txt just surfaces them faster.
Why inblog Sites Should Consider It
Every blog on inblog is a content surface that AI tools might ingest. A small llms.txt at the blog root — pointing at pillar posts, glossary entries, and brand intro — tells LLMs exactly what to read when a user asks about your brand or topic. It's a direct lever on AI citation quality with minimal effort.
Sources: