Indexability
Indexability is whether a page that search engines have crawled can be analyzed, stored in the index (the search engine's database), and shown in search results. While crawlability asks "can search engines reach this page," indexability asks "does the page they reached qualify to be indexed."
Indexability is whether a page that search engines have crawled can be analyzed, stored in the index (the search engine's database), and shown in search results. While crawlability asks "can search engines reach this page," indexability asks "does the page they reached qualify to be indexed."
Why It Matters
Being crawled does not guarantee being indexed. Google's own documentation states that indexing isn't guaranteed and that not every page Google processes will be indexed. A page that never makes it into the index cannot rank for anything, no matter how good its content is — indexability is a precondition for all search visibility.
The larger a site grows, the easier it is for pages to sit in a "Crawled - currently not indexed" state. Crawlability problems are relatively easy to spot because bots are blocked outright, but indexability problems are sneakier: the page loads normally for visitors and is simply invisible in search, so they often go unnoticed for months.
Crawlability vs. Indexability
| Aspect | Crawlability | Indexability |
|---|---|---|
| Question | Can search engines access and read the page? | Can the page be stored in the index and shown in results? |
| Blockers | robots.txt rules, server errors, broken links | noindex, canonical tags, 4xx/5xx status codes, low-quality content |
| Relationship | Prerequisite | Evaluated after a successful crawl |
The two concepts are sequential. A page that was never crawled is not even evaluated for indexing; only pages that were crawled successfully move on to the indexability stage.
What Determines Indexability
- Noindex directives: If a page carries a noindex meta tag or X-Robots-Tag header, search engines exclude it from the index.
- Canonical signals: If the canonical URL points to a different page, the page is classified as an "alternate page" and dropped from indexing. Google clusters similar pages and indexes only one representative page per cluster.
- HTTP status codes: Only pages returning a 200 response can be indexed. 404/410 errors, 5xx errors, soft 404s, and redirecting URLs are excluded.
- The robots.txt relationship: A robots.txt block stops crawling, not indexing. Worse, a blocked page cannot have its noindex tag read, so the URL can still end up indexed through external links alone.
- Content quality: Even a technically indexable page may be skipped if its content is thin or duplicative — a common cause behind "Crawled - currently not indexed."
How to Check
The Page Indexing report in Google Search Console groups every non-indexed page by reason — "Excluded by 'noindex' tag," "Alternate page with proper canonical tag," "Crawled - currently not indexed," and so on — which tells you whether each exclusion is intentional or a problem to fix. For individual URLs, the URL Inspection tool shows the indexing status and the canonical Google actually selected.
Sources:
- In-depth guide to how Google Search works - Google Search Central
- Page Indexing report - Search Console Help
- Indexability: Make sure search engines can actually find and rank you - Search Engine Land
How inblog Helps
inblog handles the indexability fundamentals automatically for every published post: clean 200 responses, a canonical tag per post, and an auto-generated sitemap. Posts you want kept out of search can be excluded with the per-post noindex setting. After publishing, make a habit of checking the Page Indexing report in Search Console to confirm your posts actually made it into the index.