Sitemap
A Sitemap is a structured file that provides search engines with a list of URLs for pages, images, videos, and other content on a website. It serves as a "map" that helps search engine crawlers explore and index a site more efficiently.
A Sitemap is a structured file that provides search engines with a list of URLs for pages, images, videos, and other content on a website. It serves as a "map" that helps search engine crawlers explore and index a site more efficiently.
Why It Matters
Search engines discover web pages by following links. However, for newly created pages, deep pages with insufficient internal links, or large-scale sites with hundreds of thousands of pages, crawlers may struggle to discover every page naturally. A sitemap directly informs search engines about these pages, improving crawling efficiency and preventing indexing omissions.
Sitemaps are particularly essential in the following scenarios:
- Large-scale sites with 500 or more pages
- New sites with very few external backlinks
- Sites with abundant rich media content such as images and videos
- News sites where content is frequently updated
A sitemap is a discovery and recrawl hint, not an indexing guarantee. Google may crawl a URL from a sitemap and still choose not to index it because of quality, duplication, canonical selection, noindex directives, or rendering issues.
Types
Sitemaps come in several types depending on their purpose:
XML Sitemap: The most basic and widely used format. It structures each page's URL and metadata using tags such as <url>, <loc>, and <lastmod>.
Image Sitemap: A format that specifically informs search engines about image content. Useful when you want to maximize image search visibility.
Video Sitemap: Includes metadata such as title, description, and duration of video content to help Google better understand your videos.
News Sitemap: A specialized format for news publishers that should only include articles published within the last 2 days.
Sitemap Index: When a single sitemap file exceeds 50,000 URLs or 50MB, multiple sitemaps are grouped and managed through a single index file.
HTML Sitemap: A sitemap designed for users rather than search engines. It is a page that aggregates links to key pages on the site, improving navigation convenience.
Setup Guide
Step 1 — Generate the Sitemap
There are three methods for generating a sitemap. First, use built-in CMS or framework features or plugins (e.g., Yoast SEO for WordPress). Second, auto-generate using crawling tools like Screaming Frog. Third, manually write the XML file — suitable for small-scale sites.
Step 2 — Follow Required Rules
- Keep URLs per file to 50,000 or fewer and file size to 50MB or less
- Use UTF-8 encoding
- Write URLs as absolute paths (e.g.,
https://example.com/page) - Include only canonical URLs. Exclude URLs that redirect or duplicate pages
- Use
<lastmod>only when the timestamp reflects a meaningful content change. Inaccurate timestamps can make search engines ignore the signal.
Step 3 — Deploy and Submit
Place the sitemap file in the site's root directory (e.g., https://example.com/sitemap.xml). Add Sitemap: https://example.com/sitemap.xml to your robots.txt file, and submit the URL through Google Search Console's "Sitemaps" menu.
Step 4 — Set Up Automatic Updates
Configure the sitemap to update automatically whenever content is added, modified, or deleted. Use accurate modification timestamps in the <lastmod> tag to prompt search engines to prioritize re-crawling of changed pages.
For very large sites, split URLs into logical child sitemaps such as posts, categories, images, or videos and reference them from a sitemap index. Bing and other engines may also support IndexNow for faster discovery, but it should complement, not replace, a clean XML sitemap and internal links.
Common Mistakes
Including noindex pages in the sitemap: Adding pages with a noindex tag or pages blocked by robots.txt to the sitemap sends conflicting signals to search engines. Only include pages you want indexed in your sitemap.
Including broken links (404s): If URLs for deleted pages remain in the sitemap, Google Search Console will report "Submitted URL not found (404)" errors. Regularly audit your sitemap and remove invalid URLs.
Including redirected or non-canonical URLs: Sitemap URLs should be final canonical destinations. Submitting old URLs, tracking-parameter URLs, or alternate canonical versions wastes crawl budget and makes diagnostics noisy.
Date format errors: According to SEMrush research, approximately 62% of XML sitemap errors stem from date format issues. <lastmod> must follow the W3C Datetime format (e.g., 2026-03-17 or 2026-03-17T09:00:00+09:00).
URL format inconsistency: Mixing https and http, or www and non-www, can cause search engines to treat the same page as separate entities. All URLs within the sitemap should use one consistent format.
Generating the sitemap but not submitting it: Even if you create a sitemap file, if you do not submit it to Google Search Console or Bing Webmaster Tools, it may take a significant amount of time for search engines to discover it.
Sources:
- What Is a Sitemap | Google Search Central
- XML Sitemap: What It Is And How To Generate One - Semrush
- How to Create an XML Sitemap (and Submit It to Google) - Ahrefs
- IndexNow Documentation - Bing Webmaster Tools
Related inblog Posts
How inblog Helps
inblog dynamically generates XML sitemaps that automatically reflect post publishing and deletion.