Sitemap
A Sitemap is a structured file that provides search engines with a list of URLs for pages, images, videos, and other content on a website. It serves as a "map" that helps search engine crawlers explore and index a site more efficiently.
A Sitemap is a structured file that provides search engines with a list of URLs for pages, images, videos, and other content on a website. It serves as a "map" that helps search engine crawlers explore and index a site more efficiently.
Why It Matters
Search engines discover web pages by following links. However, for newly created pages, deep pages with insufficient internal links, or large-scale sites with hundreds of thousands of pages, crawlers may struggle to discover every page naturally. A sitemap directly informs search engines about these pages, improving crawling efficiency and preventing indexing omissions.
Sitemaps are particularly essential in the following scenarios:
- Large-scale sites with 500 or more pages
- New sites with very few external backlinks
- Sites with abundant rich media content such as images and videos
- News sites where content is frequently updated
Types
Sitemaps come in several types depending on their purpose:
XML Sitemap: The most basic and widely used format. It structures each page's URL and metadata using tags such as <url>, <loc>, and <lastmod>.
Image Sitemap: A format that specifically informs search engines about image content. Useful when you want to maximize image search visibility.
Video Sitemap: Includes metadata such as title, description, and duration of video content to help Google better understand your videos.
News Sitemap: A specialized format for news publishers that should only include articles published within the last 2 days.
Sitemap Index: When a single sitemap file exceeds 50,000 URLs or 50MB, multiple sitemaps are grouped and managed through a single index file.
HTML Sitemap: A sitemap designed for users rather than search engines. It is a page that aggregates links to key pages on the site, improving navigation convenience.
Setup Guide
Step 1 — Generate the Sitemap
There are three methods for generating a sitemap. First, use built-in CMS or framework features or plugins (e.g., Yoast SEO for WordPress). Second, auto-generate using crawling tools like Screaming Frog. Third, manually write the XML file — suitable for small-scale sites.
Step 2 — Follow Required Rules
- Keep URLs per file to 50,000 or fewer and file size to 50MB or less
- Use UTF-8 encoding
- Write URLs as absolute paths (e.g.,
https://example.com/page) - Include only canonical URLs. Exclude URLs that redirect or duplicate pages
Step 3 — Deploy and Submit
Place the sitemap file in the site's root directory (e.g., https://example.com/sitemap.xml). Add Sitemap: https://example.com/sitemap.xml to your robots.txt file, and submit the URL through Google Search Console's "Sitemaps" menu.
Step 4 — Set Up Automatic Updates
Configure the sitemap to update automatically whenever content is added, modified, or deleted. Use accurate modification timestamps in the <lastmod> tag to prompt search engines to prioritize re-crawling of changed pages.
Common Mistakes
Including noindex pages in the sitemap: Adding pages with a noindex tag or pages blocked by robots.txt to the sitemap sends conflicting signals to search engines. Only include pages you want indexed in your sitemap.
Including broken links (404s): If URLs for deleted pages remain in the sitemap, Google Search Console will report "Submitted URL not found (404)" errors. Regularly audit your sitemap and remove invalid URLs.
Date format errors: According to SEMrush research, approximately 62% of XML sitemap errors stem from date format issues. <lastmod> must follow the W3C Datetime format (e.g., 2026-03-17 or 2026-03-17T09:00:00+09:00).
URL format inconsistency: Mixing https and http, or www and non-www, can cause search engines to treat the same page as separate entities. All URLs within the sitemap should use one consistent format.
Generating the sitemap but not submitting it: Even if you create a sitemap file, if you do not submit it to Google Search Console or Bing Webmaster Tools, it may take a significant amount of time for search engines to discover it.
Related inblog Posts
How inblog Helps
inblog dynamically generates XML sitemaps that automatically reflect post publishing and deletion.