Key takeaways
- “Sitemap could not be read” means Google failed to fetch or parse your sitemap as a sitemap
- This guide explains the failure modes (HTTP, redirects, content-type, format, size), how to diagnose fast, and what changes actually remove the error
Table of Contents
“Sitemap could not be read” is not an indexing verdict. It’s a transport + parsing failure.
It means: Google tried to fetch the sitemap URL you submitted, and it could not reliably interpret the response as a valid sitemap.
That usually reduces discovery (fewer URLs enter the pipeline), which then shows up later as:
- “Discovered — currently not indexed”
- “Crawled — currently not indexed”
If you want the bigger map first:
What it means (plain English)
Google expected a sitemap file. It got something else:
- an error response
- a login / blocked response
- HTML instead of XML
- a redirect chain that breaks, loops, or changes content
- a file that is too large or malformed
So it can’t trust the sitemap as a stable input.
The 80/20 causes
1) The sitemap URL returns a non‑200 or unstable response
Common culprits:
- 403/401 (blocked, WAF, auth)
- 404 (wrong URL)
- 5xx (server issues)
- timeouts
If you see access/server errors elsewhere in GSC, fix those first:
2) Redirects on the sitemap URL
Redirects aren’t automatically wrong, but they are a frequent source of “could not be read” because they create instability:
- redirect loops
- redirect chains
- different content per user-agent
See:
3) Content-type / format mismatch (HTML, not XML)
The most common “silent” failure: the sitemap URL serves a normal webpage (HTML), not a sitemap.
Typical reasons:
- wrong route (points to
/sitemappage, not/sitemap.xml) - proxy/cache misconfig serving HTML fallback
- framework route not matching in production
4) The sitemap is malformed XML (or wrong sitemap syntax)
Google can be tolerant, but not infinitely. Common mistakes:
- invalid XML
- wrong encoding
- invalid
<loc>URLs - invalid date formats in
<lastmod>
5) Size and compression issues
Hard limits matter because “could not be read” often happens when you exceed them or produce a file that’s heavy to fetch:
- too many URLs in one sitemap
- too large uncompressed
- broken gzip
If you have many URLs, use a sitemap index + chunked sitemaps.
How to diagnose fast (without guessing)
- Open the submitted sitemap URL in a browser:
- It should look like XML, not a webpage.
- The file should contain
<urlset>or<sitemapindex>.
- Fetch it with a simple HTTP client (curl / powershell) and check:
- status code is 200
- redirects are minimal (ideally none)
- response is stable on repeat requests
- Validate the XML quickly:
- if the file isn’t valid XML, fix generation first
- Confirm the URLs inside are canonical, indexable representations:
- avoid stuffing parameter URLs, duplicate paths, or URLs that redirect
If you’re treating sitemaps as a “crawl budget lever”, read this first:
What to do next (the honest version)
Fix the sitemap so it becomes a stable, parseable input. Then measure what changes in the pipeline:
- discovery and crawl cadence
- the ratio of discovered vs indexed
- whether “discovered not indexed” drops over time
If your pages are indexed but still not visible, that’s a different problem (selection, not storage):
Next in SEO & Search
Up next:
Sitemap errors (Google Search Console): what they mean and what to fix first (2026)Sitemap errors are not “bad SEO” — they’re input integrity failures. This guide classifies sitemap errors into fetch, format, and URL-level problems, explains why they matter, and shows the fastest checks to remove them.