3.46 min read

robots.txt unreachable: Why it happens and how to fix it

By Official

Key takeaways

  • txt unreachable": what Googlebot is seeing, common causes (timeouts, 403/5xx, WAF), and how to validate the fix in Search Console

Start with the map:

Related:

What "robots.txt unreachable" means

Googlebot tried to fetch https://your-domain.com/robots.txt and could not reliably access it.

This matters because robots.txt is a gatekeeper file:

  • if Google cannot fetch it, crawling can become conservative
  • many systems treat the site as unstable until the file is reachable again

It is not a ranking factor by itself, but it can cause a cascade:

  • fewer pages crawled
  • slower re-crawls
  • more "crawl anomaly" style noise

The common root causes

1) robots.txt returns 403/401 to Googlebot

This is typically WAF/CDN/security rules.

Symptoms:

  • you can load robots.txt in the browser
  • but Googlebot (or some IP ranges) gets blocked

2) robots.txt returns 5xx intermittently

Often:

  • origin is unstable
  • serverless cold starts
  • timeouts under load

Related:

3) Redirect chains on /robots.txt

Robots should not bounce through multiple redirects.

Goal:

  • one stable URL
  • 200 OK

4) Rate limiting / bot protection

Some bot protection policies block everything that looks like a bot, including Googlebot.

The 10-minute checks

  1. Fetch robots.txt from a clean network:
  1. Check headers:
  • do you see caching headers?
  • is there a weird redirect?
  1. In GSC:
  • use robots testing tools (or URL Inspection on /robots.txt if available)
  1. Check logs (if you have them):
  • requests to /robots.txt
  • response codes over time

Fix checklist

Fix A: Make /robots.txt boring and static

The best robots.txt is:

  • served directly
  • consistent 200
  • cached safely

Avoid:

  • middleware that rewrites /robots.txt
  • geo redirects
  • auth

Fix B: Whitelist Googlebot (carefully)

If WAF rules block bots, you often need allow rules for:

  • verified Googlebot IPs (or WAF built-in Googlebot verification)
  • user agent checks alone are not enough

Fix C: Remove redirect chains

If /robots.txt redirects:

  • collapse to one hop
  • ideally serve directly

Fix D: Fix origin instability

If it is a 5xx/timeout issue, stabilize the origin first.

Validation

You want to see:

  • robots.txt fetch succeeds consistently (200)
  • GSC stops reporting unreachable (with delay)

Expect lag:

  • GSC is not real-time; give it 3-14 days.

FAQ

Will this prevent indexing entirely?

Not necessarily, but it often slows crawling and makes Google conservative until robots becomes reachable again.

Should I block more stuff in robots.txt to "save crawl budget"?

Be careful. Blocking crawling can prevent Google from seeing noindex and canonicals. For content sites, cleaner URL hygiene and sitemaps usually do more than aggressive robots blocks.

How Google behaves when robots.txt is unreachable

Google does not want to crawl blindly if it cannot verify your robots policy.

In practice you often see one of these patterns:

  • crawling slows down until robots.txt is reachable again
  • Google retries robots.txt periodically (which can show up as recurring errors)
  • indexing of new URLs becomes more conservative because the site looks unstable

This is why "robots.txt unreachable" is less about the single file and more about site reliability.

Practical fixes by setup (common patterns)

If you use a CDN/WAF

Common mistake: bot protection blocks /robots.txt.

Fix:

  • ensure /robots.txt is always allowed
  • do not challenge it with JS/captcha
  • avoid country rules on this path

If you deploy frequently (serverless)

Sometimes the file is unreachable only during deploy windows.

Fix:

  • keep robots.txt static
  • cache it at the edge/CDN
  • avoid runtime logic on the route

A safe robots.txt baseline

If you are not sure, start with something boring like:

` User-agent: * Disallow:

Sitemap: https://your-domain.com/sitemap.xml `

Then add disallows only when you are confident the path should never be crawled.

Common mistakes

  • redirecting /robots.txt multiple times
  • blocking /robots.txt in WAF rules
  • returning 200 with HTML instead of plain text
  • serving different robots rules based on cookies/geo

Next in SEO & Search

View topic hub

Up next:

Blocked due to access forbidden (403): Fix checklist for Googlebot

A practical guide to "Blocked due to access forbidden (403)": typical causes (WAF, geo blocks, auth), how to verify what Googlebot sees, and safe fixes.