Key takeaways
- If you changed your site topic (or rebuilt twice), Search Console will keep crawling your past
- Here is the practical playbook: when to 301, when to 410, what to ignore, and how to stop index bloat
Table of Contents
If you changed your site direction (once or twice), Google will keep crawling your old URLs.
That is normal. Links and old sitemaps live longer than your strategy.
The goal is not to "fix every URL". The goal is to reduce crawl debt so Google spends less time on junk and more time indexing what matters now.
This post is the practical decision tree: 301 vs 410, plus the few mistakes that create months of GSC noise.
If you want the deeper model of "crawled, not indexed" as a priority decision, read this next:
If you are new here, the map is here:
Why pivots create a crawl problem (even if your new pages are good)
When a site changes direction, Google is stuck with a memory of the old web graph:
- old backlinks still point to old URLs
- scraped lists and mirrors keep the old URLs alive
- old RSS endpoints and archives get re-crawled
- bots keep probing WordPress-era paths forever
So your new posts are competing with your old footprint for attention.
This is why "crawled, not indexed" often becomes worse after a pivot: Google sees too many low-value URLs relative to the core you want indexed.
Common legacy URL patterns (real examples)
If your site used to be WordPress (or anything that looked like it), you usually see these in GSC:
- Archives:
/category/...,/author/...,/articles/page/2/ - Feed variants:
*/feed/,/feed,/rss.xml,/atom.xml - WP endpoints:
/wp-admin,/wp-content,/wp-json,/xmlrpc.php - Old sections that no longer exist:
/projects,/processes,/terms - Scanner junk:
/admin/, random long slugs, weird query strings - Duplicate variants:
www.vs apex, and legacy params like?m=1
Your job is not to "save" these URLs. Your job is to make them cheap for Google to discard.
The one rule: do not lie with redirects
A redirect is a claim: "this URL moved here".
If the old page has no real successor, redirecting it to a generic page (homepage, /blog, /start) usually becomes a soft-404 pattern:
- the user clicks a specific old URL
- lands on a generic page
- bounces
- Google learns the redirect does not satisfy intent
That slows cleanup and can make indexing more conservative site-wide.
301 vs 410 vs 404 (how Google reads them)
301 (Permanent Redirect)
Use 301 only when there is a real successor:
- old post -> the same post under a new slug
- old section -> the same section under a new path
- canonicalization:
www-> apex, stripping low-value params like?m=1
410 (Gone)
Use 410 when a URL is intentionally removed and has no replacement.
410 is not "better SEO". It is just a clearer signal than 404:
- 404 can be temporary or accidental
- 410 is explicit: permanently removed
For WordPress-era leftovers (old categories, author pages, feed endpoints), 410 is often the fastest way to stop index bloat.
404 (Not Found)
Use 404 for:
- random junk URLs
- scanner/bot paths
- truly unknown URLs you did not intentionally manage
Google drops persistent 404s too, just usually not as fast as explicit 410.
A 15-minute cleanup routine (do this in order)
This is the fast version you can run every time GSC shows a new batch of junk.
- Export 20-50 sample URLs - take a small slice from the report first. You are hunting patterns, not fighting individual URLs.
- Label each URL: Keep / Move / Remove - be brutal:
- Keep -> should be 200 and indexable
- Move -> 301 to a real successor
- Remove -> 410 if intentionally gone, otherwise 404
- Fix the two silent polluters - these create endless duplicates:
- preferred host (apex vs
www) - low-value params like
?m=1
- preferred host (apex vs
- Remove dead URLs from your sitemap - sitemap is "please index this", not "here is my history".
- Kill internal links to removed URLs - otherwise you keep telling Google they matter.
If you do only one thing: stop redirecting dead meaning into alive pages. That is how soft-404s are born.
The playbook (full version)
Step 1: classify URLs into 3 buckets
From GSC Pages / Not indexed / 4xx, bucket every URL:
- Keep: should return 200 and be indexable
- Move: should 301 to a specific successor
- Remove: should be 410 (or 404 if it is truly unknown)
Step 2: remove removed URLs from your sitemap
Your sitemap must list only URLs you want indexed.
If a URL is in the sitemap and returns 404/410, Google will keep retrying it.
Step 3: remove internal links to dead URLs
If your own site still links to a dead URL, you send mixed signals:
- "this is important" (internal link)
- "this is gone" (404/410)
Fix the links, not just the status code.
Step 4: canonicalize duplicates (the silent GSC polluters)
Common culprits:
wwwvs apex?m=1and other legacy params- trailing slash variants
301 duplicates to the canonical version.
Step 5: protect real pages from blanket legacy rules
If you use middleware to 410 whole legacy sections, be careful with generic patterns (like "single segment path").
Accidentally returning 410 for real pages (like /privacy or /press) creates permanent 4xx and wastes crawl.
How to know it is working
- GSC is delayed. Expect 3-14 days.
- Progress looks like:
- fewer new 4xx discovered
- legacy URLs slowly disappearing
- important 200 pages getting crawled more regularly
If after ~2 weeks important 200 OK pages are still "crawled, not indexed", that is usually a priority signal issue (internal linking + coherence), not a status code issue.
Quick decision table
- old URL has a true successor -> 301 to that exact page
- old URL is intentionally removed -> 410
- random junk/scanning -> 404
- duplicate URL variants -> 301 to canonical
FAQ
Does 410 remove URLs faster than 404?
Often yes, but not instantly. 410 is a clearer signal ("permanently removed"), so Google tends to drop it faster than ambiguous 404. The bigger win is that it reduces future re-processing of dead URLs.
Should I redirect old posts to the homepage or /blog?
Only if the old URL truly moved to a specific replacement page. Redirecting specific old intent to a generic page is a classic soft-404 pattern. It keeps GSC noisy and can weaken trust in your redirects.
Why does GSC keep showing removed URLs even after I fixed them?
Because GSC is delayed and because the wider web still links to those URLs. Google needs time to re-crawl, see the status, and update reports. Think in weeks, not days.
Do I need to put 404/410 URLs in my sitemap to "tell Google"?
No. A sitemap is for URLs you want indexed. If you include dead URLs, Google will keep retrying them and you will keep seeing them in reports.
What URLs should I request indexing for?
Only the pages that are your core: homepage, /start, pillar pages, topic hubs, and the few best supporting essays. If a page is thin, duplicate, or not part of your new direction, do not waste indexing requests on it.
I fixed redirects and 410s, but important 200 pages are still not indexed. Now what?
That is usually a priority issue: the page is not clearly important inside your own site. Add internal links from /start and the relevant topic hub, make the cluster coherent, and re-request indexing for the core pages only.
If you want, send me 20 sample URLs from your GSC list and I will classify them into 301 / 410 / ignore.
Next in SEO & Search
You reached the end of this topic. Previous:
Google AI Mode Direct Link: When Search Becomes a Conversation