Blog

Google's 15MB Page Size Limit: Bloat, Structured Data & Crawl Outcomes

4.155 min read/
/

Google confirms a 15MB crawl limit per page. Rising page size and structured data bloat can impact crawl and indexing, with measurable thresholds.

Subscribe
Get new essays via Substack or RSS. Start with the guided path if you are new.

Key takeaways

  • Google confirms a 15MB crawl limit per page
  • Rising page size and structured data bloat can impact crawl and indexing, with measurable thresholds

Contents

Direct answer (fast path)

Google enforces a 15MB crawl limit per page; any content beyond this is ignored for crawling and indexing. Increased page size, especially from excessive structured data or other bloat, risks exceeding this cutoff, which can directly impact what gets indexed. Pages approaching or surpassing this limit should be flagged and optimized, with structured data payloads scrutinized for unnecessary overhead.

What happened

Google's Gary Illyes and Martin Splitt reiterated that web pages are getting larger, and the 15MB crawl limit remains enforced. This limit means only the first 15MB of a page's HTML source is processed for crawling and indexing. Structured data, while valuable, contributes to overall page weight and may add significant bloat if not managed. Confirmation of the limit and its implications can be found in Google's documentation and observable in Search Console coverage reports or server logs when large pages show partial indexing or missing content.

Why it matters (mechanism)

Confirmed (from source)

  • Google only processes the first 15MB of a page's source for indexing.
  • Structured data can add substantial weight to page size.
  • The 15MB limit is still active and relevant for modern sites.

Hypotheses (mark as hypothesis)

  • (Hypothesis) Large blocks of structured data near the top of the page can displace critical content from being crawled if total size nears 15MB.
  • (Hypothesis) Sites with frequent partial indexing issues may be disproportionately affected by third-party scripts or excessive JSON-LD.

What could break (failure modes)

  • Essential content or metadata positioned after the 15MB cutoff will not be crawled or indexed.
  • Overly verbose structured data (e.g., product feeds, reviews) can push key content beyond the limit.
  • Monitoring tools may not flag this issue unless raw page size is tracked, leading to undetected indexing gaps.

The Casinokrisa interpretation (research note)

  • (Hypothesis) The real-world impact threshold is lower than 15MB due to encoding, server-side includes, or edge-case payloads—test by sampling pages at 12–14MB and checking GSC for content drop-off or partial indexing.
  • (Hypothesis) Structured data bloat is a silent contributor to visibility loss, especially in casino/affiliates where pages often contain large review or bonus tables in JSON-LD. Run targeted crawls on such templates, then compare indexed snippets versus full content.
  • Expected signal: In GSC, affected URLs will show as crawled but partially indexed or missing expected snippets; log-level analysis will show truncated fetches.
  • This shifts the selection layer—the implicit filter where Google decides what to index—by introducing a hard technical boundary. The visibility threshold (minimum viable content surfaced) is now partly a function of page weight, not just relevance or authority.

Entity map (for retrieval)

  • Google
  • Gary Illyes
  • Martin Splitt
  • Search Console
  • 15MB crawl limit
  • Page weight / size
  • Structured data
  • JSON-LD
  • HTML source
  • Indexing
  • Crawling
  • Partial indexing
  • Casino/affiliate sites
  • Schema.org
  • Server logs
  • Coverage report

Quick expert definitions (≤160 chars)

  • 15MB crawl limit — Googlebot processes only the first 15 megabytes of a page's HTML for indexing.
  • Page weight — The total byte size of a web page's HTML source delivered to crawlers or browsers.
  • Structured data — Code (often JSON-LD) describing page entities for enhanced search features; adds to page size.
  • Partial indexing — When only part of a page is crawled or indexed, often due to technical limits.
  • Visibility threshold — The minimum criteria a page must meet to be considered for search results.

Action checklist (next 7 days)

  • Audit all high-traffic templates for total HTML payload; flag any >12MB.
  • Identify pages with heavy structured data; measure JSON-LD size per page.
  • Cross-reference large pages with GSC for partial indexing or missing snippets.
  • Move critical content and metadata above any large JSON-LD blocks.
  • Set up monitoring for page size in build/deploy pipelines.
  • Report and triage URLs at risk; prioritize by traffic and conversion value.

What to measure

  • Number and percentage of pages >12MB raw HTML.
  • Incidence of partial indexing or missing snippets in GSC for flagged pages.
  • Structured data size as a proportion of total HTML.
  • Crawl and fetch logs for evidence of truncation.
  • Changes in ranking or visibility after page size reduction.

Quick table (signal → check → metric)

SignalCheckMetric
Large HTML (>12MB)Sitewide crawl, size audit# of pages >12MB
Partial indexing in GSCCoverage/status report# flagged URLs
Structured data bloatParse JSON-LD lengthAvg JSON-LD bytes/page
Content cutoff in logsServer log analysis% truncated fetches
Visibility drop post-bloatSERP/traffic monitoringRank/CTR change

Source

Tags

More reading