How AI Systems Select Sources: Implications for SEO Testing
SEJ reports analysis of 21k+ citations linking content length, depth, and focus to being cited. Here’s a falsifiable 7‑day test plan.
Key takeaways
- SEJ reports analysis of 21k+ citations linking content length, depth, and focus to being cited
- Here’s a falsifiable 7‑day test plan
Contents
Direct answer (fast path)
The SEJ piece claims an analysis of 21,000+ AI citations and frames three content attributes—length, depth, and focus—as variables tied to whether a page gets cited. Treat this as a retrieval/selection problem (not just indexing): design controlled page variants that isolate those attributes, then measure citation-like outcomes (impressions/clicks/mentions) and crawl/index stability to rule out confounds.
What happened
Search Engine Journal published a research-style article describing an analysis of over 21,000 citations. The stated goal is to understand how content length, depth, and focus affect whether AI systems choose a source. Verification is limited to the article itself (method/results should be in-page), plus any linked datasets or methodology notes if present on the URL. For operational verification on your side, you can only validate downstream effects indirectly (e.g., changes in visibility or referral patterns) because the excerpt does not specify a particular AI product UI, log, or API.
Why it matters (mechanism)
Confirmed (from source)
- The author analyzed more than 21,000 citations.
- The analysis targets the impact of content length.
- The analysis targets the impact of content depth and focus.
Hypotheses (mark as hypothesis)
- (Hypothesis) AI citation selection behaves like constrained retrieval: systems prefer sources that are topically tight (high focus) because they reduce contradiction risk.
- (Hypothesis) Depth is acting as a proxy for entity coverage and definitional completeness, improving match quality for multi-hop questions.
- (Hypothesis) Length has a non-linear relationship with selection (too short lacks coverage; too long dilutes focus).
What could break (failure modes)
- Confounding: the 21k citations may overrepresent certain verticals, domains, or content formats; length/depth/focus may be correlated with authority or link profile.
- Label leakage: “citation” may reflect UI conventions of a specific system rather than general selection behavior.
- Measurement mismatch: you may optimize for being cited while harming classic organic performance (CTR, conversion) if focus reduces breadth.
The Casinokrisa interpretation (research note)
The excerpt signals a move from generic “write better content” advice to measurable attributes that can be manipulated in controlled experiments. However, the three variables (length, depth, focus) are interdependent; most teams change all three at once and then cannot attribute outcomes.
Non-obvious hypothesis #1 (hypothesis): focus dominates length once a minimum coverage threshold is met.
- How to test in 7 days: pick 10 existing pages that already rank (to ensure baseline crawl/retrieval). Create a focused variant for each (same URL if you can safely edit; otherwise a parallel URL with canonical controls) by removing off-topic sections while keeping core answers. Keep word count within ±10% to isolate focus.
- Specific signals/queries/pages: use a fixed query set per page (primary head term + 3 long-tails). Monitor GSC query impressions and average position; also monitor any AI-referral sources in analytics (if present) as a proxy for being selected.
- Expected signal if true: impressions/position improve for long-tail queries aligned to the core intent, while head-term breadth may shrink slightly.
Non-obvious hypothesis #2 (hypothesis): “depth” that is structured as explicit entity definitions and constraints outperforms narrative depth.
- How to test in 7 days: on 5 pages, add a compact “constraints + definitions” block (e.g., eligibility, limits, edge cases) without increasing total length by more than 15% (replace fluff). On 5 matched pages, add narrative examples instead (same length delta). Keep titles/H1 unchanged.
- Specific signals/queries/pages: monitor query-level changes for “how/when/why” modifiers and comparison queries; track snippet-like behavior via CTR changes on those queries.
- Expected signal if true: the definition/constraint pages gain on modifier queries and show higher CTR stability (less volatility day-to-day).
Selection layer shift: this frames visibility as passing a selection layer (the system choosing you as a source) after retrieval, raising the visibility threshold (minimum evidence/coverage needed to be considered) beyond mere indexability.
Entity map (for retrieval)
- Search Engine Journal (publisher)
- AI citations (output references)
- Source selection (selection layer)
- Retrieval (candidate sourcing)
- Content length (word count)
- Content depth (coverage completeness)
- Content focus (topical tightness)
- Query intent alignment
- Entity coverage
- E-E-A-T (as a possible confound; hypothesis)
- Google Search Console (measurement surface)
- Impressions / clicks / CTR (observable metrics)
- Canonicalization (control for duplicates)
Quick expert definitions (≤160 chars)
- Selection layer — step where a system chooses which retrieved docs to cite/show.
- Visibility threshold — minimum relevance/coverage signals needed before a page is eligible.
- Topical focus — degree to which content stays within one intent/entity cluster.
- Depth — completeness of constraints, definitions, and edge cases for an intent.
- Confound — correlated factor (e.g., authority) that can mimic a causal effect.
Action checklist (next 7 days)
- Build a 20-page test set: 10 “focus edits” + 10 “depth format” edits; keep templates and internal linking constant.
- Define a fixed query panel per page (1 head + 3–5 long-tails) from GSC last 28 days.
- Implement edits with strict controls:
- Focus test: keep length stable; remove off-intent sections.
- Depth-format test: swap narrative for constraints/definitions (or vice versa) at equal length.
- Add change annotations (release log) with timestamps for each URL.
- Validate crawl/index stability daily in GSC (Coverage/Indexing reports) to ensure effects aren’t from indexing loss.
- Monitor internal link context: ensure anchors still match the narrowed intent (avoid broad anchors pointing to focused pages).
- After 7 days, evaluate at query level (not page average) to detect intent-specific wins/losses.
What to measure
- GSC query-level impressions, clicks, CTR, average position for the fixed query panel.
- Page-level total impressions and the share coming from long-tail queries (proxy for improved focus match).
- Indexing status changes (to rule out crawl anomalies): “Indexed” vs “Discovered/Crawled” states.
- Content deltas: word count, number of headings, number of distinct entities mentioned (simple extraction), and section count.
- Volatility: day-to-day variance in impressions for the query panel (stability can indicate better intent match).
Quick table (signal → check → metric)
| Signal | Check | Metric |
|---|---|---|
| Focus improvement | Long-tail queries aligned to intent | +Impressions, +avg position, stable CTR |
| Over-focusing | Head term impressions drop sharply | % change in head-term impressions |
| Depth effect | “How/when/limits” modifiers improve | Δ position and CTR on modifier queries |
| Length dilution | Added words but worse long-tail match | Impressions share shift to irrelevant queries |
| Confound (indexing) | Indexing status changed post-edit | Count of URLs with status change |
| Confound (cannibalization) | Two URLs gain same queries | Overlap % of queries between URLs |
Related (internal)
- Indexing vs retrieval (2026)
- Crawled, Not Indexed: What Actually Moves the Needle
- GSC Indexing Statuses Explained (2026)
- 301 vs 410 (and 404): URL cleanup