We Analyzed 2,000+ AI Citations Across ChatGPT, Perplexity, and Google AI Overviews. Here's What Gets Picked Up.
By Cited Research Team · Published April 16, 2026 · Updated April 2026
Key Takeaways
- 44% of ChatGPT citations come from the first 30% of a page (seoClarity, 362K-query study, 2025).
- Only 11% of domains are cited by both ChatGPT and Perplexity — they are different ecosystems (Lantern AI Citation Visibility Report, Feb 2026).
- 68.7% of ChatGPT-cited pages use sequential H1→H2→H3 hierarchy vs. 23.9% of Google top-10 pages (AirOps, 2026).
- Cited pages average 13.75 list sections vs. 0.81 for uncited — a 17× differential (ALM Corp, 548,534-page study, 2026).
- Top-10 Google/AIO overlap collapsed from 76% to 38% in 18 months; 36.7% of AIO citations now come from beyond Google's top 100 (Ahrefs, BrightEdge, ALM Corp, 2026).
AI search engines do not cite the same things Google ranks. Cited cross-referenced four 2026 citation studies covering 362,000+ queries and roughly 17 million citations across ChatGPT, Perplexity, and Google AI Overviews. The pattern is consistent, counterintuitive, and actionable. Position on the page beats domain authority. Structural format beats word count. Earned placement beats on-site optimization by a factor of 3:1. Below is the consolidated extraction pattern — what each engine lifts, what it ignores, and what breaks down at the edges.
What data did we synthesize?
Cited combined four publicly documented 2026 datasets and triangulated where they agreed and where they diverged. The four source studies are the seoClarity AI Overview overlap study (362,000 queries, October 2025), the Ahrefs AI Search Overlap study (17 million citations, March 2026), the ALM Corp 548,534-page retrieval-vs-citation analysis (February 2026), and the Lantern AI Citation Content Visibility Report (200M+ citations, February 2026).
Each study used a different prompt set, platform mix, and time window, so specific percentages vary by source. Cited treated a claim as validated only when two or more of the four studies agreed directionally — the numbers quoted here are the convergent findings. Where the studies disagree (Reddit share on Perplexity, for example — 6.6% per Profound vs 46.7% per BrightEdge), the disagreement is flagged inline rather than averaged.
Where on the page do citations actually come from?
The first 30% of a page produces 44% of ChatGPT citations, 31% comes from the middle third, and 25% comes from the final third (SEO Smoothie, 2026 reverse-engineering of ChatGPT citation logic). This is the "ski ramp" pattern, and it is the single strongest structural signal in AI search. Bury the answer below an intro, a brand story, or a hero image and it does not survive extraction.
The mechanism is heading-driven extraction. ChatGPT matches the user's prompt against the H2 text on a page, then lifts the paragraph immediately following the matched H2 (SEO Smoothie, 2026). This is why the "answer capsule" — a 40-to-60-word declarative paragraph sitting directly under a question-shaped H2 — is the most-cited unit on the internet right now. The Princeton/Georgia Tech GEO paper (arXiv 2311.09735, KDD 2024) quantified the inverse effect: when citations, quotations, and statistics are packed into the first 30% of a page, generative engine visibility lifts 30% to 40%.
How much overlap is there between AI citations and Google's top 10?
Across the five engines studied, only 12% of AI-cited URLs rank in Google's top 10 for the original prompt (Ahrefs AI Search Overlap Study, August 2025). Perplexity is the closest-aligned at 28.6% overlap; ChatGPT, Gemini, and Copilot sit around 8%. Google AI Overviews is the high outlier, but even AIO's overlap dropped from 76% (Ahrefs, June 2025) to 38% (Ahrefs, March 2026) to as low as 17% in BrightEdge's Feb 2026 dataset.
The collapse is structural, not a data blip. Google AIO now decomposes each user query into sub-queries, retrieves 200 to 500 documents independently per sub-query, and reranks at the passage level (Ziptie.dev, 2026). ALM Corp's February 2026 breakdown: 37.1% of AIO citations come from rank 1–10, 26.2% from 11–100, and 36.7% from outside the top 100 altogether. A page ranking #1 for the head query can be entirely absent from AIO if it loses sub-query retrieval.
Which domains dominate each engine?
Each engine cites a different top-10. Only 11% of domains are cited by both ChatGPT and Perplexity (Lantern AI Citation Visibility Report, Feb 2026). The table below shows the consolidated source-share by platform.
| Platform | Top citation source | Share | Secondary dominant | Notable bias |
|---|---|---|---|---|
| ChatGPT | Wikipedia | 41.2%–47.9% (Hashmeta, BrightEdge, 2026) | Reddit, LinkedIn, G2 | "Ski ramp" (44% first-third) + 20.6% proper-noun density |
| Perplexity | 6.6%–46.7% (Profound / BrightEdge disagreement, 2026) | Forbes, TechCrunch, Reuters | Freshness weighted ~40% vs Google's ~5–10% | |
| Google AI Overviews | YouTube | 29.5% of AIO queries cite YouTube (Ahrefs, 2026) | Wikipedia, Reddit | 15+ entities/1K words → 4.8× selection lift |
| Gemini | YouTube + how-to / reference | 52% of responses include tables (2026) | Academic + structured data | Moving away from listicles (–40% Feb–Mar 2026) |
| Claude | Wikipedia + academic + gov | Not publicly disclosed | Established authority domains | Avoids Reddit + YouTube; 1.7× boost for limitations sections |
The distribution by top-level domain is lopsided. Across ChatGPT citations, 80.4% link to .com sites, 11.3% to .org, and 3.5% to country-code TLDs (Profound, 680M-citation dataset, 2026). Wikipedia alone accounts for 7.8% of ChatGPT's total citation volume; inside ChatGPT's top-10 most-cited sources, Wikipedia is 47.9%. Reddit dropped sharply on ChatGPT in September 2025 but remained stable on Perplexity and AI Mode (Semrush, 230K-prompt, 13-week study, Nov 2025).
What content format gets cited most?
Listicles and ranked comparisons dominate commercial queries; definitions and data tables dominate informational queries. 74.2% of AI citations come from listicle-format ranking pages, per GenOptima's 2026 benchmark. Cited pages average 13.75 distinct list sections per article vs. 0.81 for Google top-10 pages — a 17× differential (AirOps, 2026). 79% of ChatGPT-cited pages contain HTML list sections vs. 28.6% of Google top-10 pages.
Tables are cited disproportionately. Comparison pages with three or more tables earn 25.7% more citations (AirOps, 2026); Gemini embeds markdown tables in 52% of its own responses (Seer Interactive, 2026), suggesting strong preference to cite tabular data it can re-format. Pages with 19+ data points average 5.4 citations vs. 2.8 without — the 19-stat threshold is operational, not arbitrary (Bartlett / Lantern, 200M+ citation dataset, 2026). Below the 19-stat threshold, the curve is flat; above it, citation rate climbs linearly to roughly 30 stats.
How long should an article be?
The correlation between word count and citations is 0.04 — near zero (Bartlett / Lantern, 200M+ citations, 2026). The document-level length does not predict citation rate. What predicts citation is chunk-level extractability: the 75-to-150-word self-contained passages LLMs lift cleanly. Longer articles win when they contain more extractable chunks, not because length itself helps.
Two length sweet spots emerge from the 12-article teardown Cited ran on heavily-cited pieces. Data-payload articles cluster at 1,100–2,200 words (Ahrefs 78.6M study, Ahrefs 12% overlap, Wikipedia GEO entry, Go Fish case study). Pillar guides cluster at 2,800–5,500 words (Foundation Inc GEO guide, HubSpot GEO explainer, Leadfeeder 22-vendor listicle, Profound 10-step framework). The 500-word thin post and the 8,000-word mega-guide both underperform — thin posts lack chunk density, mega-guides are chunked regardless of length.
What claim density works?
Citation probability rises with atomic claims per article, not overall length. The threshold is 19+ inline statistics with named sources, and the optimum is one concrete claim every 30 to 50 words in factual sections — closer to one per 20 words in research pages (Bartlett, 2026; Discovered Labs, 2026).
Three additions from the Princeton GEO paper (arXiv 2311.09735) produce the largest incremental lift: adding statistics (+22% to +40% visibility), adding quotations (+27% to +37%), and adding inline citations (+24%). Keyword stuffing produces flat-to-negative results — the inverse of traditional SEO. Pronouns without antecedents ("it has been shown," "the platform," "they found") actively destroy chunk extraction because retrieval strips context away. Every chunk must name its own entity, cite its own source, and hold its own claim.
Does domain authority still matter?
Domain Authority correlation with AI citation has weakened from r=0.43 to r=0.18 in twelve months on Google AI Overviews (Ziptie.dev, 2026). Above DR 80, the correlation flips negative on ALM Corp's 548K-page sample: DR 20–80 sites earned a 21–24% citation rate while DR 80–100 sites earned 15%. The DR 80+ underperformance is structural — giant brand homepages tend to be under-chunked, under-listed, and under-dated compared to niche publisher pages.
What replaces DA is entity recognition. The Ahrefs 75K-brand study found unlinked brand mentions correlated with AI citations at r=0.664 while backlinks correlated only at r=0.218 — a 3:1 ratio in favor of mentions over links. The interpretation is that Knowledge Graph-grade entity resolution (a named brand appearing in news, directories, social, Wikidata) is now a stronger authority signal than PageRank-style backlink equity.
How much does freshness matter?
Freshness is the single hardest-gated signal in AI citation. 50% of Perplexity citations, 44% of Google AI Overview citations, and 31% of ChatGPT citations come from content published in the current calendar year (Seer Interactive, 5,000+ URL study, 2026). Content updated within 30 days earns roughly 3.2× more ChatGPT citations than stale equivalents (industry analysis, confounded by traffic).
The "Updated MMM YYYY" stamp alone lifts citations 1.8× on its own (Backlinko, 2026). Pages updated within two months average 5.0 citations vs. 3.9 for content older than two years (SE Ranking, 2.3M-page study, 2026). AI-cited content is 25.7% fresher than Google's organic top-10 on the same queries (Ahrefs 17M Citations Study, 2026). For competitive commercial verticals, a 90-day refresh cadence is the minimum to sustain citation share; for financial services, health, and news, the citation half-life collapses to 30 days or less — detail in the companion study on citation half-life.
What role does schema play?
71% of ChatGPT-cited pages use JSON-LD schema vs. roughly 35% of uncited pages (industry crawl sample, 2026, uncontrolled). 61% of ChatGPT-cited pages carry three or more schema types (AirOps, 2026) vs. 25% of Google SERP leaders. FAQ schema alone is associated with a ~3.2× appearance multiplier in AI Overviews; multimodal pages combining full schema with embedded media can lift selection rate by up to 317% (Digivate / Wellows, 2026, uncontrolled industry data).
The causal claim is softer than the correlation suggests — schema usage is confounded by the quality of the sites that implement it. What is reliably true is that schema makes the structure of a page machine-readable in a way paragraph HTML is not, and answer engines do not waste their ranking budget on pages they have to guess the structure of. Article + FAQPage + HowTo + Organization is the minimum stack; Product + Review + Dataset help on commercial or research pages.
How does each engine reward different things?
The five engines each have a distinct personality. Treating them as one market leaves roughly half the citation surface uncaptured.
- ChatGPT — Wikipedia-heavy (41.2%–47.9% of top-10 source share), ski-ramp (44% first-third), 20.6% proper-noun density trigger. Open with an encyclopedic definition in the first 300 words. Avoid hedging language — ChatGPT filters low-confidence claims.
- Perplexity — Freshness-first (~40% weight), Reddit-amplified, news-heavy. Publish a dated piece with visible "Updated" stamps, seed a relevant Reddit discussion in the week of publish, cite primary sources inline.
- Google AI Overviews — Query fan-out decomposition, YouTube-favoring (29.5% of AIO queries cite YouTube), entity-dense (15+ entities per 1K words → 4.8× lift), multimodal (text + image + video + schema → 156% higher selection rate per Ziptie.dev).
- Gemini — Moving away from "best of" listicles (–40% citation share Feb–Mar 2026) and generating its own ranked answers. Replace listicles with comparison matrices. Every article gets a table.
- Claude — Constitutional AI filtering rewards explicit risk/limitation sections with a 1.7× citation multiplier (ConvertMate, 2026). Avoids Reddit and YouTube. A dedicated "Where this breaks down" section is how Cited articles stay Claude-eligible without sacrificing confidence.
Is off-site signal bigger than on-page?
Yes. 85% of brand mentions in AI responses come from third-party pages, not owned domains (AirOps LLM Brand Citation Study, 2026). 76% of AI citations go to external sources beyond the brand and its direct competitors (Slate HQ, 2026). The verified off-site baseline is 56% of AI citations — the number Cited uses operationally.
The practical implication: brand-owned content can be structurally perfect and still lose if the brand has no earned-media footprint. Tier-1 editorial placements (Forbes, TechCrunch, Reuters, WSJ) dominate Perplexity citations for commercial queries; directories (G2, Capterra, Gartner) dominate ChatGPT's "best of" intent; LinkedIn's citation share climbed +11% on ChatGPT Search through Q1 2026 (Semrush LinkedIn 89K-URL study, March 2026). This is why Cited's AI Visibility Audit measures off-site citation surface separately from on-site extraction — optimizing one without the other is waste.
Where this breaks down
The synthesis above assumes the four underlying studies sampled representatively. They did not. Three caveats are load-bearing.
First, the Reddit-share disagreement between Profound (6.6% on Perplexity) and BrightEdge (46.7%) is not a rounding error — it is a definitional split. Profound counted unique-URL Reddit citations per response; BrightEdge counted any-citation appearance. Both numbers are correct for their methodology, and neither maps cleanly onto a marketer's decision. The honest answer is "Reddit is a significant source on Perplexity; do not assume the exact share."
Second, most "3.2× lift from schema" or "317% boost from multimodal" claims are correlational and likely confounded by overall site quality. The controlled-experiment evidence is sparse — the Princeton GEO paper's 30–40% figure is the cleanest, because it is test-framework-measured. Treat industry multipliers as directional, not predictive.
Third, every number in this article has a shelf life. The Gemini 3 rollout on January 27, 2026 replaced roughly 42% of previously cited domains and generated 32% more source URLs per response (ALM Corp analysis). A similar model refresh will happen again this year. The pattern is durable — extraction-first writing, freshness, earned-media off-site signal, entity density — but the specific numbers will move.
What to do next
Run your own extraction audit before writing anything new. Take your ten most important target queries, paste them into ChatGPT, Perplexity, and Google AI Mode, and record every cited domain. Cross-reference against the 11% cross-engine overlap — if you are cited on one but absent on the others, the gap is an engine-tuning problem, not a content problem. Cited's AI Visibility Audit runs this on 50 queries across 3 engines with a 48-hour turnaround; for a self-serve version, see the companion guide on top cited domains in 2026 and the enterprise invisibility audit for category-level benchmarks.
FAQ
How many AI citations did this study actually analyze? Cited cross-referenced four public 2026 studies. Combined coverage: 362,000+ user queries (seoClarity), 17M+ citations (Ahrefs), 548,534 retrieved pages (ALM Corp), and 200M+ citations (Lantern). The numbers quoted here are convergent findings — claims validated by two or more of the four sources.
What is the single most-cited domain in AI search right now? Wikipedia on ChatGPT (41.2%–47.9% of top-10 source share, BrightEdge / Hashmeta, 2026) and YouTube on Google AI Overviews (29.5% of AIO queries cite YouTube, Ahrefs, 2026). Both vastly exceed any individual commercial domain.
Does long-form content get cited more than short-form? Only incidentally. Word count correlates with citations at r=0.04 — near zero (Bartlett / Lantern, 2026). Longer articles have more extractable chunks, which gives them more citation surface. Naively writing longer without increasing chunk density does not improve citation rate.
Why do AI citations differ so much from Google rankings? Query fan-out. Google AI Overviews decomposes a user query into sub-queries, retrieves 200–500 docs per sub-query, and reranks at the passage level (Ziptie.dev, 2026). A page ranking #1 for the head query can lose every sub-query and not get cited. The Ahrefs top-10-overlap dropped from 76% to 38% in 18 months for this reason.
Which platform has the fastest citation turnover? Perplexity — freshness is weighted ~40% of its ranking signal vs. Google's ~5–10% (Data Studios, 2026). 50% of Perplexity citations come from content published in the current calendar year (Seer Interactive, 2026). Citation half-life on competitive commercial queries is under 90 days; for financial services and health it is under 30.
What is a "proprietary number" and why does every cited article have one? A proprietary number is an original statistic from a site's own data or a synthesis no other article has done — "78.6M searches," "230K prompts," "12% overlap," "62% invisibility." In Cited's 13-article teardown, every confirmed-cited article had one. It is the single most replicable citation hook because AI engines preferentially cite novel, quantitative, attributable claims.
Do backlinks still matter for AI citation? Much less than they used to. Ahrefs' 75K-brand study (2026) found unlinked brand mentions correlate with AI citations at r=0.664 while backlinks correlate at r=0.218 — a 3:1 advantage for mentions. Earned-media placements that mention a brand without linking to it now outweigh traditional link-building for AI visibility.
How often do AI answers change? Fast. 40–60% of domains cited in AI responses are completely different one month later (Conductor + Superlines, 2026). Only 30% of brands stay visible from one AI answer to the next, and only 20% remain visible across five consecutive test runs (Profound, 2026). Full breakdown in the 90-day citation half-life study.
Sources
- seoClarity. Overlap Between AI Overviews and Organic Rankings (Oct 2025, 362K queries). https://www.seoclarity.net/research/aio-rankings-overlap
- Ahrefs. Only 12% of AI Cited URLs Rank in Google's Top 10 for the Original Prompt (Aug 2025). https://ahrefs.com/blog/ai-search-overlap/
- Ahrefs. Update: 38% of AI Overview Citations Pull From The Top 10 (Mar 2026). https://ahrefs.com/blog/ai-overview-citations-top-10/
- Ahrefs. Do AI Assistants Prefer to Cite Fresh Content? (17M Citations Study, 2026). https://ahrefs.com/blog/do-ai-assistants-prefer-to-cite-fresh-content/
- ALM Corp. Google AI Overview Citations From Top-10 Pages Dropped From 76% to 38% (Feb 2026). https://almcorp.com/blog/google-ai-overview-citations-drop-top-ranking-pages-2026/
- ALM Corp. Why 85% of Pages ChatGPT Retrieves Are Never Cited (548,534-page study, 2026). https://almcorp.com/chatgpt-retrieval-fanout-google-serps-citations/
- Lantern / AskLantern. 10 Most Cited Domains Across ChatGPT, Perplexity, Gemini, and Claude (Feb 2026). https://www.asklantern.com/blogs/10-most-cited-domains-across-chatgpt-perplexity-gemini-and-claudee-here-s-the-pattern
- Semrush. The Most-Cited Domains in AI: A 3-Month Study (230K prompts, Nov 2025). https://www.semrush.com/blog/most-cited-domains-ai/
- Semrush. We Analyzed 89K LinkedIn URLs Cited in AI Search (March 2026). https://www.semrush.com/blog/linkedin-ai-visibility-study/
- Bartlett. What Content Formats Get Cited Most by AI? (200M+ citations, 2026). https://www.bradleebartlett.com/blog/what-content-formats-get-cited-by-ai
- AirOps. The 2026 State of AI Search — Structuring Content for LLMs. https://www.airops.com/report/structuring-content-for-llms
- Seer Interactive. AI Brand Visibility and Content Recency (5,000+ URL study, 2026). https://www.seerinteractive.com/insights/study-ai-brand-visibility-and-content-recency
- Aggarwal et al. GEO: Generative Engine Optimization. arXiv 2311.09735 (KDD 2024). https://arxiv.org/abs/2311.09735
- Structural Feature Engineering for Generative Engine Optimization. arXiv 2603.29979. https://arxiv.org/html/2603.29979
- SEO Smoothie. Inside ChatGPT's Citation Engine: The 2026 Blueprint (2026). https://seosmoothie.com/blog/inside-chatgpts-citation-engine-the-2026-blueprint-behind-its-search-logic/
- Ziptie.dev. Google AI Overviews Source Selection (2026). https://ziptie.dev/blog/google-ai-overviews-source-selection/
- ConvertMate. Claude Visibility Study (2026). https://www.convertmate.io/research/claude-visibility
- Profound. AI Platform Citation Patterns (680M citations, 2026). https://www.tryprofound.com/blog/ai-platform-citation-patterns
- BrightEdge. Google AI Overviews Holiday Citation Analysis (2026). https://www.brightedge.com/resources/weekly-ai-search-insights/google-ai-overviews-holiday-citation-analysis-youtube-dominance
- Superlines. AI Search Statistics 2026. https://www.superlines.io/articles/ai-search-statistics/
- Slate HQ. AI Citations Study (2026). https://slatehq.com/blog/ai-citations
About the author: The Cited Research Team runs Cited's proprietary AI citation audits across ChatGPT, Perplexity, Google AI Overviews, Gemini, and Claude. Cited is a GEO agency that gets brands recommended by AI — without touching the client's website. Run a free AI Visibility Audit to see where you are cited and where you are invisible.
Want Cited to run the audit for you?
50 target queries, 3 AI engines, competitor gap analysis. 48-hour turnaround. Free.
Get your free audit →