Why manual tracking breaks at week two.
Any marketer can ask ChatGPT ten questions and note who gets cited. The problem shows up on the second probe. Between one run and the next, 40–60% of domains cited in AI responses are completely different (Superlines / Conductor volatility study, 2026). Over six months the drift hits 70–90% (Growth Memo, 2026). The AI citation graph is not a stable artifact you can audit once a quarter — it is a live system you have to sample continuously.
Google AI Mode is the worst offender: responses to the same exact query overlap with themselves only 9.2% of the time across 3 consecutive tests (Growth Memo, 2026). Only 30% of brands stay visible between two consecutive runs (Profound AI Search Volatility, 2026). Rolling averages are the only honest measurement.
The five engines and what they cost to probe.
| Engine | Probe method | Approx. cost / 1,000 probes |
|---|---|---|
| ChatGPT | OpenAI Responses API with web_search tool | $3–$12 |
| Perplexity | Perplexity API (sonar family) | $1–$5 |
| Google AI Overviews | SerpAPI AIO parser | $15–$50 |
| Gemini | Gemini API with grounding enabled | $1–$4 |
| Claude | Anthropic Messages API with web-search tool | $2–$8 |
Google AI Overview probing is the expensive one — SerpAPI prices it at the SERP tier because the parser has to actually render the result. The rest are LLM calls. For a mid-market tenant tracking 100 queries across all five engines daily, expect roughly $150–$400/month in raw probe cost.
The metrics that matter (and the ones that don’t).
Metric 1 — Citation share
The fraction of tracked queries where your brand appears as a cited source. Segment by engine (per-platform share) and by query category (commercial vs informational). This is your headline number. Track week-over-week; report monthly.
Metric 2 — Position and snippet quality
Citations are not equal. A position-1 citation with a complete product name and a value-prop snippet drives clicks. A position-5 bare-domain mention does not. Instrument both.
Metric 3 — Citation volatility
Report citation share alongside the weekly standard deviation. High volatility on a growing mean is a sign the off-site moat is still forming. Low volatility on a flat mean means you’ve plateaued — time to open a new channel.
Metric 4 — AI referral traffic and conversion
GA4 filtered by chat.openai.com, perplexity.ai, gemini.google.com, claude.ai, and related bot referrers. AI-referred traffic converts at 14.2% vs 2.8% for organic (Semrush AI Search Study, 2025). Per-platform B2B conversion: ChatGPT 15.9%, Perplexity 10.5%, Claude 5%, Gemini 3%, Google Organic 1.76% (Seer Interactive / ALM Corp, 2026).
Metric 5 — Time-to-first-citation
Days between publishing or distributing a new asset and the first citation appearing in an AI answer. This is your shortest feedback loop — often 7–14 days for well-placed off-site content on Reddit or LinkedIn. It also tells you which channels are working and which ones aren’t.
Build-vs-buy — the honest math.
The DIY path is feasible. Run a nightly cron that POSTs each tracked query against OpenAI, Perplexity, Gemini, Claude, and SerpAPI, parses each response for your domain + brand entity, stores a row per (query, engine, run), and rolls up weekly share. The open-source stack (Postgres + pg-cron + Prisma/Drizzle + a Next.js dashboard) costs under $100/month in infra plus the $150–$400 in API usage from above. Fine for a small shop tracking one brand.
Where the math breaks for most teams: (a) the probe scheduler has to deduplicate, retry 429s, and back off per engine; (b) the citation parser has to identify your brand under every reasonable spelling and strip hallucinated citations that don’t exist on the source page; (c) the volatility math needs a rolling window that doesn’t re-score your citation share every time ChatGPT flips a coin. You will rebuild the same system that Profound, Otterly, Peec, Evertune, and Cited already shipped, and the Time-to-Insight gap eats 60–90 days of program budget.
Buy when the budget for the program exceeds $3,000/month — the fixed cost of the tracker amortizes below 10% of spend. Build when you have one brand, two engineers, and a strong reason to keep probe data in-house (healthcare, finance, EU).
What Google Search Console won’t tell you.
Google Search Console does not segment AI Overview impressions, does not report clicks from Gemini grounding, and does not identify which queries triggered an AIO at all. You can partially infer AIO exposure by comparing GSC impressions to click-through rate — queries with a CTR well below the position-weighted average on a volatility-adjusted basis almost certainly have an AI Overview absorbing the click. This is an estimate, not a signal. For anything load-bearing, probe the query directly.
Bing Webmaster Tools does surface some Copilot citation data, but the sample is thin and the API latency is long. Use it as a cross-check, not a source of truth.
Weekly review — the 30-minute ritual.
- Pull citation share by engine and by query category. Compare to rolling 4-week mean.
- Flag queries where you lost a citation in the last 7 days. Look for a common theme (channel, content-type, freshness).
- Flag queries where a new competitor entered. Open the cited source. Understand what they did; copy the structural move, not the words.
- Check time-to-first-citation on last week’s shipped content. Below 14 days is healthy; above 30 days is a channel problem, not a content problem.
- Update the gap list. Queue next week’s content briefs.
The shortest path to a complete answer.
Cited handles all of the above as the default configuration: five-engine probes, rolling citation share, volatility tracking, per-engine conversion reporting, gap analysis, content drafting, off-site distribution, and monthly PDF reports. If you want to see it run against your own brand, the 48-hour audit is free; pricing starts at $1,500/month for the Monitor tier.