How to Measure AI Search Visibility (Without Expensive Tools)

Dhriti

•

Posted on 14/05/26•9 min read

By Dhriti Goyal

The question I get most often from mid-market marketers in 2026 is not “should we measure AI search?” It is “how do we measure it without spending $40,000 on a platform?” The honest answer is that you can and you should start this week.

AI search measurement has a public reputation for being inaccessible. Vendors price tools as if every brand has an enterprise budget. Most do not. But the underlying signals are not gated. Google Search Console reports AI-Mode performance for free. Bing Webmaster Tools surfaces Copilot citation signals for free. ChatGPT, Perplexity, Gemini, and AI Overviews can all be probed manually. A spreadsheet, two hours a week, and a disciplined prompt set will get a brand 70% of the directional answer a paid platform delivers at zero dollars in software.

This piece walks through that DIY program end-to-end. The four free layers, the setup for each, the limits of DIY, and the point at which the work returns more value when automated inside Pepper Atlas.

“Search is undergoing the most profound transformation of our time. Generative AI is redefining how people discover, trust, and engage with information – moving us from keywords and rankings to intelligence and context at scale.” – Anirudh Singla, Co-founder & CEO, Pepper Content (Index’25 keynote)

That transformation is not paywalled. Neither is the measurement of it.

Why Mid-Market Teams Stall and Why They Should Not

The pattern repeats in every mid-market conversation. The CEO asks for an AI-search KPI. The team prices an enterprise tool. The price exceeds quarterly content budget. The project gets pushed to “next year.” The competitor that does not postpone that runs a manual program for three quarters before buying a tool quietly pulls ahead on the metric that has started to move pipeline.

The cost of waiting is asymmetric. AI Overviews appear on 48% of Google queries today. By the time the budget approval lands, citation patterns have hardened, and reversing them takes two-to-three times the content investment that establishing them would have.

“Enterprise marketing is being re-architected around retrievability, not production volume. The teams that figured this out first did not wait for a tool. They started with a spreadsheet.” – Mandy Dhaliwal, CMO, Nutanix (Index’25)

This article is the spreadsheet.

The Four Free Measurement Layers

A working DIY measurement program runs four layers in parallel. None requires a paid tool. Together, they cover Visibility, Citation, and Referral well enough to action a content roadmap.

Manual prompt testing – direct probing of ChatGPT, Perplexity, Gemini, AI Overviews, and AI Mode against a fixed prompt set.
Google Search Console – AI-Mode and AI Overview impression and click data, free for any verified property.
Bing Webmaster Tools – Copilot citation signals and crawl logs, free for any verified property.
The DIY Spreadsheet Share of Answer Tracker – a single Google Sheet that aggregates the first three into a weekly dashboard.

Each layer is described in detail below, with the exact protocol the Pepper team uses with mid-market customers before they upgrade to Atlas.

Layer 1 – Manual Prompt Testing (The Ground Truth Layer)

Manual prompt testing is the most under-rated measurement practice of 2026. It costs nothing, takes ninety minutes a fortnight, and gives a team direct line-of-sight into what AI engines actually say about its category. Every paid tool in the market is, fundamentally, an automated version of this practice.

The Setup (One-Time, ~2 hours)

Define a locked prompt set of 50 to 100 queries spanning three intent layers: definitional (“what is X”), comparative (“X vs Y”), and decision-stage (“best X for [use case]”). The denominator must stay fixed for trend lines to mean anything. Minimum viable platform mix: ChatGPT, Perplexity, and Google AI Overviews. Add Gemini and AI Mode once the rhythm is established.

The Protocol (Bi-Weekly, ~90 minutes)

Open a private window – logged out, no personalisation. Run every prompt across every platform. For each result, record three things: did the AI cite the brand (Yes/No), which URL was cited, and which competitors appeared in the same answer. Paste the first 200 characters of the answer text for later qualitative review.

What This Layer Tells You

Share of Answer: the percentage of prompts on which you are cited.
Competitor citation patterns: who else is in the answer set, on which prompts.
Cited URL distribution – which of your pages are doing the heavy lifting.
Answer-text drift – how the AI’s description of your category is changing.

→ Atlas: Atlas automates this protocol across 100–500 prompts daily, weights citations by platform, and surfaces answer-text drift week-over-week. The manual version produces the same directional read for free.

Layer 2 – Google Search Console AI Signals

Google Search Console quietly added AI-Mode and AI Overview reporting in late 2025. Most teams have not turned the report on. It is the second-highest-leverage free signal available, and it is sitting inside an account every brand already owns.

What to Enable

Inside Search Console, open the Performance report. Filter Search Appearance by “AI Overviews” and, separately, by “AI Mode.” Both filters return impression and click data segmented by query, page, and country. The data populates back ninety days the moment the filter is applied.

The Three Reports That Matter

Top AI-Overview-triggering queries. Sort the AI Overviews filter by impressions descending. The top fifty queries are the AI surfaces your brand has earned a position on. Treat them as a heat map for content reinforcement.

Pages cited in AI Mode. The AI Mode filter shows which of your pages are appearing inside conversational answers. Healthy patterns concentrate on five-to-fifteen pages. Concentration above thirty pages without traffic uplift signals fragmentation – the AI is sampling broadly but trusting nothing.

AI-attributed CTR vs baseline. Compare CTR on AI-Overview-flagged queries against your organic baseline. Pages cited inside the AI Overview should show CTR above baseline. Pages on AI-Overview queries that are not themselves cited will show CTR sharply below baseline – the canonical signal that an AI Overview is consuming the click.

What to Do With the Data

Export weekly to the spreadsheet from Layer 4. Tag every query as “cited” or “not cited.” The not-cited list is the rebuild backlog – high-impression, zero-citation pages are the highest-leverage content investments a mid-market team will make in 2026.

→ Atlas: Atlas ingests the GSC AI-Overview and AI-Mode feeds via API and pivots them against citation data from ChatGPT, Perplexity, and Gemini in a single view. The DIY version asks you to do that pivot manually – useful for the first two quarters, painful by the third.

Layer 3 – Bing Webmaster Tools (The Copilot Window)

Bing Webmaster Tools is the most under-used free measurement asset for AI search. Microsoft Copilot draws heavily on Bing’s index, and ChatGPT’s web-browsing layer relies on Bing under the hood. The Webmaster console exposes the underlying signals – free, for any verified property.

What to Enable

Verify the property (a five-minute DNS or meta-tag step). Inside the console, the “Search Performance” report includes a Copilot filter as of November 2025. The “URL Inspection” tool shows whether a URL is indexed and rendered the way Bing expects – a strong proxy for Copilot eligibility.

The Three Signals That Matter

Copilot impression share. The number of Copilot answers your domain has appeared inside, segmented by query. A directional proxy for ChatGPT visibility.

Crawl coverage. Pages flagged “Discovered, not indexed” are invisible to Copilot. Pages flagged “Indexed, not selected” are visible but not preferred. Both lists are direct content-fix backlogs.

Backlinks from authoritative sources. Bing exposes a richer backlink data set than Google. Editorial citations and tier-one publication mentions show up here first; they are leading indicators of AI citations across both Copilot and ChatGPT within four-to-six weeks.

→ Atlas: Atlas reconciles Bing Webmaster signals with Perplexity and ChatGPT citation data so a team can see whether a Copilot impression is converting to a ChatGPT citation. The DIY version surfaces the input but not the conversion.

Layer 4 – The DIY Spreadsheet Share of Answer Tracker

The fourth layer stitches the first three into a single, defensible measurement. One Google Sheet, one tab, nine columns, two pivot tables – and the highest-leverage marketing artefact a mid-market team will build in 2026.

The Schema

Column	What it captures	Source	Cadence
Prompt	The exact text of the locked-set query.	Internal	Once / quarter
Intent layer	Definitional, comparative, or decision-stage.	Internal	Once / quarter
Platform	ChatGPT, Perplexity, Gemini, AI Overviews, AI Mode.	Manual	Bi-weekly
Cited (Y/N)	Binary – was the brand named, sourced, or hyperlinked?	Manual probe	Bi-weekly
Cited URL	Which page was cited, if any.	Manual probe	Bi-weekly
Competitors cited	Comma-separated list of brands in the same answer.	Manual probe	Bi-weekly
GSC AI impressions	Impressions on AI Overview / AI Mode queries.	Search Console	Weekly
Bing Copilot impressions	Copilot answer impressions for the property.	Bing WMT	Weekly
Notes / answer text	First 200 characters of the AI answer for drift review.	Manual paste	Bi-weekly

The Two Pivot Tables

Pivot 1 – Share of Answer by platform. Rows: platform; columns: week; value: cited rows ÷ total rows. The single number to take into a CMO review.

Pivot 2 – Cited URL frequency. Rows: cited URL; columns: week; value: count. The page-level diagnostic for which assets are doing the citation work and which need rebuilding.

The Cadence That Makes It Stick

Bi-weekly manual probe (90 minutes). Weekly GSC and Bing exports (15 minutes). Monthly review meeting (45 minutes). Quarterly prompt-set audit (90 minutes). Total: roughly six hours of human labour a month, across two people.

→ Atlas: Atlas replaces the spreadsheet with a daily-refreshed dashboard, automates manual probes across 100–500 prompts and five platforms, and pipes GSC and Bing signals into one view. Most teams move to Atlas around month four – when the spreadsheet has produced enough trend data to justify the investment, and not a quarter sooner.

Where DIY Hits Its Ceiling

The free stack will get a brand to a credible Share of Answer trend, a competitor citation map, and a working content roadmap. It will not, on its own, do four things – and those four are the upgrade trigger.

Scale beyond a few hundred prompts. Manual probing of 500 prompts across five platforms exceeds the time budget of any in-house team. Atlas runs that volume daily.

Weight citations by platform behaviour. Not every citation moves the needle equally. Atlas weights Perplexity citations (high-intent traffic) differently from Gemini mentions (broader reach). DIY treats them all as Yes/No.

Pivot answer-text drift across competitors. Atlas tracks how the AI’s description of your category is changing – language that hardens into a definition you do not control unless you correct it within weeks. This is observable in DIY only if a human reads every answer.

Tie citation gains to pipeline impact. Atlas joins citation data with CRM-side conversion data so a CMO can attribute AI-search wins to revenue. This is the layer where the function-leader-grade ROI conversation happens.

The right sequencing is to start free, prove the metric matters, generate three months of trend data, and only then upgrade. Teams that buy first and measure second almost always under-use the platform.

Insights: What Marketing Leaders Are Saying About DIY Measurement

The marketing leaders at Index’25 were blunt about how their teams started – and what they wish they had done sooner.

“We measured by hand for six months before we bought anything. Those six months made us better operators than any tool ever did. The tool then made us scalable.” – Sydney Sloan, former CMO, G2 (Index’25)

“In a world where AI summarizes everything, the brands that get summarized favourably are the ones with the clearest positioning. You only see whether your positioning is winning by reading the AI answers – and reading them is free.” – Angelique Bellmer Krembs, former CMO, PepsiCo (Index’25)

“AI search collapses the distance between brand and demand. The first dashboard we built was a Google Sheet. The fact that it was free is what got it adopted.” – Joyce Hwang, Head of Marketing, Dropbox (Index’25)

“Be the source worth citing. Measuring that doesn’t require a license – it requires a habit.” – Neil Patel (Index’25 keynote)

Measurement is the keep-score function for the GEO rule book – and the score sheet starts on a free tier.

The Quiet Truth About Measuring AI Search

The CMOs winning the AI-search race in 2026 did not start with a $40,000 platform. They started with a private window, a locked prompt set, a Google Sheet, and the discipline to run the protocol every fortnight for two quarters. They built the muscle before they bought the gym.

Manual probing for ground truth. Search Console for impression data. Bing Webmaster Tools for the Copilot window. A spreadsheet to stitch them together. Six hours of work a month. Zero dollars in software. A measurement program defensible at C-level. When the trend lines justify scale, Atlas is the next step – not the first one.

→ Atlas: When you are ready, Pepper Atlas automates every layer above across 100–500 prompts and five AI surfaces, daily. Until then, the spreadsheet is enough.

Frequently Asked Questions

Is manual prompt testing credible at C-level? Yes. Every paid tool is an automated version of the same protocol. A locked prompt set, fixed cadence, and documented citation rule produce a defensible Share of Answer trend at zero cost.

How many prompts is enough? Fifty for sanity-check; 100 for a quarterly trend; 250 for a defensible benchmark; 500 for a complete competitive map.

Why use a private window? Personalisation and prior chat context contaminate the AI’s answers. A private window approximates a generic user – the audience the AI engines are tuned to serve.

When should we upgrade to Atlas? When the spreadsheet has produced two-to-three quarters of trend data, the prompt set has grown beyond manual capacity, and the CMO needs pipeline-level attribution.