How to Run an AI Search Audit for a Brand or URL

| An AI search audit answers one question: when your buyers ask LLMs the questions that matter in your category, does your brand appear – and if not, who does, and why? Pepper’s audit methodology runs in six steps: (1) define the prompt universe, (2) run prompts across all major LLMs, (3) audit brand mentions vs. domain citations, (4) run theme-level competitor analysis, (5) generate page-level recommendations, and (6) implement and re-run monthly. This post walks through every step using a real case – the audit Pepper ran on itself, which found 0 mentions and a #612 rank, and became the foundation of an 8-month citation recovery program. |
The Audit Trail: Every Step, Mapped
- Why Every Brand Needs an AI Search Baseline – Now
- The Case Study: Pepper Audited Itself (And the Results Were Brutal)
- Step 1 – Define the Prompt Universe
- Step 2 – Run Prompts Across All Major LLMs
- Step 3 – Audit Brand Mentions vs. Domain Citations
- Step 4 – Theme-Level Competitor Analysis
- Step 5 – Page-Level Recommendations
- Step 6 – Implement and Iterate Monthly
- Industry Updates: What Marketing Leaders Are Saying
- YouTube Script
- FAQ
You can’t fix what you haven’t measured
Every GEO conversation eventually arrives at the same question: where do we actually stand? Before a brand invests in content restructuring, schema implementation, or community presence, it needs a baseline – a rigorous measurement of how it appears (or doesn’t) across the AI search surfaces its buyers use.
That measurement is the AI search audit. Done properly, it produces three things: a quantified baseline (mentions, citations, share of voice, rank), a competitor map (who owns the queries you’re missing, and what they did to own them), and a prioritized action plan (which pages to create, which to fix, in what order).
This post walks through Pepper’s complete audit methodology – the same six-step process used in Atlas audits for enterprise clients – illustrated with the most honest case study available: the audit Pepper ran on itself.
| “I’ll start by doing it tonight. Spend time looking through answer engines, asking the questions your customers are asking. See if it’s recommending you – and if it’s telling the story you want told about your brand. Start there.” – Panelist, Index ’26 ecosystem panel |
| DEFINITION: AI Search Audit (GEO Audit) |
| An AI search audit is a structured measurement of how a brand or URL appears across AI search platforms – ChatGPT, Gemini, Perplexity, Claude, and Google AI Overviews – for a defined set of buyer-relevant prompts. It quantifies brand mentions, domain citations, share of voice, and average brand position; identifies which competitors win the prompts the brand is missing; and produces page-level recommendations to close the gaps. Unlike a traditional SEO audit, which measures rankings on search results pages, an AI search audit measures presence inside generated answers. |
The Case Study: Pepper Audited Itself (And the Results Were Brutal)
In March 2026, Pepper ran its own brand through Atlas – the same audit process used for enterprise clients. We mapped 100 prompts across 10 themes covering the queries our buyers actually use: content marketing platforms, AI-era SEO and GEO optimization, content production scaling, marketing ROI, and more.
The results:
| Metric | Pepper | Top Competitor (Google) |
| Total LLM Mentions | 0 | 74 |
| Brand Rank | #612 | #1 |
| Share of Voice | 0% | 25.1% |
| Themes with Mentions | 0 of 10 | 10 of 10 |
| Domain Citations | 0 | Cited via YouTube, G2, press |
Semrush had 49 mentions. Contently had 46. HubSpot had 46. Pepper – the company that coined ‘Search Everywhere Optimization’ and runs the Index GEO Growth Summit – had zero.
This is the most important property of a well-run audit: it doesn’t care about your brand story. It measures what the machines actually say. And what they say is frequently a shock – which is exactly why the audit must come before the strategy.
| Why this case is instructive: Pepper’s product results were exceptional – Freshworks at a $330K renewal, Atlassian at 2.8x clicks per article, Mutual of Omaha at 189% MoM click growth. The zero-citation result wasn’t a product problem. It was a visibility and distribution problem. That distinction – which only an audit can establish – determines the entire strategy that follows. |
| STEP 1 | Define the Prompt Universe The 100 questions your buyers actually ask LLMs |
The prompt universe is the foundation of the entire audit. Get it wrong, and every downstream measurement is measuring the wrong thing. The prompt universe is the set of questions your actual buyers ask AI systems during their discovery, evaluation, and decision process – not the keywords you wish they searched.
Pepper’s standard structure: 100 prompts organized into 10 themes, 10 prompts per theme. For the self-audit, themes included ‘AI-Era SEO & GEO Optimization,’ ‘Content Production Efficiency & Scalability,’ ‘Content Marketing ROI & Performance Measurement,’ and ‘Affordable Content & Marketing Solutions for Startups.’
There are 4 prompt sources to draw from:
- The exact phrases prospects use when describing their problem. ‘How do we scale content without hiring 20 writers’ is a prompt. ‘Content scaling solutions’ is a keyword. Use the prompt.Sales call language –
- Cover TOFU (definitional: ‘what is GEO’), MOFU (evaluative: ‘best GEO platforms for enterprise’), and BOFU (comparative: ‘[Brand A] vs [Brand B]’, ‘[Competitor] alternatives’) prompts in every theme.Funnel-stage variants –
- Mine Reddit threads, Quora questions, and LinkedIn comments in your category. These are verbatim records of how your market phrases its questions.Community phrasing –
- Ask the LLMs themselves: ‘What questions do marketing leaders ask about [category]?’ The fan-out queries the models suggest are the queries they’re already answering for someone.LLM-suggested expansions –
| STEP 2 | Run Prompts Across All Major LLMs One model is not a baseline. Five models are. |
Every prompt must run across every major AI search surface. LLM answers diverge significantly – a brand can be well-cited in Perplexity and invisible in ChatGPT, because each platform has different retrieval sources, different training data emphases, and different citation behaviors.
The minimum platform set: ChatGPT, Google Gemini (including AI Overviews), Perplexity, and Claude. Add Bing Copilot for B2B categories where Microsoft ecosystem buyers matter.
Execution notes that matter for measurement validity:
- Run prompts in clean sessions – no conversation history, no custom instructions, no memory. You’re measuring the default answer a new buyer receives.
- Record the full response, not just whether your brand appeared – position in the answer, sentiment of the mention, and which URLs were cited as sources.
- Run the full set in a consistent window (1–2 days) – answers drift over time, and a scattered collection window corrupts comparability.
- Automate it – manually running 100 prompts across 5 platforms is 500 executions per cycle. This is what Atlas automates, with weekly re-scans and competitor alerts.
| STEP 3 | Audit Brand Mentions vs. Domain Citations Two different metrics. Two different problems. Never conflate them. |
This is the most commonly misunderstood distinction in AI search measurement. A brand mention is when the LLM names your brand in its answer. A domain citation is when the LLM cites your website as a source. They are different signals, with different causes, requiring different fixes.
| Brand Mentions | Domain Citations |
| LLM names your brand in the answer text | LLM links your website as a source for the answer |
| Driven by: training data presence, third-party coverage, entity recognition, directory listings | Driven by: content structure, schema, retrievability, indexed answer-format pages |
| Fix with: G2/Capterra profiles, press coverage, Wikipedia/Wikidata, community presence | Fix with: structured pages targeting prompts, FAQ schema, llms.txt, chunkable content |
| Measured as: Brand Coverage (% of prompts mentioning you) and Share of Voice | Measured as: Domain Coverage (% of prompts citing your site) and Domain Citation count |
| Failure means: the model doesn’t know your brand exists | Failure means: the model knows you but won’t use your site as evidence |
The six core metrics every audit should report: Brand Mentions (count), Share of Voice (your mentions ÷ all brand mentions), Brand Position (average position in answers), Domain Citations (count), Brand Coverage (% of prompts mentioning you), and Domain Coverage (% of prompts citing your site).
The diagnostic power is in the combination. High mentions + low citations means the market knows you but your site isn’t structured for retrieval. Low mentions + low citations – Pepper’s own starting state – means an entity and distribution problem that content fixes alone won’t solve.
| STEP 4 | Theme-Level Competitor Analysis Who wins each theme – and what they did to win it |
Aggregate numbers hide the strategy. Theme-level analysis reveals it. For each of the 10 themes, the audit builds a benchmark: what percentage of that theme’s prompts does each competitor appear in?
In Pepper’s self-audit, the theme benchmarks were revealing: Google owned 63% visibility on ‘SEO & Organic Growth on Limited Budget’ but only 10% on ‘Expert Content Talent & Production Scaling.’ Contently led ‘Content Production Scaling’ themes at 20–27%. HubSpot dominated ‘AI-Powered Marketing Transformation & ROI’ at 47%. No single competitor owned everything – each had carved out theme-level territories.
For each theme, the analysis answers three questions:
- This sets the realistic target. Beating a 63% incumbent requires a different investment than closing a gap on a theme where the leader sits at 17%.Who wins this theme, and at what visibility percentage? –
- Trace the citations back. Is the competitor winning through comparison pages? YouTube videos? G2 category placement? Reddit threads? Press coverage? Each asset type implies a different replication path.What assets earn their visibility? –
- Themes with fragmented leadership (no competitor above 20%) and asset types you can produce quickly (comparison pages, FAQ guides) are the 90-day targets. Themes with entrenched leaders and entity-driven visibility (Wikipedia, massive review bases) are long-cycle plays.Which themes are winnable in 90 days vs. 12 months? –
| STEP 5 | Page-Level Recommendations Translate every gap into a specific URL to create or fix |
An audit that ends with ‘improve your content’ has failed. The output of Step 5 is a prioritized list of specific pages – URLs to create, URLs to restructure – each mapped to the prompts it targets and the citation gap it closes.
From Pepper’s self-audit, the page-level recommendations were concrete:
- Create comparison/alternative pages – ‘Alternatives to Contently / Skyword / ClearVoice’ appeared across 5 of 10 themes with no Pepper page targeting any of them. Highest-ROI single opportunity in the entire audit.
- Create the definitional /geo page – Pepper coined ‘Search Everywhere Optimization’ yet had no authoritative GEO definition page. Competitors filled the definitional queries Pepper should own.
- Launch YouTube – the #1 cited domain across LLMs (95 pages in Atlas data), where Pepper had zero presence. Largest single channel gap.
- Fix technical signals – no llms.txt, no Organization schema, no Wikidata entity. AI crawlers could not identify what Pepper was.
- Claim G2 and review platforms – G2 alone drives 14 LLM citation pages; directory presence carries 20–70% citation weight. Pepper was unlisted.
Each recommendation carries three attributes: the prompts it targets (from the universe), the expected citation impact (based on asset-type weight), and the effort class (hours for technical fixes, days for pages, months for entity building). That triage produces the execution sequence.
| STEP 6 | Implement and Iterate Monthly The audit is a loop, not a report |
The audit’s value compounds only if it re-runs. The final step of the methodology: implement the page-level recommendations, then re-run the full prompt set monthly to measure lift, catch competitor moves, and uncover new topical gaps.
The monthly re-run answers: which new pages earned citations? Which themes moved? Did any competitor spike (and from what asset)? What new prompts are buyers asking that the universe should absorb? Pepper’s standard practice adds 20 new prompts per month to the tracked set as new content publishes.
In Pepper’s own program, the audit baseline (0 citations, #612) became the measurement spine for an 8-month execution calendar – with monthly Atlas reports tracking progress against targets of 30+ citations by Month 3, 100+ by Month 6, and 200+ by Month 8.
Industry Updates: What Marketing Leaders Are Saying
‘Go Do a Search. We’re Not Showing Up.’
At Pepper’s Index ’26 summit, Allison from O’Reilly described how a simple manual audit became the internal catalyst for the entire GEO program: ‘I actually had to start at the top. I started with the president. And I was like, go do a search. Go search for top learning platforms for a tech team. We don’t even show up. But our buyers are there, and we’re not.’ The lesson for practitioners: before the formal audit, the informal one – a leadership team watching their brand fail to appear – is the fastest budget-unlocking demonstration in GEO.
The ‘Audit Tonight’ Directive From the Ecosystem Panel
The Index ’26 ecosystem panel closed with a homework assignment: spend the evening querying answer engines with your customers’ actual questions, checking whether the story being told about your brand is the story you want told. The framing matters – the panelists positioned the audit not as a vendor deliverable but as a recurring leadership discipline, with one adding that gaps found at night should become content briefs by the next morning.
LLM Traffic Converts 4–6x Higher – Which Raises Audit Stakes
A data point from the Index ’26 enterprise CMO panel reframes why audit gaps are expensive: community data shared on stage showed LLM-referred traffic converting 4 to 6 times higher than other channels. Every prompt where a competitor appears and you don’t isn’t just a visibility loss – it’s a loss of the highest-converting traffic source currently measurable. The audit quantifies exactly how much of that traffic is being conceded, theme by theme.
Boards Are Now Asking for the Baseline
Christine, a CMO on the Index ’26 enterprise panel, described educating her board on GEO metrics: ‘We’ve had to educate our board a little bit on GEO and what are the metrics that you measure to see success. We measure it down to the pipeline and the revenue.’ The audit baseline – share of voice, theme coverage, citation counts – is becoming standard board-reporting material. CMOs who can’t produce a measured baseline are increasingly the exception in enterprise reviews.
Closed-Loop Audit Systems Are the Next Wave
Dave, an investor on the Index ’26 ecosystem panel, flagged where audit methodology is heading: ‘Startups coming out nowadays work with companies like Pepper to optimize appearance in answer engines, but also then take that signal, go back automatically, produce new content for all the channels, and then measure the impact of that content. There’s a closed-loop nature to this.’ The monthly manual re-run is the current standard; the automated audit-to-content-to-measurement loop is the emerging one.
FAQ: Running an AI Search Audit
What is an AI search audit?
An AI search audit is a structured measurement of how a brand or URL appears across AI search platforms – ChatGPT, Gemini, Perplexity, Claude, and Google AI Overviews – for a defined set of buyer-relevant prompts. It quantifies brand mentions, domain citations, share of voice, and brand position; maps which competitors win the prompts the brand misses; and converts those gaps into page-level recommendations. It differs from an SEO audit by measuring presence inside generated answers rather than rankings on results pages.
How many prompts should an AI search audit include?
Pepper’s standard is 100 prompts organized into 10 themes of 10 prompts each, covering TOFU (definitional), MOFU (evaluative), and BOFU (comparative) buyer questions. Fewer than 50 prompts produces unreliable theme-level analysis; more than 200 adds collection cost without proportional insight at baseline. The set should grow by roughly 20 prompts per month as new content publishes and new buyer questions surface in sales calls and communities.
What is the difference between a brand mention and a domain citation?
A brand mention is when an LLM names your brand in its answer text; a domain citation is when the LLM links your website as a source. Mentions are driven by training-data presence, third-party coverage, and entity recognition – fixed through directories, press, and Wikipedia/Wikidata. Citations are driven by content structure and retrievability – fixed through structured pages, FAQ schema, and llms.txt. High mentions with low citations means the market knows you but your site isn’t retrievable; low on both means an entity problem content alone won’t solve.
Which LLMs should an AI search audit cover?
The minimum platform set is ChatGPT, Google Gemini (including AI Overviews), Perplexity, and Claude – adding Bing Copilot for B2B categories with Microsoft-ecosystem buyers. Coverage across all platforms matters because answers diverge significantly: a brand can lead in Perplexity while remaining invisible in ChatGPT, since each platform uses different retrieval sources and citation behaviors. Single-platform audits routinely produce false confidence.
How often should an AI search audit be re-run?
Monthly for the full prompt set, with weekly scans on priority queries. LLM answers shift as new content gets indexed, competitors publish, and models update – a quarterly cadence misses competitive moves for up to 90 days. The monthly re-run measures lift from implemented recommendations, catches competitor citation spikes, and surfaces new prompts to absorb into the tracked universe. Platforms like Pepper’s Atlas automate the re-run with alerts for significant changes.
| Want the audit run for your brand? Pepper’s Atlas platform executes this exact methodology – 100-prompt universe, five-LLM coverage, mention and citation measurement, theme benchmarks, and page-level recommendations – with monthly re-runs built in. Start your AI search audit at atlas.pepper.inc |
Latest Blogs
YouTube is the most cited domain across all LLMs – ahead of LinkedIn, Reddit, and G2 combined. But most brands run a single video strategy when they actually need two distinct ones: a search/conceptual strategy (FAQ and how-to videos that answer specific AI search queries and earn LLM citations) and an engagement/discovery strategy (broad, personality-led […]
An AI search audit answers one question: when your buyers ask LLMs the questions that matter in your category, does your brand appear – and if not, who does, and why? Pepper’s audit methodology runs in six steps: (1) define the prompt universe, (2) run prompts across all major LLMs, (3) audit brand mentions vs. […]
SaaS is where AI search is moving fastest, and where the gap between leaders and laggards is widening most aggressively. In the last six months, the way SaaS buyers find software has flipped. The first action used to be a Google search. It is now a prompt: “What is the best CRM for an outbound […]
Get your hands on the latest news!
Similar Posts

Artificial Intelligence
10 mins read
Video Strategy for AI Search: Search vs. Discovery

Artificial Intelligence
8 mins read
AI Search for SaaS: How Software Brands Win

Artificial Intelligence
20 mins read