What GEO actually is
Generative Engine Optimisation is the work of making a brand citable by generative AI systems. It is the discipline that replaces traditional SEO when the user reads the AI answer instead of clicking through to a website. It sits alongside paid acquisition, not apart from it: the same entity signals that earn an AI citation also sharpen the audience and creative models behind AI performance marketing, so a brand that is legible to the engines is also legible to the buying systems.
GEO has four pillars:
- Entity hygiene: Organization schema, sameAs across Wikidata, Crunchbase, LinkedIn Company, and named-publisher mentions
- Structured data: Article, Service, FAQPage, HowTo, BreadcrumbList with @graph cross-referenced @id
- Sources-first content: statistic density, expert quotation, primary citation, self-contained 50 to 150 word chunks
- Bot access: robots.txt that allows inference bots (OAI-SearchBot, PerplexityBot, Claude-Web, Gemini) while blocking training crawlers if you choose
The six AI engines that matter in 2026
| Engine | Volume rank | Citation behaviour |
|---|---|---|
| Google AI Overviews + AI Mode | 1 | High citation density, prefers schema-rich pages, surfaces FAQPage answers verbatim |
| ChatGPT Search | 2 | Cites sources inline with hyperlinks, prefers primary-cited content with statistic density |
| Perplexity | 3 | Source-list at end of answer, prefers self-contained chunks of 100-150 words |
| Gemini (inside Google products) | 4 | Cites alongside Google AI Overviews, similar selection criteria |
| Bing Copilot | 5 | Cites Bing-indexed content, prefers Bing-friendly schema and structured tables |
| Claude | 6 | Web search citations inline, prefers expert-quoted content and named sources |
Entity hygiene: making the AI know who you are
Before an AI engine can cite you it needs to know you exist as a discrete entity. Entity hygiene is the work of making your brand a stable, well-described node in the entity graphs the AI engines query. It carries the most weight in considered-purchase categories where buyers research before they commit, which is why we lean on it hardest for sectors like fintech, where a citation in an AI answer is often the first time a prospect meets the brand.
- Organization schema with all five identifier fields: name, alternateName, legalName, identifier (UEN/EIN/etc.), and the address PostalAddress block
- sameAs ladder: link to Wikidata, Crunchbase, LinkedIn Company, X, YouTube, GitHub if relevant. Wikidata in particular is the public entity graph the AI engines reconcile against
- Named-publisher mentions: brand mentions in publications the AI engines crawl as authoritative (industry trade press, regulator publications, academic citations)
- Founder Person schema with hasCredential entries, sameAs to LinkedIn, jobTitle, worksFor cross-referenced @id to the Organization node
Structured data: the @graph pattern
Every page on the site ships a single JSON-LD @graph with cross-referenced @id pointers. This is the pattern Google explicitly recommends and the AI engines consume. The 13 nodes that make up a complete consultancy page graph:
- Organization + ProfessionalService dual type
- Parent Organization (separate @graph node)
- Person (founder) with hasCredential
- WebSite with SearchAction
- WebPage with mainEntity, lastReviewed, reviewedBy, hasPart, Speakable
- Article or TechArticle wrapping the body
- BreadcrumbList
- Service with alternateName, hasOfferCatalog, areaServed
- FAQPage with author and dateModified per Answer
- HowTo with HowToStep array
- Dataset where benchmarks are published
- ImageObject for OG image
- ImageObject for hero accent
Validate every @graph block with json.loads before deploy. A single trailing comma or unclosed brace silently invalidates the entire script tag, and the AI engines see nothing.
Sources-first content: the Princeton GEO playbook
Princeton GEO research (arXiv:2311.09735) measured what content patterns lift AI citation impressions. The three highest-impact patterns:
| Pattern | Lift in AI citation impressions | How to apply |
|---|---|---|
| Statistic addition | about 37 percent | At least one specific number per 150-200 words. Source named inline. |
| Expert quotation | about 27 percent | Quote a named industry source or regulator with attribution in the same paragraph. |
| Outbound citation | about 22 percent | Hyperlink to the primary source publisher (.gov, vendor official, academic). |
| Authoritative phrasing | about 15 percent | Phrase claims as decisive operator judgment, not hedged general-purpose advice. |
| Easy-to-understand | about 10 percent | Self-contained 50-150 word chunks that read as a complete answer on their own. |
Bot access: the robots.txt tier policy
Robots.txt in 2026 is a tier policy, not a binary allow/disallow. Three tiers:
- Training crawlers: GPTBot, Google-Extended, CCBot, anthropic-ai. Block these if your content is your IP and you do not want it baked into the next model release.
- Inference crawlers: OAI-SearchBot, PerplexityBot, ClaudeBot, Gemini, Bingbot. Allow these. They are the bots that crawl in real time to answer user queries, and blocking them means you do not get cited.
- Default crawlers: Googlebot, Bingbot, DuckDuckGo, etc. Allow with standard rules.
Most marketing sites get this wrong by either blocking everything or allowing everything. The middle path (block training, allow inference) is the 2026 default for consultancy and B2B brands. Get the access tier right and citation becomes a demand source in its own right, feeding the same funnel that performance marketing spends paid budget to fill.