Updated · May 2026

GEO Guide 2026: how to make ChatGPT and Claude cite your website

Traditional SEO is a fight to enter Google's top 10. GEO (Generative Engine Optimization) plays a different game: 83% of Google AI Overview citations come from pages outside the top 10. This guide shows you, with real data and copy-paste code, how to configure your site in one hour so ChatGPT, Claude, Perplexity, and Copilot find and cite it.

Start from the basics Implement with Claude Code

✓ Based on Princeton and IIT Delhi research ✓ Verified data from May 2026 ✓ Copy-paste ready code

What is GEO and why it matters in 2026

GEO (Generative Engine Optimization) is the set of practices that make your website visible and citable by generative search engines like ChatGPT, Claude, Perplexity, Google AI Overview, and Microsoft Copilot. Unlike classic SEO, GEO isn't about climbing positions: it's about helping AI understand what you already have.

The term was coined in the paper "GEO: Generative Engine Optimization," published by Princeton and IIT Delhi researchers at KDD 2024. The central finding: visibility in AI responses increases up to 115% by adding authoritative citations, 43% with direct quotations from credible sources, and 33% with relevant statistics.

The traffic data also explains why we're talking about GEO now and not earlier: AI search grew 527% year-over-year in the first half of 2025, ChatGPT reached 900 million weekly active users by February 2026, and AI-referred traffic converts at 5x the rate of traditional search. Even so, it still represents less than 1% of total traffic. That single fact reframes everything:

"GEO is a brand visibility strategy, not a traffic strategy. Worth an hour of setup, not a week." — @HiTw93

SEO vs GEO: four key differences

If you come from traditional SEO, the first thing to understand is that the rules are different. This table summarizes the differences that matter most when planning content:

Aspect	Traditional SEO	GEO
Goal	Top 10 in Google	Get cited in AI answers
Key metric	Position + clicks	Citations + retrieval-to-citation rate
Signals that matter	PageRank, backlinks, CTR	Clear structure, reliable sources, specific data
Where citations come from	Top 10 results	83% from outside the top 10

That last row is the most important news for small sites: the PageRank moat no longer protects the giants in the AI era. If your README or documentation is well-written, you can outrank a massive site with thin content.

The four types of AI crawlers you need to know

Most people treat robots.txt as a binary switch: either block all AI crawlers or let them all through. That's a costly mistake. AI crawlers do very different things and should be handled separately.

Type	Examples	What they do	Recommendation
Training	GPTBot, ClaudeBot, CCBot, Meta-ExternalAgent	Take your content to train future models	Block if you want to opt out of training
Search and retrieval	OAI-SearchBot, Claude-SearchBot, PerplexityBot	Fetch in real time to answer queries	Always allow
User-triggered	ChatGPT-User, Claude-User, Perplexity-User	Only fire when someone pastes your URL into a chat	Always allow
Undeclared	Bytespider, unidentified bots	Don't follow rules	Block

The most expensive mistake: blocking OAI-SearchBot thinking you're protecting your content. What you actually did was disappear from ChatGPT search results and get nothing in return.

How to configure robots.txt for AI step by step

Create a robots.txt file at the root of your site. The strategy recommended by @HiTw93 is: allow search and user-triggered crawlers, block training and undeclared ones.

# Search and retrieval: allow
User-agent: OAI-SearchBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

# User-triggered: allow
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

# Training: block (optional)
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

# Opt-out tokens
User-agent: Google-Extended
Disallow: /

# Undeclared: block
User-agent: Bytespider
Disallow: /

Sitemap: https://yoursite.com/sitemap.xml

If your priority is long-term brand exposure (so the next generation of models knows about you), keep GPTBot and CCBot on Allow. If your priority is control, block them. Bytespider should always be blocked: it doesn't identify itself properly and doesn't respect rules.

How to create your llms.txt: the AI business card

llms.txt is a new standard, similar to robots.txt but designed for AI consumption. You place a Markdown file at the root of your site describing what it does, its key pages, and who's behind it. AI systems prioritize this file when crawling your content.

According to BuiltWith, more than 840,000 sites have already deployed llms.txt, including Anthropic, Cloudflare, Stripe, and Vercel. But SE Ranking's survey of 300,000 domains shows real adoption at only 10%. In other words: you're early, and that's an advantage.

The format is simple. Create /llms.txt with this structure:

# Your project name

> One-line description of what this is.

## Links

- [Documentation](https://yoursite.com/docs)
- [GitHub](https://github.com/you/project)
- [Blog](https://yoursite.com/blog)

## About

Short paragraph explaining the project, its purpose,
key features, and what makes it different.

After creating it, submit it to directory.llmstxt.cloud, llmstxt.site, and the llms-txt-hub repo on GitHub via pull request. If you have multiple sites, make each llms.txt link to the others: they form a discovery mesh where any crawler entering through one site can find the rest.

Why you also need llms-full.txt

While llms.txt is the summary, llms-full.txt is the complete version: typically between 30 and 60 KB, with project descriptions, use cases, competitor comparisons, and README excerpts. Mintlify's CDN analysis shows that llms-full.txt receives 3 to 4 times more traffic than llms.txt. When an AI system finds the summary, it almost always goes looking for the full version.

In practice, llms-full.txt is where you concentrate the three highest-impact ingredients from the Princeton paper: authoritative citations, direct quotations, and statistics. It's the file that most influences whether AI cites you.

Markdown routes: feed AI clean content

A typical 15,000-token HTML page becomes a 3,000-token Markdown document. That's 80% less noise for AI. Evil Martians recommends serving a .md version of every page on your site.

The simplest way to tell AI that a Markdown version exists is to add this line to your HTML <head>:

<link rel="alternate" type="text/markdown" href="/page.md" />

Claude Code and Cursor already send Accept: text/markdown headers by default when fetching documentation. This is standard HTTP/1.1 content negotiation, around since 1997: not magic, just protocol.

Important: never serve different content to bots and humans based on User-Agent. That's cloaking and Google will penalize you. Use the alternate mechanism instead, which is the clean path.

Register with search platforms: the foundation you can't skip

The work on robots.txt and llms.txt makes your content readable to AI, but AI has to find you first. ChatGPT's search runs on Bing, Google AI Overview uses Google's own index, and Perplexity also relies on search APIs. If your pages aren't indexed, none of the above matters.

Google Search Console. Verify your domain via DNS or HTML file upload. Submit your sitemap.xml. Check the "Pages" report to see what's indexed and what isn't.
Bing Webmaster Tools. Underrated but critical: Copilot, DuckDuckGo, and Yahoo all run on Bing's index. If you're not on Bing, you're not on any of the three.
IndexNow. Activate this protocol in Bing Webmaster. It lets you notify Bing immediately when you publish something new, instead of waiting for a crawler to find you. URLs get indexed in minutes.
Perplexity Publisher Program. If you have regular published content, apply at pplx.ai/publisher-program. Once approved, you get an 80/20 revenue share and access to citation analytics.

Each project needs its own page with a natural-language URL

This is one of the most practical findings from Ahrefs' research on why ChatGPT cites some pages and not others: cited pages have titles with higher semantic similarity to user queries, and natural-language slugs (like /projects/pake) are cited more than opaque IDs (like /page?id=47).

URL structure matters because AI makes decisions before reading a single line of body content. /projects/pake tells it what the page is about; /page?id=47 tells it nothing. So if your site has multiple topics or products, give each one its own page with a descriptive slug.

Another practical consequence: don't concentrate everything on one giant page with anchors (#install, #commands). AI's citation granularity is the URL, not the anchor. A user asking "how to install Claude Code on Mac" deserves a dedicated /install/ page, not a fragment within the home page.

What research says: data from the Princeton paper

The paper "GEO: Generative Engine Optimization" from Princeton and IIT Delhi, presented at KDD 2024, measured which content changes most increase visibility in AI responses. These are the three highest-impact factors:

+115%

Adding authoritative citations with links to the original source. It works because AI prefers verifiable content over orphan content.

+43%

Including direct quotations from credible sources. AI can pass them straight to the user, increasing utility as an answer.

+33%

Adding relevant statistics with concrete numbers. Avoid generalities; specific data gets cited more.

The geo-citation-lab complements this paper by analyzing 602 prompts across three platforms and tens of thousands of pages. The practical findings for content creators:

Specificity. Pages with real data, clear definitions, and side-by-side comparisons have over 50% higher impact than vague, general pages.
Depth. AI doesn't favor short summaries. It favors long content from which it can extract reusable segments. High-impact pages average nearly 2,000 words and 10 or more headings. Low-impact pages average just 170 words: a 10x gap.
Sweet spot. Between 1,000 and 3,000 words. Below is too thin, above is too dense.
FAQ doesn't work. Surprisingly, pure FAQ format hurts citation rate. The tools telling you to "add FAQ to boost your score" are giving advice the data contradicts.

Platform differences: ChatGPT, Claude, Perplexity

Not all AIs cite the same way. Knowing this changes your content strategy:

Platform	Citation style	Optimal strategy
ChatGPT	Cites few sources but uses each deeply. Per-citation impact: 5x Google's.	Depth. Few excellent, long pages.
Perplexity	Cites more than 2x as many sources as ChatGPT. Wider net.	Volume. Multiple medium, specific pages.
Claude (Anthropic)	Cites conservatively. Prioritizes verifiable sources.	Authority. External citations and concrete data.
Bing/Copilot	The only AI where JSON-LD directly helps.	Keep your schema markup clean.

Another key data point: 83% of global citations are English content. If your goal is international audience, you need an English version. If your goal is only your local language audience, optimize for niches where English doesn't dominate yet.

What doesn't work in GEO: don't waste time here

Before adding things to your site searching for "more GEO," cross out this list. These techniques circulate but aren't supported by any major AI system:

<meta name="ai-content-url"> and <meta name="llms">: no specification, no adoption.
/.well-known/ai.txt: competing proposals, no winner yet. Wait.
HTML comments with hints for AI: parsers strip them before AI sees the content.
Serving different Markdown to bots via User-Agent: cloaking, Google penalizes.
Unofficial "AI-friendly" meta tags: noise, not signal.

Special case: JSON-LD isn't as useful for GEO as you'd think. SearchVIU ran an experiment: they put data only in JSON-LD without showing it on the page. The five AI systems they tested didn't find the data. Mark Williams-Cook confirmed that LLMs treat <script type="application/ld+json"> as plain text, reading the words but not understanding the semantics. The one confirmed exception is Bing/Copilot. Conclusion: keep your existing JSON-LD because it helps Bing and Google rich results, but don't expect ChatGPT or Claude to cite you more for adding it.

How to verify if your GEO works

The biggest challenge of GEO compared to SEO is measurement. There's no official Search Console for AI citations (except Bing, which has a partial panel). Here's what you can actually do:

Direct prompt testing. Once a week, run the same 5 prompts in ChatGPT, Claude, Perplexity, and Google AI Overview. Note which sources they cite. Do it in incognito mode to avoid history contamination.
Server logs or Cloudflare panel. Filter User-Agents like OAI-SearchBot, Claude-SearchBot, PerplexityBot. Seeing the crawler download your llms.txt is the strongest signal that your setup works.
Bing Webmaster Tools → AI Performance. The only official panel with citation data (covers Copilot, DuckDuckGo, Yahoo).
Referrers in analytics. Watch for traffic from chat.openai.com, claude.ai, perplexity.ai. It's the definitive proof that a user arrived through an AI citation.

An honest warning: the CJR Tow Center report analyzed 200 AI-generated citations and found that 153 contained partial or complete errors. Do the structural work because it makes your content accessible accurately, but don't take an AI citation as proof that the user saw your exact words.

How to implement GEO with Claude Code in one hour

The good news: everything described can be automated with Claude Code. If you've already read our main guide on how to use Claude Code, this is a perfect practical application. Follow these five steps:

Open your project in Claude Code. Navigate to your website folder and run claude. If you don't have it installed yet, follow the installation guide.
Ask it to create the categorized robots.txt. For example: "Create a robots.txt at the project root that allows OAI-SearchBot, Claude-SearchBot, and PerplexityBot, blocks Bytespider, and includes the sitemap."
Generate your llms.txt and llms-full.txt. "Read the README.md and main project files. Generate a summary llms.txt and a complete llms-full.txt following the llmstxt.org standard."
Add Markdown routes. "For each HTML page on the site, generate an equivalent .md version and add the link rel='alternate' type='text/markdown' header to the corresponding <head>."
Verify with a quick test. Ask Claude Code to open the resulting llms.txt and validate it against the standard format. Then commit and deploy.

Claude Code is ideal for this because it has access to your files, understands multi-file context, and can make modifications without you copying and pasting between tools. What would take you an afternoon in a traditional editor takes minutes in Claude Code.

Official resources and recommended reading

GEO: Generative Engine Optimization — Original paper from Princeton and IIT Delhi (KDD 2024).
llmstxt.org — llms.txt standard specification.
geo-citation-lab — Open research with 602 prompts and tens of thousands of pages analyzed.
Why ChatGPT Cites One Page Over Another — Ahrefs analysis of citation patterns.
IndexNow Documentation — Protocol for notifying Bing of new content in real time.
Original article by @HiTw93 — Main source for this guide, with practical examples.