Traditional SEO is a fight to enter Google's top 10. GEO (Generative Engine Optimization) plays a different game: 83% of Google AI Overview citations come from pages outside the top 10. This guide shows you, with real data and copy-paste code, how to configure your site in one hour so ChatGPT, Claude, Perplexity, and Copilot find and cite it.
GEO (Generative Engine Optimization) is the set of practices that make your website visible and citable by generative search engines like ChatGPT, Claude, Perplexity, Google AI Overview, and Microsoft Copilot. Unlike classic SEO, GEO isn't about climbing positions: it's about helping AI understand what you already have.
The term was coined in the paper "GEO: Generative Engine Optimization," published by Princeton and IIT Delhi researchers at KDD 2024. The central finding: visibility in AI responses increases up to 115% by adding authoritative citations, 43% with direct quotations from credible sources, and 33% with relevant statistics.
The traffic data also explains why we're talking about GEO now and not earlier: AI search grew 527% year-over-year in the first half of 2025, ChatGPT reached 900 million weekly active users by February 2026, and AI-referred traffic converts at 5x the rate of traditional search. Even so, it still represents less than 1% of total traffic. That single fact reframes everything:
"GEO is a brand visibility strategy, not a traffic strategy. Worth an hour of setup, not a week." — @HiTw93
If you come from traditional SEO, the first thing to understand is that the rules are different. This table summarizes the differences that matter most when planning content:
| Aspect | Traditional SEO | GEO |
|---|---|---|
| Goal | Top 10 in Google | Get cited in AI answers |
| Key metric | Position + clicks | Citations + retrieval-to-citation rate |
| Signals that matter | PageRank, backlinks, CTR | Clear structure, reliable sources, specific data |
| Where citations come from | Top 10 results | 83% from outside the top 10 |
That last row is the most important news for small sites: the PageRank moat no longer protects the giants in the AI era. If your README or documentation is well-written, you can outrank a massive site with thin content.
Most people treat robots.txt as a binary switch: either block all AI crawlers or let them all through. That's a costly mistake. AI crawlers do very different things and should be handled separately.
| Type | Examples | What they do | Recommendation |
|---|---|---|---|
| Training | GPTBot, ClaudeBot, CCBot, Meta-ExternalAgent | Take your content to train future models | Block if you want to opt out of training |
| Search and retrieval | OAI-SearchBot, Claude-SearchBot, PerplexityBot | Fetch in real time to answer queries | Always allow |
| User-triggered | ChatGPT-User, Claude-User, Perplexity-User | Only fire when someone pastes your URL into a chat | Always allow |
| Undeclared | Bytespider, unidentified bots | Don't follow rules | Block |
The most expensive mistake: blocking OAI-SearchBot thinking you're protecting your content. What you actually did was disappear from ChatGPT search results and get nothing in return.
Create a robots.txt file at the root of your site. The strategy recommended by @HiTw93 is: allow search and user-triggered crawlers, block training and undeclared ones.
# Search and retrieval: allow
User-agent: OAI-SearchBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
# User-triggered: allow
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-User
Allow: /
# Training: block (optional)
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
# Opt-out tokens
User-agent: Google-Extended
Disallow: /
# Undeclared: block
User-agent: Bytespider
Disallow: /
Sitemap: https://yoursite.com/sitemap.xml
If your priority is long-term brand exposure (so the next generation of models knows about you), keep GPTBot and CCBot on Allow. If your priority is control, block them. Bytespider should always be blocked: it doesn't identify itself properly and doesn't respect rules.
llms.txt is a new standard, similar to robots.txt but designed for AI consumption. You place a Markdown file at the root of your site describing what it does, its key pages, and who's behind it. AI systems prioritize this file when crawling your content.
According to BuiltWith, more than 840,000 sites have already deployed llms.txt, including Anthropic, Cloudflare, Stripe, and Vercel. But SE Ranking's survey of 300,000 domains shows real adoption at only 10%. In other words: you're early, and that's an advantage.
The format is simple. Create /llms.txt with this structure:
# Your project name
> One-line description of what this is.
## Links
- [Documentation](https://yoursite.com/docs)
- [GitHub](https://github.com/you/project)
- [Blog](https://yoursite.com/blog)
## About
Short paragraph explaining the project, its purpose,
key features, and what makes it different.
After creating it, submit it to directory.llmstxt.cloud, llmstxt.site, and the llms-txt-hub repo on GitHub via pull request. If you have multiple sites, make each llms.txt link to the others: they form a discovery mesh where any crawler entering through one site can find the rest.
While llms.txt is the summary, llms-full.txt is the complete version: typically between 30 and 60 KB, with project descriptions, use cases, competitor comparisons, and README excerpts. Mintlify's CDN analysis shows that llms-full.txt receives 3 to 4 times more traffic than llms.txt. When an AI system finds the summary, it almost always goes looking for the full version.
In practice, llms-full.txt is where you concentrate the three highest-impact ingredients from the Princeton paper: authoritative citations, direct quotations, and statistics. It's the file that most influences whether AI cites you.
A typical 15,000-token HTML page becomes a 3,000-token Markdown document. That's 80% less noise for AI. Evil Martians recommends serving a .md version of every page on your site.
The simplest way to tell AI that a Markdown version exists is to add this line to your HTML <head>:
<link rel="alternate" type="text/markdown" href="/page.md" />
Claude Code and Cursor already send Accept: text/markdown headers by default when fetching documentation. This is standard HTTP/1.1 content negotiation, around since 1997: not magic, just protocol.
Important: never serve different content to bots and humans based on User-Agent. That's cloaking and Google will penalize you. Use the alternate mechanism instead, which is the clean path.
The work on robots.txt and llms.txt makes your content readable to AI, but AI has to find you first. ChatGPT's search runs on Bing, Google AI Overview uses Google's own index, and Perplexity also relies on search APIs. If your pages aren't indexed, none of the above matters.
sitemap.xml. Check the "Pages" report to see what's indexed and what isn't.
This is one of the most practical findings from Ahrefs' research on why ChatGPT cites some pages and not others: cited pages have titles with higher semantic similarity to user queries, and natural-language slugs (like /projects/pake) are cited more than opaque IDs (like /page?id=47).
URL structure matters because AI makes decisions before reading a single line of body content. /projects/pake tells it what the page is about; /page?id=47 tells it nothing. So if your site has multiple topics or products, give each one its own page with a descriptive slug.
Another practical consequence: don't concentrate everything on one giant page with anchors (#install, #commands). AI's citation granularity is the URL, not the anchor. A user asking "how to install Claude Code on Mac" deserves a dedicated /install/ page, not a fragment within the home page.
The paper "GEO: Generative Engine Optimization" from Princeton and IIT Delhi, presented at KDD 2024, measured which content changes most increase visibility in AI responses. These are the three highest-impact factors:
Adding authoritative citations with links to the original source. It works because AI prefers verifiable content over orphan content.
Including direct quotations from credible sources. AI can pass them straight to the user, increasing utility as an answer.
Adding relevant statistics with concrete numbers. Avoid generalities; specific data gets cited more.
The geo-citation-lab complements this paper by analyzing 602 prompts across three platforms and tens of thousands of pages. The practical findings for content creators:
Not all AIs cite the same way. Knowing this changes your content strategy:
| Platform | Citation style | Optimal strategy |
|---|---|---|
| ChatGPT | Cites few sources but uses each deeply. Per-citation impact: 5x Google's. | Depth. Few excellent, long pages. |
| Perplexity | Cites more than 2x as many sources as ChatGPT. Wider net. | Volume. Multiple medium, specific pages. |
| Claude (Anthropic) | Cites conservatively. Prioritizes verifiable sources. | Authority. External citations and concrete data. |
| Bing/Copilot | The only AI where JSON-LD directly helps. | Keep your schema markup clean. |
Another key data point: 83% of global citations are English content. If your goal is international audience, you need an English version. If your goal is only your local language audience, optimize for niches where English doesn't dominate yet.
Before adding things to your site searching for "more GEO," cross out this list. These techniques circulate but aren't supported by any major AI system:
<meta name="ai-content-url"> and <meta name="llms">: no specification, no adoption./.well-known/ai.txt: competing proposals, no winner yet. Wait.
Special case: JSON-LD isn't as useful for GEO as you'd think. SearchVIU ran an experiment: they put data only in JSON-LD without showing it on the page. The five AI systems they tested didn't find the data. Mark Williams-Cook confirmed that LLMs treat <script type="application/ld+json"> as plain text, reading the words but not understanding the semantics. The one confirmed exception is Bing/Copilot. Conclusion: keep your existing JSON-LD because it helps Bing and Google rich results, but don't expect ChatGPT or Claude to cite you more for adding it.
The biggest challenge of GEO compared to SEO is measurement. There's no official Search Console for AI citations (except Bing, which has a partial panel). Here's what you can actually do:
OAI-SearchBot, Claude-SearchBot, PerplexityBot. Seeing the crawler download your llms.txt is the strongest signal that your setup works.chat.openai.com, claude.ai, perplexity.ai. It's the definitive proof that a user arrived through an AI citation.An honest warning: the CJR Tow Center report analyzed 200 AI-generated citations and found that 153 contained partial or complete errors. Do the structural work because it makes your content accessible accurately, but don't take an AI citation as proof that the user saw your exact words.
The good news: everything described can be automated with Claude Code. If you've already read our main guide on how to use Claude Code, this is a perfect practical application. Follow these five steps:
claude. If you don't have it installed yet, follow the installation guide.llms.txt and validate it against the standard format. Then commit and deploy.Claude Code is ideal for this because it has access to your files, understands multi-file context, and can make modifications without you copying and pasting between tools. What would take you an afternoon in a traditional editor takes minutes in Claude Code.
If this guide helped you, the natural next step is to learn Claude Code so you can implement everything on your own site in one hour. The complete step-by-step guide is waiting for you.
See the complete Claude Code guide