AI Readiness
AI Readiness (AEO/GEO) measures how well the page is optimised for AI-powered search — including ChatGPT, Perplexity, Google SGE (AI Overviews), and voice assistants.
Details
What is AEO/GEO
Answer Engine Optimisation (AEO) and Generative Engine Optimisation (GEO) are emerging disciplines focused on making content discoverable and citable by AI systems. As AI Overviews and AI chatbots replace traditional blue-link search for some queries, being featured in AI-generated answers becomes a traffic source.
Entity clarity and Schema
AI systems extract entities (people, organisations, products, places) from web pages. Entity Schema (Person, Organization, Product, etc.) in JSON-LD makes entities machine-readable and helps AI correctly identify what the page is about. sameAs links connecting to Wikidata or Wikipedia entries strengthen entity recognition.
FAQ and Q&A content
FAQPage schema and visible Q&A sections are frequently cited by AI systems because they are already structured as question-answer pairs. Content with a high Q&A ratio is more likely to be surfaced in AI Overviews and voice search responses.
Speakable schema
The Speakable schema type marks sections of text that are suitable for text-to-speech (TTS) — used by Google Assistant and smart speakers. It signals the most important, concise passages for audio delivery.
llms.txt
llms.txt is a proposed standard (similar to robots.txt) that provides a structured overview of a site for LLMs and AI crawlers. A well-formed llms.txt file helps AI systems understand site structure, content types, and contact information without crawling every page.
AI bot crawlability in robots.txt
Many sites explicitly block AI crawlers in robots.txt — sometimes intentionally (to prevent training), sometimes by mistake (copying a template). CheckSEO checks access for 6 major AI bots:
• GPTBot — OpenAI's crawler for ChatGPT web browsing and search
• ChatGPT-User — ChatGPT's real-time browsing agent
• ClaudeBot — Anthropic's crawler for Claude
• Google-Extended — Google's AI training crawler (Gemini, Bard)
• PerplexityBot — Perplexity AI's search crawler
• CCBot — Common Crawl bot used by many AI systems
If any of these are blocked, the site won't appear in that AI system's answers. The check uses the same protego parser as Google to evaluate robots.txt rules per user-agent. A WARNING is raised for each blocked bot under the AI Readiness category.
To unblock, remove or comment out the Disallow rules for these user-agents in robots.txt. If you only want to block AI training but allow search, consider blocking CCBot and Google-Extended while allowing GPTBot, ChatGPT-User, ClaudeBot, and PerplexityBot.
Cloudflare AI bot blocking
Some sites use Cloudflare's 'Block AI Scrapers' feature or custom rules to prevent AI training crawlers. While this may be intentional, it can also inadvertently block AI search crawlers like GPTBot or PerplexityBot, preventing the page from being indexed by AI systems.
Comparison tables and structured patterns
AI systems frequently extract data from comparison tables, TL;DR summaries, and definition lists because they are already pre-structured for machine consumption.
• Comparison tables — side-by-side feature/product comparisons are highly citable by AI.
• TL;DR pattern — a concise summary at the top or bottom of the article helps AI extract the key takeaway.
• Long-tail headings — headings phrased as specific questions (e.g., 'How to fix a broken canonical tag') match the long-tail queries AI systems handle.
• Conversational headings — headings using natural language patterns improve voice search and AI assistant responses.
• Authority backlinks — links from/to authoritative domains signal content trustworthiness to AI systems.
Metrics
| Metric | Description |
|---|---|
| Entity Schema | Whether Person, Organization, Product, or similar entity schema is present. |
| FAQ Schema | Whether FAQPage JSON-LD is present. |
| Speakable | Whether Speakable schema is present. |
| HowTo Schema | Whether HowToStep JSON-LD is present. |
| BreadcrumbList Schema | Whether BreadcrumbList JSON-LD is present. |
| sameAs links | Number of sameAs links to authoritative sources. |
| llms.txt | Whether /llms.txt exists and returns HTTP 200. |
| llms.txt quality | Whether llms.txt contains structured sections (not just a stub). |
| Markdown for Agents | Whether a .md or /llms.txt version of the page exists for AI agents. |
| Cloudflare | Whether Cloudflare is detected (may affect AI crawler access). |
| Content sections | Number of distinct content sections detected. |
| Q&A ratio | Percentage of content structured as question-answer pairs. |
| Named entities | Number of named entities detected in the page text. |
| Citation passages | Number of citable passages with statistics, dates, or data. |
| Comparison tables | Whether the page contains comparison or feature tables. |
| TL;DR pattern | Whether a TL;DR or summary section is present. |
| Long-tail headings | Whether headings are phrased as specific, question-like queries. |
| Conversational headings | Whether headings use natural conversational language patterns. |
| Authority backlinks | Whether the page links to/from authoritative domains. |
| AI bots in robots.txt | Which AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) are blocked in robots.txt. |