AI Readiness
AI Readiness (AEO/GEO) measures how well the page is optimised for AI-powered search — ChatGPT, Perplexity, Google AI Overviews, Claude, and voice assistants. 21 signals are evaluated. Weight: 10%.
Details
What is AEO/GEO
Answer Engine Optimisation (AEO) and Generative Engine Optimisation (GEO) are emerging disciplines focused on making content discoverable and citable by AI systems. As AI Overviews and AI chatbots replace traditional blue-link search for some queries, being featured in AI-generated answers becomes a critical traffic source.
The AI Readiness score = (number of signals present / 21) × 100. Each signal is binary (present or not).
Entity clarity and Schema (3 signals)
AI systems extract entities (people, organisations, products, places) from web pages.
1. Entity Schema — Person, Organization, Product, etc. in JSON-LD makes entities machine-readable
2. sameAs links — connections to Wikidata, Wikipedia, LinkedIn, Crunchbase strengthen entity recognition
3. BreadcrumbList Schema — helps AI understand page position in site hierarchy
Missing Entity Schema or sameAs links are warning-level issues (−10 pts).
FAQ and Q&A content (3 signals)
FAQ-style content is frequently cited by AI systems because it is already structured as question-answer pairs.
4. FAQPage Schema — JSON-LD FAQPage markup. Missing = warning
5. FAQ HTML patterns — question-phrased headings (ending with '?' or starting with how/what/why/when) detected independently of schema
6. Q&A ratio — percentage of headings (H2+H3) phrased as questions. ≥10% indicates good Q&A content structure
Speakable and HowTo Schema (2 signals)
7. Speakable Schema — marks sections suitable for text-to-speech (Google Assistant, smart speakers)
8. HowTo Schema — step-by-step instructions in JSON-LD, used by AI for procedural queries
Content structure (3 signals)
Well-structured content is easier for AI systems to parse and cite.
9. Clear content sections — at least 3 distinct sections with H2/H3 headings and body text
10. Definitions — 'X is Y' patterns or explicit definition sentences that AI can extract as direct answers
11. Named entities — 3+ multi-word proper nouns (people, companies, places) detected via NLP patterns
Machine-readable formats (2 signals)
12. llms.txt — a proposed standard (similar to robots.txt) at /llms.txt that provides a structured overview of the site for LLMs and AI crawlers
13. Markdown for agents — whether the server returns Content-Type: text/markdown when an AI agent sends Accept: text/markdown. This lets AI agents receive clean, parseable text instead of HTML.
Citable content patterns (4 signals)
14. Citation passages — 2+ paragraphs containing statistics, percentages, dollar amounts, years, or large numbers. AI systems prefer citing content with concrete data
15. Comparison tables — HTML tables with ≥2 rows and ≥2 columns. Frequently extracted by AI for feature/product comparisons
16. TL;DR pattern — ≥30% of sections have a short opening paragraph (≤3 sentences, ≤80 words) that works as a summary
17. Authority backlinks — sameAs links pointing to Wikipedia, Reddit, Quora, or Crunchbase (signals credibility to AI)
Search and heading patterns (2 signals)
18. Long-tail headings — 2+ headings (H2/H3/H4) with ≥4 words and a modifier word (e.g., 'How to fix a broken canonical tag'). These match the specific queries AI systems handle
19. Conversational headings — 2+ headings with ≥5 words using natural language patterns. Improves voice search and AI assistant responses
Technical AI integration (2 signals)
20. AI plugin manifest — whether /.well-known/ai-plugin.json exists with a valid schema_version (ChatGPT Actions / OpenAI plugin integration)
21. max-snippet unlimited — whether meta robots allows unlimited snippet length (no max-snippet restriction, or max-snippet:-1). AI systems need full-length snippets to generate accurate answers
AI bot crawlability in robots.txt
Separately from the 21 signals, the audit checks robots.txt access for 6 major AI bots:
• GPTBot — OpenAI's crawler for ChatGPT web browsing and search
• ChatGPT-User — ChatGPT's real-time browsing agent
• ClaudeBot — Anthropic's crawler for Claude
• Google-Extended — Google's AI training crawler (Gemini)
• PerplexityBot — Perplexity AI's search crawler
• CCBot — Common Crawl bot used by many AI systems
If ALL bots are blocked simultaneously, this is a critical issue (−20 pts) — the site is invisible to all AI search. If only some bots are blocked, each is a warning (−10 pts).
To unblock, remove or comment out the Disallow rules for these user-agents in robots.txt. If you only want to block AI training but allow AI search, block CCBot and Google-Extended while allowing GPTBot, ChatGPT-User, ClaudeBot, and PerplexityBot.
Cloudflare AI bot blocking
Some sites use Cloudflare's 'Block AI Scrapers' feature or custom WAF rules to prevent AI crawlers. While this may be intentional, it can also inadvertently block AI search crawlers like GPTBot or PerplexityBot, preventing the page from being cited by AI systems.
Metrics
| Metric | Description |
|---|---|
| AI Readiness score | Percentage of 21 signals present (0–100). |
| Entity Schema | Whether Person, Organization, Product, or similar entity schema is present. |
| FAQ Schema | Whether FAQPage JSON-LD is present. |
| FAQ HTML patterns | Whether question-phrased headings are detected in HTML. |
| Speakable | Whether Speakable schema is present. |
| HowTo Schema | Whether HowTo JSON-LD is present. |
| BreadcrumbList Schema | Whether BreadcrumbList JSON-LD is present. |
| sameAs links | Number of sameAs links to authoritative sources. |
| Content sections | Number of distinct content sections detected (≥3 = signal present). |
| Definitions | Whether 'X is Y' definition patterns are detected in body text. |
| llms.txt | Whether /llms.txt exists and returns HTTP 200. |
| llms.txt quality | Whether llms.txt contains structured sections (not just a stub). |
| Markdown for agents | Whether the server returns Content-Type: text/markdown for Accept: text/markdown requests. |
| Cloudflare | Whether Cloudflare is detected (may affect AI crawler access). |
| Q&A ratio | Percentage of headings phrased as questions (≥10% = signal present). |
| Named entities | Number of multi-word proper nouns detected (≥3 = signal present). |
| Citation passages | Number of paragraphs with statistics, dates, or data (≥2 = signal present). |
| Comparison tables | Whether HTML tables with ≥2 rows and ≥2 columns exist. |
| TL;DR pattern | Whether ≥30% of sections start with a short summary paragraph. |
| Long-tail headings | Whether ≥2 headings use ≥4-word specific queries with modifier words. |
| Conversational headings | Whether ≥2 headings use ≥5-word natural language patterns. |
| Authority backlinks | Whether sameAs links point to Wikipedia, Reddit, Quora, or Crunchbase. |
| AI plugin manifest | Whether /.well-known/ai-plugin.json exists with valid schema_version. |
| max-snippet unlimited | Whether meta robots allows unlimited snippet length. |
| AI bots blocked | Which AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) are blocked in robots.txt. |
Related Topics
Structured data uses standardised formats (JSON-LD, Microdata) to describe page …
Content Quality analyses how readable, well-structured, and up-to-date the page'…
Trust Signals measure EEAT — Experience, Expertise, Authoritativeness, and Trust…
Run a free SEO audit to see how your site performs in this category.