← Wiki

AI Readiness

AI Readiness (AEO/GEO) measures how well the page is optimised for AI-powered search — ChatGPT, Perplexity, Google AI Overviews, Claude, and voice assistants. 21 signals are evaluated. Weight: 10%.

Details

What is AEO/GEO

Answer Engine Optimisation (AEO) and Generative Engine Optimisation (GEO) are emerging disciplines focused on making content discoverable and citable by AI systems. As AI Overviews and AI chatbots replace traditional blue-link search for some queries, being featured in AI-generated answers becomes a critical traffic source.

The AI Readiness score = (number of signals present / 21) × 100. Each signal is binary (present or not).

Entity clarity and Schema (3 signals)

AI systems extract entities (people, organisations, products, places) from web pages.

1. Entity Schema — Person, Organization, Product, etc. in JSON-LD makes entities machine-readable

2. sameAs links — connections to Wikidata, Wikipedia, LinkedIn, Crunchbase strengthen entity recognition

3. BreadcrumbList Schema — helps AI understand page position in site hierarchy

Missing Entity Schema or sameAs links are warning-level issues (−10 pts).

FAQ and Q&A content (3 signals)

FAQ-style content is frequently cited by AI systems because it is already structured as question-answer pairs.

4. FAQPage Schema — JSON-LD FAQPage markup. Missing = warning

5. FAQ HTML patterns — question-phrased headings (ending with '?' or starting with how/what/why/when) detected independently of schema

6. Q&A ratio — percentage of headings (H2+H3) phrased as questions. ≥10% indicates good Q&A content structure

Speakable and HowTo Schema (2 signals)

7. Speakable Schema — marks sections suitable for text-to-speech (Google Assistant, smart speakers)

8. HowTo Schema — step-by-step instructions in JSON-LD, used by AI for procedural queries

Content structure (3 signals)

Well-structured content is easier for AI systems to parse and cite.

9. Clear content sections — at least 3 distinct sections with H2/H3 headings and body text

10. Definitions — 'X is Y' patterns or explicit definition sentences that AI can extract as direct answers

11. Named entities — 3+ multi-word proper nouns (people, companies, places) detected via NLP patterns

Machine-readable formats (2 signals)

12. llms.txt — a proposed standard (similar to robots.txt) at /llms.txt that provides a structured overview of the site for LLMs and AI crawlers

13. Markdown for agents — whether the server returns Content-Type: text/markdown when an AI agent sends Accept: text/markdown. This lets AI agents receive clean, parseable text instead of HTML.

Citable content patterns (4 signals)

14. Citation passages — 2+ paragraphs containing statistics, percentages, dollar amounts, years, or large numbers. AI systems prefer citing content with concrete data

15. Comparison tables — HTML tables with ≥2 rows and ≥2 columns. Frequently extracted by AI for feature/product comparisons

16. TL;DR pattern — ≥30% of sections have a short opening paragraph (≤3 sentences, ≤80 words) that works as a summary

17. Authority backlinks — sameAs links pointing to Wikipedia, Reddit, Quora, or Crunchbase (signals credibility to AI)

Search and heading patterns (2 signals)

18. Long-tail headings — 2+ headings (H2/H3/H4) with ≥4 words and a modifier word (e.g., 'How to fix a broken canonical tag'). These match the specific queries AI systems handle

19. Conversational headings — 2+ headings with ≥5 words using natural language patterns. Improves voice search and AI assistant responses

Technical AI integration (2 signals)

20. AI plugin manifest — whether /.well-known/ai-plugin.json exists with a valid schema_version (ChatGPT Actions / OpenAI plugin integration)

21. max-snippet unlimited — whether meta robots allows unlimited snippet length (no max-snippet restriction, or max-snippet:-1). AI systems need full-length snippets to generate accurate answers

AI bot crawlability in robots.txt

Separately from the 21 signals, the audit checks robots.txt access for 6 major AI bots:

• GPTBot — OpenAI's crawler for ChatGPT web browsing and search

• ChatGPT-User — ChatGPT's real-time browsing agent

• ClaudeBot — Anthropic's crawler for Claude

• Google-Extended — Google's AI training crawler (Gemini)

• PerplexityBot — Perplexity AI's search crawler

• CCBot — Common Crawl bot used by many AI systems

If ALL bots are blocked simultaneously, this is a critical issue (−20 pts) — the site is invisible to all AI search. If only some bots are blocked, each is a warning (−10 pts).

To unblock, remove or comment out the Disallow rules for these user-agents in robots.txt. If you only want to block AI training but allow AI search, block CCBot and Google-Extended while allowing GPTBot, ChatGPT-User, ClaudeBot, and PerplexityBot.

Cloudflare AI bot blocking

Some sites use Cloudflare's 'Block AI Scrapers' feature or custom WAF rules to prevent AI crawlers. While this may be intentional, it can also inadvertently block AI search crawlers like GPTBot or PerplexityBot, preventing the page from being cited by AI systems.

Metrics

Metric	Description
AI Readiness score	Percentage of 21 signals present (0–100).
Entity Schema	Whether Person, Organization, Product, or similar entity schema is present.
FAQ Schema	Whether FAQPage JSON-LD is present.
FAQ HTML patterns	Whether question-phrased headings are detected in HTML.
Speakable	Whether Speakable schema is present.
HowTo Schema	Whether HowTo JSON-LD is present.
BreadcrumbList Schema	Whether BreadcrumbList JSON-LD is present.
sameAs links	Number of sameAs links to authoritative sources.
Content sections	Number of distinct content sections detected (≥3 = signal present).
Definitions	Whether 'X is Y' definition patterns are detected in body text.
llms.txt	Whether /llms.txt exists and returns HTTP 200.
llms.txt quality	Whether llms.txt contains structured sections (not just a stub).
Markdown for agents	Whether the server returns Content-Type: text/markdown for Accept: text/markdown requests.
Cloudflare	Whether Cloudflare is detected (may affect AI crawler access).
Q&A ratio	Percentage of headings phrased as questions (≥10% = signal present).
Named entities	Number of multi-word proper nouns detected (≥3 = signal present).
Citation passages	Number of paragraphs with statistics, dates, or data (≥2 = signal present).
Comparison tables	Whether HTML tables with ≥2 rows and ≥2 columns exist.
TL;DR pattern	Whether ≥30% of sections start with a short summary paragraph.
Long-tail headings	Whether ≥2 headings use ≥4-word specific queries with modifier words.
Conversational headings	Whether ≥2 headings use ≥5-word natural language patterns.
Authority backlinks	Whether sameAs links point to Wikipedia, Reddit, Quora, or Crunchbase.
AI plugin manifest	Whether /.well-known/ai-plugin.json exists with valid schema_version.
max-snippet unlimited	Whether meta robots allows unlimited snippet length.
AI bots blocked	Which AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) are blocked in robots.txt.

AI Readiness

Details

Metrics

Related Topics

Before you go — check your site