← Wiki

AI Readiness

AI Readiness (AEO/GEO) measures how well the page is optimised for AI-powered search — including ChatGPT, Perplexity, Google SGE (AI Overviews), and voice assistants.

Details

What is AEO/GEO

Answer Engine Optimisation (AEO) and Generative Engine Optimisation (GEO) are emerging disciplines focused on making content discoverable and citable by AI systems. As AI Overviews and AI chatbots replace traditional blue-link search for some queries, being featured in AI-generated answers becomes a traffic source.

Entity clarity and Schema

AI systems extract entities (people, organisations, products, places) from web pages. Entity Schema (Person, Organization, Product, etc.) in JSON-LD makes entities machine-readable and helps AI correctly identify what the page is about. sameAs links connecting to Wikidata or Wikipedia entries strengthen entity recognition.

FAQ and Q&A content

FAQPage schema and visible Q&A sections are frequently cited by AI systems because they are already structured as question-answer pairs. Content with a high Q&A ratio is more likely to be surfaced in AI Overviews and voice search responses.

Speakable schema

The Speakable schema type marks sections of text that are suitable for text-to-speech (TTS) — used by Google Assistant and smart speakers. It signals the most important, concise passages for audio delivery.

llms.txt

llms.txt is a proposed standard (similar to robots.txt) that provides a structured overview of a site for LLMs and AI crawlers. A well-formed llms.txt file helps AI systems understand site structure, content types, and contact information without crawling every page.

AI bot crawlability in robots.txt

Many sites explicitly block AI crawlers in robots.txt — sometimes intentionally (to prevent training), sometimes by mistake (copying a template). CheckSEO checks access for 6 major AI bots:

• GPTBot — OpenAI's crawler for ChatGPT web browsing and search

• ChatGPT-User — ChatGPT's real-time browsing agent

• ClaudeBot — Anthropic's crawler for Claude

• Google-Extended — Google's AI training crawler (Gemini, Bard)

• PerplexityBot — Perplexity AI's search crawler

• CCBot — Common Crawl bot used by many AI systems

If any of these are blocked, the site won't appear in that AI system's answers. The check uses the same protego parser as Google to evaluate robots.txt rules per user-agent. A WARNING is raised for each blocked bot under the AI Readiness category.

To unblock, remove or comment out the Disallow rules for these user-agents in robots.txt. If you only want to block AI training but allow search, consider blocking CCBot and Google-Extended while allowing GPTBot, ChatGPT-User, ClaudeBot, and PerplexityBot.

Cloudflare AI bot blocking

Some sites use Cloudflare's 'Block AI Scrapers' feature or custom rules to prevent AI training crawlers. While this may be intentional, it can also inadvertently block AI search crawlers like GPTBot or PerplexityBot, preventing the page from being indexed by AI systems.

Comparison tables and structured patterns

AI systems frequently extract data from comparison tables, TL;DR summaries, and definition lists because they are already pre-structured for machine consumption.

• Comparison tables — side-by-side feature/product comparisons are highly citable by AI.

• TL;DR pattern — a concise summary at the top or bottom of the article helps AI extract the key takeaway.

• Long-tail headings — headings phrased as specific questions (e.g., 'How to fix a broken canonical tag') match the long-tail queries AI systems handle.

• Conversational headings — headings using natural language patterns improve voice search and AI assistant responses.

• Authority backlinks — links from/to authoritative domains signal content trustworthiness to AI systems.

Metrics

Metric	Description
Entity Schema	Whether Person, Organization, Product, or similar entity schema is present.
FAQ Schema	Whether FAQPage JSON-LD is present.
Speakable	Whether Speakable schema is present.
HowTo Schema	Whether HowToStep JSON-LD is present.
BreadcrumbList Schema	Whether BreadcrumbList JSON-LD is present.
sameAs links	Number of sameAs links to authoritative sources.
llms.txt	Whether /llms.txt exists and returns HTTP 200.
llms.txt quality	Whether llms.txt contains structured sections (not just a stub).
Markdown for Agents	Whether a .md or /llms.txt version of the page exists for AI agents.
Cloudflare	Whether Cloudflare is detected (may affect AI crawler access).
Content sections	Number of distinct content sections detected.
Q&A ratio	Percentage of content structured as question-answer pairs.
Named entities	Number of named entities detected in the page text.
Citation passages	Number of citable passages with statistics, dates, or data.
Comparison tables	Whether the page contains comparison or feature tables.
TL;DR pattern	Whether a TL;DR or summary section is present.
Long-tail headings	Whether headings are phrased as specific, question-like queries.
Conversational headings	Whether headings use natural conversational language patterns.
Authority backlinks	Whether the page links to/from authoritative domains.
AI bots in robots.txt	Which AI crawlers (GPTBot, ClaudeBot, PerplexityBot, etc.) are blocked in robots.txt.

AI Readiness

Details

Metrics

Related Topics