Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • News
  • Blogs
  • Builds
  • Contests
  • Compare
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    1,917+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    Categories
    • Agents1038
    • Coding971
    • Infrastructure415
    • Marketing398
    • Design335
    • Projects313
    • Analytics299
    • Research290
    • Testing183
    • Integration167
    • Data163
    • Security156
    • MCP145
    • Learning135
    • Communication120
    • Extensions114
    • Prompts110
    • Commerce106
    • Voice102
    • DevOps84
    • Web71
    • Finance18
    1. Home
    2. Tools
    3. Agent Reading Test
    Agent Reading Test icon

    Agent Reading Test

    LLM Evaluations

    A benchmark that tests how well AI coding agents can read web content, surfacing silent failure modes like truncation, CSS burial, SPA shells, and broken markdown parsing.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source benchmark for testing AI agent web reading capabilities.

    Engagement

    Available On

    Web
    API

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    LLM EvaluationsAgent FrameworksDocumentation

    Alternatives

    harness-kitAshrMaxim
    Developer
    Agent EcosystemSeattle, WAEst. 2025

    Listed Apr 2026

    About Agent Reading Test

    Agent Reading Test is a free, open-source benchmark designed to evaluate how well AI coding agents (such as Claude Code, Cursor, and GitHub Copilot) read and process documentation websites. It surfaces silent failure modes that affect real agent workflows — including content truncation, boilerplate burial, client-side rendering gaps, and tabbed content serialization. Each of the 10 test pages targets a specific failure mode documented in the Agent-Friendly Documentation Spec at agentdocsspec.com. Canary tokens are embedded at strategic positions, and agents complete realistic documentation tasks before reporting which tokens they encountered, yielding a detailed score out of 20.

    Key Features:

    • Truncation Test — A 150K-character page with canary tokens at 10K, 40K, 75K, 100K, and 130K positions to map exactly where an agent's truncation limit kicks in.
    • Boilerplate Burial Test — 80K of inline CSS precedes real content, testing whether agents distinguish CSS noise from documentation.
    • SPA Shell Test — A client-side rendered page where content only appears after JavaScript executes, exposing agents that see empty shells.
    • Tabbed Content Test — Eight language variants in tabs with canary tokens in tabs 1, 4, and 8, measuring how far into serialized tab content an agent reads.
    • Soft 404 Test — An HTTP 200 response with a "page not found" message, testing whether agents recognize error pages.
    • Broken Code Fence Test — An unclosed markdown code fence that turns all subsequent content into "code," testing markdown parsing awareness.
    • Content Negotiation Test — Different canary tokens in HTML vs. markdown versions, testing whether agents request the better format.
    • Cross-Host Redirect Test — A 301 redirect to a different hostname, testing whether agents follow cross-host redirects.
    • Header Quality Test — Three cloud platforms with identical step headers, testing whether agents can determine which section is which.
    • Content Start Test — Real content buried after 50% navigation chrome, testing whether agents read past sidebar serialization.
    • Scoring Form — Paste a comma-separated list of canary tokens into the scoring form for a detailed breakdown of pipeline delivery and content loss.
    • Open Source & CC BY 4.0 — Source code is publicly available on GitHub under the agent-ecosystem organization; content is licensed under Creative Commons Attribution 4.0.

    To get started, point your agent at agentreadingtest.com/start/ and instruct it to follow the instructions. After completing all 10 tasks, paste the agent's reported canary tokens into the scoring form at agentreadingtest.com/score/ for a full breakdown.

    Agent Reading Test - 1

    Community Discussions

    Be the first to start a conversation about Agent Reading Test

    Share your experience with Agent Reading Test, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Free

    Fully free and open-source benchmark for testing AI agent web reading capabilities.

    • 10 benchmark test pages
    • Canary token scoring
    • Scoring form with detailed breakdown
    • Answer key (answers.json)
    • Maximum score of 20 points

    Capabilities

    Key Features

    • 10 targeted benchmark tests for AI agent web reading
    • Canary token detection at strategic content positions
    • Truncation limit mapping with 150K-char test page
    • Boilerplate burial detection (80K inline CSS)
    • SPA/client-side rendering failure detection
    • Tabbed content serialization testing
    • Soft 404 recognition test
    • Broken markdown code fence parsing test
    • Content negotiation (HTML vs. Markdown) test
    • Cross-host redirect following test
    • Section header quality disambiguation test
    • Content start position test
    • Scoring form with detailed breakdown
    • Answer key available as JSON
    • Maximum score of 20 points
    • Companion to Agent-Friendly Documentation Spec

    Integrations

    Claude Code
    Cursor
    GitHub Copilot
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate Agent Reading Test and help others make informed decisions.

    Developer

    Agent Ecosystem

    Agent Ecosystem builds open-source tools and benchmarks for evaluating AI coding agent capabilities. The project is created by Dachary Carey and focuses on empirical observation of real agent workflows. Agent Reading Test and the Agent-Friendly Documentation Spec are flagship projects that help developers understand and improve how AI agents consume documentation.

    Founded 2025
    Seattle, WA
    3 employees

    Used by

    Sponsors (undisclosed)
    Read more about Agent Ecosystem
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    harness-kit icon

    harness-kit

    A Python toolkit for building and evaluating AI agent harnesses, enabling structured testing and benchmarking of LLM-based agents.

    Ashr icon

    Ashr

    Ashr is an AI agent evaluation platform that mimics production environments and user behavior to catch agent failures before they reach real users.

    Maxim icon

    Maxim

    Enterprise-grade AI evaluation and observability platform for testing, monitoring, and improving AI agents and LLM applications.

    Browse all tools

    Related Topics

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    54 tools

    Agent Frameworks

    Tools and platforms for building and deploying custom AI agents.

    173 tools

    Documentation

    AI-driven tools that automatically generate, maintain, and organize technical documentation, user guides, and project artifacts with context-aware content and intelligent updating.

    44 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions