EveryDev.ai
Sign inSubscribe
Explore AI Tools
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    Home
    Tools

    2,480+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1596
    • Coding1181
    • Infrastructure526
    • Marketing447
    • Design427
    • Projects384
    • Research357
    • Analytics331
    • Testing221
    • MCP216
    • Data205
    • Security196
    • Integration169
    • Learning154
    • Communication146
    • Prompts140
    • Extensions137
    • Commerce123
    • Voice122
    • DevOps99
    • Web77
    • Finance21
    1. Home
    2. Tools
    3. Inspect AI
    Inspect AI icon

    Inspect AI

    LLM Evaluations
    Featured

    An open-source Python framework for large language model evaluations developed by the UK AI Security Institute, supporting agentic tasks, tool use, multi-turn dialog, and 200+ pre-built benchmarks.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under the MIT License. Install via pip and use with any supported model provider.

    Engagement

    Available On

    Linux
    API
    VS Code
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    LLM EvaluationsAgent FrameworksAI Development Libraries

    Alternatives

    ZeroEvalArize AIGambit
    Developer
    UK AI Security InstituteLondon, United KingdomEst. 2023

    Listed May 2026

    About Inspect AI

    Inspect is an open-source Python framework for large language model (LLM) evaluations, developed by the UK AI Security Institute (AISI) and Meridian Labs. It is available on GitHub under the MIT License and installable via PyPI. The framework targets a broad range of evaluation types—coding, agentic tasks, reasoning, knowledge, behavior, and multi-modal understanding—and ships with over 200 pre-built evaluations ready to run against any supported model.

    What It Is

    Inspect is a structured evaluation framework that organizes LLM assessments around three composable primitives: Datasets (labelled input/target samples), Solvers (chained prompt engineering and agent logic), and Scorers (output evaluation via text comparison, model grading, or custom schemes). This architecture lets researchers and engineers define reusable evaluation components and combine them into reproducible tasks. The @task decorator and inspect eval CLI command make it straightforward to run evaluations against any supported model provider from the command line or directly from Python.

    Model Provider Coverage

    Inspect supports a wide range of model providers out of the box:

    • Cloud APIs: OpenAI, Anthropic, Google (Gemini), Grok, Mistral, AWS Bedrock, Azure AI, TogetherAI, Groq, Cloudflare, Goodfire
    • Local inference: vLLM, Ollama, llama-cpp-python, TransformerLens, nnterp, Hugging Face Transformers

    Each provider is configured by installing the relevant Python package and setting the appropriate API key environment variable, keeping the setup path consistent across providers.

    Agentic and Tool Evaluation Capabilities

    Inspect includes flexible support for evaluating agents and tool-using models:

    • Built-in tools for bash execution, Python execution, text editing, web search, web browsing, and computer use
    • Custom tool definitions and MCP (Model Context Protocol) tool integration
    • Multi-agent primitives and support for running external agents such as Claude Code, Codex CLI, and Gemini CLI
    • A sandboxing system for isolating untrusted model-generated code, with backends for Docker, Kubernetes, Modal, Proxmox, and a custom extension API
    • Tool approval policies for fine-grained control over which tool calls models are permitted to make

    Tooling and Developer Experience

    Beyond the core evaluation engine, Inspect ships with a web-based Inspect View log viewer for monitoring and visualizing evaluation runs, and a VS Code Extension for authoring, debugging, and browsing logs directly in the editor. Evaluation logs are written locally by default and can be explored via inspect view in the browser. The framework also exposes a Python API (eval()) for programmatic use alongside the CLI, and supports structured output, reasoning model options, batch processing, adaptive concurrency, and early stopping.

    Open-Source Lineage and Current Status

    The repository was created in November 2023 and, according to the GitHub project page, was last updated in May 2026. It has accumulated over 2,100 stars and 517 forks. The project is maintained under the UKGovernmentBEIS GitHub organization and is released under the MIT License, making it freely usable, modifiable, and distributable. The documentation site at inspect.aisi.org.uk is actively maintained alongside the codebase, with the uv workflow supported for reproducible development environments.

    Inspect AI - 1

    Community Discussions

    Be the first to start a conversation about Inspect AI

    Share your experience with Inspect AI, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully free and open-source under the MIT License. Install via pip and use with any supported model provider.

    • Full framework access
    • 200+ pre-built evaluations
    • All built-in solvers, scorers, and tools
    • VS Code Extension
    • Web-based log viewer

    Capabilities

    Key Features

    • 200+ pre-built LLM evaluations
    • Composable Datasets, Solvers, and Scorers
    • Built-in prompt engineering solvers (chain-of-thought, self-critique)
    • Model-graded scoring
    • Multi-turn dialog support
    • Tool calling (bash, Python, text editing, web search, web browsing, computer use)
    • MCP (Model Context Protocol) tool integration
    • Custom tool definitions
    • Multi-agent evaluation primitives
    • Support for external agents (Claude Code, Codex CLI, Gemini CLI)
    • Sandboxing via Docker, Kubernetes, Modal, Proxmox
    • Tool approval policies
    • Web-based Inspect View log viewer
    • VS Code Extension for authoring and debugging
    • CLI and Python API
    • Structured output support
    • Reasoning model support
    • Batch processing mode
    • Adaptive concurrency and rate-limit handling
    • Multimodal evaluation (images, audio, video)
    • Eval Sets for large-scale evaluation runs
    • Early stopping API
    • Caching of model outputs
    • Extensions API for custom model providers, sandboxes, and storage

    Integrations

    OpenAI
    Anthropic
    Google Gemini
    Grok
    Mistral
    Hugging Face Transformers
    AWS Bedrock
    Azure AI
    TogetherAI
    Groq
    Cloudflare
    Goodfire
    vLLM
    Ollama
    llama-cpp-python
    TransformerLens
    nnterp
    Docker
    Kubernetes
    Modal
    Proxmox
    Model Context Protocol (MCP)
    Claude Code
    Codex CLI
    Gemini CLI
    OpenAI Agents SDK
    LangChain
    Pydantic AI
    VS Code
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate Inspect AI and help others make informed decisions.

    Developer

    UK AI Security Institute

    The UK AI Security Institute (AISI) builds tools and conducts research to evaluate the safety and security of frontier AI systems. AISI developed Inspect AI in collaboration with Meridian Labs as an open-source framework for rigorous LLM evaluations. The institute operates under the UK Government's Department for Science, Innovation and Technology and publishes its evaluation tooling publicly under permissive open-source licenses.

    Founded 2023
    London, United Kingdom
    150 employees

    Used by

    OpenAI
    Google DeepMind
    Anthropic
    Mistral AI
    Read more about UK AI Security Institute
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    ZeroEval icon

    ZeroEval

    Open-source evaluation framework for testing large language models with zero-shot prompting on reasoning and coding tasks.

    Arize AI icon

    Arize AI

    Arize AI is an enterprise AI and agent engineering platform for development, observability, and evaluation of LLM applications, AI agents, and ML models in production.

    Gambit icon

    Gambit

    Gambit is an open-source agent harness framework by Bolt Foundry for building, running, and verifying LLM workflows using typed decks.

    Browse all tools

    Related Topics

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    82 tools

    Agent Frameworks

    Tools and platforms for building and deploying custom AI agents.

    341 tools

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    189 tools
    Browse all topics
    Back to all tools
    Discussions