Inspect AI

Name: Inspect AI
Availability: OnlineOnly
Author: UK AI Security Institute

An open-source Python framework for large language model evaluations developed by the UK AI Security Institute, supporting agentic tasks, tool use, multi-turn dialog, and 200+ pre-built benchmarks.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under the MIT License. Install via pip and use with any supported model provider.

Engagement

Available On

Linux

API

VS Code

SDK

CLI

UK AI Security InstituteLondon, United KingdomEst. 2023

Listed May 2026

About Inspect AI

Inspect is an open-source Python framework for large language model (LLM) evaluations, developed by the UK AI Security Institute (AISI) and Meridian Labs. It is available on GitHub under the MIT License and installable via PyPI. The framework targets a broad range of evaluation types—coding, agentic tasks, reasoning, knowledge, behavior, and multi-modal understanding—and ships with over 200 pre-built evaluations ready to run against any supported model.

What It Is

Inspect is a structured evaluation framework that organizes LLM assessments around three composable primitives: Datasets (labelled input/target samples), Solvers (chained prompt engineering and agent logic), and Scorers (output evaluation via text comparison, model grading, or custom schemes). This architecture lets researchers and engineers define reusable evaluation components and combine them into reproducible tasks. The @task decorator and inspect eval CLI command make it straightforward to run evaluations against any supported model provider from the command line or directly from Python.

Model Provider Coverage

Inspect supports a wide range of model providers out of the box:

Cloud APIs: OpenAI, Anthropic, Google (Gemini), Grok, Mistral, AWS Bedrock, Azure AI, TogetherAI, Groq, Cloudflare, Goodfire
Local inference: vLLM, Ollama, llama-cpp-python, TransformerLens, nnterp, Hugging Face Transformers

Each provider is configured by installing the relevant Python package and setting the appropriate API key environment variable, keeping the setup path consistent across providers.

Agentic and Tool Evaluation Capabilities

Inspect includes flexible support for evaluating agents and tool-using models:

Built-in tools for bash execution, Python execution, text editing, web search, web browsing, and computer use
Custom tool definitions and MCP (Model Context Protocol) tool integration
Multi-agent primitives and support for running external agents such as Claude Code, Codex CLI, and Gemini CLI
A sandboxing system for isolating untrusted model-generated code, with backends for Docker, Kubernetes, Modal, Proxmox, and a custom extension API
Tool approval policies for fine-grained control over which tool calls models are permitted to make

Tooling and Developer Experience

Beyond the core evaluation engine, Inspect ships with a web-based Inspect View log viewer for monitoring and visualizing evaluation runs, and a VS Code Extension for authoring, debugging, and browsing logs directly in the editor. Evaluation logs are written locally by default and can be explored via inspect view in the browser. The framework also exposes a Python API (eval()) for programmatic use alongside the CLI, and supports structured output, reasoning model options, batch processing, adaptive concurrency, and early stopping.

Open-Source Lineage and Current Status

The repository was created in November 2023 and, according to the GitHub project page, was last updated in May 2026. It has accumulated over 2,100 stars and 517 forks. The project is maintained under the UKGovernmentBEIS GitHub organization and is released under the MIT License, making it freely usable, modifiable, and distributable. The documentation site at inspect.aisi.org.uk is actively maintained alongside the codebase, with the uv workflow supported for reproducible development environments.

Community Discussions

Be the first to start a conversation about Inspect AI

Share your experience with Inspect AI, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under the MIT License. Install via pip and use with any supported model provider.

Full framework access
200+ pre-built evaluations
All built-in solvers, scorers, and tools
VS Code Extension
Web-based log viewer

Capabilities

Key Features

200+ pre-built LLM evaluations
Composable Datasets, Solvers, and Scorers
Built-in prompt engineering solvers (chain-of-thought, self-critique)
Model-graded scoring
Multi-turn dialog support
Tool calling (bash, Python, text editing, web search, web browsing, computer use)
MCP (Model Context Protocol) tool integration
Custom tool definitions
Multi-agent evaluation primitives
Support for external agents (Claude Code, Codex CLI, Gemini CLI)
Sandboxing via Docker, Kubernetes, Modal, Proxmox
Tool approval policies
Web-based Inspect View log viewer
VS Code Extension for authoring and debugging
CLI and Python API
Structured output support
Reasoning model support
Batch processing mode
Adaptive concurrency and rate-limit handling
Multimodal evaluation (images, audio, video)
Eval Sets for large-scale evaluation runs
Early stopping API
Caching of model outputs
Extensions API for custom model providers, sandboxes, and storage

Integrations

OpenAI

Anthropic

Google Gemini

Grok

Mistral

Hugging Face Transformers

AWS Bedrock

Azure AI

TogetherAI

Groq

Cloudflare

Goodfire

vLLM

Ollama

llama-cpp-python

TransformerLens

nnterp

Docker

Kubernetes

Modal

Proxmox

Model Context Protocol (MCP)

Claude Code

Codex CLI

Gemini CLI

OpenAI Agents SDK

LangChain

Pydantic AI