AgentDoG

Name: AgentDoG
Availability: OnlineOnly
Author: AI45Lab

A risk-aware evaluation and guardrail framework for autonomous agents that analyzes full execution trajectories to detect safety risks in AI agent systems.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache 2.0 License. Download models from Hugging Face or ModelScope and deploy locally.

Engagement

Available On

API

CLI

SDK

AI45LabAI45Lab builds research frameworks and tools for AI safety a…

Listed May 2026

About AgentDoG

AgentDoG is a diagnostic guardrail framework for AI agent safety and security, focusing on trajectory-level risk assessment of autonomous agents. Unlike single-step content moderation or final-output filtering, AgentDoG analyzes the full execution trace of tool-using agents to detect risks that emerge mid-trajectory. It provides fine-grained risk labels across three dimensions—risk source, failure mode, and real-world harm—and outperforms existing approaches on R-Judge, ASSE-Safety, and ATBench benchmarks.

Trajectory-Level Monitoring: Evaluates multi-step agent executions spanning observations, reasoning, and actions to catch risks at any point during execution.
Taxonomy-Guided Diagnosis: Provides fine-grained risk labels (risk source, failure mode, and real-world harm) with 8 risk-source categories, 14 failure modes, and 10 real-world harm categories.
ATBench Dataset: Includes a released benchmark of 500 trajectories (250 safe / 250 unsafe) with ~8.97 turns per trajectory and 1575 unique tools for evaluation.
Multiple Model Variants: Fine-tuned guard models available on Hugging Face based on Qwen3-4B, Qwen2.5-7B, and Llama3.1-8B for both binary and fine-grained classification tasks.
Flexible Deployment: Supports SGLang and vLLM for OpenAI-compatible API endpoints, as well as direct Transformers inference.
Agentic XAI Attribution: Hierarchical framework for explaining internal drivers behind agent actions, decomposing trajectories into pivotal components and fine-grained textual evidence.
State-of-the-Art Performance: AgentDoG-4B achieves 91.8% on R-Judge, 80.4% on ASSE-Safety, and 92.8% on ATBench, outperforming LlamaGuard, Qwen3-Guard, and ShieldAgent.
Customizable Prompts and Taxonomy: Edit prompt templates and taxonomy labels to adapt the framework to custom agent safety requirements.

Community Discussions

Be the first to start a conversation about AgentDoG

Share your experience with AgentDoG, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache 2.0 License. Download models from Hugging Face or ModelScope and deploy locally.

All guard model variants (4B, 7B, 8B)
ATBench benchmark dataset
Prompt templates and taxonomy
SGLang and vLLM deployment scripts
Agentic XAI Attribution framework

Capabilities

Key Features

Trajectory-level safety evaluation (binary safe/unsafe classification)
Fine-grained risk diagnosis (Risk Source, Failure Mode, Real-World Harm)
ATBench benchmark dataset with 500 annotated trajectories
Multiple fine-tuned guard models (4B, 7B, 8B parameters)
SGLang and vLLM deployment support
OpenAI-compatible API endpoint
Direct Transformers inference support
Agentic XAI Attribution framework
Interactive HTML heatmap visualization
Customizable prompt templates and taxonomy labels
Three-stage taxonomy-guided data synthesis pipeline

Integrations

Hugging Face

ModelScope

SGLang

vLLM

Transformers

Qwen3

Qwen2.5

Llama3.1

API Available

View Docs

Back to all tools

AgentDoG

Application Security

A risk-aware evaluation and guardrail framework for autonomous agents that analyzes full execution trajectories to detect safety risks in AI agent systems.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache 2.0 License. Download models from Hugging Face or ModelScope and deploy locally.

Engagement

Discussions

Available On

API

CLI

SDK

Resources

Website Docs GitHub llms.txt

Topics

Application Security Autonomous Systems LLM Evaluations

Alternatives

General Analysis promptfoo CodeWall

Developer

AI45LabAI45Lab builds research frameworks and tools for AI safety a…

Listed May 2026

About AgentDoG

Trajectory-Level Monitoring: Evaluates multi-step agent executions spanning observations, reasoning, and actions to catch risks at any point during execution.
Taxonomy-Guided Diagnosis: Provides fine-grained risk labels (risk source, failure mode, and real-world harm) with 8 risk-source categories, 14 failure modes, and 10 real-world harm categories.
ATBench Dataset: Includes a released benchmark of 500 trajectories (250 safe / 250 unsafe) with ~8.97 turns per trajectory and 1575 unique tools for evaluation.
Multiple Model Variants: Fine-tuned guard models available on Hugging Face based on Qwen3-4B, Qwen2.5-7B, and Llama3.1-8B for both binary and fine-grained classification tasks.
Flexible Deployment: Supports SGLang and vLLM for OpenAI-compatible API endpoints, as well as direct Transformers inference.
Agentic XAI Attribution: Hierarchical framework for explaining internal drivers behind agent actions, decomposing trajectories into pivotal components and fine-grained textual evidence.
State-of-the-Art Performance: AgentDoG-4B achieves 91.8% on R-Judge, 80.4% on ASSE-Safety, and 92.8% on ATBench, outperforming LlamaGuard, Qwen3-Guard, and ShieldAgent.
Customizable Prompts and Taxonomy: Edit prompt templates and taxonomy labels to adapt the framework to custom agent safety requirements.

Community Discussions

Be the first to start a conversation about AgentDoG

Share your experience with AgentDoG, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache 2.0 License. Download models from Hugging Face or ModelScope and deploy locally.

All guard model variants (4B, 7B, 8B)
ATBench benchmark dataset
Prompt templates and taxonomy
SGLang and vLLM deployment scripts
Agentic XAI Attribution framework

Capabilities

Key Features

Trajectory-level safety evaluation (binary safe/unsafe classification)
Fine-grained risk diagnosis (Risk Source, Failure Mode, Real-World Harm)
ATBench benchmark dataset with 500 annotated trajectories
Multiple fine-tuned guard models (4B, 7B, 8B parameters)
SGLang and vLLM deployment support
OpenAI-compatible API endpoint
Direct Transformers inference support
Agentic XAI Attribution framework
Interactive HTML heatmap visualization
Customizable prompt templates and taxonomy labels
Three-stage taxonomy-guided data synthesis pipeline

Integrations

Hugging Face

ModelScope

SGLang

vLLM

Transformers

Qwen3

Qwen2.5

Llama3.1

API Available

View Docs

Back to all tools