AgentDoG
A risk-aware evaluation and guardrail framework for autonomous agents that analyzes full execution trajectories to detect safety risks in AI agent systems.
At a Glance
Fully free and open-source under Apache 2.0 License. Download models from Hugging Face or ModelScope and deploy locally.
Engagement
Available On
Alternatives
Listed May 2026
About AgentDoG
AgentDoG is a diagnostic guardrail framework for AI agent safety and security, focusing on trajectory-level risk assessment of autonomous agents. Unlike single-step content moderation or final-output filtering, AgentDoG analyzes the full execution trace of tool-using agents to detect risks that emerge mid-trajectory. It provides fine-grained risk labels across three dimensions—risk source, failure mode, and real-world harm—and outperforms existing approaches on R-Judge, ASSE-Safety, and ATBench benchmarks.
- Trajectory-Level Monitoring: Evaluates multi-step agent executions spanning observations, reasoning, and actions to catch risks at any point during execution.
- Taxonomy-Guided Diagnosis: Provides fine-grained risk labels (risk source, failure mode, and real-world harm) with 8 risk-source categories, 14 failure modes, and 10 real-world harm categories.
- ATBench Dataset: Includes a released benchmark of 500 trajectories (250 safe / 250 unsafe) with ~8.97 turns per trajectory and 1575 unique tools for evaluation.
- Multiple Model Variants: Fine-tuned guard models available on Hugging Face based on Qwen3-4B, Qwen2.5-7B, and Llama3.1-8B for both binary and fine-grained classification tasks.
- Flexible Deployment: Supports SGLang and vLLM for OpenAI-compatible API endpoints, as well as direct Transformers inference.
- Agentic XAI Attribution: Hierarchical framework for explaining internal drivers behind agent actions, decomposing trajectories into pivotal components and fine-grained textual evidence.
- State-of-the-Art Performance: AgentDoG-4B achieves 91.8% on R-Judge, 80.4% on ASSE-Safety, and 92.8% on ATBench, outperforming LlamaGuard, Qwen3-Guard, and ShieldAgent.
- Customizable Prompts and Taxonomy: Edit prompt templates and taxonomy labels to adapt the framework to custom agent safety requirements.
Community Discussions
Be the first to start a conversation about AgentDoG
Share your experience with AgentDoG, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under Apache 2.0 License. Download models from Hugging Face or ModelScope and deploy locally.
- All guard model variants (4B, 7B, 8B)
- ATBench benchmark dataset
- Prompt templates and taxonomy
- SGLang and vLLM deployment scripts
- Agentic XAI Attribution framework
Capabilities
Key Features
- Trajectory-level safety evaluation (binary safe/unsafe classification)
- Fine-grained risk diagnosis (Risk Source, Failure Mode, Real-World Harm)
- ATBench benchmark dataset with 500 annotated trajectories
- Multiple fine-tuned guard models (4B, 7B, 8B parameters)
- SGLang and vLLM deployment support
- OpenAI-compatible API endpoint
- Direct Transformers inference support
- Agentic XAI Attribution framework
- Interactive HTML heatmap visualization
- Customizable prompt templates and taxonomy labels
- Three-stage taxonomy-guided data synthesis pipeline
