# OpenTraces

> A CLI tool to parse, sanitize, and commit AI agent session traces to HuggingFace Hub for training, evaluation, and open data sharing.

OpenTraces is an open-source CLI tool that treats AI agent session traces as first-class data artifacts — capturing, sanitizing, and publishing them to HuggingFace Hub for use in fine-tuning, reinforcement learning, and evaluation pipelines. It follows a familiar git-like workflow (init, status, review, push) so developers can manage agent traces the same way they manage code. Every trace is automatically scanned for secrets, API keys, and PII before leaving the machine, ensuring privacy by default. Public traces are queryable via the HuggingFace Datasets API with no proprietary lock-in.

- **Git-like workflow** — *Use `init`, `status`, `review`, and `push` commands to manage agent session traces just like source code.*
- **Private-first redaction** — *19 regex patterns plus Shannon entropy analysis automatically scrub API keys, emails, database credentials, and filesystem paths before any data leaves your machine.*
- **Auto or review push modes** — *Choose between fully automatic capture-and-push or a human-in-the-loop inbox (TUI and web UI) to approve, redact, or reject sessions before committing.*
- **HuggingFace native** — *Publishes JSONL shards directly to HF Hub, loadable via `datasets.load_dataset()` with no proprietary format or subscription required.*
- **Traces as training data** — *Structured TraceRecord schema captures alternating role sequences, tool call/observation pairs, reasoning paths, and outcome signals — validated against 10 quality checks before upload.*
- **RL/RLHF support** — *Committed patches serve as reward proxies; per-step token costs enable cost-penalized reward; sub-agent hierarchy supports credit assignment.*
- **Observability and eval** — *Cache hit rates, per-step token breakdowns, duration timelines, and model distribution metrics turn production traces into reproducible eval datasets.*
- **Automatic deduplication** — *Content-hash dedup on push prevents duplicates when resetting state, switching machines, or re-pushing sessions.*
- **Agent-native CLI** — *Every command outputs structured JSON so agents can drive other agents programmatically.*
- **Agent harness integrations** — *Works with Claude Code, Codex, Cursor, OpenCode, Hermes, and other dev-time and run-time agents via a session hook.*

## Features
- Git-like trace workflow (init, status, review, push)
- Automatic PII and secret redaction (19 regex patterns + Shannon entropy)
- HuggingFace Hub native JSONL publishing
- TUI and web inbox for human-in-the-loop review
- Auto or review push modes per project
- Content-hash deduplication on push
- Structured TraceRecord schema with 10 quality checks
- RL/RLHF reward proxy support via committed patches
- Per-step token cost metrics
- Agent-queryable public datasets via HF Datasets API
- Structured JSON CLI output for agent-driven workflows
- Session hook for automatic trace capture

## Integrations
HuggingFace Hub, Claude Code, Codex, Cursor, OpenCode, Hermes, OpenClaw, DeepAgents

## Platforms
CLI, WEB, API

## Pricing
Open Source

## Version
0.2.1

## Links
- Website: https://www.opentraces.ai
- Documentation: https://www.opentraces.ai/docs
- Repository: https://github.com/jayfarei/opentraces
- EveryDev.ai: https://www.everydev.ai/tools/opentraces