Distributed Systems Testing Skills
Two AI agent skills (SKILL.md files) that design and execute claim-driven test plans for distributed and stateful systems, producing structured Markdown artifacts with 9-state verdicts and blame classification.
At a Glance
Fully free and open source under the MIT License. Clone and use with any compatible AI coding agent.
Engagement
Available On
Listed May 2026
About Distributed Systems Testing Skills
Distributed Systems Testing Skills is an open-source project by GitHub user shenli that provides two plain Markdown skill files for AI coding agents. The skills guide agents through designing and executing rigorous, claim-driven test plans for distributed and stateful systems, producing structured artifacts a human reviewer can read to decide whether to ship — without re-running any tests.
What It Is
This project delivers two SKILL.md files — designing-distributed-system-tests and executing-distributed-system-tests — that any AI coding agent capable of reading Markdown and running shell commands can execute. Compatible agents include Claude Code, Codex, Copilot CLI, Cursor, and Gemini. The skills are not a testing framework themselves; they are opinionated workflow instructions that direct an agent to produce a structured test plan and a findings report.
The Claim-Driven Workflow
The design skill starts from the product's own documented claims rather than from test setup. Every scenario is named after the claim it tries to falsify, making it harder to weaken over time. The plan structure spans ten numbered sections (§0–§9), covering architectural summary, scope, claims under test, SUT model, existing test inventory, failure-mode hypotheses, a coverage matrix, technique selection, scenarios with optional §7.M model/history/checker discipline blocks, a coverage adequacy argument, residual uncertainty, and a confidence statement.
For consistency-critical scenarios — those falsifying claims about safety, durability, idempotency, isolation, ordering, or membership — each scenario must declare:
- An abstract model (
register,queue,log,lock,lease,ledger, etc.) - An operation-history schema
- A named checker (linearizability, serializability, session-consistency, no-lost-ack, exactly-once, etc.)
- A nemesis with observable landing evidence
- An ambiguous-outcome handling rule and a reduction plan with SUT/harness/checker/environment blame classification
The Execute Skill and 9-State Verdicts
The execute skill reads the plan, discovers the SUT's existing toolbox, probes the environment, and runs scenarios with checkpoint discipline. Every PASS must cite oracle execution evidence and proof that the fault actually fired — preventing "the chaos script ran cleanly" from being misread as "the claim survived." Every FAIL carries a blame tag (SUT, harness, checker, or environment) so reproducers reach the right queue. Verdicts come from a 9-state taxonomy that includes states like PASS-hardening, FAIL-reproducible, INCONCLUSIVE-fault-not-proven, and PARTIAL-model.
Output artifacts are written to testing-plans/<slug>.md for the plan and test-sessions/<UTC>/ for session logs, per-scenario findings, metrics, and a summary findings report.
Technique Catalog
Eight reference files distilled from the distributed systems testing literature are bundled under the design skill's references/ directory:
- jepsen-and-elle.md — linearizability/serializability under faults
- deterministic-simulation.md — reproducible bugs from a seed
- chaos-and-fault-injection.md — real-cluster partial/asymmetric faults
- fuzzing.md — input or concurrency fuzzing under sanitizers
- formal-methods-tla.md — protocol correctness at design time
- property-and-metamorphic.md — algebraic-law/metamorphic-relation testing
- performance-and-benchmarking.md — tail latency, throughput, fairness
- crash-recovery-and-upgrade.md — durability, replay, idempotency, mixed-version
Each file follows a consistent shape: when to reach for it, what it detects well, what it misses, concrete tools, papers, cost signal, and a plan checklist.
Current Status and Verification
The repository was created in May 2026 and had 159 stars and 9 forks as of its last recorded update. The project self-describes as "early but exercised": both skills have been run end-to-end against AgentDB (a distributed agent runtime in Rust) multiple times, surfacing six findings including one P0-candidate (now closed) and two P1s shipped as a PR. Real plan outputs, session directories, and findings reports from those runs live under verification/, with one subdirectory per run. An eval suite under evals/ validates behavioral changes to the SKILL.md bodies between iterations. The skill bodies are expected to evolve as harness experience accumulates.
Community Discussions
Be the first to start a conversation about Distributed Systems Testing Skills
Share your experience with Distributed Systems Testing Skills, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open source under the MIT License. Clone and use with any compatible AI coding agent.
- Both SKILL.md files (design and execute)
- Eight-file technique catalog
- Plan and findings report templates
- Idempotent one-line install
- Eval suite
Capabilities
Key Features
- Claim-driven test plan design starting from product promises
- Two SKILL.md files: one for designing plans, one for executing them
- §7.M model/history/checker discipline blocks for consistency-critical scenarios
- 9-state verdict taxonomy (PASS, FAIL, INCONCLUSIVE, PARTIAL, etc.)
- SUT/harness/checker/environment blame classification for every FAIL
- Coverage adequacy argument and confidence statement in every plan
- Eight-file technique catalog (Jepsen/Elle, chaos, fuzzing, TLA+, etc.)
- Change-scoped and project-wide design modes
- Default (read-only SUT) and author mode for execute skill
- Idempotent one-line install via INSTALL.md
- Compatible with Claude Code, Codex, Copilot CLI, Cursor, Gemini
- Structured Markdown output: plan (§0–§9) and findings report
- Checkpoint discipline for long-running execute sessions
- Eval suite for validating SKILL.md behavioral changes
