Distributed Systems Testing Skills

Name: Distributed Systems Testing Skills
Availability: OnlineOnly
Author: shenli

Two AI agent skills (SKILL.md files) that design and execute claim-driven test plans for distributed and stateful systems, producing structured Markdown artifacts with 9-state verdicts and blame classification.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open source under the MIT License. Clone and use with any compatible AI coding agent.

Engagement

Available On

CLI

API

shenlishenli is an independent developer on GitHub who builds open…

Listed May 2026

About Distributed Systems Testing Skills

Distributed Systems Testing Skills is an open-source project by GitHub user shenli that provides two plain Markdown skill files for AI coding agents. The skills guide agents through designing and executing rigorous, claim-driven test plans for distributed and stateful systems, producing structured artifacts a human reviewer can read to decide whether to ship — without re-running any tests.

What It Is

This project delivers two SKILL.md files — designing-distributed-system-tests and executing-distributed-system-tests — that any AI coding agent capable of reading Markdown and running shell commands can execute. Compatible agents include Claude Code, Codex, Copilot CLI, Cursor, and Gemini. The skills are not a testing framework themselves; they are opinionated workflow instructions that direct an agent to produce a structured test plan and a findings report.

The Claim-Driven Workflow

The design skill starts from the product's own documented claims rather than from test setup. Every scenario is named after the claim it tries to falsify, making it harder to weaken over time. The plan structure spans ten numbered sections (§0–§9), covering architectural summary, scope, claims under test, SUT model, existing test inventory, failure-mode hypotheses, a coverage matrix, technique selection, scenarios with optional §7.M model/history/checker discipline blocks, a coverage adequacy argument, residual uncertainty, and a confidence statement.

For consistency-critical scenarios — those falsifying claims about safety, durability, idempotency, isolation, ordering, or membership — each scenario must declare:

An abstract model (register, queue, log, lock, lease, ledger, etc.)
An operation-history schema
A named checker (linearizability, serializability, session-consistency, no-lost-ack, exactly-once, etc.)
A nemesis with observable landing evidence
An ambiguous-outcome handling rule and a reduction plan with SUT/harness/checker/environment blame classification

The Execute Skill and 9-State Verdicts

The execute skill reads the plan, discovers the SUT's existing toolbox, probes the environment, and runs scenarios with checkpoint discipline. Every PASS must cite oracle execution evidence and proof that the fault actually fired — preventing "the chaos script ran cleanly" from being misread as "the claim survived." Every FAIL carries a blame tag (SUT, harness, checker, or environment) so reproducers reach the right queue. Verdicts come from a 9-state taxonomy that includes states like PASS-hardening, FAIL-reproducible, INCONCLUSIVE-fault-not-proven, and PARTIAL-model.

Output artifacts are written to testing-plans/<slug>.md for the plan and test-sessions/<UTC>/ for session logs, per-scenario findings, metrics, and a summary findings report.

Technique Catalog

Eight reference files distilled from the distributed systems testing literature are bundled under the design skill's references/ directory:

jepsen-and-elle.md — linearizability/serializability under faults
deterministic-simulation.md — reproducible bugs from a seed
chaos-and-fault-injection.md — real-cluster partial/asymmetric faults
fuzzing.md — input or concurrency fuzzing under sanitizers
formal-methods-tla.md — protocol correctness at design time
property-and-metamorphic.md — algebraic-law/metamorphic-relation testing
performance-and-benchmarking.md — tail latency, throughput, fairness
crash-recovery-and-upgrade.md — durability, replay, idempotency, mixed-version

Each file follows a consistent shape: when to reach for it, what it detects well, what it misses, concrete tools, papers, cost signal, and a plan checklist.

Current Status and Verification

The repository was created in May 2026 and had 159 stars and 9 forks as of its last recorded update. The project self-describes as "early but exercised": both skills have been run end-to-end against AgentDB (a distributed agent runtime in Rust) multiple times, surfacing six findings including one P0-candidate (now closed) and two P1s shipped as a PR. Real plan outputs, session directories, and findings reports from those runs live under verification/, with one subdirectory per run. An eval suite under evals/ validates behavioral changes to the SKILL.md bodies between iterations. The skill bodies are expected to evolve as harness experience accumulates.

Community Discussions

Be the first to start a conversation about Distributed Systems Testing Skills

Share your experience with Distributed Systems Testing Skills, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open source under the MIT License. Clone and use with any compatible AI coding agent.

Both SKILL.md files (design and execute)
Eight-file technique catalog
Plan and findings report templates
Idempotent one-line install
Eval suite

Capabilities

Key Features

Claim-driven test plan design starting from product promises
Two SKILL.md files: one for designing plans, one for executing them
§7.M model/history/checker discipline blocks for consistency-critical scenarios
9-state verdict taxonomy (PASS, FAIL, INCONCLUSIVE, PARTIAL, etc.)
SUT/harness/checker/environment blame classification for every FAIL
Coverage adequacy argument and confidence statement in every plan
Eight-file technique catalog (Jepsen/Elle, chaos, fuzzing, TLA+, etc.)
Change-scoped and project-wide design modes
Default (read-only SUT) and author mode for execute skill
Idempotent one-line install via INSTALL.md
Compatible with Claude Code, Codex, Copilot CLI, Cursor, Gemini
Structured Markdown output: plan (§0–§9) and findings report
Checkpoint discipline for long-running execute sessions
Eval suite for validating SKILL.md behavioral changes

Integrations

Claude Code

OpenAI Codex

GitHub Copilot CLI

Cursor

Google Gemini

AgentDB

Porcupine (linearizability checker)

Elle (isolation anomaly checker)

iptables (fault injection)

Jepsen

API Available

View Docs

Back to all tools