Crux

Name: Crux
Availability: OnlineOnly
Author: Crux

Open-source TypeScript toolkit for harness engineering — typed building blocks for prompts, context, memory, retrieval, guardrails, and observability around your LLM calls.

Visit Website

At a Glance

Pricing

Open Source

Free and open-source under the Apache-2.0 license. Use, modify, and distribute according to license terms.

Engagement

Available On

Linux

Web

API

SDK

CLI

CruxNew York, NYEst. 2023$77M raised

Listed Jun 2026

About Crux

Crux is an open-source TypeScript toolkit for what its authors call "harness engineering" — the discipline of deliberately assembling, inspecting, and testing everything around a model call. It sits alongside your existing SDK (Vercel AI SDK, OpenAI, Anthropic, Google GenAI) rather than replacing it, providing typed building blocks for the pieces that most commonly cause AI feature failures: stale context, missing memory, dropped instructions, unsafe inputs, and untested regressions. The project is licensed under Apache-2.0 and is currently in public alpha.

What It Is

Crux is a modular TypeScript library that structures the "harness" around LLM calls. The core idea is that bad model output is rarely a model problem — it is usually a problem with what gets sent to the model. Crux makes those surrounding pieces explicit: typed prompt() definitions with Zod input/output schemas, composable context() blocks, memory() for recent messages and facts, retriever() for RAG pipelines, guardrail() for PII and injection filtering, constraint() for semantic output validation with retry, and evaluate() for quality suites and CI-friendly baselines. All of these plug into a single use: array on a prompt definition, and the SDK you already use still makes the actual model call.

Architecture: Define → Resolve → Adapt → Observe

Every Crux execution follows a four-stage pipeline:

Define — Author pure TypeScript definitions (prompts, contexts, memory blocks, tools, agents, flows, tests) that do not import a provider SDK.
Resolve — At call time, Crux validates input, filters conditional blocks, merges tools and settings, applies token budgets, and produces a provider-agnostic resolved prompt.
Adapt — An adapter maps the resolved prompt to Vercel AI SDK, OpenAI, Anthropic, Google GenAI, Convex Agent, or another runner.
Observe — Hooks emit structured events for generations, context resolution, memory reads/writes, retrieval, tools, evals, judge scores, artifacts, errors, and cost.

This separation means you can inspect what the model will see before the call runs, execute the same prompt through multiple providers, and keep quality checks tied to the definitions they protect.

Package Ecosystem

Crux ships as a family of focused packages rather than a monolithic framework:

@crux/core — SDK-agnostic primitives for prompts, contexts, memory, retrieval, safety, routing, quality, agents, and observability
@crux/ai — Vercel AI SDK adapter for generate, stream, and structured output
@crux/openai, @crux/anthropic, @crux/google — Provider-specific adapters
@crux/convex — Convex storage, server boundaries, agent bridge, and swarm integration
@crux/upstash — Upstash Vector and Redis-backed storage adapters
@crux/otel — OpenTelemetry integration for production traces (Datadog, Honeycomb, Grafana, New Relic)
@crux/local — Native local runtime, CLI, TUI, HTTP/WS server, embedded devtools, eval runner, and catalog
@crux/devtools — React devtools UI for traces, evals, source catalog, memory, plans, and runtime inspection

Observability and Evaluation

Crux provides two observability surfaces. In development, crux dev and crux traces open a visual devtools UI and terminal dashboard showing live trace timelines, resolved system previews, memory operations, and rolling quality averages. In production, @crux/otel exports OpenTelemetry spans to any compatible platform and is documented to work in Lambda, Convex, and Cloudflare Workers. For evaluation, Crux supports local quality suites with built-in judges (faithfulness, relevance, safety), prompt tests, variants, cassettes, baselines, and CI-friendly runs via crux quality run.

Current Status: Alpha

The GitHub README explicitly labels Crux as alpha software: "APIs may change, things may break, and examples may lag behind the implementation until the first stable release." The repository was created in May 2026 and last pushed in June 2026. The shipped foundation includes typed prompts and contexts, conditional and budgeted context resolution, memory blocks, retrieval and grounding, guardrails and constraints, routing and fallback, quality suites, a canonical observability graph, local devtools/runtime, and OpenTelemetry export. The README notes that a deeper "proof layer" — whole-call decision reports, richer rationale artifacts, a unified freshness vocabulary, and a polished harness-decision matcher library — is still in progress. Public npm packages are listed as "pending" on the homepage. TypeScript compatibility is verified against >=5.5 <7, with TypeScript 7 tracked as a preview lane.

Community Discussions

Be the first to start a conversation about Crux

Share your experience with Crux, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Free and open-source under the Apache-2.0 license. Use, modify, and distribute according to license terms.

Full @crux/core primitives
Provider adapters (Vercel AI SDK, OpenAI, Anthropic, Google GenAI)
Memory, retrieval, guardrails, and constraints
Quality suites and evaluation runner
Local devtools and CLI

Capabilities

Key Features

Typed prompt() definitions with Zod input/output schemas
Composable context() blocks for brand voice, policies, and shared tools
Memory blocks: recent messages, facts, episodes, procedures, and policies
Retrieval: indexers, corpora, retrievers, rerankers, grounding, and citations
Guardrails for PII detection, prompt injection, and safety filtering
Constraints for semantic output validation with retry and feedback
Model routing, fallback, semantic cache, pricing tables, and budgets
Quality suites with built-in judges (faithfulness, relevance, safety)
CI-friendly evaluation runner with baselines and variants
Local devtools with live trace timeline and terminal dashboard
OpenTelemetry export for Datadog, Honeycomb, Grafana, and New Relic
Agent composition: pipelines, parallel runs, consensus, swarms, blackboards, handoffs
SDK-agnostic prompt definitions with provider adapters
Single use: array composition model for all blocks
TypeScript >=5.5 <7 compatibility

Integrations

Vercel AI SDK

OpenAI SDK

Anthropic SDK

Google GenAI SDK

Convex

Upstash Vector

Upstash Redis

Datadog

Honeycomb

Grafana

New Relic

OpenTelemetry

Next.js

Node.js

Expo / React Native

Cloudflare Workers

AWS Lambda

Zod

API Available

View Docs

Back to all tools Suggest an edit