What Anthropic shipped at Code with Claude 2026
Ami Vora walked on stage at Code w/ Claude in San Francisco this morning and the first thing she said wasn't a model release. API volume was up 17x year-on-year. No new Claude. The framing for the day, per Vora: making our products work better for you.
That's a strange keynote to give in 2026. Every other lab on the calendar is selling capability, bigger context, faster tokens, cheaper inference. Anthropic spent the day on five new primitives in Claude Managed Agents, an enterprise GA of Claude Cowork, Code Review, Routines for scheduled overnight work, and a SpaceX deal for 220k GPUs in the next month. What Anthropic announced is the model's runtime.
For developers building on Managed Agents, that runtime is a different shape today. Five features changed at once, and they fit together cleanly enough that ignoring one probably means hand-rolling something the platform now does for you.
Memory: the base layer
Memory is the foundation. It is also the easiest feature to underrate, given the API surface is tiny.
A memory store is a filesystem-shaped data structure that persists across sessions for a given agent. You read it at the start of a session, write to it during the run, and the next session sees the result. Audit logs are first-class. Stores are portable, so you can attach the same store to multiple agents or copy a store between environments. Everything is API-controlled, which means listing, snapshotting, attaching, and deleting all happen from your own code.
Memory was already in private preview. As of today, it is public beta under the managed-agents-2026-04-01 header.
Until today, "give my agent memory" usually meant gluing a vector DB or a Postgres row to the prompt and hoping the embedding retrieved the right thing. Memory replaces that with a structured store the runtime understands and the agent can read and write deterministically. Less magical than RAG. More reliable.
Memory matters more than it looks because the next four features all assume it exists.
Dreams: memory that curates itself
Memory stores rot. Duplicates pile up. A user changes their preferred coding style; the old preference still sits in the store, and the agent now sees both. Stale entries start to outnumber fresh ones.
Dreams is an async job that reads your existing memory store plus up to 100 past session transcripts and produces a new memory store. It cleans up duplicates and resolves contradictions in favor of the latest value. It also pulls fresh insights from session transcripts, not just curates what's already in the store.
dream = client.beta.dreams.create(
inputs=[
{"type": "memory_store", "memory_store_id": store_id},
{"type": "sessions", "session_ids": [...]},
],
model="claude-opus-4-7",
instructions="Focus on coding-style preferences; ignore one-off debugging notes.",
)
The input store is never modified. The dream produces a separate output store. You review it, then either attach it to future sessions or discard it. There is no destructive in-place edit, which means you can run dreams aggressively without risking the production memory.
You can pass instructions to steer what gets curated. It runs on opus-4-7 or sonnet-4-6 and is billed at standard token rates. Limits: 100 sessions per dream, 4,096 character instructions. Status is research preview, gated behind a request access form, with the additional dreaming-2026-04-21 beta header.
While the dream is running, it exposes a session_id that points at the underlying session executing the curation pipeline. You can stream events from that session and watch the curation happen in real time, which is unusually transparent for an async cleanup job.
Anthropic shared one customer data point during the keynote. Harvey ran Dreams against the memory stores their legal agents use to remember filetype workarounds and reported a roughly 6x improvement in completion rate on the same tasks. One proof point, but a meaningful one, because filetype workarounds are exactly the kind of stale, duplicate-prone knowledge memory tends to accumulate.
Recommended
Every Major AI Coding Tool Now Has a No-Approval Mode

You ask your coding agent to scaffold a project. It creates files, installs packages, runs setup commands, and starts fixing import errors. Somewhere around the eighth "Continue" click, you stop reading what it's asking.…
Read nextMultiagent orchestration: coordinators without coordination code
Yesterday, a "multiagent" Claude system meant you had written a coordinator yourself. You managed threads, routed tool calls, passed data between agents, deduplicated work. Real engineers were rolling their own LangGraph at 3am.
Now a coordinator agent declares its subagents with callable_agents, and the runtime handles the rest. Each subagent runs in its own context-isolated thread with parallel execution and a separate conversation history. They share the container and filesystem, so they can hand each other files, but tools and context are not shared.
There is one constraint that matters: one level of delegation. Subagents cannot spawn their own subagents. That sounds limiting until you remember how many "agentic" architectures collapse because the recursion fan-out gets out of control. One level deep is opinionated, and the opinion is right for most real workflows.
The threads are persistent, which is easy to miss in the docs. The coordinator can return to a previous subagent days later, and that subagent remembers everything. Combined with Memory, this gives you long-running specialist agents you delegate to, instead of recreating each one from scratch.
The pattern that fits today is specialist subagents called by a coordinator. A code reviewer that takes a diff and returns comments. A test-writer that takes a function signature and returns a suite. None of them needs to know the others exist; they just need to do one thing well when called.
Outcomes loop: the sleeper
This is the one I think changes the most about how you would structure agentic workflows.
Most agent runs today end the way conversations end: when the message queue runs out. The agent does what it does, you read the result, and you decide whether it is good. If not, tell it what to fix. This works, but it puts the success criterion in your head rather than in the system.
The Outcomes loop moves it into the system. You define a rubric in markdown and attach it to the session via user.define_outcome. The agent works toward it while a separate grader, with its own context window and no access to the agent's reasoning, evaluates each iteration against the rubric.
{
"type": "user.define_outcome",
"description": "Build a DCF model for Costco in .xlsx",
"rubric": { "type": "text", "content": "..." },
"max_iterations": 5
}
The grader returns either "satisfied" or per-criterion gaps. The agent iterates until it passes or hits max_iterations (default 3, max 20; verify against the API reference, since the docs and keynote materials disagreed slightly on these numbers).
This turns the agent's job from "have a conversation" into "produce a deliverable that meets a spec." Think evals built into the runtime. The grader is structurally separate from the agent, and that separation is what makes it work; if the same model that produced the output also graded it, you would get the same self-flattering "looks good to me" we have seen in every weak QA agent shipped this year.
The use case I keep coming back to: anything where the deliverable is a structured artifact (a spreadsheet, a config file, an SQL migration, a refactor that has to pass a test suite). Anywhere there is a clear pass-fail criterion the agent can self-check against, this loop replaces the human-in-the-loop review cycle.
One caveat. Press coverage describes Outcomes as public beta. The official docs page lists it as research preview with request access gating. If that contradiction matters for your project, treat it as research preview until the docs page changes.
Webhooks: stop polling
Webhooks are the least flashy of the five, and the one most teams will adopt fastest, because they fix something everyone has wired around.
Subscribe to session and vault events and get notified when state changes. The signing setup is standard: 32-byte whsec_-prefixed secrets delivered with an X-Webhook-Signature header, plus an unwrap() SDK helper that verifies the signature and rejects payloads older than 5 minutes.
Two details matter for production. First, payloads only contain type and id, not the full object. You fetch the resource on receipt. This avoids the classic webhook race in which the payload you are holding is older than what's in the database. Second, the endpoint auto-disables after roughly 20 consecutive failures or if it resolves to a private IP. That is an opinionated default that will save you from the deployment where someone accidentally points production webhooks at a staging URL.
The key event to subscribe to is session.status_idled, which fires when an agent is done or is waiting for a tool result. It is the “your agent needs you” event, which lets you stop polling for completion.
What to plug in first
Five features under one beta header, each with integration cost. The order I would pick for a team building on Managed Agents today:
| Order | Feature | Why this position |
|---|---|---|
| 1 | Webhooks | Lowest integration cost, biggest immediate win. If you are polling for session state today, this is a one-day rewrite that improves latency and cuts API calls. |
| 2 | Memory | The other features assume it. Get the substrate in place before you build on top. |
| 3 | Multiagent | If your current system has a hand-rolled coordinator, replace that code with callable_agents and delete the orchestration layer you have been maintaining. |
| 4 | Outcomes | Bigger architectural shift, since you are now writing rubrics rather than prompts. Pays off in any workflow where the deliverable has a clear pass-fail. |
| 5 | Dreams | Research preview, gated, and most useful once memory accumulation has produced real drift. Wait until your stores have a backlog of duplicates and stale entries to clean up. |
If you only have time to integrate one feature this week, make it Webhooks. If you only have time to redesign one part of your agent architecture this quarter, make it Outcomes.
The pattern across all five
Anthropic shipped the runtime substrate for long-running agents today: memory that does not rot, parallelism without orchestration code, success criteria the agent self-checks against, and async notifications so you are not babysitting sessions.
The framing in the engineering blog post is "decoupling the brain from the hands." For developers, the practical version is simpler: the things you have been hand-rolling around the model are now part of the platform. Less code in your repo, more capability you can rely on.
Code with Claude 2026 was not a model day. It was the day the platform around the model became the product.
References
Anthropic engineering thesis post Scaling Managed Agents: Decoupling the brain from the hands
Managed Agents docs
- Overview: https://platform.claude.com/docs/en/managed-agents/overview
- Memory: https://platform.claude.com/docs/en/managed-agents/memory
- Dreams: https://platform.claude.com/docs/en/managed-agents/dreams
- Multi-agent: https://platform.claude.com/docs/en/managed-agents/multi-agent
- Define outcomes: https://platform.claude.com/docs/en/managed-agents/define-outcomes
- Webhooks: https://platform.claude.com/docs/en/managed-agents/webhooks
- Sessions: https://platform.claude.com/docs/en/managed-agents/sessions
- Event stream: https://platform.claude.com/docs/en/managed-agents/events-and-streaming
- Production cookbook: https://platform.claude.com/cookbook/managed-agents-cma-operate-in-production
- Request access: https://claude.com/form/claude-managed-agents
Coverage and live blogs
- Simon Willison's live blog: https://simonwillison.net/2026/May/6/code-w-claude-2026/
- The New Stack: https://thenewstack.io/anthropic-managed-agents-dreaming-outcomes/
- SiliconANGLE on Dreams: https://siliconangle.com/2026/05/06/anthropic-letting-claude-agents-dream-dont-sleep-job/
- VentureBeat on Cowork: https://venturebeat.com/orchestration/anthropic-says-claude-code-transformed-programming-now-claude-cowork-is
Comments
No comments yet
Be the first to share your thoughts