
Issue #23 · Weekly Digest
Weekly AI Dev News Digest: May 30 - June 5, 2026
Microsoft, GitHub, Cognition, and Cloudflare all reshaped their coding tools around AI agents, and most of them changed how they charge for agent usage too.
Microsoft made its own 5-billion-parameter coding model a default in VS Code. GitHub turned Copilot into a runtime other apps can embed. Cognition deleted the Windsurf brand and relaunched it as Devin Desktop. Cloudflare bought the company behind Vite. All of it points the same way: the tools now expect an AI agent to be the one using them, not just a person. Two separate "let any agent talk to any tool" protocols landed within days of each other.
On models, Nvidia and Google shipped the best open-weight models US labs have ever released, and both still lost to a Chinese model from April. Almost every vendor also changed how it charges for agent usage, because agents burn real compute and somebody has to pay for it. And Anthropic, in the middle of filing to go public, disclosed that Claude now writes more than 80% of its own merged code, which reads as either the payoff or the warning, depending on how closely it gets read.
550B
params in Nvidia's largest open model
48 to 54
the US open-weight gap to China
100M
weekly Vite downloads now under Cloudflare
80%
of Anthropic's merged code written by Claude
$44B
Ramp's valuation selling token-spend tracking
In Focus
Nvidia and Google Released New Open-Weight Models
Nvidia put its biggest open model ever on Hugging Face. Nemotron 3 Ultra is a 550-billion-parameter mixture-of-experts model with only 55B active, built on a hybrid Mamba-2 and attention design, a one-million-token context, and a clear target: long-running agents that plan and call tools. Weights went live June 4 on Hugging Face, OpenRouter, ModelScope, and NVIDIA NIM under a license that permits commercial use, serving north of 300 tokens per second on early endpoints (The New Stack). The day before, Google released Gemma 4 12B under Apache 2.0, small enough to run multimodal AI on a 16GB laptop. It reads text, images, audio, and video through a single decoder with no separate vision or audio encoders, which roughly halves the memory footprint versus encoder-based designs, and Google claims it beats the older Gemma 3 27B on GPQA Diamond, MMLU Pro, and DocVQA (Google).
Both are the best open weights their labs have produced. Both still sit behind a Chinese model that came out in April. On the Artificial Analysis Intelligence Index, Nemotron 3 Ultra scores 48, the highest of any American open model, well ahead of Gemma and gpt-oss, and still four points short of Moonshot's Kimi K2.6 at 54 (Artificial Analysis).
Model | Lab | AA Intelligence Index Kimi K2.6 (China) | Moonshot | 54 Nemotron 3 Ultra | Nvidia | 48 Gemma 4 31B | Google | 39 Nemotron 3 Super | Nvidia | 36 gpt-oss-120b | OpenAI | 33
In Focus
Coding Tools Rebuilt Around AI Agents
GitHub turned Copilot from an autocomplete into a platform other software builds on. The Copilot SDK hit general availability June 2, letting developers embed Copilot's agent runtime, planning, tool invocation, file edits, streaming, multi-turn sessions, into their own apps instead of building orchestration from scratch; it ships for Node/TypeScript and Java, registers custom tools, connects MCP servers, and is open to every tier including Free (GitHub). Alongside it came cloud and local sandboxes that give an agent isolated environments to run shell commands in without alarming anyone (GitHub), plus a code-review upgrade that pulls context from issue trackers and docs over MCP and routes security-sensitive pull requests to a higher-reasoning tier (GitHub). VS Code got an agent-first Agents window and remote agents that survive a client disconnect (GitHub), and Visual Studio added a read-only Plan agent that writes an implementation plan before touching any code (GitHub). GitHub also put Copilot on the desktop with a native GitHub Copilot app, built for directing several coding agents in parallel and wired into issues, pull requests, and the full PR lifecycle.
Microsoft's own coding models
At Build, June 2 to 3 in San Francisco, Microsoft shipped its own model stack, the first time it could after its OpenAI exclusivity terms loosened in April. The one to actually try is MAI-Code-1-Flash, a 5-billion-parameter coding model trained inside Copilot's production tool harnesses, scoring about 51% on SWE-Bench Pro while using up to 60% fewer tokens than comparable models, and rolling out June 2 as a default in the VS Code model picker (Microsoft). All seven new MAI models are reachable outside Azure on OpenRouter, Fireworks, and Baseten, with the flagship MAI-Thinking-1, a 35B-active reasoning MoE with a 256K context built without distillation, in private preview (Microsoft AI). The backend got its own releases too: Azure HorizonDB, a managed Postgres built for AI-era workloads, and Project Rayfin, a managed backend-as-a-service on Microsoft Fabric defined through GitHub workflows (Microsoft). Microsoft's edge here is distribution, not raw quality. A coding model that is already a default in VS Code reaches more developers than any standalone launch.
Microsoft Scout, an always-on desktop agent
Build's headline release was Scout, which Microsoft calls its first "Autopilot": an always-on desktop agent for Windows and macOS that carries its own governed Entra identity and acts on the user's behalf without a fresh prompt each time (Microsoft). It is capable. Scout reads and writes a workspace directory, runs shell commands under a three-tier permission model, drives a real browser through Playwright, reaches into Microsoft 365 mail, calendar, Teams, and SharePoint, spawns parallel sub-agents, and fires a heartbeat prompt every 15 to 120 minutes while the user is away. It is built on OpenClaw, the open-source agent framework that went viral earlier this year, repackaged for non-technical users inside Microsoft 365, and it needs both a Microsoft 365 Copilot license and a GitHub Copilot Business or Enterprise seat, which puts Microsoft on both sides of the agent desktop.
Internal planning documents obtained by 404 Media label phase one of Scout's rollout "Make people addicted," part of a stated path from "addictive app to agentic platform"; the internal pilot, codenamed ClawPilot, grew from about 100 daily users to more than 3,000 in a few weeks (404 Media). Read next to Scout's careful defaults, that line is less a contradiction than a roadmap. A persistent agent earns broad standing access fastest from the people who lean on it daily, so the habit is doing real work in the design. Our full breakdown walks the permission-creep loop step by step.
Coding-agent releases and new agent protocols
Cognition retired Windsurf on June 2 and relaunched the IDE as Devin Desktop, an over-the-air update that keeps accounts and keybindings intact but reframes the default surface as an Agent Command Center, a Kanban board for managing every local and cloud agent at once. It replaces the Cascade local agent with a Rust rewrite called Devin Local, roughly 30% more token-efficient, and ships the Agent Client Protocol, an Apache-2.0 spec that lets Codex, Claude Agent, and Gemini CLI run inside any ACP-compatible editor (Devin). OpenAI shipped real Codex tooling rather than a teaser: a Build iOS Apps plugin that views and tests the app in an in-app browser with SwiftUI previews and hot reload, plus a Sites plugin that deploys small web apps OpenAI hosts (OpenAI Devs). Cursor's Composer 2.5, built on Moonshot's Kimi K2.5 base, landed third on the Coding Agent Index at 62, behind Claude Opus 4.7 and GPT-5.5 in Codex, at a tenth to a sixtieth of their per-task cost (Artificial Analysis). And Cloudflare acquired VoidZero, Evan You's company behind Vite, Vitest, Rolldown, and Oxc, keeping all of it MIT-licensed; Vite alone is downloaded more than 100 million times a week, and VoidZero notes more of its tool usage now comes from AI agents than from humans (VoidZero).
In Focus
Usage-Based Billing and AI Funding
If agents are the new users, the question every finance team is suddenly asking is what they cost. Anthropic's own answer arrived buried in an Institute post: as of May 2026, Claude authored more than 80% of the code merged to production, and a typical engineer now merges roughly 8x as much code per day as in 2024 (Anthropic). Anthropic concedes a sentence later that lines of code measure quantity over quality, which is the honest call, the metric counts typing speed, not judgment. The real signal sits under the chart: as volume climbed, the bottleneck slid from writing code to reviewing it. One engineer's quote does more than the graph does, admitting that on the days everything breaks, "I have no idea what I've been up to anymore."
Other vendors changed how they bill for that compute. GitHub moved all Copilot plans onto metered AI Credits starting June 1, with code review now also burning Actions minutes (GitHub). Cursor reworked Teams billing around model choice, giving every seat separate pools for first-party and third-party models and claiming lower costs for 90% of teams (Releasebot). Anthropic splits subscription usage on June 15, moving programmatic Agent SDK and claude -p calls into a separate credit pool, a change aimed at CI pipelines more than hand-driven Claude Code (DevToolPicks). The cost-routing playbook showed up in the wild too: a Harvey and Fireworks AI hybrid setup reportedly beats Claude Opus 4.7 on legal benchmarks at about 61% lower cost by routing each step to the smallest model that can handle it (Digg). The market priced the trend directly. Ramp raised $750M at a $44B valuation on June 4, partly on selling tooling that tracks and caps AI token spend, which it pitches as a third category of business cost after people and vendors (TechCrunch). And the biggest spender of all, Anthropic, confidentially filed its S-1 with the SEC on June 1, the first formal step toward an IPO on a roughly $47B revenue run-rate, up from about $10B a year earlier (CNBC).
In Focus
Federal Bills to Preempt State AI Laws
OpenAI published a blueprint June 2 asking Congress for one federal framework for frontier AI and to preempt state laws covering "the same frontier safety risks," an approach it calls "reverse federalism." Its proposed evaluator, CAISI, would assess models and recommend mitigations but explicitly not block deployments (OpenAI). Two days later, Reps. Obernolte and Trahan unveiled the 269-page Great American Artificial Intelligence Act, a bipartisan House draft that codifies CAISI, funds it at $100M a year through 2029, requires frontier developers to publish risk frameworks and disclose release dates, and preempts state AI laws for three years, which would freeze Colorado's AI Act days before it takes effect June 30 (Roll Call).
Signals
Signals from the Edges
Arena added an Agent Mode
It runs head-to-head benchmarking of GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on multi-step agentic tasks rather than single-turn chat, a more honest test of how models behave inside real agent loops.
cua spiked on GitHub
The trycua/cua project is an open-source set of sandboxes, SDKs, and benchmarks for computer-use agents that drive full macOS, Windows, and Linux desktops.
Hugging Face open-sourced SynthTraces
A minimal codebase from co-founder Julien Chaumond for generating synthetic coding-agent session traces, useful for training or evaluating agents when real session data is scarce.
Guide Labs launched Clarity
An interpretability platform for inspecting and steering the specific concepts driving a model's outputs instead of treating it as a black box.
Perplexity and Chai went vertical
Perplexity Computer connects businesses to 400+ tools for automated workflows, pushing past search into agent territory, and Chai Discovery partnered with Pfizer to put its Chai-3 model into drug discovery and antibody design.
Looking Ahead
What to Watch
- 1
WWDC opens June 8
Apple's keynote and the Platforms State of the Union land Monday; watch the Xcode, Swift, and on-device Foundation Models tracks, plus a reworked Siri reportedly pulling in Gemini through a January partnership. Apple's AI story has been mostly promises, so this is the show-don't-tell one.
- 2
Anthropic's metering split hits June 15
Agent SDK and
claude -pusage moves to a separate credit pool, so audit any CI or third-party agent that authenticates against a subscription before then. - 3
A Claude Sonnet 4.8 rumor is circulating
Developer chatter points to a launch around mid-June on the strength of leaked filter strings, one of which (
opus-4-7) later became a real release. Treat it as rumor until there is a model card. - 4
Devin Local fully replaces Cascade July 1
Update any Cascade-named automations before the old agent is removed.
- 5
More Codex is close
The new iOS and Sites plugins read like the opening moves of a larger push to make Codex run the whole build-and-preview loop, not just write code.
Ted Chiang argued that we only call the chatbots conscious because they produce text, while structurally similar models like AlphaFold and Sora that fold proteins or generate video get none of the same treatment. It is a useful lens for a week whose loudest story was a model writing 80% of its own code. The systems doing the most work are the ones that go unnoticed, right up until the day nobody can read as fast as they write.