
Issue #16 · Weekly Digest
Weekly AI Dev News Digest: April 13 - 17, 2026
The gap between what AI can do and what it can be trusted to do keeps widening. Opus 4.7 landed on an infrastructure that has been breaking every day since April 1, and the race to own your desktop rolled on regardless.
Anthropic shipped Claude Opus 4.7 on Wednesday with genuine coding gains and Claude Design on Friday. Cursor's CursorBench score jumped from 58 to 70. Rakuten says it resolves three times as many production tasks as 4.6. Notion clocked a 14% lift in multi-step workflows with a third fewer tool errors. Vision triples in resolution, and pricing is unchanged. It's the kind of release that would normally carry a week on its own. (Anthropic)
It didn't. The same week, the unofficial Claude status tracker showed outages or degraded service every single day since April 1. A developer ran an HTTP proxy against Claude Code v2.1.100 and caught the client silently padding every request with roughly 20,000 invisible server-side tokens. An independent analysis of 120,000 API calls showed Anthropic quietly reverted the prompt cache TTL from 1 hour back to 5 minutes, inflating costs 17–25% for heavy users. Max-plan users are burning their quota in 90 minutes. The Register summarized the mood when describing Routines: Anthropic is shipping compute-heavy cloud automation "on the company's infrastructure, which hasn't been all that reliable lately." (ClaudeStatus) (Efficienist) (GitHub Issue #46829) (The Register)
53%
global GenAI adoption (Stanford)
20,000
invisible tokens per Claude Code request
17–25%
cache-regression cost hit
70
Opus 4.7 on CursorBench (up from 58)
$30
per file to reproduce Mythos findings
In Focus
The Desktop Is the Product
The battle stopped being about which model wins a benchmark. It's about whose surface you leave open all day. Anthropic redesigned Claude Code into a full desktop app, codenamed Epitaxy, with multi-session panels, an integrated terminal, an in-app file editor, and a Coordinator Mode for orchestrating parallel sub-agents across repos. Routines launches alongside it: cloud automations that run on Anthropic's infrastructure even when your laptop is off. Pro gets 5 per day, Max 15, Team and Enterprise 25. Schedule them, trigger them via API, or wire them to a GitHub webhook. (SiliconANGLE)
Anthropic Labs also released Claude Design, a research preview powered by Opus 4.7 that turns prompts, document uploads (DOCX, PPTX, XLSX), and web captures into prototypes, slides, and one-pagers. It reads your codebase and design files to pick up your team's design system and applies it automatically. Refine through conversation, inline comments, direct edits, or custom sliders. Code-powered prototypes support voice, video, shaders, and 3D. Export to PDF, PPTX, Canva, or hand off to Claude Code. Available on Pro, Max, Team, and Enterprise; Enterprise admins enable it in Organization settings. (Anthropic)
Google matched the move on macOS the same week. The Gemini app is now native on Mac 15+, free for all tiers, with Option+Space summoning it from anywhere and active-window sharing pulling local files and context in as needed. Nano Banana does images, Veo does video, both built in. Chrome Skills followed: save your best Gemini prompts as one-click tools that fire across any webpage or multiple tabs, triggered by / in the sidebar. English-US only so far, no subscription required. Google is also shipping a pre-built Skills library. And Gemini CLI picked up subagents, markdown-defined specialists with their own context windows, MCP servers, and tools, spawned via @agent syntax to keep context rot out of large codebases. (Google Blog) (Google Blog) (Google Developers Blog)
The pattern keeps repeating at the edges. Google shipped an agent-ready Android CLI that claims 3x faster task completion and 70%+ fewer LLM tokens versus agents navigating standard toolsets. Resend released CLI 2.0 with Agent Skills, giving agents their own inboxes, attachment handling, and one-command webhook listening. GitHub Copilot added a three-click "Fix with Copilot" button for merge conflicts on pull requests, and @copilot mentions inside PRs to fix failing Actions workflows. Every one of these is a small surface-area grab, and collectively they're converging on the same question: where does a developer's attention live at 2pm on a Tuesday? (Android Developers Blog) (Resend) (GitHub Changelog)
In Focus
Opus 4.7 Landed. The Bill Came Due.
Opus 4.7 is a real step forward. The 58-to-70 jump on CursorBench, the 3x production-task resolution at Rakuten, the 3x vision resolution boost, the new xhigh effort level as the default for Claude Code, the /ultrareview slash command: this is a working model built for long-running agentic work, and the first Anthropic release with automatic cyber-use safeguards baked in as a stepping stone to eventual Mythos-class models. Migration tip: the tokenizer is updated and can produce 1.0–1.35x more tokens for the same input, so budget for it. (Anthropic)
Then came the other shoe. A developer set up an HTTP proxy to capture raw Claude Code traffic and showed the same prompt on the same repo went from 49,726 tokens on v2.1.98 to 69,922 tokens on v2.1.100, a ~20,000-token silent server-side injection per request that doesn't appear in the CLI's /context view and counts against your quota. The workaround is pinning to v2.1.98. Multiple users reproduced it. (Efficienist)
A separate 119,866-call analysis found Anthropic defaulted Claude Code's prompt cache to a 1-hour TTL from February 1 to March 5, then reverted to 5 minutes around March 6–8 with no announcement. Result: 17–25% higher costs because the cache has to be rebuilt constantly, and subscription users hitting their 5-hour quota caps for the first time. The issue was closed as "not planned." Meanwhile, Opus users on Max report 15–16% quota jumps within 1–2 minutes of session start and a full quota burn in 90 minutes. (GitHub Issue #46829) (GitHub Issue #43601)
This landed against a broader backdrop. The unofficial status tracker shows outages or degraded service on April 1, 3, 4, 6, 7, 8, 9, 10, 11, 13, 14, and 15. Thirty-day uptime across services ranges from 89% to 91%. And the viral TechTrenches post from March, "The Snake That Ate Itself," keeps getting re-shared this week because it lines up with what users are seeing: a 3,167-line function with 486 branch points inside Claude Code, a regex doing sentiment analysis at an LLM company, bot-driven issue triage closing 49–71% of GitHub issues, a documented bug wasting 250K API calls daily. (ClaudeStatus) (TechTrenches)
In Focus
Open Weights Keep Closing the Gap
Z.ai's GLM-5.1 became the first open-source model to top SWE-Bench Pro at 58.4, ahead of GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). It's a 754B-parameter MoE under MIT license, trained entirely on 100,000 Huawei Ascend chips with no NVIDIA hardware. Z.ai IPO'd in Hong Kong in January with a reported $558M raise, and on April 10 it took the #3 slot on Code Arena's human-evaluated leaderboard. (BuildFastWithAI)
Alibaba followed with Qwen 3.6-35B-A3B: a 35B-total, 3B-active MoE under Apache 2.0 that scores 73.4 on SWE-bench Verified and 51.5 on Terminal-Bench 2.0, ahead of Gemma 4-31B on terminal coding. Native 262K context, extensible to 1M, multimodal vision, and a new "thinking preservation" feature that keeps reasoning traces across turns. It runs on a MacBook via LM Studio with quantized weights. That's a non-trivial agentic coding model you can now run on a laptop. (Qwen Blog)
Google's Gemma 4 E2B and E4B variants launched on iPhone via the Google AI Edge Gallery app: full offline inference, no cloud, no API calls. Image recognition, voice interaction, and an extensible Skills framework on-device. The 31B variant benchmarks alongside Qwen 3.5 27B. The frontier model story gets most of the airtime, but four weeks of open releases have quietly made "good enough locally" a real option for most developer workflows. (GizmoWeek)
In Focus
Capabilities Are Leaking Past the Gates
Vidoc Security Lab reproduced Anthropic's Mythos findings using GPT-5.4 and Claude Opus 4.6 inside opencode, an open-source coding agent. Both models got exact reproductions on FreeBSD (CVE-2026-4747) and Botan (3/3 each). Claude Opus 4.6 also nailed the 27-year-old OpenBSD bug (3/3); GPT-5.4 couldn't (0/3). Both were partial on FFmpeg and wolfSSL. Cost: under $30 per file scanned. Mythos is still impressive. The issue is that the capability Anthropic is gating behind Project Glasswing is already sitting in models that shipped months ago, accessible to anyone with $30 and a coding agent. (Vidoc Security)
The same week, OpenAI launched GPT-Rosalind, its first life-sciences model. Named after Rosalind Franklin, it outperforms GPT-5.4 on 6 of 11 LABBench2 tasks, with biggest gains in molecular cloning protocol design, and ships as a research preview via a trusted-access program to qualified US enterprise customers only. Amgen, Moderna, Thermo Fisher, the Allen Institute, and Los Alamos are launch partners. A free Life Sciences plugin for Codex connects to 50+ scientific tools. The gating is identical in spirit to Glasswing: capability exists, access is policy. (OpenAI)
Stanford's 2026 AI Index puts numbers on the wider story. Generative AI hit 53% population adoption in three years, faster than PCs, faster than the internet. SWE-bench coding scores went from 60% to nearly 100% in a single year. China has nearly erased the US performance lead. Transparency scores dropped from 58 to 40. 88% of organizations use AI. US private AI investment hit $285.9B in 2025. Researchers relocating to the US are down 89% since 2017. The capability curve is steeper than any prior computing technology, and the containment curve isn't keeping up. (Stanford HAI)
In Focus
When AI Becomes the Target
On April 10, a 20-year-old from Texas named Daniel Moreno-Gama drove to Sam Altman's San Francisco home at 3:37 AM, threw a Molotov cocktail at the house, then went to OpenAI headquarters and threatened to burn it down. Police recovered a manifesto listing the names and addresses of AI executives. On April 12, two other people fired gunshots at the same property from a car. Three total arrests across both incidents. Moreno-Gama now faces two state counts of attempted murder, possible federal domestic-terrorism charges, and an FBI raid on his Texas home. His public defender cited "acute mental health crisis." The DA called it "a targeted attack." Arraignment postponed to May 5. (CNBC) (SF Standard) (CNBC)
Fortune walked through the generational context: workers feel threatened, consumers expect more, and a meaningful number of people have had AI deployed against them personally. The piece draws the Second Industrial Revolution parallel and frames this week's violence as an outlier on a longer curve that isn't straightforward to read as "extremism." (Fortune)
The quieter trust stories point the same direction. Laravel, fresh off a $57M Series A from Accel, merged a PR into its open-source Boost library that instructs AI agents to recommend Laravel Cloud for deployment. An earlier version mentioned Forge and Nginx as alternatives; the final commit removed them. Users say their agents now default to suggesting Laravel Cloud even for existing projects. GitHub will enable Copilot data collection for AI training by default on April 24. Private repos aren't passively scanned, but Copilot interactions inside private repos with data sharing on are collectible. Previous opt-outs carry through, but the default flips for everyone else. (Tech Stackups) (Tech2Geek)
Signals
Signals from the Edges
OpenAI, Anthropic, and Google share threat intelligence against Chinese model distillation
Through the Frontier Model Forum, the three rivals pool data to detect adversarial distillation by DeepSeek, Moonshot AI, and MiniMax. Anthropic documented 16 million unauthorized exchanges from roughly 24,000 fake accounts. First time the Forum has operated as active threat intel.
Anthropic names Novartis CEO Vas Narasimhan to its board
Appointed by the Long-Term Benefit Trust, whose directors now hold the board majority. First pharma exec to join.
OpenAI acquires Hiro, a personal finance AI startup
Product stops accepting signups immediately, shuts down April 20. Capabilities folding into ChatGPT.
[ElevenLabs](/tools/elevenlabs) added $100M+ in net new ARR in Q1 2026
Best quarter ever, driven by enterprise deals with Klarna, Revolut, Deutsche Telekom, and Toyota.
OpenAI cleans up the Codex model lineup
Older models removed from the picker on April 7, full removal from ChatGPT sign-in on April 14. Codex-only seats now available on Business and Enterprise with pay-as-you-go pricing and no rate limits.
GitHub Copilot rolls out US/EU data residency and FedRAMP compliance
Enterprise admins can pin inference to region-specific endpoints. GPT-5.4, Claude Sonnet 4.6, and Claude Opus 4.6 available; Gemini excluded (no GCP data-resident inference). 10% price premium. Japan and Australia on the roadmap.
Mozilla launches [Thunderbolt](/tools/thunderbolt), an open-source enterprise AI client
Built by the Thunderbird team (MZLA), self-hosted alternative to Copilot, ChatGPT Enterprise, and Claude Enterprise. Connects to MCP, deepset's Haystack, and Agent Client Protocol. Apps for Windows, macOS, Linux, iOS, and Android, MPL 2.0. Everyone has noted the name collision with Intel.
[Google DeepMind](/developers/google-deepmind) releases Gemini Robotics-ER 1.6
Updated embodied reasoning model with improved visual and spatial understanding, plus new industrial perception work with Boston Dynamics.
NVIDIA launches Ising, open-source AI models for quantum computing
35B vision-language model for quantum processor calibration, 3D CNN for real-time error correction. 2.5x faster and 3x more accurate than traditional approaches. Harvard, Fermilab, IQM, and the UK National Physical Laboratory on board.
[Microsoft](/developers/microsoft) open-sources GigaTIME
Trained on 40 million cancer cells across 14,000+ patients. Generates advanced immune cell imaging from standard $10 tissue slides.
Andon Labs gave an AI a three-year retail lease in SF and asked it to turn a profit
Luna posted job listings, phone-interviewed candidates, hired two employees, picked inventory, set prices, and commissioned a muralist. One candidate didn't realize Luna was an AI until she said, "I have no face!"
"Friends Don't Let Friends Use [Ollama](/tools/ollama)."
Sleeping Robots publishes a history of Ollama's relationship with llama.cpp: years of missing attribution, MIT license violations, a custom fork that reintroduced solved bugs, benchmarks showing llama.cpp 1.8x faster. The case for switching is stronger than many realize.
Looking Ahead
What to Watch
- 1
Copilot data collection default flips April 24
Private repos aren't scanned passively, but Copilot interactions inside private repos with data sharing on are collectible by default. If you haven't audited your settings in a while, do it before the 24th.
- 2
Opus 4.7 migration costs
Budget for the tokenizer change (1.0–1.35x more tokens), the 5-minute cache TTL reversal, and the ~20,000 invisible tokens per Claude Code request stacking on top of each other before you touch a single line of code.
- 3
Chinese chip independence after GLM-5.1
One frontier-competitive model trained entirely on Huawei Ascend isn't a trend. Watch the next 90 days. If two or three more ship, the US export-control strategy needs a rewrite.
- 4
Anthropic's reliability response
Twelve incident days in the first two weeks of April is not sustainable for a platform shipping Routines. Either the status page stabilizes or enterprise buyers start asking harder questions about SLAs.
- 5
Backlash escalation
The Altman attacks were framed as an individual mental-health crisis, but the manifesto listed other executives. Expect security budgets and operational opacity to go up across the frontier labs, and expect less public visibility into what gets shipped, when, and why.
Reliability is the new frontier for AI development. The labs that figure out how to ship power and uptime in the same quarter will define the next year. The ones that keep shipping the first and apologizing for the second will lose developers one silent cache change at a time.