
Issue #17 · Weekly Digest
Weekly AI Dev News Digest: April 20 - 24, 2026
Five labs shipped frontier models in five days. DeepSeek's V4 runs 7x cheaper than Opus on Huawei chips instead of NVIDIA, and Anthropic spent the same week publishing a postmortem on why Claude has been getting worse. The frontier got bigger, cheaper, and messier in one week, and the two labs with the most to lose from any of that had bad weeks for other reasons entirely.
GPT-5.5 landed Thursday with native desktop automation, a 1.1M token context window, and a doubled per-token price that OpenAI says is offset by 40% fewer tokens per task. It's OpenAI's new Intelligence Index leader, and on a normal week it would have owned the cycle. This wasn't a normal week.
Four other frontier-class releases landed within 120 hours of it. DeepSeek V4 shipped at a fraction of the price on Huawei chips instead of NVIDIA, Alibaba's Qwen3.6-27B ran on a MacBook and beat Anthropic on terminal coding, Claude Opus 4.7 went GA on Bedrock, and ChatGPT Images 2.0 dominated Image Arena within a day of release. The pricing floor fell out, the chip stack cracked, and the two US labs with the most to lose from either had bad weeks for separate reasons.
$0.14/M
DeepSeek V4-Flash input
7x cheaper
V4-Pro output vs Opus
75%
of new Google code is AI-generated
$60B
SpaceX option on Cursor
22 sec
Wiz threat handoff (from 8 hours)
In Focus
Five Frontiers in Five Days
DeepSeek V4 is the one that rewrites the brief. V4-Pro (1.6T total / 49B active) and V4-Flash (284B / 13B active) shipped Friday as MIT-licensed MoE models with a 1M context window and a new Hybrid Attention Architecture that uses 27% of V3.2's inference FLOPs and 10% of its KV cache at 1M tokens. Pro runs at $1.74 in / $3.48 out per million tokens, Flash at $0.14 / $0.28. Pro undercuts Opus 4.7 on output by roughly 7x. The second story is the chip stack: DeepSeek optimized V4 for Huawei Ascend 950 and Cambricon chips instead of NVIDIA and didn't give Western chipmakers early access. (Simon Willison)
Alibaba's Qwen team answered with Qwen3.6-27B, a dense Apache-2.0 multimodal model that scores 77.2% on SWE-bench Verified and 59.3% on Terminal-Bench 2.0. The Terminal-Bench number matches Claude Opus 4.5 exactly and beats Alibaba's own 397B-parameter MoE. A 4-bit quantized GGUF build weighs 16.8 GB and fits on a 32 GB MacBook Pro. Flagship-class coding agent performance, no US vendor in the loop, runs offline. (Hugging Face)
GPT-5.5 brings native computer use into ChatGPT and Codex and scores 82.7% on Terminal-Bench 2.0. Pricing doubles to $5 in / $30 out per million tokens, but OpenAI's internal numbers claim 40% fewer tokens used per task, so the effective bill depends on your workload. API access is "very soon" behind separate safeguards. Anthropic's Claude Opus 4.7 went generally available on Amazon Bedrock the same week with dynamic capacity allocation, adaptive thinking budgets, the full 1M context, high-resolution image support, and up to 10,000 RPM per account per region. Published scores: 64.3% on SWE-bench Pro, 87.6% on SWE-bench Verified. (AWS) And OpenAI's ChatGPT Images 2.0 shipped Tuesday with native reasoning in the architecture, hitting #1 on Image Arena within 12 hours by a +242 point margin, the largest lead ever recorded there. Instant mode is free-tier eligible. DALL-E 2 and 3 both retire May 12. (OpenAI)
In Focus
The $20 Plan Can't Hold the Product Anymore
Anthropic had the loud week. On Tuesday afternoon the public pricing page swapped a checkbox for an X on Claude Code under Pro, and the support docs retitled to Max-only. No email, no blog post, no changelog. After heavy developer backlash, Head of Growth Amol Avasare called it "a small test on ~2% of new prosumer signups," and Anthropic reverted the public page within hours even as the server-side experiment kept running. OpenAI's Codex lead Thibault Sottiaux jumped in on X to note Codex "will continue to be available both in the FREE and PLUS ($20) plans." (Simon Willison)
Thursday's postmortem from Anthropic is the document to read. It traces recent "Claude got dumber" reports to three separate regressions that all stacked: a March default-effort downgrade from high to medium on Opus 4.6 in Claude Code (reversed April 7), a March prompt-caching bug that dropped reasoning history every turn after a one-hour idle instead of once (fixed April 10), and an April 16 system prompt addition limiting output to 25 words between tool calls and 100 words in final responses, which cost ~3% on coding evals (reverted April 20). All three fixes shipped by April 20, and usage limits are being reset for all subscribers. It's an unusually candid writeup about how the changes passed reviews and evals. (Anthropic)
The third story is the Acceptable Use classifier on Opus 4.7, which is refusing legitimate security research, computational structural biology, and even Russian-language prompts. The Register counted 30+ GitHub issues filed in April. The director of LSU's Cyber Center posted that Claude refused to help him edit his own cybersecurity textbook. The stated remediation path is applying to Anthropic's Cyber Verification Program, which in practice favors researchers with a public CVE or conference talk track record, not the early-career researchers most likely to benefit from AI-assisted workflows. (The Register) Fortune's Kemba Walden extends the argument in an op-ed this week: the Mythos Preview's 83% first-try exploit-creation rate exposes gaps in US critical infrastructure defense that under-resourced sectors can't close without public-private investment. (Fortune)
Amazon also put another $5B into Anthropic this week, bringing the total to $13B with up to $20B more tied to milestones, and secured 5 GW of compute capacity via Graviton plus Trainium2 through Trainium4. Project Rainier is expanding on roughly 500K Trainium2 chips. (Anthropic) On the other end of the capital stack, SpaceX announced an option to acquire Cursor for $60B later this year (or pay a $10B termination fee), preempting Cursor's in-progress $2B round at a $50B valuation. SpaceX says it will use its Colossus cluster, described as "a million H100 equivalent," to scale Composer. Cursor still resells Claude and GPT while both vendors compete with it directly, which is why this deal exists now. (TechCrunch)
In Focus
Google's Full-Stack Bet
Google Cloud Next landed alongside the frontier chaos with the most complete stack story of the quarter. The 8th-gen TPU line split in two: TPU 8t for training, scaling to 9,600 chips and 2 PB of shared HBM per superpod at 121 ExaFlops and 3x compute per pod over Ironwood, and TPU 8i for inference, with 288 GB HBM per chip, 384 MB on-chip SRAM, and an 80% better perf-per-dollar claim over the previous generation. Both land later in 2026 on Google's Axion ARM CPU hosts with native JAX, PyTorch, SGLang, and vLLM support, plus bare-metal access. (Google)
Above the silicon, Vertex AI got rebundled as Gemini Enterprise Agent Platform: Agent Designer, a unified Inbox, long-running agents, Projects, and Skills. The Agentic Data Cloud now includes a cross-cloud Lakehouse and a Knowledge Catalog that grounds agents in business context without the usual data engineering overhead. Gemini Embedding 2 shipped GA alongside it with multimodal support across 100+ languages. (Google) Workspace Intelligence maps the relationships between your emails, chats, files, collaborators, and active projects into a shared knowledge graph that Gemini agents can reason over. "Ask Gemini in Chat" becomes a unified command line for work. A new Workspace MCP Server exposes the same context layer to outside agents. (Google Workspace) And Wiz surfaced for the first time as Agentic Defense, with Google name-checking a drop in initial-compromise-to-secondary-handoff time from 8 hours to 22 seconds as the gap the platform is pitched against. (Google)
The line that will get quoted the most came from Sundar's keynote: 75% of new Google code is AI-generated. It's AI-generated code that engineers then approve, not autonomous merges. Google also demoed a recent complex migration running 6x faster than a year ago, and says its Gemini macOS app went from Swift prototype to shipping release in days. (Google)
In Focus
The Supply Chain Is the Soft Target
The prettiest post-incident writeup of the week is EveryDev.ai's reconstruction of the Vercel breach: February Lumma Stealer infection on a Context.ai employee's laptop via a Roblox cheat download, stolen Google Workspace OAuth tokens, one of which belonged to a Vercel employee who had authorized Context.ai's AI Office Suite with full Workspace scopes. The attacker pivoted into Vercel's internal environments and enumerated non-sensitive env vars, which by default decrypt to plaintext when read internally. Vercel flipped the default to sensitive within 24 hours of disclosure, but older non-sensitive vars stay in the old state until reclassified, and old deployments hold their build-time values until redeployed. (EveryDev.ai)
The pattern keeps showing up. Privacy researcher Alexander Hanff found that Claude Desktop for macOS writes Native Messaging manifest files into Chromium browser paths, including for browsers the user hasn't installed yet, pre-authorizing three Chrome extension IDs to run local binaries. The manifest is rewritten every time Claude Desktop launches, and removing it without uninstalling the app doesn't work. Anthropic hasn't responded publicly. Independent security researcher Noah Kenney pushed back on the "spyware" framing but agreed the technical claims are reproducible and the behavior breaks a widely understood trust boundary. (The Register)
On the delivery side, Malwarebytes caught google-antigravity[.]com (lookalike typosquat, served via sponsored search results) shipping a 138 MB installer that installs the real Antigravity IDE, then runs an extra PowerShell step that deploys the Amatera infostealer targeting browser sessions, saved passwords, autofill, and crypto wallets. On Windows it spawns via mshta.exe to look like a trusted Microsoft binary. On macOS it stages through a base64-obfuscated second script. (Malwarebytes) And Pillar Security disclosed a prompt-injection RCE in the actual Antigravity IDE: the find_by_name tool passed user input directly to fd without validation, so injecting the -X (exec-batch) flag turned a file search into arbitrary code execution. The technique bypassed Antigravity's Strict Mode entirely. Disclosed January 7, patched February 28, publicly detailed this week. (The Hacker News)
Signals
Signals from the Edges
Google Labs open-sources DESIGN.md for AI coding agents
Google Stitch published an Apache 2.0 spec on GitHub: a single markdown file at your project root, YAML front matter for tokens plus prose for rationale, read as persistent context by Claude Code, Cursor, Copilot, and Antigravity. Ships with npx @google/design.md lint, which validates token integrity and checks WCAG AA contrast. The AGENTS.md pattern applied to design systems. The community awesome-design-md repo already has reference files for Stripe, Vercel, Figma, and Supabase.
OpenAI open-weights a PII redaction model
Privacy Filter is a 1.5B-parameter model (50M active), derived from gpt-oss, Apache 2.0, 128K context, bidirectional token classifier architecture. 96% F1 on PII-Masking-300k. Runs locally in a browser or on a laptop. Preprocessing infrastructure, not a frontier model.
Codex Chronicle adds opt-in screen memory in preview
Sandboxed background agents periodically capture screen content, OCR it, and write markdown memory files to disk, unencrypted. ChatGPT Pro on macOS only, blocked in the EU, UK, and Switzerland. Security researchers immediately compared it to Microsoft's Recall. Burns rate limits quickly and widens prompt-injection exposure from on-screen content.
GitHub Copilot tightens plans and exposes a token meter
Pro, Pro+, and Student signups in GitHub Copilot are paused (Free stays open), Opus is stripped from Pro, Opus 4.7 is Pro+ only at 7.5x request multiplier through April 30. The actual change that matters: a token-based usage meter running alongside the legacy premium-request count. You can have premium requests left over and still get throttled. Cursor walked this road in June 2025 before killing "fast requests" for usage-based credits entirely.
Microsoft Teams SDK ships a three-line HTTP adapter
The renamed Teams AI Library now includes ExpressAdapter in TypeScript and FastAPIAdapter in Python that wrap your existing HTTP server and inject a POST /api/messages route. Docs walk through running a Slack Bolt bot and Teams on the same Express server, bridging a LangChain chain, and forwarding messages to an Azure AI Foundry agent.
AWS Interconnect reaches general availability
Layer 3 private connections between AWS VPCs and Google Cloud (Azure and OCI later in 2026) with built-in MACsec encryption over the AWS backbone. Interconnect Last Mile auto-provisions 4 redundant connections across 2 locations with BGP, MACsec, and Jumbo Frames by default. Spec is on GitHub under Apache 2.0.
Microsoft wires Claude Mythos Preview into its SDL toolchain
Developers can run vulnerability detection earlier in the dev cycle. The integration lands during active external scrutiny of Mythos's own release constraints.
Gil Pignol on Meta's $80B metaverse pattern
Medium essay framing Meta's April 8 switch from open-source Llama to proprietary Muse Spark as the same playbook that produced the metaverse writedown: pick a platform layer, pour capex into owning it, pivot when the extraction mechanism fails to materialize. Useful less for the Meta specifics than for the framework about which AI companies have a natural interface between models and revenue, and which ones are spending to manufacture one.
Looking Ahead
What to Watch
- 1
GPT-5.5 API rollout
OpenAI said "very soon" and gated it behind separate safeguards. The pricing/effective-cost picture changes meaningfully once teams can benchmark the 40% fewer tokens per task claim on their own workloads.
- 2
DeepSeek's non-NVIDIA chip stack
One frontier model on Huawei Ascend plus Cambricon isn't a trend. Two or three more ships over the next 90 days and the US export-control strategy needs a rewrite, not an amendment.
- 3
The Anthropic trust tax
Three simultaneous stories (Pro pricing, the quality postmortem, AUP over-refusal) all erode the willingness of developers to stake a year on Claude. Watch whether April 20's fixes hold and whether the Cyber Verification Program actually opens access or formalizes gatekeeping.
- 4
Copilot's token meter
GitHub is the last major AI coding tool still priced primarily in requests. The meter is the leading indicator of the switch. If Copilot follows Cursor into pure usage-based pricing, the whole "flat monthly sub with soft limits" era ends in one quarter.
- 5
SpaceX/Cursor closing
A $60B option plus a $10B termination fee is a real number, not a signaling round. Whether it closes (and whether Anysphere keeps reselling Claude and GPT through the transition) will reshape where indie developer tooling sits in the stack.
A year ago the question was which model to use. This quarter it's which stack is worth a year of work, and the hard part isn't the capital or the chips. It's figuring out which lab will still be shipping cleanly in twelve months.