Joe Seifi's avatar

AI Dev News Digest - Dec 12th, 2025

By Joe Seifi 0 comments • 44 minutes ago
GPT-5.2, MCP, and Disney at the Agentic AI Foundation ribbon-cutting ceremony

This week OpenAI called a “code red” after Gemini 3 started misbehaving, then rolled out GPT‑5.2 less than two weeks later. The headline is that, but the bigger story happened on December 9th.

Anthropic, OpenAI and Block, companies that normally compete, together released the Agentic AI Foundation press release. Their logos were all over it. MCP joined the Linux Foundation. AGENTS.md opened to the public. Block gave Goose to the project. Seeing these rivals cooperate feels like cats sharing a bowl of water.

Other news: Disney paid OpenAI a billion dollars and asked them to put Mickey Mouse into Sora. Claude Code appeared in Slack. The Pentagon plans to deploy Gemini to 3 million people by Friday. Harness pulled $240 million because “the money comes after the code is written.” It’s been one of those weeks when you scroll through news and everything shifts.

The race for foundation models is still alive, but the real battle now is who controls the infrastructure that powers agents.

Foundation Models

  • OpenAI releases GPT-5.2. The "code red" response to Gemini 3 is here. Three flavors: Instant (speed), Thinking (reasoning/coding), and Pro (max accuracy). Knowledge cutoff moved to August 2025. Better at long context, tool use, and vision. Already in GitHub Copilot. (OpenAI)
  • GPT-5.2 available in GitHub Copilot. You can now select GPT-5.2 in VS Code, github.com, GitHub Mobile, and Copilot CLI. Admins need to enable it for Enterprise/Business plans. (GitHub)
  • OpenAI warns future models pose "high" cybersecurity risk. GPT-5.1-Codex-Max scored 76% on capture-the-flag exercises (up from 27% with GPT-5). They're establishing a Frontier Risk Council and testing Aardvark, a tool for finding security vulnerabilities. (Axios)
  • Google releases Deep Research API for developers. The reimagined Gemini Deep Research agent is now accessible via the Interactions API. Developers can embed autonomous research capabilities into their apps. Google also open-sourced DeepSearchQA, a benchmark for complex web research with 900 multi-step tasks. Pricing: $2 per million input tokens. (Google)
  • Mistral updates its Devstral coding-model family, releasing Devstral 2 and Devstral Small 2. Devstral 2—the flagship large model—hits 72.2% on SWE-Bench Verified and comes in up to 7× cheaper than Sonnet. Devstral Small 2 (24B) targets local and edge use, running on consumer GPUs. All models shipped on Dec 9, 2025, alongside Mistral Vibe CLI, a terminal-first agentic coding tool built on Devstral. (EveryDev)
  • Agentic AI Foundation launches under Linux Foundation. OpenAI, Anthropic, and Block co-founded it. Anthropic donated MCP, OpenAI contributed AGENTS.md, Block gave Goose. Google, Microsoft, AWS, Bloomberg, and Cloudflare are platinum members. This is the "USB-C for AI agents" moment. (Linux Foundation)
  • Anthropic donates MCP to Linux Foundation. One year after launch, MCP has 10,000+ active public servers, adoption by ChatGPT, Cursor, Gemini, Copilot, and VS Code, plus 97M+ monthly SDK downloads. Governance stays community-driven. (Anthropic)
  • GitHub MCP Server gets tool-specific config and Lockdown mode. You can now enable only the tools you need via X-MCP-Tools header for 60-90% context window savings. Lockdown mode restricts content from untrusted contributors in public repos. Migrated to official MCP Go SDK. (GitHub)
  • Google launches managed MCP servers. Google Cloud now offers fully managed, remote MCP servers for Maps, BigQuery, Compute Engine, and Kubernetes Engine. Just paste a URL instead of spending a week configuring connectors. Works with Gemini, Claude, and ChatGPT as clients. (TechCrunch)
  • NVIDIA and Lakera release agentic AI security framework. New taxonomy maps component risks to system harms for autonomous agents. Includes Agent Red Teaming via Probes methodology and a dataset with 10K+ traces from attack/defense runs. Covers memory poisoning, tool misuse, and privilege compromise. (Help Net Security)

AI Coding Tools

  • Cursor 2.2 ships Debug Mode, Plan Mode diagrams, and multi-agent judging. Debug Mode instruments your code with runtime logs, spins up servers to capture execution data, and feeds it back to the agent. Plan Mode now generates inline Mermaid diagrams. When running parallel agents, Cursor automatically judges which solution is best. (Cursor)
  • VS Code 1.107 unifies agent sessions and adds Agent HQ. Agent Sessions are now integrated into the Chat view instead of a standalone panel. Local agents keep running when you close chat. Background agents work in isolated Git worktrees. "Continue in…" lets you hand off a local session to cloud. Basically a mission control for multi-agent workflows. (VS Code)
  • LangChain ships Polly and LangSmith Fetch CLI for agent debugging. Polly is an AI agent built into LangSmith that debugs your agents—analyzes traces with hundreds of steps, spots patterns across conversations, and engineers better prompts. LangSmith Fetch is a CLI that pulls traces directly into your terminal for Claude Code or Cursor to analyze. (LangChain, LangChain)
  • Artificial Analysis releases Stirrup agent harness and GDPval-AA leaderboard. Stirrup is a lightweight, open-source framework for building agents that "gets out of the way and lets the model choose its approach." GDPval-AA is their leaderboard for OpenAI's GDPval benchmark. Models get shell access and web browsing in an agentic loop, with Elo ratings from blind pairwise comparisons. (GitHub, Artificial Analysis)
  • Claude Code coming to Slack. Beta feature lets devs tag @Claude in Slack to spin up complete coding sessions. It reads channel context for bug reports or feature requests, posts progress in threads, and opens PRs. Workflow-first, not just snippets. (TechCrunch)
  • Claude Code updates: async agents, named sessions, stats. You can now run agents asynchronously while you work, name sessions with /rename, and see usage stats with /stats. Added .claude/rules/ support and image dimension metadata for coordinate mappings. (Anthropic)
  • Kilo Code raises $8M seed. GitLab co-founder Sid Sijbrandij's open-source coding agent hits 750K downloads and #1 on OpenRouter. Positions itself as the Cursor alternative with 500+ model options, no rate limits, and transparent pricing. Cota Capital led. (CNBC)
  • Black Duck launches Signal for AI-powered application security. Agentic AI solution uses MCP services to find, prioritize, and fix vulnerabilities across source code, binaries, and supply chain. Integrates directly with GitHub Copilot, Cursor, Claude Code, and Gemini. Language-agnostic, trained on 20 years of security data. (Black Duck)

Enterprise & Partnerships

  • Accenture and Anthropic launch multi-year partnership. Accenture is forming the Accenture Anthropic Business Group with ~30,000 professionals trained on Claude. Claude Code goes to tens of thousands of Accenture devs. First joint offering helps CIOs measure and scale AI-powered software development. (Accenture)
  • OpenAI partners with Deutsche Telekom. Multi-year deal to bring AI to 261 million customers across Europe. Deutsche Telekom gets early access to alpha-phase models. ChatGPT Enterprise rolling out internally. Pilots start Q1 2026. (Deutsche Telekom)
  • US DoD selects Google Gemini for GenAI.mil. 3 million military and civilian personnel will have Gemini for Government on their desktops by end of week. First AI deployed on the new GenAI.mil platform. IL5 authorized for sensitive data. (Google Cloud)
  • Disney invests $1B in OpenAI, opens Sora to its characters. Three-year licensing deal gives Sora and ChatGPT Images access to 200+ Disney, Marvel, Pixar, and Star Wars characters starting 2026. Mickey Mouse, Iron Man, Darth Vader all included. No talent likeness or voices. (CNBC)
  • Cursor CEO says no IPO planned, reveals 80% internal ticket automation. Michael Truell at Fortune Brainstorm AI: Cursor hit $1B ARR, has 300+ employees, and automated 80% of internal support tickets with AI. Focus is on serving teams as the "atomic unit" and expanding beyond code writing. (TechCrunch)
  • Slack CEO joins OpenAI as first Chief Revenue Officer. Denise Dresser leaves Salesforce after 14+ years to lead OpenAI's global revenue strategy. The hire signals OpenAI is serious about profitability as it manages multibillion-dollar losses and $1 trillion in financial commitments. (Fortune)
  • US Department of Transportation deploys Salesforce Agentforce. AI agents will provide 24/7 citizen support, analyze traffic and safety data for real-time alerts, and review grant applications. USDOT is unifying operations on the Agentforce 360 Platform to manage billions in federal infrastructure grants. (Morningstar)
  • Dell in talks to acquire Israeli AI startup Dataloop. Potential deal would deepen Dell's push into enterprise AI infrastructure. Dataloop's platform handles labeling, managing, and processing unstructured data for training AI models. Company has raised ~$50M from NGP Capital, Alpha Wave Global, and others. (Calcalist)

Funding & Startups

  • Harness raises $240M Series E at $5.5B valuation. Goldman Sachs led. Harness AI automates the "after-code" phase—testing, deployment, security, compliance—using AI agents and a Software Delivery Knowledge Graph. On track for $250M ARR. (TechCrunch)
  • Unconventional AI raises $475M seed at $4.5B valuation. Former Databricks AI chief Naveen Rao's new startup is building biologically-inspired, energy-efficient AI computers. Largest seed round in AI history. (TechStartups)
  • Fal raises $140M for real-time generative AI infrastructure. Platform runs open-source or proprietary AI models at ultra-low latency. NVIDIA, Kleiner Perkins, a16z participated. Third raise in 2025. (TechStartups)
  • Port raises $100M at $800M valuation. Israeli startup takes on Spotify's Backstage with a managed internal developer portal. Now pivoting to agentic engineering platform with AI agent orchestration and governance. Customers include GitHub, British Telecom, LG. General Atlantic led. (TechCrunch)
  • Emergent gets Google AI Futures Fund investment. Vibe-coding platform from Dunzo co-founder hits 2.5M users and $25M ARR in five months. Lets non-technical founders build full-stack apps with AI agents. Google provides model access and infrastructure support. (Business Standard)
  • a16z Speedrun invests in Loops AI. Istanbul-based startup building autonomous AI agents for e-commerce gets backing from Andreessen Horowitz's early-stage program. Claims 20x higher engagement and 70% sales uplift per visitor through "agentic commerce" that handles sales, service, and marketing autonomously. (Yahoo Finance)

Developer Experience

  • Google Labs unveils Disco browser with GenTabs. Experimental AI browser turns your open tabs into custom interactive web apps using Gemini 3. Describe what you need in natural language and it generates trip planners, meal plans, or study tools. Links back to original sources. macOS waitlist open. (Google)
  • Instacart launches in-ChatGPT shopping with Instant Checkout. First app to offer embedded checkout in ChatGPT via the Agentic Commerce Protocol. Go from "help me shop for apple pie ingredients" to doorstep delivery without leaving the chat. (OpenAI)
  • OpenAI launches first Certifications courses. Now you can get officially certified on OpenAI tools. (OpenAI)
  • OpenAI enterprise usage up 8x year-over-year. Custom GPT usage jumped 19x. Reasoning token consumption up 320x. Heavy users save 10+ hours per week. But most enterprise users still aren't using advanced features like data analysis or search. (OpenAI)
  • OpenAI and Jony Ive's AI device takes shape. Reuters reports the device will be "always present, always sensing and listening" with visible signals when paying attention. OpenAI's compact Mini models could enable meaningful on-device AI. No screen, designed to feel like "sitting in a cabin by a lake" versus the Times Square chaos of smartphones. (MacDailyNews)
  • System76 releases Pop!_OS 24.04 LTS with COSMIC desktop. After three years of development, the Rust-based COSMIC desktop environment hits general availability. Features modular and composable design, intuitive window tiling, workspace management, and native apps. Replaces GNOME apps with COSMIC Files, Terminal, Text Editor, and Store. Runs on Linux 6.17 kernel. (System76)

Industry News

  • Time names "Architects of AI" as 2025 Person of the Year. Recognition that AI stopped being about the future and "roared into the present." Cover features Sam Altman, Dario Amodei, Jensen Huang, and other AI leaders in a nod to the famous "Lunch atop a Skyscraper" photo. (Time)
  • OpenAI exits "code red" after GPT-5.2 launch. Sam Altman says they expect to be out of crisis mode by January. The company declared "code red" after Gemini 3 topped LMArena benchmarks. Fidji Simo says the model was months in the works, not a rushed response. (CNBC)

Sign in to join the discussion.

No comments yet. Be the first to reply!