Joe Seifi's avatar

AI Dev News Digest - Oct 10th, 2025

By Joe Seifi 0 comments • about 6 hours ago
1760196530234-kx44xm

OpenAI's going all-in on what I'm calling "vibe-coded agents". They dropped AgentKit for building function-calling agents that actually do stuff, plus you can now ship mini-apps inside ChatGPT instead of just prompts. Codex is back, there's a Batch API for queuing heavy workloads, and the whole push feels less like demos and more like "let's build real products." Google's matching the energy with CodeGemma 2 for better code generation and Gemini 2.5's computer-use agents that can click and type through apps via API. DeepMind's auto-patching open-source bugs with CodeMender, and Anthropic just made Claude Code way more flexible with plugins while also naming Stripe's ex-CTO as their new CTO.

The ecosystem's moving fast. GitHub Copilot CLI got faster, Together AI's new speculative system is claiming serious inference speedups, and workflow automation tools are getting massive funding rounds. Zendesk says their AI agent can knock out most support tickets autonomously, and GitHub patched a Copilot Chat vuln that could've leaked secrets via prompt injection. If you want the full picture, the State of AI Report just dropped with charts and takeaways on where we're actually at. Curious how you're using any of this stuff?

OpenAI: Apps, Agents, APIs

  • AgentKit for function-calling agents. A lightweight toolkit for tool use, memory, and routing so you can assemble safe, task-oriented agents. (OpenAI)
  • Apps in ChatGPT + Apps SDK. Build mini-apps with UI, permissions, and actions that run inside ChatGPT; ship workflows instead of prompts. (OpenAI)
  • Codex (GA). OpenAI re-introduces a code-focused model for generation, refactors, and explanations across languages and IDEs. (OpenAI)
  • OpenAI’s broader dev push. New models and features target real product work, not toy demos; the theme is “agents with grounding.” (TechCrunch)
  • Batch API for async jobs. Queue large workloads, lower cost, and avoid rate-limit pain when running long AI tasks. (OpenAI)

Google & DeepMind: Coding + Computer Use

  • CodeGemma 2. Next-gen code models with better pass@1, explain/repair, and tight toolchain hooks for debugging. (Google AI Blog)
  • Gemini 2.5 Computer Use. Programmable, permissioned “computer use” agents that can click, type, and navigate apps, usable via API. (Google Blog)
  • DeepMind’s CodeMender. An auto-patching agent that scans OSS and drafts fixes; humans still approve merges. (DeepMind)

Claude Code & Anthropic

  • Claude Code adds plugins for easy customization Bundle slash commands, sub-agents, MCP servers, and hooks; install with /plugin and share via marketplaces (public beta). (Anthropic)
  • Anthropic names Rahul Patil CTO former Stripe CTO; he’ll lead product, compute, infra, inference, data science, and security as Sam McCandlish shifts to Chief Architect. (Anthropic)

GitHub & Dev Tooling

  • Copilot CLI update. 15% faster, cleaner output, and a simpler flow for common commands. (GitHub Changelog)

Infra & Self-Hosting

  • Together AI debuts ATLAS, an adaptive-learning speculator for LLM inference that adapts to live workloads to boost acceptance rates and throughput, with reported speedups up to 4× (≈501 TPS on DeepSeek-V3.1, ≈460 TPS on Kimi-K2). (Together)
  • CoreWeave dev tools. New offerings to make standing up inference and workloads on their GPU cloud simpler for teams. (CoreWeave)
  • AI21 tiny model (open source). A small, efficient LLM for edge and constrained environments. (Ai21)

Support & Ops Automation

  • Zendesk autonomous agent. Claims big ticket resolution rates by letting an AI handle end-to-end support flows with guardrails. (TechCrunch)

Security & Policy

  • Copilot Chat vuln patched. Researchers showed prompt-injection + CSP bypass (“CamoLeak”) could leak secrets; GitHub disabled risky paths and shipped fixes. (The Register)
  • Disrupting malicious AI use. OpenAI details takedowns, red-team learnings, and coordination with platforms and law enforcement. Useful reading for abuse defenses. (OpenAI)

Funding & Ecosystem

  • n8n raises $180M at $2.5B. Workflow/agent orchestration gets serious backing from Accel and NVentures. (n8n)
  • Reflection AI raises $2B at $8B (report). Outlet reports a massive round for an open-source model lab aiming at “superintelligent” systems. (ReflectionAI)

Research & Weekend Reading

  • State of AI Report 2025. Annual snapshot on compute, model quality, safety, and industry shifts. Good charts; sober takeaways. (State of AI)
  • Personalized long-term LLM interactions. Paper proposes memory and preference systems so LLM agents stay useful over weeks, not minutes. (arXiv)

Sign in to join the discussion.

No comments yet. Be the first to reply!