Composer is Cursor's New Fast LLM for Agentic Coding

By Sam Moore • 1 comments • 2 days ago

Cursor dropped their first custom model yesterday. Composer is built into Cursor 2.0. It's a Mixture-of-Experts model trained with reinforcement learning, positioned as the "fast frontier" option for agentic coding. They claim 4× faster than comparable models and say most tasks finish in under 30 seconds. Generation speed is around 250 tokens/sec.

What they're actually saying. This isn't a SOTA play. Sonnet 4.5 and GPT-5 both outperform Composer on their internal benchmark (Cursor Bench). The pitch is more like: "fast enough to be useful, but way faster than the smart models." They trained it using RL in hundreds of thousands of concurrent sandboxed coding environments, not just on static datasets. During training, the model had access to the same tools you use: semantic search, file editing, terminal commands, grep.

The transparency problem. Cursor's been evasive about whether they trained from scratch or fine-tuned an existing open model. Their researcher (Sasha Rush) was on Hacker News yesterday dodging the question. When directly asked if Composer is a fine-tune, he said: "Our primary focus is on RL post-training." Make of that what you will. For context, their earlier prototype "Cheetah" had rumors of being Grok-based, which they denied.

The speed vs. intelligence split. This is where it gets interesting. On HN, developers are divided into two camps:

Team Speed: "I know what I want built, I just need it implemented fast. If I can read faster than it writes, that's the bottleneck."
Team Quality: "Sonnet 4.5 is as low as I'm willing to go. I'd rather wait for the right answer than iterate on garbage."

One user put it bluntly: "Sonnet 4.5 quality is about as low as I'm willing to go. Speed isn't the problem. Wrestling with bad output is." Another countered: "Engineering is the art of 'good enough,' and Composer is clearly good enough but a lot faster."

Infrastructure nerds. They built custom MXFP8 MoE kernels using CUDA/PTX and used PyTorch + Ray for async RL at scale. Training ran on thousands of NVIDIA GPUs with hybrid sharded data parallelism. They adapted their Background Agents infra to schedule VMs for the bursty nature of RL runs. They claim this lets them train natively at low precision without post-training quantization, which is why inference is so fast.

Cursor 2.0 context. The model is part of a bigger UX overhaul. Cursor 2.0 is agent-centric: you can run up to 8 agents in parallel using git worktrees or remote machines. They've added in-editor browser testing, sandboxed terminals, improved code review, and voice mode. The interface is designed around agents, not files.

Pricing. Composer is $1.25 per million input tokens and $10 per million output tokens, same as GPT-5 and gemini-2.5-pro. Cache reads are $0.13/M. It's currently included in Pro plans, but several users noted Cursor's pricing has been "constantly changing and confusing."

Benchmark weirdness. Cursor Bench is internal-only, built from real dev requests at Cursor plus hand-curated "optimal" PRs. They won't release it publicly because it'd immediately leak into training sets, as one commenter noted. But that also means you can't independently verify the claims. The charts in their blog post group models into vague categories like "Best Open" and "Fast Frontier" without naming them directly, which rubbed some people the wrong way.

Early feedback. Some devs on HN reported Cursor 2.0 crashing while running agents. Others said Composer feels "fast and solid" but not as thorough as GPT-5 or Sonnet 4.5 at planning. One tester said: "I had already tested other fast models, but with poor quality. Composer is the first one that combines speed and quality." Another: "Auto mode is only good for trivial stuff at this point."

Questions for the crowd:

What's your speed tolerance? If Composer generates faster than you can read, does that matter? Or is max intelligence always worth the wait?
Where does it break? What kinds of tasks is Composer reliably good/bad at vs Sonnet/GPT?
Routing sanity check: Does Auto mode ever pick Composer for you, and does it make sense when it does?
Test-writing: The RL training emphasized running tests. Does Composer actually write and iterate on tests in your stack?
Is it training on our code? Several users assume Cursor trains on user data unless you're on an enterprise plan. Anyone have clarity here?

Starter task to try:

"Identify the flakiest unit tests in this repo. Stabilize at least two of them and add regression coverage. Explain your plan first, run tests, iterate until green."

If you test it, share: repo size/language, task description, wall-clock time, which model(s) you used, and how it compared to your baseline. References:

Cursor 2.0 announcement: https://cursor.com/blog/2-0
Composer technical deep-dive: https://cursor.com/blog/composer
MXFP8 MoE kernels post: https://cursor.com/blog/kernels
Pricing docs: https://cursor.com/docs/account/pricing
Simon Willison's take: https://simonwillison.net/2025/Oct/29/cursor-composer/
VentureBeat coverage: https://venturebeat.com/ai/vibe-coding-platform-cursor-releases-first-in-house-llm-composer-promising
Hacker News discussion: https://news.ycombinator.com/item?id=45748725
Developer site: https://anysphere.inc
Twitter/X: https://x.com/cursor_ai

Joe Seifi2 days ago

Been testing Composer vs Sonnet 4.5 on a Next refactor. Speed part is def real, Composer is basically instant for mid-sized stuff. Sonnet takes way longer but way cleaner output. Composer struggles with complex state stuff, asked it to refactor a Redux slice and kept adding bugs. Sonnet nailed it first try. But for grunt work like converting class components or fixing linter errors its perfect, fast enough I dont get distracted

Trying a workflow now like Composer for quick iterations, Sonnet when I actually need it to think. Speed difference is real but so is the intelligence gap. Anyone else getting crashes when running multiple agents or just me?