# SubQ

> SubQ is a sub-quadratic LLM built for 12M-token reasoning, enabling agents to work across full repositories, long histories, and persistent state at one-fifth the cost of leading LLMs.

SubQ is the first large language model built on a fully sub-quadratic sparse-attention architecture, designed specifically for long-context tasks at scale. Unlike transformer-based models that process every possible token relationship at O(n²) complexity, SubQ operates at O(n) by focusing only on the relationships that matter — reducing attention compute by nearly 1,000× at 12M tokens. It delivers 12M-token reasoning at 150 tokens per second, at one-fifth the cost of other leading LLMs, with no quality loss. SubQ is available as a developer API and as a coding agent integration layer.

- **12M Token Context Window** — *Process entire codebases, months of pull request history, and long-running agent state in a single prompt.*
- **Sub-Quadratic Architecture** — *Built on Sparse Structured Attention (SSA), SubQ scales linearly rather than quadratically, making long-context inference practical and affordable.*
- **OpenAI-Compatible API** — *Drop-in API endpoints with streaming and tool use support, making integration straightforward for existing developer workflows.*
- **SubQ Code Integration** — *A long-context layer for coding agents that plugs into Claude Code, Codex, and Cursor, delivering ~25% lower bills and 10× faster codebase exploration via a one-line install.*
- **Auto-Redirect for Expensive Turns** — *SubQ Code automatically redirects token-heavy questions away from expensive frontier models, optimizing cost without changing agent behavior.*
- **Benchmark-Validated Performance** — *SubQ 1M-Preview achieves 81.8% on SWE-Bench Verified and 95.0% on RULER @ 128K, with results third-party validated.*
- **Enterprise API Access** — *Full-context API for enterprise teams to process full repositories and pipeline states in a single call at linear cost.*
- **Research-Driven Team** — *Built by researchers from Meta, Google, Oxford, Cambridge, and BYU, pushing foundational change at the model architecture level.*

## Features
- 12M token context window
- Sub-quadratic sparse attention architecture
- O(n) linear scaling vs O(n²) transformer
- 150 tokens per second inference speed
- 1/5 cost of leading LLMs
- OpenAI-compatible API endpoints
- Streaming and tool use support
- SubQ Code for coding agent integration
- Auto-redirect for expensive model turns
- One-line install for coding agents
- SWE-Bench Verified 81.8% score
- RULER @ 128K 95.0% score
- Third-party validated benchmarks

## Integrations
Claude Code, Codex, Cursor

## Platforms
API

## Pricing
Paid

## Version
SubQ 1M-Preview

## Links
- Website: https://subq.ai
- Documentation: https://subq.ai/request-early-access
- EveryDev.ai: https://www.everydev.ai/tools/subq