SubQ is a sub-quadratic LLM built for 12M-token reasoning, enabling agents to work across full repositories, long histories, and persistent state at one-fifth the cost of leading LLMs.
At a Glance
About SubQ
SubQ is the first large language model built on a fully sub-quadratic sparse-attention architecture, designed specifically for long-context tasks at scale. Unlike transformer-based models that process every possible token relationship at O(n²) complexity, SubQ operates at O(n) by focusing only on the relationships that matter — reducing attention compute by nearly 1,000× at 12M tokens. It delivers 12M-token reasoning at 150 tokens per second, at one-fifth the cost of other leading LLMs, with no quality loss. SubQ is available as a developer API and as a coding agent integration layer.
- 12M Token Context Window — Process entire codebases, months of pull request history, and long-running agent state in a single prompt.
- Sub-Quadratic Architecture — Built on Sparse Structured Attention (SSA), SubQ scales linearly rather than quadratically, making long-context inference practical and affordable.
- OpenAI-Compatible API — Drop-in API endpoints with streaming and tool use support, making integration straightforward for existing developer workflows.
- SubQ Code Integration — A long-context layer for coding agents that plugs into Claude Code, Codex, and Cursor, delivering ~25% lower bills and 10× faster codebase exploration via a one-line install.
- Auto-Redirect for Expensive Turns — SubQ Code automatically redirects token-heavy questions away from expensive frontier models, optimizing cost without changing agent behavior.
- Benchmark-Validated Performance — SubQ 1M-Preview achieves 81.8% on SWE-Bench Verified and 95.0% on RULER @ 128K, with results third-party validated.
- Enterprise API Access — Full-context API for enterprise teams to process full repositories and pipeline states in a single call at linear cost.
- Research-Driven Team — Built by researchers from Meta, Google, Oxford, Cambridge, and BYU, pushing foundational change at the model architecture level.
Community Discussions
Be the first to start a conversation about SubQ
Share your experience with SubQ, ask questions, or help others learn from your insights.
Pricing
API
Full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.
- 12M token context window
- Streaming + tool use
- OpenAI-compatible endpoints
SubQ Code
Long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases and answer token-heavy questions faster.
- ~25% lower bill
- 10× faster codebase exploration
- Auto-redirects expensive model turns
- One-line install
Capabilities
Key Features
- 12M token context window
- Sub-quadratic sparse attention architecture
- O(n) linear scaling vs O(n²) transformer
- 150 tokens per second inference speed
- 1/5 cost of leading LLMs
- OpenAI-compatible API endpoints
- Streaming and tool use support
- SubQ Code for coding agent integration
- Auto-redirect for expensive model turns
- One-line install for coding agents
- SWE-Bench Verified 81.8% score
- RULER @ 128K 95.0% score
- Third-party validated benchmarks
