SubQ

Name: SubQ
Availability: OnlineOnly
Author: Subquadratic Inc.

SubQ is a sub-quadratic LLM built for 12M-token reasoning, enabling agents to work across full repositories, long histories, and persistent state at one-fifth the cost of leading LLMs.

Visit Website

At a Glance

Pricing

Paid

API: Custom/contact

SubQ Code: Custom/contact

Engagement

Available On

API

Subquadratic Inc.Miami, FLEst. 2026$29M raised

Listed May 2026

About SubQ

SubQ is the first large language model built on a fully sub-quadratic sparse-attention architecture, designed specifically for long-context tasks at scale. Unlike transformer-based models that process every possible token relationship at O(n²) complexity, SubQ operates at O(n) by focusing only on the relationships that matter — reducing attention compute by nearly 1,000× at 12M tokens. It delivers 12M-token reasoning at 150 tokens per second, at one-fifth the cost of other leading LLMs, with no quality loss. SubQ is available as a developer API and as a coding agent integration layer.

12M Token Context Window — Process entire codebases, months of pull request history, and long-running agent state in a single prompt.
Sub-Quadratic Architecture — Built on Sparse Structured Attention (SSA), SubQ scales linearly rather than quadratically, making long-context inference practical and affordable.
OpenAI-Compatible API — Drop-in API endpoints with streaming and tool use support, making integration straightforward for existing developer workflows.
SubQ Code Integration — A long-context layer for coding agents that plugs into Claude Code, Codex, and Cursor, delivering ~25% lower bills and 10× faster codebase exploration via a one-line install.
Auto-Redirect for Expensive Turns — SubQ Code automatically redirects token-heavy questions away from expensive frontier models, optimizing cost without changing agent behavior.
Benchmark-Validated Performance — SubQ 1M-Preview achieves 81.8% on SWE-Bench Verified and 95.0% on RULER @ 128K, with results third-party validated.
Enterprise API Access — Full-context API for enterprise teams to process full repositories and pipeline states in a single call at linear cost.
Research-Driven Team — Built by researchers from Meta, Google, Oxford, Cambridge, and BYU, pushing foundational change at the model architecture level.

Community Discussions

Be the first to start a conversation about SubQ

Share your experience with SubQ, ask questions, or help others learn from your insights.

Pricing

API

Full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

Custom

contact sales

12M token context window
Streaming + tool use
OpenAI-compatible endpoints

SubQ Code

Long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases and answer token-heavy questions faster.

Custom

contact sales

~25% lower bill
10× faster codebase exploration
Auto-redirects expensive model turns
One-line install

View official pricing

Capabilities

Key Features

12M token context window
Sub-quadratic sparse attention architecture
O(n) linear scaling vs O(n²) transformer
150 tokens per second inference speed
1/5 cost of leading LLMs
OpenAI-compatible API endpoints
Streaming and tool use support
SubQ Code for coding agent integration
Auto-redirect for expensive model turns
One-line install for coding agents
SWE-Bench Verified 81.8% score
RULER @ 128K 95.0% score
Third-party validated benchmarks

Integrations

Claude Code

Codex

Cursor

API Available

View Docs

Back to all tools

SubQ

LLM Orchestration

SubQ is a sub-quadratic LLM built for 12M-token reasoning, enabling agents to work across full repositories, long histories, and persistent state at one-fifth the cost of leading LLMs.

Visit Website

At a Glance

Pricing

Paid

API: Custom/contact

SubQ Code: Custom/contact

Engagement

Discussions

Available On

API

Resources

Website Docs llms.txt

Topics

LLM Orchestration AI Infrastructure AI Coding Assistants

Alternatives

Interfaze Alibaba Cloud Model Studio Synthetic

Developer

Subquadratic Inc.Miami, FLEst. 2026$29M raised

Listed May 2026

About SubQ

12M Token Context Window — Process entire codebases, months of pull request history, and long-running agent state in a single prompt.
Sub-Quadratic Architecture — Built on Sparse Structured Attention (SSA), SubQ scales linearly rather than quadratically, making long-context inference practical and affordable.
OpenAI-Compatible API — Drop-in API endpoints with streaming and tool use support, making integration straightforward for existing developer workflows.
SubQ Code Integration — A long-context layer for coding agents that plugs into Claude Code, Codex, and Cursor, delivering ~25% lower bills and 10× faster codebase exploration via a one-line install.
Auto-Redirect for Expensive Turns — SubQ Code automatically redirects token-heavy questions away from expensive frontier models, optimizing cost without changing agent behavior.
Benchmark-Validated Performance — SubQ 1M-Preview achieves 81.8% on SWE-Bench Verified and 95.0% on RULER @ 128K, with results third-party validated.
Enterprise API Access — Full-context API for enterprise teams to process full repositories and pipeline states in a single call at linear cost.
Research-Driven Team — Built by researchers from Meta, Google, Oxford, Cambridge, and BYU, pushing foundational change at the model architecture level.

Community Discussions

Be the first to start a conversation about SubQ

Share your experience with SubQ, ask questions, or help others learn from your insights.

Pricing

API

Full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost.

Custom

contact sales

12M token context window
Streaming + tool use
OpenAI-compatible endpoints

SubQ Code

Long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases and answer token-heavy questions faster.

Custom

contact sales

~25% lower bill
10× faster codebase exploration
Auto-redirects expensive model turns
One-line install

View official pricing

Capabilities

Key Features

12M token context window
Sub-quadratic sparse attention architecture
O(n) linear scaling vs O(n²) transformer
150 tokens per second inference speed
1/5 cost of leading LLMs
OpenAI-compatible API endpoints
Streaming and tool use support
SubQ Code for coding agent integration
Auto-redirect for expensive model turns
One-line install for coding agents
SWE-Bench Verified 81.8% score
RULER @ 128K 95.0% score
Third-party validated benchmarks

Integrations

Claude Code

Codex

Cursor

API Available

View Docs

Back to all tools