Inception Labs

Name: Inception Labs
Price: 0.25 USD
Availability: OnlineOnly
Author: Inception Labs

Diffusion-based large language models that generate tokens in parallel, delivering 5x faster inference with best-in-class quality at lower cost.

Visit Website

At a Glance

Pricing

Paid

Mercury 2: $0.25 usage-based

Mercury 2 Output: $0.75 usage-based

Mercury Edit: $0.25 usage-based

+2 more plans

Engagement

Available On

Web

API

Inception LabsPalo Alto, CAEst. 2024$50000000 raised

Listed Feb 2026

About Inception Labs

Inception Labs builds and deploys next-generation large language models (LLMs) powered by diffusion rather than traditional auto-regressive generation. By using diffusion, their Mercury models produce many tokens in parallel, making them several times faster and less than half the cost of conventional LLMs. The diffusion framework provides fine-grained control over outputs, allowing adherence to specific schemas and semantic constraints while offering a unified paradigm for combining language with other data modalities.

Parallel Token Generation enables Mercury models to generate multiple tokens simultaneously instead of one at a time, resulting in blazing-fast inference speeds that are 5x faster than traditional LLMs.
Mercury 2 Reasoning Model is the fastest reasoning LLM and the first reasoning diffusion LLM, ideal for complex applications where both performance and speed are crucial.
Mercury Edit is a small, coding-focused diffusion LLM designed for code editing and extremely latency-sensitive components of coding workflows.
OpenAI API Compatible means Mercury models integrate seamlessly into existing LLM workflows as a drop-in replacement with minimal code changes.
Enterprise-Grade Deployment options include Inception API, AWS Bedrock, Azure Foundry, and model routers like OpenRouter, with configurable data retention, private networking, and custom SLAs.
Real-Time Voice Applications enable natural AI engagement in voice-powered workflows like customer support, translation, and immersive gaming experiences.
Lightning Fast Agents automate complex coding and business workflows with ultra-responsive AI that stays in flow without interrupting user thinking.
Cost-Effective Pricing at $0.25 per 1M input tokens and $0.75 per 1M output tokens makes high-performance AI accessible for production applications.

To get started, request early access through the Inception website or access Mercury through AWS Bedrock, Azure Foundry, or model routers. The API is OpenAI-compatible, requiring only a one-line code change for integration. Documentation is available at docs.inceptionlabs.ai for detailed implementation guidance.

Community Discussions

Be the first to start a conversation about Inception Labs

Share your experience with Inception Labs, ask questions, or help others learn from your insights.

Pricing

Mercury 2

Input pricing per 1M tokens for the fastest reasoning LLM

$0.25

usage based

Fastest reasoning LLM
First reasoning dLLM
Complex applications support
OpenAI API compatible

Mercury 2 Output

Output pricing per 1M tokens for Mercury 2

$0.75

usage based

Parallel token generation
Best-in-class quality
Enterprise-grade reliability

Mercury Edit

Input pricing per 1M tokens for coding-focused dLLM

$0.25

usage based

Small coding-focused model
Code editing optimized
Extremely latency-sensitive workflows

Mercury Edit Output

Output pricing per 1M tokens for Mercury Edit

$0.75

usage based

Fast code completions
Tab suggestions
Chat responses

Enterprise

Custom enterprise deployment with dedicated support

Custom

contact sales

Private networking
Dedicated capacity
Custom SLAs
99.5%+ uptime
Priority support
No prompt logging options

View official pricing

Capabilities

Key Features

Parallel token generation
Diffusion-based language models
Mercury 2 reasoning model
Mercury Edit coding model
OpenAI API compatible
Real-time voice applications
Lightning fast agents
Instant code editing
Rapid search capabilities
Enterprise-grade privacy
AWS Bedrock integration
Azure Foundry integration
Custom SLAs
No training on customer data
Configurable data retention

Integrations

AWS Bedrock

Azure Foundry

OpenRouter

Poe

OpenAI API

API Available

View Docs

Back to all tools

About Inception Labs

Parallel Token Generation enables Mercury models to generate multiple tokens simultaneously instead of one at a time, resulting in blazing-fast inference speeds that are 5x faster than traditional LLMs.
Mercury 2 Reasoning Model is the fastest reasoning LLM and the first reasoning diffusion LLM, ideal for complex applications where both performance and speed are crucial.
Mercury Edit is a small, coding-focused diffusion LLM designed for code editing and extremely latency-sensitive components of coding workflows.
OpenAI API Compatible means Mercury models integrate seamlessly into existing LLM workflows as a drop-in replacement with minimal code changes.
Enterprise-Grade Deployment options include Inception API, AWS Bedrock, Azure Foundry, and model routers like OpenRouter, with configurable data retention, private networking, and custom SLAs.
Real-Time Voice Applications enable natural AI engagement in voice-powered workflows like customer support, translation, and immersive gaming experiences.
Lightning Fast Agents automate complex coding and business workflows with ultra-responsive AI that stays in flow without interrupting user thinking.
Cost-Effective Pricing at $0.25 per 1M input tokens and $0.75 per 1M output tokens makes high-performance AI accessible for production applications.

Community Discussions

Be the first to start a conversation about Inception Labs

Share your experience with Inception Labs, ask questions, or help others learn from your insights.

Pricing

Mercury 2

Input pricing per 1M tokens for the fastest reasoning LLM

$0.25

usage based

Fastest reasoning LLM
First reasoning dLLM
Complex applications support
OpenAI API compatible

Mercury 2 Output

Output pricing per 1M tokens for Mercury 2

$0.75

usage based

Parallel token generation
Best-in-class quality
Enterprise-grade reliability

Mercury Edit

Input pricing per 1M tokens for coding-focused dLLM

$0.25

usage based

Small coding-focused model
Code editing optimized
Extremely latency-sensitive workflows

Mercury Edit Output

Output pricing per 1M tokens for Mercury Edit

$0.75

usage based

Fast code completions
Tab suggestions
Chat responses

Enterprise

Custom enterprise deployment with dedicated support

Custom

contact sales

Private networking
Dedicated capacity
Custom SLAs
99.5%+ uptime
Priority support
No prompt logging options

View official pricing

Capabilities

Key Features

Parallel token generation
Diffusion-based language models
Mercury 2 reasoning model
Mercury Edit coding model
OpenAI API compatible
Real-time voice applications
Lightning fast agents
Instant code editing
Rapid search capabilities
Enterprise-grade privacy
AWS Bedrock integration
Azure Foundry integration
Custom SLAs
No training on customer data
Configurable data retention

Integrations

AWS Bedrock

Azure Foundry

OpenRouter

Poe

OpenAI API

API Available

View Docs

Inception Labs

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About Inception Labs

Community Discussions

Be the first to start a conversation about Inception Labs

Pricing

Mercury 2

Mercury 2 Output

Mercury Edit

Mercury Edit Output

Enterprise

Capabilities

Key Features

Integrations

Inception Labs

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About Inception Labs

Community Discussions

Be the first to start a conversation about Inception Labs

Pricing

Mercury 2

Mercury 2 Output

Mercury Edit

Mercury Edit Output

Enterprise

Capabilities

Key Features

Integrations