Inception Labs
Diffusion-based large language models that generate tokens in parallel, delivering 5x faster inference with best-in-class quality at lower cost.
At a Glance
Pricing
Paid
Engagement
Available On
About Inception Labs
Inception Labs builds and deploys next-generation large language models (LLMs) powered by diffusion rather than traditional auto-regressive generation. By using diffusion, their Mercury models produce many tokens in parallel, making them several times faster and less than half the cost of conventional LLMs. The diffusion framework provides fine-grained control over outputs, allowing adherence to specific schemas and semantic constraints while offering a unified paradigm for combining language with other data modalities.
-
Parallel Token Generation enables Mercury models to generate multiple tokens simultaneously instead of one at a time, resulting in blazing-fast inference speeds that are 5x faster than traditional LLMs.
-
Mercury 2 Reasoning Model is the fastest reasoning LLM and the first reasoning diffusion LLM, ideal for complex applications where both performance and speed are crucial.
-
Mercury Edit is a small, coding-focused diffusion LLM designed for code editing and extremely latency-sensitive components of coding workflows.
-
OpenAI API Compatible means Mercury models integrate seamlessly into existing LLM workflows as a drop-in replacement with minimal code changes.
-
Enterprise-Grade Deployment options include Inception API, AWS Bedrock, Azure Foundry, and model routers like OpenRouter, with configurable data retention, private networking, and custom SLAs.
-
Real-Time Voice Applications enable natural AI engagement in voice-powered workflows like customer support, translation, and immersive gaming experiences.
-
Lightning Fast Agents automate complex coding and business workflows with ultra-responsive AI that stays in flow without interrupting user thinking.
-
Cost-Effective Pricing at $0.25 per 1M input tokens and $0.75 per 1M output tokens makes high-performance AI accessible for production applications.
To get started, request early access through the Inception website or access Mercury through AWS Bedrock, Azure Foundry, or model routers. The API is OpenAI-compatible, requiring only a one-line code change for integration. Documentation is available at docs.inceptionlabs.ai for detailed implementation guidance.
Community Discussions
Be the first to start a conversation about Inception Labs
Share your experience with Inception Labs, ask questions, or help others learn from your insights.
Pricing
Mercury 2
Input pricing per 1M tokens for the fastest reasoning LLM
- Fastest reasoning LLM
- First reasoning dLLM
- Complex applications support
- OpenAI API compatible
Mercury 2 Output
Output pricing per 1M tokens for Mercury 2
- Parallel token generation
- Best-in-class quality
- Enterprise-grade reliability
Mercury Edit
Input pricing per 1M tokens for coding-focused dLLM
- Small coding-focused model
- Code editing optimized
- Extremely latency-sensitive workflows
Mercury Edit Output
Output pricing per 1M tokens for Mercury Edit
- Fast code completions
- Tab suggestions
- Chat responses
Enterprise
Custom enterprise deployment with dedicated support
- Private networking
- Dedicated capacity
- Custom SLAs
- 99.5%+ uptime
- Priority support
- No prompt logging options
Capabilities
Key Features
- Parallel token generation
- Diffusion-based language models
- Mercury 2 reasoning model
- Mercury Edit coding model
- OpenAI API compatible
- Real-time voice applications
- Lightning fast agents
- Instant code editing
- Rapid search capabilities
- Enterprise-grade privacy
- AWS Bedrock integration
- Azure Foundry integration
- Custom SLAs
- No training on customer data
- Configurable data retention
