IonRouter

Name: IonRouter
Availability: OnlineOnly
Author: Cumulus Labs

High throughput, low cost AI inference API powered by IonAttention, supporting LLMs, vision, image, video, and audio models with OpenAI-compatible endpoints.

Visit Website

At a Glance

Pricing

Free tier available

Join Discord to receive $5 of free credits to start building on IonRouter.

Pay-as-you-go: $0

Enterprise: Custom/contact

Engagement

Available On

API

Web

Cumulus LabsSan Francisco, CAEst. 2025$500000 raised

Listed Mar 2026

About IonRouter

IonRouter is a high-performance AI inference platform built by Cumulus Labs, powered by their custom IonAttention engine that multiplexes models on a single GPU with millisecond swap times. It delivers significantly higher throughput than standard inference providers — over 7,000 tokens/second on a single GH200 for Qwen2.5-7B — while keeping costs low with per-token, usage-based pricing. The platform is OpenAI API-compatible, meaning teams can switch with a single line of code change, and supports a wide range of model types including language, vision, image, video, and audio.

IonAttention Engine: A custom inference stack built from the ground up for NVIDIA Grace Hopper Superchips, enabling real-time model multiplexing and adaptive traffic handling for maximum throughput.
OpenAI-Compatible API: Point any existing OpenAI client (Python, TypeScript, Go, etc.) at api.ionrouter.io/v1 with no other code changes required.
Broad Model Support: Access frontier and open-source models across language (Qwen3.5, GPT-OSS-120B, Kimi-K2.5), vision (VLMs), image (Flux Schnell), video (Wan2.2), and audio categories.
Custom Model Deployment: Deploy your own finetunes, custom LoRAs, or any open-source model on dedicated GPU streams with no cold starts and per-second billing.
Usage-Based Pricing: Pay per million tokens (input and output priced separately) or per GPU-second for video/image generation — no idle costs or seat fees.
Iondex Leaderboard: Browse and compare available models via the Iondex leaderboard and category explorer to find the best model for your use case.
Playground: Test any supported model directly in the browser playground before integrating into your application.
Enterprise Support: Book a call for dedicated GPU streams, custom deployments, and enterprise-grade SLAs tailored to high-volume workloads like robotics, surveillance, and AI video pipelines.

Community Discussions

Be the first to start a conversation about IonRouter

Share your experience with IonRouter, ask questions, or help others learn from your insights.

Pricing

FREE

Discord Free Credits

Join Discord to receive $5 of free credits to start building on IonRouter.

$5 free credits
Access to all models
OpenAI-compatible API
Playground access

Pay-as-you-go

Pay per million tokens (input/output priced separately) or per GPU-second for image/video generation. No idle costs.

usage based

Per-token billing (input and output)
Per-GPU-second billing for video/image
No idle costs
Access to all available models
OpenAI-compatible API
Playground access

Enterprise

Dedicated GPU streams, custom model deployments, and enterprise-grade SLAs for high-volume workloads.

Custom

contact sales

Dedicated GPU streams
Custom LoRA and finetune deployment
No cold starts
Per-second billing
Enterprise SLAs
Tailored for robotics, surveillance, and AI video pipelines

View official pricing

Capabilities

Key Features

IonAttention custom inference engine
OpenAI-compatible API
Multi-model support (language, vision, image, video, audio)
Custom model and LoRA deployment
Dedicated GPU streams
No cold starts
Per-token and per-GPU-second billing
Iondex model leaderboard
Browser playground
Enterprise deployments
NVIDIA Grace Hopper Superchip infrastructure

Integrations

OpenAI Python SDK

OpenAI TypeScript SDK

Go HTTP clients

Any OpenAI-compatible framework

API Available

View Docs

Back to all tools

IonRouter

AI Infrastructure

High throughput, low cost AI inference API powered by IonAttention, supporting LLMs, vision, image, video, and audio models with OpenAI-compatible endpoints.

Visit Website

At a Glance

Pricing

Free tier available

Join Discord to receive $5 of free credits to start building on IonRouter.

Pay-as-you-go: $0

Enterprise: Custom/contact

Engagement

16views

Discussions

Available On

API

Web

Resources

Website Docs GitHub llms.txt

Topics

AI Infrastructure Local Inference LLM Orchestration

Alternatives

Bodega Inference Engine Synthetic Lemonade

Developer

Cumulus LabsSan Francisco, CAEst. 2025$500000 raised

Listed Mar 2026

About IonRouter

IonAttention Engine: A custom inference stack built from the ground up for NVIDIA Grace Hopper Superchips, enabling real-time model multiplexing and adaptive traffic handling for maximum throughput.
OpenAI-Compatible API: Point any existing OpenAI client (Python, TypeScript, Go, etc.) at api.ionrouter.io/v1 with no other code changes required.
Broad Model Support: Access frontier and open-source models across language (Qwen3.5, GPT-OSS-120B, Kimi-K2.5), vision (VLMs), image (Flux Schnell), video (Wan2.2), and audio categories.
Custom Model Deployment: Deploy your own finetunes, custom LoRAs, or any open-source model on dedicated GPU streams with no cold starts and per-second billing.
Usage-Based Pricing: Pay per million tokens (input and output priced separately) or per GPU-second for video/image generation — no idle costs or seat fees.
Iondex Leaderboard: Browse and compare available models via the Iondex leaderboard and category explorer to find the best model for your use case.
Playground: Test any supported model directly in the browser playground before integrating into your application.
Enterprise Support: Book a call for dedicated GPU streams, custom deployments, and enterprise-grade SLAs tailored to high-volume workloads like robotics, surveillance, and AI video pipelines.

Community Discussions

Be the first to start a conversation about IonRouter

Share your experience with IonRouter, ask questions, or help others learn from your insights.

Pricing

FREE

Discord Free Credits

Join Discord to receive $5 of free credits to start building on IonRouter.

$5 free credits
Access to all models
OpenAI-compatible API
Playground access

Pay-as-you-go

Pay per million tokens (input/output priced separately) or per GPU-second for image/video generation. No idle costs.

usage based

Per-token billing (input and output)
Per-GPU-second billing for video/image
No idle costs
Access to all available models
OpenAI-compatible API
Playground access

Enterprise

Dedicated GPU streams, custom model deployments, and enterprise-grade SLAs for high-volume workloads.

Custom

contact sales

Dedicated GPU streams
Custom LoRA and finetune deployment
No cold starts
Per-second billing
Enterprise SLAs
Tailored for robotics, surveillance, and AI video pipelines

View official pricing

Capabilities

Key Features

IonAttention custom inference engine
OpenAI-compatible API
Multi-model support (language, vision, image, video, audio)
Custom model and LoRA deployment
Dedicated GPU streams
No cold starts
Per-token and per-GPU-second billing
Iondex model leaderboard
Browser playground
Enterprise deployments
NVIDIA Grace Hopper Superchip infrastructure

Integrations

OpenAI Python SDK

OpenAI TypeScript SDK

Go HTTP clients

Any OpenAI-compatible framework

API Available

View Docs

Back to all tools