# IonRouter

> High throughput, low cost AI inference API powered by IonAttention, supporting LLMs, vision, image, video, and audio models with OpenAI-compatible endpoints.

IonRouter is a high-performance AI inference platform built by Cumulus Labs, powered by their custom IonAttention engine that multiplexes models on a single GPU with millisecond swap times. It delivers significantly higher throughput than standard inference providers — over 7,000 tokens/second on a single GH200 for Qwen2.5-7B — while keeping costs low with per-token, usage-based pricing. The platform is OpenAI API-compatible, meaning teams can switch with a single line of code change, and supports a wide range of model types including language, vision, image, video, and audio.

- **IonAttention Engine**: *A custom inference stack built from the ground up for NVIDIA Grace Hopper Superchips, enabling real-time model multiplexing and adaptive traffic handling for maximum throughput.*
- **OpenAI-Compatible API**: *Point any existing OpenAI client (Python, TypeScript, Go, etc.) at `api.ionrouter.io/v1` with no other code changes required.*
- **Broad Model Support**: *Access frontier and open-source models across language (Qwen3.5, GPT-OSS-120B, Kimi-K2.5), vision (VLMs), image (Flux Schnell), video (Wan2.2), and audio categories.*
- **Custom Model Deployment**: *Deploy your own finetunes, custom LoRAs, or any open-source model on dedicated GPU streams with no cold starts and per-second billing.*
- **Usage-Based Pricing**: *Pay per million tokens (input and output priced separately) or per GPU-second for video/image generation — no idle costs or seat fees.*
- **Iondex Leaderboard**: *Browse and compare available models via the Iondex leaderboard and category explorer to find the best model for your use case.*
- **Playground**: *Test any supported model directly in the browser playground before integrating into your application.*
- **Enterprise Support**: *Book a call for dedicated GPU streams, custom deployments, and enterprise-grade SLAs tailored to high-volume workloads like robotics, surveillance, and AI video pipelines.*

## Features
- IonAttention custom inference engine
- OpenAI-compatible API
- Multi-model support (language, vision, image, video, audio)
- Custom model and LoRA deployment
- Dedicated GPU streams
- No cold starts
- Per-token and per-GPU-second billing
- Iondex model leaderboard
- Browser playground
- Enterprise deployments
- NVIDIA Grace Hopper Superchip infrastructure

## Integrations
OpenAI Python SDK, OpenAI TypeScript SDK, Go HTTP clients, Any OpenAI-compatible framework

## Platforms
API, WEB

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://ionrouter.io
- Documentation: https://ionrouter.io/docs
- Repository: https://github.com/cumulus-compute-labs
- EveryDev.ai: https://www.everydev.ai/tools/ionrouter