# IonRouter > High throughput, low cost AI inference API powered by IonAttention, supporting LLMs, vision, image, video, and audio models with OpenAI-compatible endpoints. IonRouter is a high-performance AI inference platform built by Cumulus Labs, powered by their custom IonAttention engine that multiplexes models on a single GPU with millisecond swap times. It delivers significantly higher throughput than standard inference providers — over 7,000 tokens/second on a single GH200 for Qwen2.5-7B — while keeping costs low with per-token, usage-based pricing. The platform is OpenAI API-compatible, meaning teams can switch with a single line of code change, and supports a wide range of model types including language, vision, image, video, and audio. - **IonAttention Engine**: *A custom inference stack built from the ground up for NVIDIA Grace Hopper Superchips, enabling real-time model multiplexing and adaptive traffic handling for maximum throughput.* - **OpenAI-Compatible API**: *Point any existing OpenAI client (Python, TypeScript, Go, etc.) at `api.ionrouter.io/v1` with no other code changes required.* - **Broad Model Support**: *Access frontier and open-source models across language (Qwen3.5, GPT-OSS-120B, Kimi-K2.5), vision (VLMs), image (Flux Schnell), video (Wan2.2), and audio categories.* - **Custom Model Deployment**: *Deploy your own finetunes, custom LoRAs, or any open-source model on dedicated GPU streams with no cold starts and per-second billing.* - **Usage-Based Pricing**: *Pay per million tokens (input and output priced separately) or per GPU-second for video/image generation — no idle costs or seat fees.* - **Iondex Leaderboard**: *Browse and compare available models via the Iondex leaderboard and category explorer to find the best model for your use case.* - **Playground**: *Test any supported model directly in the browser playground before integrating into your application.* - **Enterprise Support**: *Book a call for dedicated GPU streams, custom deployments, and enterprise-grade SLAs tailored to high-volume workloads like robotics, surveillance, and AI video pipelines.* ## Features - IonAttention custom inference engine - OpenAI-compatible API - Multi-model support (language, vision, image, video, audio) - Custom model and LoRA deployment - Dedicated GPU streams - No cold starts - Per-token and per-GPU-second billing - Iondex model leaderboard - Browser playground - Enterprise deployments - NVIDIA Grace Hopper Superchip infrastructure ## Integrations OpenAI Python SDK, OpenAI TypeScript SDK, Go HTTP clients, Any OpenAI-compatible framework ## Platforms API, WEB ## Pricing Freemium — Free tier available with paid upgrades ## Links - Website: https://ionrouter.io - Documentation: https://ionrouter.io/docs - Repository: https://github.com/cumulus-compute-labs - EveryDev.ai: https://www.everydev.ai/tools/ionrouter