Cumulus Labs
Cumulus Labs delivers fast, scalable GPU compute and serverless AI inference by activating idle cloud capacity and multiplexing models on single GPUs.
At a Glance
- AI Development Teams
- Robotics Companies
- Game Developers
- Content Creators
- +1 more
AI Tools by Cumulus Labs
(1)IonRouter
AI Inference Platform API
Discussions
No discussions yet
Be the first to start a discussion about Cumulus Labs
Latest News
Cumulus Labs launches IonRouter: High-throughput, low-cost inference API
Cumulus Labs joins Y Combinator Winter 2026 Batch
Cumulus Labs joins NVIDIA Inception program for AI startups
Cumulus Labs launches performant GPU Cloud for AI teams
Products & Services
A serverless inference API for open-source and fine-tuned AI models (e.g., Kimi K2.5, Qwen 3.5), featuring high throughput and low-cost pricing.
A custom inference engine designed to multiplex multiple AI models on a single GPU with millisecond swap times.
A performant GPU cloud that optimizes training and inference workloads with preemptive resource management.
Market Position
Positions as the most cost-effective and highest throughput inference provider for open-source models, outperforming incumbents by using custom multiplexing on GH200 hardware.
Leadership
Founders
Veer Shah
Founder and CEO at Cumulus Labs (YC W26). Previously led a Space Force SBIR contract for military satellite Kubernetes.
Suryaa Rajinikanth
Founder and CTO at Cumulus Labs (YC W26). Previously Lead Engineer at TensorDock building distributed GPU marketplaces; Forward Deployed Software Engineer at Palantir; Software Engineer at Boston University; roles at Blackstone, Fidelity, and Georgia Tech (Lead Undergraduate Researcher).
Executive Team
Veer Shah
Co-Founder & CEO
Experience in cloud infrastructure and distributed systems; previously led Space Force SBIR contracts.
Suryaa Rajinikanth
Co-Founder & CTO
Former Lead Engineer at TensorDock and Forward Deployed Software Engineer at Palantir.
Founding Story
Founders Suryaa Rajinikanth and Veer Shah, who have known each other for years, started Cumulus Labs to address high costs and billing complexities in GPU inference. Leveraging their backgrounds in distributed systems and robotics, they built the IonAttention engine to multiplex models and maximize GPU efficiency.
Business Model
Revenue Model
Serverless inference fees (pay-per-million tokens), GPU cloud usage fees (pay-per-cycle/per-second billing), and potentially enterprise custom deployments.
Pricing Tiers
Pay-per-token with no idle costs.
High-throughput serving on GH200 hardware.
On-demand video generation.
Plus per-image output cost.
Target Markets
- AI Development Teams
- Robotics Companies
- Game Developers
- Content Creators
- Enterprise AI Operations
- Real-time robotics perception
- Multi-camera surveillance video analysis
- AI video generation pipelines
- On-demand game asset generation
- General-purpose high-speed LLM inference
- AI infrastructure startups
- Content creation platforms