Cumulus Labs

Cumulus Labs delivers fast, scalable GPU compute and serverless AI inference by activating idle cloud capacity and multiplexing models on single GPUs.

Visit Website

At a Glance

16Tool Views

San Francisco, CAHeadquarters

2025Est.

5Employees

AI Tools by Cumulus Labs

(1)

IonRouter

AI Inference Platform API

AI Infrastructure Local Inference LLM Orchestration

Discussions

No discussions yet

Be the first to start a discussion about Cumulus Labs

Latest News

03/16/2026

01/01/2026

Cumulus Labs launches performant GPU Cloud for AI teams

LinkedIn / Y Combinator

Products & Services

IonRouter

March 2026

A serverless inference API for open-source and fine-tuned AI models (e.g., Kimi K2.5, Qwen 3.5), featuring high throughput and low-cost pricing.

IonAttention

Late 2025

A custom inference engine designed to multiplex multiple AI models on a single GPU with millisecond swap times.

Cumulus GPU Cloud

Late 2025

A performant GPU cloud that optimizes training and inference workloads with preemptive resource management.

Market Position

Positions as the most cost-effective and highest throughput inference provider for open-source models, outperforming incumbents by using custom multiplexing on GH200 hardware.

Leadership

Founders

Veer Shah

Founder and CEO at Cumulus Labs (YC W26). Previously led a Space Force SBIR contract for military satellite Kubernetes.

Suryaa Rajinikanth

Founder and CTO at Cumulus Labs (YC W26). Previously Lead Engineer at TensorDock building distributed GPU marketplaces; Forward Deployed Software Engineer at Palantir; Software Engineer at Boston University; roles at Blackstone, Fidelity, and Georgia Tech (Lead Undergraduate Researcher).

Executive Team

Veer Shah

Co-Founder & CEO

Experience in cloud infrastructure and distributed systems; previously led Space Force SBIR contracts.

Suryaa Rajinikanth

Co-Founder & CTO

Former Lead Engineer at TensorDock and Forward Deployed Software Engineer at Palantir.

Founding Story

Founders Suryaa Rajinikanth and Veer Shah, who have known each other for years, started Cumulus Labs to address high costs and billing complexities in GPU inference. Leveraging their backgrounds in distributed systems and robotics, they built the IonAttention engine to multiplex models and maximize GPU efficiency.

Business Model

Revenue Model

Serverless inference fees (pay-per-million tokens), GPU cloud usage fees (pay-per-cycle/per-second billing), and potentially enterprise custom deployments.

Pricing Tiers

Serverless Inference (Kimi-K2.5)

$0.20 per 1M tokens (input), $1.60 per 1M tokens (output)

Pay-per-token with no idle costs.

Serverless Inference (Qwen3.5-122B)

$0.20 per 1M tokens (input), $1.60 per 1M tokens (output)

High-throughput serving on GH200 hardware.

Wan2.2 (Text-to-Video)

$0.00194 per GPU-second

On-demand video generation.

Flux Schnell (Image)

$0.005 per image

Plus per-image output cost.

Private

Target Markets

Industries & Segments

AI Development Teams
Robotics Companies
Game Developers
Content Creators
Enterprise AI Operations

Use Cases

Real-time robotics perception
Multi-camera surveillance video analysis
AI video generation pipelines
On-demand game asset generation
General-purpose high-speed LLM inference

Notable Customers

AI infrastructure startups
Content creation platforms

Quick Facts

Headquarters

San Francisco, CA

Founded

2025

Entity Type

Inc.

Employees

Total Funding

$500,000+

Investors

Y Combinator, NVIDIA Inception

Office Locations

San Francisco

Boston

Funding History

Pre-Seed/Accelerator$500,000

Jan 2026

Y Combinator

History & Milestones

January 2026

Joined Y Combinator as part of the W26 batch.

March 2026

Officially launched IonRouter, a high-throughput, low-cost inference API.

Early 2026

Joined the NVIDIA Inception program as a partner.

2025

Company founded by Veer Shah and Suryaa Rajinikanth.

Late 2025/Early 2026

Launched the Cumulus Labs GPU Cloud for AI teams.

Key Capabilities

IonAttention multiplexing engine

12.5s cold starts for serverless GPU workloads

Per-second billing / No idle costs

OpenAI-compatible API (zero-code migration)

Support for custom LoRAs and fine-tuned models

High throughput (up to 7,167 tok/s on GH200)

Integrations & Partnerships

Platform Integrations

OpenAI API
Python SDK
TypeScript SDK
Go SDK

Key Partnerships

NVIDIA Inception

Y Combinator

Connect

Website

GitHub

X / Twitter

Discord

AI Topics

Cumulus Labs focuses on these topics:

AI Infrastructure(1)

Local Inference(1)

LLM Orchestration(1)

Back to all developers

Cumulus Labs

Cumulus Labs delivers fast, scalable GPU compute and serverless AI inference by activating idle cloud capacity and multiplexing models on single GPUs.

Visit Website

At a Glance

16Tool Views

San Francisco, CAHeadquarters

2025Est.

5Employees

AI Tools by Cumulus Labs

(1)

IonRouter

AI Inference Platform API

AI Infrastructure Local Inference LLM Orchestration

Discussions

No discussions yet

Be the first to start a discussion about Cumulus Labs

Latest News

03/16/2026

Cumulus Labs launches IonRouter: High-throughput, low-cost inference API

ionrouter.io

01/01/2026

Cumulus Labs joins Y Combinator Winter 2026 Batch

ycombinator.com

01/01/2026

Cumulus Labs joins NVIDIA Inception program for AI startups

ionrouter.io

01/01/2026

Cumulus Labs launches performant GPU Cloud for AI teams

LinkedIn / Y Combinator

Products & Services

IonRouter

March 2026

A serverless inference API for open-source and fine-tuned AI models (e.g., Kimi K2.5, Qwen 3.5), featuring high throughput and low-cost pricing.

IonAttention

Late 2025

A custom inference engine designed to multiplex multiple AI models on a single GPU with millisecond swap times.

Cumulus GPU Cloud

Late 2025

A performant GPU cloud that optimizes training and inference workloads with preemptive resource management.

Market Position

Positions as the most cost-effective and highest throughput inference provider for open-source models, outperforming incumbents by using custom multiplexing on GH200 hardware.

Leadership

Founders

Veer Shah

Founder and CEO at Cumulus Labs (YC W26). Previously led a Space Force SBIR contract for military satellite Kubernetes.

Suryaa Rajinikanth

Executive Team

Veer Shah

Co-Founder & CEO

Experience in cloud infrastructure and distributed systems; previously led Space Force SBIR contracts.

Suryaa Rajinikanth

Co-Founder & CTO

Former Lead Engineer at TensorDock and Forward Deployed Software Engineer at Palantir.

Founding Story

Business Model

Revenue Model

Serverless inference fees (pay-per-million tokens), GPU cloud usage fees (pay-per-cycle/per-second billing), and potentially enterprise custom deployments.

Pricing Tiers

Serverless Inference (Kimi-K2.5)

$0.20 per 1M tokens (input), $1.60 per 1M tokens (output)

Pay-per-token with no idle costs.

Serverless Inference (Qwen3.5-122B)

$0.20 per 1M tokens (input), $1.60 per 1M tokens (output)

High-throughput serving on GH200 hardware.

Wan2.2 (Text-to-Video)

$0.00194 per GPU-second

On-demand video generation.

Flux Schnell (Image)

$0.005 per image

Plus per-image output cost.

Private

Target Markets

Industries & Segments

AI Development Teams
Robotics Companies
Game Developers
Content Creators
Enterprise AI Operations

Use Cases

Real-time robotics perception
Multi-camera surveillance video analysis
AI video generation pipelines
On-demand game asset generation
General-purpose high-speed LLM inference

Notable Customers

AI infrastructure startups
Content creation platforms

Quick Facts

Headquarters

San Francisco, CA

Founded

2025

Entity Type

Inc.

Employees

Total Funding

$500,000+

Investors

Y Combinator, NVIDIA Inception

Office Locations

San Francisco

Boston

Funding History

Pre-Seed/Accelerator$500,000

Jan 2026

Y Combinator

History & Milestones

January 2026

Joined Y Combinator as part of the W26 batch.

March 2026

Officially launched IonRouter, a high-throughput, low-cost inference API.

Early 2026

Joined the NVIDIA Inception program as a partner.

2025

Company founded by Veer Shah and Suryaa Rajinikanth.

Late 2025/Early 2026

Launched the Cumulus Labs GPU Cloud for AI teams.

Key Capabilities

IonAttention multiplexing engine

12.5s cold starts for serverless GPU workloads

Per-second billing / No idle costs

OpenAI-compatible API (zero-code migration)

Support for custom LoRAs and fine-tuned models

High throughput (up to 7,167 tok/s on GH200)

Integrations & Partnerships

Platform Integrations

OpenAI API
Python SDK
TypeScript SDK
Go SDK

Key Partnerships

NVIDIA Inception

Y Combinator

Connect

Website

GitHub

X / Twitter

Discord

AI Topics

Cumulus Labs focuses on these topics:

AI Infrastructure(1)

Local Inference(1)

LLM Orchestration(1)

Back to all developers