# Cerebrium

> Serverless AI infrastructure for deploying LLMs, agents, and vision models globally with low latency, zero DevOps, and per-second billing.

Cerebrium provides serverless infrastructure for real-time AI applications, enabling developers to deploy LLMs, agents, and vision models globally with low latency and zero DevOps overhead. The platform offers per-second billing, automatic scaling from zero to thousands of containers, and supports 12+ GPU types including T4, A10, A100, H100, and H200. Trusted by companies like Deepgram, Vapi, Tavus, and LiveKit, Cerebrium simplifies the entire development workflow from configuration to observability.

- **Fast Cold Starts** - Average app starts in 2 seconds or less, ensuring minimal latency for real-time applications
- **Auto-scaling** - Scale from zero to thousands of requests automatically and only pay for compute you actually use
- **Multi-region Deployments** - Deploy globally across multiple regions for better compliance and improved performance for users worldwide
- **12+ GPU Types** - Select from T4, L4, A10, A100, L40s, H100, H200, Trainium, Inferentia, and other GPUs for specific use cases
- **WebSocket & Streaming Endpoints** - Native support for real-time interactions, low-latency responses, and streaming tokens as they're generated
- **Batching & Concurrency** - Combine requests into batches to minimize GPU idle time and dynamically scale to handle thousands of simultaneous requests
- **Distributed Storage** - Persist model weights, logs, and artifacts across deployments with no external setup required
- **OpenTelemetry Integration** - Track app performance end-to-end with unified metrics, traces, and log observability
- **Bring Your Own Runtime** - Use custom Dockerfiles or runtimes for absolute control over app environments
- **CI/CD & Gradual Rollouts** - Support for CI/CD pipelines and safe, gradual rollouts for zero-downtime updates
- **Secrets Management** - Store and manage secrets securely via the dashboard to keep API keys hidden and safe
- **SOC 2 & HIPAA Compliance** - Enterprise-grade security ensuring data is secure, available, and private

To get started, sign up for a free account with $30 in free credits (no credit card required), initialize a project, choose your desired hardware, and deploy. The platform handles scaling, infrastructure management, and observability automatically.

## Features
- Fast cold starts (2 seconds or less)
- Auto-scaling from zero to thousands
- Multi-region deployments
- 12+ GPU types (T4, L4, A10, A100, L40s, H100, H200)
- WebSocket endpoints
- Streaming endpoints
- REST API endpoints
- Batching
- Concurrency handling
- Asynchronous jobs
- Distributed storage
- OpenTelemetry observability
- Bring your own runtime
- CI/CD & gradual rollouts
- Secrets management
- SOC 2 compliance
- HIPAA compliance
- Per-second billing

## Integrations
Deepgram, Vapi, Tavus, BitHuman, LiveKit, Lelapa AI, Akool

## Platforms
WEB, API

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://www.cerebrium.ai
- Documentation: https://docs.cerebrium.ai/
- EveryDev.ai: https://www.everydev.ai/tools/cerebrium