Modal icon

Modal

Modal is a serverless cloud platform designed to run compute-intensive AI and machine learning applications without requiring users to manage infrastructure. It provides an AI-native runtime with fast model initialization and autoscaling, a built-in globally distributed storage layer for high-throughput model and data access, and first-class support for GPU and CPU workloads for inference, training, and batch processing. Modal exposes APIs and SDKs to deploy OpenAI-compatible model endpoints, run notebooks and sandboxes, and orchestrate large-scale batch jobs.

  • AI-native runtime — a serverless runtime engineered for heavy AI workloads with fast startup and autoscaling to handle parallel GPU/CPU jobs.
  • Built-in distributed storage — globally distributed storage optimized for fast model loading and dataset access to reduce I/O bottlenecks.
  • Flexible compute options — per-second GPU and CPU pricing for many GPU types and hardware configurations, enabling fine-grained usage-based billing.
  • Model serving & inference — deploy OpenAI-compatible LLM endpoints and host custom model APIs at scale.
  • Batch, notebooks, and sandboxes — run large-scale batch workflows, interactive notebooks, and isolated sandboxes for development and testing.

To get started, sign up on the web console, consult the documentation for SDK and API examples, and deploy a sample model or notebook to validate autoscaling and compute billing.

Modal Tool Discussions

No discussions yet

Be the first to start a discussion about Modal

Demo Video for Modal

Developer

Modal builds a serverless cloud platform that lets engineers and researchers run compute-intensive AI and ML applications without manag…read more

Pricing and Plans

(Freemium)

Starter

Free
  • $30 / month free compute credits
  • Up to 3 workspace seats
  • 100 containers + 10 GPU concurrency
  • Crons and web endpoints (limited)
  • Real-time metrics and logs
  • Region selection
  • Access to Modal Community Slack

Team

$250/month
  • $100 / month free compute credits
  • Unlimited seats
  • 1000 containers + 50 GPU concurrency
  • Unlimited crons and web endpoints
  • Custom domains
  • Static IP proxy
  • Deployment rollbacks
  • Access to Modal Community Slack

Enterprise

Contact for pricing

Custom compute pricing with advanced security, support, and higher concurrency.

  • Contact sales
  • Volume-based discounts
  • Unlimited seats
  • Higher GPU concurrency
  • Embedded ML engineering services
  • Support via private Slack
  • Audit logs, Okta SSO, and HIPAA

System Requirements

Operating System
Any OS with a modern web browser
Memory (RAM)
4 GB+ RAM
Processor
Any modern 64-bit CPU
Disk Space
No local storage required (cloud-based)

AI Capabilities

Model serving
Inference
Training
Fine-tuning
Batch processing
Text-to-speech (Chatterbox)