Replicate icon

Replicate

Replicate is a developer platform and API for running, fine‑tuning, deploying, and scaling machine learning models. It exposes models as production-ready APIs and supports running community and private models with per-second hardware billing. Teams can deploy custom models (via Cog), fine-tune models with their data, and monitor predictions with logs and metrics.

  • One-line API access call any model with a single API request using official SDKs (Node, Python, HTTP).
  • Pay‑for‑what‑you‑use billing models are billed by runtime (per-second) and by hardware type so you only pay for compute used.
  • Deploy custom models package and deploy your own model with Cog to create a scalable API endpoint.
  • Fine-tuning support train or fine-tune models on Replicate to produce custom versions for specific tasks.
  • Hardware choices & scaling choose CPU or GPU hardware (T4, L40S, A100, etc.) and scale automatically when demand increases.
  • Logging & monitoring built-in metrics and logs let teams track model performance and debug predictions.

To get started, sign up on the web, obtain an API token, and use the Node/Python/HTTP SDKs to run a published model or deploy your own model packaged with Cog.

No discussions yet

Be the first to start a discussion about Replicate

Demo Video for Replicate

Developer

Replicate builds a developer platform that makes models available as production-ready APIs and infrastructure. The team of engineers an…read more

Pricing and Plans

(Paid)

CPU (standard)

$0.36/usage

Standard CPU runtime billed per second (example rate shown on pricing).

  • Per-second billing by runtime
  • Runs on shared CPU hardware

Nvidia T4 GPU

$0.81/usage

Nvidia T4 GPU runtime billed per second (example rate shown on pricing).

  • Per-second billing for GPU inference
  • Lower-cost GPU option for image and model inference

Nvidia A100 (80GB) GPU

Popular
$5.04/usage

Nvidia A100 (80GB) GPU runtime billed per second (example rate shown on pricing).

  • High-memory GPU for large models and training
  • Per-second billing for multi‑GPU options

System Requirements

Operating System
Any OS with a modern web browser or server environment for API clients
Memory (RAM)
4 GB+ RAM
Processor
Any modern 64-bit CPU
Disk Space
No local storage required (cloud-based)

AI Capabilities

Text-to-image generation
Text-to-video generation
Speech generation
Music generation
Image restoration
Language models (LLMs)