Replicate
Replicate provides a developer platform and API to run, fine-tune, deploy, and scale machine learning models with pay‑for‑what‑you‑use hardware billing.
At a Glance
Pricing
Paid
Engagement
Available On
About Replicate
Replicate is a developer platform and API for running, fine‑tuning, deploying, and scaling machine learning models. It exposes models as production-ready APIs and supports running community and private models with per-second hardware billing. Teams can deploy custom models (via Cog), fine-tune models with their data, and monitor predictions with logs and metrics.
- One-line API access call any model with a single API request using official SDKs (Node, Python, HTTP).
- Pay‑for‑what‑you‑use billing models are billed by runtime (per-second) and by hardware type so you only pay for compute used.
- Deploy custom models package and deploy your own model with Cog to create a scalable API endpoint.
- Fine-tuning support train or fine-tune models on Replicate to produce custom versions for specific tasks.
- Hardware choices & scaling choose CPU or GPU hardware (T4, L40S, A100, etc.) and scale automatically when demand increases.
- Logging & monitoring built-in metrics and logs let teams track model performance and debug predictions.
To get started, sign up on the web, obtain an API token, and use the Node/Python/HTTP SDKs to run a published model or deploy your own model packaged with Cog.

Community Discussions
Be the first to start a conversation about Replicate
Share your experience with Replicate, ask questions, or help others learn from your insights.
Pricing
CPU (standard)
Standard CPU runtime billed per second (example rate shown on pricing).
- Per-second billing by runtime
- Runs on shared CPU hardware
Nvidia T4 GPU
Nvidia T4 GPU runtime billed per second (example rate shown on pricing).
- Per-second billing for GPU inference
- Lower-cost GPU option for image and model inference
Nvidia A100 (80GB) GPU
Nvidia A100 (80GB) GPU runtime billed per second (example rate shown on pricing).
- High-memory GPU for large models and training
- Per-second billing for multi‑GPU options
Capabilities
Key Features
- Run community and private models via API with one line of code
- Fine-tune and train models with your data
- Deploy custom models using Cog
- Per-second, hardware-based billing (CPU and multiple GPU types)
- Logging, metrics, and automatic scaling for deployed models
Integrations
Demo Video
