Replicate
Replicate is a developer platform and API for running, fine‑tuning, deploying, and scaling machine learning models. It exposes models as production-ready APIs and supports running community and private models with per-second hardware billing. Teams can deploy custom models (via Cog), fine-tune models with their data, and monitor predictions with logs and metrics.
- One-line API access call any model with a single API request using official SDKs (Node, Python, HTTP).
- Pay‑for‑what‑you‑use billing models are billed by runtime (per-second) and by hardware type so you only pay for compute used.
- Deploy custom models package and deploy your own model with Cog to create a scalable API endpoint.
- Fine-tuning support train or fine-tune models on Replicate to produce custom versions for specific tasks.
- Hardware choices & scaling choose CPU or GPU hardware (T4, L40S, A100, etc.) and scale automatically when demand increases.
- Logging & monitoring built-in metrics and logs let teams track model performance and debug predictions.
To get started, sign up on the web, obtain an API token, and use the Node/Python/HTTP SDKs to run a published model or deploy your own model packaged with Cog.
No discussions yet
Be the first to start a discussion about Replicate
Demo Video for Replicate
Developer
Pricing and Plans
CPU (standard)
Standard CPU runtime billed per second (example rate shown on pricing).
- Per-second billing by runtime
- Runs on shared CPU hardware
Nvidia T4 GPU
Nvidia T4 GPU runtime billed per second (example rate shown on pricing).
- Per-second billing for GPU inference
- Lower-cost GPU option for image and model inference
Nvidia A100 (80GB) GPU
Nvidia A100 (80GB) GPU runtime billed per second (example rate shown on pricing).
- High-memory GPU for large models and training
- Per-second billing for multi‑GPU options