Deep Infra
Deep Infra provides developer-friendly, pay-as-you-go inference APIs and hosted infrastructure to run a large catalog of machine learning models and custom LLMs at scale. The platform offers OpenAI-compatible endpoints, native DeepInfra APIs, SDKs, and streaming support so teams can migrate or integrate with existing toolchains. Deep Infra also offers dedicated GPU instances and private deployments, with SOC 2 and ISO 27001 security controls and a zero-retention policy for user data.
- OpenAI-compatible API — Use existing OpenAI-style requests and SDKs to call models hosted on Deep Infra with minimal changes.
- Model marketplace (100+ models) — Access text, embedding, image, audio, and multimodal models and choose per-model token or execution pricing.
- Custom LLM hosting — Deploy your own model on dedicated GPUs (A100, H100, H200, B200) and pay for GPU uptime with autoscaling options.
- Token- and usage-based pricing — Per-input and per-output token pricing and per-minute / per-hour execution billing for models and GPUs; billing is pay-as-you-go.
- Security & compliance — SOC 2 and ISO 27001 certifications and a stated zero-retention policy for inputs and outputs.
- Integrations & SDKs — Official docs and SDKs (REST, Python, JavaScript), OpenAI-compatible endpoints, and integrations like LangChain and LlamaIndex.
Getting started: create an account on the web dashboard, obtain an API token, and call the OpenAI-compatible or DeepInfra-native endpoints; use the docs and SDKs for Python/JS examples and enable dedicated instances or private deployments via the dashboard when needed.
No discussions yet
Be the first to start a discussion about Deep Infra
Developer
Pricing and Plans
Token-based inference
Per-token pricing for model inference; input and output tokens are billed separately and shown per 1M tokens.
- Input tokens billed (example: $0.27 per 1M input tokens)
- Output tokens billed (example: $0.40 per 1M output tokens)
- Access to hosted model catalog and streaming
Dedicated GPU (A100 example)
Example price for A100 dedicated GPU per GPU-hour; other GPU types (H100, H200, B200) have different hourly rates.
- Dedicated GPU instances for custom model hosting
- Billed per GPU-hour with autoscaling options
- Suitable for private deployments and high-throughput inference