Deep Infra

Deep Infra provides developer-friendly, pay-as-you-go inference APIs and hosted infrastructure to run a large catalog of machine learning models and custom LLMs at scale. The platform offers OpenAI-compatible endpoints, native DeepInfra APIs, SDKs, and streaming support so teams can migrate or integrate with existing toolchains. Deep Infra also offers dedicated GPU instances and private deployments, with SOC 2 and ISO 27001 security controls and a zero-retention policy for user data.

OpenAI-compatible API — Use existing OpenAI-style requests and SDKs to call models hosted on Deep Infra with minimal changes.
Model marketplace (100+ models) — Access text, embedding, image, audio, and multimodal models and choose per-model token or execution pricing.
Custom LLM hosting — Deploy your own model on dedicated GPUs (A100, H100, H200, B200) and pay for GPU uptime with autoscaling options.
Token- and usage-based pricing — Per-input and per-output token pricing and per-minute / per-hour execution billing for models and GPUs; billing is pay-as-you-go.
Security & compliance — SOC 2 and ISO 27001 certifications and a stated zero-retention policy for inputs and outputs.
Integrations & SDKs — Official docs and SDKs (REST, Python, JavaScript), OpenAI-compatible endpoints, and integrations like LangChain and LlamaIndex.

Getting started: create an account on the web dashboard, obtain an API token, and call the OpenAI-compatible or DeepInfra-native endpoints; use the docs and SDKs for Python/JS examples and enable dedicated instances or private deployments via the dashboard when needed.