Deep Infra icon

Deep Infra

Deep Infra provides developer-friendly, pay-as-you-go inference APIs and hosted infrastructure to run a large catalog of machine learning models and custom LLMs at scale. The platform offers OpenAI-compatible endpoints, native DeepInfra APIs, SDKs, and streaming support so teams can migrate or integrate with existing toolchains. Deep Infra also offers dedicated GPU instances and private deployments, with SOC 2 and ISO 27001 security controls and a zero-retention policy for user data.

  • OpenAI-compatible API — Use existing OpenAI-style requests and SDKs to call models hosted on Deep Infra with minimal changes.
  • Model marketplace (100+ models) — Access text, embedding, image, audio, and multimodal models and choose per-model token or execution pricing.
  • Custom LLM hosting — Deploy your own model on dedicated GPUs (A100, H100, H200, B200) and pay for GPU uptime with autoscaling options.
  • Token- and usage-based pricing — Per-input and per-output token pricing and per-minute / per-hour execution billing for models and GPUs; billing is pay-as-you-go.
  • Security & compliance — SOC 2 and ISO 27001 certifications and a stated zero-retention policy for inputs and outputs.
  • Integrations & SDKs — Official docs and SDKs (REST, Python, JavaScript), OpenAI-compatible endpoints, and integrations like LangChain and LlamaIndex.

Getting started: create an account on the web dashboard, obtain an API token, and call the OpenAI-compatible or DeepInfra-native endpoints; use the docs and SDKs for Python/JS examples and enable dedicated instances or private deployments via the dashboard when needed.

No discussions yet

Be the first to start a discussion about Deep Infra

Developer

Deep Infra builds low-latency, cost-efficient inference infrastructure and developer APIs for running modern machine learning models. T…read more

Pricing and Plans

(Paid)

Token-based inference

$0.27/usage

Per-token pricing for model inference; input and output tokens are billed separately and shown per 1M tokens.

  • Input tokens billed (example: $0.27 per 1M input tokens)
  • Output tokens billed (example: $0.40 per 1M output tokens)
  • Access to hosted model catalog and streaming

Dedicated GPU (A100 example)

$0.89/usage

Example price for A100 dedicated GPU per GPU-hour; other GPU types (H100, H200, B200) have different hourly rates.

  • Dedicated GPU instances for custom model hosting
  • Billed per GPU-hour with autoscaling options
  • Suitable for private deployments and high-throughput inference

System Requirements

Operating System
Any OS with a modern web browser
Memory (RAM)
4 GB+ RAM
Processor
Any modern 64-bit CPU
Disk Space
No local storage required (cloud-based)

AI Capabilities

Text Generation
Embeddings
Reranker
Text To Image
Text To Speech
Text To Video
Automatic Speech Recognition
Zero-shot Image Classification