inference.sh
An agent runtime platform that handles durable execution, tool orchestration, observability, and infrastructure so developers can run reliable AI agents in production.
At a Glance
Pricing
Entry-level tier with base concurrency and standard result storage. Unlocked automatically on sign-up.
Engagement
Available On
About inference.sh
inference.sh is an agent runtime platform that eliminates the infrastructure burden of running AI agents in production. It provides durable execution, 150+ pre-built tool integrations, real-time observability, and human-in-the-loop controls so developers can focus on what their agents do rather than how to keep them running. The platform supports no-code, low-code, and full API workflows, making it accessible to builders at every level. It is built around a trust-first philosophy where every action is traceable, failures are graceful, and automation is never a black box.
- Durable Execution: Event-driven, checkpoint-based execution ensures agents resume from the last successful step after failures, timeouts, or restarts — no lost state.
- Tool Orchestration: Access 150+ apps as agent tools via a single API, with structured execution, approval gates, and full visibility into what ran.
- Observability: Every tool call, decision, and action is automatically traced and streamed in real time — no instrumentation required.
- Human-in-the-Loop: Add approval gates with a single flag; agents pause, show their intended action, and wait for confirmation before proceeding.
- Deep Agents (Sub-Agents): Orchestrator agents can spawn specialist sub-agents as tools, delegating tasks and collecting structured results back up the chain.
- Dynamic Widgets: Agents generate interactive UI elements — forms, charts, selections — rendered inline in the chat interface.
- Pay-Per-Execution Pricing: Credits-based model with no idle costs; tiers unlock automatically based on cumulative usage.
- Custom App Creation: Scaffold, code, and deploy your own apps using the CLI and Python or JavaScript SDKs; schemas automatically become tool parameters.
- Visual Workflow Builder: Drag-and-drop flow editor chains apps into multi-step pipelines, deployable as a single callable app.
- Real OAuth Integrations: Durable, encrypted integrations with Google, Slack, Discord, X.com, Microsoft, Salesforce, Notion, and more — with automatic token refresh.
- Bring Your Own Keys (BYOK): Use your own GCP, Azure, or AWS billing and credits for AI models.
- Agentic Payments (x402): Managed wallets and budget controls let agents make programmatic payments autonomously via the x402 protocol.
- Self-Hosted Option: Deploy inference.sh in your own VPC or on-premises for maximum data control and privacy.
- Python & JavaScript SDKs: Install via
pip install inferenceshornpm i @inferencesh/sdkto create, manage, and run agents fully programmatically.
Community Discussions
Be the first to start a conversation about inference.sh
Share your experience with inference.sh, ask questions, or help others learn from your insights.
Pricing
Free Plan Available
Entry-level tier with base concurrency and standard result storage. Unlocked automatically on sign-up.
- Base concurrent agents
- Base concurrent API calls
- Standard result storage
Growth
Higher concurrency and extended storage. Unlocks automatically based on cumulative usage or via sales contact.
- More concurrent agents
- More concurrent API calls
- Extended result storage
- BYOK (own API keys)
- Team workspaces
- Private apps
- Priority queue
- Custom integrations
- Priority support
Scale
Highest concurrency and maximum storage for large-scale production workloads.
- Highest concurrent agents
- Highest concurrent API calls
- Maximum result storage
- BYOK (own API keys)
- Team workspaces
- Private apps
- Priority queue
- Custom integrations
- Priority support
Enterprise
Custom concurrency, pooled credits, SSO/SAML, audit logs, self-hosted deployment, and dedicated support with SLAs.
- Custom concurrency
- Pooled credits
- SSO/SAML
- Audit logs
- Self-hosted deployment
- Dedicated support
- SLAs
Capabilities
Key Features
- Durable execution with checkpoint-based state persistence
- 150+ pre-built tool integrations via single API
- Real-time observability and automatic tracing
- Human-in-the-loop approval gates
- Deep agents / sub-agent orchestration
- Dynamic inline UI widgets
- Visual drag-and-drop workflow builder
- Custom app creation with CLI
- Python and JavaScript SDKs
- Real OAuth integrations with token refresh
- Bring Your Own Keys (BYOK)
- Agentic payments via x402 protocol
- Self-hosted / on-premises deployment
- Pay-per-execution credits model
- Webhooks and async callbacks
- Built-in key-value memory per conversation
- Multi-step planning with interruption resume
- Structured output for orchestrators
