BentoML

Name: BentoML
Price: 0.51 USD
Availability: OnlineOnly
Author: BentoML

AI Infrastructure

AI inference platform for deploying, scaling, and optimizing any ML model in production with full control over infrastructure.

Visit Website

At a Glance

Pricing

Free tier available

Full access to Bento Inference Platform with one-time free compute credit

Starter: $0.51

Scale: Custom/contact

Enterprise: Custom/contact

Engagement

0views

0saves

0discussions

Available On

Web

API

SDK

Resources

Website Docs GitHub llms.txt

Topics

AI Infrastructure Model Management Cloud Computing Platforms

About BentoML

BentoML is an AI inference platform designed for speed and control, enabling teams to deploy any model anywhere with tailored optimization, efficient scaling, and streamlined operations. The platform offers both a managed cloud service (Bento Inference Platform) and an open-source framework for serving AI/ML models and custom inference pipelines in production.

BentoML simplifies inference infrastructure while providing full control over deployments, supporting popular open-source models like Llama, DeepSeek, Flux, and Qwen, as well as custom fine-tuned models across any architecture, framework, or modality.

Open Model Catalog allows deploying popular open-source models with just a few clicks, including day-one access to newly released models.
Custom Model Serving provides a unified framework for packaging and deploying models using vLLM, TRT-LLM, JAX, SGLang, PyTorch, and Transformers.
Tailored Optimization enables automatic configuration finding based on latency, throughput, or cost requirements, with advanced performance tuning and distributed LLM inference across multiple GPUs.
Smart Scaling features intelligent auto-scaling that adapts to inference-specific metrics and patterns, with blazing-fast cold starts and scale-to-zero capabilities.
Advanced Serving Patterns support interactive applications, async long-running tasks, large-scale batch inference, and complex workflow orchestration for RAG and compound AI systems.
Dev Codespace allows iterating in the cloud as fast as locally, with instant cloud GPU runs in seconds.
LLM Gateway provides a unified interface for all LLM providers with centralized cost control and optimization.
Full Observability offers comprehensive monitoring including compute and performance tracking, LLM-specific metrics, and system health monitoring.
Enterprise Features include self-hosting on any cloud or on-premises, SOC 2 Type II and ISO 27001 compliance, HIPAA support, SSO, audit logs, and dedicated support engineering.

To get started, sign up for the Starter plan with free compute credits to prototype and test deployments. Use the BentoML open-source library to package your models, then deploy to the cloud with automatic scaling and monitoring. For enterprise needs, contact the team for custom SLAs and bring-your-own-cloud options.

Community Discussions

Be the first to start a conversation about BentoML

Share your experience with BentoML, ask questions, or help others learn from your insights.

Pricing

TRIAL

Until credits exhausted

Full access to Bento Inference Platform with one-time free compute credit

Deploy open-source LLMs
Deploy custom models with BentoML
Spin up GPUs and test deployments

Starter

Learn and prototype with no up-front commitment

$0.51

usage based

Dedicated deployments
Pay only compute you use
Fast cold start and auto-scaling
SOC 2 Type II compliant
Monitoring and logging dashboard
Community Slack support

Scale

Cost-efficient scaling for growing workloads with committed use discount

Custom

contact sales

Priority access to H100, H200 and more
Unlimited seats and deployments
Dedicated compute pool and cold-start guarantee
Region selection
Dedicated Slack channel

Enterprise

Full control and dedicated support in your environment

Custom

contact sales

Full control in your VPC or on-prem
Tailored performance research and tuning
Custom SLAs
Use existing cloud commitments
Full control over data and network policies
Multi-cloud, hybrid compute orchestration
Audit logs, SSO, compliance evidence kit
Dedicated support engineering

View official pricing

Capabilities

Key Features

Open model catalog with one-click deployment
Custom model serving across any framework
Automatic performance optimization
Intelligent auto-scaling with scale-to-zero
Distributed LLM inference across multiple GPUs
Dev codespace for cloud iteration
LLM Gateway for unified API access
Comprehensive observability and monitoring
Deployment automation and CI/CD
Canary, shadow, and A/B testing
Multi-cloud and hybrid compute orchestration
Cross-region scaling
Cold-start acceleration
Batch inference processing
SOC 2 Type II compliance
HIPAA compliance
SSO and audit logs

Integrations

vLLM

TRT-LLM

JAX

SGLang

PyTorch

Transformers

AWS

GCP

Azure

Kubernetes

Nvidia GPUs

AMD GPUs

API Available

View Docs

Back to all tools

BentoML

At a Glance

Pricing

Engagement

Available On

Resources

Topics

About BentoML

Community Discussions

Be the first to start a conversation about BentoML

Pricing

Until credits exhausted

Starter

Scale

Enterprise

Capabilities

Key Features

Integrations

Red Hat AI

Modal

Deep Infra