Inworld AI

Name: Inworld AI
Availability: OnlineOnly
Author: Inworld AI

Production-grade voice AI APIs offering top-ranked text-to-speech, speech-to-speech, speech-to-text, and LLM routing for developers building natural conversational applications.

Visit Website

At a Glance

Pricing

Free tier available

Evaluation and prototyping with pay-as-you-go usage and up to 40 minutes of TTS included free.

Creator: $25/mo

Developer: $300/mo

Growth: $1500/mo

+1 more plan

Engagement

Available On

API

Web

Inworld AIMountain View, CAEst. 2021$125M raised

Listed May 2026

About Inworld AI

Inworld AI provides production-grade voice AI APIs ranked #1 on the Artificial Analysis Speech Arena, offering realtime text-to-speech, speech-to-speech, speech-to-text, and intelligent LLM routing. The platform delivers sub-130ms first-chunk latency and supports over 100 languages, making it suitable for companions, agentic workforces, learning platforms, health and wellness apps, and interactive media. Developers access all capabilities through a unified API with SOC2 Type II, HIPAA, and GDPR compliance built in.

Realtime TTS — Top-ranked text-to-speech with sub-130ms latency, starting at $15/1M characters; supports voice cloning from 15 seconds of audio, text-based voice design, advanced inline voice direction, and cross-lingual output in 100+ languages.
Realtime Speech-to-Speech API — End-to-end full-duplex audio streaming over WebSocket or WebRTC with custom voices, tool calling, intelligent turn detection, and dynamic context management mid-session.
Realtime STT — Speech-to-text with real-time voice profiling (emotion, age, accent, pitch, style), semantic and acoustic VAD, word-level timestamps, speaker diarization, and custom vocabulary support.
Realtime LLM Router — Single API that routes requests across OpenAI, Anthropic, Google, xAI, Groq, Mistral, and 200+ models with built-in failover, A/B testing, user-aware and context-aware routing, and no added latency.
Voice Cloning & Design — Clone a voice from 15 seconds of audio or describe a voice in natural language to generate a production-ready custom voice without recording.
Advanced Voice Direction — Add bracketed instructions anywhere in text to adjust tone, speed, volume, vocal style, and pauses in real time.
Enterprise Security — SOC2 Type II certified, HIPAA compliant, GDPR compliant; optional zero data retention, on-prem deployment, and EU/India data residency available.
Credit-Based Billing — Monthly credits usable across TTS, STT, and LLMs; higher tiers unlock volume discounts up to 40% off standard rates.

Community Discussions

Be the first to start a conversation about Inworld AI

Share your experience with Inworld AI, ask questions, or help others learn from your insights.

Pricing

FREE

On-Demand

Evaluation and prototyping with pay-as-you-go usage and up to 40 minutes of TTS included free.

Up to 40 min TTS included
5 custom voices
Voice cloning & voice design
Realtime API access
220+ LLM models via Router

Creator

Content creation and small projects with $25 in monthly credits.

$25

per month

$25 in credits per month
100 custom voices
Audio downloads
40K chars per TTS Playground request
Workspace creation & sharing
Everything in On-Demand

Developer

Popular

Production applications with $300 in monthly credits and up to 20% off rates.

$300

per month

$300 in credits per month
Up to 20% off rates
1,000 custom voices
Increased concurrency limits
Workspace creation and sharing
Priority email support
Everything in Creator

Growth

Large deployments and compliance with $1,500 in monthly credits and up to 40% off rates.

$1500

per month

$1,500 in credits per month
Up to 40% off rates
3,000 custom voices
Higher API concurrency & limits
Professional voice cloning (add-on)
ZDR, HIPAA & BAA (add-ons)
Everything in Developer

Enterprise

Custom pricing, limits, and terms for the highest-volume deployments.

Custom

contact sales

As low as $10/1M for Realtime TTS-2 & 1.5 Max and $5/1M for 1.5 Mini
Custom limits
SLA & DPA
On-prem deployment
EU & India data residency
Dedicated AM & Slack channel
Everything in Growth

View official pricing

Capabilities

Key Features

Realtime text-to-speech (TTS)
Speech-to-speech API
Speech-to-text (STT)
LLM routing across 200+ models
Voice cloning from 15 seconds of audio
Text-based voice design
Advanced inline voice direction
Cross-lingual support (100+ languages)
Full-duplex WebSocket/WebRTC streaming
Intelligent turn detection
Function calling mid-session
Voice profiling (emotion, age, accent, pitch, style)
Word-level timestamps and speaker diarization
Custom vocabulary support
User-aware and context-aware LLM routing
Built-in A/B testing and failover
SOC2 Type II, HIPAA, GDPR compliance
Zero data retention (add-on)
On-prem deployment (Enterprise)
EU and India data residency (Enterprise)

Integrations

OpenAI

Anthropic

Google

xAI

Groq

Mistral

WebSocket

WebRTC

API Available

View Docs

Back to all tools Suggest an edit

About Inworld AI

Realtime TTS — Top-ranked text-to-speech with sub-130ms latency, starting at $15/1M characters; supports voice cloning from 15 seconds of audio, text-based voice design, advanced inline voice direction, and cross-lingual output in 100+ languages.
Realtime Speech-to-Speech API — End-to-end full-duplex audio streaming over WebSocket or WebRTC with custom voices, tool calling, intelligent turn detection, and dynamic context management mid-session.
Realtime STT — Speech-to-text with real-time voice profiling (emotion, age, accent, pitch, style), semantic and acoustic VAD, word-level timestamps, speaker diarization, and custom vocabulary support.
Realtime LLM Router — Single API that routes requests across OpenAI, Anthropic, Google, xAI, Groq, Mistral, and 200+ models with built-in failover, A/B testing, user-aware and context-aware routing, and no added latency.
Voice Cloning & Design — Clone a voice from 15 seconds of audio or describe a voice in natural language to generate a production-ready custom voice without recording.
Advanced Voice Direction — Add bracketed instructions anywhere in text to adjust tone, speed, volume, vocal style, and pauses in real time.
Enterprise Security — SOC2 Type II certified, HIPAA compliant, GDPR compliant; optional zero data retention, on-prem deployment, and EU/India data residency available.
Credit-Based Billing — Monthly credits usable across TTS, STT, and LLMs; higher tiers unlock volume discounts up to 40% off standard rates.

Inworld AI