Inworld AI
Production-grade voice AI APIs offering top-ranked text-to-speech, speech-to-speech, speech-to-text, and LLM routing for developers building natural conversational applications.
At a Glance
About Inworld AI
Inworld AI provides production-grade voice AI APIs ranked #1 on the Artificial Analysis Speech Arena, offering realtime text-to-speech, speech-to-speech, speech-to-text, and intelligent LLM routing. The platform delivers sub-130ms first-chunk latency and supports over 100 languages, making it suitable for companions, agentic workforces, learning platforms, health and wellness apps, and interactive media. Developers access all capabilities through a unified API with SOC2 Type II, HIPAA, and GDPR compliance built in.
- Realtime TTS — Top-ranked text-to-speech with sub-130ms latency, starting at $15/1M characters; supports voice cloning from 15 seconds of audio, text-based voice design, advanced inline voice direction, and cross-lingual output in 100+ languages.
- Realtime Speech-to-Speech API — End-to-end full-duplex audio streaming over WebSocket or WebRTC with custom voices, tool calling, intelligent turn detection, and dynamic context management mid-session.
- Realtime STT — Speech-to-text with real-time voice profiling (emotion, age, accent, pitch, style), semantic and acoustic VAD, word-level timestamps, speaker diarization, and custom vocabulary support.
- Realtime LLM Router — Single API that routes requests across OpenAI, Anthropic, Google, xAI, Groq, Mistral, and 200+ models with built-in failover, A/B testing, user-aware and context-aware routing, and no added latency.
- Voice Cloning & Design — Clone a voice from 15 seconds of audio or describe a voice in natural language to generate a production-ready custom voice without recording.
- Advanced Voice Direction — Add bracketed instructions anywhere in text to adjust tone, speed, volume, vocal style, and pauses in real time.
- Enterprise Security — SOC2 Type II certified, HIPAA compliant, GDPR compliant; optional zero data retention, on-prem deployment, and EU/India data residency available.
- Credit-Based Billing — Monthly credits usable across TTS, STT, and LLMs; higher tiers unlock volume discounts up to 40% off standard rates.
Community Discussions
Be the first to start a conversation about Inworld AI
Share your experience with Inworld AI, ask questions, or help others learn from your insights.
Pricing
On-Demand
Evaluation and prototyping with pay-as-you-go usage and up to 40 minutes of TTS included free.
- Up to 40 min TTS included
- 5 custom voices
- Voice cloning & voice design
- Realtime API access
- 220+ LLM models via Router
Creator
Content creation and small projects with $25 in monthly credits.
- $25 in credits per month
- 100 custom voices
- Audio downloads
- 40K chars per TTS Playground request
- Workspace creation & sharing
- Everything in On-Demand
Developer
Production applications with $300 in monthly credits and up to 20% off rates.
- $300 in credits per month
- Up to 20% off rates
- 1,000 custom voices
- Increased concurrency limits
- Workspace creation and sharing
- Priority email support
- Everything in Creator
Growth
Large deployments and compliance with $1,500 in monthly credits and up to 40% off rates.
- $1,500 in credits per month
- Up to 40% off rates
- 3,000 custom voices
- Higher API concurrency & limits
- Professional voice cloning (add-on)
- ZDR, HIPAA & BAA (add-ons)
- Everything in Developer
Enterprise
Custom pricing, limits, and terms for the highest-volume deployments.
- As low as $10/1M for Realtime TTS-2 & 1.5 Max and $5/1M for 1.5 Mini
- Custom limits
- SLA & DPA
- On-prem deployment
- EU & India data residency
- Dedicated AM & Slack channel
- Everything in Growth
Capabilities
Key Features
- Realtime text-to-speech (TTS)
- Speech-to-speech API
- Speech-to-text (STT)
- LLM routing across 200+ models
- Voice cloning from 15 seconds of audio
- Text-based voice design
- Advanced inline voice direction
- Cross-lingual support (100+ languages)
- Full-duplex WebSocket/WebRTC streaming
- Intelligent turn detection
- Function calling mid-session
- Voice profiling (emotion, age, accent, pitch, style)
- Word-level timestamps and speaker diarization
- Custom vocabulary support
- User-aware and context-aware LLM routing
- Built-in A/B testing and failover
- SOC2 Type II, HIPAA, GDPR compliance
- Zero data retention (add-on)
- On-prem deployment (Enterprise)
- EU and India data residency (Enterprise)
