Cartesia Sonic
Ultra-low latency text-to-speech model with 90ms time-to-first-audio designed for real-time voice AI applications and voice agents.
At a Glance
Pricing
Get introduced to ultra-low latency voice AI through core models and your own voice agent
Engagement
Available On
About Cartesia Sonic
Cartesia Sonic is a flagship text-to-speech (TTS) model that delivers ultra-low latency voice generation with a time-to-first-audio of just 90ms. Designed for fluid, real-time voice AI experiences, Sonic powers voice agents, customer service applications, localization, and interactive conversational systems. The platform includes Sonic-3 as the flagship TTS model, along with Ink-Whisper for speech-to-text and Line for voice agent development.
-
Ultra-Low Latency TTS: Sonic-3 delivers industry-leading 90ms time-to-first-audio, enabling natural real-time conversations and voice interactions without perceptible delays.
-
Voice Cloning: Offers both instant voice cloning (available on Pro and above) and professional voice cloning (Startup and above) for creating custom voice profiles with high fidelity.
-
Voice Changer: Transform audio with voice modification capabilities, allowing users to alter voice characteristics in real-time.
-
Multilingual Support: Comprehensive language support for global applications including localization across Asia Pacific, Europe, Latin America, Middle East, and more.
-
Voice Library: Access a curated collection of pre-built voices for immediate use in applications without custom training.
-
Design a Voice: Create custom voice profiles tailored to specific brand requirements and use cases.
-
Infilling: Advanced text infilling capabilities for seamless audio generation with natural transitions.
-
Line Voice Agent Platform: Build voice agents from first agent to production-ready deployment with SDK, CLI, telephony integration, call analytics, and observability tools.
-
Ink Speech-to-Text: Ink-Whisper provides the fastest streaming speech-to-text at competitive pricing, complementing the TTS capabilities for full voice AI workflows.
-
API Access: RESTful API with concurrent request support scaling from 2 (Free) to custom limits (Enterprise) for TTS operations.
To get started, sign up for a free account to receive 20K credits for models and $1 prepaid for agents. Upgrade to Pro for commercial use and instant voice cloning, or choose Startup/Scale plans for team collaboration, higher concurrency limits, and professional voice cloning capabilities.
Community Discussions
Be the first to start a conversation about Cartesia Sonic
Share your experience with Cartesia Sonic, ask questions, or help others learn from your insights.
Pricing
Free Plan Available
Get introduced to ultra-low latency voice AI through core models and your own voice agent
- 20K credits for models
- $1 prepaid for agents
- Personal use
- Discord support
- Sonic-3 API access
Pro
Upgrade for instant voice cloning and to try voice AI in production for commercial use
- 100K credits for models
- $5 prepaid for agents
- Instant voice cloning
- Commercial Use
- 3 TTS concurrent requests
- 3 agent slots
- 12 concurrent calls
Startup
For teams starting to use voice AI in production and need shared API keys, pro voice cloning, and multiple agents
- 1.25M credits for models
- $49 prepaid for agents
- Pro voice cloning
- Organizations
- 5 TTS concurrent requests
- 5 agent slots
- 20 concurrent calls
Scale
For businesses with large-scale use cases requiring high concurrencies and multiple agents
- 8M credits for models
- $299 prepaid for agents
- Priority support
- High concurrency limits
- 15 TTS concurrent requests
- 10 agent slots
- 60 concurrent calls
Enterprise
Custom supported models and agents with mission-critical guarantees for uptime, security, and compliance
- Custom usage pricing
- Custom concurrency
- Enterprise support via Slack
- Enterprise-grade security & compliance
- Priority Dedicated Support via Slack
- Single Sign-On (SSO)
- PCI compliance
- Custom SLAs
- Custom Security Review
- HIPAA compliance
Capabilities
Key Features
- Ultra-low latency TTS (90ms time-to-first-audio)
- Sonic-3 flagship text-to-speech model
- Instant voice cloning
- Pro voice cloning
- Voice changer
- Voice library
- Design a voice
- Infilling
- Multilingual support
- Sonic-Turbo API access
- Line voice agent development platform
- Ink-Whisper speech-to-text
- Telephony integration
- Call analytics
- Text-to-Agent creation
- Reasoning templates
- CLI and SDK
- Observability tools
- Background agents
- GitHub integration
