# Cartesia Sonic

> Ultra-low latency text-to-speech model with 90ms time-to-first-audio designed for real-time voice AI applications and voice agents.

Cartesia Sonic is a flagship text-to-speech (TTS) model that delivers ultra-low latency voice generation with a time-to-first-audio of just 90ms. Designed for fluid, real-time voice AI experiences, Sonic powers voice agents, customer service applications, localization, and interactive conversational systems. The platform includes Sonic-3 as the flagship TTS model, along with Ink-Whisper for speech-to-text and Line for voice agent development.

- **Ultra-Low Latency TTS**: Sonic-3 delivers industry-leading 90ms time-to-first-audio, enabling natural real-time conversations and voice interactions without perceptible delays.

- **Voice Cloning**: Offers both instant voice cloning (available on Pro and above) and professional voice cloning (Startup and above) for creating custom voice profiles with high fidelity.

- **Voice Changer**: Transform audio with voice modification capabilities, allowing users to alter voice characteristics in real-time.

- **Multilingual Support**: Comprehensive language support for global applications including localization across Asia Pacific, Europe, Latin America, Middle East, and more.

- **Voice Library**: Access a curated collection of pre-built voices for immediate use in applications without custom training.

- **Design a Voice**: Create custom voice profiles tailored to specific brand requirements and use cases.

- **Infilling**: Advanced text infilling capabilities for seamless audio generation with natural transitions.

- **Line Voice Agent Platform**: Build voice agents from first agent to production-ready deployment with SDK, CLI, telephony integration, call analytics, and observability tools.

- **Ink Speech-to-Text**: Ink-Whisper provides the fastest streaming speech-to-text at competitive pricing, complementing the TTS capabilities for full voice AI workflows.

- **API Access**: RESTful API with concurrent request support scaling from 2 (Free) to custom limits (Enterprise) for TTS operations.

To get started, sign up for a free account to receive 20K credits for models and $1 prepaid for agents. Upgrade to Pro for commercial use and instant voice cloning, or choose Startup/Scale plans for team collaboration, higher concurrency limits, and professional voice cloning capabilities.

## Features
- Ultra-low latency TTS (90ms time-to-first-audio)
- Sonic-3 flagship text-to-speech model
- Instant voice cloning
- Pro voice cloning
- Voice changer
- Voice library
- Design a voice
- Infilling
- Multilingual support
- Sonic-Turbo API access
- Line voice agent development platform
- Ink-Whisper speech-to-text
- Telephony integration
- Call analytics
- Text-to-Agent creation
- Reasoning templates
- CLI and SDK
- Observability tools
- Background agents
- GitHub integration

## Integrations
Telephony systems, GitHub

## Platforms
WEB, API, DEVELOPER_SDK

## Pricing
Open Source, Free tier available

## Links
- Website: https://cartesia.ai/sonic
- Documentation: https://docs.cartesia.ai/
- Repository: https://github.com/cartesia-ai
- EveryDev.ai: https://www.everydev.ai/tools/cartesia-sonic