Cartesia Sonic

Name: Cartesia Sonic
Availability: OnlineOnly
Author: Cartesia

Ultra-low latency text-to-speech model with 90ms time-to-first-audio designed for real-time voice AI applications and voice agents.

Visit Website

At a Glance

Pricing

Free tier available

Get introduced to ultra-low latency voice AI through core models and your own voice agent

Pro: $4/mo

Startup: $39/mo

Scale: $239/mo

+1 more plan

Engagement

Available On

Web

API

SDK

CartesiaSan Francisco, CAEst. 2023$191M raised

Listed Feb 2026

About Cartesia Sonic

Cartesia Sonic is a flagship text-to-speech (TTS) model that delivers ultra-low latency voice generation with a time-to-first-audio of just 90ms. Designed for fluid, real-time voice AI experiences, Sonic powers voice agents, customer service applications, localization, and interactive conversational systems. The platform includes Sonic-3 as the flagship TTS model, along with Ink-Whisper for speech-to-text and Line for voice agent development.

Ultra-Low Latency TTS: Sonic-3 delivers industry-leading 90ms time-to-first-audio, enabling natural real-time conversations and voice interactions without perceptible delays.
Voice Cloning: Offers both instant voice cloning (available on Pro and above) and professional voice cloning (Startup and above) for creating custom voice profiles with high fidelity.
Voice Changer: Transform audio with voice modification capabilities, allowing users to alter voice characteristics in real-time.
Multilingual Support: Comprehensive language support for global applications including localization across Asia Pacific, Europe, Latin America, Middle East, and more.
Voice Library: Access a curated collection of pre-built voices for immediate use in applications without custom training.
Design a Voice: Create custom voice profiles tailored to specific brand requirements and use cases.
Infilling: Advanced text infilling capabilities for seamless audio generation with natural transitions.
Line Voice Agent Platform: Build voice agents from first agent to production-ready deployment with SDK, CLI, telephony integration, call analytics, and observability tools.
Ink Speech-to-Text: Ink-Whisper provides the fastest streaming speech-to-text at competitive pricing, complementing the TTS capabilities for full voice AI workflows.
API Access: RESTful API with concurrent request support scaling from 2 (Free) to custom limits (Enterprise) for TTS operations.

To get started, sign up for a free account to receive 20K credits for models and $1 prepaid for agents. Upgrade to Pro for commercial use and instant voice cloning, or choose Startup/Scale plans for team collaboration, higher concurrency limits, and professional voice cloning capabilities.

Community Discussions

Be the first to start a conversation about Cartesia Sonic

Share your experience with Cartesia Sonic, ask questions, or help others learn from your insights.

Pricing

FREE

Free

Get introduced to ultra-low latency voice AI through core models and your own voice agent

20K credits for models
$1 prepaid for agents
Personal use
Discord support
Sonic-3 API access

Pro

Upgrade for instant voice cloning and to try voice AI in production for commercial use

$4/mo

billed annually

$5/mo monthly

100K credits for models
$5 prepaid for agents
Instant voice cloning
Commercial Use
3 TTS concurrent requests
3 agent slots
12 concurrent calls

Startup

For teams starting to use voice AI in production and need shared API keys, pro voice cloning, and multiple agents

$39/mo

billed annually

$49/mo monthly

1.25M credits for models
$49 prepaid for agents
Pro voice cloning
Organizations
5 TTS concurrent requests
5 agent slots
20 concurrent calls

Scale

For businesses with large-scale use cases requiring high concurrencies and multiple agents

$239/mo

billed annually

$299/mo monthly

8M credits for models
$299 prepaid for agents
Priority support
High concurrency limits
15 TTS concurrent requests
10 agent slots
60 concurrent calls

Enterprise

Custom supported models and agents with mission-critical guarantees for uptime, security, and compliance

Custom

contact sales

Custom usage pricing
Custom concurrency
Enterprise support via Slack
Enterprise-grade security & compliance
Priority Dedicated Support via Slack
Single Sign-On (SSO)
PCI compliance
Custom SLAs
Custom Security Review
HIPAA compliance

View official pricing

Capabilities

Key Features

Ultra-low latency TTS (90ms time-to-first-audio)
Sonic-3 flagship text-to-speech model
Instant voice cloning
Pro voice cloning
Voice changer
Voice library
Design a voice
Infilling
Multilingual support
Sonic-Turbo API access
Line voice agent development platform
Ink-Whisper speech-to-text
Telephony integration
Call analytics
Text-to-Agent creation
Reasoning templates
CLI and SDK
Observability tools
Background agents
GitHub integration

Integrations

Telephony systems

GitHub

API Available

View Docs

Back to all tools Suggest an edit

About Cartesia Sonic

Ultra-Low Latency TTS: Sonic-3 delivers industry-leading 90ms time-to-first-audio, enabling natural real-time conversations and voice interactions without perceptible delays.
Voice Cloning: Offers both instant voice cloning (available on Pro and above) and professional voice cloning (Startup and above) for creating custom voice profiles with high fidelity.
Voice Changer: Transform audio with voice modification capabilities, allowing users to alter voice characteristics in real-time.
Multilingual Support: Comprehensive language support for global applications including localization across Asia Pacific, Europe, Latin America, Middle East, and more.
Voice Library: Access a curated collection of pre-built voices for immediate use in applications without custom training.
Design a Voice: Create custom voice profiles tailored to specific brand requirements and use cases.
Infilling: Advanced text infilling capabilities for seamless audio generation with natural transitions.
Line Voice Agent Platform: Build voice agents from first agent to production-ready deployment with SDK, CLI, telephony integration, call analytics, and observability tools.
Ink Speech-to-Text: Ink-Whisper provides the fastest streaming speech-to-text at competitive pricing, complementing the TTS capabilities for full voice AI workflows.
API Access: RESTful API with concurrent request support scaling from 2 (Free) to custom limits (Enterprise) for TTS operations.

Cartesia Sonic