Fish Audio
Fish Audio provides studio-grade AI text-to-speech and voice cloning tools with emotion control, bridging the gap between synthetic and natural speech.
At a Glance
- AI Developers
- Content Creators
- Gaming Industry
- Enterprise Solutions
AI Tools by Fish Audio
(1)Fish Audio
AI Voice Cloning and TTS API
Discussions
No discussions yet
Be the first to start a discussion about Fish Audio
Latest News
Fish Audio Open-Sources S2: Fine-Grained Control Meets Production Streaming
Best Speech to Text APIs 2026: Technical Comparison & Integration Guide
How to Use SAM Audio for Audio Separation Step by Step
Launching Fish Audio S1: A Frontier Text-to-Speech Audio Foundation Model
Products & Services
A frontier text-to-speech audio foundation model touted as the most expressive and natural TTS model on the market.
An open-source version of the model featuring fine-grained control and support for production streaming.
Next-generation multilingual text-to-speech and realistic voice cloning engine.
Studio-grade voice cloning that sounds like the user with support for emotion control.
Market Position
Challenges incumbents like ElevenLabs by offering superior emotion control, real-time directing capabilities, and open-source models (S2).
Leadership
Founders
Leng Yue (冷月)
Founder of Fish Audio and former NVIDIA researcher. A prolific open-source developer who turned a personal interest in human-level AI voice synthesis into a business scaling to $5M ARR.
Shijia Liao
Chief Scientist at Fish Audio. Former researcher at NVIDIA and University of Maryland. Expert in vision foundation models and multi-modality models.
Executive Team
Leng Yue
CEO & Founder
Former NVIDIA researcher and open-source developer.
Shijia Liao
Chief Scientist
Former NVIDIA and UMD researcher specializing in multi-modality models.
Founding Story
Founded by a Gen Z team led by former NVIDIA researcher Leng Yue, who turned a personal focus on human-level AI voice synthesis (reportedly motivated by 'heartbreak') into a high-growth startup that scaled from $400k to $5M ARR in early 2025.
Business Model
Revenue Model
Subscription-based tiered plans and pay-as-you-go API usage.
Pricing Tiers
8,000 credits monthly, up to 7 mins of S1 generation, 3 public voice slots.
250,000 credits monthly, up to 200 mins S1 generation, unlimited public + 10 private slots, commercial use.
2,000,000 credits monthly, up to 27 hours S1 generation, unlimited voice slots, commercial use.
Target Markets
- AI Developers
- Content Creators
- Gaming Industry
- Enterprise Solutions
- AI companions
- Content creation
- Game development
- Voice-overs
- Real-time avatars
- 20,000+ active developers
- Over 1.2M creators