Miso TTS 8B
An 8-billion parameter open-source text-to-speech model designed for high-quality, highly emotive conversational speech generation with voice cloning support.
At a Glance
Free to use under Modified MIT License. Run locally or access via Hugging Face.
Engagement
Available On
Alternatives
Listed Jun 2026
About Miso TTS 8B
Miso TTS 8B is an open-source, 8-billion parameter text-to-dialogue model built by Miso Labs (Kamino Learning, Inc.) for high-quality conversational speech synthesis. The model is available on GitHub and Hugging Face, and can be run locally on CUDA-capable hardware. A live demo is hosted on the Miso Labs landing page at misolabs.ai.
What It Is
Miso TTS 8B is a text-to-speech model in the RVQ (Residual Vector Quantization) Transformer category, inspired by the Sesame CSM architecture. It generates Mimi audio codes from text and optional audio context, making it suitable for conversational speech generation rather than simple single-utterance synthesis. The model currently supports English only.
Architecture
The model uses two transformer components working in tandem:
- Backbone transformer (Llama 8B): Consumes interleaved text and audio-frame embeddings, conditioning generation on conversation history.
- Audio decoder transformer (Llama 300M): Autoregressively predicts higher-order audio codebooks within each frame.
Key model specs include a text vocabulary of 128,256 tokens, an audio vocabulary of 2,051 tokens, 32 audio codebooks, the Mimi audio tokenizer, and a maximum sequence length of 2,048. Default inference uses torch.bfloat16 precision.
Voice Cloning and Prompted Generation
Miso TTS 8B supports optional prompted generation, allowing the model to condition on prior audio for voice cloning. Users supply a Segment object containing a speaker ID, transcript, and audio waveform as context. Without a prompt, the model generates speech from text alone. Generated audio is watermarked by default using the SilentCipher watermarking model from Sony.
Setup Path
The repository supports two installation paths:
- uv (recommended): Clone the repo, run
uv sync --python 3.10, activate the virtual environment, and runuv run python run_misotts.py. - pip: Create a Python 3.10 venv, install with
pip install -e ., and runpython run_misotts.py.
Model weights are hosted publicly on Hugging Face at MisoLabs/MisoTTS and are downloaded automatically on first run via the Hugging Face Hub cache.
Deployment Notes and Safety
The model requires a CUDA GPU with sufficient VRAM for the checkpoint precision being loaded. The repository notes that Miso TTS 8B is a large model and recommends GPU inference for best results. The project's safety guidelines explicitly prohibit using the model to impersonate people, create deceptive audio, commit fraud, or generate harmful content. Deployers are advised to use their own private watermark key.
Current Status
The GitHub repository was created in May 2026 and last updated in early June 2026, with 1,662 stars and 134 forks as reported by the repository metadata. The project is released under a Modified MIT License, with a commercial attribution clause applying to products exceeding 50 million monthly active users or $10 million USD in monthly revenue.
Community Discussions
Be the first to start a conversation about Miso TTS 8B
Share your experience with Miso TTS 8B, ask questions, or help others learn from your insights.
Pricing
Open Source
Free to use under Modified MIT License. Run locally or access via Hugging Face.
- Full model weights on Hugging Face
- Local inference via Python
- Voice cloning support
- Audio watermarking
- Commercial use allowed with attribution for large-scale deployments
Capabilities
Key Features
- 8B parameter text-to-speech model
- High-quality conversational speech generation
- Voice cloning via prompted generation
- RVQ Transformer architecture
- Llama 8B backbone with Llama 300M audio decoder
- Mimi audio tokenizer
- 32 audio codebooks
- SilentCipher audio watermarking
- Hugging Face model hosting
- Local inference support
- Python API
- English language support