Dia

Name: Dia
Availability: OnlineOnly
Author: Nari Labs

Dia is an open-source text-to-speech model by Nari Labs that generates realistic dialogue audio with multiple speakers, emotions, and non-verbal sounds from transcripts.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source model available for free download and local use.

Engagement

Available On

Windows

macOS

Linux

Web

API

Nari LabsNari Labs builds open-source AI models focused on speech and…

Listed Mar 2026

About Dia

Dia is an open-source 1.6B parameter text-to-speech model developed by Nari Labs, designed to generate highly realistic dialogue directly from transcripts. It supports multi-speaker audio generation, non-verbal cues like laughter and coughing, and fine-grained emotion and tone control. Dia can also perform voice cloning using an audio reference, making it a powerful tool for content creators, researchers, and developers building conversational AI applications.

Multi-speaker dialogue generation: Generate realistic conversations between multiple speakers directly from a text transcript using speaker tags.
Non-verbal audio support: Include sounds like laughter, coughing, and sighs in generated audio by adding special tokens in the transcript.
Emotion and tone control: Guide the emotional delivery of speech through natural language descriptions embedded in the transcript.
Voice cloning: Provide an audio reference clip to clone a specific voice and use it in generated dialogue.
Open-source model weights: Download and run the 1.6B parameter model locally via Hugging Face or the GitHub repository.
Gradio demo: Try Dia instantly through the hosted Hugging Face Spaces demo without any local setup.
Python API: Integrate Dia into your own applications using the provided Python package and inference scripts.
Local inference: Run the model on your own hardware for full control over privacy and customization.

Community Discussions

Be the first to start a conversation about Dia

Share your experience with Dia, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source model available for free download and local use.

1.6B parameter TTS model
Multi-speaker dialogue generation
Voice cloning
Non-verbal audio cues
Emotion control

Capabilities

Key Features

Multi-speaker dialogue generation
Non-verbal audio cues (laughter, coughing, sighs)
Emotion and tone control via transcript
Voice cloning from audio reference
1.6B parameter open-source model
Hugging Face Spaces demo
Python API
Local inference support

Integrations

Hugging Face

Gradio

API Available

View Docs

Back to all tools

Dia

Voice Synthesis

Dia is an open-source text-to-speech model by Nari Labs that generates realistic dialogue audio with multiple speakers, emotions, and non-verbal sounds from transcripts.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source model available for free download and local use.

Engagement

12views

Discussions

Available On

Windows

macOS

Linux

Web

API

Resources

Website Docs GitHub llms.txt

Topics

Voice Synthesis Audio Production Local Inference

Alternatives

Fish Audio Papla Media Vois

Developer

Nari LabsNari Labs builds open-source AI models focused on speech and…

Listed Mar 2026

About Dia

Multi-speaker dialogue generation: Generate realistic conversations between multiple speakers directly from a text transcript using speaker tags.
Non-verbal audio support: Include sounds like laughter, coughing, and sighs in generated audio by adding special tokens in the transcript.
Emotion and tone control: Guide the emotional delivery of speech through natural language descriptions embedded in the transcript.
Voice cloning: Provide an audio reference clip to clone a specific voice and use it in generated dialogue.
Open-source model weights: Download and run the 1.6B parameter model locally via Hugging Face or the GitHub repository.
Gradio demo: Try Dia instantly through the hosted Hugging Face Spaces demo without any local setup.
Python API: Integrate Dia into your own applications using the provided Python package and inference scripts.
Local inference: Run the model on your own hardware for full control over privacy and customization.

Community Discussions

Be the first to start a conversation about Dia

Share your experience with Dia, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source model available for free download and local use.

1.6B parameter TTS model
Multi-speaker dialogue generation
Voice cloning
Non-verbal audio cues
Emotion control

Capabilities

Key Features

Multi-speaker dialogue generation
Non-verbal audio cues (laughter, coughing, sighs)
Emotion and tone control via transcript
Voice cloning from audio reference
1.6B parameter open-source model
Hugging Face Spaces demo
Python API
Local inference support

Integrations

Hugging Face

Gradio

API Available

View Docs

Back to all tools