Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    2,106+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1228
    • Coding1045
    • Infrastructure455
    • Marketing414
    • Design374
    • Projects340
    • Analytics319
    • Research306
    • Testing200
    • Data171
    • Integration169
    • Security169
    • MCP164
    • Learning146
    • Communication131
    • Prompts122
    • Extensions120
    • Commerce116
    • Voice107
    • DevOps92
    • Web73
    • Finance19
    1. Home
    2. Tools
    3. VibeVoice
    VibeVoice icon

    VibeVoice

    Speech Recognition
    Featured

    An open-source family of frontier voice AI models from Microsoft, including long-form TTS, multi-speaker speech synthesis, real-time streaming TTS, and long-form ASR with speaker diarization.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under the MIT License. All model weights and code are publicly available.

    Engagement

    Available On

    Windows
    Linux
    API
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Speech RecognitionVoice SynthesisGenerative Media

    Alternatives

    KlicStudioSarvam AIResemble AI
    Developer
    MicrosoftOne Microsoft Way, Washington 98052-7329Est. 1975$30B raised

    Listed Apr 2026

    About VibeVoice

    VibeVoice is a family of open-source frontier voice AI models developed by Microsoft Research, covering both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR). It uses continuous speech tokenizers operating at an ultra-low frame rate of 7.5 Hz and a next-token diffusion framework combining a Large Language Model with a diffusion head for high-fidelity audio generation. The project is released under the MIT License and is intended for research and development purposes.

    Key models and features include:

    • VibeVoice-ASR — A unified speech-to-text model that handles up to 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identity (Who), timestamps (When), and content (What). Supports 50+ languages and customized hotwords.
    • VibeVoice-TTS — A long-form multi-speaker TTS model capable of synthesizing up to 90 minutes of speech with up to 4 distinct speakers. Supports English, Chinese, and other languages with expressive, natural-sounding output.
    • VibeVoice-Realtime-0.5B — A lightweight 0.5B parameter real-time streaming TTS model with ~300ms first-audible latency, supporting streaming text input and robust long-form generation (~10 minutes).
    • Hugging Face Integration — All model weights are available on Hugging Face Hub; VibeVoice-ASR is natively supported via the Hugging Face Transformers library.
    • vLLM Inference Support — VibeVoice-ASR supports vLLM for accelerated inference.
    • Finetuning Support — Finetuning code for VibeVoice-ASR is publicly available in the repository.
    • Google Colab Demos — Interactive Colab notebooks are provided for quick experimentation with streaming TTS and realtime models.
    • Next-Token Diffusion Architecture — Core innovation using acoustic and semantic tokenizers at 7.5 Hz for efficient long-sequence processing while preserving audio fidelity.
    VibeVoice - 1

    Community Discussions

    Be the first to start a conversation about VibeVoice

    Share your experience with VibeVoice, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source (MIT)

    Fully free and open-source under the MIT License. All model weights and code are publicly available.

    • VibeVoice-ASR model weights
    • VibeVoice-Realtime-0.5B model weights
    • ASR finetuning code
    • Colab demo notebooks
    • Hugging Face Transformers integration

    Capabilities

    Key Features

    • Long-form ASR up to 60 minutes in a single pass
    • Speaker diarization with timestamps
    • Customized hotword support
    • 50+ language multilingual ASR
    • Long-form multi-speaker TTS up to 90 minutes
    • Up to 4 distinct speakers in a single TTS pass
    • Real-time streaming TTS with ~300ms latency
    • Next-token diffusion architecture
    • vLLM inference support
    • Hugging Face Transformers integration
    • Finetuning code available
    • Google Colab demos

    Integrations

    Hugging Face Transformers
    vLLM
    Google Colab
    Gradio
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate VibeVoice and help others make informed decisions.

    Developer

    Microsoft

    Microsoft is a multinational technology company that develops and supports software, services, devices, and solutions including Visual Studio Code, Azure AI Services, and developer tools.

    Founded 1975
    One Microsoft Way
    $30B raised
    228,000 employees

    Used by

    Nearly 70% of the Fortune 500 use…
    More than 85% of the Fortune 500 use…
    Disney
    Dow
    +10 more
    Read more about Microsoft
    WebsiteGitHubX / Twitter
    9 tools in directory

    Similar Tools

    KlicStudio icon

    KlicStudio

    KlicStudio is an open-source AI-powered video localization and dubbing tool that automates subtitle generation, translation, and voice synthesis for videos.

    Sarvam AI icon

    Sarvam AI

    India's sovereign full-stack AI platform offering frontier-class models, APIs, and conversational agents optimized for 22 Indian languages and English.

    Resemble AI icon

    Resemble AI

    AI voice generator platform offering voice cloning, text-to-speech, speech-to-speech, and deepfake detection capabilities.

    Browse all tools

    Related Topics

    Speech Recognition

    AI tools that convert spoken language into text.

    34 tools

    Voice Synthesis

    AI tools that generate human-like speech from text.

    22 tools

    Generative Media

    AI platforms providing comprehensive generative capabilities across multiple media types including images, video, audio, and 3D content.

    64 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions