EveryDev.ai
Sign inSubscribe
Home
Tools

2,760+ AI tools

  • New
  • Trending
  • Featured
  • Compare
  • Arena
Categories
  • Agents1887
  • Coding1349
  • Infrastructure636
  • Marketing505
  • Projects450
  • Research411
  • Design394
  • Analytics358
  • Security248
  • MCP246
  • Testing242
  • Data239
  • Integration181
  • Prompts169
  • Communication162
  • Learning162
  • Extensions156
  • Voice139
  • Commerce127
  • DevOps112
  • Web83
  • Finance24
AI Tools by Topic
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
    1. Home
    2. Tools
    3. VibeVoice
    VibeVoice icon

    VibeVoice

    Speech Recognition
    Featured

    An open-source family of frontier voice AI models from Microsoft, including long-form TTS, multi-speaker speech synthesis, real-time streaming TTS, and long-form ASR with speaker diarization.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under the MIT License. All model weights and code are publicly available.

    Engagement

    Available On

    Windows
    Linux
    API
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Speech RecognitionVoice SynthesisAudio

    Alternatives

    DeepgramUltravoxKrisp
    Developer
    MicrosoftOne Microsoft Way, Washington 98052-7329Est. 1975$30B raised

    Listed Apr 2026

    About VibeVoice

    VibeVoice is a family of open-source frontier voice AI models developed by Microsoft Research, covering both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR). It uses continuous speech tokenizers operating at an ultra-low frame rate of 7.5 Hz and a next-token diffusion framework combining a Large Language Model with a diffusion head for high-fidelity audio generation. The project is released under the MIT License and is intended for research and development purposes.

    Key models and features include:

    • VibeVoice-ASR — A unified speech-to-text model that handles up to 60-minute long-form audio in a single pass, producing structured transcriptions with speaker identity (Who), timestamps (When), and content (What). Supports 50+ languages and customized hotwords.
    • VibeVoice-TTS — A long-form multi-speaker TTS model capable of synthesizing up to 90 minutes of speech with up to 4 distinct speakers. Supports English, Chinese, and other languages with expressive, natural-sounding output.
    • VibeVoice-Realtime-0.5B — A lightweight 0.5B parameter real-time streaming TTS model with ~300ms first-audible latency, supporting streaming text input and robust long-form generation (~10 minutes).
    • Hugging Face Integration — All model weights are available on Hugging Face Hub; VibeVoice-ASR is natively supported via the Hugging Face Transformers library.
    • vLLM Inference Support — VibeVoice-ASR supports vLLM for accelerated inference.
    • Finetuning Support — Finetuning code for VibeVoice-ASR is publicly available in the repository.
    • Google Colab Demos — Interactive Colab notebooks are provided for quick experimentation with streaming TTS and realtime models.
    • Next-Token Diffusion Architecture — Core innovation using acoustic and semantic tokenizers at 7.5 Hz for efficient long-sequence processing while preserving audio fidelity.
    VibeVoice - 1

    Community Discussions

    Be the first to start a conversation about VibeVoice

    Share your experience with VibeVoice, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source (MIT)

    Fully free and open-source under the MIT License. All model weights and code are publicly available.

    • VibeVoice-ASR model weights
    • VibeVoice-Realtime-0.5B model weights
    • ASR finetuning code
    • Colab demo notebooks
    • Hugging Face Transformers integration

    Capabilities

    Key Features

    • Long-form ASR up to 60 minutes in a single pass
    • Speaker diarization with timestamps
    • Customized hotword support
    • 50+ language multilingual ASR
    • Long-form multi-speaker TTS up to 90 minutes
    • Up to 4 distinct speakers in a single TTS pass
    • Real-time streaming TTS with ~300ms latency
    • Next-token diffusion architecture
    • vLLM inference support
    • Hugging Face Transformers integration
    • Finetuning code available
    • Google Colab demos

    Integrations

    Hugging Face Transformers
    vLLM
    Google Colab
    Gradio
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate VibeVoice and help others make informed decisions.

    Developer

    Microsoft

    Microsoft is a multinational technology company that develops and supports software, services, devices, and solutions including Visual Studio Code, Azure AI Services, and developer tools.

    Founded 1975
    One Microsoft Way
    $30B raised
    228,000 employees

    Used by

    Nearly 70% of the Fortune 500 use…
    More than 85% of the Fortune 500 use…
    Disney
    Dow
    +10 more
    Read more about Microsoft
    WebsiteGitHubX / Twitter
    15 tools in directory

    Similar Tools

    Deepgram icon

    Deepgram

    AI-powered APIs for speech recognition, voice agents, audio intelligence, and text-to-speech.

    Ultravox icon

    Ultravox

    Real-time voice AI platform with speech-native models for building and scaling conversational voice agents.

    Krisp icon

    Krisp

    Krisp is a Voice AI platform offering noise cancellation, accent conversion, AI meeting notes, and call center productivity tools for teams and developers.

    Browse all tools

    Related Topics

    Speech Recognition

    AI tools that convert spoken language into text.

    40 tools

    Voice Synthesis

    AI tools that generate human-like speech from text.

    30 tools

    Audio

    AI tools that generate or edit audio — music, sound effects, voice and speech, and podcast production.

    25 tools
    Browse all topics
    Back to all toolsSuggest an edit
    36views
    Discussions