Vocode
Open source Python library for building real-time voice-based LLM agents with integrations for transcription, synthesis, and telephony.
At a Glance
About Vocode
Vocode is an open source voice AI library built by vocodedev, released under the MIT License. It provides the abstractions, integrations, and orchestration needed to build real-time streaming voice conversations on top of any AI stack, and can be deployed to phone calls, Zoom meetings, and system audio. The project is hosted on GitHub with over 3,700 stars and 655 forks as of mid-2024.
What It Is
Vocode is a Python library (installable via pip install vocode) that lets developers build voice-based LLM applications. It handles the full pipeline from speech-to-text transcription through LLM inference to text-to-speech synthesis, all in a streaming, real-time architecture. The library is modular, meaning each component — transcriber, agent, synthesizer — can be swapped independently. A companion enterprise-grade API (Vocode API) sits on top of the open source core and is designed for managing AI agents on phone calls at scale.
Two-Layer Architecture
Vocode ships as two distinct products:
- Vocode Core — the open source library providing integrations, orchestration, and abstractions for building voice applications on any AI stack. Available on GitHub under the MIT License.
- Vocode API — a hosted, enterprise-grade API for managing AI agents on phone calls, built on top of Vocode Core. Accessible via a dashboard and supported by Python and Node.js SDKs.
This separation lets developers self-host the full stack using Vocode Core or use the managed API for production telephony workloads.
Integrations and Supported Services
Vocode Core ships with out-of-the-box integrations across the entire voice pipeline:
- Transcription: AssemblyAI, Deepgram, Gladia, Google Cloud, Microsoft Azure, RevAI, OpenAI Whisper, Whisper.cpp
- LLMs: OpenAI, Anthropic
- Speech synthesis: Rime.ai, Microsoft Azure, Google Cloud, Play.ht, ElevenLabs, Cartesia, Coqui (OSS), gTTS, StreamElements, Bark, AWS Polly
- Telephony: Inbound and outbound phone calls, Zoom dial-in
- Frameworks: Langchain agent integration for outbound calls
A React SDK (vocode-react-sdk) is also available for browser-based voice applications.
Use Cases and Deployment Targets
The library supports several deployment patterns:
- Real-time microphone/speaker conversations on local system audio
- Inbound phone numbers that respond with LLM-based agents
- Outbound phone calls initiated and managed by LLM agents
- Zoom meeting dial-in bots
- Voice-based personal assistants and interactive apps (e.g., voice chess)
- Langchain-integrated agents that can place real phone calls
Current Status
The latest release of vocode-core is v0.1.113, published on June 18, 2024. The last code push to the main branch was November 15, 2024. The project's GitHub page notes the team is actively looking for community maintainers. The repository remains public and the MIT license is unchanged. The Vocode API dashboard is accessible at dashboard.vocode.dev for hosted usage.
Community Discussions
Be the first to start a conversation about Vocode
Share your experience with Vocode, ask questions, or help others learn from your insights.
Pricing
Open Source
Full open source library available under MIT License via GitHub.
- Real-time streaming voice conversations
- Inbound and outbound phone call support
- Modular transcriber/agent/synthesizer pipeline
- All built-in integrations
- Self-hosted telephony server
Capabilities
Key Features
- Real-time streaming voice conversations
- Inbound and outbound phone call support
- Zoom meeting dial-in
- Modular transcriber/agent/synthesizer pipeline
- LLM agent orchestration
- Langchain integration
- Python and Node.js SDKs
- React SDK for browser-based voice apps
- Self-hosted telephony server
- Open source MIT license
