Parlor
On-device, real-time multimodal AI that enables natural voice and vision conversations running entirely on your local machine using Gemma 4 E2B and Kokoro TTS.
At a Glance
Fully free and open-source under Apache License 2.0. Self-host on your own machine.
Engagement
Available On
Alternatives
Listed Apr 2026
About Parlor
Parlor is an open-source, on-device multimodal AI assistant that lets you have real-time voice and vision conversations without any cloud dependency. It uses Google's Gemma 4 E2B model for speech and vision understanding, and Kokoro for text-to-speech, all running locally on Apple Silicon or Linux with a supported GPU. The project is designed to eliminate server costs for AI-powered language learning and conversation, with a total end-to-end latency of roughly 2.5–3.0 seconds on an Apple M3 Pro.
- On-device inference: Runs entirely on your local machine using LiteRT-LM (GPU) for Gemma 4 E2B and MLX (Mac) or ONNX (Linux) for Kokoro TTS — no cloud API calls required.
- Real-time voice activity detection: Uses Silero VAD in the browser for hands-free, push-to-talk-free conversation.
- Barge-in support: Interrupt the AI mid-sentence by speaking, enabling natural conversational flow.
- Sentence-level TTS streaming: Audio playback begins before the full response is generated, reducing perceived latency.
- Multimodal vision + speech: Point your camera at objects and discuss them in real time; the model processes both audio and JPEG video frames simultaneously.
- Multilingual support: Gemma 4 E2B supports multiple languages, allowing users to fall back to their native language during conversations.
- FastAPI WebSocket backend: A lightweight Python server handles audio PCM and JPEG frame ingestion over WebSocket and streams audio chunks back to the browser.
- Quick start with uv: Clone the repo, run
uv syncanduv run server.py, then openhttp://localhost:8000— models (~2.6 GB) download automatically on first run. - Configurable model path and port: Override
MODEL_PATHto use a local model file andPORTto change the server port via environment variables. - Apache 2.0 licensed: Free to use, modify, and distribute.
Community Discussions
Be the first to start a conversation about Parlor
Share your experience with Parlor, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under Apache License 2.0. Self-host on your own machine.
- On-device real-time multimodal AI
- Voice and vision conversations
- No cloud costs
- Apache 2.0 license
- Full source code access
Capabilities
Key Features
- On-device real-time multimodal AI
- Voice and vision conversations
- Gemma 4 E2B model integration
- Kokoro TTS (MLX on Mac, ONNX on Linux)
- Browser-based voice activity detection (Silero VAD)
- Barge-in interruption support
- Sentence-level TTS streaming
- FastAPI WebSocket server
- Automatic model download on first run
- Multilingual support
- No cloud dependency
- Configurable model path and server port
