Parlor

Name: Parlor
Availability: OnlineOnly
Author: Fikri Karim

On-device, real-time multimodal AI that enables natural voice and vision conversations running entirely on your local machine using Gemma 4 E2B and Kokoro TTS.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache License 2.0. Self-host on your own machine.

Engagement

Available On

macOS

Linux

Web

API

CLI

Fikri KarimAtlanta, GAEst. 2025

Listed Apr 2026

About Parlor

Parlor is an open-source, on-device multimodal AI assistant that lets you have real-time voice and vision conversations without any cloud dependency. It uses Google's Gemma 4 E2B model for speech and vision understanding, and Kokoro for text-to-speech, all running locally on Apple Silicon or Linux with a supported GPU. The project is designed to eliminate server costs for AI-powered language learning and conversation, with a total end-to-end latency of roughly 2.5–3.0 seconds on an Apple M3 Pro.

On-device inference: Runs entirely on your local machine using LiteRT-LM (GPU) for Gemma 4 E2B and MLX (Mac) or ONNX (Linux) for Kokoro TTS — no cloud API calls required.
Real-time voice activity detection: Uses Silero VAD in the browser for hands-free, push-to-talk-free conversation.
Barge-in support: Interrupt the AI mid-sentence by speaking, enabling natural conversational flow.
Sentence-level TTS streaming: Audio playback begins before the full response is generated, reducing perceived latency.
Multimodal vision + speech: Point your camera at objects and discuss them in real time; the model processes both audio and JPEG video frames simultaneously.
Multilingual support: Gemma 4 E2B supports multiple languages, allowing users to fall back to their native language during conversations.
FastAPI WebSocket backend: A lightweight Python server handles audio PCM and JPEG frame ingestion over WebSocket and streams audio chunks back to the browser.
Quick start with uv: Clone the repo, run uv sync and uv run server.py, then open http://localhost:8000 — models (~2.6 GB) download automatically on first run.
Configurable model path and port: Override MODEL_PATH to use a local model file and PORT to change the server port via environment variables.
Apache 2.0 licensed: Free to use, modify, and distribute.

Community Discussions

Be the first to start a conversation about Parlor

Share your experience with Parlor, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache License 2.0. Self-host on your own machine.

On-device real-time multimodal AI
Voice and vision conversations
No cloud costs
Apache 2.0 license
Full source code access

Capabilities

Key Features

On-device real-time multimodal AI
Voice and vision conversations
Gemma 4 E2B model integration
Kokoro TTS (MLX on Mac, ONNX on Linux)
Browser-based voice activity detection (Silero VAD)
Barge-in interruption support
Sentence-level TTS streaming
FastAPI WebSocket server
Automatic model download on first run
Multilingual support
No cloud dependency
Configurable model path and server port

Integrations

Gemma 4 E2B (Google DeepMind)

LiteRT-LM (Google AI Edge)

Kokoro TTS (Hexgrad)

Silero VAD

HuggingFace

MLX

ONNX

FastAPI

API Available

View Docs

Back to all tools Suggest an edit

About Parlor

On-device inference: Runs entirely on your local machine using LiteRT-LM (GPU) for Gemma 4 E2B and MLX (Mac) or ONNX (Linux) for Kokoro TTS — no cloud API calls required.
Real-time voice activity detection: Uses Silero VAD in the browser for hands-free, push-to-talk-free conversation.
Barge-in support: Interrupt the AI mid-sentence by speaking, enabling natural conversational flow.
Sentence-level TTS streaming: Audio playback begins before the full response is generated, reducing perceived latency.
Multimodal vision + speech: Point your camera at objects and discuss them in real time; the model processes both audio and JPEG video frames simultaneously.
Multilingual support: Gemma 4 E2B supports multiple languages, allowing users to fall back to their native language during conversations.
FastAPI WebSocket backend: A lightweight Python server handles audio PCM and JPEG frame ingestion over WebSocket and streams audio chunks back to the browser.
Quick start with uv: Clone the repo, run uv sync and uv run server.py, then open http://localhost:8000 — models (~2.6 GB) download automatically on first run.
Configurable model path and port: Override MODEL_PATH to use a local model file and PORT to change the server port via environment variables.
Apache 2.0 licensed: Free to use, modify, and distribute.

Parlor