# ScreenMind

> AI-powered screen memory tool that captures your screen, analyzes it with Gemma 4, and builds a searchable, conversational local memory — 100% private with zero cloud dependencies.

ScreenMind is an open-source, privacy-first screen memory tool built by Ayush Shekhar and released under the MIT License. It continuously captures your screen, runs multimodal AI analysis locally using Gemma 4 E2B via llama.cpp, and stores everything in a searchable SQLite database — no cloud, no telemetry, no data leaving your machine. The project positions itself explicitly as a privacy-respecting alternative to Microsoft Recall, which the README notes "stores data in plaintext, sends telemetry, and was met with massive privacy backlash."

## What It Is

ScreenMind is a local AI memory system for your desktop. It sits in the background, detects meaningful screen changes (rather than firing on a fixed timer), and sends each screenshot through a multi-model pipeline: EasyOCR extracts raw text, Gemma 4 E2B analyzes the image and text together to produce structured JSON (app name, activity category, mood, scene description, spatial layout), MiniLM-L6-v2 generates semantic embeddings, and FTS5 indexes everything for keyword search. The result is a personal knowledge base you can query conversationally — asking things like "what did Ishaa say on Discord?" and getting the actual message back.

## Architecture and AI Pipeline

The system runs four AI models in concert:

- **EasyOCR** — extracts raw screen text fed as context to Gemma
- **Gemma 4 E2B (via llama.cpp)** — vision + audio + reasoning in one model; handles screenshot analysis, voice memo transcription, meeting transcription, daily summaries, and chat answers
- **MiniLM-L6-v2** — 80MB CPU-resident model generating 384-dim semantic vectors
- **SQLite WAL + FTS5** — zero-config database with concurrent reads and full-text search

Three analysis modes trade speed for depth: Accurate (~76s, deep thinking + layout detection), Balanced (~40s, thinking enabled), and Fast (~12s, no thinking, layout via OCR clustering). A per-app perceptual hash cache with app-aware staleness (communication apps refresh faster than IDEs) significantly reduces inference calls.

## Privacy and Security Model

All computation happens on-device after the one-time model download (~5GB GGUF). Key privacy controls include:

- **Sensitive data filter** — auto-redacts credit cards, SSNs, API keys, and passwords before storage
- **AES encryption at rest** — Fernet encryption for screenshots with OS keyring integration
- **Dashboard PIN lock** — session-based auth with configurable auto-lock timeout
- **Incognito mode** — one-click pause; nothing recorded
- **App blocklist** — silently skips capture for specified applications

## Agent Platform and MCP Integration

ScreenMind ships a full agent/plugin system. Markdown agents (`.md` files with YAML frontmatter) let anyone write English prompts that Gemma executes against screen data on a schedule. Python plugins (`.py`) get full SDK access with state persistence and direct LLM calls. Four built-in agents cover daily journaling, focus reporting, meeting action extraction, and code changelog summarization.

The project also exposes an MCP server (`mcp_server.py`) over stdio transport, making screen history available to Claude Desktop, Cursor, and VS Code. MCP tools include `search_screen`, `get_recent_activity`, `get_daily_summary`, `capture_now`, and `search_audio` for meeting transcripts.

## Setup Path and System Requirements

ScreenMind requires Python 3.10+, approximately 5GB of disk space for the Gemma 4 E2B GGUF model, and a GPU with 4GB+ VRAM (recommended). It runs on Windows, macOS, and Linux via OS-specific platform adapters. Setup is: clone the repo, create a virtual environment, install requirements, and run `main.py`. On first launch it auto-downloads the model, starts `llama-server`, and opens a web dashboard at `http://127.0.0.1:7777`. Configuration is available via `.env`, environment variables, or the Settings tab in the dashboard.

## Current Status

The repository was created in May 2026 and last pushed in June 2026, with 80 stars and 3 forks at time of indexing. The project is actively maintained with CI via GitHub Actions and Codecov coverage tracking. The README notes macOS/Linux platform adapters exist but flags real-hardware testing as a high-impact contribution area, suggesting Windows is the most tested platform currently.

## Features
- Smart screen capture with content-change detection
- Gemma 4 E2B multimodal analysis (vision + audio + reasoning)
- Hybrid semantic + keyword search (MiniLM + FTS5)
- Conversational RAG chat with screen memory
- Voice memo recording and transcription
- Meeting auto-detection and transcription (Zoom/Teams/Meet)
- Analytics dashboard with category breakdown and hourly heatmap
- Day Rewind timelapse playback
- Three analysis modes: Accurate, Balanced, Fast
- Per-app perceptual hash cache
- 100% local inference via llama.cpp
- Sensitive data auto-redaction (credit cards, SSNs, API keys)
- AES encryption at rest with OS keyring
- Dashboard PIN lock with auto-lock timeout
- Incognito mode (one-click pause)
- Agent platform: Markdown AI agents and Python plugins
- MCP server for Claude Desktop, Cursor, VS Code
- Obsidian and Notion integration
- Webhook support (HMAC signed, auto-retry)
- System-wide hotkeys (bookmark, pause, voice memo)
- FastAPI REST server with Swagger docs
- Cross-platform support: Windows, macOS, Linux

## Integrations
Claude Desktop (MCP), Cursor (MCP), VS Code (MCP), Obsidian, Notion, Slack (webhooks), Discord (webhooks), IFTTT (webhooks), Zoom (meeting detection), Microsoft Teams (meeting detection), Google Meet (meeting detection), HuggingFace (model download), llama.cpp / llama-server, EasyOCR, MiniLM-L6-v2

## Platforms
WINDOWS, MACOS, LINUX, WEB, API, VSC_EXTENSION, CLI

## Pricing
Open Source

## Version
main

## Links
- Website: https://github.com/ayushh0110/ScreenMind
- Documentation: https://github.com/ayushh0110/ScreenMind/blob/main/MCP_SETUP.md
- Repository: https://github.com/ayushh0110/ScreenMind
- EveryDev.ai: https://www.everydev.ai/tools/screenmind
