A capture-first personal knowledge base that collects links, text, screenshots, and notes, then uses AI to process them into searchable, reusable knowledge pages.
At a Glance
Free to run locally or self-host under the source-available license for personal, educational, research, and internal organizational use.
Engagement
Available On
Alternatives
Listed May 2026
About Sift
Sift is a capture-first personal knowledge base built by Yuanlw, written primarily in TypeScript and hosted on GitHub. It is designed to close the gap between saving information and actually reusing it — letting users dump links, text, screenshots, and notes first, then letting AI analysis, association, and indexing happen in the background. The project is currently a functional personal MVP, not yet a mature public SaaS product.
What It Is
Sift sits in the personal knowledge management category, but with a specific philosophy: saving must be fast, and understanding can happen later. Rather than requiring users to assign titles, categories, or tags at capture time, Sift accepts raw material — URLs, copied text, images, quick notes — and processes it asynchronously into structured source records, readable wiki-style knowledge pages, and retrieval-ready vector chunks. The result is a personal knowledge asset that can be searched, queried, and fed into external Agent workflows.
Core Workflow
The pipeline Sift implements follows a clear sequence:
- Collect — Quick capture of links, text, screenshots, and notes via an inbox interface; supports batch URL import, browser bookmark HTML import, and bulk photo/screenshot import.
- Process — Background extraction, structuring, source record generation, knowledge page generation, semantic chunking, and vector indexing.
- Organize — Inbox views for today's captures, in-progress, failed, pending notes, ignored, and test data; failed items can be retried, supplemented, or ignored.
- Retrieve — Full-library Q&A, per-knowledge-page Q&A with history, full-text search, semantic recall, recent review, knowledge discovery, and duplicate detection.
- Expose — Agent API and MCP endpoint so external tools can read Sift's knowledge context.
Architecture and Model Configuration
Sift requires three model types: a text/chat model for extraction, structuring, knowledge page generation, and Q&A; an embedding model for retrieval; and an optional vision model for image OCR. The /settings page offers two modes — using Sift's default models (with quota tracking) or configuring a custom OpenAI-compatible endpoint. Custom API keys are never returned to the frontend; multi-user deployments can encrypt keys server-side. The model layer is designed to support OpenAI, Anthropic, Google Gemini, Qwen, DeepSeek, and local model gateways without changing the product's core boundaries.
Knowledge Graph and Retrieval
Phases P10 and P11 introduced an invisible knowledge relationship layer and graph-aware retrieval. Ask and Agent queries can now expand recall using Source-Wiki relationships, related wiki pages, and duplicate source signals. Phase P12 added one-click human-confirmed merging of high-confidence related or duplicate discoveries into existing wiki pages, preserving merge history, source relationships, and chunk reconstruction.
Current Status
The README explicitly states that Sift is a usable personal MVP suitable for daily personal use and ongoing product review, but not yet a mature public hosted SaaS. Completed phases (P0–P12) cover the full capture-first foundation, extraction, source/knowledge page generation, search, Q&A, Agent API, MCP, mobile-first capture, external import, review/discovery, model metering, account security, knowledge graph, and merge workflows. Still needed before broader deployment: email verification, password recovery, team/multi-tenant support, production task queues, model provider expansion, regression testing, and a clearer account/deployment system. The license is source-available (not OSI open source): personal, educational, research, and internal organizational use is permitted, but offering it as a public SaaS or resale service without explicit written permission is not allowed.
Community Discussions
Be the first to start a conversation about Sift
Share your experience with Sift, ask questions, or help others learn from your insights.
Pricing
Self-Hosted
Free to run locally or self-host under the source-available license for personal, educational, research, and internal organizational use.
- Full capture-first inbox
- AI extraction and knowledge page generation
- Full-text and semantic search
- Q&A with history
- Agent API and MCP endpoint
Capabilities
Key Features
- Capture-first inbox for links, text, screenshots, and notes
- Background AI extraction and structuring
- Source record and knowledge page generation
- Full-text search and semantic recall
- Full-library and per-page Q&A with history
- Batch URL import, browser bookmark HTML import, and bulk screenshot import
- Recent review, knowledge discovery, and duplicate detection
- Agent API and MCP endpoint for external tool integration
- Knowledge relationship graph and graph-aware retrieval
- One-click human-confirmed merge of related/duplicate content
- Model configuration via settings UI (default or custom OpenAI-compatible)
- Model usage metering and quota tracking
- Inbox views: today, in-progress, failed, pending notes, ignored
