# Sift

> A capture-first personal knowledge base that collects links, text, screenshots, and notes, then uses AI to process them into searchable, reusable knowledge pages.

Sift is a capture-first personal knowledge base built by Yuanlw, written primarily in TypeScript and hosted on GitHub. It is designed to close the gap between saving information and actually reusing it — letting users dump links, text, screenshots, and notes first, then letting AI analysis, association, and indexing happen in the background. The project is currently a functional personal MVP, not yet a mature public SaaS product.

## What It Is

Sift sits in the personal knowledge management category, but with a specific philosophy: saving must be fast, and understanding can happen later. Rather than requiring users to assign titles, categories, or tags at capture time, Sift accepts raw material — URLs, copied text, images, quick notes — and processes it asynchronously into structured source records, readable wiki-style knowledge pages, and retrieval-ready vector chunks. The result is a personal knowledge asset that can be searched, queried, and fed into external Agent workflows.

## Core Workflow

The pipeline Sift implements follows a clear sequence:

- **Collect** — Quick capture of links, text, screenshots, and notes via an inbox interface; supports batch URL import, browser bookmark HTML import, and bulk photo/screenshot import.
- **Process** — Background extraction, structuring, source record generation, knowledge page generation, semantic chunking, and vector indexing.
- **Organize** — Inbox views for today's captures, in-progress, failed, pending notes, ignored, and test data; failed items can be retried, supplemented, or ignored.
- **Retrieve** — Full-library Q&A, per-knowledge-page Q&A with history, full-text search, semantic recall, recent review, knowledge discovery, and duplicate detection.
- **Expose** — Agent API and MCP endpoint so external tools can read Sift's knowledge context.

## Architecture and Model Configuration

Sift requires three model types: a text/chat model for extraction, structuring, knowledge page generation, and Q&A; an embedding model for retrieval; and an optional vision model for image OCR. The `/settings` page offers two modes — using Sift's default models (with quota tracking) or configuring a custom OpenAI-compatible endpoint. Custom API keys are never returned to the frontend; multi-user deployments can encrypt keys server-side. The model layer is designed to support OpenAI, Anthropic, Google Gemini, Qwen, DeepSeek, and local model gateways without changing the product's core boundaries.

## Knowledge Graph and Retrieval

Phases P10 and P11 introduced an invisible knowledge relationship layer and graph-aware retrieval. Ask and Agent queries can now expand recall using Source-Wiki relationships, related wiki pages, and duplicate source signals. Phase P12 added one-click human-confirmed merging of high-confidence related or duplicate discoveries into existing wiki pages, preserving merge history, source relationships, and chunk reconstruction.

## Current Status

The README explicitly states that Sift is a usable personal MVP suitable for daily personal use and ongoing product review, but not yet a mature public hosted SaaS. Completed phases (P0–P12) cover the full capture-first foundation, extraction, source/knowledge page generation, search, Q&A, Agent API, MCP, mobile-first capture, external import, review/discovery, model metering, account security, knowledge graph, and merge workflows. Still needed before broader deployment: email verification, password recovery, team/multi-tenant support, production task queues, model provider expansion, regression testing, and a clearer account/deployment system. The license is source-available (not OSI open source): personal, educational, research, and internal organizational use is permitted, but offering it as a public SaaS or resale service without explicit written permission is not allowed.

## Features
- Capture-first inbox for links, text, screenshots, and notes
- Background AI extraction and structuring
- Source record and knowledge page generation
- Full-text search and semantic recall
- Full-library and per-page Q&A with history
- Batch URL import, browser bookmark HTML import, and bulk screenshot import
- Recent review, knowledge discovery, and duplicate detection
- Agent API and MCP endpoint for external tool integration
- Knowledge relationship graph and graph-aware retrieval
- One-click human-confirmed merge of related/duplicate content
- Model configuration via settings UI (default or custom OpenAI-compatible)
- Model usage metering and quota tracking
- Inbox views: today, in-progress, failed, pending notes, ignored

## Integrations
OpenAI-compatible text/chat models, OpenAI-compatible embedding models, Vision/OCR models, Anthropic, Google Gemini, Qwen, DeepSeek, Local model gateways, MCP (Model Context Protocol), Stripe (for SaaS billing in hosted deployments), Docker / Docker Compose, PostgreSQL

## Platforms
WEB, API, CLI

## Pricing
Free

## Version
P12 (MVP)

## Links
- Website: https://github.com/Yuanlw/Sift
- Documentation: https://github.com/Yuanlw/Sift/blob/main/docs/local-setup.md
- Repository: https://github.com/Yuanlw/Sift
- EveryDev.ai: https://www.everydev.ai/tools/sift-knowledge-base
