Sift

Name: Sift
Availability: OnlineOnly
Author: Yuanlw

A capture-first personal knowledge base that collects links, text, screenshots, and notes, then uses AI to process them into searchable, reusable knowledge pages.

Visit Website

At a Glance

Pricing

Free

Free to run locally or self-host under the source-available license for personal, educational, research, and internal organizational use.

Engagement

Available On

Web

API

CLI

YuanlwYuanlw builds Sift, a capture-first personal knowledge base…

Listed May 2026

About Sift

Sift is a capture-first personal knowledge base built by Yuanlw, written primarily in TypeScript and hosted on GitHub. It is designed to close the gap between saving information and actually reusing it — letting users dump links, text, screenshots, and notes first, then letting AI analysis, association, and indexing happen in the background. The project is currently a functional personal MVP, not yet a mature public SaaS product.

What It Is

Sift sits in the personal knowledge management category, but with a specific philosophy: saving must be fast, and understanding can happen later. Rather than requiring users to assign titles, categories, or tags at capture time, Sift accepts raw material — URLs, copied text, images, quick notes — and processes it asynchronously into structured source records, readable wiki-style knowledge pages, and retrieval-ready vector chunks. The result is a personal knowledge asset that can be searched, queried, and fed into external Agent workflows.

Core Workflow

The pipeline Sift implements follows a clear sequence:

Collect — Quick capture of links, text, screenshots, and notes via an inbox interface; supports batch URL import, browser bookmark HTML import, and bulk photo/screenshot import.
Process — Background extraction, structuring, source record generation, knowledge page generation, semantic chunking, and vector indexing.
Organize — Inbox views for today's captures, in-progress, failed, pending notes, ignored, and test data; failed items can be retried, supplemented, or ignored.
Retrieve — Full-library Q&A, per-knowledge-page Q&A with history, full-text search, semantic recall, recent review, knowledge discovery, and duplicate detection.
Expose — Agent API and MCP endpoint so external tools can read Sift's knowledge context.

Architecture and Model Configuration

Sift requires three model types: a text/chat model for extraction, structuring, knowledge page generation, and Q&A; an embedding model for retrieval; and an optional vision model for image OCR. The /settings page offers two modes — using Sift's default models (with quota tracking) or configuring a custom OpenAI-compatible endpoint. Custom API keys are never returned to the frontend; multi-user deployments can encrypt keys server-side. The model layer is designed to support OpenAI, Anthropic, Google Gemini, Qwen, DeepSeek, and local model gateways without changing the product's core boundaries.

Knowledge Graph and Retrieval

Phases P10 and P11 introduced an invisible knowledge relationship layer and graph-aware retrieval. Ask and Agent queries can now expand recall using Source-Wiki relationships, related wiki pages, and duplicate source signals. Phase P12 added one-click human-confirmed merging of high-confidence related or duplicate discoveries into existing wiki pages, preserving merge history, source relationships, and chunk reconstruction.

Current Status

The README explicitly states that Sift is a usable personal MVP suitable for daily personal use and ongoing product review, but not yet a mature public hosted SaaS. Completed phases (P0–P12) cover the full capture-first foundation, extraction, source/knowledge page generation, search, Q&A, Agent API, MCP, mobile-first capture, external import, review/discovery, model metering, account security, knowledge graph, and merge workflows. Still needed before broader deployment: email verification, password recovery, team/multi-tenant support, production task queues, model provider expansion, regression testing, and a clearer account/deployment system. The license is source-available (not OSI open source): personal, educational, research, and internal organizational use is permitted, but offering it as a public SaaS or resale service without explicit written permission is not allowed.

Community Discussions

Be the first to start a conversation about Sift

Share your experience with Sift, ask questions, or help others learn from your insights.

Pricing

FREE

Self-Hosted

Free to run locally or self-host under the source-available license for personal, educational, research, and internal organizational use.

Full capture-first inbox
AI extraction and knowledge page generation
Full-text and semantic search
Q&A with history
Agent API and MCP endpoint

Capabilities

Key Features

Capture-first inbox for links, text, screenshots, and notes
Background AI extraction and structuring
Source record and knowledge page generation
Full-text search and semantic recall
Full-library and per-page Q&A with history
Batch URL import, browser bookmark HTML import, and bulk screenshot import
Recent review, knowledge discovery, and duplicate detection
Agent API and MCP endpoint for external tool integration
Knowledge relationship graph and graph-aware retrieval
One-click human-confirmed merge of related/duplicate content
Model configuration via settings UI (default or custom OpenAI-compatible)
Model usage metering and quota tracking
Inbox views: today, in-progress, failed, pending notes, ignored

Integrations

OpenAI-compatible text/chat models

OpenAI-compatible embedding models

Vision/OCR models

Anthropic

Google Gemini

Qwen

DeepSeek

Local model gateways

MCP (Model Context Protocol)

Stripe (for SaaS billing in hosted deployments)

Docker / Docker Compose

PostgreSQL

API Available

View Docs

Back to all tools Suggest an edit

About Sift

What It Is

Core Workflow

The pipeline Sift implements follows a clear sequence:

Collect — Quick capture of links, text, screenshots, and notes via an inbox interface; supports batch URL import, browser bookmark HTML import, and bulk photo/screenshot import.
Process — Background extraction, structuring, source record generation, knowledge page generation, semantic chunking, and vector indexing.
Organize — Inbox views for today's captures, in-progress, failed, pending notes, ignored, and test data; failed items can be retried, supplemented, or ignored.
Retrieve — Full-library Q&A, per-knowledge-page Q&A with history, full-text search, semantic recall, recent review, knowledge discovery, and duplicate detection.
Expose — Agent API and MCP endpoint so external tools can read Sift's knowledge context.

Sift