little-coder
A coding agent tuned for small local language models, built on top of the pi agent framework, enabling offline AI-assisted coding on consumer hardware.
At a Glance
About little-coder
little-coder is an open-source coding agent designed specifically to maximize performance from small local language models (LLMs) running on consumer-grade hardware. Built by Itay Inbar and published on GitHub under the Apache 2.0 license, it layers 20+ TypeScript extensions and 30 skill markdown files on top of the minimal pi agent framework. The project is accompanied by a research write-up on Substack titled Honey, I Shrunk the Coding Agent, which documents the "scaffold–model fit" thesis behind the design.
What It Is
little-coder is a CLI coding agent that runs entirely offline against local inference servers (llama.cpp, Ollama, LM Studio) while also supporting cloud providers (Anthropic, OpenAI, etc.) through the same interface. It is not a fork of pi — pi is a plain npm dependency providing the agent loop, multi-provider API, TUI, session tree, compaction, and extension model. little-coder adds its small-model-specific scaffolding on top: skill injection, knowledge injection, output repair, quality monitoring, thinking-budget capping, a bash permission gate, checkpoint snapshots, browser automation, and an evidence store. All small-model-specific extensions auto-disable for large or cloud models.
Scaffold–Model Fit: The Core Idea
The project's central claim, documented in the Substack paper, is that architectural adaptation of the agent scaffold — not model scale — is the primary lever for improving small-model coding performance. The paper reports that a 9.7B Qwen3.5 model running through little-coder's scaffold achieved 45.56% on the Aider Polyglot benchmark (225 exercises), compared to a matched-model vanilla Aider baseline of 19.11% on the same benchmark. The project attributes this gap to mechanisms like per-turn skill selection, output-parser repair of malformed tool calls, quality-monitor loop detection, and thinking-budget management.
Benchmark Results
The repository tracks a growing set of benchmark results, all run on a single consumer laptop (i9-14900HX, 32 GB RAM, 8 GB VRAM on RTX 5070 Laptop) with no cloud inference:
- v0.0.2 (paper): Qwen3.5-9B via Ollama — 45.56% on Aider Polyglot (225 exercises)
- v0.0.5: Qwen3.6-35B-A3B via llama.cpp — 78.67% on Aider Polyglot
- v0.1.4: Qwen3.6-35B-A3B — 40.0% on Terminal-Bench-Core v0.1.1 (80 tasks)
- v0.1.13: Qwen3.6-35B-A3B — 24.6% ± 3.2 on Terminal-Bench 2.0 (89 tasks × 5 trials), accepted to the official Terminal-Bench 2.0 leaderboard at rank 120
- v0.1.24: Qwen3.5-9B (Q4_K_M, 5.3 GB on GPU) — 9.2% ± 2.4 on Terminal-Bench 2.0, leaderboard rank 142
- v0.1.27: Qwen3.6-35B-A3B — 40.00% (66/165) on GAIA validation set
The project homepage claims the Qwen3.6-35B-A3B + little-coder combination ranked above Gemini CLI + Gemini 2.5 Pro on the Terminal-Bench 2.0 leaderboard.
Architecture and Extension Model
little-coder's architecture is organized around pi's lifecycle hooks (before_agent_start, context, before_provider_request, tool_call, tool_result, turn_end, session_compact). The 23 bundled TypeScript extensions include:
- skill-inject — per-turn tool-skill selection (error > recency > intent)
- knowledge-inject — algorithm cheat-sheet scoring (word=1.0, bigram=2.0, threshold=2.0)
- output-parser — repairs malformed tool calls (
```tool,<tool_call>, bare JSON) - quality-monitor — detects empty/hallucinated/loop responses and triggers correction
- thinking-budget — caps thinking tokens per turn, retries with thinking off
- permission-gate — bash whitelist (ls, cat, git log/status/diff, find, grep, etc.)
- checkpoint — snapshots files before Write/Edit
- shell-session — tmux-proxy and subprocess backends for persistent shell state
- browser — Playwright-based BrowserNavigate/Click/Type/Scroll/Extract
- evidence — per-session evidence store with 1 KB snippet cap and compaction awareness
Update: v1.8.2
The latest release is v1.8.2, published on 2026-05-30, as shown in the GitHub repository. The project was created in April 2026 and has seen rapid iteration, moving from a Python-based substrate (v0.0.x) to a TypeScript/pi-based architecture (v0.1.0+). The current development focus (Phase 2) has shifted from benchmark coverage to operating real knowledge bases — medical, athletic, and educational — with many markdown files at once, stressing retrieval, compaction, and context-budgeting on histories longer than any single benchmark task. The repository reports 1,388 stars and 90 forks as of the last update.
Community Discussions
Be the first to start a conversation about little-coder
Share your experience with little-coder, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under Apache License 2.0. Install via npm or bun.
- Full coding agent with 20+ extensions
- Local inference support (llama.cpp, Ollama, LM Studio)
- Cloud provider support (Anthropic, OpenAI)
- 30 skill markdown files
- Python benchmark harness
Capabilities
Key Features
- Runs entirely offline against local inference servers (llama.cpp, Ollama, LM Studio)
- Supports cloud providers (Anthropic, OpenAI) through the same interface
- 20+ TypeScript extensions built on the pi agent framework
- Per-turn skill injection from 30 markdown skill files
- Knowledge injection with algorithm cheat-sheet scoring
- Output-parser repairs malformed tool calls
- Quality monitor detects empty, hallucinated, or looping responses
- Thinking-budget cap with retry logic
- Bash permission gate with configurable whitelist
- File checkpoint snapshots before Write/Edit operations
- Persistent shell session via tmux-proxy and subprocess backends
- Playwright-based browser automation (navigate, click, type, scroll, extract)
- Per-session evidence store with compaction awareness
- MoE model support: experts in RAM, attention on GPU (22 GB model on 8 GB VRAM)
- LAN inference support via configurable base URL env vars
- User-override model configuration file
- Benchmark harness for Aider Polyglot, Terminal-Bench, and GAIA
- All small-model extensions auto-disable for large/cloud models
