little-coder

Name: little-coder
Availability: OnlineOnly
Author: Itay Inbar

A coding agent tuned for small local language models, built on top of the pi agent framework, enabling offline AI-assisted coding on consumer hardware.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache License 2.0. Install via npm or bun.

Engagement

Available On

Windows

macOS

Linux

API

CLI

Itay InbarTel Aviv, IsraelEst. 2026

Listed May 2026

About little-coder

little-coder is an open-source coding agent designed specifically to maximize performance from small local language models (LLMs) running on consumer-grade hardware. Built by Itay Inbar and published on GitHub under the Apache 2.0 license, it layers 20+ TypeScript extensions and 30 skill markdown files on top of the minimal pi agent framework. The project is accompanied by a research write-up on Substack titled Honey, I Shrunk the Coding Agent, which documents the "scaffold–model fit" thesis behind the design.

What It Is

little-coder is a CLI coding agent that runs entirely offline against local inference servers (llama.cpp, Ollama, LM Studio) while also supporting cloud providers (Anthropic, OpenAI, etc.) through the same interface. It is not a fork of pi — pi is a plain npm dependency providing the agent loop, multi-provider API, TUI, session tree, compaction, and extension model. little-coder adds its small-model-specific scaffolding on top: skill injection, knowledge injection, output repair, quality monitoring, thinking-budget capping, a bash permission gate, checkpoint snapshots, browser automation, and an evidence store. All small-model-specific extensions auto-disable for large or cloud models.

Scaffold–Model Fit: The Core Idea

The project's central claim, documented in the Substack paper, is that architectural adaptation of the agent scaffold — not model scale — is the primary lever for improving small-model coding performance. The paper reports that a 9.7B Qwen3.5 model running through little-coder's scaffold achieved 45.56% on the Aider Polyglot benchmark (225 exercises), compared to a matched-model vanilla Aider baseline of 19.11% on the same benchmark. The project attributes this gap to mechanisms like per-turn skill selection, output-parser repair of malformed tool calls, quality-monitor loop detection, and thinking-budget management.

Benchmark Results

The repository tracks a growing set of benchmark results, all run on a single consumer laptop (i9-14900HX, 32 GB RAM, 8 GB VRAM on RTX 5070 Laptop) with no cloud inference:

v0.0.2 (paper): Qwen3.5-9B via Ollama — 45.56% on Aider Polyglot (225 exercises)
v0.0.5: Qwen3.6-35B-A3B via llama.cpp — 78.67% on Aider Polyglot
v0.1.4: Qwen3.6-35B-A3B — 40.0% on Terminal-Bench-Core v0.1.1 (80 tasks)
v0.1.13: Qwen3.6-35B-A3B — 24.6% ± 3.2 on Terminal-Bench 2.0 (89 tasks × 5 trials), accepted to the official Terminal-Bench 2.0 leaderboard at rank 120
v0.1.24: Qwen3.5-9B (Q4_K_M, 5.3 GB on GPU) — 9.2% ± 2.4 on Terminal-Bench 2.0, leaderboard rank 142
v0.1.27: Qwen3.6-35B-A3B — 40.00% (66/165) on GAIA validation set

The project homepage claims the Qwen3.6-35B-A3B + little-coder combination ranked above Gemini CLI + Gemini 2.5 Pro on the Terminal-Bench 2.0 leaderboard.

Architecture and Extension Model

little-coder's architecture is organized around pi's lifecycle hooks (before_agent_start, context, before_provider_request, tool_call, tool_result, turn_end, session_compact). The 23 bundled TypeScript extensions include:

skill-inject — per-turn tool-skill selection (error > recency > intent)
knowledge-inject — algorithm cheat-sheet scoring (word=1.0, bigram=2.0, threshold=2.0)
output-parser — repairs malformed tool calls (```tool, <tool_call>, bare JSON)
quality-monitor — detects empty/hallucinated/loop responses and triggers correction
thinking-budget — caps thinking tokens per turn, retries with thinking off
permission-gate — bash whitelist (ls, cat, git log/status/diff, find, grep, etc.)
checkpoint — snapshots files before Write/Edit
shell-session — tmux-proxy and subprocess backends for persistent shell state
browser — Playwright-based BrowserNavigate/Click/Type/Scroll/Extract
evidence — per-session evidence store with 1 KB snippet cap and compaction awareness

Update: v1.8.2

The latest release is v1.8.2, published on 2026-05-30, as shown in the GitHub repository. The project was created in April 2026 and has seen rapid iteration, moving from a Python-based substrate (v0.0.x) to a TypeScript/pi-based architecture (v0.1.0+). The current development focus (Phase 2) has shifted from benchmark coverage to operating real knowledge bases — medical, athletic, and educational — with many markdown files at once, stressing retrieval, compaction, and context-budgeting on histories longer than any single benchmark task. The repository reports 1,388 stars and 90 forks as of the last update.

Community Discussions

Be the first to start a conversation about little-coder

Share your experience with little-coder, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache License 2.0. Install via npm or bun.

Full coding agent with 20+ extensions
Local inference support (llama.cpp, Ollama, LM Studio)
Cloud provider support (Anthropic, OpenAI)
30 skill markdown files
Python benchmark harness

Capabilities

Key Features

Runs entirely offline against local inference servers (llama.cpp, Ollama, LM Studio)
Supports cloud providers (Anthropic, OpenAI) through the same interface
20+ TypeScript extensions built on the pi agent framework
Per-turn skill injection from 30 markdown skill files
Knowledge injection with algorithm cheat-sheet scoring
Output-parser repairs malformed tool calls
Quality monitor detects empty, hallucinated, or looping responses
Thinking-budget cap with retry logic
Bash permission gate with configurable whitelist
File checkpoint snapshots before Write/Edit operations
Persistent shell session via tmux-proxy and subprocess backends
Playwright-based browser automation (navigate, click, type, scroll, extract)
Per-session evidence store with compaction awareness
MoE model support: experts in RAM, attention on GPU (22 GB model on 8 GB VRAM)
LAN inference support via configurable base URL env vars
User-override model configuration file
Benchmark harness for Aider Polyglot, Terminal-Bench, and GAIA
All small-model extensions auto-disable for large/cloud models

Integrations

llama.cpp

Ollama

LM Studio

Anthropic Claude

OpenAI

Qwen models

pi agent framework

Playwright

tmux

Node.js

npm

bun

Hugging Face Hub

API Available

View Docs

Back to all tools Suggest an edit