nanochat

Name: nanochat
Availability: OnlineOnly
Author: Andrej Karpathy

nanochat is an open-source, from-scratch codebase for training and serving your own small chat LLM on a tight budget. It’s designed to run a full “speedrun” on a single 8×H100 box in roughly a few hours (~$100): tokenization, base pretraining, mid-training on chat data, supervised finetuning, optional RL on GSM8K, evaluation, and a simple web UI to talk to the model.

What it includes:

Tokenizer & data: a custom Rust BPE tokenizer and scripts to pull a shuffled subset of FineWeb-EDU for pretraining.
Training stages: base pretraining → mid-training (SmolTalk + MMLU aux + GSM8K) → SFT; optional RL (simplified GRPO) on GSM8K.
Evaluation: CORE / ChatCORE metrics plus task-specific scores (ARC-Easy/Challenge, MMLU, GSM8K, HumanEval), and an auto-generated report.md summarizing runs.
Inference & serving: a compact engine with KV caching (prefill + decode) and a FastAPI server with a lightweight chat web UI.
Scalability knob: model depth as the primary “slider” (e.g., d20 ≈ ~560M params), with auto-adjusted batch/accumulation.

Use it to understand the full training loop, tweak data or hyperparameters, and stand up a private, hackable chat model end-to-end.

No discussions yet

Be the first to start a discussion about nanochat

Developer

Andrej Karpathy

karpathy.ai

karpathy

𝕏karpathy

1 AI Tool

Andrej Karpathy publishes open-source machine learning demos and educational projects focused on deep learning and practical implementa…read more

Andrej Karpathy developer profile

Pricing and Plans

(Open Source)

Open Source

Free

Open-source repository available for local use, modification, and learning.

Full repository source code
Permissive open-source usage for experimentation
Reference implementation for an end-to-end chat LLM pipeline

System Requirements

Operating System

Any OS with a modern web browser

Memory (RAM)

4 GB+ RAM

Processor

Any modern 64-bit CPU

Disk Space

None (web app)

← Back to all tools

Stats on nanochat

Related Tools

Helicone

20h

Observability Platforms

Helicone provides observability and analytics for large language model usage via a web dashboard and API to capture telemetry, metrics, and logs from LLM calls.