LamBench
A benchmark of 120 pure lambda calculus programming problems for evaluating how well AI models can implement algorithms using lambda encodings.
At a Glance
Fully free and open-source benchmark available on GitHub under MIT license.
Engagement
Available On
Alternatives
Listed Apr 2026
About LamBench
λ-bench (LamBench) is an open-source benchmark suite containing 120 pure lambda calculus programming problems designed to evaluate AI model capabilities in functional and symbolic reasoning. Each problem challenges a model to write a program in Lamb, a minimal lambda calculus language, using λ-encodings of data structures to implement specific algorithms. Models receive a problem description, data encoding specification, and test cases, then must return a single .lam program that passes all input/output pairs. The benchmark spans 12 categories ranging from trivial Church natural number arithmetic to highly complex tasks like BF interpreters, FFT, and Sudoku solvers — all in pure λ-calculus.
- 120 Diverse Problems — Problems are organized across 12 categories including Church Naturals, Scott Naturals, Church/Scott Lists, Trees, ADTs, N-Tuples, and complex Algorithms.
- Live Leaderboard — A generated GitHub Pages landing page displays up-to-date rankings for all evaluated models, built by running
bun run build. - Lamb Language — A minimal pure lambda calculus with named top-level definitions; no built-in data types — everything is λ-encoded using abstractions and applications.
- Automated Evaluation Harness — Run
bun bench <provider/model>to evaluate any supported model; results are written as timestamped text files in theres/directory. - Flexible CLI Options — Supports
--filter <prefix>,--concurrency <n>,--timeout <seconds>, and--no-reasoningflags for fine-grained benchmark control. - Multi-Provider Support — Works with OpenAI, Anthropic, and Google model APIs; API keys are stored in
~/.config/for easy configuration. - v1 Scoring — Score is the pass rate (solved problems / 120); future versions will incorporate program size measured in bits against reference implementations.
- Reference Solutions Included — The
lam/directory contains reference.lamsolutions for all 120 tasks, enabling size-based comparisons.
Community Discussions
Be the first to start a conversation about LamBench
Share your experience with LamBench, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source benchmark available on GitHub under MIT license.
- 120 pure lambda calculus problems
- Automated evaluation harness
- Reference solutions included
- Live leaderboard generator
- Multi-provider model support
Capabilities
Key Features
- 120 pure lambda calculus programming problems
- 12 problem categories including Church/Scott encodings and Algorithms
- Automated evaluation harness via CLI
- Live leaderboard on GitHub Pages
- Lamb minimal lambda calculus language
- Multi-provider AI model support (OpenAI, Anthropic, Google)
- Timestamped result files
- Reference solutions for all 120 tasks
- Flexible CLI flags for filtering and concurrency
- v1 pass-rate scoring with future size-based scoring planned
