# LamBench

> A benchmark of 120 pure lambda calculus programming problems for evaluating how well AI models can implement algorithms using lambda encodings.

λ-bench (LamBench) is an open-source benchmark suite containing 120 pure lambda calculus programming problems designed to evaluate AI model capabilities in functional and symbolic reasoning. Each problem challenges a model to write a program in **Lamb**, a minimal lambda calculus language, using λ-encodings of data structures to implement specific algorithms. Models receive a problem description, data encoding specification, and test cases, then must return a single `.lam` program that passes all input/output pairs. The benchmark spans 12 categories ranging from trivial Church natural number arithmetic to highly complex tasks like BF interpreters, FFT, and Sudoku solvers — all in pure λ-calculus.

- **120 Diverse Problems** — *Problems are organized across 12 categories including Church Naturals, Scott Naturals, Church/Scott Lists, Trees, ADTs, N-Tuples, and complex Algorithms.*
- **Live Leaderboard** — *A generated GitHub Pages landing page displays up-to-date rankings for all evaluated models, built by running `bun run build`.*
- **Lamb Language** — *A minimal pure lambda calculus with named top-level definitions; no built-in data types — everything is λ-encoded using abstractions and applications.*
- **Automated Evaluation Harness** — *Run `bun bench <provider/model>` to evaluate any supported model; results are written as timestamped text files in the `res/` directory.*
- **Flexible CLI Options** — *Supports `--filter <prefix>`, `--concurrency <n>`, `--timeout <seconds>`, and `--no-reasoning` flags for fine-grained benchmark control.*
- **Multi-Provider Support** — *Works with OpenAI, Anthropic, and Google model APIs; API keys are stored in `~/.config/` for easy configuration.*
- **v1 Scoring** — *Score is the pass rate (solved problems / 120); future versions will incorporate program size measured in bits against reference implementations.*
- **Reference Solutions Included** — *The `lam/` directory contains reference `.lam` solutions for all 120 tasks, enabling size-based comparisons.*

## Features
- 120 pure lambda calculus programming problems
- 12 problem categories including Church/Scott encodings and Algorithms
- Automated evaluation harness via CLI
- Live leaderboard on GitHub Pages
- Lamb minimal lambda calculus language
- Multi-provider AI model support (OpenAI, Anthropic, Google)
- Timestamped result files
- Reference solutions for all 120 tasks
- Flexible CLI flags for filtering and concurrency
- v1 pass-rate scoring with future size-based scoring planned

## Integrations
OpenAI API, Anthropic API, Google AI API, Bun runtime

## Platforms
CLI, WEB, API

## Pricing
Open Source

## Links
- Website: https://victortaelin.github.io/lambench/
- Documentation: https://github.com/VictorTaelin/LamBench#readme
- Repository: https://github.com/VictorTaelin/LamBench
- EveryDev.ai: https://www.everydev.ai/tools/lambench