LamBench

Name: LamBench
Availability: OnlineOnly
Author: VictorTaelin

A benchmark of 120 pure lambda calculus programming problems for evaluating how well AI models can implement algorithms using lambda encodings.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source benchmark available on GitHub under MIT license.

Engagement

Available On

CLI

Web

API

VictorTaelinVictorTaelin builds open-source tools and languages focused…

Listed Apr 2026

About LamBench

λ-bench (LamBench) is an open-source benchmark suite containing 120 pure lambda calculus programming problems designed to evaluate AI model capabilities in functional and symbolic reasoning. Each problem challenges a model to write a program in Lamb, a minimal lambda calculus language, using λ-encodings of data structures to implement specific algorithms. Models receive a problem description, data encoding specification, and test cases, then must return a single .lam program that passes all input/output pairs. The benchmark spans 12 categories ranging from trivial Church natural number arithmetic to highly complex tasks like BF interpreters, FFT, and Sudoku solvers — all in pure λ-calculus.

120 Diverse Problems — Problems are organized across 12 categories including Church Naturals, Scott Naturals, Church/Scott Lists, Trees, ADTs, N-Tuples, and complex Algorithms.
Live Leaderboard — A generated GitHub Pages landing page displays up-to-date rankings for all evaluated models, built by running bun run build.
Lamb Language — A minimal pure lambda calculus with named top-level definitions; no built-in data types — everything is λ-encoded using abstractions and applications.
Automated Evaluation Harness — Run bun bench <provider/model> to evaluate any supported model; results are written as timestamped text files in the res/ directory.
Flexible CLI Options — Supports --filter <prefix>, --concurrency <n>, --timeout <seconds>, and --no-reasoning flags for fine-grained benchmark control.
Multi-Provider Support — Works with OpenAI, Anthropic, and Google model APIs; API keys are stored in ~/.config/ for easy configuration.
v1 Scoring — Score is the pass rate (solved problems / 120); future versions will incorporate program size measured in bits against reference implementations.
Reference Solutions Included — The lam/ directory contains reference .lam solutions for all 120 tasks, enabling size-based comparisons.

Community Discussions

Be the first to start a conversation about LamBench

Share your experience with LamBench, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source benchmark available on GitHub under MIT license.

120 pure lambda calculus problems
Automated evaluation harness
Reference solutions included
Live leaderboard generator
Multi-provider model support

Capabilities

Key Features

120 pure lambda calculus programming problems
12 problem categories including Church/Scott encodings and Algorithms
Automated evaluation harness via CLI
Live leaderboard on GitHub Pages
Lamb minimal lambda calculus language
Multi-provider AI model support (OpenAI, Anthropic, Google)
Timestamped result files
Reference solutions for all 120 tasks
Flexible CLI flags for filtering and concurrency
v1 pass-rate scoring with future size-based scoring planned

Integrations

OpenAI API

Anthropic API

Google AI API

Bun runtime

API Available

View Docs

Back to all tools

LamBench

LLM Evaluations

A benchmark of 120 pure lambda calculus programming problems for evaluating how well AI models can implement algorithms using lambda encodings.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source benchmark available on GitHub under MIT license.

Engagement

Discussions

Available On

CLI

Web

API

Resources

Website Docs GitHub llms.txt

Topics

LLM Evaluations AI Development Libraries Local Inference

Alternatives

ZeroEval Artificial Analysis TruLens

Developer

VictorTaelinVictorTaelin builds open-source tools and languages focused…

Listed Apr 2026

About LamBench

120 Diverse Problems — Problems are organized across 12 categories including Church Naturals, Scott Naturals, Church/Scott Lists, Trees, ADTs, N-Tuples, and complex Algorithms.
Live Leaderboard — A generated GitHub Pages landing page displays up-to-date rankings for all evaluated models, built by running bun run build.
Lamb Language — A minimal pure lambda calculus with named top-level definitions; no built-in data types — everything is λ-encoded using abstractions and applications.
Automated Evaluation Harness — Run bun bench <provider/model> to evaluate any supported model; results are written as timestamped text files in the res/ directory.
Flexible CLI Options — Supports --filter <prefix>, --concurrency <n>, --timeout <seconds>, and --no-reasoning flags for fine-grained benchmark control.
Multi-Provider Support — Works with OpenAI, Anthropic, and Google model APIs; API keys are stored in ~/.config/ for easy configuration.
v1 Scoring — Score is the pass rate (solved problems / 120); future versions will incorporate program size measured in bits against reference implementations.
Reference Solutions Included — The lam/ directory contains reference .lam solutions for all 120 tasks, enabling size-based comparisons.

Community Discussions

Be the first to start a conversation about LamBench

Share your experience with LamBench, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source benchmark available on GitHub under MIT license.

120 pure lambda calculus problems
Automated evaluation harness
Reference solutions included
Live leaderboard generator
Multi-provider model support

Capabilities

Key Features

120 pure lambda calculus programming problems
12 problem categories including Church/Scott encodings and Algorithms
Automated evaluation harness via CLI
Live leaderboard on GitHub Pages
Lamb minimal lambda calculus language
Multi-provider AI model support (OpenAI, Anthropic, Google)
Timestamped result files
Reference solutions for all 120 tasks
Flexible CLI flags for filtering and concurrency
v1 pass-rate scoring with future size-based scoring planned

Integrations

OpenAI API

Anthropic API

Google AI API

Bun runtime

API Available

View Docs

Back to all tools