Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    2,085+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1181
    • Coding1018
    • Infrastructure446
    • Marketing412
    • Design362
    • Projects332
    • Analytics318
    • Research303
    • Testing197
    • Data169
    • Integration166
    • Security166
    • MCP158
    • Learning145
    • Communication129
    • Extensions119
    • Commerce115
    • Prompts114
    • Voice106
    • DevOps91
    • Web73
    • Finance19
    1. Home
    2. Tools
    3. LamBench
    LamBench icon

    LamBench

    LLM Evaluations

    A benchmark of 120 pure lambda calculus programming problems for evaluating how well AI models can implement algorithms using lambda encodings.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source benchmark available on GitHub under MIT license.

    Engagement

    Available On

    CLI
    Web
    API

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    LLM EvaluationsAI Development LibrariesLocal Inference

    Alternatives

    ZeroEvalArtificial AnalysisTruLens
    Developer
    VictorTaelinVictorTaelin builds open-source tools and languages focused…

    Listed Apr 2026

    About LamBench

    λ-bench (LamBench) is an open-source benchmark suite containing 120 pure lambda calculus programming problems designed to evaluate AI model capabilities in functional and symbolic reasoning. Each problem challenges a model to write a program in Lamb, a minimal lambda calculus language, using λ-encodings of data structures to implement specific algorithms. Models receive a problem description, data encoding specification, and test cases, then must return a single .lam program that passes all input/output pairs. The benchmark spans 12 categories ranging from trivial Church natural number arithmetic to highly complex tasks like BF interpreters, FFT, and Sudoku solvers — all in pure λ-calculus.

    • 120 Diverse Problems — Problems are organized across 12 categories including Church Naturals, Scott Naturals, Church/Scott Lists, Trees, ADTs, N-Tuples, and complex Algorithms.
    • Live Leaderboard — A generated GitHub Pages landing page displays up-to-date rankings for all evaluated models, built by running bun run build.
    • Lamb Language — A minimal pure lambda calculus with named top-level definitions; no built-in data types — everything is λ-encoded using abstractions and applications.
    • Automated Evaluation Harness — Run bun bench <provider/model> to evaluate any supported model; results are written as timestamped text files in the res/ directory.
    • Flexible CLI Options — Supports --filter <prefix>, --concurrency <n>, --timeout <seconds>, and --no-reasoning flags for fine-grained benchmark control.
    • Multi-Provider Support — Works with OpenAI, Anthropic, and Google model APIs; API keys are stored in ~/.config/ for easy configuration.
    • v1 Scoring — Score is the pass rate (solved problems / 120); future versions will incorporate program size measured in bits against reference implementations.
    • Reference Solutions Included — The lam/ directory contains reference .lam solutions for all 120 tasks, enabling size-based comparisons.
    LamBench - 1

    Community Discussions

    Be the first to start a conversation about LamBench

    Share your experience with LamBench, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully free and open-source benchmark available on GitHub under MIT license.

    • 120 pure lambda calculus problems
    • Automated evaluation harness
    • Reference solutions included
    • Live leaderboard generator
    • Multi-provider model support

    Capabilities

    Key Features

    • 120 pure lambda calculus programming problems
    • 12 problem categories including Church/Scott encodings and Algorithms
    • Automated evaluation harness via CLI
    • Live leaderboard on GitHub Pages
    • Lamb minimal lambda calculus language
    • Multi-provider AI model support (OpenAI, Anthropic, Google)
    • Timestamped result files
    • Reference solutions for all 120 tasks
    • Flexible CLI flags for filtering and concurrency
    • v1 pass-rate scoring with future size-based scoring planned

    Integrations

    OpenAI API
    Anthropic API
    Google AI API
    Bun runtime
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate LamBench and help others make informed decisions.

    Developer

    VictorTaelin

    VictorTaelin builds open-source tools and languages focused on functional programming and formal methods. The LamBench project provides a rigorous benchmark for evaluating AI model capabilities in pure lambda calculus. The repository is hosted on GitHub and maintained as an open community resource.

    Read more about VictorTaelin
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    ZeroEval icon

    ZeroEval

    Open-source evaluation framework for testing large language models with zero-shot prompting on reasoning and coding tasks.

    Artificial Analysis icon

    Artificial Analysis

    Independent AI model benchmarking platform providing comprehensive performance analysis across intelligence, speed, cost, and quality metrics

    TruLens icon

    TruLens

    Open-source library for evaluating and tracking LLM applications with feedback functions and observability tools.

    Browse all tools

    Related Topics

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    61 tools

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    141 tools

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    84 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions