EveryDev.ai
Sign inSubscribe
AI Tools by Topic
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    Home
    Developers

    2,259+ AI companies

    • Radar
    • Trending
    1. Home
    2. Developers
    3. ExploitBench

    ExploitBench

    ExploitBench measures the capability of AI cybersecurity agents to climb the 'exploitation ladder,' ranging from reaching vulnerable code to executing arbitrary payloads.

    Visit Website

    At a Glance

    1Tool Listed
    2Products
    4Capabilities
    Discussions
    Pittsburgh, PAHeadquarters
    2024Est.
    15Employees
    Focus Areas
    LLM Evaluations
    Security Testing
    Agent Harness
    Connect
    Latest News
    Introducing ExploitBench: The First Benchmark Built to Measure AI Model Exploitation.May 13, 2024
    A Capability Ladder Benchmark for LLM Cybersecurity Agents published on arXiv.May 14, 2026
    Markets
    • AI Safety Researchers
    • Cybersecurity Professionals
    • Government Defense Units

    AI Tools by ExploitBench

    (1)
    View ExploitBench
    ExploitBench tool icon

    ExploitBench

    AI Security Exploit Benchmark

    LLM EvaluationsSecurity TestingAgent Harness

    Discussions

    No discussions yet

    Be the first to start a discussion about ExploitBench

    Latest News

    05/13/2024

    Introducing ExploitBench: The First Benchmark Built to Measure AI Model Exploitation.

    facebook.com
    05/14/2026

    A Capability Ladder Benchmark for LLM Cybersecurity Agents published on arXiv.

    arxiv.org

    Products & Services

    2
    ExploitBench Benchmark
    May 2024

    A specialized benchmark designed to measure the 'exploitation ladder' for AI agents, covering steps from vulnerability discovery to arbitrary code execution.

    ExploitGym
    2024

    An evaluation environment and toolkit for testing LLM-based cybersecurity agents against hardened vulnerability targets.

    Market Position

    ExploitBench is the first benchmark to offer a granular, ladder-based approach to measuring autonomous exploitation, providing deeper insights than binary pass/fail tests.

    Leadership

    Founders

    DD

    Dr. David Brumley

    Professor at Carnegie Mellon University (CMU) and Director of CyLab. He was formerly the CEO and Founder of ForAllSecure (acquired by Bugcrowd) and currently serves as the Chief AI and Science Officer at Bugcrowd.

    SL

    Seunghyun Lee

    PhD Student at Carnegie Mellon University and a leading security researcher specializing in Chrome V8 vulnerability research.

    Executive Team

    DD

    Dr. David Brumley

    Project Lead / Professor

    Renowned cybersecurity expert, professor at CMU, and executive at Bugcrowd.

    SL

    Seunghyun Lee

    Lead Researcher / PhD Student

    Security researcher at Carnegie Mellon University focusing on autonomous exploitation.

    Board of Directors

    DG
    Dave Gerry
    CEO, Bugcrowd (Partner)
    DB
    David Brumley
    Lead Advisor / Professor

    Founding Story

    ExploitBench was created by researchers at CMU and Bugcrowd to provide a realistic, hardened evaluation standard for the growing field of autonomous AI cybersecurity agents, moving beyond simple static analysis.

    Business Model

    Revenue Model

    Open Source Research Initiative. Funding and support provided by Carnegie Mellon University and Bugcrowd.

    Pricing Tiers

    Open Source
    Free

    The benchmark and associated code are available on GitHub for the global research community.

    N/A (Research Project)

    Target Markets

    Industries & Segments
    • AI Safety Researchers
    • Cybersecurity Professionals
    • Government Defense Units
    Use Cases
    • Benchmarking large language models (LLMs) for security
    • Red teaming AI agents
    • Evaluating defensive AI capabilities
    Notable Customers
    • Anthropic
    • Bugcrowd
    • Carnegie Mellon University

    Quick Facts

    Headquarters
    Pittsburgh, PA
    Founded
    2024
    Entity Type
    Academic Research Project / Open Source Initiative
    Employees
    15
    Total Funding
    Supported by CMU research grants and Bugcrowd institutional support.
    Investors
    Carnegie Mellon University, Bugcrowd
    Office Locations
    Pittsburgh
    San Francisco

    History & Milestones

    May 14, 2026

    Release of the comprehensive research paper 'A Capability Ladder Benchmark for LLM Cybersecurity Agents' detailing the ExploitBench framework.

    May 13, 2024

    Official launch and announcement of ExploitBench, the first benchmark for measuring AI model exploitation capabilities.

    Key Capabilities

    4
    16 checkpoints across 5 tiered capability levels
    Measurement of the full exploitation lifecycle
    Hardened, real-world bug targets (e.g., Chrome V8)
    Open-source evaluation environment

    Integrations & Partnerships

    Platform Integrations

    • GitHub
    • ExploitGym

    Key Partnerships

    Bugcrowd
    Carnegie Mellon University CyLab

    Connect

    Website
    exploitbench.ai
    GitHub
    exploitbench
    X / Twitter
    0x10n
    LinkedIn
    thedavidbrumley

    AI Topics

    3

    ExploitBench focuses on these topics:

    LLM Evaluations(1)
    Security Testing(1)
    Agent Harness(1)
    Back to all developers