ExploitBench

ExploitBench measures the capability of AI cybersecurity agents to climb the 'exploitation ladder,' ranging from reaching vulnerable code to executing arbitrary payloads.

Visit Website

At a Glance

11Tool Views

Pittsburgh, PAHeadquarters

2024Est.

15Employees

AI Tools by ExploitBench

(1)

ExploitBench

AI Security Exploit Benchmark

LLM Evaluations Security Testing Agent Harness

Discussions

No discussions yet

Be the first to start a discussion about ExploitBench

Latest News

05/13/2024

Introducing ExploitBench: The First Benchmark Built to Measure AI Model Exploitation.

facebook.com

05/14/2026

A Capability Ladder Benchmark for LLM Cybersecurity Agents published on arXiv.

arxiv.org

Products & Services

ExploitBench Benchmark

May 2024

A specialized benchmark designed to measure the 'exploitation ladder' for AI agents, covering steps from vulnerability discovery to arbitrary code execution.

ExploitGym

2024

An evaluation environment and toolkit for testing LLM-based cybersecurity agents against hardened vulnerability targets.

Market Position

ExploitBench is the first benchmark to offer a granular, ladder-based approach to measuring autonomous exploitation, providing deeper insights than binary pass/fail tests.

Leadership

Founders

Dr. David Brumley

Professor at Carnegie Mellon University (CMU) and Director of CyLab. He was formerly the CEO and Founder of ForAllSecure (acquired by Bugcrowd) and currently serves as the Chief AI and Science Officer at Bugcrowd.

Seunghyun Lee

PhD Student at Carnegie Mellon University and a leading security researcher specializing in Chrome V8 vulnerability research.

Executive Team

Dr. David Brumley

Project Lead / Professor

Renowned cybersecurity expert, professor at CMU, and executive at Bugcrowd.

Seunghyun Lee

Lead Researcher / PhD Student

Security researcher at Carnegie Mellon University focusing on autonomous exploitation.

Board of Directors

Dave Gerry

CEO, Bugcrowd (Partner)

David Brumley

Lead Advisor / Professor

Founding Story

ExploitBench was created by researchers at CMU and Bugcrowd to provide a realistic, hardened evaluation standard for the growing field of autonomous AI cybersecurity agents, moving beyond simple static analysis.

Business Model

Revenue Model

Open Source Research Initiative. Funding and support provided by Carnegie Mellon University and Bugcrowd.

Pricing Tiers

Open Source

Free

The benchmark and associated code are available on GitHub for the global research community.

N/A (Research Project)

Target Markets

Industries & Segments

AI Safety Researchers
Cybersecurity Professionals
Government Defense Units

Use Cases

Benchmarking large language models (LLMs) for security
Red teaming AI agents
Evaluating defensive AI capabilities

Notable Customers

Anthropic
Bugcrowd
Carnegie Mellon University

Quick Facts

Headquarters

Pittsburgh, PA

Founded

2024

Entity Type

Academic Research Project / Open Source Initiative

Employees

Total Funding

Supported by CMU research grants and Bugcrowd institutional support.

Investors

Carnegie Mellon University, Bugcrowd

Office Locations

Pittsburgh

San Francisco

History & Milestones

May 14, 2026

Release of the comprehensive research paper 'A Capability Ladder Benchmark for LLM Cybersecurity Agents' detailing the ExploitBench framework.

May 13, 2024

Official launch and announcement of ExploitBench, the first benchmark for measuring AI model exploitation capabilities.

Key Capabilities

16 checkpoints across 5 tiered capability levels

Measurement of the full exploitation lifecycle

Hardened, real-world bug targets (e.g., Chrome V8)

Open-source evaluation environment

Integrations & Partnerships

Platform Integrations

GitHub
ExploitGym

Key Partnerships

Bugcrowd

Carnegie Mellon University CyLab

Connect

Website

GitHub

X / Twitter

AI Topics

ExploitBench focuses on these topics:

LLM Evaluations(1)

Security Testing(1)

Agent Harness(1)

Back to all developers Suggest an edit

ExploitBench

ExploitBench measures the capability of AI cybersecurity agents to climb the 'exploitation ladder,' ranging from reaching vulnerable code to executing arbitrary payloads.

Visit Website

At a Glance

11Tool Views

Pittsburgh, PAHeadquarters

2024Est.

15Employees

AI Tools by ExploitBench

(1)

ExploitBench

AI Security Exploit Benchmark

LLM Evaluations Security Testing Agent Harness

Discussions

No discussions yet

Be the first to start a discussion about ExploitBench

Latest News

05/13/2024

Introducing ExploitBench: The First Benchmark Built to Measure AI Model Exploitation.

facebook.com

05/14/2026

A Capability Ladder Benchmark for LLM Cybersecurity Agents published on arXiv.

arxiv.org

Products & Services

ExploitBench Benchmark

May 2024

A specialized benchmark designed to measure the 'exploitation ladder' for AI agents, covering steps from vulnerability discovery to arbitrary code execution.

ExploitGym

2024

An evaluation environment and toolkit for testing LLM-based cybersecurity agents against hardened vulnerability targets.

Market Position

ExploitBench is the first benchmark to offer a granular, ladder-based approach to measuring autonomous exploitation, providing deeper insights than binary pass/fail tests.

Leadership

Founders

Dr. David Brumley

Seunghyun Lee

PhD Student at Carnegie Mellon University and a leading security researcher specializing in Chrome V8 vulnerability research.

Executive Team

Dr. David Brumley

Project Lead / Professor

Renowned cybersecurity expert, professor at CMU, and executive at Bugcrowd.

Seunghyun Lee

Lead Researcher / PhD Student

Security researcher at Carnegie Mellon University focusing on autonomous exploitation.

Board of Directors

Dave Gerry

CEO, Bugcrowd (Partner)

David Brumley

Lead Advisor / Professor

Founding Story

Business Model

Revenue Model

Open Source Research Initiative. Funding and support provided by Carnegie Mellon University and Bugcrowd.

Pricing Tiers

Open Source

Free

The benchmark and associated code are available on GitHub for the global research community.

N/A (Research Project)

Target Markets

Industries & Segments

AI Safety Researchers
Cybersecurity Professionals
Government Defense Units

Use Cases

Benchmarking large language models (LLMs) for security
Red teaming AI agents
Evaluating defensive AI capabilities

Notable Customers

Anthropic
Bugcrowd
Carnegie Mellon University

Quick Facts

Headquarters

Pittsburgh, PA

Founded

2024

Entity Type

Academic Research Project / Open Source Initiative

Employees

Total Funding

Supported by CMU research grants and Bugcrowd institutional support.

Investors

Carnegie Mellon University, Bugcrowd

Office Locations

Pittsburgh

San Francisco

History & Milestones

May 14, 2026

Release of the comprehensive research paper 'A Capability Ladder Benchmark for LLM Cybersecurity Agents' detailing the ExploitBench framework.

May 13, 2024

Official launch and announcement of ExploitBench, the first benchmark for measuring AI model exploitation capabilities.

Key Capabilities

16 checkpoints across 5 tiered capability levels

Measurement of the full exploitation lifecycle

Hardened, real-world bug targets (e.g., Chrome V8)

Open-source evaluation environment

Integrations & Partnerships

Platform Integrations

GitHub
ExploitGym

Key Partnerships

Bugcrowd

Carnegie Mellon University CyLab

Connect

Website

GitHub

X / Twitter

AI Topics

ExploitBench focuses on these topics:

LLM Evaluations(1)

Security Testing(1)

Agent Harness(1)

Back to all developers Suggest an edit