SWE-bench

SWE-bench is a benchmark for evaluating the ability of AI models to resolve real-world software engineering issues in popular GitHub repositories.

Visit Website

At a Glance

22Tool Views

Princeton, NJHeadquarters

2023Est.

15Employees

AI Tools by SWE-bench

(2)

SWE-smith

SWE Agent Training Data Toolkit

Agent Harness AI Dev Libraries HITL Training

SWE-bench

LLM Software Engineering Benchmark

LLM Evaluations Automated Testing AI Coding Asst.

Discussions

No discussions yet

Be the first to start a discussion about SWE-bench

Latest News

05/01/2026

Claude 4.5 Opus achieves top spot on SWE-bench Verified leaderboard with 76.8% resolution rate.

swebench.com

08/01/2024

OpenAI and Princeton release SWE-bench Verified to improve evaluation reliability.

OpenAI Blog

10/01/2024

SWE-bench Multimodal released, adding visual task capability testing.

arXiv

Products & Services

SWE-bench

2023

A benchmark for evaluating large language models on real-world software engineering tasks mined from GitHub.

SWE-agent

2024

An open-source system that turns LLMs into software engineering agents capable of fixing bugs in real repositories.

SWE-bench Verified

Aug 2024

A subset of 500 issues from SWE-bench that have been human-verified to be reliable for evaluation.

SWE-bench Multimodal

2024

A version of the benchmark that includes visual information from UI issues and screenshots.

Market Position

The industry standard for evaluating autonomous software engineering agents.

Leadership

Founders

Carlos E. Jimenez

Ph.D. Candidate at Princeton University, focused on natural language processing and software engineering agents.

John Yang

Ph.D. student at Stanford University (previously Princeton), creator of InterCode and lead developer of SWE-bench and SWE-agent.

Karthik Narasimhan

Assistant Professor of Computer Science at Princeton University, co-director of Princeton Language and Intelligence (PLI).

Executive Team

Carlos E. Jimenez

Lead Researcher

Princeton University Ph.D. student.

John Yang

Lead Developer

Stanford/Princeton Ph.D. student.

Board of Directors

Ofir Press

Advisor (Researcher at Princeton)

Alexander Wettig

Researcher (Princeton)

Shunyu Yao

Researcher (Princeton)

Founding Story

Started as a research project at Princeton University to bridge the gap between simple coding tasks and real-world software maintenance.

Business Model

Revenue Model

Open-source research project; no direct revenue model. Supported by university grants and industry partnerships (compute/validation).

Pricing Tiers

Open Source

Available for free on GitHub and for submission to the official leaderboard.

N/A (Academic Project)

Target Markets

Industries & Segments

AI Research Labs
Software Engineering Companies
LLM Providers

Use Cases

LLM Benchmarking
AI Agent Development
Software Engineering Automation

Notable Customers

OpenAI
Anthropic
Google DeepMind
Meta AI

Quick Facts

Headquarters

Princeton, NJ

Founded

2023

Entity Type

Academic Research Project / University-affiliated Entity

Employees

Total Funding

Funded by Princeton Language and Intelligence (PLI) and industry partners like OpenAI.

Investors

Princeton University, OpenAI

Office Locations

Princeton University

Stanford University

Funding History

Research SponsorshipUndisclosed (Compute/Human validation)

2024

OpenAI

Amazon Web Services (AWS)

History & Milestones

May 2026

Leaderboard updated with frontier models like Claude 4.5 and Gemini 3 Flash, showing significant performance improvements.

May 2024

SWE-bench presented as an Oral presentation at ICLR 2024.

Aug 2024

Release of SWE-bench Verified in collaboration with OpenAI, featuring human-validated issues.

Late 2024

Introduction of SWE-bench Multimodal and SWE-bench Multilingual.

Oct 2023

Initial release of SWE-bench paper and dataset.

Key Capabilities

Automated evaluation harness

Human-verified subset

Multimodal support

Real-world repository context

Integrations & Partnerships

Platform Integrations

GitHub
Docker
PyPI

Key Partnerships

OpenAI (Verified subset)

Scale AI (Pro version collaboration)

Connect

Website

swebench.com

GitHub

SWE-bench

X / Twitter

jyangballin

AI Topics

SWE-bench focuses on these topics:

LLM Evaluations(1)

Automated Testing(1)

AI Coding Assistants(1)

Agent Harness(1)

AI Development Libraries(1)

Human-in-the-Loop Training(1)

Back to all developers Suggest an edit

SWE-bench

SWE-bench is a benchmark for evaluating the ability of AI models to resolve real-world software engineering issues in popular GitHub repositories.

Visit Website

At a Glance

22Tool Views

Princeton, NJHeadquarters

2023Est.

15Employees

AI Tools by SWE-bench

(2)

SWE-smith

SWE Agent Training Data Toolkit

Agent Harness AI Dev Libraries HITL Training

SWE-bench

LLM Software Engineering Benchmark

LLM Evaluations Automated Testing AI Coding Asst.

Discussions

No discussions yet

Be the first to start a discussion about SWE-bench

Latest News

05/01/2026

Claude 4.5 Opus achieves top spot on SWE-bench Verified leaderboard with 76.8% resolution rate.

swebench.com

08/01/2024

OpenAI and Princeton release SWE-bench Verified to improve evaluation reliability.

OpenAI Blog

10/01/2024

SWE-bench Multimodal released, adding visual task capability testing.

arXiv

Products & Services

SWE-bench

2023

A benchmark for evaluating large language models on real-world software engineering tasks mined from GitHub.

SWE-agent

2024

An open-source system that turns LLMs into software engineering agents capable of fixing bugs in real repositories.

SWE-bench Verified

Aug 2024

A subset of 500 issues from SWE-bench that have been human-verified to be reliable for evaluation.

SWE-bench Multimodal

2024

A version of the benchmark that includes visual information from UI issues and screenshots.

Market Position

The industry standard for evaluating autonomous software engineering agents.

Leadership

Founders

Carlos E. Jimenez

Ph.D. Candidate at Princeton University, focused on natural language processing and software engineering agents.

John Yang

Ph.D. student at Stanford University (previously Princeton), creator of InterCode and lead developer of SWE-bench and SWE-agent.

Karthik Narasimhan

Assistant Professor of Computer Science at Princeton University, co-director of Princeton Language and Intelligence (PLI).

Executive Team

Carlos E. Jimenez

Lead Researcher

Princeton University Ph.D. student.

John Yang

Lead Developer

Stanford/Princeton Ph.D. student.

Board of Directors

Ofir Press

Advisor (Researcher at Princeton)

Alexander Wettig

Researcher (Princeton)

Shunyu Yao

Researcher (Princeton)

Founding Story

Started as a research project at Princeton University to bridge the gap between simple coding tasks and real-world software maintenance.

Business Model

Revenue Model

Open-source research project; no direct revenue model. Supported by university grants and industry partnerships (compute/validation).

Pricing Tiers

Open Source

Available for free on GitHub and for submission to the official leaderboard.

N/A (Academic Project)

Target Markets

Industries & Segments

AI Research Labs
Software Engineering Companies
LLM Providers

Use Cases

LLM Benchmarking
AI Agent Development
Software Engineering Automation

Notable Customers

OpenAI
Anthropic
Google DeepMind
Meta AI

Quick Facts

Headquarters

Princeton, NJ

Founded

2023

Entity Type

Academic Research Project / University-affiliated Entity

Employees

Total Funding

Funded by Princeton Language and Intelligence (PLI) and industry partners like OpenAI.

Investors

Princeton University, OpenAI

Office Locations

Princeton University

Stanford University

Funding History

Research SponsorshipUndisclosed (Compute/Human validation)

2024

OpenAI

Amazon Web Services (AWS)

History & Milestones

May 2026

Leaderboard updated with frontier models like Claude 4.5 and Gemini 3 Flash, showing significant performance improvements.

May 2024

SWE-bench presented as an Oral presentation at ICLR 2024.

Aug 2024

Release of SWE-bench Verified in collaboration with OpenAI, featuring human-validated issues.

Late 2024

Introduction of SWE-bench Multimodal and SWE-bench Multilingual.

Oct 2023

Initial release of SWE-bench paper and dataset.

Key Capabilities

Automated evaluation harness

Human-verified subset

Multimodal support

Real-world repository context

Integrations & Partnerships

Platform Integrations

GitHub
Docker
PyPI

Key Partnerships

OpenAI (Verified subset)

Scale AI (Pro version collaboration)

Connect

Website

swebench.com

GitHub

SWE-bench

X / Twitter

jyangballin

AI Topics

SWE-bench focuses on these topics:

LLM Evaluations(1)

Automated Testing(1)

AI Coding Assistants(1)

Agent Harness(1)

AI Development Libraries(1)

Human-in-the-Loop Training(1)

Back to all developers Suggest an edit