SWE-bench
SWE-bench is a benchmark for evaluating the ability of AI models to resolve real-world software engineering issues in popular GitHub repositories.
At a Glance
- AI Research Labs
- Software Engineering Companies
- LLM Providers
AI Tools by SWE-bench
(2)SWE-smith
SWE Agent Training Data Toolkit
SWE-bench
LLM Software Engineering Benchmark
Discussions
No discussions yet
Be the first to start a discussion about SWE-bench
Latest News
Claude 4.5 Opus achieves top spot on SWE-bench Verified leaderboard with 76.8% resolution rate.
OpenAI and Princeton release SWE-bench Verified to improve evaluation reliability.
SWE-bench Multimodal released, adding visual task capability testing.
Products & Services
A benchmark for evaluating large language models on real-world software engineering tasks mined from GitHub.
An open-source system that turns LLMs into software engineering agents capable of fixing bugs in real repositories.
A subset of 500 issues from SWE-bench that have been human-verified to be reliable for evaluation.
A version of the benchmark that includes visual information from UI issues and screenshots.
Market Position
The industry standard for evaluating autonomous software engineering agents.
Leadership
Founders
Carlos E. Jimenez
Ph.D. Candidate at Princeton University, focused on natural language processing and software engineering agents.
John Yang
Ph.D. student at Stanford University (previously Princeton), creator of InterCode and lead developer of SWE-bench and SWE-agent.
Karthik Narasimhan
Assistant Professor of Computer Science at Princeton University, co-director of Princeton Language and Intelligence (PLI).
Executive Team
Carlos E. Jimenez
Lead Researcher
Princeton University Ph.D. student.
John Yang
Lead Developer
Stanford/Princeton Ph.D. student.
Board of Directors
Founding Story
Started as a research project at Princeton University to bridge the gap between simple coding tasks and real-world software maintenance.
Business Model
Revenue Model
Open-source research project; no direct revenue model. Supported by university grants and industry partnerships (compute/validation).
Pricing Tiers
Available for free on GitHub and for submission to the official leaderboard.
Target Markets
- AI Research Labs
- Software Engineering Companies
- LLM Providers
- LLM Benchmarking
- AI Agent Development
- Software Engineering Automation
- OpenAI
- Anthropic
- Google DeepMind
- Meta AI