EveryDev.ai
Sign inSubscribe
Explore AI Tools
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    Home
    Developers

    2,240+ AI companies

    • Radar
    • Trending
    1. Home
    2. Developers
    3. SWE-bench

    SWE-bench

    SWE-bench is a benchmark for evaluating the ability of AI models to resolve real-world software engineering issues in popular GitHub repositories.

    Visit Website

    At a Glance

    2Tools Listed
    4Products
    4Capabilities
    Discussions
    Princeton, NJHeadquarters
    2023Est.
    15Employees
    Focus Areas
    LLM Evaluations
    Automated Testing
    AI Coding Assistants
    Agent Harness
    Connect
    Latest News
    Claude 4.5 Opus achieves top spot on SWE-bench Verified leaderboard with 76.8% resolution rate.May 1, 2026
    OpenAI and Princeton release SWE-bench Verified to improve evaluation reliability.Aug 1, 2024
    Markets
    • AI Research Labs
    • Software Engineering Companies
    • LLM Providers

    AI Tools by SWE-bench

    (2)
    View SWE-smith
    SWE-smith tool icon

    SWE-smith

    SWE Agent Training Data Toolkit

    Agent HarnessAI Dev LibrariesHITL Training
    View SWE-bench
    SWE-bench tool icon

    SWE-bench

    LLM Software Engineering Benchmark

    LLM EvaluationsAutomated TestingAI Coding Asst.

    Discussions

    No discussions yet

    Be the first to start a discussion about SWE-bench

    Latest News

    05/01/2026

    Claude 4.5 Opus achieves top spot on SWE-bench Verified leaderboard with 76.8% resolution rate.

    swebench.com
    08/01/2024

    OpenAI and Princeton release SWE-bench Verified to improve evaluation reliability.

    OpenAI Blog
    10/01/2024

    SWE-bench Multimodal released, adding visual task capability testing.

    arXiv

    Products & Services

    4
    SWE-bench
    2023

    A benchmark for evaluating large language models on real-world software engineering tasks mined from GitHub.

    SWE-agent
    2024

    An open-source system that turns LLMs into software engineering agents capable of fixing bugs in real repositories.

    SWE-bench Verified
    Aug 2024

    A subset of 500 issues from SWE-bench that have been human-verified to be reliable for evaluation.

    SWE-bench Multimodal
    2024

    A version of the benchmark that includes visual information from UI issues and screenshots.

    Market Position

    The industry standard for evaluating autonomous software engineering agents.

    Leadership

    Founders

    CE

    Carlos E. Jimenez

    Ph.D. Candidate at Princeton University, focused on natural language processing and software engineering agents.

    JY

    John Yang

    Ph.D. student at Stanford University (previously Princeton), creator of InterCode and lead developer of SWE-bench and SWE-agent.

    KN

    Karthik Narasimhan

    Assistant Professor of Computer Science at Princeton University, co-director of Princeton Language and Intelligence (PLI).

    Executive Team

    CE

    Carlos E. Jimenez

    Lead Researcher

    Princeton University Ph.D. student.

    JY

    John Yang

    Lead Developer

    Stanford/Princeton Ph.D. student.

    Board of Directors

    OP
    Ofir Press
    Advisor (Researcher at Princeton)
    AW
    Alexander Wettig
    Researcher (Princeton)
    SY
    Shunyu Yao
    Researcher (Princeton)

    Founding Story

    Started as a research project at Princeton University to bridge the gap between simple coding tasks and real-world software maintenance.

    Business Model

    Revenue Model

    Open-source research project; no direct revenue model. Supported by university grants and industry partnerships (compute/validation).

    Pricing Tiers

    Open Source
    $0

    Available for free on GitHub and for submission to the official leaderboard.

    N/A (Academic Project)

    Target Markets

    Industries & Segments
    • AI Research Labs
    • Software Engineering Companies
    • LLM Providers
    Use Cases
    • LLM Benchmarking
    • AI Agent Development
    • Software Engineering Automation
    Notable Customers
    • OpenAI
    • Anthropic
    • Google DeepMind
    • Meta AI

    Quick Facts

    Headquarters
    Princeton, NJ
    Founded
    2023
    Entity Type
    Academic Research Project / University-affiliated Entity
    Employees
    15
    Total Funding
    Funded by Princeton Language and Intelligence (PLI) and industry partners like OpenAI.
    Investors
    Princeton University, OpenAI
    Office Locations
    Princeton University
    Stanford University

    Funding History

    Research SponsorshipUndisclosed (Compute/Human validation)
    2024
    OpenAI
    Amazon Web Services (AWS)

    History & Milestones

    May 2026

    Leaderboard updated with frontier models like Claude 4.5 and Gemini 3 Flash, showing significant performance improvements.

    May 2024

    SWE-bench presented as an Oral presentation at ICLR 2024.

    Aug 2024

    Release of SWE-bench Verified in collaboration with OpenAI, featuring human-validated issues.

    Late 2024

    Introduction of SWE-bench Multimodal and SWE-bench Multilingual.

    Oct 2023

    Initial release of SWE-bench paper and dataset.

    Key Capabilities

    4
    Automated evaluation harness
    Human-verified subset
    Multimodal support
    Real-world repository context

    Integrations & Partnerships

    Platform Integrations

    • GitHub
    • Docker
    • PyPI

    Key Partnerships

    OpenAI (Verified subset)
    Scale AI (Pro version collaboration)

    Connect

    Website
    swebench.com
    GitHub
    SWE-bench
    X / Twitter
    jyangballin

    AI Topics

    6

    SWE-bench focuses on these topics:

    LLM Evaluations(1)
    Automated Testing(1)
    AI Coding Assistants(1)
    Agent Harness(1)
    AI Development Libraries(1)
    Human-in-the-Loop Training(1)
    Back to all developers