SWE-smith

Name: SWE-smith
Availability: OnlineOnly
Author: SWE-bench

An open-source toolkit for generating training data and task instances for software engineering agents, enabling fine-tuning of LMs on real GitHub repositories.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under the MIT License. All features, dataset, and model weights are freely available.

Engagement

Available On

Windows

macOS

API

SDK

CLI

SWE-benchPrinceton, NJEst. 2023

Listed May 2026

About SWE-smith

SWE-smith is an open-source toolkit developed by researchers at Stanford University, Princeton Language & Intelligence, and Alibaba Qwen for generating training data for software engineering (SWE) agents. Released in April 2025, it lets users turn any GitHub repository into a SWE-gym and synthesize hundreds to thousands of task instances — including file localization, program repair, and SWE-bench-style tasks — for training language models. The project was accepted as a Spotlight paper at NeurIPS 2025 Datasets & Benchmarks Track.

What It Is

SWE-smith is a data generation pipeline and training toolkit targeting the problem of scarce, high-quality training data for software engineering agents. It automates the process of creating execution environments from GitHub repositories, synthesizing bug-inducing task instances, filtering them by unit test breakage, and generating natural-language issue descriptions. The result is a scalable dataset factory that can produce task instances for any Python-based GitHub repository.

How the Pipeline Works

The SWE-smith workflow follows four main steps:

Environment construction: Wrap a GitHub repository in a Docker-based execution environment.
Task synthesis: Automatically generate code mutations that introduce bugs or regressions.
Harness filtering: Keep only tasks that break one or more unit tests, ensuring task validity.
Issue generation: Produce natural-language issue descriptions for each task, mimicking real GitHub issues.

The toolkit requires Docker and was developed and tested on Ubuntu 22.04.4 LTS. The project explicitly states it does not plan to support Windows or macOS.

Dataset and Model Resources

The SWE-bench organization publishes several artifacts alongside the toolkit:

52,000+ task instances across 128 popular GitHub repositories, available on Hugging Face as SWE-bench/SWE-smith.
SWE-agent-LM-32B, a fine-tuned version of Qwen 2.5 Coder trained on SWE-smith data, which the project reports achieves 40.2% pass@1 on SWE-bench Verified — described by the authors as a +32% jump over the base model.
26,000 SWE-agent trajectories, including the 5,000 used to train SWE-agent-LM-32B.
250+ Docker environments, one per repository represented in the dataset.

Training Integrations

SWE-smith has been used for two training paradigms according to the project documentation:

Supervised fine-tuning of Qwen 2.5 Coder into SWE-agent-LM-32B using the SWE-agent framework.
GRPO-style reinforcement learning using the SkyRL framework from NovaSky-AI.

The Python API makes it straightforward to load task instances from Hugging Face Datasets and spin up Docker containers pre-initialized with each task, leaving the training loop to the user.

Update: NeurIPS 2025 Spotlight and Open-Source Release

SWE-smith was publicly released on April 30, 2025, with the full toolkit, dataset, model weights, and trajectories open-sourced under the MIT license. The paper was accepted as a Spotlight at NeurIPS 2025 Datasets & Benchmarks Track (arXiv:2504.21798). The repository was last pushed in May 2026, indicating active ongoing development. The project is part of a broader SWE-bench ecosystem that includes SWE-bench, SWE-agent, Mini-SWE-Agent, SWE-ReX, and sb-cli.

Community Discussions

Be the first to start a conversation about SWE-smith

Share your experience with SWE-smith, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under the MIT License. All features, dataset, and model weights are freely available.

Full toolkit source code under MIT License
52k+ task instances on Hugging Face
250+ Docker environments
SWE-agent-LM-32B model weights
26k SWE-agent trajectories

Capabilities

Key Features

Turn any GitHub repository into a SWE-gym execution environment
Synthesize unlimited task instances (file localization, program repair, SWE-bench-style)
Filter tasks by unit test breakage for quality assurance
Generate natural-language issue descriptions for tasks
52k+ pre-built task instances across 128 GitHub repositories
Docker-based isolated execution environments
Python API for loading tasks and spinning up containers
Supports supervised fine-tuning and GRPO-style reinforcement learning
Compatible with SWE-agent training framework
Pre-trained SWE-agent-LM-32B model weights available on Hugging Face

Integrations

Docker

Hugging Face Datasets

SWE-agent

SkyRL

Qwen 2.5 Coder

GitHub

SWE-bench

SWE-ReX

API Available

View Docs

Demo Video

Watch on YouTube

Back to all tools Suggest an edit