SWE-smith
An open-source toolkit for generating training data and task instances for software engineering agents, enabling fine-tuning of LMs on real GitHub repositories.
At a Glance
Fully free and open-source under the MIT License. All features, dataset, and model weights are freely available.
Engagement
Available On
Alternatives
Listed May 2026
About SWE-smith
SWE-smith is an open-source toolkit developed by researchers at Stanford University, Princeton Language & Intelligence, and Alibaba Qwen for generating training data for software engineering (SWE) agents. Released in April 2025, it lets users turn any GitHub repository into a SWE-gym and synthesize hundreds to thousands of task instances — including file localization, program repair, and SWE-bench-style tasks — for training language models. The project was accepted as a Spotlight paper at NeurIPS 2025 Datasets & Benchmarks Track.
What It Is
SWE-smith is a data generation pipeline and training toolkit targeting the problem of scarce, high-quality training data for software engineering agents. It automates the process of creating execution environments from GitHub repositories, synthesizing bug-inducing task instances, filtering them by unit test breakage, and generating natural-language issue descriptions. The result is a scalable dataset factory that can produce task instances for any Python-based GitHub repository.
How the Pipeline Works
The SWE-smith workflow follows four main steps:
- Environment construction: Wrap a GitHub repository in a Docker-based execution environment.
- Task synthesis: Automatically generate code mutations that introduce bugs or regressions.
- Harness filtering: Keep only tasks that break one or more unit tests, ensuring task validity.
- Issue generation: Produce natural-language issue descriptions for each task, mimicking real GitHub issues.
The toolkit requires Docker and was developed and tested on Ubuntu 22.04.4 LTS. The project explicitly states it does not plan to support Windows or macOS.
Dataset and Model Resources
The SWE-bench organization publishes several artifacts alongside the toolkit:
- 52,000+ task instances across 128 popular GitHub repositories, available on Hugging Face as
SWE-bench/SWE-smith. - SWE-agent-LM-32B, a fine-tuned version of Qwen 2.5 Coder trained on SWE-smith data, which the project reports achieves 40.2% pass@1 on SWE-bench Verified — described by the authors as a +32% jump over the base model.
- 26,000 SWE-agent trajectories, including the 5,000 used to train SWE-agent-LM-32B.
- 250+ Docker environments, one per repository represented in the dataset.
Training Integrations
SWE-smith has been used for two training paradigms according to the project documentation:
- Supervised fine-tuning of Qwen 2.5 Coder into SWE-agent-LM-32B using the SWE-agent framework.
- GRPO-style reinforcement learning using the SkyRL framework from NovaSky-AI.
The Python API makes it straightforward to load task instances from Hugging Face Datasets and spin up Docker containers pre-initialized with each task, leaving the training loop to the user.
Update: NeurIPS 2025 Spotlight and Open-Source Release
SWE-smith was publicly released on April 30, 2025, with the full toolkit, dataset, model weights, and trajectories open-sourced under the MIT license. The paper was accepted as a Spotlight at NeurIPS 2025 Datasets & Benchmarks Track (arXiv:2504.21798). The repository was last pushed in May 2026, indicating active ongoing development. The project is part of a broader SWE-bench ecosystem that includes SWE-bench, SWE-agent, Mini-SWE-Agent, SWE-ReX, and sb-cli.
Community Discussions
Be the first to start a conversation about SWE-smith
Share your experience with SWE-smith, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under the MIT License. All features, dataset, and model weights are freely available.
- Full toolkit source code under MIT License
- 52k+ task instances on Hugging Face
- 250+ Docker environments
- SWE-agent-LM-32B model weights
- 26k SWE-agent trajectories
Capabilities
Key Features
- Turn any GitHub repository into a SWE-gym execution environment
- Synthesize unlimited task instances (file localization, program repair, SWE-bench-style)
- Filter tasks by unit test breakage for quality assurance
- Generate natural-language issue descriptions for tasks
- 52k+ pre-built task instances across 128 GitHub repositories
- Docker-based isolated execution environments
- Python API for loading tasks and spinning up containers
- Supports supervised fine-tuning and GRPO-style reinforcement learning
- Compatible with SWE-agent training framework
- Pre-trained SWE-agent-LM-32B model weights available on Hugging Face
Integrations
Demo Video

