TMax

An open-source research codebase for training, evaluating, and deploying simple yet powerful terminal-using LLM agents, covering data generation, SFT, and RL training pipelines.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache 2.0. Self-host and use the codebase, models, and datasets at no cost.

Engagement

Available On

CLI

API

SDK

WAISeattle, WAEst. 2026

Listed Jun 2026

About TMax

TMax is an open-source project from AllenAI (Allen Institute for AI) focused on building simple, powerful terminal-using agents. Released under the Apache 2.0 license, the codebase covers the full lifecycle of terminal agent development: synthetic data generation, supervised fine-tuning (SFT), reinforcement learning (RL) training, and evaluation against benchmarks like Terminal-Bench and SWE-bench.

What It Is

TMax is a research framework for training LLM-based agents that interact with a terminal (bash shell) to complete tasks. The project trains a series of models — referred to as the "tmax" series — and provides all the tooling needed to reproduce or extend that work. It is accompanied by a paper on arXiv (2606.23321) and a blog post from the WAI organization. The codebase is written primarily in Python and managed with uv for dependency handling.

Four-Stage Pipeline Architecture

The repository is organized around four distinct stages:

Data generation (rl_data/): A scalable, diversity-aware pipeline that synthesizes terminal-agent tasks by sampling from structured compositional axes. Tasks are packaged as self-contained Apptainer/Docker environments with programmatic verifiers, then solved at pass@k and published to Hugging Face Hub.
Agent (Vanillux2Agent/): A direct LiteLLM agent built on the vanillux prompt harness — derived from mini-SWE-agent prompts — with a bash tool schema, submit marker, format-error recovery, and output truncation. It executes commands through Harbor's active environment.
Training (training/open-instruct/): A fork of AllenAI's open-instruct repository with fixes for Qwen 3.5 and terminal-agent training. SFT and DPPO RL launch scripts for tmax models are provided under training/open-instruct/scripts/tmax/.
Evaluation (scripts/ + beaker_configs/): Shell/Slurm launchers and a Beaker pipeline that serves a model with vLLM and runs Harbor datasets against it.

Task Data and the Harbor Ecosystem

TMax ships a full 15k task corpus in Harbor format, published on the Harbor registry as tmax/TMax-15K-Harbor. This corpus combines a legacy 10k set of self-contained tasks with 5k newer intricate multi-modal tasks. Every task includes a self-contained Harbor environment and a programmatic verifier, enabling any agent or model to be evaluated directly without regenerating data. The Harbor framework supports both local Docker and cloud-based Daytona sandbox execution.

Requirements and Setup Path

Running TMax requires:

uv for Python dependency management
An LLM API key (e.g., GEMINI_API_KEY) or a local vLLM/Ollama/OpenAI-compatible endpoint
apptainer on PATH for building and running task containers (data generation only)
A Dockerhub login and personal access token for training at scale
HF_TOKEN for Hugging Face upload and gated model access
A container runtime (Docker or Daytona) for evaluating on the published Harbor dataset

The quickstart involves running uv sync, then using provided shell scripts to generate tasks, solve them, analyze pass@k statistics, train models, and run evaluations.

Update: Initial Release

The repository was created in March 2026 and last updated in June 2026, with the initial release of the codebase, models, and the accompanying arXiv paper ("Tmax: A simple recipe for terminal agents"). The authors include Hamish Ivison, Junjie Oscar Yin, Rulin Shao, Teng Xiao, Nathan Lambert, and Hannaneh Hajishirzi. Models and datasets are published on Hugging Face under the allenai/tmax collection.

Community Discussions

Be the first to start a conversation about TMax

Share your experience with TMax, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache 2.0. Self-host and use the codebase, models, and datasets at no cost.

Full codebase access under Apache 2.0
Data generation pipeline
SFT and RL training scripts
Evaluation pipeline (Terminal-Bench, SWE-bench)
15k Harbor task corpus

Capabilities

Key Features

Terminal-using LLM agent training and evaluation
Compositional synthetic task data generation pipeline
Pass@k task solving with programmatic verifiers
SFT and DPPO RL training via open-instruct fork
Vanillux2Agent with bash tool schema and format-error recovery
15k Harbor task corpus with self-contained environments
vLLM model serving integration
Beaker and Slurm evaluation pipeline
Daytona and Docker sandbox support
Hugging Face Hub dataset publishing

Integrations

Hugging Face Hub

vLLM

LiteLLM

Harbor framework

Daytona

Docker

Apptainer

Beaker

Slurm

open-instruct

Qwen 3.5

Terminal-Bench

SWE-bench

API Available

View Docs

Back to all tools Suggest an edit

About TMax

What It Is

Four-Stage Pipeline Architecture

The repository is organized around four distinct stages:

Data generation (rl_data/): A scalable, diversity-aware pipeline that synthesizes terminal-agent tasks by sampling from structured compositional axes. Tasks are packaged as self-contained Apptainer/Docker environments with programmatic verifiers, then solved at pass@k and published to Hugging Face Hub.
Agent (Vanillux2Agent/): A direct LiteLLM agent built on the vanillux prompt harness — derived from mini-SWE-agent prompts — with a bash tool schema, submit marker, format-error recovery, and output truncation. It executes commands through Harbor's active environment.
Training (training/open-instruct/): A fork of AllenAI's open-instruct repository with fixes for Qwen 3.5 and terminal-agent training. SFT and DPPO RL launch scripts for tmax models are provided under training/open-instruct/scripts/tmax/.
Evaluation (scripts/ + beaker_configs/): Shell/Slurm launchers and a Beaker pipeline that serves a model with vLLM and runs Harbor datasets against it.

Task Data and the Harbor Ecosystem

Requirements and Setup Path

Running TMax requires:

uv for Python dependency management
An LLM API key (e.g., GEMINI_API_KEY) or a local vLLM/Ollama/OpenAI-compatible endpoint
apptainer on PATH for building and running task containers (data generation only)
A Dockerhub login and personal access token for training at scale
HF_TOKEN for Hugging Face upload and gated model access
A container runtime (Docker or Daytona) for evaluating on the published Harbor dataset

The quickstart involves running uv sync, then using provided shell scripts to generate tasks, solve them, analyze pass@k statistics, train models, and run evaluations.

TMax

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About TMax

What It Is

Four-Stage Pipeline Architecture

Task Data and the Harbor Ecosystem

Requirements and Setup Path

Update: Initial Release

Community Discussions

Be the first to start a conversation about TMax

Pricing

Open Source

Capabilities

Key Features

Integrations

TMax

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About TMax

What It Is

Four-Stage Pipeline Architecture

Task Data and the Harbor Ecosystem

Requirements and Setup Path

Update: Initial Release

Community Discussions

Be the first to start a conversation about TMax

Pricing

Open Source

Capabilities

Key Features

Integrations