# Anserini

> A Lucene-based toolkit for reproducible information retrieval research, bridging academic IR research and real-world search application development.

Anserini is an open-source Java toolkit built on Apache Lucene, designed to make information retrieval research reproducible and practically applicable. Maintained by the Castorini research group, it grew out of a 2016 reproducibility study of open-source retrieval engines (Lin et al., ECIR 2016) and has since been described in peer-reviewed publications at SIGIR 2017 and the Journal of Data and Information Quality (2018). The project is licensed under Apache 2.0 and is actively developed on GitHub with over 380 contributors.

## What It Is

Anserini is a research toolkit that wraps Apache Lucene to provide a principled, reproducible environment for information retrieval (IR) experiments. Its core job is to let researchers index document collections, run retrieval experiments, and reproduce published baselines — all with a consistent, version-controlled codebase. The project explicitly positions itself as a bridge between academic IR research and the engineering of real-world search systems. A companion Python interface, Pyserini, exposes most Anserini features for users who prefer Python over Java.

## Architecture and Setup Paths

Anserini offers two primary installation modes:

- **Fatjar**: A self-contained JAR downloaded via `curl`, requiring no repository clone. This is the fastest path for running experiments.
- **Dev environment**: A full repository clone for contributors or users who need to modify source code.

The toolkit is primarily written in Java (83%), with Python (14%) and Shell scripts rounding out the codebase. It is distributed on Maven Central under the `io.anserini` namespace, making it easy to include as a dependency in other Java projects.

## Reproducibility as a First-Class Goal

The project's stated mission is reproducible IR research. It ships with prebuilt index registries and topic registries so that published experimental results can be re-run with a single command. Two reproduction workflows are documented: one from prebuilt indexes (faster) and one from raw document collections (more thorough). The repository includes dedicated `runs/` and `logs/` directories to capture experiment outputs in a structured way, and CI badges confirm that the build and test suite remain green on the master branch.

## Agent-Aware Workflow

Anserini has added explicit support for coding agents (such as those powered by large language models). The repository includes an `.agents/skills/` directory with structured skill files for:

- Installing the dev environment or fatjar
- Running CLI commands (prebuilt-index registry, topics registry, search, REST workflows)
- Executing reproducibility experiments

The README provides direct prompt templates users can give to their coding agents, making Anserini one of the earlier research toolkits to formally document agent-oriented onboarding paths.

## Update: v2.0.0 and Lucene 10.4.0

As of April 12, 2026 (commit `c6eed6`), Anserini was upgraded to Lucene 10.4.0 as part of the v2.0.0 release. Lucene 9 indexes remain readable by the new code, but indexes generated by Lucene 10 cannot be read by older versions of Anserini. The repository shows active development with commits as recent as May 20, 2026, including SPLADE-v3 ONNX reproduction updates and locale-stable reproduction output fixes.

## Features
- Lucene-based indexing and retrieval
- Reproducible IR experiment framework
- Prebuilt index registry
- Topics registry
- BM25 and dense retrieval support
- SPLADE and ONNX model support
- Fatjar self-contained distribution
- Maven Central package
- Pyserini Python interface
- Agent-oriented skill files for coding agents
- REST API workflows
- Prebuilt and raw document collection reproduction paths

## Integrations
Apache Lucene, Pyserini, Maven Central, ONNX, SPLADE, trec_eval, MS MARCO, BEIR

## Platforms
CLI, API, DEVELOPER_SDK

## Pricing
Open Source

## Version
2.0.0

## Links
- Website: http://anserini.io/
- Documentation: https://github.com/castorini/anserini/blob/master/docs/additional-docs.md
- Repository: https://github.com/castorini/anserini
- EveryDev.ai: https://www.everydev.ai/tools/anserini
