LOFT

Name: LOFT
Availability: OnlineOnly
Author: Google DeepMind

LOFT (Long-context Frontiers) is a Google DeepMind benchmark for evaluating large language models on long-context retrieval and reasoning tasks across diverse modalities.

Visit Website

At a Glance

Pricing

Open Source

Freely available open-source benchmark for long-context LLM evaluation.

Engagement

Available On

Web

API

SDK

Google DeepMindLondon, EnglandEst. 2010$400 raised

Listed Mar 2026

About LOFT

LOFT (Long-context Frontiers) is an open-source benchmark released by Google DeepMind to evaluate large language models (LLMs) on tasks requiring long-context understanding, retrieval, and multi-step reasoning. It covers a wide range of task types and modalities, pushing the frontier of what LLMs can do with extended context windows. The benchmark is designed to surface real-world challenges in retrieval-augmented generation, multi-hop reasoning, and in-context learning at scale.

Long-context evaluation — Tests LLMs on tasks that require processing and reasoning over very long input contexts, up to millions of tokens.
Multi-task coverage — Includes diverse task types such as retrieval, multi-hop QA, summarization, and more, spanning text and other modalities.
Open-source research tool — Hosted on GitHub under Google DeepMind, making it freely accessible for researchers to reproduce, extend, and build upon.
Benchmark suite — Provides standardized datasets and evaluation scripts so researchers can compare model performance consistently.
RAG and in-context learning focus — Specifically designed to stress-test retrieval-augmented generation pipelines and in-context few-shot learning at long context lengths.
Getting started — Clone the repository from GitHub, follow the setup instructions in the README, and run the provided evaluation scripts against your model of choice.

Community Discussions

Be the first to start a conversation about LOFT

Share your experience with LOFT, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Freely available open-source benchmark for long-context LLM evaluation.

Full benchmark suite access
Standardized datasets
Evaluation scripts
Multi-task coverage
Reproducible research

Capabilities

Key Features

Long-context LLM evaluation
Multi-task benchmark suite
Retrieval and reasoning tasks
Multi-hop question answering
RAG pipeline stress-testing
In-context learning evaluation
Standardized datasets and evaluation scripts
Open-source and reproducible

Integrations

Python

HuggingFace

Google DeepMind models

API Available

Reviews & Ratings

No ratings yet

Be the first to rate LOFT and help others make informed decisions.

Developer

Google DeepMind

Google DeepMind is the artificial intelligence division of Alphabet Inc. that serves as the central engine behind Google's AI strategy. The organization was formed in April 2023 through the merger of two influential teams: Google Brain, the internal team that invented the Transformer architecture, and DeepMind, the London-based research lab acquired by Google in 2014 for approximately $500 million. The lab employs over 2,000 scientists and engineers across research centers in the United Kingdom, United States, Canada, France, Germany, and Switzerland. Google DeepMind develops the Gemini family of large language models alongside generative AI tools including Veo for video, Imagen for images, and Lyria for music. As of 2026, the Gemini App team operates under DeepMind, giving researchers direct control over both model development and product experience. ## Research Breakthroughs **AlphaGo** (2016) became the first program to defeat a world champion at Go, beating Lee Sedol 4-1 in a match watched by over 200 million viewers. **AlphaFold** solved the 50-year grand challenge of protein structure prediction, mapping over 200 million proteins and earning CEO Demis Hassabis and colleague John Jumper the 2024 Nobel Prize in Chemistry. Other notable projects include **AlphaStar** for StarCraft II mastery and **AlphaGeometry** for mathematical reasoning. ## 2026 Strategic Focus Google DeepMind has shifted focus from generative AI toward three major initiatives: **Agentic AI** — Building models with Deep Think reasoning capabilities that can independently plan and execute complex workflows, moving AI from a conversational tool to an autonomous partner. **Physical AI and Robotics** — A partnership with Boston Dynamics integrates Gemini Robotics models into the Atlas humanoid robot. The hire of Aaron Saunders, former CTO of Boston Dynamics, as VP of Hardware Engineering signals expanded investment in bridging AI software with physical hardware. **AI for Science** — The Genesis Mission, in partnership with the US Department of Energy, includes opening a fully automated materials science laboratory in the UK. AlphaGenome, targeting non-coding DNA mapping for genetic disease research, is expected in mid-2026. ## Leadership **Sir Demis Hassabis** serves as CEO. A former chess prodigy ranked second in the world for his age at 13, he later designed AI systems for video games including Theme Park before earning a PhD in cognitive neuroscience studying memory and imagination. He co-founded DeepMind in 2010 with Shane Legg and Mustafa Suleyman around a two-step mission: solve intelligence, then use it to solve everything else. **Dr. Jeff Dean** serves as Chief Scientist, overseeing technical infrastructure including Google's TPU supercomputers. He previously built foundational Google systems including MapReduce, BigTable, and Spanner.

Founded 2010

London, England

$400 raised

6,000 employees

Used by

Google (internal divisions: Android,…

National Health Service (NHS) - Royal…

Moorfields Eye Hospital

University College London Hospital

+9 more

Similar Tools

SkillsBench

An open-source evaluation framework that benchmarks how well AI agent skills work across diverse, expert-curated tasks in high-GDP-value domains.

LLM Stats

Public leaderboards and benchmark site that publishes verifiable evaluations, scores, and performance metrics for large language models and AI providers.

SciArena

Open evaluation platform from the Allen Institute for AI where researchers compare and rank foundation models on scientific literature tasks using head-to-head, literature-grounded responses.

Browse all tools

LOFT

LLM Evaluations

LOFT (Long-context Frontiers) is a Google DeepMind benchmark for evaluating large language models on long-context retrieval and reasoning tasks across diverse modalities.

Visit Website

At a Glance

Pricing

Open Source

Freely available open-source benchmark for long-context LLM evaluation.

Engagement

10views

Discussions

Available On

Web

API

SDK

Resources

Website GitHub llms.txt

Topics

LLM Evaluations Retrieval-Augmented Generation Academic Research

Alternatives

SkillsBench LLM Stats SciArena

Developer

Google DeepMindLondon, EnglandEst. 2010$400 raised

Listed Mar 2026

About LOFT

Long-context evaluation — Tests LLMs on tasks that require processing and reasoning over very long input contexts, up to millions of tokens.
Multi-task coverage — Includes diverse task types such as retrieval, multi-hop QA, summarization, and more, spanning text and other modalities.
Open-source research tool — Hosted on GitHub under Google DeepMind, making it freely accessible for researchers to reproduce, extend, and build upon.
Benchmark suite — Provides standardized datasets and evaluation scripts so researchers can compare model performance consistently.
RAG and in-context learning focus — Specifically designed to stress-test retrieval-augmented generation pipelines and in-context few-shot learning at long context lengths.
Getting started — Clone the repository from GitHub, follow the setup instructions in the README, and run the provided evaluation scripts against your model of choice.

Community Discussions

Be the first to start a conversation about LOFT

Share your experience with LOFT, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Freely available open-source benchmark for long-context LLM evaluation.

Full benchmark suite access
Standardized datasets
Evaluation scripts
Multi-task coverage
Reproducible research

Capabilities

Key Features

Long-context LLM evaluation
Multi-task benchmark suite
Retrieval and reasoning tasks
Multi-hop question answering
RAG pipeline stress-testing
In-context learning evaluation
Standardized datasets and evaluation scripts
Open-source and reproducible

Integrations

Python

HuggingFace

Google DeepMind models

API Available

Reviews & Ratings

No ratings yet

Be the first to rate LOFT and help others make informed decisions.

Developer

Google DeepMind

Founded 2010

London, England

$400 raised

6,000 employees

Used by

Google (internal divisions: Android,…

National Health Service (NHS) - Royal…

Moorfields Eye Hospital

University College London Hospital

+9 more

Similar Tools

SkillsBench

An open-source evaluation framework that benchmarks how well AI agent skills work across diverse, expert-curated tasks in high-GDP-value domains.

LLM Stats

Public leaderboards and benchmark site that publishes verifiable evaluations, scores, and performance metrics for large language models and AI providers.

SciArena

Open evaluation platform from the Allen Institute for AI where researchers compare and rank foundation models on scientific literature tasks using head-to-head, literature-grounded responses.

Browse all tools