Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • News
  • Blogs
  • Builds
  • Contests
Create
Sign In
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    1,630+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    Categories
    • Coding737
    • Agents659
    • Marketing313
    • Infrastructure299
    • Design241
    • Analytics231
    • Research228
    • Projects222
    • Integration148
    • Testing129
    • Data127
    • Learning116
    • MCP114
    • Security108
    • Extensions96
    • Communication81
    • Prompts80
    • Commerce72
    • Voice72
    • Web59
    • DevOps46
    • Finance12
    Sign In
    1. Home
    2. Tools
    3. LOFT
    LOFT icon

    LOFT

    LLM Evaluations

    LOFT (Long-context Frontiers) is a Google DeepMind benchmark for evaluating large language models on long-context retrieval and reasoning tasks across diverse modalities.

    Visit Website

    At a Glance

    Pricing

    Open Source

    Freely available open-source benchmark for long-context LLM evaluation.

    Engagement

    Available On

    Web
    API
    SDK

    Resources

    WebsiteGitHubllms.txt

    Topics

    LLM EvaluationsRetrieval-Augmented GenerationAcademic Research

    Listed Mar 2026

    About LOFT

    LOFT (Long-context Frontiers) is an open-source benchmark released by Google DeepMind to evaluate large language models (LLMs) on tasks requiring long-context understanding, retrieval, and multi-step reasoning. It covers a wide range of task types and modalities, pushing the frontier of what LLMs can do with extended context windows. The benchmark is designed to surface real-world challenges in retrieval-augmented generation, multi-hop reasoning, and in-context learning at scale.

    • Long-context evaluation — Tests LLMs on tasks that require processing and reasoning over very long input contexts, up to millions of tokens.
    • Multi-task coverage — Includes diverse task types such as retrieval, multi-hop QA, summarization, and more, spanning text and other modalities.
    • Open-source research tool — Hosted on GitHub under Google DeepMind, making it freely accessible for researchers to reproduce, extend, and build upon.
    • Benchmark suite — Provides standardized datasets and evaluation scripts so researchers can compare model performance consistently.
    • RAG and in-context learning focus — Specifically designed to stress-test retrieval-augmented generation pipelines and in-context few-shot learning at long context lengths.
    • Getting started — Clone the repository from GitHub, follow the setup instructions in the README, and run the provided evaluation scripts against your model of choice.
    LOFT - 1

    Community Discussions

    Be the first to start a conversation about LOFT

    Share your experience with LOFT, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Freely available open-source benchmark for long-context LLM evaluation.

    • Full benchmark suite access
    • Standardized datasets
    • Evaluation scripts
    • Multi-task coverage
    • Reproducible research
    View official pricing

    Capabilities

    Key Features

    • Long-context LLM evaluation
    • Multi-task benchmark suite
    • Retrieval and reasoning tasks
    • Multi-hop question answering
    • RAG pipeline stress-testing
    • In-context learning evaluation
    • Standardized datasets and evaluation scripts
    • Open-source and reproducible

    Integrations

    Python
    HuggingFace
    Google DeepMind models
    API Available

    Reviews & Ratings

    No ratings yet

    Be the first to rate LOFT and help others make informed decisions.

    Developer

    Google DeepMind

    Google DeepMind is the artificial intelligence division of Alphabet Inc. that serves as the central engine behind Google's AI strategy. The organization was formed in April 2023 through the merger of two influential teams: Google Brain, the internal team that invented the Transformer architecture, and DeepMind, the London-based research lab acquired by Google in 2014 for approximately $500 million. The lab employs over 2,000 scientists and engineers across research centers in the United Kingdom, United States, Canada, France, Germany, and Switzerland. Google DeepMind develops the Gemini family of large language models alongside generative AI tools including Veo for video, Imagen for images, and Lyria for music. As of 2026, the Gemini App team operates under DeepMind, giving researchers direct control over both model development and product experience. ## Research Breakthroughs **AlphaGo** (2016) became the first program to defeat a world champion at Go, beating Lee Sedol 4-1 in a match watched by over 200 million viewers. **AlphaFold** solved the 50-year grand challenge of protein structure prediction, mapping over 200 million proteins and earning CEO Demis Hassabis and colleague John Jumper the 2024 Nobel Prize in Chemistry. Other notable projects include **AlphaStar** for StarCraft II mastery and **AlphaGeometry** for mathematical reasoning. ## 2026 Strategic Focus Google DeepMind has shifted focus from generative AI toward three major initiatives: **Agentic AI** — Building models with Deep Think reasoning capabilities that can independently plan and execute complex workflows, moving AI from a conversational tool to an autonomous partner. **Physical AI and Robotics** — A partnership with Boston Dynamics integrates Gemini Robotics models into the Atlas humanoid robot. The hire of Aaron Saunders, former CTO of Boston Dynamics, as VP of Hardware Engineering signals expanded investment in bridging AI software with physical hardware. **AI for Science** — The Genesis Mission, in partnership with the US Department of Energy, includes opening a fully automated materials science laboratory in the UK. AlphaGenome, targeting non-coding DNA mapping for genetic disease research, is expected in mid-2026. ## Leadership **Sir Demis Hassabis** serves as CEO. A former chess prodigy ranked second in the world for his age at 13, he later designed AI systems for video games including Theme Park before earning a PhD in cognitive neuroscience studying memory and imagination. He co-founded DeepMind in 2010 with Shane Legg and Mustafa Suleyman around a two-step mission: solve intelligence, then use it to solve everything else. **Dr. Jeff Dean** serves as Chief Scientist, overseeing technical infrastructure including Google's TPU supercomputers. He previously built foundational Google systems including MapReduce, BigTable, and Spanner.

    Founded 2010
    London, England
    $400 raised
    6,000 employees

    Used by

    Google (internal divisions: Android,…
    National Health Service (NHS) - Royal…
    Moorfields Eye Hospital
    University College London Hospital
    +9 more
    Read more about Google DeepMind
    WebsiteGitHubLinkedInX / Twitter
    3 tools in directory

    Similar Tools

    SkillsBench icon

    SkillsBench

    An open-source evaluation framework that benchmarks how well AI agent skills work across diverse, expert-curated tasks in high-GDP-value domains.

    LLM Stats icon

    LLM Stats

    Public leaderboards and benchmark site that publishes verifiable evaluations, scores, and performance metrics for large language models and AI providers.

    SciArena icon

    SciArena

    Open evaluation platform from the Allen Institute for AI where researchers compare and rank foundation models on scientific literature tasks using head-to-head, literature-grounded responses.

    Browse all tools

    Related Topics

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    47 tools

    Retrieval-Augmented Generation

    RAG Systems that enhance LLM outputs by retrieving relevant information from external knowledge bases, combining the power of generative AI with information retrieval for more accurate and contextual responses.

    40 tools

    Academic Research

    AI tools designed specifically for academic and scientific research.

    27 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Sign in
    1view
    0upvotes
    0discussions