Ragas

Name: Ragas
Availability: OnlineOnly
Author: Ragas

Ragas is an open-source framework for evaluating and testing LLM applications, helping teams measure retrieval-augmented generation (RAG) pipeline quality with automated metrics.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source framework available via pip with all core evaluation metrics and features.

Engagement

Available On

API

SDK

RagasSan Francisco, CAEst. 2023$500000 raised

Listed Mar 2026

About Ragas

Ragas is an open-source evaluation framework purpose-built for LLM applications, with a strong focus on retrieval-augmented generation (RAG) pipelines. It provides a suite of automated metrics that measure faithfulness, answer relevancy, context precision, and more — enabling teams to objectively assess and improve their AI systems. Ragas integrates with popular LLM frameworks and supports both unit-test-style evaluations and continuous monitoring in production. It is widely used by AI engineers and researchers who need reliable, reproducible quality signals for their LLM-powered products.

RAG Evaluation Metrics: Automatically score RAG pipelines on faithfulness, answer relevancy, context recall, context precision, and more using reference-free and reference-based metrics.
LLM-as-a-Judge: Leverage LLMs to evaluate generated outputs against ground truth or without reference, reducing the need for manual annotation.
Test Dataset Generation: Synthetically generate evaluation datasets from your documents to bootstrap testing without manual labeling.
Integration with LLM Frameworks: Works seamlessly with LlamaIndex, LangChain, and other popular orchestration frameworks to evaluate pipelines end-to-end.
CI/CD-Ready Evaluations: Run evaluations as part of automated pipelines to catch regressions before they reach production.
Observability & Monitoring: Track evaluation metrics over time to monitor model and pipeline quality in production environments.
Customizable Metrics: Define and extend custom metrics tailored to your specific use case and domain requirements.
Open Source: Freely available on GitHub, with an active community and transparent development.

To get started, install Ragas via pip, connect it to your LLM provider, and run evaluations on your RAG pipeline outputs using the built-in metric suite or your own custom metrics.

Community Discussions

Be the first to start a conversation about Ragas

Share your experience with Ragas, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source framework available via pip with all core evaluation metrics and features.

RAG evaluation metrics
LLM-as-a-Judge
Synthetic dataset generation
LangChain & LlamaIndex integration
Custom metrics

Capabilities

Key Features

RAG pipeline evaluation
LLM-as-a-Judge scoring
Synthetic test dataset generation
Faithfulness metric
Answer relevancy metric
Context precision and recall metrics
CI/CD integration
Production monitoring
Custom metric support
LangChain integration
LlamaIndex integration

Integrations

LlamaIndex

LangChain

OpenAI

Hugging Face

AWS Bedrock

Azure OpenAI

API Available

View Docs

Back to all tools

Ragas

LLM Evaluations

Ragas is an open-source framework for evaluating and testing LLM applications, helping teams measure retrieval-augmented generation (RAG) pipeline quality with automated metrics.

Visit Website

At a Glance

Pricing

Open Source

Fully open-source framework available via pip with all core evaluation metrics and features.

Engagement

12views

Discussions

Available On

API

SDK

Resources

Website Docs GitHub llms.txt

Topics

LLM Evaluations Retrieval-Augmented Generation Observability Platforms

Alternatives

LangChain DeepEval Opik

Developer

RagasSan Francisco, CAEst. 2023$500000 raised

Listed Mar 2026

About Ragas

RAG Evaluation Metrics: Automatically score RAG pipelines on faithfulness, answer relevancy, context recall, context precision, and more using reference-free and reference-based metrics.
LLM-as-a-Judge: Leverage LLMs to evaluate generated outputs against ground truth or without reference, reducing the need for manual annotation.
Test Dataset Generation: Synthetically generate evaluation datasets from your documents to bootstrap testing without manual labeling.
Integration with LLM Frameworks: Works seamlessly with LlamaIndex, LangChain, and other popular orchestration frameworks to evaluate pipelines end-to-end.
CI/CD-Ready Evaluations: Run evaluations as part of automated pipelines to catch regressions before they reach production.
Observability & Monitoring: Track evaluation metrics over time to monitor model and pipeline quality in production environments.
Customizable Metrics: Define and extend custom metrics tailored to your specific use case and domain requirements.
Open Source: Freely available on GitHub, with an active community and transparent development.

To get started, install Ragas via pip, connect it to your LLM provider, and run evaluations on your RAG pipeline outputs using the built-in metric suite or your own custom metrics.

Community Discussions

Be the first to start a conversation about Ragas

Share your experience with Ragas, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully open-source framework available via pip with all core evaluation metrics and features.

RAG evaluation metrics
LLM-as-a-Judge
Synthetic dataset generation
LangChain & LlamaIndex integration
Custom metrics

Capabilities

Key Features

RAG pipeline evaluation
LLM-as-a-Judge scoring
Synthetic test dataset generation
Faithfulness metric
Answer relevancy metric
Context precision and recall metrics
CI/CD integration
Production monitoring
Custom metric support
LangChain integration
LlamaIndex integration

Integrations

LlamaIndex

LangChain

OpenAI

Hugging Face

AWS Bedrock

Azure OpenAI

API Available

View Docs

Back to all tools