# DeepEval > DeepEval is an open-source LLM evaluation framework that enables developers to build reliable evaluation pipelines and test any AI system with 50+ research-backed metrics. DeepEval is a comprehensive LLM evaluation framework used by leading AI companies including OpenAI, Google, Adobe, and Walmart. It provides a native Pytest integration that fits directly into CI/CD workflows, enabling unit-testing for LLMs with over 50 research-backed metrics. The framework supports single and multi-turn evaluations, multi-modal test cases (text, images, audio), synthetic data generation, and automatic prompt optimization. - **Unit-Testing for LLMs** — *Install via `pip install deepeval` and integrate natively with Pytest to run evaluations in your CI/CD pipeline.* - **LLM-as-a-Judge Metrics** — *Access 50+ research-backed metrics including G-Eval (chain-of-thought criteria scoring), DAG (directed acyclic graph for multi-step scoring), and QAG (question-answer generation scoring).* - **Single and Multi-Turn Evaluations** — *Evaluate any use case and system architecture, including multi-turn conversational agents.* - **Native Multi-Modal Support** — *Evaluate text, images, and audio with built-in multi-modal test cases.* - **Synthetic Data Generation** — *Generate synthetic test datasets and simulate conversations when no test data is available.* - **Auto-Optimize Prompts** — *Automatically optimize prompts without manual tweaking using DeepEval's built-in prompt optimization.* - **Confident AI Cloud Platform** — *Use DeepEval on Confident AI for team-wide collaborative AI testing, regression testing, dataset management, observability, tracing, online monitoring, and human annotations.* - **Wide Framework Integrations** — *Integrates with OpenAI, LangChain, LlamaIndex, LangGraph, Pydantic AI, CrewAI, Anthropic, and OpenAI Agents.* ## Features - 50+ research-backed LLM evaluation metrics - G-Eval chain-of-thought scoring - DAG directed acyclic graph evaluation - QAG question-answer generation scoring - Native Pytest integration - CI/CD pipeline support - Single and multi-turn evaluations - Multi-modal test cases (text, images, audio) - Synthetic data generation - Conversation simulation - Automatic prompt optimization - LLM-as-a-Judge - Regression testing - Dataset management - Observability and tracing - Online monitoring - Human annotations ## Integrations OpenAI, LangChain, LlamaIndex, LangGraph, Pydantic AI, CrewAI, Anthropic, OpenAI Agents, Pytest, Confident AI ## Platforms WEB, API, DEVELOPER_SDK ## Pricing Open Source, Free tier available ## Links - Website: https://deepeval.com - Documentation: https://deepeval.com/docs/getting-started - Repository: https://github.com/confident-ai/deepeval - EveryDev.ai: https://www.everydev.ai/tools/deepeval