Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    2,386+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1556
    • Coding1160
    • Infrastructure524
    • Marketing440
    • Design415
    • Projects378
    • Research350
    • Analytics327
    • Testing214
    • MCP207
    • Data201
    • Security186
    • Integration167
    • Learning154
    • Communication144
    • Prompts138
    • Extensions133
    • Commerce123
    • Voice122
    • DevOps97
    • Web74
    • Finance21
    1. Home
    2. Tools
    3. Arize AI
    Arize AI icon

    Arize AI

    LLM Evaluations

    Arize AI is an enterprise AI and agent engineering platform for development, observability, and evaluation of LLM applications, AI agents, and ML models in production.

    Visit Website

    At a Glance

    Pricing
    Free tier available

    Fully open-source, self-hosted observability and evaluation tool for LLM applications.

    AX Pro: $50/mo
    AX Enterprise: Custom/contact

    Engagement

    Available On

    Windows
    iOS
    Web
    API
    SDK

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    LLM EvaluationsObservability PlatformsAgent Frameworks

    Alternatives

    MaximAgentOpsLangWatch
    Developer
    Arize AIBerkeley, CAEst. 2019$131M raised

    Updated May 2026

    About Arize AI

    Arize AI builds the Arize AX platform — a unified environment for developing, evaluating, and observing AI agents and LLM applications in production. Founded by Jason Lopatecki and Aparna Dhinakaran, the company is backed by Battery Ventures, Foundation Capital, TCV, Microsoft Ventures, and others. The platform is available as a cloud SaaS offering and as a self-hosted option, with an open-source counterpart called Phoenix.

    What It Is

    Arize AX is an AI and agent engineering platform that closes the loop between development and production. It covers three core pillars: development tooling (prompt management, optimization, and playground), evaluation (LLM-as-a-Judge, CI/CD experiments, human annotation), and observability (OpenTelemetry-based tracing, online evals, monitoring dashboards). The platform also includes Alyx, an AI engineering agent that helps teams debug faster and build with greater confidence. A purpose-built datastore called adb underpins the platform, designed for real-time ingestion, sub-second queries, and elastic compute at petabyte scale.

    Core Capabilities

    • Open Standard Tracing: Agent and framework tracing powered by OpenTelemetry (OTEL) and the OpenInference conventions, making it vendor-, framework-, and language-agnostic.
    • LLM as a Judge: Automated evaluation of prompts and agent actions at scale using LLM-based evaluators, all open-source.
    • CI/CD Experiments: Regression detection for prompts and agents integrated into development pipelines.
    • Human Annotation & Queues: Labeling queues, production annotations, and golden dataset creation in one place.
    • Prompt Hub: Prompt serving, management, environment tags, multi-prompt comparison, and automatic optimization.
    • ML & Computer Vision Observability: Model drift detection, embedding monitoring, heatmap-based failure analysis, and data curation for traditional ML and CV models.
    • Alyx: A context-aware AI agent that assists with span/trace debugging, dashboard creation, prompt optimization, and AI trace search.

    Open-Source Foundation

    Arize publishes Phoenix, a fully open-source, self-hostable observability and evaluation tool for LLM applications, available on GitHub. The evaluation libraries and eval models are also open-source. The platform is built on OpenTelemetry and uses standard data file formats to avoid data lock-in. According to Arize, Phoenix reaches approximately 5 million downloads per month.

    Platform Architecture and Deployment

    Arize AX is available as a multi-tenant SaaS with data region options (US, EU, or CA) and as a self-hosted enterprise deployment supporting data residency and multi-region configurations. The underlying adb datastore is purpose-built for generative AI workloads, handling real-time ingestion and sub-second analytical queries. The enterprise tier adds SOC2 Type II, HIPAA compliance, SSO enforcement, space-level RBAC, audit logs, and uptime SLAs.

    Scale and Adoption Signals

    Arize's homepage states the platform has processed 1 trillion spans and runs 50 million evaluations per month. The vendor's customer page lists organizations including DoorDash, Instacart, Reddit, Uber, Booking.com, PagerDuty, Siemens, PepsiCo, TripAdvisor, and others as customers. The Defense Innovation Unit page notes that the U.S. Navy awarded prototype agreements to Arize AI as part of Project AMMO for ML-based maritime target recognition.

    Why It Matters for AI Teams

    The platform addresses a core challenge in production AI: without visibility into agent accuracy, hallucinations, and model drift, teams cannot reliably improve or trust their systems. By integrating tracing, evaluation, and prompt management in one place — and connecting production data back to the development loop — Arize AX enables what the company describes as a "data-driven iteration cycle." The inclusion of Alyx as an in-platform AI agent, and the open-source Phoenix project as a free self-hosted entry point, positions Arize across both individual developers and large enterprise engineering teams.

    Arize AI - 1

    Community Discussions

    Be the first to start a conversation about Arize AI

    Share your experience with Arize AI, ask questions, or help others learn from your insights.

    Pricing

    FREE

    Phoenix OSS

    Fully open-source, self-hosted observability and evaluation tool for LLM applications.

    • Self-hosted deployment
    • User-managed trace spans
    • User-managed ingestion volume
    • User-managed projects
    • User-managed retention
    FREE

    AX Free

    For individuals and startups. SaaS deployment with limited spans and retention.

    • 25k trace spans per month
    • 1 GB ingestion per month
    • 15-day retention
    • Alyx (Arize agent)
    • Online evals

    AX Pro

    For small teams and startups. Higher rate limits, longer retention, and email support.

    $50
    per month
    • 50k trace spans per month
    • 10 GB ingestion per month
    • 30-day retention
    • Higher rate limits
    • Longer retention
    • Email support
    • Everything in AX Free

    AX Enterprise

    For enterprise teams. Custom spans, ingestion, retention, SLA, compliance, and self-hosting options.

    Custom
    contact sales
    • Custom trace spans
    • Custom ingestion volume
    • Configurable retention
    • Dedicated support
    • Uptime SLA
    • SOC2 Type II and HIPAA compliance
    • Training sessions
    • adb Data Fabric
    • Enterprise SSO (Okta, AzureAD/EntraID)
    • SSO enforcement
    • Space-level RBAC
    • Audit logs
    • Data residency (self-hosting add-on)
    • Multi-region deployments (self-hosting add-on)
    • Everything in AX Pro
    View official pricing

    Capabilities

    Key Features

    • LLM and agent tracing with OpenTelemetry
    • LLM-as-a-Judge automated evaluation
    • CI/CD experiment integration for regression detection
    • Human annotation queues and golden dataset creation
    • Prompt management, serving, and optimization
    • Prompt playground with trace replay
    • Online evaluations (LLM judge, code evals, session evals, agent path evals)
    • Real-time monitoring and custom dashboards
    • ML model drift detection and embedding monitoring
    • Computer vision observability
    • Alyx AI engineering agent for debugging and optimization
    • adb purpose-built datastore for generative AI workloads
    • Multi-agent tracing graphs
    • Token and cost tracking
    • SOC2 Type II and HIPAA compliance (Enterprise)
    • SSO and RBAC (Enterprise)
    • Self-hosting with data residency options
    • Open-source Phoenix OSS

    Integrations

    OpenTelemetry
    LangChain
    LlamaIndex
    OpenAI
    Anthropic
    Cohere
    AWS
    Azure
    Google Cloud
    Okta
    Azure AD / Entra ID
    GitHub SSO
    Google SSO
    Python SDK
    JavaScript SDK
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate Arize AI and help others make informed decisions.

    Developer

    Arize AI Team

    Founded 2019
    Berkeley, CA
    $131M raised
    120 employees

    Used by

    Uber
    Klaviyo
    PepsiCo
    Adobe
    +1 more
    Read more about Arize AI Team
    Website
    1 tool in directory

    Similar Tools

    Maxim icon

    Maxim

    Enterprise-grade AI evaluation and observability platform for testing, monitoring, and improving AI agents and LLM applications.

    AgentOps icon

    AgentOps

    AgentOps is a developer platform for tracing, debugging, and deploying reliable AI agents and LLM apps with observability across 400+ LLMs and frameworks.

    LangWatch icon

    LangWatch

    LangWatch is a developer-first platform for testing, evaluating, and monitoring AI agents and LLM applications, with agent simulations, real-time evals, and LLM observability.

    Browse all tools

    Related Topics

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    74 tools

    Observability Platforms

    Comprehensive platforms that combine metrics, logs, and traces with AI-powered analytics to provide deep insights into complex distributed systems and application behavior.

    77 tools

    Agent Frameworks

    Tools and platforms for building and deploying custom AI agents.

    307 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    36views
    Discussions