LM Arena
To build the world's most trusted AI evaluation platform that measures AI reliability through real-world human preferences, serving as the voice of humans shaping and improving AI to ensure responsible deployment.
At a Glance
- AI developers and researchers
- Model providers (OpenAI, Google, Anthropic, Meta, xAI, etc.)
- Enterprises implementing AI systems
- Software developers and engineers
- +5 more
AI Tools by LM Arena
(1)LM Arena
LLM Evaluation and Deployment Platform
Discussions
No discussions yet
Be the first to start a discussion about LM Arena
Latest News
Launch of Chatbot Arena research project at UC Berkeley
Claude 3 surpasses GPT-4 for the first time on Chatbot Arena
Chatbot Arena rebrands as LMArena and becomes a formal company (Arena Intelligence Inc.)
LMArena Secures $100M in Seed Funding at $600M valuation
Products & Services
Side-by-side blind model comparisons where users vote on responses using crowdsourced pairwise comparisons. Uses Bradley-Terry model for Elo-style rankings.
AI coding competition for web development challenges. Powered by Code Arena experience as of November 12, 2025.
VSCode extension for benchmarking AI coding assistants on real-world code completion with paired autocomplete and in-line editing features.
Benchmarking environment for AI software engineers working with real-world GitHub codebases.
Market Position
LMArena is positioned as a neutral, science-driven alternative to static academic benchmarks, providing the gold standard for evaluating real-world model performance through rigorous science and human judgment. It differentiates from Scale AI (expert-driven/private services) and Hugging Face (automated/objective benchmarks) by providing crowdsourced human preference signals based on actual user interactions. The platform is best known for its crowdsourced AI leaderboards that have become an industry standard for model makers. It serves as a transparent, reproducible, community-driven infrastructure layer that evaluates models based on real-world prompts rather than proprietary or closed testing.
Leadership
Founders
Anastasios N. Angelopoulos
CEO. PhD from UC Berkeley with expertise in trustworthy AI systems, black-box decision-making, and medical machine learning. Former researcher at Google DeepMind. UC Berkeley postdoc and researcher at Sky Computing Lab.
Wei-Lin Chiang
CTO. Studied distributed systems and deep learning frameworks at UC Berkeley SkyLab. Former research experience at Google Research, Amazon, and Microsoft. UC Berkeley postdoc.
Ion Stoica
Co-founder and Advisor. UC Berkeley professor and serial founder of Databricks, Anyscale, and Conviva. Advisor to the founding team at Berkeley Sky Computing Lab.
Executive Team
Anastasios N. Angelopoulos
Co-Founder and CEO
PhD from UC Berkeley with expertise in trustworthy AI systems, black-box decision-making, and medical machine learning. Former researcher at Google DeepMind.
Wei-Lin Chiang
Co-Founder and CTO
Studied distributed systems and deep learning frameworks at UC Berkeley SkyLab. Former research experience at Google Research, Amazon, and Microsoft.
Board of Directors
Founding Story
LMArena began in early 2023 as Chatbot Arena, a scrappy academic side project by two UC Berkeley Ph.D. roommates, Anastasios Angelopoulos and Wei-Lin Chiang, at the Berkeley Sky Computing Lab. Originally built to test their own open-source large language model, Vicuna, it addressed the industry-wide challenge that technical benchmarks often fail to reflect real-world user experience. The platform was created as a blind taste test for AI models to determine which ones provide the best user experience for tasks like coding, content creation, and conversation. Within one week of launch, the site received 4,700 votes. By December 2023, the Wall Street Journal described it as the AI industry's obsession. In April 2025, the founders announced Chatbot Arena had become a startup called LMArena (Arena Intelligence Inc.), and in May 2025, raised $100 million in seed funding.
Business Model
Revenue Model
Freemium model with commercial evaluation services. Core platform is free to the public to generate crowdsourced data. Revenue is generated by charging model providers and enterprise clients for private arenas, evaluation tooling, analytics dashboards, API/SDK access, and premium support for custom assessments. Annualized consumption rate (ARR) based on commercial model evaluations.
Pricing Tiers
Open access to compare AI models, vote on responses, and view public leaderboards. Core participation is free and open to the public.
Commercial service for enterprises, model labs, and developers. Includes private arenas for proprietary datasets, evaluation tooling, analytics dashboards, diagnostic reports, API/SDK access, and premium support for custom assessments. Pay-as-you-go consumption-based pricing model.
Target Markets
- AI developers and researchers
- Model providers (OpenAI, Google, Anthropic, Meta, xAI, etc.)
- Enterprises implementing AI systems
- Software developers and engineers
- Web developers
- Content creators and designers
- Model benchmarking and performance comparison
- Specialized task evaluation (coding, math, creative writing)
- Product development and model selection
- Search and grounding evaluation
- Media editing and generation assessment
- Enterprise model evaluation on proprietary data
- OpenAI
- Google DeepMind
- Anthropic
- Meta