SciArena
Open evaluation platform from the Allen Institute for AI where researchers compare and rank foundation models on scientific literature tasks using head-to-head, literature-grounded responses.
At a Glance
Pricing
Free access to core SciArena search, summarization, and conversational features.
Engagement
Available On
About SciArena
SciArena is an open evaluation platform from the Allen Institute for AI (Ai2) for benchmarking foundation models on scientific literature tasks. Instead of relying on static benchmarks, SciArena collects head-to-head comparisons from human researchers: users submit research questions, see side-by-side, literature-grounded answers from two models, and vote for the better response. These votes drive a public leaderboard and power SciArena-Eval, a meta-evaluation benchmark for testing LLM-as-judge systems.
- Arena-style model comparison — Submit scientific questions, inspect long-form, citation-attributed answers from two foundation models, and cast a vote for the preferred output.
- Leaderboard with Elo-style ratings — Track how models like o3, Claude, Gemini, and DeepSeek rank overall and by scientific discipline using an Elo-style rating system.
- SciArena-Eval benchmark — Use the released human preference data and code to study automated evaluators, LLM-as-judge setups, and model alignment with expert judgments.
- Literature-grounded retrieval — Behind the scenes, SciArena uses a multi-stage retrieval pipeline over the Semantic Scholar corpus to ground answers in relevant, up-to-date papers.
- Research-grade data quality controls — Expert annotators, training, blind ratings, and agreement checks help ensure the preference data is reliable enough for serious evaluation work.
Community Discussions
Be the first to start a conversation about SciArena
Share your experience with SciArena, ask questions, or help others learn from your insights.
Pricing
Free Plan Available
Free access to core SciArena search, summarization, and conversational features.
- Core semantic search
- AI-generated summaries
- Conversational Q&A
- Basic filters and citation export
Capabilities
Key Features
- Semantic search across scientific literature
- AI-generated paper summaries
- Conversational Q&A over papers
- Filters for date/venue/author and citation export
