EveryDev.ai
Sign inSubscribe
Home
Tools

2,760+ AI tools

  • New
  • Trending
  • Featured
  • Compare
  • Arena
Categories
  • Agents1887
  • Coding1349
  • Infrastructure636
  • Marketing505
  • Projects450
  • Research411
  • Design394
  • Analytics358
  • Security248
  • MCP246
  • Testing242
  • Data239
  • Integration181
  • Prompts169
  • Communication162
  • Learning162
  • Extensions156
  • Voice139
  • Commerce127
  • DevOps112
  • Web83
  • Finance24
AI Tools by Topic
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
    1. Home
    2. Tools
    3. IsItNerfed?
    IsItNerfed? icon

    IsItNerfed?

    LLM Evaluations

    Continuous LLM evaluation platform that tracks AI model performance over time through community voting and automated coding task metrics.

    Visit Website

    At a Glance

    Pricing
    Free

    Free access to LLM performance tracking and community voting

    Engagement

    Available On

    Web

    Resources

    Websitellms.txt

    Topics

    LLM EvaluationsAI Coding AssistantsPerformance Metrics

    Alternatives

    Confident AIToolathlonArtificial Analysis
    Developer
    IsItNerfed?Saint Johns, FLEst. 2025

    Listed Feb 2026

    About IsItNerfed?

    IsItNerfed? is a continuous LLM evaluation platform that helps developers and AI users track whether large language models are performing better or worse over time. The platform combines community-driven "vibe checks" with automated metrics to provide real-time insights into AI model performance changes.

    The platform addresses a common concern in the AI community: whether LLM providers silently degrade or "nerf" their models. By aggregating user feedback and running standardized coding tasks, IsItNerfed? provides transparency into model performance fluctuations.

    • Vibe Check System allows users to vote on whether specific AI agents feel "Smarter," "Same," or "Nerfed" compared to previous experiences, with real-time aggregation of community sentiment over 24-hour periods.

    • AI Agent Tracking monitors popular coding assistants including Claude Code, Codex CLI, and Gemini CLI, displaying hourly and daily performance indicators based on user votes.

    • Metrics Check continuously runs standardized coding tasks against LLMs to objectively measure failure rates over time, with lower scores indicating better performance.

    • Historical Charts powered by TradingView display failure rate trends over 7-day and 30-day periods for models like Claude Code (Sonnet 4.5), Claude Code (Sonnet 4), and GPT-4.1.

    • Model-Specific Tracking provides separate performance metrics for different model versions, allowing users to compare how specific models perform on coding tasks.

    To get started, simply visit the website and participate in vibe checks by voting on how AI agents are performing for you. View the metrics dashboard to see objective failure rate data and historical trends. The platform requires no account creation for basic usage and provides immediate access to community sentiment and performance data.

    IsItNerfed? - 1

    Community Discussions

    Be the first to start a conversation about IsItNerfed?

    Share your experience with IsItNerfed?, ask questions, or help others learn from your insights.

    Pricing

    FREE

    Free

    Free access to LLM performance tracking and community voting

    • Vibe check voting
    • View AI agent performance indicators
    • Access metrics check data
    • Historical performance charts
    • Community sentiment tracking

    Capabilities

    Key Features

    • Community vibe check voting system
    • Real-time LLM performance indicators
    • Automated coding task failure rate tracking
    • Historical performance charts
    • AI agent monitoring (Claude Code, Codex CLI, Gemini CLI)
    • Model-specific performance metrics
    • 24-hour trend visualization
    • TradingView-powered charting

    Integrations

    TradingView

    Reviews & Ratings

    No ratings yet

    Be the first to rate IsItNerfed? and help others make informed decisions.

    Developer

    IsItNerfed? Team

    IsItNerfed? builds a continuous LLM evaluation platform that tracks AI model performance through community voting and automated coding task metrics. The team provides transparency into whether AI models are being silently degraded by aggregating user feedback and running standardized benchmarks.

    Founded 2025
    Saint Johns, FL
    2 employees

    Used by

    Individual developers and AI teams…
    Read more about IsItNerfed? Team
    Website
    1 tool in directory

    Similar Tools

    Confident AI icon

    Confident AI

    End-to-end platform for LLM evaluation and observability that benchmarks, tests, monitors, and traces LLM applications to prevent regressions and optimize performance.

    Toolathlon icon

    Toolathlon

    Toolathlon is an open-source benchmark for evaluating language agents on diverse, realistic, and long-horizon tool-use tasks across 32 software applications and 604 tools.

    Artificial Analysis icon

    Artificial Analysis

    Independent AI model benchmarking platform providing comprehensive performance analysis across intelligence, speed, cost, and quality metrics

    Browse all tools

    Related Topics

    LLM Evaluations

    Platforms and frameworks for evaluating, testing, and benchmarking LLM systems and AI applications. These tools provide evaluators and evaluation models to score AI outputs, measure hallucinations, assess RAG quality, detect failures, and optimize model performance. Features include automated testing with LLM-as-a-judge metrics, component-level evaluation with tracing, regression testing in CI/CD pipelines, custom evaluator creation, dataset curation, and real-time monitoring of production systems. Teams use these solutions to validate prompt effectiveness, compare models side-by-side, ensure answer correctness and relevance, identify bias and toxicity, prevent PII leakage, and continuously improve AI product quality through experiments, benchmarks, and performance analytics.

    88 tools

    AI Coding Assistants

    AI tools that help write, edit, and understand code with intelligent suggestions.

    530 tools

    Performance Metrics

    Specialized tools for measuring, evaluating, and optimizing AI model performance across accuracy, speed, resource utilization, and other critical parameters.

    44 tools
    Browse all topics
    Back to all toolsSuggest an edit
    30views
    Discussions