Joe Seifi's avatarJS

GPT-5 Hype vs. Reality: A Disappointing Gap

By Joe Seifi 1 comments • 9 days ago
GPT-5 Hype

So, OpenAI dropped GPT-5 on August 7th 2025. Sam Altman was all “PhD-level expert” and “superpower on demand” — basically, “Hey, you’re not a coder anymore. You’re a wizard.” The marketing deck looked like a developer’s dream:

  • 74.9% on SWE-bench Verified (up from 69.1% on o3)
  • 88% on Aider’s polyglot benchmark
  • 22% fewer tokens, 45% fewer tool calls
  • 256k context window (yes, 256k, not 400k like some thought)
  • New reasoning_effort and verbosity knobs
  • GPT-5, Mini, Nano — all at different price points
  • Multi-model routing: fast model for simple stuff, deep model for hard problems
  • Custom tools via plain English (no JSON needed!)
  • And — get this — no model picker in the UI anymore. The system just knows what to do.

All of this was sold as the future of coding, a seamless upgrade from GPT-4o. Devs were pumped. We’d finally have an AI that could plan, write, test, and refactor like a real engineer.

But fast forward a few hours and days, and the vibe’s totally different.

This new report dives into what actually happened after the rollout. We’re not just looking at benchmarks — we’re digging into developer forums, API logs, and real user reports. Spoiler: the gap between promise and reality is wild.


🎯 What We Were Told

Here’s what OpenAI said GPT-5 would do — and what they claimed the numbers were.

FeatureWhat Was PromisedSource
SWE-bench Verified74.9% (vs. 69.1% on o3)FinalRoundAI¹
Aider Polyglot Benchmark88%FinalRoundAI¹
Token Efficiency22% fewer tokens vs. GPT-4oFinalRoundAI²
Tool Call Efficiency45% fewer callsFinalRoundAI²
Context Window400K tokensFinalRoundAI³
Dynamic RoutingFast model for simple tasks, deep model for complex onesFinalRoundAI³
Custom ToolsDescribe in plain English (no JSON)FinalRoundAI³
API Controlreasoning_effort, verbosity, max_completion_tokensFinalRoundAI³
PricingGPT-5, GPT-5 Mini, GPT-5 Nano — cheaper and fasterFinalRoundAI⁴
PersonalityMore natural, creative, "better prose"Leon Furze blog⁵
Upgrade ExperienceNo model picker. Auto-select the best one.Digital Watch Observatory⁶

That’s a lot of promises. And they were backed by a slick demo, a keynote, and partner quotes from big tech companies.

So, what actually happened?


🤯 What We Actually Got

1. 💣 Performance & Latency: It’s Slow. Like, really slow.

"A simple prompt now takes 3–4 minutes. On Cursor. That’s not a bug — it’s a feature?"
Cursor forum user

Let’s talk numbers.

TaskGPT-4o (old)GPT-5 (now)Notes
Basic "Hello" via Responses API2–5 sec~60 secOpenAI dev forum¹⁰
4k-token prompt (long context)~5–10 sec30+ secUser report¹⁰
Simple regex fix (Cursor)5–8 sec3–4 minDev on Cursor forum⁷
"Hello" in API (no logic)2–5 sec1 minOpenAI community report⁹

Result? GPT-5 is 45× slower than GPT-4.1 in some cases — and that’s after OpenAI claimed it was more efficient.

“I used to write a function in 10 seconds. Now I wait 3 minutes and get a blank screen.”
Dev on OpenAI community forum¹³

Some users even hit timeout errors because the default reasoning_token_budget is too low.

And yes — the model picker is gone. You can’t switch back to GPT-4o or o3. You’re stuck with GPT-5, even if it’s slower, dumber, and more expensive.


2. 🔥 Token Use & Cost: “Efficiency” Is a Lie

OpenAI said: “GPT-5 uses 22% fewer tokens.”
Reality: It uses way more.

Here’s what users are reporting:

ModelTaskTokens UsedOutput
gpt-5-nanoList 6 phrases1,659 tokens
gpt-5-nanoWrite 4,000-token article5,030 tokens❌ (no output)
gpt-5-miniSimple task3,000+ tokens❌ (blank)

I asked it to generate a simple list. It used 1,659 tokens. For six phrases. That’s insane. — OpenAI community, post #1338030¹²

Another user said:

I set max_completion_tokens = 1000. It consumed 5,000 tokens and returned nothing. Just a blank string.

And the worst part? The model hits the reasoning_token_budget limit (2048) and just… stops. No error. No output. Just silence.

This is not efficient. It’s wasteful. And it’s ruining the cost argument — especially for the "cheap" Nano and Mini models.


3. 💣 Code Quality: It’s Making Things Worse

It’s not fixing the bug. It’s rewriting the whole app in a language I don’t know. — Dev on Cursor forum¹⁴

GPT-5 was supposed to be a multi-step agent — plan, write, test, integrate.

Instead, users report:

  • Inserting 500 lines of code for a simple regex fix
  • Ignoring instructions (e.g., “use this library” → ignores it)
  • Failing to use tools (e.g., LSP build system, test runner)
  • Making up code (e.g., importing a non-existent package)

It doesn’t know how to use the LSP. It’s like it’s blind to the editor. — Cursor user¹⁷

Some devs say they have to re-explain the same rule 3–4 times before GPT-5 actually follows it.

And when it does try to use a tool?

It calls git commit but doesn’t push. No error. No explanation. — Another user

It’s not an agent. It’s a dumb, overconfident intern.


4. 📝 Response Quality: Bland, Robotic, and Boring

It’s like talking to a corporate beige zombie. — Windows Central, user review¹⁸

Let’s be real: GPT-5 was supposed to have better prose, better creativity.

Instead:

  • Poems sound like a “parody” (Reddit user¹⁹)
  • Summaries are wrong or missing key points
  • Table generation fails
  • PDF analysis doesn’t work
  • Memory? It forgets stuff from 20 seconds ago
  • Context window: supposedly 256k, but many users report only 128k works reliably

I asked it to write a blog post. It gave me 3 short sentences. No structure. No flow. Like it was tired. — Medium critic²²

And the worst part?

It ghosts you mid-sentence. Just… silent. — Multiple users on OpenAI forum¹³


5. 🚫 Missing Features & Forced Upgrade

Let’s be honest: GPT-5 is not a full upgrade. It’s a downgrade in disguise.

Here’s what’s missing or broken:

FeatureWhat’s PromisedWhat’s Actually True
Custom toolsWorks in all tiersOnly in Standard plan
Fine-tuningNot availableStill not here
Temperature controlYesgpt-5-nano has NO temperature param
Model pickerRemovedYou can’t go back to GPT-4o
256k contextYesOnly 128k actually works

And yes — GPT-4o and o3 are gone from the ChatGPT UI.

I loved GPT-4o for creative tasks. Now I’m forced to use GPT-5, which is slower, dumber, and more corporate. — OpenAI forum user⁶

This isn’t an upgrade. This is shrinkflation.

They’re not selling a better model. They’re selling a worse one and making you buy it. — Medium critic²³


6. 📉 API Reliability: Blank Outputs Are the Norm

The new Responses API has a reasoning_token_budget parameter.

But if you hit it?
Empty string.
→ No error.
→ No progress.

I set max_completion_tokens = 500. It used 3,000 tokens and returned nothing. — OpenAI forum¹²

And because gpt-5-nano doesn’t expose temperature, you can’t control creativity at all.

So you’re stuck with:

  • No creativity control
  • No reliable output
  • No way to debug why it’s silent

I’d rather use a Python script than trust GPT-5 to generate a simple response. — Dev, OpenAI community²⁴


🔍 What Devs Are Actually Saying

This isn’t just “it’s slow.” This is grief.

  • “I’m mourning GPT-4o.”
    → Devs are genuinely sad about losing a faster, smarter, more responsive model.

  • “It feels like OpenAI is forcing us to use a worse product.”
    → The removal of model choice feels like forced upgrade, not progress.

  • “This isn’t a revolution. It’s an incremental patch with a hype machine.”
    → Multiple threads call it a “disaster” and accuse OpenAI of repeating the “maths good, but discourse analysis? Zero” pattern²¹.

  • “It’s not worth it.”
    → Some devs are switching back to Claude or staying on GPT-4.1.


📌 Final Verdict: The Hype Was Too Big

GPT-5 has some real improvements — better benchmarks, better agentic tasks in theory.

But in practice?

  • Slower than GPT-4o
  • Uses more tokens
  • Gives blank outputs
  • Breaks workflows
  • Removes model choice
  • Forces you to use a worse product

It’s not a revolution.
It’s not a leap.
It’s a downgrade disguised as a breakthrough.


✅ What Devs Should Do Now

  1. Don’t trust the marketing. OpenAI sold GPT-5 like it was a god-tier coder. It’s not.
  2. Test new models in staging — not production.
  3. Keep fallbacks. GPT-4o is still better for many tasks. Use it.
  4. Watch for API quirks. reasoning_token_budget = silent failure. max_completion_tokens too low = blank.
  5. Demand transparency. OpenAI needs to clarify what’s actually working vs. what’s broken.

📚 References

  1. GPT-5 Released: What the Performance Claims Actually Mean – FinalRoundAI
  2. First Impressions of GPT-5 – Leon Furze
  3. GPT-5 Launch Sparks Backlash – Digital Watch Observatory
  4. GPT 5 Slow. Very slow – Cursor Forum
  5. GPT-5 + Responses API is extremely slow – OpenAI Dev Forum
  6. OpenAI Launches GPT-5: Initial Legal Benchmarking – Legal IT Insider
  7. What is going on with the GPT-5 API? – OpenAI Dev Forum
  8. GPT 5 is really bad (at least in Cursor) – Cursor Forum
  9. Did Sam Altman Oversell GPT-5? – Windows Central
  10. GPT5 Is Horrible – Reddit
  11. GPT-5: OpenAI's Worst Release Yet – Medium
  12. GPT-5 is awful - Reddit

TL;DR:
GPT-5 is not ready for prime time.
It’s slower, more expensive, and less reliable than GPT-4o.
Don’t upgrade without testing.
And if you’re using it in production? You’re doing it wrong.

Please sign in to join the discussion.

Sam Moore's avatarSM
Sam Moore5 days ago

So looks like there was a pivot from OpenAI and Sam. That GPT-5 made a grand entrance, looking formal and sounding like a boss, leaving many of us puzzled where our friendly AI assistant went.

OpenAI’s new "automatic router" was meant to seamlessly choose the best answers, but it stumbled right away, slowing things down and making the model feel less capable, almost as cold as legal jargon.

Now we have choices: Auto, Fast, or Thinking modes. They say the personality will become softer and more friendly over time.

Essentially, OpenAI tried a one-size-fits-all approach, it didn’t work, and now we can pick again.