GPT-5 Hype vs. Reality: A Disappointing Gap

By Joe Seifi • 1 comments • about 2 months ago

So, OpenAI dropped GPT-5 on August 7th 2025. Sam Altman was all “PhD-level expert” and “superpower on demand” — basically, “Hey, you’re not a coder anymore. You’re a wizard.” The marketing deck looked like a developer’s dream:

74.9% on SWE-bench Verified (up from 69.1% on o3)
88% on Aider’s polyglot benchmark
22% fewer tokens, 45% fewer tool calls
256k context window (yes, 256k, not 400k like some thought)
New reasoning_effort and verbosity knobs
GPT-5, Mini, Nano — all at different price points
Multi-model routing: fast model for simple stuff, deep model for hard problems
Custom tools via plain English (no JSON needed!)
And — get this — no model picker in the UI anymore. The system just knows what to do.

All of this was sold as the future of coding, a seamless upgrade from GPT-4o. Devs were pumped. We’d finally have an AI that could plan, write, test, and refactor like a real engineer.

But fast forward a few hours and days, and the vibe’s totally different.

This new report dives into what actually happened after the rollout. We’re not just looking at benchmarks — we’re digging into developer forums, API logs, and real user reports. Spoiler: the gap between promise and reality is wild.

🎯 What We Were Told

Here’s what OpenAI said GPT-5 would do — and what they claimed the numbers were.

Feature	What Was Promised	Source
SWE-bench Verified	74.9% (vs. 69.1% on o3)	FinalRoundAI¹
Aider Polyglot Benchmark	88%	FinalRoundAI¹
Token Efficiency	22% fewer tokens vs. GPT-4o	FinalRoundAI²
Tool Call Efficiency	45% fewer calls	FinalRoundAI²
Context Window	400K tokens	FinalRoundAI³
Dynamic Routing	Fast model for simple tasks, deep model for complex ones	FinalRoundAI³
Custom Tools	Describe in plain English (no JSON)	FinalRoundAI³
API Control	`reasoning_effort`, `verbosity`, `max_completion_tokens`	FinalRoundAI³
Pricing	GPT-5, GPT-5 Mini, GPT-5 Nano — cheaper and faster	FinalRoundAI⁴
Personality	More natural, creative, "better prose"	Leon Furze blog⁵
Upgrade Experience	No model picker. Auto-select the best one.	Digital Watch Observatory⁶

That’s a lot of promises. And they were backed by a slick demo, a keynote, and partner quotes from big tech companies.

So, what actually happened?

🤯 What We Actually Got

1. 💣 Performance & Latency: It’s Slow. Like, really slow.

"A simple prompt now takes 3–4 minutes. On Cursor. That’s not a bug — it’s a feature?"
— Cursor forum user

Let’s talk numbers.

Task	GPT-4o (old)	GPT-5 (now)	Notes
Basic "Hello" via Responses API	2–5 sec	~60 sec	OpenAI dev forum¹⁰
4k-token prompt (long context)	~5–10 sec	30+ sec	User report¹⁰
Simple regex fix (Cursor)	5–8 sec	3–4 min	Dev on Cursor forum⁷
"Hello" in API (no logic)	2–5 sec	1 min	OpenAI community report⁹

Result? GPT-5 is 45× slower than GPT-4.1 in some cases — and that’s after OpenAI claimed it was more efficient.

“I used to write a function in 10 seconds. Now I wait 3 minutes and get a blank screen.”
— Dev on OpenAI community forum¹³

Some users even hit timeout errors because the default reasoning_token_budget is too low.

And yes — the model picker is gone. You can’t switch back to GPT-4o or o3. You’re stuck with GPT-5, even if it’s slower, dumber, and more expensive.

2. 🔥 Token Use & Cost: “Efficiency” Is a Lie

OpenAI said: “GPT-5 uses 22% fewer tokens.”
Reality: It uses way more.

Here’s what users are reporting:

Model	Task	Tokens Used	Output
gpt-5-nano	List 6 phrases	1,659 tokens	✅
gpt-5-nano	Write 4,000-token article	5,030 tokens	❌ (no output)
gpt-5-mini	Simple task	3,000+ tokens	❌ (blank)

I asked it to generate a simple list. It used 1,659 tokens. For six phrases. That’s insane. — OpenAI community, post #1338030¹²

Another user said:

I set max_completion_tokens = 1000. It consumed 5,000 tokens and returned nothing. Just a blank string.

And the worst part? The model hits the reasoning_token_budget limit (2048) and just… stops. No error. No output. Just silence.

This is not efficient. It’s wasteful. And it’s ruining the cost argument — especially for the "cheap" Nano and Mini models.

3. 💣 Code Quality: It’s Making Things Worse

It’s not fixing the bug. It’s rewriting the whole app in a language I don’t know. — Dev on Cursor forum¹⁴

GPT-5 was supposed to be a multi-step agent — plan, write, test, integrate.

Instead, users report:

Inserting 500 lines of code for a simple regex fix
Ignoring instructions (e.g., “use this library” → ignores it)
Failing to use tools (e.g., LSP build system, test runner)
Making up code (e.g., importing a non-existent package)

It doesn’t know how to use the LSP. It’s like it’s blind to the editor. — Cursor user¹⁷

Some devs say they have to re-explain the same rule 3–4 times before GPT-5 actually follows it.

And when it does try to use a tool?

It calls git commit but doesn’t push. No error. No explanation. — Another user

It’s not an agent. It’s a dumb, overconfident intern.

4. 📝 Response Quality: Bland, Robotic, and Boring

It’s like talking to a corporate beige zombie. — Windows Central, user review¹⁸

Let’s be real: GPT-5 was supposed to have better prose, better creativity.

Instead:

Poems sound like a “parody” (Reddit user¹⁹)
Summaries are wrong or missing key points
Table generation fails
PDF analysis doesn’t work
Memory? It forgets stuff from 20 seconds ago
Context window: supposedly 256k, but many users report only 128k works reliably

I asked it to write a blog post. It gave me 3 short sentences. No structure. No flow. Like it was tired. — Medium critic²²

And the worst part?

It ghosts you mid-sentence. Just… silent. — Multiple users on OpenAI forum¹³

5. 🚫 Missing Features & Forced Upgrade

Let’s be honest: GPT-5 is not a full upgrade. It’s a downgrade in disguise.

Here’s what’s missing or broken:

Feature	What’s Promised	What’s Actually True
Custom tools	Works in all tiers	Only in Standard plan
Fine-tuning	Not available	Still not here
Temperature control	Yes	`gpt-5-nano` has NO `temperature` param
Model picker	Removed	You can’t go back to GPT-4o
256k context	Yes	Only 128k actually works

And yes — GPT-4o and o3 are gone from the ChatGPT UI.

I loved GPT-4o for creative tasks. Now I’m forced to use GPT-5, which is slower, dumber, and more corporate. — OpenAI forum user⁶

This isn’t an upgrade. This is shrinkflation.

They’re not selling a better model. They’re selling a worse one and making you buy it. — Medium critic²³

6. 📉 API Reliability: Blank Outputs Are the Norm

The new Responses API has a reasoning_token_budget parameter.

But if you hit it?
→ Empty string.
→ No error.
→ No progress.

I set max_completion_tokens = 500. It used 3,000 tokens and returned nothing. — OpenAI forum¹²

And because gpt-5-nano doesn’t expose temperature, you can’t control creativity at all.

So you’re stuck with:

No creativity control
No reliable output
No way to debug why it’s silent

I’d rather use a Python script than trust GPT-5 to generate a simple response. — Dev, OpenAI community²⁴

🔍 What Devs Are Actually Saying

This isn’t just “it’s slow.” This is grief.

“I’m mourning GPT-4o.”
→ Devs are genuinely sad about losing a faster, smarter, more responsive model.
“It feels like OpenAI is forcing us to use a worse product.”
→ The removal of model choice feels like forced upgrade, not progress.
“This isn’t a revolution. It’s an incremental patch with a hype machine.”
→ Multiple threads call it a “disaster” and accuse OpenAI of repeating the “maths good, but discourse analysis? Zero” pattern²¹.
“It’s not worth it.”
→ Some devs are switching back to Claude or staying on GPT-4.1.

📌 Final Verdict: The Hype Was Too Big

GPT-5 has some real improvements — better benchmarks, better agentic tasks in theory.

But in practice?

Slower than GPT-4o
Uses more tokens
Gives blank outputs
Breaks workflows
Removes model choice
Forces you to use a worse product

It’s not a revolution.
It’s not a leap.
It’s a downgrade disguised as a breakthrough.

✅ What Devs Should Do Now

Don’t trust the marketing. OpenAI sold GPT-5 like it was a god-tier coder. It’s not.
Test new models in staging — not production.
Keep fallbacks. GPT-4o is still better for many tasks. Use it.
Watch for API quirks. reasoning_token_budget = silent failure. max_completion_tokens too low = blank.
Demand transparency. OpenAI needs to clarify what’s actually working vs. what’s broken.

📚 References

GPT-5 Released: What the Performance Claims Actually Mean – FinalRoundAI
First Impressions of GPT-5 – Leon Furze
GPT-5 Launch Sparks Backlash – Digital Watch Observatory
GPT 5 Slow. Very slow – Cursor Forum
GPT-5 + Responses API is extremely slow – OpenAI Dev Forum
OpenAI Launches GPT-5: Initial Legal Benchmarking – Legal IT Insider
What is going on with the GPT-5 API? – OpenAI Dev Forum
GPT 5 is really bad (at least in Cursor) – Cursor Forum
Did Sam Altman Oversell GPT-5? – Windows Central
GPT5 Is Horrible – Reddit
GPT-5: OpenAI's Worst Release Yet – Medium
GPT-5 is awful - Reddit

TL;DR:
GPT-5 is not ready for prime time.
It’s slower, more expensive, and less reliable than GPT-4o.
Don’t upgrade without testing.
And if you’re using it in production? You’re doing it wrong.

Please sign in to join the discussion.

Sam Mooreabout 2 months ago

So looks like there was a pivot from OpenAI and Sam. That GPT-5 made a grand entrance, looking formal and sounding like a boss, leaving many of us puzzled where our friendly AI assistant went.

OpenAI’s new "automatic router" was meant to seamlessly choose the best answers, but it stumbled right away, slowing things down and making the model feel less capable, almost as cold as legal jargon.

Now we have choices: Auto, Fast, or Thinking modes. They say the personality will become softer and more friendly over time.

Essentially, OpenAI tried a one-size-fits-all approach, it didn’t work, and now we can pick again.