Modern Chaos
Posts
MC.97: Gemini 3.0 Pro, look mamma no hype

MC.97: Gemini 3.0 Pro, look mamma no hype

Why it is a good thing

Olivier Legris
November 20, 2025

The initial benchmarks came in for Gemini 3.0 on November 18th, and the reaction was... measured.

Not disappointing. Not revolutionary. Just... measured.

After months of teasing, after the Cloudflare outage that made it feel like Google was literally breaking the internet with the weight of their model announcement, after the promises of "multimodal understanding" and "agentic capabilities," the response from the community was remarkably subdued: it's good, but not transformative.

This matters more than you might think. Because what we're witnessing isn't the end of AI progress. It's the end of the hype cycle around model releases. And that's actually a sign of maturity.

Time to defy hype and gravity

The Hype Cycle is Dead. Long Live the Hype Cycle.

For the past 18 months, every major model release followed a predictable pattern:

Teasing phase: Cryptic tweets, leaked benchmarks, speculation
Launch day: Massive benchmark jumps, breathless coverage, "this changes everything"
Reality check (24-48 hours): Users discover the model hallucinates, ignores instructions, or performs worse than expected on real tasks
Rationalization: "Well, the benchmarks are still impressive, even if..."

We saw this with GPT-4, Claude 4 Opus, Grok 4.1, and now Gemini 3.0.

But something shifted this time. The reality check didn't come 48 hours after launch. It came during the launch.

Within hours of Gemini 3.0 becoming available, users reported:

Better performance on benchmarks, but inconsistent on real-world tasks
Aggressive code rewriting that ignored user preferences
Creative writing that felt mechanical compared to Claude
Math reasoning that was solid but not revolutionary

The benchmarks showed a 31% improvement on ARC-AGI-2. The lived experience showed... incremental improvement.

What the Plateau Reveals

This plateau reveals something crucial: we've hit a point where benchmark performance and user experience have decoupled.

Gemini 3.0 is genuinely better at many things. The benchmarks aren't lying. But the marginal improvement in what users can actually do with the model is smaller than the marginal improvement in the numbers.

This is what maturity looks like in technology.

We're no longer in the phase where each new model unlocks entirely new capabilities. We're in the phase where each new model refines existing capabilities. The jump from GPT-3 to GPT-4 felt transformative because it was. The jump from GPT-4 to GPT-5 feels incremental because it is.

And that's fine. That's actually healthy.

The Real Story: Consolidation and Specialization

While everyone was watching Gemini 3.0's benchmarks, the more interesting story was happening elsewhere:

1. Coding Models are Diverging

OpenAI released GPT-5.1-Codex-Max with "Extra High" reasoning and claims of 24+ hour autonomous operation. Google released Antigravity, their agentic IDE. Anthropic is quietly shipping Claude with better tool use.

These aren't competing on general capability anymore. They're competing on specific workflows. Coding. Search. Agentic tasks.

The era of "one model to rule them all" is ending. The era of "the right model for the right job" is beginning.

2. Open Source is Catching Up (Slowly)

Deep Cogito's Cogito v2.1 hit the leaderboards. It's not beating Gemini 3.0. But it's close enough that for many use cases, the cost difference matters more than the capability difference.

This is the pattern we saw with open source software. Eventually, the gap closes enough that factors like cost, control, and customization become the deciding factors.

3. The Infrastructure Layer is Where the Real Competition Is

While everyone watches model releases, the actual competitive advantage is shifting to:

Inference optimization (vLLM, SGLang, Ollama)
Context management and retrieval (RAG systems, MCP)
Agent orchestration frameworks
Fine-tuning and adaptation tools

The model itself is becoming commoditized. The stack around it is where differentiation happens.

What This Means for Builders

If you're building AI products, the plateau is actually good news:

You can stop chasing the latest model: The marginal gains from switching to the newest model are getting smaller. Focus on using the current generation well.
Specialization pays: Instead of "works with all models," build "works best with X model for Y task." That's a more defensible position.
The moat is in the data and workflow, not the model: The companies winning with AI aren't the ones with the best models. They're the ones with the best data pipelines and the best understanding of their users' workflows.
Stability matters more than novelty: Users care about reliability and consistency more than they care about the latest benchmark improvements.

Le french CunCun

Here's what nobody wants to say out loud: we might be approaching the limits of what scale alone can achieve.

Gemini 3.0 is rumored to be around 10 trillion parameters. That's massive. The benchmarks are impressive. But the lived experience is... fine. Good. Not revolutionary.

This doesn't mean AI progress is stopping. It means the nature of progress is changing. We're moving from "bigger models = better results" to "smarter training = better results" and "better integration = better results."

The next breakthrough probably isn't another 10x parameter increase. It's probably:

Better reasoning at inference time (which OpenAI is exploring with their reasoning models)
Better training data quality and curation
Better alignment and instruction-following
Better integration with external tools and knowledge

These are harder problems than just scaling. They require more creativity, more domain expertise, more careful engineering.

The Hype Cycle Isn't Dead. It's Just Maturing.

We're not at the end of AI progress. We're at the end of the easy progress.

The Gemini 3.0 plateau isn't a sign that AI has peaked. It's a sign that we've moved from the "exponential growth" phase to the "refinement" phase. It's the same transition that happened with:

Mobile: From "smartphones are revolutionary" to "which smartphone has the best camera"
Cloud computing: From "cloud is the future" to "which cloud provider has the best pricing and features"
Deep learning: From "neural networks can learn anything" to "which architecture is best for this specific task"

Each transition felt like a plateau. Each transition was actually a sign of maturity.

What Comes Next

The real story of AI in the next 12 months won't be about model releases. It will be about:

Agentic systems that actually work: Not just in benchmarks, but in production
Specialized models for specific domains: Medicine, law, finance, etc.
Better integration with human workflows: Not replacement, but collaboration
The infrastructure layer solidifying: The winners will be the ones with the best inference, the best retrieval, the best orchestration
The cost curve flattening: Cheaper inference, better efficiency, more accessible AI

The plateau is here. And it's actually fine.

Until next Thursday 🎉
Olivier

Like this newsletter? Forward it to a friend and have them sign up here.

Until next Thursday 🎉

Reply

or to participate.