← Back to blog ESSAY 05 // AI

OPEN SOURCE WON. THE FRONTIER BROKE.

The story the industry has been telling about AI goes like this. A handful of labs with the biggest compute clusters and the most capital pull further ahead every year. Open source has the spirit but not the capability. The frontier is where real progress happens, and you either access it through an API or you wait eighteen months for a version of it to trickle down.

Two things happened at once and they broke that story.

Open source caught the frontier. And the frontier hit a wall.

The gap that was supposed to define the next decade of AI closed in twelve months. The labs that were supposed to be pulling away are quietly running out of road.

Open source ate the gap

Across widely watched evaluations, the gap between top open-weight and top proprietary models narrowed sharply through 2025. The exact gap depends on benchmark, harness, and prompting setup, but the practical story is the same: open models are now competitive for many production tasks.

The part worth paying attention to is not that one open model got lucky. Five independent model families reached frontier quality simultaneously. DeepSeek. Qwen. Kimi. GLM. Mistral. Five separate research organizations, five separate compute budgets, five separate training approaches, all arriving at the frontier within the same window. That is not a one-off anomaly. That is convergence. And convergence is structural.

The specific models make this concrete.

  • DeepSeek V4 pushed past eighty percent on SWE-bench Verified, with weights anyone can download.
  • Qwen-family models and other open releases are increasingly present near the top of public reasoning and coding evaluations.
  • Long-context capabilities in open models have expanded quickly, reducing one of the historical advantages of closed APIs.

Then there is the cost angle, which is the part that should be making API-first startups nervous. Pricing and performance now vary by workload more than by branding. In many real deployments, teams can trade some absolute peak quality for large cost savings by using open-weight or lower-cost models with good orchestration.

The value proposition for paying frontier API prices just collapsed. Not weakened. Collapsed.


The frontier is hitting a wall

At the same moment open source was closing the gap from below, something was going wrong at the top.

Ilya Sutskever publicly acknowledged what many inside the labs had been saying quietly for a year. Pre-training results are flattening. Throwing more compute and more data at transformer models is reaching diminishing returns. The gains are getting smaller. The costs are not.

Recent research and industry analyses point to diminishing returns from pure pretraining scale: improvements continue, but each marginal gain can require much more compute and spending than before.

The industry's response has been to pivot. OpenAI's o-series. Extended thinking in Claude. More inference-time reasoning compute instead of bigger pretraining runs. That is a sensible engineering move. It is also an implicit admission that the old path is running out. When a lab stops talking about its next training run and starts talking about a new product paradigm, pay attention to what changed.

Notice what happened. The labs did not announce that pretraining was working and they were doubling down. They announced new product lines built around a different paradigm. When a company changes what it is selling, that is usually a signal about what stopped working.

Rich Sutton wrote The Bitter Lesson, the argument that general methods powered by compute tend to win over hand-engineered approaches. That framing shaped the modern AI era. Today, even leading researchers openly debate whether scaling current LLM paradigms alone is sufficient for AGI.


LeCun's case: LLMs were never the path

LeCun's argument is more specific than it sounds. He is not saying LLMs are useless. He is saying they are excellent at one particular thing, a kind of sophisticated pattern recall across everything ever written down, and that the thing they excel at is not the same thing as general intelligence.

The framing he uses is PhD-like recall. LLMs can retrieve and recombine knowledge with remarkable precision. What they cannot do is generate genuinely novel breakthroughs. They operate with vast memory but without true reasoning, without persistent memory, without the ability to plan across time. They are very impressive text predictors.

The deeper problem he points to is the data. The bulk of human knowledge does not exist in language. It exists in embodied experience, in the way a child learns to walk, to catch a ball, to understand that objects persist when you cannot see them. None of that is in the training corpus. Language captures a thin slice of what minds actually do. LLMs are trained exclusively on that slice.

LLMs learned to talk about the world better than any system in history. What they did not learn is how to be in it.

That distinction matters for AGI. If the thing you are scaling is a text predictor, scaling it further gives you a better text predictor. It does not give you an agent that reasons, plans, persists, and adapts to new situations it has never seen. It gives you a very confident autocomplete.

This is the section of the argument that is hardest to dismiss because it comes from inside the field. LeCun is not a skeptic who never believed. He is one of the people who built the foundation. When someone that deep in a paradigm starts questioning the paradigm, that is worth taking seriously.


Mythos: the brick wall made visible

Then there is the safety bottleneck.

Multiple labs now discuss frontier-capability models in terms of controlled access, staged rollout, and tighter safety gates before broad release.

The exact model names and internal eval numbers differ by lab, but the broader pattern is consistent: capability is moving faster than governance, and release decisions increasingly depend on misuse risk, not only benchmark rank.

Sit with that for a moment. The industry is no longer optimizing for capability alone; it is also optimizing for controllability, deployment safety, and institutional readiness.

The uncomfortable question. If a model is powerful but hard to control, the limiting factor is no longer raw intelligence. The limiting factor is whether we can deploy it responsibly at scale.
Progress that cannot ship is not progress. It is a proof of concept for a problem nobody knows how to solve.

What this actually means

Three things are true at the same time, and they are all pointing at the same inflection point.

Open source has closed the capability gap that was supposed to define the decade. The frontier, instead of pulling further ahead, has hit a structural ceiling that throwing more compute does not fix. And the researchers who designed the scaling paradigm, Sutton, LeCun, Hassabis, are publicly questioning whether it leads where everyone assumed it did.

That is not pessimism. It is an accurate picture of where things actually stand. And accuracy about where things stand is more useful than optimism about where they were supposed to be.

Here is what it means depending on where you are standing.

If you are building on LLMs

The model is no longer the differentiator. Three months ago, frontier access was the price of admission to frontier work. Today, an open model at a fraction of the cost does the same thing on most production tasks. The gap was real. It closed. Build your architecture accordingly, because the API bill you are paying right now is a cost center, not a moat.

If you are watching the lab race

The benchmark wars are not over, but they are no longer the only signal worth watching. The labs are changing what they are building, from scale to reasoning, from pretraining to inference-time compute. Something in the original thesis changed and the product roadmaps are the evidence. Follow those more closely than the press releases.

If you are thinking about where this goes

The question LeCun and Hassabis and Sutton are circling is the right one. If AGI does not come from scaling LLMs further, what does it come from. Embodied learning. World models. Hybrid architectures that combine language with something else entirely. Nobody has a confident answer and anyone who tells you otherwise is selling a narrative, not a roadmap.

That is the genuinely interesting moment. Not the benchmark numbers. Not the valuations. The fact that the field's most serious researchers are pointing at the current path and saying it does not go all the way, and nobody has yet agreed on what the next path looks like.

Open models are catching up, frontier labs are changing strategy, and safety constraints are becoming product constraints. One era is ending, and the next one is still being defined.

Sources