The AGI benchmark has a winner, a 5-year-old and not the billion-dollar software
AGI achieved, score just above zero. How does a sentence like that survive a press conference?
The benchmark that broke itself
Huang went on a podcast and said AGI has been achieved. Two days later, his competitor scored 0.37 percent on the only test that takes that claim seriously. Grok scored zero. Not close to zero. Zero. Every five-year-old did better, without instructions, without training, without a funding round.
This is not a mistake. This is the product.
Huang sells chips. Altman sells subscriptions. Arm named a processor the βAGI CPU.β The language does exactly what itβs supposed to do: raise money from people who understand theyβll miss out if they donβt move now. Nothing has gone wrong. Nothing needs fixing. Welcome.
How numbers survive the truth
ARC-AGI-1 fell. ARC-AGI-2 fell. Each time, the labs threw compute and training data at the problem until the benchmark was dead. Now that thereβs no training data to throw, the system scores 0.37 percent. They call that a methodological debate. Of course they do.
The Duke harness pushed Claude to 97.1 percent on one variant. One. Out of a hundred and thirty-five. The official score stayed at 0.25 percent. But 97.1 exists now. It circulates. It shows up in presentations, in press releases, in conversations with people who make decisions about someone elseβs job. Thatβs how you send a number into the world while technically telling the truth. Nobody lied. Nobody had to. Thatβs the beauty of it.
Billions have been raised on a promise that is architecturally impossible to keep. The system interpolates within its training distribution. Outside that distribution, it collapses. This is in the papers. The people raising the money read the papers. They raise the money anyway.
Somewhere, someone in an organization decides which roles are βAGI-proof.β That decision is based on what Huang said on a podcast. The person who gets laid off doesnβt know Grok scored zero. Theyβre told the timing makes sense. That itβs not personal.
Itβs never personal.
Nvidia. OpenAI. Microsoft. Arm. The names are in the article. At the moment they need to be spoken as an answer, the piece reaches for a rhetorical question instead. Thatβs how you protect someone while pretending to hold them accountable.