News Signal knowledge data

The AGI benchmark has a winner, a 5-year-old and not the billion-dollar software

AGI achieved, score just above zero. How does a sentence like that survive a press conference?

The benchmark that broke itself

Huang went on a podcast and said AGI has been achieved. Two days later, his competitor scored 0.37 percent on the only test that takes that claim seriously. Grok scored zero. Not close to zero. Zero. Every five-year-old did better, without instructions, without training, without a funding round.

This is not a mistake. This is the product.

Huang sells chips. Altman sells subscriptions. Arm named a processor the β€œAGI CPU.” The language does exactly what it’s supposed to do: raise money from people who understand they’ll miss out if they don’t move now. Nothing has gone wrong. Nothing needs fixing. Welcome.

How numbers survive the truth

ARC-AGI-1 fell. ARC-AGI-2 fell. Each time, the labs threw compute and training data at the problem until the benchmark was dead. Now that there’s no training data to throw, the system scores 0.37 percent. They call that a methodological debate. Of course they do.

The Duke harness pushed Claude to 97.1 percent on one variant. One. Out of a hundred and thirty-five. The official score stayed at 0.25 percent. But 97.1 exists now. It circulates. It shows up in presentations, in press releases, in conversations with people who make decisions about someone else’s job. That’s how you send a number into the world while technically telling the truth. Nobody lied. Nobody had to. That’s the beauty of it.

Billions have been raised on a promise that is architecturally impossible to keep. The system interpolates within its training distribution. Outside that distribution, it collapses. This is in the papers. The people raising the money read the papers. They raise the money anyway.

Somewhere, someone in an organization decides which roles are β€œAGI-proof.” That decision is based on what Huang said on a podcast. The person who gets laid off doesn’t know Grok scored zero. They’re told the timing makes sense. That it’s not personal.

It’s never personal.

Nvidia. OpenAI. Microsoft. Arm. The names are in the article. At the moment they need to be spoken as an answer, the piece reaches for a rhetorical question instead. That’s how you protect someone while pretending to hold them accountable.