Everyone talks about artificial general intelligence. Tech leaders promise it's around the corner. Sceptics say it's a fantasy. But here's the awkward truth at the heart of the debate: nobody can agree on what AGI actually means, let alone how to measure it.

Google DeepMind is trying to change that. The research lab this week published a paper, "Measuring Progress Toward AGI: A Cognitive Taxonomy," laying out a science-backed framework for evaluating how close AI systems are to genuine general intelligence. Alongside it, DeepMind has launched a $200,000 Kaggle hackathon inviting researchers worldwide to help build the benchmarks that would put the framework into practice.

Defining intelligence, piece by piece

Rather than treating intelligence as a single score on a leaderboard, the DeepMind team drew on decades of research from psychology, neuroscience and cognitive science to break general intelligence into ten distinct cognitive abilities.

Eight are foundational building blocks: perception, generation, attention, learning, memory, reasoning, metacognition (awareness of one's own thinking), and executive functions (planning, flexibility and inhibition). These combine into two higher-order faculties: problem solving and social cognition — the ability to read social situations and respond appropriately.

The idea is straightforward: if we can measure how an AI performs across all ten abilities, relative to how humans perform on the same tasks, we get a much clearer picture of where the technology actually stands.

A three-step test

DeepMind proposes a three-stage evaluation protocol. First, run AI systems through a broad suite of cognitive tasks covering each of the ten abilities, using held-out test data to prevent contamination — a persistent problem with existing benchmarks. Second, collect human baselines from a demographically representative sample of adults performing the same tasks. Third, map each AI system's performance against the human distribution.

The result would not be a single "intelligence score" but a detailed profile showing where AI matches, exceeds or falls short of human cognitive ability.

The hackathon: help wanted

A framework on paper is one thing. Turning it into real, usable tests is another — and that's where the hackathon comes in. DeepMind has partnered with Kaggle to crowdsource evaluations for the five cognitive abilities where current benchmarks are weakest: learning, metacognition, attention, executive functions and social cognition.

The competition carries a $200,000 prize pool, with $10,000 going to each of the top two submissions in the five tracks and $25,000 grand prizes for the four best overall entries. Submissions are open from 17 March through 16 April, with winners announced on 1 June.

Participants can use Kaggle's Community Benchmarks platform to test their evaluations against frontier AI models — a practical sandbox for some genuinely tricky questions. How do you test whether an AI is aware of its own reasoning limitations? How do you measure cognitive flexibility under pressure?

Why it matters now

The timing is notable. As AI companies race to release ever more capable models, the question of what "capable" really means keeps getting murkier. Traditional benchmarks — the standardised tests that dominate AI leaderboards — are increasingly criticised for measuring knowledge retrieval rather than genuine thinking, and for being vulnerable to data contamination.

DeepMind's framework doesn't claim to have all the answers. AGI remains, by most expert assessments, a distant prospect. But the team hopes to "move the conversation around AGI from one of subjective claims and speculation toward a grounded, measurable scientific endeavour."

In a field often driven by hype, that sounds like a step worth taking.