Google DeepMind's Raia Hadsell is building the reasoning engine that could make current AI look like a calculator

Google DeepMind VP of Research Raia Hadsell is leading an aggressive push toward “System 2” AI thinking, using reinforcement learning and synthetic data at a scale that could fundamentally change what Gemini is capable of.

The AI industry has spent years scaling compute and harvesting internet text, and it has hit a wall. Models hallucinate, stumble on multi-step logic, and struggle with anything that requires genuine planning rather than sophisticated pattern matching. Raia Hadsell, Google DeepMind’s VP of Research, is the person tasked with solving that , and the approach her team is taking suggests the next version of Gemini will be a fundamentally different kind of system than what came before.

The technical framing Hadsell’s team is working within is the distinction between System 1 and System 2 cognition , borrowed from behavioral economics, where System 1 is fast and intuitive and System 2 is slow, deliberate, and logical. Current large language models are essentially System 1 machines: extraordinarily fast at pattern completion, but structurally incapable of pausing to verify their own reasoning. The research agenda at DeepMind is aimed at building the scaffolding for System 2, which means AI that reasons rather than predicts.

One of the more striking details from recent technical briefings is the scale of synthetic training data now feeding Gemini’s development pipeline , reportedly exceeding trillions of tokens. This is a direct response to the looming data wall: the supply of high-quality human-generated internet text is finite and increasingly exhausted as a training source. By generating and verifying synthetic data internally, DeepMind can create training environments where models check their own reasoning chains rather than simply absorbing text correlations. It is a shift in the epistemology of how these systems learn.

That synthetic data pipeline connects directly to what Hadsell has described as generative agents , systems capable of producing their own training scenarios, stress-testing their outputs, and iterating. The architecture builds on the foundation laid by AlphaGeometry and the Gemini 1.5 and 2.0 model families, which demonstrated that formal reasoning tasks could be approached with AI-native logic rather than retrieved human solutions.

Reinforcement Learning Enters the Gemini Pipeline

Perhaps the most consequential structural change is the direct integration of DeepMind’s reinforcement learning research into Gemini’s training pipeline. The technique mirrors what made AlphaGo work: a feedback loop in which the model plays against itself, fails, updates, and gradually learns strategies that no human trainer explicitly encoded. Applied to language and reasoning, this creates a pathway toward AI that learns to learn , generating novel approaches rather than interpolating between examples it has seen before.

Internal benchmarks, which DeepMind has not yet published in full, reportedly show meaningful reductions in error rates on complex mathematical reasoning tasks. The specifics remain closely guarded, but the directional claim is significant: the team believes it is approaching a point where reasoning capability is genuinely distinct from probabilistic text generation, not merely a better approximation of it.

What This Means for the Competitive Landscape

OpenAI and Anthropic are chasing the same bottleneck, variously described in leaks and research discussions as Q-star or Strawberry-style reasoning. The race is not just academic. Enterprise buyers are waiting for AI agents capable of autonomous scientific research, high-level strategic planning, and multi-step decision-making , none of which current models reliably deliver. Whoever cracks reasoning first is positioned to redefine what enterprise AI contracts are actually worth.

There is also a capital argument embedded in Hadsell’s work. Google has committed tens of billions to AI infrastructure, and the traditional path of simply scaling compute is producing diminishing returns on reasoning tasks. The pivot her team represents , toward scaling test-time compute, meaning letting models think longer before answering , offers a way to extract more capability from existing hardware investments rather than indefinitely expanding them. It reframes the ROI calculation on infrastructure spend in a way that matters to investors watching Google’s AI capex with increasing scrutiny.

Hadsell’s research is not a product announcement , it is a signal about where the capability frontier is actually moving. The question worth watching is how quickly these internal benchmark gains translate into deployable Gemini features, and whether Google can move fast enough to convert research leadership into market position before its competitors close the reasoning gap.

Also read: A Wall Street Journal op-ed argues the US should champion open-source AI to outmaneuver ChinaAn 8-Year-Old Built an App in Three Hours. Here’s What That Tells Us.Claude is outperforming ChatGPT on the benchmarks that actually matter to enterprise users