OpenAI says 70% to AGI. A prominent cognitive scientist says we're nowhere close

Depending on who you ask in Silicon Valley or beyond, AGI, artificial general intelligence, is either a finished product, a work in progress, or a total misunderstanding of what intelligence actually is.

“I’d say I’m basically 70 to 80% there,” OpenAI co-founder and president Greg Brockman said this week, describing how close he believes we are to AGI, AI that can match or exceed human-level performance across virtually any intellectual task. “I think it’s extremely clear that we are going to have AGI within the next couple years in a way that is still going to be jagged, but that the floor of task will just be almost for any intellectual task of how you use your computer, the AI will be able to do that.”

His remarks draw from a line from OpenAI co-founder Andrej Karpathy who noted that state-of-the-art LLMs can both “perform extremely impressive tasks (e.g. solve complex math problems)” while simultaneously struggling with “some very dumb problems.”

Meanwhile, NVIDIA CEO Jensen Huang recently went further, telling Lex Fridman that AGI has already been achieved, though his working definition was modest: an AI agent that creates a briefly viral app worth a billion dollars before it “kind of dies away.” Building an actual Nvidia? “The odds of 100,000 of those agents building Nvidia is 0%,” Huang conceded.

But Gary Marcus, cognitive scientist, NYU professor emeritus, and one of the AI field’s most devoted curmudgeons, who also built and sold an AI company to Uber, says the entire framing is wrong.

“We as a society are placing truly massive bets around the premise that AGI is close,” Marcus said in an October 2025 keynote at the Royal Society in London, posted online in March 2026. “I am talking about literally a trillion dollar bet.” The bet, he argues, rests on a confusion: “Large language models are deeply flawed imitators that are preying on the Eliza effect,” which is a psychological phenomenon in which users treat programs as if they have human-like capabilities.

‘Spud’ and the scaling argument

OpenAI has reportedly just completed a new pre-training run codenamed Spud, and he sees it as the distillation of years of research. “We have maybe two years’ worth of research that is coming to fruition in this model,” Brockman told Big Technology’s Alex Kantrowitz. “It’s going to be very exciting.”

He said OpenAI had achieved something of a domino effect with its latest AI model, where a degree of recursion simplifies model development: “When we improve the pre-training, it makes all the other steps much easier. It’s a model that is more capable to start. When it’s trying out different ideas and learning from its own mistakes, that process just is faster. It needs to make fewer mistakes.”

The models, he said, are already transforming real work. “New model releases really went from the AI being able to do like 20% of your tasks to like 80%. And that was this massive shift because it went from being kind of a nice thing to do, to you absolutely need to retool your workflow around these AIs.”

“We had this result recently where a physicist had been working on a problem for some time. He gave it to our model. 12 hours later we have a solution,” Brockman said. “And he said this is the first time he’d seen a model where he felt like it was thinking; that this is a problem that maybe humanity would never solve and our AI solved it.”

A contrarian take

Marcus has heard this pitch before. For more than two decades, in fact. “What I’ve heard for a quarter century is ‘we’re working on it, we’re going to solve it next year with a little bit more data,’” Marcus said. “I think by now maybe we can see that that’s not really the case.” His argument is that scaling, the principle that more data and more compute reliably produce smarter models, is an empirical trend that has already begun to fade.

“Sam Altman talks about scaling laws like they’re a property of the universe. They aren’t. They’re empirical observations, like Moore’s Law. And Moore’s Law ran out,” Marcus said. He pointed to Meta’s Llama 4, which “just wasn’t that good” and “fell off the curve.” In any case, the model was controversial. And then GPT-5: “Altman spent three years campaigning for this… when it came out, it was just okay.”

Here, Marcus is being charitable. At launch, GPT-5 was met with considerable backlash from some quarters. Marcus predicted GPT-5 would be underwhelming, and for years had signaled that scaling hypothesis was running out of steam.

“I argued in 2022 in a paper that basically got me excommunicated from the AI field, I’m not even kidding, that scaling was not a law of the universe and that it was not going to solve the core problems,” Marcus said referring to his Nautilus publication.

Then, in a development Marcus relishes, one of the field’s most respected voices came around. Richard Sutton, the reinforcement learning pioneer whose influential essay “The Bitter Lesson” argued that brute-force compute always wins, publicly reversed course on LLMs.

“You were never alone, Gary, though you were the first to bite the bullet, to fight the good fight and make the argument well, again and again, for the limitations of LLMs,” Sutton wrote. “I salute you for this good service.”

“This is the man who invented the bitter lesson,” Marcus told the audience, “and he came to see that LLMs were not it.”

The jaggedness question

Brockman acknowledges that current models are uneven. “The technology we have right now is very jagged. It is absolutely superhuman at many tasks. When it comes to writing code, those kinds of things, the AI can just do it. But there’s some very basic tasks that a human can do that our AI still struggles with.”

Brockman described an engineer on his team who went from being unable to get the AI to handle “low-level hardcore systems engineering” to giving it a design doc and watching it “actually implement it, add metrics, observability, run the profiler, improve it to the point that it’s the exact thing that he was hoping to produce.” The change came between model versions. “It’s almost slowly, slowly, slowly, all at once.”

“Anybody who studies these systems systematically, especially if they have a background in cognitive science, will realize that they do stupid things all the time,” Marcus said. He rattled off examples: models that can’t label parts of an elephant, that produce five hands when asked for three, that generate wiring diagrams “that might actually kill people if they took them seriously.”

The problem, Marcus said, is structural. “When you take these systems out of the kinds of things they’ve been trained on, they can often have problems. I wrote about this in 1998. We’ve seen this over and over and over again.”

One model or many minds?

Brockman is betting on unification: one model architecture that handles everything, and OpenAI is currently building a so-called “superapp” that fuses disparate AI functions together. “The pretty wild thing about what AGI is, is that sometimes these very different-looking applications, between speech to speech, image generation, text, it’s all kind of one model and we just sort of tweak that in slightly different ways.”

OpenAI pulled back from Sora, its video generation model, precisely because it ran on a different tech tree… and was very expensive. “If you branch too far and you have two different artifacts, that is very hard to sustain in a world where there is limited compute,” Brockman said. The future, in his view, is “one AI layer that can be pointed at specific applications in a very thin way.”

Marcus rejects this premise. “There’s no one way the mind works because the mind is not one thing,” he said, quoting two cognitive psychologists. “Instead, the mind has parts, and different parts of the mind operate in different ways.”

“The whole LLM hypothesis is that the mind works in one way: by using this thing called attention and looking over large amounts of data and doing sequence prediction. This is just one of the things the mind does.”

His alternative? Modularity and hybrid architectures. “The very idea of AGI, of artificial general intelligence, might not even be the right goal, at least right now.”

His exhibit A: AlphaFold 3, DeepMind’s protein structure prediction system. “My favorite example of any AI that’s ever been built. It does not try to write sonnets or evaluate sonnets. All it does is take a sequence of DNA nucleotides and predict where a bunch of atoms will be. It’s about as specialized as you can get, and it’s fantastic. Every biologist in the world knows about it and many of them use it every day.”

“Maybe we don’t need AGI right now,” Marcus said.

OpenAI says 70% to AGI. A prominent cognitive scientist says we’re nowhere close

Tags: