The False Glorification of Yann LeCun

With the support of Meta, and the unwitting assistance of the press, Yann LeCun has for the last decade run one of the more successful PR campaigns in recent scientific history, persistently allowing himself to be painted as the inventor of ideas, techniques, and arguments to which he is not the inventor of. The culmination of that PR campaign came Friday, with a puff piece in The Wall Street Journal, tied to a new startup LeCun is apparently launching, with a grossly misleading headline that styled LeCun as a lone genius.

The headline is wrong in almost every way possible: LeCun has not been right about everything, or even consistent in his own views over time, particularly with respect to LLMs (see below). In many of the things that he is now challenging, he is far from alone, though the headlines sometimes paint him like that, and far from being the first in raising the challenges that he has raised.

In fact, most of what LeCun says has a long history that precedes him. And in every way possible he conveniently ignores that history, in order to exaggerate the originality of his ideas. By and large he has been effective at this. The myth of him as lone genius has worked for him, at least in the popular imagination.

But it is not true — or even remotely so.

For the most part, LeCun is nowadays known in the general public for five ideas: convolutional neural networks, his critique of large language models, his critique of the scaling hypothesis, his advocacy of commonsense and physical reasoning, and his advocacy of world models. The thing is, (a) he originated exactly none of these ideas, and (b) he rarely if ever credits any of the people who actually did. This reflects a consistent pattern known as the plagiarism of ideas.

Per the US Office of Research Integrity, the plagiarism of ideas is defined as “Appropriating someone else’s idea (e.g., an explanation, a theory, a conclusion, a hypothesis, a metaphor) in whole or in part, or with superficial modifications without giving credit to its originator”. He has done this over and over and over.

Convolutional neural networks (CNNS) are, without a doubt, a foundational contribution to AI; they found applications in image recognition, speech recognition, natural language processing, recommendation systems, and many other areas. Until large language models became dominant, they were one of the leading techniques used in machine learning. (The LSTM, published by Hochreiter and Schmidhuber, 1997, was also in very widespread use, exceeding CNNs in commercial use according to one study.) And there is no doubt that LeCun played a role in developing convolutional neural networks. But he neither invented them nor was he the first to apply the back-propagation algorithm to learning their weights (though many people mistakenly believe this).

The foundational work was done by Kunihiko Fukushima in 1979-1980; Wei Zhang et al (1988) beat LeCun to adding the back-propagation to convolutional neural networks in little-known work in 1988 that was published in Japanese (with an English abstract). LeCun had the good fortune of publishing in more prominent places the next year in English, and he has devised important tricks for improving their performance, but his work wasn’t first. LeCun rarely mentions his predecessors.

Schmidhuber has documented this ongoing pattern numerous times, presenting receipts in a short history of convolution neural networks, in documentation of how a key paper by LeCun neglects critical past work, in explication of how a list by LeCun of key recent inventions again neglected critical past work, and in a detailed discussion of LeCun and his collaborators’s consistent omissions of previous work.

Yet LeCun often slights Zhang’s pioneering work; strikingly, no mention of it was made at all in LeCun’s most cited survey.

The Wall Street Journal article is in part about LeCun’s critique of LLMs, and his critique has been noted in the press numerous times. In no way was LeCun there first, either.

I was likely the first, with a series of challenges to GPT-2 and GPT-3 on Twitter in the fall of 2019 and in a series of articles in 2019 and 2020.

What is striking is that LeCun was, at the time, publicly hostile to those critiques, accusing me of a “rearguard action.” I spent much of the next several years arguing against LLMs; LeCun frequently tussled with me, and never once publicly supported my critique. (It’s only when ChatGPT eclipsed Meta that LeCun began to be sharply, publicly critical of LLMs) LeCun has in fact probably never cited my own critiques, and always presented his own critiques as if they were his own original ideas.

Likewise, LeCun rarely if ever cites Emily Bender et al’s 2021 prominent stochastic parrots paper, an influential and important critique of LLMs that also preceded the era in which LeCun began to loudly critique LLMs.

Over a year later, November 2022, LeCun was still loudly promoting (his company’s) LLMs as “amazing work”:

He only fully changed his mind a few weeks later, after Galactica tanked and ChatGPT ate his lunch. Consistent “for 40 years” he has not been.

Most ludicrous of all is the WSJ portrayal of LeCun as somehow isolated in doubting that LLMs can reach AGI. A recent AAAI survey showed that this is in fact the majority opinion among a broad sampling of researchers and academic scientists, by a wide margin.

LeCun is also making waves for his criticism of scaling. The Wall St. Journal reports:

We are not going to get to human-level AI just by scaling LLMs,” [LeCun] said on Alex Kantrowitz’s Big Technology podcast this spring. “There’s no way, absolutely no way, and whatever you can hear from some of my more adventurous colleagues, it’s not going to happen within the next two years. There’s absolutely no way in hell to–pardon my French.”

But, again, LeCun wasn’t here first. Instead, I was probably the first person to doubt this publicly, back in 2022:

There are serious holes in the scaling argument. To begin with, the measures that have scaled have not captured what we desperately need to improve: genuine comprehension…

What’s more, the so-called scaling laws aren’t universal laws like gravity but rather mere observations that might not hold forever, much like Moore’s law, a trend in computer chip production that held for decades but arguably began to slow a decade ago.11

At the time LeCun was hardly supportive; instead he trolled my critique on Facebook. In no way has LeCun ever publicly acknowledged that I was on target with my conjecture that scaling LLMs would not lead to AGI. To the contrary, LeCun has repeatedly pretended that the anti-scaling idea originated with him, and he continues to falsely paint himself as the first and lone person to see this.

Another common argument of LeCun in recent years is to say that LLMs lack common sense and are poor at physical reasoning. For years, though, he hardly seemed to emphasize the problem at all; in LeCun’s famous 2015 Nature paper on deep learning, common sense is only mentioned once, in passing, with zero citations.

In reality, others that he rarely or never cites had been concerned about the problem for years, going back to John McCarthy (one of AI’s actual godfathers) in the late 1950’s, Pat Hayes in the 1970s and 1980s and in this last two decades by my long-term collaborator Ernest Davis, literally in the same department as LeCun at NYU for the last quarter century, yet Davis’ work on commonsense reasoning is mentioned by LeCun scandalously rarely.

Also mentioned in the WSJ article is that LeCun is excited about world models, “a technology that LeCun thinks is more likely to advance the state of AI than Meta’s current language models”.

As it happens, the idea of world models is not new. As noted here a few months ago, it goes back to the 1950s, and Herb Simon’s General Problem Solver. Schmidhuber has been advocating adding world models to neural networks, as far back as 1990, and in important technical work in 2015, as well as in a more recent article with David Ha, now CEO of the very well-funded Sakana Labs. True to form, LeCun rarely refers to this work in his public presentations.

Likewise, Ernest Davis and I argued strenuously that the field should pay more attention to world (cognitive) models in our 2019 book Rebooting AI, which LeCun was dismissive of. Challenge with LLMs representing world models have been central to my own critiques of LLMs since the fall of 2019, and were one of four key foci in my 2020 article The Next Decade in AI, perhaps the first lengthy discussion of the need for integrating world (cognitive) models specifically with LLMs. I don’t believe LeCun has ever acknowledged any of this, except in 2019 when he originally dismissed my claims.

Similarly, Fei-Fei Li is building a world-model focused AI startup, World Labs; based on past experience, I will be surprised if LeCun ever gives her endeavors much mention.

Someone on X summed it up well Saturday:

The eminent AI researcher Hector Zenil made similar points on LinkedIn on Sunday morning:

Yann LeCun has, without a doubt made genuine contributions to AI, and I am pleased to see him speak out again the limits on LLMs. But he has also systematically dismissed and ignored the work of others for years, including Schmidhuber, Fukushima, Zhang, Bender, Li and myself, in order to exaggerate his own contributions. With the help of Meta’s media lobby he has succeeded in fooling most of the press and some fraction of the public. LeCun has lapped it up, and done absolutely nothing to set the record straight.

But the myths about the originality of his thought simply aren’t true.

Whether he can produce genuinely original ideas in his new startup remains to be seen.

Gary Marcus began critiquing traditional neural networks and calling for hybrid neurosymbolic architectures in his first publication in 1992, advocated vociferously for neurosymbolic cognitive models in his 2001 book The Algebraic Mind, in which he anticipated current troubles with hallucinations and unreliable reasoning. He first warned how these limits would apply to LLMs in 2019, emphasizing their lack of stable world models.

The False Glorification of Yann LeCun

Tags: