AI 'Godfather' Yann LeCun: LLMs Are Nearing the End, but Better AI Is Coming

Newsweek Contributing Editor for AI and President Emeritus of Bell Labs

🎙️ Voice is AI-generated. Inconsistencies may occur.

Yann LeCun always reminds me of the very best of Bell Labs' scientists and engineers—a unique breed of individual, fiercely independent of thought and action, who thrive within company structures that typically value obedience and conformance to the corporate mantra and goals. In my experience, corporate parents only tolerate such independence of thought and deed as the price to pay for attracting the best and brightest minds that are the fundamental catalyst for disruptive innovation. But I find that this understanding and "détente" is increasingly rare in modern business culture in which uniformity and alignment seem to be prized above all, possibly due to the hyperbolic nature of our times and the denigration of dissenting voices. So, as they say, props to Meta for fostering LeCun and his radically honest and contrarian voice.

This independence of view is most clearly demonstrated by LeCun's observation in our Newsweek AI Impact interview that LLMs are doomed to the proverbial technology scrap heap in a matter of years due to their inability to represent the continuous high-dimensional spaces that characterize nearly all aspects of our world. This honest critique from one of the "Godfathers of AI" flies in the face of Meta's significant investment and LeCun's own contributions to the Llama LLM family and their clear ambition to develop commercial solutions from such systems.

However, the argument that LeCun makes has an irrefutable logic. The essence of generative AI systems is that they are able to learn an entire body of knowledge and use that as the basis for generating "new" content of value, due to the massive volume of information to which they have access during training and an enigmatic inference process that produces summary output that is readily accessible and contextually relevant. But, as LeCun points out, without a model that will allow this massive volume of data to be efficiently mapped into an abstract representation that helps constrain and construct the rules of domain of interest, i.e. to be able to accurately simulate the world, one has to resort to learning every possible scenario and perspective in order to produce outputs that are free from unacceptable errors and robust enough to be predictive of any next state of the space. Moreover, this must be done reliably and repeatedly over any timescale or length-scale of interest, and with the requisite foundational principles, e.g. gravity, object permanence, thermodynamics or physical and chemical laws, correctly encapsulated.

It has come as a surprise to nearly everyone—even experts in linguistics—that human language fits this requirement for such a discrete, low-dimensional space, and so LLMs have been able to digest the vast majority of everything that has ever been written and generate startingly coherent answers to a wealth of different queries. No one knows exactly how LLMs achieve this, but it is likely through a process that both mirrors how humans learn from language, by clustering related concepts and creating dynamic concept maps, as well as by echoes of human writings, which David Eagleman described in an earlier Newsweek AI Impact interview as "intelligent echo illusions."

Yann LeCun AI Impact 2 Yann LeCun AI Impact 2 Photo-illustration by Newsweek/Getty

LeCun and I discussed this language and conversational proficiency, and we agree that these models are more than "parrots" in that they have indeed developed some understanding of concepts via clever statistical clustering via the attention mechanism, but they cannot reason or plan without additional assistance from dedicated tools or expert agents. They appear to do so, using the intelligent echo illusion trick, as well as recent techniques such as chain-of-thought reasoning. But, in reality, this is still just a statistical sampling of the possible answer space along different trajectories, which does not connote any real understanding. The apt analogy LeCun uses is of two types of students: some learn by rote memorization and do well on tests but they lack the ability to answer questions beyond the learned scope, whereas other students learn the underlying concepts and create mental models that they can use to answer a wide(r) range of questions. It is clear that current LLMs apply the former approach and have the attendant limitations, whereas most humans can employ either approach depending on the task at hand.

A recurring theme of this series of interviews has been the wide support for Daniel Kahneman's System 1 versus System 2 framing of how the human brain operates to allow both rapid, heuristic-based, automated responses based on efficient pattern recognition (System 1) and more deliberate, analytical and exploratory cognitive processing (System 2).

LeCun argues that AI systems today are all forms of System 1. For example, an LLM produces one token after another through a fixed amount of computation by reacting only to the current token without any larger reasoning. The addition of chain-of-thought type reasoning is an attempt to allow contemplation but, as mentioned above, this is still just a statistical approach, exploring multiple paths with different probabilities before rendering an answer. This is, at best, what one might ascribe as a "System 1.1," and it's exponentially more expensive than System 1 due to the n² explorations that these methods require, rendering it inefficient relative to the alternative System 2-type approach.

Importantly, System 2 requires a model of the world to reason and plan over multiple timescales and abstraction levels to find the optimal answer. As LeCun points out, these representations must be at the right level of abstraction to make useful predictions for the task at hand. LeCun again provides an illustrative example of how a person manages to compute and execute all the tasks required to travel from New York to Paris using a combination of different world model abstractions, e.g. creating a mental map of the relative locations of New York and Paris; a computation of the optimum path and which modes of transportation to employ; creation of a plan that combines these elements into a sequence with timescales; execution of the set of actions required to make the relevant reservations; evaluation of other tasks that must be completed prior to departure including the belongings to pack, to name just a few of the planning and reasoning tasks undertaken. This example is currently quite pertinent, as automated agents have begun to appear that offer to perform the travel-planning part of the task, leveraging a variety of different agentic services "under the hood," and do so with quite impressive proficiency. But these systems are still not modeling the world as humans do; they are compiling statistically likely routes for human digestion and selection. If they were truly behaving in a human-like way, they would be able to reuse the underlying framework for myriad different tasks at different time and length-scales, including controlling all the necessary motion and sensing of the environment with adaptation to every experienced nuance. They would also be able to with replan and re-optimize every second, minute and hour, keeping the objective clearly in mind, all while processing multiple other unrelated tasks.

Importantly, LeCun highlights how all prior attempts by AI systems to use visual information directly to predict the next state(s) of a real-world scene have failed, as the direct visual information is too continuous, with too high a dimensionality in object, sensory and resolution space to allow reliable prediction—in short, there are simply too many unknowns when operating at the "sensory pixel level" in the real world. Indeed, it is no coincidence that this is not how humans process the physical world. As Eagleman points out, the brain is creating and running a phenomenally rich model of the world that only receives minimal input from the senses to "ground" it at each point in time. So, in essence, all the extraneous information not required by the model is eliminated, allowing continuous computation of trajectories through the mental model space (either real or imagined) with unparalleled efficiency.

However, it is important to note the brain does not always employ its System 2 in this way; in fact, Kahneman argues that it desperately tries not to employ System 2 at all, as it is effortful, time consuming and can only focus on a single task at a time, so there is a real cost associated with doing so. Instead, System 2 summarizes its prior thinking into System 1 automata to allow fast response times to familiar circumstances with minimal effort. In computing terminology, System 2 compiles its complex high-dimensional, symbolic code into System 1 executables.

Defining Future AI Systems

Given the preceding discussion, it is reasonable to argue two things about the future of AI systems:

1. They must possess a System 2–type capability that uses abstract representations of the world or domains of interest

2. They must have well-defined goals and guardrails that appropriately bound the scope of exploration

These two elements are clearly defining aspects of intelligence. I have written at length about the nature of intelligence in a previous article. LeCun chooses to define it as "a collection of skills and an ability to acquire new skills quickly, with minimal or no learning." I would generalize this to include acquisition of new skills and perspectives, then this is as good a definition as exists.

LeCun argues that language plays a role in intelligent processing, but it suffers from being too compressed and too simplified, so that a lot of the complexity of the real world just cannot be expressed in this way. Indeed, as Eagleman says, "Language is a super-compressed package of meaning that gets unpacked by the receiver," but this is almost by definition a "lossy" compression, and the subsequent unpacking and mapping onto the receiver's (different) mental model compounds the errors.

Fortunately, many types of intelligence are not language-based. For example, many engineering, scientific, or mathematical representations are pictorial or abstract. The representations we create in our neocortex of these spaces are much more robust than language-based models as they are more specific and less subjective. It is, therefore, reasonable to conjecture that the most progress will be made in building AI systems that emulate the more objective functions and domains before we tackle subjective domains, e.g., the creative arts and social dynamics.

Now, considering the second requirement above, a note of caution is in order with respect to goals for AI systems: Without a lived experience of their own, they will need to be imbued with human goals and objectives—but which ones, and whose variants? It is widely accepted that human society relies on two driving behaviors: i) physical and/or psychological dominance, and ii) intellectual predominance, also known as prestige. The former is hardwired in the primitive brain (selecting for physical and social strength) and the latter is probably hardwired into the neocortex (selecting for the strength of intelligence). But the relative interplay of these different human frameworks has not resulted in a stable human equilibrium, so it is valid to ask how can we encode such values safely into machines? Isaac Asimov's famous laws of robotics are an attempt to describe the allowed set of machine-human interactions, but they focus on physical safety alone, which is a sub-space for which it is much easier to find common ground between different people and cultures.

Leaping (or JEPA-ing) Forward

Despite the obvious challenges, LeCun is focusing his team's efforts on building such System 2 AI's, focusing on two variants of his Joint Embedding Predictive Architecture (JEPA). One focus area is to create an AI system that generates software code for a task that understands its own state, e.g., the content of memory and registers, as it executes, so it can self-determine the goodness of its progress towards its goal. The other system uses massive amounts of video images of the world to build an abstraction of the physical world that allows successful prediction of "what happens next" in different scenarios. These efforts are currently showing promise to amplify these two common human tasks: a more recent cognitive task (coding) for which we have no special ability, and the other is one of the oldest tasks (predicting outcomes in a visual scene) for which we have an exceptional ability. Time will tell whether this type of architecture, which has analogous elements to the functional centers of the human brain and their interplay, will help put us on a better path to a brave new world of amplified human intelligence.

But beyond playing a pioneering role in what is developed, LeCun is also clear on how these models should be released and refined. He remains a passionate advocate for open source, as he sees it as vital to allow the creation of the requisite diversity of models and transparency regarding the model operation, as well as to support reliable, inclusive, complete, contextual representations of reality (or realities) with the right guardrails for any culture. Furthermore, he argues that open-sourcing is essential for national sovereignty, so that even if only the richest countries or companies are able to afford to develop the foundational models, anyone can build on top of these models at a lower cost and with the highest degree of certainty that their value sets are reliably represented.

Wrapping up, it is now abundantly clear from the combined views expressed in this series so far that the prevailing wisdom is that LLMs are not the primary path forward for AI, as they don't have a model for anything other than language, which is a relatively simple discrete space. More provocatively, LeCun argues that LLMs will be more or less obsolete in five years and replaced by new models, such as the JEPA-based models he is developing, that have an understanding of the physical world and System 2-type abilities. It is also acknowledged that humans will not be able to comprehend the internal representations used by these AI systems as they will not, in all likelihood, be built on humanlike neocortical substrates and so will not be exactly analogous to the human "way of thinking." But, similar to Eagleman's "Team of Rivals," LeCun argues that there is going to be an interactive "Society of Machines" that keeps an eye on each other, with one machine taking another down when one misbehaves or goes rogue. And another machine will explain the essential operation and decision-making in human terms.

LeCun also mirrors Eagleman's and Brooks' views that foresee the emergence of a new social reality in which humanity steps up in the hierarchy and people increasingly manage AI systems more than they manage other people. The key point in this conception is that although these AIs will be smarter than us in many domains, they will be below us in this new hierarchy, and they will be therefore required to do our bidding, constrained by the objectives we set and the guardrails we impose. Furthermore, LeCun concurs with Hawkins' observation that we should not attribute too much power to intelligence since physical strength, or psychological or biochemical strength, is arguably more dangerous than intelligence. If this were not the case, a virus like COVID-19 would not be able to bring the human population to its knees and, conversely, the most intelligent people would always be in positions of power, which is increasingly evident not the norm.

Netting this all out, one can only conclude that AI will absolutely not lead to human extinction, due both to the limited practical impact of intelligence on societal evolution, and the embedded guardrails in these systems that will ensure human-compatible operation cannot be violated. In human terms: There will be no "free will" in the future of AI systems.