Recursivity: 2025

In the past I've commented on bad discussions of thinking, intelligence, brains, and computers, such as those by Gary N. Smith, Doug Hofstadter, Arthur Gardner, and Robert Epstein.

With the rise of LLM's like ChatGPT, the commentary has only gotten worse. We see the same kinds of mistakes and bad argumentation tactics that have been used for decades. Here are some of them.

Commenters will always say things like "It looks like an LLM is thinking, but it's not really thinking." But they'll never explain what the difference is between "thinking" and "really thinking".
Commenters will always say things like "It's not really thinking, it's just X", where X is some mechanical or physical process. This is a mistake that goes back to Leibniz, and just amounts to a categorical denial that atomic computational processes can give rise to minds.
Whenever a new advance in AI comes along and falsifies a prediction along the lines of "computers, lacking true intelligence, will never do X", commenters won't admit that they do have "true intelligence". They'll just say, "Oh, I guess I was wrong, doing X doesn't require true intelligence."
Commenters will use vague words like "understand" and "think", but never provide any definitive tests by which we could determine whether a computational system does these actions.
Commenters will always compare AI to only the most competent humans, never to the average person. For example, when ChatGPT matches or slightly exceeds average human performance on some benchmark, this will not be regarded as evidence of thought. "Their results are not always trustworthy!" people will say of LLM's, ignoring the fact that neither are people, encyclopedias, or any other source of information we depend on.
Commenters will claim that some new AI implementation lacks some quality of humans. Maybe it's that the AI is not "embodied", or "lacks feelings", or "doesn't have beliefs" or "lacks ground truth". But they won't provide any clean, direct argument that these things are needed to be think or be intelligent. It's the old "airplanes aren't really flying because they don't flap their wings like birds" fallacy. This is where a background in the theory of computation helps. One of the first things you learn is that one computational model can simulate another. Having gears or transistors or neurons aren't fundamental aspects of computation.

Here is yet another example. There is a new online course by Carl T. Bergstrom and Jevin D. West, professors at the University of Washington. These are distinguished scholars, and, as you might expect, their analysis is not as bad as some of the others I've cited above. But it is noteworthy that neither of these two professors has a background in neuroscience, the theory of computation, or machine learning.

This is not to say they are entirely wrong. Some of the things they say make sense. They talk about the limitations of some current LLM's, and they warn about the dangers of relying on current LLM's without checking their results independently. They caution that putting LLM's in charge of critical systems is a danger. They argue that use of LLM's by students may result in decreased educational gain. I agree with all these things. But they also make mistakes and they use some of the tactics mentioned above.

I will quote some passages from the current (July 12 2025) version of their online course, followed by my commentary.

#1: "Given a string of words, you guessed the next one in the sequence. This is basically all that ChatGPT and other LLMs are doing."

This is not true, or at best, an oversimplification, and it illustrates one my main criticisms. They make assertions about LLM's that are meant to apply to all such models, but they don't. For example, o3 can do much more than simply "guessing the next one". It appears to do step-by-step reasoning, and it can back up its claims with cited references. Furthermore, it narrates the steps of its reasoning so you can follow along.

This assertion of Bergstrom and West is also a classic example of the Leibniz mill fallacy, one that was well-satirized here.

#2: "They don’t reason the way that people do. They don’t have any sort of embodied understanding of the world." and then later on "It doesn’t ‘think’ about your question the way a person does."

Perhaps this is true, perhaps not. But they don't know with certainty, for example, that next-word prediction is not part of most human interactions; I would argue that it is! But the assertions are also misleading, because there is an implication hiding in the background, something entirely missing from their argument: namely, even if we grant their claim that LLM's don't reason like people do and don't have an "embodied understanding of the world", it doesn't follow that this is some really fundamental limitation on their capabilities. LLM's produce correct results at a rate that can't simply be explained as chance alone.

#3: "They don’t even have a fundamental sense of truth and falsehood."

This criticism fails because they don't say what it would mean for anything (a person, a machine) to have a "fundamental sense" about anything. Suppose I say in response, actually, some LLM's do indeed have a "fundamental sense of truth and falsehood". How could we test that assertion? What method would they use? I think the assertion can be judged as "not even wrong", to use a phrase of Pauli.

#4: "But don’t let the impressive capabilities of LLMs lure you into thinking that they understand human experience or are capable of logical reasoning."

Two claims, one that is simply wrong ("not capable of logical reasoning") for some LLM's and one that is not even wrong, namely that they can't "understand human experience". The reason why it is "not even wrong" is that they offer no definition of "understand" for which we could carry out a test. "Understand" is such a vague word that one can make such assertions and not back them up.

#5: "It's not intelligent. It doesn't understand anything. It doesn't think."

Yet more assertions made without any supporting evidence. And assertions using words with vague, complicated, and multifaceted meanings like "intelligent", "understand", "think".

I would argue that, in any reasonable understanding of these words, some LLM's are intelligent. It is reasonable to say that many of them do understand things, and that many of them do indeed think. The fact that Bergstrom and West themselves use these words and then characterize this use as "fall[ing] in[to] a trap" illustrates that the colloquial use of these words to describe what LLM's do is, in fact, quite reasonable.

But if we wish to argue about these things, I think there is a need to provide some definitive tests by which we could decide,

* is this particular LLM intelligent? How intelligent?
* does this particular LLM understand anything?
* does this particular LLM think?

Another problem with these assertions is that they seem to suggest that, for example, something is either intelligent or it is not. This is the kind of black-and-white thinking that pervades so much of the discussion about thinking machines. I have criticized this in detail in a number of my blog posts here. Why could it not be that LLM's display an intelligence that perhaps does not yet match humans in some areas, but outperforms them in others? Why must intelligence be reduced to one single capability? Why do we not measure it on a continuum? Intelligence is multifaceted and many people have written about the multiple kinds of intelligence.

#6: "Teddy: Should I trust them?

Carl: No.

I think you can use them to suggest things to try, suggest questions to ask, suggest things to research, but you can't trust them to give you correct answers."

This is interesting for me because it illustrates yet another way there is an implication behind the words. Here the implied syllogism seems to be "because the answers LLM's can't be trusted to give correct answers all the time, they can't be trusted or useful at all". And yet exactly the same thing is true of people, encyclopedias, dictionaries, newspaper articles, books, scientific journals, and all sorts of other ways we gain an understanding of the world! What academic does not have a colleague that makes a pronouncements with confidence on a broad range of issues in and outside their own competence? Such a colleague is very often right, but sometimes wildly wrong. Should we then condemn all academia in the same way? We always have the same problem: we need to check the answers, whether provided by an LLM or a person.

Whenever you come up with a criticism of LLM's that is also applicable in exactly the same way to people, then you're not really criticizing LLM's at all. You're just faced with a fundamental problem of epistemology.

#7: "These systems have no ground truth, no underlying model of the world, and no rules of logic."

Well, once again, a combination of what I would consider false statements and statements that are "not even wrong". No underlying model of the world? Of course they do. They have a model of the world based on reading literally millions of texts. If we were communicating with aliens light-years away, their understanding, their model of our world would be similarly based. No rules of logic? That's another example of a claim made but not supported. How do you know that a prediction model can't result in reasoning that is logic-based? I see no mathematical proof of this.

Some LLM's are indeed trained on corpuses of "ground truth", that is, a set of assertions about the world that has been created and checked by experts, such as textbooks, academic papers, case law, etc. I'd include Wikipedia, except that a lot of academics have a low opinion about Wikipedia, too (which in my opinion is unjustified). Ground truth also finds a role in fine-tuning of models.

But more importantly, I think the emphasis on "ground truth" as an important foundation of knowledge is misplaced. To name just one example, people functioned for hundreds of thousands of years believing that the world was flat. This was their "ground truth", no pun intended. The fact that it is not correct didn't mean that they were not intelligent, that they didn't have minds, or that their reasoning was fundamentally flawed; they were still able to live their lives and exhibit intelligent behavior such as navigating successfully in unfamiliar terrain, and travelling long distances by sea.

The same thing applies, with less time, to things like the belief that the speed of light is instantaneous. So I think the criticism that there is no "ground truth" for LLM's does not, once again, amount to a really substantive criticism. To turn it into one, you would have to have a convincing reason why a machine that thinks has to have some particular ground truth in order to be considered intelligent. And if it does need this, how much ground truth is needed? Quantify it.

#8: "Not only do LLMs sometimes fabricate incorrect answers, they also obscure the information sourcing—the blue links—that are part and parcel of traditional search."

Another example of a blanket assertion about LLM's that fails, for example, on models like o3.

#9: "But when an LLM is the author, there is no mind there for a reader to glimpse."

Another example of "not even wrong", which rests on the vagueness of "mind". Given a computer system, what method would we use to decide whether it has a "mind" or not? Is it really a black-and-white quantity? Do only people have minds? How about animals? Where is the line you draw?

I would argue that not only is it is reasonable to say that there a "mind" in some LLM's, it is actually a representation of pieces of thousands or millions of minds. I think that is a much more useful way to think about them.

There is now a 60-year history or more of people making assertions that machines "don't have minds", "don't really think" and "aren't really intelligent". With each new advance in AI, instead of saying, "oh, maybe we were wrong, maybe they really do think after all", the consistent response has been, "sigh, I guess you don't need real intelligence to be able to do X after all". Witness Doug Hofstadter and his remarks about chess after Deep Blue. There is also a 60-year history of people making firm predictions that "machines will never be able to" do a variety of things, including play chess, play GO, play ping-pong, transcribe human language, translate human language, write a good song, paint a good painting, and so forth. The fact that all of these predictions have now proved false should give everyone pause on making similar assertions.

Recursivity

Sunday, July 13, 2025

Yet Another Bad Analysis of AI

-

Recursivity

Blog Archive

Other Blogs to Visit