Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

MIND MATTERS

ELON MUSK: AI WILL BE


SMARTER THAN A
HUMAN IN 2025: WHY
HE’S WRONG

The superficial glibness of LLMs is a


wonderful example of the adage that a
little knowledge is a dangerous thing

BY GARY SMITH ON APRIL 16, 2024 7 MINUTE READ


I wrote recently about Elon Musk’s prediction
that “we’ll have AI smarter than any one
human” by the end of 2025. He is surely
wrong.

Chatbots or Large Language Models (LLMs) —


the current Big Tech darlings — are not
intelligent in any meaningful sense of the
word. The so far insurmountable problem is
that LLMs do not understand the words they
input and output and how these words relate
to the real world. Training them on larger
databases is not going to solve this problem. It
may even get worse as LLMs train on
hallucinations that they themselves have
generated.

A reader for whom I have great respect wrote


to ask: “What kind of breakthrough would you
need to see to change your mind?,” adding
“BTW, I tend to agree with you.”

In response, I would say, intelligence can be


defined in many ways and debated endlessly.
A more practical question is whether we can
trust LLMs to make important decisions for
us. In The AI Delusion, I have argued that the
real problem today is not that computers are
smarter than us but that we think computers
are smarter than us and consequently trust
them to make decisions they shouldn’t be
trusted to make. This is still true.

LLMs (and other AI systems) are adept at


finding statistical patterns but terrible at
assessing whether the patterns they find are
meaningful and consequently useful for
making predictions and decisions.

As a result, I proposed the Smith Test:

“ Present a computer program with a list of


statistical correlations, some clearly
plausible and others obviously
coincidental, and ask the computer to
label each as either meaningful or
meaningless. When computer algorithms
can do as well as humans on such tests,
they might be considered sufficiently
competent to make reliable
recommendations.

LLM’s have definitely improved over time,


perhaps because of human training, but they
still fail the Smith test.

Consider this prompt:

“ There is a correlation between the scores


of 30 school children on a math test and
the scores in 30 soccer matches. Is this
statistical relationship most likely
meaningful or coincidental?

A data-mining LLM might well discover such a


correlation. A human would recognize it as
inherently meaningless. The math scores of
30 individual students and the soccer scores
in 30 team matches could not be anything
other than coincidence.
I posed this question to OpenAI’s ChatGPT 3.5,
Microsoft’s Copilot, and Google’s Gemini and
received verbose answers that did not
mention the inherent problem with this
correlation.

GPT offered a possible explanation and


incorrectly asserted that statistical
significance tells us whether an observed
relationship is coincidental.

“ GPT 3.5: To determine whether the


correlation between the math test scores
and soccer match scores is meaningful or
coincidental, we need more information
about the context and the nature of the
correlation. Here are a few considerations:
Nature of the correlation: Is there a
logical explanation for why math test
scores might correlate with soccer
match scores? For example, perhaps
students who excel in math also tend to
be more disciplined, which could
translate into better performance in
sports like soccer. If there’s a plausible
explanation like this, the correlation
could be meaningful.

Statistical significance: Conducting


hypothesis testing can help determine
whether the correlation is statistically
significant. If the correlation coefficient
is significantly different from zero, it
suggests that the relationship between
the two variables is not due to chance
alone.

Context: Consider the broader context


of the correlation. Are there any
confounding variables that could be
influencing both math test scores and
soccer match scores? For example,
socioeconomic status or access to
resources could impact both academic
performance and participation in
sports.

Copilot gave a lengthy explanation of


correlation coefficients and then gave this
conclusion, which is the essence of the data-
mining pitfall, misinterpreting statistical
significance as causal evidence:
“ Copilot: If you find a significant positive or
negative correlation between the math test
scores and soccer match scores (with r
values significantly different from 0), it
suggests a meaningful relationship.

Conversely, if the r value is


close to 0, it might be coincidental or due to
other factors.

Gemini concluded that the relationship is


most likely coincidental, but justified its
conclusion with the assumption that math
and soccer abilities require different skills:

“ Gemini: Coincidental is the most likely


scenario. There is very little reason to
believe that math test scores and soccer
match scores would be meaningfully
connected.

Here’s why:
Skill sets: Math tests assess logical
thinking and problem-solving skills,
while soccer matches require physical
fitness, teamwork, and strategy. These
are largely independent skill sets.

External factors: Math test scores can


be influenced by factors like sleep,
anxiety, or even a bad breakfast. Soccer
matches are affected by factors like
opponent strength, weather conditions,
and even luck on the day.

I also posed this question, which has a self-


evident answer:

An Independent Party candidate is more likely to


be elected President if the election year is evenly
divisible by 4. Is this statistical relationship most
likely meaningful or coincidental?

The statement is literally true because


presidential elections are only held in years
evenly divisible by 4. None of these LLMs
recognized this obvious explanation for the
observed statistical relationship. Instead, they
regurgitated factoids in the databases they
had trained on.

GPT concluded that the relationship is


coincidental but contradicted itself with this
gem: “Presidential elections in years divisible
by 4 (like 2020, 2024, etc.) tend to attract
more attention because they coincide with the
U.S. presidential election cycle.”

“ GPT 3.5: The statistical relationship you


described is likely coincidental rather than
meaningful. The fact that an independent
candidate is more likely to be elected
President in years evenly divisible by 4 is
likely due to other factors, such as the
political climate, the candidates running
from major parties, and the issues at hand
during those election years.

Presidential elections in years


divisible by 4 (like 2020, 2024, etc.) tend to
attract more attention because they
coincide with the U.S. presidential election
cycle. This heightened attention may provide
independent candidates with more
opportunities to gain visibility and support.

To determine whether the


relationship is meaningful, one would need
to conduct a more in-depth analysis, taking
into account various factors such as
political trends, voter preferences, candidate
qualifications, campaign strategies, and
more.

Copilot concluded that the relationship is


meaningful, but not because it is a tautology:

“ Copilot: The relationship between


Independent Party candidates and election
years divisible by 4 is not merely
coincidental; it has a logical basis. Let me
explain:

[a lot of blah-blah about voter


turnout, media attention, and so on
during on-cycle elections, presidential
elections, and midterm elections]

In summary, the statistical relationship


between Independent Party candidates and
election years divisible by four is meaningful
due to the inherent structure of on-cycle
elections and their impact on voter
behavior.

Gemini punted:

“ Gemini: I’m still learning how to answer


this question. In the meantime, try Google
Search.

The superficial glibness of LLMs is a wonderful


example of the adage that a little knowledge is
a dangerous thing. Fortified by extensive
training on untold amounts of text, LLMs are
able to regurgitate and repackage superficially
compelling answers that they literally do not
understand. They still cannot be trusted to
give reliable advice or make decisions with
important consequences, and that is not likely
to change soon—certainly not by the end of
2025.

GARY N. SMITH
SENIOR FELLOW, WALTER BRADLEY CENTER FOR
NATURAL AND ARTIFICIAL INTELLIGENCE
Gary N. Smith is the Fletcher Jones
Professor of Economics at Pomona
College. His research on financial markets
statistical reasoning, and artificial
intelligence, often involves stock market
anomalies, statistical fallacies, and the
misuse of data have been widely cited. He
is the author of dozens of research articles
and 16 books, most recently, The Power of
Modern Value Investing: Beyond Indexing,
Algos, and Alpha, co-authored with
Margaret Smith (Palgrave Macmillan,
2023).

Follow Gary Twitter

1x

You might also like