^{Gary Smith
April 8, 2025

7

Business and Finance, Large Language Models (LLMs)}

Machine Learning Algos Often Fail: They Focus on Data, Ignore Theory

_{Without a theory, a pattern is just a pattern} _{Gary Smith
April 8, 2025

7

Business and Finance, Large Language Models (LLMs)}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

A crucial element of the scientific revolution is an insistence that theories be tested empirically — when possible, with randomized controlled trials. Machine learning algorithms reverse this wisdom.

Instead of using data to test plausible theories, they ransack data for statistical patterns, with no particular theory to guide their plundering. In fact, theories are considered an obstacle to “knowledge discovery.” This reversal of the scientific method is called “data mining” or HARKing (Hypothesizing After the Results are Known).

ChatGPT and other large language models (LLMs) are statistical pattern-seekers, but the allure of data mining predates LLMs. Back in 2008, Chris Anderson, at the time the editor-in-chief of Wired, wrote an article with the provocative title, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”:

With enough data, the numbers speak for themselves…. The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

Before the ascent of machine-learning algorithms, data mining was considered a sin comparable to plagiarism. “Data without theory” was known to be unreliable. “Correlation is not causation” was understood as a truism. Now, data mining is considered virtuous. Correlation is said to supersede causation.

The black box problem

These algorithms are fatally hobbled by their inability to relate the data they input and output to the world we live in. They have no way of assessing whether the statistical patterns they uncover are knowledge or nonsense. If the algorithms are inside a black box, humans cannot save them from their folly.

Here is example that I recently discussed in an investments class I teach. Merrill Lynch featured the graph below on the cover of the April 7, 1986, issue of its newsletter Investment Strategy Focus:

The accompanying analysis argued that,

While it is not a perfect fit, there is enough correlation to provide an interesting reading. At the present time, the bond yield is about as far below the crude oil price as it was during the 1976/78 period. If one is banking on a lower price of oil in the near term, it is conceivable that long term rates could trend lower as well.

The visual impression is striking and there is, in fact, a 0.9 correlation between these two variables. There is even a superficially persuasive theory: oil prices are likely to increase during inflationary periods and interest rates tend to rise when inflation increases.

However, the theory is more subtle. Interest rates tend to increase when the rate of inflation increases, true enough, but the figure shows the price level, not the rate of change of prices, and these are two separate things. The level of housing prices and the rate of change of housing prices, for example, are different. So, too, are the level of stock prices and the rate of change of stock prices. Stock prices could be high, but falling, or stock prices could be low, but rising. The same is true of oil prices.

Imagine that there is a double-digit annual rate of inflation for a few years and then inflation stops, leaving the price level constant. Interest rates might rise to double-digit levels during the inflation. But, once the inflation stops, interest rates will come back down to a level consistent with zero inflation. The Merrill Lynch figure, in contrast, implies that if oil prices were to rise to $40 and stay there, interest rates would rise to 14% and stay there.

Using data without theory, a machine-learning algorithm might stumble on the 0.9 correlation between oil prices and the 10-year Treasury rate during these years and use this “knowledge discovery” to predict oil prices from interest rates, or vice versa, in other years.

ChatGPT and other large language models (LLMs) are data mining algorithms but they can go beyond simple correlations in that they can use the immense databases they have trained on to generate explanations for the patterns they find.

Testing ChatGPT on the same data

I gave ChatGPT the data in the Merrill Lynch figure and asked:

Can you analyze these data for me on R (the interest rate on 10-year Treasury bonds) and P (the price per barrel of West Texas crude oil)? Can either of these variables be used to predict the other?

ChatGPT regurgitated an impressive (but verbose and largely irrelevant) recounting of events that affected oil prices and interest rates during these years and then generated this conclusion:

Research indicates that bond yields [interest rates] can effectively predict oil prices. A study published in Energy Economics found that long-term government bond yields have substantial explanatory power on oil returns.…

Another study in Financial Innovation demonstrated that the forward-looking volatility of the 10-year U.S. Treasury yield is a useful state variable capable of predicting the returns and volatility of crude oil prices.…

These findings suggest that while bond yields can be used to predict oil prices, the reverse relationship—using oil prices to predict bond yields—is less evident. The causality appears to flow primarily from bond yields to oil prices.

ChatGPT’s citation of published research is impressive though — as we learned with Dr. Watson’s ill-fated attempts to find a cure for cancer — “there’s many a slip ‘twixt the cup and the lip” when it comes to drawing conclusions from research papers. For example, LLMs are unlikely to differentiate between good and bad research, to recognize when some studies have been contradicted by other studies, or to notice subtleties in the research.

Here, ChatGPT made the same mistake as the Merrill Lynch analysts. They cited studies reporting a relationship between interest rates and oil returns (i.e., changes in oil prices), and ChatGPT misinterpreted this conclusion as a relationship between interest rates and the level of oil prices. They also reversed the direction of causation.

The figure below shows what happened over the next several years. Instead of converging, the gap between the two lines widened. The statistical relationship in the 1973–1986 data predicts that, when the price of oil went above $130 in 2008, the 10-year bond rate would be above 30%; the actual interest rate was 4%. When the 10-year bond rate dropped to 0.62% in 2020, the price of oil was predicted to be –$26. (Yep, that’s a negative price!) The actual price was $41.

Of course, Merrill Lynch analysts made the same mistake as ChatGPT, but humans are educable. That is why I talked about this example in my investments class. We can learn the general principle and apply it in a variety of contexts. Data-mining algorithms, in contrast, might recognize, after the fact, that a discovered statistical pattern has vanished. But they will not know why because they never had a logical explanation for the pattern in the first place. They put data before theory.

I’ve written elsewhere about the nutty claim by some researchers that jobs that require intelligence are more at risk of being taken over by LLMs. The stupidity of LLMs and other machine-learning algorithms is precisely why they cannot be trusted to make decisions requiring genuine intelligence — and why they are little threat to jobs that benefit from intelligence.