Mind Matters Natural and Artificial Intelligence News and Analysis
3d-molecular-visualization-complex-biomolecule-structure-mod-607936071-stockpack-adobe_stock
3D Molecular Visualization: Complex Biomolecule Structure Models. Generative AI.
Image Credit: Vectorezy - Adobe Stock

AI in Biology: The Future AI Didn’t Predict

It doesn’t look like the past. Physical systems that evolve over time but don’t follow a fixed formula have always presented a deep challenge to AI
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

This is the fifth and final part of Erik J. Larson’s series on the attempt to understand protein folding using AlphaFold. The preceding parts are linked below.

If AlphaFold’s struggles with TIM-3 were an anomaly, that would be one thing. But the deeper problem is that TIM-3 is not the exception but rather the rule.

Many of biology’s most important proteins, especially those implicated in cancer, neurodegeneration, and immune signaling, are functionally disordered. They do not lock into a single predictable shape, and yet AI models trained on static structural data insist on forcing them into one.

This is not just a limitation—it is a fundamental flaw in the logic of AI-driven protein prediction. Until AI moves beyond pattern recognition confined to past observations, it will continue to hit walls that researchers need to address sooner or later. It is not that AlphaFold has improved structural biology in ways that no scientist disputes. But its usefulness is conditional, and like TIM-3, the future of AI in protein science remains “unfolded.”

How IDRs were studied before AI — and still are

The difficulty of understanding intrinsically disordered regions (IDRs) has plagued experimental biology for decades. AI, as we’ve seen, can’t tackle this problem, but experimental methods have proven extremely limited as well.

This area of cell biology calls out for a true revolution. That would mean, not just fast methods of stable fold prediction but making headway on the difficult but frequent and important cases of intrinsic disorder. The problem with both AI and lab techniques is simple: traditional structural methods rely on proteins having a consistent shape. IDRs refuse to cooperate.

the elaborate process of protein folding, essential for proper function within living organismsImage Credit: Nilima - Adobe Stock

For instance, X-ray crystallography, the gold standard for determining protein structures since the mid-20th century, requires proteins to be locked into an orderly, repeating crystal lattice before they can be analyzed (hence “crystallography”). But a protein that never settles into a single form cannot be crystallized. The method that solved the structure of DNA and thousands of well-folded proteins is, for IDRs, useless.

This is a hard truth for biology and modern science to swallow, akin perhaps to the discovery that we can detect chaos in physical systems (with, say, turbulence) but can’t predict it. We don’t know yet the outcome of what we might call “the problem of intrinsic disorder” in protein folding, but we know already that existing methods are largely inadequate — and AI isn’t an exception but more proof of the rule.

Cryo-electron microscopy (cryo-EM) fares only slightly better. Unlike X-ray crystallography, cryo-EM does not require crystallization. That makes it invaluable for the study of proteins that resist the rigid constraints of crystallography. But there is a catch: cryo-EM works by averaging thousands of images of the same molecule to reconstruct a 3D model. When the protein being studied is in constant motion, shifting between dozens or even hundreds of conformations, the resulting image is often a blurry approximation rather than a true representation of its behavior.

Nuclear magnetic resonance (NMR) spectroscopy

For researchers trying to peer into the world of IDRs, the most useful tool has been nuclear magnetic resonance (NMR) spectroscopy. Unlike crystallography or cryo-EM, which seek a single, static image of a protein, NMR captures signals from the atomic nuclei within a protein as it moves in solution. The method can thus provide a kind of moving picture — a glimpse into the full range of shapes an IDR might take at any given moment. It does not offer a single, definitive structure but rather a collection of possibilities, a representation of IDRs as they exist in the cell.

Other techniques — such as small-angle X-ray scattering (SAXS), which provides information about the overall shape of IDRs in solution, and single-molecule Förster resonance energy transfer (smFRET), which tracks changes in protein conformation in real time — have added to the toolkit.

But even with these advances, IDRs remain significantly harder to study than their well-folded counterparts. Unlike stable proteins, which can be frozen in time, IDRs exist as a shifting ensemble of possibilities. Their function is inseparable from their ever-changing form.

The litany of methods structural biologists and other scientists have developed over the years to tame the problem of protein folding has yielded impressive results. Unfortunately, many of the more powerful methods aren’t end-to-end solutions as hoped for. But they do enable scientists to peek at the complexity of protein folding in the absence of accurate predictions of how a particular protein will fold, given an evolutionary history and a set of physic-chemical constraints. This isn’t nothing, of course. But the fundamental problem with our dynamic shape shifting IDRs and IDPs is one of change.

Modeling and predicting physical change is a bugbear for AI that has bedeviled the field since its inception (as is, say, the concept of cause). Computers are good at processing data in a spreadsheet that doesn’t “evolve over time.” But as we’ve seen, evolving unpredictably over time is what proteins often do. This is a problem for AI that can be summed up more philosophically as: the future doesn’t look like the past.

The future doesn’t look like the past

Physical systems that evolve over time but don’t follow a fixed formula have always presented a deep challenge to AI. A rich vocabulary has developed in the field to point to different problems stemming from dynamism, or change over time: the ramification problem, the qualification problem (related to dynamic change), the frame problem, and many others.

It’s one thing to predict outcomes in a system that adheres to a set of repeatable rules — the orbits of planets, the trajectory of a cannonball, even the way a well-folded protein settles into its lowest-energy state. It’s another thing entirely when the system changes unpredictably or follows so many interacting rules that we can’t reduce it to a simple model. Machine learning methods, including AlphaFold, rely on training sets containing past observations, and those datasets, by definition, only contain static snapshots of what has already happened. If that kind of information about the future were available, well — we wouldn’t be calling it “prediction.” But the claim of prediction follows from the mistaken premise that problems like protein folding resemble problems like landing a lunar module on Mars. They are very different problems.

The curse of dynamism

It’s not hard to see how the “curse of dynamism” in AI shows up outside biology, most famously in autonomous navigation. Level 5 self-driving cars remain an elusive goal because the real world doesn’t conform to the practice or training data. There’s no universal equation that will tell a car whether the large shape ahead is a semi-truck making a wide right turn or a low-hanging bridge, but the distinction matters immensely if you plan to drive underneath it. Recognizing a child running into the street is obviously critical, and so too is recognizing that a child wearing a dinosaur costume on Halloween is still a child — not a bizarre, out-of-distribution object to be ignored.

Machine learning excels at pattern recognition. But what’s known as its “generalization” performance relies on it using a statistical model of the examples encountered while training — called “training a model.” The problem of outliers or “edge cases” has frustrated AI scientists and engineers (and now structural biologists) for decades, and it’s safe to say that we still don’t have a satisfactory answer to how to handle them in computational systems, whether billed as “AI” or not.

This problem of the outlier is, fortunately, relatively rare in other AI tasks, like self-driving (even so, it’s edge cases that should scare you if you ride in a self-driving car). But in biology, phenomena like intrinsically disordered regions or entire proteins are not really outliers — they’re part of the central domain of inquiry. It is estimated that 30–50% of all human proteins contain at least one intrinsically disordered region longer than 30 amino acids, and 10–25% of human proteins are entirely disordered —lacking any stable structure at all.

In other words, IDRs are not fringe cases. They represent a major, indispensable class of proteins — proteins that drive core biological functions like cell signaling, gene regulation, triggering immune responses, and many aspects of neurological function. Some of the most crucial molecular pathways in human biology are governed by disordered proteins, and when these proteins malfunction, they are often implicated in cancer, neurodegeneration, and other devastating diseases.

AlphaFold is a “revolution” in structural biology? Hardly.

Note: Erik J. Larson writes the Substack Colligo.

Here are the first four articles in this series by Larson:

AI in biology: AI meets intrinsically disordered proteins. Protein folding — the process by which a protein arrives at its functional shape — is one of the most complex unsolved problems in biology. The mystery of protein folding remains unsolved because, as is so often the case with AI narratives, the reality is much more complicated than the hype.

AI in biology: What difference did the rise of the machines make? AI works very well for proteins that lock into a single configuration, as many do. But intrinsically disordered ones don’t play by those rules. The resulting problems aren’t a temporary bug — they’re a basic limitation of training a machine learning model on a dataset where proteins always fold neatly.

AI in biology: So is this the end of the experiment? No. But a continuing challenge is that many of the most biologically important proteins don’t adopt a single stable structure. Their functions depend on structural fluidity. The core issue AI isn’t just missing data — AlphaFold’s entire approach is

and

AI in biology: The disease connection — when proteins go wrong Some of the most crucial proteins for human health—the ones we need to understand most urgently—are the very ones that AI has the hardest time modeling. The issue is not simply that AI struggles with intrinsically disordered regions — it is that the very premise of IDR behavior contradicts the way these models operate. This isn’t just a flaw — it’s a fundamental crack in the foundation of AI’s “revolutionary” claims.


Erik J. Larson

Fellow, Technology and Democracy Project
Erik J. Larson is a Fellow of the Technology & Democracy Project at Discovery Institute and author of The Myth of Artificial Intelligence (Harvard University Press, 2021). The book is a finalist for the Media Ecology Association Awards and has been nominated for the Robert K. Merton Book Award. He works on issues in computational technology and intelligence (AI). He is presently writing a book critiquing the overselling of AI. He earned his Ph.D. in Philosophy from The University of Texas at Austin in 2009. His dissertation was a hybrid that combined work in analytic philosophy, computer science, and linguistics and included faculty from all three departments. Larson writes for the Substack Colligo.

AI in Biology: The Future AI Didn’t Predict