^{Erik J. Larson
May 26, 2025

8

Artificial general intelligence (AGI), Philosophy of Mind}

Part 2: The Fiction of Generalizable AI: How to Game the System

_{Progress toward real generalization, by any substantive measure, is nil. Perhaps we should reexamine the very concept of the “I” in AI} _{Erik J. Larson
May 26, 2025

8

Artificial general intelligence (AGI), Philosophy of Mind}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

In Part 1, we looked at the assumption that a network can start blank and be transformed into an intelligent agent simply via enough data. Now let’s look at what follows.

One consequence of our embrace of machine learning—and our quasi-philosophical faith in “blank slate” AI — is that it becomes relatively easy for researchers to game the system. As AI analyst François Chollet points out, researchers can often succeed simply by “buying” the data and data features required to solve a particular problem.

This phenomenon is well-known on popular data science competition sites like Kaggle, where winning teams often find that their systems underperform in real-world settings, even though they have been finely tuned (overfit) simply to win the specific contest they entered.

Over the years, such observations have been wrapped into what’s often called the AI Effect: the idea that every time AI succeeds at something, the goalposts are simply moved. “That’s not real intelligence,” critics complain, “it’s just another trick.”

But proponents of the AI Effect get it exactly backwards.

Whereas an AI can display high skill at a particular task when engineered narrowly and specifically for that task, humans can only display high skill in a specific task because we have the capacity to efficiently acquire skills in general. Gary Kasparov couldn’t “hack” chess the way IBM’s Deep Blue team engineered a machine to beat him in 1997. He had to use general intelligence to acquire the specific expertise. The so-called AI Effect isn’t a complaint about AI’s successes. It is a statement of the radical difference between minds and machines.

Let’s move on now to two types of generalization that feature prominently in Chollet’s work, and that undergird much of the broader discussion about the true nature of the “I” in AI.

Two Types of Generalization

System-centric generalization refers to the ability of an AI system to handle situations it has not itself encountered before. This is the formal notion of generalization we see in statistical learning theory: an engineer trains a system on a set of NNN examples and then tests it on a separate holdout set not included in NNN.

The system is said to generalize well when it achieves a high score — such as a high F-measure or low classification error — on the unseen test set.

However, it’s important to recognize that system-centric generalization ignores any prior knowledge that the developer may have injected into the system. Architectural choices, feature engineering, data pre-processing — all these can “bake in” assumptions that help the system succeed, but which are not accounted for in the generalization measure itself.

By contrast, developer-aware generalization shifts the frame.

Here, neither the system nor the developer has previously encountered the problem. This form of generalization accounts for any prior knowledge injected by the developer — it treats the combined system-plus-developer preparation as the true agent facing a novel task.

In other words, developer-aware generalization asks:

Can this system perform well on genuinely new tasks that neither it nor its human creators anticipated or designed for?

If we treat the developer and the system as a single unified entity, developer-aware generalization reduces to system-centric generalization. But in practice, the distinction matters greatly: most current AI systems only generalize because of heavy developer foresight, not because they autonomously discover radically new capabilities.

Degrees of Generalization: Local, Broad, and Extreme

Perhaps the most important set of distinctions for anyone hoping to make sense of the current “AGI debate” around foundational models are the degrees of generalization themselves. Generalizing is a good thing, to be sure. But as with much in life, the devil is in the details.

Consider:

Local generalization handles data from a known distribution within a well-scoped set of tasks. Think of an image classifier that distinguishes previously unseen cat images from dog images. Chollet calls this “adaptation to known unknowns within a single task or well-defined set of tasks.” He also notes — correctly — that the field of AI has been absorbed in local generalization problems from its inception up to the present day.

We can call local generalization what it effectively is: robustness.

Broad generalization (flexibility) is the ability of a system to handle a broad category of tasks and environments without further human intervention. These are sometimes referred to as “unknown unknowns” problems. They encompass thorny challenges like Level 5 autonomous driving or Rosie the Robot making beds, dusting furniture, and cooking dinner for the family.

Chollet characterizes broad generalization as “adaptation to unknown unknowns across a broad category of related tasks.” Steve Wozniak’s famous coffee cup test — where a system must enter a random kitchen and successfully make a cup of coffee — would be a classic example.

It is important to be clear: we do not have systems like this today.

It is not even obvious that we are making substantive progress that escapes the gravitational pull of local generalization, despite enormous interest (and hype) around achieving it.

What’s especially dispiriting is that the level of hype surrounding an approaching “artificial general intelligence” or “AGI” effectively ignores serious discussion of these distinctions, treating intelligence as if it were a linear phenomenon that more scaling and a few extra gigabytes of data will solve.

The ARC challenge: Cutting through the hype

Chollet, to his credit, developed the Abstraction and Reasoning Corpus (ARC), a test designed to simulate developer-aware environments that assume only very basic cognitive priors — things like “objects don’t just disappear” and “counting is possible.” Reporting in full on the ARC results would take us too far afield for now, but in a recent conversation, Chollet noted that even the best current systems essentially fail on the latest versions. Progress toward real generalization, by any substantive measure, is nil.

We can say here that we should cut the hype and get to work — or perhaps we should go even deeper and reexamine the very concept of the “I” in AI, asking harder questions than the ballyhooed debates online ever seem willing to confront.

Here’s a zinger worth posing:

Is extreme generalization — the kind that natural intelligence exhibits — even possible in a machine? And if it is, would it actually enhance the human world in any meaningful way?

Would it augment human intelligence — or merely automate and flatten it?

These are value questions, not just technical ones. And my purpose here is to begin laying out the rich conceptual landscape of intelligence — to show that most of our contemporary dialogue about today’s and tomorrow’s AI remains profoundly chimeric.

Let’s get back to the main point.

Human-centric extreme generalization: What minds actually do

Finally, extreme generalization refers to open-ended systems that can handle entirely novel situations — situations that share only abstract relational structures with any previous task.

In other words: this is us.

Again, Chollet: “Adaptation to unknown unknowns across an unknown range of tasks and domains.”

Our only examples are biological: natural intelligence.

There is an even larger class I’ll just gloss here — something like “universality,” where the testing domain would be the class of all problems requiring intelligence anywhere in the universe. Naturally, humans would perform poorly at such a benchmark, being finite beings rather than omniscient gods.

Brain psychology mind soul and hope concept art, 3d illustration, surreal artwork, imagination painting, conceptual idea

Thus, the proper domain we should focus on is human-centric extreme generalization — the kind that enables us to, for example, play chess, envision the sun rather than the Earth at the center of the solar system, see how Newton’s equations needed modification to explain electromagnetism, discover special relativity, and predict the existence of dark matter.

We’re also quite good at doing the profoundly ordinary. We can fix a leaky faucet. Drive to the store. Order a book online. Chat with a neighbor about the approaching storm and whether the garbage pickup will be delayed. Read a room.

Human-centric extreme generalization is, quite literally, what we do when we use our minds.

It remains unclear where the true horizon of this kind of intelligence lies. But what is absolutely clear is this:

We do not — even remotely — have systems today that can move from statistical learning theory’s model of local generalization to the kind of open-ended, adaptive creativity that empowered us to build computers in the first place — and to announce the field of AI.

Toward a humanistic theory of intelligence

In my next essay, I want to dive deeper into how the framework of “known knowns,” “known unknowns,” and “unknown unknowns” is so fertile — not only for understanding the “I” in AI, but for understanding the broader class of problems that we human beings ourselves uniquely encounter.

We are, after all, moving toward a humanistic theory.

We must pass through the dominant cognitive ideas of our time — only, I hope, to arrive, as the poet hoped, back at where we began, and to know the place for the first time.

Here’s Part 1: The fiction of generalizable AI: A tale in two parts. Why intelligence isn’t a linear scale — and why true generalization remains unsolved. The big idea behind generative AI (mistakenly) assumes a network can start blank and be transformed into an intelligent agent simply via enough data. (Erik J. Larson)