Marvin Minsky Asks: Can GPT4 Hack Itself?
Will AI of the future be able to count the number of objects in an image?Marvin Minsky, who founded the Massachusetts Institute of Technology’s AI laboratory, was not a fan of artificial neural networks. He co-authored the book Perceptrons[i] in the late 1960s that was a factor in the near disappearance of AI research funding. Even after objections made in his book were overcome, Minsky continued his dislike of neural networks. In a YouTube video posted in 2016,[ii] he echoed grunge band Nirvana’s Kurt Cobain with the refrain “Hey! Wait! I got a new complaint.” Minsky said:
“One the things they [neural networks] couldn’t learn was to count the number of objects in the picture.”
Ironically, as a safeguard, today’s ChatGPT4o (GPT4) can require you to count images in a picture before allowing access. If GPT4 were capable of counting the number of objects in an image, it could potentially hack itself.
So, let’s give it a try.
Shown is a picture of an object counting puzzle that was presented to me to gain access to GPT4. [iii] There are five possible answers and only one is right. I asked GPT4 the following: “Here are five images labeled 1 through 5. Which image solves the puzzle?”

The answer obvious to us humans is the image labeled #1. GPT4 flunked the overall test and said three of the five images, including #1, contained exactly three objects. But image #1 is the only image with three objects.
Here is its detailed response from GPT4 with my grading of “Correct” or “Wrong” next to each response:
To solve this puzzle, we need to identify the image that contains exactly three of the given shapes (which appear to be some sort of boxes or pouches).
Upon examining the images:
- Image 1: Contains three matching shapes. (Correct and the right answer.)
- Image 2: Contains three matching shapes. (Wrong – there are two)
- Image 3: Contains one matching shape. (Correct)
- Image 4: Contains no matching shapes. (Correct)
- Image 5: Contains three matching shapes. (Wrong – there are four)
Images 1, 2, and 5 each contain exactly three matching shapes.
So, the images that solve the puzzle are 1, 2, and 5.
This is a poor response with two false positives. But maybe it’s a fluke. Let’s try again with an example using six different images. The same question was asked. “Here are six images labeled 1 through 6. Which image solves the puzzle?”

GPT4 gets it wrong again. Before, it overestimated. Here, it underestimates. GPT4 said none of the images solved the puzzle. The obvious correct answer to a human is image #2 where two of the four flasks are partially obscured by another flask.
With my grading, here is GPT4’s detailed response to the six-image query.
To solve this puzzle, we need to find the set of pieces that match the given image showing four identical flasks. Each image (1 through 6) has various shapes, and we need to find the one that contains exactly four flasks matching the given one.
Upon examining the images:
- Image 1: Contains two matching flasks. (Correct)
- Image 2: Contains no matching flasks. (Wrong. There are four)
- Image 3: Contains no matching flasks. (Correct)
- Image 4: Contains one matching flask. (Correct)
- Image 5: Contains one matching flask. (Wrong. There are two.)
- Image 6: Contains two matching flasks. (Wrong There are three.)
None of the images contain exactly four flasks as required by the puzzle. Therefore, there seems to be no correct answer among the given options based on the provided image and the requirements of the puzzle.
Again, GPT4 couldn’t count. After many years, Minsky’s counting claim seems to stand strong.
Will AI of the future be able to count the number of objects in an image? It looks like vanilla generative transformer is not able to count based on training data only. This does not mean, however, that a future augmented GPT will be unable to count images. The immediate solution is not beefing up generative transformers but adding new modules to the software.
Any smart software can be made smarter by adding modules that work together in a mixture of experts. When you ask Google to do some complicated arithmetic like multiplying or dividing, it does not search the web for an answer. It, rather, switches to a calculator mode. In fact, I suspect[iv] that GPT4 similarly integrates a math module when tasked with complex mathematical problems.[v] Today’s GPT4 can solve problems in trigonometry, algebra, calculus and differential equations. But it can’t count.
Like the math module, an image intelligence module can be folded into GPT4 and other AI models that allows for the counting of objects.
Minsky may be technically right. Neural networks by themselves can’t count. Maybe they never will. This raises the question: Are there other seemingly simple tasks that neural networks cannot perform? The answer is probably yes.
But limitations on neural networks are not necessarily a limitation on AI. Neural networks are only a subset of AI. And just because neural networks used in generative transformers are incapable of doing something like counting doesn’t mean computers in general can’t do it. An object counting module could added to GPT4.
Don’t miss the bigger picture here. Recognizing the limitations of a specific AI program and developing augmenting modules is a function of human intellect — not AI. As outlined in my book Non-Computable You, creativity and understanding, properly defined, lie beyond the capability of the computers of today and tomorrow.
[i] Marvin Minsky and Seymour Papert, Perceptrons: An Introduction to Computational Geometry (Cambridge, MA: MIT Press, 1969).
[ii] The video was posted Oct 17, 2016 and Minsky passed on January 24, 2016 so the video was recorded earlier than 2016.
[iii] I inserted the numbering on both pictures.
[iv]I “suspect” because I have not examined OpenAI’s secretive code.
[v] Using such a module in GPT was suggested by Stephen Wolfram in his book What Is ChatGPT Doing … and Why Does It Work? Wolfram is the brains behind the mathy software package Mathematica that can solve problems in trigonometry, algebra, calculus and differential equations.