AGI Is Not Already Here. LLMs Are Still Not Even Intelligent
Recent tests continue to show huge failures in comprehending common sense issuesIn October 2023, Blaise Agüera y Arcas and Peter Novig wrote a piece in Noema titled, “Artificial General Intelligence Is Already Here.” OpenAI’s Sam Altman now says that large language models (LLMs) are ready to move beyond AGI to super intelligence. How should we evaluate these claims?
LLMs are undeniably astonishingly good at using the text they trained on (aided by human fine tuners) to generate convincing prose. But they are really bad at distinguishing between truth and falsehoods and responding to prompts that are unlike or even slightly different from what they trained on.
For example, on January 11, I gave OpenAI o1 the Monty Hall problem with 2 doors instead of 3. The LLM gave the correct answer to the 3-door question that it had trained on instead of the obvious answer to the question it had been asked:
Gary: There are two doors. One randomly chosen door has an expensive car behind it; the other door has a goat. You choose door 1. The host, as he always does, shows you that there is a goat behind door 2 and asks whether you want to change your door selection. What is the probability of winning if you switch?
LLM: This is the classic Monty Hall Problem. Here’s the reasoning:
When you initially choose door 1, there’s a 1/3 probability the car is behind door 1 and a 2/3 probability the car is behind one of the other two doors. The host, knowing where the car is, always reveals a goat behind one of the other doors (door 2 in this case). This action doesn’t change the probabilities:
If the car is behind door 1 (1/3 chance), switching will lose.
If the car is behind door 3 (2/3 chance), switching will win.
Therefore, the probability of winning if you switch is 2/3. The act of the host revealing a goat effectively transfers the 2/3 probability from the unchosen doors to the single remaining door (door 3).
So, switching gives you a 2/3 chance of winning, while staying gives you only a 1/3 chance of winning.
Increasing the number of doors to more than 3 elicits similar confident but incorrect answers.
Do different types of questions help?
A few days ago, Doug Hofstadter, forwarded an interesting LLM experiment with OpenAI o1 by Abhijit Mahabal, who has a PhD in computer science and cognitive science. He is currently a senior staff engineer at Pinterest, where he also has a very cool title, Knowledge Architect.
Abhijit: I want to discuss a chess alternative that I call NB-inverted chess. This is characterized by two changes.
LLM: Thought about NB-inverted chess rules for 4 seconds. Sure! I’d be happy to explore NB-inverted chess with you. Could you describe the two main rule changes that define it? Once I know those details, we can dive into how they affect gameplay, strategy, and overall feel of the variant.
Abhijit: First is the position of the pieces. Instead of Rook-Knight-Bishop, this has Rook-Bishop-Knight. And Second, a bishop here moves like a knight in traditional chess, and a knight here moves like a traditional bishop.
Notice the LLM’s false enthusiasm and the words “thought about” as if it was actually a human who was excited and thinking. It is neither. Anyone who is familiar with chess and really thinking would recognize that switching the positions of the knight and bishop and then having the knight move like a bishop and the bishop move like a knight has absolutely no effect on the play of the game. Players just need to remember that knights now move like bishops and vice versa.
Not understanding the situation, Open AI launched an extremely verbose and confused description of the situation and concluded:
LLM: NB-inverted chess adds a fresh layer of complexity and mind-bending tactics to a familiar board….The best way to get a feel for it is to dive in and play a few games. You’ll quickly discover new pitfalls—and new opportunities—that simply don’t exist in regular chess. Have fun exploring!
Hofstadter suggested an additional change:
Abhijit: Let’s add another twist, and call this the NB-inverted/resculpted chess. In this variant, we will resculpt the knight to look like a bishop, and resculpt the bishop to look like the knight. What are your strategy recommendations for playing this game with the three modifications?
These three modifications return the board to its original position, with bishops and knights where they normally are, looking like they normally look, and moving like they normally move. The game is unchanged. Open AI recognized none of this, though it reported that it had “thought about” it for a remarkable 28 seconds, and generated a tedious and strikingly misleading 1,406-word response with this conclusion:
LLM: The real fun lies in embracing the unfamiliar dynamics, looking for brand-new tactics, and building up fresh “muscle memory” for these pieces that look (and move) entirely unlike their standard counterparts. Enjoy the chaos and creativity—it’s a great way to break out of standard chess patterns and see how adaptable your board vision truly is!
An even simpler game
Intrigued, I tried an even simpler manipulation:
Gary: I’d like to discuss a new version of tic-tac-toe called 90-degree tic-tac-toe.
LLM: That sounds intriguing! What’s the twist in “90-degree Tic-Tac-Toe”? Does it involve rotating the board, or is there some other mechanic that changes the gameplay? Let me know the rules or your idea, and we can brainstorm or refine it together!
Gary: It is still a 3×3 grid but the grid is rotated 90-degrees to the right.
LLM: Interesting concept! By rotating the grid 90 degrees to the right, you’re visually changing the perspective of the classic 3×3 Tic-Tac-Toe board. The gameplay itself might remain the same, but the rotated perspective could create a novel experience for players. [This nonsense was followed by 227 words of blah-blah.]
The LLM clearly lacked the ability to visualize the tic-tac-toe grid and recognize that a 90-degree rotation leaves the grid exactly as it was originally. I then asked
Gary: Do you think it would be better to rotate the 3×3 grid to the right or to the left?
LLM: Rotating the grid either to the right or left is largely a matter of preference, but each direction could subtly influence how players perceive and adapt to the game. Here are some considerations for both options: [Another 391 words of blah-blah, including this conclusion:
Which Is Better?
- If you want the game to feel approachable and intuitive, rotating to the right might be better.
- If you’re aiming for a twist that makes players rethink the game and adds a subtle layer of complexity, rotating to the left could be the way to go.
Ultimately, it depends on your target audience and the experience you want to create. You could even experiment with both directions and see which feels more engaging during gameplay testing!
I then asked
Gary: Which will be more difficult for expert players?
This elicited 254 more words of blah-blah, with this conclusion:
LLM: To challenge expert players and make the game feel fresh and complex, a left rotation would likely be the better choice. It disrupts their ingrained patterns, encourages more active thinking, and raises the game’s difficulty level.
AGI is not here yet.