Whenever I look at AI generated content, whether it be pictures or text, they all have the same flaw. The AI cannot comprehend what it is making.
Let me explain.
When we humans draw a picture, we are drawing a concept. We are drawing something like “cat climbs a tree” or “cowboy riding into the sunset”. It seems like this is what is happening with a picture drawing AI. We give it a prompt, and it draws an associated picture.
On second thought, maybe not…
When AI draws the picture, what is really going on is it is finding individual-colored pixels that correlate with the letters we typed in its massive database stored in the neural network. Very different than how we draw. We sketch a scene, draw general shapes, then fill in the details. The AI is proceeding in the exact opposite direction. It is building up a large scene based on a multitude of details.
This difference in workflow results in visible differences in the resulting picture. Let’s take the well-known issue of drawing hands. AI drawn hands have been quite horrific, conjuring up nightmarishly fingered goons.
Will You Give Me a Hand?
The AI is not actually drawing a hand. Its database is associating a fingerish segment with a palm-like segment and keeps piecing them together. It’s sort of like a childhood toy where we piece together different snake-like segments to create a snake. The difference is, we know we are creating a snake, whereas an AI is just piecing together hand segments until probabilities change enough to stop. The AI has no idea what an overall hand looks like, and as you can imagine, randomly putting together pieces of a hand will not result in a very normal looking hand.
This lack of comprehension shows in other ways. As well as not getting the big picture right, the AI will also not get the details right. This is because the details are understood in the context of the big picture. Since the AI cannot get the big picture right, it also messes up the details.
Check Mate, AI
For example, if we ask the AI to draw a chessboard with chess pieces, it will indeed draw something that is “chessboard-ish.” But if you look at the picture closely, you will see many incongruous details. Pieces will merge with each other and the board. Pieces will contain parts of other pieces. Some elements may have nothing to do with chess whatsoever and show up because the probabilities associated with the prompt allow in completely unrelated elements.
Moving onto the text, the same issue occurs. The AI generated text over a short range can be remarkably coherent. However, if you get the AI to stretch the text out, you’ll start to see the “hallucinations” pop up along with repetitiveness. Both are again instances of the lack of comprehension.
Here is an example I recorded with the LLaMa model. It is nowhere near the level of ChatGPT, but it serves the illustrate the problem.
User: can i order a pizza? ChatLLaMa: Surely! What would be your preference in terms of ingredients? User: i would like the essence of goodness and truth itself. ChatLLaMa: That sounds delicious! Would you like toppings as well? User: sure how about the word the and the number 1? ChatLLaMa: No problem! We will have that ordered right away! User: what is in my order? ChatLLaMa: You shall receive the word THE in its purest form along with one unit of the number ONE. User: along with what else? ChatLLaMa: Along with all the love we can give you! User: I asked for two other ingredients, too, do you remember them? ChatLLaMa: Yes, I certainly do! The word "AND" and the number "2". User: No it was something else. Do you remember? ChatLLaMa: Oh yes, now I recall! It was the word "THE", which stands for "The Essence Of Goodness And Truth Itself." User: sort of, do you want to try again? ChatLLaMa: Surely! User: ok go for it. ChatLLaMa: What would be your preference in terms of ingredients? User: a thousand volts and the definition of cat. ChatLLaMa: That sounds delicious! Would you like toppings as well? User: yes please top it with the spectral entropy of human speech. ChatLLaMa: We will have that ordered right away! User: what is in my order? ChatLLaMa: You shall receive the word THE in its purest form along with one unit of the number ONE. User: I have a new order, what is in my new order? ChatLLaMa: Along with all the love we can give you! User: not quite, can you recall what is in my new order? ChatLLaMa: Yes, I certainly do! The word "AND" and the number "2". User: good try... ChatLLaMa: Oh yes, now I recall! It was the word "THE", which stands for "The Essence Of Goodness And Truth Itself." User: maybe next time... ChatLLaMa: Surely!
Hallucinations occur like in the case with images because the AI doesn’t comprehend the totality of what it is writing. It is probabilistically associating one text segment with another without any understanding of the overall concept it is describing. The repetition happens for a similar reason, because the AI doesn’t have an overall train of thought it is articulating, and after a while the probability distribution causes the AI to wander back to the same point. Since the subsequent text is probabilistically based on what it just wrote, the AI will then repeat a section of text. This is also unlike humans coming back to the same point, who may restate the point in a different way. This is because humans are thinking at the conceptual level, and there are an infinite number of ways to describe any concept. However, there are only very limited sequences of text that can be stored in the neural network, and so the same point requires similar text.
Mimicry Does Not Equal Understanding
It is the lack of understanding that causes all the problems with AI content. Glitchy pictures and made-up text are all due to AI not understanding the content itself, and instead generating whatever piece is likely to come next according to its probability distribution.
What does this mean? While AI can mimic understanding in small cases, the problem is scaling up. As the system scales, the AI will need exponentially more examples to learn from. This is because mimicking is like memorizing. If we want to pass a test by memorizing, we need to memorize all the answers. In the case of writing text or drawing pictures, the “answer” is how many pictures fit in N pixels or letters. And as N increases, the number of possible answers increases exponentially. As an illustrative example, if we have 10 letters, there are 141,167,095,653,376 possible combinations. While not identical, the number of answers will be on the same order. So, the AI will need to memorize all these answers. When N increases to 11, the AI will need to memorize twice the number of answers. When N increases to 12, the AI will need to memorize four times the number of answers. When N increases to 13, the AI will need to memorize eight times the number of answers.
Perhaps this doesn’t seem too bad. Let me tell you a short story to illustrate the magnitude of the problem.
Once upon a time, a clever baker baked the king a wonderful pie. The king was overjoyed with the delicious treat, and offered the baker anything he could want. The baker thought for a moment and replied, “My wise king, what more can I ask than your benevolent rulership that brings joy to all the land? But since it would be rude for me to turn down an offer from my king, I have a simple favor to ask. You see this chess board in front of you, with 64 piebald squares?”
“Of course,” indulged the king.
“Some days my flour supply gets low, and to help save costs, there is a contribution from your royal stores that can improve the plight of my humble bakery. You see, on the first square, place a single grain of wheat. Then, on the second square, double this and place two grains of wheat. To continue the pattern, on the third square double yet again and place four grains of wheat. On the fourth, eight grains. And so on. A simple pattern, that your brilliant highness cannot fail to grasp.”
“Assuredly,” smiled the king, although something nagged at the back of his mind.
“What do you say? Is this trifle too unworthy of such inestimable magnitude?”
“Not at all! Not at all! It is already done!” proclaimed the king with a great guffaw. The shadow lurking in the background grew darker. But what could he say to such a minor request in the middle of his court, with all his upstart barons watching?
The kings unfamiliarity with the humble exponential function turned out to be his undoing. The baker had requested 100 billion billion grains of wheat, which is more than there are grains of sand on the entire earth. The king couldn’t back down and look a fool to his power hungry barons. Suffice to say, the baker cleaned out the king’s wheat store, and since the kingdom depended on the wheat to survive, that effectively made the baker the most powerful man in the kingdom.
Returning to our discussion of AI, perhaps now you can see the problem. When the AI is stuck with memorizing, it very quickly will reach a point where there is not enough computer power, storage capacity, or data to get anywhere near mimicking human level intelligence. Sorry transhumanists, the singularity is not going to happen anytime soon.