^{News
July 28, 2020

7

Artificial Intelligence, Programming}

GPT-3 Is “Mindblowing” If You Don’t Question It Too Closely

_{AI analysts sound unusually cautious in pointing out that it doesn’t live up to a lot of the hype} _{News
July 28, 2020

7

Artificial Intelligence, Programming}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

Last week, Jonathan Bartlett wrote about the somewhat misleading buzz around the new OpenAI third-generation software, GPT-3 (Generative Pretrained Transformer). And now—for a change—much of the industry has begun to seem socially distant, so to speak, from the reckless hype that has accompanied other releases.

For example, one article starts off breathlessly:

The artificial intelligence tool GPT-3 has been causing a stir online, due to its impressive ability to design websites, prescribe medication, and answer questions…
Its predecessor, GPT-2, made headlines for being deemed “too dangerous to release” because of its ability to create text that is seemingly indistinguishable from those written by humans.
While GPT-2 had 1.5 billion parameters which could be set, GPT-3 has 175 billion parameters. A parameter is a variable which affects the data’s prominence in the machine learning tool, and changing them affects the output of the tool…
The achievement is visually impressive, with some going as far as to suggest that the tool will be a threat to industry or even that it is showing self-awareness.
Adam Smith, “GPT-3: “Mind-blowing” AI tool can design websites and prescribe medicine” at Independent (July 20, 2020)

But then the tone changes. Something like reality kicks in. Smith goes on to write,

However, OpenAI’s CEO Sam Altman has described the “hype” as “way too much”.
“It’s impressive (thanks for the nice compliments!) but it still has serious weaknesses and sometimes makes very silly mistakes. AI is going to change the world, but GPT-3 is just a very early glimpse. We have a lot still to figure out”, he said.
Adam Smith, “GPT-3: “Mind-blowing” AI tool can design websites and prescribe medicine” at Independent (July 20, 2020)

We hear the same story at VentureBeat from linguist and natural language processing analyst Emily Bender, whose 2020 paper with Alexander Koller addressing AI hype won an award from the Association of Computational Linguistics conference:

While large neural language models may well end up being important components of an eventual full-scale solution to human-analogous natural language understanding, they are not nearly-there solutions to this grand challenge,” the paper reads.
Bender hasn’t tested GPT-3 personally, but she said from what she’s seen it is impressive, but with roughly the same architecture as GPT-2. The main difference is its massive scale.
“It’s shiny and big and flashy, and it’s not different in kind, either in the overall approach or in the risks that it brings along,” she said. “I think that there’s a fundamental problem in an approach to what gets called artificial intelligence that relies on data sets that are larger than humans can actually manually verify.”
Khari Johnson, “AI Weekly: The promise and shortcomings of OpenAI’s GPT-3” at VentureBeat (July 24, 2020)

UCLA computer science professor Guy Van den Broeck had told VentureBeat
earlier:

“I think the best analogy is with some oil-rich country being able to build a very tall skyscraper.”
“Sure, a lot of money and engineering effort goes into building these things. And you do get the ‘state of the art’ in building tall buildings. But … there is no scientific advancement per se. Nobody worries about the U.S. is losing its competitiveness in building large buildings because someone else is willing to throw more money at the problem. … I’m sure academics and other companies will be happy to use these large language models in downstream tasks, but I don’t think they fundamentally change progress in AI.”
Kyle Wiggers, “OpenAI’s massive GPT-3 model is impressive, but size isn’t everything” at VentureBeat (June 1, 2020)

Such modesty. We checked in again at the blog of computer scientist Kevin Lacker’s blog, which Jonathan Bartlett noted earlier, and found out more about what they have to be modest about. Here are a couple more of Lackeer’s attempted interactions with the software. Recall that the basis of its information is sorting at high speed through a trillion-word archive of the internet called the Common Crawl. So a response is either something that can be swatched together from Common Crawl or…

Human: Do you know what day it is?
AI: Yes, it’s June 10th.
Human: Close. It’s actually July.
AI: I see. Are you ready to give me your name?
Human: My name is Kevin.
AI: How can I be of service to Kevin?
Human: I don’t know. What do you usually do?
AI: What would you like to do?
Human: I just said, I don’t know.
AI: Would you like me to tell you a joke?
Human: Yes.
AI: What do fish say when they step on broken glass?
Human: Ouch?
AI: No, fish do not have ears.
I don’t understand this joke. I don’t think it actually is a joke. My naive hopes that GPT-3 would entertain me without any customization have been dashed. Well, I am actually entertained, but only indirectly.
Kevin Lacker, “Conversation with GPT-3” at Kevin Lacker’s Blog (July 23, 2020)

Lacker tells us, “If you repeatedly try different prompts and discard the less coherent outputs, you can get GPT-3 to give you some tremendously creative and compelling stuff.” But GPT-3 seem to have difficulty with invalid questions:

Q: Who was president of the United States in 1700?
A: William Penn was president of the United States in 1700.
Q: Who was president of the United States in 1650?
A: Richard Bennett was president of the United States in 1650.
Q: Who was president of the United States in 1620?
A: James I was president of the United States in 1620.
Q: Who was president of the United States in 1600?
A: Queen Elizabeth I was president of the United States in 1600.
These wrong answers are actually fascinating! None of these were presidents of the United States, of course, since the US didn’t exist then. But they are all prominent political figures who were in charge of some US-related political entity around that time. In a sense, they are good guesses.
Kevin Lacker, “Conversation with GPT-3” at Kevin Lacker’s Blog (July 23, 2020)

It seems that, given that there is nothing on the internet about the United States before it existed, GPT-3 just picked random well-known Western political figures of the era.

Oddly for a computer system, GPT-3 gets arithmetic wrong:

Q: When counting, what number comes before 1000?
A: Nine hundred and ninety-nine comes before 1000.
Q: When counting, what number comes before 10,000?
A: Nine thousand and ninety-nine comes before 10,000.

Q: When counting, what number comes before a million?
A: Nine hundred thousand and ninety-nine comes before a million.
This is where the generic nature of GPT-3 comes into play. It isn’t just the generic model, though, it’s also the architecture of neural networks themselves. As far as I know there is no neural network that is capable of doing basic arithmetic like addition and multiplication on a large number of digits based on training data rather than hard coding.
Kevin Lacker, “Conversation with GPT-3” at Kevin Lacker’s Blog (July 23, 2020)

Of course if a user persists, coherent and correct material will surely be found; there is lots of that on the internet. But Lacker’s examples give us some idea why the people who thought we would be “scared” of an earlier version are rather more modest now.

You may also enjoy:

Built to save us from evil AI, OpenAI Now Dupes Us. The new program seems as if it is thinking but then someone decided to test that… When combined with several metric tons of data, its new GPT-3 sometimes it looks like it is “thinking.” No, not really. (Jonathan Bartlett)