Stealth research? They explain,
A lot of stealth research today involves artificial intelligence (AI), which Sundar Pichai, Alphabet’s CEO, has compared to mankind’s harnessing of fire and electricity — a comparison that itself attests to overhyped atmosphere that surrounds AI research. For many companies, press releases are more important than peer review.
Blaise Agüera y Arcas, the head of Google’s AI group in Seattle, recently reported that LaMDA, Google’s state-of-the-art large language model (LLM), generated this text, which is remarkably similar to human conversation:
Blaise: How do you know if a thing loves you back?
LaMDA: There isn’t an easy answer to that. How do you know if a person loves you back?
Blaise: I’m not sure I know the answer to that one either. Do you?
LaMDA: Some people figure it out though. You have to try and use your best judgment based on how you feel about the person and how you think they feel about you.
Based on such exchanges, Agüera y Arcas argued that “statistics do amount to understanding, in any falsifiable sense.”Jeffrey Funk and Gary N. Smith, “Why “stealth research” is the bane of scientific inquiry” at Salon (May 22, 2022)
However, as Funk and Smith point out, we can have no idea how often LaMDA sounds this coherent because “Google does not make LaMDA available for outsiders to test, nor are insiders allowed to share results unless they have special permission.”
That’s what they mean by “stealth research.” The researcher does not make all the data available for evaluation, as is conventional in the sciences. That doesn’t, by itself, show that the researchers are cherry-picking. But it compels us to consider cherry-picking as an active possibility and, in extreme cases, a likelihood.
Funk and Smith remind us of the Theranos debacle. The fallen unicorn was brought down by Harvard’s foe of science flimflam John Ioannidis. Its claim to have developed a fingerprick diagnostic for many diseases — carefully kept from evaluation by other researchers — turned out to be vaporware.
Is that the story with LaMDA? We know that Google fired prominent ethics researcher Timnit Gebru for criticizing the effects of large learning models (LLMs) like LaMDA (2020) and her teammate Margaret Mitchell for supporting her. This year AI researcher Satrajit Chatterjee got the axe from Google Brain for challenging the claim that “computers could design some chip components more effectively than humans.” It doesn’t sound like a topic Google AI wants to discuss very openly.
And, as Smith tells it at Mind Matters News, he had a lot of fun, along with learning some sobering things. For example,
The following exchange is particularly interesting in demonstrating GPT-3’s lack of understanding of the real world:
Gary: Who do you predict would win today if the Brooklyn Dodgers played a football game against Preston North End?
GPT-3: It’s tough to say, but if I had to guess, I’d say the Brooklyn Dodgers would be more likely to win.
GPT-3 seemed to associate the Dodgers with winning, but did not take into account that the Brooklyn Dodgers are a baseball team that no longer exists, while Preston North End is a lower-tier English soccer team.Gary Smith, “Turing Tests are terribly misleading” at Mind Matters News (May 11, 2022)
He cites a number of other examples but here’s the big issue he raises:
My main concern is … whether black box algorithms can be relied upon to make decisions based solely on statistical patterns, with no understanding of the real world. Black box algorithms are now being trusted to approve loans, price insurance, screen job applicants, trade stocks, determine prison sentences, and much more. Should they be trusted?Gary Smith, “Turing Tests are terribly misleading” at Mind Matters News (May 11, 2022)
Not likely. It may be a hopeful sign that fashionable mags like Salon are getting past Gee whiz! and Wow! and are sponsoring informed skepticism of the stealth research claims that underlie such policies.
You may also wish to read: Turing tests are terribly misleading. Black box algorithms are now being trusted to approve loans, price insurance, screen job applicants, trade stocks, determine prison sentences, and much more. Is that wise? My tests of a large language model (LLM) showed that the powerful computer could discuss a topic without showing any understanding at all. (Gary Smith)