Mind Matters News and Analysis on Natural and Artificial Intelligence

Researchers: Deep Learning Vision Is Very Different from Human Vision

Mistaking a teapot shape for a golf ball, due to surface features, is one striking example from a recent open-access paper
Share
Facebook
Twitter
googleplus Google+
arroba Email
An artificial intelligence network thought there was a 0.41 percent chance this object is a teapot. Its first choice was a golf ball./Nicholas Baker, PLOS Computational Biology

Recent experiments show the “severe limitations” of deep learning networks’ vision, limitations we wouldn’t usually think of, says a team of psychologists.  From ScienceDaily,

In the first experiment, the psychologists showed one of the best deep learning networks, called VGG-19, color images of animals and objects. The images had been altered. For example, the surface of a golf ball was displayed on a teapot; zebra stripes were placed on a camel; and the pattern of a blue and red argyle sock was shown on an elephant. VGG-19 ranked its top choices and chose the correct item as its first choice for only five of 40 objects…

VGG-19 thought there was a 0 percent chance that the elephant was an elephant and only a 0.41 percent chance the teapot was a teapot. Its first choice for the teapot was a golf ball, which shows that the artificial intelligence network looks at the texture of an object more so than its shape, said lead author Nicholas Baker, a UCLA psychology graduate student.

“It’s absolutely reasonable for the golf ball to come up, but alarming that the teapot doesn’t come up anywhere among the choices,” Kellman said. “It’s not picking up shape.” Paper. (open access)Nicholas Baker, Hongjing Lu, Gennady Erlikhman, Philip J. Kellman., “Deep convolutional networks do not classify based on global object shape” at PLOS Computational Biology, 2018; 14 (12): e1006613 DOI: 10.1371/journal.pcbi.1006613

One concern raised by Hongjing Lu is “We can fool these artificial systems pretty easily.” The object was not to “fool” the networks but to determine whether they identified objects in the same way as humans do. The networks did “a poor job of identifying such items as a butterfly, an airplane and a banana,” according to the researchers. The explanation they propose is that “Humans see the entire object, while the artificial intelligence networks identify fragments of the object.”

Of course, if someone did intend to fool a deep learning network, it might be deceived by images that would not fool a human viewer.

Author summary: “Deep learning” systems – specifically, deep convolutional neural networks (DCNNs) – have recently achieved near human levels of performance in object recognition tasks. It has been suggested that the processing in these systems may model or explain object perception abilities in biological vision. For humans, shape is the most important cue for recognizing objects. We tested whether deep convolutional neural networks trained to recognize objects make use of object shape. Our findings indicate that other cues, such as surface texture, play a larger role in deep network classification than in human recognition. Most crucially, we show that deep learning systems have no sensitivity to the overall shape of an object. Whereas deep learning systems can access some local shape features, such as local orientation relations, they are not sensitive to the arrangement of these edge features or global shape in general, and they do not appear to distinguish bounding contours of objects from other edge information. These findings show a crucial divergence between artificial visual systems and biological visual processes. More.

See also: Can an algorithm be racist?

and

AI Hype Top Ten 2018: Help, not hype, from a computer science prof