AI vs. Artists?: Here’s What DALL-E 2 Just Can’t DoArtificial intelligence may be able to create generic images of particular items, but it can’t grasp and convey general concepts, says engineering prof Karl Stephan
(This article by engineering prof Karl D. Stephan originally appeared at Engineering Ethics Blog (August 14, 2022) under the title “AI Illustrator Still Looking for Work: The Shortcomings of DALL-E 2,” and is reprinted with permission.)
The field of artificial intelligence has made great strides in the last couple of decades, and any time a new AI breakthrough is announced, critics voice concerns that yet another field of human endeavor has fallen victim to automation and will disappear from the earth when machines replace the people who do it now.
Until recently, the occupation of art illustrator seemed reasonably safe from assault by AI innovations. Only a human, it seemed, can start with a set of verbal ordinary-language instructions and come up with a finished work of art that fulfills those instructions. And that was mostly true until AI folks began tackling that problem.
The AI research lab OpenAI has publicized its work in this area using a “transformer” type of program that has gone through two versions so far, DALL-E and now DALL-E 2. I’m sure the late surrealist artist Salvador Dali would be pleased at the honor of having a robot artist named after him, but the connection between his works and the productions of DALL-E 2 are perhaps closer than the researchers would like. In an article in IEEE Spectrum, journalist Eliza Strickland highlights the shortcomings of DALL-E 2’s productions and speculates on whether such systems will ever be generally useful.
As is the case with most such human-task-imitating systems, the first step the researchers took was to compile a large number of examples (650 million in the case of DALL-E 2) to train the software about what illustrations look like. The images and their accompanying descriptive texts were all from the Internet, which has its own biases, of course. The researchers have learned some lessons from previous fiascos with AI software that allowed random users to request sketchy or offensive products, so they have been very careful about pre-screening the training images and (in some cases) manually censoring what DALL-E 2 comes up with. They have also not released the program for general use yet, but have carefully selected users under controlled conditions. If you just let your imagination roam with the scenario of letting some randy teenage boys loose with a program that will make a picture of whatever they describe to it, you can see the potential for abuse.
For certain purposes, DALL-E 2 does fine. If you want a generic type of picture that would look good as a filler for a brochure about a meeting, and you just want to show some people in a corporate setting, DALL-E 2 can do that. But so can tons of free clip-art websites. If you want an image that conveys specific information, however — a diagram, say, or even text — DALL-E 2 tends to fall flat on its digital face. The Spectrum journalist was privileged to make a few text requests to DALL-E 2 for specific images. She asked for an image of “a technology journalist writing an article about a new AI system that can create remarkable and strange images.” She got back three photo-like pictures, but they were all of guys, and only one seemed to have anything to do with AI. Then she asked for “an illustration of the solar system, drawn to scale.” All three results had a sun-like thing somewhere, and planet-like things, and white circular lines on a black background showing the orbits, but the number of planets and what they looked like was pretty random. So technical illustrators (those who are left after software like Adobe Illustrator has made every man or woman his own illustrator) need not file for unemployment insurance right away. DALL-E 2 has a way to go yet.
There is a fundamental question lurking in the background of AI exploits like this. It can be phrased a number of ways, but it basically amounts to this: will AI software ever show something that amounts to human-like general intelligence? And believe it or not, your humble scribe, along with Gyula Klima, a philosopher then at Fordham University, recently published a paper addressing just that question, and we concluded that the answer was “No.”
As you might guess when a philosopher gets involved, the details are somewhat complicated. But we began with the notion that the intellect, which is a specific power of the human mind, relies on the use of concepts. In the limited space I have, I can best illustrate concepts with examples. The specific house I live in is a particular thing. There is only one house exactly like mine. I can remember it, I can form a mental image of it, and I can even imagine it with a different color of trim than it actually has. And software programs can do what amounts to these sorts of mental operations as well. In my mind, my house is a perception of a real, individual thing.
By contrast, take the concept of “house.” Not my house or your house, just “house.” Any mental image you have that is inspired by “house” is not identical to “house”—it’s only an example of it. The idea or concept denoted by the word “house” is not reducible to anything specific. The same goes for ideas such as freedom or conservatism. You can’t draw a picture of conservatism, but you can draw a picture of a particular conservative.
In our paper, we gave strong evidence in favor of the notion that because concepts cannot be reduced to representations of individual things, AI programs will never be able to use them. In the article, we used examples from an art-generating AI program that in some ways resembles DALL-E 2, in that it was trained on thousands of artworks and then made to generate artwork-like images. The results showed the same kind of fidelity to superficial details and total absence of underlying coherence that DALL-E 2’s productions showed. As one of the OpenAI researchers quoted in the Spectrum article noted, “DALL-E doesn’t know what science is . . . . [S]o it tries to make up something that’s visually similar without understanding the meaning.” That is, without having any concept of what it’s doing.
The big question is whether further research in AI will produce programs that truly understand concepts, and use that understanding to guide their production of art, text, or what have you. Klima and I think not, and you can look up our article to understand why. But we may be wrong, and only time and more AI research will tell.
Sources: Eliza Strickland’s article “DALL-E 2’s Failures Reveal the Limits of AI” appeared on pp. 5-7 of the August 2022 print issue of IEEE Spectrum. “‘Artificial intelligence and its natural limits” by Karl D. Stephan and Gyula Klima appeared in vol. 36, no. 1, pp. 9-18, of AI & Society in 2021.
You may also wish to read:
What can AI text-to-image generators do for artists? Developer David Holz discusses his Midjourney generator in terms of the communities that grow up around art generated from words or phrases. Most of the art seems trivial or meaningless, as we might expect from mixing images without creativity, but mass blended images might dispel an artist’s slump.
New OpenAI art program does NOT claim copyright for AI. As DALL-E 2, which generates blended images in response to key words, moves into the art world, a key question has just been settled. That’s good, says Robert J. Marks, author of Non-Computable You: “Images generated by AI should be no more copyrightable than Google search engine results.”