Mind Matters Natural and Artificial Intelligence News and Analysis
code on whiteboard
software code in whiteboard
Photo licensed via Adobe Stock

Does ChatGPT Pass the Creativity Test?

What does ChatGPT have to do in order to be considered creative?

What is creativity? Where does it come from? Why are some things humans do considered creative, while other things mundane? Can AI be creative?

To answer these questions, let’s come up with a definition.

Creativity at least means something new has been done. No work that copies what has come before is considered creative. 

A Creativity Criteria

Just doing something new is not enough either. If it were, then I can easily be creative by flipping a coin 100 times. That specific sequence of coin flips will only occur once in the entire history of humanity. But no one would say I was creative when I flipped a coin.

This means creativity has to generate a new insight.

However, these two criteria are not adequate, either. I could flip a coin often enough that by chance it generates a meaningful sequence. So, the final part of our definition is that creativity must efficiently generate a new insight.

With this brief definition in hand, we can evaluate whether AI is creative.

Recently, an article made headlines claiming ChatGPT (a popular AI) has surpassed most humans in a test for creativity, called the Torrance Tests of Creative Thinking.  An independent testing company rated ChatGPT’s responses using the following criteria.

  • “Fluency. The total number of interpretable, meaningful, and relevant ideas generated in response to the stimulus.
  • Flexibility. The number of different categories of relevant responses.
  • Originality. The statistical rarity of the responses.
  • Elaboration. The amount of detail in the responses.”

ChatGPT was rated in the top 1% for fluency and originality, meaning it has the ability to create a large number of new, statistically rare ideas. It was also placed in the top 3% for flexibility, which is the ability to generate variety of idea categories. The rating for elaboration was not given. ChatGPT was compared to a nationwide group of 2,700 college students who took the same test.

Seems pretty convincing, but have we been provided enough information to determine whether ChatGPT is creative?

According to the definition of creativity given above, the answer is no. While ChatGPT fulfills the “new” and “insightful” part of the definition, it is not clear whether ChatGPT does this efficiently. All we are told is that,

“The researchers submitted eight responses generated by ChatGPT, the application powered by the GPT-4 artificial intelligence engine.”

We are not told how many responses ChatGPT generated altogether. The researchers could have had ChatGPT generate many responses, and then select the eight most promising responses. Since ChatGPT basically amounts to a massive database of the internet, in the form of a neural network, it is unsurprising that ChatGPT can generate a large variety of responses, and then humans select the most coherent. In which case, the creativity is all reducible to humans: the humans who wrote the original internet content, and those who selected the best permutations of that content.

Without further information, all we can say is that ChatGPT is a good creativity tool, but the jury is still out on whether ChatGPT is creative itself.

As a side note, the testers used an old version of the Torrance tests from prior to 1984. The newer version removes the “flexibility” criterion and replaces it with two new criteria.

”Resistance to Premature Closure: The ability to keep thoughts open and delay the closure of ideas to make the most original ideas possible.

Abstractness of Titles: Ability to synthesize and organize processes of thinking to see the picture more deeply and richly.”

Gauging ChatGPT’s Creativity

In my opinion, it would be much more interesting to see how ChatGPT rates according to these criteria.  “Flexibility” is just a combination of the “fluency” and “originality” criteria, since a large number of statistically rare responses will also cover a large number of categories.  However, the new openness and abstractness criteria are much closer to true creativity, and would require that ChatGPT actually understand the test instead of spouting out a large volume of randomly generated sentences.

For comparison, I have run three experiments testing ChatGPT’s creativity, and all with a consistent outcome. While ChatGPT appears to create original content, the content becomes repetitive over a long enough interval.  Neither are the responses always coherent. Transcripts are linked in the following list of the experiments.

1. Invent a new word.

2. Tell me about climate change.

3. Tell me everything you know about popcorn.

So, according to the definition of creativity given at the beginning of the article, ChatGPT is not creative:

1. Efficient: the repetition means many responses must be generated to get a brand-new response

2. New: the responses are obviously restricted to the training data and do not venture outside

3. Insightful: responses are not always coherent showing ChatGPT doesn’t really understand, but just gets lucky

Eric Holloway

Senior Fellow, Walter Bradley Center for Natural & Artificial Intelligence
Eric Holloway is a Senior Fellow with the Walter Bradley Center for Natural & Artificial Intelligence, and holds a PhD in Electrical & Computer Engineering from Baylor University. A Captain in the United States Air Force, he served in the US and Afghanistan. He is the co-editor of Naturalism and Its Alternatives in Scientific Methodologies.

Does ChatGPT Pass the Creativity Test?