LLMs & creativity: measuring divergent association

When looking at creative tasks, how do large language models compare to human performance?

There has been some recent machine learning research on the subject of creativity. I followed some of these leads into a fascinating area of psychology, on how “divergent” thinking can be used to look at the creative process. Is there some way of quantifying creativity? What are the methods used to do this? Can large language models (LLMs) be considered as creative?

I decided to have a look at one such quantitative task, as outlined in the paper “Naming unrelated words predicts creativity” by Olson and Nahas et al. The basis concept behind the study is the idea that creative people are able to generate more divergent ideas, and suggests that naming unrelated words and then measuring the semantic distance between them could serve as an objective measure of divergent thinking.

The study introduced a new measure of divergent thinking called the Divergent Association Task (DAT). This task asks participants to generate 10 nouns that are as different from each other as possible in all meanings and uses of the words. They then evaluated the results of this task by computing the semantic distance.

This led to the fairly intuitive idea of using LLMs to attempt the same task; the results are given in the PDF, in the repo (code etc. is available). This suggests further, related, questions: can we apply LLMs to other creative tasks, like, for example, the Alternate Uses Test? How do LLMs compare to human performance in these & other, related tasks?