^{News
July 9, 2024

6

Artificial Intelligence, Language}

Written by Human or Bot? Researchers Have a New Way to Tell

_{After the use of chatbots surged in 2023, some common words appeared much more often in the abstracts of journal papers, hinting at AI origin} _{News
July 9, 2024

6

Artificial Intelligence, Language}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

Lots of stuff we read sounds like it was written by a bot — a chatbot or Large Language Model (LLM) like ChatGPT. But, according to a recent article at Wired, even the experts have a hard time being sure.

Image Credit: FantasyLand86 - Adobe Stock

After all, some people have always written like bots. Long before bots were invented. Others — because they work in bureaucracies — have bottiness thrust upon them.

But then there are also students who trust the internet to write the papers that they should be learning to write themselves. That is more of a problem. The risk is not that they will grow up to write like bots but that they won’t be able to write anything because they’ve never done it.

A sudden surge in familiar words?

A research group hopes it has come up with a new detection method, at least for academic writing. As Ars Technica, editor Kyle Orland tells us, “excess words” started appearing much more often post-LLMs (i.e., 2023 and 2024).” Analyzing 14 million abstracts from science papers published on PubMed, 2010 through 2024, they found,

… a number of words that were extremely uncommon in these scientific abstracts before 2023 that suddenly surged in popularity after LLMs were introduced. The word “delves,” for instance, shows up in 25 times as many 2024 papers as the pre-LLM trend would expect; words like “showcasing” and “underscores” increased in usage by nine times as well. Other previously common words became notably more common in post-LLM abstracts: the frequency of “potential” increased 4.1 percentage points; “findings” by 2.7 percentage points; and “crucial” by 2.6 percentage points, for instance.
Kyle Orland, “The telltale words that could identify generative AI text,” Ars Technica, July 7, 2024

Of course, as Orland points out, some words — “ebola,” for example — can surge rapidly due to world events. But words like “across, additionally, comprehensive, crucial, enhancing, exhibited, insights, notably, particularly, within” surged post-LLM, without any obvious new need arising.

Based on their findings, the four researchers, from the University of Tubingen and Northwestern University, think that at least 10 percent of the post-2022 papers were written with some AI help.

Should the bot be included as one of the authors?

The researchers explain it like this in the Abstract to their paper, posted at ArXiv,

Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.
Kobak, Dmitry, Rita Gonzalez-Marquez, EmHoke-Agnes Horvat and Jan Lause. “Delving into ChatGPT usage in academic writing through excess vocabulary.” ArXiv abs/2406.07016 (2024): n. pag.

Why it matters

Digital chatbots on smartphones access data and information in online networks. Robot Applications and Global Connectivity AI Artificial Intelligence innovation and technology

It matters because, as the researchers say somewhat bluntly in their paper, “LLMs are infamous for making up references, providing inaccurate summaries, and making false claims that sound authoritative and convincing.”

Kobak et al. wouldn’t need to convince business prof Gary Smith of that. He has recounted here at Mind Matters News his experiences with chatbots that chatter on about how the Soviet Union sent bears into space (no) or that it’s safe to walk downstairs backwards with your eyes closed (um…). They routinely flunked Grade Nine math. The fact that they overuse some words in research papers is not as serious as these types of deficiencies, which academics might encounter if they are thinking of relying on the bots to speed up their work.

Trying our hand at detection

Tech advice site Techwiser reviewed a number of programs that try to determine, based on probability, whether text is AI- generated. Their favorite is CopyLeaks. Ravi Teja writes, “Of all the tools I tried, Copyleaks offered the most accurate results in finding AI-generated content and is also free to use. There is a website and a Chrome extension for faster access. They say it works best for detecting ChatGPT and Gemini-generated text. But in our testing, it worked just as well with other language models like Claude.” (April 11, 2024)

You can test it yourself here.

We tried it at Mind Matters News on three excerpts: Some lines from a personal letter; some copy from one of our articles that we know was not AI-generated, and some AI-generated stuff from Squibler.io The detector got all three right immediately: 2 human, 1 bot.

In fairness, that was probably an easy test because the AI copy was absolute socko boffo boilerplate. Things are going to be more complex with hybrid texts — part bot/part human — from fields where a lot of boilerplate is required so only a probability score/percentage can be assigned.

One can imagine the yell of dismay when the policy paper, over which a hapless clerk in a real-life equivalent of Charles Dickens famous Office of Circumlocution has toiled all night, is pronounced to be — probably the work of a bot. But, and this is what saves him from a breakdown, it’s also certain to be safe for work. He can definitely live with that.