^{News
October 11, 2024

3

Large Language Models (LLMs), Peer Review}

Business prof: Big data is making science worse, not better

_{News
October 11, 2024

3

Large Language Models (LLMs), Peer Review}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

It’s helping fuel the sources of bad science, says Pomona College’s Gary Smith in International Higher Education:

The scientific revolution has been fueled by using data to test theories, so it might be assumed that big data has now created a golden age for science. If anything, the opposite is true. Big data and powerful computers have inflamed the replication crisis that is undermining the credibility of scientists and scientific research…

The prevalence of completely fabricated papers is now increasing because LLMs [chatbots] generate articulate papers that generally must be read carefully to detect the deception, and reviewers have little incentive to read carefully. Even papers clearly written by LLMs can slip through the review process. One paper published in an Elsevier journal began, “Certainly, here is a possible introduction for your topic,” while another Elsevier paper included this: “I’m very sorry, but I don’t have access to real-time information or patient-specific data, as I am an AI language model.” It is, of course, more difficult to detect LLM-generated papers that have been stripped of such obvious markers.

“Big Data and the Assault on Science” (September 15 2024)

The replication crisis?

If a paper’s results are good science, other papers should be able to reproduce them. But in too many cases, that’s not happening:

To gauge the extent of the crisis, a team led by Brian Nosek tried to replicate 100 studies published in three premier psychology journals; 64 failed. Teams led by Colin Camerer reexamined 18 experimental economics papers published in two top economics journals and 21 experimental social science studies published in Nature and Science; 40 percent did not replicate.

Assault on Science

Computers don’t create the types of cheating — p-hacking and HARKing, for example — that go into low quality research, he says. But they do make these flawed practices “convenient and powerful.”

LLMs trained on enormous text databases can write remarkably articulate fake papers and do so faster than any human. Large databases also facilitate systematic p-hacking by providing many more ways in which data can be manipulated until a desired statistically significant result is obtained. Big data also creates an essentially unlimited number of ways to look for patterns—any patterns—until something statistically significant is found. In each case, the results are flawed and unlikely to replicate.

Assault on Science

Two reforms he suggests are that journals should not publish papers until authors disclose their non-confidential data and methods and that reviewers “who do careful, thorough appraisals” should be compensated. They are unlikely to be able, let alone willing, to do the required work for free.

Dr. Smith writes to say that his article has been so well received that it will be republished in University World News.

You may also wish to read: P-hacking: The perils of presidential election models History professor Alan Lichtman’s model uses 13 true/false questions reflecting likely voter interests. But some of them seem rather subjective. Lichtman’s Thirteen Keys to the White House prediction model for the presidency shows some evidence of p-hacking, as does Helmut Norpoth’s Primary Model. (Gary Smith)