^{News
November 30, 2018

3

Machine Learning}

Quantity vs Quality: Can AI Help Scientists Produce Better Papers?

_{What happens when scientists simply can't read all their peers' papers and still find time for original research?} _{News
November 30, 2018

3

Machine Learning}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

Who will read these papers awaiting peer review if they are published?/Niklas Bildhauer, Wikimedia

Quantity is definitely a solved problem. STM, the “voice of scholarly publishing” estimated in 2015 that roughly 2.5 million science papers are published each year. Some are, admittedly, in predatory or fake journals. But over 2800 journals are assumed to be genuine. From all this, we can deduce that most scientists have not read most of the literature in their field, though they probably read immediately relevant or ground-breaking findings.

But the question has arisen whether, in some cases, scientists have even read papers in which they are listed as authors. A report in Nature (September 2018) revealed that “Thousands of scientists publish a paper every five days” or 72 papers a year:

The sensible reaction is of the absurd. At that prodigious rate, the scientists who authored more than 72 papers a year are unlikely even to have read all of them. Let alone contributed directly to the work inside, writing a few words or providing some key idea. Let alone doing some meaningful science in their construction. Mark Humphries, “Handing Science Over to the Machines” at Medium (The Spike)

Of course, prominent scientists may be listed as authors merely to draw attention to their students’ work. Others are techies whose main concern is whether the instruments generated meaningful data. Maybe the only person who truly read a given published paper was the proofreader, who was looking for small errors, not the big picture.

When we add all the published papers together, the picture is too big for any one person to see.

Can AI help? Humphries points out that most papers are “archival knowledge,” not intended to be pored over in detail unless needed. In his own area, neuroscience, researchers have already used machine learning to build tree diagrams to help neuroscientists focus on “need to read” vs. “good to read”:

Building a machine to tell us what we don’t know is exactly what Jessica and Bradley Voytek did. Scraping 3.5 million abstracts from PubMed, and linking them by key-words for brain regions, disorders, and cognitive functions, they built a model of neuroscientific knowledge. This model naturally has a hierarchy: “cortex”, “thalamus”, and “striatum” are all children of “brain”, for example. Which opened up a simple but effective hypothesis generator: find two concepts that share a parent, but have not been linked together in the existing literature. That pair of concepts are then candidates for linking together. Mark Humphries, “ Handing Science Over to the Machines” at Medium (The Spike)

The machine learning program won’t generate a groundbreaking idea or do the research but it can sort through trees much more quickly than a researcher would. The patterns that emerge might generate testable ideas. The information load can stay manageable despite its size, even if no one person knows the whole, rapidly growing picture.

See also: Fake reviews, sure; but fake science journals? (MercatorNet)