Mind Matters Natural and Artificial Intelligence News and Analysis
Female chromosomes, medical artwork

AI Can Fight COVID by Detecting Changes in Virus “Language”

One research team is experimenting with natural language processors (NLP), used to analyze human speech, to detect similar virus mutations

One strategy in the fight against COVID-19 relies on the curious fact that genetics is actually a language. Genome sequencer Francis Collins has even called it The Language of God.

More practically, AI programs that act as natural language processors can help catch deadly coronavirus mutations. The same strategies the AIs use for reading sentences can be used to read the virus’s attempts to escape destruction by mutations:

Galileo once observed that nature is written in math. Biology might be written in words. Natural-language processing (NLP) algorithms are now able to generate protein sequences and predict virus mutations, including key changes that help the coronavirus evade the immune system.

The key insight making this possible is that many properties of biological systems can be interpreted in terms of words and sentences.

Will Douglas Heaven, “AIs that read sentences are now catching coronavirus mutations” at MIT Technology Review

NLP can help predict “viral immune escape” — where the virus’s mutations prevent it from being destroyed by the human immune system. A research team led by computational biologist Bonnie Berger sees the virus’s ability to infect a host as “grammatical correctness” and its ability to stay there, despite efforts to get rid of it, as “semantic meaning.”

Medicine  face mask with name of coronavirus covid-2019 on the globus. World epidemic of coronavirus.

The comparison with the grammar of human speech might go something like this: Let’s suppose you want to say “I would like a glass of water.” You need a subject (I) , a verb (would like), and an object (a glass of water). Without grammar to organize your communications, no one will know what you want and you won’t get your glass of water. Similarly, the virus must use certain genetic sequences to infect a host; otherwise, it doesn’t get in.

Semantic meaning is the actual meaning of a word. Words in a sentence can be substituted (mutations), which makes the sentence mean something different:

Changing just one word in the sentence “wine growers revel in good season” can produce the sentences “wine growers revel in strong season” or “wine growers revel in flu season.” Both share the same grammatical structure but one has changed its meaning more than the other. The tool looks for similar changes in a virus, flagging those that change its meaning most.

Will Douglas Heaven, “AIs that read sentences are now catching coronavirus mutations” at MIT Technology Review

The mutations that change the “meaning” of the viral genome most might be the hardest for a given configuration of the immune system to detect and deal with. That’s where NLP might help, over large stretches of data.

The paper is open access.

It is conventional for geneticists to speak in terms of the genome as a language, for example,

The DNA molecule carries information in the form of a sequence of four nucleotide bases, adenine (A), cytosine (C), guanine (G) and thymine (T), which can be thought of as the letters of the genomic language. Short sequences of the letters form ‘DNA words’ that determine when and where proteins are made in the body.

Karolinska Institutet, “Understanding the Language of the Genome” at Genomics Research (May 8, 2017)

One genetics site explains how the sentences in our genomes work:

But words alone aren’t enough to convey meaning. You need to string words together to form sentences. In the same way, amino acids combine together through DNA translation to form protein.

These sentences need punctuation. Punctuation serves to let you know when a sentence begins, when it ends, and any pauses or gaps in-between. DNA is no different. It uses specific codons to indicate the beginning or ending of a sentence.

For example, the codon “ATG” indicates the beginning of an amino acid sequence. For this reason, scientists refer to ATG as the “START” codon. It is always at the beginning of a sentence. Without a START codon, your cells wouldn’t know where to begin making proteins.

There are also three codons that act as a “STOP” codon. These three codons (TGA, TAA, TAG) always indicate the end of a sentence. Without a STOP codon, your cells wouldn’t know when to stop making a given protein.

Admin, “The Language of DNA” at CRIGeneticsblog (November 15, 2019)

Examples of sentences in our DNA that give instructions to proteins are offered at the CRIGenetics site.

Some enterprising researchers have turned genetic sequences into music as well.

Some people claim that we live in a meaningless world but — if the language of the genome is any guide — we live in a world so packed with meaning that we will likely never comprehend it all.

You may also enjoy:

There is a glitch in the description of DNA as software. In contemporary culture, we are asked to believe — in an impressive break with observed reality — that the code wrote itself.

Mind Matters News

Breaking and noteworthy news from the exciting world of natural and artificial intelligence at MindMatters.ai.

AI Can Fight COVID by Detecting Changes in Virus “Language”