^{Eric Holloway

October 16, 2018

7

Artificial Intelligence, Natural Intelligence}

Does information theory support design in nature?

_{William Dembski makes a convincing case, using accepted information theory principles relevant to computer science} _{Eric Holloway

October 16, 2018

7

Artificial Intelligence, Natural Intelligence}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

William Dembski/Laszlo Bencze

When I first began to look into intelligent design (ID) theory while I was considering becoming an atheist, I was struck by Bill Dembski’s claim that ID could be demonstrated mathematically through information theory. A number of authors who were experts in computer science and information theory disagreed with Dembski’s argument. They offered two criticisms: that he did not provide enough details to make the argument coherent and that he was making claims that were at odds with established information theory.

In online discussions, I pressed a number of them, including Jeffrey Shallit, Tom English, Joe Felsenstein, and Joshua Swamidass. I also read a number of their articles. But I have not been able to discover a precise reason why they think Dembski is wrong. Ironically, they actually tend to agree with Dembski when the topic lies within their respective realms of expertise. For example, in his rebuttal Shallit considered an idea which is very similar to the ID concept of “algorithmic specified complexity”. The critics tended to pounce when addressing Dembski’s claims outside their realms of expertise.

To better understand intelligent design’s relationship to information theory and thus get to the root of the controversy, I spent two and a half years studying information theory and associated topics during PhD studies with one of Dembski’s co-authors, Robert Marks. I expected to get some clarity on the theorems that would contradict Dembski’s argument. Instead, I found the opposite.

Claude Shannon/Conrad Jakobs

The two primary approaches to information theory are Shannon information and Kolmogorov complexity. The example usually used to illustrate the difference is the amount of information in a sequence of fair coin flips, say ten in a row. Shannon’s theory says that there is nothing distinctive about a sequence that is all heads compared to a sequence of more evenly distributed (“garbled”) heads and tails. The sequences are equally likely (have an equal probability) and thus they have equal information content. Kolmogorov’s theory, on the other hand, distinguishes between the two sequences based on their “description lengths.” How long would it take to explain the sequence? The sequence that is all heads has a much shorter description length (“all heads”) than the evenly distributed garbled sequence (possibly “three tails, one head, two tails, three heads, one tail”). The latter sequence is considered a typical “random” sequence in Kolmogorov’s theory.

Kolmogorov’s approach to information theory is motivated by Laplace’s observation that we do not assign equal probability to all patterns:

“if heads comes up a hundred times in a row, then this appears to us extraordinary, because the almost infinite number of combinations that can arise in a hundred throws are divided in regular sequences, or those in which we observe a rule that is easy to grasp, and in irregular sequences, that are incomparably more numerous.” (Laplace, 1951: 16–7) [1]

Given a fair coin flip, highly compressible patterns (patterns that are easy to describe) are rarer than incompressible patterns. When compressibility is high enough (two hundred heads in a row, for example), we generally look for a more plausible explanation than a fair coin flip.

Andrey Kolmogorov/Konrad Jacobs

Dembski applies this line of reasoning in “The Explanatory Filter.” It is motivated by a similar consideration: how to distinguish a typical random sequence from non-random sequences after the fact. The standard way of testing hypotheses, developed in the 1920s by Ronald Fisher and thus called “Fisherian hypothesis testing,” requires that any hypotheses to be tested must be stated before the experiment is performed. Fisher did not provide a way for patterns to be detected after the fact. If we are wondering whether our universe or life forms show evidence of design, we must, of course, examine them after the fact. However, Kolmogorov’s theory of information shows that we can detect patterns after the fact because sequences that can be concisely described are rarer than sequences that require lengthy descriptions.[2]

Additionally, I found Dembski’s key indicator of intelligent design, “complex specified information (CSI)”, to be a more refined form of the information theory concept of “mutual information,” with the additional constraint that the random variable for specification is independent of the described event. This additional constraint results in the second keystone of intelligent design theory: the conservation of information.

Dembski proved that searching for a good search algorithm (the “search for a search”) is no easier than performing the search for the primary target in the first place. The implication is that there is no shortcut by which natural processes of law and chance can produce information from chaos or increase the amount of existing information. Thus natural processes such as the various means by which evolution may occur cannot be said to create information.

There is a similar theorem in computer science called the “no free lunch theorem (NFLT).” It states all search algorithms’ performance is identical when averaged across all possible problems. I initially thought that ID’s similarity to established theory would end there. Instead, I discovered that there are a couple conservation theorems similar to the NFLT in information theory.

In Shannon’s information theory, there is data processing inequality, for example. It states that processing data does not increase information regarding its origin beyond the original content of the data. In Kolmogorov information theory, there is also Leonid Levin’s law of independence conservation, which states that no combination of random and deterministic processing can increase mutual information (increase the algorithmic mutual information between independently specified bitstrings). In addition, there are a number of variations on these conservation laws and related quantities such as the Kullback-Liebler distance, which show that determinism and randomness are incapable of creating CSI.

Hans Moravec

Intelligent design theory is sometimes said to lack any practical application. One straightforward application is that, because intelligence can create information and computation cannot, human interaction will improve computational performance. Addressing this observation there is a growing field known as “human computation” which investigates whether human-in-the-loop computation is more effective than a purely computational approach. It turns out that the answer is yes.

There are numerous tasks that humans find trivial but are extremely difficult or impossible for algorithms to perform. This phenomenon is known as Moravec’s paradox. Combining human and computational approaches allow the computers to “amplify” the humans’ capabilities, and vice versa. The big tech companies such as Microsoft, Google, Facebook and Amazon all use forms of human computation to power their search and recommendation algorithms. Incidentally, a number of artificial intelligence companies have been caught faking their AI with human workers posing as bots.

After the years of study, I found that, rather being at odds with established information theory, Dembski’s Explanatory Filter is very much in line with well-known theorems. This left me wondering why there was so much controversy around his theory in the first place. I have still not been able to answer this question, but whatever the cause of the controversy, it is not lack of theoretical and practical justification.

[1] Laplace, P. S., & Simon, P. (1951). A philosophical essay on probabilities, translated from the 6th French edition by Frederick Wilson Truscott and Frederick Lincoln Emory.

[2] This observation is used by Ray Solomonoff in combination with Bayes theorem to derive a mathematical formalization of Occam’s razor.

Note: A version of this piece was also published October 16, 2018, by John Mark Reynolds, in his regular column space, Eidos, at Patheos.

Eric Holloway has a Ph.D. in Electrical & Computer Engineering from Baylor University. He is a current Captain in the United States Air Force where he served in the US and Afghanistan He is the co-editor of the book Naturalism and Its Alternatives in Scientific Methodologies. Dr. Holloway is an Associate Fellow of the Walter Bradley Center for Natural and Artificial Intelligence.

Also by Eric Holloway: Artificial intelligence is impossible

Could one single machine invent everything?

Human intelligence as a halting oracle (Eric Holloway)

and

A formal proof that a halting oracle can create information (Eric Holloway)