^{Eric Holloway

July 5, 2019

7

Machine Learning}

How Business Intelligence Can Break the Data Deadlock

_{Companies today are awash in information. But which patterns are real? Which are cloud bunnies?} _{Eric Holloway

July 5, 2019

7

Machine Learning}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Companies today don’t have a problem with finding information; they are up to their collective ears in it! But what items matter and how will we know?

It is very difficult to mine vast masses of data to find significant insights, especially if you don’t a Ph.D. in mathematics and computer science. The data disciplines require deep technical knowledge, most of which is divorced from the day to day running of a company. Thus the directors who need the insights the most are the least qualified to analyze the data.

Business intelligence is an emerging field in the information technology industry that tries to bridge the gap between the data and the directors. The data is presented in visual formats to help people spot patterns. However, that’s not all we need. We are prone to seeing bunnies in the clouds, faces in the mountains, and highways on Mars.

One way business intelligence can address this problem (false positives) is hypothesis testing. The data analyst can generate a figure for the probability that a pattern is real, not imagined. The difficulty is that, for strong guarantees, the patterns must be proposed before they are seen in the data. But the more the analyst looks at the data to derive a pattern, the more that analyst falls prey to seeing patterns that are not really there. Thus, the need to state all patterns up front is a huge restriction and deadlocks our ability to gain insight from the data.

Welcome to Data Deadlock. Should we just go home now?

Intelligent design theory might help us make new headway in the fields of information theory and statistics. The problem is familiar: how can we be sure that a pattern we see, for example, apparent design in the biological record, is not merely a chance outcome? Intelligent design theory makes the novel proposal that we can derive patterns from the data after the fact while retaining the strong guarantees of hypothesis testing.

The insight behind intelligent design theory goes back hundreds of years, at least to the great mathematician Pierre Laplace (1749–1827). Laplace remarked that certain numbers seem very unlikely in and of themselves. For example, he might see a carriage with the number 1000. Given that the numbers are given out in order and there are at least a thousand carriages, someone is bound to see that number. But, happening to be that particular person is much less likely.

The notion that some numbers are less random than others sounds counterintuitive because we tend to think of all numbers as alike. For example, what if the same person wins the Powerball lottery ten times in a row? We would think something is fishy, despite the fact that that occurrence is just as likely as any other specific configuration of ten wins. What about flipping a hundred heads in a row? Each sequence of a hundred coin flips is equally likely, so why do we think a hundred heads is any more significant than any other sequence? Yet, if a street gambler flipped a hundred heads, we would immediately suspect foul play.

Note that we are seeing these apparently non-random patterns after the fact. Yet at the same time, we are certain their chance occurrence is very small. And analysis of lottery scandals shows that our suspicions are well-founded. So, as these examples show, contrary to the dogma of hypothesis testing, it is possible to do after-the-fact pattern analysis while limiting the probability of false positives.

Intelligent design theory formalizes after-the-fact pattern analysis. We can analyze the difference between two measures and call the result the event’s measure of complex specified information (CSI). The first measure is the negative log probability of the patterned event, known as the event’s complexity. The second measure is the description length of the pattern, known as the event’s specification. The amount of complex specified information is the complexity minus the specification. Two to the power of the negative complex specified information measure gives an upper bound on the probability that the pattern occurred by chance.

Returning to the coin example, we can see how this analysis would register one hundred heads as less likely than a more evenly distributed sequence of heads and tails. Each sequence of a hundred coin flips has a very small probability and the negative logarithm of the probability happens to be one hundred. So, the coin sequence complexity is one hundred. At the same time, a sequence of a hundred heads has a short description, so its specification is, let’s say, eighty. Thus, the amount of complex specified information in a sequence of one hundred heads is twenty and the sequence is unlikely to occur by chance. On the other hand, coin sequences with a fairly equal count of heads and tails will tend to have long descriptions and consequently long specifications, thus a high probability of occurring by chance.

However, a reliable probability bound must meet one important criterion: The description must be independent of the event’s occurrence. Here is an example of a description that is not independent: Let’s say I flip a fair coin one hundred times and write the sequence on a piece of paper. I then describe the sequence as what is “on the paper,” a very short specification of about fifty-seven. This method of creating a specification would mean that every sequence of a hundred coin flips has an amount of complex specified information of forty-three, so each of these sequences is considered non-random. This, of course, would make the metric meaningless for distinguishing random from non-random events. Thus, it is essential that the means of specifying the event be independent of the event’s occurrence.

Circling back to business intelligence, we see that complex specified information is exactly what is required to make business intelligence work because it addresses the issue of false positives. First, complex specified information does not guarantee the pattern the director sees is real. What complex specified information provides is an upper bound on how likely the pattern is to be an accident. Exponentiating the negative to the power of two provides a probability that the director is wrong if he decides the pattern is real. If the information measure is large enough, there is a very small probability the pattern is a false positive.

As a result, we see that by using intelligent design theory, we can go beyond the restrictions of the data disciplines’ concept of hypothesis testing. The data disciplines require us to state all our patterns upfront before we have even seen the data. This requirement is very restrictive and can miss many valuable insights.

Intelligent design theory enables us to examine the data as we wish in order to discover potential patterns and then measure the patterns’ information content to distinguish the real from the false. That would free the company director to leverage business knowledge to interpret the data and maximize the insight he can extract.

See also: Machine learning tip: Set boundaries for the problems. We cannot take a giant pile of unorganized data, shove it into a machine, and expect useful results (Jonathan Bartlett)