Last time out, we talked about Occam’s Razor: Of two competing theories, the simpler explanation that accounts for all that needs explaining is better. Here’s an illustration you can try yourself:
Imagine that you have a deck of 52 cards, with an unknown proportion of red to black cards. There are either more red than black, more black than red, or the sums are equal. After randomly shuffling the deck, you flip over 10 of the cards and they are all black. What do you suppose is more likely, that there are more black or red cards in the deck?
Most likely you would say black. But how did you come to this conclusion? Within the framework we’ve been discussing, we would say it is because we have a very simple description, i.e. model, for the cards we drew, namely that they are all black. And because we have high confidence that the deck is predominantly black, we also have high confidence that the next card we draw will be black.
Now what if the 10 cards you flipped are evenly split between red and black? What do you now conclude? Probably that the deck is not predominantly either color. In that case, we do not have high confidence about the color of the next card we draw.
This scenario also matches our framework nicely. When we draw an even split of cards, we do not have a concise way to describe that exact draw of 10 cards. We would have to name each card individually, e.g. drew a red, then a black, then another black, then a red, and so on. In this case, our model for the data we have observed is not concise. The model exactly matches the data, and thus our model cannot give us any confidence regarding the color of unseen cards, i.e. unseen data.
In both cases, drawing all black and drawing an even split, we’ve been able to identify models after the fact. In the case of all black, we know that our after-the-fact model gives us high predictive ability with cards we have not seen yet. This approach is contrary to Fisher’s method where we formulate all theories before encountering the data (Part 3). We came up with the model after we saw the cards we drew.
Thus we can use Occam’s Razor to generalize.
Machine learning isn’t difficult; just different. A few simple principles open many doors:
Part 1 in this series by Eric Holloway is The challenge of teaching machines to generalize. Teaching students simply to pass tests provides a good illustration of the problems. We want the machine learning algorithms to learn general principles from the data we provide and not merely little tricks and nonessential features that score high but ignore problems.
Part 2: Supervised Learning. Let’s start with the most common type of machine learning, distinguishing between something simple, like big and small.
Part 3: Don’t snoop on your data. You risk using a feature for prediction that is common to the dataset, but not to the problem you are studying
Part 4: Occam’s Razor Can Shave Away Data Snooping. The greater an object’s information content, the lower its probability. If there is only a small probability that our model is mistaken with the data we have, there is a very small probability that our model will be mistaken with data we do not have.
For more general background on machine learning:
Part 1: Navigating the machine learning landscape. To choose the right type of machine learning model for your project, you need to answer a few specific questions (Jonathan Bartlett)
Part 2: Navigating the machine learning landscape — supervised classifiers Supervised classifiers can sort items like posts to a discussion group or medical images, using one of many algorithms developed for the purpose. (Jonathan Bartlett)