Mind Matters News and Analysis on Natural and Artificial Intelligence

TagData snooping

Brush and razor for shaving beard. Concept background of hair salon men, barber shop

Occam’s Razor Can Shave Away Data Snooping

The greater an object's information content, the lower its probability.

One technique to avoid data snooping is based on the intersection of information theory and probability: An object’s probability is related to its information content. The greater an object’s information content, the lower its probability. We measure a model’s information content as the logarithmic difference between the probability that the data occurred by chance and the number of bits required to store the model. The negative exponential of the difference is the model’s probability of occurring by chance. If the data cannot be compressed, then these two values are equal. Then the model has zero information and we cannot know if the data was generated by chance or not. For a dataset that is incompressible and uninformative, swirl some tea Read More ›

Snooping Dmitry Ratushny Unsplash 1455368109333-ebc686ad6c58

Machine Learning, Part 3: Don’t Snoop on Your Data

You risk using a feature for prediction that is common to the dataset, but not to the problem you are studying

As long as we can establish that our theories, hypotheses, and/or models are independent of the data, then we can trust that their predictive power will generalize beyond the data we have observed.

Read More ›