Step Away From Stepwise Regression (and Other Data Mining)

Stepwise regression, which is making a comeback, is just another form of HARKing — Hypothesizing After the Results are Known

There is a strong correlation between the number of lawyers in Nevada and the number of people who died after tripping over their own two feet. There are similarly impressive correlations between U.S. crude oil imports and the per capita consumption of chicken — and the number of letters in the winning word in the Scripps National Spelling Bee and the number if people killed by venomous spiders. If you find these amusing (as I do), there are many more at the website Spurious Correlations. These silly statistical relationships are intended to demonstrate that correlation is not causation. But no matter how often or how loudly statisticians shout that warning, many people do not hear it. When there is a Read More ›

Data Mining: A Plague, Not a Cure

It is tempting to believe that patterns are unusual and their discovery meaningful; in large data sets, patterns are inevitable and generally meaningless

Findings patterns in data is easy. Finding meaningful patterns that have a logical basis and can be used to make accurate predictions is elusive. We can see this from 18th-century attempts to cure scurvy through 21st century claims about the stock market or history.

