Mind Matters News and Analysis on Natural and Artificial Intelligence

The Numbers Don’t Speak for Themselves

The patterns uncovered by machine learning may reflect a larger reality or just a bias in gathering data
googleplus Google+
arroba Email
Henry Potter (Lionel Barrymore) was #6 of American Film Institute’s all-time film villains George Bailey (Jimmy Stewart) was #9 in heroes.

The heroism of George Bailey, the central character in Frank Capra’s film It’s a Wonderful Life” (1946),grew out of his view of people. To George, the people his bank served were fully human, with families and lives; not mere numbers recorded in a bank book.

The film’s villain—Henry Potter, a crotchety, greedy old man guided only by pursuit of the highest return on investment—reduces people to mere numbers. One scene pits Bailey against Potter as they review loans George has made. George argues for the character of the families he knows while Potter insists on what he regards as cold, hard “facts.” The film’s enduring popularity shows which world we prefer, and it’s not Mr. Potter’s.

Sadly, the unbounded push for efficiency, as measured by nothing more than reducing events to numbers, encourages us to use artificial intelligence (AI) in a range of situations where a human being would serve better. And a price is being paid. In MIT Technology Review, Karen Hao discusses the use by police departments of computerized risk assessment. They try to determine who might commit a crime if given bail instead of jail. But using numbers alone sadly mistakes incidental circumstances that turn up in the numbers for actual causes:

…machine-learning algorithms use statistics to find patterns in data. So if you feed it historical crime data, it will pick out the patterns associated with crime. But those patterns are statistical correlations—nowhere near the same as causations. If an algorithm found, for example, that low income was correlated with high recidivism, it would leave you none the wiser about whether low income actually caused crime. But this is precisely what risk assessment tools do: they turn correlative insights into causal scoring mechanisms. Karen Hao, “AI is sending people to jail—and getting it wrong” at Technology Review

Data scientist Cathy O’Neil, first on her blog mathbabe and then later in her award-winning book, Weapons of Math Destruction, has called attention to the data part of Big Data and Machine Learning.

Machine Learning, despite all the hyperbole and expansive language, is statistics. The data stream driving those statistics determines the outcome even when we cannot see the connection. Because Machine Learning is opaque—even experts cannot clearly explain how a system arrived at a conclusion—we treat it as magic. Therefore, we should, Dr. O’Neil warns, mistrust the systems until proven innocent (and correct). When applied to criminal risk assessment, misused AI inaccurately assigns risk where it may not belong, possibly sending—as Hao notes at Technology Review—the “wrong people” to jail.

Data analyst Kalev Leetaru—who has written several good pieces on AI at Forbes—also points out that machine learning is “about correlations, not causation.” The fact that two pieces of data may be linked does not mean that one causes the other but nonetheless,

Developers and data scientists increasingly treat their creations as silicon lifeforms ‘learning’ concrete facts about the world, rather than what they truly are: piles of numbers detached from what they represent, mere statistical patterns encoded into software. We must recognize that those patterns are merely correlations amongst vast reams of data, rather than causative truths or natural laws governing our world. Kalev Leetaru, “A Reminder That Machine Learning Is About Correlations Not Causation” at Forbes

At times, verification and study of these statistical patterns are useful. For example, machine learning is well-suited to assist us in the analysis of complex patterns, such as MRI results. Properly used, it could help save many lives. But when misapplied—for the sake of “efficiency”—to broad swathes of the people using data sets gathered for other purposes, sets that nearly always carry embedded biases, it damages many lives. Leetaru reminds us:

Perhaps the biggest issue with current machine learning trends, however, is our flawed tendency to interpret or describe the patterns captured in models as causative rather than correlations of unknown veracity, accuracy or impact. “A Reminder That Machine Learning Is About Correlations Not Causation” at Forbes

In the end, we need to choose the world we prefer: George Bailey’s world that sees people or Mr. Potter’s world that sees mere numbers. The choice we make affects how we will deploy or, in some critical cases, not deploy AI. And that choice affects us all.

You may also be interested in: Can an algorithm be racist? The machine has no opinion. It processes vast tracts of data. And, as a result, the troubling hidden roots of some data are exposed

Did AI teach itself to “not like” women?


AI: Think about ethics before trouble arises (George Montañez)

Image result for Brendan Dixon Brendan Dixon is a Software Architect with experience designing, creating, and managing projects of all sizes. His first foray into Artificial Intelligence was in the 1980s when he built an Expert System to assist in the diagnosis of software problems at IBM. Though he’s spent the majority of his career on other types of software, he’s remained engaged and interested in the field.

Also by Brendan Dixon: Artificial Intelligence Is Actually Superficial Intelligence The confusing ways the word “intelligence” is used belies the differences between human intelligence and machine sophistication

AI Winter Is Coming: Roughly every decade since the late 1960s has experienced a promising wave of AI that later crashed on real-world problems, leading to collapses in research funding.


The “Superintelligent AI” Myth: The problem that even the skeptical Deep Learning researcher left out