^{Analysis
July 26, 2019

7

Machine Learning, Social Media}

We Built the Power Big Social Media Have Over Us

_{Click by click, and the machines learned the patterns. Now we aren’t sure who is in charge} _{Analysis
July 26, 2019

7

Machine Learning, Social Media}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

If you are living on planet Earth, connected to the internet, you are likely making Google, Facebook, Amazon, Netflix, and other web giants even more powerful.

Computer scientists call the algorithms these companies depend on machine learning. Though machine learning (ML) is endlessly promoted as an emerging new form of artificial intelligence (AI), it isn’t actually new. “Deep Learning,” for instance, a popular machine learning approach used by Google and nearly everyone else, is a more sophisticated version of neural networks. Neural networks appeared almost contemporaneously with the field of artificial intelligence itself, in the 1950s.

Deep Learning identifies photos of our friends on Facebook, improves search results from Google, Bing and other major engines, and personalizes our content from news feeds and recommendations on sites like Amazon, Netflix, or Spotify. These results are impressive, of course. To many enthusiasts, they are the launch of a rocket that will finally arrive at truly intelligent machines.

As an area of research and development in computer science, ML is in fact a valuable tool for scientists and developers. Massive volumes of data from the worldwide web have given this old field new life. Without “Big Data,” machine learning systems are ho-hum performers. But feed them mega data sets (and use fast enough computers), and they soon outperform other approaches on a vast array of problems that were once hopeless.

Here’s an example of how it works: Input messy hand-written letters and keep correcting the system’s mistakes and ML before long rapidly recognizes all the letters of an alphabet (easy problem). Use vast quantities of text in different languages and you will get Google Translate. So too, voice recognition systems like Siri (with continuing mistakes), facial recognition, and even self-driving cars. These systems aren’t perfect, but they have the virtue of bypassing clunky rule-based approaches by ferreting out the patterns directly in data and they are beginning to be used everywhere. It is no surprise that ML is now hugely popular in academic research labs, Silicon Valley, and pretty much anywhere else we find software systems.

Because ML is data-driven, our click streams, blogs, tweets, and other content are its resource. Like Wikipedia, ML has become a global group project (but we’re all working for free, turns out). Take news feeds on Facebook. When we, “out in the wild,” decide to click on a link, the Deep Learning-based system Facebook uses our decisions to personalize news feeds, including the content we clicked as part of its training, in a continual feedback loop that keeps zeroing in on what we like (or think we like), based on what we choose. It’s a seemingly virtuous cycle: our online activity feeds the systems’ data, and they, in turn, spit back more and more relevant content for us. Wallah! A personalized feed, courtesy of ML.

The approach has big drawbacks, however. The fact that we train ML simply by going about our business on platforms like Facebook means that we’re all data points and that can have consequences. Consider the recent Cambridge Analytica scandal where Facebook may have violated laws by selling identifiable, detailed, and personal user data. ML systems can’t help us unless, by our web behavior, we tell them who we are and what we like. But then whoever controls those systems knows that information. Concerns about privacy and manipulation are obvious.

It gets worse. Even part-time students of ML know that algorithms like Deep Learning don’t work without a human-engineered design, a set of initial constraints that point the “learning” (really numerical optimization techniques, i.e., math) in a given direction. That is a kind of built-in bias, but in this context, the term isn’t meant as an accusation. A system with no bias can’t optimize anything, and so doesn’t “learn” what we want it to. The initial choices made in the design of an ML system enable it to build a “theory,” a model of the data. A news system, for instance, might learn about content by topic (politics, sports, science), political leaning (right, left), or even by mood (positive, negative reviews). If what’s desired is “more and more conservative political news in the UK,” the ML system, once trained on our inputs, populates our feeds with relevant pages automatically.

Unfortunately, this approach also ties us to our past clickstream, showing us (in effect) our own past preferences, not necessarily our information goals. Facebook news feeds, for instance, will screen out (say) conservative friends’ posts and links, if a user clicks mostly left-leaning news items. Opinion diversity silently disappears, as the ML system does what it was designed to do—give more of the same. It’s a “filter bubble,” as one commentator has put it. Or, it’s a new gatekeeper, but a hidden one. We can correct for this blindness manually, but ML systems themselves won’t do it for us. Vigilance is not built-in.

Data-driven, human-designed machine learning systems also give corporate interests a powerful new tool. Amazon recommends books based on what we view and buy (cool), but Netflix offers up less expensive movies first, not for us (once we’ve subscribed, they’re all free anyway) but for Netflix. Blockbusters are more expensive (Netflix pays, not us), and so the design choices made by Netflix silently steer us away from them, towards other stuff Netflix hopes we’ll enjoy. We’re getting what Netflix decides is best. Fine, sure, but we might have wanted the blockbuster (who knows?). Of course, we can always manually search for movies, but then the good, helpful side of ML goes away, too.

ML is flawed, in other words, and much of the benefits go to the designers (who, after all, own the systems). Dyed-in-the-wool AI hopefuls might argue that the MY systems are a mere stepping stone to yet more powerful AI. Perhaps a “Super-AI” is on its way next, which through sheer intelligence can mitigate these troubling human concerns (oh, but why should it care?).

Next-generation AI might escape its masters and start really helping us, like a friend (or a slave). But no one has a clue how to create anything but the idiot savants we now have. We feed them and train them with our clickstreams so that they may feed us their idiot savant results back, in turn. So we’re stuck, working for free, training the Web giants’ ML systems to reap benefits for them while enduring (assuming we notice) the downsides.

Be warned. ML isn’t going to “evolve” into something entirely new and different. New designs and approaches will come from future human innovation, more sensitive to human concerns. In the meantime—and perhaps all for the better—we would do well to recognize and guard against the manipulative game we’re all now playing.

Also by Analysis:

Futurism doesn’t learn from past experience. Technological success stories cannot be extrapolated into an indefinite future

and

The mind can’t be just a computer, Gödel demonstrated that fact and Turing tried to live with it

Further reading:

Machine Learning Dates Back to at Least 300 BC The key to machine learning is not machines but mathematics (Jonathan Bartlett)