Microsoft Flight Simulator: Promise and Problems of Big Open DataFor some software, bad data doesn’t matter; for other software, working off of month-old data could be life-threatening
Last week, Microsoft released its critically acclaimed Microsoft Flight Simulator, to much cheering and applause. The game creates a photorealistic journey across the planet. Artificial intelligence combines multiple data sets to create a magnificent virtual experience of flying through the world.
The data comes from satellite maps for terrain and texture information and OpenStreetMap to add three dimensional information to city data, such as building heights and other information. Combining all these data sources generates a 3D world using a variety of AI photogrammetry techniques. The program then streams this world to you as you fly through it.
Additionally, the system streams in real-world weather data, so that the weather experienced in any part of the world is transmitted to your game. You get a simulated experience of flying through the weather that is happening at that time.
The rendered world, however, takes a huge amount of data. In fact, the terabytes of rendered data simply can’t be downloaded onto your computer. It must be streamed in as you go through the world instead. This limitation has caused a number of users to experience problems when the massive amount of data that comes in during travel to new places overwhelms the internet connection. (Note, however, that this feature can also be turned off.)
What’s really instructive, however, are the mistakes the program can make.
At the time of release, in the middle of Melbourne, Australia, Microsoft Flight Simulator showed a skinny but extremely tall 212-story building reaching out into the sky (in comparison, the tallest building in the world only has 163 floors). Why? It turns out that, because one of the data sources for the game was OpenStreetMap, the program inherited errors from that project.
Last year, a user accidentally mistyped the number of stories in a building. This error was quickly corrected in the original data. But between the time that it was originally typed and subsequently corrected, the developers had taken their snapshot of the data to use in the flight simulator.
This isn’t a problem for Microsoft Flight Simulator itself. For the game, it is actually a kind of fun Easter egg to find. One player has garnered almost a half a million views of his video of trying to land on it (and succeeding). But it does raise questions about other efforts to combine Big Data sources, especially those that include crowdsourced data.
With sufficiently large amounts of data, it is essentially impossible to verify all data points. Thus, any system that uses the data must be equipped with the ability to withstand errors in that data.
For instance, automotive journalist Ed Niedermeyer has pointed out that these advances in AI create outstanding features but what is needed in many situations is reliability and safety. Because the problematic cases are unpredictable (both in occurrence, and, as is evident here, in size), the edge cases may feature not merely degraded behavior but potentially catastrophic behavior.
Additionally, the question arises, how does such data get updated in large-scale systems? Streets and buildings are continually modified. Even if a perfect AI data processor were able to stitch the pieces together without error, the time required to process that data would be immense. As such, there would be massive delays between updates to the original data source and updates to its rendering.
For some software, bad data doesn’t matter; for other software, working off of month-old data could be life-threatening.
Finally, there is the combining process itself. Not only does the data need continual updating, but the process of combining the data sources into a final rendered environment also requires continual rework. If an error is found, is it an error in the source data or in the machine rendering of it? These errors take time to track down and it is hard to fix errors without inadvertently introducing others.
In short, having lots of data and fancy ways to combine it gives you a great first-pass approximation of the real world. The problem is that as the data stream grows, it gets harder to update and harder to verify. For games and for assistance-type tools, this works great. For situations where safety and reliability are paramount, it’s a problem.
The biggest threat from AI brings is not that the robots will overthrow us, but that we will be too enamored with it to take its shortcomings seriously.
You may also enjoy:
What’s the main thing we should learn from the big Twitter hack? Yes, Twitter got control of its platform back but not before its credibility in security matters was significantly weakened.
Bitcoin: Is lack of trust the biggest security threat? It’s almost a parable: Everyone can see, no one can access, the millions trapped in the ether by a password known only to a dead man.