How Do We Know What Superintelligent AI Will Do?
If superintelligent systems existed, logic demonstrates that they would be unpredictableWith the increase in the capabilities of artificial intelligence over the last decade, a significant number of researchers have realized the importance of creating intelligent systems that are both capable and also safe and secure [1-6]. Unfortunately, the field of AI Safety is very young and researchers are still working to identify its main challenges and limitations. Impossibility results are well known in many fields of inquiry [7-13] and some have now been identified in AI Safety [14-16].
In this post*, I want to concentrate on a poorly understood concept of unpredictability in intelligent systems [17] which limits our ability to understand the impact of the intelligent systems we are developing. It is a challenge for software verification and intelligent systems control, as well as for AI Safety in general.
In theoretical computer science and in software development in general, many well-known impossibility results are well established. Some of them are strongly related to the subject of this paper; for example, Rice’s Theorem states that no computationally effective method can decide if a program will exhibit a particular non-trivial behavior, such as producing a specific output [18]. Similarly, Wolfram’s Computational Irreducibility states that complex behaviors of programs can’t be predicted without actually running those programs [19]. Any physical system which could be mapped onto a Turing Machine will similarly exhibit Unpredictability [20, 21].
Unpredictability of AI is one of many impossibility results in AI Safety, also known as Unknowability [22] or Cognitive Uncontainability [23]. It is defined as our inability to precisely and consistently predict what specific actions an intelligent system will take to achieve its objectives, even if we know the terminal goals of the system. It is related to the unexplainability and incomprehensibility of AI but is not quite the same thing. Unpredictability does not imply that better-than-random statistical analysis is impossible; it simply points out a general limitation on how well such efforts can perform. It is particularly pronounced with advanced generally intelligent systems (superintelligence) in novel domains. In fact, we can present a proof of unpredictability for such superintelligent systems.
Proof: This is a proof by contradiction. Suppose things were otherwise. Suppose that unpredictability is wrong and it is possible for a person to accurately predict the decisions of superintelligence. That means they can make the same decisions as the superintelligence, which makes them as smart as the superintelligence. But that is a contradiction because superintelligence is defined as a system smarter than any person is. That, in turn, means that our initial assumption was false and unpredictability is not wrong.
The amount of unpredictability can be formally measured via the theory of Bayesian surprise, which measures the difference between posterior and prior beliefs of the predicting agent [24-27]. “The unpredictability of intelligence is a very special and unusual kind of surprise, which is not at all like noise or randomness. There is a weird balance between the unpredictability of actions and the predictability of outcomes.” [28]. A simple heuristic is to estimate the amount of surprise as proportionate to the difference in intelligence between the predictor and the predicted agent. See Yudkowsky [29, 30] for an easily-followed discussion of this topic.
Unpredictability is practically observable in current narrow domain systems with superhuman performance. Developers of famous intelligent systems such as Deep Blue (Chess) [31, 32], IBM Watson (Jeopardy) [33], and AlphaZero (Go) [34, 35] did not know what specific decisions their AI was going to make for every turn. All they could predict was that it would try to win using any actions available to it, and win it did. AGI developers are in exactly the same situation; they may know the ultimate goals of their system but do not know the actual step-by-step plan it will execute, which of course has serious consequences for AI Safety [36-39]. A reader interested in concrete examples of unanticipated actions of intelligent agents is advised to read two surveys on the subject, one in the domain of evolutionary algorithms [40] and another on narrow AI agents [41].
There are infinitely many paths to every desirable state of the world. The great majority of them are completely undesirable and unsafe, and most have negative side effects. In harder cases, which is to say most and most real-world cases, even the overall goal of the system may not be precisely known or may be known only in abstract terms, aka “make the world better.” While in some cases the terminal goal(s) could be learned, even if you can learn to predict an overall outcome with some statistical certainty, you cannot learn to predict all the steps to the goal a system of superior intelligence would take. A lower intelligence can’t accurately predict all decisions of a higher intelligence, a concept known as Vinge’s Principle [42]. “Vinge’s Principle implies that when an agent is designing another agent (or modifying its own code), it needs to approve the other agent’s design without knowing the other agent’s exact future actions.” [43].
Unpredictability is an intuitively familiar concept. We can usually predict the outcome of common physical processes without knowing specific behavior of particular atoms, just as we can typically predict the overall behavior of the intelligent system without knowing specific intermediate steps. Rahwan and Cebrian observe that “… complex AI agents often exhibit inherent unpredictability: they demonstrate emergent behaviors that are impossible to predict with precision—even by their own programmers. These behaviors manifest themselves only through interaction with the world and with other agents in the environment… In fact, Alan Turing and Alonzo Church showed the fundamental impossibility of ensuring an algorithm fulfills certain properties without actually running said algorithm. There are fundamental theoretical limits to our ability to verify that a particular piece of code will always satisfy desirable properties unless we execute the code, and observe its behavior.” [44]. (See Rahwan et al. for additional discussion on unpredictability and related issues with machine behavior [45].)
Others have arrived at similar conclusions. “Given the inherent unpredictability of AI, it may not always be feasible to implement specific controls for every activity in which a bot engages.” [46]. “As computer programs become more intelligent and less transparent, not only are the harmful effects less predictable, but their decision-making process may also be unpredictable.” [47]. “The AI could become so complex that it results in errors and unpredictability, as the AI will be not able to predict its own behavior.” [48]. “… the behavior of [artificial intellects] will be so complex as to be unpredictable, and therefore potentially threatening to human beings.” [49].
We can conclude that the Unpredictability of AI will forever make 100% safe AI an impossibility. But we can still strive for Safer AI because we are able to make some predictions about AIs we design.
*Note: This post is based on “Unpredictability of AI”.
References
1. Yampolskiy, R.V., Artificial Intelligence Safety and Security. 2018: Chapman and Hall/CRC.
2. Callaghan, V., et al., Technological Singularity. 2017: Springer.
3. Baum, S.D., et al., Long-term trajectories of human civilization. foresight, 2019. 21(1): p. 53-83.
4. Duettmann, A., et al., Artificial General Intelligence: Coordination & Great Powers.
5. Charisi, V., et al., Towards Moral Autonomous Systems. arXiv preprint arXiv:1703.04741, 2017.
6. Brundage, M., et al., The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv preprint arXiv:1802.07228, 2018.
7. Fisher, M., N. Lynch, and M. Peterson, Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM, 1985. 32(2): p. 374-382.
8. Grossman, S.J. and J.E. Stiglitz, On the impossibility of informationally efficient markets. The American economic review, 1980. 70(3): p. 393-408.
9. Kleinberg, J.M. An impossibility theorem for clustering. in Advances in neural information processing systems. 2003.
10. Strawson, G., The impossibility of moral responsibility. Philosophical studies, 1994. 75(1): p. 5-24.
11. Bazerman, M.H., K.P. Morgan, and G.F. Loewenstein, The impossibility of auditor independence. Sloan Management Review, 1997. 38: p. 89-94.
12. List, C. and P. Pettit, Aggregating sets of judgments: An impossibility result. Economics & Philosophy, 2002. 18(1): p. 89-110.
13. Dufour, J.-M., Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica: Journal of the Econometric Society, 1997: p. 1365-1387.
14. Yampolskiy, R.V., What are the ultimate limits to computational techniques: verifier theory and unverifiability. Physica Scripta, 2017. 92(9): p. 093001.
15. Armstrong, S. and S. Mindermann, Impossibility of deducing preferences and rationality from human policy. arXiv preprint arXiv:1712.05812, 2017.
16. Eckersley, P., Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). arXiv preprint arXiv:1901.00064, 2018.
17. Yampolskiy, R.V. The space of possible mind designs. in International Conference on Artificial General Intelligence. 2015. Springer.
18. Rice, H.G., Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 1953. 74(2): p. 358-366.
19. Wolfram, S., A new kind of science. Vol. 5. 2002: Wolfram Media, Champaign.
20. Moore, C., Unpredictability and undecidability in dynamical systems. Physical Review Letters, 1990. 64(20): p. 2354.
21. Moore, C., Generalized shifts: unpredictability and undecidability in dynamical systems. Nonlinearity, 1991. 4(2): p. 199.
22. Vinge, V. Technological singularity. in VISION-21 Symposium sponsored by NASA Lewis Research Center and the Ohio Aerospace Institute. 1993.
23. Cognitive Uncontainability, in Arbital. Retrieved May 19, 2019: Available at: https://arbital.com/p/uncontainability/.
24. Itti, L. and P. Baldi. A principled approach to detecting surprising events in video. in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). 2005. IEEE.
25. Itti, L. and P.F. Baldi. Bayesian surprise attracts human attention. in Advances in neural information processing systems. 2006. MIT Press.
26. Storck, J., S. Hochreiter, and J. Schmidhuber. Reinforcement driven information acquisition in non-deterministic environments. in Proceedings of the international conference on artificial neural networks, Paris. 1995. Citeseer.
27. Schmidhuber, J., Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. Journal of SICE, 2009. 48(1).
28. Yudkowsky, E., Expected Creative Surprises, in Less Wrong. October 24, 2008: https://www.lesswrong.com/posts/rEDpaTTEzhPLz4fHh/expected-creative-surprises.
29. Yudkowsky, E., Belief in Intelligence, in Less Wrong. October 25, 2008: Available at: https://www.lesswrong.com/posts/HktFCy6dgsqJ9WPpX/belief-in-intelligence.
30. Yudkowsky, E., Aiming at the Target, in Less Wrong. October 26, 2008: Available at: https://www.lesswrong.com/posts/CW6HDvodPpNe38Cry/aiming-at-the-target.
31. Vingean Uncertainty, in Arbital. Retrieved May 19, 2019: Available at: https://arbital.com/p/Vingean_uncertainty/.
32. Campbell, M., A.J. Hoane Jr, and F.-h. Hsu, Deep blue. Artificial intelligence, 2002. 134(1-2): p. 57-83.
33. Ferrucci, D.A., Introduction to “this is watson”. IBM Journal of Research and Development, 2012. 56(3.4): p. 1: 1-1: 15.
34. Yudkowsky, E., Eliezer Yudkowsky on AlphaGo’s Wins, in Future of Life Institute. March 15, 2016: https://futureoflife.org/2016/03/15/eliezer-yudkowsky-on-alphagos-wins/.
35. Silver, D., et al., A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018. 362(6419): p. 1140-1144.
36. Pistono, F. and R.V. Yampolskiy, Unethical Research: How to Create a Malevolent Artificial Intelligence. arXiv preprint arXiv:1605.02817, 2016.
37. Yampolskiy, R.V., What to Do with the Singularity Paradox?, in Philosophy and Theory of Artificial Intelligence. 2013, Springer Berlin Heidelberg. p. 397-413.
38. Babcock, J., J. Kramar, and R. Yampolskiy, The AGI Containment Problem, in The Ninth Conference on Artificial General Intelligence (AGI2015). July 16-19, 2016: NYC, USA.
39. Majot, A.M. and R.V. Yampolskiy. AI safety engineering through introduction of self-reference into felicific calculus via artificial pain and pleasure. in IEEE International Symposium on Ethics in Science, Technology and Engineering. May 23-24, 2014. Chicago, IL: IEEE.
40. Lehman, J., J. Clune, and D. Misevic. The surprising creativity of digital evolution. in Artificial Life Conference Proceedings. 2018. MIT Press.
41. Yampolskiy, R.V., Predicting future AI failures from historic examples. foresight, 2019. 21(1): p. 138-152.
42. Vinge’s Principle, in Arbital. Retrieved May 19, 2019: Available at: https://arbital.com/p/Vinge_principle/.
43. Vingean Reflection, in Aribital. Retrieved May 19, 2019: Available at: https://arbital.com/p/Vingean_reflection/.
44. Rahwan, I. and M. Cebrian, Machine Behavior Needs to Be an Academic Discipline, in Nautilus. March 29, 2018: Available at: http://nautil.us/issue/58/self/machine-behavior-needs-to-be-an-academic-discipline.
45. Rahwan, I., et al., Machine behaviour. Nature, 2019. 568(7753): p. 477.
46. Mokhtarian, E., The Bot Legal Code: Developing a Legally Compliant Artificial Intelligence. Vanderbilt Journal of Entertainment & Techology Law, 2018. 21: p. 145.
47. Bathaee, Y., The artificial intelligence black box and the failure of intent and causation. Harvard Journal of Law & Technology, 2018. 31(2): p. 889.
48. Turchin, A. and D. Denkenberger, Classification of global catastrophic risks connected with artificial intelligence. AI & SOCIETY, 2018: p. 1-17.
49. De Garis, H., The Artilect War. Available at https://agi-conf.org/2008/artilectwar.pdf, 2008.