Unexplainability and Incomprehensibility of AIIn the domain of AI safety, the more accurate the explanation is, the less comprehensible it is
Explainability and comprehensibility of AI are important requirements for intelligent systems deployed in real-world domains. Users want and frequently need to understand how decisions impacting them are made. Similarly, it is important to understand how an intelligent system functions for safety and security reasons.1
For decades, AI projects relied on human expertise, distilled by knowledge engineers. They were both explicitly designed and easily understood by people. For example, expert systems, frequently based on decision trees, are perfect models of human decision making and so are naturally understandable by both developers and end-users. With a paradigm shift in the leading AI methodology over the last decade to machine learning systems based on Deep Neural Networks (DNN), this natural ease of understanding got sacrificed. The current systems are seen as “black boxes” (not to be confused with AI boxing [1, 2]), opaque to human understanding but extremely capable both with respect to results and the learning of new domains. As long as Big Data and Huge Compute are available, zero human knowledge is required  to achieve superhuman  performance.
With their new found capabilities, DNN-based AI systems are tasked with making decisions in employment , admissions , investing , matching , diversity , security [10, 11], recommendations , banking , and countless other critical domains. As many such domains are legally regulated, it is a desirable property (and frequently a requirement [14, 15]) that such systems should be able to explain how they arrived at their decisions, particularly to show that they are bias-free . Additionally, and perhaps this is an even more important factor in making artificially intelligent systems safe and secure , it is essential that we understand what they are doing and why. A particular area of interest in AI Safety [18-25] is predicting and explaining causes of AI failures .
A number of impossibility results are well-known in many areas of research [27-35] and some are starting to be discovered in the domain of AI research, for example: Unverifiability , Unpredictability2,  and limits on preference deduction  or alignment . In this section we introduce Unexplainability of AI and show that some decisions of superintelligent systems will never be explainable, even in principle. We will concentrate on the most interesting case, a superintelligent AI acting in novel and unrestricted domains. Simple cases of Narrow AIs making decisions in restricted domains (e.g., Tic-Tac-Toe) are both explainable and comprehensible. Consequently, a whole spectrum of AIs can be developed, ranging from completely explainable/comprehensible to completely unexplainable/incomprehensible. We define Unexplainability as the impossibility of providing an explanation for certain decisions made by an intelligent system which is both 100% accurate and comprehensible.
Artificial Deep Neural Networks continue increasing in size and may already comprise millions of neurons, thousands of layers and billions of connecting weights, ultimately targeting and perhaps surpassing the size of the human brain. They are trained on Big Data from which million-feature vectors are extracted and on which decisions are based, with each feature contributing to the decision in proportion to a set of weights. To explain such a decision, which relies on literally billions of contributing factors, AI must either simplify the explanation and so make it less accurate/specific/detailed or report it exactly but elucidate nothing by virtue of the explanation’s semantic complexity, large size, and abstract data representation. Such precise reporting is just a copy of a trained DNN model.
For example, an AI utilized in the mortgage industry may look at an application to decide the creditworthiness of a person in order to approve a loan. For simplicity, let’s say the system looks at only a hundred descriptors of the applicant and utilizes a neural network to arrive at a binary approval decision. An explanation which included all hundred features and weights of the neural network would not be very useful, so the system may instead select one of two of the most important features and explain its decision with respect to just those top properties, ignoring the rest. This highly simplified explanation would not be accurate as if the other 98 features all contributed to the decision and if only one or two top features were considered the decision could have been different. This is similar to the way Principal Component Analysis works for dimensionality reduction .
Any decision made by the AI is a function of some input data and is completely derived from the code/model of the AI. But a useful explanation must be simpler than just the presentation of the complete model while retaining all information that is relevant to the decision. We can reduce this problem of explaining to the problem of lossless compression . Any possible decision derived from data/model can be represented by an integer encoding such data/model combination and it is a proven fact that some random integers can’t be compressed without loss of information due to the Counting argument . “The pigeonhole principle prohibits a bijection between the collection of sequences of length N and any subset of the collection of sequences of length N – 1. Therefore, it is not possible to produce a lossless algorithm that reduces the size of every possible input sequence.”3 To avoid this problem, an AI could try to produce decisions, which it knows are explainable/compressible, but that means that it is not making the best decision with regards to the given problem. Doing so is suboptimal and may have safety consequences and so should be discouraged.
A complementary concept to Unexplainability, Incomprehensibility of AI addresses capacity of people to completely understand an explanation provided by an AI or superintelligence. We define Incomprehensibility as the impossibility of complete understanding of any 100% -accurate explanation for certain decisions of an intelligent system by any human.
Artificially intelligent systems are designed to make good decisions in their domains of deployment. Optimality of the decision with respect to available information and computational resources is what we expect from a successful and highly intelligent systems. An explanation of the decision, in its ideal form, is a proof of correctness of the decision. (For example, a superintelligent chess playing system may explain why it sacrificed a queen by showing that it forces a checkmate in 12 moves and, by doing so, it proves the correctness of its decision.) As decisions and their proofs can be arbitrarily complex, impossibility results native to mathematical proofs become applicable to explanations. For example, explanations may be too long to be surveyed [43, 44] (Unsurveyability), Unverifiable , or too complex to be understood  making the explanation incomprehensible to the user. Any AI, including black box neural networks can in principle be converted to a large decision tree of nothing but “if” statements, but that will only make it human-readable not human-understandable.
It is generally accepted that, in order to understand certain information, a person needs a particular level of cognitive ability. This is the reason students are required to take standardized exams such as SAT, ACT, GRE, MCAT or LCAT, etc., and score at a particular percentile in order to be admitted to their desired program of study at a selective university. Similar tests are given to those wishing to join the military or government service. All such exams indirectly measure person’s IQ (Intelligence Quotient) [46, 47] but vary significantly in how closely they correlate with standard IQ test scores (g-factor loading). The more demanding the program of study (even at the same university), the higher cognitive ability is expected from students. For example, average quantitative GRE score of students targeting mathematical sciences is 163, while average quantitative score for students interested in studying history is 1484. The trend may be reversed for verbal scores.
We can predict a certain complexity barrier to human understanding for any concept for which a relative IQ of above 250 would be necessary, as no person has ever tested so high. In practice, the barrier may be much lower because average IQ is just 100 and additional complications from limited memory and attention span can place even relative easy concepts outside of human grasp. To paraphrase Wittgenstein: if superintelligence explained itself, we would not understand it.
Incomprehensibility results are well-known for different members of Chomsky hierarchy  with finite state automation unable to recognize context-free languages, pushdown automata unable to recognize context-sensitive languages, and linear-bounded non-deterministic Turing machines unable to recognize recursively enumerable languages. Simpler machines can’t recognize languages which more complex machines can recognize.
While people are frequently equated with unrestricted Turing machines via the Church-Turing thesis , Blum et al. formalize human computation, in practice, as a much more restricted class . However, Turing machines are not an upper limit on what is theoretically computable, as described by different hypercomputation models . Even if our advanced AIs (superintelligence) fail to achieve true hypercomputation capacity, for all practical purposes and compared to the human computational capabilities, they would be outside of what human-equivalent agents can recognize/comprehend.
Superintelligence would be a different type of computation, far superior to humans in practice. It is obviously not the case that superintelligent machines would actually have infinite memories or speeds but they would appear to act as they do to unaugmented humans. For example a machine capable of remembering one trillion items vs seven, as in the short-term memory of most people, would appear to have an infinite capacity to memorize. In algorithmic complexity theory, some algorithms become the most efficient for a particular problem type on inputs so large as to be unusable in practice. But such inputs are nonetheless finite . So, just as a finite state automaton can’t recognize recursively enumerable languages, so will people fail in practice to comprehend some explanations produced by superintelligent systems. They are simply not in the same class of automata, even if theoretically, given infinite time, they are.
Additionally, decisions made by AI could be mapped onto the space of mathematical conjectures about the natural numbers. An explanation for why a particular mathematical conjecture is true or false would be equivalent to a proof (for that conjecture). However, due to Gödel’s First Incompleteness Theorem, we know that some true conjectures are unprovable. As we have mapped decisions onto conjectures and explanations onto proofs, some decisions made by AI must be fundamentally unexplainable/incomprehensible. Explanations as proofs would be subject to all the other limitations known about proofs, including Unsurveyability, Unverifiability and Undefinability [53, 54]. Finally, it is important to note that we are not saying that such a decision/conjecture reduction would preserve the semantics of the subject, just that it is a useful tool for showing the impossibility of explainability/comprehensibility in some cases.
The issues described in this paper can be seen as a communication problem between an AI encoding and sending information (sender) and a person receiving and decoding information (receiver). Efficient encoding and decoding of complex symbolic information is difficult enough, as described by Shannon’s Information Theory , but with respect to the Explainability and Comprehensibility of AI we must also worry about the complexity of semantic communication . Explainability and Comprehensibility are another conjugate pair [36, 57] in the domain of AI safety. The more accurate the explanation is, the less comprehensible it is, and vice versa, the more comprehensible the explanation, the less accurate it is. A non-trivial explanation can’t be both accurate and understandable but it can be inaccurate and comprehensible. There is a huge difference between understanding something and almost understanding it. Incomprehensibility is a general result applicable to many domains including science, social interactions, etc., depending on the mental capacity of a participating person(s).
Human beings are finite in our abilities. For example, our short term memory is about 7 units on average. In contrast, an AI can remember billions of items and AI capacity to do so is growing exponentially. While never infinite in a true mathematical sense, machine capabilities can be considered such in comparison with ours. This is true for memory, compute speed, and communication abilities. Hence the famous dictum, Finitum Non Capax Infiniti (the finite cannot contain the infinite), is highly applicable to understanding the incomprehensibility of the god-like  superintelligent AIs.
1 This post is based on https://arxiv.org/abs/1907.03869
2 Unpredictability is not the same as Unexplainability or Incomprehensibility, see ref. 37. Yampolskiy, R.V., Unpredictability of AI. arXiv preprint arXiv:1905.13053, 2019. for details.
1. Yampolskiy, R.V., Leakproofing Singularity-Artificial Intelligence Confinement Problem. Journal of Consciousness Studies JCS, 2012.
2. Armstrong, S. and R.V. Yampolskiy, Security solutions for intelligent and complex systems, in Security Solutions for Hyperconnectivity and the Internet of Things. 2017, IGI Global. p. 37-88.
3. Silver, D., et al., Mastering the game of go without human knowledge. Nature, 2017. 550(7676): p. 354.
4. Bostrom, N., Superintelligence: Paths, dangers, strategies. 2014: Oxford University Press.
5. Strohmeier, S. and F. Piazza, Artificial Intelligence Techniques in Human Resource Management—A Conceptual Exploration, in Intelligent Techniques in Engineering Management. 2015, Springer. p. 149-172.
6. Walczak, S. and T. Sincich, A comparative analysis of regression and neural networks for university admissions. Information Sciences, 1999. 119(1-2): p. 1-20.
7. Trippi, R.R. and E. Turban, Neural networks in finance and investing: Using artificial intelligence to improve real world performance. 1992: McGraw-Hill, Inc.
8. Joel, S., P.W. Eastwick, and E.J. Finkel, Is romantic desire predictable? Machine learning applied to initial romantic attraction. Psychological science, 2017. 28(10): p. 1478-1489.
9. Chekanov, K., et al., Evaluating race and sex diversity in the world’s largest companies using deep neural networks. arXiv preprint arXiv:1707.02353, 2017.
10. Novikov, D., R.V. Yampolskiy, and L. Reznik. Artificial intelligence approaches for intrusion detection. in 2006 IEEE Long Island Systems, Applications and Technology Conference. 2006. IEEE.
11. Novikov, D., R.V. Yampolskiy, and L. Reznik. Anomaly detection based intrusion detection. in Third International Conference on Information Technology: New Generations (ITNG’06). 2006. IEEE.
12. Wang, H., N. Wang, and D.-Y. Yeung. Collaborative deep learning for recommender systems. in Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2015. ACM.
13. Galindo, J. and P. Tamayo, Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational Economics, 2000. 15(1-2): p. 107-143.
14. Goodman, B. and S. Flaxman, European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 2017. 38(3): p. 50-57.
15. Doshi-Velez, F., et al., Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134, 2017.
16. Osoba, O.A. and W. Welser IV, An intelligence in our image: The risks of bias and errors in artificial intelligence. 2017: Rand Corporation.
17. Yampolskiy, R.V., Artificial Intelligence Safety and Security. 2018: Chapman and Hall/CRC.
18. Yampolskiy, R.V., Artificial superintelligence: a futuristic approach. 2015: Chapman and Hall/CRC.
19. Yampolskiy, R.V., What to Do with the Singularity Paradox?, in Philosophy and Theory of Artificial Intelligence. 2013, Springer. p. 397-413.
20. Pistono, F. and R.V. Yampolskiy, Unethical research: how to create a malevolent artificial intelligence. arXiv preprint arXiv:1605.02817, 2016.
21. Umbrello, S. and R. Yampolskiy, Designing AI for Explainability and Verifiability: A Value Sensitive Design Approach to Avoid Artificial Stupidity in Autonomous Vehicles.
22. Trazzi, M. and R.V. Yampolskiy, Building Safer AGI by introducing Artificial Stupidity. arXiv preprint arXiv:1808.03644, 2018.
23. Yampolskiy, R.V., Personal Universes: A Solution to the Multi-Agent Value Alignment Problem. arXiv preprint arXiv:1901.01851, 2019.
24. Behzadan, V., R.V. Yampolskiy, and A. Munir, Emergence of Addictive Behaviors in Reinforcement Learning Agents. arXiv preprint arXiv:1811.05590, 2018.
25. Behzadan, V., A. Munir, and R.V. Yampolskiy. A psychopathological approach to safety engineering in ai and agi. in International Conference on Computer Safety, Reliability, and Security. 2018. Springer.
26. Yampolskiy, R.V., Predicting future AI failures from historic examples. Foresight, 2019. 21(1): p. 138-152.
27. Gödel, K., On formally undecidable propositions of Principia Mathematica and related systems. 1992: Courier Corporation.
28. Heisenberg, W., Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik, in Original Scientific Papers Wissenschaftliche Originalarbeiten. 1985, Springer. p. 478-504.
29. Fisher, M., N. Lynch, and M. Peterson, Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM, 1985. 32(2): p. 374-382.
30. Grossman, S.J. and J.E. Stiglitz, On the impossibility of informationally efficient markets. The American economic review, 1980. 70(3): p. 393-408.
31. Kleinberg, J.M. An impossibility theorem for clustering. in Advances in neural information processing systems. 2003.
32. Strawson, G., The impossibility of moral responsibility. Philosophical studies, 1994. 75(1): p. 5-24.
33. Bazerman, M.H., K.P. Morgan, and G.F. Loewenstein, The impossibility of auditor independence. Sloan Management Review, 1997. 38: p. 89-94.
34. List, C. and P. Pettit, Aggregating sets of judgments: An impossibility result. Economics & Philosophy, 2002. 18(1): p. 89-110.
35. Dufour, J.-M., Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica: Journal of the Econometric Society, 1997: p. 1365-1387.
36. Yampolskiy, R.V., What are the ultimate limits to computational techniques: verifier theory and unverifiability. Physica Scripta, 2017. 92(9): p. 093001.
37. Yampolskiy, R.V., Unpredictability of AI. arXiv preprint arXiv:1905.13053, 2019.
38. Armstrong, S. and S. Mindermann, Impossibility of deducing preferences and rationality from human policy. arXiv preprint arXiv:1712.05812, 2017.
39. Eckersley, P., Impossibility and Uncertainty Theorems in AI Value Alignment.
40. Brinton, C. A framework for explanation of machine learning decisions. in IJCAI-17 Workshop on Explainable AI (XAI). 2017.
41. Hutter, M., The Human knowledge compression prize. URL http://prize. hutter1. net, 2006.
42. Compression of random data (WEB, Gilbert and others), in Faqs. Retrieved June 16, 2019: Available at: http://www.faqs.org/faqs/compression-faq/part1/section-8.html.
43. Bassler, O.B., The surveyability of mathematical proof: A historical perspective. Synthese, 2006. 148(1): p. 99-133.
44. Coleman, E., The surveyability of long proofs. Foundations of Science, 2009. 14(1-2): p. 27-43.
45. Yampolskiy, R.V., Efficiency Theory: a Unifying Theory for Information, Computation and Intelligence. Journal of Discrete Mathematical Sciences & Cryptography, 2013. 16(4-5): p. 259-277.
46. Abramov, P.S. and R.V. Yampolskiy, Automatic IQ Estimation Using Stylometric Methods, in Handbook of Research on Learning in the Age of Transhumanism. 2019, IGI Global. p. 32-45.
47. Hendrix, A. and R. Yampolskiy. Automated IQ Estimation from Writing Samples. in MAICS. 2017.
48. Chomsky, N., Three models for the description of language. IRE Transactions on information theory, 1956. 2(3): p. 113-124.
49. Yampolskiy, R., The Singularity May Be Near. Information, 2018. 9(8): p. 190.
50. Blum, M. and S. Vempala, The Complexity of Human Computation: A Concrete Model with an Application to Passwords. arXiv preprint arXiv:1707.01204, 2017.
51. Ord, T., Hypercomputation: computing more than the Turing machine. arXiv preprint math/0209332, 2002.
52. Lipton, R.J. and K.W. Regan, David Johnson: Galactic Algorithms, in People, Problems, and Proofs. 2013, Springer. p. 109-112.
53. Tarski, A., Der Wahrheitsbegriff in den formalisierten Sprachen. Studia Philosophica, 1936. 1: p. 261–405.
54. Tarski, A., The concept of truth in formalized languages. Logic, semantics, metamathematics, 1956. 2: p. 152-278.
55. Shannon, C.E., A mathematical theory of communication. Bell system technical journal, 1948. 27(3): p. 379-423.
56. Wooldridge, M., Semantic issues in the verification of agent communication languages. Autonomous agents and multi-agent systems, 2000. 3(1): p. 9-31.
57. Calude, C.S., E. Calude, and S. Marcus. Passages of Proof. December 2001 Workshop Truths and Proofs. in Annual Conference of the Australasian Association of Philosophy (New Zealand Division), Auckland. 2001.
58. Rahner, K., Thomas Aquinas on the Incomprehensibility of God. The Journal of Religion, 1978. 58: p. S107-S125.