Chaitin’s Discovery of a Way of Describing True RandomnessHe found that concepts from computer programming worked well because, if the data is not random, the program should be smaller than the data
In this week’s podcast, “The Chaitin Interview II: Defining Randomness,” Walter Bradley Center director Robert J. Marks interviewed mathematician and computer scientist Gregory Chaitin on randomness. It’s a subject on which Chaitin has thought deeply since his teenage years (!), when he published a journal paper on the subject. How do we measure randomness? Chaitin begins by reflecting on his 1969 paper:
This portion begins at 1:12 min. A partial transcript, Show Notes, and Additional Resources follow.
Gregory Chaitin: In particular, my paper looks at the size of computer programs in bits. More technically you ask, what is the size in bits of the smallest computer program you need to calculate a given digital object? That’s called the program size complexity or the algorithmic information content.
Robert J. Marks: I’ve heard you call those “elegant” programs.
Gregory Chaitin: Well, an elegant program is the smallest program that has the output that it does. Its size in bits will be the measure of complexity or of algorithmic information content.
Robert J. Marks: As a teenager, you published a landmark paper, On the Length of Programs for Computing Finite Binary Sequences. You published it in the Journal of the Association for Computing Machinery, which I guess is the oldest journal associated with theoretical computer science. It was founded in 1947, right after World War II. This was a landmark paper in the founding of algorithmic information theory and you covered a number of topics.
One of them, which is just fascinating, is a brand new concept of the idea of randomness. You offered a whole new definition of randomness. Do you have a definition for randomness in general?
Gregory Chaitin: Well, the normal definition of randomness is a process that produces something unpredictable, like tossing a coin. If you have a fair coin and you keep tossing it, that’s going to give you a random sequence of heads and tails. You look at how it was generated, the sequence. But I wanted a mathematical definition, because you see if you toss a coin, you could get all heads, and that isn’t random, but it is possible.
Robert J. Marks: Yes.
Gregory Chaitin: I didn’t like that definition so I wanted a definition of lack of structure. You see, with the normal coin tosses, actually every possible finite sequence of heads and tails in a sense is equally random, because they were all generated by tossing a fair coin. But some of them, all heads has a lot of structure, all tails have a lot of structure, alternating heads and tails have a lot of structure. I was looking at something that ignored how the sequence is generated and just looked at it and said, is there structure here or isn’t there?
Now, the reason for doing this is because you can think of a physical theory to explain a phenomenon as a program — software that can calculate the predictions. If the program is short, then you have a very comprehensible theory and a lot of structure. But if the program is the same size in bits as the number of bits of experimental data, then that’s not much of an explanation. It’s not much of a theory because there always is a program the same size in bits as the bits of data.
Why? Because it just puts the data into the program directly and prints it out. That can always be done. But the smaller the program is, compared in size in bits to the number of bits of data that you’re trying to explain — and I’m talking about an explanation that gives no noise — it’s not a statistical theory. It has to give every bit of the data correctly. If that’s a small program, then you have a good theory. If you have two theories and one of them is a smaller program than the other, the smaller program is a better theory if the two of them calculate the exact sequence of your experimental data. It’s sort of a model of the scientific method.
Now, I’m not using equations. There’s a lot of talk about complexity in discussions of the philosophy of science, but they’re talking about the complexity of the equations, for example. That’s very hard to define and make a mathematical theory about because mathematical notation changes. But if you have to explain to a computer how to calculate the observations, there are universal Turing machines. There are optimum computers that give the smallest programs, and that’s a good basis for a mathematical theory, a more precise definition of complexity.
When I was a kid, I was reading a book by Karl Popper called The Logic of Scientific Discovery [1934 German/1959 English] He has a whole chapter on simplicity and he points out some remarks of Hermann Weyl on this. Weyl was a student of David Hilbert (1862–1943). He was a wonderful mathematician and mathematical physicist, and he wrote two books on philosophy where he says that the notion of causality of a theory — really saying that something is governed by a scientific law — is meaningless unless you have a notion of complexity because there’s always a law.
Note: Hermann Weyl (1885–1955, pictured) was, according to Stanford Encyclopedia of Philosophy, “a great and versatile mathematician of the 20th century. His work had a vast range, encompassing analysis, algebra, number theory, topology, differential geometry, spacetime theory, quantum mechanics, and the foundations of mathematics.”
Stanford also notes, “He was unusual among scientists and mathematicians of his time in being attracted to idealist philosophy: his idealist leanings can be seen particularly in his work on the foundations of mathematics.” Shades of Kurt Gödel (1906–1978) there, in that Gödel also stood out, in his case because of his belief in God.
Gregory Chaitin: You can also always find an equation passing through points of data on a graph. A thing called a Lagrangian interpolation will produce an algebraic equation that passes through any finite set of points. Leibniz makes a similar remark and Weyl was aware [of that]. You have to have a notion of complexity as well as a notion of what a law is because otherwise it’s meaningless to say that there’s a theory for something.
I think this is a very deep remark. The question was, I think Weyl also says, it’s tough to define this precisely because mathematical notation changes. What are you going to use? Are you going to use Bessel functions in your equation, for example? They change as a function of time, so it seems a bit arbitrary. Now, taking a universal computer and looking at the size in bits of a program gives a more definite notion of complexity that you measure in bits.
Also, there’s a problem, because if you look at what Weyl discussed and what Leibniz discussed, they’re talking about points of data that a scientist has on graph paper, and these points are infinite precision information. They’re real numbers. In theory, a point is infinite precision, so it’s an infinite amount of information. A key step in an algorithm information theory is that you replace the original problem, which was points of data on graph paper and an equation passing through those points, which doesn’t work out too well, although it’s closer to the real case in real physics.
You replace it by discrete and finite amounts of information. You think of the physical scientific data you’re trying to explain as a finite sequence of zeros and ones, and then the program, which is your theory, is also a finite sequence of zeros and ones, in binary. Inside computers it’s always binary. Then it’s very easy to compare how many bits in your theory versus how many bits in your data?
The simplest theory is the best, and if there is no theory simpler than the data you are trying to explain, then the data is random. It has no structure because for any sequence of bits, you can calculate it from a program the same size in bits as the data. That doesn’t enable you to distinguish between a sequence of bits with structure from a sequence of bits that has no structure. It’s when you say that the program has to be simpler than the data you’re trying to explain. Your theory has to be simpler than the data. Then it’s a theory.
Gregory Chaitin: This idea goes back to the discourse on metaphysics, which is a relatively short text of Leibniz. The original is in French, it’s called Discours de Metaphysique. It was found nearly a century after Leibniz died among his papers. The person who found it gave it this name. Weyl, following the Germanic tradition, had studied a lot of philosophy. Leibniz is a hero in the German speaking world. He’s less known outside the German speaking world. Weyl’s two books on philosophy mention Leibniz and mention this idea. Then Popper refers to it in The Logic of Scientific Discovery.
Note: Gottfried Wilhelm Leibniz (1646–1716, pictured) was, according to the Stanford Encyclopedia of Philosophy “one of the great thinkers of the seventeenth and eighteenth centuries and is known as the last “universal genius”. He made deep and important contributions to the fields of metaphysics, epistemology, logic, philosophy of religion, as well as mathematics, physics, geology, jurisprudence, and history.”
Gregory Chaitin: Algorithmic Information Theory takes up this question and changes the context, from a theory as an equation passing through a set of points which have an infinite precision to making everything be discrete and have a finite number of bits. Then, mathematically, you’re in business.
This was the fundamental idea that inspired me to try to work out a detailed theory. I had these papers in high school, but I did many versions of the theory. The one I regard as definitive is called A Theory of Program Size, Formally Identical to Information Theory, also published in the journal of the ACM in 1975. The original versions had some problems, and I think the definitive version is from 1975.
Now, my interest in this was philosophical. I wanted to understand what a theory is, how to measure its complexity. But mostly I was interested in incompleteness, because it turns out that this notion of complexity asked for the size of the smallest program to calculate something. This is how you measure the algorithmic information content of that digital object.
It’s a nice definition. You have a nice mathematical theory, but you can never calculate it. Well, except in a finite number of cases for various small programs. Everywhere you go you get incompleteness in this theory, things that you can define but you can’t calculate. Incompleteness sort of hits you in the face in this theory. My main interest was in incompleteness and in trying to extend the work of Gödel and Turing on incompleteness that I had studied as a teenager. But there are other people who will have more practical interests. Making this criterion that a good theory is a small one, you can apply it to predicting future observations by looking at the size.
Note: The photograph of Hermann Weyl, of unknown date, is courtesy ETH Zürich (CC BY-SA 3.0). The portrait of Leibniz (ca. 1695) by Christoph Bernhard Francke is public domain.
Next: How did Ray Solomonoff shake up algorithmic information theory?
Here are the stories, with links, to the earlier podcast discussion with Gregory Chaitin last week:
Gregory Chaitin’s “almost” meeting with Kurt Gödel. This hard-to-find anecdote gives some sense of the encouraging but eccentric math genius. Chaitin recalls, based on this and other episodes, “There was a surreal quality to Gödel and to communicating with Gödel.”
Gregory Chaitin on the great mathematicians, East and West: Himself a “game-changer” in mathematics, Chaitin muses on what made the great thinkers stand out. Chaitin discusses the almost supernatural awareness some mathematicians have had of the foundations of our shared reality in the mathematics of the universe.
How Kurt Gödel destroyed a popular form of atheism. We don’t hear much about logical positivism now but it was very fashionable in the early twentieth century. Gödel’s incompleteness theorems showed that we cannot devise a complete set of axioms that accounts for all of reality — bad news for positivist atheism.
You may also wish to read: Things exist that are unknowable: A tutorial on Chaitin’s number (Robert J. Marks)
Five surprising facts about famous scientists we bet you never knew: How about juggling, riding a unicycle, and playing bongo? Or catching criminals or cracking safes? Or believing devoutly in God… (Robert J. Marks)
- 00:27 | Introducing Gregory Chaitin
- 01:12 | Chaitin’s landmark paper published in his teen years
- 02:04 | Chaitin’s definition of randomness
- 08:43 | Metaphysics
- 10:30 | Chaitin’s philosophical interest
- 19:24 | Fermat’s Last Theorem
- Gregory Chaitin’s Website
- “On the length of programs for computing finite binary sequences,” by Gregory Chaitin, published when he as a teenager. (Journal of the ACM (JACM) 13, no. 4 (1966): 547-569).
- Unravelling Complexity: The Life and Work of Gregory Chaitin, edited by Shyam Wuppuluri and Francisco Antonio Doria
- Karl Popper
- The Logic of Scientific Discovery by Karl Popper
- Hermann Weyl
- Gottfried Wilhelm Leibniz
- Leibniz’s Discours de Métaphysique (in English)
- “A Theory of Program Size Formally Identical to Information Theory” by Gregory Chaitin (Journal of the ACM (JACM) 22, no. 3 (1975): 309-440).
- Ray Solomonoff
- Marvin Minsky
- Fermat’s Last Theory on YouTube