An intro to the recursion theorem
This post is about Kleene's recursion theorem in computability theory, which conveys the idea that computer programs can talk about themselves. Not in a I-am-sentient-feed-me type of way, but in the sense that they can somehow see their own source code and do things with it.
It's a remarkable theorem with a deceptively simple proof. The field of computability theory was originally called recursion theory before they wanted more grant money and changed their name; this is arguably the most fundamental results in the field. I've heard a set theorist say that recursion theory is the study of the recursion theorem, for whatever that's worth. (To be clear, it's not a super deep theorem, but it lays a foundation for many other results.) To give it some motivation, though, I'll talk about quines first.
After explaining the theorem, I'll also try to explain its proof, which I think is interesting because it's ridiculously short. It feels like smoke and mirrors, and I think it's one of the rare instances in math where you have a nontrivial theorem with a trivial proof. (Trivial doesn't mean easy!)
Quines
Quines are programs whose effect is to print out their own source code, without taking any input (so you cannot input the file that holds their source code, for example).
Think about it for a second: how would you do this in your favorite programming language? Can you do it in any programming language?
They're very interesting objects, perhaps because they live on the boundary of that favorite ingredient of paradoxes: self-reference. They also seem to be unreasonably popular among a certain demographic of math and computer science nerds. Douglas Hofstadter in his book Godel, Escher, Bach was apparently the first to name this concept a quine, after the philosopher Willard Van Orman Quine, and this book being unreasonably good is my guess for why many people have heard of quines.
Examples
Here's a few quines from Wikipedia:
# This example uses format strings, and chr(39) == "'".
a = 'a = {}{}{}; print(a.format(chr(39), a, chr(39)))'; print(a.format(chr(39), a, chr(39)))
# Another quine. %r quotes automatically
exec(s:='print("exec(s:=%r)"%s)')
# A quine in SQL. Kind of horrendous to parse.
SELECT REPLACE(REPLACE('SELECT REPLACE(REPLACE("$",CHAR(34),CHAR(39)),CHAR(36),"$") AS Quine',CHAR(34),CHAR(39)),CHAR(36),'SELECT REPLACE(REPLACE("$",CHAR(34),CHAR(39)),CHAR(36),"$") AS Quine') AS Quine
These quines use format strings and replacement to obfuscate the self-reference, but it's there.
There's another kind of quine, which is in Lisp-based languages your program can be
1
and in this case 1
will get printed to output, since numbers are self-evaluating. Similarly, in C-based languages, the empty program is a quine. If these feel a bit hacky, or like cheating, that's okay - we're going to find out that we can make quines even when we change the rules to get rid of this kind of cheating.
What are quines doing?
One can view quines as the fixed point of the execution
function, which takes the code for a program and executes it with (we'll say) no input: normally execution(program) = output
, and here execution(program) = program
. So the existence of quines is like a fixed-point theorem, and we will make this precise later.
But what does it mean to "apply" `execution` to a function?
This can be a little confusing, because programs are functions themselves. Normally you would write
program(input) = output
. So actually, when I writeprogram
, I should be writingtext(program)
for the text (code) of the program, and say theexecution
program has the fixed pointtext(program)
, i.e.execution(text(program)) = text(program)
.
As a second aside, the
execution
program is usually called a universal Turing machine, and is worth its own post. It's a bit easier to see that this should exist - just think about emulators.
However, we can also ask for something a little more general than just this. A quine is a program that says "take my source code, and print it". The weird and remarkable thing is not "print it", it's the first part - "take my source code". If we can have a program print its own source code, then maybe it can inspect it, play around with it, do anything with it?
I should emphasize this is not the same as the most self-modifying programs, which is much less mysterious -- those programs take as input the file containing their source code, and modify that. They don't have this sort of innate knowledge about their source code that quines do. A program at location /usr/bin/fake_quine
which says "print the contents of /usr/bin/fake_quine
" is not a quine; it had to read in the contents of that file. This is like a program which goes execution(text(program), input_file_with_text_of_program) = text(program)
.
What we want are programs that do things like, for example, "Take no input, and output a Minecraft redstone machine that runs my own source code twice." How can this program compile its own source code to the Minecraft redstone programming language, without having direct access to its own source code?
Kleene's second recursion theorem
Kleene's second recursion theorem says: you can find a program which does anything you want with its source code. However, it says this in the language of computability theory, which requires some parsing. (There is also a Kleene's first recursion theorem, but it's about fixed points of objects called enumeration operators, and isn't as widely known.)
Theorem (Stephen Kleene, 1938): For any partial computable function
What the heck? I was promised Minecraft redstone compilers and I got functions from
So here's what you should actually read the theorem as: For any computer program
In other words, if we find this magic index / source code
The correspondence: the Church-Turing thesis
The correspondence is not so bad, as it turns out. There's only countably many programs, i.e. they are in one-to-one correspondence with the natural numbers, because we can just write down all the programs of length 1 (using UTF or ASCII or your favorite encoding), then all programs of length 2, and all programs of length 3, and so on. You can sort them alphabetically, or some other way, to get a listing of all finite strings in order. (We're demanding that computer programs can't be infinite.) In this way you can compute an index for every valid C program, and also compute the C program that corresponds to every valid index. So we can "compile" any programming language to the "programming language" of natural numbers.
A partial computable function is more complicated: the idea is to capture any function from
We define the partial computable functions by specifying a recipe for building them all:
- Start with the constant functions (e.g.
for any input ), the successor function , and the projection functions which take in many inputs and give you just one, e.g. .- These are definitely computable by even the strictest standards; it's hard to imagine some definition of computation that wouldn't include these.
- Extend these functions by allowing three operations:
- Composition: if
is a partial computable function and is a partial computable function, then so is , i.e. we can compute . This makes sense - just do the computations one after another. - Primitive recursion: if
is a partial computable function, then so is the function which performs over and over times. only outputs one number, so by "over and over" I mean , and undefined if that first input of is undefined. Note that this decreases the number of inputs by two - had inputs and has ! This makes sense as a thing we should be able to compute - just compute repeatedly times. - Minimization: given a partial computable function
, the function returns the smallest argument which makes . This might not exist, or might not halt at some input, which would make the output of undefined, and this how we get undefined outputs in this scheme. Again, this is pretty obvious to compute - just run on these inputs for each over and over, stopping only if we find a zero. Every partial computable function is made by starting with these three types of function and doing these three operations. Remarkably, this gives every function we can possibly compute, whether it's the source code of a webserver turning input strings of HTTP requests into output strings of website data, or a program searching for twin primes, or quantum algorithms, or anything else.
- Composition: if
Quick aside on total computable and primitive recursive functions
Partial computable functions are also called just computable functions, or just recursive functions. There is also the notion of a primitive recursive function, which is the same definition except without minimization, so that the functions are total instead of partial; i.e., they are defined on every input. Minimization is the only thing that introduces inputs which don't have an output into this. However, this is not that same thing as a total computable function, which is just a partial computable function that is total. It's a good exercise to come up with a total computable function that isn't primitive recursive!
The way this is done is by showing that partial computable functions can compute anything that a Turing machine, which is a primitive type of computer, can compute, and that a Turing machine can compute any partial computable function. The second part of this is easy, I explained it informally in the bullet points, but the first part is much more annoying and was done by Alan Turing's during his PhD in the mid-1930s.
Turing himself wasn't thinking about the computers of today; the computers of his time were people, and he designed his simple Turing machine to have the basic properties that a British clerk with an unlimited supply of paper and time might have. Later, people proved that although modern computers can easily emulate a Turing machine, a Turing machine can also emulate any modern computer.
This is the Church-Turing thesis: any model of computation is equivalent, and so by analyzing these very simple models of computation (partial computable functions, Turing machines) and seeing what is not computable or hard to compute relative to other things in these settings, we can say things about the nature of computation in general.
Enumerating the partial recursive functions
We can't list out the partial computable functions nicely (i.e. without many repeats), but we can list out Turing machines. Very briefly a Turing machine is determined by a "transition function" which turns the state of the machine (chosen from some finite set of possible states), plus the current input (let's say a 0 or a 1) it is reading from memory, into an output which it writes to memory and the next place to go to in memory. So if we only allow 1 state, for example, there's only finitely many combinations of
If that was too fast, that's okay, since I haven't really said what a Turing machine is; just take on faith that we can list out all their programs, just like we can list out all Python programs in alphabetical order.
This gives us a precise way to say the
Theorem (Kleene, 1938): For any partial computable function
Hopefully, it now is more believable that this actually means there is a computer program
The magical proof
I'll start with the punchline. Here is the entire proof:
S-m-n
The crux of the proof is the S-m-n theorem, which says there is a computable function
You can visualize this as follows: think of the function
It's not too hard to imagine why
on input y:
emulate the two-input program with source code e and inputs x and y
Clearly
- Church's thesis that everything computable can be computed by a Turing machine,
- the pseudocode in English can be "compiled" into a natural-number index by a Turing machine,
- and you can emulate another Turing machine using a Turing machine, then this is a genuine proof. I haven't done much to convince you of the third point, but arguably the fact that you can go look up a Turing machine emulator on the internet should reduce this to just needing to believe the first point.
This is a form of currying - instead of taking two inputs to an output, take the first input to a program which turns the second input to an output. (Currying is named after Haskell Curry, as is the programming language Haskell. In category theory, adjunctions extend the idea of currying in a very broad fashion.)
A story, a proof attempt, and a proof
I'll attempt to motivate the proof now, but it might be confusing. The story I was given when I first heard the proof was this:
A group of Kleene's students came to his office one day because they were trying to understand the proof of his recursion theorem. They asked him, 'How did you prove this theorem? It seems like you just messed around with the S-m-n theorem.'
They stood there in his office, bracing themselves for a long explanation of the deep motivation and understanding he must have had of this theorem, and why it was so useful and profound despite its inscrutable proof.
When he finally responded, he said: 'Well, yeah, I just messed around with the S-m-n theorem' and had no further motivation to offer.
We want to find some source code
This is very close! But we wanted the left-hand side to be
We could try the opposite approach, where we start with
So what should we do? We just add a fudge factor to make up the difference. Instead of letting
Now it works:
Importantly, we are able to compute
Maybe more dramatically, this tells us we can take a programming language like Python, and add a keyword my_own_source_code
which is valid in our brand-new Python4, and we are able to write programs just fine in this new source code. We can take any Python4 program which references itself, and compile it to an ordinary (but probably very-strange looking) Python3 program, and run it. Self-reference doesn't add any additional computational power; it's already baked into all the computers we have.
Rogers' fixed-point theorem (Kleene's first recursion theorem)
To wrap things up, I'll state Rogers's fixed point theorem.
Theorem: Given any total computable function
Recall that total computable functions are partial computable functions which are defined on every input. So the idea behind this statement is saying, no matter how we screw around with our listing of computable functions, we're always going to have something that computes the same function before and after.
You might instinctively think that maybe you can just send
The proof of this from Kleene's second recursion theorem is immediate - let
on input e,n
compute f(e)
emulate the program with source code f(e) on input n
Then, the recursion theorem says there's some
- that you can't list out the computable functions without repeats
- that you can't compute the set of indices / source codes for a given computable function
- that given any two programming languages there is some number
so that the th program in one language is the same as the th program in the other language - computable functions correspond to formulas of arithmetic
- there are true theorems in arithmetic which are not provable (fixing any reasonable proof system)
- and plenty more, since this is a pretty common tool almost taken for granted in computability theory.