Reading time: 8 minutes.

# On Rogozhin's Universal Turing Machines

Every now and then, we see some headline about Turing Completeness of *something*. For example, Minecraft or Dwarf Fortress, or even Minesweeper are famous examples of accidentally Turing complete systems.

If you know what a Turing machine is (and you should) you will have an intuitive idea of the claim: you know that *X can compute any computable function*. Sure; but if you stop thinking about that for a bit, it is not so intuitive **how** we can prove that. If we think long enough, we can start understanding how X can simulate a Turing machine, but **how can we be sure that we can encode a Universal Turing Machine** and **what is the program of a UTM and how we can prove that it is, in fact, universal?**

Rogozhin’s Turing machines^{1} are in fact the way these problems are tackled nowadays. They provide a set of **minimal** specifications to which we can reduce a different system: if the system can implement a Rogozhin’s UTM then the system is Turing complete.

To understand why they are important and why they work we need to take a small step back.

## Tag Systems and Turing Completeness

Righozhin’s UTMs proof is founded on a much less popular computational model: **Tag Systems**. Tag System is a deterministic computational model published by Emil Leon Post in 1943.

Tag Systems are defined by three elements:

- The
**deletion number**\(m\). This parameter is so important that tag system are often called*m*-tag system. - The
**alphabet**\(A\) - The set of
**production function**\(P(x \in A)\). - A
**tape**initialized with any word from \(A\).

At each computation state, the tag system will:

- It reads the
**first**symbol on the tape. This will be the input for the production function. - It writes the output of the production function
**at the end**of the tape. - Delete the first \(m\) symbol from the head of the tape.

The tag system will stop when the tape is empty or contains a number of symbols smaller than \(m\).^{2}

### A Simple Example

Let’s write a simple 2-tag system that is capable of computing if a certain number N is odd or even. The system has the following parameters:

- \(m=2\)
- \(A = { E, O, X }\)
- Rules:
- \(X \rightarrow e\) (empty string)
- \(E \rightarrow E\)
- \(O \rightarrow EO\)

The tape will be initialized with the string \(OOXXX…X\) with N Xs depending on the input number. At the end, we will find on the tape the symbol \(E\) if the number is *even* and \(O\) if it is *odd*.

So let’s check an even number, for instance 4:

Or an odd number, like 3:

### Tag Systems and Universal Turing Machines

You may ask why this digression on tag systems. The reason is that *tag systems* are way less complex than Turing Machines and, yet, **they are Turing Complete**. Therefore, they are the preferred way to prove that a certain computational model is Turing complete.

In particular, *Cocke & Minsky 1964 ^{3}* proved that 2-tag system can emulate a Universal Turing Machine.

At this point, we can answer our first answer: how can we prove that a Turing Machine is a Universal Turing Machine? The answer is straightforward: **a Turing machine can be shown to be a Universal Turing Machine by proving that it can emulate a Turing-complete class of m-tag systems.**

And that’s exactly how Rogozhin does in his paper: it describes a Turing machine that can simulate *any* 2-tag system. The details of the proof are quite long and filled of extremely tedious details. Given that they will be better explained in the original paper, I will try to give you a very high level intuition of the proof.

### Encoding the Alphabet

Each letter in the alphabet of the Tag System is associated with a number \(N_i\). The Turing machine encodes the \(i\)-th letter in \(A\) by repeating a certain string \( \omega \) \(N_i\) times.

So, for instance, in the example tag system above we have 3 letters (E, O, and X) and we may decide to associate \(E\) with \(N_0 = 1\), \(O\) with \(N_1 = 2\) and \(X\) with \(N_2 = 3\). So, if our Turing machine has just two character \(a\) and \(b\), we may decide to encode \(E\) as \(ab\), \(O\) as \(abab\) and \(X\) as \(ababab\). We call this encoding respectively \(A_E\), \(A_O\) and \(A_X\).

Of course, you cannot chose **any** number for \(N_i\), but for now let’s keep things simple.^{4}

### Encoding the Production Function

The element of the Production Function are in the form \(a_i \rightarrow a_x a_y \ldots a_z\).

The Turing machine will encode this as

$$P_i = A_{N_z} S \ldots S A_{N_y} S A_{N_x}$$

Where \(S\) is some kind of separation **mark**. Note that the production rule is written in **reverse order**. In the example above the \(O -> EO\) production rule would be \(P_O = (abab)a(ab)\) (assuming \(a\) as \(S\)).

### Encoding the Initial Word

Of course, we need to encode the initial word somewhere. We call it \(I\) and we use the same approach used for the words in the production function. So, if we want to encode the initial word \(OOXX\) we will get

$$(abab)b(abab)b(ababab)b(ababab)$$

Note that the separation mark in this string may different from the marks in the production rules!

### Initial tape of the Turing machine

Now we just need to write everything on the tape. First we list the halting symbol ^{5}, then the production rules (in order), then the separator string, and finally the initial word.

$$A_H P_{n} \ldots P_0 \hat{S} I$$

Where \(\hat{S}\) is an extra code containing one or more separation marks in such a way to make the number \(N_i\) useful for computation.^{6} The head of the Turing machine starts on the left of \(I\) (except for the case \(UTM(2,18)\)).

### How the Turing machine works

It may be useful to think at the initial code

- On the first step, the Turing machine searches for the production rule corresponding to the first code on the right of the head.
- The Turing machine writes the output of the selected production rule
**in reverse order**(that is the right order, because it was originally reversed). - The Turing machine restores its tape for a new cycle: that is two codes right to the previous position.

## Back to the Rogozhin’s Universal Turing Machines

At this point Rogozhin proceeds to show how some **minimal** Turing machines can implement this general algorithm.

What does it mean to be **minimal** for a TM? To answer this we define \(UTM(m,n)\) as a UTM with \(m\) states and \(n\) symbols. The product \(mn\) represents the number of instructions of the TM.

Rogozhin’s UTMs are a class of 7 UTMs. In particular:

- \(UTM(24,2)\) is a UTM that
**minimizes the number of symbols**and has a total of 48 instructions. - \(UTM(5,5)\) is a UTM that
**minimizes the number of instructions**(they are just 25, but they handle only a specific – yet Turing complete – format of 2-tag systems). - \(UTM(2,18)\) is a UTM that
**minimizes the number of states**and has a total of 36 instructions.

There are more of them, but in general, when we are proving that something is a UTM, we want to use a representation with few symbols or few states (depending on the problem). Usually, defining a lot of symbols is much easier than explaining a lot of states and, therefore, researchers commonly use the \(UTM(2,18)\) machine. In fact, if you can show that a game/system/things can implement the program of a \(UTM(2,18)\), then you have a proof that the game/system/thing is, in fact, a Universal Turing Machine and, therefore, Turing complete.

At this point I can do a boring example involving some lame universal machines and show you that it can implement the “software” of \(UTM(2,18)\). But I do not want to be lame.

So, following a very funny paper I read recently, we are gonna show it is possible to implement a \(UTM(2,18)\) in * Magic: The Gathering* (and, therefore yes: MTG is a Turing complete game).

This will probably require some time. But it will be the description of the most nerdy thing ever done! It will be fun!

Propose in 1996 by Yurii Rogozhin in

*”Small universal Turing machines.”*↩︎There are also alternative halting method. For instance, we may have a special

*halting symbol*in the alphabet like Post described in the original paper. ↩︎Cocke, John, and Marvin Minsky. “Universality of tag systems with P= 2.” Journal of the ACM (JACM) 11.1 (1964): 15-20. ↩︎

If you are really curious, the number N_i is exactly the number of

*marks*there are between the first code and the i-th production rule on the Turing machine’s tape. So, the Turing machine, when it read the code can “jump” directly to the rule! This is fascinating, almost magical, but this article is already heavy as it is! ↩︎Rogozhin’s works uses a tag system with the halting symbol, but I omitted this to avoid more complications. ↩︎

Read the other foot note. I will not enter into that mess here! ↩︎