Large Numbers — Notes

This page goes into greater detail on the background of some of the large numbers and functions described on my large numbers page. The topics are presented in the same order as on that page.

Contents of this page:
Notes on Fermat numbers
Perfect numbers
Mersenne Numbers
Mersenne Primes
Latin Number Names
Googol and Googolplex
Real-valued hyper4 operator
Gödel's undecidable sentence
Notes on Skewes Number
Ackermann's Function
Comparing Graham's Number and Chained Arrow Numbers
What Turing and Church Did Not Prove
Notes on Busy Beaver Function
Busy Beavers and BB Candidate Record-Setters
Analysis of the 1997 Marxen-Buntrock BB6 Record-Setter
Analysis of the Oct 2000 Marxen-Buntrock BB6 Record-Setter 'q'

Notes on Fermat numbers

The Fermat Numbers, Sloane's A000215, are all numbers of the form 22N+1:


3
5
17
257
65537
4294967297
18446744073709551617
340282366920938463463374607431768211457
...

Their factorizations are:


3
5
17
257
65537
641 × 6700417
274177 × 67280421310721
59649589127497217 × 5704689200685129054721
...

All of the factors of all of the Fermat numbers, arranged in ascending order, make Sloane's sequence A023394. There is a project, similar to the Mersenne prime search, to find all terms in this sequence.

Goldbach's Proof That There are an Infinite Number of Primes

There are many proofs of the "infinitude" of primes. Eleven are listed in the book by Paulo Ribenboim, The New Book of Prime Number Records, 3rd edition (Springer-Verlag, New York, 1995, ISBN 0-387-94457-5).

This is Goldbach's proof, which he gave in a letter written to Euler in 1730. I have expanded and rewritten it from this page (in the past this page has been sometimes unavailable).

Lemma (Goldbach, 1730): Any two Fermat numbers Fn=22n+1 are relatively prime to each other.

Proof: First notice that

Fx - 2 = 22x - 1

Now set

Fn+1 - 2 = 22n+1 - 1

Substitute a^2 - 1 for (a - 1) × (a + 1) on the right hand side:

Fn+1 - 2 = (22n - 1) × (22n + 1)

therefore (substituting x for n+1) we have

Fx - 2 = (Fx-1 - 2) × Fx-1

we also have

F1 - 2 = F0

Therefore (by induction) it is true in general that

Fn - 2 = F0 × F1 × ... × Fn-1

Let m be any integer less than n. Fm is one of the terms on the right-hand side. Let d by any common factor of Fm and Fn. Since Fm appears on the right side, d must be a factor of Fn-2. Since d is a factor of Fn and of Fn-2, d is a factor of 2. But every Fermat number is odd, so d must be 1. Thus, the only common factor of any Fermat number and any other smaller Fermat number is 1: so the two Fermat numbers are relatively prime.

Theorem (Goldbach, 1730): There are infinitely many primes.

Proof: For each Fermat number Fn, choose a single prime divisor pn. Since the Fermat numbers are relatively prime to each other (just proven) we know that any two primes pm and pn are different. Therefore there is at least one prime pn for each Fermat number Fn, that is, at least one prime for every integer n.

Since any two Fermat numbers are relatively prime to each other, the Fermat sequence is called pairwise relatively prime. Any other pairwise relatively prime sequence can also be used to prove that there is an infinite number of primes. Here is another example:


w1 = 2
w2 = w1 + 1
w3 = w1 × w2 + 1
w4 = w1 × w2 × w3 + 1
etc.

(Other valid sequences result if you change the initial 2 and the 1's at the end of subsequent lines to any other pair of numbers that are relatively prime to each other). Another form of the same sequence is:


a1 = 2
an+1 = an2 - an + 1

This sequence starts:


2 (prime)
3 (prime)
7 (prime)
43 (prime)
1807 = 13 × 139
3263443 (prime)
10650056950807 = 547 × 607 × 1033 × 31051
113423713055421844361000443 = 29881 × 67003 × 9119521 × 6212157481
etc.

Other sequences, such as the Mersenne numbers are relatively prime, but there is no simple iterative formula to generate them.

Perfect numbers

Following are the smaller perfect numbers. All are of the form

Pp = 2p-1(2p-1)

where p is one of the Mersenne primes listed below. The issue of odd perfect numbers is still unresolved.


n p Pp
1 2 6
2 3 28
3 5 496
4 7 8128
5 13 33550336
6 17 8589869056
7 19 137438691328
8 31 2305843008139952128
9 61 2658455991569831744654692615953842176
10 89 191561942608236107294793378084303638130997321548169216
11 107 13164036458569648337239753460458722910223472318386943117783728128
12 127 14474011154664524427946373126085988481573677491474835889066354349131199152
13 521 A bit too long to show here
14 607 Even longer (and so on...)

For more perfect numbers, check the list of Mersenne primes below and use the formula Pp = 2p-1(2p-1).

Mersenne Numbers

The "Mersenne numbers" are numbers of the form 2p-1 where p is prime. Here are the first 25. Because the Mersennes are pairwise relatively prime, each prime number will occur at most once in the factors column of this table:


p 2p-1 factors
2 3 (prime)
3 7 (prime)
5 31 (prime)
7 127 (prime)
11 2047 = 23 × 89
13 8191 (prime)
17 131071 (prime)
19 524287 (prime)
23 8388607 = 47 × 178481
29 536870911 = 233 × 1103 × 2089
31 2147483647 (prime)
37 137438953471 = 223 × 616318177
41 2199023255551 = 13367 × 164511353
43 8796093022207 = 431 × 9719 × 2099863
47 140737488355327 = 2351 × 4513 × 13264529
53 9007199254740991 = 6361 × 69431 × 20394401
59 576460752303423487 = 179951 × 3203431780337
61 2305843009213693951 (prime)
67 147573952589676412927 = 193707721 × 761838257287
71 2361183241434822606847 = 228479 × 48544121 × 212885833
73 9444732965739290427391 = 439 × 2298041 × 9361973132609
79 604462909807314587353087 = 2687 × 202029703 × 1113491139767
83 9671406556917033397649407 = 167 × 57912614113275649087721
89 618970019642690137449562111 (prime)
97 158456325028528675187087900671 = 11447 × 13842607235828485645766393

Mersenne Primes

A Mersenne number is a Mersenne prime if it is prime. In the above list, the Mersenne primes are M2 = 3, M3 = 7, M5 = 31, M7 = 127, M13 = 8191, etc. For each Mersenne prime there is also a perfect number Pp given by

Pp = 2p-1(2p-1).

Here is a (reasonably complete) list of known primes p for which Mp is a Mersenne prime. For the first few, the "discovery" credit refers to the discovery of the corresponding perfect number:


n p digits
in Mp
digits
in Pp
year
found
discoverer(s)
1 2 1 1 ~500BC unknown
2 3 1 2 ~275BC Euclid
3 5 2 3 ~275BC Euclid
4 7 3 4 ~275BC Euclid
5 13 4 8 1456 unknown
6 17 6 10 1588 Cataldi
7 19 6 12 1588 Cataldi
8 31 10 19 1772 Euler
9 61 19 37 1883 Pervouchine (or "Pervushin")
10 89 27 54 1911 Powers
11 107 33 65 1914 Powers (also claimed by Fauquembergue)
12 127 39 77 1876 Lucas
13 521 157 314 19520130 Robinson
14 607 183 366 19520130 Robinson
15 1279 386 770 19520626 Robinson
16 2203 664 1327 19521007 Robinson
17 2281 687 1373 19521009 Robinson
18 3217 969 1937 19570908 Riesel
19 4253 1281 2561 19611103 Hurwitz & Selfridge
20 4423 1332 2663 19611103 Hurwitz & Selfridge
21 9689 2917 5834 19630511 Gillies
22 9941 2993 5985 19630516 Gillies
23 11213 3376 6751 19630602 Gillies
24 19937 6002 12003 19710304 Tuckerman
25 21701 6533 13066 19781030 Noll & Nickel
26 23209 6987 13973 19790209 Noll
27 44497 13395 26790 19790408 Nelson & Slowinski
28 86243 25962 51924 19821025 Slowinski
29 110503 33265 66530 19880128 Colquitt & Welsh
30 132049 39751 79502 19830919 Slowinski
31 216091 65050 130100 19850901 Slowinski
32 756839 227832 455663 19920401 Slowinski & Gage
33 859433 258716 517430 19940201 Slowinski & Gage
34 1257787 378632 757263 19960903 Slowinski & Gage
35 1398269 420921 841842 19961113 Armengaud, Woltman et. al.
36 2976221 895932 1791864 19970824 Spence, Woltman et. al.
37 3021377 909526 1819050 19980127 Clarkson, Woltman, Kurowski et. al.
38 6972593 2098960 4197919 19990601 Hajratwala, Woltman, Kurowski et. al.
39 13466917 4053946 8107892 20011114 Cameron, Woltman, Kurowski, et. al.
?? 20996011 6320430 12640858 20031202 Shafer, GIMPS et. al.
?? 24036583 7235733 14471465 20040515 Findley, GIMPS et. al.
?? 25964951 7816230 15632458 20050218 Nowak, GIMPS et. al.
?? 30402457 9152052 20051215 Cooper, Boone, GIMPS et. al.
?? 32582657 9808358 20060904 Cooper, Boone, GIMPS et. al.

For the primes near the end of the list labeled with "??", it is not known if there are Mersenne primes in the region between them, because the eligible primes have not all been completely tested. (For example, the 29th Mersenne prime was found in 1988, five years after the immediately preceding and following Mersenne primes.) All values of p less than 4395400 have been tested (only the prime values need to be tested, because p must be prime for Mp = 2p-1 to be prime.) See The Mersenne Search Status page for the most current information (and find out how you can set up your computer to join in the search!).

Searches for Mersenne primes are simplified by a number of special tests, that are not generally applicable to searches for all primes. For example, if p is prime, then N = 2p-1, when represented in binary, has p 1's and no 0's. It is not a multiple of 3 (except for the case p=2) because the number of binary digits is odd — so if you group the binary digits in groups of 2 and add modulo 3 (a process analagous to "casting out 9's") you'll get 1 left over. A similar process for groups of 3 1's shows that N is not a multiple of 7 except for the case p=3.

Latin Number Names

This table of Latin number names is adapted from the French textbook formerly at this URL: http://www.guetali.fr/home/jeanneau/chap24.html.


Arabic Roman Latin
1 I unus, una, unum
2 II duo, duae, duo
3 III tres, tres, tria
4 IV quattuor
5 V quinque
6 VI sex
7 VII septem
8 VIII octo
9 IX novem
10 X decem
11 XI undecim
12 XII duodecim
13 XIII tredecim
14 XIV quattuordecim
15 XV quindecim
16 XVI se(x)decim
17 XVII septemdecim
18 XVIII duodeviginti
19 XIX undeviginti
20 XX viginti
21 XXI viginti unus ; unus et viginti
22 XXII viginti duo ; duo et viginti
25 XXV viginti quinque ; quinque et viginti
(there are similar double forms for all numbers below 100)
28 XXVIII duodetriginta
29 XXIX undetriginta
30 XXX triginta
38 XXXVIII duodequadraginta
40 XL quadraginta
50 L quinquaginta
60 LX sexaginta
70 LXX septuaginta
80 LXXX octoginta
90 XC nonaginta
100 C centum
122 CXXII centum viginti duo
200 CC ducenti, ae, a
300 CCC trecenti, ae, a
400 CCCC quadringenti, ae, a
500 D quingenti, ae, a
600 DC sescenti, ae, a
700 DCC septingenti, ae, a
800 DCCC octingenti, ae, a
900 DCCCC nongenti, ae, a
1000 M mille
1124 MCXXIV mille centum viginti quattuor
2000 MM duo milia
3200 MMMCC tria milia ducenti
10,000 decem milia
100,000 centum milia
1,000,000 ducenta milia

Googol and Googolplex

Some time before 1940, mathematician Edward Kasner asked his (at the time) nine-year-old nephew to name the number 10100, and the nephew invented the name googol. Kasner then named 10googol with the name googolplex.

googol is a class-2 number, and is reasonably easy to comprehend. You can even write it out:

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

The number of particles (protons, electrons, photons, etc.) in the universe has been estimated as 1080, so while a googol is an awful lot bigger, it's still roughly the same idea.

Googolplex on the other hand, is 10googol, or:

1010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

and that's a number far too big to relate to anything physical — or is it?

This description comes from Don Page, and is highly paraphrased:

In quantum physics one talks about distinct and indistinct states, overlapping wave functions and many other things, and one of the important principles that comes out of it the concept of how much time it takes for a system with many possible states to spontaneously assume a certain, given state. For simplicity it is imagined that the system is inside an impermeable, rigid box (which thus prevents anything from going in or out and thus isolates the system). If you're outside the box you have no idea what's inside, and any state can spontaneously occur (even a higher-energy state). For example, if the system is a proton and a neutron, the possible states include: A hydrogen atom, a hydrogen atom with the electron in a higher "shell", a separate proton and electron, a neutron, or many other things. The amount of time you might have to wait for one of these states to occur depends on the number of possible states, which depends on the number of particles inside.

Cosmologists and physicists often use these concepts to discuss what might happen inside black holes. The inside of a black hole is isolated from us (nothing can get out) and so one can talk about all the possible states that might exist inside the black hole and how long it might take for a certain state to arise. In theory, if left inside a rigid nonpermeable box for long enough (so that nothing new can go into the black hole) the contents of the box will at some point in the future cease to be a black hole and instead be real matter.

The formula for the number of possible states is a really big exponential function with Planck constants in it and other such things. I don't have the exact formula but we do know that if the black hole had a mass equal to the Milky Way Galaxy, plus the Andromeda galaxy, and Magellanic clouds, and a few other similar globular clusters in our region, then the number of states, and the amount of time you'd have to wait, is somewhere around a googolplex. Its such a big number that there's no way to distinguish what units should be used. It could be a googolplex seconds or a googolplex millenia. Whatever the units, thats the amount of time you'd have to wait for the contents of an impermeable, rigid box containing a black hole with the mass of the local galactic neighborhood to spontaneously become identical to the local galactic neighborhood as we know it.

The original (and much better) description is here (old URL was http://www.uni-frankfurt.de/~fp/Tools/GetAGoogol.html)

Reference:

Edward Kasner and James Newman, Mathematics and the Imagination, Penguin, 1940.

Extension of the hyper4 function to reals

The hyperexponential (also called "tetration" or "power tower", see this page ) xy is defined for integer y>0 by:

x1 = x
xy = x^(x^^(y-1))          for y>1

To remove ambiguity in other contexts, I am using the symbol instead of the commonly-used ^^. This function is an extension of the series (addition, multiplication, exponentiation). Several times, people have asked me: how can one define xy for real y? For example, what is πe)?

There are three other functions that have been extended to the reals in ways that seem promising: the factorials by the Gamma function, hyperfactorials by the K-function, and the lower (Simon and Plouffe 1995) superfactorial by Barnes' G-function. All of these are defined in terms of an integral or an infinite product and they all have a "recurrence relation" like Gamma(x+1) = x Gamma(x). However, defining the tetration operator in terms of an infinite product, or even an infinite product of infinite products of integrals, seems like it will not work since it needs to behave more like an infinite exponential "tower". Using this method for the hyper4 function would probably require using an integral that acts like the xy function in one asymptote and like a lower function in the other.

Daniel Geisler has collected a lot of information related to this issue at his website, tetration.org. The best solution I have found so far was developed by I. N. Galidakis and is described here1. It uses the concept of a power tower with gradually growing exponents — for example, it definines x0.8=x0.8, x0.9=x0.9, x1.0=x1.0=xx0.0, x1.1=xx0.1, x1.2=xx0.2, etc. While this function is continuous, its 1st, 2nd, etc. derivatives are not, except for the 1st derivative when x=e.

Here is my solution:

Begin by imagining an infinite power tower of the form:

xy = x^(x^(a^(b^(c^...^(1^(1^(1)))...))))

where there are a finite number of x's on the bottom, and an infinite number of 1's on the top. The 1's can be completely ignored, and all you need to look at are the x's and the special terms a, b, c, ... in the middle. These terms have values between 1 and x and are called "transition terms". (Galidakis' method fits this model but only has one transition term.)

Now imagine that the x's at the bottom are actually just a tiny bit smaller than x, and that the 1's at the top are actually a tiny bit larger than 1. In fact, every term of the power tower is between 1 and x. The terms shown as x are just very very close to x.

We want something that is well-behaved, continuous and smooth. That means, essentially, x>a>b>c>1 and as y increases, the values of a, b, c increase gradually until eventually a=x. To achieve this result I decided to use the error function erf(z) = 2/π integral(0...z) [e(-z2)dt]

The terms in the power tower need to vary from x to 1 in a smooth manner with two asymptotes and an exponential bias. This is accomplished by the function:

g(x,z) = x^((1+erf(2z-1))/2)

This function increases monotonically from 1 for z<<0 to x for z>>1, and most of the transition from 1 to x occurs in the range 0<z<1. Here are examples for x=2 and x=3:


g(2,-1.0) = 1
g(2,-0.8) = 1
g(2,-0.6) = 1.001
g(2,-0.4) = 1.004
g(2,-0.2) = 1.017
g(2,0.0) = 1.056
g(2,0.2) = 1.147
g(2,0.4) = 1.309
g(2,0.6) = 1.528
g(2,0.8) = 1.743
g(2,1.0) = 1.894
g(2,1.2) = 1.967
g(2,1.4) = 1.992
g(2,1.6) = 1.999
g(2,1.8) = 2
g(2,2.0) = 2

All terms end up being between 1 and x, and there are none that are precisely equal to 1 or to x. However, they get arbitrarily close.

Then I define a finite approximation power-tower:

T(x,y,n) = g(x,y) ^ (g(x,y-1) ^ (g(x,y-2) ^ (... ^ (g(x,y-n+1)))))

for example: T(x,y,3) = g(x,y)^(g(x,y-1)^g(x,y-2))

Then finally, the first approximation to a real-valued tetration operator is:

xy ≅ H(x,y) = limit(n→infinity) T(x,y,n)

Here are some examples:


to approximate use
xy xabc1
22.0 21.8941.05611
22.2 21.9671.14711
22.4 21.9921.3091.0011
22.5 21.9971.4141.0021
22.6 21.9991.5281.0041
22.8 221.7431.0171
23.0 221.8941.0561


Notice that when y is an integer, there are two terms (1.894 and 1.056 in this example) that are almost equivalent to an x. (But not exact: 1.8941.056 is a little less than 2; more on this later.)

Also notice that when y is a half-integer, one of the terms is the square root of x and two others (1.997 and 1.002 in this example) just about "cancel out".

Finally, notice that the example for 23 is the same as the one for 22, except that all the values have moved up one step in the power tower.

Computer implemenation of this function yields the following values:


H(2, 1.0) = 1.962860044952 H(3, 1.0) = 3.01484034768
H(2, 1.2) = 2.173179457529 H(3, 1.2) = 3.792914579042
H(2, 1.4) = 2.466135238417 H(3, 1.4) = 5.340372413507
H(2, 1.6) = 2.885294328336 H(3, 1.6) = 8.646517688235
H(2, 1.8) = 3.385703944243 H(3, 1.8) = 15.08586176829
H(2, 2.0) = 3.898340329384 H(3, 2.0) = 27.44381034902
H(2, 2.2) = 4.510162634986 H(3, 2.2) = 64.5178910787
H(2, 2.4) = 5.525615740224 H(3, 2.4) = 353.1873877405
H(2, 2.6) = 7.388565694475 H(3, 2.6) = 13348.65068561
H(2, 2.8) = 10.45197695863 H(3, 2.8) = 15768315.29971
H(2, 3.0) = 14.91136402023 H(3, 3.0) = 1.241724436834×1013
H(2, 3.2) = 22.78737180478 H(3, 3.2) = 6.065367827881×1030
H(2, 3.4) = 46.06553106953 H(3, 3.4) = 3.259939872799×10168
H(2, 3.6) = 167.5636835838 H(3, 3.6) = 8.41325266884×106368
H(2, 3.8) = 1400.743379664 H(3, 3.8) = 2.40222047×107523398
H(2, 4.0) = 30815.4026169 H(3, 4.0) = 2.43×105924531213185


Immediately you should notice that H(3,1), H(2,3) etc. do not have the expected integer values. This results from the fact that the terms given by the g() function do not cancel out in the way that they need to, and for sufficiently low values of y, the first term is not x. To compensate for this, two techniques are used.

The first technique relies on the expected property that

x(y-1) = logx(xy)

The technique is to calculate xy2 for y2=y, or y2=y+1, or y2=y+2, using whatever value of y2 is high enough to ensure that the power tower has at least one term equal to x, then take the log of the result an appropriate number of times. More formally:

H2(x,y,0) = H(x,y),
H2(x,y,n) = logx(H2(x,y+1,n-1)) for n>0
    and
xy ≅ H3(x,y) = limit(n→inf) H2(x,y,n)

The second adjustment that needs to be done is to add a constant bias to y:

H4(x,y) = H3(x,y+Kx)

The bias Kx depends on x but not on y; it corrects for the fact that the transition terms do not "cancel out" perfectly. To compute Kx any value of y can be used, for simplicity set y=1 and find the value of Kx such that H4(x,y)=x.

Applying these techniques yields the following:


H4(2, 1.0) = 2 H4(3, 1.0) = 3
H4(2, 1.2) = 2.220044044808 H4(3, 1.2) = 3.767454385038
H4(2, 1.4) = 2.535796688385 H4(3, 1.4) = 5.286052796168
H4(2, 1.6) = 2.976545041727 H4(3, 1.6) = 8.533218056394
H4(2, 1.8) = 3.48226088408 H4(3, 1.8) = 14.87566955748
H4(2, 2.0) = 4 H4(3, 2.0) = 27
H4(2, 2.2) = 4.659076583142 H4(3, 2.2) = 62.73827352165
H4(2, 2.4) = 5.798970054811 H4(3, 2.4) = 332.7270753606
H4(2, 2.6) = 7.870989644306 H4(3, 2.6) = 11786.360106
H4(2, 2.8) = 11.17544894536 H4(3, 2.8) = 12516938.66315
H4(2, 3.0) = 16 H4(3, 3.0) = 7.625597484988×1012
H4(2, 3.2) = 25.26514549836 H4(3, 3.2) = 8.585464199454×1029
H4(2, 3.4) = 55.67547493765 H4(3, 3.4) = 5.638449242549×10158
H4(2, 3.6) = 234.1013825216 H4(3, 3.6) = 3.33366805307×105623
H4(2, 3.8) = 2312.838710271 H4(3, 3.8) = 3.02141334×105972097
H4(2, 4.0) = 65536 H4(3, 4.0) = 4.89×103638334640024


Similar results are produced with other values of x>1. However, notice:

- I am not certain that the limit is well-defined for arbitrarily large y. The error function converges quickly, but since it's being used for terms in a power tower, it might not converge fast enough if the power tower is tall. Due to this, even when the "bias" is perfectly adjusted for a small value of y the same bias will not produce the expected result for larger y. However, the formulas seem to behave well for the values I have tested, which all fall within the range 1 < xy < 1010308.

- As I just mentioned, the bias is different for different values of x. For x greater than about 2.834 the bias is positive, otherwise it is negative. Beyond establishing that this crossover point is not e=2.71828... I have not investigated further. I suspect a "better" g(x,z) would involve a function similar to the error function but with a slightly different shape so that a bias is not necessary. Determining the proper definition of this function is currently beyond my abilities.

I have not yet speculated on extensions handling y<1, x<1 or for complex x and y.

(Following is a brief description of my own history with this function.

I have been obsessed with the higher-order compound counting functions (increment, add, multiply, etc...) in 1974 when I was 10 years old. At age 12 I began trying to make slide rules to compute the "lower hyper4 function" defined by the left-associative repetition of exponents: xy = (((x^x)^x)^...)^x — and did not succeed for the first few attempts. I discovered the formula x^(x^(y-1)) at age 14 and had a working slide rule a couple months later. In that part of my life (which I call "my classical period") I never could crack the right-associative "higher hyper4 function", but the obsession never died.

By my late 30's I had seen the common threads between the various generalized functions Gamma (for the factorials), Barnes' G-function (for the lower superfactorial), and the K-function (for the hyperfactorials). Eventually I decided to just put the assumptions down on paper and manually construct a function that would meet all of them. What I have now is not wholly satisfactory because it requires an adjustment for the base.)

Gödel's undecidable sentence

The discipline of formal logic was described by Aristotle around 350 BC — although it was probably developed well before him. He described the syllogism, an inference of truth based on other established truths. For example:

if all integers are real numbers
and all primes are integers
then all primes are real numbers

Around 1850, George Boole refined formal logic by defining more precise symbols, axioms and rules, the result is called boolean algebra and propositional calculus.

Meanwhile, set theory was being developed by Georg Cantor and others.

Then Friedrich Ludwig Gottlob Frege began developing the predicate calculus. He applied existing formalism techniques to set theory and began developing a complete formalized system of arithmetic, symbolic logic, and set theory from basic axioms and inference rules. This work was just being published as Grundgesetze der Arithmetik vol. 2 when Bertrand Russell (a mathematician who was also well-known as a writer) presented Frege with the following paradox:

Consider the class of all classes that are not a member of itself. Is this class a member of itself or not?

This paradox challenged Frege so completely that he had to withdraw his work because its foundation was flawed.

After toppling Frege's work, Bertrand Russell and Alfred North Whitehead began work on a similar and even more massive project, with corrections to the problems in Frege's approach and using some of the foundation axioms of Giuseppe Peano. Their goal was to make it possible to derive every true theorem in number theory by starting with a set of axioms and a set of inference rules, and methodically applying all the inference rules to the axioms and existing theorems to create new theorems. For examples, see the book Gödel, Escher Bach.

The technique of deriving all truth by an automatic process is appealing — it suggests the possibility of automation (e.g. by a mechanical or electronic computer) to eliminate human error in discovering proofs. However, from the start there was much doubt about whether it could ever be used to discover every truth in number theory (if it were possible, the technique would have completeness). For one thing, there was the issue of time and algorithmic complexity: the number of theorems that are provable in such a system is infinite, and all but a very very tiny fraction of those theorems are completely useless (true, but useless).

The goal of formalization was not questioned, and was continued across various fields of mathematics, e.g. by Hilbert.

Gödel finally showed that the Russell/Whitehead approach would not achieve completeness, in fact that no (sufficiently powerful) axiomatic system of number theory can prove all statements which are true in that system. He did this by showing how statements, expressions and theorems in a number theory system S could be used to encode its own symbols, axioms and inference rules. He then demonstrated that there exists a statement G in that system which asserts "Statement G cannot be derived through the axioms and inference rules of formal system S.". If G is indeed inferrable, then it is false — and the formal system S is inconsistent. But if G cannot be inferred, then it is true, and exists as an example of a true statement that cannot be inferred, and therefore demonstrates that S is incomplete. Thus, all sufficiently powerful formal systems of number theory are either inconsistent or incomplete.

It is now generally agreed that this phenomenon of incompleteness is a property of all axiomatic systems.

Notes on Skewes Number

The Skewes Numbers concern the distribution of primes, specifically how many primes exist that are below a certain number. The Skewes numbers set an upper bound for the point at which the number of primes ends up being less than the approximation "li(x)". See here for a complete description of the problem.

The following table gives the number of primes less than each power of 10:


n 10n primes
1 10 4
2 100 25
3 1,000 168
4 10,000 1,229
5 100,000 9,592
6 1,000,000 78,498
7 10,000,000 664,579
8 100,000,000 5,761,455
9 1,000,000,000 50,847,534
10 10,000,000,000 455,052,511
11 100,000,000,000 4,118,054,813
12 1,000,000,000,000 37,607,912,018
13 10,000,000,000,000 346,065,536,839
14 100,000,000,000,000 3,204,941,750,802
15 1,000,000,000,000,000 29,844,570,422,669
16 10,000,000,000,000,000 279,238,341,033,925
17 100,000,000,000,000,000 2,623,557,157,654,233
18 1,000,000,000,000,000,000 24,739,954,287,740,860
19 10,000,000,000,000,000,000 234,057,667,276,344,607
20 100,000,000,000,000,000,000 2,220,819,602,560,918,840
21 1,000,000,000,000,000,000,000 21,127,269,486,018,731,928
22 10,000,000,000,000,000,000,000 201,467,286,689,315,906,290

(source: Caldwell)

Beyond about n=12 (and slightly larger values as computers get faster) this table cannot be produced by the direct method of finding each prime and counting. There was sufficient interest in this that, as early as 1867, Kulik had completed a table of the smallest factor of every integer (and therefore, giving all the primes) up to slightly above 100,000,000!

Soon after Kulik's table, Meissel found an indirect way to count the exact number of primes up to far higher numbers. His method was simplified by Lehmer in 1959, then improved by Lagarias, Miller and Odlyzko in 1985, and then improved further by Deleglise and Rivat.

The method has been improved to such an extent that it is actually possible to derive values of pi(x) for really huge values of x like 10100 or 101000. In 2005 Patrick Demichel found the actual point at which the pi(x) vs. Li(x) crossover occurs, 1.397162914×10316. .

Ackermann's Function

See this page for descriptions of the different versions of Ackermann's function.

Comparing Graham's Number and Chained Arrow Numbers

The following three numbers are listed in ascending order:

3→3→64→2
Graham's number
3→3→65→2.

First, note the function-based definition of Graham's number:


gn(n) = { hy(3,6,3) for n = 1 { { hy(3, gn(n-1)+2, 3) for n > 1 Graham's Number = gn(64)


And we will give another similar definition for a function we'll call c3(N):


c3(n) = { 27 for n = 1 { { hy(3, c3(n-1), 3) for n > 1


Here is the expansion process for 3→3→N→2:

3→3→1→2 = 27 = c3(1)

3→3→2→2 = 3→3→(3→3→1→2) = 3→3→c3(1) = hy(3,c3(1),3) = c3(2)

3→3→3→2 = 3→3→(3→3→2→2) = 3→3→c3(2) = hy(3,c3(2),3) = c3(3)

and in general,

3→3→N→2 = c3(N)

Now note that:

c3(1) << gn(1) << c3(2)

c3(2) = hy(3, c3(1), 3) << hy(3, gn(1)+2, 3) = gn(2) << c3(2)

and in general

c3(N) = hy(3, c3(N-1), 3) << hy(3, gn(N-1)+2, 3) = gn(N) << c3(N+1)

therefore,

c3(64) = 3→3→64→2 << gn(64) = Graham's number << 3→3→65→2 = c3(65)

A reader asked me how 3→3→3→3 compares to these:

3→3→3→3
= 3→3→(3→3→2→3)→2
= 3→3→(3→3→(3→3→1→3)→2)→2
= 3→3→(3→3→27→2)→2
= 3→3→c3(27)→2
= c3(c3(27))

therefore 3→3→3→3 is much bigger than Graham's Number.

What Turing and Church Did Not Prove

Turing is often credited with having proven that there are limits to what can be computed by computers, including all conceivable designs of real-world computers. This actually depends a lot on what you mean by computer. Here are some types of computers that a Turing machine cannot emulate:

Each of these has one or more properties that make Turing's argument inapplicable. Usually this is some form of unpredictability on a fundamental level. For the ideal analog computers, Turing's argument does not apply because of the cardinality of the reals It is known that Turing planned to create a limits-to-computability proof for such computers, but that he never did (I suspect it was not because such computers are actually without limit!)

Notes on Busy Beaver Function

In 1964 Milton Green developed a lower bound for the Busy Beaver function that was published in the proceedings of the 1964 IEEE symposium on switching circuit theory and logical design. Heiner Marxen and Juergen Buntrock described it as "a non-trivial (not primitive recursive) lower bound". This lower bound can be calculated but is too complex to state as a single expression in terms of n. When n=8 the method gives

BB(8) >= 3 × (7 × 392 - 1) / 2 ≅ 8.248×1044

Ray Kurzweil, in his 1999 book The Age of Spiritual Machines, discusses the Busy Beaver function in a footnote. I believe he was quoting the results of Green or Marxen and Buntrock, because his description is suggestive of the Ackermann function and because of the near-agreement of the values for BB(8). Kurzweil states that a 6-state Turing machine "can add" and BB(6) is 35; that a 7-state machine can multiply and BB(7) is 22961, and that an 8-state machine can exponentiate and BB(8) is "about 1043". Then he goes on to say that BB(10) requires a notation in which the height of a stack of exponents is given by another stack of exponents whose height is given by... and so on; this might be equivalent to the generalized hyper operator. Presumably BB(9) is something in between like hyper4 or hyper5. He says that BB(12) requires an "even more exotic notation", and that the limits of human intelligence are reached well before BB(100).

It has since been shown that at least one possible BB(6) candidate performs iterated exponentiation, the equivalent of hyper4.

The direct approach to finding actual values of the busy beaver function is to simulate Turing machines with a computer program, trying each possible set of rules to see what it does. The first impediment to this is that the number of machines to test grows so quickly. Lin and Rado give the forumla (2N-1)4N2N-2, which incorporates some of the rules listed below:

N (2N-1) 4N2N-2
1: 1
2: 192
3: 103680
4: 117440512
5: 230400000000
6: 697437190619136
7: 3018837446159761408
8: 17708874310761169551360
etc.

To actually implement a complete test of all Turing machines of a given state, you have to deal with many different types of cases. Many are easy to deal with and many are very difficult.

(If your browser supports Java you can run any of the Turing machines given here on this simulator. If you have Linux, you can use my own simulator written in perl

Since the tape is symmetrical, left and right can be interchanged. These two programs differ only in the directions of the arrows, so one can be ignored:

1,_:   2,1,>     1,1:   2,1,<
2,_:   1,1,<     2,1:   H,1,>

1,_:   2,1,<     1,1:   2,1,>
2,_:   1,1,>     2,1:   H,1,<

the latter program is skipped by setting as a rule that the "1,_" state always moves to the right.

If the "1,_" state writes a "_" and moves to state N, the machine will find itself in state N with a completely empty tape. Such a machine is identical to a machine in which states 1 and N are switched, except that it takes one extra step. Thus, for the purpose of the Busy Beaver problem, the first machine can be ignored. This is accomplished by setting a rule that the "1,_" state should always write a 1.

If the "1,_" state moves to state N, and if N is greater than 2, states N and 2 can be swapped and the resulting machine is equivalent, and as a result one of these two machines can be ignored. This is accomplished by setting a rule that the "1,_" state always proceeds to state 1 or 2.

If the "1,_" state goes to state 1, then the machine will stay in state 1 forever and write 1's forever, and therefore will never halt. Thus, such machines can be skipped. Combined with the previous rules, this completely defines the actions for the "1,_" state: It should be "1,_: 2,1,>".

The aforegoing rules establish that the "2,_" action will be the second to execute. Therefore it should not halt: if it did the machine would never use the "1,1" action, which could be used by having the "2,_" action perform "1,1,<".

Furthermore, if the "2,_" action goes to state 1 or state 2, it should not move to the right because if it did it would create an infinite-loop machine (such a machine would continually move right, either staying in state 2 or alternating between states 1 and 2.).

No state should have both of its actions go to itself (i.e. not change state) because that would result in an infinite loop as soon as that state is entered. An example of this is if the "3,_" action and the "3,1" action both went to state 3.

Except for state 1 (which is by definition the starting state) and state 2 (which, if using the aforegoing rules, is the state entered from the "1,_" state), all the other states can be renumbered in arbitrary order, and the resulting machines all have the same behavior. This is important for machines with 4 or more states and allows lots of programs to be eliminated quickly.

If the program doesn't have any states that contain a HALT action, we know it will never halt and it can be skipped.

Programs containing two or more HALT actions can be skipped because only one of the HALTs will ever be used (whichever one it gets to first) and the other HALTs would be superfluous.

If the HALT writes a 0, the same machine with a HALT that writes a 1 will be at least as good or better; thus the 0-writing version can be ignored.

A machine that moves left when HALTing is the same as a machine that moves right when HALTing, thus only one of these two possibilities needs to be looked at. This is accomplished by setting a rule that the HALT always moves to the right.

Some programs contain unused states. For example:

1,_:   1,1,>     1,1:   1,1,<
2,_:   2,1,>     2,1:   H,1,>

State 1 always goes to state 1, so state 2 is never reached. Programs like this can be skipped because there is always a way to modify the program so that it uses the extra state and produces at least as much output.

A similar example is:

1,_:   2,1,>     1,1:   2,1,<
2,_:   1,1,<     2,1:   1,1,<
3,_:   4,1,>     3,1:   4,1,<
4,_:   H,1,>     4,1:   3,1,<

There is no way to get past states 1 and 2. Such machines are eliminated by looking for loops in the state graph.

Some programs halt very quickly, for example:

1,_:   2,1,>     1,1:   2,1,<
2,_:   3,1,>     2,1:   3,1,<
3,_:   4,1,>     3,1:   4,1,<
4,_:   H,1,>     4,1:   H,1,>

This program halts quickly because each state goes to a higher-numbered state, and the highest state (state 4) always halts. Based on that alone you know it will run for only 4 steps. (However, it doesn't hurt much to run such machines because they don't take long to run)

Based on all these examples, it seems as though it ought to be easy to search for the busy beaver using brute force. The trouble is in the programs that have to be run to determine what happens. Some exhibit very complex behavior. Only the simplest behaviors can be detected automatically.

If a program is simulated, and the simulation shows that the program halts before it has used all of its rules, then the unused rules are irrelevant and we know right away that all other programs that differ only in these unused rules will have the same behavior. This allows those other programs to be skipped.

Any programs that don't halt real quickly have to be analyzed in some way, usually by human inspection.

The simplest behaviors are like this trivial example:


1,_:   2,1,>     1,1:   2,1,<
2,_:   1,1,>     2,1:   H,1,>

which will keep writing 1's and moving to the right.

Many programs have similar behavior, writing some continuous pattern and moving steadily and repeatedly either to the left or to the right. Here's a slightly less trivial example — it moves forever to the left in a two steps forward, one step back manner, leaving a trail of alternating 1's and 0's:

1,_:   2,1,>     1,1:   3,1,<
2,_:   3,1,<     2,1:   H,1,>
3,_:   1,_,<     3,1:   3,_,<

Automatically identifying such programs can be difficult, because sometimes the pattern it generates is complex and it takes a large number of steps to repeat each cycle.

Some programs write a fixed number of 1's but never halt, for example:

1,_:   2,1,>     1,1:   H,1,>
2,_:   2,_,>     2,1:   1,1,>

which writes a single 1 and then goes off forever to the right.

Some programs grow forever but don't move steadily in one direction or the other. For example:

1,_:   2,1,>     1,1:   1,1,<
2,_:   3,1,>     2,1:   2,1,>
3,_:   1,_,<     3,1:   H,1,>

It produces an ever-expanding row of 1's in both directions, by moving back and forth. Each time it gets to an end it writes a new 1 and then reverses direction. State 3 is never reached while on a 1, so it never halts. Many programs behave in a similar manner but produce more complicated patterns or move in more complicated ways.

Some programs grow forever without any repeating pattern. The simplest example is this machine:

1,_:   2,1,>     1,1:   1,1,<
2,_:   1,_,<     2,1:   2,_,>
3,_:   1,_,<     3,1:   H,1,>

which is a binary counter. It keeps moving back and forth producing each binary number in sequence. The string of 1's and 0's is always changing, has a pattern that never repeats and grows forever (although, of course, the growth is very slow).

The next example is a little more complicated. It writes a string of 1's to the right, and simultaneously updates a binary number on the left; the binary number tells how many 1's have been written to the right:

1,_:   2,1,>     1,1:   1,1,<
2,_:   3,_,>     2,1:   2,_,>
3,_:   4,1,>     3,1:   3,1,>
4,_:   5,_,<     4,1:   H,1,>
5,_:   1,_,<     5,1:   5,1,<

Here is another machine (given by Marxen and Buntrock) that works like a binary counter, but produces a more complex pattern. Whenever all the 1's are to the left of the head (which occurs often) the binary value is recognizable. The digits are coded in groups of 3 cells: what would be a "1" digit appears as "111" and what would be a "0" appears as "_11":

1,_:   2,1,>     1,1:   1,1,<
2,_:   1,_,<     2,1:   3,_,>
3,_:   3,_,<     3,1:   4,1,>
4,_:   5,1,>     4,1:   1,_,<
5,_:   2,_,>     5,1:   H,1,>

Many other programs move around erratically, writing seemingly random patterns of 1's and 0's and transitioning between states in a seemingly random order, not moving steadily in any direction. Here's an example, the 5-state machine found by Schult in 1983:

1,_:   2,1,>     1,1:   3,_,<
2,_:   3,1,>     2,1:   4,1,>
3,_:   1,1,<     3,1:   2,_,>
4,_:   5,_,>     4,1:   H,1,>
5,_:   3,1,<     5,1:   1,1,>

This machine was a record-holder for a while in the 5-state "contest". In order to halt it has to be in state 4 with a 1 on the tape. The only way to be in state 4 is coming from state 2 with a 1 on the tape. The only way to be in state 2 is coming from state 1 with a 0 on the tape, or state 3 with a 1 on the tape. All other states lead to states 1 and 3, which is where it spends most of its time. When you run it, it appears to fill the tape with ever longer patterns of "111_111_111_...", then backtrack over those patterns and convert them into "1_1_1_1_1_...", but every time it completes a cycle there is a little difference, like an extra 1 at the left or right end. It runs for 134,467 steps before finally producing just the right pattern of 1's and 0's to lead itself into its HALT state.

There are many programs like this that run for a long time and eventually halt, and many others that have been simulated for a long time without following any type of pattern that anyone has been able to figure out. Such machines resemble this one:

1,_:   2,1,>     1,1:   2,1,<
2,_:   3,1,<     2,1:   5,_,>
3,_:   4,_,<     3,1:   1,_,>
4,_:   1,1,>     4,1:   4,_,<
5,_:   H,1,>     5,1:   3,_,>

Its behavior was sufficiently complex as to elude Marxen and Buntrock, but not their program. Designed to analyze Turing machines and prove that they do or do not halt, the program did manage to create a proof that this machine does not halt. The trouble is, the proof is so complex that it is difficult to check by hand.

Some programs run so long before halting that direct "one-step-at-a-time" simulation isn't fast enough to show how long they run. Many machines can be run to completion with first-order acceleration techniques (skipping through any repeated sequence of states and/or tape cells). However, many others require a combination of different types of acceleration. The current record-holder for 6-state machines, which takes 8,690,333,381,690,951 (over 8×1015) steps to halt, is a good example. And, when all types of automated acceleration fall short, "maunal" (human) analysis can sometimes be used to prove that a machine halts and calculate the relevant numbers. (An example of this follows the table).

Busy Beavers and BB Candidate Record-Setters

You can run any of these on my turing machine simulator but be warned — the simulator is a command-line-oriented Perl program, certainly not for the faint of heart.


When Who Marks Steps Program Published
1962 Lin & Rado BB1 = 1 1
1,_:H,1,>  1,1:1,_,<
Rado, On non-computable functions. The Bell System Technical Journal, vol. 41, no. 3, pp. 877-884, May 1962.
1962 Lin & Rado BB2 = 4 6
1,_:2,1,>  1,1:2,1,<
1,_:1,1,<  1,1:H,1,<
Rado, On non-computable functions. The Bell System Technical Journal, vol. 41, no. 3, pp. 877-884, May 1962.
1965? Lin & Rado BB3 = 6 21 ? Lin & Rado. Computer Studies of Turing Machine Problems. Journal for the Association of Computing Machinery, vol. 12, no. 2, pp. 196-212, April 1965.
1973? Brady BB4 = 13 107 ? A. H. Brady. Solution of the non-computable "Busy Beaver" game for k=4. Abstracts for ACM Computer Science Conference (Washington DC, 1975), p. 27, ACM, 1975.
January 1983 Uwe Schult BB5 >= 501 134,467 1,_:2,1,>  1,1:3,_,<
2,_:3,1,>  2,1:4,1,>
3,_:1,1,<  3,1:2,_,>
4,_:5,_,>  4,1:H,1,>
5,_:3,1,<  5,1:1,1,>
J. Ludewig, U. Schult, and F. Wankmueller. Chasing the Busy Beaver — Notes and Observations on a Competition to Find the 5-state Busy Beaver Forschungsberichte des Fachbereichs Informatik, 159, Universitaet Dortmund, Dortmund, 1983.
December 1984 George Uhing BB5 >= 1915 2,133,492 1,_:2,1,>  1,1:3,1,<
2,_:1,_,<  2,1:4,_,<
3,_:1,1,<  3,1:H,1,>
4,_:2,1,<  4,1:5,1,>
5,_:4,_,>  5,1:2,_,>
A.K. Dewdney, Computer Recreations, Scientific American, vol. 252, no. 4, pp. 12-16, April 1985.
September 1989 Heiner Marxen, Juergen Buntrock BB5 >= 4098 47,176,870 1,_:2,1,<  1,1:3,1,>
2,_:3,1,<  2,1:2,1,<
3,_:4,1,<  3,1:5,_,>
4,_:1,1,>  4,1:4,1,>
5,_:H,1,>  5,1:1,_,>
H. Marxen and J. Buntrock. Attacking the Busy Beaver 5 Bulletin of the EATCS, 40, pp. 247-251, February 1990.
1989? Heiner Marxen, Juergen Buntrock BB6 >= 136,612 >= 13,122,572,797 1,_:2,1,<  1,1:1,1,<
2,_:3,1,>  2,1:2,1,>
3,_:6,_,>  3,1:4,1,>
4,_:1,1,<  4,1:5,_,>
5,_:1,_,<  5,1:3,1,>
6,_:5,1,<  6,1:H,1,>
H. Marxen and J. Buntrock. Attacking the Busy Beaver 5. Bulletin of the EATCS, 40, pp. 247-251, February 1990.
September 1997 Heiner Marxen BB6 >= 95,524,079 8,690,333,381,690,951 1,_:2,1,>  1,1:1,1,>
2,_:3,1,<  2,1:2,1,<
3,_:6,_,>  3,1:4,1,<
4,_:1,1,>  4,1:5,_,<
5,_:H,1,>  5,1:6,1,<
6,_:1,_,<  6,1:3,_,<
Heiner Marxen, web page
Oct 2000 Heiner Marxen, Juergen Buntrock BB6 >= 6.427499 × 10462 6.196913 × 10925 1,_:2,1,>  1,1:2,_,<
2,_:3,_,>  2,1:2,1,<
3,_:4,1,>  3,1:1,_,<
4,_:5,1,<  4,1:6,1,<
5,_:1,1,<  5,1:4,_,<
6,_:H,1,>  6,1:5,1,<
Heiner Marxen, web page
Mar 2001 Heiner Marxen, Juergen Buntrock BB6 >= 1.29149 × 10865 3.00233 × 101730 1,_:2,1,>  1,1:6,_,<
2,_:3,_,>  2,1:4,_,>
3,_:4,1,<  3,1:5,1,>
4,_:5,_,<  4,1:4,_,<
5,_:1,_,>  5,1:3,1,>
6,_:1,1,<  6,1:H,1,>
Heiner Marxen, web page

Analysis of the 1997 Marxen-Buntrock BB6 Record-Setter

This section describes the machine that set the record BB6 >= 95,524,079. It is a 6-state candidate discovered by Marxen and Buntrock in 1997. (The Oct 2000 record-holder is discussed in the next section)

The machine's operation lends itself easily to analysis, which reveals a beautiful combination of a few simple processes that interact in just the right way. The machine implements an iterated function that grows at an exponential rate and uses a test condition based on a low-probability, deterministic, and statistically random condition to determine when to stop growing. As a result, it grows to a very large size, so large that it takes a really sophisticated Turing machine simulator to show that it halts by brute-force iteration.

Simulation of the first few steps shows that it creates a row of 1's and repeatedly expands it to a row of 1's about 2.5 times longer. The exit condition depends on the number of 1's modulo 4 (the remainder of the number when divided by 4). But the number of 1's increases in a way that also depends in its value modulo 4, with the result that the value modulo 4 changes in a "chaotic" manner. This means that the row of 1's can multiply many times before the machine halts. Furthermore, the process of performing the mod-4 test is combined with the process of performing the division by 2 (which is then followed by a multiplication by 5). This combination of functions — accomplishing two things simultaneously with the same code — is part of what makes this much complexity possible in a Turing machine with only 6 states.

To analyze, we started by finding an example of a simple state that the machine enters over and over again. I call such a state "canonical" because it it has a simple pattern and recurs multiple times during the machine's history. As already implied, the canonical state in this case is a single continuous set of N 1's. In addition we specify that the machine is in state 1 and the head is positioned over any of the 1's. It doesn't matter which of the 1's the head is on, because in all such states the machine just moves to the right, staying in state 1 until it gets to the end of the row of 1's.

This condition occurs at steps 4, 37, 259, and several more times as listed below. After it reaches the right-hand end of the row of 1's, it adds two more 1's to the right and reverses direction, then plows through the rest of the 1's changing them into a "1_1_" pattern (this is easy to see in the machine's state table: it cycles through states 3,4,5,6, each time moving to the left and writing alternating 1 and _). When the head reaches the left end, what happens next depends on the value of N modulo 4. Here is what the tape will end up looking like for each of the 4 possibilities:

N = 0 mod 4 (4, 8, 12, etc.): 1(10)M11
N = 1 mod 4 (5, 9, 13, etc.): Halts with 1(10)M11 on the tape
N = 2 mod 4 (6, 10, 14, etc.): 111(10)M11
N = 3 mod 4 (7, 11, 15, etc.): 111(10)M11

(Where M is equal to N/2, rounded down.) As you can see, only values of N that are 1 more than a multiple of 4 will result in a halt.

In the remaining cases the machine begins a process that takes about 5 N2 / 4 steps, during which it repeatedly shuttles back and forth turning each copy of '1_' into '11' and then appending a '111' to the left, thereby adding a total of 3M 1's to the left. When this is done the tape is once again a single solid row of 1's, with the machine in state 1. The new value of N for this new, longer string of 1's is different for each of the three cases:

N = 0 mod 4: 3 M + 1 + 2 M + 2 = 5 N / 2 + 3 = (1 or 3) mod 4
N = 1 mod 4: (machine will have halted)
N = 2 mod 4: 3 M + 3 + 2 M + 2 = 5 N / 2 + 5 = (0 or 2) mod 4
N = 3 mod 4: 3 M + 3 + 2 M + 2 = 5 (N-1) / 2 + 5 = (0 or 2) mod 4

Notice that the formula for the new N involves a division by 2, and adds either 3 or 5. Because of the division by 2, the new N's mod-4 value depends on the old N's value modulo 8:

N = 0 mod 8: new N = 3 mod 4
N = 1 mod 8: (machine will have halted)
N = 2 mod 8: new N = 2 mod 4
N = 3 mod 8: new N = 2 mod 4
N = 4 mod 8: new N = 1 mod 4
N = 5 mod 8: (machine will have halted)
N = 6 mod 8: new N = 0 mod 4
N = 7 mod 8: new N = 0 mod 4

As you can see, out of the 6 possible cases, only one of them leads to an N=1 mod 4 value (the 'halting' value) for the new N. This is an example of a generalized Collatz mapping1. (The parameters are: T(x) = floor(mi.x / d) + Xi, where d=4, i such that x = i mod d, mi = {10,0,10,10}, Xi = {3,1,5,3}, and initial value x=3).

A "Collatz mapping" is an iterated function on integers, where at each iteration you perform a modulo test (like the remainder after dividing by 2) and then select from one of a number of linear functions (such as X/2 or 3X+1) based on what the remainder was. The simplest and best-known Collatz mapping is the "3N+1 problem", which states that for any X, the next value of X in the iteration is:


X/2 if X is even
3X + 1 if X is odd


Such iterations are notable for the fact that the behavior of the iteration is very difficult to characterize. Does any value for X cause the iteration to grow forever, or do they all always end up in the cycle 4, 2, 1, 4, 2, 1, ...? Some values of X grow pretty high.

Our Turing machine differs from the standard Collatz mapping in having a "halt" condition as one possible outcome. It also differs from the 3N+1 problem in that all of the linear functions grow, thus there is no question of it ever getting smaller — only how long will it grow before it stops?

Here are all the N values, and for the first 12 cases the step number (found by simulation) at which a row of N 1's is first seen. (The precise number of steps for each N can be calculated using a little more analysis, but it's not important for answering the halting question). Also given are the mod-4 value and the value of M. Notice in particular the fairly erratic pattern of mod-4 values.


at step N i such that
N = i mod 4
M
4 3 3 1
37 10 2 5
259 30 2 15
1661 80 0 40
10,224 203 3 101
63,057 510 2 255
392,779 1280 0 640
2,449,742 3203 3 1601
15,294,575 8010 2 4005
95,566,797 20,030 2 10,015
597,248,199 50,080 0 25,040
3,732,606,762 125,203 3 62,601
313,010 2 156,505
782,530 2 391,265
1,956,330 2 978,165
4,890,830 2 2,445,415
12,227,080 0 6,113,540
30,567,703 3 15,283,851
76,419,260 0 38,209,630
191,048,153 1 95,524,076

It might seem a little disappointing to some readers to note that this machine, a record-setter for the number of 1's produced, actually produces as many as twice that record, only to cut the number almost in half just before halting.

Analysis of the Oct 2000 Marxen-Buntrock BB6 Record-Setter 'q'

This is a high-level description of an old BB(6) candidate champion, discovered in October 2000 and later surpassed on March 2001. It writes over 6×10462 ones before halting. For a more thorough and rigorous analysis read this page.

This machine operates on similar principles to the one just discussed, but instead of iterating a process of multiplying the number of 1's by 5/2, it iterates a process of raising 2 to the power of the number of 1's! Yes, that's right, iterated exponents. Combined with the modulo dependency mechanism (this time on a modulo-3 test rather than modulo-4) it's easy to see how it gets up to 10462 ones.

To analyze, we use the same method as before of finding a simple "canonical" state (like a run of 1's) and looking at what it produces when the number of 1's changes. In this case, the simplest such state is when the head is in state 2, positioned on a 0 cell, with N consecutive 1's just to the left of the head, and nothing else. Call such a state C(N).

Looking at the fates of the first few C(N)'s, we find:

C(0) becomes C(10) after 86 steps.
C(1) becomes C(24) after 408 steps.
C(2) halts after 8 steps, leaving 4 1's on the tape.
C(3) becomes C(28) after 544 steps.
C(4) becomes C(56) after 2098 steps.
C(5) halts after 11 steps, leaving 6 1's on the tape.
C(6) becomes C(64) after 2730 steps.
C(7) becomes C(120) after 9548 steps.
C(8) halts after 14 steps, leaving 8 1's on the tape.
C(9) becomes C(136) after 12260 steps.
C(10) becomes C(248) after 40806 steps.
C(11) halts after 17 steps, leaving 10 1's on the tape.
C(12) becomes C(280) after 52030 steps.
C(13) becomes C(504) after 168832 steps.
C(14) halts after 20 steps, leaving 12 1's on the tape.
C(15) becomes C(568) after 214488 steps.

It is clear right away that there is a modulo-3 effect going on. A general formula is only evident for the case of N=2 modulo 3:

C(3N+2) halts after 3N+8 steps, leaving (2(N+1)/3)+2 1's on the tape.

For the others, we have to consider another "canonical" state; this one is a bit more complex. It consists of bunch of groups of 1's separated by single 0's, with the head in state 2 and positioned just to the left of the leftmost 1. I will represent such a state as D(a,b,c,...) where the a, b, c etc. are numbers giving the number of 1's in each group. Each of the C(N)'s above (except the ones that halt) leads to one of these D(...) states, which then passes through one or more additional D(...) states and ultimately becomes another C() state. Here is the first D(...) state for each C() state:

C(0) becomes 111_1 which we call D(3,1)
C(1) becomes 111_1_1 which we call D(3,1,1)

C(3) becomes 1111_1_1 which we call D(4,1,1)
C(4) becomes 111_11_1_1 which we call D(3,2,1,1)

C(6) becomes D(4,2,1,1)
C(7) becomes D(3,2,2,1,1)

C(9) becomes D(4,2,2,1,1)
C(10) becomes D(3,2,2,2,1,1)

C(12) becomes D(4,2,2,2,1,1)
C(13) becomes D(3,2,2,2,2,1,1)

C(15) becomes D(4,2,2,2,2,1,1)

and in general:

C(3N) becomes D(4,X,1,1) where X is a string of N-1 2's in a row
C(3N+1) becomes D(3,X,1,1) where X is a string of N 2's in a row

Then we can look at each of the D's and see what happens:

D(3,1) becomes C(10)

D(3,1,1) becomes D(10,1)
D(10,1) becomes C(24)

D(4,1,1) becomes D(12,1)
D(12,1) becomes C(28)

D(3,2,1,1) becomes D(11,1,1)
D(11,1,1) becomes D(26,1)
D(26,1) becomes C(56)

D(4,2,1,1) becomes D(13,1,1)
D(13,1,1) becomes D(28,1)
D(28,1) becomes C(60)

The pattern that emerges is:

D(N,1) becomes C(2N+4)

D(N,1,1) becomes D(2N+4,1)

D(N,2,1,1) becomes D(2N+5,1,1)

and in general:

D(N,2,X) becomes D(2N+5,X)

where X is any sequence of groups.

So, to figure out how far this machine goes before halting, we have to find the first occurance of a C(N) state, then apply these rules to find each subsequent C(N) state until we get to one with N=2 modulo 3. Since the transition from each C(N) to the next higher C(N) involves approximately N/3 steps iterating 2N+4 or 2N+5 each step, it amounts to calculating (approximately) 2N/3 each time. If you count each C(N) as a "step", the machine only takes three "steps" to produce the huge number 6×10462.

C(1) occurs at step 1 of the machine's operation.

C(1) becomes D(3,1,1) which becomes C(24) at step 409.

C(24) = C(8×3), becomes D(4,2,2,2,2,2,2,2,1,1), which becomes C(4600).

C(4600) = C(1533×3+1), becomes D(3,X,1,1) where "X" is a string of 1533 2's in a row, which becomes C(B) where B is calculated by the following BASIC code:


B = 3 FOR X = 1 to 1533 B = 2 * B + 5 NEXT X B = 2 * B + 4 B = 2 * B + 4 PRINT 2 * (B+1) / 3 + 2


or the following equivalent code for the bc tool in Unix:


b = 3 x = 1533 while (x > 0) { x = x - 1 b = 2 * b + 5 } b = 2 * b + 4 b = 2 * b + 4 2 * (b + 1) / 3 + 2


The first few values of B are: 3, 11, 27, 59, 123, 251, ... It gets doubled another 1530 times, which produces a final value of B around 9×10462. As luck would have it, this value of B is equal to 2 modulo 3, so this C(N) is the last. The machine proceeds to cut the number of 1's by 2/3 and halt.

The final line of the sample-code prints the number of 1's that the machine will leave on the tape just before halting. That value is:


64274998051227535695469776160755043010989153278551705369591334375073\ 53044244808535117560064286256132435863376749034679434299778610119888\ 86658563463437539907054723557168433393272208554111381733247158811503\ 30341868202595889544122642918970323255986879066608417102461351403452\ 62280750979809098117770580762320987650584291257578514811285364167355\ 86688154556760240264990637942260439101648404017014632900048602801665\ 8647567729882589366735582198403287964952012694238177960


References:

Heiner Marxen, table of record setters
Heiner Marxen, table of BB(6) candidates found by a search in 2000
Robert Munafo, Detailed Analysis of the Oct 2000 Marxen-Buntrock BB(6) Record-Setter 'q'.



1. "Collatz mappings" are described here and also in the book "CRC Concise Encyclopedia of Mathematics".


1 : I. N. Galidakis, formerly known to some as Ioannis, has been exploring a number of related issues for several years. There is a lot of work related to the Lambert W function, which is important in anything involving coupled exponentials such as xx and xex, or their inverses. An older version of real-valued hyper4 function description is available on web.archive.org under the URL http://users.forthnet.gr/ath/jgal/math/exponents4.html:>.


Robert Munafo's home pages at Pair Networks

Men Core Values


© 1996-2008 Robert P. Munafo. Email the author