| Large Numbers Notes |
This page goes into greater detail on the background of some of the large numbers and functions described on my large numbers page. The topics are presented in the same order as on that page.
Contents of this page:
Notes on Fermat numbers
Perfect numbers
Mersenne Numbers
Mersenne Primes
Latin Number Names
Googol and Googolplex
Real-valued hyper4 operator
Gödel's undecidable sentence
Notes on Skewes Number
Ackermann's Function
Comparing Graham's Number and Chained Arrow Numbers
What Turing and Church Did Not Prove
Notes on Busy Beaver Function
Busy Beavers and BB Candidate Record-Setters
Analysis of the 1997 Marxen-Buntrock BB6 Record-Setter
Analysis of the Oct 2000 Marxen-Buntrock BB6 Record-Setter 'q'
The Fermat Numbers, Sloane's A000215, are all numbers of the form 22N+1:
3
5
17
257
65537
4294967297
18446744073709551617
340282366920938463463374607431768211457
...
Their factorizations are:
3
5
17
257
65537
641 × 6700417
274177 × 67280421310721
59649589127497217 × 5704689200685129054721
...
All of the factors of all of the Fermat numbers, arranged in ascending order, make Sloane's sequence A023394. There is a project, similar to the Mersenne prime search, to find all terms in this sequence.
Goldbach's Proof That There are an Infinite Number of Primes
There are many proofs of the "infinitude" of primes. Eleven are listed in the book by Paulo Ribenboim, The New Book of Prime Number Records, 3rd edition (Springer-Verlag, New York, 1995, ISBN 0-387-94457-5).
This is Goldbach's proof, which he gave in a letter written to Euler in 1730. I have expanded and rewritten it from this page (in the past this page has been sometimes unavailable).
Lemma (Goldbach, 1730): Any two Fermat numbers Fn=22n+1 are relatively prime to each other.
Proof: First notice that
Fx - 2 = 22x - 1
Now set
Fn+1 - 2 = 22n+1 - 1
Substitute a^2 - 1 for (a - 1) × (a + 1) on the right hand side:
Fn+1 - 2 = (22n - 1) × (22n + 1)
therefore (substituting x for n+1) we have
Fx - 2 = (Fx-1 - 2) × Fx-1
we also have
F1 - 2 = F0
Therefore (by induction) it is true in general that
Fn - 2 = F0 × F1 × ... × Fn-1
Let m be any integer less than n. Fm is one of the terms on the right-hand side. Let d by any common factor of Fm and Fn. Since Fm appears on the right side, d must be a factor of Fn-2. Since d is a factor of Fn and of Fn-2, d is a factor of 2. But every Fermat number is odd, so d must be 1. Thus, the only common factor of any Fermat number and any other smaller Fermat number is 1: so the two Fermat numbers are relatively prime.
Theorem (Goldbach, 1730): There are infinitely many primes.
Proof: For each Fermat number Fn, choose a single prime divisor pn. Since the Fermat numbers are relatively prime to each other (just proven) we know that any two primes pm and pn are different. Therefore there is at least one prime pn for each Fermat number Fn, that is, at least one prime for every integer n.
Since any two Fermat numbers are relatively prime to each other, the Fermat sequence is called pairwise relatively prime. Any other pairwise relatively prime sequence can also be used to prove that there is an infinite number of primes. Here is another example:
w1 = 2
w2 = w1 + 1
w3 = w1 × w2 + 1
w4 = w1 × w2 × w3 + 1
etc.
(Other valid sequences result if you change the initial 2 and the 1's at the end of subsequent lines to any other pair of numbers that are relatively prime to each other). Another form of the same sequence is:
a1 = 2
an+1 = an2 - an + 1
This sequence starts:
| 2 | (prime) |
| 3 | (prime) |
| 7 | (prime) |
| 43 | (prime) |
| 1807 | = 13 × 139 |
| 3263443 | (prime) |
| 10650056950807 | = 547 × 607 × 1033 × 31051 |
| 113423713055421844361000443 | = 29881 × 67003 × 9119521 × 6212157481 |
| etc. |
Other sequences, such as the Mersenne numbers are relatively prime, but there is no simple iterative formula to generate them.
Following are the smaller perfect numbers. All are of the form
Pp = 2p-1(2p-1)
where p is one of the Mersenne primes listed below. The issue of odd perfect numbers is still unresolved.
| n | p | Pp |
| 1 | 2 | 6 |
| 2 | 3 | 28 |
| 3 | 5 | 496 |
| 4 | 7 | 8128 |
| 5 | 13 | 33550336 |
| 6 | 17 | 8589869056 |
| 7 | 19 | 137438691328 |
| 8 | 31 | 2305843008139952128 |
| 9 | 61 | 2658455991569831744654692615953842176 |
| 10 | 89 | 191561942608236107294793378084303638130997321548169216 |
| 11 | 107 | 13164036458569648337239753460458722910223472318386943117783728128 |
| 12 | 127 | 14474011154664524427946373126085988481573677491474835889066354349131199152 |
| 13 | 521 | A bit too long to show here |
| 14 | 607 | Even longer (and so on...) |
For more perfect numbers, check the list of Mersenne primes below and use the formula Pp = 2p-1(2p-1).
The "Mersenne numbers" are numbers of the form 2p-1 where p is prime. Here are the first 25. Because the Mersennes are pairwise relatively prime, each prime number will occur at most once in the factors column of this table:
| p | 2p-1 | factors |
| 2 | 3 | (prime) |
| 3 | 7 | (prime) |
| 5 | 31 | (prime) |
| 7 | 127 | (prime) |
| 11 | 2047 | = 23 × 89 |
| 13 | 8191 | (prime) |
| 17 | 131071 | (prime) |
| 19 | 524287 | (prime) |
| 23 | 8388607 | = 47 × 178481 |
| 29 | 536870911 | = 233 × 1103 × 2089 |
| 31 | 2147483647 | (prime) |
| 37 | 137438953471 | = 223 × 616318177 |
| 41 | 2199023255551 | = 13367 × 164511353 |
| 43 | 8796093022207 | = 431 × 9719 × 2099863 |
| 47 | 140737488355327 | = 2351 × 4513 × 13264529 |
| 53 | 9007199254740991 | = 6361 × 69431 × 20394401 |
| 59 | 576460752303423487 | = 179951 × 3203431780337 |
| 61 | 2305843009213693951 | (prime) |
| 67 | 147573952589676412927 | = 193707721 × 761838257287 |
| 71 | 2361183241434822606847 | = 228479 × 48544121 × 212885833 |
| 73 | 9444732965739290427391 | = 439 × 2298041 × 9361973132609 |
| 79 | 604462909807314587353087 | = 2687 × 202029703 × 1113491139767 |
| 83 | 9671406556917033397649407 | = 167 × 57912614113275649087721 |
| 89 | 618970019642690137449562111 | (prime) |
| 97 | 158456325028528675187087900671 | = 11447 × 13842607235828485645766393 |
A Mersenne number is a Mersenne prime if it is prime. In the above list, the Mersenne primes are M2 = 3, M3 = 7, M5 = 31, M7 = 127, M13 = 8191, etc. For each Mersenne prime there is also a perfect number Pp given by
Pp = 2p-1(2p-1).
Here is a (reasonably complete) list of known primes p for which Mp is a Mersenne prime. For the first few, the "discovery" credit refers to the discovery of the corresponding perfect number:
| n | p | digits in Mp | digits in Pp | year found | discoverer(s) |
| 1 | 2 | 1 | 1 | ~500BC | unknown |
| 2 | 3 | 1 | 2 | ~275BC | Euclid |
| 3 | 5 | 2 | 3 | ~275BC | Euclid |
| 4 | 7 | 3 | 4 | ~275BC | Euclid |
| 5 | 13 | 4 | 8 | 1456 | unknown |
| 6 | 17 | 6 | 10 | 1588 | Cataldi |
| 7 | 19 | 6 | 12 | 1588 | Cataldi |
| 8 | 31 | 10 | 19 | 1772 | Euler |
| 9 | 61 | 19 | 37 | 1883 | Pervouchine (or "Pervushin") |
| 10 | 89 | 27 | 54 | 1911 | Powers |
| 11 | 107 | 33 | 65 | 1914 | Powers (also claimed by Fauquembergue) |
| 12 | 127 | 39 | 77 | 1876 | Lucas |
| 13 | 521 | 157 | 314 | 19520130 | Robinson |
| 14 | 607 | 183 | 366 | 19520130 | Robinson |
| 15 | 1279 | 386 | 770 | 19520626 | Robinson |
| 16 | 2203 | 664 | 1327 | 19521007 | Robinson |
| 17 | 2281 | 687 | 1373 | 19521009 | Robinson |
| 18 | 3217 | 969 | 1937 | 19570908 | Riesel |
| 19 | 4253 | 1281 | 2561 | 19611103 | Hurwitz & Selfridge |
| 20 | 4423 | 1332 | 2663 | 19611103 | Hurwitz & Selfridge |
| 21 | 9689 | 2917 | 5834 | 19630511 | Gillies |
| 22 | 9941 | 2993 | 5985 | 19630516 | Gillies |
| 23 | 11213 | 3376 | 6751 | 19630602 | Gillies |
| 24 | 19937 | 6002 | 12003 | 19710304 | Tuckerman |
| 25 | 21701 | 6533 | 13066 | 19781030 | Noll & Nickel |
| 26 | 23209 | 6987 | 13973 | 19790209 | Noll |
| 27 | 44497 | 13395 | 26790 | 19790408 | Nelson & Slowinski |
| 28 | 86243 | 25962 | 51924 | 19821025 | Slowinski |
| 29 | 110503 | 33265 | 66530 | 19880128 | Colquitt & Welsh |
| 30 | 132049 | 39751 | 79502 | 19830919 | Slowinski |
| 31 | 216091 | 65050 | 130100 | 19850901 | Slowinski |
| 32 | 756839 | 227832 | 455663 | 19920401 | Slowinski & Gage |
| 33 | 859433 | 258716 | 517430 | 19940201 | Slowinski & Gage |
| 34 | 1257787 | 378632 | 757263 | 19960903 | Slowinski & Gage |
| 35 | 1398269 | 420921 | 841842 | 19961113 | Armengaud, Woltman et. al. |
| 36 | 2976221 | 895932 | 1791864 | 19970824 | Spence, Woltman et. al. |
| 37 | 3021377 | 909526 | 1819050 | 19980127 | Clarkson, Woltman, Kurowski et. al. |
| 38 | 6972593 | 2098960 | 4197919 | 19990601 | Hajratwala, Woltman, Kurowski et. al. |
| 39 | 13466917 | 4053946 | 8107892 | 20011114 | Cameron, Woltman, Kurowski, et. al. |
| ?? | 20996011 | 6320430 | 12640858 | 20031202 | Shafer, GIMPS et. al. |
| ?? | 24036583 | 7235733 | 14471465 | 20040515 | Findley, GIMPS et. al. |
| ?? | 25964951 | 7816230 | 15632458 | 20050218 | Nowak, GIMPS et. al. |
| ?? | 30402457 | 9152052 | 20051215 | Cooper, Boone, GIMPS et. al. | |
| ?? | 32582657 | 9808358 | 20060904 | Cooper, Boone, GIMPS et. al. |
For the primes near the end of the list labeled with "??", it is not known if there are Mersenne primes in the region between them, because the eligible primes have not all been completely tested. (For example, the 29th Mersenne prime was found in 1988, five years after the immediately preceding and following Mersenne primes.) All values of p less than 4395400 have been tested (only the prime values need to be tested, because p must be prime for Mp = 2p-1 to be prime.) See The Mersenne Search Status page for the most current information (and find out how you can set up your computer to join in the search!).
Searches for Mersenne primes are simplified by a number of special tests, that are not generally applicable to searches for all primes. For example, if p is prime, then N = 2p-1, when represented in binary, has p 1's and no 0's. It is not a multiple of 3 (except for the case p=2) because the number of binary digits is odd so if you group the binary digits in groups of 2 and add modulo 3 (a process analagous to "casting out 9's") you'll get 1 left over. A similar process for groups of 3 1's shows that N is not a multiple of 7 except for the case p=3.
This table of Latin number names is adapted from the French textbook formerly at this URL: http://www.guetali.fr/home/jeanneau/chap24.html.
| Arabic | Roman | Latin |
| 1 | I | unus, una, unum |
| 2 | II | duo, duae, duo |
| 3 | III | tres, tres, tria |
| 4 | IV | quattuor |
| 5 | V | quinque |
| 6 | VI | sex |
| 7 | VII | septem |
| 8 | VIII | octo |
| 9 | IX | novem |
| 10 | X | decem |
| 11 | XI | undecim |
| 12 | XII | duodecim |
| 13 | XIII | tredecim |
| 14 | XIV | quattuordecim |
| 15 | XV | quindecim |
| 16 | XVI | se(x)decim |
| 17 | XVII | septemdecim |
| 18 | XVIII | duodeviginti |
| 19 | XIX | undeviginti |
| 20 | XX | viginti |
| 21 | XXI | viginti unus ; unus et viginti |
| 22 | XXII | viginti duo ; duo et viginti |
| 25 | XXV | viginti quinque ; quinque et viginti |
| (there are similar double forms for all numbers below 100) | ||
| 28 | XXVIII | duodetriginta |
| 29 | XXIX | undetriginta |
| 30 | XXX | triginta |
| 38 | XXXVIII | duodequadraginta |
| 40 | XL | quadraginta |
| 50 | L | quinquaginta |
| 60 | LX | sexaginta |
| 70 | LXX | septuaginta |
| 80 | LXXX | octoginta |
| 90 | XC | nonaginta |
| 100 | C | centum |
| 122 | CXXII | centum viginti duo |
| 200 | CC | ducenti, ae, a |
| 300 | CCC | trecenti, ae, a |
| 400 | CCCC | quadringenti, ae, a |
| 500 | D | quingenti, ae, a |
| 600 | DC | sescenti, ae, a |
| 700 | DCC | septingenti, ae, a |
| 800 | DCCC | octingenti, ae, a |
| 900 | DCCCC | nongenti, ae, a |
| 1000 | M | mille |
| 1124 | MCXXIV | mille centum viginti quattuor |
| 2000 | MM | duo milia |
| 3200 | MMMCC | tria milia ducenti |
| 10,000 | decem milia | |
| 100,000 | centum milia | |
| 1,000,000 | ducenta milia |
Some time before 1940, mathematician Edward Kasner asked his (at the time) nine-year-old nephew to name the number 10100, and the nephew invented the name googol. Kasner then named 10googol with the name googolplex.
googol is a class-2 number, and is reasonably easy to comprehend. You can even write it out:
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The number of particles (protons, electrons, photons, etc.) in the universe has been estimated as 1080, so while a googol is an awful lot bigger, it's still roughly the same idea.
Googolplex on the other hand, is 10googol, or:
1010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
and that's a number far too big to relate to anything physical or is it?
This description comes from Don Page, and is highly paraphrased:
In quantum physics one talks about distinct and indistinct states, overlapping wave functions and many other things, and one of the important principles that comes out of it the concept of how much time it takes for a system with many possible states to spontaneously assume a certain, given state. For simplicity it is imagined that the system is inside an impermeable, rigid box (which thus prevents anything from going in or out and thus isolates the system). If you're outside the box you have no idea what's inside, and any state can spontaneously occur (even a higher-energy state). For example, if the system is a proton and a neutron, the possible states include: A hydrogen atom, a hydrogen atom with the electron in a higher "shell", a separate proton and electron, a neutron, or many other things. The amount of time you might have to wait for one of these states to occur depends on the number of possible states, which depends on the number of particles inside.
Cosmologists and physicists often use these concepts to discuss what might happen inside black holes. The inside of a black hole is isolated from us (nothing can get out) and so one can talk about all the possible states that might exist inside the black hole and how long it might take for a certain state to arise. In theory, if left inside a rigid nonpermeable box for long enough (so that nothing new can go into the black hole) the contents of the box will at some point in the future cease to be a black hole and instead be real matter.
The formula for the number of possible states is a really big exponential function with Planck constants in it and other such things. I don't have the exact formula but we do know that if the black hole had a mass equal to the Milky Way Galaxy, plus the Andromeda galaxy, and Magellanic clouds, and a few other similar globular clusters in our region, then the number of states, and the amount of time you'd have to wait, is somewhere around a googolplex. Its such a big number that there's no way to distinguish what units should be used. It could be a googolplex seconds or a googolplex millenia. Whatever the units, thats the amount of time you'd have to wait for the contents of an impermeable, rigid box containing a black hole with the mass of the local galactic neighborhood to spontaneously become identical to the local galactic neighborhood as we know it.
The original (and much better) description is here (old URL was http://www.uni-frankfurt.de/~fp/Tools/GetAGoogol.html)
Reference:
Edward Kasner and James Newman, Mathematics and the Imagination, Penguin, 1940.
Extension of the hyper4 function to reals
The hyperexponential (also called "tetration" or "power tower", see this page ) x④y is defined for integer y>0 by:
x④1 = x
x④y = x^(x^^(y-1)) for y>1
To remove ambiguity in other contexts, I am using the symbol ④ instead of the commonly-used ^^. This function is an extension of the series (addition, multiplication, exponentiation). Several times, people have asked me: how can one define x④y for real y? For example, what is π④e)?
There are three other functions that have been extended to the reals in ways that seem promising: the factorials by the Gamma function, hyperfactorials by the K-function, and the lower (Simon and Plouffe 1995) superfactorial by Barnes' G-function. All of these are defined in terms of an integral or an infinite product and they all have a "recurrence relation" like Gamma(x+1) = x Gamma(x). However, defining the tetration operator in terms of an infinite product, or even an infinite product of infinite products of integrals, seems like it will not work since it needs to behave more like an infinite exponential "tower". Using this method for the hyper4 function would probably require using an integral that acts like the xy function in one asymptote and like a lower function in the other.
Daniel Geisler has collected a lot of information related to this issue at his website, tetration.org. The best solution I have found so far was developed by I. N. Galidakis and is described here1. It uses the concept of a power tower with gradually growing exponents for example, it definines x④0.8=x0.8, x④0.9=x0.9, x④1.0=x1.0=xx0.0, x④1.1=xx0.1, x④1.2=xx0.2, etc. While this function is continuous, its 1st, 2nd, etc. derivatives are not, except for the 1st derivative when x=e.
Here is my solution:
Begin by imagining an infinite power tower of the form:
x④y = x^(x^(a^(b^(c^...^(1^(1^(1)))...))))
where there are a finite number of x's on the bottom, and an infinite number of 1's on the top. The 1's can be completely ignored, and all you need to look at are the x's and the special terms a, b, c, ... in the middle. These terms have values between 1 and x and are called "transition terms". (Galidakis' method fits this model but only has one transition term.)
Now imagine that the x's at the bottom are actually just a tiny bit smaller than x, and that the 1's at the top are actually a tiny bit larger than 1. In fact, every term of the power tower is between 1 and x. The terms shown as x are just very very close to x.
We want something that is well-behaved, continuous and smooth. That means, essentially, x>a>b>c>1 and as y increases, the values of a, b, c increase gradually until eventually a=x. To achieve this result I decided to use the error function erf(z) = 2/π integral(0...z) [e(-z2)dt]
The terms in the power tower need to vary from x to 1 in a smooth manner with two asymptotes and an exponential bias. This is accomplished by the function:
g(x,z) = x^((1+erf(2z-1))/2)
This function increases monotonically from 1 for z<<0 to x for z>>1, and most of the transition from 1 to x occurs in the range 0<z<1. Here are examples for x=2 and x=3:
| g(2,-1.0) = 1 |
| g(2,-0.8) = 1 |
| g(2,-0.6) = 1.001 |
| g(2,-0.4) = 1.004 |
| g(2,-0.2) = 1.017 |
| g(2,0.0) = 1.056 |
| g(2,0.2) = 1.147 |
| g(2,0.4) = 1.309 |
| g(2,0.6) = 1.528 |
| g(2,0.8) = 1.743 |
| g(2,1.0) = 1.894 |
| g(2,1.2) = 1.967 |
| g(2,1.4) = 1.992 |
| g(2,1.6) = 1.999 |
| g(2,1.8) = 2 |
| g(2,2.0) = 2 |
All terms end up being between 1 and x, and there are none that are precisely equal to 1 or to x. However, they get arbitrarily close.
Then I define a finite approximation power-tower:
T(x,y,n) = g(x,y) ^ (g(x,y-1) ^ (g(x,y-2) ^ (... ^ (g(x,y-n+1)))))
for example: T(x,y,3) = g(x,y)^(g(x,y-1)^g(x,y-2))
Then finally, the first approximation to a real-valued tetration operator is:
x④y ≅ H(x,y) = limit(n→infinity) T(x,y,n)
Here are some examples:
| to approximate | use |
| x④y | xabc1 |
| 2④2.0 | 21.8941.05611 |
| 2④2.2 | 21.9671.14711 |
| 2④2.4 | 21.9921.3091.0011 |
| 2④2.5 | 21.9971.4141.0021 |
| 2④2.6 | 21.9991.5281.0041 |
| 2④2.8 | 221.7431.0171 |
| 2④3.0 | 221.8941.0561
|
Notice that when y is an integer, there are two terms (1.894 and 1.056 in this example) that are almost equivalent to an x. (But not exact: 1.8941.056 is a little less than 2; more on this later.)
Also notice that when y is a half-integer, one of the terms is the square root of x and two others (1.997 and 1.002 in this example) just about "cancel out".
Finally, notice that the example for 2④3 is the same as the one for 2④2, except that all the values have moved up one step in the power tower.
Computer implemenation of this function yields the following values:
| H(2, 1.0) = 1.962860044952 | H(3, 1.0) = 3.01484034768 |
| H(2, 1.2) = 2.173179457529 | H(3, 1.2) = 3.792914579042 |
| H(2, 1.4) = 2.466135238417 | H(3, 1.4) = 5.340372413507 |
| H(2, 1.6) = 2.885294328336 | H(3, 1.6) = 8.646517688235 |
| H(2, 1.8) = 3.385703944243 | H(3, 1.8) = 15.08586176829 |
| H(2, 2.0) = 3.898340329384 | H(3, 2.0) = 27.44381034902 |
| H(2, 2.2) = 4.510162634986 | H(3, 2.2) = 64.5178910787 |
| H(2, 2.4) = 5.525615740224 | H(3, 2.4) = 353.1873877405 |
| H(2, 2.6) = 7.388565694475 | H(3, 2.6) = 13348.65068561 |
| H(2, 2.8) = 10.45197695863 | H(3, 2.8) = 15768315.29971 |
| H(2, 3.0) = 14.91136402023 | H(3, 3.0) = 1.241724436834×1013 |
| H(2, 3.2) = 22.78737180478 | H(3, 3.2) = 6.065367827881×1030 |
| H(2, 3.4) = 46.06553106953 | H(3, 3.4) = 3.259939872799×10168 |
| H(2, 3.6) = 167.5636835838 | H(3, 3.6) = 8.41325266884×106368 |
| H(2, 3.8) = 1400.743379664 | H(3, 3.8) = 2.40222047×107523398 |
| H(2, 4.0) = 30815.4026169 | H(3, 4.0) = 2.43×105924531213185
|
Immediately you should notice that H(3,1), H(2,3) etc. do not have the expected integer values. This results from the fact that the terms given by the g() function do not cancel out in the way that they need to, and for sufficiently low values of y, the first term is not x. To compensate for this, two techniques are used.
The first technique relies on the expected property that
x④(y-1) = logx(x④y)
The technique is to calculate x④y2 for y2=y, or y2=y+1, or y2=y+2, using whatever value of y2 is high enough to ensure that the power tower has at least one term equal to x, then take the log of the result an appropriate number of times. More formally:
H2(x,y,0) = H(x,y),
H2(x,y,n) = logx(H2(x,y+1,n-1)) for n>0
and
x④y ≅ H3(x,y) = limit(n→inf) H2(x,y,n)
The second adjustment that needs to be done is to add a constant bias to y:
H4(x,y) = H3(x,y+Kx)
The bias Kx depends on x but not on y; it corrects for the fact that the transition terms do not "cancel out" perfectly. To compute Kx any value of y can be used, for simplicity set y=1 and find the value of Kx such that H4(x,y)=x.
Applying these techniques yields the following:
| H4(2, 1.0) = 2 | H4(3, 1.0) = 3 |
| H4(2, 1.2) = 2.220044044808 | H4(3, 1.2) = 3.767454385038 |
| H4(2, 1.4) = 2.535796688385 | H4(3, 1.4) = 5.286052796168 |
| H4(2, 1.6) = 2.976545041727 | H4(3, 1.6) = 8.533218056394 |
| H4(2, 1.8) = 3.48226088408 | H4(3, 1.8) = 14.87566955748 |
| H4(2, 2.0) = 4 | H4(3, 2.0) = 27 |
| H4(2, 2.2) = 4.659076583142 | H4(3, 2.2) = 62.73827352165 |
| H4(2, 2.4) = 5.798970054811 | H4(3, 2.4) = 332.7270753606 |
| H4(2, 2.6) = 7.870989644306 | H4(3, 2.6) = 11786.360106 |
| H4(2, 2.8) = 11.17544894536 | H4(3, 2.8) = 12516938.66315 |
| H4(2, 3.0) = 16 | H4(3, 3.0) = 7.625597484988×1012 |
| H4(2, 3.2) = 25.26514549836 | H4(3, 3.2) = 8.585464199454×1029 |
| H4(2, 3.4) = 55.67547493765 | H4(3, 3.4) = 5.638449242549×10158 |
| H4(2, 3.6) = 234.1013825216 | H4(3, 3.6) = 3.33366805307×105623 |
| H4(2, 3.8) = 2312.838710271 | H4(3, 3.8) = 3.02141334×105972097 |
| H4(2, 4.0) = 65536 | H4(3, 4.0) = 4.89×103638334640024
|
Similar results are produced with other values of x>1. However, notice:
- I am not certain that the limit is well-defined for arbitrarily large y. The error function converges quickly, but since it's being used for terms in a power tower, it might not converge fast enough if the power tower is tall. Due to this, even when the "bias" is perfectly adjusted for a small value of y the same bias will not produce the expected result for larger y. However, the formulas seem to behave well for the values I have tested, which all fall within the range 1 < x④y < 1010308.
- As I just mentioned, the bias is different for different values of x. For x greater than about 2.834 the bias is positive, otherwise it is negative. Beyond establishing that this crossover point is not e=2.71828... I have not investigated further. I suspect a "better" g(x,z) would involve a function similar to the error function but with a slightly different shape so that a bias is not necessary. Determining the proper definition of this function is currently beyond my abilities.
I have not yet speculated on extensions handling y<1, x<1 or for complex x and y.
(Following is a brief description of my own history with this function.
I have been obsessed with the higher-order compound counting functions (increment, add, multiply, etc...) in 1974 when I was 10 years old. At age 12 I began trying to make slide rules to compute the "lower hyper4 function" defined by the left-associative repetition of exponents: x④y = (((x^x)^x)^...)^x and did not succeed for the first few attempts. I discovered the formula x^(x^(y-1)) at age 14 and had a working slide rule a couple months later. In that part of my life (which I call "my classical period") I never could crack the right-associative "higher hyper4 function", but the obsession never died.
By my late 30's I had seen the common threads between the various generalized functions Gamma (for the factorials), Barnes' G-function (for the lower superfactorial), and the K-function (for the hyperfactorials). Eventually I decided to just put the assumptions down on paper and manually construct a function that would meet all of them. What I have now is not wholly satisfactory because it requires an adjustment for the base.)
The discipline of formal logic was described by Aristotle around 350 BC although it was probably developed well before him. He described the syllogism, an inference of truth based on other established truths. For example:
if all integers are real numbers
and all primes are integers
then all primes are real numbers
Around 1850, George Boole refined formal logic by defining more precise symbols, axioms and rules, the result is called boolean algebra and propositional calculus.
Meanwhile, set theory was being developed by Georg Cantor and others.
Then Friedrich Ludwig Gottlob Frege began developing the predicate calculus. He applied existing formalism techniques to set theory and began developing a complete formalized system of arithmetic, symbolic logic, and set theory from basic axioms and inference rules. This work was just being published as Grundgesetze der Arithmetik vol. 2 when Bertrand Russell (a mathematician who was also well-known as a writer) presented Frege with the following paradox:
Consider the class of all classes that are not a member of itself. Is this class a member of itself or not?
This paradox challenged Frege so completely that he had to withdraw his work because its foundation was flawed.
After toppling Frege's work, Bertrand Russell and Alfred North Whitehead began work on a similar and even more massive project, with corrections to the problems in Frege's approach and using some of the foundation axioms of Giuseppe Peano. Their goal was to make it possible to derive every true theorem in number theory by starting with a set of axioms and a set of inference rules, and methodically applying all the inference rules to the axioms and existing theorems to create new theorems. For examples, see the book Gödel, Escher Bach.
The technique of deriving all truth by an automatic process is appealing it suggests the possibility of automation (e.g. by a mechanical or electronic computer) to eliminate human error in discovering proofs. However, from the start there was much doubt about whether it could ever be used to discover every truth in number theory (if it were possible, the technique would have completeness). For one thing, there was the issue of time and algorithmic complexity: the number of theorems that are provable in such a system is infinite, and all but a very very tiny fraction of those theorems are completely useless (true, but useless).
The goal of formalization was not questioned, and was continued across various fields of mathematics, e.g. by Hilbert.
Gödel finally showed that the Russell/Whitehead approach would not achieve completeness, in fact that no (sufficiently powerful) axiomatic system of number theory can prove all statements which are true in that system. He did this by showing how statements, expressions and theorems in a number theory system S could be used to encode its own symbols, axioms and inference rules. He then demonstrated that there exists a statement G in that system which asserts "Statement G cannot be derived through the axioms and inference rules of formal system S.". If G is indeed inferrable, then it is false and the formal system S is inconsistent. But if G cannot be inferred, then it is true, and exists as an example of a true statement that cannot be inferred, and therefore demonstrates that S is incomplete. Thus, all sufficiently powerful formal systems of number theory are either inconsistent or incomplete.
It is now generally agreed that this phenomenon of incompleteness is a property of all axiomatic systems.
The Skewes Numbers concern the distribution of primes, specifically how many primes exist that are below a certain number. The Skewes numbers set an upper bound for the point at which the number of primes ends up being less than the approximation "li(x)". See here for a complete description of the problem.
The following table gives the number of primes less than each power of 10:
| n | 10n | primes |
| 1 | 10 | 4 |
| 2 | 100 | 25 |
| 3 | 1,000 | 168 |
| 4 | 10,000 | 1,229 |
| 5 | 100,000 | 9,592 |
| 6 | 1,000,000 | 78,498 |
| 7 | 10,000,000 | 664,579 |
| 8 | 100,000,000 | 5,761,455 |
| 9 | 1,000,000,000 | 50,847,534 |
| 10 | 10,000,000,000 | 455,052,511 |
| 11 | 100,000,000,000 | 4,118,054,813 |
| 12 | 1,000,000,000,000 | 37,607,912,018 |
| 13 | 10,000,000,000,000 | 346,065,536,839 |
| 14 | 100,000,000,000,000 | 3,204,941,750,802 |
| 15 | 1,000,000,000,000,000 | 29,844,570,422,669 |
| 16 | 10,000,000,000,000,000 | 279,238,341,033,925 |
| 17 | 100,000,000,000,000,000 | 2,623,557,157,654,233 |
| 18 | 1,000,000,000,000,000,000 | 24,739,954,287,740,860 |
| 19 | 10,000,000,000,000,000,000 | 234,057,667,276,344,607 |
| 20 | 100,000,000,000,000,000,000 | 2,220,819,602,560,918,840 |
| 21 | 1,000,000,000,000,000,000,000 | 21,127,269,486,018,731,928 |
| 22 | 10,000,000,000,000,000,000,000 | 201,467,286,689,315,906,290 |
(source: Caldwell)
Beyond about n=12 (and slightly larger values as computers get faster) this table cannot be produced by the direct method of finding each prime and counting. There was sufficient interest in this that, as early as 1867, Kulik had completed a table of the smallest factor of every integer (and therefore, giving all the primes) up to slightly above 100,000,000!
Soon after Kulik's table, Meissel found an indirect way to count the exact number of primes up to far higher numbers. His method was simplified by Lehmer in 1959, then improved by Lagarias, Miller and Odlyzko in 1985, and then improved further by Deleglise and Rivat.
The method has been improved to such an extent that it is actually possible to derive values of pi(x) for really huge values of x like 10100 or 101000. In 2005 Patrick Demichel found the actual point at which the pi(x) vs. Li(x) crossover occurs, 1.397162914×10316. .
See this page for descriptions of the different versions of Ackermann's function.
Comparing Graham's Number and Chained Arrow Numbers
The following three numbers are listed in ascending order:
3→3→64→2
Graham's number
3→3→65→2.
First, note the function-based definition of Graham's number:
And we will give another similar definition for a function we'll call c3(N):
Here is the expansion process for 3→3→N→2:
3→3→1→2 = 27 = c3(1)
3→3→2→2 = 3→3→(3→3→1→2) = 3→3→c3(1) = hy(3,c3(1),3) = c3(2)
3→3→3→2 = 3→3→(3→3→2→2) = 3→3→c3(2) = hy(3,c3(2),3) = c3(3)
and in general,
3→3→N→2 = c3(N)
Now note that:
c3(1) << gn(1) << c3(2)
c3(2) = hy(3, c3(1), 3) << hy(3, gn(1)+2, 3) = gn(2) << c3(2)
and in general
c3(N) = hy(3, c3(N-1), 3) << hy(3, gn(N-1)+2, 3) = gn(N) << c3(N+1)
therefore,
c3(64) = 3→3→64→2 << gn(64) = Graham's number << 3→3→65→2 = c3(65)
A reader asked me how 3→3→3→3 compares to these:
3→3→3→3
= 3→3→(3→3→2→3)→2
= 3→3→(3→3→(3→3→1→3)→2)→2
= 3→3→(3→3→27→2)→2
= 3→3→c3(27)→2
= c3(c3(27))
therefore 3→3→3→3 is much bigger than Graham's Number.
What Turing and Church Did Not Prove
Turing is often credited with having proven that there are limits to what can be computed by computers, including all conceivable designs of real-world computers. This actually depends a lot on what you mean by computer. Here are some types of computers that a Turing machine cannot emulate:
Each of these has one or more properties that make Turing's argument inapplicable. Usually this is some form of unpredictability on a fundamental level. For the ideal analog computers, Turing's argument does not apply because of the cardinality of the reals It is known that Turing planned to create a limits-to-computability proof for such computers, but that he never did (I suspect it was not because such computers are actually without limit!)
In 1964 Milton Green developed a lower bound for the Busy Beaver function that was published in the proceedings of the 1964 IEEE symposium on switching circuit theory and logical design. Heiner Marxen and Juergen Buntrock described it as "a non-trivial (not primitive recursive) lower bound". This lower bound can be calculated but is too complex to state as a single expression in terms of n. When n=8 the method gives
BB(8) >= 3 × (7 × 392 - 1) / 2 ≅ 8.248×1044
Ray Kurzweil, in his 1999 book The Age of Spiritual Machines, discusses the Busy Beaver function in a footnote. I believe he was quoting the results of Green or Marxen and Buntrock, because his description is suggestive of the Ackermann function and because of the near-agreement of the values for BB(8). Kurzweil states that a 6-state Turing machine "can add" and BB(6) is 35; that a 7-state machine can multiply and BB(7) is 22961, and that an 8-state machine can exponentiate and BB(8) is "about 1043". Then he goes on to say that BB(10) requires a notation in which the height of a stack of exponents is given by another stack of exponents whose height is given by... and so on; this might be equivalent to the generalized hyper operator. Presumably BB(9) is something in between like hyper4 or hyper5. He says that BB(12) requires an "even more exotic notation", and that the limits of human intelligence are reached well before BB(100).
It has since been shown that at least one possible BB(6) candidate performs iterated exponentiation, the equivalent of hyper4.
The direct approach to finding actual values of the busy beaver function is to simulate Turing machines with a computer program, trying each possible set of rules to see what it does. The first impediment to this is that the number of machines to test grows so quickly. Lin and Rado give the forumla (2N-1)4N2N-2, which incorporates some of the rules listed below:
N (2N-1) 4N2N-2
1: 1
2: 192
3: 103680
4: 117440512
5: 230400000000
6: 697437190619136
7: 3018837446159761408
8: 17708874310761169551360
etc.
To actually implement a complete test of all Turing machines of a given state, you have to deal with many different types of cases. Many are easy to deal with and many are very difficult.
(If your browser supports Java you can run any of the Turing machines given here on this simulator. If you have Linux, you can use my own simulator written in perl
Since the tape is symmetrical, left and right can be interchanged. These two programs differ only in the directions of the arrows, so one can be ignored:
1,_: 2,1,> 1,1: 2,1,<
2,_: 1,1,< 2,1: H,1,>
1,_: 2,1,< 1,1: 2,1,>
2,_: 1,1,> 2,1: H,1,<
the latter program is skipped by setting as a rule that the "1,_" state always moves to the right.
If the "1,_" state writes a "_" and moves to state N, the machine will find itself in state N with a completely empty tape. Such a machine is identical to a machine in which states 1 and N are switched, except that it takes one extra step. Thus, for the purpose of the Busy Beaver problem, the first machine can be ignored. This is accomplished by setting a rule that the "1,_" state should always write a 1.
If the "1,_" state moves to state N, and if N is greater than 2, states N and 2 can be swapped and the resulting machine is equivalent, and as a result one of these two machines can be ignored. This is accomplished by setting a rule that the "1,_" state always proceeds to state 1 or 2.
If the "1,_" state goes to state 1, then the machine will stay in state 1 forever and write 1's forever, and therefore will never halt. Thus, such machines can be skipped. Combined with the previous rules, this completely defines the actions for the "1,_" state: It should be "1,_: 2,1,>".
The aforegoing rules establish that the "2,_" action will be the second to execute. Therefore it should not halt: if it did the machine would never use the "1,1" action, which could be used by having the "2,_" action perform "1,1,<".
Furthermore, if the "2,_" action goes to state 1 or state 2, it should not move to the right because if it did it would create an infinite-loop machine (such a machine would continually move right, either staying in state 2 or alternating between states 1 and 2.).
No state should have both of its actions go to itself (i.e. not change state) because that would result in an infinite loop as soon as that state is entered. An example of this is if the "3,_" action and the "3,1" action both went to state 3.
Except for state 1 (which is by definition the starting state) and state 2 (which, if using the aforegoing rules, is the state entered from the "1,_" state), all the other states can be renumbered in arbitrary order, and the resulting machines all have the same behavior. This is important for machines with 4 or more states and allows lots of programs to be eliminated quickly.
If the program doesn't have any states that contain a HALT action, we know it will never halt and it can be skipped.
Programs containing two or more HALT actions can be skipped because only one of the HALTs will ever be used (whichever one it gets to first) and the other HALTs would be superfluous.
If the HALT writes a 0, the same machine with a HALT that writes a 1 will be at least as good or better; thus the 0-writing version can be ignored.
A machine that moves left when HALTing is the same as a machine that moves right when HALTing, thus only one of these two possibilities needs to be looked at. This is accomplished by setting a rule that the HALT always moves to the right.
Some programs contain unused states. For example:
1,_: 1,1,> 1,1: 1,1,<
2,_: 2,1,> 2,1: H,1,>
State 1 always goes to state 1, so state 2 is never reached. Programs like this can be skipped because there is always a way to modify the program so that it uses the extra state and produces at least as much output.
A similar example is:
1,_: 2,1,> 1,1: 2,1,<
2,_: 1,1,< 2,1: 1,1,<
3,_: 4,1,> 3,1: 4,1,<
4,_: H,1,> 4,1: 3,1,<
There is no way to get past states 1 and 2. Such machines are eliminated by looking for loops in the state graph.
Some programs halt very quickly, for example:
1,_: 2,1,> 1,1: 2,1,<
2,_: 3,1,> 2,1: 3,1,<
3,_: 4,1,> 3,1: 4,1,<
4,_: H,1,> 4,1: H,1,>
This program halts quickly because each state goes to a higher-numbered state, and the highest state (state 4) always halts. Based on that alone you know it will run for only 4 steps. (However, it doesn't hurt much to run such machines because they don't take long to run)
Based on all these examples, it seems as though it ought to be easy to search for the busy beaver using brute force. The trouble is in the programs that have to be run to determine what happens. Some exhibit very complex behavior. Only the simplest behaviors can be detected automatically.
If a program is simulated, and the simulation shows that the program halts before it has used all of its rules, then the unused rules are irrelevant and we know right away that all other programs that differ only in these unused rules will have the same behavior. This allows those other programs to be skipped.
Any programs that don't halt real quickly have to be analyzed in some way, usually by human inspection.
The simplest behaviors are like this trivial example:
1,_: 2,1,> 1,1: 2,1,<
2,_: 1,1,> 2,1: H,1,>
which will keep writing 1's and moving to the right.
Many programs have similar behavior, writing some continuous pattern and moving steadily and repeatedly either to the left or to the right. Here's a slightly less trivial example it moves forever to the left in a two steps forward, one step back manner, leaving a trail of alternating 1's and 0's:
1,_: 2,1,> 1,1: 3,1,<
2,_: 3,1,< 2,1: H,1,>
3,_: 1,_,< 3,1: 3,_,<
Automatically identifying such programs can be difficult, because sometimes the pattern it generates is complex and it takes a large number of steps to repeat each cycle.
Some programs write a fixed number of 1's but never halt, for example:
1,_: 2,1,> 1,1: H,1,>
2,_: 2,_,> 2,1: 1,1,>
which writes a single 1 and then goes off forever to the right.
Some programs grow forever but don't move steadily in one direction or the other. For example:
1,_: 2,1,> 1,1: 1,1,<
2,_: 3,1,> 2,1: 2,1,>
3,_: 1,_,< 3,1: H,1,>
It produces an ever-expanding row of 1's in both directions, by moving back and forth. Each time it gets to an end it writes a new 1 and then reverses direction. State 3 is never reached while on a 1, so it never halts. Many programs behave in a similar manner but produce more complicated patterns or move in more complicated ways.
Some programs grow forever without any repeating pattern. The simplest example is this machine:
1,_: 2,1,> 1,1: 1,1,<
2,_: 1,_,< 2,1: 2,_,>
3,_: 1,_,< 3,1: H,1,>
which is a binary counter. It keeps moving back and forth producing each binary number in sequence. The string of 1's and 0's is always changing, has a pattern that never repeats and grows forever (although, of course, the growth is very slow).
The next example is a little more complicated. It writes a string of 1's to the right, and simultaneously updates a binary number on the left; the binary number tells how many 1's have been written to the right:
1,_: 2,1,> 1,1: 1,1,<
2,_: 3,_,> 2,1: 2,_,>
3,_: 4,1,> 3,1: 3,1,>
4,_: 5,_,< 4,1: H,1,>
5,_: 1,_,< 5,1: 5,1,<
Here is another machine (given by Marxen and Buntrock) that works like a binary counter, but produces a more complex pattern. Whenever all the 1's are to the left of the head (which occurs often) the binary value is recognizable. The digits are coded in groups of 3 cells: what would be a "1" digit appears as "111" and what would be a "0" appears as "_11":
1,_: 2,1,> 1,1: 1,1,<
2,_: 1,_,< 2,1: 3,_,>
3,_: 3,_,< 3,1: 4,1,>
4,_: 5,1,> 4,1: 1,_,<
5,_: 2,_,> 5,1: H,1,>
Many other programs move around erratically, writing seemingly random patterns of 1's and 0's and transitioning between states in a seemingly random order, not moving steadily in any direction. Here's an example, the 5-state machine found by Schult in 1983:
1,_: 2,1,> 1,1: 3,_,<
2,_: 3,1,> 2,1: 4,1,>
3,_: 1,1,< 3,1: 2,_,>
4,_: 5,_,> 4,1: H,1,>
5,_: 3,1,< 5,1: 1,1,>
This machine was a record-holder for a while in the 5-state "contest". In order to halt it has to be in state 4 with a 1 on the tape. The only way to be in state 4 is coming from state 2 with a 1 on the tape. The only way to be in state 2 is coming from state 1 with a 0 on the tape, or state 3 with a 1 on the tape. All other states lead to states 1 and 3, which is where it spends most of its time. When you run it, it appears to fill the tape with ever longer patterns of "111_111_111_...", then backtrack over those patterns and convert them into "1_1_1_1_1_...", but every time it completes a cycle there is a little difference, like an extra 1 at the left or right end. It runs for 134,467 steps before finally producing just the right pattern of 1's and 0's to lead itself into its HALT state.
There are many programs like this that run for a long time and eventually halt, and many others that have been simulated for a long time without following any type of pattern that anyone has been able to figure out. Such machines resemble this one:
1,_: 2,1,> 1,1: 2,1,<
2,_: 3,1,< 2,1: 5,_,>
3,_: 4,_,< 3,1: 1,_,>
4,_: 1,1,> 4,1: 4,_,<
5,_: H,1,> 5,1: 3,_,>
Its behavior was sufficiently complex as to elude Marxen and Buntrock, but not their program. Designed to analyze Turing machines and prove that they do or do not halt, the program did manage to create a proof that this machine does not halt. The trouble is, the proof is so complex that it is difficult to check by hand.
Some programs run so long before halting that direct "one-step-at-a-time" simulation isn't fast enough to show how long they run. Many machines can be run to completion with first-order acceleration techniques (skipping through any repeated sequence of states and/or tape cells). However, many others require a combination of different types of acceleration. The current record-holder for 6-state machines, which takes 8,690,333,381,690,951 (over 8×1015) steps to halt, is a good example. And, when all types of automated acceleration fall short, "maunal" (human) analysis can sometimes be used to prove that a machine halts and calculate the relevant numbers. (An example of this follows the table).
Busy Beavers and BB Candidate Record-Setters
You can run any of these on my turing machine simulator but be warned the simulator is a command-line-oriented Perl program, certainly not for the faint of heart.
| When | Who | Marks | Steps | Program | Published |
| 1962 | Lin & Rado | BB1 = 1 | 1 |
1,_:H,1,> 1,1:1,_,< | Rado, On non-computable functions. The Bell System Technical Journal, vol. 41, no. 3, pp. 877-884, May 1962. |
| 1962 | Lin & Rado | BB2 = 4 | 6 |
1,_:2,1,> 1,1:2,1,< 1,_:1,1,< 1,1:H,1,< | Rado, On non-computable functions. The Bell System Technical Journal, vol. 41, no. 3, pp. 877-884, May 1962. |
| 1965? | Lin & Rado | BB3 = 6 | 21 | ? | Lin & Rado. Computer Studies of Turing Machine Problems. Journal for the Association of Computing Machinery, vol. 12, no. 2, pp. 196-212, April 1965. |
| 1973? | Brady | BB4 = 13 | 107 | ? | A. H. Brady. Solution of the non-computable "Busy Beaver" game for k=4. Abstracts for ACM Computer Science Conference (Washington DC, 1975), p. 27, ACM, 1975. |
| January 1983 | Uwe Schult | BB5 >= 501 | 134,467 |
1,_:2,1,> 1,1:3,_,<
2,_:3,1,> 2,1:4,1,> 3,_:1,1,< 3,1:2,_,> 4,_:5,_,> 4,1:H,1,> 5,_:3,1,< 5,1:1,1,> | J. Ludewig, U. Schult, and F. Wankmueller. Chasing the Busy Beaver Notes and Observations on a Competition to Find the 5-state Busy Beaver Forschungsberichte des Fachbereichs Informatik, 159, Universitaet Dortmund, Dortmund, 1983. |
| December 1984 | George Uhing | BB5 >= 1915 | 2,133,492 |
1,_:2,1,> 1,1:3,1,<
2,_:1,_,< 2,1:4,_,< 3,_:1,1,< 3,1:H,1,> 4,_:2,1,< 4,1:5,1,> 5,_:4,_,> 5,1:2,_,> | A.K. Dewdney, Computer Recreations, Scientific American, vol. 252, no. 4, pp. 12-16, April 1985. |
| September 1989 | Heiner Marxen, Juergen Buntrock | BB5 >= 4098 | 47,176,870 |
1,_:2,1,< 1,1:3,1,>
2,_:3,1,< 2,1:2,1,< 3,_:4,1,< 3,1:5,_,> 4,_:1,1,> 4,1:4,1,> 5,_:H,1,> 5,1:1,_,> | H. Marxen and J. Buntrock. Attacking the Busy Beaver 5 Bulletin of the EATCS, 40, pp. 247-251, February 1990. |
| 1989? | Heiner Marxen, Juergen Buntrock | BB6 >= 136,612 | >= 13,122,572,797 |
1,_:2,1,< 1,1:1,1,<
2,_:3,1,> 2,1:2,1,> 3,_:6,_,> 3,1:4,1,> 4,_:1,1,< 4,1:5,_,> 5,_:1,_,< 5,1:3,1,> 6,_:5,1,< 6,1:H,1,> | H. Marxen and J. Buntrock. Attacking the Busy Beaver 5. Bulletin of the EATCS, 40, pp. 247-251, February 1990. |
| September 1997 | Heiner Marxen | BB6 >= 95,524,079 | 8,690,333,381,690,951 |
1,_:2,1,> 1,1:1,1,>
2,_:3,1,< 2,1:2,1,< 3,_:6,_,> 3,1:4,1,< 4,_:1,1,> 4,1:5,_,< 5,_:H,1,> 5,1:6,1,< 6,_:1,_,< 6,1:3,_,< | Heiner Marxen, web page |
| Oct 2000 | Heiner Marxen, Juergen Buntrock | BB6 >= 6.427499 × 10462 | 6.196913 × 10925 |
1,_:2,1,> 1,1:2,_,<
2,_:3,_,> 2,1:2,1,< 3,_:4,1,> 3,1:1,_,< 4,_:5,1,< 4,1:6,1,< 5,_:1,1,< 5,1:4,_,< 6,_:H,1,> 6,1:5,1,< | Heiner Marxen, web page |
| Mar 2001 | Heiner Marxen, Juergen Buntrock | BB6 >= 1.29149 × 10865 | 3.00233 × 101730 |
1,_:2,1,> 1,1:6,_,<
2,_:3,_,> 2,1:4,_,> 3,_:4,1,< 3,1:5,1,> 4,_:5,_,< 4,1:4,_,< 5,_:1,_,> 5,1:3,1,> 6,_:1,1,< 6,1:H,1,> | Heiner Marxen, web page |
Analysis of the 1997 Marxen-Buntrock BB6 Record-Setter
This section describes the machine that set the record BB6 >= 95,524,079. It is a 6-state candidate discovered by Marxen and Buntrock in 1997. (The Oct 2000 record-holder is discussed in the next section)
The machine's operation lends itself easily to analysis, which reveals a beautiful combination of a few simple processes that interact in just the right way. The machine implements an iterated function that grows at an exponential rate and uses a test condition based on a low-probability, deterministic, and statistically random condition to determine when to stop growing. As a result, it grows to a very large size, so large that it takes a really sophisticated Turing machine simulator to show that it halts by brute-force iteration.
Simulation of the first few steps shows that it creates a row of 1's and repeatedly expands it to a row of 1's about 2.5 times longer. The exit condition depends on the number of 1's modulo 4 (the remainder of the number when divided by 4). But the number of 1's increases in a way that also depends in its value modulo 4, with the result that the value modulo 4 changes in a "chaotic" manner. This means that the row of 1's can multiply many times before the machine halts. Furthermore, the process of performing the mod-4 test is combined with the process of performing the division by 2 (which is then followed by a multiplication by 5). This combination of functions accomplishing two things simultaneously with the same code is part of what makes this much complexity possible in a Turing machine with only 6 states.
To analyze, we started by finding an example of a simple state that the machine enters over and over again. I call such a state "canonical" because it it has a simple pattern and recurs multiple times during the machine's history. As already implied, the canonical state in this case is a single continuous set of N 1's. In addition we specify that the machine is in state 1 and the head is positioned over any of the 1's. It doesn't matter which of the 1's the head is on, because in all such states the machine just moves to the right, staying in state 1 until it gets to the end of the row of 1's.
This condition occurs at steps 4, 37, 259, and several more times as listed below. After it reaches the right-hand end of the row of 1's, it adds two more 1's to the right and reverses direction, then plows through the rest of the 1's changing them into a "1_1_" pattern (this is easy to see in the machine's state table: it cycles through states 3,4,5,6, each time moving to the left and writing alternating 1 and _). When the head reaches the left end, what happens next depends on the value of N modulo 4. Here is what the tape will end up looking like for each of the 4 possibilities:
N = 0 mod 4 (4, 8, 12, etc.): 1(10)M11
N = 1 mod 4 (5, 9, 13, etc.): Halts with 1(10)M11 on the tape
N = 2 mod 4 (6, 10, 14, etc.): 111(10)M11
N = 3 mod 4 (7, 11, 15, etc.): 111(10)M11
(Where M is equal to N/2, rounded down.) As you can see, only values of N that are 1 more than a multiple of 4 will result in a halt.
In the remaining cases the machine begins a process that takes about 5 N2 / 4 steps, during which it repeatedly shuttles back and forth turning each copy of '1_' into '11' and then appending a '111' to the left, thereby adding a total of 3M 1's to the left. When this is done the tape is once again a single solid row of 1's, with the machine in state 1. The new value of N for this new, longer string of 1's is different for each of the three cases:
N = 0 mod 4: 3 M + 1 + 2 M + 2 = 5 N / 2 + 3 = (1 or 3) mod 4
N = 1 mod 4: (machine will have halted)
N = 2 mod 4: 3 M + 3 + 2 M + 2 = 5 N / 2 + 5 = (0 or 2) mod 4
N = 3 mod 4: 3 M + 3 + 2 M + 2 = 5 (N-1) / 2 + 5 = (0 or 2) mod 4
Notice that the formula for the new N involves a division by 2, and adds either 3 or 5. Because of the division by 2, the new N's mod-4 value depends on the old N's value modulo 8:
N = 0 mod 8: new N = 3 mod 4
N = 1 mod 8: (machine will have halted)
N = 2 mod 8: new N = 2 mod 4
N = 3 mod 8: new N = 2 mod 4
N = 4 mod 8: new N = 1 mod 4
N = 5 mod 8: (machine will have halted)
N = 6 mod 8: new N = 0 mod 4
N = 7 mod 8: new N = 0 mod 4
As you can see, out of the 6 possible cases, only one of them leads to an N=1 mod 4 value (the 'halting' value) for the new N. This is an example of a generalized Collatz mapping1. (The parameters are: T(x) = floor(mi.x / d) + Xi, where d=4, i such that x = i mod d, mi = {10,0,10,10}, Xi = {3,1,5,3}, and initial value x=3).
A "Collatz mapping" is an iterated function on integers, where at each iteration you perform a modulo test (like the remainder after dividing by 2) and then select from one of a number of linear functions (such as X/2 or 3X+1) based on what the remainder was. The simplest and best-known Collatz mapping is the "3N+1 problem", which states that for any X, the next value of X in the iteration is:
| X/2 | if X is even | ||
| 3X + 1 | if X is odd
|
Such iterations are notable for the fact that the behavior of the iteration is very difficult to characterize. Does any value for X cause the iteration to grow forever, or do they all always end up in the cycle 4, 2, 1, 4, 2, 1, ...? Some values of X grow pretty high.
Our Turing machine differs from the standard Collatz mapping in having a "halt" condition as one possible outcome. It also differs from the 3N+1 problem in that all of the linear functions grow, thus there is no question of it ever getting smaller only how long will it grow before it stops?
Here are all the N values, and for the first 12 cases the step number (found by simulation) at which a row of N 1's is first seen. (The precise number of steps for each N can be calculated using a little more analysis, but it's not important for answering the halting question). Also given are the mod-4 value and the value of M. Notice in particular the fairly erratic pattern of mod-4 values.
| at step | N | i such that N = i mod 4 | M |
| 4 | 3 | 3 | 1 |
| 37 | 10 | 2 | 5 |
| 259 | 30 | 2 | 15 |
| 1661 | 80 | 0 | 40 |
| 10,224 | 203 | 3 | 101 |
| 63,057 | 510 | 2 | 255 |
| 392,779 | 1280 | 0 | 640 |
| 2,449,742 | 3203 | 3 | 1601 |
| 15,294,575 | 8010 | 2 | 4005 |
| 95,566,797 | 20,030 | 2 | 10,015 |
| 597,248,199 | 50,080 | 0 | 25,040 |
| 3,732,606,762 | 125,203 | 3 | 62,601 |
| 313,010 | 2 | 156,505 | |
| 782,530 | 2 | 391,265 | |
| 1,956,330 | 2 | 978,165 | |
| 4,890,830 | 2 | 2,445,415 | |
| 12,227,080 | 0 | 6,113,540 | |
| 30,567,703 | 3 | 15,283,851 | |
| 76,419,260 | 0 | 38,209,630 | |
| 191,048,153 | 1 | 95,524,076 |
It might seem a little disappointing to some readers to note that this machine, a record-setter for the number of 1's produced, actually produces as many as twice that record, only to cut the number almost in half just before halting.
Analysis of the Oct 2000 Marxen-Buntrock BB6 Record-Setter 'q'
This is a high-level description of an old BB(6) candidate champion, discovered in October 2000 and later surpassed on March 2001. It writes over 6×10462 ones before halting. For a more thorough and rigorous analysis read this page.
This machine operates on similar principles to the one just discussed, but instead of iterating a process of multiplying the number of 1's by 5/2, it iterates a process of raising 2 to the power of the number of 1's! Yes, that's right, iterated exponents. Combined with the modulo dependency mechanism (this time on a modulo-3 test rather than modulo-4) it's easy to see how it gets up to 10462 ones.
To analyze, we use the same method as before of finding a simple "canonical" state (like a run of 1's) and looking at what it produces when the number of 1's changes. In this case, the simplest such state is when the head is in state 2, positioned on a 0 cell, with N consecutive 1's just to the left of the head, and nothing else. Call such a state C(N).
Looking at the fates of the first few C(N)'s, we find:
C(0) becomes C(10) after 86 steps.
C(1) becomes C(24) after 408 steps.
C(2) halts after 8 steps, leaving 4 1's on the tape.
C(3) becomes C(28) after 544 steps.
C(4) becomes C(56) after 2098 steps.
C(5) halts after 11 steps, leaving 6 1's on the tape.
C(6) becomes C(64) after 2730 steps.
C(7) becomes C(120) after 9548 steps.
C(8) halts after 14 steps, leaving 8 1's on the tape.
C(9) becomes C(136) after 12260 steps.
C(10) becomes C(248) after 40806 steps.
C(11) halts after 17 steps, leaving 10 1's on the tape.
C(12) becomes C(280) after 52030 steps.
C(13) becomes C(504) after 168832 steps.
C(14) halts after 20 steps, leaving 12 1's on the tape.
C(15) becomes C(568) after 214488 steps.
It is clear right away that there is a modulo-3 effect going on. A general formula is only evident for the case of N=2 modulo 3:
C(3N+2) halts after 3N+8 steps, leaving (2(N+1)/3)+2 1's on the tape.
For the others, we have to consider another "canonical" state; this one is a bit more complex. It consists of bunch of groups of 1's separated by single 0's, with the head in state 2 and positioned just to the left of the leftmost 1. I will represent such a state as D(a,b,c,...) where the a, b, c etc. are numbers giving the number of 1's in each group. Each of the C(N)'s above (except the ones that halt) leads to one of these D(...) states, which then passes through one or more additional D(...) states and ultimately becomes another C() state. Here is the first D(...) state for each C() state:
C(0) becomes 111_1 which we call D(3,1)
C(1) becomes 111_1_1 which we call D(3,1,1)
C(3) becomes 1111_1_1 which we call D(4,1,1)
C(4) becomes 111_11_1_1 which we call D(3,2,1,1)
C(6) becomes D(4,2,1,1)
C(7) becomes D(3,2,2,1,1)
C(9) becomes D(4,2,2,1,1)
C(10) becomes D(3,2,2,2,1,1)
C(12) becomes D(4,2,2,2,1,1)
C(13) becomes D(3,2,2,2,2,1,1)
C(15) becomes D(4,2,2,2,2,1,1)
and in general:
C(3N) becomes D(4,X,1,1) where X is a string of N-1 2's in a row
C(3N+1) becomes D(3,X,1,1) where X is a string of N 2's in a row
Then we can look at each of the D's and see what happens:
D(3,1) becomes C(10)
D(3,1,1) becomes D(10,1)
D(10,1) becomes C(24)
D(4,1,1) becomes D(12,1)
D(12,1) becomes C(28)
D(3,2,1,1) becomes D(11,1,1)
D(11,1,1) becomes D(26,1)
D(26,1) becomes C(56)
D(4,2,1,1) becomes D(13,1,1)
D(13,1,1) becomes D(28,1)
D(28,1) becomes C(60)
The pattern that emerges is:
D(N,1) becomes C(2N+4)
D(N,1,1) becomes D(2N+4,1)
D(N,2,1,1) becomes D(2N+5,1,1)
and in general:
D(N,2,X) becomes D(2N+5,X)
where X is any sequence of groups.
So, to figure out how far this machine goes before halting, we have to find the first occurance of a C(N) state, then apply these rules to find each subsequent C(N) state until we get to one with N=2 modulo 3. Since the transition from each C(N) to the next higher C(N) involves approximately N/3 steps iterating 2N+4 or 2N+5 each step, it amounts to calculating (approximately) 2N/3 each time. If you count each C(N) as a "step", the machine only takes three "steps" to produce the huge number 6×10462.
C(1) occurs at step 1 of the machine's operation.
C(1) becomes D(3,1,1) which becomes C(24) at step 409.
C(24) = C(8×3), becomes D(4,2,2,2,2,2,2,2,1,1), which becomes C(4600).
C(4600) = C(1533×3+1), becomes D(3,X,1,1) where "X" is a string of 1533 2's in a row, which becomes C(B) where B is calculated by the following BASIC code:
or the following equivalent code for the bc tool in Unix:
The first few values of B are: 3, 11, 27, 59, 123, 251, ... It gets doubled another 1530 times, which produces a final value of B around 9×10462. As luck would have it, this value of B is equal to 2 modulo 3, so this C(N) is the last. The machine proceeds to cut the number of 1's by 2/3 and halt.
The final line of the sample-code prints the number of 1's that the machine will leave on the tape just before halting. That value is:
References:
Heiner Marxen, table of record setters
Heiner Marxen, table of BB(6) candidates found by a search in 2000
Robert Munafo, Detailed Analysis of the Oct 2000 Marxen-Buntrock BB(6) Record-Setter 'q'.
1. "Collatz mappings" are described here and also in the book "CRC Concise Encyclopedia of Mathematics".
1 :
I. N. Galidakis, formerly known to some as Ioannis, has been
exploring a number of related issues for several years. There is a lot
of work related to the Lambert W function, which is important in
anything involving coupled exponentials such as xx and xex,
or their inverses. An older version of real-valued hyper4 function
description is available on web.archive.org under the URL
http://users.forthnet.gr/ath/jgal/math/exponents4.html:>.
Robert Munafo's home pages at Pair Networks
© 1996-2008 Robert P. Munafo.
Email the author