Large Numbers


First page . . . Back to page 3 . . . Forward to page 5 . . . Last page (page 7)


The Steinhaus-Moser-Ackermann operators

The Ackermann function and the Steinhaus-Moser notation are both equivalent to a triadic operator that is somewhat more powerful than the hy(a,b,c) function above. The Ackermann function and Steinhaus-Moser are roughly equivalent to each other so we'll discuss them together.

Ackermann's Function

A recursive function first described by W. Ackermann in 1928 to demonstrate a property of computability in the field of mathematics, and also used more recently as an example of pathological recursive functions in computer science. There are many different versions of the function; for a complete description of each go here. I will use the version that is the simplest to convert to the hyper operators:

ack-b(a,b) = { y+1 , for x=0 { { 2 , for x=1, y=0 { { 0 , for x=2, y=0 { { 1 , for x>2, y=0 { { ack-b(x-1,ack-b(x,y-1)) for x,y > 0 ack-rm(a,b) = { 2b for a = 1 { { 2 for b = 1, a > 1 { { ack-rm(a-1, ack-rm(a,b-1)) for a,b > 1

which yields to analysis as follows:

ack-rm(1,b) = 2b ack-rm(a,1) = 2 ack-rm(2,b) = ack-rm(1, ack-rm(2,b-1)) = 2*ack-rm(2,b-1) and by induction, ack-rm(2,b) = 2^b ack-rm(3,b) = ack-rm(2, ack-rm(3,b-1)) = 2^ack-rm(3,b-1) and by induction, ack-rm(3,b) = 2b ack-rm(4,b) = ack-rm(3, ack-rm(4,b-1)) = 2^^ack-rm(4,b-1) and by induction, ack-rm(4,b) = 2b and by induction, ack-rm(a,b) = hy(2,a+1,b)

The example value most commonly cited is ack-rm(3,5), 25 which is 265536, a large class-2 number. Of course, as with Steinhaus-Moser notation it is easy to transcend the classes entirely.

At this point it is tempting to try to avoid the "triadic function requirement" noted above by defining a single-variable function, such as:

a1(n) = ack-h(n,n,n)

While it is true that a1(x) grows just as fast as the ack-h() function, and therefore serves as a good way of defining large numbers as a function of one variable, actually computing those numbers involves the recursive definition of the function. If x>1, we have:

a1(x) = ack-h(x,x,x) = ack-h(x-1, x, ack-h(x,x,x-1))

note that the arguments of the two ack-h functions on the right are not equal to each other, and therefore we can't substitute from the definition of a1(n) to make the right side be in terms of the a1() function.

However, as seen above it is possible to reduce the Ackermann function to two arguments. Furthermore, it is the fastest-growing function you can get using two arguments, if the function is defined only in terms of calls to itself and the "successor function" f(x)=x+1.

The Mega and the Moser

These numbers were constructed by Hugo Steinhaus and Leo Moser (in the late 1970's or earlier, exact date unknown) just to show that it is easy to create a notation for extremely large numbers. In the special "Steinhaus-Moser notation", numbers can have polygons (like triangles or squares) drawn around them:

X inside a triangle equals XX

X inside a square equals X inside X concentric triangles

X inside a pentagon equals X inside X concentric squares

and so on.

2 inside a pentagon is called mega, and 2 inside a mega-gon is called moser.

(In an earlier version, by Steinhaus before Moser got involved, the pentagon is replaced by a circle, so mega is 2 inside a circle, and there are no higher polygons so moser is unrepresentable. Steinhaus also defines megiston as 10 inside a circle. moser is much bigger than megiston. Both versions of the notation predate 1983; I learned about the moser at an academic library in June 1979.)

Here is the notation in functional form:

sm(a,n,p) = { a^a for n = 1, p = 3 { { sm(a,a,p-1) for n = 1, p > 3 { { sm(sm(a,1,p),n-1,p) for n > 1

and here are a few values:

sm(2,1,3) = 2^2 = 4 sm(2,2,3) = sm(sm(2,1,3),1,3) = sm(4,1,3) = 4^4 = 256 sm(2,1,4) = sm(2,2,3) = 256 mega = sm(2,1,5) = sm(2,2,4) = sm(sm(2,1,4),1,4) = sm(256,1,4) = sm(256,256,3) = sm(256^256,255,3) = sm((256^256)^(256^256),254,3) = sm([(256^256)^(256^256)]^[(256^256)^(256^256)],253,3) = ... megiston = sm(10,1,5) moser = sm(2,1,Mega)

We can approximate the sm function in terms of the hy function for small values of p:

sm(n,1,3) = n^n = hy(n,3,n) = hy(n,4,2) sm(n,2,3) = (n^n)^(n^n) = n^(n*n^n) = n^(n^(n+1)) ≅ hy(n,4,3+epsilon) sm(n,3,3) = sm(n,2,3)^sm(n,2,3) = (n^(n^(n+1)))^(n^(n^(n+1))) = n^(n^(n+1) * n^(n^(n+1))) = n^(n^(1+n+n^(n+1))) ≅ n^(n^(n^(n+1+epsilon))) ≅ hy(n,4,4+epsilon) by induction sm(n,x,3) ≅ hy(n,4,x+1+epsilon) sm(n,1,4) = sm(n,n,3) ≅ hy(n,4,n+1+epsilon) sm(n,2,4) = sm(sm(n,1,4),1,4) ≅ sm(hy(n,4,n+1+epsilon),1,4) ≅ hy(hy(n,4,n+1+epsilon),4,hy(n,4,n+1+epsilon)+1+epsilon) "intuitively", sm(n,1,4) ≅ hy(n,4,n+1+epsilon) sm(n,1,5) ≅ hy(n,5,n^2)

The value of mega can be defined in BASIC (-: as follows:

let M = 256 for N=1 to 256 let M = M^M next N print "mega is", M

The first few steps of this generate the numbers:

256 = 28 = 223
256256 = 22048 = 2211 ≅ 3.231700607153×10616
(256256)(256256) = 222059 ≅ 10(1.992373902866×10619)
[(256256)(256256)][(256256)(256256)] = 222059+22059 ≅ 1010(1.992373902866×10619)

Each time through the loop there are twice as many 256's — so there are 2256 256's in mega. However, the parentheses are grouped differently from the power towers discussed above. After two times through the loop, for example, it's (256256)^(256256). That is not as big as 256(256(256256)) — the former is 10(1.992373902866×10619), the latter is 10[10(7.78271055807×10616)]. This discrepency continues, with the result that the mega is about 10^(10^(10^...^(1.992373902866×10619)...)), with 255 10's before the "1.992373902866×10619" part. For reasons explained here, that is equivalent to what you get if you replace all but the last few 10's with nearly any other number between say 2 and 1000. (You can do these calculations with hypercalc.)

Friedman Sequences

In a 1998 paper14, Harvey Friedman describes the problem of finding the longest sequence of letters (where there are N allowed letters) such that no subsequence of letters i through 2i occurs anywhere further on in the sequence. For 1 letter the maximum length is 3: AAA. For 2 letters the longest sequence is 11: ABBBAAAAAAA. For 3 letters the longest sequence is very very long, but not infinite.

He then goes on to show how to construct proofs of lower bounds for N-character sequences using certain special (N+1)-character sequences. With help from R. Dougherty, he found a lower bound for the N=3 case, A7198(158386) = ack-rm(7198,158386) = ack(7198,2,158386) = hy(2,7199,158386) = 2(7199)158386, where x(7199)y represents the 7199th hyper operator.13

Graham's Number

This number is an upper bound for a problem in Ramsey theory, (graph coloring, combinatorics). The problem is to determine the lowest dimension of a hypercube such that if the lines joining all pairs of corners are two-colored, a planar graph K4 of one color will be forced. (K4 is a totally-connected graph with 4 edges, topologically equivalent to the 4 vertices and 6 edges of a tetrahedron). Graham and Rothschild proved that an answer exists and the best upper bound they could find is Graham's Number. Graham's Number is gn(64), where gn() is defined as follows:

gn(n) = { hy(3,6,3) for n = 1 { { hy(3, gn(n-1)+2, 3) for n > 1

Todd Cesere and Tim Chow have both proven that Graham's Number is bigger than the Moser, and in fact even gn(2) is much bigger than the Moser. Tim's proof is outlined on a page by Susan Stepney, here.

Superclasses

Now let's take a break from the specific examples and functions for a moment to describe another type of "number class" that I have seen recently in the way people describe large numbers. We'll call these superclasses.

I first saw the concept in Eliezer Yudkowsky's description18 of Graham's number, apparently adapted from the very similar description by Martin Gardner16. I'm adapting it further to illustrate the superclasses.

(1) Let's start with 3↑3. This is 33 = 27, a number that is small enough for anyone to visualize.

(2) Next, consider 3↑↑3 = 33. This is 333 = 327 = 7625597484987, about 7 trillion. Even 7 billion would be too big for anyone to visualize, but most people can understand it as being comparable to something familiar. (For example, it is 1000 times more than the world population but about a tenth the number of cells in a human body.)

(3) 3↑↑↑3 or 33, is 3(33), which is 37625597484987. That would be a power tower of 3's, 7 trillion levels high. Now we are far beyond the ability to understand — nothing in the real world is this numerous, and even combinations of things (such as the number of ways to shuffle every particle in the known universe) only require, at most, 4 or 5 levels of a power-tower to express. However, even though the quantity is beyond understanding, the procedure for computing it can be visualized: Start with n=1. Now replace n with 3n. Replace n with 3n again, and so on — repeat seven trillion times. This is a procedure that one can visualize doing, and it can even be completed in a reasonable amount of time using a fast computer and a suitable representation format such as level-index.

(4) Now consider 3↑↑↑↑3, or 33. This is 3(33), which is 3(3(3(...(33))...)), a chain of 7 trillion 3's and hyper4 operators. Every one of these requires performing the exponent operation an unimaginable number of times. So now, the procedure is beyond human ability to visualize, although it can be understood. Start with n=3. Now replace n with an exponential tower of threes of height n. Repeat 33 - 2 more times, where 33 is the huge number whose calculation we visualized in the previous paragraph! Martin Gardner, also writing about Graham's Number, said:

3↑↑↑↑3 is unimaginably larger than 3↑↑↑3, but it is still small as finite numbers go, since most finite numbers are very much larger.

(5) We have just seen four numbers in a sequence: 3↑3, 3↑↑3, 3↑↑↑3, and 3↑↑↑↑3. Now consider the formula for Graham's number. Begin with x = 3↑↑↑↑3, the number whose calculation procedure cannot even be visualized — then increase the number of up-arrows from 4 to x. Then increase the number of up-arrows to this new, larger value of x again. Then — repeat 61 more times! Here's what Yudkowsky had to say:

Graham's number is far beyond my ability to grasp. I can describe it, but I cannot properly appreciate it. (Perhaps Graham can appreciate it, having written a mathematical proof that uses it.) This number is far larger than most people's conception of infinity. I know that it was larger than mine. ...

This example illustrates the numbers (including those on my numbers page and on this page up to this point) can be seen as going from one level of abstraction to another. Here are the definitions of the superclasses:

Superclass 1: The number can be visualized. (Example: 27)

Superclass 2: The number cannot be visualized but it can be understood. (Example: 3↑↑3 ≅ 7 trillion)

Superclass 3: The number cannot be understood, but the procedure for computing it can be visualized. (Example: 3↑↑↑3)

Superclass 4: The procedure cannot be visualized, but it can be understood. (3↑↑↑↑3)

Superclass 5: The procedure for generating the number is so abstract that it cannot be understood (by whoever's talking, by the author, by yourself, or perhaps by some imagined typical person).

To complete the analogy to the normal classes, I will also add:

Superclass 0: The number can be experienced (whatever that means) by unsentient animals. (Example: 2)

Like the classes, the superclasses have important qualities that distinguish them, and this insight is what makes "superclass" a useful distinction. Each requires an additional amount of mental effort to ensure that the number is understood well anough to answer questions such as which number is larger, or which algorithm grows faster.

Notice that the first three (superclass 0, 1 and 2) are roughly equivalent to class 0, class 1 and class 2 above. The division points between them will be similar for most readers, but the upper range of superclass 2 will probably vary from one reader to another. I think I "understand" numbers about as big as the size of the visible universe, although I find it harder on some days than on others. There are probably people who have an understanding of such things as the size of the universe after inflation, or the size that would be needed for a spontaneous origin of life.

The first three superclasses can all be calculated fairly easily, using familiar techniques or modest towers of exponents — in the latter case sizes can easily be compared by looking at the number of levels in the tower and the value of the number at the top. Practical tools like hypercalc exist to actually work with these numbers.

Superclass 3 begins wherever your ability to "understand" ends. However, the procedure for computing it can still be visualized. Somewhere within superclass 3, the practical tools like hypercalc stop working. At the top end of superclass 3, we are well beyond the ability to work with actual numbers and have to start using symbols and words, because the numbers outstrip the available tools. However, it is still possible to make fairly simple proofs to show how one function converts to another. An example is proving that the higer-valued hyper4 operator ab grows about as quickly as the lower-valued hyper5 operator ab.

With superclass 4, such proofs become more difficult. This is why it is a bit tougher to compare Graham's number to the Moser. It can be worked out by most who have time and patience, and the explanation can be followed and understood fairly easy, but it probably isn't too easy to remember.

With Graham's number, and the others of its size and larger which we are about to get into, most readers will have stepped firmly into superclass 5. The last paragraph in the Yudkowsky quote captures what superclass 5 is about: perception and understanding are so difficult that it is hard even to compare the number to infinity. I cannot say where your superclass 5 begins, but once you reach it, it is better to stop trying to grasp the quantities in any direct way and instead use rigorous deduction techniques, or trust others who have done so.

I have found my own boundary between 4 and 5 varies widely from one day to another. Clearly there are those (such as the people who worked on the numbers we are about to discuss) for whom "superclass 4" extends far higher than mine. I suggest it might be useful to define:

Superclass 6: The number is so big that no-one can understand its definition well enough to know anything useful about it.

In other words, superclass 6 is finite, but so large that no person, nor humanity as a whole, stands a chance of discovering anything useful about it. This happens with specific values of the Busy Beaver function beyond a certain point.

Conway's Chained Arrow Notation

This notation is based on Knuth's up-arrow notation. In this notation, three or more numbers are joined by right-arrows. The arrow is not an ordinary dyadic operator. You can have three or more numbers joined by arrows, in which case the arrows don't act separately, the whole chain has to be considered as a unit. It might be thought of as a function with a variable number of arguments, or perhaps a function whose single argument is an ordered list or vector.

Here is Conway's notation, adapted from Susan Stepney's excellent web page on large numbers:

A. Three-number chains are equivalent to the generalized hyper operator:

a→b→c = hy(a, c+2, b) = a(c+2)b
= a ^^..^^ b, with c up-arrows

B. A chain with a final 1 is equivalent to the same chain with the 1 removed:

a→b→...→x→1 = a→b→...→x

C. Conway is silent about the meaning of a two-element chain10, but a definition is necessary. The most logical definition combines the two previous rules and assumes that they work in reverse:

ab = a→b→1 (by rule A)
   = a→b (by rule B)
therefore the reverse is true: a→b = ab

D. The last two numbers in a chain can be dropped if the second-to-last is a 1:

a→b→...→x→1→(z + 1) = a→b→...→x

E. The last number in the chain can be reduced in size by 1 by taking the second-to-last number and replacing it with a copy of the entire chain with its second-to-last number reduced by 1:

a→b→...→x→ (y+1) → (z+1)
= a→b→...→x→ (a→b→...→x→y→(z+1)) →z

F. If the last number in the chain is 2, rules D and E combine into:

a→b→...→x→ (y+1) → 2
= a→b→...→x→ (a→b→...→x→y→2) →1
= a→b→...→x→ (a→b→...→x→y→2)
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2) →1 )
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2))

and in general,

a→b→...→x→ (y+1) → 2
= a→b→...→x→ (     y times
         a→b→...→x
             ) y times

G. A more general expansion of rule E follows:

a→b→...→x→2→ (z+1)
= a→b→...→x→ (a→b→...→x→1→(z+1)) →z
= a→b→...→x→ (a→b→...→x ) →z

a→b→...→x→3→ (z+1)
= a→b→...→x→ (a→b→...→x→2→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z

a→b→...→x→4→ (z+1)
= a→b→...→x→ (a→b→...→x→3→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z) →z

and in general,

a→b→...→x→ (y+1) → (z+1)
= a→b→...→x→ (     y times
         a→b→...→x
             ) →z     y times

The parentheses can only be removed after the chain inside the parentheses has been evaluated into a single number. Remember, the arrows are all taken together at once. You can't group them to evaluate, you can only reduce terms off the end using one of the rules above. a→b→c→d is not equivalent to (a→b→c)→d, a→(b→c→d), a→(b→c)→d, a→b→(c→d) or anything else like that.

Ackermann's function is equivalent to a 3-element chain:

A(m, n) = ( 2 → (n + 3) → (m - 2) ) - 3

It can also be shown that Graham's number is bigger than 3→3→64→2 and smaller than 3→3→65→2. See here for details.


First page . . . Back to page 3 . . . Forward to page 5 . . . Last page (page 7)



If you like this you might also enjoy my numbers page.

Robert Munafo's home pages on HostMDS   (c) 1996-2010 Robert P. Munafo.   about   contact

This work is licensed under a Creative Commons Attribution 2.5 License. Details here s.13