| Large Numbers |
Back to page 3 . . . Forward to page 5
These numbers were constructed by Hugo Steinhaus and Leo Moser (in the late 1970's or earlier, exact date unknown) just to show that it is easy to create a notation for extremely large numbers. In the special "Steinhaus-Moser notation", numbers can have polygons (like triangles or squares) drawn around them:
X inside a triangle equals XX
X inside a square equals X inside X concentric triangles
X inside a pentagon equals X inside X concentric squares
and so on.
2 inside a pentagon is called mega, and 2 inside a mega-gon is called moser.
(In an earlier version, by Steinhaus before Moser got involved, the pentagon is replaced by a circle, so mega is 2 inside a circle, and there are no higher polygons so moser is unrepresentable. Steinhaus also defines megiston as 10 inside a circle. moser is much bigger than megiston. Both versions of the notation predate 1983; I learned about the moser at an academic library in June 1979.)
Here is the notation in functional form:
sm(a,n,p) = { a^a for n = 1, p = 3 { { sm(a,a,p-1) for n = 1, p > 3 { { sm(sm(a,1,p),n-1,p) for n > 1and here are a few values:
sm(2,1,3) = 2^2 = 4 sm(2,2,3) = sm(sm(2,1,3),1,3) = sm(4,1,3) = 4^4 = 256 sm(2,1,4) = sm(2,2,3) = 256 mega = sm(2,1,5) = sm(2,2,4) = sm(sm(2,1,4),1,4) = sm(256,1,4) = sm(256,256,3) = sm(256^256,255,3) = sm((256^256)^(256^256),254,3) = sm([(256^256)^(256^256)]^[(256^256)^(256^256)],253,3) = ... megiston = sm(10,1,5) moser = sm(2,1,Mega)We can approximate the sm function in terms of the hy function for small values of p:
sm(n,1,3) = n^n = hy(n,3,n) = hy(n,4,2) sm(n,2,3) = (n^n)^(n^n) = n^(n*n^n) = n^(n^(n+1)) ≅ hy(n,4,3+epsilon) sm(n,3,3) = sm(n,2,3)^sm(n,2,3) = (n^(n^(n+1)))^(n^(n^(n+1))) = n^(n^(n+1) * n^(n^(n+1))) = n^(n^(1+n+n^(n+1))) ≅ n^(n^(n^(n+1+epsilon))) ≅ hy(n,4,4+epsilon) by induction sm(n,x,3) ≅ hy(n,4,x+1+epsilon) sm(n,1,4) = sm(n,n,3) ≅ hy(n,4,n+1+epsilon) sm(n,2,4) = sm(sm(n,1,4),1,4) ≅ sm(hy(n,4,n+1+epsilon),1,4) ≅ hy(hy(n,4,n+1+epsilon),4,hy(n,4,n+1+epsilon)+1+epsilon) "intuitively", sm(n,1,4) ≅ hy(n,4,n+1+epsilon) sm(n,1,5) ≅ hy(n,5,n^2)The value of mega can be defined in BASIC (-: as follows:
let M = 256 for N=1 to 256 let M = M^M next N print "mega is", MThe first few steps of this generate the numbers:
256 = 28 = 223
256256 = 22048 = 2211 ≅ 3.231700607153×10616
(256256)(256256) = 222059 ≅ 10(1.992373902866×10619)
[(256256)(256256)][(256256)(256256)] =
222059+22059 ≅ 1010(1.992373902866×10619)
Each time through the loop there are twice as many 256's so there are 2256 256's in mega. However, the parentheses are grouped differently from the power towers discussed above. After two times through the loop, for example, it's (256256)^(256256). That is not as big as 256(256(256256)) the former is 10(1.992373902866×10619), the latter is 10[10(7.78271055807×10616)]. This discrepency continues, with the result that the mega is about 10^(10^(10^...^(1.992373902866×10619)...)), with 255 10's before the "1.992373902866×10619" part. For reasons explained here, that is equivalent to what you get if you replace all but the last few 10's with nearly any other number between say 2 and 1000. (You can do these calculations with hypercalc.)
In a 1998 paper14, Harvey Friedman describes the problem of finding the longest sequence of letters (where there are N allowed letters) such that no subsequence of letters i through 2i occurs anywhere further on in the sequence. For 1 letter the maximum length is 3: AAA. For 2 letters the longest sequence is 11: ABBBAAAAAAA. For 3 letters the longest sequence is very very long, but not infinite.
He then goes on to show how to construct proofs of lower bounds for N-character sequences using certain special (N+1)-character sequences. With help from R. Dougherty, he found a lower bound for the N=3 case, A7198(158386) = ack-rm(7198,158386) = ack(7198,2,158386) = hy(2,7199,158386) = 2(7199)158386, where x(7199)y represents the 7199th hyper operator.13
This number is an upper bound for a problem in Ramsey theory, (graph coloring, combinatorics). The problem is to determine the lowest dimension of a hypercube such that if the lines joining all pairs of corners are two-colored, a planar graph K4 of one color will be forced. (K4 is a totally-connected graph with 4 edges, topologically equivalent to the 4 vertices and 6 edges of a tetrahedron). Graham and Rothschild proved that an answer exists and the best upper bound they could find is Graham's Number. Graham's Number is gn(64), where gn() is defined as follows:
gn(n) = { hy(3,6,3) for n = 1 { { hy(3, gn(n-1)+2, 3) for n > 1
Todd Cesere and Tim Chow have both proven that Graham's Number is bigger than the Moser, and in fact even gn(2) is much bigger than the Moser. Tim's proof is outlined on a page by Susan Stepney, here.
Now let's take a break from the specific examples and functions for a moment to describe another type of "number class" that I have seen recently in the way people describe large numbers. We'll call these superclasses.
I first saw the concept in Eliezer Yudkowsky's description18 of Graham's number, apparently adapted from the very similar description by Martin Gardner16. I'm adapting it further to illustrate the superclasses.
(1) Let's start with 3↑3. This is 33 = 27, a number that is small enough for anyone to visualize.
(2) Next, consider 3↑↑3 = 3④3. This is 333 = 327 = 7625597484987, about 7 trillion. Even 7 billion would be too big for anyone to visualize, but most people can understand it as being comparable to something familiar. (For example, it is 1000 times more than the world population but about a tenth the number of cells in a human body.)
(3) 3↑↑↑3 or 3⑤3, is 3④(3④3), which is 3④7625597484987. That would be a power tower of 3's, 7 trillion levels high. Now we are far beyond the ability to understand nothing in the real world is this numerous, and even combinations of things (such as the number of ways to shuffle every particle in the known universe) only require, at most, 4 or 5 levels of a power-tower to express. However, even though the quantity is beyond understanding, the procedure for computing it can be visualized: Start with n=1. Now replace n with 3n. Replace n with 3n again, and so on repeat seven trillion times. This is a procedure that one can visualize doing, and it can even be completed in a reasonable amount of time using a fast computer and a suitable representation format such as level-index.
(4) Now consider 3↑↑↑↑3, or 3⑥3. This is 3⑤(3⑤3), which is 3④(3④(3④(...④(3④3))...)), a chain of 7 trillion 3's and hyper4 operators. Every one of these requires performing the exponent operation an unimaginable number of times. So now, the procedure is beyond human ability to visualize, although it can be understood. Start with n=3. Now replace n with an exponential tower of threes of height n. Repeat 3⑤3 - 2 more times, where 3⑤3 is the huge number whose calculation we visualized in the previous paragraph! Martin Gardner, also writing about Graham's Number, said:
3↑↑↑↑3 is unimaginably larger than 3↑↑↑3, but it is still small as finite numbers go, since most finite numbers are very much larger.
(5) We have just seen four numbers in a sequence: 3↑3, 3↑↑3, 3↑↑↑3, and 3↑↑↑↑3. Now consider the formula for Graham's number. Begin with x = 3↑↑↑↑3, the number whose calculation procedure cannot even be visualized then increase the number of up-arrows from 4 to x. Then increase the number of up-arrows to this new, larger value of x again. Then repeat 61 more times! Here's what Yudkowsky had to say:
Graham's number is far beyond my ability to grasp. I can describe it, but I cannot properly appreciate it. (Perhaps Graham can appreciate it, having written a mathematical proof that uses it.) This number is far larger than most people's conception of infinity. I know that it was larger than mine. ...
This example illustrates the numbers (including those on my numbers page and on this page up to this point) can be seen as going from one level of abstraction to another. Here are the definitions of the superclasses:
Superclass 1: The number can be visualized. (Example: 27)
Superclass 2: The number cannot be visualized but it can be understood. (Example: 3↑↑3 ≅ 7 trillion)
Superclass 3: The number cannot be understood, but the procedure for computing it can be visualized. (Example: 3↑↑↑3)
Superclass 4: The procedure cannot be visualized, but it can be understood. (3↑↑↑↑3)
Superclass 5: The procedure for generating the number is so abstract that it cannot be understood (by whoever's talking, by the author, by yourself, or perhaps by some imagined typical person).
To complete the analogy to the normal classes, I will also add:
Superclass 0: The number can be experienced (whatever that means) by unsentient animals. (Example: 2)
Like the classes, the superclasses have important qualities that distinguish them, and this insight is what makes "superclass" a useful distinction. Each requires an additional amount of mental effort to ensure that the number is understood well anough to answer questions such as which number is larger, or which algorithm grows faster.
Notice that the first three (superclass 0, 1 and 2) are roughly equivalent to class 0, class 1 and class 2 above. The division points between them will be similar for most readers, but the upper range of superclass 2 will probably vary from one reader to another. I think I "understand" numbers about as big as the size of the visible universe, although I find it harder on some days than on others. There are probably people who have an understanding of such things as the size of the universe after inflation, or the size that would be needed for a spontaneous origin of life.
The first three superclasses can all be calculated fairly easily, using familiar techniques or modest towers of exponents in the latter case sizes can easily be compared by looking at the number of levels in the tower and the value of the number at the top. Practical tools like hypercalc exist to actually work with these numbers.
Superclass 3 begins wherever your ability to "understand" ends. However, the procedure for computing it can still be visualized. Somewhere within superclass 3, the practical tools like hypercalc stop working. At the top end of superclass 3, we are well beyond the ability to work with actual numbers and have to start using symbols and words, because the numbers outstrip the available tools. However, it is still possible to make fairly simple proofs to show how one function converts to another. An example is proving that the higer-valued hyper4 operator a④b grows about as quickly as the lower-valued hyper5 operator a⑤b.
With superclass 4, such proofs become more difficult. This is why it is a bit tougher to compare Graham's number to the Moser. It can be worked out by most who have time and patience, and the explanation can be followed and understood fairly easy, but it probably isn't too easy to remember.
With Graham's number, and the others of its size and larger which we are about to get into, most readers will have stepped firmly into superclass 5. The last paragraph in the Yudkowsky quote captures what superclass 5 is about: perception and understanding are so difficult that it is hard even to compare the number to infinity. I cannot say where your superclass 5 begins, but once you reach it, it is better to stop trying to grasp the quantities in any direct way and instead use rigorous deduction techniques, or trust others who have done so.
I have found my own boundary between 4 and 5 varies widely from one day to another. Clearly there are those (such as the people who worked on the numbers we are about to discuss) for whom "superclass 4" extends far higher than mine. I suggest it might be useful to define:
Superclass 6: The number is so big that no-one can understand its definition well enough to know anything useful about it.
Conway's Chained Arrow Notation
This notation is based on Knuth's up-arrow notation. In this notation, three or more numbers are joined by right-arrows. The arrow is not an ordinary dyadic operator. You can have three or more numbers joined by arrows, in which case the arrows don't act separately, the whole chain has to be considered as a unit. It might be thought of as a function with a variable number of arguments, or perhaps a function whose single argument is an ordered list or vector.
Here is Conway's notation, adapted from Susan Stepney's excellent web page on large numbers:
A. Three-number chains are equivalent to the generalized hyper operator:
a→b→c = hy(a, c+2, b) = a(c+2)b
= a ^^..^^ b, with c up-arrows
B. A chain with a final 1 is equivalent to the same chain with the 1 removed:
a→b→...→x→1 = a→b→...→x
C. Conway is silent about the meaning of a two-element chain10, but a definition is necessary. The most logical definition combines the two previous rules and assumes that they work in reverse:
ab = a→b→1 (by rule A)
= a→b (by rule B)
therefore the reverse is true: a→b = ab
D. The last two numbers in a chain can be dropped if the second-to-last is a 1:
a→b→...→x→1→(z + 1) = a→b→...→x
E. The last number in the chain can be reduced in size by 1 by taking the second-to-last number and replacing it with a copy of the entire chain with its second-to-last number reduced by 1:
a→b→...→x→ (y+1) → (z+1)
= a→b→...→x→ (a→b→...→x→y→(z+1)) →z
F. If the last number in the chain is 2, rules D and E combine into:
a→b→...→x→ (y+1) → 2
= a→b→...→x→ (a→b→...→x→y→2) →1
= a→b→...→x→ (a→b→...→x→y→2)
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2) →1 )
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2))
and in general,
a→b→...→x→ (y+1) → 2
= a→b→...→x→ ( y times
a→b→...→x
) y times
G. A more general expansion of rule E follows:
a→b→...→x→2→ (z+1)
= a→b→...→x→ (a→b→...→x→1→(z+1)) →z
= a→b→...→x→ (a→b→...→x ) →z
a→b→...→x→3→ (z+1)
= a→b→...→x→ (a→b→...→x→2→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z
a→b→...→x→4→ (z+1)
= a→b→...→x→ (a→b→...→x→3→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z) →z
and in general,
a→b→...→x→ (y+1) → (z+1)
= a→b→...→x→ ( y times
a→b→...→x
) →z y times
The parentheses can only be removed after the chain inside the parentheses has been evaluated into a single number. Remember, the arrows are all taken together at once. You can't group them to evaluate, you can only reduce terms off the end using one of the rules above. a→b→c→d is not equivalent to (a→b→c)→d, a→(b→c→d), a→(b→c)→d, a→b→(c→d) or anything else like that.
Ackermann's function is equivalent to a 3-element chain:
A(m, n) = ( 2 → (n + 3) → (m - 2) ) - 3
It can also be shown that Graham's number is bigger than 3→3→64→2 and smaller than 3→3→65→2. See here for details.
Jonathan Bowers' Extended Operators
Jonathan Bowers has defined a whole series of notations that surpass everything mentioned so far except perhaps the Conway chained-arrow notation. We can start by defining his extended operators, which are kind of analagous to the hyper operators. In fact, he starts with the hyper operators, which can be thought of as the original, "unextended" set of operators. Instead of putting the operator number inside a raised circle, he uses a pair of "angle-brackets":
a <n> b = aⓝb = hy(a,n,b)
So, <1> is addition, a <1> b = a+b; <2> is multiplication and so on.
Using this notation makes it easier to make the operator itself a variable or expression, and unlike using the hy() function it retains the look of applying an operator (because the operator part is in the middle where it "belongs". For example:
a <1+2> b = a <3> b = ab
gn(1) = 3 <6> 3
gn(2) = 3 <3 <6> 3> 3
Mega ≅ 256 <4> 2
Moser ≅ Mega <Mega+1> 2 ≅ (256 <4> 2) <256 <4> 2> 2
Here is his first extended operator:
a <<1>> 2 = a <a> a
a <<1>> 3 = a <a <a> a> a
a <<1>> 4 = a <a <a <a> a> a> a
and so on
a <<1>> b is "a expanded to b".
If you wish, you might prefer to represent this operator in a way similar to the higher hyper operators with a 1 inside two circles: a⓵b. Since the two circles are hard to see in normal font sizes, I'll instead use the hyper1 operator symbol ① inside a set of parentheses: a(①)b.
Using this notation, Graham's number is shown to be between 3(①)65 and 3(①)66 (the gn(x) function is as defined here):
3(①)2 = 3 <3> 3
gn(1) = 3 <6> 3
3(①)3 = 3 <3 <3> 3> 3 = 3 <27> 3
gn(2) = 3 <3 <6> 3> 3
and so on
3(①)65
gn(64) = Graham's number
3(①)66
Going back to Bowers' notation, here is the definition of the next extended operator. It is a right-associative recursive iteration of the first:
a <<2>> 2 = a <<1>> a
a <<2>> 3 = a <<1>> (a <<1>> a)
a <<2>> 4 = a <<1>> (a <<1>> (a <<1>> a))
and so on
a <<2>> b = a <<1>> (a <<2>> (b-1))
a <<2>> b is called "a multiexpanded to b".
Note the similarity of the definition a <<2>> b = a <<1>> (a <<2>> (b-1)) to the corresponding definition for a hyper operator, e.g. a②b = a①(a②(b-1)).
The subsequent extended operators are defined in a similar way, each in terms of the previous.
a <<3>> b is called "a powerexpanded to b".
a <<4>> b is called "a expandotetrated to b".
Then Bowers defines a third series of extended operators, in a similar way:
a <<<1>>> 2 = a <<a>> a
a <<<1>>> 3 = a <<a <<a>> a>> a
a <<<1>>> 4 = a <<a <<a <<a>> a>> a>> a
and so on
a <<<1>>> b is "a exploded to b".
then the next operator:
a <<<2>>> 2 = a <<<1>>> a
a <<<2>>> 3 = a <<<1>>> (a <<<1>>> a)
a <<<2>>> 4 = a <<<1>>> (a <<<1>>> (a <<<1>>> a))
and so on
a <<<2>>> b is called "a multiexploded to b".
and so on:
a <<<3>>> b is called "a powerexploded to b".
a <<<4>>> b is called "a explodotetrated to b".
You can see that this generalizes easily to a function of four numbers, a, b, the number inside the angle-brackets and a number telling how many angle-brackets there are. This can be written as a function, f(a,b,c,d) or something like that but Bowers wanted to go further.
Back to page 3 . . . Forward to page 5