Proceed to Safety

Large Numbers    

First page . . . Back to page 4 . . . Forward to page 6 . . . Last page (page 11)

The Mega and the Moser

These numbers were constructed by Hugo Steinhaus and Leo Moser (in the late 1960's or earlier, exact dates unknown) just to show that it is easy to create a notation for extremely large numbers.

In the earlier "Steinhaus polygon notation", there were 3 special symbols: a triangle, a square, and a circle.

X inside a triangle equals XX
X inside a square equals X inside X concentric triangles
X inside a circle equals X inside X concentric squares

Steinhaus called 2 inside a circle "Mega" and 10 in a circle "Megiston".

(These descriptions are in the 1969 edition of the book "Mathematical Snapshots"; I learned about Megiston at a university library in June 1979.)

Later, in what is now called "Steinhaus-Moser notation", the circle was replaced with a pentagon and higher-order polygons (hexagons, etc.) were added to the system.

As before, X inside a triangle equals XX
As before, X inside a square equals X inside X concentric triangles
X inside a pentagon equals X inside X concentric squares
X inside a hexagon equals X inside X concentric pentagons
and so on.

The "Mega" is now represented by 2 inside a pentagon, and "Megiston" is 10 inside a pentagon. A new, much larger quantity called "Moser's number" is "2 inside a megagon", where a "megagon" is a polygon with a Mega sides.

Here is the notation in functional form:

sm(a,n,p) = { a^a for n = 1, p = 3 { { sm(a,a,p-1) for n = 1, p > 3 { { sm(sm(a,1,p),n-1,p) for n > 1

and here are a few values:

sm(2,1,3) = 2^2 = 4 sm(2,2,3) = sm(sm(2,1,3),1,3) = sm(4,1,3) = 4^4 = 256 sm(2,1,4) = sm(2,2,3) = 256 mega = sm(2,1,5) = sm(2,2,4) = sm(sm(2,1,4),1,4) = sm(256,1,4) = sm(256,256,3) = sm(256^256,255,3) = sm((256^256)^(256^256),254,3) = sm([(256^256)^(256^256)]^[(256^256)^(256^256)],253,3) = ... megiston = sm(10,1,5) moser = sm(2,1,Mega)

We can approximate the sm function in terms of the hy function for small values of p:

sm(n,1,3) = n^n = hy(n,3,n) = hy(n,4,2) sm(n,2,3) = (n^n)^(n^n) = n^(n*n^n) = n^(n^(n+1)) ~= hy(n,4,3+epsilon) sm(n,3,3) = sm(n,2,3)^sm(n,2,3) = (n^(n^(n+1)))^(n^(n^(n+1))) = n^(n^(n+1) * n^(n^(n+1))) = n^(n^(1+n+n^(n+1))) ~= n^(n^(n^(n+1+epsilon))) ~= hy(n,4,4+epsilon) by induction sm(n,x,3) ~= hy(n,4,x+1+epsilon) sm(n,1,4) = sm(n,n,3) ~= hy(n,4,n+1+epsilon) sm(n,2,4) = sm(sm(n,1,4),1,4) ~= sm(hy(n,4,n+1+epsilon),1,4) ~= hy(hy(n,4,n+1+epsilon),4,hy(n,4,n+1+epsilon)+1+epsilon)    "intuitively", sm(n,1,4) ~= hy(n,4,n+1+epsilon) sm(n,1,5) ~= hy(n,5,n^2)

The value of Mega can be computed by hypercalc's internal BASIC interpreter with the following code:

10 let mega = 256; 20 for n=1 to 256; 40 let mega = mega ^ mega; 80 next n 160 print "Mega = ", mega 320 end

The first few steps of this generate the numbers:

256 = 28 = 223
256256 = 22048 = 2211 ≈ 3.231700607153×10616
(256256)(256256) = 222059 ≈ 10(1.992373902866×10619)
[(256256)(256256)][(256256)(256256)] = 222059+22059 ≈ 1010(1.992373902866×10619)

Each time through the loop there are twice as many 256's — so there are 2256 256's in mega. However, the parentheses are grouped differently from the power towers discussed above. After two times through the loop, for example, it's (256256)^(256256). That is not as big as 256(256(256256)) — the former is 10(1.992373902866×10619), the latter is 10[10(7.78271055807×10616)]. This discrepancy continues, with the result that the mega is about 10^(10^(10^...^(1.992373902866×10619)...)), with 255 10's before the "1.992373902866×10619" part. For reasons explained here, that is equivalent to what you get if you replace all but the last few 10's with nearly any other number between say 2 and 1000. hypercalc's final answer is:

255 PT ( 1.992373902865×10619 )

which represents a power tower with 255 powers of 10 ("255 P.T.") and 1.992373...×10619 at the top.


Fast Growing Hierarchies

We have a lot of larger finite and well-defined numbers to come. As we proceed it becomes steadily harder to put them in ascending order. This is where it becomes really useful to use a formally-defined function hierarchy, as discussed earlier.

The (unfortunately somewhat poorly-named) phrase "fast growing hierarchy" refers to a family of construction and proof techniques that together can be used to demonstrate a lot of useful results that are important for working out the relative sizes of the large numbers to be discussed below.

Many examples of function hierarchies are possible, depending on the purpose for which they are to be used. An often-used example is the Hardy hierarchy, which relates closely to the Goodstein sequences we will discuss next. Here is how it begins:

f0(n) = n+1
f1(n) = f0n(n) = n×2
f2(n) = f1n(n) = n×2n
f3(n) = f2n(n) = n×2n×2n×2n ...
. . .

These functions fall into a pattern that should look familiar:

*f0() performs addition K+n,
f1() performs multiplication K  n,
f2() slightly exceeds performs exponentiation Kn,
f3() slightly exceeds tetration or "hyper4" kn
. . .

And the next functions after that are equivalent in growth rate to the higher "hyper" operators.

As in our earlier examples there are an infinite number of functions with integer indices like this followed by many more with cardinal infinite indices.

This function hierarchy is quite well described in the Numberphile video TREE vs Graham's number.

That's enough on this for now, but we'll revisit this rich topic later.


Goodstein Sequences

Reuben Goodstein in 1944 proved [30] a remarkable theorem about the finite limit of an iterative process that seems to most observers to be destined to go on forever.

Higher than any of the finite-indexed fn() functions, but not quite as high as the single-argument Ackermann function, is gw(n), the length of the weak Goodstein sequence starting with n. After we discuss that we'll move on to other things, and eventually come back to Goodstein sequences again using the "strong" definition.

"Goodstein sequence of n" = the sequence of integers constructed as in Goodstein's Theorem from a starting value of n.

The length of a Goodstein sequence, and the highest value achieved, are both guaranteed to be finite, and are given by a fast-growing function in n. Two definitions are common:

gw(n) = highest value in the Goodstein sequence of n.

gw(n) = number of terms in the Goodstein sequence of n (counting from the initial term equal to n and the first term equal to 0); or equivalently: the value of the base b of the term with a value of 1.

There are two ways to do this, "weak" and "strong". We'll start with the "weak" version, where the numbers grow less quickly and the sequence gets to 0 sooner.

Start with a number, like 7; and start with base 2. Write out the number "in base 2" by breaking it up into a sum of "digits" multiplied by powers of 2:

7 = 1×22 + 1×21 + 1×20

Now "increase the base to 3" by changing the 2's to 3's (but leave the exponents alone):

      1×32 + 1×31 + 1×30

Now interpret this as a number, which is the new (larger) value:

1×32 + 1×31 + 1×30 = 13

Now subtract one:

12 = 1×32 + 1×31 + 0×30

12 is the second term in the sequence, and 3 is the base.

Now repeat the process: Change the 3's to 4's, compute a new value, and subtract one:

      1×42 + 1×41 + 0×40
1×42 + 1×41 + 0×40 = 20
19 = 1×42 + 0×41 + 3×40    ←*

Repeating this a few more times, we get:

27 = 1×52 + 0×51 + 2×50
37 = 1×62 + 0×61 + 1×60
49 = 1×72 + 0×71 + 0×70
63 =             7×81 + 7×80    ←*

Clearly every time we make a new term in the sequence, the base is increasing by one. This means that if we want to know the length of the sequence we can just look at the value of the base. We'll do this because it makes it easier to work out how long these sequences will be. It also means:

Every time the b0 term of the sum decreases, b becomes b+1

We're just adding one, but we will say that there is a function f(b) = b+1. Expressing it that way is more important than it may seem at the moment. Let's do the next 8 steps:

69 =             7×91 + 6×90
75 =             7×101 + 5×100
81 =             7×111 + 4×110
87 =             7×121 + 3×120
93 =             7×131 + 2×130
99 =             7×141 + 1×140
105 =             7×151 + 0×150
111 =             6×161 + 15×160    ←*

Also, at this point, notice the lines marked with "←*". These are times where the rightmost "digit", the b0 term of the sum, has switched from 0×b0 to (b-1)b0. Notice that the base is 4 the first time, and it is 8 the next time, and then 16 the time after that. It doubles each time.

Here is another example, starting with 27 (in base 2, 110112) and writing things more concisely with digits and the base as a subscript:

                     110112 = 2710
110113 = 11210. 110103 = 11110
110104 = 32410. 110034 = 32310    ←*
110035 = 75310. 110025 = 75210
110026 = 151410. 110016 = 151310
110017 = 274510.
110008 = 460810. 107778 = 460710    ←*
107779 = 719810.
1077610 = 10776
1077511 = 1557010
1077412 = 2183210
1077313 = 2983810
1077214 = 3988810
1077115 = 5230610
1077016 = 6744010. 1076F16 = 6743910    ←*
1076F17 = 8566110

Again the lines marked with "←*" show where the lowest digit was a zero, and again the base doubles from 4 to 8 and then to 16.

This doubling is a universal phenomenon; it happens in any Goodstein sequence even if we were to start with a base other than 2. The reason is pretty easy to see: When it is 0×b0, we're going to need to subtract one from the next "digit" to the left, which takes one step to do, and then we have b-1 in the rightmost "digit" and it will therefore take another b-1 steps to get this down to zero. So we spend exactly b steps, and during this time the base goes up from b to 2b.

Every time we go from ... + k×b1 + 0×b0 to ... + (k-1)×b1 + 0×b0, b becomes 2b

This time there is a function g(b) = 2b, which is equivalent to repeating the function f(b) = b+1 a total of b times.

For similar reasons, the number of steps needed each time a b2 digit decreases are expressed by a function h(b) that is a repetition of the g() function:

Every time we go from ... + k×b2 + 0×b1 + 0×b0 to (k-1)×b2 + 0×b1 + 0×b0, b becomes 2bb.

The new function h(b) is 2bb, which is exactly what you expect to get if you start with b and double it b times.

The f(), g(), and h() functions we have defined so far fall into a pattern that should look familiar:

f(n) = n+1 = f0(n)
g(n) = 2n = f1(n)
h(n) = 2nn = f2(n)

These are none other than the first three functions of the fast-growing hierarchy described earlier.

Of course, the next-higher digits will grow the sequence length at a rate like that of tetration and the higher "hyper" operators. In general, if our starting value is 2n then the sequence will start with 1111113 (with n 1's) and the length of the sequence will involve the nth hyper operator. Very approximately,

gw(2n) ≈ 2↑↑...↑(2n)       (with n-2 arrows)
            = hy(2, n, 2n)

The much more slowly growing "Goodstein numbers" gn(n), as seen in Sloane's sequence A266202, considers the nth term in the "Goodstein sequence of n" as defined above. So for example, when we started with 7 and iterated 7 times we had 69 = 7×91, so 69 is the 7th member of sequence A266202 (the starting 0 is considered the 0th member of the sequence). The corresponding sequence for the "strong" definition of Goodstein iteration is A266201, and there the 7th term is 37665879.

For more, you can skip ahead a bit to Goodstein sequences (strong).

First page . . . Back to page 4 . . . Forward to page 6 . . . Last page (page 11)

Japanese readers should see: 巨大数論 (from @kyodaisuu on Twitter)

If you like this you might also enjoy my numbers page.

Robert Munafo's home pages on AWS    © 1996-2024 Robert P. Munafo.    about    contact
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Details here.