| Large Numbers |
Back to page 4 . . . Forward to page 6
Bowers' Array Notation (4-element Subset)
In order to generalize his operators while also making it easy to extend the system further, Bowers created his array notation. The 3-element version of Bowers array notation was already covered above. It is easy to convert from the extended operator notation to an array notation version of the same number sacrificing a bit by making the rules for "expanding" a number from the array notation a little complex.
All of the operators defined thus far can be expressed as an array with up to four elements, as follows:
[a] = a
[a,b] = a+b
[a,b,1] = [a,b] = a+b = a①b
[a,b,2] = a×b = a②b
[a,b,3] = ab = a③b
[a,b,c] = a <c> b = hy(a,c,b)
[a,b,c,1] = a <c> b (combining a and b with the cth operator from the added, multiplied, exponentiated, ... sequence)
[a,b,c,2] = a <<c>> b (combining a and b with the cth operator from the expanded, multiexpanded, powerexpanded, ... sequence)
[a,b,c,3] = a <<<c>>> b (combining a and b with the cth operator from the exploded, multiexploded, powerexploded, ... sequence)
Here are the rules:
1. For one- and two-element arrays, just add the elements. [a] = a
and [a,b] = a+b
2. If rule 1 does not apply, and if there are any trailing 1's,
remove them: [a,b,1,1] = [a,b];
[a,b,c,1] = [a,b,c], etc.
3. If neither previous rule applies,
and the 2nd entry is a 1, remove all but the first element:
[a,1,b,c] = [a] = a.
4. If none of the previous rules applies, and the 3rd entry is a 1:
[a,b,1,c] becomes [a,a,[a,b-1,1,c],c-1]
5. Otherwise all four elements are greater than 1: [a,b,c,d]
becomes [a,[a,b-1,c,d],c-1,d]
At this point, it is interesting to note how similar the Bowers array rules are to the definition of a recursive function like the Ackermann function. The Ackermann function was originally developed with the restriction that functions must be defined entirely in terms of calls to themselves and each other, and in terms of the "successor function" s(x) = x+1. The 5 rules above can be restated:
12. f(s(s(a)),1,1,1) = a
1b. f(s(s(a)),s(s(s(b))),1,1) = f(s(s(s(a))),s(s(b)),1,1)
3. f(s(s(a)),1,s(s(b)),s(s(c))) = a
4. f(s(s(a)),s(s(s(b))),1,s(s(s(c)))) = f(s(s(a)),s(s(a)),f(s(s(a)),s(s(b)),1,s(s(s(c)))),s(s(c)))
5. f(s(s(a)),s(s(s(b))),s(s(s(c))),s(s(d))) = f(1,f(1,s(s(b)),s(s(s(c))),s(s(d))),s(s(c)),s(s(d)))
Here is an example of applying the rules to the simplest non-trivial 4-element array:
[2,2,2,2] = [2,[2,1,2,2],1,2] (by rule 5)
[2,1,2,2] = 2 (by rule 3)
so we have [2,2,1,2]
[2,2,1,2] = [2,2,[2,1,1,2],1] (by rule 4)
[2,1,1,2] = 2 (by rule 3)
so we have [2,2,2,1]
[2,2,2,1] = [2,2,2] (by rule 2}
[2,2,2] = [2,[2,1,2],1] (by rule 5)
[2,1,2] = 2 (by rule 3)
so we have [2,2,1]
[2,2,1] = [2,2] (by rule 2)
[2,2] = 2+2 = 4 (by rule 1)
With a little effort you can see that anything starting with [2,2 is equal to 4. To get anything bigger than 4, you have to have at least one 3. Here is the simplest example:
[3,2,1,2] = [3,3,[3,1,1,2],1] (by rule 4)
= [3,3,3,1] (because [3,1,1,2] = 3 by rule 3)
= [3,3,3] (by rule 2)
Once it is reduced to a 3-element array, we can convert to hyper operator notation as established earlier. So [3,2,1,2] is 3③3 = 33 = 27. Now using the extended operator <<1>>, 3 <<1>> 2 = 3 {3} 3. This is the same as [3,2,1,2}.
Here is another example:
[3,2,2,2] = [3,[3,1,2,2],1,2] (by rule 5)
= [3,3,1,2] (because [3,1,2,2] = 3 by rule 3)
= [3,3,[3,2,1,2],1] (by rule 4)
= [3,3,[3,3,[3,1,1,2],1],1] (by rule 4 again)
= [3,3,[3,3,3,1],1] (because [3,1,1,2] = 3 by rule 3)
= [3,3,[3,3,3]] (by rule 2)
= [3,3,[3,[3,2,3],2]] (by rule 5)
= [3,3,[3,[3,[3,1,3],2],2]] (by rule 5)
= [3,3,[3,[3,3,2],2]] ([3,1,3] = 3 by rule 3)
= [3,3,[3,[3,[3,2,2],1],2]] (by rule 5)
= [3,3,[3,[3,[3,[3,1,2],1],1],2]] (by rule 5)
= [3,3,[3,[3,[3,3]],2]] (by rules 2 and 3)
= [3,3,[3,[3,6],2]] (by rule 1)
= [3,3,[3,9,2]] (by rule 1)
... (about 8 repeats of rules 5 and 1 to turn [3,9,2] into 27)
= [3,3,27] = 3 {27} 3 = 3㉗3 = 3 {26} (3 {26} (3...))
This is equivalent to 3 <<2>> 2, which expands as follows:
3 <<2>> 2 = 3 <<1>> 3 = 3 <3 <3> 3> 3 = 3 <27> 3 = 3㉗3
3 <<2>> 2 is the same as [3,2,2,2]. In fact, the rules for the 4-element array notation are equivalent to definitions of the extended operators. The array [a,b,c,2] is equal to a <<c>> b; [a,b,c,3] is a <<<c>>> b; and in general [a,b,c,d] is a <<<<<c>>>>> b with d sets of brackets around the c.
Bowers Arrays with 5 or More Elements
Of course, Bowers wanted to extend the system, so the rules were designed to work with arrays of arbitrary length. This is done by changing rules 4 and 5 to the following:
4. If none of the previous rules applies, and the 3rd entry is a 1: Define the variables a,b,S,d and R so that the array is [a,b,S,1,d,R] where a,b are the first two elements, [S,1] is the string of 1 or more 1's; d is the first element bigger than 1 and [R] is the remaining part of the array. Replace the array with [a,a,S',[a,b-1,S,1,d,R],d-1,R] where [S'] is a string of a's of equal length as string [S].
5. If none of the previous rules applies, replace the second element: [a,b,c,R] becomes [a,[a,b-1,c,R],c-1,R]
I am fairly well convinced that Bowers is right in stating that the value represented by the 5-element array [n,n,n,n,n] is at least as large as n→n→n→...→n in the Conway chained-arrow notation, where there are n items in the chain.
For more on Bowers' notation, including updated definitions and a great many more steps in defining new recursive functions, read here (or the newer, longer version here).
Generalized Invention of Recursive Functions
At this point it is best to just describe the general process. Larger numbers are described by defining various types of recursive functions, always defined in terms of themselves and other previously defined recursive functions. Each new definition adds a little more complexity to the system. In order to understand any one function, you have to understand all the functions it is defined in terms of. Once you have defined a new function, you can invoke it with larger and larger arguments: f(2), f(10), f(f(1000)), etc. until the amount of digits and notation symbols becomes inconvenient, then you define a new function g(x).
It is important to note that you keep adding information: plugging in larger numbers like 2, 10, 1000 increases the information, and defining functions greatly increases the information. In general, larger numbers require more information.
But defining functions is just an operation in itself. If you define a standardized way to define new functions, then you can abstractify the process of defining the functions, and define a new function based in the idea of iterating the process of defining functions. This requires modeling the process of recursive definition and computation, something that can be done with, say, a computer program that emulates another simpler computer.
This is a jump into a second-higher level of abstraction. Just as arithmetic is an algorithmic abstractification of counting, and defining functions is an algorithmic abstractifcation of the mechanics of arithmetic, this new process of automatically repeatedly defining functions is an abstractification of that.
All of these ideas were formalized and the process of algorithmic abstractification was studied in the theory of computation by Turing, Church and Kleene, among others. They showed that all algorithmic processes within a certain limited definition of algorithmic process could be reproduced by a certain, minimal definition of computation, and used that model to show that there were certain limits to what types of things could be computed.
If the foregoing makes little sense, consider this concrete (but somewhat non-rigorous) example. Select any well defined, "sufficiently powerful" grammar G, consisting of a symbol-set of S symbols, finite in number, and well-defined rules of what constitutes a syntactically valid string of symbols specifying an integer. An example grammar that should be fairly familiar uses the symbols:
0 1 2 3 4 5 6 7 8 9 + * ^ ( )
and the rules that these symbols are to be strung together to make a legal set of additions, multiplications and exponentiations yielding an integer result; in this example S = 15. Just to be unambiguous, we'll require parentheses whenever two operators appear in a string.
Given this grammar G, for every integer N there is a set of integers EN consisting of all the integers that can be specified as a combination of N symbols in G using G's defined grammar. This set is finite, (it has, at most, SN elements, since there are that many combinations of N symbols from a set of S). Since EN has a finite number of elements, it therefore has a maximum element. Define m(N) to be a new function (not a part of the grammar G) giving the value of this maximum expressible integer in the grammar G for each N. Now we have a function which is guaranteed to grow at least as fast as any function defined within G, or faster. (Technically, it is only guaranteed to grow faster above a certain minimum value of N this is part of what we vaguely called "sufficiently powerful"). In any case, this function, or any larger function definition from f(x) = m(x) + 1 to f(x) = m(m(m(x))) or beyond, can be defined as part of a new, larger grammar G' incorporating all of the definitions of G plus the new definition of f().
So, in the specific example given here, we find in particular that for N = 1, N = 3, N = 7, N = 11, the largest expressible integers in G are:
9, therefore m(1) = 9
9^9, therefore m(3) = 9 ^ 9
9^(9^9), therefore m(7) = 9④3
9^(9^(9^9)), therefore m(11) = 9④4
and in general for N = 4 X + 3 for any integer X > 0,
m(N) = 9④(X+2)
Since N is always larger than X + 2 we can define our new grammar G' just by adding the symbol:
h
(which represents the ④ or hyper4 operator) and the new syntax:
a h b
where a and b are valid strings, and interpreted as a④b. This function grows faster than G's m(x) function. In this new grammar, which we now call G', m(3) is 9④9, m(4) is 9④99 and m(7) is 9④(9④9), so the process can continue to grammar G'' and so on. If you continue the same idea indefinitely you just get higher hyper operators, but you could also define new symbols using the ideas given above the Ackermann function, the Conway chained-arrow notation, etc. At each stage you have a grammar Gx with its maximal function m(n) to which the same idea can be applied to generate another bigger function.
The Lin-Rado Busy Beaver Function
The process described in the previous section defines higher functions while adding to the number of information necessary to describe the function and its formal system. To do this in absolutely the least amount of data, it's hard to beat the busy beaver problem.
The Turing machine is often used to demonstrate fundamental principles of computation. It is equivalent to many (but not all) actual real-world computer techniques. A Turing machine consists of a state machine that has a certain (finite) number of states, an infinitely large memory (usually described as an infinitely long linear strip of paper that can be marked in two different ways) and a set of rules describing what the machine will do given its current state and the marks on the paper. The rules are limited to things like moving one way or the other on the strip, writing a symbol (like 0 or 1) on the paper, looking at what's written at the current position, changing to another state, and stopping. The rules are often described as "five-tuples": each rule is five numbers (A,B,C,D,E) and is interpreted as "If you're in state A and the tape has symbol B then go to state C, write symbol D, and move the tape E units to the right". (A must be from 1 to N, C must be from 1 to N or 'H' for halt, B and D must be 0 or 1 and E must be -1 or 1. Note that a "don't move the tape" option wouldn't gain anything, because then you'll just apply another rule and overwrite the tape anyway.).
The Busy Beaver Function was originally defined by Tibor Rado at Ohio State in 1962. It is defined by specifying that you must start with a blank tape (all 0's), with a finite number of symbols per position on the tape (we usually use two: 0 and 1) and you're limited to N states in the state machine. What is the most number of marks (1's) you can have it write on the tape before stopping? A set of rules that never stops doesn't count. The maximum number of marks for N states is BBN. This is a well-defined function and it grows very very fast.
In this table, the column labeled "Machines" tells how many Turing machines of N states exist; this is (4N+4)2N (the number that actually have to be checked is lower). The column labeled "steps" shows how many steps are taken by the current record-holder before halting. Here are the record holders and a more detailed description of the difficulty of the problem. A good page for recent infomation about the problem is Marxen's own page.
| N | Machines | BBN | Steps | Found by |
| 1 | 64 | 1 | 1 | Lin & Rado |
| 2 | 20736 | 4 | 6 | Lin & Rado |
| 3 | 16777216 | 6 | 21 | Lin & Rado |
| 4 | 25600000000 | 13 | 107 | Brady |
| 5 | 63403380965376 | >= 4098 | 47176870 | Marxen & Buntrock |
| 6 | 232218265089212416 | >= 4.6×101439 | 2.5×102879 | Terry and Shawn Ligocki |
| 7 | ||||
| 8 | >= 8.248×1044 | Milton Green |
When it comes to implementing fast-growing functions of integers, Turing machines appear to do a very good job of going at least as high as anything else we've defined. For example, a Turing machine with only 6 states is sufficient to implement an interated exponential function with a chaotic deterministic low-probability exit condition. The machine that set the 1.29149×10865 record is essentially performing the iteration X=2K×X several times in a row before halting. There are few involved with Turing machines who doubt that with only a few more states, massively higher numbers can be computed by much faster-growing functions.
BBN is not "computable" in the formal sense you cannot predict how long it might take to count the number of 1's written by all Turing Machines with N states for arbitrary values of N. But for specific small values of N, it is possible to do a brute-force search, with human assistance to examine all the "non-halting" candidates and equip your program with pattern-matching techniques to identify these as non-halting.
However, this takes massively greater amounts of work for each higher value of N, and so the Busy Beaver function is unwieldy to calculate. No-one has been able to complete the brute-force search for any value of N greater than 4.
So the Busy Beaver function is not actually a good way to calculate big numbers for example, 101027 isn't nearly as big as BB27, but it's bigger than any BBN value we've been able to calculate, and it can be calculated much more quickly and easily.
The only way in which the Busy Beaver function "grows fastest" is when you look at it in terms of the function's value compared to the amount of information required to specify the formal system, the function, and the function's parameter(s). This is a highly abstract concept and shouldn't be considered important unless you are studying the theory of deterministic algorithms specified within formal systems. To understand this, you can imagine, defining a precise set of rules for manipulating symbols, which define all of the functions above (starting with addition and going up through chained arrow notation, iteratively defining new functions, and so on). Each new rule, symbol and function would take a bit more information to define completely. If you wrote a computer program to compute each function, each program would be a bit more complex. You could also do the same thing by starting with a definition of the rules of the Turing machine, then start with 1-state Turing machines and then increase the number of states by adding a few extra bits of information per state. It is generally believed that, as the amount of information used gets higher, the Turing machine based system will produce higher function values than any other formal system.
In other words, the Turing machine is believed to be the most concise known general-purpose algorithmic computation system. It seems to grow faster than any other function in any other formal system when both the system's definition and the function's arguments are counted as part of the data length.
Beyond the Busy Beaver Function
Attempts to go beyond the Busy Beaver function, by necessity, have to go beyond functions, algorithms and formal systems. Imagine any sufficiently general definition of formalism (such as the Turing machine) and then define a function f(N) giving the maximum value of the results of its computation in terms of N, a suitably formalized specification of the amount of information used to define the formal system and the algorithm. f(N) will have a finite value for any finite N and can be said to grow at a certain rate. Because all f(N) are finite for all finite N, there exists a g() such that g(N) is greater than f(N) for all N.
By necessity, it is impossible to define g(N) in any specific way because the entire realm of formal systems and algorithmic definition is already a part of the definition of f(N). By necessity, g(N) cannot have a clear definition: if it did that definition is formalizable and capable of being computed by the Turing machine, and is therefore already part of f(N).
At this point in the discussion (or usually sooner) it becomes apparent that there is additional knowledge and assumptions "outside the system". An effort is made to identify these, define them precisely and add them into the quantity N. After doing this, it is soon discovered that the resulting formal system itself depends on things outside itself, and so on. I have encountered many expositions, discussion threads, etc. over the years, that begin with an optimistic determination to formalize the problem and quantify exactly how large numbers can be derived from first principles; they all have ended up somewhere in this jungle of abstraction. Here is a relevent quote:
I have this vision of hoards of shadowy numbers lurking out there in the dark, beyond the small sphere of light cast by the candle of reason. They are whsipering to each other; plotting who knows what. Perhaps they don't like us very much for capturing their smaller brethren with our minds. Or perhaps they just live uniquely numberish lifestyles, out there beyond our ken.12
The Frontier
As of this writing (last checked in 2007), this seems to be the frontier of development for the expression of large finite numbers. Of course, many people try, but everything seen so far appears to duplicate or fall short of the results of computation theory.
If you're interested in defining larger functions, go right ahead, but please note that you'll want to check your new function to see if it really pushes the limits a significant amount. If you use any of the methods described above, then your new function will not push the limits a significant amount.
Back to page 4 . . . Forward to page 6