mdbtxt1
mdbtxt2
Proceed to Safety

Alternative Number Formats    

Contents:

Residue Number System

p-adic Numbers

Double-Base Number System

Multiple-Base Composite Integers

Factorial Base Number System

Logarithmic Number System

Level-Index and Symmetric Level-Index

Lexicographic Strings

     A Ubiquitous Practice

     Knuth's Binary Lexicographic Strings

     Munafo Alphabetic PT System

     Order-Preserving Verbal Names

Alternative Algorithms for Standard Formats

Incremental Games

Bibliography


Residue Number System (RNS)

A base is chosen consisting of a set of relatively-prime numbers. Each normal integer is represented in the RNS as a set of digits, each digit is the number's value modulo each of the relatively-prime numbers in the base. For example, if the base is {2,3,5} the numbers 0 through 29 can be represented as follows:

{2,3,5} {2,3,5} {2,3,5} 0 0,0,0 10 0,1,0 20 0,2,0 1 1,1,1 11 1,2,1 21 1,0,1 2 0,2,2 12 0,0,2 22 0,1,2 3 1,0,3 13 1,1,3 23 1,2,3 4 0,1,4 14 0,2,4 24 0,0,4 5 1,2,0 15 1,0,0 25 1,1,0 6 0,0,1 16 0,1,1 26 0,2,1 7 1,1,2 17 1,2,2 27 1,0,2 8 0,2,3 18 0,0,3 28 0,1,3 9 1,0,4 19 1,1,4 29 1,2,4

For example, 17 is (1,2,2) because 17 = 1 mod 2, 17 = 2 mod 3 and 17 = 2 mod 5. Notice that none of the other numbers in the table has the representation (1,2,2), in fact no two numbers in the table have the same representation.

Residue number systems closely resemble certain astronomical and calendrical calculations.

The calculation of Easter uses a {19,7} residue system with additional periodic corrections to achieve greater agreement with the actual phases of the moon. With all the corections the current Gregorian system has a cycle of 2081882250=2×34×53×7×19×773 solar days, equivalent to 5700000 Gregorian years of 365.2425 solar days.

The Akan calendar involves a base {6,7} system formed by the use of a traditional six-day week concurrently with the seven-day week to form a 42-day period. The Rokuyo of the Japanese calendar are a repeating 6-day cycle, but are reset at the beginning of every month, before even a single 42-day cycle has taken place.

Similar to the Akan situation, the Javanese calendar involves a traditional five-day cycle combined with the 7-day week, creating a 35-day period called the Wetonan cycle.

A base {13,20} system is used in the Mayan liturgical year (see [12] pp 312-313 and Maya calendar), a day-counting system still in use by some residents of Oaxaca and the Guatemala. In ancient usage, combined with a 365-day civil year it constituted a base-{365,13,20} system with the same range as a {73,13,20} system; 73×13×20=18980, the number of days in the 52-year "calendar wheel" (see [12] pp. 314-315). By contrast, the "Long Count" (referenced in the movie 2012) is a comparatively much simpler {20,20,20,18,20} system, a cycle of 2880000 days (see [12] pp. 316-322).

Residue number systems have been proposed as a way to make computers faster because addition, subtraction and multiplication can be done with far less circuit complexity. However, other operations, notably division and conversion to/from a standard base for input and output, make the system impractical for most applications.

To add, subtract or multiply, you perform a modulo addition, subtraction or multiplication within each field. For example, to multiply 3 by 7:

{2,3,5} 3 = (1,0,3) 1 x 1 = 1 mod 2 (2's column) 7 = (1,1,2) because 0 x 1 = 0 mod 3 (3's column) 21 = (1,0,1) 3 x 2 = 1 mod 5 (5's column)

Converting between binary and RNS is a little difficult, and is based on the Chinese Remainder Theorem.

Division is very difficult, and is the subject of much research.

Articles on residue number systems:

"Residue Number System Based Multiple Code DS-CDMA Systems", Lie-Liang Yang and Lajos Hanzo, The IEEE Vehicular Technology Conference, Houston, Texas, USA, May 16-19, 1999, pp1450-1454 http://www-mobile.ecs.soton.ac.uk/lly/papers/VTC99_RNS1-web.pdf

"Novel High-Radix Residue Number System Processors", V. Paliouras and T. Stouraitis, IEEE Transactions on Circuits and Systems-Part II, vol. 47, no. 10, pp. 1059-1073, October 2000. http://www.vlsi.ee.upatras.gr/Users/Associates/Paliuras/47csii10-paliouras.pdf

http://www.dspvlsi.uniroma2.it/pubs/iscas01a/iscas01a.html


p-adic Numbers

A p-adic number system extends integer and rational numbers to a system of "real numbers" in a different way from the ordinary real numbers, by defining "closeness" in a very different way: Given a prime number p, two numbers that differ by any multiple of p are equally close, unless they differ by a multiple of p2 in which case they are considered closer (and even closer if their difference is a multiple of p3, and so on). Since differences of pn are less and less significant as n increases, numbers can be written in "base-p" notation with the digits in reverse order, with higher powers of p trailing off to the right.

Arithmetic with p-adic numbers is similar in ways to the residue number systems.

Examples of 5-adic arithmetic:

5-adic multiplication table

a×b .0 .1 .2 .3 .4
.0 .0 .0 .0 .0 .0
.1 .0 .1 .2 .3 .4
.2 .0 .2 .4 .11 .31
.3 .0 .3 .11 .41 .22
.4 .0 .4 .31 .22 .13

Notice the similarity to base-5 multiplication; for example in base 5, 25×45 = 135; and in 5-adic representation .2×.4 = .31; the answer is the same but with the digits reversed.

Computing the p-adic Expansion of a Ratio A/B

We'll continue with the p=5 or "5-adic" system. For this example we want to represent 14/105, so a=10 and b=105.

First, remove any common factor(s): reduce 14/105 to 2/15.

Next, remove any power(s) of p, and remember them for later. Here we have a power of 5 in the denominator: a/b = c/5d, where c=2 and d=3. We'll compute 2/3, then multiply the result by 1/5 (which is as simple as shifting the "decimal point", because p-adic representation is like base-p).

If the numerator is not 1, we will first compute 1/d, then multiply by c using p-adic multiplication. So in the example, we compute 1/3 then multiply by 2 afterwards.

At this point we've reduced the problem to computing the reciprocal of an integer 1/d. Let e = d mod p and use division modulo p to determine 1/e mod p. In this case, 1/3 = 2 mod 5 (in row 2, colmn 1 of the following table):

Division modulo 5

dn 0 1 2 3 4
1 0 1 2 3 4
2 0 3 1 4 2
3 0 2 4 1 3
4 0 4 3 2 1
(from row d and column n, read n/d from the table)

We will first compute 1/de, then multiply by e to get e/de = 1/d. Since e is 2, we need to compute 1/6.

Now we have a fraction of the form 1/(Np+1) for some N. The full p-adic representation can be computed by a process similar to long division:

.1404040404.. _____________________________ .11 ) .1 .11 .4344444.. ----- .0444444.. .044 .0104444.. ----- .00044444..

We are computing 1/6 which is .1/.11 in 5-adic representation. Using the division table above, 1/1 = 1 mod 5, so the first digit of the quotient is 1. Multiply this by the divisor to get .11 and subtract. To subtract .11 from .1, we first complement .11 getting .434444... and add this to .1, carrying one to the right: .1 plus .434444... equals .0444444...

Divide the first digit of the remainer by the first digit of the dividend. Using the division table again, 1/4 = 4 mod 5. So the second digit of the quotient is 4. Multiplying by the dividend we get .044, complement to get .0104444... and add to get .0004444...

The next digit of the partial remainder is 0, giving a quotient digit of 0 and no change to the partial remainder. Then we get 4 for the next quotient digit, and the same calculation repeats.

So the 5-adic representation of 1/6 is .14040404... and to get 1/3 = 1/d, we multiply this by 2:

.14040404.. x .2 ---------- .23131313..

To get 2/3 = c/d, we multiply again by 2:

.23131313.. x .2 ---------- .41313131..

To get 2/15 = a/b = c/5d, we divide by 5, which is just a shift of the "decimal point". Thus, the 5-adic representation of 2/15 is 0.04131313...


Double-Base Number System (DBNS)

This is a form of alternate representation being studied for use in digital signal processing at the University of Windsor, Ontario Canada. Numbers are expressed as a sum of one or more terms of the form 2a3b. Each term is either present or absent, represented by a coefficient of 1 or 0 respectively. DBNS representation allows for addition, subtraction and multiplication that are very efficient (by gate count) so is well suited for signal processing or similar applications; the only drawback is conversion to and from standard representation.

"Theory and Applications of the Double-Base Number System", Vassil S. Dimitrov, Graham A. Jullien, and William C. Miller, IEEE Transactions on Computers, Vol. 48, No. 10, October 1999


Multiple-Base Composite Integers

This system was suggested by George Spelvin1, who also suggested some integer-only microfloat formats. MBCI integers are used to express integer values from a universe in which all the prime factors are in a fairly small set. For example, if the eligible primes are 2, 3, and 5, an MBCI system will represent any integer of the form 2a3b5c for integer values of a, b and c. Such values form the sequence: 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36, 40, 45, 48, 50, 54, 60, 64, 72, 75, 80, 81, 90, ... (Sloane's A051037).

For fixed-length representations, the typical strategy is to limit the exponent of each eligible prime to a range that can be expressed in a given number of binary digits. For example, using the primes 2 3 and 5 again, one could limit the exponent of 2 to fall in the range 0..15, and the exponents of 3 and 5 to fall in the range 0..3; this allows the exponents to be stored in 4, 2, and 2 bits respectively for a total of 8 bits. Such a system encodes lots of commonly used integer values, including most of the commonly-encountered highly-composite numbers (such as 24, 144, 3600, etc.), all of the common video resolution heights and widths (like 640, 480, 1024, 768, 720 and 1080), and all of the common modem baud rates (like 2400, 19200, and 57600).

Another fixed-width representation does not use explicit bit fields for each exponent, but instead simply indexes into a (hypothetical infinite) list of all integers with the eligible prime factors. For example, a 5-bit representation would be used to specify any of the 32 smallest positive integers of the form 2a3b5c, which are the 32 values of sequence A051037 shown above.


Factorial Base Number System (FBNS)

Like Multiple-Base systems but taking the idea further, a factorial base number system (or "factorial number system" as called by Knuth)
has digits in each position that range from 0 to N-1, with the "place value" of each position being (N-1)! with N starting at 1. Counting in this system looks like this: 0, 10, 100, 110, 200, 210, 1000, 1010, 1100, 1110, 1200, 1210, 2000, 2010, 2100, 2110, 2200, 2210, 3000, 3010, 3100, 3110, 3200, 3210, 10000, 10010, ... (Sloane's sequence A124252). The rightmost digit is always 0; to avoid this one might say that N starts at 2, in which case you get the same thing without the final 0: 0, 1, 10, 11, 20, 21, 100, 101, 110, 111, 120, 121, 200, 201, 210, 211, 220, 221, 300, 301, 310, 311, 320, 321, 1000, 1001, ... (Sloane's sequence A7623).

If you want to handle fractions, then an FBNS provides a way to express any rational number with a finite number of digits to the right of the "decimal point". The digits to the right are for multiples of the fraction 1/N!=, with N starting at 2. The expansions are often a bit non-intuitive, as these examples from the Wikipedia page show:

1/2=0.1!
1/3=0.02!
2/3=0.11!
1/4=0.012!
3/4=0.112!
1/5=0.0104!
1/6=0.01!
5/6=0.12!
1/7=0.003206!
1/8=0.003!
1/9=0.00232!
1/10=0.0022!
1/11=0.002053140A!
2/11=0.0101462819!
9/11=0.1133105082!
10/11=0.1214036491!

Brent Sanders has made an implementation in Ruby (for integers only, no fractions), here is the source code


Logarithmic Number System (LNS)

Some articles on "logarithmic" number systems:

M. Arnold, T. Bailey, J. Cowles: "Comments on 'An Architecture for Addition and Subtraction of Long Word Length Numbers in the Logarithmic Number System'. , IEEE Trans. on Computers, Jun 1992, pp 786-788

"Semi-Logarithmic Number Systems", J.-M. Muller, A. Scherbyna, and A. Tisserand, IEEE Trans. on Computers, Feb 1998 http://www.ens-lyon.fr/~jmmuller/IEEETC-Fev98.pdf

"Efficient algorithms for binary logarithmic conversion and addition", Y. Wan and C.L. Wey, IEEE Trans. on Computers and Digital Techniques, May 1999, p.168

"Efficient conversion algorithms for long-word-length binary logarithmic numbers and logic implementation", Y. Wan, M.A. Khalil and C.-L. Wey, IEEE Trans. on Computers and Digital Techniques, Nov 1999, p.295

R. Zimmermann, "Computer Arithmetic: Principles, Architectures, and VLSI Design," Lecture notes, Integrated Systems Laboratory, ETH Zurich, 1997.


Level-Index (LI) and Symmetric Level-Index (SLI) Number Systems

This is also called the anti-tetrational number system.

LI and SLI use the same presentation for numbers greater than or equal to 1: A real-number base is chosen (usually e=2.71828... but sometimes 2, or could be any other base). A number x is represented as a triple (s,l,i). s is a sign (0 or 1), l is the level and is a positive integer, i is the index and is in the range [0,1). l and i are related to x via a "generalized exponential function" Fge in the following way:

x = -1s Fge(l,i)
Fge(l,i) = ei, if i=0
Fge(l,i) = eFge(l-1,i), if i>0

To represent numbers in the range (-1,1), the LI system uses l=0 and i=|x|. To represent numbers in the range [1,e), it uses l=1, and i=ln(x) where i is in the range [0,1). l=2 is used for values from e through ee, and so on. Here are a few examples:

sl i represents the value
1 2 0.0 -2.71828183... = -e
1 1 0.5 -1.64872127... = -√e
1 1 0.0 -1
1 0 0.5 -0.5
0 0 0.0 0
0 0 0.5 0.5
0 1 0.0 1
0 1 0.5 1.6487212707... = √e
0 2 0.0 2.71828183... = e
0 2 0.5 5.20032576... = ee
0 2 0.834... 10
0 3 0.0 15.1542622... = ee
0 3 0.423... 100
0 4 0.0 3814279.104... = eee
0 5 0.0 2.3315...×101656520 = eeee

Any real number can be turned into a unique combination of sign s, integer portion l and fraction i and used to represent a (usually larger) quantity via this definition. This could be used as a mapping from reals onto reals, that "compresses" a large range of values into a fairly small range of function arguments. This function is shaped sort of like the hyperbolic sine sinh(x) but with a straight line section in the area from -1 to 1. It has several nice properties, including that it is continuous, differentiable, and monotonic (everywhere increasing).

The differentiable property makes Level-Index particularly good for a computer number format, because it means that the roundoff error does not suddenly change at certain values. Normal floating-point numbers are not as well-behaved: the maximum (and mean expected) magnitude of roundoff errors doubles at every power of 2.

Symmetric Level-Index

The original purpose of Level-Index was to avoid roundoff and overflow errors in calculations. One common cause of roundoff error comes from calculating a very very small quantity, such as 10-1000. This can happen as an intermediate result in calculations whose correct final answer depends on not rounding this value to 0.

The symmetric level-index system (SLI) was created to address this. In the SLI system, a reciprocal bit r is added. As in level-index, there is a sign bit s and a level l. If r=0 we have the same interpretation as in LI: the value is -1sFge(l,i). If r=1, the quantity being represented is -1s/Fge(l,i).

Once you have the reciprocal bit, values from 0 to 1 can be handled by setting the bit with a value above 1. For example, 0.5 and 2.0 are the same except that 0.5 has the reciprocal bit set. This means that a level l=0 no longer happens. In order not to "waste" the value l=0, the level is just decreased by 1.

Here are some examples of numbers in the SLI system listed in ascending numerical order. These are all 32-bit word representations, and the level l is a 3-bit quantity that can range from 0 to 7. This leaves 27 bits for the i (index) field, which is here shown in hexadecimal. In the rightmost column, extra digits are used to show the size of the difference between adjacent values. You can clearly see how there is more precision for small numbers (like 5) than for big numbers (like 5×10100). I have not seen any descriptions of the handling of special numbers like infinity or negative 0, so I had to guess.

srl i represents the value
1 1 7 7FFFFFF negative infinity
1 0 3 0000000 -3814279.1
1 0 2 7FFFFFF -3814277.9
1 0 2 0000000 -15.1542622
1 0 1 7FFFFFF -15.1542619
1 0 1 0000000 -2.71828183 = -e
1 0 0 7FFFFFF -2.71828180
1 0 0 0000001 -1.0000000075
1 1 0 0000000 -1 (exact)
1 1 0 0000001 -0.9999999925
1 1 0 58B90BF -0.5000000028
1 1 0 58B90C0 -0.4999999990
1 1 1 0000000 -0.3678794411 = -1/e
1 0 0 0000000 -0 (not used)
0 0 0 0000000 0 (exact)
0 1 1 0000000 0.3678794411 = 1/e
0 1 0 7FFFFFF 0.3678794439
0 1 0 58B90C0 0.4999999990
0 1 0 58B90BF 0.5000000028
0 1 0 0000001 0.9999999925
0 1 0 0000000 1 (exact)
0 0 0 0000001 1.0000000075
0 0 0 58B90BF 1.999999988
0 0 0 58B90C0 2.000000003
0 0 0 7FFFFFF 2.71828180
0 0 1 0000000 2.71828183 = e
0 0 1 3CE9CCA 4.999999948 note 8 usable digits
0 0 1 3CE9CCB 5.000000008 and no exact value for 5
0 0 2 0000000 15.1542622 = ee
0 0 3 0000000 3814279.1 = eee
0 0 3 4389695 4.999954139×10100 note 5 usable digits
0 0 3 4389696 5.000033872×10100
0 0 3 7FFFFFF 7.2×101656519
0 0 4 0000000 2.3×101656520
0 0 5 0000000 101.0125×101656520
0 0 6 0000000 10101.0125×101656520
0 0 7 0000000 1010101.0125×101656520
0 0 7 7FFFFFE 101010109.7×101656518
0 1 0 0000000 infinity

Papers about LI and SLI: [2], [3], [4], [5], [6], [7], [8], [9], [10]


Lexicographic Strings

The following systems all share the properties:

The Ubiquitous Practice of Communicating the Biggest Part First

In Ifrah's 1999 book The Universal History of Numbers [12] there are dozens of examples of written and spoken number systems, covering 6000 years of history and including indigenous cultures from five continents. In 37 cases there are illustrations of specific large numbers made of parts like "one hundred and forty-three". All but three present the largest part of a number first, and two of those three (Turkish 8th century and Inca 12th century) are unclear. In this table, the location in [12] is given as a page number with "a" or "b" to indicate which column contains the illustration/example, and B or S indicates whether the bigger or smaller part of the number is written first:

page and column culture and date B/S page and column culture and date B/S
26a English present B 238b Jewish 3c CE B
26b Tibetan present B 244b Arabia 10c CE B
27a Mongolian present B 247b Ethiopia 4c CE B
27b Turkey 8c CE ? 264a China 15c CE B
29a Sanskrit 5c BCE S 267a China 1c CE B
68b Inca 12c CE ? 270b China 12c BCE B
88a Sumeria 4m BCE B 277a China, Japan current B
118a Proto-Elamite 3m BCE B 309b Mayan 7c CE B
138b Sumero-Akkadian 17c BCE B 319a Mayan 4c CE B
147b Babylonia 19c BCE B 326b Hittite 14c BCE B
152b Uruk Seleucid 2c BCE B 327a Aztec 12c CE B
166b Egypt 26c BCE B 329b Armenia 5c CE B
171b Hieratic 3m BCE B 331a Egypt 2c BCE B
180a Crete 16c BCE B 332a Aramaea 8c BCE B
182b Greece 5c BCE B 332a Sighalese 7c CE B
186b Sheba 2c BCE B 334b Tamil 7c CE B
189b Rome 3c BCE B 335a Malay 7c CE B
190a Etruscan 4c BCE B 336a Mari 19c BCE B
221b Greece 3c BCE B

To a modern observer it is easy to see why so many cultures chose to put the biggest part of a number first: bigger is usually more important, and in speech one usually benefits from delivering the most important part of a communication up-front.

Following this "most important first" principle, a lexicographic string representation makes it easy to distinguish large numbers from small just by looking at the beginning of the text representing each number.

To illustrate how our written numerals fail to meet this ideal, consider the numbers 1111 and 999. The first is larger, but you cannot tell just by looking at the first few digits. It is only after examining all of the digits of both numbers that you can see which is larger.

Lexicographic string representations have two properties: they are text-string representation systems, and they are lexicographic.

In a text-string representation system, any number N has a text representation consisting of a sequence of characters from some type of "alphabet". For example, we normally use an alphabet of ten symbols, the digits 0 through 9.

Lexicographic is defined in the following way: For any two numbers A and B, consider their written forms w(A) and w(B). In a lexicographic system, the following rules always hold true:

If the first character of the w(A) comes after the first character of w(B), then A is larger than B.

If the first character of the w(A) comes before the first character of w(B), then A is smaller than B.

Furthermore, if the first characters of w(A) and w(B). are the same, then the size of A relative to B can be determined by looking at the second character. In general:

If A is greater than B, then the text representing A comes after the text representing B, using a standard dictionary-type ordering.

Knuth's Systems

Three systems described by Knuth [1] use an alphabet of just two symbols: the binary digits 0 and 1. All comprise encodings of the nonnegative integers.

The first and simplest represents each N with by a string of N 1's followed by a single 0:


0 = 0
1 = 10
2 = 110
3 = 1110
4 = 11110
5 = 111110
. . .

Interpreted as binary numbers these strings are the terms of Sloane's sequence A000918(N+1).

The second Knuth system is based on the ordinary binary representation of N, but replaces the initial 1 with the the string from the above system, representing the value of length(N)-1 where "length(N)" represents the number of binary digits in the ordinary representation of N. So, for example the number 5 has the binary representation 101, and the length of this is 3 and length-1 is 2. So Knuth takes the initial 1 and replaces it with the representation of 2 above: 110 followed by 01, making the 5-character string 11001. Here is an illustration of this system:


0 = 00
1 = 01
2 = 100
3 = 101
4 = 11000
5 = 11001
6 = 11010
7 = 11011
8 = 1110000
9 = 1110001
10 = 1110010
11 = 1110011
. . .
15 = 1110111
16 = 111100000
17 = 111100001
. . .
31 = 111101111
32 = 11111000000
. . .

Interpreted as binary numbers these strings are the terms of Sloane's sequence A171885(N).

In each string, the length of the initial 111111... portion is the same as the length of the 110101... portion. At each power of 2, both parts increase in length by 1 symbol. That means that on average, this system uses about twice as many binary digits as a normal binary number, but it does it without needing to know in advance how many binary digits you're going to use.

The third Knuth system takes this one step further: the initial "11110..." string is replaced with a more compact representation expressing how many 1's appear in that string, and (the really clever bit) uses its own representation for that encoding. This never results in problems because each needed value is smaller than the value we're trying to define, so as long as we start with 0=0 the rest of the system lifts itself up by its bootstraps, so to speak:

N structure string
0 (0) 0
1 (1 0) = (1 0) 10
2 (1 1 0) = (1 10 0) 1100
3 (1 1 1) = (1 10 1) 1101
4 (1 2 00) = (1 1100 00) 1110000
5 (1 2 01) = (1 1100 01) 1110001
6 (1 2 10) = (1 1100 10) 1110010
7 (1 2 11) = (1 1100 11) 1110011
8 (1 3 000) = (1 1101 000) 11101000
9 (1 3 001) = (1 1101 001) 11101001
10 (1 3 010) = (1 1101 010) 11101010
11 (1 3 011) = (1 1101 011) 11101011
12 (1 3 100) = (1 1101 100) 11101100
13 (1 3 101) = (1 1101 101) 11101101
14 (1 3 110) = (1 1101 110) 11101110
15 (1 3 111) = (1 1101 111) 11101111
16 (1 4 0000) = (1 1110000 0000) 111100000000
17 (1 4 0001) = (1 1110000 0001) 111100000001
... . . . . . .
31 (1 4 1111) = (1 1110000 1111) 111100001111
32 (1 5 00000) = (1 1110001 00000) 1111000100000
... . . . . . .
255 (1 7 1111111) = (1 1110011 1111111) 111100111111111
256 (1 8 00000000) = (1 11101000 00000000) 11110100000000000
... . . . . . .
511 (1 8 11111111) = (1 11101000 11111111) 11110100011111111
512 (1 9 000000000) = (1 11101001 000000000) 111101001000000000
... . . . . . .
65535 (1 15 111111111111111) = (1 11101111 111111111111111) 111101111111111111111111
65536 (1 16 0000000000000000) = (1 111100000000 0000000000000000) 11111000000000000000000000000
... . . . . . .

Interpreted as binary numbers these strings are the terms of Sloane's sequence A010097(N). According to that entry, these strings are also known as "prefix" or Levenshtein codes.

Munafo Alphabetic PT System

I invented this system around the time I began my numbers webpage, to have a consistent way to name the labels for cross-references in the RHTF. The alphabet for this system has seventeen symbols: letters a through e and p, digits 0 through 9, and underscore _.

It represents nonnegative values of the form A/10N where A and N are both nonegaive integers. In other words, anything that can be expressed with a finite number of digits and no signs or symbols except perhaps a single decimal point.

The basic approach is to divide numbers into a balanced set of "ranges" that each start with a different letter, and make sure that the numbers within each range are in order.

The initial design for the system had seven ranges, distinguished by the first character of the string representation, as follows:

0-9 : numbers less than 10.0
a : numbers from 10.0 to 99.999...
b : numbers from 100.0 to 999.999...
c : numbers from 1000.0 to 9999.999...
d : numbers from 10000.0 to 99999.999...
e : numbers from 105 to 10300
p : numbers from 10300 to 1010300

Later I modified it to extend the p range to include everything from 10300 on up.

For the first range (numbers less than 10), the string has one digit, an optional _ symbol representing the decimal point, and (optionally) more digits representing the fractional part.

For the next four ranges, the letter tells you how many digits to expect before the optional _ character. The string for 27 is a27; the string for 123.45 is b123_45, and so on.

I could have defined more letters beyond a, b, c and d this way, but I felt it was becoming difficult to keep track of how many digits are indicated by each letter, and the letter e has a familiar meaning related to scientific notation.

So the next range, using strings starting with e, is for numbers that are expressed in "exponential notation" on a calculator. The e is followed by a three-digit exponent representing a power of 10, then after that are the first one (or several) digits of the number. For example, 6.02×1023 is represented by the string e023_602. A decimal point is not given with a _ character because it must always be after the first digit of the number (in this case, right after the 6). However, I do include a _ character right after the 3-digit exponent to make it clear where the exponent ends. If the number is an exact power of ten, I still have a digit "1", so googol is e100_1, not simply "e100".

Above 10300, a letter p is added to the beginning to indicate that we start with "ten to the power of..." (the p stands for "power"). Immediately after this p should be a string representing what 10 is raised to the power of. So googolplex 1010100 could be represented by the string "pe100_1".

To go higher than 1010300 I considered adding more p's to the beginning of the string to represent extra 10's, but I needed to discuss power-towers high enough to make that impracticl for use as labels. I wanted a system that could go a further on a single p, and then add a second p to extend the range much more. (The numbers I wanted to represent are discussed on my large numbers page starting around the hyper5 section.)

Therefore, instead of just using n consecutive p's to make a power-tower of height n+1 or n+2, I made the rule for the p group require following the p immediately with the encoding of how many powers of 10 are added at the beginning; then an underscore _. This means that we start with p1_ instead of just "p", then p2_ (instead of "pp") to make the power-tower one step higher, then p3_, and so on. Thus, a string that starts with p has symbols representing two numbers — the height of the tower and the "thing at the top".

So to make the string for googolplex we put "p1_" in front of the string for googol (which is e100_1 as above); the whole string is p1_e100_1. 10 to the power of googolplex, 101010100, is p2_e100_1, and 10 to the power of that is p3_e100_1, and so on.

That first number after the p can any string in this format: "a22" means 22, so pa22e033184 is a tower with twenty-two 10's and 1.84×1033 on top. Think of it as "p(a22)_e033_184".

We get to the numbers requiring two p's when the height of the tower is 10300 or more. For example, given that 10↑↑10 is represented by p8_e010_1, we can represent 10↑↑↑3 = 10↑↑(10↑↑10) concisely with the string pp8_e010_1_1. This should be thought of as "p(p8_e10_1)_1", to indicate a power-tower of 10's of height p8_e010_1, with a 1 on top. This effectively means that the result of a hyper5 or "pentation" operation a↑↑↑b will start with about b copies of p. In such cases it is likely to end with a lot of ..._1_1_1. I decided that extra duplicates of "_1" at the end can be left off.

Until recently I didn't need any RHTF labels with multiple p's at the beginning, as my numbers page did not go beyond 2↑↑1000. There is now a single double-p entry, thanks to

Patashu

's break_eternity.js library, which has a computational limit of 10↑↑(1.8×10308).

Here is an extensive list of examples to show how the system handles these cases. For convenience, scientific notation is used in the explanations here, so for example I use "6.02×1023" for the integer value 602000000000000000000000.

ASCII representation Value
0 0
0_0000000000000234 0.0000000000000234 = 2.34×10-14    (All 0's are shown, negative exponents in scientific notation are not supported.)
0_1 0.1
2_7 2.7
2_7000 2.7    (You can add extra 0's on the end)
9_34 9.34
a10_2 10.2    (Numbers with 2, 3, 4, or 5 digits use the letters a through d)
b256 256
b300_0 300.0
c1 1000    (Trailing 0's can be left off.)
c1000 1000
c1729 1729
d19683 19683
d99999 99999
e005_1 100000 = 1×105    (Numbers from 6 to 300 digits use e prefix, a 3-digit exponent and a mantissa)
e005_100 100000    (Extra 0's can be left off at the end provided it doesn't result in a different value from that which is intended.)
e005_1000007 100000.7
e005_13 130000    (Again, extra 0's left off)
e023_602 6.02×1023
e100_1 10100    and the "1" at the end is not optional.
e299_9999999999999999999 9.999999999999999999×10299    (This is close to the end of the strings that start with e.)
p1_b300_0 10300    (I call this "1 P.T. 300", which means "One Power of Ten, 300". The string is formed from the prefix "p1_", representing the single power of 10, followed by "b300_0" which is the string representing 300.0)
p1_c2345_602 4×102345 ≈ 102345.602 = "1 P.T. 2345.602" (The ".602" is approximately the logarithm to base 10 of 4, so 10 to the power of 2345.602 is 4 times 10 to the power of 2345.)
p1_e006_3 103000003 = 103×106+3    ("milli-millillion", the largest named number in the list published by Edward Brooks in 1904.)
p1_e007_144 4.4823309×1014471464 = 224036582(224036583-1)    (Held the record for a while as the largest known perfect number.)
p1_e009_345 103.45×109 = "1 P.T. 3.45×109"
p1_e010_1 1010000000000 = 101010    (NOTE that the representations "p2_10" for "2 P.T. 10" and "p3_1" for "3 P.T. 1" are not allowed, even though their interpretations equal the same value.)
p1_e299_999 109.99×10299 = "1 P.T. 9.99×10299"
p2_b300_0 1010300    (Called "2 P.T. 300" because it is "two powers of ten" with 300 above the two 10's. This is the first string that starts with p2.)
p3_b300_0 101010300    (Called "3 P.T. 300" because it is "three powers of ten" with 300 above the three 10's. This is the first string that starts with p3.)
pa10_c2345_6 10^(10^(...^(102345.6)))    (with a total of ten 10's. This is "10 P.T. 2345.6", that is, ten powers of 10 with 2345.6 at the top.)
pa27 27 P.T. 1    (A power-tower containing 27 10's and a 1 at the top. Since 101=10, the 1 can be ignored so it's just 27 powers of 10.)
pa99_c2345_6 99 P.T. 2345.6
pb100 100 P.T. 1    (100 powers of 10 with a 1 at the top. Since 101=10, the 1 can be ignored so it's just 100 powers of 10, and the "_1" can be left off.)
pb100_1 100 P.T. 1    (100 powers of 10 with a 1 at the top. Since 101=10, the 1 can be ignored so it's just 100 powers of 10.)
pb100_b234.5 100 P.T. 234.5
pb100_c1000 100 P.T. 1000
pb101_3 101 P.T. 3    (101 powers of 10 with a 3 at the top. Since 103=1000, this is the same as 100 P.T. 1000. This is the only potential ordering problem in the system. You have to be consistent, and not use "pb100_c1000" and "pb101_3" in the same set of strings.)
pb256_b619_299 256 P.T. 619.299    (255 powers of 10 with 1.99237...×10619 at the top. This is the large number called mega by Hugo Steinhaus and Leo Moser)
pd99999_c2345.6 99999 P.T. 2345.6
pe005_1_1 100000 P.T. 1    (Again, in this case, extra 0's can be left off at the end provided it doesn't result in a different value from that which is intended)
pe005_100000_1 100000 P.T. 1
pe005_100000_2 100000 P.T. 2    (If the exponent at the top of the power-tower is anything other than 1 (in this case a "2"), then all the 0's in 100000 need to be shown.)
pe005_100000_c2345_6 100000 P.T. 2345.6
pe010_1 10000000000 P.T. 1    =    (1010) P.T. 1
pe010_100000 10000000000 P.T. 1    =    (1010) P.T. 1
pe010_10000000000 10000000000 P.T. 1    =    (1010) P.T. 1
pe010_10000000000_1 10000000000 P.T. 1    =    (1010) P.T. 1
pe010_10000000000_2 10000000000 P.T. 2    =    (1010) P.T. 2 (Again, all 0's need to be shown in this case because there is a precise exponent at the top.)
pe010__1000__2__1 (This string is illegal. It has an ambiguous interpretaton: either pe010_10000000000_2_1 which is "2.1" at the top of 10000000000 10's, or pe010_10002000000_1 which is "1.0" at the top of 10002000000 10's. In addition to being ambiguous it would sort out of order: "pe010_1000_2_1" comes before pe010_10000000000_1 in lexicogrphical ordering even though both of its possible interpretations are larger numbers.)
pe299_9999999 (9.999999×10299) P.T. 1
pp1_b300 (10300) P.T. 1    (A power tower of 10's, of height 10300. This is the first string that starts with two p's.)
pp1_e006_1 (101000000) P.T. 1    (A power tower of 10's, of height 101000000 = 10106.)
pp2_b300_1 (1010300) P.T. 1    (A power tower of 10's, of height 1010300.)
ppb27_1 (27 P.T. 1) P.T. 1    (A power tower of 10's, of height X, where X is a power tower of 10's of height 27.)
ppp1_b300 (10300 P.T. 1) P.T. 1    (A power tower of 10's, of height X, where X is a power tower of 10's of height 10300. This is the first string that starts with three p's.)
. . .

The system continues in this way, adding one p for each repetition of "... where X is a power tower of 10's of height Y, ...". When the number of p's becomes unwieldy, the numbers are hard to work with in any system because the numbers I wanted to discuss were not equal to two numbers with a single "hyper" operator. In general it will eventually occur that definitive ordering is nonobvious, so I suppose any mechanical lexicographic system will fail to reach as far as the fast-growing functions used to define the largest numbers.

Order-Preserving Verbal Names

As mentioned above using the examples from [12], in the vast majority of cultures, the largest part of a number is said or written first. For example, consider 220=1048576. In standard English, this number is said "one million, forty-eight thousand, five hundred and seventy-six". Saying the largest part first has a benefit that is shared with the lexicographic string systems: the listener gets the most important part of the communication up-front.

In describing his bit-string systems, Knuth describes them as order-preserving because if expressed as a string of 1's and 0's, sorting lexicographically results in numeric value ordering. I generalise the designation "order-preserving" to include the concept of expressing the most significant information first.

In English the "largest first" custom is broken when communicating large numbers using scientific notation. When someone says "6.02×1023" in words, they say "six point zero two times ten to the power of twenty-three". Listening to these words, it initially sounds as though they are describing a number close to "six". The listener has no idea how large the number might be until the very end.

one million,forty-eight-thousand,five hundred andseventy-six

biggest part

smallest part
but
six point zero two times ten to the power of twenty-three

small part

even smaller

really big!
??huh??

To address this, the speaker can reverse the two parts of 6.02×1023, as if written "1023×6.02". When spoken, this becomes "ten to the power of twenty-three, times six point zero two" with the comma naturally serving the same function as it does after "one million" in my previous example:

*ten to the power
of twenty-three,*
times six point zero two

biggest part
much better!

smallest part

In general, the "inverted scientific notation grammar" is:

ten to the power of some number times some number

This works pretty well for a while, until we get to numbers like 1010100. This would normally be said "ten to the power of ten to the power of one hundred". Once we begin to encounter such "power towers" that contain two or more numbers, a new problem arises: the listener won't have a clue how big the number is until discovering how many times the speaker is going to repeat the words "ten to the power of...". Here is the old value of the larger Skewes' number:

ten to the power of ten to the power of ten to the power of thirty-four

it's big

getting bigger...

when will it end?

oh, I guess we're done
   backwards again...   

Once again we can address the problem by giving the most important bit of information up-front. This time, the most important bit of information is the number of 10's in the exponent stack before the final number at the top, which is then spoken using the existing convention for normal of "scientific notation" numbers. So, for example, to express 10101034, we can use the following (somewhat awkward) construction: "A power tower of three tens, with thirty-four at the top".

a power tower of three tens, with thirty-four at the top

really big

now we know how big

now we know more precisely

For reasons that become clear when working with such numbers, the "number at the top" can be pretty nearly anything that would normally be expressed in scientific notation (see the latter parts of my numbers pages for plenty of examples). Therefore we generalize the grammar:

a power-tower of some number tens, with some number at the top

where in this case "some number" can be anything from the "inverted scientific notation grammar" above.

Let us devise a way to describe the value of 10101010101.1, which comes up in a discussion by Don Page of the quantum Poincare recurrence time of a hypothetical huge universe, and works out to 10101.51×103883775501690 or 1010103.88×1012. The power-tower is five numbers high, and the first three are tens. The part at the top is 3.88×1012, but as previously discussed, this would be better if rearranged to 1012×3.88. So the full name works out to:

A power-tower of three tens, with ten to the power of twelve times three point eight eight at the top.

Now I'll present some more examples in increasing numerical order. For these examples I imagine that the following abbreviations have been adopted through gradual evolution and speaker laziness: tenpow for "ten to the power of"; powtow for "a power-tower of" (rhymes with kowtow); with for "tens, with"2; and omitting the final words "at the top".

2.54 "two point five four"
1000 "one thousand"
6.02×1023 "tenpow twenty-three times six point zero two"
10100, better known as "googol" "tenpow one hundred times one" or simply "tenpow one hundred"
1010000000000 "tenpow ten billion times one" or simply "tenpow ten billion"
101010 = 1010000000000 "powtow two with ten"
1010100, better known as "googolplex" "powtow two with one hundred"
101010000000 = 1010107 "powtow two with ten million" or "powtow three with seven"
10↑↑10 "powtow nine with ten" or more simply "powtow ten tens"
(↑↑ is iterated exponentiation)
10↑↑(10↑↑10) "powtow powtow ten tens tens"

After this point, the repeated "powtow" again causes a problem — the reader may wish to speculate on how to extend the grammar further.

However, if we stick with just one repetition of "power tower", we have a grammar that is order-preserving, that communicates the "most important bits" about a number's size right from the start, preserves the standard wording for the most common sizes of numbers, and also allows the communication of almost incomprehensibly large values on the high end.


Alternative Algorithms for Standard Formats

For now, this section is just a bibliography listing.

Exception Handling

John R. Hauser, "Handling Floating-Point Exceptions in Numeric Programs", ACM trans. on Prog. Lang. and Systems, v 18 no 2 (Mar 1996), 139--174

CORDIC algorithms

C. W. Schelin, "Calculator function approximation", Amer. Math. Monthly 90 (1983), 317--325.

Neil Eklund, "CORDIC: Elementary Function Computation Using Recursive Sequences"

"A Survey of CORDIC Algorithms for FPGA Based Computers", Ray Andraka, FPGA '98. Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays, Feb. 22-24, 1998, Monterey, CA. pp. 191--200 http://www.andraka.com/files/crdcsrvy.pdf

Composite and Hybrid Systems

"Towards Exact Geometric Computation" (To Appear, Computational Geometry: Theory and Applications) Chee-Keng Yap, 1994

"Floating Point and Composite Arithmetics", W.N.Holmes, Proceedings of the Eighth Biennial Computational Techniques and Applications Conference, Adelaide, South Australia, Sep 20 - Oct 1, 1997

Neville Holmes, "Composite Arithmetic: Proposal for a New Standard." IEEE Computer 30(3): 65-73 (1997) Not online, but see: HolmesArith.pdf, HolmesDisp.pdf and HolmesStor.pdf

Unevaluated Sum of Multiple Native Floating-Point Values

This format is a simple way to get higher precision, without higher range, from another existing format. It relies on certain properties of rounding that are followed by most hardware floating point formats including IEEE 754. See my page on f161 "triple-double" precision for more details.

Dekker, T. (1971) A floating-point technique for extending the available precision. In Numerische Mathematik 18, 224-242.

Wyatt, W.T., et. al. (1976) A portable extended precision arithmetic package and library with Fortran precompiler. ACM Trans. Math. Softw. 2(3), 209-231.

Brent, R. (1978) A Fortran multiple-precision arithmetic package. ACM Trans. Math. Softw. 4(1), 57-70.

Briggs, K. (1998) Doubledouble floating point arithmetic. http://keithbriggs.info/software.html

Li, X., et. al. (2000). Design, implementation and testing of extended and mixed precision BLAS. Lawrence Berkeley National Laboratory. Technical Report LBNL-45991

Hida, Y., et. al. (2001) Quad-double arithmetic: algorithms, implementation, and application. Lawrence Berkeley National Laboratory, Technical Report LBNL-46996.

Priest (1991) Algorithms for Arbitrary Preciion Floating Point Arithmetic. Proc. 10th Symposium on Computer Arithmetic, IEEE Computer Society Press, Caif., 1991.

Priest (1992) On Properties of Floating Point Arithmetics: Numerical Stability and the Cost of Accurate Computations.

Shewchuk (1997) Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates. Discrete and Computational Geometry 18(3), 305-363.

Arbitrary Precision

R.P. Brent, "Fast multiple-precision evaluation of elementary functions", Journal of the ACM, 23(2), 1976, pp. 242-251.

General and Survey papers

"Computer Arithmetic: Principles, Architectures, and VLSI Design (1999)", Reto Zimmermann, Lecture notes March 16, 1999


Incremental Games

Many online (browser-based) games of the "idle" or "incremental" variety require very large numbers, and for these a variety of special representation formats has been invented. These include simple modifications such as storing a number as its logarithm (extending the overflow limit to 1021024), the use of a level-index system for repeated exponentiation, and more elaborate multi-level-index systems to handle repeated use of the tetration or "hyper4" function a↑↑b and higher operators with more arrows. The ExpantaNum.js library by Naruyoko can handle anything up to Graham's number

As an example, I discuss the Omagenum.js library in more detail, in the online game section of my large numbers notes.


Bibliography

[1] Donald E. Knuth, Supernatural numbers. Appears on pages 310-325 in The Mathematical Gardner, ed. David A. Klarner (1981).

[2] Clenshaw, C. W. and Olver, F. W. J.: "Beyond floating point." J. Assoc. Comput. Mach. 31 (1984), 319--328.

[3] C. W. Clenshaw, D. W. Lozier, F. W. J. Olver, and P. R. Turner, "Generalized exponential and logarithmic functions", Comput. Math. Appl. 12B (1986), 1091--1101.

[4] C.W. Clenshaw, F.W.J. Olver, and P.R. Turner. "Level-index arithmetic: an introductory survey." In P.R. Turner, editor, Numerical Analysis and Parallel Processing, pages 95--168. Springer-Verlag, 1987. Lecture Notes in Mathematics, No.1397.

[5] Clenshaw, C. W. and Olver, F. W. J. 1987. "Level-index arithmetic operations." SIAM J. Num. Anal. 24, 2 (Apr.), 470--485.

[6] C. W. Clenshaw and P. R. Turner, "The symmetric level-index system", IMA J. Numer. Anal. 8 (1988), 517--526.

[7] Peter R. Turner. "A software implementation of SLI arithmetic." In Proc. 9th Symp. on Computer Arithmetic, pp. 18--24. IEEE Computer Society Press, 1989.

[8] D.W.Lozier and F.W.J.Olver, "Closure and precision in level-index arithmetic", SIAM J. Numer. Anal. 27 (1990), 1295-1304.

*In this paper he shows that arithmetic using the four standard operators is closed in a level-index system with level <= 6 and mantissa <= 5,500,000 bits.*

[9] D.W. Lozier and P.R. Turner, "Error-Bounding in Level-Index Computer Arithmetic", in G. Alefeld and J. Herzberger, eds., Numerical Methods and Error Bounds, Akademie Verlag, Berlin, 1996, pp. 138-145.

[10] Michael A. Anuta, Daniel W. Lozier, Nicolas Schabanel, And Peter R. Turner, "Basic Linear Algebra Operations In Sli Arithmetic", Euro-Par '96, LNCS 1124, Vol. II, Springer-Verlag (1996), pp. 193-202.

[11] Michael A. Anuta, Daniel W. Lozier, and Peter R. Turner, "The MasPar MP-1 As a Computer Arithmetic Laboratory" J. Res. Nat. Inst. Standards and Technology 101,2 (1996) pp. 165-174. http://nvl.nist.gov/pub/nistpubs/jres/101/2/j2anut.pdf

Very good survey, includes several other number systems as well

[12] Georges Ifrah, The Universal History of Numbers, ISBN 0-471-37568-3. (1999). (A concise summary of many ancient systems, all with dates and illustrated examples appears in chapter 23, "The Final Stage of Numerical Notation".)


Footnotes

1 : George Spelvin, email correspondence.

2 : Due to the power tower paradox, the number of items in a power tower matters far more than their exact values, so tens is unnecessarily specific. For the purposes of this naming system, as well as for nearly all computational purposes, a power tower of three 10's with googol at the top is equivalent to a power tower of three 2's with googol at the top, or three 1000's with googol at the top:

222googol ≈ 101010googol ≈ 100010001000googol

See also the uncomparably larger discussion on my large numbers page.


Robert Munafo's home pages on AWS    © 1996-2024 Robert P. Munafo.    about    contact
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Details here.

This page was written in the "embarrassingly readable" markup language RHTF, and was last updated on 2024 Jan 04. s.27