mdbtxt1
mdbtxt2
Proceed to Safety

Minimally Complex Sequences    

Quick Links:    MCS Source Code    License (Creative Commons Attribution-NonCommercial 4.0 International)

This is a listing of integer sequences generated by what I call 'classical' definitions. These include virtually all number sequences known by ancient cultures (the notable exception is the prime numbers). There are also many others that have equally simple definitions. Examples include:

The subject of this page is a broader class of recurrence relations than those described on the 2nd-order linear recurrence page, but not broad enough to include the recurrence formulas on my accelerating sequences page.

My comprehensive sequences page is broader still, and has pointers to many other specialized pages on other subcategories of sequences.

Contents

Types of Sequences in my Catalog
Detailed Description of Sequence Numbering System
C Source Code
Ordering Used in the Table
Main Table
      There is an index to the main table at the bottom of each page.
Statistics
Footnotes

Types of Sequence Definitions

The following broad categories of sequence definitions are used:

Direct: Each term AN is computed directly from a formula involving N but not depending on other terms in the sequence.

(This category includes the even numbers, if defined as EN=2N.)

First-Order Recurrence1: After an initial term A0, subsequent terms AN+1 are defined in terms of a formula involving AN (and possibly also involving N).

(This category includes the factorials, if defined as F0=1, FN=N  FN-1, and the even numbers, if defined as E0=0, EN=2+EN-1. Notice that the same sequence can have both a direct definition and a recurrence definition.)

Second-Order Recurrence1: After two initial terms A0 and A1, subsequent terms AN+1 are defined in terms of a formula involving AN-1, and possibly also AN and/or N.

(This category includes the Fibonacci numbers, F0=0, F1=1, FN+1=FN+FN-1. It includes all of the 2nd-order linear recurrence sequences described here but also a lot of others such as MCS54400, which is nonlinear because its formula has an NAN term.).

Third-Order Recurrence1: After three initial terms A0, A1 and A2, subsequent terms AN+1 are defined in terms of a formula involving AN-2, and possibly also AN, AN-1, and/or N.

All formulas are a sum of terms involving a constant coefficient multiplied by some combination of a power of N and zero or more previous terms. This table shows the combinations that are used:

initial
terms
   coefficients
        1         N           N2         N3
A0    AN    NAN
A1    AN-1 NAN-1
A2    AN-2

Direct formulas use only the first row; first-order recurrence formulas use the first two rows, and so on.

Horadam sequences

Some of the sequences belong to a more narrowly-defined category studied by Horadam and possessing some interesting properties. They are discussed more fully on my linear recurrences page: The Horadam Sequences.

Lucas sequences

The 2nd-order Lucas sequences are another narrowly-defined category of linear recurrence relations. Again, see my linear recurrences page: The Lucas Sequences.

Sequence ID Number

Each sequence definition is given an identification number that uniquely specifies the initial term(s) (if any) and formula. All ID numbers begin with the letters MCS (for Minimally Complex Sequence) followed by one or more decimal digits. The decimal digits are chosen by the following steps:

1. The sequence type, initial value(s) if needed and coefficients are encoded into fields, either by using the coefficient directly as a field value, or using a translation table that ensures the "default" coefficient has a field value of 0.
2. The fields are Huffman-encided into binary strings.
3. The binary strings are concatenated into a single bitstring.
4. The bitstring is modified into a truncated bitstring.
5. The truncated bitstring is treated as a single positive integer; this is the sequence number, and the decimal representation of this integer is placed after MCS to make the sequence ID number.

Step 1: Fields

The first field gives the sequence type as defined above: 4 for direct, 3 for 1st-order recurrence, 2 for 2nd-order recurrence and 1 for 3rd-order recurrence. After this there are several more fields for the coefficients, in the order shown here:

For direct definitions: K, CN, CN2, CN3

For 1st-order recurrence: K, CAN, CN, CN AN, CN2, CN3, A0

For 2nd-order recurrence: K, CAN, CN, CAN-1, CN AN, CN2, CN3, CN AN-1, D1, A0

For 3rd-order recurrence: K, CAN, CN, CAN-1, CN AN, CN2, CAN-2, CN AN-1, CN3, D2, D1, A0

where D1=A1-A0-1 and D2=A2-A1-1

Step 2: Huffman Encoding

All fields are then converted to one or more binary digits according to the following table:

  0 → 0
  1 → 100
  2 → 1010
-2 → 10110
  4 → 101110
-4 → 1011110
-5 → 1011111
-1 → 110
  3 → 1110
  5 → 111100
  6 → 1111010
-6 → 1111011
-3 → 111110
  7 → 111111000
  8 → 111111001
  x → 11111101x (for future expansion)
-7 → 111111100
-8 → 111111101
  x → 11111111x (for future expansion)

Then the fields, so encoded, are concatenated to form the bitstring. This table was generated by a normal Huffman statistical division based on the frequency of coefficient values in all sequence definitions with a complexity score (see below) of 9 or less; then the lengths were diminished as much as possible with the goal of reducing the average overall bitstring length; it was then reordered so that the shorter (more common) bitstrings end in 0, and so that larger coefficients map onto larger bitstrings.

Steps 3-4: Truncated Bitstring, and Sequence Number

To make the truncated bitstring, first an initial 1 is added (because otherwise, initial 0's would be lost when the bitstring is converted to a decimal integer). Then any trailing 0's are removed (there is no loss of information because, as specified below, extra 0 terms at the end are always assumed and allow future expansion). Then a single trailing 1 is removed (again there is no loss of information because at this point we know there is at least one trailing 1). The bitstring is then converted to decimal as if it were a binary integer. "MCS" followed by this decimal number is the sequence ID number. No zeros are added; the sequence numbers have a variable number of digits. In general, sequences with simpler definitions will have smaller sequence numbers.

Future Expansion

For future expansion of the sequence generator, sequence types other than 1 through 4 are unused (which means that there are currently no bitstrings starting with 10, corresponding to sequence numbers between 2N and 3×2N-1 for all N; plus other smaller ranges defined similarly). This provides space to define new sequence types and formulas, without having to resort to long sequence ID numbers. These new sequence types can use any encoding whatsoever, even abandoning the requirements given in the next few paragraphs, so long as the bitstring of all 0's remains undefined (because it results in a 0 or null sequence number).

Additional fields can be added at the end of the existing coefficient lists. For example, an N4 term could be added, with all existing sequences having a CN4 coefficient of 0. Any new fields and the corresponding formula terms will be defined with this compatibility in mind. (Here is an example: suppose it is decided to allow the coefficient of AN to be a fraction. This necessitates a new "denominator of CAN" term. Its useful values are positive integers; the value of 1 is the one that has no effect. So, a 1 in this coefficient would be encoded as a field value of 0; the higher values 2, 3, 4, 5, ... could be encoded as 1, -1, 2, 3, etc. respectively, in order to take maximum advantage of the huffman field lengths.) Since the Huffman table maps 0→0, and trailing 0's are discarded before converting to a decimal sequence ID, it is easy to see that any number of extra 0 fields can be added, with no effect on the resulting sequence ID number. Even after the hypothetical N4 and "denominator of CAN" terms are added, all previously-defined sequences will still have the same ID numbers.

Furthermore, for any new fields, a different Huffman table could be used, or the expansion could even use a non-Huffman type of encoding, so long as all sequence numbers as currently defined result in the sequences currently defined. (For example, suppose it is decided that the constant, N, N2 and N3 terms should each have an optional factor of -1N. In this case the best approach is to add 4 fields that are each exactly 1 bit wide (or a single field of 4 bits). For backwards compatibility, bit values of 0 mean there is no factor of -1N; and all sequence numbers would be interpreted as having this extra 4 bits at the end of the bitstring.)

There are by definition no sequence numbers with a leading 0 when expressed in decimal, such as MCS027. These are intended to be used as manually-assigned sequence numbers, as in the Sloane database (except that there is still no fixed number of digits). Thus, to find out the definition of MCS027 or MCS0143 you would have to look in the database. (No such database exists at present, however it is anticipated that these will be used to attach notes and cross-references, for example to document the multiple different equivalent definitions of a sequence; an example follows.)

Notes

The sequence number is not affected by the scoring system, although the scoring system affects which sequences are listed in the main table. There is, however, a rough correlation between the coefficient value weights in the scoring system and their Huffman symbol lengths.

Aliases

Some sequences are shown with multiple sequence numbers — an initial sequence number in larger print, followed by "(alias ...)" with additional sequence numbers. The aliases represent different formulas that create the same sequence. Here is an example of two definitions that are considered to be aliases:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
   MCS220: AN = N (score: 1)
-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, ...
   MCS886: AN = N - 1 (score: 2)

These sequences are considered equivalent because they differ only in the initial -1 at the beginning of MCS886. In the main table they are both listed under MCS220. The number MCS220 is chosen not because 220 is less than 886, but because MCS220 has a lower complexity score (1 vs. 2).

For the purposes of matching aliases, any initial -1, 0 or 1 terms are ignored but all other terms are significant, including -1, 0 or 1 terms that occur after the first term of magnitude 2 or greater.

Most of the time, aliases will have quite different sequence numbers, as in the example with MCS220 and MCS886. However, in a few rare cases the sequence numbers are very close (such as MCS860675 and MCS860676, two versions of the Lucas numbers that differ only by an initial "1"). Another curiousity (that has an easy explanation if you look at the encoding scheme) is seen in MCS220 = N, MCS440 = N2 and MCS880 = N3. And although it is not listed here, MCS110 is N0.

Examples

The Fibonacci numbers have a second-order recurrence definition: A0=0; A1=1; AN+1=AN+AN-1. The coefficients that need to be encoded are: sequence type=2; K=0; CAN=1; CN=0; CAN-1=1; CN AN=0; CN2=0; CN3=0; CN AN-1=0; A1-A0-1=0; A0=0. So the fields, before encoding are: 2, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0. By the Huffman table, this becomes the bitstring 1.1010.0.100.0.100.0.0.0.0.0.0 (after adding the initial 1). Then trailing zeros are dropped, and then a single trailing 1 is dropped, making 1.1010.0.100.0 or 1101001000; this is converted to decimal giving the value 840. So, this definition of the Fibonacci numbers is given sequence number MCS456.

Note that some sequences have multiple definitions. For example, the natural numbers may be defined as a direct sequence: AN = N. Using this definition the bitstring is 1.10110.0.100.0, and the sequence number is MCS108. However, the natural numbers can also be defined by a 1st-order recurrence A0=0; AN+1 = AN + 1. By this definition, the bitstring is 1.1110.100.100.0.0.0.0, and the sequence number is MCS244. As you might expect, the first definition has a lower complexity score, and that is the one shown in the main table.

Complexity Score

Each sequence definition is given a complexity score based on the number of terms and the size of its coefficients. The basic idea is to take the number of terms in the formula and add the absolute value of the initial terms plus any coefficients (a coefficient of 1 does not count). After a while I made the score calculation a bit more complex to better handle 2nd- and 3rd-order recurrence formulas.

For examples of how it works just browse the following table and search for "score: 3", etc.

The main table includes those sequences with complexity scores up to 5.

Efficient Implementation

I have written an efficient search engine that finds MCS sequences matching a given query, with results ordered by complexity score. The source code is here:

mcsfind.c
rpmtypes.h
int128.c
int128.h
type_s16.h
type_s32.h
type_s64.h

Build Instructions

These instructions are for UNIX, Linux or MacOS (using a Terminal or Shell window).

Save each of the source files on your computer, and remove the ".txt" from the end of each of the source files' names. Put them into directories that are set up like the following. The name of the directory "include" is important, but you can use different names for "development" and "mcs" if you wish:

development include rpmtypes.h int128.c int128.h type_s16.h type_s32.h type_s64.h mcs mcsfind.c

Then from within the development/mcs directory, use the following commands:

g++ -c -m64 ../include/int128.c -o int128.o g++ -m64 -I../include mcsfind.c -lm int128.o -o mcsfind

If the compilation is successful, you should be able to perform MCS searches like the following example:

   $ time ./mcsfind -w1 2 3 5 9 2, 3, 5, 9, 17, 33, 65, 129, 257, 513, 1025, 2049, 4097, 8193, 16385, 32769, 65537, 131073, 262145, 524289, 1048577, 2097153, 4194305, 8388609, 16777217, ... MCS252546 : A[0] = 2; A[N] = 2 A[N-1] - 1 (score: 4) 1, 2, 3, 5, 9, 15, 25, 43, 73, 123, 209, 355, 601, 1019, 1729, 2931, 4969, 8427, 14289, 24227, 41081, 69659, 118113, 200275, 339593, 575819, 976369, ... MCS802976 : A[0] = 1; A[1] = 2; A[2] = 3; A[N] = A[N-1] + 2 A[N-3] (score: 5) 1, 1, 2, 3, 5, 9, 18, 39, 89, 209, 498, 1195, 2877, 6937, 16738, 40399, 97521, 235425, 568354, 1372115, 3312565, 7997225, 19306994, 46611191, 112529353, ... MCS6904326 : A[0] = 1; A[1] = 1; A[K+1] = 2 A[K] + A[K-1] - K (score: 5)    real 0m0.008s user 0m0.006s sys 0m0.002s $

The mcsfind command has many options and includes built-in help. To see the built-in help, use the command ./mcsfind --help

Performance

Using the --report-all and -n options, along with the wc tool in UNIX, one can measure the speed with which mcsfind generates the N "simplest" sequences. For example, the command:

time mcs -ls20 --report-all -n10000 | wc

might show that mcsfind took 0.326 seconds to generate 10,000 sequences. In the following table I measure the number of sequences, and time taken, for various values of the -ls option, which sets an upper limit on the score of all the generated sequences. A sample command is:

time mcs --report-all -ls8 | wc

max score n time sequences/sec
3 73 0.180s 405
4 328 0.112s 2929
5 1316 0.161s 8174
6 4905 0.204s 24040
7 17278 0.567s 30470
8 58170 1.308s 44470
9 265303 6.605s 40170
10 866247 17.398s 49790
11 1805961 43.129s 41870
12 9727114 196.153s 49590
13 24867423 555.400s 44770
14 73177857 1447.398s 50560
15 73177857
16 242924041 4938.383s 49190

All tests were run on a 2.16 GHz Nehalem Xeon workstation, but the workstation was busier when the odd-numbered entries were run, giving those a lower benchmark in the sequences/sec column. I expect that any CPU with a speed of 2 GHz or faster will have no problem meeting or exceeding 40,000 sequences/sec on runs of a few seconds or longer.

Note also that the results for maximum score of 15 are the same as for 16. This results from the scoring algorithm and limits on coefficient magnitudes; there are simply no sequences with a score in the range (14..15].


Ordering Used in the Table

I use the ordering originally established by N.J.A. Sloane [1].

Here the sequences are presented in lexicographic order, as if minus signs and any leading -1, 0 or 1 terms were ignored. For example, "1, -1, -2, 1, 6, 5, -6, ..." is treated as if it starts "2, 1, 6, 5, 6, ...".

Note in particular that any aliases (see above) are considered identical and listed together just as they would have been in Sloane's books ([1] and [2]).



. . . Forward to page 2 . . . Last page (page 30)



Quick Index:
      Sequences Beginning 2,0,...
      Sequences Beginning 2,1,...
      Sequences Beginning 2,2,0,... ; 2,2,1,... ; etc.
      Sequences Beginning 2,2,4,... ; 2,2,5,... ; etc.
      Sequences Beginning 2,3,0,... ; 2,3,1,... ; etc.
      Sequences Beginning 2,3,4,... ; 2,3,5,... ; etc.
      Sequences Beginning 2,4,...
      Sequences Beginning 2,5,... ; 2,6,... and 2,7,...
      Sequences Beginning 2,8,... ; 2,9,... ; etc.
      Sequences Beginning 3,0,... ; 3,1,... ; etc.
      Sequences Beginning 3,4,... ; 3,5,... ; etc.
      Sequences Beginning 3,8,... ; 3,9,... ; etc.
      Sequences Beginning 4,0,... ; 4,1,... ; etc.
      Sequences Beginning 4,6,... ; 4,7,... ; etc.
      Sequences Beginning 5,...
      Sequences Beginning 6,... ; 7,... ; etc.



Robert Munafo's home pages on AWS    © 1996-2024 Robert P. Munafo.    about    contact
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Details here.

This page was written in the "embarrassingly readable" markup language RHTF, and some sections were last updated on 2024 Mar 02. s.30