Minimally Complex Sequences

Quick Links: MCS Source Code License (Creative Commons Attribution-NonCommercial 4.0 International)

This is a listing of integer sequences generated by what I call 'classical' definitions. These include virtually all number sequences known by ancient cultures (the notable exception is the prime numbers). There are also many others that have equally simple definitions. Examples include:

The natural numbers: 0, 1, 2, 3, 4, 5, 6, 7, ...
The squares: 0, 1, 4, 9, 16, 25, 36, 49, ... (and others that have a "figurate" interpretation)
The powers of 2: 1, 2, 4, 8, 16, 32, 64, ...
The Fibonacci numbers: 0, 1, 1, 2, 3, 5, 8, 13, ...
The factorials: 1, 1, 2, 6, 24, 120, 720, ...

The subject of this page is a broader class of recurrence relations than those described on the 2^nd-order linear recurrence page, but not broad enough to include the recurrence formulas on my accelerating sequences page. This page is a list of links to other pages I've written on subcategories of sequences.

Types of Sequences in my Catalog
Detailed Description of Sequence Numbering System
C Source Code
Ordering Used in the Table
Main Table
There is an index to the main table at the bottom of each page.
Statistics
Footnotes

Types of Sequence Definitions

The following broad categories of sequence definitions are used:

Direct: Each term A_N is computed directly from a formula involving N but not depending on other terms in the sequence.

(This category includes the even numbers, if defined as E_N=2N.)

First-Order Recurrence¹: After an initial term A₀, subsequent terms A_N+1 are defined in terms of a formula involving A_N (and possibly also involving N).

(This category includes the factorials, if defined as F₀=1, F_N=N F_N-1, and the even numbers, if defined as E₀=0, E_N=2+E_N-1. Notice that the same sequence can have both a direct definition and a recurrence definition.)

Second-Order Recurrence¹: After two initial terms A₀ and A₁, subsequent terms A_N+1 are defined in terms of a formula involving A_N-1, and possibly also A_N and/or N.

(This category includes the Fibonacci numbers, F₀=0, F₁=1, F_N+1=F_N+F_N-1. It includes all of the 2^nd-order linear recurrence sequences described here but also a lot of others such as MCS54400, which is nonlinear because its formula has an NA_N term.).

Third-Order Recurrence¹: After three initial terms A₀, A₁ and A₂, subsequent terms A_N+1 are defined in terms of a formula involving A_N-2, and possibly also A_N, A_N-1, and/or N.

All formulas are a sum of terms involving a constant coefficient multiplied by some combination of a power of N and zero or more previous terms. This table shows the combinations that are used:

initial terms		coefficients
		1 N N² N³
A₀		A_N NA_N
A₁		A_N-1 NA_N-1
A₂		A_N-2

Direct formulas use only the first row; first-order recurrence formulas use the first two rows, and so on.

Horadam sequences

Some of the sequences belong to a more narrowly-defined category studied by Horadam and possessing some interesting properties. They are discussed more fully on my linear recurrences page: The Horadam Sequences.

Lucas sequences

The 2^nd-order Lucas sequences are another narrowly-defined category of linear recurrence relations. Again, see my linear recurrences page: The Lucas Sequences.

Sequence ID Number

Each sequence definition is given an identification number that uniquely specifies the initial term(s) (if any) and formula. All ID numbers begin with the letters MCS (for Minimally Complex Sequence) followed by one or more decimal digits. The decimal digits are chosen by the following steps:

1. The sequence type, initial value(s) if needed and coefficients are encoded into fields, either by using the coefficient directly as a field value, or using a translation table that ensures the "default" coefficient has a field value of 0.
2. The fields are Huffman-encided into binary strings.
3. The binary strings are concatenated into a single bitstring.
4. The bitstring is modified into a truncated bitstring.
5. The truncated bitstring is treated as a single positive integer; this is the sequence number, and the decimal representation of this integer is placed after MCS to make the sequence ID number.

Step 1: Fields

The first field gives the sequence type as defined above: 4 for direct, 3 for 1^st-order recurrence, 2 for 2^nd-order recurrence and 1 for 3^rd-order recurrence. After this there are several more fields for the coefficients, in the order shown here:

For direct definitions: K, C_N, C_N², C_N³

For 1^st-order recurrence: K, C_{A_N}, C_N, C_{N A_N}, C_N², C_N³, A₀

For 2^nd-order recurrence: K, C_{A_N}, C_N, C_{A_N-1}, C_{N A_N}, C_N², C_N³, C_{N A_N-1}, D₁, A₀

For 3^rd-order recurrence: K, C_{A_N}, C_N, C_{A_N-1}, C_{N A_N}, C_N², C_{A_N-2}, C_{N A_N-1}, C_N³, D₂, D₁, A₀

where D₁=A₁-A₀-1 and D₂=A₂-A₁-1

Step 2: Huffman Encoding

All fields are then converted to one or more binary digits according to the following table:

  0 → 0
  1 → 100
  2 → 1010
-2 → 10110
  4 → 101110
-4 → 1011110
-5 → 1011111
-1 → 110
  3 → 1110
  5 → 111100
  6 → 1111010
-6 → 1111011
-3 → 111110
  7 → 111111000
  8 → 111111001
  x → 11111101x (for future expansion)
-7 → 111111100
-8 → 111111101
  x → 11111111x (for future expansion)

Then the fields, so encoded, are concatenated to form the bitstring. This table was generated by a normal Huffman statistical division based on the frequency of coefficient values in all sequence definitions with a complexity score (see below) of 9 or less; then the lengths were diminished as much as possible with the goal of reducing the average overall bitstring length; it was then reordered so that the shorter (more common) bitstrings end in 0, and so that larger coefficients map onto larger bitstrings.

Steps 3-4: Truncated Bitstring, and Sequence Number

To make the truncated bitstring, first an initial 1 is added (because otherwise, initial 0's would be lost when the bitstring is converted to a decimal integer). Then any trailing 0's are removed (there is no loss of information because, as specified below, extra 0 terms at the end are always assumed and allow future expansion). Then a single trailing 1 is removed (again there is no loss of information because at this point we know there is at least one trailing 1). The bitstring is then converted to decimal as if it were a binary integer. "MCS" followed by this decimal number is the sequence ID number. No zeros are added; the sequence numbers have a variable number of digits. In general, sequences with simpler definitions will have smaller sequence numbers.

Future Expansion

For future expansion of the sequence generator, sequence types other than 1 through 4 are unused (which means that there are currently no bitstrings starting with 10, corresponding to sequence numbers between 2^N and 3×2^N-1 for all N; plus other smaller ranges defined similarly). This provides space to define new sequence types and formulas, without having to resort to long sequence ID numbers. These new sequence types can use any encoding whatsoever, even abandoning the requirements given in the next few paragraphs, so long as the bitstring of all 0's remains undefined (because it results in a 0 or null sequence number).

Additional fields can be added at the end of the existing coefficient lists. For example, an N⁴ term could be added, with all existing sequences having a C_N⁴ coefficient of 0. Any new fields and the corresponding formula terms will be defined with this compatibility in mind. (Here is an example: suppose it is decided to allow the coefficient of A_N to be a fraction. This necessitates a new "denominator of C_{A_N}" term. Its useful values are positive integers; the value of 1 is the one that has no effect. So, a 1 in this coefficient would be encoded as a field value of 0; the higher values 2, 3, 4, 5, ... could be encoded as 1, -1, 2, 3, etc. respectively, in order to take maximum advantage of the huffman field lengths.) Since the Huffman table maps 0→0, and trailing 0's are discarded before converting to a decimal sequence ID, it is easy to see that any number of extra 0 fields can be added, with no effect on the resulting sequence ID number. Even after the hypothetical N⁴ and "denominator of C_{A_N}" terms are added, all previously-defined sequences will still have the same ID numbers.

Furthermore, for any new fields, a different Huffman table could be used, or the expansion could even use a non-Huffman type of encoding, so long as all sequence numbers as currently defined result in the sequences currently defined. (For example, suppose it is decided that the constant, N, N² and N³ terms should each have an optional factor of -1^N. In this case the best approach is to add 4 fields that are each exactly 1 bit wide (or a single field of 4 bits). For backwards compatibility, bit values of 0 mean there is no factor of -1^N; and all sequence numbers would be interpreted as having this extra 4 bits at the end of the bitstring.)

There are by definition no sequence numbers with a leading 0 when expressed in decimal, such as MCS027. These are intended to be used as manually-assigned sequence numbers, as in the Sloane database (except that there is still no fixed number of digits). Thus, to find out the definition of MCS027 or MCS0143 you would have to look in the database. (No such database exists at present, however it is anticipated that these will be used to attach notes and cross-references, for example to document the multiple different equivalent definitions of a sequence; an example follows.)

Notes

The sequence number is not affected by the scoring system, although the scoring system affects which sequences are listed in the main table. There is, however, a rough correlation between the coefficient value weights in the scoring system and their Huffman symbol lengths.

Aliases

Some sequences are shown with multiple sequence numbers — an initial sequence number in larger print, followed by "(alias ...)" with additional sequence numbers. The aliases represent different formulas that create the same sequence. Here is an example of two definitions that are considered to be aliases:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
MCS220: A_N = N (score: 1)
-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, ...
MCS886: A_N = N - 1 (score: 2)

These sequences are considered equivalent because they differ only in the initial -1 at the beginning of MCS886. In the main table they are both listed under MCS220. The number MCS220 is chosen not because 220 is less than 886, but because MCS220 has a lower complexity score (1 vs. 2).

For the purposes of matching aliases, any initial -1, 0 or 1 terms are ignored but all other terms are significant, including -1, 0 or 1 terms that occur after the first term of magnitude 2 or greater.

Most of the time, aliases will have quite different sequence numbers, as in the example with MCS220 and MCS886. However, in a few rare cases the sequence numbers are very close (such as MCS860675 and MCS860676, two versions of the Lucas numbers that differ only by an initial "1"). Another curiousity (that has an easy explanation if you look at the encoding scheme) is seen in MCS220 = N, MCS440 = N² and MCS880 = N³. And although it is not listed here, MCS110 is N⁰.

Examples

The Fibonacci numbers have a second-order recurrence definition: A₀=0; A₁=1; A_N+1=A_N+A_N-1. The coefficients that need to be encoded are: sequence type=2; K=0; C_{A_N}=1; C_N=0; C_{A_N-1}=1; C_{N A_N}=0; C_N²=0; C_N³=0; C_{N A_N-1}=0; A₁-A₀-1=0; A₀=0. So the fields, before encoding are: 2, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0. By the Huffman table, this becomes the bitstring 1.1010.0.100.0.100.0.0.0.0.0.0 (after adding the initial 1). Then trailing zeros are dropped, and then a single trailing 1 is dropped, making 1.1010.0.100.0 or 1101001000; this is converted to decimal giving the value 840. So, this definition of the Fibonacci numbers is given sequence number MCS456.

Note that some sequences have multiple definitions. For example, the natural numbers may be defined as a direct sequence: A_N = N. Using this definition the bitstring is 1.10110.0.100.0, and the sequence number is MCS108. However, the natural numbers can also be defined by a 1^st-order recurrence A₀=0; A_N+1 = A_N + 1. By this definition, the bitstring is 1.1110.100.100.0.0.0.0, and the sequence number is MCS244. As you might expect, the first definition has a lower complexity score, and that is the one shown in the main table.

Complexity Score

Each sequence definition is given a complexity score based on the number of terms and the size of its coefficients. The basic idea is to take the number of terms in the formula and add the absolute value of the initial terms plus any coefficients (a coefficient of 1 does not count). After a while I made the score calculation a bit more complex to better handle 2^nd- and 3^rd-order recurrence formulas.

For examples of how it works just browse the following table and search for "score: 3", etc.

The main table includes those sequences with complexity scores up to 5.

Efficient Implementation

I have written an efficient search engine that finds MCS sequences matching a given query, with results ordered by complexity score. The source code is here:

mcsfind.c
rpmtypes.h
int128.c
int128.h
type_s16.h
type_s32.h
type_s64.h

Build Instructions

These instructions are for UNIX, Linux or MacOS (using a Terminal or Shell window).

Save each of the source files on your computer, and remove the ".txt" from the end of each of the source files' names. Put them into directories that are set up like the following. The name of the directory "include" is important, but you can use different names for "development" and "mcs" if you wish:

development include rpmtypes.h int128.c int128.h type_s16.h type_s32.h type_s64.h mcs mcsfind.c

Then from within the development/mcs directory, use the following commands:

g++ -c -m64 ../include/int128.c -o int128.o g++ -m64 -I../include mcsfind.c -lm int128.o -o mcsfind

If the compilation is successful, you should be able to perform MCS searches like the following example:

$ time ./mcsfind -w1 2 3 5 9 2, 3, 5, 9, 17, 33, 65, 129, 257, 513, 1025, 2049, 4097, 8193, 16385, 32769, 65537, 131073, 262145, 524289, 1048577, 2097153, 4194305, 8388609, 16777217, ... MCS252546 : A[0] = 2; A[N] = 2 A[N-1] - 1 (score: 4) 1, 2, 3, 5, 9, 15, 25, 43, 73, 123, 209, 355, 601, 1019, 1729, 2931, 4969, 8427, 14289, 24227, 41081, 69659, 118113, 200275, 339593, 575819, 976369, ... MCS802976 : A[0] = 1; A[1] = 2; A[2] = 3; A[N] = A[N-1] + 2 A[N-3] (score: 5) 1, 1, 2, 3, 5, 9, 18, 39, 89, 209, 498, 1195, 2877, 6937, 16738, 40399, 97521, 235425, 568354, 1372115, 3312565, 7997225, 19306994, 46611191, 112529353, ... MCS6904326 : A[0] = 1; A[1] = 1; A[K+1] = 2 A[K] + A[K-1] - K (score: 5) real 0m0.008s user 0m0.006s sys 0m0.002s $

The mcsfind command has many options and includes built-in help. To see the built-in help, use the command ./mcsfind --help

Performance

Using the --report-all and -n options, along with the wc tool in UNIX, one can measure the speed with which mcsfind generates the N "simplest" sequences. For example, the command:

time mcs -ls20 --report-all -n10000 | wc

might show that mcsfind took 0.326 seconds to generate 10,000 sequences. In the following table I measure the number of sequences, and time taken, for various values of the -ls option, which sets an upper limit on the score of all the generated sequences. A sample command is:

time mcs --report-all -ls8 | wc

max score	n	time	sequences/sec
3	73	0.180s	405
4	328	0.112s	2929
5	1316	0.161s	8174
6	4905	0.204s	24040
7	17278	0.567s	30470
8	58170	1.308s	44470
9	265303	6.605s	40170
10	866247	17.398s	49790
11	1805961	43.129s	41870
12	9727114	196.153s	49590
13	24867423	555.400s	44770
14	73177857	1447.398s	50560
15	73177857	*	*
16	242924041	4938.383s	49190

All tests were run on a 2.16 GHz Nehalem Xeon workstation, but the workstation was busier when the odd-numbered entries were run, giving those a lower benchmark in the sequences/sec column. I expect that any CPU with a speed of 2 GHz or faster will have no problem meeting or exceeding 40,000 sequences/sec on runs of a few seconds or longer.

* NOTE that the results for maximum score of 15 are the same as for 14. This results from the scoring algorithm and limits on coefficient magnitudes; there are no sequences with a score in the range (14..15], so the total for 15 ends up being the same as for 14.

Ordering Used in the Table

I use the ordering originally established by N.J.A. Sloane [1].

Here the sequences are presented in lexicographic order, as if minus signs and any leading -1, 0 or 1 terms were ignored. For example, "1, -1, -2, 1, 6, 5, -6, ..." is treated as if it starts "2, 1, 6, 5, 6, ...".

Note in particular that any aliases (see above) are considered identical and listed together just as they would have been in Sloane's books ([1] and [2]).

. . . Forward to page 2 . . . Last page (page 30)

This page was written in the "embarrassingly readable" markup language RHTF, and some sections were last updated on 2025 Apr 29.