/* mcsfind.c.txt is a read-only copy of .../MCS/mcsfind.c created by .../pub/math/MCS.per This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Details here: http://mrob.com/cc-license.html mcsfind - Search for a sequence A description of the problem this program solves, and details of the sequence numbering system, etc. are at mrob.com/pub/math/MCS.html For usage instructions, invoke with -h as the sole argument. PURPOSE This program is designed to augment MCS.per by performing a deep search for a single desired sequence. It avoids MCS's huge memory overhead by not storing all the sequences, at the cost of having to re-generate everything for every search. TTD To support webcam mode, add a --index-by-gen-order option which prints out a single sequence using the coefficients generated by the Nth instance of the iterator. Provide a way to resume a search "from the middle" : When a search at -ls12 ends, print out the current values of all the loop index variables; then another search (presumably for the same sequence terms) can be started from that point, by setting all the variables and jumping into the innermost loop. (This could also be done by a simple adaptation of the previously described --index-by-gen-order option, but that's less efficient because it requires the iterator to retrace its steps). This has application in a future server implementation, in that a search result can contain up to three MCS numbers, but if there aren't three MCS numbers then the "vector" of loop values for continuing the search would be stored. If the same query comes again after a while (conditional on recent server load) then the search could be extended. I found a puzzle that was stated "find the wrong number in the series 1, 4, 9, 18, 30, 38, 49". To solve this with MCS it would be a lot easier if I could specify wildcard terms. (These questions are often seen on aptitude tests, but there are likely to be harder, e.g. two interleaved and otherwise unrelated sequences. Of course if the answer were "numbers retired by the New York Yankees", MCS would be completely helpless). Support matching terms with unspecified middle, e.g. "mcsfind 27 .. 143" would find sequences that contain 27 and 143 in that order, with any number of terms in between. The number of dots could specify a maximum gap length, or use '?' for a single term, '...' for one or more terms, or some other similar system. After a few seconds, split process into multiple threads; each thread traversing the tree in a different way such that total coverage is guaranteed without overlap (example: when a parent node's complexity PC is in the range [max_cplx-MAXC .. max_cplx], explore only those child nodes whose branch number N plus PC == TN mod TT, where TN is our thread number and TT the total number of threads.) Split again each time the workload has increased fourfold. Compare wall-clock time to utime to regulate number of threads. Changes in Weights and Generation: (None of these changes should "break" existing sequence numbers, unless there turns out to be a bug in the current implementation of &seqnum) Find a better balance between fine granularity and efficiency (as it stands, "mcs -n1 388 4957" if over 5 times slower than "mcs -w1 -n1 388 4957") Make surcharge for negative a little less than it is now. Add score improvements to MCS.per and adjust its $max_cplx value accordingly Surcharge for position in sequence (for seq-howto) should be logarithmic. In general, if the score component weights are proportional to the bitlength of the corresponding Huffman symbols, the distribution of MCS numbers in the webpage listing will make more sense. &seqnum seems to have a bug -- "$b2 =~ s/[1.]$//;" allowing a ". to be deleted instead of a "1". Actually this is saved by the greedy quality of regexp phrases, but I'd like to make it more explicit huffman table needs to be expanded for larger coefficients If both Am1 and Nam1 terms exist, print formula in factored form, and less surcharge for complexity score (Similar for An and Nan) Try to do similar tests for other factored forms, general case is (K1 N + K2) (C1 + C2 An + C3 Am1) */ /* REVISION HISTORY (Brief summary of the Perl version "MCS.per"; see its revision history for more): 20041111 Began development of MCS.per 20041204 MCS.rhtf is pretty much finished (long hiatus) 20070711 Begin modifications to make it useful as a way to search for sequences. This leads to the decision to write a faster version in C. 20070921 First version. Pretty much get it generating the same sequences that MCS.per generates. {To test this: invoke these two commands and compare their output: ORIGINAL COMMAND SYNTAX circa 20070922: ./mcsfind -n100 ./MCS.per -n 100 CURRENT SYNTAX as of 20091130: ./b-mcsfind -r -w1 -lp10 -n100 -20091201} 20070922 Pass numerical arguments to search for a subsequence. It will start with very simple formulas, then progressively more complex ones. Add coeff and print2. Improve print2 formatting to agree with MCS.per. 20070923 Change default max_match to 3. Add -r option to select RHTF output (default now uses ANSI terminal color styles). Add some UTF-8 encoded superscripts and subscripts. 20070924 Add -f option to select plain ASCII, RHTF or UTF-8 in addition to the (default) ANSI. "-r" still selects RHTF (same as "-f 1"). 20070926 Add higher-granularity weight option (specified with "-w 2") with logarithmic weights. Tweak it so that the following sequences are given in the order shown: seqnum score huffman formula MCS 220 1.0 1.101110.0'100 N MCS 884 1.7 1.101110.100'100 N+1 MCS 882 2.0 1.101110.0.10 10 2 N MCS 440 2.3 1.101110.0.0'100 N^2 MCS 15520 2.7 1.1110.0.1010.0.0.0.0'100 2 A[n] (In the original weighting, all of these had the same score, 2 points.) 20071005 More changes to high-granularity weights to make things more consistent. 20071006 WEIGHT_LOG is now the default. Detect and suppress alternate definitions ("variations") of the same sequence; add -v option to get original behavior. 20071007 Change argument format ("-n 10" becomes "-n10") to make it a little less confusing when arguments and sequence members are combined in the same command line. Add enhuff() and supporting functions. Output still agrees with old perl script, using the following method to compare: ./MCS.per -n 1000 > outMCS mcs -w1 -n1000 -v -f1 > outmcsfind diff outMCS outmcsfind 20071008 Improve efficiency of search with new weighting by adding the "range" parameter to scan.1. It's still pretty bad though; I'd like to find a better way without losing the benefits of higher granularity. 20071010 Add -D (debug) option. Use arrays to cache map1, map2, etc. functions. 20080206 Add --help option. 20090617 Fix minor errors in help text; add some comments. 20091122 Add -seqNNN option. Add -lcNNN option. Increase SCAN.LIMIT to 32. This option affects search speed; to test it I am using commands like "time ./mcsfind -n27 3 5 7 8 9 10". Increasing SCAN.LIMIT from 24 to 32 increased the time of this command from 25.746s to 28.987s. Add 128-bit versions of generate() and print1() to support more output from the new command -seqNNN. "./mcsfind -lc1000 MCS30992" now prints the factorials up to 31!=8.22e33. Spend quite a while diagnosing the output of MCS206182547, which stopped mysteriously after the first negative term; I discover a bug in the 128-bit multiply. (MCS206182547 is fairly interesting, an exponentially increasing oscillating sequence with a period of about 9 terms per oscillation) 20091123 Add -lo option, to find lower order recurrence relations and other things that can't be found as easily otherwise. Now "MCS 3 7 27 -lo0" finds pure polynomial formulas (in this case 2 N^3 + 2 N^2 + 3) and "MCS -lo2 2 4 6 7 8" finds complex oscillations like MCS3616484370 that more typically show up as 3rd-order recurrences. {Now with the negative sequence changes, you should add "-lp10" and "-n10" to see some oscillating sequences. Another pattern, "mcs -n7 -lo2 2 4 6 7 6 -lp10" gives 7 sequences with different oscillation periods: MCS445983762, MCS1784118290, MCS884446988, MCS7204960786, MCS1769335200, MCS57871556818, MCS57640999442 20091126 Make it possible to search for and match negative values, mainly to support my work in recreating the Floretion sequences and things like "mcs 1 0 -1 0 1 0 -1 0" (the real component of the powers of i). Add -lp option, with "-lp10" making it act as it did before today's changes. When a sequence matches, subsequent negative terms are printed out so long as it satisfied the -lpNN criterion (an example of this is "mcs -lo2 2 4 6 7 6 -lp10", which finds MCS445983762; before today this sequence was found but no terms after the "-49" were not shown, and you have to do "mcs MCS445983762" to see them.) This change has some drawbacks, in particular regarding the uniqueness matching. If you do "mcs 2 5 9", now you get MCS1972, MCS62017 and MCS250433 -- three sequences that differ only in initial 0 and -1 terms. 20091128 Add match.multi and addseq. These implement a new, more sophisticated CRC system. Each sequence is CRC'd multiple times based on different subsequences all starting with an initial start term. (If a sequence is reported and is CRC'd out to the 40th term, and another identical sequence is found later, the new identical sequence might only go out to the 38th term because of an extra 2 terms at the beginning. That means that the already-reported sequence now needs to be CRC'd out to 38 terms to effectively match against this one. I implement this by computing all CRCs when the sequence is found, even though many of the shorter ones might not actually get used.) The result of this change is to restore the old behavior regarding 0 and 1 terms: It now no longer reports sequences as unique that only differ in initial -1, 0 or 1 terms (as before, you can use the -v flag if you want these). 20091130 Increase ALLOC_LIMIT to 256 20091201 Add -lm### and -old-negprint options; increase SCAN.LIMIT to 64. These changes are to support the #compare# script and to help ensure that my changes to the MCS.html webpage remain consistent with recent changes to mcsfind.c (most importantly, increasing the number of displayed terms, and including sequences with negative terms). 20091202 Allow commas with arbitrary spacing in argument list, e.g. "mcs 2,3,5,8,13,21,34" or "mcs 5,25 ,125, 625, 3125 , 15625". This makes it more compatible with the #DBIS# script; I frequently use both to search fr the same sequence. Add -ls option, with default 12: #mcs# will no longer search forever if you give it an impossible problem (specify -ls99 if you still want that behavior) Major change: After discovering that I cannot find the MCS number corresponding to A077966, I decide to remove the tests that were designed to eliminate certain types of constant and trivial interleaved sequences. 20091203 fflush after writing each result to stdout 20091208 Remove requirement that mode-0 sequences must have a non-negative initial term. 20091216 Add horadam command option and cmd.horadam(). Add print3 and print.d; display sequences in "A[N]=..." form whenever possible and otherwise use K instead of N. 20100108 Add crc32_init. 20101029 Fix a few compiler warnings for GCC 4.2.1 20101030 Add seqnum.load(), seqnum.def() and fmt.seqnum(); RHTF format labels now include a specific label in the link, when available (using ~/data/MCS-links.txt generated by MCS.per). 20101106 Use string array in usage() 20120107 Make the "Extra arguments were ignored" message more useful. 20120419 Add advice on increasing the -ls option. Add -linNN option. This controls now many non-matching terms can precede the search terms. For example, "mcs 1 2 3 5" normally reports: MCS856582: 1, 1, 2, 3, 5, 7, 10, ... MCS840: 0, 1, 1, 2, 3, 5, 8, 13, ... MCS3187078: 1, 1, 2, 3, 5, 6, 7, ... All of these have initial non-matching terms that precede the "1 2 3 5" terms. With the option -lin1, MCS840 is excluded and we instead get: MCS856582: 1, 1, 2, 3, 5, 7, 10, ... MCS860678: 1, 1, 2, 3, 5, 8, 13, ... MCS434496: 1, 2, 3, 5, 7, 11, ... With the option -lin0, all matches must start with 1 2 3 5: MCS215168: 1, 2, 3, 5, 8, 13, ... MCS869504: 1, 2, 3, 5, 7, 10, ... MCS434496: 1, 2, 3, 5, 7, 11, ... Add -lfmNN option. This controls how many following terms are allowed to be identical for a sequence to be considered distinct from any other matches reported thus far. For example, using a command like "mcs 1 2 3 -n27 -lin0 | sort", we get the following matches (among others): 1, 2, 3, 3, 6, 9, 9, ... 1, 2, 3, 4, 5, 6, 7, ... 1, 2, 3, 5, 7, 10, 13, ... 1, 2, 3, 5, 7, 11, 15, ... 1, 2, 3, 5, 8, 13, 21, ... Notice there are two matches that start "1 2 3 5 7", both of these have the search terms "1 2 3" plus two more terms "5 7" . With the option -lfm1, only one of these "5 7" extensions is reported, and another, slightly higher-complexity sequence takes its place: 1, 2, 3, 3, 6, 9, 9, ... 1, 2, 3, 4, 5, 6, 7, ... 1, 2, 3, 4, 6, 9, 13, ... 1, 2, 3, 5, 7, 10, 13, ... 1, 2, 3, 5, 8, 13, 21, ... With the option -lfm0, each reported match must have a distinct value following the "1 2 3": 1, 2, 3, 3, 5, 7, 8, ... 1, 2, 3, 4, 5, 6, 7, ... 1, 2, 3, 5, 8, 13, 21, ... 1, 2, 3, 6, 9, 18, 27, ... 1, 2, 3, 7, 16, 44, 124, ... In the course of adding these options, I have also fixed bugs in match.multi. Before these changes, "mcs -w1 1 1 2 3 5" did not find MCS840. Here is an example comparison of the effects of these changes: Before : mcs 3 7 27 -n7 0, -1, 0, 0, 3, 7, 27, 75, 271, 879, 3327, 12127, 48735, 194271, 827839, ... MCS3444773 : A[0]=0; A[1]=-1; A[K+1]=A[K] + K A[K-1] + K (score: 7.2) 0, 1, 6, 1, 3, 13, 3, 7, 27, 7, 15, 55, 15, 31, 111, 31, 63, 223, 63, ... MCS3277451 : A[0]=0; A[1]=1; A[2]=6; A[N]=2 A[N-3] + 1 (score: 8.4) 1, 2, 7, -1, -3, -13, 3, 7, 27, -5, -13, -53, 11, 27, 107, -21, -53, ... MCS52440156 : A[0]=1; A[1]=2; A[2]=7; A[N]=- 2 A[N-3] + 1 (score: 8.6) 1, 2, 3, 7, 27, 65, 130, 241, 406, 640, 968, 1404, 1969, 2694, 3599, ... MCS6652176 : A[0]=1; A[1]=2; A[2]=3; A[K+1]=A[K-2] + K^3 - 2 (score: 8.7) 1, 3, 3, 7, 27, 63, 127, 237, 399, 631, 957, 1389, 1951, 2673, 3573, ... MCS25563444 : A[0]=1; A[1]=3; A[2]=3; A[K+1]=A[K-2] + K^3 - K (score: 8.8) 2, 1, 2, 7, 3, 7, 27, 11, 27, 107, 43, 107, 427, 171, 427, 1707, 683, ... MCS213956698 : A[0]=2; A[1]=1; A[2]=2; A[N]=4 A[N-3] - 1 (score: 8.5) 1, -3, 3, 7, 27, 51, 97, 153, 237, 337, 471, 627, 823, 1047, 1317, 1621, ... MCS2012708 : A[0]=1; A[K+1]=- A[K] + K^3 + K^2 - 2 (score: 9.0) After : mcs 3 7 27 -n7 0, -1, 0, 0, 3, 7, 27, 75, 271, ... MCS3444773 (as before) 1, 1, 3, 7, 27, 81, 383, 1383, 7667, 31817, 199351, 922703, 6379243, ... MCS6854774 : A[0]=1; A[1]=1; A[K+1]=3 K A[K-1] - A[K-1] + K (score: 8.0) 1, 1, 3, 7, 27, 125, 701, 4575, 34121, 286685, 2682519, 27685235, ... MCS6872326 : A[0]=1; A[1]=1; A[K+1]=K A[K] + 3 A[K-1] - K (score: 8.0) 0, 1, 6, 1, 3, 13, 3, 7, 27, 7, ... MCS3277451 (as before) 0, 3, 7, 27, 159, 1267, 12663, 151947, 2127247, 34035939, 612646887, ... MCS126554 : A[0]=0; A[K+1]=2 K A[K] - 2 K + 3 (score: 8.1) 1, 2, 7, -1, -3, -13, 3, 7, 27, ... MCS52440156 (as before) 1, 2, 3, 7, 27, 65, 130, 241, 406, ... MCS6652176 (as before) So there are three sequences (MCS6854774, MCS6872326, and MCS126554) that should have been reported but were skipped over because of the flaws in matchmulti. 20130207 Add -f4 (OEIS style ASCII) formatting mode. Add lucas option Fix a bug in coeff_expand that caused it to treat the cNam1 coefficient as 0 in any sequence of mode 3. An example is MCS1574540, which printed as MCS393612 before I fixed the bug. 20130208 Add HTML format; add letter-string format for -D; better handling of -lm; remove unused got2 variable in gen_128. 20130209 Add --report-all option. 20130210 Add lots of --show-xxx and --hide-xxx options. 20130211 Add scan.2() using a fancy "iterator", it runs a tiny bit faster, but more importantly it isolates the rather large and complex nested loops from everything else, and will enable important refactoring and improvements in the future. (For example, I could implement a faster search that is used when you're trying to match a single set vs. multiple sets of search terms). 20130212 Better checks against g_too_big; remove one o_negokay case for compatibility with MCS.per 20130223 Add validate_types() and ifdefs to help with porting s16 and s32 types. Add utf8_encode(). 20130224 Change -r and -r1 to --rhtf and --rhtf-simple 20130225 Use type_s16.h, etc. Move block header more towards the beginning of the file. 20130226 Several changes to support -ls option with non-integer argument; fix bug caused by calling cache_weights before setting the default value of o_weight; add -Dz. 20151210 Add g_total_gen 20160203 Add -encode option with variable number of arguments, then immediately refactor it so that it always takes the same 12. (It doesn't yet do quite what I want: there needs to be a way to specify something like A0=1, A1=0, A2=0, A_(n+1)=A_n+1 where the user wants 3 initial terms but the auto-detection wrongly guesses it's a 1st-order recurrence) 20160229 Add show.weights() and -S option. 20160306 Add gettimeofday, gettime, inittime; print elapsed time after results. 20160309 Add -m/--minimal-inits/--allow-extra-inits options (which presently do nothing) 20160311 -D (debug) option now takes letters for each flag, chosen to be close to corresponding debug options in RIES. Current debug letters are 'gpPyZ'. Remove scan1 and --old-search option. --allow-extra-inits is now the default; it now finds sequence MCS6967314 given "2 4 9 19 39" as search terms. 20170216 Add MK_SUBST_DATE and --version option. 20170402 Add simpler initialisation of g_toobig128 20230623 Add -is option */ /* BUGS and TTD (TO DO and FUTURE IMPROVEMENTS) Add a -decode option that turns an MCS number into the coefficients Searching for 3,0,2,3,2,5 (even with -v option) doesn't report MCS404818299 (but instead only finds MCS404819757). -encode option can't tell if you want initial terms of '0'. I should give the user a way to specify the mode directly, by either adding a -mode option that hints the mode to -encode, or perhaps adding a special syntax like: -encode -1 0 - 2 2 0 0 -1 2 0 0 0 where the placement of the '-' indicates the end of the initial terms (in this example, the user is requesting mode 2) I'd like to have three different options for handling terms prior to the match pattern for purposes of determining uniqueness. Given a search for "2 5 9", the three options would cause the following three different behaviors: * MCS1972, MCS62017 and MCS250433 are distinct (current "mcs -v 2 5 9") * MCS1972, MCS62017 and MCS250433 are "identical" but MCS1972, MCS2012290 and MCS13781018 are distinct (current "mcs 2 5 9") * MCS1972 and MCS2012290 are "identical"; MCS13781018 is still distinct The new one can be implemented through the current system of CRCs for each reported sequence, but starting the CRC at the first matched term, rather than at the first "significant" term. */ #ifndef MK_SUBST_DATE # define MK_SUBST_DATE 20170216 #endif #include #include #include #include /* -------------- typedefs -------------------------------------------------- / These work with the function validate_types(); see RIES source code for / details. */ #include #include #include void validate_types(void); void validate_types(void) { TYPE_S16_CHECK_SIZE("validate_types") TYPE_S32_CHECK_SIZE("validate_types") TYPE_S64_CHECK_SIZE("validate_types") } #include /* ---------gettimeofday--------- mcsfind uses gettimeofday() to measure how much time was used in the search. To port it to another OS, add another #ifdef case to provide another gettimeofday() function. */ #ifdef _WIN32 /* Windows version. Note that _WIN32 is defined even if building for a 64-bit target (which in addition defines _WIN64) */ /* and are needed for _ftime() http://msdn.microsoft.com/en-us/library/z54t9z5f(v=vs.71).aspx */ # include # include /* defines the UNIX-compatible "timeval" structure that we take as a parameter, allowing us to emulate the UNIX routine that RIES was designed to use. It's in winsock because network protocols use a lot of data structures that were originally defined by UNIX systems. */ # include /* from www.linuxjournal.com/article/5574 */ void gettimeofday(struct timeval* t,void* timezone) { struct _timeb timebuffer; _ftime( &timebuffer ); t->tv_sec=timebuffer.time; t->tv_usec=1000*timebuffer.millitm; } #else /* On UNIX, Linux, MacOS X, and CygWin systems the gettimeofday function is in the standard libraries and is defined by these #includes. */ # include # include #endif double tod_start; /* gettime() value at program start */ /* gettime is the only really OS-specific routine in this whole program. It returns how long the program has been running, measured in tenths of a second. For example, it returns 14 if the program has run for 1.4 seconds. This is actual elapsed "clock-on-the-wall" time, not necessarily a measure of how much time the CPU has spent working on RIES. For example, if you close your laptop while RIES is running it will include the time the system was "asleep" in its final printout of time used. */ double gettime(void) { struct timeval tod_record; double t_now; /* There is a block of code inside an #ifdef above (search for "---gettimeofday---") that sets up the appropriate include files and/or defines a gettimeofday() function based on compile-time flags like _WIN32. See the block-comment there for more details. */ gettimeofday(&tod_record, 0); t_now = ((double) (tod_record.tv_sec)) + ((double) (tod_record.tv_usec)) / 1.0e6; return(t_now - tod_start); } double gettime(void); void inittime(void); void inittime(void) { double now; /* Init the global so gettime() has a valid value to subtract from its new measurement */ tod_start = 0; /* With tod_start set to 0 gettime will return the absolute current time */ now = gettime(); /* Set the global to this value. */ tod_start = now; /* From now on, calls to gettime() will measure time from the moment we made the preceding gettime() call. */ } /* Limits for lengths of generated sequences; the first is for queries and the other is for the display commands -seq##, MCS## and horadam */ #define SCAN_LIMIT 64 #define ALLOC_LIMIT 256 /* Defaults for the -lc## option: 2 and 4 lines on an 80-column TTY */ #define DEFAULT_MAX_CHARS 159 #define DEFAULT_MC_CMDSEQ 319 #define FORMAT_PLAIN 0 // Purely plain ASCII and no RHTF #define FORMAT_RHTF 1 // Plain ASCII; styles expressed via RHTF #define FORMAT_ANSI 2 // Use ANSI terminal colors, etc. to display some // additional info #define FORMAT_UTF8 3 // Use ANSI plus UTF-8 (works in Tiger, not in Jaguar) #define FORMAT_OEIS 4 // ASCII with OEIS-like expressions #define FORMAT_HTML 5 // ASCII with HTML markup #define WEIGHT_ORIG 1 // original weights #define WEIGHT_LOG 2 // finer weights used to give coefficients a logarithmic // distribution int o_format; int o_ignore_sign; s64 o_max_match; int o_rhtf_simpler; int o_weight; int o_variations; int o_report_all; int o_max_charlen; int o_cmdseq_lc; int o_max_order; int o_negokay; int o_poswindow; int o_legacy_negprint; int o_cplx_limit; int o_cplx_custom; int o_encode; int o_horadam; int o_min_inits; int o_show_formula; int o_show_seqnum; int o_show_terms; int o_show_score; int o_show_rhtf_links; int o_lucas; int o_initial_nonmatch; int o_lim_follow_match; s64 g_too_big; s128 g_toobig128; char g_tb_str[100]; s64 g_total_gen; /* %%% for completeness we want to define huffman table values for all coefficients and then we can set maxc = max_cplx; */ #define MINC -8 #define MAXC 8 int debug_g=0; /* Show seqs found to be valid by generate Old o_dbg1 */ int debug_p=0; /* Show coefficients on report.match and enhuff (old -D24) */ int debug_P=0; /* Show dehuff and extracted coefficients */ int debug_y=0; /* Show statistics for each pass through main loop (old -D1) */ int debug_Z=0; /* Show weight and search method (old -DZ) */ /* ----------------------------- prototypes ------------------------------ */ void h_for_help(void); void help_encode(void); void usage(void); void crc32_init(void); void crc32_start(u32 *crc); void crc32_add8(u32 *crc, u8 data); u32 crcseq(s64 *s, int n); int crc_match(u32 c, int siglen); char * coeff(int x, int one); void init_format(void); char * coeff_hsym(int x); void enhuff(int *a, int n, char *s, int maxlen); int dehuff(char *s, int *a, int maxlen); s64 str10_s64(char * bits); int s64_str10(s64 x, char * bits, int maxlen, int pure); s64 coeff_seqnum(int mode, int cK, int cAn, int cN, int cAm1, int cNan, int cN2, int cAm2, int cNam1, int cN3, int d2, int d1, int i0); void fmt_seqnum(s64 sn); void coeff_expand(int * ca, int * mode, int * cK, int * cAn, int * cN, int * cAm1, int * cNan, int * cN2, int * cAm2, int * cNam1, int * cN3, int * d2, int * d1, int * i0); void print2(int mode, int cK, int cN, int cN2, int cN3, int i0, int cAn, int cNan, int i1, int cAm1, int cNam1, int i2, int cAm2); void print3(int mode, int i0, int cAn, int i1, int cAm1); void print_d(int mode, int cK, int cN, int cN2, int cN3, int i0, int cAn, int cNan, int i1, int cAm1, int cNam1, int i2, int cAm2); int generate(s64 * A, int mode, int cK, int cN, int cN2, int cN3, int i0, int cAn, int cNan, int i1, int cAm1, int cNam1, int i2, int cAm2, int limit, int negokay, int * p_final_i); void gen_128(s128 * A, int mode, int pcK, int pcN, int pcN2, int pcN3, int pi0, int pcAn, int pcNan, int pi1, int pcAm1, int pcNam1, int pi2, int pcAm2, int limit, int * p_final_i); void print128(s128 * A, int max_cl, int final_i, int match_where); void print1(s64 * A, int max_cl, int final_i, int match_where); int iabs(int x); void find_sig(s64 * A, int final_i, int *firstsig, int *siglen); int ilw(int x); int map1(int x); int map2(int x, int surcharge); int map3(int x); int map4(int x); int match1(s64 * A, int final_i, int * p_match_where); int matchmulti(s64 * s, int firstsig, int siglen, int match_where); void addseq(s64 * s, int siglen); void report_match(s64 * A, int max_cl, int final_i, int match_where, int mode, int cK, int cN, int cN2, int cN3, int i0, int cAn, int cNan, int i1, int cAm1, int cNam1, int i2, int cAm2, int cplx); int s2_iterator(int init_range); void scan2(int min_cplx, int range); s64 str_s64(char *s, char * *r); void cmd_encode(s64 * coeff); void cmd_horadam(int p, int q, int r, int s); void cmd_lucas(int P, int Q, int UV); void cmd_seq(s64 p1); void cache_weights(void); void show_weights(void); /* -------------------------------- help --------------------------------- */ void h_for_help(void) { printf("Give -h option for help.\n"); } void help_encode(void) { fprintf(stderr, "%s", "-encode requires exactly 12 parameters:\n" "\n" " A0, A1, A2 Initial 3 terms\n" " cK, cN, cN2, cN3 polynomial terms\n" " cAn, cNan coefficients of A[n] and of N A[n]\n" " cAm1, cNam1 coefficients of A[n-1] and of N A[n-1]\n" " cAm2 coefficient of A[n-2]\n" "\n" "All are to be given in this order; use 0 for any unused terms.\n" ); } void usage(void) { const char * help_text[] = { "NAME", "", " mcsfind - Search for Minimally Complex Sequences, given some terms", "", "USAGE", "", " mcsfind [-n#] [-v] [-f#|-r] [-w#] [-lx##]... [-D##] N1 N2 N3 ...", " mcsfind -seq###", " mcsfind horadam A B R S", " mcsfind lucas P Q", "", "mcsfind orders its output by progressively increasing formula complexity.", "It will search more and more complex sequences until the specified terms", "are found or other optional limits (-n#, -ls#) are reached.", "", "The following options are available:", "", "Searching options", " -w1 Original simple weighting (as used on mrob.com/pub/math/MCS.html)", " -w2 Logarithmically weighted coefficients (this is the default)", " This option makes finer distinctions as to the 'complexity' of", " each sequence, and also affects the number of sequences that", " will be checked for any given value of the -ls option.", " -m synonym for --minimal-inits", " -n7 Match up to 7 sequences (default is 3)", " -v Show variations (different formulas that produce the same", " sequence)", " -is Ignore sign: allows 7 to match -7", " -lc300 Limit characters: display up to 300 characters of each sequence", " (default is 159, or 319 when you give -seqNN or MCSNN)", " -lfm2 Limit following matching terms (2 in this example). For a new", " match to be reported, it must differ from all matches reported so", " far, not counting the search terms and this many additional terms.", " (default value is 10; smaller values give fewer matches)", " -lin2 Limit number of initial non-matching terms (2 in this example)", " (default value is 999; smaller values give fewer matches)", " -lm999 Limit magnitude: Do not display (or attempt to calculate) terms", " exceeding the given amount (999 in this example) in absolute value", " Default is about 10^15 when searching, or 2*10^34 when you give", " -seqNN or MCSNN. Use this if you can't handle such big integers.", " -lo2 Limit order: Consider only defintions up to 2nd-order recurrence", " (default allows all, currently up to 3rd order)", " -ls5.2 Limit score: Consider only defintions of complexity less than", " or equal to 5.2 (default is -ls12.0)", " -lp7 Limit to positive values: In order to match, a candidate sequence", " must produce a value of 2 or more within the first 7 terms. Legacy", " behavior of #mcs# is achieved by specifying -lp10", "", " --allow-extra-inits Allow extra initial terms in a definition, e.g. if", " the recurrence is 1st-order, there may be 1, 2, or", " even 3 initial term(s)", "", " -f0 Plain ASCII format", " -f1 ASCII with RHTF markup", " -f2 ANSI terminal (recommended on MacOS X 10.2)", " -f3 ANSI plus UTF-8 characters (subscripts, etc.)", " -f4 ASCII formatting with lowercase letters (like OEIS)", " -f5 ASCII with HTML markup", " -rhtf same as -f1", "", " --hide-formula Do not display formula (sequence definition)", " --hide-score Do not display complexity score", " --hide-seqnum Do not display the sequence number (MCS number)", "", " --minimal-inits (or -m) Do not allow more initial terms than the", " order of the recurrence", "", " --report-all Show every sequence generated (also sets -n very", " large; if followed by -n the latter will override)", "", " --show-rhtf-links In RHTF format, include links to MCS catalog", "", " --version Shows the date it was compiled", "", "EXAMPLES", "", " -seq440 Display sequence MCS440", " MCS440 Display sequence MCS440", " hor 0 1 2 3 Display Horadam sequence (0,1,2,3)", " luc 1 -1 Display Lucas sequences U(1,-1) and V(1,-1)", " -encode 2 4 0 1 0 0 0 2 0 0 0 0", " Given coefficients, get MCS number (details below)", "", " -D27 Set debug bitmask", " -S show coefficient weights", "", "Additional explanation and background is available at:", "", " www.mrob.com/pub/math/MCS.html", "", 0}; int i; for(i=0; help_text[i]; i++) { printf("%s\n", help_text[i]); } help_encode(); } /* ---------------------------- seqnum cache ----------------------------- */ /* These routines keep track of the seqnums that were generated by MCS.per which we use to figure out what type of link to emit in RHTF output mode */ int g_snc_init = 0; long g_snc_nseq; s64 * g_snc_def; /* Contains S if seqnum S is defined */ s64 * g_snc_ref; /* Seqnum of seq this is an alias for, else equal to S */ /* Read the list of sequences (MCS-links.txt) which is generated by MCS.per */ void seqnum_load(void) { char * hd; char pn[200]; FILE * IN; /* Init all the globals to default "empty" settings */ g_snc_nseq = 0; g_snc_def = 0; g_snc_ref = 0; /* Regardless of whether we actually load any links, we count this as "being initialized" */ g_snc_init = 1; /* We only bother with this if running under my home account. %%% On a server, there might be no HOME, and PWD would not contain "/munafo". Probably I should have a PHP script supply a pathname to a file containing the URLs. */ hd = getenv("HOME"); if (hd) { if (strstr(hd, "/munafo") == 0) { return; } } else { return; } sprintf(pn, "%s/data/MCS-links.txt", hd); /* printf("pn is %s\n", pn); */ IN = fopen(pn, "r"); if (IN) { char * res; int na; long sn1, sn2, nseq; nseq = 0; res = hd; while (res) { res = fgets(pn, 200, IN); if (res) { switch(pn[0]) { case 'N' : /* This field tells us how many sequences to allocate */ na = sscanf(pn, "N %ld", &g_snc_nseq); if (na) { // printf("N %ld\n", g_snc_nseq); g_snc_def = (s64 *) malloc((size_t) g_snc_nseq * sizeof(s64)); g_snc_ref = (s64 *) malloc((size_t) g_snc_nseq * sizeof(s64)); if (g_snc_def && g_snc_ref) { // printf("malloc successful\n"); nseq = 0; /* This is redundant but meant to cover the case of multiple N lines in the file */ } else { g_snc_nseq = 0; } } break; case 'a' : /* Normal sequence definition */ na = sscanf(pn, "a %ld", &sn1); if (na && (nseq < g_snc_nseq)) { /* printf("a %ld\n", sn1); */ g_snc_def[nseq] = sn1; g_snc_ref[nseq++] = sn1; } break; case 'b' : /* Alias sequence definition */ na = sscanf(pn, "b %ld %ld", &sn1, &sn2); if (na && (nseq < g_snc_nseq)) { /* printf("b %ld %ld\n", sn1, sn2); */ g_snc_def[nseq] = sn1; g_snc_ref[nseq++] = sn2; } break; default: break; } } } fclose(IN); if (nseq) { /* Set g_snc_nseq to the actual number of sequences we got */ g_snc_nseq = nseq; } } } /* seqnum_def tells if a seqnum is defined in the RHTF pages emitted my MCS.per, and if so, which seqnum it is an alias to. Pass in the seqnum S you are interested in. If the return value is: 0 The sequence is not defined S The sequence is defined and not an alias > 0 The sequence is an alias to this other seqnum */ s64 seqnum_def(s64 seqnum) { long i; /* We only show RHTF links in RHTF mode and when the o_show_rhtf_links option is set. */ if (o_show_rhtf_links == 0) { return 0; } if (o_format != FORMAT_RHTF) { return 0; } if (g_snc_init == 0) { seqnum_load(); g_snc_init = 1; } /* Paranoia: make sure there's a pointer */ if ((g_snc_def == 0) || (g_snc_ref == 0)) { return 0; } for(i=0; i> 24L) ^ data; *crc = (*crc << 8L) ^ crc32_conv_8[index]; } /* Given a pointer to a sequence, and the number of terms, returns a CRC of that sequence. */ u32 crcseq(s64 *s, int n) { u32 rv; s64 t; int i, j; crc32_start(&rv); for(i=0; i>= 8; } } return rv; } int crc_match(u32 c, int siglen) { int rv; crcnode * l; rv = 0; l = crclists[siglen]; while(l) { if (l->crc == c) { return 1; } l = l->next; } return 0; } /* ---------------- coefficient handling and conversion ------------------ */ char coeff_scratch[40]; int do_plus; /* coeff turns a number into a string, with special treatment for -1, 0 and -1. This is for printing expressions like "2 x - y + 1", where the coefficients are 2, -1 and 1. The parameter "novar" indicates if there is no variable following, meaning that we need to print a "1" or "-1". */ char * coeff(int x, int novar) { coeff_scratch[0] = 0; if (x < -1) { sprintf(coeff_scratch, " - %d", -x); } else if (x == -1) { if (novar) { sprintf(coeff_scratch, " - 1"); } else { sprintf(coeff_scratch, " -"); } } else if (x == 1) { if (do_plus) { if (novar) { sprintf(coeff_scratch, " + 1"); } else { sprintf(coeff_scratch, " +"); } } else if (novar) { sprintf(coeff_scratch, " 1"); } } else { if (do_plus) { sprintf(coeff_scratch, " + %d", x); } else { sprintf(coeff_scratch, " %d", x); } } if (x != 0) { do_plus = 1; } return coeff_scratch; } /* ------------------------- output formatting --------------------------- */ char ks_MCS[20]; char ital_on[20]; char ital_off[20]; char bold_on[20]; char bold_off[20]; char small_on[20]; char small_off[20]; char sub_on[20]; char sub_off[20]; char sup_on[20]; char sup_off[20]; char text_A[10]; char text_N[10]; char text_subN[10]; char text_K[10]; char text_subK[10]; char text_subplus[10]; char text_subminus[10]; char text_sub0[10]; char text_sub1[10]; char text_sub2[10]; char text_sub3[10]; char text_sup2[10]; char text_sup3[10]; void utf8_encode(u32 codepoint, char * dst) { u32 cp; unsigned char c1, c2, c3, c4, c5, c6; if (dst == 0) { return; } else if (codepoint < 0) { *dst = 0; return; } /* Break the codepoint up into 6-bit pieces */ cp = codepoint; c6 = (unsigned char) (cp & 0x3f); c6 |= 0x80; cp >>= 6; c5 = (unsigned char) (cp & 0x3f); c5 |= 0x80; cp >>= 6; c4 = (unsigned char) (cp & 0x3f); c4 |= 0x80; cp >>= 6; c3 = (unsigned char) (cp & 0x3f); c3 |= 0x80; cp >>= 6; c2 = (unsigned char) (cp & 0x3f); c2 |= 0x80; cp >>= 6; c1 = (unsigned char) cp; c1 |= 0x80; if (codepoint <= 0x7f) { /* Normal ASCII */ sprintf(dst, "%c", (unsigned char) codepoint); } else if (codepoint <= 0x7ff) { /* 11 bits up to U+07FF: 110xxxxx 10xxxxxx */ sprintf(dst, "%c%c", (0xc0 | c5), 0x80 | c6); } else if (codepoint <= 0xffff) { /* 16 bits up to U+FFFF: 1110xxxx 10xxxxxx 10xxxxxx */ sprintf(dst, "%c%c%c", (0xe0 | c4), c5, c6); } else if (codepoint <= 0x1fffff) { /* 21 bits up to U+1FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx */ sprintf(dst, "%c%c%c%c", (0xf0 | c3), c4, c5, c6); } else if (codepoint <= 0x3ffffff) { /* 26 bits up to U+3FFFFFF: 111110xx 10xxxxxx ... (3 more) */ sprintf(dst, "%c%c%c%c%c", (0xf8 | c2), c3, c4, c5, c6); } else { /* 31 bits up to U+7FFFFFFF: 111110xx 10xxxxxx ... (4 more) */ sprintf(dst, "%c%c%c%c%c%c", (0xfc | c1), c2, c3, c4, c5, c6); } } void init_format(void) { /* Defaults for everything */ strcpy(ks_MCS, "MCS"); strcpy(ital_on, ""); strcpy(ital_off, ""); strcpy(bold_on, ""); strcpy(bold_off, ""); strcpy(small_on, ""); strcpy(small_off, ""); strcpy(sub_on, "["); strcpy(sub_off, "]"); strcpy(sup_on, "^"); strcpy(sup_off, ""); strcpy(text_A, "A"); strcpy(text_N, "N"); strcpy(text_subN, "N"); strcpy(text_K, "K"); strcpy(text_subK, "K"); strcpy(text_subplus, "+"); strcpy(text_subminus, "-"); strcpy(text_sub0, "0"); strcpy(text_sub1, "1"); strcpy(text_sub2, "2"); strcpy(text_sub3, "3"); strcpy(text_sup2, "2"); strcpy(text_sup3, "3"); switch(o_format) { case FORMAT_PLAIN: default: break; case FORMAT_OEIS: /* a(n)=2*a(n-1)-2*a(n-3)+a(n-4); a(0)=1, a(1)=1, a(2)=2, a(3)=3. There is no distinction between N and K (in other words, it doesn't matter whether it's a recurrence formula like "a(n)=a(n-1)+2*n+1" or a direct closed-form like "a(n)=n^2"). */ strcpy(text_A, "a"); strcpy(text_N, "n"); strcpy(text_subN, "n"); strcpy(text_K, "n"); strcpy(text_subK, "n"); strcpy(ital_on, ""); strcpy(ital_off, ""); strcpy(bold_on, ""); strcpy(bold_off, ""); strcpy(small_on, ""); strcpy(small_off, ""); strcpy(sub_on, "("); strcpy(sub_off, ")"); strcpy(sup_on, "^"); strcpy(sup_off, ""); break; case FORMAT_RHTF: if (o_rhtf_simpler) { strcpy(ks_MCS, "MCS"); } else { strcpy(ks_MCS, "[MCS|+mcs]"); } strcpy(ital_on, "*"); strcpy(ital_off, "*"); strcpy(bold_on, "#"); strcpy(bold_off, "#"); strcpy(small_on, "<$:"); strcpy(small_off, ":>"); strcpy(sub_on, "v{"); strcpy(sub_off, "}"); strcpy(sup_on, "^{"); strcpy(sup_off, "}"); break; case FORMAT_HTML: /* An=2*An-1-n2, A(0)=1. */ strcpy(text_N, "n"); strcpy(text_subN, "n"); strcpy(text_K, "k"); strcpy(text_subK, "k"); strcpy(ital_on, ""); strcpy(ital_off, ""); strcpy(bold_on, ""); strcpy(bold_off, ""); strcpy(small_on, ""); strcpy(small_off, ""); strcpy(sub_on, ""); strcpy(sub_off, ""); strcpy(sup_on, ""); strcpy(sup_off, ""); break; case FORMAT_UTF8: strcpy(sub_on, ""); strcpy(sub_off, ""); strcpy(sup_on, ""); strcpy(text_N, "X"); /* We use the Greek letter chi subscript, which looks like an "X", in place of both N and K */ strcpy(text_K, "X"); utf8_encode(0x1d6a, text_subN); /* 0x1d6a == subscript 'x' (really chi) */ strcpy(text_subK, text_subN); utf8_encode(0x208a, text_subplus); /* 0x208a == subscript '+' */ utf8_encode(0x208b, text_subminus); /* 0x208b == subscript '-' */ utf8_encode(0x2080, text_sub0); /* 0x2080 == subscript '0' */ utf8_encode(0x2081, text_sub1); /* 0x2081 == subscript '1' */ utf8_encode(0x2082, text_sub2); /* 0x2082 == subscript '2' */ utf8_encode(0x2083, text_sub3); /* 0x2083 == subscript '3' */ utf8_encode(0xb2, text_sup2); /* 0xb2 == supersript '2' */ utf8_encode(0xb3, text_sup3); /* 0xb3 == supersript '3' */ // fall through to ANSI case FORMAT_ANSI: strcpy(ital_on, "\033[1m"); strcpy(ital_off, "\033[22m"); strcpy(bold_on, "\033[1;4;94m"); strcpy(bold_off, "\033[22;24;30m"); strcpy(small_on, "\033[32m"); strcpy(small_off, "\033[0m"); break; } } #define N_HSYM 18 const char *huff_sym[N_HSYM] = {"", "0", /* 0 */ "100", /* 1 */ "110", /* -1 */ "1010", /* 2 */ "1110", /* 3 */ "10110", /* -2 */ "101110", /* 4 */ "111100", /* 5 */ "111110", /* -3 */ "1011110", /* -4 */ "1011111", /* -5 Last symbol from original code tree */ "1111010", /* 6 First expansion symbol */ "1111011", /* -6 */ "111111000", /* 7 */ "111111100", /* -7 */ "111111001", /* 8 */ "111111101"};/* -8 */ /* More could be added here if desired in the future */ int huff_name[20] = { 999, 0, 1, -1, 2, 3, -2, 4, 5, -3, -4, -5, 6, -6, 7, -7, 8, -8}; /* coeff_hsym takes a coefficient (like 3 or -7) and returns a pointer to a string of 1's and 0's representing its huffman symbol */ char * coeff_hsym(int x) { int i; for(i=0; i 0) && (bits[i-1] == '0')) { i--; } /* Remove a single trailing '1' */ if (i>=2) { i--; } bits[i] = 0; //printf(" after: %s\n", bits); rv = str_bindec(bits); if (i>=63) { fprintf(stderr, "Seqnum huffman-encoding %s is too big for s64\n", bits); exit(-1); } return(rv); } /* Converts a 64-bit integer into a character string of 1's and 0's. If the 'pure' parameter is true, it will not add the extra '1' and fill the rest of the string with '0's to the end. The reverse of this operation is performed in str_bindec The equivalent function in MCS.per is &decbin */ int s64_str10(s64 x, char * bits, int maxlen, int pure) { char *s; s64 n; int bitlen; int i; i = 0; /* Count number of significant bits in x */ bitlen = 0; n = x; while(n) { n >>= 1; bitlen++; } /* Restore value of n for use in the conversion */ n = x; s = bits; if (maxlen <= 0) { return(0); } if (maxlen == 1) { s[i] = 0; return(0); } if (n == 0) { s[i++] = '0'; s[i] = 0; return(1); } /* Get the first '1' bit into the sign position */ while(n >= 0) { n <<= 1; } /* Get all the 1 bits into the string */ while ((n != 0) && (i+1 < maxlen)) { s[i++] = ((n<0) ? '1' : '0'); n <<= 1; } /* Check for string too short. */ if (i+1 >= maxlen) { s[i] = 0; fprintf(stderr, "s64.str10: err1 x=%lld\n", x); exit(-1); } /* Add any more needed 0's */ while((i+1 < maxlen) && (i= maxlen) || (i= 1) { ca[i++] = cAn; } ca[i++] = cN; if (mode >= 2) { ca[i++] = cAm1; } if (mode >= 1) { ca[i++] = cNan; } ca[i++] = cN2; if (mode >= 3) { ca[i++] = cAm2; } if (mode == 3) { ca[i++] = cNam1; } ca[i++] = cN3; if (mode == 2) { ca[i++] = cNam1; } if (mode >= 3) { ca[i++] = ci2; } if (mode >= 2) { ca[i++] = ci1; } if (mode >= 1) { ca[i++] = i0; } if (debug_p) { printf("coeff.seqnum: calling enhuff\n"); } enhuff(ca, i, bits, 100); if (debug_p) { printf("coeff.seqnum: got bits: %s ", bits); } seqnum = str10_s64(bits); if (debug_p) { printf(" = %lld\n", (long long) seqnum); } return(seqnum); } /* Given a pointer to a list of tokens (coefficients), fills in all the values of the coefficient variables. The inverse of this function is performed by coeff.seqnum. */ void coeff_expand(int * ca, int * mode, int * cK, int * cAn, int * cN, int * cAm1, int * cNan, int * cN2, int * cAm2, int * cNam1, int * cN3, int * d2, int * d1, int * i0) { int i, m, ci1, ci2; i = 0; *cNam1 = 0; m = 4-ca[i++]; *mode = m; *cK = ca[i++]; if (m >= 1) { *cAn = ca[i++]; } else { *cAn = 0; } *cN = ca[i++]; if (m >= 2) { *cAm1 = ca[i++]; } else { *cAm1 = 0; } if (m >= 1) { *cNan = ca[i++]; } else { *cNan = 0; } *cN2 = ca[i++]; if (m >= 3) { *cAm2 = ca[i++]; } else { *cAm2 = 0; } if (m == 3) { *cNam1 = ca[i++]; } *cN3 = ca[i++]; if (m == 2) { *cNam1 = ca[i++]; } if (m >= 3) { ci2 = ca[i++]; } else { ci2 = 0; } *d2 = ci2 + 1; if (m >= 2) { ci1 = ca[i++]; } else { ci1 = 0; } *d1 = ci1 + 1; if (m >= 1) { *i0 = ca[i++]; } else { *i0 = 0; } if (debug_p) { int j; printf(" coeff_expand: in Huffman order the coefficients are: "); for(j=0; j= 1) { /* A[0] = C */ printf("%s%s%s%s%s%s = %d; ", ital_on, text_A, ital_off, sub_on, text_sub0, sub_off, i0); } if (mode >= 2) { /* A[1] = C */ printf("%s%s%s%s%s%s = %d; ", ital_on, text_A, ital_off, sub_on, text_sub1, sub_off, i1); } if (mode >= 3) { /* A[2] = C */ printf("%s%s%s%s%s%s = %d; ", ital_on, text_A, ital_off, sub_on, text_sub2, sub_off, i2); } /* Print initial "A[K] =" or "A[N+1] =" */ if (mode == 0) { printf("%s%s%s%s%s%s =", ital_on, text_A, sub_on, text_subK, sub_off, ital_off); } else { printf("%s%s%s%s", ital_on, text_A, ital_off, sub_on); printf("%s%s%s%s%s%s =", ital_on, text_subK, ital_off, text_subplus, text_sub1, sub_off); } do_plus = 0; /* +- C N A[N] */ if (cNan) { printf("%s %s%s%s %s", coeff(cNan, 0), ital_on, text_K, ital_off, ital_on); printf("%s%s%s%s%s", text_A, sub_on, text_subK, sub_off, ital_off); } /* +- C A[N] */ if (cAn) { printf("%s %s", coeff(cAn, 0), ital_on); printf("%s%s%s%s%s", text_A, sub_on, text_subK, sub_off, ital_off); } /* +- C N A[N-1] */ if (cNam1) { printf("%s %s%s%s %s%s%s%s", coeff(cNam1, 0), ital_on, text_K, ital_off, ital_on, text_A, ital_off, sub_on); printf("%s%s%s%s%s%s", ital_on, text_subK, ital_off, text_subminus, text_sub1, sub_off); } /* +- C A[N-1] */ if (cAm1) { printf("%s %s%s%s%s", coeff(cAm1, 0), ital_on, text_A, ital_off, sub_on); printf("%s%s%s%s%s%s", ital_on, text_subK, ital_off, text_subminus, text_sub1, sub_off); } /* +- C A[N-2] */ if (cAm2) { printf("%s %s%s%s%s", coeff(cAm2, 0), ital_on, text_A, ital_off, sub_on); printf("%s%s%s%s%s%s", ital_on, text_subK, ital_off, text_subminus, text_sub2, sub_off); } /* +- C N^3 */ if (cN3) { printf("%s %s%s%s%s%s%s", coeff(cN3, 0), ital_on, text_K, ital_off, sup_on, text_sup3, sup_off); } /* +- C N^2 */ if (cN2) { printf("%s %s%s%s%s%s%s", coeff(cN2, 0), ital_on, text_K, ital_off, sup_on, text_sup2, sup_off); } /* +- C N */ if (cN) { printf("%s %s%s%s", coeff(cN, 0), ital_on, text_K, ital_off); } /* +- C */ if (cK) { printf("%s", coeff(cK, 1)); } } /* This is like print2, but only displays formulas that fit the standard "A[N]=..." form. These are the pure recurrence relations, where the sequence index N does not appear in the formula. As a rule, these sequences can be defined in multiple ways with the same formula, but changing the initial terms. NOTE: You should call this routine as print3(mode, i0, i1, i2, cAn, cAm1, cAm2, cK) even though the arguments are internally called "cAm1" through "aAm3" */ void print3(int mode, int i0, int i1, int i2, int cAm1, int cAm2, int cAm3, int cK) { if (mode >= 1) { /* A[0] = C */ printf("%s%s%s%s%s%s = %d; ", ital_on, text_A, ital_off, sub_on, text_sub0, sub_off, i0); } if (mode >= 2) { /* A[1] = C */ printf("%s%s%s%s%s%s = %d; ", ital_on, text_A, ital_off, sub_on, text_sub1, sub_off, i1); } if (mode >= 3) { /* A[2] = C */ printf("%s%s%s%s%s%s = %d; ", ital_on, text_A, ital_off, sub_on, text_sub2, sub_off, i2); } /* A[N] = */ printf("%s%s%s%s%s%s =", ital_on, text_A, sub_on, text_subN, sub_off, ital_off); do_plus = 0; if (cAm1) { /* +- C A[N-1] */ printf("%s %s%s%s%s", coeff(cAm1, 0), ital_on, text_A, ital_off, sub_on); printf("%s%s%s%s%s%s", ital_on, text_subN, ital_off, text_subminus, text_sub1, sub_off); } if (cAm2) { /* +- C A[N-2] */ printf("%s %s%s%s%s", coeff(cAm2, 0), ital_on, text_A, ital_off, sub_on); printf("%s%s%s%s%s%s", ital_on, text_subN, ital_off, text_subminus, text_sub2, sub_off); } if (cAm3) { /* +- C A[N-3] */ printf("%s %s%s%s%s", coeff(cAm3, 0), ital_on, text_A, ital_off, sub_on); printf("%s%s%s%s%s%s", ital_on, text_subN, ital_off, text_subminus, text_sub3, sub_off); } if (cK) { printf("%s", coeff(cK, 1)); } else if (do_plus == 0) { /* We didn't have any terms at all! */ printf(" 0"); } } /* Dispatches to print2 or print3 depending on which is most appropriate */ void print_d(int mode, int cK, int cN, int cN2, int cN3, int i0, int cAn, int cNan, int i1, int cAm1, int cNam1, int i2, int cAm2) { if (o_show_seqnum) { s64 sn; sn = coeff_seqnum(mode, cK, cAn, cN, cAm1, cNan, cN2, cAm2, cNam1, cN3, i2-i1, i1-i0, i0); printf(" "); fmt_seqnum(sn); } if (o_show_seqnum && o_show_formula) { printf(" : "); } if (o_show_formula) { if ((cN==0) && (cN2==0) && (cN3==0) && (cNan==0) && (cNam1==0)) { print3(mode, i0, i1, i2, cAn, cAm1, cAm2, cK); } else { print2(mode, cK, cN, cN2, cN3, i0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2); } } } /* ----------------------- scanning and generation ----------------------- */ /* Generate a sequence from the 13-element vector of coefficients. */ int generate(s64 * A, int mode, int cK, int cN, int cN2, int cN3, int i0, int cAn, int cNan, int i1, int cAm1, int cNam1, int i2, int cAm2, int limit, int o_negokay, int * p_final_i) { int i, lv, lvs, N, got2, stop; s64 An, Am1, Am2, Ap1; int final_i; An = Am1 = Am2 = 0; /* To avoid warnings */ i = 0; if (mode == 0) { N = 0; A[i++] = cK; } if (mode >= 1) { A[i++] = i0; } if (mode >= 2) { A[i++] = i1; } if (mode >= 3) { A[i++] = i2; } final_i = i-1; got2 = 0; /* default */ if (o_negokay) { got2 = mode ? mode : 1; } else if ((mode == 0) && (A[0] < 0)) { got2 = 0; /* We formerly had 'got2 = -1' here, which rejected any non-recurrence-based sequence with a negative constant term */ } else if ((mode < 2) && (A[0] > 1)) { got2 = 1; } else if ((mode == 2) && (A[0] > 1)) { got2 = 2; } else if ((mode == 3) && (A[0] > 1)) { got2 = 3; } else if ((mode == 2) && (A[1] > 1)) { got2 = 1; } else if ((mode == 3) && (A[1] > 1)) { got2 = 2; } else if ((mode == 3) && (A[2] > 1)) { got2 = 1; } else { got2 = 0; } if (mode == 1) { An = A[0]; } if (mode == 2) { Am1 = A[0]; An = A[1]; } if (mode == 3) { Am2 = A[0]; Am1 = A[1]; An = A[2]; } stop = 0; if (mode == 0) { lvs = 1; } else { lvs = mode; } /* Loop until we've computed the requested number of terms */ for (lv = lvs; (got2 >=0) && (lv <= limit); lv++) { if (mode == 0) { N = lv; } else { N = lv - 1; } Ap1 = cK + (cN * N) + (cN2 * N * N) + (cN3 * N * N * N) + (cAn * An) + (cNan * N * An) + (cAm1 * Am1) + (cNam1 * N * Am1) + (cAm2 * Am2); if (i g_too_big) || (-Ap1 > g_too_big)) { /* %%% If I ever add an AnAm1 term, self-squaring sequences become possible and the limit testing will need to be handled differently */ stop++; } else if (o_negokay) { got2++; } else if (got2 < 0) { } else if (got2 > 0) { got2++; } else if (Ap1 > 1) { got2 = 1; } Am2 = Am1; Am1 = An; An = Ap1; } /* end of for(lv...) */ /* Increment global count of number of seqs generated */ g_total_gen++; *p_final_i = final_i; return got2; } /* End of generate() */ /* Generate a sequence using 128-bit arithmetic. The setup and calculation is the same as generate() above, except that we're using 128-bit variables. The main difference is that we don't count the number of significant terms ("got2" in generate()) and the only termination condition is overflow. */ void gen_128(s128 * A, int mode, int pcK, int pcN, int pcN2, int pcN3, int pi0, int pcAn, int pcNan, int pi1, int pcAm1, int pcNam1, int pi2, int pcAm2, int limit, int * p_final_i) { int i, lv, lvs, N, stop; s128 An, Am1, Am2, Ap1; s128 cK, cN, cN2, cN3, i0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2; int final_i; cK = pcK; cN = pcN; cN2 = pcN2; cN3 = pcN3; i0 = pi0; cAn = pcAn; cNan = pcNan; i1 = pi1; cAm1 = pcAm1; cNam1 = pcNam1; i2 = pi2; cAm2 = pcAm2; An = Am1 = Am2 = 0; /* To avoid warnings */ i = 0; if (mode == 0) { N = 0; A[i++] = cK; } if (mode >= 1) { A[i++] = i0; } if (mode >= 2) { A[i++] = i1; } if (mode >= 3) { A[i++] = i2; } final_i = i; if (mode == 1) { An = A[0]; } if (mode == 2) { Am1 = A[0]; An = A[1]; } if (mode == 3) { Am2 = A[0]; Am1 = A[1]; An = A[2]; } stop = 0; if (mode == 0) { lvs = 1; } else { lvs = mode; } for (lv = lvs; lv <= limit; lv++) { if (mode == 0) { N = lv; } else { N = lv - 1; } Ap1 = cK + (cN * N) + (cN2 * N * N) + (cN3 * N * N * N) + (cAn * An) + (cNan * N * An) + (cAm1 * Am1) + (cNam1 * N * Am1) + (cAm2 * Am2); if (i g_toobig128) { /* %%% If I ever add an AnAm1 term, self-squaring sequences become possible and the limit testing will need to be handled differently */ stop++; } Am2 = Am1; Am1 = An; An = Ap1; } /* Increment global count of number of seqs generated */ g_total_gen++; *p_final_i = final_i; } s64 match[ALLOC_LIMIT]; int g_match_len; long g_matches_found; int g_num_crc; int x_match_where; /* Print out the sequence */ void print128(s128 * A, int max_cl, int final_i, int match_where) { int i, stop, charlen; char ai[100]; stop = 0; charlen = 3; // for the final "..." for(i=0; (stop == 0) && (i<=final_i) && (charlen < max_cl); i++) { s128_str(A[i], ai); charlen = charlen + strlen(ai) + 2; if (charlen > max_cl) { stop = 1; } else if (A[i] > g_toobig128) { stop = 1; } else { if ((i >= match_where) && (i < match_where + g_match_len)) { printf("%s%s%s, ", bold_on, ai, bold_off); } else { printf("%s, ", ai); } } } printf("...\n"); } /* Print out the sequence */ void print1(s64 * A, int max_cl, int final_i, int match_where) { int i, stop, charlen; char ai[100]; stop = 0; charlen = 3; /* for the final "..." */ for(i=0; (stop == 0) && (i<=final_i) && (charlen < max_cl); i++) { sprintf(ai, "%lld", A[i]); charlen = charlen + strlen(ai) + 2; /* +2 for the ", " */ if (charlen > max_cl) { stop = 1; } else if ((A[i] > g_too_big) || (-A[i] > g_too_big)) { stop = 1; } else { if ((i >= match_where) && (i < match_where + g_match_len)) { printf("%s%s%s, ", bold_on, ai, bold_off); } else { printf("%s, ", ai); } } } printf("...\n"); } int iabs(int x) { if (x>=0) { return x; } return -x; } /* Given a sequence, find the first "significant" term (defined as the first term whose absolute value is at least 2) and the number of terms we have calculated starting from that term. */ void find_sig(s64 * A, int final_i, int *firstsig, int *siglen) { int i, stop, Ai; int i_firstsig, i_toobig; i_firstsig = -1; i_toobig = -1; stop = 0; for(i=0; i<=final_i; i++) { Ai = iabs(A[i]); if (stop) { } else if ((Ai > g_too_big) || (-Ai > g_too_big)) { i_toobig = i; stop++; } else { if ((i_firstsig < 0) && (Ai > 1)) { i_firstsig = i; } } } if (i_toobig < 0) { i_toobig = i; } *firstsig = i_firstsig; *siglen = i_toobig - i_firstsig; } int a_ilw[33] = {0, 10, 20, 26, 30, 33, 36, 38, 40, 42, 43, 45, 46, 47, 48, 49, 50, 51, 52, 52, 53, 54, 55, 55, 56, 56, 57, 58, 58, 59, 59, 60, 60}; // Integer logarithmic weight. Functions as iabs if using original weights. int ilw(int x) { int ora; // over-range adjust int nsc; // negative surcharge nsc = 0; if (x<0) { x = -x; nsc = 1; } if (o_weight == WEIGHT_ORIG) { return x; } ora = 0; while (x>32) { ora += 10; x >>= 1; } return a_ilw[x] + ora + nsc; } // Maps -2 -1 0 1 2 ... onto 2 1 0 0 1 ... int map1(int x) { if (x<0) { return ilw(-x); } else if (x == 0) { return 0; } else if (x == 1) { if (o_weight == WEIGHT_ORIG) { return 0; } else { return 1; // 0.1 surcharge } } /* else */ return ilw(x-1); } // Absolute value plus a surcharge if nonzero. For example, if the // surcharge is 1, this maps -2 -1 0 1 2 ... onto 3 2 0 2 3 ... int map2(int x, int surcharge) { if (o_weight == WEIGHT_ORIG) { surcharge /= 10; } if (x != 0) { return (surcharge + ilw(x)); } return 0; } // Simple absolute-value mapping. int map3(int x) { return ilw(x); } /* Like map1, but not logarithmically skewed. This means that if the log * weight method is selected we need to multiply the number by 10 so it * is on the same scale as the values returned by map1/map2/map3. */ int map4(int x) { if (o_weight == WEIGHT_LOG) { if (x <= 0) { return (-10 * x); } /* else */ return (10 * (x-1)); } /* else */ return (map1(x)); } /* The "coefficient range" CRANGE is the number of possible values coefficients * can take on. If MINC, MAXC are -8, 8 then CRANGE is 17 because there are * 17 values from -8 to 8 inclusive. */ #define CRANGE (1 + MAXC - MINC) /* Cached versions of the map functions */ int a_map1[CRANGE]; int a_map2_n3[CRANGE]; int a_map2_6[CRANGE]; int a_map2_13[CRANGE]; int a_map2_16[CRANGE]; int a_map2_20[CRANGE]; int a_map3[CRANGE]; int a_map4[CRANGE]; void cache_weights(void) { int i; /* printf("cache_weights\n"); */ for(i=MINC; i<=MAXC; i++) { a_map1[i-MINC] = map1(i); a_map2_n3[i-MINC] = map2(i, -3); a_map2_6[i-MINC] = map2(i, 6); a_map2_13[i-MINC] = map2(i, 13); a_map2_16[i-MINC] = map2(i, 16); a_map2_20[i-MINC] = map2(i, 20); a_map3[i-MINC] = map3(i); a_map4[i-MINC] = map4(i); } } void show_weights(void) { int i; printf("Selected weights: %s\n", ((o_weight == WEIGHT_ORIG) ? "original" : "logarithmic")); printf("%s", "Recurrence formula template:\n" "A[n+1] = K + c N + c N^2 + c N^3 + c A[n] + c A[n-1] + c A[n-2]\n" " + c N A[n] + c N A[n-1]\n" "Where all 'c's are coefficients. In the table:\n" " I_n = Initial terms: A[0], (A[1]-A[0]), and (A[2]-A[1])\n" " K = constant K N = coefficient of N\n" " N^2 = coefficient of N^2 N^3 = coefficient of N^3\n" " Ann = coefficients of A[n], A[n-1] and A[n-2]\n" " nAn = coefficients of N A[n] and N A[n-1]\n" "\n" ); printf("value mode I_n K N N^2 N^3 Ann nAn\n"); for(i=MINC; i<=MAXC; i++) { printf(" %2d ", i); /* value */ printf(" %3d", a_map4[i-MINC]); /* mode */ printf(" %3d", a_map1[i-MINC]); /* i0, d1, d2 */ printf(" %3d", a_map2_n3[i-MINC]); /* cK */ printf(" %3d", a_map3[i-MINC]); /* cN */ printf(" %3d", a_map2_13[i-MINC]); /* cN2 */ printf(" %3d", a_map2_20[i-MINC]); /* cN3 */ printf(" %3d", a_map2_6[i-MINC]); /* cAn, cAm1, cAm2 */ printf(" %3d", a_map2_16[i-MINC]); /* cNan, cNam1 */ printf("\n"); } } /* See if the given sequence matches the search terms. */ int match1(s64 * A, int final_i, int * p_match_where) { int i, j, gg; #if 0 printf("final_i %d g_match_len %d\n", final_i, g_match_len); for(i=0; i 0) && (j final_i) { gg = -1; /* ran off end of sequence */ } else if (o_ignore_sign && (A[i+j] == -match[j])) { /* pass */ } else if (A[i+j] != match[j]) { gg = 0; /* no match */ } } if (gg > 0) { /* we made it through all the match terms! */ *p_match_where = i; return 1; } } *p_match_where = 0; return 0; } /* CRC a sequence at its current length and all shorter lengths, looking for a match. It stops CRCing as soon as it finds a match. The return value is 0 if any matches were found, and 1 if the sequence is "unique" (no match) The inputs indicate: firstsig Index of the first "significant" term siglen Numbers of terms from the firstsig to the end of the array match_where Index of first term matching the search terms For example, if the search terms are "1 2 3", the following sequences are found. F = firstsig, M = match_where, and showing the values of siglen and the computed value of cl_min: M F 0 _1__2__3_ 4 5 6 7 SL=6 cl_min = 2 F M -2 -1 0 _1__2__3_ 4 5 SL=8 cl_min = 6 F M -3 -2 -1 0 _1__2__3_ 4 SL=8 cl_min = 7 */ int matchmulti(s64 * seq, int firstsig, int siglen, int match_where) { int check_len, cl_min; u32 crc; check_len = siglen; /* Compute the minimum number of terms that we will be CRCing to flag a "redundant" sequence. This must be large enough so that at the very least we are matching against all the terms in the query, plus any additional leading terms that precede the query terms. The -lfm option also affects this. */ cl_min = g_match_len + match_where - firstsig + o_lim_follow_match; if (cl_min < 1) { cl_min = 1; } while(check_len > cl_min) { crc = crcseq(seq + firstsig, check_len); if (crc_match(crc, check_len)) { return 0; } check_len--; } //printf("mm: g_match_len %d, match_where %d, firstsig %d, cl_min %d\n", // g_match_len, match_where, firstsig, cl_min); return 1; } /* Remember this CRC, and re-checksum the sequence at each shorter length */ void addseq(s64 * s, int siglen) { int sl; u32 crc; crcnode * n; sl = siglen; while (sl > 1) { crc = crcseq(s, sl); n = (crcnode *) malloc(sizeof(crcnode)); if (n) { n->crc = crc; n->next = crclists[sl]; crclists[sl] = n; } else { fprintf(stderr, "could not allocate %dth crcnode\n", g_num_crc); exit(-1); } g_num_crc++; sl--; } } /* report.match performs all the printout operations for a sequence that is considered a "match" by scan.1 */ void report_match(s64 * A, int max_cl, int final_i, int match_where, int mode, int cK, int cN, int cN2, int cN3, int i0, int cAn, int cNan, int i1, int cAm1, int cNam1, int i2, int cAm2, int cplx) { if (o_show_terms) { print1(A, o_max_charlen, final_i, match_where); } if (debug_p) { printf("report_match: mode=%d", mode); printf(" cK=%d cN=%d cN2=%d cN3=%d", cK, cN, cN2, cN3); if (mode>=1) { printf(" i0=%d cAn=%d cNan=%d", i0, cAn, cNan); } if (mode>=2) { printf(" i1=%d cAm1=%d cNam1=%d", i1, cAm1, cNam1); } if (mode>=3) { printf(" i2=%d cAm2=%d", i2, cAm2); } printf("\n"); } print_d(mode, cK, cN, cN2, cN3, i0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2); if (o_show_score) { if (o_weight == WEIGHT_ORIG) { printf(" %s(score: %d)%s", small_on, cplx, small_off); } else { printf(" %s(score: %d.%d)%s", small_on, cplx/10, cplx%10, small_off); } } printf("\n"); g_matches_found++; fflush(stdout); } s64 g_sc1, g_sc2, thispass, thishit, maxpass; /* X + 1 cK surcharge -0.3 2 X cN surcharge 0 X^2 cN2 surcharge 1.3 2 A[n] cAn surcharge 0.6 Cost of a "2" coefficient for each term: K N N^2 N^3 mode 17 20 33 40 0 i0 An N An 10 26 36 10 d1 Am1 N Am1 10 26 36 20 d2 Am2 10 26 30 */ int j_min_cplx; int j_max_cplx; int j_minmod; int j_maxmod; int j_mode; int j_cK; int j_cN; int j_cN2; int j_cN3; int j_i0; int j_cAn; int j_cNan; int j_i1; int j_cAm1; int j_cNam1; int j_i2; int j_cAm2; int j_cpl1; int j_cpl2; int j_cpl3; int j_cpl4; int j_cpl5; int j_cpl5b; int j_cpl6; int j_cpl7; int j_cpl8; int j_cpl9; int j_cp10; int j_cp11; int j_cplx; int j_i0lo; int j_i0hi; int j_d1; int j_d1lo; int j_d1hi; int j_i1lo; int j_i1hi; int j_d2; int j_d2lo; int j_d2hi; int j_i2lo; int j_i2hi; /* Uses global parameter j_min_cplx; sets j_max_cplx To use the iterator: j_min_cplx = min_cplx; gg = s2.iterator(range); while(gg) { ... Use the coefficients here ... gg = s2.iterator(0); } g_sc1, g_sc2, thispass, thishit will now contain total stats for the iteration */ int s2_iterator(int init_range) { if (init_range == 0) { /* Normal case: Iterator is already running, we just want to get the next vector */ goto getnext; } /* Start the iterator, with 'range' giving the amount of complexity to cover (with g_min_cplx already set) */ j_max_cplx = j_min_cplx + init_range - 1; thispass = thishit = 0; for (j_cAn = MINC; j_cAn <= MAXC; j_cAn++) { j_cpl1 = a_map2_6[j_cAn + MAXC]; for (j_cNan = MINC; (j_cNan <= MAXC) && (j_cpl1 <= j_max_cplx); j_cNan++) { j_cpl2 = j_cpl1 + a_map2_16[j_cNan + MAXC]; for (j_cAm1 = MINC; (j_cAm1 <= MAXC) && (j_cpl2 <= j_max_cplx); j_cAm1++) { j_cpl3 = j_cpl2 + a_map2_6[j_cAm1 + MAXC]; for (j_cNam1 = MINC; (j_cNam1 <= MAXC) && (j_cpl3 <= j_max_cplx); j_cNam1++) { j_cpl4 = j_cpl3 + a_map2_16[j_cNam1 + MAXC]; for (j_cAm2 = MINC; (j_cAm2 <= MAXC) && (j_cpl4 <= j_max_cplx); j_cAm2++) { j_cpl5 = j_cpl4 + a_map2_6[j_cAm2 + MAXC]; /* To find the MCS for a given query we might need to try a higher mode, even if the recurrence formula has almost all 0 coefficients. Here we find out how low the mode can be (it can always be as high as 3) */ if ((j_cAn == 0) && (j_cNan == 0) && (j_cAm1 == 0) && (j_cNam1 == 0) && (j_cAm2 == 0)) { j_minmod = 0; } else if ((j_cAm1 == 0) && (j_cNam1 == 0) && (j_cAm2 == 0)) { j_minmod = 1; } else if (j_cAm2 == 0) { j_minmod = 2; } else { j_minmod = 3; } if (o_min_inits) { j_maxmod = j_minmod; } else { j_maxmod = 3; } /* -lo option */ if (j_maxmod > o_max_order) { j_maxmod = o_max_order; } /* Try permitted modes */ for (j_mode = j_minmod; j_mode <= j_maxmod; j_mode++) { /* add complexity points for mode */ j_cpl5b = j_cpl5 + a_map4[j_mode + MAXC]; /* Select ranges for i0, d1, d2 */ if (j_mode == 0) { j_i0lo = 0; j_i0hi = 0; j_d1lo = 0; j_d1hi = 0; j_d2lo = 0; j_d2hi = 0; } else if (j_mode == 1) { j_i0lo = -1; j_i0hi = MAXC; j_d1lo = 0; j_d1hi = 0; j_d2lo = 0; j_d2hi = 0; } else if (j_mode == 2) { j_i0lo = -1; j_i0hi = MAXC; j_d1lo = -1; j_d1hi = MAXC; j_d2lo = 0; j_d2hi = 0; } else { /* j_mode == 3 */ j_i0lo = -1; j_i0hi = MAXC; j_d1lo = -1; j_d1hi = MAXC; j_d2lo = -1; j_d2hi = MAXC; } for (j_cK = MINC; (j_cK <= MAXC) && (j_cpl5b <= j_max_cplx); j_cK++) { j_cpl6 = j_cpl5b + a_map2_n3[j_cK + MAXC]; for (j_cN = MINC; (j_cN <= MAXC) && (j_cpl6 <= j_max_cplx); j_cN++) { j_cpl7 = j_cpl6 + a_map3[j_cN + MAXC]; for (j_cN2 = MINC; (j_cN2 <= MAXC) && (j_cpl7 <= j_max_cplx); j_cN2++) { j_cpl8 = j_cpl7 + a_map2_13[j_cN2 + MAXC]; for (j_cN3 = MINC; (j_cN3 <= MAXC) && (j_cpl8 <= j_max_cplx); j_cN3++) { j_cpl9 = j_cpl8 + a_map2_20[j_cN3 + MAXC]; for(j_i0 = j_i0lo; (j_i0 <= j_i0hi) && (j_cpl9 <= j_max_cplx); j_i0++) { j_cp10 = j_cpl9 + a_map1[j_i0 + MAXC]; /* i1 = i0 + d1 d1 = i1 - i0 i2 = i1 + d2 d2 = i2 - i1 */ j_i1lo = j_i0 + j_d1lo; if (j_i1lo < MINC) { j_i1lo = MINC; } j_i1hi = j_i0 + j_d1hi; if (j_i1hi > MAXC) { j_i1hi = MAXC; } //for(j_d1 = j_d1lo; (j_d1 <= j_d1hi) && (j_cp10 <= j_max_cplx); j_d1++) { //j_cp11 = j_cp10 + a_map1[j_d1 + MAXC]; //j_i1 = j_i0 + j_d1; for(j_i1 = j_i1lo; (j_i1 <= j_i1hi) && (j_cp10 <= j_max_cplx); j_i1++) { j_d1 = j_i1 - j_i0; #if 1 j_cp11 = j_cp10 + a_map1[j_d1 + MAXC]; #else j_cp11 = j_cp10 + a_map1[j_i1 + MAXC]; #endif j_i2lo = j_i1 + j_d2lo; if (j_i2lo < MINC) { j_i2lo = MINC; } j_i2hi = j_i1 + j_d2hi; if (j_i2hi > MAXC) { j_i2hi = MAXC; } //for(j_d2 = j_d2lo; (j_d2 <= j_d2hi) && (j_cp11 <= j_max_cplx); j_d2++) { //j_cplx = j_cp11 + a_map1[j_d2 + MAXC]; //j_i2 = j_i1 + j_d2; for(j_i2 = j_i2lo; (j_i2 <= j_i2hi) && (j_cp11 <= j_max_cplx); j_i2++) { j_d2 = j_i2 - j_i1; #if 1 j_cplx = j_cp11 + a_map1[j_d2 + MAXC]; #else j_cplx = j_cp11 + a_map1[j_i2 + MAXC]; #endif g_sc1++; thispass++; if ((j_cplx >= j_min_cplx) && (j_cplx <= j_max_cplx)) { thishit++; g_sc2++; return 1; /* The code to generate and test a sequence was here */ } getnext: ; /* This is the entry point */ } } } } } } } } } } } } } if (thispass > maxpass) { maxpass = thispass; } /* Iterator is complete. Return 0 to indicate there is nothing more to process */ return 0; } void scan2(int min_cplx, int range) { s64 A[ALLOC_LIMIT]; int max_cplx; int limit, valid, doit, final_i, match_where, uniq, gg; max_cplx = min_cplx + range - 1; if (debug_y) { if (min_cplx == max_cplx) { printf("sc2 [%d]: ", max_cplx); } else { printf("sc2 [%d..%d]: ", min_cplx, max_cplx); } } limit = SCAN_LIMIT; j_min_cplx = min_cplx; gg = s2_iterator(range); while(gg) { uniq = 0; /* First challenge: are there any numbers greater than 2 within the first o_poswindow terms of the sequence? */ if (o_report_all) { valid = o_poswindow; } else { valid = generate(A, j_mode, j_cK, j_cN, j_cN2, j_cN3, j_i0, j_cAn, j_cNan, j_i1, j_cAm1, j_cNam1, j_i2, j_cAm2, o_poswindow, o_negokay, &final_i); if (debug_g) { int k; printf("valid %d: ", valid); for(k=0; (k<=final_i) && (k<=10); k++) { printf("%4ld,", (long)(A[k])); } printf("...\n"); } if ((valid == 0) && o_negokay) { fprintf(stderr, "negative okay but not valid?\n"); print1(A, DEFAULT_MAX_CHARS, final_i, match_where); exit(-1); } } if (o_negokay || (valid > 0)) { /* "Limit + o_poswindow - valid" here is to ensure that the number of "valid" terms generated is always the same in each sequence we report */ generate(A, j_mode, j_cK, j_cN, j_cN2, j_cN3, j_i0, j_cAn, j_cNan, j_i1, j_cAm1, j_cNam1, j_i2, j_cAm2, limit + (o_poswindow - valid), (o_legacy_negprint == 0), &final_i); /* Second challenge: does the sequence match the search terms? */ if (o_report_all) { uniq = 1; } else if (match1(A, final_i, &match_where) && (match_where <= o_initial_nonmatch)) { int i_firstsig; int siglen; uniq = 0; if (o_variations) { uniq = 1; } else { find_sig(A, final_i, &i_firstsig, &siglen); /* Third challenge: Did we actually find something new? */ /* Find out if this sequence is "unique" in the sense that its initial digits do not match any sequence reported so far */ uniq = matchmulti(A, i_firstsig, siglen, match_where); if (uniq) { /* Yes, it's "unique" */ addseq(A + i_firstsig, siglen); } } } } if (uniq) { report_match(A, o_max_charlen, final_i, match_where, j_mode, j_cK, j_cN, j_cN2, j_cN3, j_i0, j_cAn, j_cNan, j_i1, j_cAm1, j_cNam1, j_i2, j_cAm2, j_cplx); if (g_matches_found >= o_max_match) { goto exit1; // return; } } /* Get the next set of coefficients */ gg = s2_iterator(0); } exit1: ; if (debug_y) { printf(" %9lld for %lld", thishit, thispass); if (thispass) { printf(", %.3g %%", 100.0*((double) thishit) / ((double) thispass)); } printf("\n"); } } /* End of scan.2 */ /* --- end of iterator version --- */ /* Given a string s, returns an integer (with optional - sign) scanned from the beginning of the string. If the parameter r is provided, it is set to point to the first character after the end of the scanned integer. */ s64 str_s64(char *s, char * *r) { s64 rv; char *c; s64 sign; sign = 1; rv = 0; c = s; if (*c == '-') { sign = -1; c++; } while(*c) { if (((*c) >= '0') && ((*c) <= '9')) { rv *= 10LL; rv += ((s64) ((*c) - '0')); } else { goto str_s64_done; } c++; } str_s64_done: ; if (r) { *r = c; } return (sign * rv); } /* testing with -seq1808378250 mode=2 cK=5 cN=-1 cN2=-1 cN3=0 i0=1 cAn=1 cNan=0 i1=4 cAm1=2 cNam1=0 bits: 1101011110010011010100110001010100 A[0] = 1; A[1] = 4; A[N+1] = A[N] + 2 A[N-1] - N^2 - N + 5 1, 4, 9, 16, 27, 44, 73, 124, 219, 400, 753, 1448, 2827, 5572, 11049, ... */ /* Does the -seqNNN command, which takes an MCS number and prints out the definition and some terms of the sequence. */ #define CMD_SEQ_LEN 100 void cmd_seq(s64 p1) { s128 A[ALLOC_LIMIT]; char bits[CMD_SEQ_LEN]; int coeff[CMD_SEQ_LEN]; int i, d1, d2; int mode, cK, cN, cN2, cN3, i0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2; int limit, final_i; int nsym; /* these globals affect print1() */ g_match_len = g_matches_found = 0; limit = ALLOC_LIMIT; if (debug_P) { s64_str10(p1, bits, CMD_SEQ_LEN, 1); printf("cmd_seq: seqnum %lldd = %sb\n", p1, bits); } s64_str10(p1, bits, CMD_SEQ_LEN, 0); nsym = dehuff(bits, coeff, CMD_SEQ_LEN); if (debug_P) { for(i=0; i=1) { printf(" i0=%d cAn=%d cNan=%d", i0, cAn, cNan); } if (mode>=2) { printf(" i1=%d cAm1=%d cNam1=%d", i1, cAm1, cNam1); } if (mode>=3) { printf(" i2=%d cAm2=%d", i2, cAm2); } printf("\n"); } gen_128(A, mode, cK, cN, cN2, cN3, i0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2, limit, &final_i); print128(A, o_cmdseq_lc, final_i, 0); print_d(mode, cK, cN, cN2, cN3, i0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2); printf("\n"); } void cmd_encode(s64 * ca) { int i; s64 snum; int mode, cK, cN, cN2, cN3, A0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2; int A1, A2; int d1, d2; mode = cK = cN = cN2 = cN3 = A0 = cAn = cNan = 0; i1 = cAm1 = cNam1 = i2 = cAm2 = A1 = A2 = d1 = d2 = 0; /* Set all the coefficients */ i = 0; A0 = ca[i++]; A1 = ca[i++]; A2 = ca[i++]; cK = ca[i++]; cN = ca[i++]; cN2 = ca[i++]; cN3 = ca[i++]; cAn = ca[i++]; cNan = ca[i++]; cAm1 = ca[i++]; cNam1 = ca[i++]; cAm2 = ca[i++]; if (A2 || cAm2) { mode = 3; } else if (A1 || cNam1 || cAm1) { mode = 2; } else if (A0 || cNan || cAn) { mode = 1; } else { mode = 0; } d1 = A1 - A0; d2 = A2 - A1; /* these globals affect print1() */ g_match_len = g_matches_found = 0; snum = coeff_seqnum(mode, cK, cAn, cN, cAm1, cNan, cN2, cAm2, cNam1, cN3, d2, d1, A0); cmd_seq(snum); } /* End of cmd.encode */ /* Like the -seqNNN command, but creates a Horadam sequence given its (p,q,r,s) */ void cmd_horadam(int p, int q, int r, int s) { s64 p1; int mode, cK, cN, cN2, cN3, i0, cAn, cNan, i1, cAm1, cNam1, i2, cAm2; int d1, d2; /* Set all the coefficients */ mode = 2; cK = cN = cN2 = cN3 = 0; i0 = p; cAn = s; cNan = 0; i1 = q; cAm1 = r; cNam1 = 0; d1 = i1 - i0; i2 = 0; cAm2 = 0; d2 = i2 - i1; /* these globals affect print1() */ g_match_len = g_matches_found = 0; p1 = coeff_seqnum(mode, cK, cAn, cN, cAm1, cNan, cN2, cAm2, cNam1, cN3, d2, d1, i0); cmd_seq(p1); } /* Like cmd.horadam, but makes a Lucas U or V sequence */ void cmd_lucas(int P, int Q, int UV) { int hor_p, hor_q, hor_r, hor_s; if (UV) { /* Generate the V parameters */ hor_p = 2; hor_q = P; } else { /* Generate the V parameters */ hor_p = 0; hor_q = 1; } /* These are always the same */ hor_r = P; hor_s = -Q; cmd_horadam(hor_p, hor_q, hor_r, hor_s); } int main(int argc, char** argv) { int i, np; s64 mt; /* match term */ s64 n, p1; int i1, i2, i3, i4; float f1; long l1; char c1; char * arg_warn; char *s; validate_types(); crclists = (crcnode * *) malloc(sizeof(crcnode *) * (1+SCAN_LIMIT)); if (crclists == 0) { fprintf(stderr, "could not malloc for crclists\n"); exit(-1); } for(i=0; i<=SCAN_LIMIT; i++) { crclists[i] = 0; } crc32_init(); init_format(); o_weight = WEIGHT_LOG; cache_weights(); /* compute MAXINT */ g_too_big = 1; while (g_too_big<<1 > 0) { g_too_big <<= 1; } /* At any step the value can increase by about 2N*MAXC, due to the (cNan * N * An) and (cNam1 * N * Am1) terms. Therefore, we stop trying to calculate a sequence when a term A[n] gets bigger than MAXINT / (2*MAXC*MAXT) where MAXT is the max number of terms. */ g_too_big = g_too_big / ((s64) (2 * MAXC * SCAN_LIMIT)); /* printf("g_too_big is %lld\n", (long long) g_too_big); */ #if 0 /* Old code for computing g_toobig128, which doesn't work on some (newer?) versions of GCC */ g_toobig128 = 1; while (g_toobig128+g_toobig128 > 0) { g_toobig128 = g_toobig128+g_toobig128; } printf("gb2 %g\n", (double) g_toobig128); printf("2mal = %g\n", (double)(2 * MAXC * ALLOC_LIMIT)); g_toobig128 = g_toobig128 / ((s128) (2 * MAXC * ALLOC_LIMIT)); printf("gb3 %g\n", (double) g_toobig128); #else g_toobig128 = (s128)(170141183460469231731687303715884105728.0 / (double)(2 * MAXC * ALLOC_LIMIT)); #endif /* Convert to a string, needed for the error message we print if they exceed the limit when giving an option like -lm */ s128_str(g_toobig128, g_tb_str); /* printf("g_toobig128 is %s\n", g_tb_str); */ /* Options defaults */ o_max_match = 3; o_format = 2; o_ignore_sign = 0; o_rhtf_simpler = 0; o_variations = 0; o_report_all = 0; o_max_charlen = DEFAULT_MAX_CHARS; o_cmdseq_lc = DEFAULT_MC_CMDSEQ; o_max_order = 10; // Effectively no limit o_negokay = 1; o_poswindow = 10; o_legacy_negprint = 0; o_cplx_limit = 12; o_cplx_custom = 0; o_encode = 0; o_horadam = 0; o_min_inits = 0; o_show_formula = 1; o_show_seqnum = 1; o_show_terms = 1; o_show_score = 1; o_show_rhtf_links = 1; o_lucas = 0; o_initial_nonmatch = 999; o_lim_follow_match = 10; g_match_len = 0; arg_warn = 0; for(i=1; i= FORMAT_PLAIN) && (n <= FORMAT_HTML)) { o_format = n; init_format(); // printf("opt format %d\n", o_format); } } else if ((strcmp(argv[i], "-h") == 0) || (strcmp(argv[i], "--h") == 0) || (strcmp(argv[i], "-help") == 0) || (strcmp(argv[i], "--help") == 0)) { /* Built-in help */ usage(); arg_warn = argv[i]; } else if (strcmp(argv[i], "--hide-formula") == 0) { /* Do not display formula (sequence definition) */ o_show_formula = 0; } else if (strcmp(argv[i], "--hide-score") == 0) { /* Do not display complexity score */ o_show_score = 0; } else if (strcmp(argv[i], "--hide-seqnum") == 0) { /* Do not display sequence number (MCS-number) */ o_show_seqnum = 0; } else if (strcmp(argv[i], "--hide-terms") == 0) { /* Do not display the sequence itself */ o_show_terms = 0; } else if (strcmp(argv[i], "--hide-rhtf-links") == 0) { /* Do not include RHTF-style links with sequence number (MCS-number) */ o_show_rhtf_links = 0; } else if (strncmp(argv[i], "horadam", strlen(argv[i])) == 0) { /* Treat the search terms as elements of a Horadam sequence vector */ o_horadam = 1; } else if (strcmp(argv[i], "-is") == 0) { /* Ignore sign, e.g. matches 3 7 27 against 3 -7 27 */ o_ignore_sign = 1; } else if (strncmp(argv[i], "-lc", 3) == 0) { /* Limit complexity-score */ if (np = sscanf(argv[i], "-lc%d", &i1), np == 1) { if (i1 > 10) { o_max_charlen = i1; o_cmdseq_lc = i1; } else { fprintf(stderr, "-lc value should be at least 11\n"); h_for_help(); exit(-1); } } else { fprintf(stderr, "-lc needs a number, e.g. '-lc500'.\n"); h_for_help(); exit(-1); } } else if (np = sscanf(argv[i], "-lfm%d", &i1), np == 1) { /* Limit following matching terms */ if (i1 >= 0) { o_lim_follow_match = i1; } else { fprintf(stderr, "-lfm value should be at least 0\n"); h_for_help(); exit(-1); } } else if (np = sscanf(argv[i], "-lin%d", &i1), np == 1) { /* Limit number of initial non-matching terms */ if (i1 >= 0) { o_initial_nonmatch = i1; /* printf("o_initial_nonmatch == %d\n", o_initial_nonmatch); */ } else { fprintf(stderr, "-lin value should be at least 0\n"); h_for_help(); exit(-1); } } else if (np = sscanf(argv[i], "-lm%lld", &p1), np == 1) { /* Limit magnitude: Do not display (or attempt to calculate) any terms whose absolute value exceeds the given amount */ /* Set the s64 limit, if we can */ if ((p1 >= 1) && (p1 <= g_too_big)) { g_too_big = p1; } /* Set the s128 limit. %%% NOTE: Because we sscanf into an s64, the biggest -lm arg you can give is 9223372036854775807. I could fix this by sscanf'ing a string and converting the string to s128; it seems I'd have to write a str_128 function. A cheaper solution would be to sscanf into a double, but it would be less precise. */ if ((p1 >= 1) && ((s128)p1 <= g_toobig128)) { g_toobig128 = p1; } else { fprintf(stderr, "-lm value must be in range [1..%s]\n", g_tb_str); h_for_help(); exit(-1); } } else if (np = sscanf(argv[i], "-lo%d", &i1), np == 1) { /* Limit order */ if (i1 >= 0) { o_max_order = i1; } else { fprintf(stderr, "-lo value should be at least 0\n"); h_for_help(); exit(-1); } } else if (np = sscanf(argv[i], "-lp%d", &i1), np == 1) { /* Limit to positive values: In order to match, a candidate sequence must produce a value of 2 or more within the first N terms. Legacy behavior of #mcs# is achieved by specifying -lp10 */ if (i1 >= (SCAN_LIMIT/2)) { fprintf(stderr, "-lp value cannot be greater than %d\n", (SCAN_LIMIT/2)-1); h_for_help(); exit(-1); } if (i1 > 0) { o_negokay = 0; o_poswindow = i1; } else { fprintf(stderr, "-lp value should be at least 1\n"); h_for_help(); exit(-1); } } else if (np = sscanf(argv[i], "-ls%f", &f1), np == 1) { if (f1 >= 2.0) { if (o_weight == WEIGHT_LOG) { o_cplx_limit = int(f1 * 10.0 + 0.5); o_cplx_custom = 1; printf("Logarithmic weights; custom o_cplx_limit %.1f\n", ((float)o_cplx_limit)/10.0); } else { o_cplx_limit = int(f1 + 0.5); printf("Simple weights; custom o_cplx_limit %d\n", o_cplx_limit); } } else { fprintf(stderr, "-ls value should be at least 2\n"); h_for_help(); exit(-1); } } else if (strncmp(argv[i], "lucas", strlen(argv[i])) == 0) { o_lucas = 1; } else if (np = sscanf(argv[i], "MCS%lld", &p1), np == 1) { cmd_seq(p1); arg_warn = argv[i]; } else if ((strcmp(argv[i], "-m") == 0) || (strcmp(argv[i], "--minimal-inits") == 0) ) { /* Do not allow extra initial terms in a definition, e.g. if the recurrence is mode 1, there cannot be more than 1 initial term(s) */ o_min_inits = 1; } else if (strncmp(argv[i], "--old-negprint", strlen(argv[i])) == 0) { o_legacy_negprint = 1; } else if (np = sscanf(argv[i], "-n%ld", &l1), np == 1) { if (l1 <= 0) { fprintf(stderr, "-n argument should be positive\n"); h_for_help(); exit(-1); } else { o_max_match = l1; // printf("opt max_match %d\n", o_max_match); } } else if (strcmp(argv[i], "-r") == 0) { fprintf(stderr, "-r is deprecated; use --rhtf or --rhtf-simple instead.\n"); h_for_help(); exit(-1); } else if (strcmp(argv[i], "--report-all") == 0) { /* Report all sequences generated, regardless of match */ o_report_all = 1; o_max_match = (s64) 1.0e12; } else if (strcmp(argv[i], "--rhtf") == 0) { o_format = FORMAT_RHTF; init_format(); } else if (strcmp(argv[i], "--rhtf-simple") == 0) { o_format = FORMAT_RHTF; o_rhtf_simpler = 1; o_show_rhtf_links = 0; init_format(); } else if (strncmp(argv[i], "-S", 2) == 0) { show_weights(); exit(0); } else if (np = sscanf(argv[i], "-seq%lld", &p1), np == 1) { cmd_seq(p1); arg_warn = argv[i]; } else if (strcmp(argv[i], "--show-formula") == 0) { /* Display formula (sequence definition) */ o_show_formula = 1; } else if (strcmp(argv[i], "--show-score") == 0) { /* Display complexity score */ o_show_score = 1; } else if (strcmp(argv[i], "--show-seqnum") == 0) { /* Display sequence number (MCS-number) */ o_show_seqnum = 1; } else if (strcmp(argv[i], "--show-terms") == 0) { /* Display the sequence itself */ o_show_terms = 1; } else if (strcmp(argv[i], "--show-rhtf-links") == 0) { /* Include RHTF-style links with sequence number (MCS-number) */ o_show_rhtf_links = 1; } else if (strcmp(argv[i], "--version") == 0) { printf("mcsfind version %d\n", MK_SUBST_DATE); exit(0); } else if (strcmp(argv[i], "-v") == 0) { o_variations = 1; } else if (strncmp(argv[i], "-w", 2) == 0) { s = argv[i]; n = str_s64(s+2, 0); if ((n >= WEIGHT_ORIG) && (n <= WEIGHT_LOG)) { o_weight = n; cache_weights(); // printf("opt weight %d\n", o_weight); } } else if ( ((argv[i])[0] == '-') && isalpha((argv[i])[1]) ) { /* '-' followed by a letter, and not caught by the above */ fprintf(stderr, "unrecognized option '%s', or option parameter missing\n", argv[i]); h_for_help(); exit(-1); } else if ( ( ((argv[i])[0] == '-') && isdigit((argv[i])[1]) ) || ((argv[i])[0] == ',') || isdigit((argv[i])[0]) ) { /* A sequence term */ char *s; char *r; s = argv[i]; r = 0; if (*s == ',') { s++; } while(s && (*s) && (g_match_len < SCAN_LIMIT)) { mt = str_s64(s, &r); if (mt < 0) { o_negokay = 1; } if (o_negokay || (mt > 0)) { match[g_match_len++] = mt; /* printf("got term %lld leaving %s\n", mt, r); */ } s = 0; if (r && (*r == ',')) { r++; if ((*r == '-') || (isdigit(*r))) { s = r; } } } } else if (strcmp(argv[i], ",") == 0) { /* We also allow bare commas */ } else { fprintf(stderr, "Unrecognized argument '%s'\n", argv[i]); exit(-1); } } if (arg_warn) { exit(0); } if (o_encode) { if (g_match_len == 12) { cmd_encode(match); exit(0); } else { help_encode(); exit(-1); } } if (o_horadam) { if (g_match_len >= 4) { printf("Horadam sequence Wn(%lld, %lld; %lld, %lld) :\n", match[0], match[1], match[2], match[3]); cmd_horadam(match[0], match[1], match[2], match[3]); exit(0); } else { fprintf(stderr, "Horadam requires 4 integer arguments\n"); exit(-1); } } if (o_lucas) { if (g_match_len >= 2) { printf("Lucas sequence U(%lld, %lld) :\n", match[0], match[1]); cmd_lucas(match[0], match[1], 0); printf("Lucas sequence V(%lld, %lld) :\n", match[0], match[1]); cmd_lucas(match[0], match[1], 1); exit(0); } else { fprintf(stderr, "Lucas requires 2 integer arguments\n"); exit(-1); } } if (debug_y) { printf("Searching for: "); for(i=0; i 60)) { range = 1 + (i-60)/7; } else { range = 1; } scan2(i, range); i = i + range; } if (g_matches_found < o_max_match) { char s_higher[16]; printf("Search ended after %ld matches, (exceeded complexity limit ", g_matches_found); if (o_weight == WEIGHT_LOG) { printf("%d).\n", o_cplx_limit/10); sprintf(s_higher, "%d", (o_cplx_limit/10)+2); } else { printf("%d).\n", o_cplx_limit); sprintf(s_higher, "%d", o_cplx_limit+2); } printf("To search further, use -ls%s\n", s_higher); } if (debug_y) { printf("Total: %lld for %lld", g_sc2, g_sc1); if (g_sc1) { printf(" (%3.2g %%)", 100.0 * ((double) g_sc2) / ((double) (g_sc1))); } printf(" maxpass == %lld\n", maxpass); } { double tsec = gettime(); printf("Sequences generated: %ld\n", ((long) g_total_gen)); printf("Time: %7.3f\n", tsec); } return 0; } /* end of mcsfind.c */