/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 luams3.c

A summary of this and other HPC projects is in highperformance.rhtf
19961015.xxxx Create LUAMS2 to answer a new thread on sci.fractals
about the Mandelbrot area. It seems they want to do a new GIPPP on our
newer, faster machines.
19961017.2xxx Find my postings to sci.fractals from 1993. Notable
points:
 I was about to run a GIPPP myself but called it off in order to
accomodate the distanceestimate GIPPP, which at the time was more
mathematically interesting.
 The 1.506595 result was calculated with a dwell limit of 512*1024
and period detection. It was an average of 40 runs, half of which
straddled Re=0.5 (Seahorse Valley) and the other half of which ran
right down it.
 The error term 0.000002 came from the standard deviation for the 40
runs divided by sqrt(40).
 An additional 0.0000056 should have been subtracted to account for
the dwell limit 524288 that was used in the 40 runs.
 It would be nice to calculate the total number of iterations iterated
in the course of the area estimate.
 It takes 7 years for computing power to grow enough to use a 2x2
larger grid.
And other things I thought of while reading:
 Given the trend in stderr, it takes a little under 20 years for
computing power to grow enough to gain one decimal digit.
 It would be nice to compute the centerofgravity of the Mandelbrot
Set at the same time as the area.
19961018.14xx. Add period detection.
grid itmax area stderr time
32 256 1.509723 0.039114 1.60
32 512 1.500947 0.038372 0.63
64 512 1.516968 0.015629 2.00
64 1024 1.512485 0.016269 2.30
128 1024 1.510655 0.007460 7.92
128 2048 1.508534 0.007052 8.87
256 2048 1.508178 0.002845 33.2
256 4096 1.507292 0.002688 36.6
512 4096 1.507517 0.001122 141
512 8192 1.507116 0.001097 155
1024 8192 1.506971 0.000489 607
1024 16384 1.506779 0.000501 661
19961018.16xx. Add test for period2 muatoms, but it doesn't work.
Add skewing along Im axis and code to deal with the first row
differently.
32 256 1.530539 0.039807 1.45
32 512 1.524781 0.038548 0.63
64 512 1.508679 0.017331 2.00
64 1024 1.504898 0.019480 2.32
128 1024 1.509324 0.005810 7.90
128 2048 1.507078 0.005430 8.88
256 2048 1.508248 0.002059 33.2
256 4096 1.507333 0.002191 36.6
512 4096 1.507676 0.001369 149
512 8192 1.507299 0.001376 166
1024 8192 1.507195 0.000489 675
19961018.1635. Fix period2 test. Increase number of iterations
greatly to reduce that source of error.
19961018.1717. Add printout of total aggregate iterations.
32 4096 1.512 842 730 0.035 980 710 771,615 2.07
64 8192 1.502 917 751 0.019 617 425 2,714,431 4.40
128 16384 1.505 115 841 0.005 550 857 10,725,998 12.3
256 32768 1.506 413 466 0.002 277 286 45,943,157 49.6
512 65536 1.506 898 665 0.001 368 183 200,294,650 218
1024 131072 1.506 801 245 0.000 503 518 877,059,533 895
2048 262144 1.506 675 767 0.000 151 678 3,830,269,312 3737
4096 524288 1.506 588 196 0.000 074 581 16,898,067,806 15840
8192 1048576 1.506 595 617 0.000 027 484 74,884,551,084 67570
19961021.1248. Make m_area save to a file periodically, with filename
based on gridsize and n_max so (potentially) different projects can be
started and stopped at different times. It also has the nice sideeffect
that if you run the same gridsize again after recompiling (or whatever)
it returns the area for that gridsize immediately.
1024 262144 1.506 622 314 46,910,573
2048 524288 1.506 286 621 213,925,295
19961021.1315. Go back to m_area_stat, which now automatically benefits
from the changes to m_area. Increase itmax again and begin a new set of
runs (Also, the skew parameter has become an integer, so two things have
changed):
32 8192 1.511 458 647 0.037 221 915 904,123
64 16384 1.502 640 743 0.019 431 398 3,133,053
128 32768 1.504 840 699 0.005 686 364 11,973,129
256 65536 1.506 341 352 0.002 261 414 51,835,404
512 131072 1.506 886 924 0.001 371 587 225,038,584
1024 262144 1.506 783 392 0.000 493 761 971,556,504
2048 524288 1.506 674 263 0.000 152 765 4,248,279,832
19961022
4096 1048576 1.506 585 067 0.000 074 146 18,687,285,458
19961023
8192 2097152 1.506 593 810 0.000 027 584 83,170,102,744
19961028
16384 4194304 1.506 588 027 0.000 012 606 368,748,991,052
(19961107)
32768 8388608 1.506 591 544 0.000 004 484 1,416,385,072,113
(19961209)
65536 16777216 1.506 592 300 0.000 001 900 3,728,239,546,503
19961028.17xx. Add nice() routine, so that it will avoid slowing
down the Mac when the mouse is being moved. This does a pretty good
job of keeping LUAMS out of the way while I'm doing development.
19961029.1550. Small changes to nice().
19961104.2xxx. rebuild Power MANDELZOOM projects, preparing to merge
pipelined iteration code into LUAMS.
19961105.1458. Finally add Jay Hill's tests for period1 and 2
muatoms (replacing my old test for the period2 muatom). The
iterations column shows how much this speeds up the program:
32 8192 1.511 458 647 0.037 221 915 276,466
64 16384 1.502 640 743 0.019 431 398 993,155
128 32768 1.504 840 699 0.005 686 364 4,144,440
256 65536 1.506 341 352 0.002 261 414 17,877,456
512 131072 1.506 886 924 0.001 371 587 79,693,035
Immediately replace the ongoing background task so it can benefit from
the speedup.
19961105.1650. Add knowledge of exact value (7 pi / 16) for period 1
and 2 muatoms; pixel counting only used to estimate area of the
higher muatoms. As expected, this lowers the standard deviation a
lot at first, then substantially less and eventually not at all:
32 8192 1.510 347 371 0.028 337 143 276,466
64 16384 1.501 618 093 0.013 830 156 993,155
128 32768 1.506 875 646 0.005 529 593 4,144,440
256 65536 1.506 404 500 0.001 983 789 17,877,456
512 131072 1.506 870 205 0.001 167 508 79,693,035
1024 262144 1.506 775 316 0.000 508 381 361,842,027
(This happens because the standard error has two components: that
which comes from the period1 and 2 muatoms and that which comes
from the rest of the Set. The component of error from the two muatoms
goes down as (grid size ^ 1.0)/(grid size^2) because the muatoms'
boundaries have dimension 1.0; the other error term goes down slower
and as a result it dominates when the grid size is arbitrarily large.)
19961107.xxxx. Merge in pipelined iteration code and (almost) verify
accuracy. It seems to differ on a few points, probably because of the
differences in the iteration algorithms. Also, it doesn't provide a
speed improvement (yet) because it doesn't have orbit detection. (In
other words, the simple iterator with orbit detection is faster than
the pipelined iterator without orbit detection.)
19961107.17xx. Two sources of discrepancy were found: One was a bug
(not flushing the pipeline before doing the first row adjustment) and
the other results from the differences in the iteration formulas (such
as using zi2 = 2*(zr*zi + ci/2) rather than zi2 = zr*zi + ci). The
following are the new standard results to compare to (times are
approximate):
32 8192 1.511 458 647 0.037 221 915 275,459 0.267
64 16384 1.502 640 743 0.019 431 398 967,521 0.867
128 32768 1.504 914 742 0.005 692 132 4,188,562 4.533
256 65536 1.506 386 129 0.002 312 484 17,981,647 15.87
512 131072 1.506 869 368 0.001 349 102 80,190,202 64.72
{For the next 15 months the program's performance is hindered because
I don't realize that the loop unrolling makes the period detection so
inefficient as to cancel out the benefit of iterating two points at
once. 19990520}
19961107.2039. Pipeline now includes orbit detection, but isn't any
faster than single iteration. Don't really know why...
Subject: Mandelbrot Set Area, new result
From: Munafo@prepress.pps.com
Date: 1996/12/23
MessageId: <32BED4D5.BF8@prepress.pps.com>
Newsgroups: sci.fractals
I have continued to compute the area of the Mandelbrot Set using
my method of averaging 20 runs by the pixelcounting method,
with slightly different grid placements and grid spacings. The
latest result (after about 3.7 trillion Mandelbrot iterations)
is 1.50659230 + 0.0000006.
Here are the results of a set of such averaged runs. For each
grid size, 20 runs were computed and averaged together.
grid dwell average standard
size limit area deviation
   
32 8192 1.511 4 0.037 2
64 16384 1.502 6 0.019 4
128 32768 1.504 84 0.005 68
256 65536 1.506 34 0.002 26
512 131072 1.506 88 0.001 37
1024 262144 1.506 783 0.000 493
2048 524288 1.506 674 0.000 152
4096 1048576 1.506 585 0 0.000 074 1
8192 2097152 1.506 593 8 0.000 027 5
16384 4194304 1.506 588 0 0.000 012 6
32768 8388608 1.506 591 54 0.000 004 48
65536 16777216 1.506 592 30 0.000 001 90
The standard deviation measures how much (on average) each
run differs from the average. This means that the average
itself almost certainly varys even less from the true area.
For 20 samples, the standard error of the mean is about 0.23
of the standard deviation. A safe estimate of the error
would be about 1/3 of the standard deviation.
(My old estimate, from April 13th 1993, was 1.506595
+ 0.000002. That estimate was based on an average of 40
runs, and used a dwell limit of 524288. I should have
subtracted 0.000005 to account for the error in using such
a low dwell limit; I also was a little too "liberal" in my
calculation of the expected error.)
 Robert Munafo
{Big hiatus here 19990520} 
19971010.0055. Resurrect the project. I have just gotten it to run
again after updating to the new CodeWarrior. It was crashing because
g_dp was being set to dwell2, which caused an infinite loop (the fix
was to set g_dp to dwell3). Can't tell how fast it's
running because the iteration counts are way off due to orbit
detection  it's adding up the dwell values returned by the pipeline,
not the actual number of iterations performed. With this new build
on the 6100/80 I am starting a new set of runs:
32 8192 1.511 458 647 0.037 221 916 1,427,000 0.367
64 16384 1.502 640 744 0.019 431 398 7,705,483 0.983
128 32768 1.504 914 742 0.005 692 132 53,725,852 3.667
256 65536 1.506 386 130 0.002 312 485 389,356,427 13.72
512 131072 1.506 869 368 0.001 349 103 2,983,154,869 56.55
1024 262144 1.506 781 916 0.000 492 708 23,278,493,347 262.9
2048 524288 1.506 673 891 0.000 151 484 183,890,967,883 1405
4096 1048576 1.506 583 929 0.000 074 489 1,461,504,579,703 5106
19971011
8192 2097152 1.506 593 644 0.000 027 695 11,658,311,598,595 59242
19971012.1454. Make it measure ticks only while it's actually calculating,
and make it record ticks in the files. This requires trashing all the
files so far, so I have to start over. While I'm at it, I also make it record
the actual number of iterations and the "simulated" number of iterations
seperately, which had stopped being done right after loop detection was added.
It is somewhat disturbing to notice that the actual area values are changing
too. Probably something to do with the pipeline restart after loading from a
file  I think perhaps it isn't flushing the pipeline before doing saves.
32 1.511 458 647 0.037 221 916 13,698,616
8192 460,236 0.55
64 1.502 640 744 0.019 431 398 84,071,307
16384 1,452,625 1.1667
128 1.504 914 742 0.005 692 132 585,681,564
32768 6,091,786 3.5833
256 1.506 386 130 0.002 312 485 4,358,544,267
65536 30,530,867 13.05
512 1.506 751 662 0.001 348 801 33,606,713,474
131072 165,141,228 54.817
1024 1.506 771 394 0.000 495 651 263,814,226,416
262144 1,048,193,486 251.7
2048 1.506 673 891 0.000 151 484 2,090,491,542,859
524288 7,455,656,801 1349.6
4096 1.506 584 024 0.000 074 387 16,643,863,788,711
1048576 55,389,875,749 8218.1
19971012.2347. Yep, it wasn't flushing the pipeline before doing file
operations. Change format of output and begin a whole new set of runs.
32 1.511 458 647 0.037 221 916 13,698,616
8192 460,236 0.55 5.8575 MFLOPs
64 1.502 640 744 0.019 431 398 84,071,307
16384 1,452,625 1.1667 8.7157 MFLOPs
128 1.504 914 742 0.005 692 132 585,681,564
32768 6,091,786 3.55 12.012 MFLOPs
256 1.506 386 130 0.002 312 485 4,358,495,089
65536 30,543,955 13.05 16.384 MFLOPs
512 1.506 869 368 0.001 349 103 33,607,097,921
131072 165,722,959 55.183 21.022 MFLOPs
1024 1.506 781 916 0.000 492 708 263,815,177,892
262144 1,050,371,926 252.75 29.090 MFLOPs
2048 1.506 673 891 0.000 151 484 2,090,497,201,424
524288 7,464,872,712 1359 38.451 MFLOPs
19971013:
4096 1.506 584 024 0.000 074 387 16,643,919,148,551
1048576 55,472,219,207 8321.5 46.663 MFLOPs
8192 1.506 593 644 0.000 027 695 132,836,944,645,477
2097152 428,665,503,505 57369 52.304 MFLOPs
19971018:
16384 1.506 588 076 0.000 012 669 1,061,430,262,321,482
4194304 3,365,249,417,036 4.0937e5 57.544 MFLOPs
19971014.14xx Look over the code thoroughly, trying to figure out why
the pipeline is running only half as fast as theory indicates it should.
I don't see anything, so I'll have to do some profiling. Given the magnitude
of the discrepancy, just breaking into Macsbug a bunch of times should be
sufficient to find where it's spending half its time.
19971014.235x I check this and find that it's actually spending almost all
of its time in the unrolled loop. However, there are a lot of pipeline
stalls because the compiler reordered the instructions.
After I turn off optimization and recompile it is in fact a bit faster,
but not nearly as faster as I thought it would be. It looks like the
asymptote is a little over 60 MFLOPs. Then I remember that it's double
precision, and the next day I check Power MANDELZOOM and see that it
only performs at 64 MFLOPs double anyway.
19971018.23xx. Read the notes from 19961017 and realize I still haven't
done the centerofgravity thing. Oops.
32 8192 1.511 458 647 0.037 221 916 13,698,616
0.292 565 786 0.014 640 563 460,236
0.517 6.236 MFLOPs
64 16384 1.502 640 744 0.019 431 398 84,071,307
0.287 474 451 0.004 811 761 1,452,625
1.78 5.702 MFLOPs
128 32768 1.504 914 742 0.005 692 132 585,639,290
0.286 495 853 0.002 352 688 6,094,438
3.58 11.905 MFLOPs
256 65536 1.506 386 130 0.002 312 485 4,358,537,348
0.286 741 015 0.000 912 340 30,588,764
12.8 16.750 MFLOPs
512 131072 1.506 869 368 0.001 349 103 33,607,168,662
0.286 880 847 0.000 416 546 165,498,826
54.5 21.263 MFLOPs
1024 262144 1.506 781 916 0.000 492 708 263,814,920,625
0.286 801 539 0.000 173 164 1,049,624,871
246 29.910 MFLOPs
2048 524288 1.506 673 891 0.000 151 484 2,090,496,819,135
0.286 811 014 0.000 074 787 7,461,538,307
1,310 40.011 MFLOPs
4096 1048576 1.506 584 024 0.000 074 387 16,643,914,697,009
0.286 765 808 0.000 024 839 55,470,040,257
7,970 48.708 MFLOPs
8192 2097152 1.506 593 644 0.000 027 695 132,836,936,354,151
0.286 769 509 0.000 009 733 428,624,195,429
54,700 54.845 MFLOPs
19971021.xxx.x I get ready to post a new message to sci.fractals:
Recently I have decided to restart the computing effort I undertook
about a year ago at this time to compute the area of the Mandelbrot
Set. After reviewing the notes from that time I saw that there had
been some interest in the past of computing the Mandelbrot Set's
"center of gravity" as well. I have also done some optimizations of
the code in the mean time and fixed a few bugs, and am ready to do
some long runs.
The method I use involves counting pizels on a grid. There is error
in the estimate resulting from grid placement and the fact that the
grid squares on the boundary get counted as completely in or
completely out when in fact only part of their area is in. It is hard
to estimate the magnitude of this error, so I use statistical methods
to measure it. I use a bunch of grids of slightly different grid
spacings with slight vertical and horizontal offsets, and take the
mean and standard deviation of all the runs.
These are the figures I've compiled so far. Each run is the average
of 20 grids for which the standard deviation is shown.
grid iteration area and std deviation
size limit centerofgravity (20 runs)
256 65536 1.506 39 0.002 31
0.286 741 0.000 912
512 131072 1.506 87 0.001 35
0.286 881 0.000 417
1024 262144 1.506 782 0.000 493
0.286 802 0.000 173
2048 524288 1.506 674 0.000 151
0.286 811 0 0.000 074 8
4096 1048576 1.506 584 0 0.000 074 4
0.286 765 8 0.000 024 8
8192 2097152 1.506 593 6 0.000 027 7
0.286 769 51 0.000 009 73
For each average, the standard deviation gives the expected error
in each of the 20 individual grids. The expected error of the mean
is 1/sqrt(19) of the standard deviation (this is called "the standard
error of the mean"). I use 1/3 as a conservative estimate. So, my
best estimate of the centerofgravity is 0.286769 + 0.000003.
A few other observations:
 My program currently runs at 8 million Mandelbrot iterations per
second. The current model Macs are about 4 times faster. Intel
machines get nowhere close because they don't have enough
floatingpoint registers to keep the pipeline full.
 Each time I double the grid size it takes almost 8 times as long.
It takes 7 years for the industry to increase computing power of
personal computers by 8.
 Each time the gridsize is doubled the error goes down to about 0.4
of the previous value. It would take about 20 years for personal
computers to improve enough to give us one more digit in the area
estimate (given the same run time).
 My best area estimate is still the figure from last year:
1.5065923 + 0.0000006.
and then I notice that the current version is much slower than the version I
was running when I posted the results a year ago. The number of iterations
is going up by about 7.5 each time, and last year's version was more like 4.5.
I guess almost right away that it's because of the new pipelined routines'
rather poor ability to detect limit cycles.
19971021.23xx. I do some runs with the old iterator routine just to verify this.
32 8192 1.511 458 647 0.037 221 916 13,648,582
0.292 565 786 0.014 640 563 275,459
0.383 5.0301 MFLOPs
64 16384 1.502 640 744 0.019 431 398 83,832,862
0.287 474 451 0.004 811 761 967,521
0.867 7.8146 MFLOPs
128 32768 1.504 914 742 0.005 692 132 584,564,367
0.286 495 853 0.002 352 688 4,188,562
2.97 9.8831 MFLOPs
256 65536 1.506 386 130 0.002 312 485 4,354,401,151
0.286 741 015 0.000 912 340 17,981,647
11.9 10.533 MFLOPs
512 131072 1.506 869 368 0.001 349 103 33,588,336,186
0.286 880 847 0.000 416 546 80,190,202
51.1 10.989 MFLOPs
1024 262144 1.506 781 916 0.000 492 708 263,748,416,082
0.286 801 539 0.000 173 164 362,847,780
224 11.347 MFLOPs
2048 524288 1.506 673 891 0.000 151 484 2,090,234,952,071
0.286 811 014 0.000 074 787 1,678,084,515
1,010 11.608 MFLOPs
4096 1048576 1.506 584 024 0.000 074 387 16,642,875,255,072
0.286 765 808 0.000 024 839 7,780,618,202
4,600 11.846 MFLOPs
and I quickly decide that I should try to figure out a way I can write some
code to *guess* whether each point will be in the set or not (e.g., based on
the result of the most recently completed point). It can use the double pipeline
for points guessed outside the set and the single pipeline for points
guessed in the set.
19971022.01xx. Optimize the single iterator routine a bit (the rad2 test
in particular could use some work). This improves the times substantially.
32 8192 1.511 458 647 0.037 221 916 13,664,870
0.292 565 786 0.014 640 563 291,747
0.183 11.139 MFLOPs
64 16384 1.502 640 744 0.019 431 398 83,882,823
0.287 474 451 0.004 811 761 1,017,482
0.600 11.871 MFLOPs
128 32768 1.504 914 742 0.005 692 132 584,737,298
0.286 495 853 0.002 352 688 4,361,493
2.23 13.670 MFLOPs
256 65536 1.506 386 130 0.002 312 485 4,355,042,622
0.286 741 015 0.000 912 340 18,623,118
9.37 13.918 MFLOPs
512 131072 1.506 869 368 0.001 349 103 33,590,805,167
0.286 880 847 0.000 416 546 82,659,183
40.2 14.376 MFLOPs
1024 262144 1.506 781 916 0.000 492 708 263,758,102,020
0.286 801 539 0.000 173 164 372,533,718
176 14.839 MFLOPs
2048 524288 1.506 673 891 0.000 151 484 2,090,273,319,600
0.286 811 014 0.000 074 787 1,716,452,044
788 15.246 MFLOPs
I try making it call the dual pipeline whenever the dwell was greater than
50 but not in the set, the results are encouraging at first and then
much, much worse at higher grid sizes:
256 65536 22,167,716 9.2 sec 16.867 MFLOPs
512 131072 109,955,129 38.9 sec 19.769 MFLOPs
1024 262144 623,655,741 176 24.776
2048 524288 4,006,852,998 888 31.588
4096 1048576 27,710,036,024 4,860 39.904
8192 2097152 177,665,018,000 27,300 45.543
so I go back to the alldwell1 version:
4096 1048576 1.506 584 024 0.000 074 387 16,643,027,976,653
0.286 765 808 0.000 024 839 7,933,339,783
3,580 15.502 MFLOPs
8192 2097152 1.506 593 644 0.000 027 695 132,833,077,382,170
0.286 769 509 0.000 009 733 36,987,339,092
16,300 15.852 MFLOPs
16384 4194304 1.506 588 076 0.000 012 669 1,061,413,532,170,594
0.286 767 267 0.000 003 204 171,493,547,507
74,400 16.138 MFLOPs
32768 8388608 1.506 591 501 0.000 004 512 8,486,342,263,388,365
0.286 768 373 0.000 001 774 814,136,316,927
347,000 16.442 MFLOPs
65536 16777216 1.506 592 311 0.000 001 904 67,870,833,876,143,188
0.286 768 423 0.000 000 510 3,844,750,451,387
1,630,000 16.529 MFLOPs
19980218 compose a new article for sci.fractals:
Mandelbrot Set centerofgravity is 0.28676842 + 0.00000017
Recently I restarted the computing effort I undertook
about a year ago at this time to compute the area of the Mandelbrot
Set. After reviewing the notes from that time I saw that there had
been some interest in the past of computing the Mandelbrot Set's
"center of gravity" as well. I have also done some optimizations of
the code in the mean time and fixed a few bugs, and have started
doing a long run.
The method I use involves counting pizels on a grid. There is error
in the estimate resulting from grid placement and the fact that the
grid squares on the boundary get counted as completely in or
completely out when in fact only part of their area is in. It is hard
to estimate the magnitude of this error, so I use statistical methods
to measure it. I use a bunch of grids of slightly different grid
spacings with slight vertical and horizontal offsets, and take the
mean and standard deviation of all the runs.
These are the figures I've compiled so far. Each run is the average
of 20 grids for which the standard deviations are shown.
grid iter area and standard iterations
size limit centerof deviations time (seconds)
gravity (20 runs) performance (MFLOPs)
256 65536 1.506 38 0.002 31 18,623,118
0.286 741 0.000 912 9.4 13.918
512 131072 1.506 87 0.001 35 82,659,183
0.286 881 0.000 417 40.2 14.376
1024 262144 1.506 782 0.000 493 372,533,718
0.286 802 0.000 173 176 14.839
2048 524288 1.506 674 0.000 151 1,716,452,044
0.286 811 0 0.000 074 8 788 15.246
4096 1048576 1.506 584 0 0.000 074 4 7,933,339,783
0.286 765 8 0.000 024 8 3,580 15.502
8192 2097152 1.506 593 6 0.000 027 7 36,987,339,092
0.286 769 51 0.000 009 73 16,300 15.852
16384 4194304 1.506 588 1 0.000 012 7 171,493,547,507
0.286 767 27 0.000 003 20 74,400 16.138
32768 8388608 1.506 591 50 0.000 004 51 814,136,316,927
0.286 768 37 0.000 001 77 347,000 16.442
65536 16777216 1.506 592 31 0.000 001 90 3,844,750,451,387
0.286 768 423 0.000 000 510 1,630,000 16.529
For each average, the standard deviation gives the expected error
in each of the 20 individual grids. The expected error of the mean
is 1/sqrt(19) of the standard deviation (this is called "the standard
error of the mean"). I use 1/3 as a conservative estimate. So, my
best estimate of the centerofgravity is 0.28676842 + 0.00000017.
A few other observations:
 My program currently runs at about 2.35 million Mandelbrot iterations
per second. The latest model Macs are about 4 times faster. Intel
machines get nowhere close because they don't have enough
floatingpoint registers to keep the pipeline full.
 Each time I double the grid size it takes almost 4.7 times as long.
The current rate of progress in computer performance is
about 1.56 per year; at that rate it takes about 3.5 years
to increase computing power by 4.7.
 Each time the gridsize is doubled the error goes down to about 0.4
of the previous value (but it varies from 0.3 to 0.55). It would take
about 8 to 10 years for personal computers to improve enough to give
us one more digit in the area estimate (given the same run time).
 My new program reproduces my best area estimate from late 1996:
1.5065923 + 0.0000006.
http://www.cecm.sfu.ca/projects/ISC/ISCmain.html gives the
following for 0.286... (and way too much output for 1.506...):
2867682063800133 = (0250) F(44/59;39/56;1)
2867682149714066 = (0250) F(19/21;11/13;1)
2867682521500307 = (0404) Psi(5/12)+Psi(3/4)Psi(19/20)
2867682633829350 = (0001) (1/3ln(3))^Feig1
2867682783580957 = (0010) sum((11/3*n^320*n^2+157/3*n25)/(n!+1),n=1..inf)
2867682869758240 = (0064) sum((7/3*n^313*n^2+83/3*n+3)/(Fibo(n)+1),n=1..inf)
2867682944514431 = (0326) 4^(3/4)*(1212^(1/4))
2867682960509936 = (0001) Catalan+2/3+GAM(17/24)
2867682970307599 = (0192) (GAM(5/6)+1/3)/(ln(3)+4)
2867683491264107 = (0400) sum(1/(n!+7/6*n^35*n^2+131/6*n14),n=1..inf)
2867683638369271 = (0395) sum(1/C(2*n,n)/(17/6*n^313*n^2+133/6*n10),n=1..inf)
2867683747960658 = (0013) sum((8/3*n^327/2*n^2+167/6*n3)/n^(n1),n=1..inf)
2867683843843252 = (0259) F(3/11,9/10;4/9,4/5,1/2;1)
Artin = 0.3739558136192022
gamma = 0.57721566490153286060651209008240
TwinPrim = 0.6601618158468695
Catalan = 0.9159655941772190150546035149324
Feig2 = 2.502907875095892822283902873218215786381271376727149977336192056
Feig1 = 4.6692016091029906718532038
Gibbs + BesI(1,1) = 2.417096155974951
5606298864626305 = (0001) Gibbs+3/2*Feig2
5606312839756551 = (0001) Artin+Gibbs^Khint
5606382942821305 = (0001) Gibbsln(3)^E
8606371538195862 = (0405) Cahen/Parking
5606401453580847 = (0192) (sin(1)+3)/(Gibbs+5)
19980302 get the new result:
131072 33554432 1.506 591 734 0.000 000 624 18,173,551,931,685
0.286 768 317 0.000 000 295 7,590,000 16.754
19980304 It seems clear that the Mandelbrot Set center of gravity is
(ln(3)1/3)^Feig1, which comes out to about 0.2867682633829350268529586
{Another big hiatus here 19990520} 
19990518.1xxx. Port the program to gcc under Linux, and call it
luams3.c. It seems to produce fairly similar results, and runs at
about 32 MFLOPs on a 233MHz Pentium MMX.
19990518.1820. I've gotten it up to 38 MFLOPs by changing the
optimization flags and rearranging the inner loop a bit. For some
reason, optimization makes it print different values of g_act_its. I
need to figure out why in case this represents a bug.
Gridsize 32, itmax 8192
Area 1.5114586 + 0.0372219 Its: 13664870
Center of gravity 0.29256579 + 0.01464056, Actual: 291766
Gridsize 64, itmax 16384
Area 1.5026407 + 0.0194314 Its: 83882823
Center of gravity 0.28747445 + 0.00481176, Actual: 1011791
Gridsize 128, itmax 32768
Area 1.5049147 + 0.0056921 Its: 584735910
Center of gravity 0.28649585 + 0.00235269, Actual: 4367224
Gridsize 256, itmax 65536
Area 1.5063889 + 0.0022895 Its: 4355035389
Center of gravity 0.28675522 + 0.00091069, Actual: 18674354
Gridsize 512, itmax 131072
Area 1.5068630 + 0.0013543 Its: 33590760535
Center of gravity 0.28687915 + 0.00041832, Actual: 82279430
Gridsize 1024, itmax 262144
Area 1.5067819 + 0.0004926 Its: 263758161877
Center of gravity 0.28680108 + 0.00017393, Actual: 372254099
19990519.1xxx. Add im_sqrt parameters so the inner loop doesn't have
to test for iterations overflow. This doesn't seem to speed it up as
much as I was hoping it would. I would guess that results from
speculative execution of the floatingpoint operations.
19990519.2356. Today I make it catch the SIGTERM signal so it will
automatically stop when the system is rebooted normally, make .bashrc
launch luams at startup if it's not already running, and write a
script to find and kill the luams process to use when I need to.
I also restore the l_ticks and g_ticks variables and related
benchmarking calculations, which I removed during yesterday's port
because I didn't yet know how to measure user CPU time under Linux.
All of these changes bring LUAMS back up to the level of usability it
had in the Power Macintosh incarnation, and now it's on a more stable
OS and significantly faster hardware.
19990520.xxxx
Add iter_type typedef to make it easy to change the floatingpoint type
for benchmarking purposes. float isn't any faster than double on a
Celeron.
Also, I discover that the 333MHz Celeron is in fact a lot faster than
the 233MHz Pentium MMX at work, despite the Linux "BogoMIPS" values.
19990603.2430
Change prototype of main() so gcc doesn't complain.
19990628.2422
Check current state of luams3 output (it has been running as a
background task for about 5 weeks now) and find that it has finished
the 131072 gridsize and one iteration (out of 20) of the next
gridsize:
Gridsize 131072, itmax 67108864
Area 1.506591696 + 0.000000627 Its: 1085772250631444522
Center of gravity 0.2867683146 + 0.0000002961 Actual: 22749243215253
Total computation time: 2100630400 msec, 75808 KFLOPs
it is notable that the area estimate is a little lower (by 0.000 000 030)
which I attribute to the higher itmax.
19991021
It has finished its 262144 grid:
Gridsize 262144 (34359738368 pixels) ....................
Gridsize 262144, itmax 134217728
Area 1.506591772 + 0.000000242 Its: 8685544402515235059
Center of gravity 0.2867684377 + 0.0000000748 Actual: 107475449884576
Total computation time: 1303136528 msec, 76045 KFLOPs
The total computation time is wrong (because of overflow). The actual
time was about 9.9 million seconds or about 115 days; in realtime the
program was running continuously since 20000519.
From this I derive my estimates:
area = 1.50659177 + 0.00000008
cg = 0.28676844 + 0.000000025
After a few weeks I add these results to my web pages but I don't post
them to a newsgroup until 2000.
{Another big hiatus here} 
20030925
Make luams3 a standard background task on S3. Replace
WRITE_INTERVAL with a variable, to make it cache a little more on each
of the alreadyfinished cases each time it restarts. Increase max
gridsize to 1048576 so it can be left running for a very long time
without running out of things to do.
Port it to MacOS X 10.2.x (mainly involves using getrusage instead
of clock() and changes to how time is counted and printed out)
2003121x
Divide std deviation by 3 when printing to more accurately reflect
the error in the area estimate.
20040903 gridsize 524288 has finished, but the work was wasted because
g_inset overflowed. Change it to 64bit.
20040906 Fix overflow bug in g_inset that caused loss of entire
524288size grid results.
{Another big hiatus here} 
20101017 Get it running again (mainly just updating the makefile after
an upgrade to MacOS 10.6 caused me to audit all of my makefiles)
20101019 After about 1.5 days as a single thread, it has completed
all gridsizes up to 131072.
Add COMPILE_DATE, usage() and commandline argument parsing. Add
slice option to permit use of multiple copies of #luams# for
concurrency (which can be run on separate machines and then merged by
copying the cache files to one location, or can all be run on the same
machine). Compute KFLOPs with floatingpoint math.
Add check_sizes() and update the definitions of signed16, etc. for
64bit compilers. (The existing cache files have to be trashed because
some of the variables were the wrong size. They are now archived as
"cache20101019.1400.zip")
I do some sanity testing with gridsize 4096, then start a set of 4
slices, now starting over from scratch.
20101023:
Change R_LO, R_HI, and I_HI; add standard deviation to output
Calculate standard error based on actual number of samples being
averaged (rather than hardcoded to 3.0 as before), and multiply by
1.96 to get the 95% confidence interval; these combine to make the new
std error estimate more conservative than before (because 1.96/sqrt(20)
is greater than 1/3.0).
Std dev and std err are both now returned explicitly from
m_area_stat, and both are printed
Do not print center of gravity average if PERFECT_CARDIOID is defined
Keep track of total number of pixels tested (g_tested variable)
which is now also saved in the cache files.
This version is in "oldversions/luams20101023.txt"
20101024 I have done several important changes over the past 24 hours,
and now have a sync issue: the cache files from the 20101019 and
earlier versions will not work with the new changes, but I want to
"salvage" as much of that cached work as I can to get some results for
future comparison. So I am going to redo the recent changes in an
order that allows me to keep the old cache structure for as long as
possible.
First few changes:
* Add SKEW_INC_MAX define, use it in the one place where STAT_NSAMP
was being used to compute skew_inc, so that I can change the number of
samples without affecting the skews (and therefore the cache file
names)
* Do not print center of gravity average if PERFECT_CARDIOID is defined
* Display PERFECT_CARDIOID and USE_FILES options after COMPILE_DATE
* Calculate standard error based on actual number of samples being
averaged (rather than hardcoded to 3.0 as before), and multiply by
1.96 to get the 95% confidence interval; these combine to make the new
std error estimate more conservative than before (because
1.96/sqrt(20) is greater than 1/3.0).
* Std dev and std err are both now returned explicitly from
m_area_stat, and both are printed
This version of the code is in "oldversions/luams20101024.txt", and
the cache files are saved as "cache20101024.1930.zip".
I continue to make changes, now diverging from the old cache file format:
* Store R_LO, R_HI and I_HI in each cache file and abort with an error
if an existing cache file has different values
* Change R_LO, R_HI, and I_HI (as was done yesterday); the ranges are now
(2.02 .. 0.49) and (0.00 .. 1.15)
* Disable PERFECT_CARDIOID (probably permanently)
* Add gmin and gmax options
* Add g_row_mass, used to accumulate the mass total one row at a
time, to minimize the cumulative effect of roundoff error that would
otherwise influence the precision of the answer at the larger
gridsizes.
* DEFAULT_GS_MIN is now 25; gridsizes are now centered on the nominal
size (for example, for gridsize 128 it used to go from 128 to 147, but
now it will go from 118 to 137). Add save_or_reload().
20101028 Finish gridsize groups from 25 up to 204800 on MP16. Stats
for the last few gridwidths are:
Gridwidth 51200:
Area 1.506592211(521), CoG 0.286768176(129); 8223 sec.
Gridwidth 102400:
Area 1.506591964(223), CoG 0.2867683969(465); 8912 sec.
Gridwidth 204800:
Area 1.5065919790(936), CoG 0.2867683744(266); 8912 sec.
20111228 In response to email from Warren Smith, I decide to continue
with the next gridwidth 409600 (using first 1 task, then 4) on MP16.
This should take about 7 or 8 days to complete (each task is taking
just a little over 30 hours to complete a grid, and each task is doing
5 grids).
20120105 Add a few formatting routines (f64_expon, snfu_tag, and
snformat_unc), not yet complete.
20120107 At the end of the gridwidth 409600 run started on 1228, I
have these results:
Gridwidths 409590409609 at Nmax 209715200....................
Gridsize 409600, itmax 209715200
Area 1.506591856 + 2.54e08 (n=20, sigma 5.80e08)
Center of gravity 0.2867684229 + 1.11e08 (n=20, sigma 2.54e08)
Its: 1618709677335174482; Actual: 140360269258091
pixels in first sample: 20059349338 of 76864256651
Total computation time: 691255 sec, 1.42 GFLOPs
This result shows a bug: the "Its:" number (effective number of
iterations) overflowed its range (which was 64bit integer). This
needs to be changed to 128bit or double.
In addition, in order for the numbers to be right when used with
slice, I need to make each cache file store only its own g_agg_its
and g_act_its values, not the running totals for the entire set of
grids.
20120208 Report error on fopen(filename, "w"). Eliminate use of ugly "goto"s.
20120209 Replace g_agg_its and g_act_its with g_this_its, g_effc_its,
g_ttl_its and g_t_efc_its, all of which are now 128bit integers. This
makes the cache files incompatible with anything I did prior to this
point.
I also work on snformat_unc; get it mostly working.
Change use of l_ticks and g_ticks; add f_ticks (this parallels the
changes made to the g_xxx_its variables so that the benchmark stats
work regardless of the slice option).
20120210 Add s128_str
20120222 Report ns_done rather than STAT_NSAMP for (n=XX) value (only
affects slice mode)
20120224 Add g_dots and display current imaginary coordinate as a sign
of progress.
FUTURE CHANGES:
 Area of cubic Mandelbrot set (Z^3+C) and higher orders.
 Add an argument option to override STAT_NSAMP
 end of revision history
____________________________________________________________________________*/
/* #include "sysincludes.h" */
#ifdef MACOSX
# include
# include
#else
# include
# include
# include
#endif
#include
#include
#include
#include
#include
#include
// Parameters that control the amount and precision of area measurement.
// Because of the way the cache files are named, any of these parameters can
// be changed and the program can be recompiled without losing earlier work
// and without clearing the cache.
// Gridsizes to calculate
#define DEFAULT_GS_MIN 25
#define DEFAULT_GS_MAX 2000000
// Iterations limit multipliers to try.
// The 19980218 results used a multiplier of 256.
#define IM_MIN 512
#define IM_MAX 512
// Number of runs to average together to determine standard deviation.
// The standard error goes down by the square root of this value, and
// goes down just a little faster when increasing the gridsize.
#define STAT_NSAMP 20
#define SKEW_INC_MAX 20
/*
The real coordinates in the Mandelbrot set vary from 2.0 to
+0.471185... and the imaginary coordinates range from +1.122757...
See:
mrob.com/pub/muency/easternmostpoint.html
mrob.com/pub/muency/northernmostpoint.html
We conservatively add about 0.02 in all directions.
*/
#define R_LO 2.02
#define R_HI 0.49
#define I_LO 0.00
#define I_HI 1.15
typedef unsigned short unsigned16;
typedef signed int signed32;
typedef unsigned int unsigned32;
typedef signed long signed64;
typedef unsigned long unsigned64;
/* For 128bit I probably should be converting LUAMS to C++ so I can
use my own int128 library. However, this works in the shortterm.
According to some on the internet, something like "int128_t"
or "__int128_t" might be provided by compilers. */
typedef __attribute__((__mode__(__TI__))) signed128;
typedef unsigned __attribute__((__mode__(__TI__))) unsigned128;
typedef double iter_type;
typedef struct {
double ar_area;
double ar_gravity;
} area_result;
// dwell completion procedure pointer
typedef void(*dwell_cpp)(iter_type, long, long);
typedef void(*dwell_proc)(iter_type r, iter_type i, signed32 itmax,
signed32 im_sqrt, dwell_cpp cp);
signed64 rpm_clock(void);
void dcp1(iter_type r, long dwell, long its);
void dwell1(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt,
dwell_cpp cp);
void dwell2(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt,
dwell_cpp cp);
area_result m_area(signed32 gridsize, signed32 itmax, signed32 im_sqrt,
signed32 skew_v);
area_result m_area_stat(signed32 gridsize, signed32 itmax,
signed32 nsamp, area_result *std_dev, area_result *std_err, int * ns_done);
void loop1(void);
int main(int, char **);
signed32 g_gs_min;
signed32 g_gs_max;
signed128 g_this_its;
signed128 g_effc_its;
signed128 g_ttl_its;
signed128 g_t_efc_its;
unsigned32 g_itmax;
unsigned64 g_inset;
unsigned64 g_tested;
unsigned64 g_1st_inset;
unsigned64 g_1st_tested;
double g_row_mass;
double g_mass;
// l_ticks counts the total number of ticks (microseconds on Linux) spent on
// the current grid. f_ticks is the number of ticks that have elapsed since
// the last time we wrote out to a cache file. Everytime we write a file we
// transfer f_ticks into l_ticks and zero out f_ticks, so that f_ticks doesn't
// overflow.
signed64 g_ticks;
signed64 l_ticks;
signed64 f_ticks;
unsigned16 g_quit;
dwell_proc g_dp;
const double k256 = 256.0;
const double k96 = 96.0;
const double k32 = 32.0;
const double k16 = 16.0;
const double k7 = 7.0;
const iter_type k4 = 4.0;
const double k3 = 3.0;
const iter_type k2 = 2.0;
const double k1 = 1.0;
const double kpi = 3.1415926535897932;
void s128_str(char *s, signed128 x);
void s128_str(char *s, signed128 x)
{
int d, nd;
signed128 t, k10;
if (x == 0) {
*s = '0'; s++;
*s = 0; return;
}
if (x < 0) {
*s = ''; s++;
x =  x;
}
k10 = 10;
// Extract each digit (in reverse order)
nd = 0;
while(x > 0) {
t = x / k10;
d = x  (t * k10);
s[nd++] = ('0' + d);
x = t;
}
s[nd] = 0;
/* Unreverse the output string */
int i; char tc;
for (i=0; i*2 < nd; i++) {
tc = s[nd1i]; s[nd1i] = s[i] ; s[i] = tc;
}
}
// Define PERFECT_CARDIOID to add the exact value of the period1 and 2
// components' areas rather than pixelcounting them. This reduces the
// computation time a little (less than you'd think!) and also reduces the
// standard error for small grid sizes, but not for big grid sizes.
//#define PERFECT_CARDIOID
#define USE_FILES // To save work in disk files, to recover from power failures
/* The write interval constants are in the same units as rpm_clock() */
#define INIT_WRITE_INTERVAL 500000LL
#define MAX_WRITE_INTERVAL 5000000LL
signed64 write_interval;
//#define DEBUG_WRITES
int g_slice; int g_of;
/* Measure elapsed runtime in microseconds */
signed64 rpm_clock(void)
{
signed64 rv;
struct rusage ru;
int err;
err = getrusage(RUSAGE_SELF, &ru);
if (err) {
printf("rpm_clock err %d\n", err);
exit(1);
}
rv = (signed64) ru.ru_utime.tv_sec;
rv = (1000000LL * rv) + (signed64) ru.ru_utime.tv_usec;
return(rv);
}
void check_sizes(void)
{
int s;
s = (int) sizeof(unsigned16);
if (s != 2) {
printf("sizeof(u16) == %d, expected size was 2\n", s);
exit(1);
}
s = (int) sizeof(unsigned32);
if (s != 4) {
printf("sizeof(u32) == %d, expected size was 4\n", s);
exit(1);
}
s = (int) sizeof(unsigned64);
if (s != 8) {
printf("sizeof(u64) == %d, expected size was 8\n", s);
exit(1);
}
s = (int) sizeof(unsigned128);
if (s != 16) {
printf("sizeof(u128) == %d, expected size was 16\n", s);
exit(1);
}
if (0) {
/* Test signed128 type. Note that printf has no format selector for
this type. */
signed128 x;
x = 1;
for(s=0; s<128; s++) {
/* Note that printf has no format selector for 128bit integers. */
printf("%d %10g\n", s, ((double) x));
x += x;
}
exit(0);
}
}
void fmt_1_999(char * s, double x);
void fmt_1_999(char * s, double x)
{
strcpy(s, "");
if (x >= 100.0) {
sprintf(s, "%3.0f", x);
if (strlen(s) > 3) {
strcpy(s, "999");
}
} else if (x >= 10.0) {
sprintf(s, "%3.1f", x);
if (strlen(s) > 4) {
strcpy(s, "99.9");
}
} else if (x >= 1.0) {
sprintf(s, "%3.2f", x);
if (strlen(s) > 4) {
strcpy(s, "9.99");
}
} else {
strcpy(s, "1.00");
}
}
void format_si(char * s, double x);
void format_si(char * s, double x)
{
if (x < 0.0) {
strcpy(s, "neg err");
} else if (x == 0) {
strcpy(s, " 0 ");
} else if (x >= 1.0e18) {
sprintf(s, "%2.1e", x);
} else if (x >= 1.0e15) {
fmt_1_999(s, x/1.0e15);
strcat(s, " P");
} else if (x >= 1.0e12) {
fmt_1_999(s, x/1.0e12);
strcat(s, " T");
} else if (x >= 1.0e9) {
fmt_1_999(s, x/1.0e9);
strcat(s, " G");
} else if (x >= 1.0e6) {
fmt_1_999(s, x/1.0e6);
strcat(s, " M");
} else if (x >= 1.0e3) {
fmt_1_999(s, x/1.0e3);
strcat(s, " K");
} else if (x >= 1.0) {
fmt_1_999(s, x);
strcat(s, " ");
} else {
sprintf(s, "%2.1e", x);
}
}
int f64_expon(double x, signed32 * expon);
int f64_expon(double x, signed32 * expon)
{
int rv;
if ((x == 0)  isnan(x)  isinf(x)) {
return 1;
}
signed32 e = 0;
x = fabs(x);
while(x < 1.0) {
e;
x *= 10.0;
}
while(x >= 10.0) {
e++;
x /= 10.0;
}
if (expon) { *expon = e; }
}
char snfu_scratch[32];
int snfu_tag(char * s, long n, char * fmt, double x,
char **lead, char **digits, char **trail);
/* Perform an snprintf with the given format string, and "tag" the result
with pointers to the "leader", digits, and "trailer". */
int snfu_tag(char * s, long n, char * fmt, double x,
char **lead, char **digits, char **trail)
{
int l1, l2;
if (n <= 0) {
return 1;
}
if ((lead == 0)  (digits == 0)  (trail == 0)) {
return 1;
}
*lead = 0;
*digits = 0;
*trail = 0;
/* %%% Here we should make sure fmt fits within the constraints we're
willing to deal with: contains exactly one '%', has an 'e', 'f' or 'g'
at the end, etc. */
/* Get the digits, and also find out how much snprintf wants */
l1 = snprintf(s, n, fmt, x);
if (l1+1 > n) {
return 1;
}
*lead = s;
/* Find lead digit */
char * ld = s;
int gg = 1;
while(gg) {
if ((*ld >= '1') && (*ld <= '9')) {
gg = 0;
} else if (*ld == 0) {
gg = 0;
} else {
ld++;
}
}
if (*ld == 0) {
return 0;
}
*digits = ld;
/* Skip digits */
gg = 1;
while((*ld == '.')  ((*ld >= '0') && (*ld <= '9'))) {
ld++; /* Skip a character */
}
*trail = ld;
return l1;
}
int snformat_unc(char * s, long n, double x, double unc);
int snformat_unc(char * s, long n, double x, double unc)
{
int l1, l2, i;
const char * spec1 = "(Uncertain)";
char ts[30];
char * lead; char * digits; char * trail;
if (fabs(unc) > fabs(x)) {
/* The uncertainty is bigger than the value */
if (strlen(spec1)+1 < n) {
return strlen(spec1);
}
strncpy(s, spec1, n);
return strlen(s);
}
/* Check for 0 */
if (x == 0.0) {
s[0] = '0'; s[1] = 0;
return 1;
}
/* Deal with negatives */
if (x < 0) {
*s++ = '';
n;
x =  x;
}
if (unc < 0) {
unc =  unc;
}
/* Get the digits, and also find out how much snprintf wrote */
l1 = snfu_tag(ts, sizeof(ts), "%.17g", x, &lead, &digits, &trail);
if (digits == 0) {
s[0] = 0;
return 0;
}
char t2[30];
int gotdot = 0;
for(i=0; (i+1 < sizeof(t2)) && (digits + i < trail); i++) {
t2[i] = digits[i];
if (t2[i] == '.') {
gotdot = 1;
}
}
if (gotdot == 0) {
t2[i++] = '.';
}
while (i+1 < sizeof(t2)) {
t2[i++] = '0';
}
t2[i] = 0;
/* We want to add three uncertainty digits inside '()' plus the
trailing null */
l2 = l1+5;
if (l2+1 > n) {
s[0] = 0;
return 0;
}
/* Determine how many lead digits are in the clear */
int ex = 0; int eu = 0;
f64_expon(x, &ex); f64_expon(unc, &eu);
int need = ex  eu;
/* Put at most 15 digits in the lead */
if (need > 15) {
need = 15;
}
/* Find lead digit */
char * ld = digits;
int len = digits  lead;
for(i=0; i 0) {
/* If this character is a digit, count it */
if ((t2[i] >= '0') && (t2[i] <= '9')) {
need;
}
*s++ = t2[i];
i++; /* Next digit */
}
/* Then 3 more value digits */
need = 3;
while (need > 0) {
/* If this character is a digit, count it */
if ((t2[i] >= '0') && (t2[i] <= '9')) {
need;
}
*s++ = t2[i];
i++; /* Next digit */
}
*s++ = '(';
/* Then 3 digits of the uncertainty */
char us[30];
char * ulead; char * udig; char * utr;
/* Get the digits, and also find out how much snprintf wrote */
l1 = snfu_tag(us, sizeof(us), "%.3e", unc, &ulead, &udig, &utr);
need = 3;
int j = 0;
while ((need > 0) && (udig[j])) {
/* If this character is a digit, count it */
if ((udig[j] >= '0') && (udig[j] <= '9')) {
need;
*s++ = udig[j];
}
j++; /* Next digit */
}
while (need > 0) {
*s++ = '0';
need;
}
*s++ = ')';
/* Add trail */
j = 0;
while(trail[j]) {
*s++ = trail[j++];
}
/* trailing null, and we be done. */
*s = 0;
return strlen(s);
}
void my_handler(int signal);
void my_handler(int s)
{
g_quit = 1;
printf("\n\nGot signal %d, exiting!", s);
}
void dcp1(iter_type r, long dwell, long act_its)
{
g_tested++;
if (dwell > g_itmax) {
g_inset++;
g_row_mass += r;
g_dp = &dwell1; // dwell1 or dwell3
} else if (dwell > 50) {
g_dp = &dwell1; // dwell1 or dwell3
} else {
g_dp = &dwell1; // dwell1 or dwell3
}
g_effc_its += dwell;
g_this_its += act_its;
}
// This is a relatively straightforward iterate routine, except that it
// detects loops.
void dwell1(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt,
dwell_cpp d1_cp)
{
iter_type cr, ci;
register iter_type zr, zi;
register iter_type sr, si;
register iter_type zr2, zi2, rad2;
signed32 rv_its;
signed32 act_its;
signed32 n1, n2;
int escape_flag = 0;
int cycle_flag = 0;
rv_its = 1;
act_its = 1;
zr = r;
zi = i;
cr = r;
ci = i / k2;
sr = zr;
si = zi;
for (n1=2; n1<=im_sqrt; n1++) {
sr = zr;
si = zi;
for (n2=0; n2 k4) {
escape_flag = 1;
n1 = n2 = im_sqrt + 1; /* Make both loops exit */
} else if ((zr == sr) && (zi == si)) {
rv_its = itmax+1;
cycle_flag = 1;
n1 = n2 = im_sqrt + 1; /* Make both loops exit */
}
}
}
if ((cycle_flag == 0) && (escape_flag == 0)) {
/* We did not cycle or escape, and we ran out of iterations.
Return act_its+1 just to make sure it meets the criteria. */
act_its++;
}
if (cycle_flag == 0) {
/* We did not cycle, but might have escaped. Here I want to return the
actual iteration count. */
rv_its = act_its;
}
/* Now call the completion routine, to update the statistics for this
grid (total number of pixels, area total, etc.) */
(*d1_cp)(r, rv_its, act_its);
}
// dwell2 checks to see if the point is in R2a or R2.1/2a, and
// calls (*g_dp) if it isn't. (*g_dp) should point to dwell1 or
// another dwell routine.
void dwell2(iter_type r, iter_type i, signed32 itmax, signed32 im_sqrt,
dwell_cpp cp)
{
iter_type i2, s, s2;
signed32 dwell, act_its;
// Test for period2 component
i2 = i * i;
s = r * r + i2;
s2 = (k256*s  k96)*s + k32*r;
if (s2 < k3) {
/* We do not need to call the dwell routine */
} else {
// Test for cardioid
s2 = k16*s + k32*r + k16;
if (s2 < k1) {
/* We do not need to call the dwell routine */
} else {
/* Yes we do need to call the dwell routine */
(*g_dp)(r, i, itmax, im_sqrt, cp); // dwell1 or dwell3
/* dwell routine calls *cp, so we don't need to do that. */
return;
}
}
/* If we get here, it means the point is either in the cardioid or the
period2 component (largest muatom). */
act_its = 1;
// If we're using the PERFECT_CARDIOID method, we report this
// point as being *outside* the set because the exact areas of
// R2a and R2.1/2a are added by m_area
#ifdef PERFECT_CARDIOID
dwell = 1;
#else
dwell = itmax+1;
#endif
(*cp)(r, dwell, act_its); // dcp1
return;
}
void save_or_reload (int * loaded, signed64 * wrote_time, signed64 * now_time,
iter_type * i, int * firstrow, char * filename);
void save_or_reload (int * loaded, signed64 * wrote_time, signed64 * now_time,
iter_type * i, int * firstrow, char * filename)
{
FILE *f;
if (*loaded) {
// Time to write?
*now_time = rpm_clock();
*now_time = *now_time  *wrote_time;
#ifdef USE_FILES
if (*now_time > write_interval) {
// Only write if we're still running "normally". This is
// mainly so I don't have to worry too much about making
// my loops exit gracefully.
if (g_quit == 0) {
iter_type tfloat;
f_ticks += rpm_clock(); // stop timer to write value to file
// printf("about to write; f_ticks == %ld\n", f_ticks);
l_ticks += f_ticks;
// printf("l_ticks now %ld\n", l_ticks);
// write
f = fopen(filename, "w");
if (f == 0) {
fprintf(stderr, "Could not open '%s' for writing.\n", filename);
exit(1);
}
fwrite(i, sizeof(i), 1, f);
fwrite(&g_inset, sizeof(g_inset), 1, f);
fwrite(&g_tested, sizeof(g_tested), 1, f);
fwrite(&g_mass, sizeof(g_mass), 1, f);
fwrite(firstrow, sizeof(firstrow), 1, f);
fwrite(&g_this_its, sizeof(g_this_its), 1, f);
fwrite(&g_effc_its, sizeof(g_effc_its), 1, f);
fwrite(&l_ticks, sizeof(l_ticks), 1, f);
tfloat = R_LO; fwrite(&tfloat, sizeof(tfloat), 1, f);
tfloat = R_HI; fwrite(&tfloat, sizeof(tfloat), 1, f);
tfloat = I_HI; fwrite(&tfloat, sizeof(tfloat), 1, f);
fclose(f);
*wrote_time = rpm_clock();
#ifdef DEBUG_WRITES
printf("Wrote i = %10.7f\n", i);
#endif
// printf("restart f_ticks at 0\n");
f_ticks = 0; // zero timer
f_ticks = rpm_clock(); // start timer
}
if (write_interval < MAX_WRITE_INTERVAL) {
write_interval += INIT_WRITE_INTERVAL;
}
}
#endif // USE_FILES
} else {
/* Not yet loaded cache file */
#ifdef USE_FILES
f = fopen(filename, "r");
if (f) {
iter_type t_rlo, t_rhi, t_ihi;
// Reload from file
fread(i, sizeof(i), 1, f);
fread(&g_inset, sizeof(g_inset), 1, f);
fread(&g_tested, sizeof(g_tested), 1, f);
fread(&g_mass, sizeof(g_mass), 1, f);
fread(firstrow, sizeof(firstrow), 1, f);
fread(&g_this_its, sizeof(g_this_its), 1, f);
fread(&g_effc_its, sizeof(g_effc_its), 1, f);
fread(&l_ticks, sizeof(l_ticks), 1, f);
// printf("read l_ticks == %ld\n", l_ticks);
fread(&t_rlo, sizeof(t_rlo), 1, f);
fread(&t_rhi, sizeof(t_rhi), 1, f);
fread(&t_ihi, sizeof(t_ihi), 1, f);
fclose(f);
/* The following checks also help ensure endianness and alignment of the
preceding data fields */
if (t_rlo != R_LO) {
printf("\n");
printf("Cannot use cache file '%s'\n", filename);
printf(" because its R_LO value is %10g and we expect %10g\n",
t_rlo, R_LO);
exit(1);
}
if (t_rhi != R_HI) {
printf("\n");
printf("Cannot use cache file '%s'\n", filename);
printf(" because its R_HI value is %10g and we expect %10g\n",
t_rhi, R_HI);
exit(1);
}
if (t_ihi != I_HI) {
printf("\n");
printf("Cannot use cache file '%s'\n", filename);
printf(" because its I_HI value is %10g and we expect %10g\n",
t_ihi, I_HI);
exit(1);
}
#ifdef DEBUG_WRITES
printf("Continuing from i = %10.7f\n", *i);
#endif
f_ticks = 0; // Since any calculation up to this point has been replaced
// with the cache values, we need to restart the timer at 0
f_ticks = rpm_clock();
} else {
// No cache file. Schedule a cache write 1 second from now.
*wrote_time = rpm_clock()  write_interval + 1000000L;
}
#endif // USE_FILES
*loaded = 1;
}
}
int g_num_dots;
char g_dots[STAT_NSAMP+1];
area_result m_area(signed32 gridsize, signed32 itmax, signed32 im_sqrt,
signed32 skew_v)
{
area_result rv;
iter_type r, i2;
iter_type ri_inc;
iter_type row0_adjust;
double pix_area;
double result;
double skew_d;
iter_type i;
int firstrow;
char filename[200];
signed64 wrote_time, now_time;
int loaded;
wrote_time = rpm_clock();
loaded = 0;
sprintf(filename, "cache/%ld,%ld@%ld",
(long) gridsize, (long) skew_v, (long) itmax);
// printf("filename: %s", filename);
ri_inc = (R_HI  R_LO) / gridsize;
skew_d = skew_v;
skew_d /= 65536.0;
skew_d *= ri_inc;
pix_area = ri_inc*ri_inc;
g_inset = 0;
g_tested = 0;
g_mass = 0;
g_itmax = itmax;
firstrow = 1;
for (i = I_LO + skew_d;
(i < I_HI) && (g_quit == 0);
i += ri_inc)
{
save_or_reload(&loaded, &wrote_time, &now_time, &i, &firstrow, filename);
i2 = i;
if (firstrow) {
i2 = (i+(ri_inc/2.0))/2.0;
}
#ifndef DEBUG_WRITES
if (g_quit == 0) {
printf("\r%s%9.6f ", g_dots, i2);
fflush(stdout);
}
#endif
g_row_mass = 0;
for (r = R_LO  skew_d;
(r < R_HI) && (g_quit == 0);
r += ri_inc)
{
dwell2(r, i2, itmax, im_sqrt, &dcp1);
}
if (firstrow) {
// printf("%qd>", g_inset);
i2 = g_inset;
row0_adjust = (i+(ri_inc/2.0))/ri_inc;
i2 *= row0_adjust;
g_inset = i2;
g_row_mass *= row0_adjust;
// printf("%qd ", g_inset);
firstrow = 0;
}
g_mass = g_mass + g_row_mass;
}
result = g_inset;
result *= pix_area;
result *= k2;
#ifdef PERFECT_CARDIOID
// Add the exact area of the cardioid and the period2 component.
// %%% this works for the area but not for g_mass, so we get the wrong answer
// for the center of gravity
result += (k7 * kpi)/k16;
#endif
rv.ar_area = result;
rv.ar_gravity = g_mass / g_inset;
return rv;
}
area_result m_area_stat(
signed32 gridsize, // Initial gridsize
signed32 itmax, // Itmax for all samples
signed32 nsamp, // Number of samles to take
area_result *std_dev, // Sample standard deviation results will go here
area_result *std_err, // Standard error results will go here
int * ns_done) // Number of samples taken will go here
{
double a_sum, g_sum;
double a_sum_sqr, g_sum_sqr;
signed32 i, skew_v, skew_inc;
double area, gravity;
double a_mean, g_mean;
area_result mar;
area_result rv;
signed32 its, n1, n2, im_sqrt;
signed32 gs_start, gs_end;
gs_start = gridsize  (nsamp/2);
gs_end = gs_start + nsamp;
printf("Gridwidths %ld%ld at Nmax %ld\n",
(long) gs_start, (long) (gs_end1), (long) itmax);
fflush(stdout);
g_num_dots = 0;
*ns_done = 0;
(*std_dev).ar_area = 0.0;
(*std_dev).ar_gravity = 0.0;
(*std_err).ar_area = 0.0;
(*std_err).ar_gravity = 0.0;
rv.ar_area = 0.0;
rv.ar_gravity = 0.0;
/* Compute im_sqrt. The algorithm is dumb but doesn't matter because we
only do it once per unique value of itmax */
its = 0;
im_sqrt = 0;
for (n1=2; n1 itmax) {
im_sqrt = n1;
n1 = n2 = itmax; /* Make both loops end now */
}
its++;
}
}
if (im_sqrt == 0) {
fprintf(stderr, "Error: did not detect im_sqrt\n");
exit(1);
}
g_ttl_its = 0;
g_t_efc_its = 0;
a_sum = a_sum_sqr = 0;
g_sum = g_sum_sqr = 0;
skew_inc = 65536 / SKEW_INC_MAX; // was "nsamp"
skew_v = skew_inc / 2;
g_ticks = 0;
g_1st_inset = g_1st_tested = 0;
gridsize = gs_start;
for (i=0; (i 1) {
printf("RUNNING AS SLICE %d of %d SLICES\n", g_slice, g_of);
}
if (g_of > 4) {
/* More than 4 slices means less than 5 samples per slice */
printf(" Not enough samples for meaningful averages.\n");
} else if (g_of > 1) {
printf(
" (These stats reflect %d samples out of the full set of %d)\n",
ns_done, STAT_NSAMP);
printf("\n");
}
if (g_of <= 4) {
char s[30];
/* 4 or fewer slices means we have enough samples to calculate
the average and standard error, etc. */
printf(" Gridsize %ld, itmax %ld\n", (long) gs, (long) itmax);
snformat_unc(s, sizeof(s), mar.ar_area, 1.96 * std_err.ar_area);
printf(" Area %s (n=%d, sigma %7.2e)\n",
s, ns_done, std_dev.ar_area);
// printf(
// " Area %11.9f + %7.2e (n=%d, sigma %7.2e)\n",
// mar.ar_area, 1.96 * std_err.ar_area, ns_done, std_dev.ar_area);
#ifndef PERFECT_CARDIOID
snformat_unc(s, sizeof(s), mar.ar_gravity, 1.96*std_err.ar_gravity);
printf(" Center of gravity %s (n=%d, sigma %7.2e)\n",
s, ns_done, std_dev.ar_gravity);
// printf(
// " Center of gravity %13.10f + %7.2e (n=%d, sigma %7.2e)\n",
// mar.ar_gravity, 1.96 * std_err.ar_gravity, ns_done,
// std_dev.ar_gravity);
#endif
s128_str(s, g_t_efc_its);
printf(" Its: %20s", s);
s128_str(s, g_ttl_its);
printf("; Actual: %s\n", s);
// printf(" Its: %10.05g; Actual: %10.05g\n",
// ((double) g_t_efc_its), ((double) g_ttl_its));
printf(" pixels in first sample: %lu of %lu\n",
g_1st_inset, g_1st_tested);
perf = ((double) g_ticks)/1.0e6; /* Seconds */
printf(" Total computation time: %g sec", perf);
if (perf > 0.01) {
char s[20];
/* FLOPs per millisecond equals KFLOPs per second. */
perf = ((double) g_ttl_its) * 7.0 / perf;
format_si(s, perf);
printf(", %sFLOPs", s);
} else {
/* Ran for less than 10 milliseconds */
printf(" (too short for KFLOPs measure)");
}
printf("\n");
}
}
}
printf("\n");
}
}
void usage(void);
void usage()
{
fprintf(stderr, "%s",
"NAME\n"
" luams  measure area of Mandelbrot Set\n"
"\n"
"SYNOPSIS\n"
" luams [slice N/M] [gmin A] [gmax B]\n"
"\n"
"EXAMPLES\n"
" luams slice 3/4 (slice number is 0based)\n"
"\n"
);
}
int main(int nargs, char **argv)
{
char ** av;
char *arg;
int args;
printf("LUAMS3, %s", COMPILE_DATE);
#ifdef PERFECT_CARDIOID
printf(", PERFECT_CARDIOID");
#endif
#ifdef USE_FILES
printf(", USE_FILES");
#endif
printf("\n\n");
check_sizes();
/* Set defaults */
write_interval = INIT_WRITE_INTERVAL;
g_slice = 0; g_of = 1;
g_gs_min = DEFAULT_GS_MIN;
g_gs_max = DEFAULT_GS_MAX;
if (nargs > 1) {
av = argv;
args = nargs;
av++; /* skip our program name/path */
args;
while(args) {
arg = *av; av++; args;
if ((strcmp(arg, "help") == 0)  (strcmp(arg, "h") == 0)
 (strcmp(arg, "help") == 0)  (strcmp(arg, "h") == 0)) {
usage();
exit(0);
} else if (strcmp(arg, "gmin") == 0) {
char * s;
int na, g;
s = *av; av++; args;
na = sscanf(s, "%d", &g);
if (na != 1) {
fprintf(stderr, "gmin requires an argument\n");
exit(1);
}
if (g < 32) {
fprintf(stderr, "gmin argument must be at least 32.\n");
exit(1);
}
g_gs_min = g;
printf("Min gridsize %d\n", g_gs_min);
} else if (strcmp(arg, "gmax") == 0) {
char * s;
int na, g;
s = *av; av++; args;
na = sscanf(s, "%d", &g);
if (na != 1) {
fprintf(stderr, "gmax requires an argument\n");
exit(1);
}
if (g < 32) {
fprintf(stderr, "gmax argument must be at least 32.\n");
exit(1);
}
g_gs_max = g;
printf("Max gridsize %d\n", g_gs_max);
} else if (strcmp(arg, "slice") == 0) {
char * s;
int na, slice, of;
s = *av; av++; args;
na = sscanf(s, "%d/%d", &slice, &of);
if (na != 2) {
fprintf(stderr, "slice requires an argument like '3/7'\n");
exit(1);
}
if (of <= 0) {
fprintf(stderr, "second part of slice argument must be a positive integer.\n");
exit(1);
}
if ((slice < 0)  (slice >= of)) {
fprintf(stderr, "first part of slice argument must be in the range 0 .. %d inclusive\n", (int) (of1));
exit(1);
}
if (of > STAT_NSAMP) {
fprintf(stderr, "number of slices should not be more than %d\n", (int) STAT_NSAMP);
exit(1);
}
g_slice = slice; g_of = of;
printf("Will run as slice %d of %d\n", g_slice, g_of);
}
}
}
if (g_gs_max < g_gs_min) {
fprintf(stderr, "gmax must be at least as large as gmin.\n");
exit(1);
}
signal(SIGINT, &my_handler);
signal(SIGTERM, &my_handler);
loop1();
}