The SPEC Benchmarks  

This page describes the SPEC benchmarks, giving their history, a description of how they are measured and how the scores are calculated, and useful formulas for converting between one SPEC benchmark and another.

Contents


History
Issues Preventing Comparison
The SPEC CPU Suites :
  SPEC CPU 1989
  SPEC CPU 1992
  SPEC CPU 1995
  SPEC CPU 2000
  SPEC CPU 2006
Conversions :
  Converting between Dhrystone/Whetstone and SPEC89
  Converting between SPEC89 and SPEC92
  Converting between SPEC92 and SPEC95
  Converting between SPEC95 and SPEC2000
  Converting between SPEC2000 and SPEC2006
  Relationship Between Speed and Rate Metrics
  Speed and Rate conversion for CPU92
  Speed and Rate conversion for CPU95
  Speed and Rate conversion for CPU2000
  Speed and Rate conversion for CPU2006
Data :
  Dhrystone and Whetstone data for some early systems
  Reference Machine Times for the 1989 SPEC Benchmarks
  Reference Machine Times for SPEC_CPU_92
  Reference Machine Times for SPEC_CPU_95
  Reference Machine Times for SPEC CPU2000
  Reference Machine Times for SPEC CPU2006
  Thermal data for SPEC CPU2006
Footnotes and References
Bibliography

History

SPEC, the Standard Performance Evalution Corporation, is an organization dedicated to producing benchmarks that are reasonably scientific, unbiased, meaningful and relevant. They were formed in September 1988 and released their first set of benchmarks, the SPEC Benchmark Suite for UNIX Systems version 1.0, in October 1989 27.

The suite consisted of 10 programs which could be run and measured to produce three scores: integer SPECmark, floating-point SPECmark, and overall SPECmark. For each program, the speed of running that program on a test machine was measured relative to the speed of the same program running on the "Reference Machine", a VAX 11/780. The SPECint score was a geometric mean of the speeds of the 4 programs used to measure integer performance, and SPECfp was the geometric mean of the other 6 programs. The mean of all 10 programs yielded the SPECmark score.

In 1991 December, SPEC renamed the integer and floating-point SPECmarks to SPECint and SPECfp, respectively 18. The next month, they released SPECint92 and SPECfp92, and the 1989 versions became known as SPECint89 and SPECfp89 18. There was no overall "SPECmark92" combining the integer and floating-point performance into a single score.

The 1992 suite used a greater number of programs to evaluate performance, and introduced the SPECrate metric for multi-CPU machines. The SPECrate measurement involves running multiple copies of a benchmark program simultaneously, and the formula is a bit more complicated because it needs to include an extra variable for the number of copies that were running simultaneously.

In 1995, 2000 and 2006, SPEC released updated benchmark suites, each with a greater number of programs than its predecessor, and each using larger amounts of code and larger datasets. This brings the total to five49 suites: 1989, 1992, 1995, 2000 and 2006. Each suite defines a standard reference machine and a set of programs to run, and formulas for computing the scores. All but the 1989 version also measure multi-CPU throughput.

The SPEC benchmarks are continually updated because of two problems that affect most (but not all) benchmark methods:

Issues Preventing Comparison

Readers looking for a thorough and scientific comparison of the various SPEC benchmark suites should look at the papers by Phansalkar, Hoste, et al. listed in the bibliography [61], [62], [63], [65].

The rest of this section (and indeed, pretty much this entire web page) concerns simplistic comparisons, such as a one-dimensional linear relation "A = K×B" where A and B are the scores given by two different benchmarks and K is an (approximate) constant of proportionality.


An Engineering--Marketing Synergy
An Engineering--Marketing Synergy


Comparing SPECint vs. SPECrate_int, etc.

SPEC advises against comparing SPECrate_int and SPECrate_fp scores to SPECint and SPECfp, respectively. Prior to the 2006 suite, they made it a little difficult by using formulas that create quite different results, and by not describing the formulas explicitly. However, it is easy to determine what the formulas are, and if you know what you're comparing, a direct comparison can be quite meaningful. Such a comparison is meaningful when, for example, a computer user has to perform 12 independent runs of the same program on 12 different but equally demanding datasets, and has a choice between running 4 copies at a time (in a total of 3 batches) on a 4-CPU machine, or running them one at a time on a 1-CPU machine. In such scenarios the 12-run workload is described as being "easily" or "trivially" parallelized.

Comparing SPECint95 vs. SPECint2000, etc.

SPEC discourages its general audience from converting between one version of the SPEC metric and another.50 Here I explain the reasons in greater detail.

Each new benchmark suite involves a greater number of programs, generally with a larger memory usage and greater running time (when compared to programs from the older suite running on the same test machine). Because of the phenomena described here, this means that the SPEC scores from different suites cannot be directly compared. For example, SPECint89 programs use less memory than SPECint95 programs. Therefore, a system with a large CPU data cache and relatively small amount of RAM will do relatively well on SPECint89 and relatively poorly on SPECint95, and a machine with a small CPU data cache and large amount of RAM will do comparatively worse on SPECint89 and comparatively better on SPECint95. However, most actual machines have a balanced amount of cache, memory, and other necessary system components. As a result, SPEC benchmarks of consecutive releases are actually quite closely correlated.

If you want a closer understanding of the issues involved, I would suggest starting with the 1999 paper by Gustafson and Todi [60]. In particular note the HINT graphs that are annotated showing where various benchmarks like Dhrystone, LINPACK, SPEC, etc. fall on the HINT graphs.

Each SPEC benchmark has multiple components that fall in different places on the HINT curve, and many have behavior that reflects multiple points or segments of the curve. For example, a test might spend half of its time doing intense calculations that stay in the level-2 cache all the time, and the other half of its time doing huge matrix transforms that take huge amounts of memory, perhaps even paging blocks of memory in and out from disk.

Over the course of many years, there are two phenomena that affect how the benchmark numbers change:

Since the overall benchmark is an average of several benchmark programs which are scattered across the HINT curve, some of these effects cancel out. But of course some variation remains. The best we can hope for when doing conversions like those described below is to get a rough idea of the order of magnitude of a comparison. For example, the formulas below indicate that a 2.8 GHz Intel Core 2 processor would get a SPECint95 score about 28 times as high as a Sun Ultra 5/10 300 MHz47. Rather than using such numbers directly, it is best to say something like "A Core 2 system can probably handle Sun Ultra 5/10 -sized workloads at about 15 to 60 times the speed of the Sun Ultra 5/10". In this case we have added a factor-of-2 of uncertainty (from 28/2=14 to 28*2=56, or roughly 15 to 60).

It is also important to realize that the Sun Ultra 5/10 probably cannot do much of anything that we now expect a Core 2 to do. In other words:

Even if the Core 2 is 30 times as fast as the Sun Ultra 5/10, the Sun Ultra 5/10 will be much more than 30 times slower than a Core 2.

If you can't comprehend that paradox, you probably shouldn't be going around casually converting benchmark results from one suite to another.



So you still want to proceed?
So you still want to proceed?



Having given considerable warning, I will now proceed to details:


The 1989 SPEC Benchmark Suite for UNIX Systems

This suite, later renamed "SPECint89" and "SPECfp89", consists of 10 programs: GCC, ESP, LI, EQNtott, SPICE2G6, DODUC, NASA7, MATRIX300, FPPPP and TOMCATV. Each is representative of a type of task that computers were being used for at the time the suite was developed. The reference machine and program run times are listed here.

SPECratios are obtained by dividing program run time into VAX 11/780 run time. Using GCC as an example, we have:

SPECratioGCC = 1481.5 / runtimeTM(GCC)

where

1481.5 = runtime of GCC on the reference machine
TM = test machine

Thus, SPECratios are higher for faster machines.

SPECint89 is the geometric mean of the SPECratios for the four SPECint89 programs, and SPECfp89 is defined similarly. A machine with a SPECint89 of 2.0 is about twice as fast (at integer calculations) as a VAX 11/780.


SPEC CPU92

For SPEC CPU92, the reference machine is the same as in CPU89, a VAX 11/780. The VAX is given a SPECint92 and SPECfp92 score of 1.0. Each program selected for use in CPU92 is run on the VAX to determine its reference time, denoted below by Tref(program). The programs and reference times are listed here.

Single-CPU Benchmarks: SPECint92 and SPECfp92

Each program is compiled with standard flags for a "base" measurement, or with tester-selected optimization flags for a "peak" measurement. The program is run and its runtime measured. The "Base Ratio" or "Peak Ratio" for that program run is computed as follows:

Ratio = RefTime / RunTime

For example, running 085.gcc on an IBM POWERstation M20/220

takes 413.0 seconds, and the reference machine takes 5460 seconds. So, the ratio for 085.gcc on the POWERstation M20/220 is: 34

ratio085.gcc = 5460 / runtimeSUT(085.gcc) = 5460 / 413.0 = 13.22

where

5460 = Tref(085.gcc), the runtime of 085.gcc on the reference machine (a VAX 11/780)
SUT = system under test
413.0 = runtime of 085.gcc on SUT

In the integer test suite there are 6 programs. SPECint92 is the geometric mean of the ratios for the 6 programs, and SPECfp92 is defined similarly. A machine with SPECint92 of 2.0 is about twice as fast (at integer calculations) as the VAX 11/780. Here are the runtimes and ratios for the six CINT92 programs on the POWERstation M20/220: 34

program ref-time runtime ratio -------------- -------- ------- ----- 008.espresso 2270 115.1 19.7 022.li 6210 397.1 15.6 023.eqntott 1100 38.0 28.9 026.compress 2770 197.9 14.0 072.sc 4530 282.4 16.0 085.gcc 5460 413.0 13.2

To determine the SPECint92 score for the POWERstation, the geometric mean of the 6 ratios is computed:

mean = (19.7 × 15.6 × 28.9 × 14.0 × 16.0 × 13.2)(1/6) = 17.24

This average 17.24 is the SPECint92 score.

SPECfp92 scores are determined similarly from a set of 14 programs. Details of the 6 integer and 14 floating-point programs are given here.

Correlation studies between the various components of the SPEC92 suite were performed by Giladi and Ahituv [57].

SPECrate92

The formula for calculating the rate metrics is more complex that it was in CPU89, largely because of misunderstanding and misuse. This quotation describes the problem: 13

Unfortunately it was easy to make invalid comparisons between SPECmark89s and SPECthruput89s or even mistake values between these metrics. It is not fair to compare the speed of a uni-processor machine against the throughput of a multi-processor. However, many believed that it would be acceptable to compare SPECthruput89s against SPECmark89s, because the SPECthruput89 looked like a SPECmark89 both in terms of the results and the means to calculate those results.

A SPECrate92 for a given machine is calculated as follows:

SPECrate = geom mean [ SPECrate(program) ]

SPECrate(program) = N × [ Tref(program) / Tref(056.ear) ] × [ 604800 / TSUT(program) ]

where

N = number of copies run concurrently (this can be different for each program, chosen by the tester to maximize the SPECrate — for example see the results for the HP Apollo 9000/755. 35)
Tref(program) = time to run program on the reference machine, a VAX 11/780
Tref(056.ear) = time to run 056.ear on the reference machine = 25500 seconds. 12
604800 = number of seconds in a week
TSUT(program) = time to finish last concurrent copy on system under test
SUT = system under test

For example, running 2 copies of 008.espresso on an HP Apollo 9000/755 took 48 seconds. 35 Applying the formula:

SPECrate(008.espresso) = N × [ Tref(008.espresso) / Tref(056.ear) ] × [ 604800 / TSUT(008.espresso) ]
= 2 × [ 2270 / 25500 ] × [ 604800 / 48 ]
= 2243.29

Thus the SPECrate for 008.espresso on the HP Apollo 9000/755 is 2243, as you can see in that machine's report. 35

It is useful to note that for this machine, different programs were run with different values of N. This is allowed by the SPEC CPU92 run rules and provides for the possibility that a tester might recognize that certain tasks are more suited to greater parallelism than others. In the specific case of the HP Apollo 9000/755, which is a single-processor machine, some programs were run with N=1 and others with N=2 or N=3. (This can produce greater throughput because of multiple-resource contention. For example, if the test program uses a lot of CPU and also uses a lot of disk I/O, one copy can be using the CPU while the other is waiting for the disk drive to seek to a requested data block. In this situation, it is still possible for two tasks to be waiting for the same resource at the same time, so adding a third task can increase the odds that both hardware resources always have at least one waiting client.)

The overall SPECint_rate92 for the machine is the geometric mean of the SPECrates for each of the integer programs:

SPECrate_int92 = (2243 × 2055 × 2115 × 1564 × 1852 × 1742)(1/6)
= 1914.17

rounded off to 1914 as you can see in the report. 35


SPEC CPU95: SPECint95, SPECfp95 and SPECrate95

For SPEC CPU95, the reference machine is a SPARCstation 10/40 with 128MB of memory, and that machine is given a SPECint95 and SPECfp95 score of 1.0. The programs and reference times are listed here.

Single-CPU Benchmarks

Each program is compiled with standard flags for a "base" measurement, or with tester-selected optimization flags for a "peak" measurement. The program is run and its runtime measured. The "Base Ratio" or "Peak Ratio" for that program run is computed as follows:

Ratio = RefTime / RunTime

For example, running 126.gcc on the AlphaStation 200 4/100 takes 1280 seconds, and the reference machine takes 1700 seconds. So, the ratio for 126.gcc on the AlphaStation 200 4/100 is: 14

ratio126.gcc = 1700 / runtimeSUT(126.gcc) = 1700 / 1280 = 1.328.

where

1700 = Tref(126.gcc), the runtime of 126.gcc on the reference machine (a SPARCstation 10/40)
SUT = system under test
1280 = runtime of 126.gcc on SUT

In the integer test suite there are 8 programs. SPECint95 is the geometric mean of the ratios for the 8 programs, and SPECfp95 is defined similarly. A machine with SPECint95 of 2.0 is about twice as fast (at integer calculations) as the SPARCstation 10/40. Here are the runtimes and ratios for all 8 of the CINT95 programs on the AlphaStation 200 4/100: 14

program ref-time runtime ratio -------------- -------- ------- ----- 099.go 4600 2240 2.05 124.m88ksim 1900 1350 1.41 126.gcc 1700 1280 1.33 129.compress 1800 1216 1.48 130.li 1900 1299 1.46 132.ijpeg 2400 1540 1.56 134.perl 1900 1291 1.47 147.vortex 2700 2197 1.23

To determine the SPECint95 score for the AlphaStation, the geometric mean of the 8 ratios is computed:

mean = (2.05 × 1.41 × 1.33 × 1.48 × 1.46 × 1.56 × 1.47 × 1.23)(1/8) = 1.48

This average 1.48 is the SPECint_base95 score.

SPECfp95 scores are determined similarly from a set of 10 programs. Details of the 8 integer and 10 floating-point programs are given here.

Rate (Throughput) Benchmarks

The formulas used in the SPECrate95 metric are not publicly described by SPEC, but it is possible to assume that the design is the same as for SPECrate92. That would imply that the formula is:

SPECrate(program) = N × [ Tref(program) / Tref(145.fpppp) ] × [ 604800 / TSUT(program) ]

With the terms defined similarly to above. Note that the role of 056.ear has been replaced by 145.fpppp because 145.fpppp is the test program with the longest runtime on the reference machine (9600 seconds).

Using 126.gcc as an example, the CINT95rate results submitted by the tester 15 indicate that one copy was run and the runtime was 1280. Using the formula, we would predict that the SPECrate for 126.gcc would be 1 × [ 1700 / 9600 ] × [ 604800 / 1280 ], which is 83.67. However the submitted results indicate a "base rate" was 11.9. These numbers differ by a ratio of 7.03. Similar results are found for each of the other programs in the CINT95rate report for the AlphaStation:

program copies Tref runtime rate predicted ratio ------------- ------ ---- ------- ---- --------- ----- 099.go 1 4600 2240 18.5 129.4 6.99 124.m88ksim 1 1900 1350 12.7 88.67 6.98 126.gcc 1 1700 1280 11.9 83.67 7.03 129.compress 1 1800 1216 13.3 93.26 7.01 130.li 1 1900 1299 13.2 92.15 6.98 132.ijpeg 1 2400 1540 14.0 98.18 7.01 134.perl 1 1900 1291 13.2 92.72 7.02 147.vortex 1 2700 2197 11.1 77.42 6.97

The easiest way to explain make the formula fit these numbers is to replace the time constant 604800 (the number of seconds in a week) with the number of seconds in a day, 86400. The remaining discrepancy (e.g. 6.97 versus 7) is explained by roundoff error in the results report. Applying the same calculation to the numbers in other CPU95 rate reports gives similar results.

So, SPECint95 uses this formula to compute the individual rates:

SPECrate(program) = N × [ Tref(program) / Tref(145.fpppp) ] × [ 86400 / TSUT(program) ]

where

N = number of copies run concurrently (this may be different for each program if a peak SPECrate is being measured, but not for a base SPECrate. 41)
Tref(program) = time to run program on the reference machine, a SPARCstation 10/40
Tref(145.fpppp) = time to run 145.fpppp on the reference machine = 9600 seconds.
86400 = number of seconds in a day
TSUT(program) = time to finish last concurrent copy on system under test
SUT = system under test

This formula is (mostly) given in the CPU95 run rules 41, which state (section 4.2.2):

The "rate" calculated for each benchmark is a function of the number of copies run * reference factor for the benchmark * number of seconds in a day / elapsed time in seconds, which yield a rate in jobs/day.

Here, the phrase "reference factor for the benchmark" corresponds to the ratio Tref(program) / Tref(145.fpppp).


SPEC CPU2000

For SPEC CPU2000, the reference machine is a Sun Ultra 5/10 workstation with a 300-MHz SPARC processor and 256MB of memory, and this machine is given a SPECint2000 and SPECfp2000 score of 100. The program names and their reference run times are listed here.

Once again, there are two important changes in the actual formulas. The longest-runtime program is now 171.swim with a runtime of 3100, and the marathon-batch period has been decreased again from one day to one hour (3600).

SPEC CPU2000 - The benchmark suites, their method of use, and the results produced

CINT2000 - The suite of 17 compute-intensive programs used to measure integer performance

CFP2000 - The suite of 14 compute-intensive programs used to measure floating-point performance

SPECint2000 - A measure of speed for single-CPU machines, measures how fast the machine runs all the CINT2000 programs when told to run them one at a time.

SPECint_rate2000 - A measure of throughput for multi-CPU machines, measures how fast the machine can complete a number of simultaneous runs of programs from the CINT2000 suite, when told to run N copies of the same CINT2000 program at the same time.

score - A number produced by a performing a carefully regulated test run of the programs in a suite and averaging the results, normalized by comparing to the reference machine (the Sun Ultra 5/10)

Single-CPU Benchmarks

Each program is compiled with standard flags for a "base" measurement, or with tester-selected optimization flags for a "peak" measurement. The program is run three times; each runtime is measured and the median time is used. 40 The "Base Ratio" or "Peak Ratio" for that program run is computed as follows:

Ratio = 100 × RefTime / RunTime

For example, running 168.wupwise on a Compaq AlphaServer GS160 Model 6/731 produced a base run time of 399 seconds 25. The Base Ratio is 100×1600/399 = 401. That number tells us that, when running the 168.wupwise program compiled conservatively, the AlphaSever GS160 6/731 completed the run about 4.01 times faster than a Sun Ultra 5/10 300MHz workstation running the same program compiled the same way.

The formula for the overall SPECint2000 or SPECfp2000 score is a geometric mean of the ratios for all the programs in the benchmark suite (17 for INT, 14 for FP). For example, in the floating-point suite, the Compaq AlphaServer GS160 Model 6/731 got ratios as low as 145 (for 183.equake) and as high as 1217 (for 179.art), and the geometric mean of all the ratios was 405.

Throughtput (Rate) Benchmarks

For the concurrent throughput ratings (SPECint_rate2000 and SPECfp_rate2000), there are also "base" and "peak" versions, where the "peak" version is done with aggressive compiler optimization and "base" is compiled conservatively. The formula for the SPEC{int|fp}rate2000 score is:

CPU2000_Rate = geom mean [ CPU2000_Rate(program) ]

CPU2000_Rate(program) = N × [ Tref(program) / Tref(171.swim) ] × [ 3600 / TSUT(program) ]

where

N = number of copies run concurrently (this may be different for each program if a peak SPECrate is being measured, but not for a base SPECrate. 40)
Tref(program) = time to run program on the Sun Ultra 5/10
Tref(171.swim) = time to run 171.swim on the Sun Ultra 5/10 = 3100
3600 = number of seconds in an hour
TSUT(program) = time to finish last concurrent copy on system being tested
SUT = system under test

This formula is (mostly) given in the CPU2000 run rules 40, which state (section 4.3.2):

The "rate" calculated for each benchmark is a function of:
  the number of copies run *
  reference factor for the benchmark *
  number of seconds in an hour /
  elapsed time in seconds
which yields a rate in jobs/hour.

Here, the phrase "reference factor for the benchmark" corresponds to the ratio Tref(program) / Tref(171.swim).


SPEC CPU2006

For SPEC CPU2006, the reference machine is a Sun Ultra Enterprise 2 workstation 1 with a 296-MHz UltraSPARC II processor 2. This machine is similar to the Ultra 5/10 used in the CPU2000 suite, but has better cache and more RAM.

The reference machine is given a SPECint2006 and SPECfp2006 score of 1.00 3. The program names and their reference run times are listed here.

Single-Processor Integer (CINT2006) Calculation

There are 12 programs in the test suite. Each program is compiled and run three times, the runtimes are measured and the median 10 is used to calculate a runtime ratio 5. Ratios are obtained by dividing program run time into the reference machine run time. For example, consider the Sun Blade 1000 and the program 403.gcc 1,4:

ratio403.gcc = Tref(403.gcc) / TSUT(403.gcc) = 8050 / 2702 = 2.98

where

Tref(403.gcc) = runtime of 403.gcc on reference machine = 8050
SUT = system under test
TSUT(403.gcc) = runtime of 403.gcc on SUT = 2702

As you can see, ratios are higher for faster machines.

SPECint2006 is the geometric mean of the ratios for the 12 SPECint2006 programs, and SPECfp2006 is defined similarly. A machine with SPECint2006 of 2.0 is about twice as fast (at integer calculations) as the Sun Ultra Enterprise 2. Again, using the Sun Blade 1000 as an example, here are the runtimes and ratios for each of the 12 CINT2006 programs 4:

program ref-time runtime ratio -------------- -------- ------- ----- 400.perlbench 9770 3077 3.18 401.bzip2 9650 3260 2.96 403.gcc 8050 2702 2.98 429.mcf 9120 2331 3.91 445.gobmk 10490 3310 3.17 456.hmmer 9330 2587 3.61 458.sjeng 12100 3449 3.51 462.libquantum 20720 10318 2.01 464.h264ref 22130 5259 4.21 471.omnetpp 6250 2572 2.43 473.astar 7020 2554 2.75 483.xalancbmk 6900 2018 3.42

To determine the SPECint2006 score for the Sun Blade 1000, the geometric mean of the 12 ratios is computed:

mean = (3.18 × 2.96 × 2.98 × 3.91 × 3.17 × 3.61 × 3.51 × 2.01 × 4.21 × 2.43 × 2.75 × 3.42)(1/12) = 3.12

Since the test was performed without special adjustments in compiler flags or other similar optimizations, it is a SPECint®_base2006 score.

Multi-Processor Integer Throughput (CINT2006 Rate) Calculation

The same test suite is used. The reference machine is the same Sun Ultra Enterprise 2, and its SPECint_rate2006 and SPECfp_rate2006 scores are both 1.00.

To test a machine, a number of copies N is selected — usually this is equal to the number of CPUs, cores, hardware threads, or user register sets on the test system, but that is not required. 6,11

NOTE : It is now very common for systems to implement hardware-supported simultaneous multithreading (SMT). This is the feature called "Hyperthreading" by Intel in its Core i7 processors, and exists in its latest Xeon processors as well as the IBM POWER5 and POWER6. The UltraSPARC T1 and IBM's POWER7 have four-way multithreading. The future "Bulldozer"-based CPUs from AMD will accelerate multiple threads in a way that is not directly comparable to standard SMT. In most cases, any posted SPEC results will have been achieved in whatever way produces the highest result, and that often means running more than one copy per CPU core. Do not assume the number of copies is the number of cores. Read the SPEC CPU2006 results report for the system in question to find out.

The rate score for the system under test is determined from a geometric mean of rates for each program in the test suite: 7

CPU2006_Rate = geom mean [ rate(program) ]

Each individual test program's rate is determined by taking the median 10 of three runs 5 (as above for the speed metric). Each run consists of N copies of the program running simultaneously on the test system. Its time is the time it takes for all the copies to finish (that is, the time from when the first copy starts until the last copy has finished). The rate metric for that program is calculated by the following formula: 8

rate(program) = N × Tref(program) / TSUT(program)

where

N = number of copies run concurrently (this may be different for each program if a peak SPECrate is being measured, but not for a base SPECrate. 39)
Tref(program) = time to run one copy of the program on the Sun Ultra Enterprise 2
TSUT(program) = time to finish last concurrent copy on system being tested
SUT = system under test

This formula is (mostly) given in the CPU2006 run rules 42, which state (section 4.3.2):

The "rate" calculated for each benchmark is a function of:
  the number of copies run *
  reference factor for the benchmark /
  elapsed time in seconds
which yields a rate in jobs/time.

Here, the phrase "reference factor for the benchmark" refers simply to Tref(program).

For example, consider the AMD Shuttle SN25P and the program 403.gcc 9:

CPU2006_Rate(403.gcc) = 2 × Tref(403.gcc) / TSUT(403.gcc) = 2 × 8050 / 875 = 18.4

where

Tref(403.gcc) = time to run one copy of 403.gcc on the Sun Ultra Enterprise 2 reference system = 8050 (see above)
TSUT(403.gcc) = time to run 2 copies of 403.gcc on the Shuttle SN25P = 875

This calculation is performed for each program in the test suite. The figures for all 12 integer programs on the AMD Shuttle SN25P follow:

program copies ref-time runtime ratio -------------- ------ -------- ------- ----- 400.perlbench 2 9770 823 23.7 401.bzip2 2 9650 1485 13.0 403.gcc 2 8050 875 18.4 429.mcf 2 9120 1218 15.0 445.gobmk 2 10490 782 26.8 456.hmmer 2 9330 1244 15.0 458.sjeng 2 12100 1009 24.0 462.libquantum 2 20720 2703 15.3 464.h264ref 2 22130 1529 29.0 471.omnetpp 2 6250 1003 12.5 473.astar 2 7020 1098 12.8 483.xalancbmk 2 6900 994 13.9

To determine the SPECint_rate2006 score for the Shuttle SN25P, the geometric mean of the 12 ratios is computed:

mean = (23.7 × 13.0 × 18.4 × 15.0 × 26.8 × 15.0 × 24.0 × 15.3 × 29.0 × 12.5 × 12.8 × 13.9)(1/12) = 17.5


Conversions


So you still use Dhrystone?
So you still use Dhrystone?


As described in each section below, it is possible (and in most cases easy) to convert the results of one benchmark into "equivalent" results for another benchmark of the same type (either integer or floating-point).

However, the results of such conversions are only meaningful in certain limited cases.

Dhrystone                                    
   ↑
1757
   |
Dhry-MIPS Whetstone
MWIPs
   ↑
1.386
   |
   |
1.239
   ↓
SPECint89 SPECfp89
   ||
1.000
   ||
   ||
1.000
   ||
SPECint92   — 23.72 →   SPECrateInt92 SPECfp92   — 23.72 →   SPECrateFp92
   ↑    ↑    ↑    ↑
50.20 132.3 60.20 158.7
   |    |    |    |
SPECint95   — 9.000 →   SPECrateInt95 SPECfp95   — 9.000 →   SPECrateFp95
   |    ↑    |    ↑
8.264 93.76 7.752 99.96
   ↓    |    ↓    |
SPECint2000  ← 86.10 —  SPECint2000_rate SPECfp2000   ← 86.10 —  SPECfp2000_rate
   ↑    ↑    ↑    ↑
116.0 1.347 147.0 1.707
   |    |    |    |
SPECint2006  == 1.000 ==  SPECint2006_rate SPECfp2006   == 1.000 ==  SPECfp2006_rate

Converting Between Dhrystone and SPECint89

Prior to SPEC, the de-facto accepted standard benchmark of integer CPU performance was Dhrystone. Dhrystone was replaced by the 1989 SPEC integer metric. It had been often criticized for many reasons including its small code size, abuse by salesmen and customers, disproportionate emphasis on string operations, etc.

However, despite all the lack of faith in Dhrystone there was actually a very good correlation between the SPEC integer benchmark and the Dhrystone benchmark. Using data from 28 systems for which Dhrystone and SPECint89 results were readily available and 4 systems for which the SPEC existed and the Dhrystone could be easily deduced, Al Aburto performed linear regression least-squares analysis and found a correlation coefficient of 0.971 between Dhrystone version 1.1 and SPECint89 28.

Based on the data from these 32 systems, we get these (approximate) conversions:


Dhrystone-score ≈ 2435 × SPECint89-score

Dhrystone "MIPS" score ≈ 1.39 × SPECint89-score

"Dhrystone MIPS" is an old interpretation of the Dhrystone benchmark, calculated to give the VAX 11/780 a score of 1.0. This was simply the raw score (loops per second) divided by 1757 (the raw score of a VAX 11/780). The acronym "MIPS" stands for Million Instructions Per Second, and "Dhrystone MIPS" is so-called in reference to "VAX MIPS". The VAX 11/780 had performance similar to the IBM System/370 model 158-3, which was marketed as a "1 MIPS" machine. The term "VAX MIPS" was also common in those days, and was a unit of performance expressed in terms of the VAX. Note that 2435 is about 1757 times 1.39.

The value of 1.39 stated above is based on statistics from a large number of machines. There seems to be a paradox — the Dhrystone MIPS conversion gives the VAX a score of 1.0, and SPEC was calibrated to give the VAX a score of 1.0 as well — so why does the linear regression indicate that Dhrystone MIPS is 1.39 times as great as SPECint89?

The answer to the paradox lies in the fact that the VAX 11/780 was actually not a very typical machine. Compared to machines of the following decade (which comprise the bulk of the Dhrystone and SPEC data), the VAX 11/780 performs less well on Dhrystone, as compared to real-world applications, than one would expect. This might have been the result of the aforementioned heavy emphasis on string operations.

So in the interest of being more accurate for a larger number of machines, we can either ignore the VAX 11/780, or refer to it as a "1.39 MIPS" machine.

Converting Between Whetstone and SPECint89

In the 1980's there were a few different floating-point benchmarks, all subject to criticisms similar to those raised over Dhrystone. The one most similar to Dhrystone for our purposes is Whetstone because it also used a small code size and was normalized to a nominally "1-MIPs" machine.

A similar correlation study can be done and the results almost always show a far lower correlation, because there was a far greater variation in floating-point performance among the machines that were typically being measured with the Whetstone benchmark. To cite one simple example, some CPUs, notably the Intel 80486, only implemented double precision in hardware and provided single precision at the same speed as double precision.

Based on the 13 systems listed here, the average ratio between MWIPs and SPECfp89 is 0.9867 for single precision and 0.8072 for double precision (for which only 11 systems were measured).


Converting between SPEC89 and SPEC92

SPEC89 and SPEC92 both use the same reference machine (a VAX 11/780), and both metrics give the VAX a score of 1.00, so they are directly comparable:


SPECint92-score ≈ SPECint89-score

SPECfp92-score ≈ SPECint89-score


Converting between SPEC92 and SPEC95

SPEC95 uses the SPARCstation 10 model 40 as its reference machine; this machine has a SPECint92 score of 50.2 16 and SPECfp92 of 60.2 17. By definition, its CPU95 scores are both 1. So to convert (approximately) between SPEC92 and SPEC95 numbers, multiply or divide by 50.2 or 60.2.

SPECint95-score ≈ SPECint92-score / 50.2

SPECfp95-score ≈ SPECint92-score / 60.2


Converting between SPEC95 and SPEC2000

SPEC2000 uses the Sun Ultra 5/10 300MHz as its reference machine; this machine has a SPECint95 score of 12.1 30 and SPECfp95 of 12.9 31. By definition, its CPU2000 scores are both 100. So to convert (approximately) between CPU95 and CPU2000 numbers, multiply or divide by 100/12.1=8.26 or 100/12.9=7.75 for SPECint and SPECfp respectively:

SPECint2000-score ≈ 100 * SPECint95-score / 12.1

SPECfp2000-score ≈ 100 * SPECint95-score / 12.9


Converting between SPEC2000 and SPEC2006

SPEC CPU2006 uses the Sun Ultra Enterprise 2 with a 296 MHz processor as its reference machine. It has a SPECint2000 score of 116 32 and SPECfp2000 of 147 33. By definition, its CPU2006 scores are both 1.00. So to convert (approximately) between CPU2000 and CPU2006 numbers, multiply or divide by 1/116=0.00862 or 1/147=0.00680 for SPECint and SPECfp respectively:

SPECint2006-score ≈ SPECint2000-score / 116

SPECfp2006-score ≈ SPECfp2000-score / 147

(The Sun Ultra Enterprise 2 is similar to, but slightly better than, the Ultra 5/10 300 MHz used as the reference system for SPEC CPU2000. Because of this, the Ultra Enterprise 2 was already a fairly old and relatively slow machine by 2006. That's why nearly all SPEC2006 scores are much higher than the reference machine's score.)


Relationship Between Speed and Rate Metrics

The speed metrics (SPECint and SPECfp) measure how quickly a single task can be completed (implicitly a single-threaded task running on one CPU core). The rate metrics (SPECint_rate and SPECfp_rate) measure the overall capacity for the system to complete tasks (with run-rules allowing the tester to run as few or as many simultaneous tasks as it takes to attain the highest rate).

This is the type of thing people seem either to completely understand, or else not understand at all. I will make an analogy that is similar to (but I believe a bit better then) the one used by an author at SPEC back in the 1990's.

Yesterday I went to a favorite diner for breakfast. It is a small place with few customers, and they were not busy when I arrived. I ordered a ham and cheese omelette. They have one chef, who is equipped with one omelette pan, one stove, and one square meter of counter space next to the stove. He heard my order, got to work immediately, and I had my omelette in 5 minutes.

This morning I went to a much larger restaurant. They have five chefs, each of whom has his own stove with five burners, five omelette pans and five square meters of counter space. This restaurant has at least five times as much of everything as the one I went to yesterday, and when I arrived none of them were busy. I gave my order (an omelette) to the waiter. The order was delivered to the kitchen in one minute. Five minutes later my omelette was ready, and another minute later it was delivered to me by the waiter. Total time, 7 minutes.

The larger restaurant had much greater capacity then the first one — but it took them the same time (actually a bit longer) to fulfill my order. The reason is clear — no matter how much equipment and manpower you throw at it, the task of cooking an omelette cannot be accelerated beyond certain basic limits.

If I wanted to have breakfast with four friends and we all wanted omelettes, then the larger restaurant would indeed be faster. The diner with one chef and one pan would take about 25 minutes to finish preparing our breakfast, while the larger one would probably finish the task in 7 or 8 minutes.

This is an analogy to what computers are doing. Most modern computers have more than one processor, and are able to run two or more tasks at full speed. While there are many tasks that can be broken up and shared among multiple processors, some cannot, and some have pieces that are fundamentally atomic (indivisible). In addition, certain tasks (such as delivering the data to the processor) might happen in parallel but still incur a delay that isn't there with a single-processor machine.

The speed metrics (SPECint and SPECfp) measure the speed at which these "atomic" tasks can be completed under ideal circumstances.

Note that although I just referred to "processors", the same principles apply to "cores", "virtual cores", "hardware threads" and other similar developments in modern hardware today.

Why it is important to know the difference between speed and rate metrics

Example: Alex is using a dual-processor workstation with a SPECint rating of 5. He is shopping for another to use on the same or similar work, and decides to buy a single-CPU system with a rating of 7. He is disappointed to discover that the new machine is a little slower than the old — he expected it to be faster. The reason is that, unknown to him, he had been utilizing the old machine's ability to perform two tasks simultaneously. The new machine, while faster at performing single tasks, has a lesser capacity to finish multiple tasks over a period of time. Alex would have been better off comparing machines based on their SPECint_rate metrics. Unfortunately, it is likely that the rate metric for the single-CPU machine is not available.

Why it is important to have rate metrics for single-CPU machines

Example: Brett runs a research project related to weather forecasting. He has just gotten permission to procure a workstation to replace his current setup, a pair of brand-A single-CPU workstations rated at 10 SPECfp each. The goal for the purchase is to get the job onto a single system that can do the work in 2 days. The current setup takes 3.5 days to finish processing 126 separate datasets, a task performed weekly. Therefore Brett estimates that any system that will deliver 35 SPECfp can accomplish his goal (because 35×2 days = (10+10)×3.5 days).

He finds a few single-CPU systems from brand-B rated 45 to 50 on the SPECfp metric. He knows these will do the job — but he also discovers that for the same money, a brand-C dual-CPU workstation can be bought, using CPUs that individually rate about 30. Even though 30×2=60, we have no idea how well the brand-C workstation might do on the single-CPU SPECfp metric, because SPECrate_fp involves multiple concurrent tasks which might run sub-optimally on a single-CPU machine due to shared resource contention.

Brett knows that his workload is easily broken up among multiple CPUs because that is what he is already doing each week. Unfortunately, he cannot compare the brand-B and brand-C workstations because they do not all have the same metrics. The dual-CPU systems have SPECrate_fp scores and the single-CPU systems have SPECfp scores. This problem could be solved easily if brand-B would provide SPECrate_fp scores for its single-CPU systems.


Impossible Concurrency
Impossible Concurrency



Speed and Rate conversion for CPU92

As described above, the SPECrate for each individual program in the integer or FP suite is calculated as follows:

SPECrate(program) = N × [ Tref(program) / Tref(056.ear) ] × [ 604800 / TSUT(program) ]

where

N = number of copies run concurrently
Tref(program) = time to run program on the reference machine, a VAX 11/780
Tref(056.ear) = time to run 056.ear on the reference machine = 25500 seconds. 12
604800 = number of seconds in a week
TSUT(program) = time to finish last concurrent copy on system under test
SUT = system under test

Therefore, a SPECrate for the VAX 11/780, based on running just a single copy of a particular program, would be:

SPECrate(program) = 1 × [ Tref(program) / Tref(056.ear) ] × [ 604800 / Tref(program) ]

the Tref(program) terms cancel out, leaving

SPECrate(program) = 604800 / Tref(056.ear) = 23.72

for all programs. Since the SPECrates for the programs in the INT and FP suites are all computed normalized to 056.ear 12, SPECrate_int92 and SPECrate_fp92 for a VAX 11/780 will be 23.72.

This allows us to convert from SPEC92 to SPECrate92 for single-processor machines:

SPECrateInt92 = 23.72 × SPECInt92
SPECrateFP92 = 23.72 × SPECFP92

For example, the Dell Dimension XPS (133MHz, 512KB L2) is a single-processor system with a reported SPECint92 of 177.9. 36 The SPECrate_int92 was also reported for that system, and the tester ran each program in the suite with N=1 as the number of concurrent copies. They reported 37 a SPECrate_int92 of 4144, very close to the predicted value 23.72 × 177.9 = 4220.

This conversion for single-processor machines was frequently performed by third parties to allow comparison of a single-processor system to a dual-processor system by customers only interested in long-term homogeneous capacity: 24,23

Computed specrates are indicated by "c". They're computed from SpecInt92, SpecFP92 (for uniprocessors) using a scaling factor. This number is usually slightly less than or equal to a measured specrate on a uniprocessor. The scaling factor is the number of seconds in a week, divided by the time of the longest-running benchmark on the reference SPEC VAX 11/780, which is 604800/25500, or about 23.7. - John DiMarco

A more general conversion formula factors in the number of processors:

SPECrateInt92 = 23.72 × P × SPECInt92
SPECrateFP92 = 23.72 × P × SPECFP92

where P is the number of CPUs45. It should be understood that this formula only gives a theoretical ideal maximum which is never achieved. For example, if a 4-CPU system is built from CPUs that deliver 112 SPECint92 in single-CPU systems, then the 4-CPU system, if designed really well, will have a SPECrateInt92 close to but noticably less than 23.72 × 4 × 112 = 10626.

This formula can be reversed to convert the other way:

SPECInt92 = SPECrateInt92 / (23.72 × P)
SPECFP92 = SPECrateFP92 / (23.72 × P)

For example, a 4-CPU server with a SPECrate_int92 of 2372 would not deliver 100 SPECint92, because SPECint92 is based on running a single task on a single processor. Instead, each of its 4 cpus would deliver 25 SPECint92 (or probably a little more, because the SPECrate figure includes the inefficiency of the OS overhead for multiprocessing).


Speed and Rate conversion for CPU95

As described above the SPECrate scores for each program in the integer or FP suite is calculated as follows:

SPECrate(program) = N × [ Tref(program) / Tref(145.fpppp) ] × [ 86400 / TSUT(program) ]

where

N = number of copies run concurrently
Tref(program) = time to run program on the reference machine, a SPARCstation 10/40
Tref(145.fpppp) = time to run 145.fpppp on the reference machine = 9600 seconds.
86400 = number of seconds in a day
TSUT(program) = time to finish last concurrent copy on system under test
SUT = system under test

If the system under test has just one processor, the tester creating the CINT95rate or CFP95rate rate scores would run just one copy of each program.

ratio(program) = Tref(program) / runtime(program)

rate(program) = 1 × [ Tref(program) / Tref(145.fpppp) ] × [ 86400 / TSUT(program) ]

The ratio between a program's ratio and rate would be:

rate(program) / ratio(program) = [ runtime(program) × Tref(program) × 86400 ] / [ TSUT(program) × Tref(program) × Tref(145.fpppp) ]

Since only one copy is being run, runtime(program) and TSUT(program) are the same. Also, the Tref(program) terms cancel out, so we get:

rate(program) / ratio(program) = 86400 / Tref(145.fpppp) = 9.00

predicting that the base rates in a single-processor system's SPECint_rate95 report will be 9 times the base ratios in the SPECint95 report.

The AlphaStation 200 4/100 results 14,15 provide a convenient example of this:

program ratio rate rate/ratio ------------ ----- ---- ---------- 099.go 2.05 18.5 9.02 124.m88ksim 1.41 12.7 9.01 126.gcc 1.33 11.9 8.95 129.compress 1.48 13.3 8.99 130.li 1.46 13.2 9.04 132.ijpeg 1.56 14.0 8.97 134.perl 1.47 13.2 8.98 147.vortex 1.23 11.1 9.02

confirming the prediction made by the formulas:

SPECrate_{int|fp}95 = 9.00 × SPEC{int|fp}95 for single-processor systems
SPEC{int|fp}95 = SPECrate_{int|fp}95 / 9.00 for single-processor systems

Looking at many SPEC95 and SPECrate95 results for various single-CPU systems, this appears to actually be the case.

When multiple copies are run on multiple CPUs46, the disk and memory, and sometimes the cache, are being shared. This prevents a multi-CPU system from attaining 100% efficiency. However, these formulas can predict the maximum theoretical performance that could be attained with P simultaneous processes44, given the speed scores of a single processor:

maximum SPECrate{Int|FP}95 = 9.00 × P × SPEC{Int|FP}95
SPEC{Int|FP}95 = maximum SPECrate{Int|FP}95 / (9.00 × P)


Speed and Rate conversion for CPU2000

As described before, the formula for SPEC{int|fp}rate2000 is:

CPU2000_Rate = geom mean [ CPU2000_Rate(program) ]

CPU2000_Rate(program) = N × [ Tref(program) / Tref(171.swim) ] × [ 3600 / TSUT(program) ]

where

N = number of copies run concurrently
Tref(program) = time to run program on the Sun Ultra 5/10
Tref(171.swim) = time to run 171.swim on the Sun Ultra 5/10 = 3100
3600 = number of seconds in an hour
TSUT(program) = time to finish last concurrent copy on system being tested
SUT = system under test

The reference machine is a Sun Ultra 5/10. The CINT2000 or CFP2000 Rate for the Sun Ultra 5/10, based on running just a single copy of a particular program, would be:

CPU2000_Rate(program) = 1 × [ Tref(program) / Tref(171.swim) ] × [ 3600 / Tref(program) ]

The Tref(program) terms cancel out, leaving

CPU2000_Rate(program) = 3600 / Tref(171.swim) = 1.161

for all programs. Since the rates for the programs in the CINT2000 and CFP2000 suites are all computed normalized to 171.swim, CINT2000 and CFP2000 rates for a Sun Ultra 5/10 will be 1.161.

Since the SPECint2000 and SPECfp2000 scores for the Sun Ultra 5/10 are both 100, the formulas for conversion for a single-processor machine are:

SPECint2000 = (100 / 1.161) × SPECint_rate2000 = 86.1 × SPECint_rate2000 for single-processor systems
SPECfp2000 = (100 / 1.161) × SPECfp_rate2000 = 86.1 × SPECfp_rate2000 for single-processor systems

Using the published results for certain single-processor machines (such as the Compaq AlphaServer GS160 Model 6/731, which was tested and published in April 2000) 25,26 and comparing the base ratios in its C{int|fp}2000 results to its C{int|fp}2000_Rate results, it is easy to see that the conversion formulas do in fact work. For example, it has a base SPECint2000 of 353, and a base SPECint_rate2000 of 4.09. 352 divided by 4.09 equals 86.1.

When multiple copies are run on multiple CPUs46, the disk and memory, and sometimes the cache, are being shared. This prevents a multi-CPU system from attaining 100% efficiency. However, these formulas can predict the maximum theoretical performance that could be attained with P simultaneous processes44, given the speed scores of a single processor:

maximum SPEC{int|fp}2000_Rate ≈ P × SPEC{int|fp}2000 / 86.1 SPEC{int|fp}2000 ≈ maximum SPEC{int|fp}2000_Rate × 86.1 / P

This conversion rate is evident from the example conversion given in the documentation for the SPEC CPU2000 utility program rawformat, which gives an example conversion51 showing that a SPECfp_base2000 score of 176 is equivalent to a SPECfp_rate_base2000 score of 2.05.


Speed and Rate conversion for CPU2006

Unlike each of the previous versions of SPEC{int|fp}_rate, there are no scaling factors in the equation that make it difficult to compare the speed and rate metrics for single-processor systems.43 This is clear both from the scores of the reference machine under the speed and rate metrics 2,38, and from the formulas. As stated above, for the speed metric:

ratioprogram = Tref(program) / TSUT(program)

and for the rate metric:

rate(program) = N × Tref(program) / TSUT(program)

On a single-processor machine, N will usually be 1 and the TSUT(program) values will be the same, resulting in ratios and rates being equal. Since SPECint2006 is the geometric mean of the ratios, and SPECint_rate2006 is the geometric mean of the rates, these end up being directly comparable:

SPECrate_{int|fp}2006 ≈ SPEC{int|fp}2006 for single-processor systems
SPEC{int|fp}2006 ≈ SPECrate_{int|fp}2006 for single-processor systems

This one-to-one conversion rate is evident from the example conversion given in the documentation for the SPEC CPU2006 utility program rawformat, which gives an example conversion52 similar to the example in the CPU2000 suite 51, and showing that a SPECint(R)_base2006 score of 10.1 is equivalent to a SPECint(R)_rate_base2006 score of 10.1.

When multiple copies are run on multiple CPUs46, the disk and memory, and sometimes the cache, are being shared. This prevents a multi-CPU system from attaining 100% efficiency. However, these formulas can predict the maximum theoretical performance that could be attained with P simultaneous processes44, given the speed scores of a single processor:

ideal maximum SPECrate{Int|FP}2006 = P × SPEC{Int|FP}2006
SPEC{Int|FP}2006 = ideal maximum SPECrate{Int|FP}2006 / P

It is interesting to note that CPU2006 has made conversion quite easy: the speed metric times the number of processors equals the (theoretical maximum achievable) rate metric. This change is counter to the 1992 philosophy discouraging this sort of comparison, but is in keeping with recent developments in the marketplace: single-processor, single-core workstations are getting to be rather rare, and software developers are under pressure to adapt their products to take advantage of dual or quad CPU cores in order to remain competitive and to meet customers' needs (or at least expectations) of increased performance.

I imagine that in a future CPU suite, the speed metric will be redesigned to allow (and perhaps emphasize) tasks that use multiple threads and multiple CPU cores when available. I note that in the submission guidelines for the CPUv6 Suite search, after describing what sorts of real-world applications would be considered for the new suite, there is the sentence49:

It is also acceptable for parallel/threaded codes or applications to be submitted.

The libquantum Dispute

It is generally agreed that automatic optimization by compilers is a useful thing, and should be included in benchmarks — but that it is also useful to have benchmarks that disallow optimization. This, the SPEC CPU suites include a "base" measurement and a "peak" measurement, with per-program optimization settings only allowed for the "peak".

There is considerably more debate about what types of automatic optimizations should be allowed. The most common types, such as subexression elimination and loop unrolling, are usually accepted without question. More aggressive techniques, such as vectorization (the use of an AltiVec or SSE vector unit to parallelize a loop) are more controversial but seem to be accepted because they are so commonly used in actual real-world applications.

The 462.libquantum benchmark in CPU2006 took this debate to a new level. The Intel and Sun compilers (since well before 2010), GCC (starting with version 4.6) and the AMD Open64 compiler (since version 4.2.4) are able to turn certain types of nested loops into a set of threads running in parallel. In some test cases (such as [%%% use bench-src/SPEC2006/dbg-libquantum to find examples]) the 462.libquantum component of a CPU2006 benchmark run can give a result up to 70 times greater than the geometric mean of the other components.

There are two competing points of view in the libquantum debate. Both have merit and serve different objectives.

Both of these arguments are valid and useful, and I personally find both approaches to be useful at different times.

The above arguments, particulatly the second one, can be inflamed into hyperbole. A commonly seen example:

[...] everyone is aware that libquantum is a broken benchmark [...]

Of course, the benchmark is broken if you are counting on SPEC{int|fp}2006 to be a "single-core" benchmark. But it is not at all broken if you treat it as a "single task" benchmark. The problem is that a lot of people want SPEC{int|fp}2006 to be a single-core benchmark and others want it to be a single-task benchmark. There is little understanding among the benchmarking community about which of these is the "intended" or "correct" purpose of SPEC{int|fp}2006.

What we really need is both: a SPEC CPU "single core" benchmark, and a separate SPEC CPU "single task" benchmark (along with the existing, un-"broken", SPECRate which measures "many identical tasks in parallel".

To resolve the issue, the SPEC consortium will probably want to offer more types of benchmark results in CPUv6 than the four that are in CPU2006. Something like this will almost certainly happen, given the "parallel/threaded codes or applications" quote I gave in the previous section. Perhaps we'll have three benchmarks as I've just suggested: the pre-existing SPEC{int|fp}2006 and SPECrate{Int|FP}2006, with the former being a pure "single core" benchmark, and a third benchmark for the "sigle task" benchmark. I suspect that explicitly multi-threadable tasks/applications will be grouped together with ostensibly "single-threaded" but parallelizable tasks/applications (like libquantum) for this new "single-task, possibly parallelizable" benchmark. This benchmark will fit nicely between the existing speed metrics (single-task, single-threaded) and rate metrics (many identical tasks, each single-threaded) that are in CPU2006.


Dhrystone and Whetstone data for some early systems

The following tables are from Al Aburto's old speccorr.tbl file28. Here is Aburto's introduction from that file:

This is the data set of Dhrystone and SPECratio and SPECin89 data I have.   Not much data really. I have about 75 SPECratio and SPECint89 data points and about 3 times as many Dhrystone1.1 data points, but it is difficult lining up the results with the same systems and compilers (impossible really). I just do the best I can, and despite that, the results really turn out pretty good. The Dhrystone results I obtained from the French Unix Users group. You can ftp these results (below) via anonymous ftp from ftp.nosc.mil (128.49.192.51) in directory 'pub/aburto'.

Here is the actual data (showing just the relevant columns) from systems that were measured via both SPEC 89 and at least one of the Dhrystone and Whetstone benchmarks. In the first table note there is no row "09".

Last update: 28 Sep 1992 Dhrystone1.1 SPEC int89 ------------ ----- System MHz D/S Ratio * 00 DEC VAX 11/780 5.00 1757 1.0 1.0 01 HP 9000/340 16.67 6677 3.8 2.7 02 HP 9000/370 33.33 14407 8.2 5.2 03 DECstation 2100 12.50 18273 10.4 8.7 04 Sun 4/260 16.67 19900 11.3 8.7 05 Sun SPARCstation 1 20.00 20206 11.5 9.5 06 HP 9000/834 15.00 23441 13.3 10.2 07 Sun SPARCstation 1+ 25.00 23720 13.5 11.2 08 MIPS RC2030 16.67 26179 14.9 11.3 10 DG AV 310 20.00 37073? 21.1? 11.6 11 DECstation 3100 16.67 26600 15.1 11.8 12 HP Apollo 10000 18.20 27000 15.4 11.9 13 Sun SPARCstation 330 25.00 27777 15.8 12.3 14 HP 9000/425s 25.00 35140? 20.0? 12.9 15 MIPS M/120-5 16.67 30572 17.4 13.0 16 i486 25.00 25477 14.5 13.3 17 SGI 4D/25S 20.00 29342 16.7 14.0 18 IBM RS6000/320 20.00 51832? 29.3? 15.8 19 IBM RS6000/520 20.00 52183? 29.7? 15.8 20 AT&T Starserver E 33.00 47439 27.0 17.2 21 Stardent 3010 32.00 42695 24.3 18.6 22 DECstation 5000/200 25.00 45331 25.8 18.9 23 MIPS RC3260 25.00 42735 24.3 19.3 24 MIPS Magnum (RC3230) 25.00 43103 24.5 19.5 25 CDC 4360 25.00 46209 26.3 19.7 26 MIPS M/2000 25.00 47400 27.0 19.8 27 Sun SPARCstation 2 40.00 50075 27.5 20.2 28 IBM RS6000/530 25.00 64789 36.9 20.2 29 IBM RS6000/540 30.00 78187 44.5 24.0 30 HP 9000/720 50.00 98041 55.8 38.5 31 CDC 4680 (Beta) 60.00 97338 55.4 42.0 32 HP 9000/730 66.67 130897 74.5 51.0   Linear Correlation of: [...] MHz to SPECint89: 0.890   Dhrystone 1.1 to GCC1.35: 0.939 Dhrystone 1.1 to ESP: 0.974 Dhrystone 1.1 to LI: 0.939 Dhrystone 1.1 to EQN: 0.980 Dhrystone 1.1 to SPECint89: 0.971 [...]

Aburto also shows correlation coefficients to each of the components of SPECint89 and to clock speed. Note how much closer Dhrystone agrees with SPECint89 and each of its components than to clock speed.

Since Dhrystone is so closely correlated to SPECint89, it makes sense to use this data to compute the conversion factor. Note that the Dhrystone "MIPS" is the column labeled "Ratio" in Aburto's table. I wrote a simple program to compute an average ratio from the Aburto data, and got:

Out of 32 systems, geometric mean of Dhry/SPECint89 = 2435.09 Geometric mean of Dhrystone MIPS/SPECint89 = 1.39

Another data file from Aburto gives similar data for Whetstone and SPECfp89:

System MWIPs SPEC MHz singl doubl fp89 -- -------------------- ------ ----- ----- ---- 00 VAX 11/780 5.00 1.18 0.76 1.0 01 HP 9000/340 16.67 1.70 1.50 1.1 02 Sun 4/260 16.67 8.50 6.80 4.3 03 Sun SPARCstation 1 20.00 8.00 5.60 7.5 04 HP 9000/834 15.00 9.00 6.60 9.1 05 DECstation 3100 16.67 13.00 10.30 10.9 06 DEC DS5400 20.00 17.20 11.3 07 Sun SPARCstation 330 25.00 12.30 9.90 11.6 08 Sun SPARCstation 490 33.00 17.40 14.0 17.0 09 HP DN10010 18.2 14.3 14.1 20.7 10 Intel 486 66DX2 66.0 15.3 15.5 21.2 11 IBM RS6000/530 25.0 18.8 36.7 12 Convex C240 25.0 17.0 12.5 38.7

Aburto did not try to correlate Whetstone to SPECfp89 (there really aren't enough data points) but the average ratios are:

Geometric means: Whetstone Single/SPECfp ratio = 0.9867 (13 systems) Whetstone Double/SPECfp ratio = 0.8072 (11 systems)


Reference Machine Times for the 1989 SPEC Benchmark Suite for UNIX Systems

The times are from the old "specin89.tbl" and "specfp89.tbl" files formerly made available by Aburto and Simizu. 21 The similarities to SPEC92 programs were gleaned from a paper by Aashish Phansalkar et al. [62]

Reference machine: Digital VAX 11/780   CPU: 5 MHz VAX Primary Cache: 8K Secondary Cache: None Memory: ?   SPECint89 Benchmark Reference Name Time(Sec) Description ------------ --------- -------------------- GCC 1481.5 GNU C compiler ESP 2266.0 Minimization of boolean functions (Similar to 008.espresso) LI 6206.2 Lisp interpreter (Similar to 022.li) EQNtott 1100.8 Conversion from equation to truth table (Similar to 023.eqntott)   SPECfp89 Benchmark Reference Name Time(Sec) Description ------------ --------- -------------------- SPICE2G6 23951.4 Circuit simulation (Similar to 013.spice2g6) DODUC 1863.0 Monte Carlo simulation (Similar to 015.doduc) NASA7 20093.1 NASA Ames FORTRAN Kernels (Similar to 093.nasa7) MATRIX300 4525.1 Matrix multiplication FPPPP 3038.4 Quantum chemistry -- simulates chemical reactions by evaluating definite integrals (Similar to 094.fpppp) TOMCATV 2648.6 Vectorized mesh generation (Similar to 047.tomcatv)


Reference Machine Times for SPEC_CPU_92

The descriptions are from Jeffrey Reilly's SPEC FAQ 19.

Reference machine: Digital VAX 11/780   CPU: 5 MHz VAX Primary Cache: 8K Secondary Cache: None Memory: ?   CINT92 (a.k.a. SPECint92) Benchmark Reference Name Time(Sec) Description ------------ --------- ------------------------------- 008.espresso 2270 Generates and optimizes Programmable Logic Arrays 022.li 6210 Uses a LISP interpreter to solve the nine queens problem, using a recursive backtracking algorithm 023.eqntott 1100 Translates a logical representation of a Boolean equation to a truth table 026.compress 2770 Reduces the size of input files by using Lempel-Ziv coding 072.sc 4530 Calculates budgets, SPEC metrics and amortization schedules in a spreadsheet based on the UNIX cursor-controlled package "curses" 085.gcc 5460 Translates preprocessed C source files into optimized Sun-3 assembly language output   CFP92 (a.k.a. SPECfp92) Benchmark Reference Name Time(Sec) Description ------------ --------- ------------------------------- 013.spice2g6 24000 Simulates analog circuits (double precision) 015.doduc 1860 Performs Monte-Carlo simulation of the time evolution of a thermo-hydraulic model for a nuclear reactor's component (double precision) 034.mdljdp2 7090 Solves motion equations for a model of 500 atoms interacting through the idealized Lennard-Jones potential (double precision) 039.wave5 3700 Solves particle and Maxwell's equations on a Cartesian mesh (single precision) 047.tomcatv 2650 Generates two-dimensional, boundary-fitted coordinate systems around general geometric domains (vectorizable, double precision) 048.ora 7420 Traces rays through an optical surface containing spherical and planar surfaces (double precision) 052.alvinn 7690 Trains a neural network using back propagation (single precision) 056.ear 25500 * Simulates the human ear by converting a sound file to a cochleogram using Fast Fourier Transforms and other math library functions (single precision) 077.mdljsp2 3350 Similar to 034.mdljdp2, solves motion equations for a model of 500 atoms (single precision) 078.swm256 12700 Solves the system of shallow water equations using finite difference approximations (single precision) 089.su2cor 12900 Calculates masses of elementary particles in the framework of the Quark Gluon theory (vectorizable, double precision) 090.hydro2d 13700 Uses hydrodynamical Navier Stokes equations to calculate galactical jets (vectorizable, double precision) 093.nasa7 16800 Executes seven program kernels of operations used frequently in NASA applications, such as Fourier transforms and matrix manipulations (double precision) 094.fpppp 9200 Calculates multi-electron integral derivatives (double precision)

* denotes the program with the longest runtime on the reference machine.


Reference Machine Times for SPEC_CPU_95

Reference machine: Sun SPARCstation 10 Model 40   CPU: 40MHz SuperSPARC I Primary Cache: 20KBI+16KBD on chip Secondary Cache: None Memory: 128MB   CINT95 (a.k.a. SPECint95) Benchmark Reference Name Time(Sec) Description ------------ -------- -------------------- 099.go 4600 An internationally ranked go-playing program 124.m88ksim 1900 A chip simulator for the Motorola 88100 microprocessor 126.gcc 1700 Based on the GNU C compiler version 2.5.3 129.compress 1800 A in-memory version of the common UNIX utility 130.li 1900 Xlisp interpreter 132.ijpeg 2400 Image compression/decompression on in-memory images 134.perl 1900 An interpreter for the Perl language 147.vortex 2700 An object oriented database   CFP95 (a.k.a. SPECfp95) Benchmark Reference Name Time(Sec) Description ------------ -------- --------------------- 101.tomcatv 3700 Vectorized mesh generation 102.swim 8600 Shallow water equations 103.su2cor 1400 Monte-Carlo method 104.hydro2d 2400 Navier Stokes equations 107.mgrid 2500 3d potential field 110.applu 2200 Partial differential equations 125.turb3d 4100 Turbulence modeling 141.apsi 2100 Weather prediction 145.fpppp 9600 * From Gaussian series of quantum chemistry benchmarks 146.wave5 3000 Maxwell's equations

* denotes the program with the longest runtime on the reference machine.


Reference Machine Times for SPEC CPU2000

Reference machine: Sun Ultra 5/10 300MHz   CPU: 300 MHz SPARC Primary Cache: 16KBI+16KBD on chip Secondary Cache: 2MB (I+D) off chip Memory: 256MB   SPEC CINT2000 Benchmark Reference Name Time(Sec) Description ------------ -------- --------------------- 164.gzip 1400 Compression 175.vpr 1400 FPGA Circuit Placement and Routing 176.gcc 1100 C Programming Language Compiler 181.mcf 1800 Combinatorial Optimization 186.crafty 1000 Game Playing: Chess 197.parser 1800 Word Processing 252.eon 1300 Computer Visualization 253.perlbmk 1800 PERL Programming Language 254.gap 1100 Group Theory, Interpreter 255.vortex 1900 Object-oriented Database 256.bzip2 1500 Compression 300.twolf 3000 Place and Route Simulator   SPEC CFP2000 Benchmark Reference Name Time(Sec) Description ------------ -------- ---------------------- 168.wupwise 1600 Physics / Quantum Chromodynamics 171.swim 3100 * Shallow Water Modeling 172.mgrid 1800 Multi-grid Solver: 3D Potential Field 173.applu 2100 Parabolic / Elliptic Partial Differential Equations 177.mesa 1400 3-D Graphics Library 178.galgel 2900 Computational Fluid Dynamics 179.art 2600 Image Recognition / Neural Networks 183.equake 1300 Seismic Wave Propagation Simulation 187.facerec 1900 Image Processing: Face Recognition 188.ammp 2200 Computational Chemistry 189.lucas 2000 Number Theory / Primality Testing 191.fma3d 2100 Finite-element Crash Simulation 200.sixtrack 1100 High Energy Nuclear Physics Accelerator Design 301.apsi 2600 Meteorology: Pollutant Distribution

* denotes the program with the longest runtime on the reference machine.


Reference Machine Times for SPEC CPU2006

Reference machine: Sun Ultra Enterprise 2   CPU: 296 MHz SPARC 2 cores, 2 chips, 1 core/chip Primary Cache: 16KB inst + 16KB data on chip (each chip) Secondary Cache: 2MB (inst + data) off chip (each chip) Tertiary Cache: none Memory: 2 GB Storage: Two 36 GB 10,000 RPM SCSI hard drives (one dedicated to operating system, one dedicated to SPEC code and dataset)   SPEC CINT2006 Benchmark Reference Name Time(Sec) Description ------------- -------- --------------------- 400.perlbench 9770 PERL programming language interpreter 401.bzip2 9650 General-purpose data compression 403.gcc 8050 C language optimizing compiler 429.mcf 9120 Combinatorial optimization (vehicle scheduling) 445.gobmk 10490 Plays Go and analyzes Go positions 456.hmmer 9330 gene sequence search using Profile Hidden Markov Models 458.sjeng 12100 plays chess and several chess variants 462.libquantum 20720 simulation of a quantum computer 464.h264ref 22130 * H.264/AVC Video compression 471.omnetpp 6250 Simulation of a large Ethernet network 473.astar 7020 2D path-finding used in game A.I. 483.xalancbmk 6900 transform XML documents into HTML, text, etc.   SPEC CFP2006 Benchmark Reference Name Time(Sec) Description ------------- -------- ---------------------- 410.bwaves 13590 fluid dynamics (blast waves) 416.gamess 19580 * Quantum chemical computations 433.milc 9180 Physics / Quantum chromodynamics (QCD) 434.zeusmp 9100 Physics / Magnetohydrodynamics 435.gromacs 7140 Chemistry / Molecular Dynamics 436.cactusADM 11950 Physics / General Relativity 437.leslie3d 9400 Computational Fluid Dynamics (CFD) 444.namd 8020 Classical Molecular Dynamics 447.dealII 11440 PDE solving by Adaptive Finite Element Method 450.soplex 8340 linear algebra solution by the Simplex algorithm 453.povray 5320 Ray Tracing / rendering 454.calculix 8250 finite element modeling of 3D structures 459.GemsFDTD 10610 Maxwell equation solution by finite-difference time-domain (FDTD) method 465.tonto 9840 quantum chemistry / crystallography 470.lbm 13740 incompressible fluid simulation by Lattice Boltzmann Method (LBM) 481.wrf 11170 Weather Research and Forecasting (WRF) Model 482.sphinx3 19490 speech recognition system   * Unlike previous suites, there is no special role given to the program with the longest runtime on the reference machine.

Here are slightly longer descriptions of the benchmark components. For even more thorough descriptions, see the "CPU2006 Benchmark Descriptions" document from SPEC[64].

SPEC CINT2006 components:

400.perlbench : PERL programming language interpreter, with most OS-specific features removed

401.bzip2 : General-purpose data compression — No I/O other than reading input. All compression and decompression happens entirely in memory.

403.gcc : C language optimizing compiler, targeting AMD Opteron, with inlining heuristics altered slightly to use more memory

429.mcf : Combinatorial optimization / Singledepot vehicle scheduling

445.gobmk : Plays Go and analyzes Go positions

456.hmmer : Search a gene sequence database using Profile Hidden Markov Models

458.sjeng : based on Sjeng 11.2 (freeware), plays chess and several chess variants.

462.libquantum : simulation of a quantum computer using libquantum library

464.h264ref : H.264/AVC (Advanced Video Coding) Video compression

471.omnetpp : Simulation of a large Ethernet network, based on the OMNeT++ discrete event simulation system

473.astar : A. I. path finding; portable 2D path-finding library that is used in game's AI.

483.xalancbmk : modified version of Xalan-C++, an XSLT processor for transforming XML documents into HTML, text, or other XML document types.

SPEC CFP2006 components:

410.bwaves : numerically simulates blast waves in three dimensional transonic transient laminar viscous flow.

416.gamess : Quantum chemical computations

433.milc : Physics / Quantum chromodynamics (QCD); serial (single CPU) version of su3imp, simulates behavior of quarks and gluons according to lattice gauge theory.

434.zeusmp : Physics / Magnetohydrodynamics — ZEUS-MP, simulates astrophysical phenomena

435.gromacs : Chemistry / Molecular Dynamics — simulation of Newtonian equations of motion for systems with hundreds to millions of particles

436.cactusADM : Physics / General Relativity — combination of Cactus, an open source problem solving environment, and BenchADM, a computational kernel representative of many applications in numerical relativity

437.leslie3d : LESlie3d, a research level Computational Fluid Dynamics (CFD) code used to investigate a wide array of turbulence phenomena

444.namd : Classical Molecular Dynamics Simulation. Data layout and inner loop of NAMD, a parallel program for the simulation of large biomolecular systems.

447.dealII : PDE solving by Adaptive Finite Element Method

450.soplex : based on SoPlex Version 1.2.1; solves a linear program using the Simplex algorithm.

453.povray : Computer Visualization / Ray Tracing / rendering

454.calculix : CalculiX, a free finite element code for modeling linear and nonlinear three dimensional structures using classical theory of finite elements.

459.GemsFDTD : Solves the Maxwell equations in 3D in the time domain using the finite-difference time-domain (FDTD) method.

465.tonto : Tonto is an open source quantum chemistry package, adapted for crystallographic tasks

470.lbm : implements the Lattice Boltzmann Method (LBM) to simulate incompressible fluids.

481.wrf : Weather Research and Forecasting (WRF) Model, a mesocale numerical weather prediction system serving both operational forecasting and atmospheric research needs.

482.sphinx3 : based on the Sphinx-3 speech recognition system.


Thermal Data for SPEC CPU2006

Huan Liu studied[1] patterns in the usage of "public" (i.e. rented to customers) cloud servers by measuring their CPU temperature at various times over a long period. In order to establish the correlation (or lack of such) between "CPU utilization" and temperature, he also measured amount of heat generated in q workstation (with an Intel Core 2 Duo E6300 processor) while it is running various individual components of the SPEC CPU 2006 suite:

name temperature†    name temperature†
410.bwaves -44.65 400.perlbench -42.94
416.gamess -44.73 401.bzip2 -45.35
433.milc -49.42 403.gcc -47.81
434.zeusmp -45.63 429.mcf -50.12
435.gromacs -49.47 445.gobmk -44
436.cactusADM -48.99 456.hmmer -43.61
437.leslie3d -48.07 458.sjeng -44.01
444.namd 462.libquantum -48.85
447.dealII 464.h264ref -45
450.soplex 471.omnetpp
453.povray 473.astar
454.calculix -42.67 483.xalancbmk
459.GemsFDTD -47.63
465.tonto -45.47
470.lbm -50
481.wrf
482.sphinx3 -45.74
† Temperature is shown in degrees Celsius below the CPU's shutdown temperature: higher (less negative) numbers are hotter. source: Liu[1] table 1

[1] Huan Liu, A Measurement Study of Server Utilization in Public Clouds, 2011.


Footnotes and References

1 : http://www.spec.org/cpu2006/Docs/readme1st.html SPEC, CPU2006 Read Me First, question 23 (reference machine)

2 : http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00001.txt SPEC, CINT2006 results posted by Sun Microsystems for Ultra Enterprise 2 system, March 2006.

3 : http://www.spec.org/cpu2006/results/res2006q3/ SPEC, CPU2006 Results submitted in Third Quarter 2006.

4 : http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00047.txt Spec, CINT2006 results posted by Sun Microsystems for Sun Blade 1000 system.

5 : The use of three runs and discard of highest and lowest times is clear from any of the result reports. For example, look at the 401.bzip2 times in the CINT2006 summary for AMD Shuttle SN25P.

6 : http://www.spec.org/cpu2006/Docs/readme1st.html SPEC, CPU2006 Read Me First, question 15 (concerning "rate" vs. "speed").

7 : http://www.spec.org/cpu2006/Docs/readme1st.html SPEC, CPU2006 Read Me First, question 13 (definitions of the metrics).

8 : This simple and fairly obvious formula was verified by looking at numbers from posted results. For example, see the test run times in the CINT_rate2006 results for the AMD Shuttle SN25P and the CINT2006 results for the reference machine.

9 : http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00008.txt SPEC, CINT_rate2006 results submitted by Advanced Micro Devices for Shuttle SN25P, Q3 2006.

10 : median: the middle of a set of data values. For example, in the set of three values {3, 8, 10}, the median is 8. Note this is different from the mean or average, which in this example would be 7=(3+8+10)/3.

11 : http://www.spec.org/cpu2006/results/res2006q4/ SPEC, CPU2006 Results submitted in Fourth Quarter 2006. Note the CINT2006 Rates entry for IBM System X 3800 — they ran 16 copies on a system with 8 cores, because the system's Intel Xeon processor cores support 2 threads per core. (Intel's "hyperthreading" is essentially a hardware implementation of two virtual CPU's using duplicate register sets, register renaming, and interleaved scheduling of instructions from the two instruction streams. This adds efficiency because the small number of registers in the x86 user programming model is not enough to keep a long pipeline busy.)

12 : http://www.spec.org/cpu92/specrate.txt Alexander Carlton, "CINT92 and CFP92 Homogeneous Capacity Method". Describes the normalization formula used in the rate metrics (scroll down to the description of "ReferenceFactor").

13 : http://www.spec.org/cpu92/specrate.txt ibid.; in "History" section near the end.

14 : http://www.spec.org/cpu95/results/res9509/p019.html SPEC, SPECint95 results submitted by Digital Equipment Corp. for the AlphaStation 200 4/100, September 1995.

15 : http://www.spec.org/cpu95/results/res9509/p074.html SPEC, CINT95rate results submitted by Digital Equipment Corp. for the AlphaStation 200 4/100, September 1995.

16 : http://performance.netlib.org/performance/html/spec.suni40.cint92.6_93.notes.html The Performance Database Server at Netlib, Benchmark CINT92 summary, results for SPARCstation 10 Model 40 submitted by Sun Microsystems Inc. and published by SPEC in June 1993.

17 : http://performance.netlib.org/performance/html/spec.sunf41.cfp92.6_93.notes.html The Performance Database Server at Netlib, Benchmark CFP92 summary, results for SPARCstation 10 Model 40 submitted by Sun Microsystems Inc. and published by SPEC in June 1993.

18 : http://www.islandnet.com/~kpolsson/workstat/work1991.htm Ken Polsson, Chronology of Workstation Computers 1991-1992

19 : http://ftp.sunet.se/pub/benchmark/aburto/faq/spec.faq Jeffrey Reilly, Answers to Frequently Asked Questions about SPEC Benchmarks, 1993 June 4 (formerly at http://gd.tuwien.ac.at/perf/benchmark/aburto/faq/spec.faq)

20 : http://groups.google.com/group/comp.benchmarks/msg/57d2ed8f7915deba John DiMarco, "SPECmark table", comp.benchmarks article, 1994 Jan 10. Lists SPEC CPU92 and SPEC89 results in three tables. A copy is here

21 : Al Aburto, specin89.tbl and specfp89.tbl, text files published via anonymous FTP at ftp.nosc.mil (no longer available).

22 : See [62].

23 : http://staff.stir.ac.uk/b.m.bullen/tech97.htm Brian Bullen (University of Stirling), "Note on PROCESSORS" from Technical Strategy. Describes features and gives performance figures for a number of machines that were being used at the university in 1996.

24 : http://groups.google.com/group/comp.benchmarks/msg/50df1ead101fa5e8 John DiMarco, "SPECmark table", comp.benchmarks article, 1995 May 16. Lists many SPEC CPU92 results in a table.

25 : http://www.spec.org/osg/cpu2000/results/res2000q2/cpu2000-20000511-00100.asc SPEC, CFP2000 results submitted by Compaq Computer Corporation for the AlphaServer GS160 Model 6/731 in May 2000. The integer results are here.

26 : http://www.spec.org/osg/cpu2000/results/res2000q2/cpu2000-20000605-00120.asc SPEC, CFP2000 rate results submitted by Compaq Computer Corporation for the AlphaServer GS160 Model 6/731 in May 2000. Its integer rate results are here.

27 : http://www.islandnet.com/~kpolsson/workstat/work1998.htm Ken Polsson, Chronology of Workstation Computers 1987-1990

28 : Al Aburto, "This is the data set of Dhrystone and SPECratio and SPECin89 data I have" (speccorr.tbl), September 1992. Text file made available by anonymous FTP from ftp.nosc.mil — no longer available online, but he described an earlier version of this work in this article in comp.benchmarks.

29 : http://www.unixnerd.demon.co.uk/sun_unix.html John Burns, "Sun - The Unix Enthusiast's Choice", web page giving SPEC ratings of SUN workstations including a few models not in SPEC's published reports.

30 : http://www.spec.org/cpu95/results/res98q1/cpu95-980128-02369.asc SPEC, CINT95 results posted by Sun Microsystems for Ultra 10 300MHz system.

31 : http://www.spec.org/cpu95/results/res98q1/cpu95-980128-02370.asc SPEC, CFP95 results posted by Sun Microsystems for Ultra 10 300MHz system.

32 : http://www.spec.org/osg/cpu2000/results/res2006q2/cpu2000-20060612-06193.asc SPEC, CINT2000 results posted by Sun Microsystems for Ultra Enterprise 2 system.

33 : http://www.spec.org/osg/cpu2000/results/res2006q2/cpu2000-20060612-06194.asc SPEC, CFP2000 results posted by Sun Microsystems for Ultra Enterprise 2 system.

34 : http://performance.netlib.org/performance/html/spec.ibm220.cint92.6_93.notes.html SPEC, SPECint92 results posted by IBM for RISC System/6000 POWERstation M20/220 system, June 1993.

35 : http://performance.netlib.org/performance/html/spec.hp755.crint.6_93.notes.html SPEC, SPECint92 results posted by Hewlett-Packard Company for HP Apollo 9000/755 system, June 1993.

36 : http://performance.netlib.org/performance/html/spec.delli1.cint92.12_95.notes.html SPEC, SPECint92 results posted by Dell Computer Corporation for Dell Dimension XPS (133MHz, 512KB L2) system, December 1995.

37 : http://performance.netlib.org/performance/html/spec.dellir1.crint92.12_95.notes.html SPEC, SPECrate_int92 results posted by Dell Computer Corporation for Dell Dimension XPS (133MHz, 512KB L2) system, December 1995.

38 : http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00003.txt SPEC, CINT2006 Rate results posted by Sun Microsystems for Ultra Enterprise 2 system, March 2006.

39 : http://www.spec.org/cpu2006/Docs/runrules.html SPEC CPU2006 Run Rules

40 : http://www.spec.org/cpu2000/docs/runrules.html SPEC, CPU2000 Run Rules, v1.3A, June 2006. (Previously at http://www.spec.org/cpu2000/docs/runrules.html)

41 : http://www.spec.org/cpu95/rules/RUNRULES.txt SPEC, CPU95 Run Rules,

42 : http://www.spec.org/cpu2006/Docs/runrules.html SPEC, CPU2006 Run Rules, candidate 4, July 2006

43 : This change to a more obvious relation between the speed and rate metrics reflects the development in both hardware and software: by 2006, almost all systems being subjected to SPEC CPU testing had at least two processors or at least two cores, and software that utilized multiple cores had become quite common. Note also the subtle but significant change in wording in section 4.3.2. The CPU2000 run rules state:

It is permitted to use the SPEC tools to generate a 1-cpu rate disclosure from a 1-cpu speed run. The reverse is not permitted.

The CPU2006 run rules have the same statement, but with not changed to also and cpu changed to copy:

It is permitted to use the SPEC tools to generate a 1-copy rate disclosure from a 1-copy speed run. The reverse is also permitted.

(In both quotes, the emphasis is mine.) The shift in emphasis from "cpus" to "copies" reflects the advent of multithreading and the confusion in nomenclature of "CPU" vs. "core" and so on; see following footnotes.44,45,46. See also the ReadMe1st qestion 1950

44 : processes : The number of programs you want to run at once, or the number of pieces of a program that are run simultaneously in order to divide the work of a large task. These pieces can be called "processes", "tasks" or (software) "threads". Depending the hardware, this might be one per "CPU" or "processor", one per "core", one per "virtual core" or "hardware thread", etc. Companies doing SPEC benchmarks on their systems figure out how many processes will produce the best score (highest performance).

45 : CPUs : At the time of SPECrate92, it was still the case that systems had "processors" each containing one "core" and there was as yet no simultaneous multithreading. However, really what was being measured was the number of processes44 being run at once, and for modern systems that is almost always different from the number of "CPUs".

46 : Or on multiple processor cores in a single-"CPU" system, or even multiple threads in a single core system that implements hardware simultaneous multithreading.

47 : Here is my comparison of the Sun Ultra 5/10 300 MHz to an Intel Core 2, used as an example in the introduction:

A Sun Ultra 5/10 300 MHz is the reference machine for CPU2000. Its CPU95 scores are: SPECint95 = 12.1 and SPECfp95 = 12.9. By definition its SPECint2000 and SPECfp2000 scores are both 100. Using the conversion given above, estimates of its CPU2006 scores would be SPECint2006 = 0.862 and SPECfp2006 = 0.680.

For a 2.8 GHz Core 2 Duo, I will choose the IBM System x3200 M2 which uses a 2.8 GHz processor.48 For that system we have SPECint2006 = 23.8 and SPECfp2006 = 21.5.

So the ratio of speeds between the Sun Ultra and the Intel Core 2 would be about 23.8/0.862 = 27.6 for int, and about 21.5/0.68 = 31.6 for floating-point.

48 : http://www.spec.org/cpu2006/results/res2009q3/cpu2006-20090817-08395.pdf SPEC, SPECint2006 results posted by IBM for System x3200 M2(Intel Core 2 Duo E7400) system, May 19 2009. FP results here.

49 : http://www.spec.org/cpu/cpuv6/ SPEC, CPUv6 Benchmark Search Program (loaded in early 2010).

As of early 2010, work was underway to qualify candidate benchmarks for a successor to the CPU2006 suite. Submissions were accepted through the end of June 2010, and each submission was to go through a certification process that could last one or two years.

50 : http://www.spec.org/cpu2006/Docs/readme1st.html SPEC, SPEC CPU2006: Read Me First, question 19. In full, it states:

Q19: Is there a way to translate SPEC CPU2000 results to SPEC CPU2006 results or vice versa?

There is no formula for converting CPU2000 results to CPU2006 results and vice versa; they are different products. There probably will be some correlation between CPU2000 and CPU2006 results (i.e., machines with higher CPU2000 results often will have higher CPU2006 results), but there is no universal formula for all systems.

SPEC encourages SPEC licensees to publish CPU2006 numbers on older platforms to provide a historical perspective on performance.

51 : http://www.spec.org/cpu2000/Docs/utility.html SPEC, SPEC CPU2000 Utility Programs, rawformat section, question 3. In full, it states:

How do you generate a 1-cpu rate result from a 1-cpu speed result?

To generate a 1-cpu rate result from a speed run, copy the original rawfile to another location, and use rawformat to both generate the new rawfile and whatever other reports you want. For example:

$ grep SPECf CFP2000.015.asc SPECfp_base2000 176 SPECfp2000 -- $ cp CFP2000.015.raw convertme $ rawformat --output_format asc,raw,ps --rate convertme runspec v2.00 - Copyright (C) 1999 Standard Performance Evaluation Corporation Loading standard modules.............. Loading runspec modules............. Identifying output formats...asc...config...html...pdf...ps...raw... Formatting convertme format: ASCII -> convertme.asc format: raw -> convertme.raw format: PostScript -> convertme.ps $ grep SPECf convertme.asc SPECfp_rate_base2000 2.05 SPECfp_rate2000 --

NT Notes: on NT systems, you may find that --output_format will only accept one argument at a time. So, first create the rawfile, by using --output_format raw, then use the new rawfile to create the other reports.

52 : http://www.spec.org/cpu2006/Docs/utility.html SPEC, SPEC CPU2006 Utility Programs, rawformat section, question "How do you generate a rate result from a speed result? (or vice-versa)". Sample output is shown that is similar to the CPU2000 version51, converting from 10.1 to 10.1 and back again.


Bibliography

Here are several papers related directly or indirectly to the SPEC benchmarks. All discuss important issues that affect all who hope to implement or use benchmarks.

[54] H J Curnow and B A Wichmann, A synthetic benchmark. Computer Journal 19 (1) pp. 43-49 (1976).

Describes the Whetstone benchmark design and limitations; gives test results and source code in ALGOL.

[55] John Gustafson et al., The Design of a Scalable, Fixed-Time Computer Benchmark. Journal of parallel and distributed computing 12 (4) pp. 388-401 (1991) At Gustafson's website.

Describes a benchmark that automatically adapts to growing computer memory and speed, while meeting the objectives of portability, usefulness and relevance, and based on a graphics algorithm called "radiosity". This is a predecessor to the HINT benchmark.

[56] Kaivalya M. Dixit, Overview of the SPEC Benchmarks. The Benchmark Handbook, (1993).

Describes the first two versions (1989 and 1992) of the SPEC benchmark suite; includes tables of CPU92 results.

[57] R. Giladi and N Ahituv, SPEC as a performance evaluation measure. Computer 28(8) pp. 33-42, August 1995.

Gives descriptions and stats for the CPU89 and CPU92 components, and regression analysis of all components against each other.

[58] John L. Gustafson and Quinn O. Snell, HINT: A New Way To Measure Computer Performance. Proceedings of the 28th Hawaii International Conference on System Sciences, ISBN:0-8186-6935-7, p. 392 (1995). At citeseer and the Ames Laboratory

Describes reasons for the unsuitability of SPEC and other non-scalable benchmarks, mainly from the criticism that they do not adapt to variable size and capability of systems, and variations in problem sizes that real users perform. Gives a detailed description of the HINT benchmark, with notes on how to parallelize, and test results, but no source code.

[59] Mark Claypool, Touchstone — A Lightweight Processor Benchmark. Computer Science Technical Report series, Worcester Polytechnic Institute. At citeseer and the WPI FTP server

Describes a counter-loop benchmark similar to BogoMIPs and correlates it to gcc, LINPACK, SPEC and quicksort.

[60] John L. Gustafson and Rajat Todi, Conventional Benchmarks as a Sample of the Performance Spectrum, The Journal of Supercomputing 13 (3) pp. 321-342, May 1999. Available from the HINT homepage

Describes several popular benchmark approaches, including LINPACK, SPEC, STREAM, Whetstone/Dhrystone, and several others, and characterizes them as points along a typical curve of the tradeoff between performance and memory footprint given by the HINT benchmark. Lots of specific data and cool formulas like "spec95int ~= 7.5 x (MQUIPS at 180 KB)"

[61] Aashish Phansalkar, et al., Four Generations of SPEC CPU Benchmarks: What has Changed and What has Not, 2004. At citeseer and U Texas.

Similar to the 2005 paper by te same authors

[62] Aashish Phansalkar, et al., Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites. Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005 ISBN 0-7803-8965-4 pp. 10-20 (2005). At citeseer and U Texas.

Lists each of the component programs of CPU89, CPU92, CPU95 and CPU2000, and gives their similarities, differences, and classifications into similar categories. Includes extensive analysis based on runtime profiling.

[63] K. Hoste, et al., Performance Prediction based on Inherent Program Similarity, PACT '06 Proceedings of the 15th international conference on Parallel architectures and compilation techniques, ISBN:1-59593-264-X (2006). At citeseer and U Virginia.

This paper explains the analysis techniques and objectives of the 2004, 2005 and 2007 papers by Phansalkar, et al.

[64] SPEC, SPEC CPU2006 Benchmark Descriptions (PDF file), 2006.

Gives extensive descriptions of each of the component programs in the SPEC CPU2006 benchmark suite.

[65] Aashish Phansalkar, et al., Analysis of Redundancy and Application Balance in the SPEC CPU2006 Benchmark Suite, ISCA '07 Proceedings of the 34th annual international symposium on Computer architecture ISBN: 978-1-59593-706-3. At U Texas.

The authors of the above papers from 2004 and 2005 perform a similar analysis on the benchmarks in the SPEC CPU2006 suite.


SPEC® and the benchmark names SPECint89, SPECfp89, SPECmark, SPEC CPU95, CINT95, SPECint_base95, SPECint95, SPECint_base_rate95, SPECint_rate95, CFP95, SPECfp_base95, SPECfp95, SPECfp_base_rate95, SPECfp_rate95, SPEC CPU2000, CINT2000, SPECint_base2000, SPECint2000, SPECint_rate_base2000, SPECint_rate2000, CFP2000, SPECfp_base2000, SPECfp2000, SPECfp_rate_base2000, SPECfp_rate2000, SPEC CPU2006, CINT2006, SPECint®_base2006, SPECint2006, SPECint®_rate_base2006, SPECint_rate2006, CFP2006, SPECfp®_base2006, SPECfp2006, SPECfp®_rate_base2006, and SPECfp_rate2006 are registered trademarks of the Standard Performance Evaluation Corporation. The examples given above are drawn from test results, disclosures and documentation published on www.spec.org on (and prior to) Jan 25, 2007.


Robert Munafo's home pages on HostMDS   © 1996-2014 Robert P. Munafo.   about   contact    mrob    mrob27    @mrob_27
This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. Details here.

This page was last updated on 2014 Nov 21. s.11