| The SPEC Benchmarks |
This page describes the SPEC benchmarks, giving their history, a description of how they are measured and how the scores are calculated, and useful formulas for converting between one SPEC benchmark and another.
History
SPEC, the Standard Performance Evalution Corporation, is an organization dedicated to producing benchmarks that are reasonably scientific, unbiased, meaningful and relevant. They were formed in September 1988 and released their first set of benchmarks, the SPEC Benchmark Suite for UNIX Systems version 1.0, in October 198927.
The suite consisted of 10 programs which could be run and measured to produce three scores: integer SPECmark, floating-point SPECmark, and overall SPECmark. For each program, the speed of running that program on a test machine was measured relative to the speed of the same program running on the "Reference Machine", a VAX 11/780. The SPECint score was a geometric mean of the speeds of the 4 programs used to measure integer performance, and SPECfp was the geometric mean of the other 6 programs. The mean of all 10 programs yielded the SPECmark score.
In 1991 December, SPEC renamed the integer and floating-point SPECmarks to SPECint and SPECfp, respectively18. The next month, they released SPECint92 and SPECfp92, and the 1989 versions became known as SPECint89 and SPECfp8918. There was no overall "SPECmark92" combining the integer and floating-point performance into a single score.
The 1992 suite used a greater number of programs to evaluate performance, and introduced the SPECrate metric for multi-CPU machines. The SPECrate measurement involves running multiple copies of a benchmark program simultaneously, and the formula is a bit more complicated because it needs to include an extra variable for the number of copies that were running simultaneously.
In 1995, 2000 and 2006, SPEC released updated benchmark suites, each with a greater number of programs than its predecessor, and each using larger amounts of code and larger datasets. This brought the total to five suites: 1989, 1992, 1995, 2000 and 2006. Each suite defines a standard reference machine and a set of programs to run, and formulas for computing the scores. All but the 1989 version also measure multi-CPU throughput.
The SPEC benchmarks are continually updated because of two problems that affect most (but not all) benchmark methods:
Issues Preventing Comparison
Comparing SPECint vs. SPECrate_int, etc.
SPEC advises against comparing SPECrate_int and SPECrate_fp scores to SPECint and SPECfp, respectively. Prior to the 2006 suite, they made it a little difficult by using formulas that create quite different results, and by not describing the formulas explicitly. However, it is easy to determine what the formulas are, and if you know what you're comparing, a direct comparison can be quite meaningful. Such a comparison is meaningful when, for example, a computer user has to perform 12 independent runs of the same program on 12 different but equally demanding datasets, and has a choice between running 4 copies at a time (for a total of 3 runs) on a 4-CPU machine, or running them one at a time on a 1-CPU machine. In such scenarios the application is described as being "easily" or "trivially" parallelized.
Comparing SPECint95 vs. SPECint2000, etc.
Each new benchmark suite involves a greater number of programs, generally with a larger memory usage and greater running time (when compared to programs from the older suite running on the same test machine). Because of the phenomena described here, this means that the SPEC scores from different suites cannot be directly compared. For example, SPECint89 programs use less memory than SPECint95 programs. Therefore, a system with a large CPU data cache and relatively small amount of RAM will do relatively well on SPECint89 and relatively poorly on SPECint95, and a machine with a small CPU data cache and large amount of RAM will do worse on SPECint89 and better on SPECint95. However, most actual machines have a balanced amount of cache, memory, and other necessary system components. As a result, SPEC benchmarks from the different years are actually quite closely correlated.
The 1989 SPEC Benchmark Suite for UNIX Systems
This suite, later renamed "SPECint89" and "SPECfp89", consists of 10 programs: GCC, ESP, LI, EQNtott, SPICE2G6, DODUC, NASA7, MATRIX300, FPPPP and TOMCATV. Each is representative of a type of task that computers were being used for at the time the suite was developed. The reference machine and program run times are listed here.
SPECratios are obtained by dividing program run time into VAX 11/780 run time:
SPECratioGCC = 1481.5 / runtimeTM(GCC)
1481.5 = runtime of GCC on reference machine TM = test machine
Thus, SPECratios are higher for faster machines.
SPECint89 is the geometric mean of the SPECratios for the 4 SPECint89 programs, and SPECfp89 is defined similarly. A machine with SPECint89 of 2.0 is about twice as fast (at integer calculations) as a VAX 11/780.
For SPEC CPU92, the reference machine is the same as in CPU89, a VAX 11/780. The VAX is given a SPECint92 and SPECfp92 score of 1.0. Each program selected for use in CPU92 is run on the VAX to determine its reference time, denoted below by Tref(program). The programs and reference times are listed here.
Single-CPU Benchmarks: SPECint92 and SPECfp92
Each program is compiled with standard flags for a "base" measurement, or with tester-selected optimization flags for a "peak" measurement. The program is run and its runtime measured. The "Base Ratio" or "Peak Ratio" for that program run is computed as follows:
Ratio = RefTime / RunTime
For example, running 085.gcc on an IBM POWERstation M20/220
takes 413.0 seconds, and the reference machine takes 5460 seconds. So, the ratio for 085.gcc on the POWERstation M20/220 is:34
ratio085.gcc = 5460 / runtimeSUT(085.gcc) = 5460 / 413.0 = 13.22
5460 = Tref(085.gcc), the runtime of 085.gcc on the reference
machine (a VAX 11/780)
SUT = system under test
413.0 = runtime of 085.gcc on SUT
In the integer test suite there are 8 programs. SPECint92 is the geometric mean of the ratios for the 8 programs, and SPECfp92 is defined similarly. A machine with SPECint92 of 2.0 is about twice as fast (at integer calculations) as the VAX 11/780. Here are the runtimes and ratios all 8 of the CINT92 programs on the POWERstation M20/220:34
To determine the SPECint92 score for the POWERstation, the geometric mean of the 6 ratios is computed:
mean = (19.7 × 15.6 × 28.9 × 14.0 × 16.0 × 13.2)(1/6) = 17.24
This average 17.24 is the SPECint92 score.
SPECrate92
The formula for calculating the rate metrics is more complex that it was in CPU89, largely because of misunderstanding and misuse. This quotation describes the problem:13
Unfortunately it was easy to make invalid comparisons between SPECmark89s and SPECthruput89s or even mistake values between these metrics. It is not fair to compare the speed of a uni-processor machine against the throughput of a multi-processor. However, many believed that it would be acceptable to compare SPECthruput89s against SPECmark89s, because the SPECthruput89 looked like a SPECmark89 both in terms of the results and the means to calculate those results.
A SPECrate92 for a given machine is calculated as follows:
SPECrate = geom mean [ SPECrate(program) ]
SPECrate(program) = N × [ Tref(program) / Tref(056.ear) ] × [ 604800 / Ttest(program) ]
N = number of copies run concurrently (this can be different for each
program, chosen by the tester to maximize the SPECrate for
example see the results for the HP Apollo 9000/755.35)
Tref(program) = time to run program on the reference machine,
a VAX 11/780
Tref(056.ear) = time to run 056.ear on the reference
machine = 25500 seconds.12
604800 = number of seconds in a week
TSUT(program) = time to finish last concurrent copy on
system under test
SUT = system under test
For example, running 2 copies of 008.espresso on an HP Apollo 9000/755 took 48 seconds.35 Applying the formula:
SPECrate(008.espresso) = N
× [ Tref(008.espresso) / Tref(056.ear) ]
× [ 604800 / Ttest(008.espresso) ]
= 2 × [ 2270 / 25500 ] × [ 604800 / 48 ]
= 2243.29
Thus the SPECrate for 008.espresso on the HP Apollo 9000/755 is 2243, as you can see in that machine's report.35
It is useful to note that for this machine, different programs were run with different values of N. This is allowed by the SPEC CPU92 run rules and provides for the possibility that a tester might recognize that certain tasks are more suited to greater parallelism than others. In the specific case of the HP Apollo 9000/755, which is a single-processor machine, some programs were run with N=1 and others with N=2 or N=3.
The overall SPECint_rate92 for the machine is the geometric mean of the SPECrates for each of the integer programs:
SPECrate_int92 = (2243 × 2055 × 2115 × 1564 × 1852
× 1742)(1/6)
= 1914.17
rounded off to 1914 as you can see in the report.35
SPEC CPU95: SPECint95, SPECfp95 and SPECrate95
For SPEC CPU95, the reference machine is a SPARCstation 10/40 with 128MB of memory, and that machine is given a SPECint95 and SPECfp95 score of 1.0. The programs and reference times are listed here.
Single-CPU Benchmarks
Each program is compiled with standard flags for a "base" measurement, or with tester-selected optimization flags for a "peak" measurement. The program is run and its runtime measured. The "Base Ratio" or "Peak Ratio" for that program run is computed as follows:
Ratio = RefTime / RunTime
For example, running 126.gcc on the AlphaStation 200 4/100 takes 1280 seconds, and the reference machine takes 1700 seconds. So, the ratio for 126.gcc on the AlphaStation 200 4/100 is:14
ratio126.gcc = 1700 / runtimeSUT(126.gcc) = 1700 / 1280 = 1.328.
1700 = Tref(126.gcc), the runtime of 126.gcc on the reference
machine (a SPARCstation 10/40)
SUT = system under test
1280 = runtime of 126.gcc on SUT
In the integer test suite there are 8 programs. SPECint95 is the geometric mean of the ratios for the 8 programs, and SPECfp95 is defined similarly. A machine with SPECint95 of 2.0 is about twice as fast (at integer calculations) as the SPARCstation 10/40. Here are the runtimes and ratios all 8 of the CINT95 programs on the AlphaStation 200 4/100:14
To determine the SPECint95 score for the AlphaStation, the geometric mean of the 8 ratios is computed:
mean = (2.05 × 1.41 × 1.33 × 1.48 × 1.46 × 1.56 × 1.47 × 1.23)(1/8) = 1.48
This average 1.48 is the SPECint_base95 score.
Rate (Throughput) Benchmarks
The formulas used in the SPECrate95 metric are not publicly described by SPEC, but it is possible to assume that the design is the same as for SPECrate92. That would imply that the formula is:
SPECrate(program) = N × [ Tref(program) / Tref(145.fpppp) ] × [ 604800 / Ttest(program) ]
With the terms defined similarly to above. Note that the role of 056.ear has been replaced by 145.fpppp because 145.fpppp is the test program with the longest runtime on the reference machine (9600 seconds).
Using 126.gcc as an example, the CINT95rate results submitted by the tester15 indicate that one copy was run and the runtime was 1280. Using the formula, we would predict that the SPECrate for 126.gcc would be 1 × [ 1700 / 9600 ] × [ 604800 / 1280 ], which is 83.67. However the submitted results indicate a "base rate" was 11.9. These numbers differ by a ratio of 7.03. Similar results are found for each of the other programs in the CINT95rate report for the AlphaStation:
The easiest way to explain make the formula fit these numbers is to replace the time constant 604800 (the number of seconds in a week) with the number of seconds in a day, 86400. The remaining discrepency (e.g. 6.97 versus 7) is explained by roundoff error in the results report. Applying the same calculation to the numbers in other CPU95 rate reports gives similar results.
So, SPECint95 uses this formula to compute the individual rates:
SPECrate(program) = N × [ Tref(program) / Tref(145.fpppp) ] × [ 86400 / Ttest(program) ]
N = number of copies run concurrently (this may be different for
each program if a peak SPECrate is being measured, but not
for a base SPECrate.41)
Tref(program) = time to run program on the reference machine,
a SPARCstation 10/40
Tref(145.fpppp) = time to run 145.fpppp on the reference
machine = 9600 seconds.
86400 = number of seconds in a day
TSUT(program) = time to finish last concurrent copy on
system under test
SUT = system under test
This formula is (mostly) given in the CPU95 run rules41, which state (section 4.2.2):
The "rate" calculated for each benchmark is a function of the number of copies run * reference factor for the benchmark * number of seconds in a day / elapsed time in seconds, which yield a rate in jobs/day.
Here, the phrase "reference factor for the benchmark" corresponds to the ratio Tref(program) / Tref(145.fpppp).
For SPEC CPU2000, the reference machine is a Sun Ultra 5/10 workstation with a 300-MHz SPARC processor and 256MB of memory, and this machine is given a SPECint2000 and SPECfp2000 score of 100. The program names and their reference run times are listed here.
Once again, there are two important changes in the actual formulas. The longest-runtime program is now 171.swim with a runtime of 3100, and the marathon-batch period has been decreased again from one day to one hour (3600).
SPEC CPU2000 - The benchmark suites, their method of use, and the results produced
CINT2000 - The suite of 17 compute-intensive programs used to measure integer performance
CFP2000 - The suite of 14 compute-intensive programs used to measure floating-point performance
SPECint2000 - A measure of speed for single-CPU machines, measures how fast the machine runs all the CINT2000 programs when told to run them one at a time.
SPECint_rate2000 - A measure of throughput for multi-CPU machines, measures how fast the machine can complete a number of simultaneous runs of programs from the CINT2000 suite, when told to run N copies of the same CINT2000 program at the same time.
score - A number produced by a performing a carefully regulated test run of the programs in a suite and averaging the results, normalized by comparing to the reference machine (the Sun Ultra 5/10)
Single-CPU Benchmarks
Each program is compiled with standard flags for a "base" measurement, or with tester-selected optimization flags for a "peak" measurement. The program is run three times; each runtime is measured and the median time is used.40 The "Base Ratio" or "Peak Ratio" for that program run is computed as follows:
Ratio = 100 × RefTime / RunTime
For example, running 168.wupwise on a Compaq AlphaServer GS160 Model 6/731 produced a base run time of 399 seconds25. The Base Ratio is 100×1600/399 = 401. That number tells us that, when running the 168.wupwise program compiled conservatively, the AlphaSever GS160 6/731 completed the run about 4.01 times faster than a Sun Ultra 5/10 300MHz workstation running the same program compiled the same way.
The formula for the overall SPECint2000 or SPECfp2000 score is a geometric mean of the ratios for all the programs in the benchmark suite (17 for INT, 14 for FP). For example, in the floating-point suite, the Compaq AlphaServer GS160 Model 6/731 got ratios as low as 145 (for 183.equake) and as high as 1217 (for 179.art), and the geometric mean of all the ratios was 405.
Throughtput (Rate) Benchmarks
For the concurrent throughput ratings (SPECint_rate2000 and SPECfp_rate2000), there are also "base" and "peak" versions, where the "peak" version is done with aggressive compiler optimization and "base" is compiled conservatively. The formula for the SPEC{int|fp}rate2000 score is:
CPU2000_Rate = geom mean [ CPU2000_Rate(program) ]
CPU2000_Rate(program) = N × [ Tref(program) / Tref(171.swim) ] × [ 3600 / TSUT(program) ]
N = number of copies run concurrently (this may be different for
each program if a peak SPECrate is being measured, but not
for a base SPECrate.40)
Tref(program) = time to run program on the Sun Ultra 5/10
Tref(171.swim) = time to run 171.swim on the Sun Ultra 5/10 = 3100
3600 = number of seconds in an hour
TSUT(program) = time to finish last concurrent copy on system being tested
SUT = system under test
This formula is (mostly) given in the CPU2000 run rules40, which state (section 4.3.2):
The "rate" calculated for each benchmark is a function of:
the number of copies run *
reference factor for the benchmark *
number of seconds in an hour /
elapsed time in seconds
which yields a rate in jobs/hour.
Here, the phrase "reference factor for the benchmark" corresponds to the ratio Tref(program) / Tref(171.swim).
For SPEC CPU2006, the reference machine is a Sun Ultra Enterprise 2 workstation1 with a 296-MHz UltraSPARC II processor2. This machine is similar to the Ultra 5/10 used in the CPU2000 suite, but has better cache and more RAM.
The reference machine is given a SPECint2006 and SPECfp2006 score of 1.003. The program names and their reference run times are listed here.
Single-Processor Integer (CINT2006) Calculation
There are 12 programs in the test suite. Each program is compiled and run three times, the runtimes are measured and the median10 is used to calculate a runtime ratio5. Ratios are obtained by dividing program run time into the reference machine run time. For example, consider the Sun Blade 1000 and the program 403.gcc1,4:
ratio403.gcc = Tref(403.gcc) / TSUT(403.gcc) = 8050 / 2702 = 2.98
Tref(403.gcc) = runtime of 403.gcc on reference machine = 8050
SUT = system under test
TSUT(403.gcc) = runtime of 403.gcc on SUT = 2702
As you can see, ratios are higher for faster machines.
SPECint2006 is the geometric mean of the ratios for the 12 SPECint2006 programs, and SPECfp2006 is defined similarly. A machine with SPECint2006 of 2.0 is about twice as fast (at integer calculations) as the Sun Ultra Enterprise 2. Again, using the Sun Blade 1000 as an example, here are the runtimes and ratios for each of the 12 CINT2006 programs4:
To determine the SPECint2006 score for the Sun Blade 1000, the geometric mean of the 12 ratios is computed:
mean = (3.18 × 2.96 × 2.98 × 3.91 × 3.17 × 3.61 × 3.51 × 2.01 × 4.21 × 2.43 × 2.75 × 3.42)(1/12) = 3.12
Since the test was performed without special adjustments in compiler flags or other similar optimizations, it is a SPECint®_base2006 score.
Multi-Processor Integer Throughput (CINT2006 Rate) Calculation
The same test suite is used. The reference machine is the same Sun Ultra Enterprise 2, and its SPECint_rate2006 and SPECfp_rate2006 scores are both 1.00.
To test a machine, a number of copies N is selected usually this is equal to the number of CPU cores or threads (register sets) on the test system, but that is not required.6,11
The rate score for the system under test is determined from a geometric mean of rates for each program in the test suite:7
CPU2006_Rate = geom mean [ rate(program) ]
Each individual test program's rate is determined by taking the median10 of three runs5 (as above for the speed metric). Each run consists of N copies of the program running simultaneously on the test system. Its time is the time it takes for all the copies to finish (that is, the time from when the first copy starts until the last copy finishes). The rate metric for that program is calculated by the following formula:8
rate(program) = N × Tref(program) / TSUT(program)
N = number of copies run concurrently (this may be different for
each program if a peak SPECrate is being measured, but not
for a base SPECrate.39)
Tref(program) = time to run one copy of the program on the Sun
Ultra Enterprise 2
TSUT(program) = time to finish last concurrent copy on system being tested
SUT = system under test
This formula is (mostly) given in the CPU2006 run rules42, which state (section 4.3.2):
The "rate" calculated for each benchmark is a function of:
the number of copies run *
reference factor for the benchmark /
elapsed time in seconds
which yields a rate in jobs/time.
Here, the phrase "reference factor for the benchmark" refers simply to Tref(program).
For example, consider the AMD Shuttle SN25P and the program 403.gcc9:
CPU2006_Rate(403.gcc) = 2 × Tref(403.gcc) / T(403.gcc) = 2 × 8050 / 875 = 18.4
8050 = time to run one copy of 403.gcc on the Sun Ultra
Enterprise 2 reference system (see above)
875 = time to run 2 copies of 403.gcc on the Shuttle SN25P
This calculation is performed for each program in the test suite. The figures for all 12 integer programs on the AMD Shuttle SN25P follow:
To determine the SPECint_rate2006 score for the Shuttle SN25P, the geometric mean of the 12 ratios is computed:
mean = (23.7 × 13.0 × 18.4 × 15.0 × 26.8 × 15.0 × 24.0 × 15.3 × 29.0 × 12.5 × 12.8 × 13.9)(1/12) = 17.5
Conversions
Converting Between Dhrystone and SPECint89
Prior to SPEC, the de-facto accepted standard benchmark of integer CPU performance was Dhrystone. Dhrystone was replaced by the 1989 SPEC integer metric. It had been often criticized for many reasons including its small code size, abuse by salesmen and customers, disproportionate emphasis on string operations, etc.
However, despite all the lack of faith in Dhrystone there was actually a very good correlation between the SPEC integer benchmark and the Dhrystone benchmark. Using data from 28 systems for which Dhrystone and SPECint89 results were readily available and 4 systems for which the SPEC existed and the Dhrystone could be easily deduced, Al Aburto performed linear regression least-squares analysis and found a correlation coefficient of 0.971 between Dhrystone version 1.1 and SPECint8928.
Based on the data from these 32 systems, we get these (approximate) conversions:
Dhrystone = 2510 × SPECint89
Dhrystone MIPS = 1.43 × SPECint89
"Dhrystone MIPS" is an old interpretation of the Dhrystone benchmark, calculated to give the VAX 11/780 a score of 1.0. This was simply the raw score (loops per second) divided by 1757 (the raw score of a VAX 11/780). The acronym "MIPS" stands for Million Instructions Per Second, and "Dhrystone MIPS" is so-called in reference to "VAX MIPS". The VAX 11/780 had performance similar to the IBM System/370 model 158-3, which was marketed as a "1 MIPS" machine. The term "VAX MIPS" was also common in those days, and was a unit of performance expressed in terms of the VAX.
The value of 1.43 stated above is based on statistics from a large number of machines. It indicates a much more significant fact the VAX 11/780 was in fact, not a very typical machine. Compared to machines of the following decade, the VAX 11/780 performs less well on Dhrystone than one would expect.
Converting between SPEC89 and SPEC92
SPEC89 and SPEC92 both use the same reference machine (a VAX 11/780), and both metrics give the VAX a score of 1.00, so they are directly comparable:
SPECint92 ≅ SPECint89
SPECfp92 ≅ SPECint89
Converting between SPEC92 and SPEC95
SPEC95 uses the SPARCstation 10 model 40 as its reference machine; this machine has a SPECint92 score of 50.216 and SPECfp92 of 60.217. By definition, its CPU95 scores are both 1. So to convert (approximately) between SPEC92 and SPEC95 numbers, multiply or divide by 50.2 or 60.2.
SPECint95 ≅ 50.2 * SPECint92
SPECfp95 ≅ 60.2 * SPECint92
Converting between SPEC95 and SPEC2000
SPEC2000 uses the Sun Ultra 5/10 300MHz as its reference machine; this machine has a SPECint95 score of 12.130 and SPECfp95 of 12.931. By definition, its CPU2000 scores are both 100. So to convert (approximately) between CPU95 and CPU2000 numbers, multiply or divide by 100/12.1=8.26 or 100/12.9=7.75 for SPECint and SPECfp respectively:
SPECint2000 ≅ 100 * SPECint95 / 12.1
SPECfp2000 ≅ 100 * SPECint95 / 12.9
Converting between SPEC2000 and SPEC2006
SPEC CPU2006 uses the Sun Ultra Enterprise 2 with a 296 MHz processor as its reference machine; this machine is similar to but slightly better than the Ultra 5/10 300 MHz used for SPEC CPU2000. The Ultra Enterprise 2 has a SPECint2000 score of 11632 and SPECfp2000 of 14733. By definition, its CPU2006 scores are both 100. So to convert (approximately) between CPU2000 and CPU2006 numbers, multiply or divide by 100/116=0.862 or 100/147=0.680 for SPECint and SPECfp respectively:
SPECint2006 ≅ 100 * SPECint2000 / 116
SPECfp2006 ≅ 100 * SPECint2000 / 147
Relationship Between Speed and Rate Metrics
The speed metrics (SPECint and SPECfp) measure how quickly a single task can be completed (implicitly a single-threaded task running on one CPU core). The rate metrics (SPECintrate and SPECfprate) measure the overall capacity for the system to complete tasks (with the implication that it's okay to run ask many simultaneous tasks as you want to attain the highest rate).
This is the type of thing people seem to completely understand, or not understand at all. I will make an analogy that is similar to (but I believe a bit better then) the one used by an author at SPEC back in the 1990's.
Yesterday I went to a favorite diner for breakfast. It is a small place with few customers, and they were not busy when I arrived. I ordered a ham and cheese omelette. They have one chef, who posesses one omelette pan, one stove, and one square meter of counter space next to the stove. He heard my order, got to work immediately, and I had my omelette in 5 minutes.
This morning I went to a much larger restaurant. They have five chefs, each of whom has his own stove with five burners, five omelette pans and five square meters of counter space. This restaurant has at least five times as much of everything as the one I went to yesterday, and when I arrived none of them were busy. I gave my order (an omelette) to the waiter. The order was delivered to the kitchen in one minute. Five minutes later my omelette was ready, and another minute later it was delivered to me by the waiter. Total time, 7 minutes.
The larger restaurant had much greater capacity then the first one but it took them the same time (actually a bit longer) to fulfill my order. The reason is clear no matter how much equipment and manpower you throw at it, the task of cooking an omelette cannot be accelerated beyond certain basic limits.
If I wanted to have breakfast with four friends and we all wanted omelettes, then the larger restaurant would indeed be faster. The diner with one chef and one pan would take about 25 minutes to finish preparing our breakfast, while the larger one would probably finish the task in 7 or 8 minutes.
This is an analogy to what computers are doing. Most modern computers have more than one processor, and are able to run two or more tasks at full speed. While there are many tasks that can be broken up and shared among multiple processors, some cannot, and some have pieces that are fundamentally atomic (indivisible).
The speed metrics (SPECint and SPECfp) measure the speed at which these "atomic" tasks can be completed under ideal circumstances.
Why it is important to know the difference between speed and rate metrics
Example: Alex is using a dual-processor workstation with a SPECint rating of 5. He is shopping for another to use on the same or similar work, and decides to buy a single-CPU system with a rating of 7. He is disappointed to discover that the new machine is a little slower than the old he expected it to be faster. The reason is that, unknown to him, he had been utilizing the old machine's ability to perform two tasks simultaneously. The new machine, while faster at performing single tasks, has a lesser capacity to finish multiple tasks over a period of time. Alex would have been better off comparing machines based on their SPECint_rate metrics. Unfortunately, it is likely that the rate metric for the single-CPU machine is not available.
Why it is important to have rate metrics for single-CPU machines
Example: Brett runs a research project related to weather forecasting. He has just gotten permission to procure a workstation to replace his current setup, a pair of brand-A single-CPU workstations rated at 10 SPECfp each. The goal for the purchase is to get the job onto a single system that can do the work in 2 days. The current setup takes 3.5 days to finish processing 126 separate datasets, a task performed weekly. Therefore Brett knows any system that will deliver 35 SPECfp can accomplish his goal. He finds a few single-CPU systems from brand-B rated 45 to 50. He knows these will do the job but he also discovers that for the same money, a brand-C dual-CPU workstation can be bought, using CPUs that individually rate about 30. Brett knows that his workload is easily broken up among multiple CPUS because that is what he is already doing each week. Unfortunately, he cannot compare the brand-B and brand-C workstations because they do not all have the same metrics. The dual-CPU systems have SPECrate_fp scores and the single-CPU systems have SPECfp scores. This problem could be solved easily if brand-B would provide SPECrate_fp scores for its single-CPU systems.
Speed and Rate conversion for CPU92
As described above, the SPECrate for each individual program in the integer or FP suite is calculated as follows:
SPECrate(program) = N × [ Tref(program) / Tref(056.ear) ] × [ 604800 / Ttest(program) ]
N = number of copies run concurrently
Tref(program) = time to run program on the reference machine,
a VAX 11/780
Tref(056.ear) = time to run 056.ear on the reference
machine = 25500 seconds.12
604800 = number of seconds in a week
TSUT(program) = time to finish last concurrent copy on
system under test
SUT = system under test
Therefore, a SPECrate for the VAX 11/780, based on running just a single copy of a particular program, would be:
SPECrate(program) = 1 × [ Tref(program) / Tref(056.ear) ] × [ 604800 / Tref(program) ]
the Tref(program) terms cancel out, leaving
SPECrate(program) = 604800 / Tref(056.ear) = 23.72
for all programs. Since the SPECrates for the programs in the INT and FP suites are all computed normalized to 056.ear12, SPECrate_int92 and SPECrate_fp92 for a VAX 11/780 will be 23.72.
This allows us to convert from SPEC92 to SPECrate92 for single-processor machines:
SPECrateInt92 = 23.72 × SPECInt92
SPECrateFP92 = 23.72 × SPECFP92
For example, the Dell Dimension XPS (133MHz, 512KB L2) is a single-processor system with a reported SPECint92 of 177.9.36 The SPECrate_int92 was also reported for that system, and the tester ran each program in the suite with N=1 as the number of concurrent copies. They reported37 a SPECrate_int92 of 4144, very close to the predicted value 23.72 × 177.9 = 4220.
This conversion for single-processor machines was frequently performed by third parties to allow comparison of a single-processor system to a dual-processor system by customers only interested in long-term homogeneous capacity:24,23
Computed specrates are indicated by "c". They're computed from SpecInt92, SpecFP92 (for uniprocessors) using a scaling factor. This number is usually slightly less than or equal to a measured specrate on a uniprocessor. The scaling factor is the number of seconds in a week, divided by the time of the longest-running benchmark on the reference SPEC VAX 11/780, which is 604800/25500, or about 23.7. - John DiMarco
A more general conversion formula factors in the number of processors:
SPECrateInt92 = 23.72 × P × SPECInt92
SPECrateFP92 = 23.72 × P × SPECFP92
where P is the number of CPUs. It should be understood that this formula only gives a theoretical ideal maximum which is never achieved. For example, if a 4-CPU system is built from CPUs that deliver 112 SPECint92 in single-CPU systems, then the 4-CPU system, if designed well, will have a SPECrateInt92 close to but noticably less than 23.72 × 4 × 112 = 10626.
This formula can be reversed to convert the other way:
SPECInt92 = SPECrateInt92 / (23.72 × P)
SPECFP92 = SPECrateFP92 / (23.72 × P)
For example, a 4-CPU server with a SPECrate_int92 of 2372 would not deliver 100 SPECint92, because SPECint92 is based on running a single task on a single processor. Instead, each of its 4 cpus would deliver 25 SPECint92 (or probably a little more, because the SPECrate figure includes the inefficiency of the OS overhead for multiprocessing).
Speed and Rate conversion for CPU95
As described above the SPECrate scores for each program in the integer or FP suite is calculated as follows:
SPECrate(program) = N × [ Tref(program) / Tref(145.fpppp) ] × [ 86400 / Ttest(program) ]
N = number of copies run concurrently
Tref(program) = time to run program on the reference machine,
a SPARCstation 10/40
Tref(145.fpppp) = time to run 145.fpppp on the reference
machine = 9600 seconds.
86400 = number of seconds in a day
TSUT(program) = time to finish last concurrent copy on
system under test
SUT = system under test
If the system under test has just one processor, the tester creating the CINT95rate or CFP95rate rate scores would run just one copy of each program.
ratio(program) = Tref(program) / runtime(program)
rate(program) = 1 × [ Tref(program) / Tref(145.fpppp) ] × [ 86400 / Ttest(program) ]
The ratio between a program's ratio and rate would be:
rate(program) / ratio(program) = [ runtime(program) × Tref(program) × 86400 ] / [ Ttest(program) × Tref(program) × Tref(145.fpppp) ]
Since only one copy is being run, runtime(program) and Ttest(program) are the same. Also, the Tref(program) terms cancel out, so we get:
rate(program) / ratio(program) = 86400 / Tref(145.fpppp) = 9.00
predicting that the base rates in a single-processor system's SPECint_rate95 report will be 9 times the base ratios in the SPECint95 report.
The AlphaStation 200 4/100 results 14,15 provide a convenient example of this:
confirming the prediction made by the formulas:
SPECrate_{int|fp}95 = 9.00 × SPEC{int|fp}95
for single-processor systems
SPEC{int|fp}95 = SPECrate_{int|fp}95 / 9.00
for single-processor systems
Looking at many SPEC95 and SPECrate95 results for various single-CPU systems, this appears to actually be the case.
When multiple copies are run on multiple CPUs, the disk and memory, and sometimes the cache, are being shared. This prevents a multi-CPU system from attaining 100% efficiency. However, these formulas can predict the maximum theoretical performance that could be attained with P processors, given the speed scores of a single processor:
maximum SPECrate{Int|FP}95 = 9.00 × P × SPEC{Int|FP}95
SPEC{Int|FP}95 = maximum SPECrate{Int|FP}95 / (9.00 × P)
Speed and Rate conversion for CPU2000
As described before, the formula for SPEC{int|fp}rate2000 is:
CPU2000_Rate = geom mean [ CPU2000_Rate(program) ]
CPU2000_Rate(program) = N × [ Tref(program) / Tref(171.swim) ] × [ 3600 / TSUT(program) ]
N = number of copies run concurrently
Tref(program) = time to run program on the Sun Ultra 5/10
Tref(171.swim) = time to run 171.swim on the Sun Ultra 5/10 = 3100
3600 = number of seconds in an hour
TSUT(program) = time to finish last concurrent copy on system being tested
SUT = system under test
The reference machine is a Sun Ultra 5/10. The CINT2000 or CFP2000 Rate for the Sun Ultra 5/10, based on running just a single copy of a particular program, would be:
CPU2000_Rate(program) = 1 × [ Tref(program) / Tref(171.swim) ] × [ 3600 / Tref(program) ]
The Tref(program) terms cancel out, leaving
CPU2000_Rate(program) = 3600 / Tref(171.swim) = 1.161
for all programs. Since the rates for the programs in the CINT2000 and CFP2000 suites are all computed normalized to 171.swim, CINT2000 and CFP2000 rates for a Sun Ultra 5/10 will be 1.161.
Since the SPECint2000 and SPECfp2000 scores for the Sun Ultra 5/10 are both 100, the formulas for conversion for a single-processor machine are:
SPECint2000 = (100 / 1.161) × SPECint_rate2000
= 86.1 × SPECint_rate2000
for single-processor systems
SPECfp2000 = (100 / 1.161) × SPECfp_rate2000
= 86.1 × SPECfp_rate2000
for single-processor systems
Using the published results for certain single-processor machines (such as the Compaq AlphaServer GS160 Model 6/731, which was tested and published in April 2000)25,26 and comparing the base ratios in its C{int|fp}2000 results to its C{int|fp}2000_Rate results, it is easy to see that the conversion formulas do in fact work. For example, it has a base SPECint2000 of 353, and a base SPECint_rate2000 of 4.09. 352 divided by 4.09 equals 86.1.
When multiple copies are run on multiple CPUs, the disk and memory, and sometimes the cache, are being shared. This prevents a multi-CPU system from attaining 100% efficiency. However, these formulas can predict the maximum theoretical performance that could be attained with P processors, given the speed scores of a single processor:
maximum SPEC{int|fp}2000_Rate ≅ P × SPEC{int|fp}2000 / 86.1 SPEC{int|fp}2000 ≅ maximum SPEC{int|fp}2000_Rate × 86.1 / P
Speed and Rate conversion for CPU2006
Unlike each of the previous versions of SPEC{int|fp}_rate, there are no scaling factors in the equation that make it difficult to compare the speed and rate metrics for single-processor systems. This is clear both from the scores of the reference machine under the speed and rate metrics2,38, and from the formulas. As stated above, for the speed metric:
ratioprogram = Tref(program) / TSUT(program)
and for the rate metric:
rate(program) = N × Tref(program) / TSUT(program)
On a single-processor machine, N will usually be 1 and the TSUT(program) values will be the same, resulting in ratios and rates being equal. Since SPECint2006 is the geometric mean of the ratios, and SPECint_rate2006 is the geometric mean of the rates, these end up being directly comparable:
SPECrate_{int|fp}2006 ≅ × SPEC{int|fp}2006
for single-processor systems
SPEC{int|fp}2006 ≅ SPECrate_{int|fp}2006
for single-processor systems
When multiple copies are run on multiple CPUs, the disk and memory, and sometimes the cache, are being shared. This prevents a multi-CPU system from attaining 100% efficiency. However, these formulas can predict the maximum theoretical performance that could be attained with P processors, given the speed scores of a single processor:
maximum SPECrate{Int|FP}2006 = P × SPEC{Int|FP}2006
SPEC{Int|FP}2006 = maximum SPECrate{Int|FP}2006 / P
It is interesting to note that CPU2006 has made conversion quite easy: the speed metric times the number of processors equals the (theoretical maximum achievable) rate metric. This change is counter to the 1992 philosophy discouraging this sort of comparison, but is in keeping with recent developments in the marketplace: single-processor, single-core workstations are getting to be rather rare, and software developers are under pressure to adapt their products to take advantage of dual or quad CPU cores in order to remain competitive and to meet customers' expectations of increased performance. I imagine that in a future CPU suite, the speed metric will be redesigned to allow (and perhaps emphasize) tasks that use multiple threads and multiple CPU cores when available.
Reference Machine Times for the 1989 SPEC Benchmark Suite for UNIX Systems
The times are from the old "specin89.tbl" and "specfp89.tbl" files formerly made available by Aburto and Simizu.21 The similarities to SPEC92 programs were gleaned from a paper by Aashish Phansalkar et. al.22
Reference Machine Times for SPEC_CPU_92
The descriptions are from Jeffrey Reilly's SPEC FAQ19.
* denotes the program with the longest runtime on the reference machine.
Reference Machine Times for SPEC_CPU_95
* denotes the program with the longest runtime on the reference machine.
Reference Machine Times for SPEC CPU2000
* denotes the program with the longest runtime on the reference machine.
Reference Machine Times for SPEC CPU2006
Footnotes and References:
1 :
http://www.spec.org/cpu2006/Docs/readme1st.html
SPEC, CPU2006
Read Me First, question 23 (reference machine)
2 :
http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00001.txt
SPEC, CINT2006 results posted by Sun Microsystems for Ultra Enterprise
2 system, March 2006.
3 :
http://www.spec.org/cpu2006/results/res2006q3/
SPEC, CPU2006
Results submitted in Third Quarter 2006.
4 :
http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00047.txt
Spec, CINT2006 results posted by Sun Microsystems for Sun Blade 1000
system.
5 :
The use of three runs and discard of highest and lowest times
is clear from any of the result reports. For example, look at the
401.bzip2 times in the
CINT2006 summary for AMD Shuttle SN25P.
6 :
http://www.spec.org/cpu2006/Docs/readme1st.html
SPEC, CPU2006
Read Me First, question 15 (concerning "rate" vs. "speed").
7 :
http://www.spec.org/cpu2006/Docs/readme1st.html
SPEC, CPU2006
Read Me First, question 13 (definitions of the metrics).
8 :
This simple and fairly obvious formula was verified by looking
at numbers from posted results. For example, see the test run times in
the CINT_rate2006 results for the AMD
Shuttle SN25P
and the CINT2006 results for the
reference machine.
9 :
http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00008.txt
SPEC, CINT_rate2006 results submitted by Advanced Micro Devices for
Shuttle SN25P, Q3 2006.
10 :
median: the middle of a set of data values. For example, in
the set of three values {3, 8, 10}, the median is 8. Note this is
different from the mean or average, which in this example would be 7.
11 :
http://www.spec.org/cpu2006/results/res2006q4/
SPEC, CPU2006
Results submitted in Fourth Quarter 2006. Note the CINT2006 Rates
entry for IBM System X 3800 they ran 16 copies on a system with 8
cores, because the system's Intel Xeon processor cores support 2
threads per core. (Intel's "hyperthreading" is essentially a hardware
implementation of two virtual CPU's using duplicate register sets,
register renaming, and interleaved scheduling of instructions from the
two instruction streams. This adds efficiency because the small number
of registers in the x86 user programming model is not enough to keep a
long pipeline busy.)
12 :
http://www.spec.org/cpu92/specrate.txt
Alexander Carlton,
"CINT92 and CFP92 Homogeneous Capacity Method". Describes the
normalization formula used in the rate metrics (scroll down to the
description of "ReferenceFactor").
13 :
http://www.spec.org/cpu92/specrate.txt
ibid.; in "History"
section near the end.
14 :
http://www.spec.org/cpu95/results/res9509/p019.html
SPEC,
SPECint95 results submitted by Digital Equipment Corp. for the
AlphaStation 200 4/100, September 1995.
15 :
http://www.spec.org/cpu95/results/res9509/p074.html
SPEC,
CINT95rate results submitted by Digital Equipment Corp. for the
AlphaStation 200 4/100, September 1995.
16 :
http://performance.netlib.org/performance/html/spec.suni40.cint92.6_93.notes.html
The Performance Database Server at Netlib, Benchmark CINT92 summary,
results for SPARCstation 10 Model 40 submitted by Sun Microsystems
Inc. and published by SPEC in June 1993.
17 :
http://performance.netlib.org/performance/html/spec.sunf41.cfp92.6_93.notes.html
The Performance Database Server at Netlib, Benchmark CFP92 summary,
results for SPARCstation 10 Model 40 submitted by Sun Microsystems
Inc. and published by SPEC in June 1993.
18 :
http://www.islandnet.com/~kpolsson/workstat/work1991.htm
Ken
Polsson, Chronology of Workstation Computers 1991-1992
19 :
http://gd.tuwien.ac.at/perf/benchmark/aburto/faq/spec.faq
Jeffrey Reilly, Answers to Frequently Asked Questions about SPEC
Benchmarks, 1993 June 4
20 :
http://groups.google.com/group/comp.benchmarks/msg/57d2ed8f7915deba
John DiMarco, "SPECmark table", comp.benchmarks article, 1994 Jan 5.
Lists SPEC CPU92 and SPEC89 results in three tables. A copy is
here.
21 :
Al Aburto, specin89.tbl and specfp89.tbl, text files published
via anonymous FTP at ftp.nosc.mil (no longer available).
22 :
http://lca.ece.utexas.edu/pubs/techreport/TR-050127-01.pdf
Phansalkar, Joshi, Eeckhout, and John, "Measuring Program Similarity",
2005. A similar paper by the same authors is
here
23 :
http://staff.stir.ac.uk/b.m.bullen/tech97.htm
Brian Bullen
(University of Striling), "Note on PROCESSORS" from
Technical Strategy. Describes features and gives performance
figures for a number of machines that were being used at the
university in 1996.
24 :
http://groups.google.com/group/comp.benchmarks/msg/50df1ead101fa5e8
John DiMarco, "SPECmark table", comp.benchmarks article, 1995 May 16.
Lists many SPEC CPU92 results in a table.
25 :
http://www.spec.org/osg/cpu2000/results/res2000q2/cpu2000-20000511-00100.asc
SPEC, CFP2000 results submitted by Compaq Computer Corporation for the
AlphaServer GS160 Model 6/731 in May 2000. The integer results are
here.
26 :
http://www.spec.org/osg/cpu2000/results/res2000q2/cpu2000-20000605-00120.asc
SPEC, CFP2000 rate results submitted by Compaq Computer Corporation
for the AlphaServer GS160 Model 6/731 in May 2000. Its integer rate
results are
here.
27 :
http://www.islandnet.com/~kpolsson/workstat/work1991.htm
Ken
Polsson, Chronology of Workstation Computers 1987-1990
28 :
Al Aburto, "This is the data set of Dhrystone and SPECratio
and SPECin89 data I have" (speccorr.tbl), September 1992. Text file
made available by anonymous FTP from ftp.nosc.mil no longer
available online, but he described an earlier version of this work in
this article
in comp.benchmarks.
29 :
http://www.unixnerd.demon.co.uk/sun_unix.html
John Burns, "Sun
- The Unix Enthusiast's Choice", web page giving SPEC ratings of SUN
workstations including a few models not in SPEC's published reports.
30 :
http://www.spec.org/cpu95/results/res98q1/cpu95-980128-02369.asc
SPEC, CINT95 results posted by Sun Microsystems for Ultra 10 300MHz
system.
31 :
http://www.spec.org/cpu95/results/res98q1/cpu95-980128-02370.asc
SPEC, CFP95 results posted by Sun Microsystems for Ultra 10 300MHz
system.
32 :
http://www.spec.org/cpu/results/res2006q2/cpu2000-20060612-06193.asc
SPEC, CINT2000 results posted by Sun Microsystems for Ultra Enterprise 2
system.
33 :
http://www.spec.org/cpu/results/res2006q2/cpu2000-20060612-06194.asc
SPEC, CFP2000 results posted by Sun Microsystems for Ultra Enterprise 2
system.
34 :
http://performance.netlib.org/performance/html/spec.ibm220.cint92.6_93.notes.html
SPEC, SPECint92 results posted by IBM for RISC System/6000
POWERstation M20/220 system, June 1993.
35 :
http://performance.netlib.org/performance/html/spec.hp755.crint.6_93.notes.html
SPEC, SPECint92 results posted by Hewlett-Packard Company for HP
Apollo 9000/755 system, June 1993.
36 :
http://performance.netlib.org/performance/html/spec.delli1.cint92.12_95.notes.html
SPEC, SPECint92 results posted by Dell Computer Corporation for Dell
Dimension XPS (133MHz, 512KB L2) system, December 1995.
37 :
http://performance.netlib.org/performance/html/spec.dellir1.crint92.12_95.notes.html
SPEC, SPECrate_int92 results posted by Dell Computer Corporation for
Dell Dimension XPS (133MHz, 512KB L2) system, December 1995.
38 :
http://www.spec.org/cpu2006/results/res2006q3/cpu2006-20060513-00003.txt
SPEC, CINT2006 Rate results posted by Sun Microsystems for Ultra
Enterprise 2 system, March 2006.
39 :
http://www.spec.org/cpu2006/Docs/runrules.html
SPEC CPU2006
Run Rules
40 :
http://www.spec.org/cpu/docs/runrules.html
SPEC, CPU2000 Run
Rules, v1.3A, June 2006.
41 :
http://www.spec.org/cpu95/rules/RUNRULES.txt
SPEC, CPU95 Run
Rules,
42 :
http://www.spec.org/cpu2006/Docs/runrules.html
SPEC, CPU2006
Run Rules, candidate 4, July 2006
SPEC® and the benchmark names SPECint89, SPECfp89, SPECmark, SPEC CPU95, CINT95, SPECint_base95, SPECint95, SPECint_base_rate95, SPECint_rate95, CFP95, SPECfp_base95, SPECfp95, SPECfp_base_rate95, SPECfp_rate95, SPEC CPU2000, CINT2000, SPECint_base2000, SPECint2000, SPECint_rate_base2000, SPECint_rate2000, CFP2000, SPECfp_base2000, SPECfp2000, SPECfp_rate_base2000, SPECfp_rate2000, SPEC CPU2006, CINT2006, SPECint®_base2006, SPECint2006, SPECint®_rate_base2006, SPECint_rate2006, CFP2006, SPECfp®_base2006, SPECfp2006, SPECfp®_rate_base2006, and SPECfp_rate2006 are registered trademarks of the Standard Performance Evaluation Corporation. The examples given above are drawn from test results, disclosures and documentation published on www.spec.org as of Jan 25, 2007.
Robert Munafo's home pages at Pair Networks
© 1996-2008 Robert P. Munafo.
Email the author
This work is licensed under a
Creative Commons Attribution 2.5 License.
Back to my main page
s.13