| Computer History |
Define three classes of computers based on size and wealth of user (all prices adjusted to US dollars in late 1990's):
A. High-end scientific and military computers costing around $20 million. Usually called "large mainframe" or "supercomputer".
B. Mainstream business (or engineering) computers costing around $200,000. Usually called "MIS server" or "high-end workstation".
C. "Consumer" machines costing around $2000. Usually called "home computer" or "personal computer".
If we look at the history of each of these three classes of computers we see that computer history has been repeating itself remarkably well. Here are the major phases in the evolution of the computer. Each phase represents about a factor of 20 in speed improvement. For each class, the first significant representative system from that phase is listed:
| Phase |
-----Class-A----- |
-----Class-B----- |
-----Class-C----- | Developments that define this phase |
| 1 |
1947 |
year |
1976 | Smallest CPU design that can do any useful work, with whatever switching technology and clock speed allow for cheapest manufacture. Simple multistate sequential operation: fetch, decode, fetch operands, execute, write back, increment PC, repeat. |
| 2 |
1955 |
year |
1984 | Simple improvements: Push the clock speed, vary the voltages and fabrication techniques, widen the ALU and opcode, more registers and memory, more instructions and addressing modes, instruction prefetch. |
| 3 |
1964 |
year |
1991 | Pipeline the 5 major parts of the CPU (fetch, decode, operands, execute, write back). CPU becomes faster than main memory; add a cache. Add floating-point and memory management hardware. |
| 4 |
1976 |
year |
1994 | Multiple ALUs, register scoreboarding, out-of-order execution, multiple sets of condition codes, speculative execution, floating-point pipeline, etc. Instruction set and programming model are redefined to fit the architecture. |
| 5 |
1988 |
1997 |
2001 | Market splits between modestly parallel (2 to 8 CPUs; suitable for multitasking and business, database, expert systems) and vector processor designs (suitable for scientific, graphics, visualization, imaging). Algorithms become obsolete; applications are rewritten to fit the architecture. |
| 6 |
1995 |
year. |
year. | Market converges again, on massively parallel systems (32 or more CPUs, same program running on all CPUs). 30-50% of the hardware cost is dedicated to inter-CPU communication. All old programming techniques become obsolete; programmers re-learn how to program. |
| 7 |
2002?. |
year. |
year. | Development focused on inter-CPU communication. As number of nodes increases, percentage of hardware dedicated to communication increases also. Bisection bandwidth grows faster than aggregate processing power. |
| 8 |
year. |
year. |
year. | Development focused on physical size and cooling. Speed of light limits the number of nodes that can work on a single problem. 3-D fabrication (stacked wafers); all air space is replaced with heat fins, coolant pipes, communication lines, power wires and thermally conductive glue. |
| 9 |
year. |
year. |
year. | (It's difficult to see what happens next. By this time I expect that the fundamental switching elements will have switched from transistors to some kind of optical, chemical or biological phenomenon.) |
| 10 |
year. |
year. |
year. |
The plateau. Computers stop growing exponentially as the laws of
physics are finally reached. Architecture stops changing and is
perfected. Hardware stops changing and is perfected. Existing software
stops changing and is perfected. New software continues to be written
as always.
|
If N is the number of processors, s is the amount of time spent (by a serial processor) on serial parts of a program and p is the amount of time spent (by a serial processor) on parts of the program that can be done in parallel, (and s + p = 1) then Amdahl's law says that speedup is given by
speedup = 1 / (s + p / N )
This would seem to limit the benefit of parallel processing to 1/s. However, it has been found that when faster hardware becomes available users run bigger problems on the faster hardware. It has also been found that for all problems that need to be run on "big" computers, the value of s goes down as the problem size increases.
First stated by Gordon Moore, a co-founder of Intel, in 1965. He predicted that the logic density of computer circuitry would follow an exponential function approximately equal to:
bits per square inch = 2 (T-1962)
Where T is the current year. Moore used memory density as a measure of logic densisty, rather than the "gate count" or "transistor count" that is more common today. So, (for example) in 1965 the logic density would be 23 = 8 bits per square inch (which was true), and in 1997 it would be 235 = 34 billion (which is an overestimate, in fact we're only up to about 1 billion).
Moore didn't actually say this, but it is the version most people have heard the statement that computer speed increases exponentially. To make it precise, we'll relate the speed in floating point operations per second to the price P in dollars and the time T:
speed = P * 1.53 (T-1973)
Including the price as a linear term in the formula is deceiving: usually you cannot count on getting twice as much power for twice as much money. There are actually many different factors at work. Some of them work against each other:
high-end technology: Higher-cost machines have a smaller market, so the develpment costs can't be spread out as much, and faster designs usually require more expensive technology (like multi-port caches for shared-memory designs). This effect causes bigger machines to cost more per megaflop than they otherwise would.
adjusting job size to match the machine: Programs designed to run efficiently on small machines don't run efficiently on large machines. If you buy a faster machine and run the same program on it, expecting performance proportional to your machine's rated performance, you'll be disappointed. But if you adjust (recompile, optimize, etc.) the program and its data for the larger machine you can keep up.
software optimization effort: Higher-priced machines get utilized more efficiently. A lot of money was spent on a fast machine and a lot of effort will get spent on optimizing the software to run fast. This effect causes bigger machines to cost less per megaflop than they otherwise would.
commodity hardware: In some cases, notably the ASCI projects, very large supercomputers are built from ordinary low-end technology. This effect counteracts the high-end technology effect, causing bigger machines to cost less per megaflop than they otherwise would.
inflation: The formula was calculated to ignore inflation. If you account for inflation by converting all prices to 1973 dollars, then you would need to change 1.53 in the formula to around 1.58. The difference of about 3% is the average inflation rate during the whole period. That higher figure is the actual rate of computer speed improvement.
recent trends not fully explained: In the last few years there has been a noticable difference in Moore's Law as compared to the 70's and 80's. *Scientific American, September 1995: "The rate of improvement in microprocessor technology has risen from 35 percent a year only a decade ago to its current high of approximately 55 percent a year, or almost 4 percent each month. Processors are now three times faster than had been predicted in the early 1980's; it is as if our wish was granted, and we now have machines from the year 2000.*" Theories explaining this include a possible greater push for development as the computer market becomes bigger (and thus more lucrative), and/or the spinoff of US government-funded projects such as the space program and military projects like ARPANET and ASCI.
To illustrate the competing effects, here is a comparison across the three classes: In the beginning of 1997, the Intel ASCI Option Red machine (a "class A" machine) had just been completed at a cost of $46 million and its rated speed was about 1.34 TFLOPs. A 16-CPU #Sun HPC 10000# (a "class B" machine) would cost about $750 thousand with a rated speed of 7.0 GFLOPs. A Gateway 2000 P5-200 with a 200 MHz Pentium Pro (a "class C" machine) cost about $2500 and was rated at 63 MFLOPs. All three MFLOPs ratings come from the LINPACK benchmark, with array sizes appropriate to get the most performance out of each machine. Let's see what the formula says:
| machine | performance based on formula | actual performance |
| Option Red | 46000000 * 1.53(1997-1973) = 1.24 TFLOPs | 1.34 TFLOPs |
| Sun HPC 10000 | 750000 * 1.53(1997-1973) = 20.3 GFLOPs | 7.0 GFLOPs |
| Gateway 2000 PPro-200 | 2500 * 1.53(1997-1973) = 67.7 MFLOPs | 63 MFLOPs |
Several of the effects can be seen at work. The high-end technology effect drives up the cost of the Sun HPC 10000. The commodity hardware effect is evident in the Option Red, and a lot of software optimization was done there as well. (The Pentium Pro 200 in the Gateway can actually do over 150 MFLOPs if you optimize the code to use the pipeline and cache well.)
More Data
Here is some additional information about a few machines organized by phase (as in the above table) with some more information, plus a long list of links to places where I cound information I used to compile the figures.
Here is a list of price data I collected while trying to establish the validity of the idea that the three classes shown in the table above have followed the same track of evolution over roughly the same time scale.
This table is my attempt to give the first example of a microprocessor using each industry-standard process size. In each case I count it only when Intel (or IBM) has shipped it in a "volume quantity" to customers, and if it uses the new process size (or smaller) for the entire (or almost the entire) die. A complete
| process | transistors | when | speed and name |
| 10 u | 2300 | 1971.11 | 108 KHz Intel 4004 |
| 6 u | 6000 | 1974.04 | 2 MHz Intel 8080 |
| 3 u | 6500 | 1976.03 | 5 MHz Intel 8085 |
| 29K | 1978.06 | 5 MHz Intel 8086 | |
| 1.5 u | 134K | 1982.02 | 6 MHz 80286 |
| 1 u | ? | 1988.04 | 25 MHz 386 DX |
| .8 u | 1.2 M | 1991.06 | 50 MHz 486 DX |
| .6 u | 3.2 M | 1994.03 | 90 MHz Pentium (P54C) |
| .35 u | 3.2 M | 1995.06 | 133 MHz Pentium (mixed .6/.35 process was used in 1995.03 for the 120-MHz Pentium) |
| .25 u (P856) | 4.5 M | 1997.09 | 200MHz Pentium MMX |
| .18 u (P858) | 27.4 M | 1999.06 | 400 MHz Pentium II |
| 130 nm (Px60) | 28 M | 2002.02 | Pentium-IIIM "coppermine" First process to use copper, rather than aluminum, interconnects; first to use 300-mm wafers (but 200-mm wafers are also used with this process) |
| 90 nm (P1262) | >= 58M | 2004.02 | 2 GHz IBM PowerPC 970FX "CMOS 10S" process: first process to use strained Silicon for the channel (semiconducting layer). Now using 300-mm wafers exclusively. |
| 65 nm (P1264) | - | 2007.06 | prediction from 2003 ITRS: completion of first three-year retooling cycle |
| 45 nm (P1266) | x | 2010.06 | prediction from 2003 ITRS: metallic gates plan is to use high-K dielectric rather than SiO2, and metal gate electrode rather than polysilicon |
| 32 nm (P1268) | x | 2013.06 | prediction from 2003 ITRS: extreme ultraviolet (EUV) lithography |
| 22 nm (P1270) | x | 2016.06 | prediction from 2003 ITRS will probably require an interconnect material superior to copper |
| 16 nm | x | 2020.06 | At this point quantum tunneling will be a problem |
Notes:
"process length" is the DRAM 1/2 pitch, a dimension relevant to memory chips. For CPU cores, the relevant dimension is "physical gate length", and is about 45% of the process length. For example, when the 90nm process is used, the gate length is 37nm.
Sources:
http://www.informit.com/articles/article.asp?p=130978&seqNum=3 http://www.architosh.com/news/2003-12/2003c-1203-tb-moore.phtml http://www.willus.com/archive/timeline.shtml http://public.itrs.net/Files/2003ITRS/ExecSum2003.pdf http://news.com.com/2100-73373-5112061.html?tag=stpop http://www.intel.com/research/silicon/itroadmap.htm
Most benchmark numbers are from old LINPACK reports.
test: see Amdahl
Robert Munafo's home pages at Pair Networks
© 1996-2008 Robert P. Munafo.
Email the author
This work is licensed under a
Creative Commons Attribution 2.5 License.
Back to my main page
s.13