[Author's note 11th Mar 2011 - In the IT business writing predictive articles like this one is simply asking to be humiliated, through massively underestimating the pace of change in our industry. The reason I'm uploading the following Byte article from 1997 is that I'm quite proud of the fact it defies such humiliation. My predictions of limiting transistor feature sizes and clock speeds by 2010 lie within the same order of magnitude as today's actual parameters - roughly speaking 35nm (0.035micron) feature-size and 1.5 GHz clock speed - which is as close to a bullseye as one can hope for in this game!]
An edited version of this material originally appeared in the March, 1998 issue of Byte magazine, a McGraw-Hill publication.
THE LIMITS OF COMPUTING
We've been living with Moore's Law for 34 years now, so the idea that one day soon it might cease to apply is rather hard to accept. It was back in 1964, four years before he helped to co-found Intel, that the promising young engineer Gordon Moore noticed that the number of transistors you could squeeze onto a single chip was doubling every year. Those early chips contained a few thousand transistors, but once the semiconductor business got into its stride the pace slackened a little, doubling the transistor count only every 18 months, and it's been doing that ever since. Now the Pentium Pro contains 5.5 million transistors, and the next generation Merced will have closer to 10 million. What's more, as the number of transistors doubles, the next generation chip doesn't cost much more per unit to manufacture, so the cost per transistor actually halves every 18 months.
This happens because chips are fabricated using lithography - in effect a kind of printing process - so the economics work rather like print on paper. It would cost almost exactly the same for Byte to print twice as many words, at half the point-size, on this page, though you might destroy your eyesight trying to read them. That problem doesn't arise with transistors, because a half-size transistor is as just good as its predecessor. Actually it's better, because smaller transistors are faster than big ones, at least in the CMOS (Complementary Metal Oxide Semiconductor) technology from which all modern chips are built. As you shrink CMOS transistors you can switch them proportionally faster by raising the clock frequency, while the power consumed by each transistor remains the same. The fortunes of Intel and the other semiconductor giants have rested for nearly 20 years on this virtuous circle of physics. It's the reason that the 166MHz Pentium machine I'm writing this article on is hundreds of times more powerful than the single mainframe that served the whole of London University during my student days in the 1960s.
Every few years someone predicts the end of this exponential growth of chip density, but every time so far they've been wrong. Now the signs are though that a slackening of the pace is in sight, not next year but within the next decade, and the reasons may be as much financial as technical. Going back to that print-on-paper analogy, you may have already spotted a weakness in the argument - printing twice the words on each page might not raise the incremental cost of printing the page, but Byte's overhead would double because the writers would need paying twice as much for all the extra words. The analogy with semiconductors holds good; as chips get denser, the cost of designing them and the cost of fabrication equipment goes up too, not quite so fast perhaps but following a similarly exponential growth curve. An increasing number of industry commentators are coming to believe that it will be the cost of building new fab plants, rather than technological obstacles, that causes Moore's curve to flatten off into a plateau.May The Physics Be With You
The Pentium Pro and PowerPC microprocessors that power our current generation of desktop machines are built in CMOS technology with a 0.35 micron feature size (roughly speaking the size of each transistor and the width of the metal tracks that connect them) while the next versions of these chips, currently being sampled, will shrink to a 0.25 micron process. As Tom Thompson explained in this spot last year ["Limits of Computing", Byte April 1996] these sizes are already pushing up against one kind of technical barrier, in the lithography process itself. The laws of optics require you to use light of wavelength no greater than the smallest feature to accurately copy the masks that describe a chip's layout onto silicon. Current litho machines use mercury arc light of 0.365 micron wavelength, and next generation machines will move to exotic krypton/fluorine UV lasers which emit at 0.248 microns. As chip fabricators try to reduce the feature size further, the illumination sources they need get ever more exotic, more expensive and less tractable. UV lasers, plus optical tricks to enhance resolution are good down to about 0.1 micron, but below that you need X-rays and then things get really tricky. No method exists for focussing X-rays, so you can't exploit optical reduction; masks must now operate like 'contact prints' meaning they need to be the same size as the chip itself, and making them terribly difficult to manufacture. It's also hard to find mask materials that are sufficiently opaque to X-rays. Despite decades of research, no commercial chip has yet been made using X-ray lithography.
Lithography is only the first obstacle. At around 0.1 micron feature sizes, problems with transistor threshold voltages, capacitance and resistance in the metal interconnects also hit crisis point. The attraction of CMOS is that - in contrast to competing technologies - transistors made in CMOS consume almost no current when they're turned off, the benign characteristic that enables immensely complex chips like the Pentium to keep reasonable power consumption. As you shrink chip designs each individual transistor can run faster while consuming the same amount of power, but of course the total number of transistors per unit area is increased, and so the power consumed (and hence heat dissipated) per unit area rises by a square law - halve the feature size, four times the watts/squ cm. Another benign property of CMOS technology comes to the rescue - the transistors' voltage and power consumption are also related by a square law, so that lowering the supply voltage can compensate for this increased power consumption. Dropping from 5 volt to 2 volt operation brings a 6-fold power saving (52/22=25/4) and dropping to 1 volt would reduce power consumption 25-fold. That's why chip designers have been reducing CPU core voltages over the last decade as feature sizes get smaller, by steps from 5v, through 3.3v and 2.8v, down to 2.5 and even 1.8 volts for next-generation chips. As a result we keep getting double the computing power for almost the same power consumption.
But in semiconductor design the trade-offs just keep on coming at you. Reducing the supply voltage reduces the maximum clock rate for the chip by increasing the gate delay (ie. the transistors switch slower), unless you offset this by also reducing the transistor's threshold voltage (the voltage at which the transistors switch on.) Lower the threshold voltage too far and the transistors will not work at all, but even before that point they begin to leak current even when turned off. Steve Furber, Professor of Computing at Manchester University, is an authority on low-power chip architectures (he designed the original ARM chip) who describes the situation like this: "in standard bulk CMOS, for every 60 millivolts you lower the threshold, leakage goes up by an order of magnitude. At the moment the industry is lowering the supply voltage only as fast as it has to, to avoid transistor breakdown." The bottom line of this mind-boggling series of trade-offs is that just under 1 volt operation looks like a fundamental limit for CMOS technology, and that's exactly where sub-0.1 micron chips are headed.
Cross-talk is another gremlin that will make 0.1 micron hard to achieve. Today's chips are essentially 2-dimensional objects, so as you shrink them you have to cram everything closer together. The tracks that join the transistors must get narrower, but to avoid increasing their resistance too much they have to be left relatively thick. Eventually they look more like walls than roads (see fig. 1), and neighboring pairs of tracks start to behave as long parallel plate capacitors, which both slows down signal transmission and also allows the signal on one wire to leak into its neighbour (cross-talk).
Chip designers will be fighting an uphill battle against cross-talk. For example the latest generation of CAD tools, when asked to lay out a long straight track run will automatically space the tracks further apart than the minimum width, but as Furber observes "of course that's very good for reducing cross-talk, but it starts to negate the advantage of the small feature size". One solution that promises to delay the inevitable is employing a better conductor, copper, for the interconnect so that tracks can be made of lower profile to reduce intertrack capacitance (see Text Box "Copper-bottomed chips"). Keeping up exponential growth beyond 0.1 micron will mean moving beyond the current universe of 2-dimensional CMOS fabrication. There are hundreds of different university research projects into 3-D chip structures and molecular level transistors built from exotic organic polymers, but none of these technologies yet looks like a viable basis for such a huge industry.
It's a combination of all these difficulties that may cause Moore's exponential curve to level off, at somewhere just below 0.1 micron feature size, with over 100 million transistors on a chip, a 1 Gigahertz clock rate and 1 volt operation, sometime during the next decade. Steve Furber thinks "we've been blessed so far, in that the sort of things ECL [Emitter Coupled Logic as used in 1970s mainframes] board designers used to worry about - transmission line effects, ringing and so on - have not existed in CMOS chips to any significant extent, but they're coming". The SIA's (Semiconductor Industry Association) National Technology Roadmap for Semiconductors, a collection of best-guess predictions from experts across the industry, agrees, hinting at a slackening of the increase in clock speed around the 1 GHz level by year 2010. Outside of CPU design, there are other obstacles. In DRAM fabrication, progress is slowing as cell sizes fall below 0.2 micron, and a move to expensive Silicon-on-Insulator technologies may be needed to get past the 1 Gbit level. Bus speeds currently lag 5 times behind CPU speeds (eg. 300 MHz Pentium on 66 MHz bus), putting tremendous pressure on cache technologies to absorb the difference. A 1 GHz CPU will demand at least a 100 MHz bus - Intel already has motherboards that fast running in its labs - but such boards are hard to design and expensive to make, which may upset the economics of the commodity PC market. However in a sense none of these technical obstacles are the real problem – instead it may be the business model that is in danger of topping out (see Text Box: The Cost of Fab).
A Quantum Leap?
What also makes the 0.1 threshold hard to cross is that at those sizes quantum effects start to become important - the number of electrons in the body of each transistor is only a few thousand, rather than billions. Ohm's Law and other rules that electrical engineers work by only describe the statistical average behaviour of huge numbers of electrons - once you get electrons on their own they follow the very different laws of quantum mechanics. As Steve Furber puts it "the probability that a few hundred electrons will spontaneously migrate to the next transistor may be only zero-point-many-noughts percent, but given billions of transistors, in a chip that’s expected to perform billions of operations per second, without error for years, you'd eventually start to see that happen."
But what about the possibility that quantum effects might be put to use to create a new kind of computer? Though many researchers are pursuing that goal, there aren't any working quantum computers yet. Present experiments are still at the level of demonstrating one or two working quantum gates. However the excitement that drives the search for a quantum computer is that it promises to be able to solve classes of problem that are for ever denied to conventional computers.
than operating on the bulk properties of electrons like a 'classical' digital
computer, quantum computers operate at the level of single quantum particles -
atoms, ions, single electrons or photons. Tom Thompson described some of the
current experimental systems in his 1996 article - promising systems include
beryllium or caesium atoms trapped inside resonant cavities, and quantum 'dots'
which are single electrons trapped in
the energy wells at the junction of two layers of doped silicon. Many of
the experiments on quantum computing use lasers and mirrors, because quantum optics
is a well understood field, rather than because that's the way to practical
devices. Which physical implementation eventually triumphs is not too important
at this stage though, as the principle underlying them all is the same, in a
very profound sense (see text box "The Square Root of Not").
Quantum computers operate on basic entities called qbits, analogous to the bits in a classical computer but more powerful. Qubits can be in one of three states: 0 and 1 are the same as for a classical bit, but the third state is a superposition of the two, both 0 and 1 at the same time. Were you to build a 4 qubit-wide register and put all 4 qubits into the superposed state, that register would contain all the numbers between 0 and 7 at the same time. Any arithmetic operations you perform on this register will be applied to all these numbers simultaneously, so quantum computers are intrinsically highly parallel machines, and the implications of this are startling (see box "Taming Tough Algorithms").
In order to exploit quantum parallelism you need the right kind of problem and algorithm. For example adding two superpositions of the numbers 0..7 would produce a superposition of 0..15, but reading the register would collapse it to one random number from that range, probably not the correct result - addition is not a quantum operation. Quantum computers have a different range of usefulness from classical computers, and in the opinion of David Deutsch of Oxford University "they will probably never be used to add up accounts in banks". Deutsch wrote the seminal paper on quantum computing in 1985, and is leading figure in the quantum computing community. He went on to explain "the way that all quantum computers will be used is as a module that you plug into a classical computer, which will program the lasers to first set up the quantum bits in their initial state, then to do the desired computation on them, and finally to read out the answer to be incorporated into some classical computation."Deutsch doubts that quantum computers will ever be fabricated with the sort of large scale integration we see in CMOS devices. You can't transmit superpositions down wires, and the problem of decoherence would forbid that sort of approach. Instead he sees the first practical quantum computers as consisting of a single ion-trap cavity, containing a few tens of ions. These would be kept close to 0o Kelvin by laser cooling, using the same tunable laser that performs the computations. The laser would excite the ions by pulsing at different frequencies, causing different pairs to interact, rather like the way that the metal balls in a Newton's Cradle toy interact. The exponentially parallel nature of the quantum computer means you could still solve huge problems with only a few qubits. As Deutsch says "for classical computers the difficulty has been to make them smaller and faster, but for a quantum computer the difficulty will be to put more bits into the interactive region". Exciting as quantum computers are, they are not in direct line of descent from MOS technologies, and the same economics are unlikely to apply to their fabrication - they are likely to have their own, different equivalent to Moore's Law.
Taming Tough Algorithms
Perhaps the hardest to bear of the limits on computing has nothing to do with physics or materials, but with algorithms. Computer scientists classify algorithms according to how fast their execution time grows with the problem size n. The set P (for Polynomial) is the set of all algorithms whose running time grows only as a constant power of the problem size n (eg. as n2 or n3). P contains many well-known polynomial algorithms for sorting, searching and so on that form the basis for most of our computer software.
On the dark side is NP, the set of all algorithms whose time grows exponentially (or worse) with problem size (eg. as 2n). The running time of NP problems can soon overwhelm any known computer for even quite moderate problem sizes - and there are many economically important problems for which only NP algorithms are known. For example Public Key Cryptography is secure precisely because factorizing large numbers is an NP problem. These problems are not all so exotic either - even humble tasks like creating an optimum school timetable, or planning the best routes between several cities are NP problems if you need an exact solution. An NP routing problem that takes 17 minutes (210 secs) to solve for 10 cities would take 40 billion trillion years (2100 secs) for 100 cities. The real buzz about the quantum computer is that it promises to cut NP problems down to size. That's because quantum parallelism is itself exponential in nature - a register made up of n superposed qubits operates on 2n numbers in a single 'cycle'. That means that a quantum computer could in theory tackle what are at present NP problems in polynomial time, making them just another mundane computing task.
If and when a working quantum computer can be built, its first "killer app" will be the factorising algorithm discovered by Peter Shor of Bell Labs in 1994. Shor's algorithm works by filling a register with n qubits and setting them to the superposed state, then raising a random number x to the power of the register contents (ie. all the integers up to 2n), dividing by n and storing the remainders in another register. The remainders form a repeating sequence, whose repeat frequency f is revealed by allowing the qubits in the second register to interfere with one another, and from f you can calculate a factor of n. The method sometimes fails, but trying again with a different x will succeed, and since the whole process takes only three quantum steps, you can afford to keep trying until it does. No doubt government agencies (and businesses) that depend on Public Key encryption are watching these developments with interest. Fortunately it seems likely that quantum encryption algorithms can be found which are wholly uncrackable.