Have we finally broken Moore's Law? Are the last two years' announcements of multi-core processors*, rather than denser and faster single-cores, a subtle signal that semiconductor manufacturers are at the end of the line? The short answer is, of course not, it's a convergent infinite series; the question is where's the asymptote? but I think something else may happen before we get there. Here's why I think that.
Please recognize that I understand just enough about device physics and process technology to be dangerous, but enough may do. I recently got a briefing that illustrated just how few dopant atoms there actually are in our next generation of transistors. When you're talking about mere handfuls of atoms, statistical process variations are correspondingly huge, and dealing with them may soon reach a point where the math offers insufficient handles for grappling with them. The briefing concerned a DSP chip, built on a 28-nm process node that works now on the bench and that should be available in commercial quantities in two years or so.
For what it's worth, here's a link to the Wikipedia entry on Moore's law. The article says what you expect it would, but it includes some interesting commentary.
Now for the ISSCC briefing. At the conference, I was talking to TI's Gordon Gammie, who was the TI lead on a combined TI/MIT team that reported a breakthrough at 28 nm at the conference. (THE MIT side of the team was headed by Professor Anantha Chandrakasan.)The paper described a joint research project detailing design methodologies for: "A 28nm 0.6V Low Power Digital Signal Processor (DSP) for Mobile Applications" Gammie said it was 28-nm version of a TI TMS320 VLIW DSP. When the IC reached production, the virtue of the advanced process technology would be longer battery life, as a result of the lower operating voltage. For engineers who understand more about process technologies than I do, the paper says the technology uses, "a dual-gate poly/SiON gate stack, double patterning at gate, high-NA 193i lithography and epitaxial S/D SiGe for pMOS performance enhancement."
As I usually do, I asked Gammie, "What was the hard part?" He explained that there were multiple hard parts, but the most significant had to do with process variations that led to timing closure and threshold voltage problems. No surprise there, but he said the issues were not caused by the lithography; they were caused by how few dopant atoms there were in the lattice of each down-scaled transistor. In turn, this led to variability in propagation delay, which created challenges in achieving timing closure.
As the paper puts it: "Timing closure is a challenging problem for ULV designs. . . . . [F]or a representative library cell, the local 3σ delay variation is larger than the 3σ global corner delay by 1.5×. As the supply voltage decreases from 1.0 V to 0.5 V, the 3σ global corner delay increases by 15× and the standard deviation of the local delay increases by 100×."
Also: "Predicting path-level delay distributions for timing closure is another challenge of ULV design. Modeling delay with Gaussian PDFs, as traditional statistical static timing analysis (SSTA) tools do, leads to a consistent underestimate of the actual delay distribution by 10-70%."
To deal with these hard parts, TI and MIT developed some new methodologies. One was a new statistical static-timing analysis technique. "For this design, setup and hold time margins at 0.5 V are verified using a new ULV SSTA design methodology based on Non-linear Operating Point Analysis for Local Variations (NLOPALV). NLOPALV models cell delay distributions to within 5%, and can be used in conjunction with existing static timing analysis (STA) tools to predict path delays to within 8%."
By way of results the paper offers: "The DSP SoC design has been fabricated and demonstrated to be operational from 587 MHz at 1.0 V (113 mW) down to 3.6 MHz at 0.34 V (720 μW) when operating from external memory (caches disabled). At the ULV target voltage of 0.5 V, the maximum frequency is 43.4 MHz. The on-chip caches are functional for supply voltages above 0.6 V. When executing from cache, the chip scales from 145 mW at 331 MHz (1.0 V) down to 5.9 mW at 14.4 MHz (0.6 V). For lower voltage and reliable ULV cache operation in production, redundancy and repair should be implemented. Active and leakage power scale by 60× and 8.5×, respectively, when executing from cache, and by 1240× and 39× when executing from external memory. The measured leakages are representative of early development silicon with transistors not yet at final leakage targets for the technology. The minimum energy per cycle occurs at 0.75 V (cache on), or 0.5 V (external memory) and is expected to reduce slightly as leakage is reduced."
So, multiple processor cores may be a sign of impending power problems for Moore's law, but on the scaling front, companies like TI, working with students and professors like the ones at the 'Tute, are still squeezing transistors, nanometer-by-nanometer, and the lithography still works, at far smaller scales than we ever thought it can be pushed. But, sheesh! Gammie was talking about bare handfuls of dopant atoms per transistor. And these are quantum entities; it's not like loading billiard balls into a rack on a pool table. The next few turns are bound to be interesting.
*Here's a historical footnote about multi-core processors. Circa 1990, when I was working at Cypress Semiconductor, Sun Microsystems announced a 4-core version of the SuperSparc processor and Cypress developed a pair of cache-controller chips. The challenge at the time was to figure out what to do with four cores. This was before Web browsers, before server farms, before virtualization. Essentially, the only boxes that could use the chips were workstations, but there were no C compilers that could break up an application program into multiple parts and put it together again, and few applications that needed that capability. ("Hey! We could simulate all four process corners at once!" somebody shouted. Multiple cores were a better mousetrap, but first, we needed better mice.