Much has been written in the past few months about the upcoming sunset of Moore’s Law. Stated succinctly, Gordon Moore predicted in 1965 that the number of transistors in ICs would double every 12 months1. There have been other versions of his prediction, including shifts to a performance metric and altered timeframes. However, the message is clear: a key metric doubles on a regular schedule and has for many years.
Moore has also stated that no exponential growth can continue forever, and it appears we are nearing the end of conventional silicon semiconductor scaling as we know it. Fundamental limits are being approached now both in planar silicon transistor technology and in on-chip interconnects. Furthermore, the costs for leading-edge photolithography have greatly accelerated as process nodes have shrunk.
Making faster transistors that are smaller and don’t leak when they’re turned off is getting extraordinarily difficult and expensive. They need to be good switches with low on-resistance and high off-resistance, and they need to change states very quickly. Despite the largely planar architecture of the process they’re made on, 45-nm generation transistors are highly complex.
Such transistors combine strain engineering and raised source/drain structures with exotic gate stacks employing high-K gate dielectrics and metal replacement gates as the control electrode (Fig. 1)2. The 32-nm process node will continue to feature planar transistors, but what happens at the 22-nm node and beyond isn’t so clear.
Interconnecting the transistors is also becoming more challenging. Interconnect delay and interconnect density are the primary issues to address3. As transistors scale, so must the on-chip electrical interconnects.
Depending on the distance separating two interconnected gates, the dominant factor determining the overall propagation delay changes. For localized wiring, the resistance of the transistor dominates. But for longer connections, the resistance of the wires becomes the dominant factor once a certain critical length of wiring is reached.
The critical length is that length where the RC delay of an interconnect line equals the delay of the same length line with a buffer inserted mid-line driving a fanout of four (Fig. 2). As the process node shrinks, this critical length shrinks as well. For a 65-nm process node, the critical length is 100 µm. For 45 nm, it’s 70 µm. And for the 32-nm node, it’s 50 µm (Fig. 3).
As these longer wires are scaled, the resistance is increasing faster than their capacitance is decreasing due to fringing capacitance from the sidewalls of the conductor. But another factor is looming on the scaling horizon: electron scattering in the interconnect metallization.
As the physical dimensions of a conductor approach the mean free path of its charge carrier (electrons), scattering at edges and grain boundaries greatly increases. The scattering of the electrons impedes the flow of current, increasing the bulk resistivity of the material.
For 30-nm dimensions in copper, this can more than double the bulk resistivity of the copper interconnect (Fig. 4). Figure 5 shows the critical dimensions of metallization structures for process node progressions from 65- to 32-nm nodes from NEC. The dimensions are approaching mean free path dimensions, so this phenomenon is now becoming an issue.
This is a serious technological challenge with no known solution4. On the one hand, we need scaled interconnect dimensions to pack more circuitry per unit area. But on the other hand, if we make wire dimensions too small, the electrical properties of the conductor are seriously degraded to the point where the result can be a slower chip after shrinking. In these cases, it makes more sense to use vertical connections to a 3D stacked chip rather than routing high-speed signals across a big die.
POWER AND BANDWIDTH
Power is another serious concern that has been driving scaling. Historically, a given processor scaled to a more advanced process node will dissipate less power at a given performance level. Holding power constant, performance can therefore be increased. The advanced process node is simply more energy-efficient.
Instead of pure clock rate increases alone, it is more power-efficient to limit the clock rates to moderate levels and to scale performance by increasing the number of processor cores on the processor die. As a result, the need for memory bandwidth is increased. Each core needs its own data, and now there are more cores to feed.
On-chip caches reduce the average external bandwidth requirement when viewed over a broad mix of workloads. But they come at a price of die area and power. Because the caches tend to be tightly coupled with a processing unit, including them on-chip tends to spread the cores apart, aggravating the interconnect length/delay issue.
As the industry transitions from multi-core to many-core architectures, the bandwidth explosion issue must be effectively addressed for the potential of the technologies to be realized. For example, Intel has reported that an 80-core processor needs approximately 1 Tbit/s of external cache memory bandwidth.
Adequate-sized third-level caches aren’t practical for on-chip integration, so there is no choice but to implement them on separate die. But delivering that sort of bandwidth to a single CPU chip is challenging from a signal interconnection and power perspective.
State-of-the-art high-speed chip-to-chip signaling consumes about 25 to 30 mW/Gbit/s of bandwidth5. For 1-Tbit/s bandwidth, this works out to 25 to 30 W. Running the links at 5 Gbits/s would require 200 signals that would need to be implemented as differential links or a total of 400 pins, not counting power and ground connections. Using a power/ground pair for every four pins demands another 200 pins. So, at least 600 pins are needed to connect to this off-chip cache.
Recognizing that the third-level cache simply needs to be wide and shallow, short and slow unterminated links could be used if the memory die can be placed close enough to the processor since there is little fan-out on any single bus wire. Significant power savings can be realized by the use of simple drivers and unterminated off-chip signaling. The bus can be implemented using approximately 2 mW/ Gbit/s per link in this physically constrained configuration.
If each signal operated at 1 Gbit/s single-ended, the Tbit/s link can be accomplished using 1000 signals. Using a power and ground pair for every four signals adds another 500 connections, requiring a total of 1500 connections to the memory. Such a simple brute force scheme trades off a large increase in wires for a big decrease in complexity and power.
However, this scheme will only work when the memory can be placed physically close to the processor die: less than 2 mm away. That is only practical when the memory can be connected using 3D methods. Conventional packaging cannot provide the density of signal connections and the short physical length required to make such a scheme possible.
By adding large third-level caches implemented as memory die laminated onto a many-core processor die, large and scalable memory bandwidth can be supplied to the processor using 3D through silicon vias (TSVs). The solution avoids the fanout and power issues that are associated with conventional packaging. It additionally avoids the wire dimension scaling problem mentioned above by sending signals up and down to separate die instead of across large die.
It’s faster to go up than it is to go across. This solution to the processor/memory bandwidth explosion likely will be the pathway to continue the Moore’s Law scaling, but using a hybrid combination of traditional semiconductor scaling married to advanced packaging technologies.
CAD TOOLS AND THERMAL ISSUES
Many issues must be addressed to make 3D technology practical. On the design side, the CAD tools needed for 3D IC design aren’t very mature. Today, it is possible to build structures such as the cache atop the many-core processor mentioned earlier. But to truly exploit the full capabilities offered by 3D, additional functionality must be efficiently supported in CAD.
Modern CAD tools aren’t well optimized to permit a many-core processor to be partitioned into a multi-level vertical structure with various elements of the processor stacked on each other to minimize delay and power. These tools can be used to co-design a memory die capable of being stacked onto a many-core processor using TSV technology. Yet that’s sufficient to begin the process of converting the industry to manufacturing 3D products.
Processors can consume a lot of power, so thermal considerations are of great importance. Increasing clock rate, functionality, and density tends to aggravate thermal issues, and stacking die atop each other concentrates the heat—all undesirable. One of the challenges of 3D integration will be intelligent management of thermals. Once again, CAD tools can bring relief by predicting “hot spots” on the die. Coupled with 3D floor planning, a more uniform heat distribution can be designed into the 3D chip architecture.
Physical packaging of 3D structures is another challenge. For the processor/memory example above, the question of placing the memory on top of or below the manycore processor die must be answered. Placing the memory atop the processor avoids the need to pass processor signals through the memory die. But the memory die interferes with the heat removal path for the processor because it is placed between the processor die and the heatsink.
On the other hand, placing the memory die between the processor die and printed- circuit board (PCB) means all signals for the processor must pass through the memory die, including power bussing. This complicates the design because many “keepout” regions must be designed into the memory so the vias can be drilled and placed. Passing the processor connections through the memory die increases their length and, hence, the parasitic packaging resistance, capacitance, and inductance.
In some cases it may be advantageous to embed components into the package substrate such as decoupling capacitors or terminating resistors or even active die. The key concept is to get these components and ICs electrically and physically close together to minimize undesirable parasitic loading of chip-to-chip interconnect and the time-of-flight delay between chips. Advanced package substrate technology is needed in this area, involving multi-chip system-in-package applications including embedded active and passive die (Fig. 6).
Another packaging challenge arises from the use of low-K dielectrics for the insulators between the on-chip wiring layers. Low-K dielectrics often feature porous structures to reduce their relative permittivity. Many of these low-K structures are very fragile and can be damaged during normal packaging processes, including back grinding (thinning), wire bonding, and flip-chipping, as well as in subsequent thermal cycles.
An additional challenge is preventing solder intermetallics resulting from flip-chipping causing electro-migrationinduced voids in the metallization structures. Tessera has been developing finepitch copper pillar flip-chip substrate technologies with a specific goal to address these failure modes (Fig. 7).
A VIABLE MANUFACTURING INFRASTRUCTURE
Critical to all of these advanced technologies is a viable supporting manufacturing infrastructure. More is needed than just semiconductor chips designed to be used in 3D applications. Low-cost manufacturing solutions including testing schemes are needed to assemble known-good die into 3D structures in high volume for the technology to be successfully deployed.
Today’s semi manufacturers extensively use contract assembly and test houses. One of the logistical issues to be addressed for 3D is who performs what function. Depending on the TSV technology used, such as “via first” or “via last,” the foundry or the assembly/test house may be the optimum place to assemble the 3D structures.
The so-called “via first” methods are best performed inside the wafer fabs while the “via last” processes can be effectively deployed in third-party contract assembly facilities. The methods used for bonding can similarly vary, and what works well for a wafer fab may not work so well at a contract assembly house. Bonding methods include oxide-to-oxide bonding (best done in a wafer fab), copper-to-copper bonding, and adhesive-to-adhesive bonding.
Prior to lamination, the devices can be handled in different ways. For example, wafers can be laminated directly to other wafers. This has advantages in terms of throughput, but there are issues with defective die being laminated atop good die. Another approach is to “reconstitute” wafers by placing only good die together onto a virtual wafer for lamination. Another method involves laminating die atop wafers. Each method has its tradeoffs.
The end-use applications will drive the methods used for 3D integration and when the conversion makes sense. The requirements for processors are different than for cell phones or camera modules. The industry has a financial incentive to continue Moore’s Law scaling, but the pathway forward necessarily involves more reliance upon advanced packaging technologies to complement and augment wafer fab approaches.
This is a fundamental change from the way semiconductor device scaling has historically progressed. However, it is required due to the real physical hurdles and skyrocketing cost associated with monotonic process node scaling. The challenge lies in seamlessly implementing this transition and continuing the scaling treadmill without disrupting the industry’s business.
1. “Cramming More Components into Integrated Circuits,” Moore, G.E., Electronics, Vol. 38, issue 8, April 19, 1965.
2. “A 45-nm Logic Technology with High- K+Metal Gate Transistors, Strained Silicon, 9 Cu Interconnect Layer, 193nm Dry Patterning, and 100% Pb-free Packaging,” K. Mistry et al, IEDM 2006.
3. ITRS Roadmap 2007, “Interconnect,” pp 39-40.
4. Sematech/Novellus Copper Resistivity Workshop, June 2005.
5. “3D Technology-A System Perspective,” Presentation from Shekhar Borkhar (Intel) from 3D Conference Burlingame, Calif., November 2008.
RICHARD CRISP is the director of semiconductor technology and applications for Tessera. He graduated cum laude in electrical engineering from Texas A&M University.