Next-generation packet-processing line cards will need large amounts of memory with very high bandwidth. Such cards could pack 50 Mbytes of high-performance SRAM in addition to 500 Mbytes or more of fast DRAM. Beyond the memory, the line cards will usually include various packet-handling functions that could require 5 Mgates or more of logic.
To meet the system performance objectives, yet ensure that the cards won't dissipate more than 150 W (the typical power limit that the card rack can supply for each card), designers have to optimize the memory subsystems. This must be treated as a top design priority—with careful evaluation of how to implement various memory blocks. These can be off- or on-chip (embedded), and they're generally composed of SRAM, DRAM, or content-addressable memory (CAM). Designers must consider the subtle tradeoffs involved with capacity, memory type, bus width, latency, permanency, power consumption, cost, and other factors. A thorough analysis of memory tradeoffs at the very outset of card design will drive optimal partitioning and most efficient use of the system logic.
In a line card, there are often four distinct applications that require memory: packet handling, header handling, routing-table lookups, and program storage. Past generations of line cards for slower network data rates could satisfy most of those tasks with external SRAM and DRAM. Discrete DRAM remains the best choice for packet buffers, which tend to be large (from 32 to 500 Mbytes), and has to provide high bandwidth, as wire speeds increase. Header memory is usually in the 4-Mbit range and, like the packet buffer, has low permanency, because it's immediately related to the packet stream.
The routing-table lookup memory typically is larger than 4 Mbits, and it must allow very fast access and have a short cycle time. It also has a much higher permanency. Plus, most previous-generation line cards are implemented with SRAM or CAM. New high-performance SRAMs that lack bus-turnaround penalties offer data transfer speeds of 400 to 500 Mbits/pin. Therefore, they're well suited when off-chip lookup tables or other fast-access memories are necessary. For the large buffer memories, reduced-latency DRAMs and such devices as the fast-column DRAMs would be well suited to form the large buffer memories. Finally, the program memory for network processors has high permanency, and it's normally not larger than 8 kwords. Flash or standard SRAM would probably serve well in this portion of the line-card application.
Both header and lookup memory have become worthy candidates for embedded DRAM. This is because DRAM cores for conservative process technologies now allow 250-MHz page-mode access cycling and random-access times of as short as 10 ns or better, while also enabling memory bus widths from two to 16 times those possible with SRAM cores. Wide buses are the key to achieving the bandwidth essential for handling packet transactions at the rates that new line cards will require.
If sheer random-access speed were the sole memory criterion, only SRAM would be considered. But embedded SRAMs have several inherent limitations (relatively narrow bus widths, high power consumption, and low bit density) that weigh against them in these applications. On the other hand, program memory may not be most efficiently implemented with embedded DRAM.
Given comparable fabrication process geometries, DRAM cells are much more compact than SRAM cells. When blocks of memory that exceed several kilobits are needed, embedded DRAM-based arrays become much more area-efficient than SRAM blocks, even when the overhead control circuitry is included in the efficiency analysis. The embedded DRAM requires only about one quarter of the silicon area that an equivalent capacity SRAM would occupy.
Some ASIC libraries offer SRAM-like cores based upon single-transistor cells (versus conventional SRAM cells comprising four to six transistors). These libraries can be used to build memory with very fast access comparable to a conventional SRAM. Yet such cores consume far more silicon than competitive DRAM and are optimized for use in processes targeted at logic-dominated chips. So this memory option best suits applications that need only small blocks of memory.
Some ASIC vendors have merged DRAM and logic process technologies for greatly improved memory density. The use of a trench-capacitor storage cell, like those employed by Toshiba and a few other companies, enable DRAM cells to be as small as possible, yet permit the accompanying logic to perform as fast as logic-optimized ASIC processes. Logic speed, memory density, memory performance, and cost certainly differentiate ASIC vendors offering memory cores, even at comparable technology nodes. Recognizing the relative advantages and disadvantages is mandatory for resolving how to best partition card functions.