New Interfaces Squeeze Top Performance From SRAMs

The need to cut bus latency and access times gave rise to a trio of standards now vying for incorporation in next-generation system designs.

Dave Bursky

May 27, 2002

12 min read

Add Us On Google

As system speeds continue climbing to new heights, memory subsystems must keep up. Standard asynchronous and synchronous-static RAMs can no longer deliver the data bandwidth needed to feed system buses that operate at 200-, 300-, and even 400-MHz clock speeds. New generations of static RAMs are necessary to eliminate the latency between Read and Write operations; double the data transfer speed by offering data transfers on both the leading and trailing edges of the data clock; and provide wider buses to increase the data bandwidth to and from the memory.

These new generations of static RAMs, and even some dynamic RAMs, meet the high-performance demands of such applications as network switches, routers, and servers. The memories can deliver up to 24-Gbit/s data bandwidths through a combination of architectural enhancements.

Competing in this high-performance area are three fairly new static memory interfaces. The oldest, zero bus turnaround (ZBT), eliminates the latency encountered in previous memory designs when the data bus must switch from reading data to writing data (or vice versa). Created by Integrated Device Technology, it's now available from many SRAM suppliers and goes by several other names, like no-bus-latency (NoBL), as coined by Cypress Semiconductor, and no-turnaround RAM (NtRAM), as touted by Samsung.

By employing both the rising and falling edges of the clock signal, DRAM and SRAM designers doubled the memory's basic data rate by transferring a word on each clock edge. This double-data-rate (DDR) interface is widely available on high-performance DRAMs. It's just starting to appear on high-bandwidth SRAMs, including the new SigmaRAM interface.

So if doubling the data transfer rate is helpful, why not quadruple it? That's happening with quad-data-rate (QDR) SRAMs. To double the data rate over the DDR memories, designers separated the I/O paths rather than sharing a common bus between input and output. Thus, QDR memories pack two data buses, one for data transfers into the chip, and one for transfers from the chip to the system. Because the two buses simultaneously transfer data using both clock edges, QDR devices quadruple the data bandwidth over single-data-rate SRAMs with common I/O pins.

With all these options, traditional synchronous-burst pipelined and asynchronous RAMs no longer provide adequate solutions. The reason mostly stems from technology advances that now let PC and RISC processor designers integrate a megabyte or more of first- and second-level cache on the CPU chip. This eliminates the need for external caches. In PC and workstation applications, the burst pipelined approach works well because the CPU would typically request a full cache-line replacement, so it actually streams in four or more 32-bit words. That matches well with the burst pipelined memory architecture.

Yet in non-PC or workstation applications, like network servers, routers, and switches, the more random nature of the memory accesses would play havoc with the burst pipeline and the BPSRAMs would perform poorly—thus the need for the new memory interfaces and new features. This doesn't mean that those applications only use one kind of high-performance memory. Instead, these complex system products often contain multiple memory types to best match differing performance aspects of the system.

Today's designers face the challenge of selecting the best memory type to fit their applications. For which application would a ZBT-type part work best, or which system does the QDR II memory fit better? Or would a competitive DRAM solution, like the fast-column DRAM or reduced-latency DRAM (FCRAM and RLDRAM, respectively), be more suitable due to its fourfold advantage in bit capacity?

In the SRAM world, designers must now pick from almost half a dozen interface options, then from various operating modes and features to use. These choices give a wide range of performance levels while presenting challenges to the creation of an optimal solution.

Over half a dozen companies offer SRAMs that lack bus-turnaround latency. These memories eliminate the idle cycles typically encountered when switching a bus from performing a Read operation to performing a Write operation. To do that, the memory designers realigned the Write data so the address-to-data relationship is the same whether the memory reads or writes data. A 2-bit burst counter on the memory chip allows up to four words to be transferred, reducing the address-bus bandwidth requirements.

No-latency memories usually have a flowthrough or a pipelined architecture. Flowthrough devices always have a one-cycle delay from address-in to data-out in the system. This minimizes the system latency, but designers pay a price in design margins: The SRAM must fetch data and deliver it in time for the next clock edge. That places an upper limit on the clock frequency and typically restricts the flowthrough approach to single-data-rate applications.

Pipelined versions of the no-latency memories include a pipeline register in the output datapath. This register lets an internal read use a full clock cycle. During the next clock cycle, the data is delivered to the outputs, and the memory array is free to perform another access. The address-to-data relationship has one additional cycle of delay compared to the flowthrough approach, but the no-latency pipelined architecture can operate at higher clock frequencies than the flowthrough approach.

QDR SRAMs, initially defined by Cypress Semiconductor, Integrated Device Technology, and Micron Technology, now include suppliers like NEC and Samsung as part of the codevelopment group. Though the QDR specification was first released about two years ago, the group is focusing heavily on the second-generation QDR interface, QDR II. It doesn't recommend the original QDR interface for new systems because that specification defined operation at speeds only up to 200 MHz. QDR II is scalable and enables speeds to 750 MHz on the input clocks, allowing 1.5-GHz data transfer rates on the I/O pins.

Able to operate at 333-MHz clock speeds, the QDR II and second-generation DDR interface (DDR II) enable data transfer speeds of 666 Mtransfers/pin. Depending on the word width—8, 18, or 36 bits—the aggregate data rate of the QDR memory is 5.3, 12, or 24 Gbits/s, respectively. Initial QDR II SRAMs are available with 18-Mbit densities. Most QDR II suppliers expect to sample or produce a 36-Mbit version with either 18- or 36-bit I/O buses in the second half of this year.

Many Options Let You Optimize: The QDR II specification defines five memory versions. QDR II SRAMs are defined by a burst length of 2 or 4, whether they use common I/O or individual I/O paths, or a DDR-type interface with separate I/O paths. Each of the five is available with three word-width options. Not counting speed grades, there are 15 physical variations of QDR II memory at each density level. But all versions operate from a 1.8-V supply and use a 165-contact, 13- by 15-mm fine-pitch ball-grid array (FBGA) package.

Implementing a common FBGA package for all versions lets designers create a universal memory site on the circuit board so they can change the memory without redesigning the pc-board layout. The standard outlines a memory density roadmap of up to 288 Mbits in the same 165-contact package.

Separate I/O bus versions well suit applications with a mostly alternating mix of Read and Write operations. For applications that do a lot of data streaming, like 16 Reads followed by 16 Writes, a common I/O bus QDR or DDR memory would be more appropriate because only one bus is used at a time.

Basically, QDR memories have two data ports that operate independently at twice the clock rate. That permits a total data transfer of four data words in one clock cycle. Designers can additionally select a two- or four-word burst option (see the table). The two-word burst implementation can indefinitely sustain both a two-word Read and a two-word Write every clock cycle. Internally, the memory array uses the first half of the clock cycle to execute the read portion of the operation and the second half of the clock cycle to execute the write operation (Fig. 1).

To do the dual Read and Write operations, the memory must accept two addresses per clock cycle. Because the address bus is shared among the Read and Write ports, the address rate on the bus must be doubled to accommodate the multiple address transfers.

In the four-word burst QDR, the memory sustains both a four-word Read and a four-word Write every two clock cycles, and it internally uses the clock cycles in a different mode than the two-word burst memory. The first cycle performs an internal Read. Data read from the memory array is transferred out of the chip via two consecutive clock cycles, making a total of four output words. The clock cycle following the Read operation can be a Write, which only needs to supply four words of input data and just one address per clock cycle.

A system that can take advantage of the longer four-word bursts makes a design easier to implement as the effective address rate is half as fast as the two-word burst QDR device. Assuming that the silicon of either burst-size memory operates at the same top speed, the four-word burst device can inherently run at higher clock frequencies than the two-word burst memories.

QDR SRAMs also use two pairs of clock signals—a master input clock pair that's nominally 180^o out of phase (referred to as K and K#), and an output data clock pair that synchronizes the timing of the output data bus (C and C#). Only the rising edges of the K and K# signals are used on the input portion of the QDR SRAMs. The output clocks are optionally used to control when data is clocked out of the SRAMs. These output clock signals are handy in systems that have multiple SRAMs located at different physical distances from the bus master. If not required to synchronize the data, the C and C# clocks can be strapped High.

One way to increase data bandwidth is to widen the data bus. But that comes with a penalty—higher chip pin counts and potentially higher power dissipation. Still, BGA packaging makes higher contact counts less unwieldy. The last of the three new interfaces, the SigmaRAM, is available in 18-, 36-, or 72-bit wide versions with densities of 18 or 36 Mbits. The SigmaRAM consortium consists of Alliance Semiconductor, Integrated Silicon Solution, GSI Technologies, Mitsubishi Electric, Toshiba, and Sony, which all defined the memory operation, pinout, and specifications.

Already, the consortium has received approval for a family of devices with densities from 18 to 144 Mbits. Versions in the family will have common I/O with either SDR or DDR signalling, and separate I/O with either SDR or DDR signalling. Common I/O versions will include devices with bus widths of up to 72 bits, and separate I/O devices will offer 36-bit input and 36-bit output buses. That's double the maximum width offered by the QDR devices.

The first SigmaRAM version to be sampled by most suppliers will operate at 333 MHz and offer data word widths of 18, 36, and 72 bits. Targeted for operation from a 1.8-V supply, the SigmaRAMs will come in 209-bump BGA packages and use a single-data-rate common I/O bus. The RAMs can deliver data in only 1.6 ns (from clock edge to data output). Also, the SRAMs have a short cycle time—just 3 ns when clocked at 333 MHz.

Second-generation SigmaRAMs, targeted for sampling in the last quarter of this year, will employ DDR signalling on the common I/O bus, doubling the data transfer rate to 666 MHz. Initially, these memories will come in 18- and 36-bit data bus widths. But in the planning stage for release in 2003 is a third version of the SigmaRAM with separate I/O buses and DDR signalling. It will include models with 9-bit data input and output buses.

DRAMs Can Compete With SRAMs: While all this development has been taking place in the SRAM arena, DRAM designers haven't ignored the need for high data bandwidth and low latency. DRAM-based solutions offer at least four to eight times the bit capacity per chip. Top SRAM capacities being released this year are at the 32-Mbit level. DRAM-based alternatives are available at 256 Mbits and can readily move to 512 Mbits/chip in the near future.

When it comes to performance, readily available 256-Mbit DDR SDRAMs provide a first-level improvement over standard SDRAMs by allowing data transfers to take place at twice the 133-MHz clock rate (266 Mtransfers/s). Companies are already sampling next-generation designs that will permit 333 Mtransfers/s, and even faster DDR interfaces are on the drawing board.

To deal with latency concerns, Micron and Infineon have formed a consortium to offer a reduced-latency DRAM (RLDRAM) that also uses a DDR bus interface. These RLDRAMs can supply a sustained data bandwidth of 19.2 Gbits/s with random accesses. Latency has been reduced to 22.9 ns and the row cycle time kept at 25 ns.

An alternative to the RLDRAM is the fast-column DRAM (FCRAM) developed by Fujitsu and Toshiba. The latest versions will offer 256-Mbit densities. The highest-speed option will let the clock rate hit a top speed of 200 MHz. Random accesses can be done in as little as 22 ns. The full random-access cycle time is just 25 ns.

To achieve its high performance, the FCRAM employs a direct-address input. The address bus isn't multiplexed like a traditional SDRAM address bus that uses two cycles to send the row and column addresses. Rather, the entire address is presented in a single cycle, reducing the access time by almost 50% versus a DDR SDRAM (Fig. 2). The interface can be considered a superset of the DDR SDRAM standard from JEDEC, so controllers can readily be designed to handle both memory types.

Even the well-established Rambus RDRAM provides a potential high-speed solution with data transfer rates currently peaking at 600 Mtransfers/s per pin, and a roadmap that will take the architecture past 800 Mtransfers/s. But its architecture better suits applications that do more sequential addressing than pure random addressing of the data locations.

Need More Information?

Alliance Semiconductor Corp.
(408) 855-4900
www.alsc.com

Cypress Semiconductor Corp.
(408) 943-2600
www.cypress.com

Denali Software Inc.
(650) 325-7241
www.denali.com

Enhanced Memory Systems Inc.
(719) 481-7000
www.edram.com

Fujitsu Microelectronics Inc.
(408) 922-9104
www.fma.fujitsu.com

GSI Technology
(408) 980-8388
www.gsitechnology.com

Hitachi America Inc.
(408) 456-2180
www.semiconductor.hitachi.com

Infineon Technologies Corp.
(888) 463-4636
www.infineon.com

Integrated Device Technology Inc.
(800) 438-2667
www.idt.com

Integrated Silicon Solution Inc.
www.issi.com
(408) 969-4747

Micron Technology Inc.
www.micron.com
(208) 368-4400

Mitsubishi Electric and Electronics USA,
(408) 730-5900
www.mitsubishichips.com
(408) 969-4747

NanoAmp Solutions Inc.
www.nanoamp.com
(408) 573-8878

NEC Electronics Inc.
www.necel.com
(408) 588-6000

Samsung Semiconductor Inc.
www.samsung.com
(800) 423-7364

Sony Semiconductor Corp.
www.sel.sony.com
(408) 432-1600

Toshiba America Electronic Components Inc.
www.toshiba.com/tae
(408) 965-4200

Virage Logic Corp.
www.viragelogic.com
(510) 360-8000

Organization Web sites:
www.rldram.org
www.qdrsram.com
www.sigmaram.org

About the Author

Dave Bursky

Technologist

Dave Bursky, the founder of New Ideas in Communications, a publication website featuring the blog column Chipnastics – the Art and Science of Chip Design. He is also president of PRN Engineering, a technical writing and market consulting company. Prior to these organizations, he spent about a dozen years as a contributing editor to Chip Design magazine. Concurrent with Chip Design, he was also the technical editorial manager at Maxim Integrated Products, and prior to Maxim, Dave spent over 35 years working as an engineer for the U.S. Army Electronics Command and an editor with Electronic Design Magazine.