DDR FIFOs Deliver The Goods For Advanced Networking

FIFOs definitely play an integral and important role in a diverse range of challenging applications, such as storage-area networks (SANs) and networking routers. To keep pace with these fast-moving markets and meet new performance requirements, FIFO products continuously evolve. Today, the challenges that FIFO devices face include facilitating bandwidth and throughput of next-generation designs, while helping to drive down overall system costs.

Of course, the continuing push for higher-bandwidth data communication has created the need for FIFO devices with a higher data throughput. Traditionally, FIFO interfaces have been based on single-data-rate (SDR) parallel buses. However, this method has trouble keeping pace with applications that have higher throughputs. By moving to a double-data-rate (DDR) format, it's possible to leap to an entirely new level of performance.

DDR Approach: For years, designers have relied on FIFOs with SDR parallel bus architectures to provide crucial buffering between subsystems running at different clock rates. As overall system speeds rose in accordance with Moore's Law, designers helped FIFOs scale accordingly by increasing bus speeds or expanding bus widths. However, increasing the device speed meant that designers had to contend with augmented board-level noise and heat dissipation.

To circumvent the problems associated with increased clock speeds, some designers instead expanded the bus width (maintaining conventional clock speeds). By placing multiple devices in parallel, or increasing device bus widths, more data can flow through the system—effectively boosting the overall bandwidth.

This approach has its drawbacks too. The design requires substantially more data pins from the control ASICs or FPGAs, creating dozens more high-speed signals to route. Extra traces not only make it difficult to interface with other devices, but can also lead to higher noise margins and impedance. Additionally, I/O pins are a valuable commodity on ASICs and FPGAs. Designers are reluctant to increase pin counts due to a rise in cost that's proportional to the number of I/Os used.

Recently, a new memory architecture has emerged, designed to scale with the rising data rates found in current high-speed systems. By employing a DDR format, these devices address existing design issues while delivering an entirely new level of performance.

With a DDR device, data is clocked on the rising and falling edges of the respective read or write clock signals (Fig. 1). This effectively doubles the device's bandwidth without increasing the clock speed or bus width. For example, a 250-MHz DDR device operates at a performance equivalent to that of a 500-MHz SDR device.

In a DDR device, a write operation will occur on both the rising and falling edge of the write clock input (WCLK), if the write enable pin (WEN—) is active during the rising edge of the clock. The same is true on the read port, where a data-read operation will take place on both the rising and falling edges of the read clock input (RCLK), provided that the read enable pin (REN—) is active during the rising edge of the clock.

By using a 2n-prefetch architecture, where the internal data bus is twice the width of the external data bus, a DDR device can capture double the amount of data per clock cycle. As a result, a single write cycle will write two data words into memory, and one read cycle will fetch two data words from memory.

Two types of memory architectures are available today: DDR SRAMs and DDR FIFOs. Also, leading memory-device manufacturers, including IDT, Micron Technology, and Cypress, together developed the quad-data-rate (QDR) SRAM. It comprises an SRAM device that supports two DDR ports—hence the term "quad" (two ports with DDR). As each data bus operates on two words of data per clock cycle on the input and output ports, the QDR device will transfer a total of four data words per clock cycle. Like other SRAMs, these devices provide addressable memory capabilities based on substantial logic circuitry.

However, a wide range of applications don't require an addressable memory because data will transfer in a first-in-first-out sequence. Often a design also requires independent clock-cycle speeds on the input and output ports to support and connect subsystems operating at different frequencies. In these cases, the DDR FIFO provides an inherent advantage. It delivers the exact functionality required and features two independent clock domains versus QDR SRAMs, which typically operate on a single-clock domain.

The DDR FIFO architecture can handle up to 20 Gbits/s of data throughput, satisfying the demands for higher bandwidth in next-generation applications. These devices are tailored to support high-speed buffering applications. They automatically generate status flags to indicate if the memory is full, partially full/empty, or empty, helping to determine how much data the queue holds. Because the flags are very accurate, they help generate the backpressure signal operation. The QDR SRAM lacks these flags and consequently requires additional counters and compare logic to supply a similar capability.

In addition, the DDR FIFO architecture enables unique device configurations. These devices implement x40/x20/x10 data formats, where typically 32 bits are data, four bits are parity, and four additional bits can be used for application-specific functions like packet markers. For example, when implementing a data bus width of 40 bits (bits 0 though 39), data bits 39 and 38 can denote the start and end of packets. Data bits 37 and 36 can indicate valid bytes within the respective word, letting the devices perform packet delineation at the packet boundaries.

The DDR FIFOs can provide multiple data-rate options: SDR or DDR per port, and even triple-data-rate (TDR) options. TDR is a new concept made available by the data-rate-matching feature on DDR FIFOs. It allows the ports to operate at different data rates. For instance, the write port may operate in SDR mode and the read port might operate in DDR mode (or vice versa), producing a TDR operation.

Setup-And-Hold Issues: As clock rates rise, setup-and-hold times for incoming data are reduced, making it increasingly important to control clock skew. This is especially imperative for DDR devices, in which both the rising and falling clock edges are used to sample data.

Tiny differences in propagation delay and clock jitter lead to unacceptable degradations of overall system timing margins. The slightest variation can mean the difference between achieving or failing setup and hold. Plus, due to the nature of DDR devices, designers must deal with data running at the same high frequencies as the read clock.

To ensure reliable setup edges, some DDR FIFOs in-clude echo output signals that can help the system tolerate unwanted propagation delay and clock jitter. For example, IDT's TeraSync DDR FIFOs have two output signals—Echo Read Clock, or ERCLK, and Echo Read Enable, or EREN (Fig. 2). They provide tighter synchronization between the data read from the FIFO and the read clock as they're received at the receiving device. These signals act as an independent clock source and enable signals.

The EREN output signal is asserted whenever the DDR FIFO places a new data word onto the output bus. Used in conjunction with ERCLK, it provides a seamless way to receive data without any external logic or state machine—a more reliable method for reading output data from the DDR FIFO. This signal informs the user that a new word has been read out. When connected to the write port of the receiving device, it guarantees that every word transferred is a new word. This eliminates the need for a counter or state machine.

The output clock ERCLK always follows the input read clock, RCLK. The ERCLK output always transitions after the data switching point when a read operation takes place. The associated delay of ERCLK has a direct relationship to the data access time and is guaranteed to be slower than the slowest output data switch. Therefore, it can be used on the receiving device to "strobe" the data word. This delay permits the DDR FIFO to accommodate clock jitter and variations in data access time due to V_CC fluctuations, operating temperature, and device characteristics.

Fibre Channel SAN: Long-distance applications are becoming more common with the growth of SAN deployment and the increased use of applications such as remote backup, restore, and campus-wide SANs. Also contributing is the need to move high-bandwidth data like images, video, and large files. While long-distance connections improve the utility of the Fibre Channel network, they also create challenges for high-performance data movement that can be addressed by data buffering.

Buffering is key to improving the performance of long-distance connections, especially when operating at 2-Gbit/s (or higher) data rates. DDR FIFOs can easily provide the buffering necessary for SAN applications, keeping pace with the higher data rates of long-distance connections.

Fibre Channel data is organized into frames that can be conjoined into sequences to create large block transfers. Multiple sequences can be combined into one exchange, allowing up to 128 Mbytes of data to be transferred with a single I/O command. The Fibre Channel standard has defined buffer credit as a mechanism that establishes the maximum amount of data that can be sent at once. Buffer credit governs the maximum amount of frame data that can be in flight at any given time. This can severely limit performance if it's insufficient for the link distance and speed.

Consequently, the total data-buffer size available in the system dictates the amount that the device can extend the buffer credit. That, in turn, governs how much data can be in flight at any given time. Increasing the size of the receive buffer eliminates the dead time, allowing the remote transmitter to send frames continuously.

The DDR FIFO is ideal for buffering SANs, where it's used on the host bus adapter (Fig. 3). It supplies high-density FIFO buffering while providing for greater data throughput without dramatically increasing bus frequencies and bus widths. The amount of receive buffer credit is based on the size of the DDR FIFO and the maximum frame size negotiated. External buffer-control logic tracks the amount of data in the DDR FIFO and calculates the amount of buffer credit that can be extended. The external control logic must know the size of the DDR FIFO to perform these calculations.

Today's DDR FIFOs support 40-bit wide buses and can easily operate at 125 MHz (250 MHz maximum), resulting in:

125 MHz × 2(DDR) × 40 bits
= 10-Gbit/s throughput

An increase in Fibre Channel system bandwidth is achieved with a reasonable bus width and a relatively small increase in frequency. Any new requirement to increase the transmission-line length can be easily accommodated. In fact, the goal is to push it to 40 km. Today's DDR FIFOs provide up to 5 Mbits of buffering. At a rate of 10 Gbits/s, this translates to 500 µs of buffering. This 500-µs buffering is greater than the delay for the R_RDY signal to propagate, from the receive end to the transmit end, over 40 km within a SAN (approximately four times the buffering for a 10-km link). The R_RDY signal is an indication from the receiving node to the transmitting node of whether or not it's ready to receive data.

10- To 40-Gbit/s Routing: A comparison of the different memory approaches in a high-speed routing application gives a better appreciation of the relative strengths of the two types of DDR architectures, DDR/QDR and DDR FIFOs. To better service ever-increasing amounts of data, voice, and video traffic within high-bandwidth communications systems, packet processing at line rates of greater than 10 Gbits/s (OC-192) becomes increasingly important in high-end router designs. As these designs approach the next level of functionality—40 Gbits/s (OC-768)—they require even higher performance.

This push for performance is especially true for devices on the ingress of a router that are used to store incoming data packets within the router. Data memory structures are implemented at this critical juncture to absorb incoming data after the receiving router has generated a stop signal. Depending on the distance between the nodes and the link data rate, the ingress memory needs to absorb all data in the line between the two routers until the stop signal is recognized at the far end and the sending router stops transmitting data.

Longer span distances and higher link-bit rates require larger ingress memories. To achieve OC-192 (10-Gbit/s) and OC-768 (40-Gbit/s) line rates, the other key parameter is bandwidth, or throughput. For example, an OC-768 interface and 100-m fiber span demand that the ingress memory has a bandwidth of at least 40 Gbits/s and 13 Mbits of density.

A QDR SRAM could be deployed as an ingress memory on an OC-768 line card, with an FPGA for controlling the operation (Fig. 4). Because the SRAM is an addressable de-vice, it takes management along with a significant number of I/O pins to control the FPGA. Up to 252 pins are consumed for data-line purposes to achieve the 40-Gbit/s data throughput, while 18 pins are dedicated to address-line purposes.

But the addressable functionality in a QDR SRAM isn't necessary for the ingress interface in a router. The memory's primary function is to act as a simple FIFO. It precludes the need for addressing. Plus, more restrictions are imposed on the FPGA design because the QDR SRAM requires common clock domains on each port. As a result, the FPGA has to control the clocking domains of the memory device.

Because it doesn't need an address bus, the DDR FIFO has an inherent advantage in this router application, delivering the exact functionality re-quired by the ingress interface. The same performance as the combined FPGA and four QDR SRAMs is achieved with two DDR FIFOs used in parallel (Fig. 5).

Together, two 40-bit DDR FIFOs can accomplish a bandwidth of 40 Gbits/s, using fewer I/O pins—only 160—and managing asynchronous clock domains. This greatly simplifies the hardware design. Then, an additional FIFO pair can be cascaded in series to provide sufficient memory density. The result is a pin-efficient solution.

As designers tackle the creation of routers for 40-Gbit/s applications, they now have an option for developing an elegant, efficient solution for the FIFO ingress portion of the design. With a 40-bit wide data bus, a DDR FIFO interface delivers 20 Gbits/s of data throughput. This satisfies the demands for even the most aggressive bandwidths in packet-processing applications, such as OC-768 line cards. Although the ingress FIFO is one of many challenges that designers face in next-generation router designs, solutions like this one will help make OC-768 a reality sooner, rather than later.

Stewart Speed is an application engineer in IDT's FIFO division (www.
idt.com), Santa Clara, Calif. He holds a BS (with honors) in electrical/electronic engineering from John Moores University, Liverpool, England. Stewart can be contacted at (800) 345-7015 or via e-mail to [email protected].

Stefan Schoettl is product marketing manager for IDT's FIFO division. He received a BS in industrial engineering from FH Munich/Germany. Schoettl can also be reached at (800) 345-7015 or via e-mail to [email protected].