FIFOs Excel At Matching Frequencies, Buses, And Data Rates

Designs that once required discrete SRAM-based solutions can now take advantage of advanced FIFOs that offer a wide range of capabilities. In addition to fast synchronous operation and high densities, today's FIFOs can connect buses of different widths and clock frequencies, as well as retransmit data on demand. Programmable features also simplify design-in.

This article describes some of the latest FIFO features and illustrates how they are beneficial in three different designs. The first of these design examples, an X-ray image-processing system, presents a method for calculating the required FIFO depth. This technique can be applied to other designs. The other two design examples describe a PCI bus algorithm accelerator and an HDTV encoder system.

FIFO architectures have come a long way in recent years, as have FIFO speeds and densities. With synchronous speeds as high as 133 MHz and densities up to 4 Mbits in a single chip, FIFOs can perform data-buffering tasks better than ever. Considering that a FIFO's separate read and write ports result in an effective 2X bandwidth compared to discrete SRAMs, FIFO speeds are high enough to solve a wide variety of high-performance challenges.

One common challenge occurs when two data buses must communicate, but they have different widths. Some FIFOs can be configured (via control pins) to accommodate input or output bus widths of X9, X18, or X36. This capability enables the FIFO to connect a 9-bit bus to a 36-bit bus, for example.

The data multiplexing to handle the different bus widths is all done transparently in the FIFO, which also can manage different clock rates on the input and output to provide frequency coupling between the buses. The X-ray and algorithm accelerator design examples presented later in this article demonstrate both bus matching and frequency coupling.

Full and empty flags are another aspect of FIFOs that affords a great deal of design flexibility. Programmable almost-full/-empty flags, which can be programmed to any value, have made it easier to get the full benefit of a FIFO's capacity in a system.

For applications in which the default settings at the programmable flags aren't suitable, consider taking advantage of a FIFO that lets you load the flag-offset registers serially. The serial approach can furnish an advantage when multiplexing the FIFO's input data bus with flag-offset-programming lines isn't possible. Some FIFOs also feature partial reset, which allows you to reset the device without losing the settings of the initialization registers. The partial reset affects only the position of the read and write pointers.

You also can choose between synchronous and asynchronous almost-full/-empty-flag timing modes. Using the almost-empty flag as an example, in synchronous mode it is asserted and de-asserted with respect to the read clock. In asynchronous mode, the almost-empty flag is asserted based on the read clock and de-asserted based on the write clock. Both of these modes may be available in the same FIFO, so you can adapt the FIFO to different design requirements.

Similarly, some FIFOs give designers a choice of either first-word-fall-through (FWFT) or Integrated Device Technology (IDT) standard mode timing. The latter is used in most FIFOs today, and it's useful in synchronous designs. In FWFT mode, the FIFO places the first word written to the FIFO on the output register without a read operation. FWFT is valuable for applications such as DSP-based systems that need to minimize latency.

A feature that goes beyond the capabilities of traditional FIFOs is mark and retransmit. With a FIFO that offers this, you can internally mark the start of a protocol data unit (PDU). If necessary, you then can retransmit the PDU from the FIFO rather than request a retransmit from the original source.

X-ray Image-Processing Example: The X-ray image-processing application includes several 256-kword by 18-bit FIFOs combined to achieve depth expansion to 1.5 Mwords. This example also illustrates frequency coupling, with the FIFO's read and write clocks running at different speeds. And, it presents a method for calculating the required FIFO depth based on input and output data rates and packet sizes.

The design described here is an I/O module in a system used to develop image-processing algorithms for X-ray applications (Fig. 1). In this system, I/O modules interconnect a display subsystem, an X-ray detector, and multiple processing modules based on the ADSP21060 floating-point DSP from Analog Devices, Norwood, Mass. The I/O channels consist of dedicated Super HArvard Risc Computer (SHARC) links.

In input mode, an I/O module receives the image-data stream from the detector and distributes the data to the processing nodes. During output mode, an I/O module collects the processed image data from the processing nodes and sends it to a display subsystem.

Three FIFOs per input channel decouple the DSP system clock from the external data clocks and relax the demand of the data bursts at the input or output. This creates an optimized load for the processors. When the FIFOs are at the input to a system, the write clock varies from 0 to 25 MHz and the read clock is at 40 MHz. When the FIFOs are located at the output of a system, the write clock is at 40 MHz and the read clock varies from 0 to 25 MHz. With the input and output buses operating at different frequencies, data must be synchronized to the local clock before use. The FIFOs perform this function automatically.

A calculation for the most demanding mode determines the minimum FIFO capacity required to bridge the data gap between two successive images. The FIFO data contents increase at clock rate r, represented by the slope of the solid line in Figure 2.

At this rate, an image of size C is written into the FIFO in C/r seconds. When the image speed (the number of images per second) is v, the next image input starts after 1/v seconds. At this time, the FIFO needs to have enough free space for the next image. By extending a dashed line as shown in the figure, you see a representation of the required continuous readout speed with slope C * v. The vertical distance between the solid line and the dashed line represents the amount of data in the FIFO at any time, and the FIFO's maximum fill point is at time C/r. Consequently, the required FIFO capacity is C - x.

For the X-ray system, each image is 4.62 Mpixels. Image-delivery speed is 7.5 images per second. And, the clock rate is 50 Mpixels per second. Therefore:

FIFO capacity = C - x

x = (C * v ) * C/r = C2 * v/r

FIFO capacity = C - (C2 * v/r) = 4.62 - ((4.62)2 * 7.5/50) = 1.42 Mpixels

Note that C * v must be less than the SHARC bus speed. The selected FIFO for the I/O module is a synchronous 256-kword by 18-bit device. The total FIFO capacity in a 2- by 3-device setup is 1.5-Mwords by 16-bit pixels.

Algorithm accelerator example: The FIFOs in the algorithm-accelerator application buffer store data going to and from a PCI bus that's connected to a rapid prototyping board. The algorithm accelerator resides on the prototyping board and typically implements complex DSP algorithms such as vocoders and modems. The FIFOs provide bus matching and frequency coupling by interfacing the 32-bit, 33-MHz PCI bus to an 18-bit data bus whose speed can vary up to 100 MHz.

The general structure of the PCI-based rapid prototyping board is shown in Figure 3. The main purpose of the FIFOs in this application is to pass data between the two buses that are asynchronous from each other. The FIFO's frequency-coupling ability is especially useful for synchronizing data coming from the PCI bus to the prototyping board prior to distributing it to the board's processing elements.

Bus matching also is useful here because the data from the PCI bus is in bursts of 32-bit words, but the application doesn't require such a wide word. Data samples for the algorithm accelerator can be represented adequately in 18 bits. By running the bus speed of the accelerator at twice the rate of the PCI bus, the 18-bit bus can maintain throughput.

The FIFOs used on the board are unidirectional 128-kword by 36-bit devices that connect to the 32-bit PCI bus on one side and the 18-bit accelerator bus on the other. On the 18-bit side, only the lower 18 bits of the FIFO are connected.

A PCI interface chip buffers the 32-bit words from the PCI bus and moves the words to the FIFO's 36-bit input data bus, giving the upper 4 bits a value of zero. The FIFO splits the 32-bit word into two 18-bit words and includes a provision to select which part shifts out first.

In the other direction, the prototype board must write two 18-bit words into the FIFO before the PCI interface can read the 32-bit word. The PCI interface checks the readiness of the complete 32-bit word by sampling the FIFO's empty flag, which will be de-asserted only after both 18-bit words are written.

HDTV example: The HDTV encoder application compresses an input video stream using MPEG-2, generating compressed HDTV video. Compression of the digital-video-pixel data, coupled with higher-order modulation techniques, is necessary to fit the HDTV broadcast signal into the spectrum allocated by the FCC. The relatively high pixel sample rates and picture density, compared to conventional digitized NTSC rates, lead to demanding requirements for data buffering and storage.

This is especially true at the network broadcast centers, where the equipment must maintain the highest possible picture quality (and in turn, a high number of bits per compressed picture frame). That's important because affiliate stations may impose several concatenations of encoding and decoding on a signal prior to its delivery to the home.

Developed by Tiernan Communications Inc., San Diego, Calif., the HDTV system profiled here uses FIFOs at various points in the data path, mainly to allow frequency coupling and data buffering. Figure 4 shows a block diagram of the HDTV video encoder section.

The write and read clocks (WCLK and RCLK, respectively) coupled by the various FIFOs are:

Video parser * FIFO * MPEG-2: WCLK = 74.25 MHz; RCLK = 63 MHz

MPEG-2 * FIFO * Tile stitching: WCLK = 63 MHz; RCLK = 54 MHz

Tile stitching * FIFO * Output process: WCLK = 54 MHz; RCLK = 27 MHz

Output process * FIFO: WCLK = 27 MHz; RCLK = 27 MHz

To compress the raw digitized video data using the MPEG-2 algorithm, the system must first capture a field or frame of active video pixel data. For this purpose, 20-bit pixel data (10 luminance and 10 chrominance bits) is sampled at 74.25 MHz. A field or frame buffer, then, must hold more than 20 Mbits and maintain write speeds of 74.25 MHz during the active video scan time—more than three times the depth required for conventional NTSC field buffers at a sample rate 2.75 times faster. The specialized DRAM-based field-buffer FIFOs typically used in video equipment do not have the required depths. They're also difficult to cascade, and they cannot meet the speed requirements.

Tiernan Communications met the requirements with a cascade of IDT's SuperSync II FIFOs. The pixel data can be logically partitioned into luminance and chrominance channels and stored separately for easier processing (Fig. 5). So, two 10-Mbit banks hold one field. A depth cascade of two IDT72V2113 SuperSync II FIFOs at 512 kwords each meets the depth requirement.

The system also must accommodate the 10-bit word width. Although the system designers could easily expand the FIFO's 9-bit word width using additional parts, it turns out that this expansion isn't necessary. The MPEG-2 algorithm only operates on 8-bit data samples, so the system can truncate or round the original 10-bit pixel data to 8 bits.

The parallel MPEG-2 processors indicated in Figure 4 each compress a tile of the HDTV picture. The system must stitch the tiles together to achieve seamless motion of objects between tiles. SuperSync II FIFOs again serve in this part of the system to achieve the necessary depth with a low parts count, while accommodating the high-speed, bursty data from the compression engines. Similarly, the fast-access FIFOs also act as rate buffers between the picture-processing block and the final output-processing stage.

The final output-processing block in the HDTV system forms the video elementary stream. This includes higher-level MPEG-2 syntax elements, such as the presentation time stamps used at the receive end of the broadcast chain to synchronize audio and video.

The video elementary stream is multiplexed with other bit streams, such as Dolby AC-3 encoded audio, and user data and identification tables in a broadcast-grade HDTV encoder. Together, these form the final transport stream that goes to an RF modulator. While the read/write access requirements for the final output FIFO are low compared to those of the up-stream signal-processing stages, the depth requirements of this last FIFO can be significant. The system therefore employs depth-cascaded SuperSync II FIFOs to take advantage of their 4-Mbit densities.

In the HDTV application, as with the other examples described here, FIFOs play a major role in data buffering, bus matching, and frequency coupling. With a single FIFO or a few cascaded devices, you can drop in the buffer capacity required by the application and solve several design challenges at once. If system requirements change or you need to configure a range of systems with similar capabilities, it's possible to substitute different FIFOs to accommodate different data rates and capacities. As a result, the key characteristic of today's FIFOs is flexibility.