Designing A Multi-Gbps Memory Interface Requires Scrutiny

Popular consumer electronics products like gaming consoles, digital TVs (DTVs), and PCs offer more features and greater performance with each successive product generation. The data-intensive nature of these products tightly links the capability of their DRAM memory interfaces with the ability of the product to support larger feature sets and greater performance.

Multi-gigabit-per-second (multi-Gbps) memory-interface architectures enable these products to achieve the function and performance required. However, memory-interface designs must overcome significant challenges so that product performance and quality can be attained.

Newer-generation DDR3 DRAM and XDR DRAM physical-layer interfaces (PHYs) are tailored with special features to overcome the challenges and issues posed by multi-Gbps memory-interface architectures. But DDR3 and XDR DRAMs have attributes that suit them to certain application classes.

In digital TV applications, for example, XDR DRAMs hold cost and certain design advantages over DDR3. On the other hand, DDR3 is well-suited for designs needing high capacity at the lowest cost-per-bit of storage. Like its DDR2 predecessor, DDR3 is a high-volume commodity DRAM that provides as much capacity as the system designer requires at the lowest possible per-bit cost.

Yet if high capacity at a low cost-per-bit isn’t the principal design metric, XDR DRAM can be a better choice, especially for consumer electronics applications like DTV and HDTV. These particular designs require high bandwidth and small access granularity, but don’t need high capacity.

For example, a typical DTV application requires 6.4 Gbytes/s. This can be achieved with either two 512-Mbyte x8 XDR DRAM devices (providing 128 Mbytes of capacity, with a desirable 16-byte access granularity) or with four 1-Gbyte x8 DDR3-1600 devices (providing 512 Mbytes of capacity, with 32-byte granularity). In such systems, an XDR solution better matches the system’s bandwidth, capacity, and access-granularity needs in comparison to DDR3. As will be discussed later, it becomes clear that an XDR DRAM can actually be less expensive as well—not in terms of cost-per-bit, but in overall system terms, including component count, board complexity, and design time.

Demanding Physical Effects When engineering multi-Gbps interface architectures, the design must be able to overcome physical effects that degrade signal timing and voltage margins, and hence, constrain the system’s performance. Those physical effects are well-known to seasoned system designers—over generations of new designs, they’ve faced and ultimately resolved them to maintain signal integrity. But for multi-Gbps interface designs, those issues are exacerbated, becoming more challenging and demanding newer solutions.

For example, multi-Gbps signaling suffers substantial degradation due to transmission-line discontinuities. These discontinuities appear in several places in a typical memory channel, from the memory-controller silicon’s attachment to its package, to the package attachment to the board, and on to the board-level imperfections in the transmission line. Even more disruptive discontinuities appear in the DRAM side of the channel, especially if memory module sockets are employed.

These many sources of impedance discontinuity in a memory-channel transmission line create reflections, which high-speed IO designers will recognize as a form of signal interference known as “inter-symbol interference” (ISI). Here, the channel appears to have residual memory, such that information in a previously transmitted bit ends up adversely affecting information in a newly transmitted bit.

Treating a memory-channel as a transmission line faces other challenges, too. For instance, a 50-Ω terminator for a 50-Ω transmission-line receiver is meant to perfectly match transmission-line impedance, eliminating reflections and the resulting ISI. Even with modern on-die termination schemes, though, it’s impossible to achieve perfect impedance matching, because the transmission line has so many discontinuities.

Furthermore, perfect on-die impedance can’t be attained due to the parasitic input capacitance (or C_I) of the on-die receiver. This causes a 50-Ω resistor to appear non-ideal at higher frequencies, again creating reflections and ISI.

Impedance discontinuities and ISI effects like these don’t present major issues at sub-gigabit-per-second transmission rates. However, it’s a different story at multi-Gbps rates, where 625-ps data eyes are common. If terminations aren’t matched, if too many discontinuities are in the channel, or if C_I is too high, the ideal 625-ps data eye the designer hopes to transmit becomes a 300-ps eye opening by the time it reaches the receiver.

Additionally, electrical traces on the board involve other parasitic capacitances, hence a natural amount of signal attenuation. For instance, a signal may start out with 500 mV of signal amplitude, but the electrical system transporting that signal acts like a low-pass filter. As signals propagate increasingly faster, the total amount of energy arriving at the receiver is substantially less than that which was originally transmitted. The result is the original 500 mV may appear closer to 200 mV due to natural channel attenuation.

The channel-equalization techniques commonly used to address this high-frequency attenuation in high-performance SERDES applications may not be applicable to DRAM systems. That’s because the I/O circuits of such systems must be optimized for latency, power, and cost, ruling out the use of most well-known equalization techniques.

Crosstalk is another major cause of signal degradation, involving undesired capacitive, inductive, or conductive coupling from one signal pair to a neighboring one. Crosstalk is, in fact, the major reason for speed limitations in single-ended signaling systems such as DDR3 (or its higher-speed cousin, GDDR3). Because XDR DRAMs utilize differential signaling (similar to high-performance SERDES systems), they are orders of magnitude more resilient to crosstalk emissions than DDR3 DRAMs.

Consequently, single-ended signaling must address crosstalk with an assortment of design techniques that isolate signals at the board level. As data rates rise, designers must physically separate electrical channels increasingly further apart to avoid crosstalk effects. In other words, designers must engineer a more expensive transmission-line system between transmitter and receiver, and between controller and DRAM, to accommodate single-ended signaling at multi-gigabit data rates.

Differential signaling also offers a cost benefit in terms of memory-controller package cost. A memory-controller ASIC package with 200 memory I/O, for instance, is substantially cheaper using wirebond packaging versus flip-chip packaging. Such a cost benefit is highly appreciated in cost-critical consumer applications like DTVs.

Continued on Page 2.

But due to crosstalk and power-supply noise issues, multi-Gbps single-ended signaling is very difficult to get fully operational in a wirebond package at multi-Gbps interface rates—usually it needs a more expensive flip-chip memory controller package. Moreover, a very wide single-ended signaling bus creates electromagnetic interference (EMI), which tends to broadcast in RF domains. The level of EMI shielding required by consumer electronics applications is therefore more expensive to achieve with single-ended signaling than with differential signaling.

In addition to the physical effects of transmission-line discontinuity and the advantages of differential signaling discussed above, other multi-Gbps interface design issues that require consideration for memory-system design are trace-length matching, skew management, and high-speed clock distribution.

Trace-Length Matching, Skew, and High-Speed Clock Distribution Trace-length matching can be easily ignored in slow-speed interface design. The designer’s thinking is often that electrical signals propagate at the speed of light, so very little attention needs to be given to electrical trace lengths. However, for multi-Gbps interfaces, it can no longer be ignored—propagation delay on a typical motherboard is about 100 ps for a signal to travel an inch. For example, it may take half a nanosecond (500 ps) for a signal to propagate along a typical memory channel. In some multi-Gbps systems, that 500 ps is as large as the entire data eye.

When signal-propagation time on an electrical interconnect is on the order of the data eye, no new major concerns crop up when only one chip-to-chip signal is at issue. But it’s quite another thing when dealing with a bus of signals, say 16, 32, or 64 interconnects, all operating at multi-Gbps data rates.

Signals between the controller and DRAM, and all traces between them, are individual and unique. Therefore, attention to this detail must be given at both the system and circuit levels (both in the memory controller and the DRAM) to ensure reliable operation. The different approaches taken by DDR3 and XDR memory systems are discussed in further detail later.

As for high-speed clock distribution, it’s fundamentally different in a memory system than in a SERDES or telecom design that utilizes advanced clock/data-recovery (CDR) techniques. In a memory system, transmissions can generally be considered to be “source synchronous.” For example, the memory controller has both a data interface to the DRAM as well as a clock reference interface (usually part of the command bus) to the DRAM. So, the DRAM gets a clock signal that’s directly related to the clock used by the memory controller to synchronize its data transmissions.

With source synchronous transmission, the major clocking issue to be concerned about is phase, but not frequency. Unlike SERDES or datacom applications, where clock reference sources on either side of the channel have frequency offsets, the transmitter and receiver in a memory system share the same, unique frequency reference and are misaligned only in phase. Such a system is generally known as meso-synchronous or mesochronous. While they share a frequency reference, the transmitter and receiver circuits must somehow compensate for their random phase discrepancy.

In an XDR DRAM system, FlexPhase circuitry in the memory controller addresses both the trace-length matching and mesochronous clocking issues by intelligently preskewing data when it’s transmitted to the DRAM (during a write operation) and deskewing it when it’s received from the DRAM (during a read operation). Advanced calibration techniques are used to automatically optimize the required deskew and preskew values. FlexPhase is discussed in more detail later.

In a DDR3 memory system, the data interface uses strobe groups, a DDR hallmark, to handle trace-length matching and the mesochronous clocking issues. A data strobe, or DQS, is defined as a timing reference that accompanies data transmitted from the DRAM to the controller (during a read operation) or from the controller to the DRAM (during a write operation).

During a read operation, the controller employs DQS for bits associated with that particular strobe domain. For example, if the DRAM transmits eight bits plus a DQS, the controller uses a delayed version of DQS to clock all eight input samplers taking data off the memory channel. This scheme allows the system designer to mitigate many trace-length matching effects of the data bus, at the expense of the additional controller pins and board traces associated with the per-byte strobe signals.

As for the command bus, DDR3 memory systems with multiple DDR3 DRAMs utilize a “fly-by” topology for command bus routing (Fig. 1). The command bus is routed so signals on this bus traverse past each of the DRAMs in sequence. This is unlike prior generations of DDR and DDR2, in which the command bus was “hybrid-T” routed.

In the older systems, each of the DRAMs on the command bus contributed to an overall, cumulative parasitic load that limited the operational rate of that bus. However, the benefit was that commands, clocks, and addresses arrived at all of the DRAMs on the command bus at approximately the same time, which simplifies some data-bus clocking issues.

Fly-by routing is now used by both XDR’s and DDR3’s command bus, which enables more reliable, higher-speed signaling by separating the parasitic load of each device on the channel. (In fact, fly-by routing has always been utilized as the command bus topology of Rambus memory products.) But fly-by routing of the command-bus necessitates more complicated deskew and preskews circuits on the data bus, since the delay between DRAM devices can easily end up being larger than a bit time at high speed. FlexPhase circuit techniques implemented in the XDR memory controller PHY are intended to address this issue.

Compare and Contrast As main-memory and consumer-application performance requirements increase, both DDR3 and XDR DRAM and their memory-controller physical-layer interfaces need to be designed as advanced memory solutions. These will address the issues discussed above and other emerging system design issues. The following explanations provide system designers with a comparison and contrast of the two and how they apply to multi-Gbps interface designs.

The FlexPhase circuits and differential signaling used by an XDR memory controller present several system benefits to address the memory-channel challenges described earlier. For example, the FlexPhase circuits relax pc-board trace-length matching requirements by anticipating and calibrating the signaling-phase offsets caused by variations in trace lengths and impedances.

To graphically show these benefits, Figure 2 compares a DDR DRAM-equipped PC motherboard and an XDR DRAM-equipped Sony PlayStation 3 board. In the DDR PC motherboard, single-ended trace-length matching is difficult for wide buses, which increases board area (and possibly board-layer count) for the memory system. In the Sony PlayStation 3 board, though, FlexPhase techniques eliminate the need for trace length matching, while differential signaling eliminates the need for large trace spacing (to reduce crosstalk effects). This promotes a compact memory-system design that saves board area and reduces engineering costs.

FlexPhase timing adjustments allow for simpler, more compact, and cost-efficient memory layouts. Those timing adjustments also provide for in-system test and characterization of key data signals, thus enabling the performance and in-situ testing of high-speed links.

FlexPhase circuits are found in the clocking system of the XDR memory controller’s I/O (Fig. 3). These circuits essentially enable high-precision, digitally controlled clock generation, which allows for 8-bit accuracy of any clock’s phase position (from 0 to 360°, relative to a fixed reference). During read operations, per-I/O FlexPhase circuits compensate for the phase difference between signals arriving on different traces (a process commonly referred to as “deskew”). During write operations, FlexPhase circuits control data-bit transmission so that byte-wise data arrives at the memory device with a known timing relationship to the DRAM’s clock signal (a process commonly referred to as “pre-skew”).

Continued on Page 3.

These circuits can also be combined with advanced digital-DLL circuits that minimize the PHY latency (i.e., from the memory-channel clock domain to the internal clock domain of the memory controller). FlexPhase also provides for in-system timing characterization and self-test functionality, which enables aggressive timing resolutions of 2.5 ps at 3.2-GHz data rates in high-performance memory systems.

FlexPhase was built to minimize the system-level cost and complexity of routing an XDR memory system. However, FlexPhase techniques can also be used to address the clock-distribution challenges of other fly-by command/address systems, such as DDR3.

In DDR3’s fly-by topology, the time required for data, strobe, clock, address, and command signals to propagate between the controller and DRAMs is primarily affected by trace lengths propagating those signals. Clock, command, address (CCA) signals arrive at each DRAM at different times. As described in further detail below, this results in data signals being transmitted from each DRAM at different times.

FlexPhase techniques can therefore be used in the DDR3 memory-controller PHY to deskew data signals, which eliminates the offset due to the fly-by architecture in addition to any inherent timing offsets of the memory system. For example, during READ operations, the memory controller with FlexPhase circuits can reliably compensate for the difference between the transmitted control signals and the data received from each memory device.

Using a 32-bit memory bus to illustrate, the DDR3 CCA bus “flies by” the DRAMs, causing read data launch-time offsets (Fig. 4). Even if trace lengths were exactly matched, each DRAM’s data arrives at the controller at different times. Consequently, the burden rests on the controller to be smart enough to independently capture each DRAM’s reads and writes and reliably reassemble the whole 32-bit word. As stated earlier, FlexPhase circuits can be used at the controller to deskew data signals, nullifying the offset due to the fly-by clocking architecture and minimizing the latency of the 32-bit word in the PHY itself.

During WRITE operations, a similar process is performed in which the memory controller can preskew the timing delay between transmitted clock, command, and address signals and the data/strobe signals sent to each DRAM. DDR3 DRAMs include a special operational mode for write-strobe deskew, also known as “levelization,” to assist with optimal preskew timing by the memory-controller PHY. Figure 5 depicts an example of FlexPhase timing preskew used in conjunction with such a levelization mode in a fly-by memory system architecture.

In summary, the FlexPhase circuit techniques and differential signaling used by XDR memory present several system, design, and cost benefits to address the challenges faced by designers of multi-Gbps memory systems. And while DDR3’s use of fly-by command bus and clock distribution allows for higher-speed command-bus signaling with multiple DRAM loads, it creates new data-bus challenges to reliably communicate between the memory controller and the DDR3 DRAMs.

And, although DDR3 DRAMs include a special “levelization” mode to assist with write-mode deskew, the newly added complexity of the memory controller PHY is significant, especially as clock frequencies continue to increase. In such systems where the lowest cost-per-bit advantages of DDR3 are the principal design metric, FlexPhase circuit techniques that address similar concerns in XDR memory systems can be used in DDR3 memory-controller PHYs to achieve reliable multi-Gbps memory operation performance.