Troubleshoot High-Speed Buses By Clearing "Clock Scheme Fog"

Full article begins on Page 2

High-speed digital buses continuously evolve. Not only are they faster, but they're chang-ing how a system clocks data. To improve data throughput, emerging synchronous digital buses are sending data multiple times per cycle via an array of clocking schemes. High-speed synchronous data transfers are becoming more common: Synchronous clocking modes that began life in high-end computing equipment are now trickling down to mid-market products. Thus, there's a greater demand for labor-saving digital troubleshooting solutions.

One of the most productive tools for debugging synchronous systems is the logic analyzer. When properly equipped, it can directly capture high-speed synchronous data. At the heart of the high-speed synchronous data-acquisition challenge, though, is the requirement for great flexibility in clocking and triggering.

Digital system designers have learned that in conventional parallel-bus architectures, brute-force increases in clock rate can yield diminishing returns. In response to this lesson, digital architects have devised a number of innovative clocking approaches, including "double-pumped," "quad-pumped," and "source-synchronous." This article defines and explains how these different approaches work and how to capture the right data at the right time.

One approach that gets particular focus is source-synchronous. Here, dedicated strobe signals are used instead of, or sometimes in addition to, a normal clock pulse. This makes acquisition inherently more complex. Yet despite the fact that several more steps are involved in its setup, as opposed to the other approaches, a dedicated source-synchronous mode makes the setup process straightforward. The "pyramid" step approach is detailed in the article.

HIGHLIGHTS:
New Clocking Scheme	"Double-pumped", "quad-pumped," and "source synchronous" are three clocking approaches created as an alternative to brute-force clock-rate increases.
Special Acquisition Modes	Today's logic analyzers must be able to handle the many edge and data combinations. One proven method is to pair high sample rates with multiplexing techniques. This "doubling" known as 2X clocking, enables capturing of signals with narrow edge spacing.
Source-Synchronous Acquisiton	Source synchronous clocking uses dedicated strobe signals instead of, or with, a normal clock pulse. Most logic analyzers require external interfaces to pre-process source-synchronous acquisitions, while some have the ability built in.
Source-Synchronous Pyramid	Though it has several more steps than other approaches, a dedicated source-synchronous mode makes setup straightforward. Discussed are the successive layers of configuration choices that form the setup "pyramid."
Table: Synchronous Operation Types	Various types of synchronous operation currently in use are broken down for comparison. The edge and data combinations mentioned in "Special Acquisition Modes" are listed here.

Full article begins on Page 2

Understand how double-pumped, quad-pumped, and source-synchronous devices work, and you can capture the right data at the right time. High-speed digital buses continuously evolve. Not only are they faster, but perhaps more importantly, these buses are changing how a system clocks data. Gone are the simple days when all data was transferred on the rising edge of the master clock. In the never-ending pursuit for improved data throughput, emerging synchronous digital buses send data multiple times per cycle via a diverse array of clocking schemes. High-speed synchronous data transfers are becoming more common in emerging computing, networking, and communications architectures. In fact, synchronous clocking modes that began life in high-end computing equipment are trickling down to mid-market products. Now designers demand labor-saving digital troubleshooting solutions even more.

One of the most productive tools for debugging synchronous systems is the logic analyzer. When properly equipped, it can directly capture high-speed synchronous data. At the heart of the high-speed synchronous data-acquisition challenge is the requirement for great flexibility in clocking and triggering.

NEW CLOCKING SCHEMES, SERIAL TRANSMISSION Pursuing ever-increasing data throughput in PCs, servers, and communications elements, digital system designers have learned that in conventional parallel-bus architectures, brute-force increases in clock rate can yield diminishing returns. In response to this lesson, digital architects have devised a number of innovative clocking approaches, including "double-pumped," "quad-pumped," and "source-synchronous."

Perhaps the most innovative of these is the source-synchronous clocking architecture. This approach is gaining wider acceptance and will be the focus of more discussion later in this article. In a typical source-synchronous transaction, the transmitting device sends a strobe and multiple data bits in each cycle. The receiving device uses this strobe to latch the data, then resynchronizes it to the master or common clock. Some double-data-rate (DDR) memory buses and front-side buses, as well as AGPnX graphics cards, employ this technique. DDR memories have an equivalent data-transfer rate of up to 800 million transfers per second (Mtransfers/s). Front-side buses have an equivalent transfer rate of up to 533 Mtransfers/s.

The table summarizes the types of synchronous operation currently in use. Note that some architectures may have more than one implementation. So one DDR device, for example, may employ double-pumped techniques, while another device takes a source-synchronous approach.

Another method of improving data bandwidth is reducing the number of data channels, multiplexing the data, and substantially increasing the synchronous clock frequency. To achieve these higher clock frequencies, the data is sent differentially and at reduced signal amplitudes.

In the face of all these advances, some engineers are finding that their logic analyzers can acquire synchronous serial data but they're cumbersome to use. Most analyzers need an external preprocessor on the front end to "precondition" the data so that the analyzer can interpret the newer data protocols. This is necessary because the instrument lacks the complex clocking capabilities to interpret the high-speed synchronous 2X, 4X, and source-synchronous data protocols prevalent in today's digital buses.

Unfortunately, the preconditioning process also latches the data, making it impossible to see the raw timing of the signals. This complicates the location of timing problems. Designers also require other important functions:

The ability to handle the steadily increasing clock frequencies and data-transfer rates in today's digital systems
A way to probe buses without degrading the signals
Device-specific support for the growing number of processors and buses that use synchronous techniques
High-resolution time-stamping of individual bit acquisitions.

In addition, features that support interconnection with analog acquisition tools (typically oscilloscopes) are valuable because digital problems often originate in the analog-signal domain.

While synchronous and serial-bus architectures have moved ahead, logic-analyzer capabilities haven't rested on their laurels. Logic analyzers built on flexible, modular platforms have adapted to the new requirements, gaining new high-speed clock modes and other data-acquisition features. These include selectable clock (sampling) frequency, demultiplexing, bus width and memory depth, and, of course, the clocking mode. In terms of clocking, logic-analyzer acquisitions fall into one of two categories:

Internally clocked: This instrument supplies its own clock, capturing the data with conventional sampling techniques to produce timing displays of the digital data. The acquisition is asynchronous. The internal clock runs at a set frequency unrelated to that of the system under test (SUT).
Asynchronous acquisitions are valuable for troubleshooting and functional verification on straightforward bus architectures. In certain sampling modes, high-performance logic analyzers can sample asynchronously at equivalent internal clock frequencies of up to 8 GHz.
Externally clocked: The logic analyzer depends on a clocking signal from the SUT and doesn't acquire data until it receives one. External (synchronous) clocking is the key to accumulating data most efficiently. Each sample equates to a specific clock event on the bus, and the resulting data can be disassembled to produce detailed listings of bus activity as well as timing diagrams. Only valid events are recorded, not the passage of time between them.

Some modern logic analyzers offer synchronous clocking modes that can acquire data at rates up to 1.25 Gbits/s. This is essential for emerging serial communications buses.

SPECIAL ACQUISITION MODES Today's state-of-the-art logic analyzers deliver up to 2-GHz asynchronous sampling at full memory depth. But this number doesn't tell the whole story. What matters as much as raw speed is the instrument's way of handling the many edge and data combinations summarized earlier in the table.

One proven method is to pair high sample rates with multiplexing techniques. Here, two or more acquisition channels pool their resources to effectively double, or even quadruple, the instrument's normal data rate.

This doubling, known as 2X clocking, allows for capturing of signals with narrow edge spacing (down to 1.25 ns in some cases). In this mode, one can select up to four different clock sources. Using the 2X clocking mode, one might, for example, sample double-pumped data at two different times with synchronous clock rates up to 800 MHz and DDRs up to 800 Mbits/s.

Figure 1 shows the advantage of 2X clocking over basic synchronous (1X) clocking. Using 1X clocking on the 800-MHz source, exactly half of the data is lost because samples are taken at intervals of 2.5 ns, rather than 1.25 ns. With 2X clocking, two groups work together to sample twice as often. The data in group D3 is meaningless. But group A3 contains all data from the time period of interest. In this case, D3 is "sacrificed."

Figure 2 depicts the listing and waveform views that result from a 2X external acquisition. In the waveform view, the line titled "LA1: Mag_Sample" shows a series of tick marks, each representing 125 ps of time. This is the logic analyzer's timing sample period. The 2X clocking mode enables the synchronous sampling of data, shown as "LA1: Mag_Data" at an 800-MHz clock rate. The resulting sampled data appears on the line titled "LA1: Data."

Quad-pumped data is the next step up. To acquire quad-pumped data, some logic analyzers use a combination of 1X clocking, two-way demultiplexing, and dual-edge capture. Alternatively, some logic analyzers offer a dedicated 4X clocking mode. In either case, the instrument samples four events per clock cycle.

The 4X approach is very similar to that of the double-pumped example explained above. A pair of groups is sampled on both leading and trailing edges—a total of four samples per cycle. The setup/hold timing settings on the groups determine when the logic analyzer samples the data associated with each clock edge.

Using the 4X mode to capture quad-pumped data yields the highest data rates available from the logic analyzer. To achieve this speed, the 4X clocking mode commits four groups to sampling the data. Of these, the demultiplex target group accumulates valid data. The data from the three demultiplexed source groups is discarded.

SOURCE-SYNCHRONOUS ACQUISITION Capturing data from a source-synchronous bus requires a special source-synchronous acquisition mode. All modes discussed up until this point rely on a conventional clock signal to clock data into the logic analyzer. Source-synchronous clocking uses dedicated strobe signals instead of, or sometimes in addition to, a normal clock pulse. Acquisition is inherently more complex than that of other synchronous modes. Most logic analyzers need external interfaces to preprocess source-synchronous acquisitions.

However, certain logic analyzers include built-in features that perform uncompromised source-synchronous acquisition. For example, at the heart of the TLA7Axx logic-analyzer modules for the TLA700 Series is a "Clock Group Complete" function that latches and holds a succession of completed events. When a final enabling event occurs, it advances all of the accumulated data. Note how closely this process matches the description of a typical source-synchronous operation.

Figure 3 is a simplified timing diagram for a source-synchronous data transfer. Here, the transmitting device sends data multiple times within each cycle (symbolized by the DATA \[15:0\] and DATA \[31:16\] groups), plus a strobe associated with each data group (Strobes 0 and 1), respectively. The receiving device uses the strobe to latch the data, then resynchronizes the data to the master clock.

SOURCE-SYNCHRONOUS PYRAMID Although it has several more steps than the modes discussed earlier, a dedicated source-synchronous mode makes the setup process straightforward. These steps can be viewed as a setup "pyramid" consisting of successive layers of configuration choices (Fig. 4). The steps are numbered in the order in which they are normally performed. Let's follow the process from "bottom to top."

Assign Edge Detectors: Every synchronous digital system requires a clock. Source-synchronous systems also use one or more strobe signals. The logic analyzer implements edge detectors to mark the passing of such events. Four edge detectors are available per module, with rising and falling edges assigned independently from one another. Linking the strobe signal to a detector completes the logic analyzer's definition of when a cycle's data is valid.
Bind Edge Detectors Into Clock Groups: A clock group consists of one or more of the previously defined edge detectors. In turn, the clock group defines exactly which edges are needed to complete an event.
Define Sample Clocks: The validity of SUT operations, such as Read or Write, are determined by Boolean conditions that encompass several signals. Setting up the sample clocks prepares the logic analyzer to sample according to these Boolean equations. Just as the clock groups are made up of edge detectors, the sample clocks consist of clock groups related by various Boolean OR and AND invocations using qualifier signals.
Setup Group Clocking: The group clocking setup step is the point at which all of the previous definitions—edge detectors, clock groups, and sample clocks—converge. The group clocking menu also imposes timing parameters on all of the logical conditions programmed so far.
Define Probe Demultiplexing: Probe demultiplexing tells the logic analyzer which groups to demultiplex so that they're consistent with the source-synchronous acquisition. The logic analyzer has default mappings for various levels of demultiplexing, which are satisfactory for source-synchronous acquisitions. Therefore, this setup step simply specifies the default map. Figure 5 shows the result of a source-synchronous acquisition. Here, the strobe latches data on its rising edge. This data is then resynchronized to the master clock, as shown in the waveform view.

To increase the data throughput in digital systems, innovative data-transfer techniques are gaining popularity. These include increasing the basic clock and data speed, sending the data differentially, reducing the signal amplitude, transferring data multiple times in one clock cycle, and sending the data in a source-synchronous format.

When capturing the digital data from the buses implementing these transfer attributes, it's necessary to use an advanced logic-analysis tool that can capture these types of buses without the need for a front end "preprocessor" to manipulate-and possibly distort the data.

A new generation of logic-analyzer modules is rising to meet this need. These tools can handle high-speed synchronous clocks and capture multiplexed bus data and source-synchronous data transfers.