It Takes A Super Sleuth To Really Debug Clock-Timing Problems

Feb. 16, 2004

15 min read

Today's digital-based systems anchor their reliability and performance on the integrity of their clocking systems. Due to the wide-reaching effect these signals have on system performance, it's paramount that problems be located and corrected. Like a good detective, the engineer must ask the right probing questions to determine exactly what is cause and what is effect.

Fixing the effect is a less than optimal solution and may allow the guilty party to escape and commit an even more serious timing crime later. For example, suppose there's a smoking bypass filter at the scene of the crime (high jitter, badly shaped waveforms, or design parameters that aren't being met). It must be proven to be the cause and not a piece of the evidence trail that leads back to a less than obvious source. Possibly the capacitor was used in "self defense" against noise in a frequency band that it couldn't protect the clock device from.

A correct-for-design termination circuit that becomes incorrect because of inappropriate source-driver-component impedance is another good example. That is, a 50-Ω termination impedance for a 50-Ω characteristic impedance transmission line is usually, but not always, correct for maximum signal integrity. The instinctive reaction is to tune the terminator for the circuit. But this will disrupt the impedance of the transmission line connecting the driver and termination. Indeed, the obvious source of trouble is very often far from the real cause.

It's best to correct this problem at its root cause by choosing an appropriate driver, rather than waste time tuning the terminator to a working but less than optimal result. By correcting the root cause, you may also fix other problems that aren't significant enough to cause timing violations but, nonetheless, degrade the timing margin.

We must also be aware that failure may not be caused by a single and easily identified source. Failure may result from several slightly out-of-spec timing entities that together push a timing edge past a limit.

For instance, the combined effects of a marginal ground, incorrect bypass component values, and poor trace-routing topologies can produce enough noise to cause a failure. To make matters even more difficult, clock components are connected to power and ground planes that are shared with the rest of the design. Thus, they're affected by the noise environment of the system they control. There's little wonder that finding the problem's root cause is so difficult.

TIME DOMAIN The first view we have into the crime scene is time-domain information. Things displaced in time from each other typically show up as skew and delay, causing timing events to be regularly or irregularly displaced in time from the desired point of occurrence. They have two prevalent root causes: deterministic-jitter and random-jitter noise sources.

Deterministic jitter can be absolutely traced back to a root source. The displacement is regular and has a discoverable periodic occurrence or fingerprint. By reading the time displacement of the period peaks on a multimodal distribution measurement (Fig. 1), you can often quickly determine the aggressor signal's frequency. Once that's accomplished, finding the absolute source is only a few steps away.

Random jitter is a little more of a problem. It can come from the components themselves or be passed forward from devices that drive the offending component.

FREQUENCY DOMAIN The other important view of clocking problems lies in the frequency domain, where we observe the energy distribution at and around the desired frequency in time. Clocking systems demand fast rise times to minimize transitions through the loads' switching threshold region. This adds many odd harmonics to the spectral content of the clock. (A pure square wave with infinite rise time is a sum of the fundamental and all of the odd harmonics of the desired frequency.) In the frequency domain, we're looking for the results of other clocks mixing with the desired clock frequency.

Four mixing products are created when clocks modulate each other: the sum, the difference, and the two initial frequencies. In most cases, the aggressor frequency will be quite close to the desired frequency, as will be the resulting sum and difference products.

ROGUES GALLERY Often an event is noted, but it's not the actual source of the problem. Consider the kingpin of timing issues—jitter. Jitter takes many forms in clocking systems. Sometimes jitter can be analyzed and the exact frequency of a guilty aggressor signal can be determined.

Sometimes jitter is a gang of misbehaving sources that, together, tip the scales and raise the noise level. The sources often are so complex in nature that a great deal of evidence must be gathered to develop a path of attack to even collect the magnitude and source of the relevant data. An example of this is excessive electromagnetic interference (EMI), where electromagnetic energy, in the form of electromagnetic waves, escapes the very transmission lines created to contain them.

While locating the existence of the noise is routinely performed with FCC and other organization-mandated testing, correcting the source is often seen as an art. Major noise sources have several typical causes to track down.

First of all, abnormally fast rise times create overdriven transmission lines. In Figure 2 the rise time is in picoseconds, and in Figure 3 it's about 2 ns. Using Fourier analysis, you can see that the amplitudes of the harmonic components of the shorter rise time are much higher and much stronger at any given frequency than for the longer rise time. This energy must be dissipated somewhere, and radiation from the traces is the least desired method.

A pc-board trace exactly 1/4 wavelength long between two components becomes a very efficient antenna. Naturally, these antennas couple magnetic waves into the atmosphere around them. The solution is to know the multiples of the fundamental clock frequency lengths. Then you have to make sure that no contiguous sections of the trace carrying those clocks is 1/4, 1/2, or 1× these lengths. Furthermore, if the clock signal is improperly terminated, reflections will cause the wave to traverse the line for an increased time. The longer the incident wave exists on the antenna, the more radiation occurs.

Slots in shielding, contrary to popular belief, don't allow EMI waves to escape. The size of the aperture controls the emission because the electromagnetic-field (EMF) waves develop circulating currents in these defined edges, and the edges themselves become the antennas. Knowing the frequency of the harmonic energy on the inside will let designers determine the largest permissible shield.

Adjacent excitation also can be an issue. If an electromagnetic wave crosses a conductor, it will induce energy into it. And if the conductor happens to be an input trace to a clock, this will modulate that clock signal's amplitude at the time that it crosses the switching point of the device it drives. This translates into jitter in the input signal of that device.

Power-supply noise is the simplest and most susceptible source of clock-signal performance degradation. Given that most clocking components are CMOS in process, their switching thresholds are usually determined by the difference in their ground (V_SS) and power (V_DD) supply pins. If either level shifts in potential, then the switching threshold of almost every internal digital node will change.

Note that there's no protection from the noise that gets into the power and ground pins. To the clock device, this noise immediately becomes internal system noise. If a phase-locked loop (PLL) is present and the frequency of the noise is within its loop bandwidth, it will pass and potentially amplify the noise. Even more difficult to predict is the effect in the frequency domain. When power-supply noise gets into the voltage-controlled oscillator (VCO) stages, increased jitter and loss-of-lock is not uncommon.

There are two defenses against power-supply noise. First, filter it to an acceptable level using adequate bypass (decoupling) techniques and components. Or, incorporate a differential architecture. A couple of key benefits of differential signaling technology are superior power-supply noise immunity and higher-frequency operation due to reduced voltage swings.

Ground noise is fully added (and subtracted) in real time to the output clock's waveform, because output signals are referenced to the device's ground pin. While modulating the T_HIGH (higher threshold) and T_LOW (lower threshold) portions of a square-wave clock seems to only threaten the signal-to-noise ratio (SNR), the real problem pertains to the clock. If the modulation is synchronous to the clock and occurs around the switching point, jitter increases. This is a deterministic jitter and can usually be seen as a bimodal distribution of clock periods in a timing-interval analyzer. If the noise is asynchronous to the clock, then the random-jitter shape of the clock's distribution is deformed in a multimodal (non-bi) shape (Fig. 4).

Thermal effects on clock systems typically induce low-frequency errors because of the time it takes to move the thermal mass of the device from one temperature to another. Your choices are either to stabilize temperature with an oven or fan or change the design to avoid the sensitivity (that is, use two components that track and compensate for each other).

Control pins, such as output-enable and configuration-selection lines, are a commonly overlooked noise source. These lines often have internal pull-up devices that are in the tens of kilohms range. Noise induced onto these signals rides a relatively uninhibited path directly into the device, through device die capacitances, and out to all the gates to which it's coupled. Terminating these to a clean power or ground signal is more prudent than letting them float and act as noise-input antennas.

Voltage or current nonlinear loading is a concern when designers need to add level shifting to voltages. Typically the source is unequal capacitive and inductive loading with respect to the driving device's power and ground pins. Placing a pull-up resistor to act as a termination is a good example. The pull-up resistor aids the internal semiconductor pull-up device in the clock driver, thereby increasing the waveform's rise time by supplying an aiding pull-up current. Similarly, it works against the pull-down device in the clock driver, causing an increase in fall time.

These two opposing current sink/source effects cause both timing and duty cycle distortion of the clock signal seen by the load device. Placing capacitors and clamp diodes in the circuitry only exacerbates the problem. You'll do better to use a symmetrical load. If a termination load of 50 Ω is required, the appropriate circuitry is a 100-Ω resistor to V_DD and a 100-Ω resistor to V_CC to preserve duty-cycle symmetry.

Distorted source signals also can affect clocking events. Most PLL devices use either a single input clock edge (single-ended) or the crossing of two input clocks at the same voltage (differential) to mark a precise point in time. On a non-PLL-based device, the output duty cycle will be affected directly by input waveform integrity. That's because it acts as a buffering delay line to the incoming waveform. If it includes internal dividing logic, then the duty cycle is less important.

Rise and fall times are critical because logic doesn't switch instantaneously. The longer an input clock spends in the hysteresis zone, the less drive the input gates get. The voltage charging their gate capacitances is lower for a longer period of time, so they charge slower. Thus, the output switches at a slower rate and at a slightly more ambiguous point in time. Although this is usually only a few picoseconds, if a margin can be designed in, it's almost always prudent to take it.

Poorly terminated clock signals cause unwanted reflections. Even though reflections are anticipated in series-terminated cases, excessive reflections add to and subtract from the desired clock-signal waveform. This amplitude modulation reduces the signal-to-noise margin. If it's strong, it can sum with the clock and cause the switching threshold crossing event to be displaced in time.

System noise, not the clock source, accounts for most noise in timing circuits. To ferret out these sources, you must add a single-frequency noise at different amplitudes to the power supply, sweep it in frequency, and measure the critical clock parameters to see the effect and what fails. You can simulate this, but injecting known noise into a completed design and then finding what amplitudes at what frequencies cause failure is the optimal way to determine design tolerance to system noise.

And don't stop at the first frequency that breaks things. Test the complete spectrum across all critical timing points. Of critical interest are the frequencies that can be measured on the power system when the design is functioning normally, because they are known suspects.

Inductively or capacitively induced noise coupled into traces isn't often thought of as a major contributor to timing errors. As frequencies get higher, however, capacitive coupling (capacitive reactance) increases, as does crosstalk. Looking for the adjacent trace frequencies will help identify their effects quickly. Be aware that while their fundamental frequency may not be easily coupled, the opposite may be the case for the rich harmonic components.

Loop bandwidth is the particular input-noise frequency-range sensitivity of components with PLLs. The component will pass, and in some cases amplify, noise within this band. You must know this value to understand how to minimize this effect. The input pin isn't always the sole source of noise into the part. Noise entering through the power-supply system and control pins is just as destructive.

Incorrectly terminated transmission lines cause energy to remain on the line longer, creating a magnetic field around the line and thus radiating EMI. If the traces are modally tuned in length (that is, have standing waves present), collisions of the reflections will create even higher amplitudes of voltage and therefore stronger magnetic fields.

Electromagnetic fields that radiate from conductive traces often result in coupling and crosstalk in a system. The shorter the pc-board traces are, and the more closely they're tuned to the load, line, and source, the less they will leak energy. Also, if the lines are terminated correctly, the signals spend less time on the line. Consequently, the overall power they radiate is diminished.

The feedback clock signal of a zero delay buffer is also an input pin. Long meandering traces acting as delay lines are cheap and accurate. But they're also paths for capacitively and inductively coupling ground-plane and adjacent-trace noise.

The tolerance error of other components (pc-board fabrication spec ranges) will later compromise performance if you design a tight clock circuit with components that are thermally sensitive or have other low tolerances. For example, the E_r (dielectric constant) of a pc board's solder mask has wide ranges of values and physical thickness when hand silk screening is used to apply it. Therefore, the best practice is to bury clock traces between a pair of known E_r material internal layers to keep transmission-line impedance well controlled.

Physical positioning of sensitive clock traces and components in the vicinity of noise will also influence clock-signal integrity. However, routing clock traces through noisy areas is sometimes unavoidable. In such cases, it's best to utilize differential traces for their superior noise-canceling characteristics. Using straight and uninterrupted traces will not give noise the opportunity to rob you of your timing margins. Traces that meander from layer to layer and around sharp corners only increase noise sensitivity.

Cross coupling from parallel signals on adjacent layers is the major cause of signal crosstalk coupling. Remember to stay as close to the same reference plane as possible. If you don't, the change in impedance when transitioning across plane boundaries will cause reflections.

UNDERSTAND THE EFFECT TO FIND THE CAUSE How you measure and eliminate suspects requires a logical and systematic data-collection approach, careful analysis of that data, and often iterations of this procedure.

For example, impedance mismatch issues don't just involve the circuitry that lies above ground. The ground-return path and components ground connection both are a critical part of the investigation. Elements such as ground bounce momentarily change the overall transmission line's impedance and cause impedance mismatches between the driver and the line it's driving. Inadequate decoupling can cause power-supply dips, which empties out these coulomb buckets of electrons when the parts need them.

The decoupling circuitry performs another important function. It's a low-pass filter that passes dc current and blocks high-frequency system noise from affecting the component. If it's designed incorrectly, noise may leak in and possibly degrade the device's performance. If the noise is synchronous with the clock's input signal, it can easily hide in the rise or fall time of the clock signal and may then be harder to detect. It usually appears as harmless synchronous crosstalk noise. But if the coupled noise occurs coincidentally with the signal traveling through a triggering threshold of the load component, its effect will be catastrophic.

To connect the cause with the effect, you need to make the cause change and then see an appropriate change in the effect. Changing series termination resistors, parallel loading resistors, and adding small loading capacitors can disrupt a transmission line—and the load that a clock driving buffer sees—enough to verify it's the cause of the problem. You can also lower impedance by adding a thick coating of solder, ground with the touch of a grounded body. Or you can cause an aberration in a transmission line with metal dental probe tips and locate a specific physical point of interest. Although these procedures are far from precise, they lead us to the problem and show which direction an impedance, resistance, inductance, or capacitance value needs to move to correct a less than optimal clock-signal environment.

Being creative in your investigation will be rewarding. But remember that to really solve a problem and not merely address its effect, you must understand what and why it is happening. Only then can you drive your investigation on knowledge and not be misled by deceptive clues.