Calculating and minimizing jitter is a multifaceted problem. It involves understanding the jitter dependencies and specifications, as well as identifying the proper noise profile. Specifically, jitter is the time variation of a periodic signal from its ideal location in time.
In a simpler context, it’s how early or late a signal transitions with reference to when it should transition, often used in relation to reference clock source. Jitter emerges from noise introduced by the power supply, its distribution network, devices on the network, routing coupling, neighboring devices in the silicon substrate, core switching, and other factors.
Jitter comes in two forms: random jitter (RJ) and deterministic jitter (DJ). RJ is primarily caused by localized thermal variations, microscopic variations in the resistance, and impedance of circuit traces generated by the inevitable small variations of trace width, dielectric properties, and many other microscopic effects that are statistically impossible to isolate.
DJ is timing jitter that’s repeatable and predictable. Thus, its peak-to-peak value is bounded, and the bounds can usually be observed or predicted with high confidence based on a reasonably low number of observations.
In present-day systems-on-a-chip (SoCs), the higher operating frequencies of the serializer-deserializer (SERDES), phase-locked loop (PLL), and other components dictate a jitter specification required at their respective reference-clock inputs, which can be less than a picosecond. For large designs, it’s a challenging task to select components and construct reference-clock networks to achieve the target jitter.
Furthermore, lots of thought must go into choosing the correct modeling of input waveforms and noise. That’s because different types of jitter sources will be superimposed to cover deterioration over the voltage source, board, package, and die. Failing to properly anticipate noise sources, as well as have plans in place to mitigate noise and implementation of clock distribution networks, will degrade the SoC’s performance and, in extreme cases, cause functional failures.
The proper planning, optimization, and implementation of a SERDES reference-clock distribution network is essential to achieve jitter of less than three-quarters of a picosecond. These elements will be viewed through our case study (a design at 40 nm), for which several SERDES instances need reference-clock through a distribution network over 20 mm through multiple gate levels with less than a 0.75-ps rms jitter specification.
During the planning phase, accurate noise profiling and modeling can be tricky. We performed an initial analysis with the standard practice of noise as half the frequency of the signal and amplitude that’s 5% of the voltage supply applied on the power supply (Fig. 1a).
The noise was added on the power-supply network at the C4 bumps. The layout parasitics of the power-supply network were extracted, and the noisy voltage supply could reach the power supply and n-well biasing pin of the repeaters and I/O pads.
Pre-Layout Modeled Spice Simulations
Before putting extensive efforts on the layout of the clock-distribution network, it’s crucial to check the feasibility of achieving the target jitter. Here, we write a virtual power mesh network that matches closely with the actual power network when constructed, as well as with optimum repeater selection, drive strength, spatial separation, VT class, substrate isolation, regulated power supply, careful layout placement and routing, and so on. (Later, we’ll discuss how to arrive at optimum values for different parameters in layout implementation to ensure minimum jitter is introduced in the distribution network.)
Once the modeled Spice file is completed, we check the feasibility of meeting the jitter specification (Fig. 2). Note that the choice of different factors determines the correlation with the final Spice run.
Different factors must be kept in mind to minimize jitter. Preliminary implementation was done in a placement and routing (PnR) tool using simple buffer tree synthesis. (Good distribution was observed.) We specifically considered the following points while implementing the clock network:
- Repeater selection: We had three options: single-ended CMOS implementation, differential implementation using differential-to-differential (d2d) and differential-to-single-ended (d2s) buffers, and enforcer-kicker implementation having feedback. After some experiments and checking overall gain in jitter, we finalized on single-ended CMOS for the buffer, which is easily available as part of the standard cell library.
- Repeater’s drive strength, VT class, spatial separation: We performed experiments (parasitic modelling, simulations) to arrive at optimum parameters.
- Substrate isolation using guard rings: This was done to keep substrate and ground quiet, which could otherwise cause DJ.
- Isolated and regulated power supply: We planned to use separate VDD/VSS power supplies to avoid noise from the core digital circuitry. Also, we used decoupling capacitor cells to reduce the dynamic IR on the power-supply network.
- Careful routing: We studied different patterns of routing and came up with an optimum non-default routing rule to have minimum parasitic and noise coupling. During layout design, we also avoided crossing any digital tracks in subsequent layers by isolation and shielding.
- Careful layout: Identify the cells that could contribute to phase noise, and select the right drive strength and load to minimize variations in the output from these cells. We placed all of the differential pairs to minimize layout-dependent variations like placement in the same orientation and perfectly matched routing.
Some of these considerations are optimized with experiments to find optimum parameters. Here, a separate analysis and optimization exercise was carried out to find optimum repeater selection, spatial separation, non-default routing (NDR) rule for routing, drive strength, and VT-class selection. They are used in the pre-layout spice simulations as well as the final layout implementation.
In the final layout implementation, the clock network was executed with the aforementioned optimum parameter selections in addition to manual implementation techniques such as substrate isolation and decoupling capacitor-cell placements for the power supply.
For example, to find the optimum NDR rule for routing, we measured length versus delay using different widths of nets. We also observed that delay and jitter were proportional. Since delay experiments were quicker, we stuck to the delay and confirmed it during the final jitter runs.
In Figures 3a and 3b, the x axis of the graph represents the length, and the y axis is the delay taken to traverse the length with the different numbers of repeaters represented by “N.” For example, N = 2 (the green color line in Figure 3a) means to traverse 20 mm by two repeaters, so it takes 2 ns with 0.63-µm NDR. Similarly, plots are obtained with other NDR rules (e.g., 1 µm in Figure 3b). We took the variations for 10 different NDR rules starting from minimum width to 1 µm. The minimum delay/jitter was found to be 0.63 µm.
We then superimposed and curve-fitted the 2D data into 3D with the added dimension of NDR rule (Fig. 4). Hence, we could arrive at the optimum NDR rule, which turned out to be 0.63-µm metal width for 1.5-mm repeater separation at 40-nm technology. Similarly, we found the optimum repeaters (inverters), spatial separation between repeaters (N = 15 for a 20-mm span of clock distribution network or nearly 1.5-mm distance between repeaters), VT selection, and drive strengths.
Post-Layout Extracted Spice Simulations
The pre-layout modeling and component selection in the optimization were key measures for robust clock network implementation, as well as for quick layout of the reference-clock network distribution and quick closure without iterations.
For very long networks, we could determine early approaches by which the jitter specification can’t be met and hence switched to alternative methods to get the required result. For confidentiality, we cannot share the actual organization and layout details, but we offer an initial clock distribution network and its zoomed version (Fig. 5).
Once the layout with the optimum component values was created, we performed Spice simulations of the final extracted netlist. We used industry-standard tools to implement, simulate, and analyze results. Some scripting was required for implementation and analysis of the results.
The eye diagrams, time-domain jitter, and period superimpositions showed the jitter at the input of the SERDES (Fig. 6). The output results show that we could meet the requirement of less than a picosecond rms jitter for a clock network fanned over 20 mm on the die.
We would like to thank the following colleagues at Open-Silicon for their valuable input and contribution: Shrikrishna Mehetre, engineering manager; Rahul Deshmukh, engineering manager; Robert Fulton, IP manager; and Seshagiri Yalavarthy, senior ASIC design engineer.