High-Density Switching Systems Pose Interface Timing Challenges

Explore the issues involved in achieving consistent timing integrity for Serial Media Independent Interface designs.

Oct. 2, 2000

9 min read

Over the past decade, physical layer (PHY) interface technology has kept pace with the proliferation of high-performance networks by providing more flexibility, faster performance, higher densities, and lower cost. This is especially true regarding the media-independent interfaces required in the multiport switching systems that form the core of modern internetworking infrastructures. Although the pin counts for media-independent interface components have dropped dramatically, increasing frequencies demand tighter timing constraints, such as the setup-and-hold functions necessary for robust system signal integrity.

The interface between the media access controller (MAC) and the PHY components has undergone significant refinement and transformation over the past few years in response to evolving system-level requirements. As port densities have become more important, one critical factor in system design has been the drive to reduce the number of individual pins required for each MAC interface in a high-density switching ASIC.

For example, the IEEE 802.3 Ethernet standard originally defined the Media Independent Interface (MII) with 16 pins per port for handling data and control functions. The number of pins required by MII wasn't a major problem for early, low-density switching systems. In addition, the buffering circuitry in MIIs allowed typical 10/100 MII implementations to use relatively manageable clock speeds in the 25-MHz range.

In early 1998, a group of industry-leading companies developed the Reduced Media Independent Interface (RMII). The RMII specification was an effort to reduce circuit complexity in MAC-to-PHY designs and improve port densities while reducing costs. Essentially, the RMII cut the required number of pins per port by more than half. This reduction streamlined the design and improved the economic viability of devices with high port densities, such as network switches and multiport switched repeaters, that use multiple independent data paths between MAC and PHY functions.

For instance, employing RMII in a typical 24-port switch system can reduce the number of pins per MAC from 16 down to six. This results in an overall system-level savings of approximately 240 pins. The operating speeds for the RMII, however, shot into the 50-MHz range.

The next step to improving media interface efficiency is increased industry adoption of the Serial Media Independent Interface (SMII) standard developed by Cisco Systems Inc. While the use of SMII can further reduce the pin-count complexity down to only three or four pins per MAC, the interface essentially runs at full line-rate speeds of 125 MHz, with 8-ns cycle times. In combination with RMII, SMII provides system-level designers with another important option for minimizing PHY-to-MAC circuit complexity, reducing pin counts, and improving port densities.

SMIIs are aimed at increasing the switch system port count while maintaining the switch ASIC pin count. Most system companies today are targeting 24-port systems or higher. For instance, moving from RMII to SMII saves three pins per port. In a 48-port system, approximately 144 pins could be saved. But, the effective use of SMII requires careful attention to critical timing issues.

The higher speeds of SMII significantly increase the need for designers to control signal integrity with regard to I/O setup-and-hold timing. The setup-and-hold calculations are fairly straightforward in concept. But when the overall cycle time is compressed to only 8 ns and the setup time consumes 1.5 ns (18.75% of that total cycle), the management of these critical timing parameters becomes paramount to creating a robust interface. According to Cisco's SMII specifications, the critical timing parameters include:

Input setup: minimum of 1.5 ns
Input hold: minimum of 1.0 ns
Output delay: minimum of 1.5 ns
and maximum of 5.0 ns

All parameters are to be measured at the timing reference points between the PHY and MAC (Fig. 1).

In practice, SMII setup and hold calculations must take into account all delay and skew factors that can occur in real-world circuitry. This can become especially difficult for high-speed designs in which even relatively short traces can exhibit many of the delay and slew characteristics of analog transmission lines, rather than crisp digital waveforms. Successfully overcoming these timing and signal integrity obstacles requires very astute design decisions at both the system level and the silicon level. Not only do system engineers need to implement optimal board design and layout rules to minimize noise, jitter, and interference, they must leverage new PHY-level semiconductor capabilities in order to maximize available margins and headroom.

There are a number of key components in setup-and-hold calculations (Fig. 2). These include:

T_{CLK SKEW BUF}
T_{CLK SKEW TRACE}
T_{CLK DLY ASIC(SKEW)}
T_{BUFFER DLY} and T_{TRACE DELAY} and
T_{INPUT BUFFER}
T_SETUP and T_HOLD

Although modifications of the clock-trace length can be used to adjust the setup-and-hold timing margins, the ability to adjust delay timing from the PHY end can provide a much more straightforward and easily implemented solution. In either case, the pin-to-pin skew of the clock buffer itself must normally be held within a tight 100- to 300-ps range.

Depending upon the clock tree loading, the delay attributed to the clock input buffer and clock distribution tree can become a significant factor. For example, two adjacent MAC ASICs communicating with each other could exhibit a significant skew factor between them due to one operating at the slow end of the process range and the other operating at the fast end. One method for reducing the clock input buffer delay and clock distribution delay is to use on-chip PLL/DLL (Fig. 3).

At operating speeds of 100 MHz and beyond, it isn't adequate to simply estimate the minimum and maximum output delay with a lump load. Detailed circuit simulations, like those performed with Spice or XTK/TLC, are needed to obtain more accurate calculations. Plus, to conduct a meaningful Spice simulation, the designer has to know the worst- and best-case output driver and input receiver Spice models; package parasitic characteristics for components at both ends; line length, line impedance, and the number of vias; and the worst- and best-case operating conditions.

A Spice simulation output will show fast and slow corner characteristics (Fig. 4). Furthermore, it reveals where the minimum and maximum delay numbers can be extracted from the Spice model for use in the setup-and-hold calculations. T_LH(MAX) and T_HL(MAX) should be implemented in the setup calculations, while T_LH(MIN) and T_HL(MIN) should be employed in the hold-time calculations.

Minimizing PHY Output Delay As seen from the previous discussion, one of the key factors in successfully designing robust SMIIs lies in providing tight control over the output delay characteristics of the PHY components. Because all system engineers don't have the extensive experience necessary to deal with these high-speed signal integrity issues from scratch, PHY component designers provide flexible solutions and tools for integrating SMII capabilities into system designs.

When driving the MAC ASIC from the PHY, if the spread of min-to-max delay is too wide, the interface design can run into problems on either one or both ends of the range. If the maximum delay is too large, it can create setup issues. Similarly, on the other end of the range, the inability to meet or exceed minimum PHY delay specifications can create problems with hold time issues at the ASIC end.

Even with a PHY delay time of 5.0 ns (at the high end of the acceptable specification), the required minimum of 1.5 ns for setup leaves very little headroom within the total 8-ns budget for trace delay and skew issues. For instance, designers using a typical PHY component that meets the stated specifications and a nominal circuit design with approximately 5- to 6-in. TX data traces can result in a negative setup margin. The bottom line is that under real-world conditions, simply meeting the specifications at the PHY end isn't enough to ensure a solid design.

Some of the newest PHY components address these issues with two features. The first feature, tight process control parameters, narrows the PHY-output range to exceed the stated SMII specifications. It thereby provides an additional headroom margin at the ASIC end.

The second emerging feature in new PHY devices is a built-in capability to adjust the center of the min-/max-delay range to meet specific design requirements. By attaching different values of capacitors to a specified pin on the PHY, an internal PLL can be adjusted precisely to shift the range either up or down.

For instance, with no capacitor attached, the default center is around 4.0 ns with a minimum of 3.0 ns and a maximum of 4.2 ns. If the system design called for the delay to optimally be centered at a lower point, however, then the designer could simply attach a capacitor with a value of 10 pF to shift to a 3.1-ns-centered range. Similarly, cap values of 30, 40, and 60 pF would shift the center to around 2.1, 1.8, and 1.0 ns, respectively.

Also with this adjustment feature, the output minimum and maximum delay spread is better controlled. For example, in a typical PHY, the output min-to-max delay can vary by a factor of around 2.0 to 2.5 ns, while this new feature can reduce the number to within a factor of about 1.5 ns. By preadjusting the PHYs' signal characteristics to precisely match the overall system requirements, designers avoid the risk of marginal timing versus the costs associated with adding complex clock management functions to the overall design. Plus, it helps reduce the need to adjust the clock traces in order to meet the setup-and-hold requirement, and it minimizes potential EMI issues.

A Straightforward Design At the system level, the ability to precisely define and maintain robust delay and skew timing between the PHY and ASIC is a critical cornerstone in simplifying the overall design of multiport SMII switching systems. For example, creating a 32-port system can be relatively straightforward when implementing two 16-port ASIC switches and eight quad-PHY devices in a unified synchronous design (Fig. 5). The extra timing headroom provided by the PHY components enables the system to utilize a single clock buffer for driving all of the components. On the other hand, the more marginal timing associated with conventional PHY components could require significantly more complexity in the form of segregated clock buffers and/or precisely equalized clock traces.

In today's high-performance switch architectures, SMII is rapidly becoming another key enabling technology for reducing the pin count and the associated power requirements that have become so critical to achieving targeted port densities. In some instances, using simplified low-power SMII-optimized PHY components can reduce per-port power requirements by as much as 30% compared to traditional designs.

Essentially, designing for SMII at 125 MHz isn't a trivial challenge. It requires designers at both the system level and the component level to account for all critical signal-integrity issues. By leveraging the capabilities of new-generation integrated PHY components, however, system designers can shortcut many of the low-level timing issues while creating more robust multiport switching fabrics. As overall network architectures become more complex and demanding, these underlying SMII-based switching fabrics will provide a fundamental part of the continuing drive for higher port densities and lower-cost system capabilities.