Chip Set Creates High-Speed, Fail-Safe Switch Fabrics

Distributed queuing and a highly integrated control architecture let this 160-Gbit/s single-stage switch-fabric chip set remove data bottlenecks.

Dave Bursky

Sept. 3, 2001

8 min read

Add Us On Google

As more data moves around and between various networks, data-switching speeds need to increase. But higher speed isn't the only requirement because networks now carry audio, video, and other time-dependent information. So, quality of service (QoS) has become a key issue in current and future systems to ensure the delivery of time-dependent packets without delays. Additionally, networks have permeated all levels of business and industry, and disruptions in service can cause both economic and professional damage. Therefore, future switching systems must also incorporate fail-safe redundancy capabilities.

All issues of high-data bandwidth, QoS, and redundancy can be addressed by using a plethora of components that result in expensive, power-hungry, and rack-filling systems. Able to tackle all of the same issues simultaneously, engineers at Vitesse Semiconductor have developed TeraStream. This is a synchronous, switch-fabric chip set that lets designers craft network-protocol-independent Layer-1 switch fabrics with the lowest cost per 10-Gbit port commercially available—less than $1270 per port for a fully redundant system.

The chips allow the creation of fail-safe switch fabrics with a scalable user bandwidth of up to 160 Gbits/s. Moreover, the company has defined a future roadmap that will lead to switch bandwidths of 320 to 640 Gbits/s.

Before releasing the TeraStream chip set, Vitesse had released the less feature-rich GigaStream and CrossStream switch fabrics. TeraStream promises to greatly reduce system power, size, complexity, and cost, while adding features such as queuing, QoS, and redundancy to satisfy future system demands. To accomplish that goal, the chip set consists of two key components, the VSC871 Queuing Engine, and the VSC881 packet exchange matrix (PEM). Both chips are fabricated in CMOS using 0.18-µm design rules and operate from 2.5- and 1.8-V power supplies.

The high operating speeds of the chips will require some careful cooling considerations, however. On average, the switch fabric will consume about 1.4 W per gigabit/s of bandwidth. For a 160-Gbit/s system, that translates into about 224 W, not counting the network interface circuits and the network processor or other control subsystem. This power level is still well below the power that alternative system solutions would consume.

In a system implementation, the queuing engine chip resides on each line card in a switching system, while multiple instances of the PEM chip form the actual switch-matrix fabric (Fig. 1). A 16-port OC-192 switch fabric might include 16 line cards, each with a queuing engine, and a card or two containing the switch matrix, which is composed of multiple PEM chips.

The queuing engine chip resides on each line card. On one side of the chip it communicates using an industry-standard CSIX-compatible interface to the network processor or some other traffic manager that also resides on the card. The CSIX interface on each VSC871 can be organized as four OC-48-capable 32-bit ports, or as one OC-192-capable 128-bit port. Contained in the VSC871 are data flow paths for both unicast and multicast data streams, enabling the chip to perform one-to-one or one-to-many data transfers.

Unlike previous multicast approaches, which required the line card to repeatedly replicate the data packets for transfer to the switch matrix, the new chip set only needs to send one set of data to the switch matrix. Circuitry on the PEM chip will then replicate the data for multicast operations. That greatly reduces data traffic on the serial backplane and decouples ingress traffic from egress congestion.

For unicast traffic, the chip creates a set of virtual-output queues that minimize head-of-line blocking at the switch-fabric input ports. A total of 64 virtual-output queuing planes are implemented. The PEM can address 32 of them. Plus, a total of 16 per-class queues are implemented for both unicast and multicast traffic. These queues can be arbitrarily divided into a mix of strict priority queues and weighted round robin queues, allowing the designer to set the desired QoS level for each stream.

A key characteristic of the VSC881 matrix is that it implements crosspoint queues for unicast traffic and input queues for multicast traffic. Therefore, the total number of unicast crosspoints is 1024. But because every crosspoint implements two separate queue priorities, the grand total of unicast queues is 2048. For the multicast queues, two separate queue priorities are implemented per input. That translates into 64 available multicast queues.

The on-card network processor or traffic manager, in turn, communicates with the network through Ethernet or other protocol interfaces. The other end of the queuing engine contains 16 serializer-deserializer (SERDES) ports, each capable of transferring data at 2.5 Gbits/s using low-voltage differential signaling (Fig. 2, left). Eight of the ports are used to connect the queuing engine to the primary PEMs over a high-speed serial backplane. The remaining eight SERDES ports connect to the redundant switch fabric and can be automatically brought online if any primary ports fail.

The scheduling performed by the fabric is a distributed type of scheduling scheme; each of the fabric chips will make its own scheduling decisions as opposed to using a "master" scheduler for the entire fabric. The queuing engine has separate RAMs for the ingress and egress queuing with the memory available to an active CSIX interface. That memory acts as a shared memory pool. The virtual output queues draw from this pool as required.

The PEM chip has separate RAMs available for unicast and multicast queuing. The unicast queuing memory has fixed distribution across all the crosspoints, and the multicast queuing memory also is distributed evenly across all inputs.

The queuing engine has 16 per-class queues. A programmable number (Ns,u) are considered high priority (HP) and are serviced in strict priority. The remaining (16-Ns,u) are considered low priority (LP) and are serviced either in strict priority or by using a weighted round robin (WRR) algorithm.

For multicast traffic, an identical queuing structure with Ns,m HP classes (strict priority) and 16-Ns,m LP classes (strict priority or WRR) is used. For each working CSIX interface, an independent scheduling process is implemented. When different unicast planes have the same highest priority, or when only WRR classes are non-empty, the planes are serviced round robin. For each WRR class, an 8-bit programmable weight is used. All unicast planes share a common set of WRR weights, while the weights for the multicast plane are independent from the unicast plane.

The PEM has four possible combinations of traffic classes: high priority/low priority and unicast/multicast. Within each traffic class, the scheduling decision is determined by round robin scheduling.

The PEM chips have matching SERDES ports with which they tie into the high-speed serial backplane (Fig. 2, right). All communications are in-band and use a Vitesse-proprietary 16/17 framing scheme. The PEM chips are more than just crossbars that route inputs to outputs. They include such functions as scheduling, advanced multicast replication, and cross-connect queues that enhance overall scheduling efficiency.

In the TeraStream architecture, designers can use two approaches to add incremental bandwidth. The first approach, called switch slicing, employs two or four PEM chips connected in a master/slave configuration. The frame is sliced into two or four sections and carried in parallel across the individual switch slices. The second approach is known as link bundling. Here, two or four serial links are "bundled" together to double or quadruple the backplane bandwidth. A wide range of system requirements also can be met with combinations of these two approaches.

Implementing redundant, fail-safe switch fabrics is easy with the chip set. Because the queuing engine chips have eight integrated, redundant, SERDES ports, designers can craft a system in which those eight spare ports connect to a second PEM array (Fig. 3). For a nonredundant system, a maximum of four PEMs can be used to form the switching matrix. In the redundant design, only four additional PEMs are needed to implement the backup switch fabric (the second PEM array).

The integrated SERDES ports can actually handle data transfer speeds ranging from 2.125 to 2.64 Gbits/s and deliver data at 2 or 2.5 Gbits/s. The ability to deliver data at those two rates is due to the Vitesse proprietary 16/17 framing scheme that optimizes backplane bandwidth utilization. Additionally, having the SERDES ports integrated into the TeraStream chips provides additional system benefits—lower component count, reduced overall power, and simplified board layout. Furthermore, the electrical characteristics of the SERDES transceivers allows designers to use standard FR4 pc- board material and readily available connectors.

In addition to performing the serializing/deserializing function, the integrated SERDES transceivers also carry the clocking information for the fabric. Each PEM and queuing engine has its own clock management unit, and each individual SERDES transceiver has an integrated clock recovery unit. Tera-Stream uses local clocking on each line and fabric card to avoid the signal integrity and jitter problems that occur when using distributed clocking across the backplane. And to help reduce power consumption, SERDES interfaces are powered down when not in use.

Price & AvailabilityThe VSC871 queuing engine comes in a 784-contact ball-grid array (BGA) package that measures 45 mm on a side and consumes about 12 W. The VSC881 comes in a 520-contact BGA package and consumes about 15 W. In units of 1000, the VSC781 and VSC881 sell for $690 and $1154 a piece, respectively. Samples will be available in the late fourth quarter, with production slated for the second quarter of 2002.

Vitesse Semiconductor Corp., 741 Calle Plano, Camarillo, CA 93012; (805) 388-3700, (408) 986-4388, contact Anita Weemaes; www.vitesse.com.

About the Author

Dave Bursky

Technologist

Dave Bursky, the founder of New Ideas in Communications, a publication website featuring the blog column Chipnastics – the Art and Science of Chip Design. He is also president of PRN Engineering, a technical writing and market consulting company. Prior to these organizations, he spent about a dozen years as a contributing editor to Chip Design magazine. Concurrent with Chip Design, he was also the technical editorial manager at Maxim Integrated Products, and prior to Maxim, Dave spent over 35 years working as an engineer for the U.S. Army Electronics Command and an editor with Electronic Design Magazine.