Novel Interconnect Scheme Enhances Comm Design

DESIGN VIEW is the summary of the complete DESIGN SOLUTION contributed article, which begins on Page 2.

The trend started with switch-backplane interconnects, and has now worked its way down to the board level. At this level, it presents a new way to resolve several critical issues in the design of equipment supporting enhanced services at high speed.

Up until the 1-Gbit generation of networking equipment, it was common to build I/O subsystems based on bus interconnects. At higher data rates, engineers adopted high-speed point-to-point protocols. The interconnect design resembled a "daisy chain" with separate ingress and egress paths. However, because today's networking equipment is routinely expected to provide wire-rate gigabit speeds and perform layer 3 through 7 operations, bus and daisy-chain architectures fall short.

One emerging solution is to implement a switched interconnect. This trend can be seen by the evolution of several popular bus standards to support switched interconnects. Standards bodies and promoters of HyperTransport, RapidIO, and PCI Express have positioned each of these to evolve to switched-bus designs.

The article delves into switched-network design, and points out potential pitfalls such as the effects of latency on transport and flow control. Also discussed are a couple of variations on switched design, specifically the role it plays in the new Advanced Telecom Computing Architecture (ATCA). An ATCA line-card design example drives the point home.

HIGHLIGHTS:
Benefits Of Switched Interconnects	To overcome resource-inefficiency and heavier-processing-load problems associated with bus and daisy-chain architectures, engineers can go with a switched interconnect. For instance, an FPGA can be configured to provide an interconnect.
Designing With A Switch	Transitioning to a switched network means learning some new terminology (switch latency, flow control, head-of-line blocking) and being alert to a few design considerations (e.g., interface ports on a switch should be channelized to the full extent of the interface protocol for maximum flexibility).
Design Possibilities From Switch Interconnects	Many new networking designs require data flow that goes back. For a switched design, adding a "mid-plane" ultimately optimizes data-plane performance, and allows the design to offer layer 4 to 7 features.
Design Example	A switched interconnect is central to some of the design-flexibility benefits built into the new Advanced Telecom Computing Architecture (ATCA) standard. The reference design for an ATCA line card illustrates the flexibility promised by such a setup.

Full article begins on Page 2.

When it comes to interconnect in gigabit-speed communications equipment, engineers are discovering that what's good for the network is also good for the networking equipment. Over the last two decades, Ethernet has evolved from bus-based to switch-based. Now the device interconnect inside the networking gear is undergoing the same transformation. The trend started with switch-backplane interconnects, and has now worked its way down to the board level, where it presents a new way to resolve several critical issues in the design of equipment supporting enhanced services at high speed.

Up until the 1-Gbit generation of networking equipment, it was common to build I/O subsystems based on bus interconnects (Fig. 1a). The data rates of these systems are limited by electrical load from bus fan-out and bus-clock distribution to multiple chips on the board. At higher data rates, engineers adopted high-speed point-to-point protocols, first based on LVDS and then on SERDES technology. The interconnect design resembled a "daisy chain" with separate ingress and egress paths (Fig. 1b). In such a system, packets must flow through every device, and may not "flow back" to a device they have already passed through.

Today's networking equipment is routinely expected to provide wire-rate multi-gigabit speeds and perform layer 3 through 7 operations. The impact on interconnect is dramatic. When services were limited to layer 3, the datapath through the network could be a linear daisy chain. This data plane could be optimized for wire-speed performance with fast ASICs providing the services. When an exception occurred, such a routing table update, it was taken out of the data plane and processed offline before being reinserted into the data flow.

With an increased workload of layer 4 through 7 functions, the data flow has become more complex. A larger number of packets now need additional processing—at wire speed—and some must use several resources before being forwarded by the switch to their destination. For example, an encrypted packet arrives at a network processing unit (NPU), but must be sent off to a security co-processor before coming back to the NPU, where it's processed and then forwarded to its destination. The straight, linear data plane yields to a complex traffic pattern that can be different for each packet that passes through the system.

Another issue with bus and daisy-chain architectures is resource inefficiency due to the separate ingress and egress packet pathways. Certain coprocessors may be needed on both paths. This means the board design must bear the cost and power draw of two devices when the throughput rating of one device is sufficient to meet the design's performance requirements. Changes in some layer 4 to 7 services could also mean a heavier processing load on one of these datapaths, resulting in a board re-spin to upgrade the processor because the interconnect can't be diverted.

Benefits Of Switched Interconnects One emerging solution for this design constraint is to implement a switched interconnect (Fig. 2). This trend can be seen by the evolution of several popular bus standards, such as PCI and RapidIO, to support switched interconnects. The standards bodies and promoters of HyperTransport, Serial RapidIO, and PCI Express have positioned each of these to embrace switch-based designs.

Engineers can implement a switched interconnect in several ways. An FPGA, while not a true switch, can be configured to interconnection between multiple devices. An FPGA's ports are programmed to communicate with each other in datapath configurations. Due to the complexity of streaming data interfaces and the requirement for high-speed buffers and flow control, the FPGA implementation can typically support only a small number of ports and generally doesn't provide full non-blocking connectivity between ports at 10-Gbit/s data rates.

It's possible to overcome some of these issues by converting an FPGA design to an ASIC. However, this substantial effort requires the resources of high-speed interface and interconnect experts, and is more akin to developing a custom ASIC rather than the push-button synthesis typically thought of in converting simple FPGA designs into ASICs.

Commercial crossbar switch chips are also available, with devices offered from a growing number of vendors. These are full-switch implementations with the reprogrammability and flow-control capabilities. The first chips to emerge support the SPI-4.2 protocol, with other protocols under development.

Designing With A Switch Transitioning to a switched network means learning some new terminology and being alert to a few design considerations. A switch's performance is determined by its capacity, which is a combination of the chip's port count, the speed of those ports, and the throughput of its switch fabric. For example, on an SPI-4.2 switch, each port operates at up to 16 Gbits/s. If the chip has two ports, the switch fabric must support at least 32 Gbits/s in order to not block data. Often switch fabrics are designed to be a multiple of the total port speed—called overspeed.

Switch latency is the time it takes a data packet to traverse the switch, and is as critical as throughput in selecting the proper switch. Two kinds of switching paradigms exist, and each can have a dramatic impact on latency. In cut-through switching, a packet's destination is read and the packet is forwarded through the switch before the switch receives the entire packet. In store-and-forward switching, by contrast, the entire packet must enter the switch before being forwarded. Generally, a cut-through switch will offer the lowest latency, especially on designs that anticipate large packets.

If a switch with significant internal overspeed doesn't also offer low-latency transport and low-latency flow control, throughput problems are likely to occur. For instance, bursty traffic will be translated to sporadic "sawtooth" performance due to an oscillation between congestion, delayed flow control, and drained buffers. Designers will need to tune the size of their buffers to the speed and latency of the switch chip. Larger buffers will result from either an under-speed product (switch fabric is less than the aggregate of the link speeds) or one with a high latency.

Flow control, another key factor of system performance, is also significantly affected by latency (Fig. 3). Flow control through a switch must be broken down by the link so that at each stage—ingress, in transit, and egress—there's the opportunity to report congestion so that the sender knows sooner and can stop transmitting. The more tightly coupled the flow control, and the lower the latency of the flow-control path, the more efficiently the overall system will operate.

The interface ports on a switch should be channelized to the full extent of the interface protocol to get the maximum flexibility from a switch. In the case of SPI-4.2, this can be as high as 256 channels. These channels, sometimes called ports, may be used to transport separated traffic flows. They have distinct hardware resources, buffers, and flow-control mechanisms, but share a common physical interface. The ability to map any channel on any interface to any other channel on any other interface is also important to allow complex data flows through multiple devices.

A key issue involves head-of-line (HOL) blocking, where a packet is blocked at the egress port due to congestion in the switch. That keeps other packets, whose datapaths aren't affected by the congestion, from progressing. HOL blocking will impact the actual throughput of the switch regardless of the switch's capacity. Port channelization and per-channel flow control are important features within a chip to overcome HOL blocking.

Switch Interconnects Open Up Design Possibilities Many new networking designs require data flow that goes back. Packets must often be processed multiple times by multiple resources before they can be forwarded. For example, an encrypted packet arrives at the NPU for classification. The NPU must send it to a security coprocessor for decryption before it flows back to the NPU for final processing.

To accomplish this without a switched interconnect would mean putting a security coprocessor in the data plane with the NPU, and requiring all packets to filter through it to accommodate the few packets that were encrypted. In this design, the security processor becomes a bottleneck if it doesn't operate at wire speed.

An extension of this switched design adds a "mid-plane" to the design where security, SNMP, content management, and other resources reside, with the switch feeding packets to each resource as necessary. This optimizes the performance of the data plane, while allowing the design to offer layer 4 to 7 features.

The concept of a switched interconnect is also central to some of the design-flexibility benefits built into the new Advanced Telecom Computing Architecture (ATCA). The ATCA is the first-ever standard for the system design of central-office-based networking equipment. Built into the specification is support for mezzanine cards per line card, which can be inserted into the line card for more functionality, physical interface flexibility, or even added computing or storage networking capability. With a switch interconnect on the line card, these mezzanine cards can be flexibly connected and mixed and matched as needed. Taken to the extreme, this could lead to a "universal" access line card that can be populated with whatever mix of Ethernet, DSL, Wi-Fi, or other connectivity standard required for the application.

Design Example The flexibility promise can be seen in a reference design for an ATCA line card (Fig. 4). The board is designed to offer removable PHY/MAC mezzanine cards so that the interfaces can be changed. These mezzanine cards can support any physical media or networking technology, such as 10/100/1000 Ethernet or WAN protocols.

An NPU that provides packet classification and services processing, a traffic manager for scheduling and quality-of-service (QoS) enforcement, a security coprocessor for encryption, and a fabric interface chip for linking to the switch backplane are available on the board. Connecting all of these elements is the PivotPoint FM 1010, a six-port SPI-4.2 switch developed by Fulcrum Microsystems.

The FM 1010 provides a high-speed interconnect to all components. The design anticipates multi-gigabit-speed interfaces. Each SPI-4.2 port can support as many as 16 channels—all of which can operate at full line rate. The internal switch function of the FM1010 supplies 192 Gbits/s of capacity, and ensures that these line rates can be maintained to every component on the board.

The switch can soft-terminate ports in the event that one of the mezzanine cards isn't populated. It also offers full connectivity to each processor that can be reconfigured, depending upon the application. A configuration that features two LAN-interface mezzanine cards may use the traffic manager more heavily than two WAN-interface cards that need more access to the security coprocessor.

This design shows how a switched interconnect can offer a new level of flexibility to networking designs. With more vendors entering the market, and improved switching support built into interconnect protocols, engineers will have more choices in how they can reshape their architectures to benefit from switching.