Switch-Fabric Chip Set Delivers Per-Channel QoS

May 27, 2002
Scalable and highly integrated, a two-chip solution simplifies the design of terabit switch fabrics.

As the amount of data that we send continues to escalate exponentially, network switch fabrics must be able to scale from today's tens of gigabit/s levels to the terabit levels needed for tomorrow's systems. Additionally, our dependence on networks based on Internet protocol (IP) to handle more voice and video communications and digital entertainment makes quality of service (QoS) increasingly important to ensure clear voice transmission and smooth-flowing video streams. Such guaranteed levels of service have typically only been available to users of TDM, ATM, or Sonet networks.

But now, companies are moving to packet-based networking as a lower-cost, more flexible solution, making the availability of bandwidth on each channel a key requirement. In addition to the challenge of designing a scalable system with guaranteed QoS levels, designers must craft the system to consume minimal power while squeezing it into smaller and smaller amounts of rack space.

Taking on all of these challenges in one two-chip solution, Broadcom has developed a scalable switch-fabric solution that enables packet-based systems to act like ATM/Sonet systems in their ability to guarantee services. The system can leverage multiprotocol line cards, provide circuit emulation to replace ATM/Sonet systems, and deliver IP- or MPLS-based QoS.

The first of the two devices is the BCM8332, an 80-Gbit switch chip that contains the actual switch fabric and packs 32 input ports and 32 output ports (Fig. 1). Each port contains an integrated serializer/deserializer (SERDES) that can handle a 3.125-Gbit/s, 8B/10B encoded data stream (2.5 Gbits/s of raw data). The companion BCM8320 is a bidirectional dual 10-Gbit fabric interface that also has the fabric management logic (Fig. 2). The two chips form the heart of a high-bandwidth switch that can be used in service edge routers, multiservice switches, subscriber management systems, and more.

Previous silicon solutions in the market achieved similar integration and scalability, but they couldn't ensure bandwidth for the various services delivered over the fabric. To guarantee bandwidth, Broadcom developed specialized techniques that can buffer every flow through the fabric. In comparison, other fabric architectures typically collapse the buffers among the various flows. That restricts the amount of differentiated services that can be associated with each input.

Most fabric architectures store all traffic from different input ports at a given priority level, to a particular output port, in the same priority-based queue. This limits the service differentiation that can be provided to various input ports within the same priority level. The Broadcom chip set maintains separate storage buffers for each input queue, letting customers arrange guaranteed bandwidth allocations between input and output ports. Any allocation can be set up using different sets of weights. This makes the chip set ideal for systems that need to set up service level agreements (SLAs).

The switch fabric is implemented logically as parallel switching planes that are connected to the switch interface chip through serial links. The chip set doesn't stripe the cells across the serial links, which lets the architecture handle graceful degradation of service in 2.5-Gbit/s chunks. Therefore, designers can configure the architecture for N-1 to N*2 redundancy.

One unique aspect of the chip set is its flexible bandwidth allocation. There are no bundling restrictions on the 3.125-Gbit/s SERDES links from the BCM8320 to the BCM8332s. When designing a system, designers must decide how much front-panel bandwidth the system users want. They must also take into account the necessary amount of "speedup bandwidth" both before and after a failure. This will determine how many switch cards must be available in the system. Therefore, system architects have the ultimate in flexibility of performance, redundancy, and cost tradeoffs when designing for specific system needs.

Each serial link provides the switch interface with 2.5 Gbits/s of backplane bandwidth. By selecting the appropriate number of lines to deliver the desired total bandwidth, system integrators can provision systems with just the right amount of redundancy, rather than requiring a full duplication of the fabric to provide redundancy.

Building The Fabric: A typical switch fabric system would use a BCM8320 fabric interface chip on each line card and one to 16 BCM8332 chips to form the switch fabric (Fig. 3). Beyond the BCM8320, an Ethernet line card might contain a network processor for control and a port aggregation chip or two, such as the BCM8842, that could each aggregate up to 12 1-Gbit ports into a 10-Gbit/s SPI-4 Phase 2-compatible data stream. Then the data would be sent from the aggregation chip to the network processor over the SPI-4 Phase 2 interface.

Next, the network processor on the line card would perform its management/analysis functions on the data packet and pass the data on to the BCM8320 over a CSIX-compatible (common switch interface standard) port. With two CSIX ports, the BCM8320 can be used in system architectures that employ dual network processors on a line card, or one network processor and a high-end traffic manager. Each CSIX port can run at 2.5, 5, or 10 Gbits/s. Regardless of the speed of the interface selected, any bandwidth from 2.5 to 40 Gbits/s in 2.5-Gbit/s increments can be implemented across the backplane. Interface chips employing different bandwidths can readily be mixed in the system.

Each fabric interface chip houses 16 bidirectional SERDES ports (16 input and 16 output channels) that ship the data streams to and from the switch fabric. Both of its CSIX-compatible ports support an OC-192 data stream and can tie into traffic management or network processors over a 32-, 64-, or 128-bit wide interface. The switch fabric can be implemented with one to 16 BCM8332s to deliver a scalable bandwidth of 80 Gbits/s to 1.28 Tbits/s. Each BCM8332 chip in the fabric operates independently, providing multiple switching planes to the BCM8320 fabric interfaces.

The fabric interface chip includes all queue management and flow control functions as well as cell parsing logic. One BCM8320 supports a fixed-length cell with a 15-byte header, a 1-byte trailer, and a 96- or 112-byte CSIX payload. A single BCM8320 can operate with as much as a fourfold speedup factor (40 Gbits/s per port) to handle the inefficiencies introduced by fixed-length cells and to provide additional redundancy. Each serial link carries a continuous cell during a transfer cycle. Since the fabric consists of independent switching planes, the BCM8320s must reorder the cells received from every fabric chip on egress to guarantee in-order delivery.

The fabric interface chip handles multiple levels of priority for unicast traffic for up to 64 ports and two levels of priority for multicast traffic. Up to 16,000 multicast groups are supported as well. The device also handles Idle, Flow Control, Unicast, and Multicast IC cFrames. Each data cFrame consists of a 2-byte base header, an optional 4-byte extended header, a variable-length payload, potential padding to the interface width, and 2 bytes of vertical parity. A Flow-Control cFrame consists of a 2-byte base header, a payload containing a variable number of 4-byte flow-control entries, potential padding to the interface width, and 2 bytes of vertical parity.

To control the data flow, the BCM8320 provides both the standard, in-band flow-control mechanism, and an optional, proprietary, out-of-band flow-control mechanism. The BCM-8332 scalable switch-fabric chip packs 32 differential serial links, each capable of operating at 3.125 Gbits/s. After subtracting the overhead due to the 8B/10B encoding, every serial link actually provides 2.5 Gbits/s of raw data bandwidth to and from the BCM8320 fabric interface chips.

Every serial link transmits a continuous cell during a transfer cycle. When cell data is unavailable for transmission, idle cells are transferred across all serial interfaces to continuously refresh flow-control information. These serial interfaces must be bidirectional to exchange essential flow-control information for proper operation of the switch fabric.

Two cells are buffered at the receive interface of each serial link on the fabric chip. This buffering helps to retime cell data from the clock domain of the serial receive interface to the core clock domain. These buffers also are used to provide temporary storage while the port destinations for multicast cells are being read from the multicast table and while access to the central queues is being scheduled.

Queued cells are scheduled for transmission to each egress port. During a scheduling interval, the output link scheduler looks at the queue status of all unicast and multicast queues as well as the programmable weights. Furthermore, the scheduler examines the status of the flow-control information for the target serial link of that scheduling interval and the synchronization status of the receiver for that serial link.

The company's approach of combining virtual output queue scheduling in the BCM8320 with dedicated output port scheduling in the BCM8332 lets the fabric deliver topnotch performance. The BCM8332 output schedulers combined with per-input-channel per class-of-service queues in the BCM8332 are just one aspect to achieving the high performance. Therefore, the fabric ensures bandwidths from any in-put to any output by differentiating among various input ports at the output port.

Most importantly, complete flow control from output back to input takes place on an input/output/class-of-service basis. All flows have individual flow controls end-to-end throughout the fabric, so no flow control restricts any other flow through the fabric. Only its own flow from input port to output port for that class of service is affected.

The fabric chips pack a CPU interface that lets a local processor control some or all of the fabric devices within a switch fabric. A local bus, the CPU interface is intended to connect devices on the same PC board. A complete fabric, though, would consist of many boards and thus many separate CPU buses. Those buses could be extended into a single logical bus through the use of a local bus controller on each board. Then the controllers would communicate to a CPU control card.

To control the fabric and integrate it into the rest of the system, Broadcom engineers developed a modular common-switch application programming interface (API) that's used on all of the company's switch products. This allows designers to leverage the development work of one driver across many switch solutions. The switch fabric, though, requires minimal software to do its job. The basic necessary software performs system initialization, link bring-up, setting of multicast groups, setting scheduler weights to guarantee bandwidth, and checking of device status.

A key example of the ease of software integration in Broadcom's fabric resides in the fact that software isn't needed to implement redundancy. Hardware will sense any failed links and continue transmission on all available links.

Beyond the API and driver software, a fabric simulator is available for designers requiring a jump start on their system architectures. The simulator runs on either a Unix, Linux, or Solaris platform and permits designers to model the fabric and the traffic across the fabric at a functional level.

For rapid prototyping, the company also offers a full midplane-based chassis reference design and development platform. A single reference chassis can contain a full 1.28 Tbits of switching capacity. Furthermore, the chassis may be extended by cascading multiple subtended shelves via optical cables. Slots are available in the chassis for switching cards, line cards, evaluation cards, and a centralized processing card.

Each line card has a pair of switch-fabric interface chips and several FPGAs that let users implement their own packet-level modification and traffic-management algorithms. An included 12-port gigabit-Ethernet aggregator offers up to 12 1-Gbit input channels and a 10-Gbit SPI-4.2 packet interface to the FPGAs (the BCM8842).

Broadcom also has available some samples of FPGA configurations that will allow the FPGAs to convert many common networking interface standards to the standard CSIX interface. Code is currently offered for XGMII, SPI4.2, Hyper Transport, and simple FIFO interfaces. Additional Broadcom silicon used on the line card includes a SiByte MIPS-based RISC processor to handle control-plane functions and other system functions, and a XAUI interface port for linking to various Broadcom development systems.

Price & Availability
Samples of the BCM8320 fabric interface, the BCM8332 switch fabric, and the BCM8842 will be available in the third quarter. In 10,000-unit lots, the BCM8320 and BCM8332 will sell for $400 and $600 apiece, while the BCM8842 runs $285 each in 1000-piece quantities.

Broadcom Corp., 16215 Alton Pkwy., P.O. Box 57013, Irvine, CA 92619-7013; (949) 450-8700; www.broadcom.com.


To join the conversation, and become an exclusive member of Electronic Design, create an account today!