A new statistically multiplexed Ethernet oversubscription design combines MAC device and interconnect to double the oversubscription ratio and deliver cost-effective layer 4-7 services to edge devices.
By Ravi Sajwan, Ample Communications and Prasad Vindla, Fulcrum Microsystems
The increasingly mission-critical nature of networking and new network applications like voice over IP are resulting in a new demand for layer 4-7 Ethernet services in even the smallest switch/router devices located at the furthermost edges of the network. This trend has given rise to implementation of statistically multiplexed Ethernet solutions as a way for engineers to lower the cost of providing these services. Implementations of statistically multiplexed Ethernet require silicon components supporting oversubscription at the media access control (MAC) layer.
Using oversubscription-enabled multi-port (MAC) silicon alone, network equipment designers can realize between a 2:1 and a 4:1 oversubscription ratio without affecting the performance of the system. But in extremely price-sensitive systems, it may be necessary to push the boundaries of oversubscription to offer competitive features. Accomplishing this can mean taking a new look at oversubscription architectures to ensure that throughput and quality of service (QoS) remain high even at peak traffic levels. Combining the oversubscribed MAC with the channelization capabilities of a SPI-4.2 system interconnect switch, leading-edge designers can double their oversubscription ratio—up to 8:1—while maintaining low-latency, wire-speed throughput.
The recent interest in oversubscription solutions comes from designers under pressure to reduce the cost of networking devices who see removing unused or idle bandwidth in a switch as a key way to accomplish this feat. Ethernet is bursty, and in most applications runs at utilization levels that are a fraction of line capacity, estimated at anywhere from one percent for Gigabit Ethernet to between 10 percent and 50 percent for 10BaseT. By intelligently combining the data from these ports, a designer efficiently utilizes network processor (NPU) cycles across many ports. The total aggregate bandwidth, in this case, is higher than that of the NPU. This enables the designer to increase the number of ports in a design and reduce the overall bill-of-material costs by decreasing the number of expensive NPUs required.
The issue is amplified in Gigabit Ethernet designs. With usage rates so low, these designs tend to over-provision high-performance packet processing resources. In a 24-port Gigabit Ethernet design, for example, a 2:1 oversubscription ratio means that an NPU with 10 Gigabits of throughput can easily service the entire design (see Figure 1).
The economic argument becomes stronger with the implementation of higher levels of oversubscription. At 2:1 oversubscription, system costs can be reduced by up to 40 percent per port; at an 8:1 oversubscription ratio, the per-port cost can be reduced by up to 70 percent (see Figure 2).
Link utilizations are lowest at the edge of the network where the switch port is connected directly to a PC, offering the best location for a highly oversubscribed switch solution. Coincidentally, because these are the devices that consist of the bulk of a network infrastructure they are under the most cost pressure. Oversubscription is generally not advised for backbone switches where data is highly aggregated and link utilization is consistently high.
The starting point for an oversubscription solution is a media-access controller (MAC) with added circuitry for intelligently multiplexing multiple incoming Ethernet ports onto one NPU. For example, a 12-port Gigabit Ethernet MAC will mux all of the incoming data from each port onto one SPI-4.2 interface that would be directed to an interconnect switch or straight to an NPU for processing.
To ensure high throughput, the oversubscription circuitry must also have built-in mechanisms for responding to any congestion that arises when multiple ports burst data at the same time. It is inevitable that multiple ports will burst data at the same time and too much data will flood the MAC causing it to drop packets. But the MAC’s packet processing intelligence can shape that barrage of data, enforcing QoS standards and allowing protected data streams—such as voice or video—to be forwarded, while non-priority packets may be dropped. The goal is for the system to know and choose the packets to be dropped and to minimize the number of indiscriminate packets dropped (tail drops).
When the congestion doesn’t affect prioritized packets, standard carrier sense multiple access/collision detection (CSMA/CD) contention management and collision avoidance logic within Ethernet will make the best decisions of which packets to drop. However, when certain data must be prioritized or in cases of extreme network congestion, the intelligence within the MAC must be utilized to enforce the service levels.
Multiple queues per port are built into the MAC to allow for prioritization. The MAC leverages QoS tagging schemes such as 802.1Q to direct packets to the appropriate queue. Once the data is classified and enqueued, then the MAC relies on a combination of modified deficit round robin (MDRR) and weighted random early detection (WRED) to enable the queue management and memory management needed to pass the data with the correct priority to match the needs of the users.
MDRR is a user-definable mechanism that ensures fair port servicing for high-priority traffic while avoiding starvation of low-priority queues. MDRR services queues in a round-robin fashion based on a per-queue credit counter that indicates the number of data transfers available for the queue for the current round. If the counter is positive, the queue is allowed to send data. One queue per port is designated as the low-latency, high-priority (LLHP) queue for special traffic such as voice, and it is always serviced first before the lower priority queues. Only the highest priority Layer 2 class of service (CoS) data is allowed in this queue (see Figure 3).
WRED is another critical element of throughput, serving as the key congestion avoidance system for oversubscription. WRED avoids congestion in an intelligent way by relying on IP precedence to drop low-priority packets when the network is congested. A Layer-2, CoS-aware WRED algorithm is most optimal for Ethernet oversubscription.
WRED provides multiple programmable thresholds (watermarks) associated with each of the queues. Each of the thresholds has its own corresponding programmable probability levels, which creates threshold-probability pairs. The threshold is the value on the queue level, and the corresponding probability is the probability of dropping a frame if the corresponding threshold is exceeded. Thresholds can also be set on selected ports to guarantee no frame drops. By providing user-programmable WRED probability thresholds, equipment suppliers can fine-tune the frame drop behavior for a specific application.
The Role of the Interconnect
Amplifying the work of oversubscription-capable MACs is a modern interconnect architecture, which can double the oversubscription ratio without a requisite increase in congestion issues. The change in interconnect requires a channelized interface protocol and a switch interconnect. This architecture leverages its switching capability to direct the data from two MACs to one output port, which is connected to a network processor (or other packet processing resource).
In reality, the interconnect protocol of choice for multi-gigabit networking designs is System Packet Interface 4 phase 2 (SPI-4.2), which not only offers fine-grained channelized flow control, but also support for streaming data. SPI-4.2 is a 10 Gbps system interconnect implementation agreement drafted by the Optical Internetworking Forum (OIF) for connecting Link layer and Physical layer devices on board in multi-gigabit Ethernet and SONET applications. SPI-4.2 is a parallel interface, breaking down throughput into as many as 256 independent streams, each capable of transmitting data at rates up to the full line rate. It is designed for the efficient transfer of both variable-sized packets and fixed-sized cells. The OIF standard for SPI-4.2 specifies a point-to-point protocol with 16-bit transmit/receive data paths and support for 256 channels (referred to as “ports” in the specification), which gives it port granularity to support the full range of both WAN and LAN applications.
An interconnect switch is a new device that provides any-to-any connectivity on a board, replacing a bus-based, or daisy-chain interconnect architecture. Early implementations of this architecture were facilitated by FPGA or ASIC devices, but had limitations in terms of performance, any-to-any flexibility and high design costs. Several merchant devices are currently available that overcome these limitations. In oversubscription applications, these devices must support the required port density to connect the MACs and the NPUs, and also have the throughput and latency to help facilitate smooth data flow and not contribute to congestion during peak transmission periods.
In switches, latency is a function of the switch fabric speed, which is measured in multiples of the aggregate throughput of the ports, called overspeed. For example, a switch with four SPI-4.2 ports, each running at 14.4 Gbps would need a total throughput of 115.2 Gbps to be two-times overspeed. In real-world applications, a three-times overspeed switch fabric has the capacity to handle peak data bursts with no congestion.
The switch can also add its own flow control capabilities to minimize buffer size in the design and further manage congestion at the port buffer. The flow control mechanism detects congestion at the ingress, the switch arbiter and the egress, and relays congestion information back to the MAC to more quickly trigger its back-off signals during peak traffic times.
Application Example: 96 Ports and Only Two NPUs
When the oversubscription MAC and the interconnect work together, the doubling of the oversubscription ratio also comes with a reduction of the number of NPUs needed for the design to one-quarter the original design. Without the switch, each MAC feeds its own NPU, but with the switch in place an NPU can handle the traffic from two MACs—each supporting oversubscription. Certainly, the throughput of the processor must be higher, but in most cases this will still result in a dramatic cost reduction (see Figure 4).
An example that builds on this illustration is a 96-port 10/100/1000 oversubscription Ethernet board for the Advanced TCA form factor standard for use in central-office telecommunications networks. The general configuration of the board is that all 96 ports must share two 10 Gbps NPUs for initial packet processing. Four 24-port 10/100/1000 MAC devices provide the connectivity and are configured for 4:1 oversubscription. The MAC provides all MAC-layer processing and also the oversubscription processing using WRED and MDRR algorithms to maintain quality of service when consolidating the data.
Once through the MAC, the data are output onto a SPI-4.2 interface using eight channels to maintain logical separation of the Ethernet streams. The switch accepts the eight streams from each of two 24-port MACs and aggregates the streams onto a single 16-channel SPI-4.2 interface that leads to the NPU. The MACs each provide intelligent mapping of 24 ports to eight SPI-4.2 channels, while maintaining per-port flow control. While the switch provides no additional oversubscription-related processing, it is able to aggregate the streams for one NPU and to use its intelligent flow control capabilities to propagate the QoS through to the NPU.
In the end, the design is able to take a device which would rely on up to eight 10 Gbps NPUs—each running at a fraction of its capacity—and consolidate that to two 10 Gbps NPUs, putting this higher-layer processing capability within reach of even the most low-cost switch device.
To meet the need of corporations today, who want increased response time and support for new data types, networking companies are designing their devices to better groom data as it enters the network, which is resulting in a new breed of edge devices with sophisticated processing. This is where statistical multiplexing and oversubscription can make a big difference in providing packet processing at the lowest possible cost. Taking a holistic look at the interplay of MAC and interconnect in these systems can help the system designer maximize the capability.
Product URL: Click here for more information