When it comes to processing wire-speed packets with a time-to-market that ASIC system designers will envy, network processors promise to beat the pants off of conventional processors. Envelope-pushing network-hardware designers simply have to decide which network processor to use. But network-processor architectures are so varied, designers need to choose carefully.
Designers also must determine which network processors will address their final product's market. The term network processor has been applied to a range of products, from Ethernet-to-DSL controllers to terabit switching. Many favored processors address mid- to high-end switching with speeds that range from OC-3 (155-Mbit/s) enterprise solutions to OC-192 (10-Gbit/s) channels used by ISPs and carriers. This range includes OC-12 (622 Mbits/s), OC-48 (2.4 Gbits/s), and Gigabit Ethernet OC-768, a long-term target for many network-processor vendors.
Operating at wire speeds is critical. Designers must match network-processing vendor claims with service support when comparing alternatives, though. Many vendors provide performance numbers based on basic routing and switching policies addressed by layers 2 through 4 of the International Standards Organization's (ISO's) Open Systems Interconnect (OSI) reference model.
The more robust layers 5 through 7 include address-content switching, URL switching, security, load balancing, service-level agreements (SLAs), network address-translation (NAT), multiprotocol-label-switching (MPLS), and voice-over-IP (VoIP). These layers often reduce the maximum bandwidth supported by the network processor. Some solutions utilize multiple network processors while others restrict designs to a single network processor.
Though the enterprise-network processor space is at the low end of the performance scale, it's usually at the high end of the service scale. Some solutions approach the commodity level. For example, Switchcore's CXE-16 is a 16-port Gigabit Ethernet switch/router on a chip. Just add some RAMBUS RDRAM, some optional content addressable memory (CAM), and an external control processor for a complete solution. There is room to add value, but not nearly as much as some of the more expensive alternatives.
The IXP1200 Internet Exchange Processor by Intel is the quintessential network processor. Its six integrated programmable microengines have hardware context support (32 registers) for four threads for a total of 24 active threads (Fig. 1). An on-chip 200-MHz StrongARM processor coordinates system activities, although a PCI-bus interface provides integration with an external control processor.
The idea behind the hardware context switch is to maintain high utilization of resources, such as the built-in coprocessors and memory access subsystem. This can be a complex programming task. A single IXP1200 may suffice, but multiple chips can be combined using a variety of bus architectures to increase throughput and functionality. This is another area that lets designers differentiate their product from the competition when providing layer 2 through 7 services.
The current offering from Allayer Technologies addresses layer 2 through 4. The AL100 uses a ring-of-switches (ROX) bus architecture. The bus supports up to four network processors, plus additional coprocessors that offer switch management and other services. Multiple processors provide incremental improvement and migration to the next-generation AL3000. The 12.8-Gbit/s ROX-II bus will supply a major leap in performance.
Under the watchful eye of an on-chip Power PC core, IBM Microelectronics packs 16 programmable protocol processors into its network processor. A PCI control-bus interface provides access to an external control processor. The protocol processors are paired to share on-chip hardware coprocessors that accelerate tree searching and frame manipulation. IBM's design utilizes less-expensive DDR DRAM while supporting OC-48 rates. As with most network processors, the amount of traffic that the device can handle depends upon the type of analysis performed with each frame. Higher-level protocols force the processors to look further down into the data, which decelerates throughput.
The CS2000 reconfigurable communication processors (RCPs) from Chameleon Systems offer an interesting alternative to the fixed configuration of most network processors (see "Scalable, Reconfigurable Processor Adjusts Logic For Top Performance," electronic design, May 15, p. 66). They consist of a dozen identical but configurable tiles, which are organized into four slices, for data processing (Fig. 2). Each tile has a control unit, seven 32-bit datapaths, two 16- by 24-bit single-cycle multipliers, and four 32-bit, 128-word memory blocks. The CS2000 also has 16 DMA engines, and its components are tied to the internal 128-bit RoadRunner system bus.
The ability to reconfigure all or part of the CS2000 on-the-fly impacts the overall system
design as well as individual algorithms. Unfortunately, configuration switching isn't instantaneous, thereby limiting the throughput that can be handled under these circumstances. Even so, it allows efficient processing algorithms to be implemented with customizable hardware that usually performs better than software designed to perform the same job.
The high-end network-pro-cessor space maximizes performance. Compared to enterprise solutions, though, functionality is often sacrificed for speed. Still, features like quality of service (QoS) are usually included.
Formally known as Agere Inc., Lucent Network Processors uses a pair of chips to handle OC-48 rates: the Fast Pattern Processor (FPP) and the Routing Switch Processor (RSP). Many OC-48 products actually support four OC-12 ports. Lucent indicates that its design should scale to OC-192 rates. It seems a good bet given that the FPP consists of only 4 million transistors, versus 53 million for C-Port. The company keeps costs down by utilizing PC133 SDRAM for off-chip memory. The PCI interface handles external management.
The FPP is programmed using a functional programming language that lends itself to the chip's architecture, compared to a more conventional processor or a state machine. Function programming may be new to many designers. Lucent, however, supplies programming samples, multiprotocol routing, and segmentation and reassembly (SAR) needed for IP over asynchronous transfer mode (ATM), one of the target markets.
Processing is often complete when the frame is ready for routing because the FPP performs pattern matching while receiving a frame. The chip manages up to 64 threads at one time. Lucent is initially targeting level 2 through 4, but the FPP can support up to level 7. Throughput depends on how deep into the frame a program must look, not how many patterns are available for matching.
The C-Port by Motorola uses a more conventional high-speed bus architecture with its C-5 digital communications processor (DCP). The C-5 has 16 channel processors plus five coprocessors, including an executive processor, a fabric processor, a table lookup unit, queue management, and buffer management (Fig. 3).
The 16 channel processors can be used individually, organized in a bank to handle a data stream in parallel, or organized serially with each processor handling a different task. Multiple C-5 DCPs can be connected in parallel to manage even more traffic, although OC-48 is a C-5's limit. C-Port's conventional multiprocessor design makes programming the C-5 DCP easier to learn and manage. This can improve the time-to-market, but it may limit the differentiation with competitors that also use the C-5 DCP.
A routing engine built by Entridia employs a hardware state machine. It's less flexible in terms of handling new protocols and features that aren't currently defined, but it can push bits faster than most programmable solutions. Data moving through the system is deterministic with low latency. Meanwhile, Entridia's Optical Edge Routing Architecture (OPERA) provides OC-12 support. OC-768 is promised by the end of the year. Even at these speeds, Entridia's IP routing supports QoS.
The MXT5100 Edge Stream Processor (ESP) by Maker Communications Inc., a part of Conexant Systems, provides layer 2 processing. It employs a 16-bit SWAN RISC processor and operates at OC-3 speeds. The faster MXT4000 uses a 32-bit processor and operates at OC-48 speeds. Its high-speed CAM-like search improves performance without a CAM's hardware overhead.
Built by Sitera, the IQ2000 claims OC-48 performance. Sitera says its architecture is scalable to OC-192 speeds. It has a fairly conventional multiprocessor design with four built-in channel processors. Plus, it has specialized lookup, context-management, DMA-management, order-management, and multicast-support coprocessors connected by a 50-Gbit/s internal bus architecture. The IQ2000 also supports external application-specific processors. Third-party processors are available for a variety of services, from policy management to encryption. Sitera is providing direct RAMBUS and SDRAM support for external memory as well. Like the C-5 DCP, the IQ2000 can be used alone or in a group.
PaceMaker 2.4, manufactured by Vitesse, is designed for OC-48 speeds. It incorporates ATM AAL5 SAR support and manages traffic for layers 2 through 4. Its hardware state machine is built for ATM and IP support. Additionally, it handles up to 256k sessions.
Another Vitesse product, the TeraPOWER, increases the power and feature scale. It's designed for layers 2 through 7 at OC-192 wire speeds. Multiple active flow processors (AFP) are optimized for different jobs, such as parsing, searching, classification, editing, and queuing. And, the TeraPOWER uses standard 64-bit SSRAM, 64-bit SDRAM, and 32-bit CSIX interfaces.
One company, Stargate Solutions Inc., takes a different approach to delivery. Its Scalable Tile ARchitecture (STAR) Packet and Protocol Processor consists of approximately 3500 ASIC gates per tile. Stargate's SmartConnect links STAR tiles. A simple instruction set manages packet and protocol processing for the programmable state machine in each tile. The instruction set is optimized for networking protocols such as IPV4, IPV6, IP-Over-Sonet, MPLS, ATM, and Frame Relay.
The current incarnation of the STAR tile handles OC-48 speeds, including ATM SAR support. Future versions will support OC-192 and higher. The advantage of Stargate's approach is customization. An entire system can be placed on an ASIC. Tiles can be surrounded by custom support logic. The downside, though, is significantly higher design-expertise requirements, creation and test of an ASIC, and custom programming and configuration.
Pushing the envelope is the name of the game. MMC Networks Inc. and EZChip have their sights set on the very high end of the market. MMC Networks' nPX family should reach OC-192 speeds by 2001. The network processor will be built around six network-optimized instruction-set RISC processors called nPcore. EZChip is a bit more specific about its NP-1 design, which brings a superscalar, data-flow design to network processing (Fig. 4). While its PCI control interface is conventional, the NP-1's ambitious on-chip 512-bit wide memory will significantly boost performance.
EZChip designed task-optimized processors (TOPs) for four basic functions: parsing, searching, frame modification, and address resolution. Each provides better than a 10:1 improvement over a conventional network-optimized RISC processor. For instance, the search TOP has search enhancements and support for compressed trees while using an enhanced hashing algorithm.
Additionally, TOP dispatching is made transparent to the programmer as a frame winds its way through each stage. Multiple TOPs can be applied at each stage, depending on the program. The current distribution of TOPs at each stage is based on estimated use for typical network environments. EZChip estimates that the NP-1 will easily handle traffic at OC-192 speeds. This includes supporting features through layer 7.
Some network processors like Sitera's IQ2000 support external coprocessors. Many other network processors are self-contained except for access to the switch fabric, memory, and a control processor. They also can utilize these coprocessors via the control processor. In either case, coprocessors often can significantly improve overall system performance. Sometimes, the network processor may already incorporate support that some external coprocessors provide.
The Internet Protocol Routing Protocol (IPRP) by Alliance Semiconductor, a device for managing routing tables, is programmed across an 8-bit command bus with an instruction set featuring about 30 instructions for database search and management. One IPRP can support three OC-192 channels.
The Policy Co-Processor family by NetLogic Microsystems provides sophisticated policy-address classification with CAM-like support. It's ideal for firewall and virtual private-network (VPN) support.
Another classification coprocessor designed for OC-192 speeds, Solidum Systems' PAX.port 1100, integrates 16 classification engines. The engines are programmed using a high-level, nonprocedural pattern description language (PDL). Protocol libraries are available. Bandwidth is independent of the number of classification entries.
A cryptographic accelerator designed by Chrysalis-ITS meets FIPS 140-1 level 3 requirements. The Luna 340 packs four RISC processors on the chip. It supports dozens of private- and public-key encryption algorithms. Also, it can sustain OC-3 rates for IPSec-ESP and ATM encryption. The Luna 340 can handle bulk DES encryption at Gigabit Ethernet speeds, as well.
Two standards organizations may help bring some order to the market. Most of the companies in the network-processor space are a member of one or both organizations.
The Common Switch Interface Forum was started to support multivendor interoperability for an OC-192 signaling system known as CSIX. This system doesn't use out-of-band signaling. It's a board-level definition that allows network-processor and support chips from different sources to be connected with minimal support.
The second group is the Common Programming Interface (CPIX) Forum. CPIX's goal is to promote application-programming interfaces (APIs) between control and network processors. It will be interesting to see how successful CPIX will be, given the architectural diversity of network processors.
These network processors cover a lot of ground. But it's clear that designers will have a number of models to choose from, regardless of the target market for their switch design. The flexibility offered by network processors will let vendors distinguish their products while enhancing software.
|Companies Mentioned In This Report|
Conexant Systems (Maker
Intel Corp. Network
MMC Networks Inc.
NetLogic Microsystems Inc.
Solidum Systems Inc.
Stargate Solutions Inc.
Vitesse Semiconductor Inc.