Network Processors Evolve To Meet Future Line Speeds

Sept. 17, 2001
Design experience with earlier network processors yields new improved products and techniques.

Continuous change in network speeds and services has forced network equipment designers and network-processor (NP) vendors to re-evaluate their designs. The quest is on for new and better ways to handle not only the higher speeds, but also the ever-increasing amount of network traffic.

Earlier generations of NPs or network-processing units (NPUs) were inadequate for line speeds greater than about OC-12 (622 Mbits/s). To meet the needs of those designing switches, routers, remote access servers, and other equipment for 1-Gbit/s and 10-Gbit/s Ethernet and Sonet OC-48, OC-192, and OC-768, designers are resorting to various new approaches, as semiconductor vendors provide a rich new mix of products and solutions.

An NP is a chip or chip set that performs packet processing at line speed. It may be a programmable RISC CPU, or multiple CPUs optimized for packet processing. An NP might also be one or more ASICs, a specialized chip, or a collection of chips that perform the desired functions. NPUs work at layers 2 through 7 of the OSI reference model. Earlier routers and switches worked at layers 2 and 3. But the growing number of new services, such as quality of service (QoS), differentiated services, and multiprotocol label switching (MPLS), now require processing through layer 7. Longer packets that take more time to classify and process have resulted. Moreover, with line speeds increasing at a rate faster than Moore's Law updates CPU chips, processing packets at line speeds has become excruciatingly difficult.

An NP is a basic component of a typical router or switch in a line or port card (Fig. 1). The card includes the physical (PHY) layer, usually fiber optic components with appropriate serializer/deserializer (SERDES) transceivers. This is followed by a framer that deals with the specific protocol used, such as Ethernet, Sonet, and ATM. The resulting packets are sent to the processing circuits. The switch fabric follows them and connects the line card to other line cards. This is called the datapath or data plane. Note the bus connection, usually PCI, to an embedded RISC processor, which implements the control path or plane.

The data plane functions are:

  • Pattern matching and packet classification
  • Packet processing or data modification
  • Traffic or queue management and traffic shaping
  • Security

The control plane functions are:

  • Set-up/tear-down
  • Table updates
  • Register/buffer management
  • Exception handling
  • Statistics gathering

Circuits in earlier equipment using one or more fast custom ASICs performed packet processing at line speed (Fig. 2a). Although still valid today, this approach lacks flexibility, but it remains the best for achieving line speeds of OC-192 and up.

If the processor is fast enough, it can handle all datapath processes described above using an NPU (Fig. 2b). Current processors can handle line speeds up to about OC-12, with OC-48 on the way.

In the newest configuration, an NPU performs many of the operations. Several complex and time-consuming operations are offloaded and delegated to coprocessors or specialized chips, typically content-addressable memory (CAMs) for packet search and classification and traffic manager chips. Most current designs use this method (Fig. 2c).

Equipment manufacturers initially solved the line-speed processing problem by using specialized ASICs. Proprietary chips, however, take time to develop (12 to 24 months), cost a lot, and are structurally rigid. Plus, an ASIC is fixed, and it can't be easily changed to support new protocols, add new functions, or be easily revised to handle unexpected changes or upgrades.

The network processor was developed to solve this problem. Programmability promised fast and relatively easy changes and upgrades. Most NPUs feature two or more fast on-chip RISC processors (ARM, MIPS, and proprietary) that can perform a variety of packet processing operations in parallel. Although a parallel-processing approach works, programming such processors while maintaining context and tracking threads isn't trivial.

After nearly two years of experience in designing with NPUs, engineers and vendors are beginning to realize how to best deploy these elusive new chips. Furthermore, designers are discovering that the best way to keep up with the ever-increasing line speeds is to offload some functions to external chips. Generally called coprocessors, they not only speed up the design, but also meet the needs of higher data rates.

One thing is certain. No one size or type of NPU fits all situations. Too many variables exist in equipment types, applications, specifications, and protocols. Vendors use a variety of architectures and hardware/software to meet their needs.

Agere Systems is a major player in NPUs. Rather than taking a conventional RISC CPU/co-processor approach, Agere developed a complete three-chip solution that fits between the framer and the switch fabric. Packets out of the framer (32 bits wide) go into the first chip, the fast pattern processor (FPP). It performs a patented search similar to that used in CAMs in conjunction with an external SRAM. If necessary, it performs packet recognition, classification, filtering, functional processing, and assembly. This unique chip is fully programmable in a specially compiled higher-level language called the functional programming language (FPL), which makes programming quite fast and easy.

Only six lines of code, for example, are needed to implement a simple IPv4 router using the Agere chip set (see the code listing). Implementing an assembler to program this function might result in several hundred lines of code. As Agere manager of NP marketing John Rolfe says, "Design or changes can be accomplished in minutes or hours by an experienced FPL programmer versus days, weeks, or months it takes an assembly language program."

The FPP output passes over a POS/PHY layer 3 bus to the routing switch processor (RSP). The FPP and RSP chips are tied together with the Agere Systems Interface (ASI) chip and provide the PCI interface to the control plane processor. The chip set handles virtually any type of traffic--Sonet, POS, Frame relay, ATM, MPLS--and runs at OC-48 speed. The latest version runs at 10 Gbits/s, making the set a good choice for OC-192 and 10-Gbit/s Ethernet applications.

Perhaps the oldest NPU firm is MMC Networks, which coined the term "network processor." Acquired by AMCC, its nPcore NPU is now in its third generation, featuring a lean 32-bit RISC-like architecture optimized for packet processing. The original 50-MHz chip was upgraded to 200 MHz in the second generation, and third generation chips have two CPUs running at 200 MHz. The most recent version, the nP7250, also contains additional on-chip coprocessors for packet classification, search, and policing and statistics. Expected later this year, an even faster version with more CPU cores running at a higher clock rate will handle OC-192 rates.

NP newcomer Internet Machines recently announced its IMpower network-processor and switch-fabric solutions. This product line consists of the NPE10 network processor, the TMC10 traffic manager chip, the SE200 switch fabric, and the IMC Development Workbench software.

The NPE10 is a set of massively parallel 32-bit RISC processors configured as a single-processor, single-threaded programming model. The processor is de-signed to separate the header from the payload in an incoming packet. Because the header identifies the protocol, this arrangement easily makes the system virtually protocol independent. The processor can achieve up to a 50-Mpacket/s throughput, suiting it for OC-192/10GE full-duplex systems. Output from the NPE10 is sent to the TMC10 traffic manager chip via an OIF SPI-4 phase 2 interface offloading the NPU, and to the programmer. The TMC10 feeds the SE200 switch-fabric IC.

One of the hottest new NPUs, PMC-Sierra's RM9000x2, features two 1-GHz MIPS processors. A super-fast 32-bit switch lets the CPUs share a high-speed memory fabric for internal communications. The chip has a DDR controller for the external DRAM, a 200-MHz MIPS SysAD bus, and a 1-GHz HyperTransport bus.

It must be emphasized that the original, traditional ASIC approach to network processing is far from dead. Majid Benanian, marketing vice president for the networking enterprise division, and Denny Sharf, strategic marketing manager for the broadband networking division at LSI Logic, say ASIC designs still have plenty of life left, providing ongoing work for many networking-equipment vendors. When a custom design is the best and the highest speeds are needed, the ASIC approach is still valid.

The fastest and easiest way to implement an NPU design and achieve the desired line rate, while also providing lots of headroom for future changes, is to delegate some of the more processing-intensive functions to external chips. In this article, coprocessor refers to different circuits for packet classification, traffic management, security, and other functions. One of the most widely implemented coprocessor types is the CAM. It's widely used for packet search, lookup, and classification (see "Use CAMs To Increase Line Speeds," p. 74).

Another major coprocessor category is traffic-management chips designed to fit between the NP and the switch fabric. While traffic management can be programmed in an NPU, doing it separately in hardware increases speed.

Acorn Networks' new genFlow-10G is an example of one such chip. Designed for multiprotocol processing and traffic shaping at 10 Gbits/s, it supports most popular switching and internetworking protocols, like Ethernet, IP, MPLS, ATM, and Frame Relay.

Mindspeed, the networking division of Conexant, now offers CX27470 Traffic Stream Processor (TSP) and PortMaker software for coprocessor applications. The CX27470 TSP chip, along with the PortMaker firmware, provides a complete set of traffic-management functions. The TSP is protocol agnostic and can define traffic streams as packets or cells, allowing it to manage ATM, packet over Sonet (POS), frame relay, and MPLS applications from OC-3 to OC-48.

Music Semiconductor's Alto priority queue packet scheduler coprocessor prioritizes and sorts 65,536 queues across as many as 16 physical ports.

Solidum Systems is offering its PAX.port 1100 and 1200 application specific standard product (ASSP) coprocessor chips as an alternative to CAMs for packet classification. These chips fit between the framer and the NP to do packet parsing, an operation not provided by CAMs. The PAX.port 1100 and 1200 use a programmable state machine to read, identify, and tag packets on-the-fly. The chips can be easily configured with Solidum's software. Using the PAX fourth-generation Pattern Description Language (PDL), designers define the type of processing needed. The software compiles the designer's policy rules into a state graph and then generates the minimal possible state machine representation for the chip.

Security is another function that makes sense to offload to a coprocessor. Typical examples of such chips are the Packet Armor and Socket Armor families from Corrent Corp. These chips have the capacity to encrypt and decrypt up to 10 million packets per second to secure the ex-change of electronic information over the Internet and VPNs. The Socket Armor chip is used for SSL applications. It can generate over 5000 1024-bit RSA keys/s. The Packet Armor chip is used for OC-48c full-duplex and OC-192 half-duplex security applications providing full support for the IPSec standard.

Software To The Rescue The concept behind NPUs is sound, but engineers have discovered that design is now a software problem rather than a hardware problem. Many blame the software problem as one of the key reasons for the rather slow adoption of NPUs in networking equipment. NPU vendors and several new companies are addressing this software problem, and the obvious opportunity.

NPU suppliers are upgrading their software. Newer, more sophisticated software features simulators, debuggers, and other tools. Also, a few vendors have added C/C++ compilers. The programming effort is typically faster and simpler if the programmer can use C. But the resulting code is commonly much less efficient than hand-coded programs using an assembler. Nevertheless, some newer NPUs have sufficient headroom in performance to accommodate C code inefficiencies. C also makes faster prototyping possible to prove a design that can be fine-tuned later with an assembler.

NPU vendors are beginning to provide libraries of common packet-processing routines and APIs too. Designers get a head start because they don't have to reinvent the packet-processing wheel for common protocols.

Also getting involved are third-party vendors, such as LVL7 Systems with its recently announced FASTPATH software. FASTPATH implements a nearly complete set of networking software already ported to and integrated on popular NPU platforms. It includes the critical NP microcode and device drivers as well as various protocols and applications code. The software is modular and verified in real-life environments.

With FASTPATH, network vendors can begin their designs with 75% to 95% of the necessary software, and that can shave six months or more from design time. As Eric Dixon, vice president of business development at LVL7 Systems, says, networking equipment vendors are getting smarter as they're now looking at what software is available before committing to a vendor's NPU chip. FASTPATH is currently available for both AMCC/MCC and Vitesse NPUs. FASTPATH for other NPUs will be available in the near future.

Another fresh approach to the software problem comes from Teja Technologies. Its recently announced Network Processing Operating System (NPOS) promises to cut software development time by as much as 50% when working with selected target NPUs. The NPOS provides an environment for developing both control plane and data plane software. NPOS runs primarily on the embedded control plane processor, typically a PowerPC, MIPS, or other RISC processor under an RTOS like Linus or VxWorks. Code produced by the software runs on both the control plane processor and on the multiple data plane processors.

Teja's value-added solution is that NPOS allows developers to express their network application logic as state machines independent from the target NPU. Programming is in a graphical flow format. The state machine logic is then mapped to target hardware by generating C, C++, or assembler code for the target NPU, resulting in a relatively fast, painless way to create code for the system. The company's initial version of NPOS is designed to support the Intel IXP1200 NPU. But Teja expects to offer versions for the most widely used NPU from Motorola C-Cube, IBM, and others.

Functional High-Level IP Code Example Using Agere's Functional Programming Language (FPL)
0x4:4 // Insure IP version 4 header
0x5:4 //Standard header-no options
fSkip (120) //Skip to Destination Address
IpAddr ( ) //Get routing information
fSkipToEnd( ) //Replay all data from memory
exitTransmit ($4) //Transmit the PDU
Need More Information?
Acorn Networks (703) 736-9397 Agere Systems (800) 372-2447 Applied Micro Circuits
Corp. (AMCC) (858) 535-4260 Corrent Corp. (480) 648-2300 IBM Microelectronics (845) 892-5389 Internet Machines (818) 575-2175 Kawasaki LSI USA Inc. (408) 570-0555 Lara Networks (408) 942-2026 LSI Logic (866) 574-5741 LVL7 Systems (919) 865-2722
Mindspeed Technologies (508) 621-0657 MOSAID Technologies Inc. (613) 599-9539 Music Semiconductors (408) 942-0837 Network Processor Forum PMC-Sierra (408) 239-8000 SiberCore Technologies (613) 271-8100 Solidum Systems Corp. (613) 724-6004 Teja Technologies Inc. (408) 288-2560 Virage Logic Corp. (877) 360-6690 Vitesse Semiconductor (805) 388-7452
About the Author

Louis E. Frenzel

Click here to find more of Lou's articles on Electronic Design. 

Sponsored Recommendations


To join the conversation, and become an exclusive member of Electronic Design, create an account today!