For science-fiction aficionados, the holodeck rooms used by starship crew members in the various versions of Star Trek are the epitome of unbounded data bandwidth. The creation of computer-generated lifelike 3D images that can interact with their surroundings represents the flow of trillions of pentabits of data. That's somewhere in the neighborhood of 1027 bits/s to and from the sensors and "emitters" in the imaginary holodeck. That data contains a combination of images pulled from distributed storage, as well as on-the-fly generated images.
Similarly, the famous "beam me up" transporter defines a process of moving solid matter nearly instantaneously from location to location. Such a process, though only imaginary, would involve the movement of equally large or still larger amounts of data in a matter of a few seconds.
Even though we're not anywhere near the stage of implementing such capabilities, the amount of data that has to be captured, routed, and delivered without error or data losses is staggering. Even the best network switches available today are just starting to address terabit speeds. Ongoing research promises to bump up the aggregate data rate to pentabit-per-second levels over the next few years.
Achieving that higher bandwidth means overcoming challenges that can typically be divided into three categories: data capture/extraction, data transmission/aggregation, and data manipulation/routing. Each of these areas has no choice but to improve.
In some ways, the Internet and World Wide Web is analogous to the starship—a large, distributed collection of computer systems that intercommunicate. But the current bandwidth available on the web is severely limited. That's partly due to the unorchestrated collection of servers, software, and interfaces used to handle all of the transactions. In contrast, all systems on the starship are designed to operate in unison and thus deliver top-notch performance.
We also suffer from a technology limitation. We haven't really figured out how to go beyond the 10-Gbit/s SONET backbones that manufacturers are putting into place. Data rates of 40 Gbits/s and beyond will take many more years of research to put into commercial use. In the meantime, data-transfer speeds are outstripping silicon's ability to support the transfers using today's architectures (Fig. 1). New architectures and technologies must be brought to bear.
The physical transport of the data streams has typically been done over microwave links, coaxial cables, and optical fibers. But as data rates increase, copper- and microwave-based schemes rapidly run out of bandwidth and give way to optical fibers. Those fibers provide the streams with much higher bandwidth.
One of the oldest techniques to increase throughput is to divide the information into multiple channels and transmit all of the channels in parallel. In copper-based interconnects, that requires a separate cable for each channel, making the cables bulky, heavy, and expensive. Plus, signal crosstalk between adjacent channels can corrupt the data.
Solutions based on optical fiber don't suffer from that crosstalk but, until recently, could only handle a single data stream (one laser beam) per fiber. That again leads to bulky, multifiber cables and limited upgrade options to increase the capacity. Running a second multifiber cable would entail a considerable expense.
Holding a lot of promise for the future, wavelength division multiplexing (WDM) and its next-generation, dense WDM (DWDM) promise ten- to thousand-fold increases in data throughput without adding fiber cables. Those increases come from two aspects. The base data rate will be increasing, first of all. And more channels of laser beams can be transmitted on a single fiber.
To deal with the first issue, higher-speed laser diodes, detectors, and the basic driver and logic circuits to support them are under development. Advances in material-deposition technology will allow companies to manufacture higher-performance gallium-aluminum-arsenide (GaAlAs) lasers that produce light at precise frequencies.
More optimized fiber-optic fibers are being developed as well. Tests by researchers at Lucent Technologies and Bell Laboratories have demonstrated the ability of a single laser diode to transmit 160 Gbits/s over 300 km of optical fiber. The single-wavelength system employs a semiconductor-based laser transmitter and demultiplexer. The high data-transmission speed represents a four-fold improvement over today's best commercially deployed system.
The optical signals were sent over the company's high-performance TrueWave RS fiber, which offers the lowest dispersion slope in the industry. With that low dispersion, the signal can retain its pulse shape over a longer distance, reducing the number of repeaters that must be used in the fiber's path when signals are sent over long distances.
By tuning multiple laser diodes so that each generates light at a different specific wavelength, designers found that they could simultaneously send multiple beams down the same fiber using WDM. It requires forming multiple laser diodes on the same substrate. The diodes are designed to generate their beams at frequencies (wavelengths) a specific distance apart in order to minimize any interference—typically between 50 and 100 GHz. Then, the beams are multiplexed onto the single fiber. On the receiving end, they're demultiplexed and the signals are recovered.
Currently deployed WDM systems typically implement a dozen to several dozen channels, with each one capable of transferring data at rates of several hundred megabits per second. By tightening the channel spacing, more channels can be combined on a single fiber. Controlling the diode optical wavelength this way requires improved processing and device fabrication. Commercial systems capable of handling up to about 100 channels are now being deployed. With a channel-to-channel spacing of 50 GHz, these systems will typically transfer data at up to several hundreds of megabits per second for each channel.
Processing improvements permit designers to fabricate the laser diodes with much closer frequency spacing, letting systems pack even more channels on a single fiber. The current record, set late last year, spaces channels just 10 GHz apart. Researchers from Bell Labs demonstrated this ultra-dense WDM (UDWDM) system, which packs 1022 channels on a single fiber. Although the initial system only transmits data at rates of 37 Mbits/s per channel, the aggregate data-transfer rate rises to more than 37 Gbits/s (Fig. 2).
The emitters and detectors generally proved to be the limiting factor in this case. If the lasers operate at top speed, the co-integration of the large number of laser diodes used to generate the beams will create a lot of heat. To keep the power manageable, a lower speed was used in the demonstration.
As manufacturing techniques improve, diode power consumption per channel will decrease. So diodes can operate at higher frequencies and therefore deliver data rates high enough to let the system hit OC-48 rates (2.488 Gbits/s) per channel. This adds up to a total fiber capacity of several terabits per second. Think about the fact that 1 terabit/s is roughly equivalent to 20 million simultaneous, two-way telephone conversations, or the transfer of the text from over 300 years worth of daily newspapers. Imagine, then, that with this increased fiber capacity, bundles containing hundreds or thousands of fibers would be able to achieve pentabit-per-second data capacities. Scotty, start warming up that transporter.
Between the laser diodes and the host system, or the detectors and the host, lie the high-speed logic circuits. These circuits take the parallel digital data and create serial data streams (or deserialize the data). Advances in CMOS technology now let circuits operate at gigahertz speeds. To hit the 2.5- and 10-GHz marks, however, biCMOS and silicon-germanium (SiGe) technologies are starting to replace GaAs circuits. Until now, those circuits have been the workhorse technology for gigahertz performance. But GaAs and other III-V materials are the only technologies capable of operation faster than 10 GHz.
Manipulating The Data
It's one thing to move data across distances between computers or central exchanges. But before data is sent or after it's received, an entirely different set of challenges will have to be conquered. Data must be packetized and provided with headers that direct it to desired locations. In some cases, it also must be secured via various data-encryption schemes.
On the receiving end, or somewhere in its routing through the network, the packets also must be identified with header information. If the data isn't used to route the packet, it's stripped off and replaced with new header information for rerouting. All of this must happen at gigabit speeds, which are usually referred to as "wire speeds."
This process requires high-performance RISC processors, specialized memories like content-addressable memories (CAMs), and new static-memory approaches. The latest SRAM schemes, the zero-bus-turnaround/no-bus-latency approach, allow designers to build still faster hardware, just as the new quad-data-rate SRAMs will as systems using those devices start to emerge later this year.
Products like these won't be fast enough for future systems, though. A class of circuit has been created to solve some of the problems. Dubbed "network processors," these controller chips will typically sit at the edge of the enterprise and manage the flow of data.
Two major classes of chips are usually needed. One controls the switch fabric that ties the networks together, while the other processes the data packets to make sure they get to the desired destination. In both of these areas, no real standards exist. The data-transfer area, in contrast, leverages standards such as SONET to ensure data flows without any incompatibilities. For the moment, more than two dozen companies are vying to become market leaders in these areas. They've come up with significantly different architectures to try to provide solutions for the control and management of the packets.
To help bring about some order and hopefully move the market forward, many of the companies have cooperated to form two organizations: the Common Switch Interface (CSIX) Forum and the Common Programming Interface (CPIX) Forum. The CSIX industry consortium has been drafting a specification that defines the physical and message layers of the interconnections between network processors and the switching fabrics. The forum refers to this function as traffic management.
Such a standard would allow equipment manufacturers to accelerate the design of complex, "chassis-based" communications systems through the use of off-the-shelf components. These components provide the processing power and critical connectivity between the interconnect and the system. The first version of the CSIX specification will support up to 4096 ports, with communication at speeds of up to OC-192 (about 10 Gbits/s). (For more info about CSIX, check out www.csix.org.)
Trying to find some common ground between communications processors and other data and telecommunications entities, the CPIX forum plans to define a series of standardized application programming interfaces (APIs). The organization wants these interfaces to be used by designers to program and interact with communications processors. By developing them, it hopes to make it easier for system manufacturers to design new products, while letting them retain the ability to select the best communications processor to meet design requirements. (For more about CPIX, check out www.cpix.org.) In general, CPIX will focus on software standards, while CSIX will target hardware standards.
Standards are gaining even more importance as voice and data networks merge into one unified network. Interest has boomed around the use of the Internet Protocol (IP) as a transport scheme for moving voice and/or video packets. But unlike data transfers for still images and text, which can tolerate the late arrival of some information, voice or video packets must arrive in a predictable way. Otherwise, the sound or image will be choppy or unintelligible.
Until recently, high-performance RISC and CISC processors typically were at the heart of most pieces of network hardware. These 32- and 64-bit engines are software-driven, and thus limited in their ability to handle packets by the instruction throughput of the processor and the complexity of the task they must execute. The best of today's crop includes the UltraSPARC III from Sun Microsystems, which runs at 600 MHz. During demonstrations, the Alpha processor from Alpha Processor Inc. has clocked at 1 GHz. Various MIPS-based processors run at 400+ MHz, and x86-based CPUs from Intel Corp. and Advanced Micro Devices Inc. now hit 700 MHz and faster.
As a loose rule-of-thumb, David Sawyer, the president of Northchurch Communications, Andover, Mass., estimates that software-driven CPUs can provide about 1 Mbit/s of bandwidth for each MIPS of throughput. Unfortunately, no matter how fast CPUs are, that won't be enough. Bandwidth handling must increase by a factor of at least 1000 or more for systems to keep up with data demands.
Even desktop computers and other future devices will demand data rates of 1 Gbit/s and higher. Plus, more and more transfers involve audio and video information and will deal with multiple packet-transfer protocols. So quality-of-service (QoS) guarantees in Layers 3, 4, and 5 of the ISO standard will be essential to assure smooth and loss-free receipt of data.
Rather than cluster all of the horsepower in a single high-performance CPU, truly scalable end-to-end networks must distribute the intelligence across all of the elements in the network. Doing so makes each element network-aware. The switches, routers, and other pieces of network hardware must all be programmable and able to evaluate data flow to improve network efficiency. As long as they have these qualities, networks can be dynamically configured and hence respond faster to changing traffic patterns.
Custom silicon solutions have long been used to provide the higher performance that off-the-shelf CPUs can't deliver. The gestation period for an ASIC is 9 to 12 months, though. Much shorter turnaround times, or an intermediate solution, are needed to simplify system design and shorten time-to-market.
Consequently, programmable-logic devices, like field-programmable gate arrays (FPGAs), are taking on a more vital role in the network market. Their short configuration turnaround times and reprogrammability (for SRAM and flash-based devices) allow easy system updates for adding new features or fixing a bug.
Over the last few years, FPGAs were typically limited to support roles. After all, their densities weren't able to implement the complex circuitry needed for network control or data-packet manipulation. But the latest offerings from companies like Altera Corp. and Xilinx Inc. break that mold by giving designers circuits with capacities hitting 2 million gates. Still higher-capacity devices are on the drawing boards, as process feature sizes drop below 0.18 µm.
As mentioned earlier, circuits such as network processors, edge processors, switch fabrics, and specialty memories like CAMs and pattern-processing chips can provide the hardware portion of the solution. Then, system designers can concentrate on the control software and features that they must include in the network hardware. Through the use of the standards fostered by the CSIX and CPIX forums, it will even be easier to develop that software.
An entire class of companies has evolved to meet the needs of such systems and make all of this a reality. The field does include a few long-established companies, such as IBM Corp., Intel Corp., and Texas Instruments Inc. But it's mostly composed of relatively new companies (typically five years or younger) that specifically solve network-bottleneck problems. These include Agere, C-Port, Entridia, Extreme Packet Devices, Innovative Engineering, Maker, MMC Networks, NeoCore, PowerX, Sitera, Silicon Spice, T.sqware, and Xaqti (now part of Vitesse Semiconductor). And that's not even all of them. You can get a more complete list by checking the forums' web sites.
Other companies also are joining in the exploitation of new CAM technologies. Kawasaki LSI, Lara Technology, Mosaid, Music Semiconductor, and NetLogic are culprits, along with the ASIC or intellectual-property providers that have CAM functions in their design libraries.
The bandwidth needed by the networks increases as the data moves further from the desktop. For instance, at the desktop, data typically demands interfaces to ISDN, xDSL, and 10-/100-Mbit Ethernet lines. Corporate-access point, which is the next level, generally finds those lower-speed lines aggregated to OC-3 and OC-12 (155 and 622 Mbits/s, respectively). The "edge" of the enterprise usually offers OC-12 and OC-48 interfaces (622 Mbits/s and 2.4 Gbits/s, respectively).
Interconnecting these enterprise organizations is the core backbone, which comprises both OC-48 and OC-192 (10-Gbit/s) SONET channels. As technology permits, the 10-Gbit/s interfaces will give way to 40-Gbit/s pathways and even faster solutions (Fig. 3).
At each of the above boundaries, designers face different challenges as they try to handle the ever-increasing quantity of data. Within the enterprise, for example, it's not enough to switch packets and prioritize them if you can't make any guarantees about delivering them. Systems must be able to identify specific critical traffic flows, such as data, video, voice, and lifeline, and then prioritize them in relation to all other flows. Under all conditions, the system must be able to guarantee that there will be availability of service (AoS) in the network for that traffic.
In an effort to resolve some related issues, Galileo Technology Inc. has unveiled a set of chips targeted at the converged network. The chip set performs the functions of the Layer 2 switch, Layer 3 switch/router, Layer 2/3/4/5 bandwidth shaper, and Layer 2/3/4/5 firewall. Each of the layers performs at full wire speed, too. The set also provides full AoS on a voice-over-IP switch. (See "Voice/Data-Switch Processor Guarantees 'Availability Of Service,'" electronic design, Nov. 22, 1999, p. 57).
To achieve the AoS capability, the GalNet 3 chip set borrows ATM's ideas of traffic policing a policy enforcement and dedicating bandwidth to specific flows (such as voice packets). Those steps guarantee both AoS and QoS. The circuits also improve on ATM by allowing the network to make informed AoS/QoS decisions based on the packet's data contents.
ATM systems can only do this at connection setup time. Once the connection is established, the bandwidth is dedicated—whether it's being used or not. The GalNet solution lets unused bandwidth be "reclaimed" and applied to other applications.
Managing the packet flow around a network is key to ensuring efficient operation and maximizing available bandwidth. Many companies are crafting first- and second-generation network processors that will form the heart of next-generation routers. As network traffic increases, classifying and routing the IP flow demands higher performance and more intelligence to prevent bottlenecks from slowing packet movements. CPU-powered routers traditionally handled such tasks, but their software overheads can bog down the routing process.
Custom architectures like the one developed by Entridia Corp. hold the promise of greatly accelerated packet movement. By placing the routing algorithms in silicon as hardwired operations, the company's Wisper chip (wire-speed edge router) can perform routing at 6 Gbits/s. On-chip routing engines reduce the latency typically encountered when those algorithms are executed on a host processor.
In the networks, both the routers and switches are vital to moving data. Typically, Layer 3 switches are used in the enterprise. They help simplify IP packet forwarding and achieve wire-speed operation. But switches haven't replaced routers at the edge of the network. There, the router's ability to handle multiple services and its higher network intelligence are critical to maintaining maximum throughput.
Since their inception, routers have usually employed hardware to handle packet forwarding and software for packet processing. But then packet processing becomes the bottleneck, and that's where many of the network processors shine as data rates start to hit the terabit level. Still higher data rates can be expected in the next few years, as data requirements inch toward the pentabit-per-second level.
For fast packet processing, CAM technology has long been used to accelerate the matching process. Due to the large chip area required for the memory array, however, these devices come with a high price tag. That's changing with the CAM approaches released last year.
A ternary CAM created by NetLogic allows the use of large lookup tables with user-definable widths of 72, 144, or 288 bits. Rather than offer just the binary 0 and 1 evaluation capability, ternary CAMs can perform compare operations on bits that are 1, 0, or "don't care." That's because the user is able to mask an entry on a per-bit basis. This capability is essential in network-routing applications, where the longest prefix-match searches are used in either classless interdomain routing or subnet masking applications.
Layer 3 and Layer 4 routing applications also can benefit from the mask-per-bit capability by letting users attach various policies and priorities to the router address-table entries. Such a scheme dramatically simplifies router design and can improve the time-to-market for the end product.
There are new techniques helping to speed network traffic that don't use CAM technology. Specialized chips perform fast pattern processing, like the recently released FPP chip from Agere. That chip performs high-speed bit-stream processing and is controlled by a very-high-level programming language that makes it simple to code the application and maximize code reuse.
Able to provide throughputs at OC-48 speeds with only a 133-MHz clock, the processor will deliver still higher throughputs when implemented in more advanced CMOS processes. This chip is one of several in a set designed by the company to perform 2.5-Gbit/s, wire-speed, Layer 3 packet/cell processing. It also can implement a virtual segmentation and reassembly process for internetworking applications.
With such an abundance of choices in network processors and switch fabrics, designers will be able to implement wire-speed solutions for the forthcoming generations of OC-48 systems. Future chip generations promise wire-speed operation at 10 Gbits/s. But that may take another year or two. Beyond that, yet another architecture may be needed to handle 40-Gbit/s and faster data-transmission requirements.