Networking Worlds Collide As 10-Gbit Ethernet LAN Enters The WAN

Throughout their existence, the local-area network (LAN) and the wide-area network (WAN) have remained separate, complementary entities. Their distinct cost/performance structures were defined by the nature of their application. Broadly stated, the LAN took care of intersystem communication within a building or complex, while the WAN facilitated distance requirements ranging from a couple of km to 40 km and beyond. Their roles were clear, and never the two should meet.

As is often the case, however, cost pressures are about to change all of that. The two are now prepared to battle for the heart and soul of communications today: the metropolitan-area network (MAN). This network boasts a diameter of roughly 40 km. It's proven to be the fastest-growing segment of the networking arena, with business-to-business communication forming the bulk of the traffic. Up until a couple of years ago, it would've fallen naturally under the umbrella of the WAN, with its SONET-based architecture's rate of 9.5 Gbits/s. But a lot has changed since then.

Ethernet, the LAN protocol of choice, has come a long way since its inception in the mid-'70s. It's gone from 5 to 10 to 100 Mbits/s by 1994. Then, between 1994 and 1998, it jumped from 100 Mbits/s to 1 Gbit/s. Now, as it stands at the threshold of 10 Gbits/s, it could cater to the expanding MAN. It only needs a standard to be decided upon that would allow it to natively interface with the established, trillion-dollar SONET infrastructure. Therein lies the problem.

To date, improvements to the Ethernet standard have been dealt with by the IEEE-802 committee alone. These participants have similar LAN interests, so most decisions ended up holding a mutual benefit. Now, participants on the 802.3ae Committee for 10-Gbit Ethernet not only include members of the LAN community, but also from the WAN community. Their concerns don't exactly match up.

The top layer of contention is cost. From there the issues trace down to the media-access-control (MAC) mechanism and how it should interface with the physical-media-dependent (PMD) layer. No one is sure just what form that PMD should take. A key concern is the type of coding to be used. At 10 Gbits/s, this will really impact the demands made upon both the silicon and the optics—and hence overall cost.

Other issues to be debated include the base speed, distances, and the wavelengths of the lasers to be used. Those concerns spiral into discussion over the number of wavelengths, the number and type of fibers, and overall system quality in terms of bit error rate and component reliability. In addition, consideration must be given as to how other applications fit in, such as Fibre Channel and NGIO.

Fortunately, the structure of the IEEE is such that the concerns of both camps can be aired. No decision will be made final until 75% of the voting membership agree. So neither camp can dominate the proceedings. Also, five criteria exist under which the 802.3ae working group will entertain proposals to solve the many outstanding technical and market issues. They are: broad market potential, compatibility with IEEE 802.3, distinct identity, technical feasibility, and economic feasibility. Under these guidelines, anyone in the group has the right to shoot down any proposal, making it difficult to compromise the integrity of the proceedings.

This strict adherence to the cost principles, as outlined in the five criteria, is the chief bone of contention between the LAN and WAN proponents. Traditionally, the WAN has been technology-driven, with cost being almost an afterthought. If it could be done, it was done. Cost didn't matter. The LAN, in general, and Ethernet in particular, have taken the opposite tack. The LAN's low cost per node has been its selling point, and has helped boost the number of installed ports to over 700 million today and rising (Fig. 1).

As the number of nodes expanded, so did the technology. From 1 to 10 to 100 to 1000 Mbits/s it grew, all the while adhering to that mantra of the Ethernet community—10 times the performance for three times the cost. Not many schemes can make that claim, especially in the WAN arena. It's the low cost of implementation, in combination with its quality of service and network-management capabilities, that have pushed Ethernet beyond its original confines. In fact, predictions are that in five to 10 years, OC-192 SONET will be completely replaced by Ethernet, which will maintain its cost/performance ratio.

This doesn't bode too well for the existing WAN infrastructure and equipment. Nor will it help pad the pockets of component suppliers, particularly in the optics realm, who have long demanded high premiums from WAN box builders. Unfortunately, it also makes the decision process as much a matter of protecting turf as it is technical viability.

Understanding The Layers To understand the issues to be voted upon, look at the layers involved relative to the OSI reference-model layers. For LANs, the OSI Datalink and Physical layers (PHYs) are subdivided into the Logical Link Control/MAC and PMD layers, respectively (Fig. 2).

From the get go, the 802.3ae Committee decided that there should be a common MAC for the LAN and the WAN. This should be full duplex only, with no demand for shared-bandwidth hubs at high speeds. Of course, if the market should change, shared-bandwidth hubs can be achieved with a full-duplex MAC using a buffered-distributor architecture.

Two other key mandates have been set down: The MAC will be speed-independent and not constrained by distance. These latter two characteristics will be defined by the physical-layer specification. Here's where the fun starts.

From the MAC out, through the parallel link to the physical-coding sublayer (PCS) that goes to the physical-medium attachment (PMA), and out through the serial link to the physical-medium-dependent (PMD) layer, everything's up in the air (Fig. 3). Even the medium itself has its options. These include copper, which is pretty much stillborn as far as voting members are concerned. Copper is followed by serial or parallel fiber interfaces over SONET or wide WDM.

Take the layer structure of Ethernet and break it apart. Look at the lower layers. A community has a forte in a specific area of the architecture's structure, whether it be the coding layer or the physical layer in the transceiver. Those individuals all have a strong desire for commonality at their layer, whether it be the PCS and PMA or PMD and transceiver (optics) layers.

So the interface between the LAN and WAN PHY and the MAC needs to be common. But because the MAC is to be common, significant energy is being spent by a number of companies to push that level of commonality further down. Of course, the chief beneficiaries of commonality would be the box makers—the Nortels and Ciscos of the world. It's no surprise that the first proposal for a unified LAN/WAN PHY has come from someone on that side, namely Howard Frazier of Cisco Systems.

Frazier's proposal takes the position that differentiation between PMDs for the LAN and WAN is unnecessary. He says that it will only result in two different sets of optics and will place limits on the economies of scale that could otherwise be achieved.

The proposal puts forward a PHY architecture suitable for serial transmission on both the LAN (10,000 Gbits/s) and WAN (OC-192c/SDH VC-4-64c). A mechanism can adapt the MAC/PLS data rate to the WAN PHY data rate.

The LAN PHY takes four lines of regular 8B/10B data (4 by 3.125 Gbits/s). Using the proposed Hari interface, it puts those lines through a 64B/66B codec and sends them out via a serializer/deserializer (SerDes) at 10.3125 GBaud. The WAN PHY is similar, but incorporates SONET framing after the 64B/66B codec. It also incorporates a two-polynomial scrambler system before the SerDes with busy idle-rate control. It thereby achieves the required optical-interface-compatible data rate of 9.95328 GBaud.

A number of advantages exist here. The LAN PHY advocates get minimal cost, minimal complexity, and maximum compatibility. For those favoring WAN-PHY, there's compatibility between the photonic infrastructure and the operations, administration, management, and provisioning facilities.

Unfortunately, it's not really so simple. The WAN group doesn't like some of the simplifications made, while the LAN people insist on achieving a lower-cost solution. There also are numerous unsolved issues with regard to the Hari serial interface itself, which has yet to be approved. The 64B/66B coding scheme suffers contention too, with its overhead and licensing issues.

The Hari serial interface resides between the PCS/PMA and the PMD (Fig. 3, again). Its objective is to provide a common PMD interface and reset the link's jitter budget within the transceiver. A third goal comes from the IEEE perspective and does not carry such importance. It's to have a common interface that will work across multiple disciplines, such as Fibre Channel.

Fibre Channel comes up frequently in the context of Ethernet. Although it's the technically superior scheme for the storage-area network (SAN), Ethernet is proving to be a viable, low-cost alternative. If it becomes possible to have Ethernet from the LAN, all the way to SAN, and over the WAN, then it's one frame going through all three networks. That would make it possible to save a lot of money in the long run.

Physically, the Hari interface would reside on a pc board and allow up to 20 in. of trace length between the ASIC/SerDes and the PMD. It would offer a lot of flexibility for device placement (Fig. 4).

As a common PMD interface, it also would let development go forward both in the transceiver and systems world. The Hari interface would lend a level of confidence that those two things will actually marry in the future. No matter what PMDs win or prove feasible from an economic or technical perspective, then, it won't matter. The system houses will have a solution.

The second purpose of Hari is crucial. It refers to its ability to reset the jitter budget. The concept in Fibre Channel was that the serializer was allowed 25%, along with the electrical-optical transceiver and the fiber plus the receiver. The deserializer also received about 25%. Typically, jitter is measured by how much of the "eye" pattern is open. It has to be 50% open when it comes out of the transceiver. Otherwise, the link won't work. How that jitter budget is set is in negotiation, and it's a major point of contention among the voting members of the 802.3ae Committee.

The people who make the transceivers worry that they don't have enough of the budget. The same goes for the SerDes builders. The key is to balance the difficulty so that no particular industry bears too much. No one knows the ideal, but it's negotiable. One of the criteria for a decision is the limitation of current test equipment.

Assuming that there's a 10-Gbit signal, the total budget is 100 ps, or 25 ps for each section. But 25 ps is approaching the resolution capabilities of commonly available test equipment, which pushes the cost up.

What Hari does is reset the jitter budget, meaning that 100% of the jitter budget can be used between the serializer and transceiver. A new 100% goes from transceiver to transceiver, while a further 100% goes from the receiver up to the deserializer. In the end, each device has more latitude. The resetting is done by putting a phase-locked loop (PLL) inside the transceiver with clock recovery. It also requires adding circuitry to tune up the edges before it moves forward. In the end, the transceiver is 100% responsible for the signal leaving it.

This puts pressure on the transceiver manufacturers, as more complicated circuitry is required, which would bump up the cost. The argument could be made, though, that it would be impossible to build a 10-Gbit transceiver of any kind without pushing up the price.

The challenges of implementing Hari were, to some extent, built in. It was intentionally designed to try and keep the pin count to an absolute minimum, while still allowing companies to design transceivers and SerDes chips that interface with it. On the interface, the selection was made of 3.125-Gbit differential lines, or those that have 2.5 Gbits/s of 8B/10B data. That decision was made under the assumption that current silicon CMOS may not be able to do that readily. But within a generation, it should be easily achievable, especially with the loose jitter tolerances. Already, companies have silicon ready to go.

The second challenge has to do with the protocol, which is essential for using this interface while accounting for lane alignment in terms of skew. The transceiver has to be able to take any inherent skew between the four lines. It then must get rid of it before it converges, multiplexes it down to one path, and sends it out. The logic there is high speed and thus expensive.

The cost of silicon rears its head again, but this time in the context of the multiplexing/demultiplexing function. Some believe that in their architectures, it's easier to multiplex up 16:1 than it is for 4:1. The argument is that with 4- by 3.125-Gbit lines coming in, logic that runs at 3.125 Gbits is required. It's inefficient to multiplex it out to 16 wide within the chip. With 16:1, they don't have to multiplex it up until the very end, when it leaves the chip. Then, all of the logic on the chip is running slower. Again, companies believe CMOS can already do 3.125 Gbits/s, so the argument basically invalid.

To date, no alternatives to Hari have been put forth. The only choices are either an improved Hari or no common PMD interface defined in the standard at all. This would leave a situation in which the individual companies would have to go out and work outside of the standards body to create a common interface. This would be perfect for companies lagging in the technology required to implement Hari. Marketing 101 says that if a company can't catch up, it should divide the market and take a leadership role in a segment.

Recouping An Investment In Optics This brings up the whole argument over choosing the optics to be used. A lot of companies have invested a lot of research time and money to realize 12-Gbit lasers at the behest of WAN-based telecom companies. There's a premium to be paid for these devices. If a 10.3-Gbit serial scheme becomes the standard, the potential to charge that premium and recoup the investment goes out the window. That's especially true if predictions come through and Ethernet becomes the WAN standard of the future. Those forecasts cause optics companies to favor a separate LAN PMD architecture with a rate of 12.5 GBaud.

Regardless of the architecture, the optics are already available to meet all 10-Gbit Ethernet requirements. Options include: 65 m over installed multimode fiber (MMF) using 1.3-µm Fabry-Perot (FP) lasers; 300 m over enhanced MMF using 0.85-µm vertical-cavity surface-emitting lasers (VCSELs); and 2 km over standard single-mode fiber (SMF) using uncooled, unisolated, 1.3-µm FP lasers. Also looming are the choices of 10 km over standard SMF using 1.3-µm distributed-feedback (DFB) lasers, or 40 km over standard SMF using 1.3-µm DFB lasers.

The decision to use a particular scheme depends very much on the application. It's likely that more than one will be standardized. In addition, the actual data rate is very dependent on the coding decision.

Options for coding include scrambling, 8B/10B coding, 16B/18B coding, and multilevel analog signaling (MAS). There's also MB810 coding, 64B/66B coding, or combinations of two or more, as in Cisco's implementation. As outlined previously, however, problems do reside within Cisco's proposal. The proposed busy-idle character is defined in addition to the normal idle character. It also uses HP's 64B/66B code to provide frame delimiting without needing to know the frame length or overwriting the frame preamble with length information. This gives the 3% overhead for the 64B/66B coding, and HP hasn't yet disclosed 64B/66B licensing terms. These terms need to be ironed out if progress is to be made. Also, 64B/66B coding is a new, unproven protocol unlike 8B/10B coding, which is robust and well known.

As a result, and in keeping with the idea that a unified LAN/WAN design is better for all concerned in terms of cost and ease of implementation, Nortel and Vitesse Semiconductor are each supporting a unified alternative (Fig. 5). Their proposal incorporates the word-hold, null-word-insertion scheme proposed by Nortel, wherein the PHY asserts the word-hold line when the PHY's FIFO is nearly full. This causes the MAC to send a null word in the data stream out to the PHY. The PHY will discard the null word, lowering the actual data rate to 9.58 Gbits/s—the preferred data payload rate for OC-192 compatibility.

Avoiding Proprietary Technology This approach yields a 3% improvement in data throughput compared to the 64B/66B scheme. It also avoids the potential issue of proprietary technology licensing, albeit with the necessity of overwriting the preamble with frame-length information.

Regardless of its shortcomings, though, Rick Walker and Richard Dugan of Agilent Technologies are getting a lot of support for their proposed 64B/10B implementation (Fig. 6). Much of this positive response is due to its balanced coding and relatively low overhead. It's also particularly suited to the Hari proposal for two reasons: Only a limited number of characters are needed, and data is transmitted in contiguous blocks of at least 64 octets.

Of the other coding schemes, MAS seems to be garnering the most interest. It's a narrowband, low-dispersion, low-EMI coding scheme that uses amplitude modulation at 2.5 GHz and 10 Gbits/s. MAS takes an 8B/10B precoded input and is compatible with GMII extensions and Gigabit Ethernet PMDs. Plus, it can use existing MMF and SMF cable. It also can utilize 2.5-Gbit/s optics and possibly GbE optics, but it requires linear lasers and high-speed digital-to-analog and analog-to-digital converters (DACs and ADCs).

The MB810 scheme seems to have all of the right characteristics. Its implementation is so complicated, though, that many voters are having a hard time understanding it. Its validity is to be determined.

So while this is being written, voters are gearing up for the next 802.3ae Committee meeting in Albuquerque, N.M., scheduled for March 6-9. At that meeting, the brainstorming phase will be officially closed and voting will begin to narrow down the proposals to a manageable few, from 17 to about five or six. It's expected that at least four serial links will be left, with one parallel (wide WDM) proposal. By July of this year, no more will be accepted. In theory, the standard should be completed by March 2002. If that's to become a reality, a lot of issues—both technical and political—must be put to bed. For now, though, the potential financial rewards seem to be motivating the various factions to resolve their differences.