Connect PCI Express Subsystems With Advanced Switching Fabrics
DESIGN VIEW is the summary of the complete DESIGN SOLUTION contributed article, which begins on Page 2.
Current parallel backplane technologies are rapidly being replaced with advanced serial-I/O-based solutions. Two such open standards include PCI Express Base and Advanced Switching (AS). These new technologies recognize the prevalence and widespread use of legacy software yet offer advanced features.
Possessing full code compatibility with PCI/PCI-X-based software, PCI Express technology is being deployed in fabrics as well as in host and I/O subsystems. With support for nontransparent bridging, used in PCI for many years, PCI Express solutions can create complex multihosted designs.
When solutions based on AS—including its full physical and data-link layer compatibility with PCI Express Base and its advanced features—become available, high-end designs will migrate to a mixed PCI Express/AS solution. Host and I/O subsystems will continue to run legacy PCI Express/PCI code, and AS will operate as the system switching fabric.
Because PCI Express Base and AS have divergent transaction layers, these switches/bridges are necessary to ensure compatibility with both standards. Each of the switch's interfaces must ensure compatibility with both standards. It's crucial that the switch provide complete protocol interoperability including, but not limited to, translation of routing techniques, enumeration of PCI Express subsystems across an AS fabric, queue management, and the ability to ensure transaction ordering to prevent deadlock.
This article investigates how to intertwine the two technologies via PI-8 switching devices. Using these switches will ensure proper translation of routing methods and provide system configuration, buffer-management techniques, and a transaction ordering process to bridge the different methodologies of each standard.
HIGHLIGHTS: | |
"Path Routing" PCI Express Packets | Unlike address routing employed in PCI and PCI Express Base, packets in an AS domain are routed via a path description. AS routes PCI Express packets through the AS fabric by pre-pending an AS route header. |
"Binding" PCI Express Base To AS | A PI-8 bridging mechanism is implemented as a PCI-Express-to-AS switch at both host and I/O nodes of an AS fabric. |
Switch Initialization | The host switch, which presents itself as a PCI Express switch as enumeration software probes the RC subsystem, enables the software to discover I/O switches and devices attached to them through the AS network. |
Buffer Management | Two buffer-management choices reside on the PCI Express side of the switch: PCI Express control logic can return credit to its interface when the packet crosses the internal boundary between PCI Express Base and AS, or when the packet is injected into the AS fabric. |
Maintaining Order | AS ordering and deadlock avoidance rules offer compliance with these same rules for PCI Express Base, even though they're not identical. |
Full article begins on Page 2
Current parallel backplane technologies are rapidly being replaced with advanced serial-I/O-based solutions. Two such open standards include PCI Express Base and Advanced Switching (AS). These new technologies recognize the prevalence and widespread use of legacy software, yet offer advanced features.
Possessing full code compatibility with PCI/PCI-X-based software, PCI Express technology is being deployed in fabrics, as well as host and I/O subsystems. With support for nontransparent bridging, used in PCI for many years, PCI Express solutions can create complex multi-hosted designs. When solutions based on AS—including its full physical and data-link layer compatibility with PCI Express Base, and its advanced features—become available, high-end designs will migrate to a mixed PCI Express/AS solution. Host and I/O subsystems will continue to run legacy PCI Express/PCI code and AS will operate as the system switching fabric.
Because PCI Express Base and AS have divergent transaction layers, switches/bridges are necessary to ensure compatibility with both standards. Each of the switch’s interfaces must ensure compatibility with the respective standards present on that interface. It’s crucial that the switch provide complete protocol interoperability including, but not limited to, translation of routing techniques, enumeration of PCI Express subsystems across an AS fabric, queue management, and the ability to ensure transaction ordering to prevent deadlock.
"Path Routing" PCI Express Packets PCI Express Base and AS differ greatly in the routing method used. Unlike address routing employed in PCI and PCI Express Base, packets in an AS domain are routed via a path description. The path describes the packet’s route from the originating edge node, through the fabric, to the destination node. Software determines the route between edge nodes by building a graph of the system that identifies all possible paths. AS routes PCI Express packets through the AS fabric by pre-pending an AS route header, including the path to the packet, without modifying the underlying packet.One of the many benefits of AS is its support for multi-protocol encapsulation and tunneling. Protocols include those for fabric services, as well as industry-standard and vendor-specific protocols. Fabric services cover spanning tree generation, device management, and event notification. Among the numerous industry-standard protocols are well-known and deployed technologies such as ATM, SONET, Ethernet, and PCI Express Base.
Implementing separate layers and headers for routing and protocol-specific information enables multi-protocol encapsulation. The underlying protocol being transported is identified by a unique protocol-interface (PI) value. Each PI has its own header, preceded by an AS routing header. Refer to Figure 1 for the protocol stacks for PCI Express Base and AS. For example, the PI-8 AS-to-PCI-Express Bridge is specifically designed to tunnel PCI Express packets through an AS-based fabric, and will be used to realize mixed AS and PCI Express systems. As defined by the ASI-SIG, PI-8 has the architectural model of a switch
"Binding" PCI Express Base To AS The PI-8 bridging mechanism is implemented as a PCI-Express-to-AS switch at both host and I/O nodes of an AS fabric (Fig. 2). Each device contains a PCI-Express-compatible switch with a single upstream port, and up to 32 downstream ports. There also is an AS interface with a PI-8 capability structure register set, including an AS turn pool and pointers for constructing an AS header. See Figure 3 for an illustration of a generic topology, where multiple PCI Express subsystems are connected via this fabric.Packets received on a switch’s PCI Express interface are moved to the AS side of the switch. They’re encapsulated there by adding an AS header using PI-8 and sent through the AS fabric. Intervening AS switches are agnostic to the PI-specific payloads contained in the AS packets. At the receiving host or I/O switch’s AS interface, the AS header is stripped off and the PCI Express packet extracted. The packet is then moved to the switch’s PCI Express interface for routing through the PCI Express subsystem.
To route packets between these switches, a "binding" must be established between the host switch that connects a root complex (RC) to the AS fabric, and the I/O switch connecting the same AS fabric to a PCI Express subsystem that doesn’t contain an RC. This binding is a programmed AS path between the host and I/O switches. Multiple PCI Express systems may be simultaneously interconnected by an AS fabric. Each PCI Express I/O subsystem must be "bound" to a single host switch, but this host switch may be bound to multiple PCI Express I/O subsystems. The AS portion of the network can communicate among itself while the host switches communicate with these I/O subsystems.
To create a binding, binding registers in the switch are programmed with the path to the destination switch’s local address apertures (for memory and I/O accesses) or bus number (for configuration accesses). Host switches will uniquely identify each I/O subsystem’s address range and bus number in the RC’s address domain. I/O switches also need binding registers, but the binding is a single fixed path back to the host switch, as an I/O switch can only be tied to one host switch.
Each register set has two values. One value represents the path to generate requests or send completions. The other represents a path to check against incoming requests and completions to ensure that they come from a valid tunnel. When a PI-8 packet is created, the switch consults the binding registers to obtain the path portion of the header.
Several binding methods may be used. If modifying existing software is an option that the host switch supports, AS software running on the RC can create this binding. Techniques that don’t require software changes to PCI Express subsystem device drivers, system OS, or BIOS include strapping, SROM initialization, or running AS-aware configuration software on an AS-capable processor node.
In the latter case, AS fabric manager-oriented discovery, configuration, and fabric-management software will see a single AS endpoint configuration space when accessing the host switch from its AS interface. Just as with a pure AS topology, PI-4 packets are used to access the AS configuration space of the host and I/O switches and initialize the various registers. Whatever method is used, when binding a host switch to an I/O switch, requests generated from either switch to the other must follow the same path through the fabric to prevent timing issues and maintain producer/consumer rules in PCI Express Base.
Switch Initialization Thanks to the PI bridging mechanism, system enumeration of all PCI Express subsystems can occur despite the presence of an intervening AS fabric. The enumeration software starts running on each of the root complexes associated with a host switch. As the software probes the RC subsystem, it will find the host switch, which presents itself as a PCI Express switch containing a virtual P2P bridge per I/O subsystem.With the binding registers in the host switches previously configured, this host switch enables the enumeration software to discover I/O switches and devices attached to them through the AS network. The I/O switches also present themselves to the software as PCI Express switches. Because the AS fabric is invisible to it, the legacy software sees only these PCI Express switches and their associated virtual PCI-to-PCI bridges. Each RC will see only the PCI Express subsystems bound to it and will be unaware of the other RCs, their subsystems, and the AS fabric. A single AS fabric can have many RCs and their associated subsystems connected together.
If the host switch is discovered prior to the binding register configuration, attempts to access a subordinate bus will be terminated by a completion code, "Unsupported Request." This makes it appear as though no devices exist downstream. If the binding is subsequently created, a hot-plug event will occur and result in the resumption of the configuration process. When configuration software now scans the virtual PCI bus created by the host switch, it will detect a new I/O switch.
With binding registers programmed, the host switch can then populate the aperture base and limit values by snooping on configuration transfers. When configuration software running on the local host accesses any bound I/O switch, the host switch snoops these accesses. Through this mechanism, the host switch can populate all information required to route every packet that has a destination on the given I/O switch. This information is placed in the host switch’s shadow registers, with one such register set for each I/O switch supported.
The host switch implements a standard set of P2P configuration registers in the legacy configuration space, which includes base and limit registers, bus number registers, and bridge control and status registers. When a packet arrives on the interface, a decode operation must be performed to determine the I/O switch that should forward the packet. To make this determination, the host switch compares addresses or bus numbers (for configuration requests or completions) against each shadow register to determine the path. I/O switches use inverse decode of its base and limit registers to see if a transaction should be forwarded to the host switch.
When an I/O switch receives a tunneled packet, it must perform two functions. First, it extracts the reverse path of the request packet and stores it in a response path table using the packet requester ID as an index. Next, it extracts the PCI Express packet from the AS packet and forwards it to the PCI Express device interface. Once the I/O switch performs these tasks, the switch retains no other information about the packet.
When an I/O switch receives a packet on the PCI Express interface, it performs an operation similar to a host switch. However, the path used in the AS header depends on whether the PCI Express packet is a request or a response. For a request, the I/O switch decodes the packet address to determine the path to the destination host switch. For a response, the I/O switch uses the requestor ID from the PCI Express packet that contains the request’s bus number as an index into the response path table, so as to obtain the correct path back to the requestor.
Buffer Management The host and I/O switch’s interfaces have individual requirements for buffer management. The switch must ensure correct operation when returning flow-control credits to each interface. Two buffer management choices reside on the PCI Express side of the switch. The PCI Express control logic can return credit to its interface when the packet crosses the internal boundary between PCI Express Base and AS, or when the packet is injected into the AS fabric. In either case, the switch must not return credit to either interface unless the buffer is actually available for the link partner to transfer a new packet. The same is true for packets that enter on the AS interface and exit on the PCI Express interface.The PCI Express interface is a three-queue model, while AS is a dual-queue model that includes a bypass queue. Figure 4a illustrates PCI Express packets being received on the PCI Express interface and the separate queues for posted writes, completions, and non-posted requests. On the AS side, all three packet types are stored in a single queue. However, if a request packet stalls at the head of the queue due to the lack of available bypass credits, the request is moved to the bypass queue. When these credits become available, the stalled request packets must be moved first to prevent additional latencies. This two-queue model is also implemented in all AS fabric switch components.
As PI-8 traffic is received at the AS interface, it gets placed in a single queue (Fig. 4b). The queue structure only changes to the standard three-queue model if it moves from the AS to PCI Express interface. This ensures a deadlock-free operation, because the bypass required by PCI Express ordering rules is supported by the switch.
PCI Express Base and AS optionally support multiple virtual channels (VCs), which must provide the queue structure shown in Figure 4. This means that an arbiter or scheduler is needed to determine which VC is allowed to use the interface egress port next. On the switch’s AS side, the VC arbitration table is software-controlled via registers within the virtual-channel-capability structure.
Maintaining Order AS ordering and deadlock avoidance rules provide compliance with these same rules for PCI Express Base (and PCI), although the rules for AS and PCI Express Base aren’t identical. Unlike AS, PCI Express technology requires that writes bypass completions to avoid deadlocks. This rule exists in PCI Express standards to support (now obsolete) PCI bridges that don’t support delayed transactions. AS doesn’t require the capability because it guarantees forward progress for completions. Nevertheless, writes can bypass completions on the PCI Express side of an AS-to-PCI Express bridge.However, these differences are made transparent to PCI Express devices and software by AS flow-control mechanisms and the PI-8 bridge architecture. As a result, PCI Express/PCI transactions can be transported transparently from one subsystem to another via the AS fabric while maintaining strong ordering and remaining free from deadlock.
To summarize, PCI Express Base and AS are designed to be complementary, despite the divergence between the two standards at the transaction layer. AS is a natural fit for high-end switching fabrics, and through the use of PI-8 switching devices, both standards can be intertwined. These switches ensure proper translation of routing methods, and provide system configuration, buffer-management techniques, and a transaction ordering process to bridge the different methodologies of each standard. Even all legacy PCI Express software can be maintained with the fabric-manager node running on the AS fabric.