Server reliability, availability and serviceability (RAS) have become crucial for businesses, and as RAS approaches a guaranteed up-time availability target of 99.999 percent, or "five nines," the importance of replacing or adding components on the fly has become critical. PCI Express (PCIe) was introduced into the PC and server environments as a serial communications interface standard and since then has built such traction that it’s now the protocol of choice in the server interconnect arena. The need for PCIe slots on these servers to be hot-plug ready has become extremely important, as is evident in next-generation designs. Hot-plug functionality is essential to maintaining "highly available" server systems.
The fundamental purpose of hot-plug functionality is to allow the orderly live insertion and extraction of boards or enclosures without adversely effecting system operations. This is typically done to repair or replace faulty devices, as it is often difficult, if not impossible, to schedule downtime on a server to replace or install peripheral cards. The ability to insert or replace I/O devices this way eliminates or at least minimizes system downtime. This technique can also be used to add new functionality for the reconfiguration of the systems. Laptop users also need hot-plug capability to swap cards that provide I/O functions, such as disk drives and communications ports built into docking stations.
The PCI Specification 2.1 was not intended to accommodate hot-plug applications, but later in 1997, the PCI Hot-Plug Specification 1.0 was introduced and defined the basic platform, add-in card and software requirements. However, system developers and system users were left with a fair amount of flexibility in their interpretation and implementation of the specification. In addition, no software visible register set had been defined. In 2001, a new set of specifications was put forward: PCI Standard Hot-Plug Controller (SHPC) 1.0 and PCI Hot-Plug 1.1. These specifications tightened the user interface and defined a standard set of registers, greatly enhancing the compatibility of hot-plug software development.
Still, PCI had hardware limitations because it is a multi-drop shared bus, which operates on reflective wave switching. Insertion of a device without the isolation of the bus was not possible. To overcome this issue there were two possible choices: make the platform responsible for the pluggable requirements, i.e., hot plug; or make the add-in card responsible for the pluggable requirements, i.e., hot-swap.
Both approaches had their shortcomings, and innovative approaches such as pre-charging the PCI pins before insertion of the card into the system were utilized to get around problems, such as glitch generation, and to maintain data integrity. The standard PCI connector had pins of the same length, so power sequences of the add-in cards could not be accommodated in the hot-swap environment. A new set of connectors was defined for Compact PCI with a staggered pin approach. This resolved most of the limitations, but because PCI was a parallel bus, performance was becoming a bottleneck. The frequency at which the bus operated was dependent on the bus loading, and more devices on the bus meant a slower bus. Lastly, due to the shared nature of the PCI bus, it could not survive a component failure because any erratic behavior on the bus would prevent all the devices on the bus from communicating.
PCI Express, since its inception, was designed to comprehend hot-plug functionality. As such, hot-plug registers are part of PCIe's capabilities, providing the operating system with a standard hot-plug hardware register interface accessible through configuration access on the PCIe bus. PCI Express also defines a standard usage model by defining the hot-plug capabilities required of hardware at a base architectural level. The native support for hot-plug control enables innovative server module form factors to be inserted or removed under power without requiring that the chassis be opened.
In a PCIe-based server system, hot-plug slots may be sourced either from the chipset or the downstream ports of a switch. As PCIe is a point-to-point bus, switches are usually required for slot expansion and creation because of the limited number of ports of the root complex. These switches appear to the software as PCI-to-PCI bridges, and each port implementing a slot that is hot-plug capable will contain its own set of hot-plug registers in the bridge configuration space. These registers report the presence or absence of defined hot-plug mechanisms to software. They contain control for power as well as indicators on the slot, along with notification of card insertion/removal, latch open/close and of the attention button press. The software is notified by sending an interrupt upstream to the root complex. The notification option is implementation-dependant.
PCIe switch developers have taken two approaches for providing hot-plug support. They have either opted for on-chip (Figure 1) or off-chip (Figure 2) SHPC support. In the first approach, the controller logic and signaling interface are embedded in the switch. The device and slot-capabilities registers are also implemented within the switch. In the off-chip SHPC approach, the signaling interface is not present and system hardware designers will be required to implement additional circuitry on the board to mimic the hot-plug controller either through an FPGA or an I/O Expander with I2C support. This increases the cost of the material, board space required and complexity of the design. An example of on-chip SHPC implementation can be found on PLX Technology PCIe switches, as all required hot-plug status registers are integrated on the chip and all required signals for hot-plug implementation are provided. The PLX switches support from three to eight ports with hot-plug capability. This substantially reduces the cost and complexity of implementation for systems needing SHPC functionality.
PCI Express hot-plug software support depends on three essential factors, firmware support, device driver support and operating system support. All three of these have to support the hot-plug specification to enable the system to handle add-in card insertions and removals.
Upon power-up, during the power-on self test (POST) time, firmware creates and loads certain methods and tables. The implementation of these methods and tables is defined in the Advanced Configuration and Power Interface (ACPI) firmware specification. The firmware is also responsible for configuring the system address space for the operating system. The firmware divides the system address space into a number of specialized regions, including those that are used for system memory, system I/O and PCI configuration space, which are required by PCI devices. The firmware address map dictates the usage of these maps by the operating system.
When a PCIe endpoint is inserted into a PCIe switch's slot, the switch generates an enabled interrupt, such as the Presence Detect Changed interrupt. The root complex forwards the interrupt to the bus driver, the Windows plug-and-play (PnP) manager and the Windows power manager. The Windows PnP manager, in turn, requests the PCI bus driver to re-enumerate the PCI bus devices, which may result in resource reallocation. When a new device is discovered, the corresponding driver is loaded, and then the device is initialized and prepared to handle I/O (See Figure 3).
All Hot-Plug events such as insertion, ejection and removal on Windows 2000 and Windows XP could be handled through ACPI 1.0b. Windows XP supports one additional object that was introduced in ACPI 2.0 (the _HPP object). These operating systems were limited for resource allocation to single bus segment only and could not re-allocate hardware resources (I/O and memory apertures) at runtime. So insertion of a bridge or a switch required multi-level resource allocation for the switch and the device behind the switch.
An operating system can properly start a device only if the pre-allocated resources are sufficient to accommodate the new device. If resources allocated are not sufficient, especially if a bridge was inserted, the operating system cannot usually reconfigure PCI-to-PCI bridges at runtime to address the device resource requirements.
The Firmware ACPI Specification 3.0 resolved this by providing support for Operating System Capability (_OSC) method and Device Specific Method (_DSM). The _OSC method is a mechanism that is used to define the parent–child dependencies of hardware to convey these relationships to the operating system, whereas _DSM is an optional ACPI control method that enables device-specific control functions and allows the OS to ignore the PCI device boot configuration.
Windows Vista, a PCI Express-aware OS, supports multilevel rebalancing. Multilevel rebalancing allows PCI bridge windows to be dynamically sized based on the resource requirements of the devices behind them. Methods such as _DSM allow platform firmware to heavily boot-configure devices for Windows XP and Server 2003, but provide Windows Vista with the liberty to ignore the boot-configured resources, thus allowing greater resource allocation flexibility. This enables the operating system to relocate and/or expand PCI bridge resource windows to accommodate device requirements that would otherwise not be possible due to the restrictions imposed by having to respect boot device requirements. Multilevel rebalance code in Windows Vista operates more efficiently, since it is no longer restricted by having to maintain the boot-configured resources of the system.
To summarize, PCIe delivers a significant amount of resilience at the transaction, data-link and physical levels, which, along with its point-to-point nature, helps PCIe-based designs avoid a single point of failure. These characteristics, combined with PCIe’s native Hot-Plug support at the firmware and the OS levels, all confirm that PCIe is ushering in a new era of RAS.
Ali Jahangiri is manager of applications engineering at PLX Technology (www.plxtech.com), Sunnyvale, Calif. He can be reached at [email protected]
Company: PLX Technology
Product URL: Click here for more information