Leverage CPLD Flexibility In Customized PCI Interfaces

Although developed specifically for the PC industry, the peripheral interconnect bus (PCI) tackles everything from desktop peripherals to advanced network switches. For industrial or communications systems, which often use proprietary solutions, designers don't have to maintain 100% PCI bus compliance. There, programmable logic devices (PLDs) offer tremendous flexibility, allowing designers to implement just the right combination of PCI features. The challenge, however, is to find a PLD with the performance required for full-speed PCI transactions, and enough logic capacity to handle the functions.

Historically, field-programmable gate arrays (FPGAs) have been big enough, but until recently, were too slow. On the other hand, complex PLDs (CPLDs) were fast enough, but until recently, were too small. However, over the last two years, FPGAs have speeded up, and CPLDs have grown larger, so designers have an abundance of devices from which to choose. But while both types of devices can handle a PCI implementation, the recent improvement in CPLD densities, and their intrinsic features, make them very attractive for PCI applications.

For instance, CPLD architectures are inherently good for state-machine-based designs, and many areas of a PCI design can take advantage of this. Another CPLD advantage is that performance is predictable, and remains constant throughout the design cycle. CPLDs also have abundant routing resources, so meeting a particular pinout, or using all of the logic resources available is relatively straightforward.

To show how CPLDs can be used to create a customized PCI interface, let's examine the design of a simple application that can be implemented with 128 macrocells. Larger CPLDs can also be used for PCI designs with more functionality, such as initiators or designs that incorporate more of the system logic on the chip. Soon, when CPLDs incorporate on-chip memory blocks, designers will be able to implement an entire PCI interface, including FIFO memory.

Designers must consider several issues when implementing PCI applications in programmable logic. The demands of PCI require high speed, generous routing resources, high I/O count, pinout flexibility, and a consistent-performance timing model.

Compliance with the 33-MHz PCI specification demands a set-up time no longer than 7 ns. In many implementations, especially in FPGAs, the fanout requirements for the address and data buses affect set-up time. One solution is to capture the address and data values in a register at every clock cycle, then route them to where they're needed. However, this adds a clock cycle of latency. A key advantage of CPLDs is that their performance specification doesn't depend on the fanout. So the system can maintain the 7-ns set-up requirement regardless of where the address and data bus signals go.

Next, bused signals must be driven valid between 2 and 11 ns after the clock signal. In FPGAs, many delays contribute to a signal's total clock-to-output delay. These include clock-to-Q logic paths, routing, and I/O buffer delays. Often, they can add up to more than 11 ns, which would then require a wait-state to be added to the PCI interface. However, because most CPLDs have a fixed-delay timing model, they can easily propagate an output from a register to an output pin in less than 11 ns. This is done regardless of the routing, while incorporating an entire "pass" of logic (Fig. 1). In fact, most CPLD datasheets guarantee this timing delay, typically referred to as t_CO2.

Another critical requirement is that the PCI device must respond to a transaction within three clocks after the address phase. If the PCI initiator does not get a response within four clock cycles, it will abort the transaction. In fact, the PCI device can respond as early as one clock cycle. This is called a "fast-response" device. Remembering that it has to meet a 7-ns set-up time, most programmable logic implementations fall into the medium or slow (two- or three-clock-cycle response) categories. By using a CPLD, designers get performance that is both fast and predictable. Additionally, CPLDs can perform a large number of logic operations within one pass of the device (or one clock cycle).

The ability to handle high-fanout signals is yet another important characteristic. Many signals within the PCI interface must be routed to 36 or more signal nodes simultaneously. This can cause significant loading problems for most FPGAs, so some type of buffering, duplication, or special routing must be used to handle these signals. Such additional support complicates the implementation significantly. However, most CPLDs have fanout-independent routing, so routing a signal to 36 nodes is no problem. Any signal from a pin or macrocell within a CPLD can be routed to as many places as needed with no change in device performance.

With most of the baseline conditions now established, let's examine the design of a PCI target interface using a CPLD containing 128 macrocells (in this case a Cypress Ultra 37128). This version can hold the smallest possible PCI target design, and allows engineers to craft a cost-effective solution while maintaining high performance and flexibility.

The design was entered using VHDL (Cypress' Warp2 development tool) with no device-specific structural components. The custom application ties into the PCI bus through a generic user interface that can easily be customized by modifying the VHDL source code. The design performs the protocol interactions of a PCI target interface (—TRDY, —DEVSEL, and —STOP signals), throttles the data bus between PCI and the user interface using two-cycle reads and one-cycle writes, includes an address counter for bursts, and determines transaction hits.

This design targets the smallest CPLD possible. It does not implement a parity generator or the configuration space registers. The designer could implement both of these functions on a separate device, leaving the CPLD to focus on the most essential, and usually most challenging, part of PCI interfacing: the control logic (Fig. 2). A minimum PCI target design incorporating parity generation and configuration space registers requires approximately 140 macrocells in a CPLD. Using a 192- or 256-macrocell device would leave sufficient resources for custom logic.

The design also includes an external FIFO memory to handle memory-write operations. This reduces latency, as the CPLD generates all FIFO control signals. Thus, the PCI target interface implements writes in 2-1-1-1 bursts, and reads in 2-2-2-2 bursts. To select between user-application data and configuration-space data, the CPLD generates a mode signal to switch between the two bidirectional data buses.

Different applications will require different size address spaces. Modifications of the VHDL design description to match these needs are expected when implementing PCI in a PLD. Because a device's address space is determined by the number of hardwired zeroes in the lower bit positions, decreasing the address space size actually increases the resources required in a PLD. The reason is that hard-wired zeroes require minimal resources within a CPLD device, while address bits must be stored in the macrocell registers of a CPLD.

Decreasing the number of hardwired zeroes also increases the address-compare portion of the design, because more real bits need to be compared. Changing the address space can also impact the address counter used for burst reads and writes.

Another possible modification to the PCI design is the support of aborts and retries. This logic would be added to the control block, and would need to recognize the REQ_ABORT and REQ_RETRY signals. Finally, designers can replace the FIFO memory with internal registers within the device. The number of bits stored will affect the resources required for the PCI design.

The input registers at the PCI address/data bus continuously capture data on every clock cycle. As soon as the state machine transitions to the CMP_ADDR (compare-address) state, the control logic knows that the address of a new transaction is contained within the input registers. This data is used to determine a transaction hit in the AddrCompare block, and is captured in the AddrCounter block. In parallel, the command-decode logic determines which (if any) of the internal command flags should be raised.

For write transactions, when the state machine is in the TDATA state, the data-input register provides write data to the user data bus. The active-low FIFOWRITE signal is asserted when data on the user bus is valid, and should be taken on that clock cycle. The data is available for one clock cycle only, and must be taken at any clock during which the FIFOWRITE signal is asserted.

For read transactions, the data buses from the user ports are routed to the PCI data bus. On the user interface, the address bus contains the address of the desired data. The user must produce the data as soon as possible (approximately 10 ns). If more time is required, additional wait states may be introduced. This is accomplished by modifying the TRDYIVAL equation. The high-impedance outputs of the user read-data-bus are controlled by the INOUT_USR signal. INOUT_USR in the high state indicates a write operation, which disables the user output drivers.

Finally, the mode of the transaction is reflected by the CFGMODE signal. When it is in the high state, the external configuration-space device responds; when in the low state, the user device responds.

The state machine within the PCI target design is the primary control circuit. Basically, the state machine remains in the idle state until it determines that the PCI initiator is attempting to interface to this target device. From there it moves to one of three other states as determined by various conditions (Fig. 3).

When the PCI bus is inactive, or the transaction on the PCI bus does not involve this particular target, the state machine remains in IDLE. Here, the C/BE bus is predecoded to assert the appropriate internal command mode. As soon as a new transaction is initiated (the —FRAME signal is asserted in this state), the state machine will move to the CMP_ADDR state.

In the compare-address state, the logic uses a two-stage, pipelined comparator to determine transaction hits. The first stage is completed in IDLE, and the second stage is completed in the compare-address state. If an address hit is determined, then the state machine will move to DTRANS, otherwise it will return to IDLE.

All PCI read and write data transfers occur in the DTRANS state. The state machine will remain here until the last transfer takes place (—TRDY and —IRDY are asserted, —FRAME is de-asserted), or a disconnect is signaled by the target (—STOP is asserted, and known to be recognized by the initiator because the —IRDY signal was also asserted). For these cases, the state machine will move back to IDLE or BACKOFF, respectively.

The state machine will change to BACKOFF when the initiator recognizes that the target is signaling a disconnect (—STOP and —IRDY both asserted). In this state, the target waits for the initiator to complete the transaction. The initiator will assert —IRDY and de-assert —FRAME to terminate the transaction. When this occurs, the state machine moves back to IDLE.

Although the VHDL code describing the PCI target design can be used in any VHDL synthesis tool, the particular code shown in this article is intended for use with the Cypress Warp2 VHDL development system. The VHDL design is divided into several sections: state machine, output registers, three-state I/O logic, pipeline registers, signal equations, data-path, address-compare block, and address counter.

CPLD architectures in general appear to be inherently better at implementing state-machine designs than their FPGA counterparts. This is true for several reasons:

The product-term logic implements equations in sum-of-product form, which is the natural, optimal form for state-machine designs.
CPLDs have a higher logic-to-register ratio than FPGAs. That simplifies the design because most state machines require more logic than registers, especially when using sequential encoding.
CPLDs have a fixed-delay timing model. This allows state-machine equations to run at a predictable frequency regardless of the routing or the logic in between. Another important aspect of the fixed-timing model versus FPGAs is that design changes in the CPLD implementation will deliver the same predictable performance.

To meet the PCI specification, remember that clock-to-output delays must be between 2 and 11 ns. In a CPLD implementation, this is done by registering all of the PCI bus-control signals, and updating them on every clock cycle. At reset, the signals are preset to a logic high.

Furthermore, the PCI specification dictates that the PCI-bus data and control signals must not drive the bus when not actively participating in a transaction. This is easily implemented in a CPLD using the output-enable product terms at each I/O cell. In some architectures, such as the Cypress CPLDs, these product terms are dedicated to performing output enable. Thus, they don't take logic resources away from the rest of the design.

To deal with the pipeline registers and signal equations in the VHDL design, the code simply has to register necessary signals and define the logic equations for all internal and external PCI signals. The Warp2 development software can then optimize the equations for the CPLD architecture, and provide a summary of the final implementation in a report file.

The output-data bus path is also registered to meet the timing requirements of PCI. When wait states are induced by the initiator (—IRDY is sampled, de-asserted), the data path must be halted until the initiator is again ready to accept data. The VHDL code must also describe that behavior as well as provide registers for the data, byte enables, and a write-enable signal to the FIFO memory.

To recognize ownership of an initiated transaction, PCI target devices must latch the address on the PCI bus, and compare it to the contents of their base address register (BAR) to determine a match. The CPLD implementation performs a bit-to-bit equality comparison of the address and the BAR value. This produces a product-term-based solution that requires only two passes through the CPLD.

The final section of the VHDL design description is the address counter. During burst transfers, the target device must increment the address; this target design uses a 14-bit counter (counts on double word boundaries) to perform that task. The counter is loaded whenever the state machine is in IDLE (in anticipation of a new transaction), and is incremented whenever data is transferred.

The inclusion of a T flip-flop in each macrocell makes CPLDs inherently good at implementing binary counters. The CPLD's architecture allows it to implement fully loadable counters that can operate at well over 100 MHz. And a benefit to designing with VHDL is that the counter's description is simple and straightforward (see code listing).

This design, although it provides only a basic target-PCI interface, can be embellished with larger FIFO memories to improve the data transfer efficiency. Also, a larger CPLD would allow the designer to include more system logic functions and to implement a PCI initiator interface, which would require several thousand additional gates of logic.