CoreConnect: The On-Chip Bus System

Processor Local Bus (PLB) General processor local bus Synchronous, nonmultiplexed bus Separate Read, Write data buses Supports concurrent Read, Writes Multimaster, programmable-priority, arbitrated bus 66/133/183 MHz (32-/64-/128-bit) 32-bit address 32-/64-/128-bit implementations (to 256-bit) Pipelined, supports early split transactions Overlapped arbitration (last cycle) Supports fixed, variable-length bursts Bus lockingOn-Chip Peripheral Bus (OPB) Peripheral bus for slower devices Synchronous, nonmultiplexed bus Multimaster, arbitrated bus 32-bit address Separate 32-bit Read, Write buses Pipelined transactions Overlapped arbitration (last cycle) Supports bursts Dynamic bus sizing, 8-, 16-, 32-bit devices Single-cycle data transfers Bus locking (parking)Device Control Register (DCR) Bus Provides alternate path to device control registers Synchronous, nonmultiplexed bus Separate Read, Write data buses Single-master, multiple-slave bus 10-bit address bus 32-bit data buses Two-cycle minimum Read/Write cycles Distributed multiplexer architecture Supports 8-, 16-, 32-bit devices Single-cycle data transfersOverview System-on-a-chip (SoC) and ASIC silicon densities now support system-level implementations. The buses to link processors, memory, peripherals, and special functions are necessary. On-chip multilevel bus systems have emerged to meet these needs. One is the CoreConnect on-chip bus system from IBM Microelectronics. Originally designed to support PowerPC cores for IBM ASICs, this bus system now handles other processors and can be licensed for deployment. It's a major contender to Silicore's Wishbone (www.silicore.com), ARM's AMBA (www.arm.com), and other multilevel bus systems.

CoreConnect is an on-chip silicon bus for an ASIC or FPGA designs. It consists of a three-level system: the processor local bus (PLB), the on-chip peripheral bus (OPB), and the device control register (DCR) bus. The first bus, the PLB, connects the processor to high-performance peripherals, such as memory, DMA controllers, and fast devices. Bridged to the PLB, the OPB supports the slower-speed peripherals. The third bus, the DCR, is a separate control bus that links to all of the devices, controllers, and bridges. It provides a separate path to set and monitor the individual control registers.

Processor Local Bus The PLB is the main on-chip system bus. It links the processor with on-chip memory, memory controllers, and other high-speed peripherals, including DMA controllers. It's a synchronous, multimaster, arbitrated bus. For higher throughput, it supports concurrent Reads and Writes, even for the same master. As a result, each master has a single 32-bit address bus plus separate Read and Write buses, which can be implemented as 32, 64, and 128 bits wide. The design even allows 256-bit-wide data buses.

Also, the PLB supports pipelined addressing, enabling masters to re-quest bus access while the current transaction(s) are executing. It implements four priority levels of bus access. Plus, masters can lock the bus for atomic operations, keeping out any other master until they have completed the locked transaction(s). Transaction address cycles have three phases; request, transfer, and acknowledge. Data cycles have two phases; transfer and acknowledge. The acknowledge phase can occur in the trailing portion of the same clock for the transfer phase of a single-clock data transfer.

The centerpiece of the PLB bus is the PLB macro, which includes the arbiter and bus multiplexer switch. Each bus master connects its arbitration signals, bus control signals, address bus, and Read and Write data buses to the arbiter, which functions like a giant multiplexer. Concurrently, the PLB can support a Read and a Write through its multiplexer. It only presents one master's address bus at a time, though, starting one transaction at a time. The arbiter supports up to 16 masters, with no logical restrictions on the number of slave devices.

The PLB implements an interesting variation of a split-transaction—a forward split-transaction. The traditional split-transaction separates the master request from the slave response. This is invaluable for making Reads effective, giving the device time to get its data ready to transfer. In contrast, CoreConnect's PLB permits masters to make early requests to the slave before the bus is allocated to them. This early warning enables the addressed device to set up before the bus is allocated to the master's request.

The PLB supports both fixed- and variable-length bursts. The fixed-length bursts are defined by a 3-bit field in the transaction request. Variable-length transactions are controlled by the master bus control signals.

Address pipelining also enables the bus to start a new transaction before the current transaction completes. This means that a Read or a Write can be started during a Write or a Read transaction in a master. The new transaction might be for the existing master or another master.

For larger systems with multiple CoreConnect PLB buses, IBM has provided two crossbar switches. These switch cores let designers link multiple on-chip PLBs through a central clearinghouse switch.

On-Chip Peripheral Bus Designed to support slower peripherals, the OPB is implemented as a straightforward multimaster, arbitrated bus. It's a synchronous bus with a common clock, but its devices can run with slower clocks, as long as all of the clocks' rising edges are in sync with the rising edge of the main clock. This bus uses a distributed multiplexer implementation.

The OPB implements a 32-bit address bus and a separate 32-bit data bus. Transaction widths can be full-word, half-word, or byte-size. The bus supports 8-, 16-, and 32-bit wide device interfaces (aligned on the left-most byte). Data transactions can take a single cycle (for matched clocks), and burst operations are supported.

The bus masters compete for the bus via the arbiter. Each master connects directly to the arbiter via its Mn_request (bus request), Mn_busLock (bus lock), and OPB_MnGrant (bus grant) signals. A master may request to lock the bus, holding it until the bus is released.

OPB consists of two muiltiplexer-based buses: the OPB address bus (OPB_ABus) and the OPB data bus (OPB_Dbus). The address bus gates the selected master's address bus to the devices, and through "AND" and "OR" gating, the data bus picks up the selected master's data bus (Mn_Dbus) for a Write or the addressed slave's data bus (SIO_DBus) for a Read.

Each clocked data transfer includes a transfer-acknowledge signal from the slave device indicating completion of the data transfer. If the slave device can't complete the data transfer or accept the transfer request, it can assert an error-acknowledge signal, Sin_errAck ("Ored" to create OPB_errAck, the OPB error acknowledge). The OPB supports built-in DMA peripheral controllers with a special DMA channel. The DMA channel arbitrates for control of the PLB like a master.

Device Control Register Bus The DCR bus provides an alternative path to the system for setting the individual device control registers. With it, the host CPU can set up the device-control-register sets without loading down the main PLB. This bus has a single master, the CPU interface, which can Read or Write to the individual device control registers. The bus employs a ring implementation to connect the CPU interface to the devices, which are addressed via a 10-bit address bus. A separate 32-bit data bus transfers register data. (On-chip silicon is cheap.)

This is a synchronous bus. The individual devices can run at a slower clock rate than the bus clock, but the rising edge of the device clocks must correspond to that of the faster DCR bus clock. The CPU, the master, connects to each slave device with its address bus, data bus, and control signals (dcrWrite, dcrRead). The output bus of the slave devices connects to the CPU interface via a multiplexer "OR" function. The addressed device simultaneously signals receipt of a Read transaction and the end of a Write transaction. Bursts aren't supported by this bus. Each transaction takes a minimum of three cycles (more with slower device clocks).