During the early day's of microcomputing, system designers could achieve adequate performance using a relatively simple bus. These buses had a basic protocol with address, data, and control signals all present at the input when a clock edge arrived. While the bus watched for a target to respond, wait states were often added. But as processors became exponentially faster, buses became a cause for concern. Low bus bandwidth was causing extreme system bottlenecks.
Designers soon realized that there were a variety of ways to increase that bandwidth. Upping the number of signals on the data bus proved to be one simple way to do this. Today, growth in this area continues, but alternative methods to increase bandwidth have also been introduced. These practices include separating the location (address), intent (transaction type), and the relevant information (data). This separation allows for streaming data (bursting), reusing signal lines (multiplexing), and even overlapping transactions (pipelining). All of these techniques result in bus performance that has reasonably kept pace with advances in microprocessor technology.
Pipeline- and transaction-based buses are currently used in many common applications. In fact, nearly all computers have a variety of one or the other. The proliferation of bus architectures partially resulted from the separation of processor, I/O, graphics, and memory buses. The processor front-side bus (FSB), the PCI bus, and the accelerated graphics port (AGP) are just a few examples from a typical system (Fig. 1). Intel began utilizing advanced buses early in the x86 architecture, while Motorola's use of pipelined buses was evident in the PowerPC. Even ARM, the low-power newcomer, has implemented a pipeline in its processor-bus family. Though different, each of these buses contains the same basic transactional elements and pipelining architecture that present challenges to test equipment, such as a logic analyzer.
Advanced buses break bus action into different transactions.Early in the transfer, a transaction type identifies a device's intent using the bus. This foresight enables intelligent recipients to begin processing before an entire transfer is complete. In addition, the bus can be partitioned into discrete logical chunks or phases with a defined end phase. The definition of these elements becomes part of the fundamental bus architecture. The entire transfer of all phases is defined as the transaction (Fig. 2). Multiplexing, bursting, and pipelining all use transactions as a foundation.
A data write can explain how a Write Transaction evolves into a burst transfer. A sender begins by announcing its intent to do a write and giving a start location. After this setup, the sender begins inundating the receiver with data. Because the receiver has already acknowledged the start location, it then takes on the responsibility of tracking the continued destination of this data. This is typically done by incrementing an address counter on the word size. The bus is thereby emancipated from the tedious, yet important, task of providing a destination for every data transfer. Plus, overhead work is greatly reduced.
An example of a transaction-based bus that allows bursting is the PCI bus. After announcing the start location, or address, of a read or write, the data is sent continually across the bus. Control signals determine the start and end of the transfer, while the receiver of the information has the task of incrementing the address counter and placing the data in the proper location.
The P6 family of processors, which includes the Pentium II Xeon, Pentium II, and Pentium Pro processors, as well as the Intel Celeron and Mobile Pentium II, utilizes the P6 family system bus as a processor front-side bus. This architecture introduced an 8-stage pipeline, which allows for much higher utilization of the electrical traces on the motherboard. A deferred-reply transaction was added to the P6 family system-bus architecture, adding a new and highly desirable dimension to the bus. Many microprocessor vendors currently use different permutations of these bus-architecture elements in their processor FSBs.
Not only does the pipelining, or queuing, of actions allow a device to begin work on a transaction before it is complete, it actually lets another transaction begin before its predecessor is complete. This is physically accomplished by defining a certain group of signals that can be used only for a phase of each transaction. Once the phase is over, another device is free to use those signals. Because there is less latency between the beginning of different actions, there is higher bandwidth. In the P6 family system-bus architecture, the pipeline is tracked by devices on the bus using an in-order queue.
Deferred response, a very apt name, is simply the ability for any bus agent to say, "Hey, I'm not ready yet. Go do something else, and I'll get back to you when I've got what you asked for." This simple concept dramatically increases bus efficiency by not requiring the initiator to continually retry the request.
The P6 family system bus is extremely complex and would take many pages to cover in detail. To understand some of the triggers below, look at how the in-order queue depth or request counter (rcnt), address (ADDR), request (REQ), data (DATA), response (RS), and some other lines behave for various pipelined and deferred transfers. To understand the details of other signals and the many exceptions to the operations shown here, please refer to the Intel P6 Family of Processors Hardware Developers' Manual, which is available on Intel's web site (www.intel.com). Or, check out the MindShare Pentium Pro and Pentium II System Architecture book (ISBN: 0-201-30973-4). Both are excellent references that can help you turn the triggering principles introduced here into valuable trigger sequences.
The diagram shown in Figure 3 begins with the yellow transaction on an idle (rcnt = 0). Simply asserting the address strobe (ADS) increments the queue depth on the next clock cycle (rcnt = 1). After passing the required phases (error, snoop), the agent responsible for responding to the yellow request defers the response by asserting DEFER and indicating it on the response signals.
Before completion of the yellow transaction, however, the next transaction (green) arbitrated and finished its request phase. The beginning of the green transaction incremented the queue depth again. But notice that one clock after completion of the yellow transaction, it decremented back down to rcnt = 1. Watching the queue depth increment and decrement on subsequent transactions reveals that it is based on the number of outstanding transactions on the bus. As the queue increments, responses have to come back in order. Thus, it is called an in order queue, or IOQ. Even if the red transaction had a response immediately, it must first wait for green.
In this example, the blue phase happens to be the deferred reply to the yellow transaction (indicated in the diagram by the asterisk). Instead of the yellow request agent retrying the entire transaction, the reply agent simply arbitrates and transfers the necessary data. (A deferred reply is typically much further apart than two states).
This short example highlights the information that's important to the triggers described here. When tailoring this information to a bus or triggers, determine what is important for you. Sketching a picture similar to this can prove invaluable.
A designer might look for many different combinations of events on a bus. Searching for a lockup event and detecting a reset are common examples. Another is transferring a specific data value to a particular location. Corrupt data, bus-to-bus transfer errors, POST code failures, and many other anomalies might be detected. The address location could be a memory address, an I/O port, or some other device on the bus. Building a trigger to find an address with a data value emphasizes the complexity of today's buses, while highlighting the power of current trigger systems.
Some analysis probes make triggering on complex buses, such as PCI and Pentium, relatively easy. These algorithmic analysis probes track the bus and latch address and data values. The realignment performed by the devices trivializes triggering on a specific address and data value. Unfortunately, the case differs for extremely complex buses like the P6 family. Their deep pipeline, deferred reply, and very high speed make similar bus alignment nearly impossible. Designers must rely on advanced logic-analyzer trigger systems to bring insight into this complexity.
The examples below use a flowchart style to evolve the bus operation shown earlier into trigger concepts. These flowcharts are directly applicable to logic-analyzer trigger sequences (see "Turning A Flowchart Into An Actual Logic-Analyzer Trigger Sequence," p. 100).
Always begin a trigger sequence by entering level 1 with RUN (Fig. 4). That level and its entire contents are evaluated during one clock cycle. The discoveries and actions defined in level 1 decide if the move is to the next level or back to the same level on the next clock edge. Branching allows non-sequential movement through the levels. When and if a trigger (point of interest) is found, there's a choice to either continue searching or stop and view the data.
Trigger Example 1: Transaction type with address. Triggering on a transaction type alone or with an address doesn't dramatically challenge today's advanced systems. Reflecting on the bus operation outlined earlier, the inception of a new transaction begins by asserting the ADS# signal. This action samples a transaction type from the REQ lines and a valid address from the ADDR lines. To perform the simple task of finding this, look for the combination of all three events. Upon their detection, trigger. Continue looking until they're found (Fig. 4, again).
Trigger Example 2: Capturing only I/O writes to a specific port. Build on the simple trigger sequence for capturing a transaction type with a specific address. Now, capture only I/O writes to a specific port address. This more complex trigger sequence introduces the concept of multi-way branching and store qualification, which is available on many modern logic analyzers. The flowchart (Fig. 5) should help make sense out of all this as each section is described.
The desire is to capture only I/O writes, so assume that the trigger system has the ability to globally store nothing. During each step, by storing samples and overriding this feature, the designer can ensure that the display contains only I/O writes. Note the absence of a trigger point for this sequence. It's "capture only," so it will require that the user press the stop button on the logic analyzer to end the trace. A trigger point can easily be added, however.
To begin, build on the simple level that finds a particular transaction type with a specific address. Level 1 of this trigger sequence is essentially the same as the first trigger example. But instead of triggering, the discovery of the conditions causes a move to level 2.
The bus being examined has multiple phases and pipelining, so storing only the information found in level 1 won't be enough to complete a picture of the I/O writes. Determine the stage of the pipeline in which the I/O write occurs, and how many states must be captured to obtain all of the phases in the transaction.
The P6 family diagram outlined earlier conveniently showed the queue depth starting at 0. The discovery of an I/O write in sequence level 1 is possible when the queue depth (rcnt) is at any value, which means the rcnt value must be determined before proceeding. Level 2 utilizes a four-way branch to make this determination. The queue depth increments one clock after an ADS#, so it should be evaluated during level 2. Dispatch to the subsequent sequence level (3, 4, 5, or 6) depends on the value of rcnt that's discovered. If rcnt is 4, for example, the user must wait for three response phases to pass before capturing the fourth response. In this case, the fourth response is the correct response—the one associated with the address and I/O write found in level 1. A similar capture is necessary for a rcnt of 3, 2, and 1.
To look for one response each, build four sequence levels after level 2. This allows entrance into this structure (response capture levels) based on the rcnt value. The following paragraph explains movement through this structure.
If the I/O write to port 80 is found in level 1, the procedure moves into level 2. If the rcnt discovered in level 2 is three, two responses must be captured before the third response appears. This response correlates to the original one found in level 1. So, the procedure will skip from level 2 to level 4. It will remain in level 4, capturing information until the first response is found. At this point, the procedure can move to level 5. There's a wait at this level until the second response is found. Finally, after moving into level 6, the procedure captures states until the third response appears. Then, it jumps to the final level.
Once all of the responses are captured, including the one associated with the I/O write, little remains. After attending to some maintenance in level 7, the prodcedure simply returns to level 1 and begins looking for other I/O writes.
Studying this bus in more detail reveals that the data transfer isn't specifically designated to occur with the response signals. For most I/O writes, the data transfer will coincide with the response signals, but this doesn't have to be the case.
In most cases, a designer can create protection by storing some extra states after the response signals are discovered. The example uses five extra states, but experimentation is necessary to discover the right amount for a particular bus.
Trigger Example 3: Looking for a specific data value. Adding a small amount of complexity to the trigger above enables the detection of a specific data value transferred to or from the address of interest. The example of an I/O write might help a user stop the analyzer on a specific POST code write, for instance. This could prove invaluable if a system continually locks up before or after a certain reading of that code. To gain this insight, alter the default storing to capture everything. Also, change the pre or post store depending on the location of the problem.
The IOQ requires responses to come back in order. So the fourth response-capture level, level 6, will always be the response associated with the initiating address. If not deferred, it also will contain the data for that address. Just add a data qualifier to this level to be able to trigger on this data. The modified level 6 (Fig. 6) shows a data value of 0xC7 qualified with the data ready (DRDY) signal. Even if this combination isn't found, exit on a response discovery so the second branch is added. The other levels remain the same, which helps to track on the I/O write of data value C7 to Port 80.
There's no "trigger" in the trigger examples for capturing only I/O writes. The only way to stop a sequence is with the stop button, as mentioned earlier. But if desired, a trigger can be added to any level.
The P6 family system bus can have eight outstanding transactions. With a pipeline this deep, even the most advanced logic-analyzer trigger system will have difficulty. The four-way branching of the HP 16717A state and timing logic-analyzer module, for example, can trigger when the pipeline is up to four deep. In other words, the rcnt cannot go greater than four. If it does, the trigger will get stuck in level 2. On a fully utilized bus, these triggers begin to break down. But they still can capture on many hardware problems. The triggers assume that bus usage is low enough that the ADS# for a new transaction doesn't appear while the procedure is in the response-detector levels.
Inevitably, a time comes when the bus or sequence of events on the bus becomes too complex for a decent logic-analyzer trigger. In this case, the cross-triggering capability of modern logic analyzers is invaluable. A cross-trigger occurs when the trigger of one analyzer utilizes the "arm" input to the sequencer of another analyzer. This feature can be exploited to find elusive bugs on processor buses and many other system buses.
The example of capturing a POST write of specific data can be greatly simplified using cross-triggering. An algorithmic PCI analysis probe can easily find a write to a specific address with data. Using this event to cross trigger to the FSB analysis probe can produce a result very close to the one described in the complex trigger above. The endless combinations of similar cross triggers can find many problems.
There's no way that this article could cover everything one might find interesting on a highly utilized transaction-based bus. And many of the specifics described here aren't applicable to triggering and capturing on all buses. The concepts remain the same, however.
Understand the bus. Draw a picture to help locate the problem. Map a trigger strategy using a flowchart. Events that could happen during subsequent clock edges become the flowchart levels. The possible occurrences within the same clock edge evolve into the actions and branches within a level.
Using default storing can eliminate un-needed states or include all states. Utilize the memory depth of the analyzer and the definable trigger position to capture where the problem might be located. If the problem isn't found on a complex bus, cross-trigger from a simpler one.
The specific applications shown here are easily extendible to other sophisticated modern buses, since most build on the same foundations. A little understanding and planning can definitely help unravel the debug mystery.