Compact Tracing With 32-Bit Microcontrollers

Nov. 15, 2007
What debugging feature is more likely to be found in higher-end 32-bit microcontrollers? Microcontrollers with advanced debugging capabilities typically augment the standard JTAG interface with trace facilities. Trace information can

What debugging feature is more likely to be found in higher-end 32-bit microcontrollers?
Microcontrollers with advanced debugging capabilities typically augment the standard JTAG interface with trace facilities. Trace information can be captured on-chip or off-chip.

What is the difference between on-chip and off-chip trace?
On-chip trace uses a RAM array located on the chip to store the trace data. It simplifies offchip debug hardware, but the amount of on-chip RAM limits the session.

Off-chip trace streams data over physical pins where an external trace probe or logic analyzer captures the trace stream (see the figure). This approach requires additional pins but less on-chip real estate because of reduced RAM requirements.

Is it possible to eliminate the trace interface pins?
Yes. Microcontroller vendors provide a number of package size options. So the die likely will have all trace included, but only the larger pin packages will have trace block bonded out to pins. Is it possible to improve offchip tracing performance?

Most off-chip trace systems use double-data-rate (DDR) clocking to increase throughput. Minimizing the kinds of trace information is one alternative, but hardware compression typically provides more information with lower bandwidth requirements.

How is trace compression implemented?
One possible implementation encodes the trace stream based on the execution sequence of an application accounting for the sequential nature and locality of reference, as in:

  • Record 1 bit per sequential instruction executed; if no instruction executes on a clock cycle, no storage takes place
  • 2 bits per branch instruction when the destination is predictable from the code
  • 4 bits + offset for indirect jumps or interrupts; the offset is 8, 16, or 32 bits depending on the distance of the branch destination
  • 4 bits + 32 bits for sync record; the sync record is placed in the trace periodically to mark an absolute address from which trace disassembly can start

What is a typical compression ratio using this approach?
Typically, 32 bits (the instruction pointer) of uncompressed trace data are available for each instruction. The amount of compression is variable because it depends on the frequency of branches, what types of branches are executed, and the distance of the jump.

The worst-case trace bandwidth occurs when there are no pipeline stalls caused by memory accesses, branches taken, and so on. That’s because they cause the processor to execute fewer instructions per clock, reducing trace bandwidth.

Typical numbers are 20 to 30 instructions for a 64-bit compressed frame. At 20 instructions, the ratio is 10:1. At 30 instructions, the ratio is 15:1.

How is tracing tied in with breakpoints?
When a breakpoint is reached, the processor halts and the trace data is read out. Breakpoints can also be used to dynamically turn trace on or off when they occur, allowing the trace buffer to be filled with just the area of code execution desired.

Are there enhancements to the trigger system that can benefit the microcontroller space?
Yes. New microcontroller core designs now include what are called “complex triggers” to enable more sophisticated triggering. One feature is “primed breakpoints,” or the setup of a sequence of triggers so the code has to execute through that sequence before the trigger occurs. This is useful for triggering on a common lower-level function, but only when it is called from a sequence of other calls first. The sequence can also include data breakpoints so primed breakpoints can be used to trigger when a peripheral register is read or written, but only from a specific function.

Another breakpoint enhancement known as a tuple combines an instruction breakpoint and data breakpoint. The instruction breakpoint is placed on a load or store instruction, and the subsequence load or store memory cycle matches the data breakpoint. Again, this is used to narrow down the trigger to one specific instruction that reads or writes memory.

Pass counters can be added to each instruction and data breakpoint so they only trigger when the pass count has been reached. This is useful if you want to trigger on a specific number of loops through code or after a certain number of memory accesses.

Qualified instruction breakpoints are enabled or disabled on a pre-defined data breakpoint when a specific load or store value matches the data address. If it does not match, the instruction breakpoint will be disabled. If the value matches, the instruction breakpoint is enabled. This is useful for enabling execution breakpoints when a pre-specified task is running in a real-time operating system (RTOS) and disabled when any other is running.

Visit MIPS' MCU homepage by clicking here.


To join the conversation, and become an exclusive member of Electronic Design, create an account today!