Class 364: Real Time Embedded Trace For ARM

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Class 364: Real Time Embedded Trace for ARM

William Orme, Development Systems Programme Manager


Richard York, Senior ASIC Designer
ARM Ltd., Cambridge, UK

Abstract:

With the move to system-on-chip (SOC) devices come new requirements for on-chip debug

support. Higher frequencies and on-chip memories mean lack of visibility of processor activity

at the pins of the device. As memory sizes and thus software complexity on-chip increase so

does the need for good debug tools, especially in embedded systems where real-time constraints

are common.

ARM is developing a new Embedded Trace Macrocell to compliment their existing

EmbeddedICE technology for the ARM7 and ARM9 family of embedded cores. This will

provide, through a narrow port on the system-on-chip device, completely non-intrusive tracing of

both instruction execution and data in real-time.

ARM has developed trace configuration and display functionality as an extension to the

ARM Debugger for Windows. ARM is working with Hewlett-Packard and other third parties to

provide the off-chip trace data acquisition units.

The Need for Real Time Trace

Breakpointing and stepping application code allows users to run the application code to a

given point in code and then stop the processor. At this point the user has the option of

examining or changing memory or register contents, stepping or restarting the application.

The really difficult bugs to track down are those that occur in situations where there is an

unforeseen and hence unpredictable interaction between the application software and hardware.
Real Time Embedded Trace for ARM Pg. 2

These bugs can be intermittent and usually only occur when the system is running at full speed;

simply starting, stopping or stepping the processor does not expose the problem. An historical

non-intrusive trace of instruction flow and data accesses can provide the extra information

needed to identify the bug. For example, an application crashes during an interrupt routine. The

result of the crash is that a memory protection fault occurs; the cause can not be found using

breakpoints and single stepping methods. The user sets up the trace filter facility to collect trace

data only during the interrupt routine, and the trigger to stop tracing when the protection fault

occurs. The filter facility limits the amount of information that has been traced and analyzed. The

trigger ensures that the trace information around the bug has been captured and not over written.

As trace buffer depths are finite, these features are important to ensure the buffer is only filled

with relevant information. They also save time by limiting the information that needs to be

analyzed to find the bug. Trigger and filter conditions can be changed to refine what trace data is

captured and when.

Traditional software debug tools such as in-circuit emulators (ICE) or logic analyzers have

relied on having access to most of the signals of a microprocessor to provide trace functionality.

This is not the case when the microprocessor core is deeply embedded in a SOC. In an extreme

case, there may be no core signals visible on the pins of the chip. The lack of signals is not the

only problem that traditional methods have difficulty overcoming. As frequencies exceed

100MHz, any additions to signal path lengths can cause the skew of signals to such an extent that

an incorrect representation of processor activity occurs.

The pin-out problem can be overcome by using bondout versions of the SOC, which provide

all the signals needed. Bondouts take on one of two forms: either as an exact replica of the final

SOC or as an implementation of a common subset of the functionality for a product range. Both
Real Time Embedded Trace for ARM Pg. 3

have their problems. An exact replica is likely to be of use for only one product, therefore new

bondouts are needed for each new project. A subset requires further logic to be added around the

bondout to provide the functionality of the SOC, so it will most likely behave differently to the

final chip. The use of bondout technology always adds time and cost to the design cycle. The

additional work required is technically challenging (difficulty of routing signals off chip at

maximum frequency) thus diverting technical expertise away from the main objective.

ARM’s Real Time Debug Solution

The ARM solution puts the real time components of in-circuit emulation into the SOC. The

advantages of this approach are:

• Debug solution running at full processor speed

• Debug of final product, as well as the development version

• Scalable solution for multiprocessor devices

• Standard tools applicable for all ARM core-based SOCs

• Easy, reliable and small interconnect to target

• Low cost

Three elements provide a complete real time debug solution:

• EmbeddedICE logic

• Real Time Monitor

• Real Time Trace

The EmbeddedICE logic, which is an integral part of the ARM7TDMI, ARM9TDMI,

ARM9E and ARM10 cores, contains breakpoint registers that compare the value on the core

address, data and control busses against values programmed into the registers. For example, the
Real Time Embedded Trace for ARM Pg. 4

logic may be programmed to generate a breakpoint when an instruction is loaded from a

particular address or a particular data value is stored to a given location. When a breakpoint

occurs the processor will be stopped and will then enter debug state or cause an exception and

enter a debug monitor program. Memory and register contents can then be examined or

modified, images can be loaded, code can be stepped or execution restarted. The EmbeddedICE

logic provides a full set of run control, debug features.

The Real Time Monitor provides two major functions with minimal intrusion on the

application execution time:

1) The debug of foreground tasks while interrupts continue

2) The ability to read and write memory without stopping the processor

ARM’s Real Time Trace Solution

The Real Time Trace solution for ARM cores embedded within an SOC is made up of three

elements that provide the capability to trace instructions and data accesses in real time:

• The Embedded Trace Macrocell

• The Trace Port Analyzer

• The Trace Debug Tools

Figure 1 shows the overall system. The on-chip Embedded Trace Macrocell (“ETM”)

monitors the ARM core busses and passes compressed information via the trace port to the Trace

Port Analyzer (“the analyser”). The ETM also contains trigger and filter logic to control what is

traced and about what event. The analyzer is an external device, which stores the information

from the trace port. The Trace Debug Tools (“the debug tools”) setup the trigger and filter logic,

retrieve the data from the analyzer and reconstruct an historical view of the processor’s activity.
Real Time Embedded Trace for ARM Pg. 5

The Embedded Trace Macrocell

Trace of the instruction flow of the processor is achieved by the ETM broadcasting branch

addresses via the trace port. The complete instruction flow is reconstructed later by the debug

tools using the binary image of the code to fill in the sequential instructions that must have been

executed. Note, that it is therefore not possible to reconstruct self-modifying code. In order to

achieve 100% traceability of the code through as narrow a port as possible two compression

techniques are used. First, for PC-relative ‘branch’ instructions (B and BL) an address is not

broadcast only a status bit indicating whether the branch was taken or not. An address is only

needed for exceptions and direct loads to the PC, which are infrequent. Second, only address bits

that have changed since the last branch are broadcast. The combination of address compression,

a small on-chip FIFO and the minimum three clock cycles required to fill the fetch-decode-

execute pipeline mean that all branch addresses can be broadcast through a 4-bit data port.

The ETM can also be programmed to broadcast the address and/or data value of data reads

and/or writes. Again, only address bits changed since the last data address are broadcast. Full

data tracing of applications excluding data intensive functions, such as block copy, can be

achieved through an 8-bit data port with only a 40-byte on-chip FIFO. When tracing of all data

accesses is not possible resources within the ETM can be used to control what data is traced. For

example: only accesses inside (or outside) of selected data regions; or only data accesses by

certain routines; or only writes of a given, bit-masked value. The instruction trace is always

broadcast with the data trace.

The same resources can be used to turn tracing on and off to monitor a suspect piece of code

over a longer elapsed time within the finite trace buffer of the analyzer. They are also used to

generate a complex trigger event about which the analyzer collects the trace data.
Real Time Embedded Trace for ARM Pg. 6

The quantity of controlling resources and size of on-chip FIFO are selectable by the ASIC

designer through synthesis parameters to best meet the trade-off between silicon area, pin count

and complexity of debug features. The maximum configuration allows:

• 16 full 32-bit address comparators, configurable as 8 range comparators

• 8 data comparators

• 16 address decodes

• 4 user-defined external inputs

• 2 inputs from the ARM core’s EmbeddedICE address and data comparators

• 4 16-bit counters

• a 3 stage sequencer

The number of pins of the ASIC utilized is configurable as 9, 13 or 21 depending on the

whether a 4, 8 or 16-bit data port is implemented. The other pins being used for three pipeline

status bits, a synchronization bit and a clock signal. Pins can be multiplexed with other signals.

The standard five JTAG interface pins are also required and are used to set up the ETM.

The Trace Port Analyzer

The analyzer is an external device, which stores the information from the trace port. The

trace information is compressed so that the analyzer does not need to capture data at the same

bandwidth as an analyzer monitoring the core busses directly. This has the benefit of either

lowering the analyzer cost or increasing the amount of processor activity that can be traced.

ARM has been working with Hewlett Packard to ensure timely support for ARM’s new on-

chip facilities with HP’s logic analyzers (16600 and 16700 series) and their low cost Trace Port
Real Time Embedded Trace for ARM Pg. 7

Analyzer. ARM is also enabling many other analyzer, emulator and debugger vendors to bring to

market new products which will support these new features.

The first generation of low cost HP trace port analyzers will support frequencies upto

100MHz and 4 or 8-bit data port widths with 512K cycle deep buffers. The logic analyzers with

the current (at time of writing) generation of state/timing cards will support frequencies up to

333 MHz and 4, 8 or 16-bit data port widths with 2M cycle deep buffers. Cascading of logic

analyzer channels gives a maximum depth of 40M cycles (limited to 100 MHz). The logic

analyzer can be used to simultaneously watch hundreds of additional signals synchronous to the

processor activity.

The Trace Debug Tools

The Trace Debug Tools retrieve the compressed trace data from the analyzer and reconstruct

an historical view of the processor’s activity using a stored copy of the binary image loaded into

the target. The display window shows a disassembly of code executed with full symbol

information and data accesses to memory interleaved. Auto-correlation to the source code

(C/C++ or assembler) is provided with highlight bars that scroll in lock step, allowing rapid

understanding of the trace data. The debug tools provide a configuration wizard to set up the

trigger and filter logic of the Embedded Trace Macrocell in a manner intuitive to the software

engineer and not requiring a detailed understanding of the ETM logic. The debug tools interface

to the target via an extension to ARM’s Remote Debug Interface (RDI 1.51) which is used by

third party debugger vendors wishing to support the ARM architecture.

The Trace Debug Tools will be available as an add-on to the new ARM Developer Suite on

Windows NT4 and 95/98.


Real Time Embedded Trace for ARM Pg. 8

Code Coverage and Performance Analysis

The most recent additions to the embeeded engineer’s toolset are code coverage and

performance analysis tools, which can provide users with several useful benefits:

• Proof of test coverage

• Minimum and maximum execution times for an algorithm

• Code size reduction

• Code performance optimization

The code coverage and analysis tools are usually resident on the host controlling the ICE or

logic analyzer; the information required is provided by the trace facility. These tools set the trace

trigger and filter functions, and then use the captured data to provide the user with the relevant

information. The ARM trace solution can provide sufficient trace data for these tools in a

completely non-intrusive way that allows testing and analysis of the actual production code with

no instrumentation to bloat code and no requirement to slow or stall the processor to obtain the

trace.

With the availability of analyzer buffer depths of up to 40 million instructions, ample trace

information is provided by ARM’s Real Time Trace solution for use by code coverage and

analysis tools.

Summary

With the addition of the real time trace solution, ARM provides all the debug facilities

needed for SOC designs, even with no external visibility of core signals, running at frequencies

in excess of 200MHz. This solution is applicable to all ARM core-based designs available from
Real Time Embedded Trace for ARM Pg. 9

any of the ARM semiconductor partners who incorporate the Embedded Trace Macrocell in their

SOC devices.

It provides a completely non-intrusive real-time solution applicable for the actual hardware

and software product shipped and the unit cost of a development seat is dramatically reduced.

###
Figure 1: ARM’s Real Time Trace Solution

System On Chip

ARM CPU Macrocell


Control
Trace Debug Tools
running on host Address Embedded
Execution Trace
Macrocell
Unit
Data

BREAKPT
EmbeddedICE
Logic
JTAG TAP
Multi-ICE
5 wire Port

JTAG Trace
Port

Trace Port
Analyzer

You might also like