Professional Documents
Culture Documents
Class 364: Real Time Embedded Trace For ARM
Class 364: Real Time Embedded Trace For ARM
Class 364: Real Time Embedded Trace For ARM
Abstract:
With the move to system-on-chip (SOC) devices come new requirements for on-chip debug
support. Higher frequencies and on-chip memories mean lack of visibility of processor activity
at the pins of the device. As memory sizes and thus software complexity on-chip increase so
does the need for good debug tools, especially in embedded systems where real-time constraints
are common.
EmbeddedICE technology for the ARM7 and ARM9 family of embedded cores. This will
provide, through a narrow port on the system-on-chip device, completely non-intrusive tracing of
ARM has developed trace configuration and display functionality as an extension to the
ARM Debugger for Windows. ARM is working with Hewlett-Packard and other third parties to
Breakpointing and stepping application code allows users to run the application code to a
given point in code and then stop the processor. At this point the user has the option of
The really difficult bugs to track down are those that occur in situations where there is an
unforeseen and hence unpredictable interaction between the application software and hardware.
Real Time Embedded Trace for ARM Pg. 2
These bugs can be intermittent and usually only occur when the system is running at full speed;
simply starting, stopping or stepping the processor does not expose the problem. An historical
non-intrusive trace of instruction flow and data accesses can provide the extra information
needed to identify the bug. For example, an application crashes during an interrupt routine. The
result of the crash is that a memory protection fault occurs; the cause can not be found using
breakpoints and single stepping methods. The user sets up the trace filter facility to collect trace
data only during the interrupt routine, and the trigger to stop tracing when the protection fault
occurs. The filter facility limits the amount of information that has been traced and analyzed. The
trigger ensures that the trace information around the bug has been captured and not over written.
As trace buffer depths are finite, these features are important to ensure the buffer is only filled
with relevant information. They also save time by limiting the information that needs to be
analyzed to find the bug. Trigger and filter conditions can be changed to refine what trace data is
Traditional software debug tools such as in-circuit emulators (ICE) or logic analyzers have
relied on having access to most of the signals of a microprocessor to provide trace functionality.
This is not the case when the microprocessor core is deeply embedded in a SOC. In an extreme
case, there may be no core signals visible on the pins of the chip. The lack of signals is not the
only problem that traditional methods have difficulty overcoming. As frequencies exceed
100MHz, any additions to signal path lengths can cause the skew of signals to such an extent that
The pin-out problem can be overcome by using bondout versions of the SOC, which provide
all the signals needed. Bondouts take on one of two forms: either as an exact replica of the final
SOC or as an implementation of a common subset of the functionality for a product range. Both
Real Time Embedded Trace for ARM Pg. 3
have their problems. An exact replica is likely to be of use for only one product, therefore new
bondouts are needed for each new project. A subset requires further logic to be added around the
bondout to provide the functionality of the SOC, so it will most likely behave differently to the
final chip. The use of bondout technology always adds time and cost to the design cycle. The
additional work required is technically challenging (difficulty of routing signals off chip at
maximum frequency) thus diverting technical expertise away from the main objective.
The ARM solution puts the real time components of in-circuit emulation into the SOC. The
• Low cost
• EmbeddedICE logic
ARM9E and ARM10 cores, contains breakpoint registers that compare the value on the core
address, data and control busses against values programmed into the registers. For example, the
Real Time Embedded Trace for ARM Pg. 4
particular address or a particular data value is stored to a given location. When a breakpoint
occurs the processor will be stopped and will then enter debug state or cause an exception and
enter a debug monitor program. Memory and register contents can then be examined or
modified, images can be loaded, code can be stepped or execution restarted. The EmbeddedICE
The Real Time Monitor provides two major functions with minimal intrusion on the
2) The ability to read and write memory without stopping the processor
The Real Time Trace solution for ARM cores embedded within an SOC is made up of three
elements that provide the capability to trace instructions and data accesses in real time:
Figure 1 shows the overall system. The on-chip Embedded Trace Macrocell (“ETM”)
monitors the ARM core busses and passes compressed information via the trace port to the Trace
Port Analyzer (“the analyser”). The ETM also contains trigger and filter logic to control what is
traced and about what event. The analyzer is an external device, which stores the information
from the trace port. The Trace Debug Tools (“the debug tools”) setup the trigger and filter logic,
retrieve the data from the analyzer and reconstruct an historical view of the processor’s activity.
Real Time Embedded Trace for ARM Pg. 5
Trace of the instruction flow of the processor is achieved by the ETM broadcasting branch
addresses via the trace port. The complete instruction flow is reconstructed later by the debug
tools using the binary image of the code to fill in the sequential instructions that must have been
executed. Note, that it is therefore not possible to reconstruct self-modifying code. In order to
achieve 100% traceability of the code through as narrow a port as possible two compression
techniques are used. First, for PC-relative ‘branch’ instructions (B and BL) an address is not
broadcast only a status bit indicating whether the branch was taken or not. An address is only
needed for exceptions and direct loads to the PC, which are infrequent. Second, only address bits
that have changed since the last branch are broadcast. The combination of address compression,
a small on-chip FIFO and the minimum three clock cycles required to fill the fetch-decode-
execute pipeline mean that all branch addresses can be broadcast through a 4-bit data port.
The ETM can also be programmed to broadcast the address and/or data value of data reads
and/or writes. Again, only address bits changed since the last data address are broadcast. Full
data tracing of applications excluding data intensive functions, such as block copy, can be
achieved through an 8-bit data port with only a 40-byte on-chip FIFO. When tracing of all data
accesses is not possible resources within the ETM can be used to control what data is traced. For
example: only accesses inside (or outside) of selected data regions; or only data accesses by
certain routines; or only writes of a given, bit-masked value. The instruction trace is always
The same resources can be used to turn tracing on and off to monitor a suspect piece of code
over a longer elapsed time within the finite trace buffer of the analyzer. They are also used to
generate a complex trigger event about which the analyzer collects the trace data.
Real Time Embedded Trace for ARM Pg. 6
The quantity of controlling resources and size of on-chip FIFO are selectable by the ASIC
designer through synthesis parameters to best meet the trade-off between silicon area, pin count
• 8 data comparators
• 16 address decodes
• 2 inputs from the ARM core’s EmbeddedICE address and data comparators
• 4 16-bit counters
• a 3 stage sequencer
whether a 4, 8 or 16-bit data port is implemented. The other pins being used for three pipeline
status bits, a synchronization bit and a clock signal. Pins can be multiplexed with other signals.
The standard five JTAG interface pins are also required and are used to set up the ETM.
The analyzer is an external device, which stores the information from the trace port. The
trace information is compressed so that the analyzer does not need to capture data at the same
bandwidth as an analyzer monitoring the core busses directly. This has the benefit of either
lowering the analyzer cost or increasing the amount of processor activity that can be traced.
ARM has been working with Hewlett Packard to ensure timely support for ARM’s new on-
chip facilities with HP’s logic analyzers (16600 and 16700 series) and their low cost Trace Port
Real Time Embedded Trace for ARM Pg. 7
Analyzer. ARM is also enabling many other analyzer, emulator and debugger vendors to bring to
The first generation of low cost HP trace port analyzers will support frequencies upto
100MHz and 4 or 8-bit data port widths with 512K cycle deep buffers. The logic analyzers with
the current (at time of writing) generation of state/timing cards will support frequencies up to
333 MHz and 4, 8 or 16-bit data port widths with 2M cycle deep buffers. Cascading of logic
analyzer channels gives a maximum depth of 40M cycles (limited to 100 MHz). The logic
analyzer can be used to simultaneously watch hundreds of additional signals synchronous to the
processor activity.
The Trace Debug Tools retrieve the compressed trace data from the analyzer and reconstruct
an historical view of the processor’s activity using a stored copy of the binary image loaded into
the target. The display window shows a disassembly of code executed with full symbol
information and data accesses to memory interleaved. Auto-correlation to the source code
(C/C++ or assembler) is provided with highlight bars that scroll in lock step, allowing rapid
understanding of the trace data. The debug tools provide a configuration wizard to set up the
trigger and filter logic of the Embedded Trace Macrocell in a manner intuitive to the software
engineer and not requiring a detailed understanding of the ETM logic. The debug tools interface
to the target via an extension to ARM’s Remote Debug Interface (RDI 1.51) which is used by
The Trace Debug Tools will be available as an add-on to the new ARM Developer Suite on
The most recent additions to the embeeded engineer’s toolset are code coverage and
performance analysis tools, which can provide users with several useful benefits:
The code coverage and analysis tools are usually resident on the host controlling the ICE or
logic analyzer; the information required is provided by the trace facility. These tools set the trace
trigger and filter functions, and then use the captured data to provide the user with the relevant
information. The ARM trace solution can provide sufficient trace data for these tools in a
completely non-intrusive way that allows testing and analysis of the actual production code with
no instrumentation to bloat code and no requirement to slow or stall the processor to obtain the
trace.
With the availability of analyzer buffer depths of up to 40 million instructions, ample trace
information is provided by ARM’s Real Time Trace solution for use by code coverage and
analysis tools.
Summary
With the addition of the real time trace solution, ARM provides all the debug facilities
needed for SOC designs, even with no external visibility of core signals, running at frequencies
in excess of 200MHz. This solution is applicable to all ARM core-based designs available from
Real Time Embedded Trace for ARM Pg. 9
any of the ARM semiconductor partners who incorporate the Embedded Trace Macrocell in their
SOC devices.
It provides a completely non-intrusive real-time solution applicable for the actual hardware
and software product shipped and the unit cost of a development seat is dramatically reduced.
###
Figure 1: ARM’s Real Time Trace Solution
System On Chip
BREAKPT
EmbeddedICE
Logic
JTAG TAP
Multi-ICE
5 wire Port
JTAG Trace
Port
Trace Port
Analyzer