Choosing The Appropriate Simulator Configuration in Code Composer Studio IDE

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Application Report

SPRA864 – November 2002

Choosing the Appropriate Simulator Configuration in


Code Composer Studio IDE
Pankaj Ratan Lal, Ambar Gadkari Software Development Systems

ABSTRACT

Software development for digital signal processors (DSPs) go through algorithm


development and application integration stages. Each stage involves validation and
optimization of the developed code. Code Composer Studio IDE has a set of simulator
configurations and tools to enable application development in these stages.

The simulator configurations for a DSP include: functional CPU simulator, cycle accurate
CPU simulator, functional device simulator, and device simulator. The simulators also
support features for validation and optimization such as the pipeline stall analyzer, code
coverage and multi-event profiler, and cache analysis tool. There are also features, such as
the pin connect and port connect, which provide external stimuli to the application. It is critical
to use the appropriate simulator configuration and a combination of features during the
different stages of application development.

This application note relates the application validation and optimization challenges to
available simulation configurations and supported tools. It helps the developer choose the
appropriate simulator configuration along with applicable features during different stages of
application development.

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Application Software Development Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 Validation and Optimization Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Choosing the Appropriate Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Simulator Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Algorithm Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 Recommended Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Algorithm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Recommended Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.2 Profilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.3 Pipeline Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Application Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4.1 Recommended Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 Application Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5.1 Recommended Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5.2 Cache Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Trademarks are the property of their respective owners.

1
SPRA864

3.5.3 Buffer Transfer Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13


3.6 Real-World Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.7 Setting Up Simulator Configurations in Code Composer Studio IDE . . . . . . . . . . . . . . . . . . . . 13
4 User Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
A.1 Simulator Configurations Based on Extent and Detail of Device Simulated . . . . . . . . . . . . . . 19
A.2 Features to Provide External Stimuli to Target Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
A.3 Visibility and Analysis Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Contents

List of Figures
Figure 1 Component View of an Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Figure 2 Algorithm Validation and Optimization and the Simulator Configurations . . . . . . . . . . . . . . 4
Figure 3 Application Validation and Optimization and the Simulator Configurations . . . . . . . . . . . . 5

List of Tables
Table 1 TMS320C6x Simulator Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Table 2 TMSW320C55x Simulator Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Table 3 Functional Device Simulators vs. Device Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 4 Algorithm Validation and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table 5 Application Validation and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1 Introduction
Code Composer Studio IDE for C55x and C6x has multiple simulator configurations available
through the import menu in Code Composer Studio setup. The simulators support features such
as the pipeline stall analyzer, code coverage and multi-event profiler, and cache analysis tool.
There are also features, such as the pin connect and port connect, which provide external
stimuli to the application. It is critical to use the appropriate simulator configuration and a
combination of features during the different stages of application development.
The organization of the application note is as follows:
• Section 2 describes a typical application development flow and associated validation and
optimization challenges.
• Section 3 enumerates the different simulator configurations, supported features, and tools.
Further, it describes how appropriate combinations of simulator configuration, features, and
tools addresses these challenges.
• Section 4 describes typical user scenarios, illustrating the usage of simulators to tackle the
validation and optimization challenges.

NOTE: The various simulator configurations and features discussed in this document refer to
those supported by Code Composer Studio IDE v2.2.

2 Choosing the Appropriate Simulator Configuration in Code Composer Studio IDE


SPRA864

2 Application Software Development Flow


A typical application software development involves the following stages:
1. Identifying the application and partitioning it into modules.
2. Making heuristic estimates of CPU cycles and memory usage for individual modules and
the overall application.
3. Creating/reusing algorithms for the modules.
4. Verifying the functionality of individual modules on the target DSP.
5. Performing optimizations on each module, for code size and CPU cycles, to achieve or
better the cycle estimates that were arrived at in step 2.
Steps 4 and 5 are repeated until satisfactory results are obtained.
6. Integrating modules and verifying the full application. Integration typically involves creating
DSP/BIOS threads, and using the chip support library (CSL) to program the DMA and
other peripherals.
7. Optimizing the application for transaction latencies and efficient buffer management, to
match or better the heuristic estimates.
This process may require changes to the application code and hence involve iterating steps
3 through 7.

Figure 1 depicts a typical structure of an application that contains multiple modules integrated
into the application framework. The framework includes using DSP/BIOS and CSL. The
application operates on external stimuli through peripherals such as a serial port.

Algorithms, such as FIR and VOL in the figure, are developed first. These are then integrated
into the application framework.

Application Framework

VOL FIR

DSP/BIOS CSL

TMS320 Hardware

CPU DMA Other McBSP


peripherals

Figure 1. Component View of an Application

Choosing the Appropriate Simulator Configuration in Code Composer Studio IDE 3


SPRA864

2.1 Validation and Optimization Challenges


Application software development involves creating correct and efficient applications. Ensuring
efficiency and correctness involves thorough validation of all aspects of the software and is
critical to the viability of real-time applications. Following is the categorization of the typical
challenges faced during application development:
• Algorithm validation involves verifying the correctness of developed code. This includes
validating all parts of the code and debugging any problems encountered in the process.
• Algorithm optimization involves meeting cycle and memory budgets, typically estimated
using heuristics. The application developer can first meet the CPU cycle budgets assuming
an ideal memory system (where memory latency is assumed to be 0). Maximum utilization of
the CPU resources and instruction set features is needed at this stage to meet the
constraints.
• Application validation involves ensuring correctness after integrating different algorithms
and control code into the application framework. The integration process may involve
creating DSP/BIOS tasks, using CSL and device drivers for programming the peripherals,
and properly placing buffers in internal and external memories. The application needs to be
validated for correct data transfers across buffers, and proper task scheduling and
prioritizations to ensure overall correctness.
• Application optimization involves ensuring efficient code placement of all modules to
minimize memory latency and cache misses, efficient usage of the DMA for transferring data
across internal memories, peripherals and external memories, and minimizing the CPU idle
time. In this process, changes may have to be made to the application, which will require
iterating through the validation and optimization cycles.
• Real-world interactions involve supplying appropriate external stimuli to run the
application. These stimuli could be application data inputs/outputs or control signals, such as
interrupts.
Application development may involve going through multiple iterations of these steps. Figure 2
and Figure 3 depict this application development flow and broadly indicate which simulator
configurations are suitable during this flow.
The next section describes the available configurations and features to meet the developmental
challenges.

4 Choosing the Appropriate Simulator Configuration in Code Composer Studio IDE


SPRA864

Bug fix

Test Test
functionality performance Optimized
Build Debug Tune
algorithm

Optimize

Functional CPU simulator Cycle accurate CPU simulator

Figure 2. Algorithm Validation and Optimization and the Simulator Configurations

Real-world
interaction

Bug fix

VOL
Test Test
Integrate/ functionality performance
Debug Tune
build

FIR

Optimize
Application
framework

Functional device simulator, Functional device simulator,


device simulator device simulator, real device

Figure 3. Application Validation and Optimization and the Simulator Configurations

3 Choosing the Appropriate Configuration


Code Composer Studio IDE provides different simulator configurations that address the needs of
different stages of application development. These configurations differ in capabilities such as
the extent of the DSP device simulated, the level of detail to which the DSP device is simulated,
features for simulating external stimuli, and support for debug and efficiency analysis tools.

The simulator configurations are classified based on the extent of the DSP device simulated and
the level of detail to which the DSP device is simulated (for details, refer to section A.1):

Choosing the Appropriate Simulator Configuration in Code Composer Studio IDE 5


SPRA864

• Core simulators
– functional CPU simulator
– cycle-accurate CPU simulator
– CPU and cache simulators
• Full-device simulators
– functional-device simulator
– device simulator
The features supported for simulating external stimuli to the DSP are classified as follows (for
details, refer to section A.2):
• pin connect
• port connect
• cross bar
• boot load
• external host port
The validation and optimization features supported by the simulators are classified as follows
(for details refer to section A.3):
• pipeline analysis
• simulator analysis events
• Code Composer Studio IDE profiler
• cache analysis (part of the analysis tool kit released with Code Composer Studio IDE v2.2)
• code coverage and multi-event profiler (part of the analysis tool kit released with Code
Composer Studio IDE v2.2)
Refer to Appendix A for further details on these capabilities. All these simulator configurations
and features are integrated into the Code Composer Studio IDE. Therefore, all the simulator
configurations support Code Composer Studio IDE features such as viewing CPU/peripheral
registers, viewing memory contents, setting breakpoints, etc.
Let us now focus on how the simulation configurations can be used to address the different
validation and optimization challenges identified.

3.1 Simulator Configurations


The tables below provide the information on various simulator configurations available with Code
Composer Studio IDE v2.2.
Table 1. TMS320C6x Simulator Configurations
Configuration Description
C62xx Cycle Accurate Simulates the core of the C62x processor. This is faster than the device simulator but does not
Sim, Little Endian simulate peripherals and cache system (uses a flat memory system).

C64xx Cycle Accurate Simulates the core of the C64x processor. This is faster than the device simulator but does not
Sim, Little Endian simulate peripherals and cache system (uses a flat memory system).

6
SPRA864

Table 1. TMS320C6x Simulator Configurations (Continued)


Configuration Description
C64xx Cache Simulator, Simulates the core of the C64x processor. This also models the L1D, L1P and the L2 caches.
Little Endian Beyond L2 it uses a flat memory system.

C67xx Cycle Accurate Simulates the core of the C67x processor. This is faster than the device simulator but does not
Sim, Little Endian simulate peripherals and cache system (uses a flat memory system).

C6201 Device Sim, Little Simulates the C6201 processor. Supports PBUS, DMA, DMS, PMS, McBSP(2), timer(2), EMIF
Endian Map 0 supports interfacing with async, SDRAM and SBSRAM memory models (EMIF not fully cycle
accurate). Does not support HPI.

C6202 Device Sim, LIttle Simulates the C6202 processor. Supports PBUS, DMA, DMS, PMS, McBSP(3), timer(2), EMIF
Endian Map 0 supports interfacing with async, SDRAM and SBSRAM memory models (EMIF not fully cycle
accurate). Does not support Exp.Bus 32-bit.

C6203 Device Sim, Little Simulates the C6203 processor. Supports PBUS, DMA, DMS, PMS, McBSP(3), timer(2), EMIF
Endian Map 0 supports interfacing with async, SDRAM and SBSRAM memory models (EMIF not fully cycle
accurate). Does not support Exp.Bus 32-bit.

C6204 Device Sim, Little Simulates the C6204 processor. Supports PBUS, DMA, DMS, PMS, McBSP(2), timer(2), EMIF
Endian Map 0 supports interfacing with async, SDRAM and SBSRAM memory models (EMIF not fully cycle
accurate). Does not support Exp.Bus 32-bit.

C6205 Device Sim, Little Simulates the C6205 processor. Supports PBUS, DMA, DMS, PMS, McBSP(2), timer(2), EMIF
Endian Map 0 supports interfacing with async, SDRAM and SBSRAM memory models (EMIF not fully cycle
accurate). Does not support PCI.

C6211 Device Simulator, Simulates the C6211 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, timer(2),
Little Endian McBSP(2), EMIF supports interfacing with async and SDRAM memory models. Does not support
HPI.

C6414 Device Simulator, Simulates the C6414 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, interrupt selector,
Little Endian McBSP(3), timer(3), EMIF supports interfacing with async, SDRAM and generic sync RAM
memory models. Does not support HPI, Utopia.

C6415 Device Simulator, Simulates the C6415 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, interrupt selector,
Little Endian McBSP(3), timer(3), EMIF supports interfacing with async, SDRAM and generic sync RAM
memory models. Does not support HPI, PCI, Utopia.

C6416 Device Simulator, Simulates the C6416 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, interrupt selector,
Little Endian McBSP(3), timer(3), TCP, VCP, EMIF supports interfacing with async, SDRAM and generic sync
RAM memory models. Does not support HPI, PCI, Utopia.

C6416 Functional Simulates the C6416 processor. This is faster than the device simulator but does not simulate all
Simulator, Little Endian the peripherals. Supports functional timer(2), interrupt selector , EDMA and QDMA (uses a flat
memory system).

C6411 Device Simulator, Simulates the C6411 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, interrupt selector,
Little Endian McBSP(2), timer(3), EMIF supports interfacing with async, SDRAM and generic sync RAM
memory models. Does not support HPI, Utopia.

C6701 Device Sim, Little Simulates the C6701 processor. Supports DMA, McBSP(2), timer(2), EMIF supports interfacing
Endian Map 0 with async, SDRAM and SBSRAM memory models (EMIF not fully cycle accurate). Does not
support HPI.

7
SPRA864

Table 1. TMS320C6x Simulator Configurations (Continued)


Configuration Description
C6711 Device Simulator, Simulates the C6711 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, McBSP(2),
Little Endian timer(2), EMIF supports interfacing with async and SDRAM memory models. Does not support
HPI.

C6712 Device Simulator, Simulates the C6712 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, McBSP(2),
Little Endian Timer(2), EMIF supports interfacing with async and SDRAM memory models.

C6713 Device Simulator, Simulates the C6713 processor. Supports L1D, L1P, L2 cache, EDMA, QDMA, timer(2), EMIF
Little Endian supports interfacing with async and SDRAM memory models, McBSP(2), McASP(2), interrupt
selector. Does not support HPI, IIC.

C6713 Functional Simulates the C6713 processor. This is faster than the device simulator but does not simulate all
Simulator, Little Endian the peripherals. Supports functional timer(2), interrupt selector, EDMA and QDMA (uses a flat
memory system).

Table 2. TMS320C55x Simulator Configurations

Configuration Description
C55xx Functional Simulates the C55x CPU Rev 2.1 core. This gives the fastest possible result but pipeline effects
Simulator of the CPU are neglected; means instructions are executed one at a time. Supports the timer but
doesn’t support any other peripherals. This simulator will not be cycle or cycle count accurate.

C55xx Cycle Accurate Simulates the C55x CPU Rev 2.1 core. Supports program/data memory with latency. If the
Simulator memory configuration is not provided, a flat memory system(memory with no latency, no
DARAM/SARAM) is used as default. Supports the timer but doesn’t support any other peripheral.

C55xx Cache Simulator Simulates C55x CPU Rev 2.1 core. Supports program/data memory with latency. If the memory
configuration is not provided, a flat memory system (memory with no latency, no
DARAM/SARAM) is used a default. Also supports timer and C55x Instruction Cache. Does not
support any other peripheral.

C5510 Device Simulator Simulates the C5510 processor. Supports ICache, DMA, EMIF, timers (2), McBSP (3), RHEA,
and EHPI. Doesn’t support DPLL and GPIO. Internal memory interface supports interfacing with
SARAM and DARAM models. External memory supports interfacing with asynchronous and
SBSRAM models.

C5502 Functional Simulates the C5502 processor. This is faster than the device simulator but does not simulate all
Simulator the peripherals. Supports functional timers (3), watchdog timer, DMA, and ICache. Doesn’t
support EMIF, McBSP, VBUS, IIC ,UART and UHPI peripherals. Uses flat memory system.

C5502 Device Simulator Simulates the C5502 processor. Supports ICache, DMA, EMIF, timers (3), watchdog timer,
McBSP (3), VBUS, IIC, and UART peripherals. Doesn’t support UHPI. Internal memory interface
supports interfacing with SARAM and DARAM models. External memory supports interfacing
with async, SBSRAM, and SDRAM models.

NOTE:
• All the configurations on the C6x have a corresponding big endian mode configuration
supported in the product. Subsequent references to simulator configurations apply to both
the endian modes.
• All the configurations on the C6x having map 0 specified in the above table have a
corresponding map1 configuration supported in the product.

8
SPRA864

• Since C55x has a protected pipelined architecture, there are two variants of CPU simulators
available on the C55x platform: functional CPU simulator (having no pipeline effects
modeled) and cycle accurate CPU simulator (having the pipeline effects modeled
accurately).
• Most of the capabilities of the simulator configurations mentioned in algorithmic development
stages are also applicable to the application development stage. If they are not applicable,
they will be specifically mentioned.

3.2 Algorithm Validation


Algorithm development is typically CPU centric; algorithms may be developed and validated in
isolation of the application framework. Debug visibility into application code variables and data
structures, CPU registers, and memory are key during this development stage. Algorithm
validation is targeted to ensure coverage of all developed code.
The functional CPU simulator on the C55x and the CPU simulators on the C6x are best suited
for algorithm validation. These run faster than other simulator configurations (refer to the
TMS320C6000 Instruction Set Simulator Technical Overview (SPRU600) and the TMS320C55x
Instruction Set Simulator Technical Overview (SPRU599) for details on performance numbers of various
simulator configurations). These configurations also simulate tasks running on the DSP/BIOS.

The code coverage tool gives information about the source code that was not exercised in a run
of the application.

3.2.1 Recommended Configurations


TMS320C55x:
C55x functional simulator
TMS320C6x:
C62x, C64x, C67x cycle accurate CPU simulators
Related features:
Breakpoints, register views, memory views, RTDX, probe points, pin connect, port connect,
DSP/BIOS, and code coverage and multi-event profiler.
Example:
In Figure 1, the algorithms VOL and FIR can be debugged using these simulator configurations.
Data may be fed to these algorithms through RTDX channels or probe points.

3.3 Algorithm Optimization


Algorithmic optimization involves optimizing for CPU cycles and code size. Identifying regions of
application code that consume excess CPU cycles, as well as the causes for these excess
cycles are key in this stage.
The C55x cycle accurate CPU simulator and the C6x CPU simulator are best suited for this
stage.

9
SPRA864

3.3.1 Recommended Configurations


TMS320C55x:
C55x cycle accurate simulator
TMS320C6x:
C62x, C64x, C67x cycle accurate CPU simulators
Related features:
• Pipeline stall analyzer (C55x).
• Code Composer Studio IDE profiler, code coverage and multi-event profiler.
3.3.2 Profilers
The Code Composer Studio IDE profiler helps identify hot spots by giving cumulative, min, max,
and average event counts over functions and ranges of code. The code coverage and
multi-event profiler helps identify causes of the performance losses by providing profile data over
events such as pipeline stalls, cache misses, and so on. It gives cumulative event counts for
multiple events on a source line basis. The multi-event profiler helps pinpoint the exact lines of
the code-contributing stall cycles. For more details, refer to online help for the Code Composer
Studio IDE profiler and Using Code Coverage and the Multi–event Profiler for Robustness and
Efficiency Analysis (SPRA868).
3.3.3 Pipeline Analysis
Once the hotspots are identified, it is important to know their causes. The CPU simulator
accurately simulates the instruction set behavior of the target DSP. For the C55x
pipeline-protected architecture, the simulator accurately simulates the stall behavior due to
various resource conflicts. This pipeline behavior can be seen on the simulator through the
pipeline stall analyzer. This will help identify causes for pipeline stalls and help optimize critical
routines coded in assembly language. For more details, refer to the online help for information
on the pipeline stall analyzer.
Example:
In Figure 1, VOL and FIR need to be optimized to meet cycle and memory budgets. The
multi-event profiler tells the exact location of cycle losses in the code. Further, on the C55x, the
pipeline stall analyzer can be used to trace the exact cause for such a stall.

3.4 Application Validation


Application correctness needs to be ensured after integrating the different algorithms and control
code into the application framework. The integration process may involve creation of DSP/BIOS
tasks, using CSL and device drivers for programming the peripherals, and proper placement of
buffers in internal and external memories.
The device simulator configurations model the DMA, serial ports, timers, and other peripherals.
These configurations help in validating the integrated applications. Since there may be multiple
iterations involved in ensuring correctness of the application, the functional device simulator
configurations are preferred due to their higher simulation speeds.
The differences between the functional and the cycle accurate device simulators are presented
in Table 3.

10
SPRA864

Table 3. Functional Device Simulators vs. Device Simulators

Functional vs. Cycle Accurate Memory Subsystems Functional vs. Cycle Accurate DMA
The functional device simulators model the cache system to Applications use DMA to transfer data across internal
give correct event counts for cache hits and misses. The memories, peripherals and external memories. These
cycle accurate simulators, besides giving correct event transfers are typically synchronized with events such as
counts, also model the cycle latencies due to memory interrupts, serial port events, etc. For simulating the
accesses The cycle accurate cache models in the device application behavior correctly, it would suffice to simulate the
simulators can be used to get an estimate of the cycles DMA transfers synchronized on those events, and not
gained after optimizing the application, using the analysis necessarily model the latencies accurately.
information from the functional device simulators. The functional simulator utilizes this characteristic to
simulate applications correctly with higher speeds by not
modeling the latencies.
Cycle accurate DMA on device simulators mimic the real
target behavior by modeling the latencies, the bus
contentions, the transfer protocols, etc.

3.4.1 Recommended Configurations


TMS320C55x:

C5502 functional simulator


Use this simulator if the peripheral capabilities needed by the application matches the
peripherals supported in this simulator. Otherwise, the C5502 or C5510 device simulator can
be used.
For example, if DMA and timers are used, C5502 functional device simulator is used. If
UART or I2C is used by the application, use the C5502 device simulator, since the C5502
functional simulator does not model these peripherals.

TMS320C6x:

C6713 functional simulator, C6416 functional simulator


Use these simulators if the peripheral capabilities needed by the application matches the
peripherals supported in these simulators. Otherwise the C6414/15/16, C620x, C621x,
C671x device simulators can be used.
For example, if DMA and timers are used, C6416/C6713 functional device simulator is used.
If TCP/VCP is used by the application, use the C6416 device simulator

Related features:

Pin connect, port connect are used for providing external stimuli for peripherals such as the
serial port. Simulator analysis events can be used for observing events on the target. These
events can be used for debugging by configuring them to stop simulation when the event occurs.

Example:

For the application in Figure 1, the VOL and FIR algorithms are integrated into an application
framework that uses DSP/BIOS and CSL. The functional device simulator may be used to verify
application correctness.

11
SPRA864

3.5 Application Optimization


Application optimization involves ensuring efficient code placement of all modules to minimize
memory latency and cache misses, efficient usage of the DMA for transferring data across
internal memories, peripherals and external memories, and minimizing the CPU idle time. The
functional device simulators and the device simulators can be used to perform these
optimizations.
3.5.1 Recommended Configurations
Cache optimizations:
• TMS320C55x:
– C55x cycle accurate CPU simulator
This configuration simulates the I-cache and can be used with the cache analysis tool to
analyze the cache hit/misses.
• TMS320C6x:
– 6713 functional simulator, 6416 functional simulator
These functional device simulator configurations can be used to analyze cache
hits/misses. They support the cache analysis tool for analyzing the L1P, and L1D cache
performance.
– C64x cache simulator
This configuration can be used to analyze the cycle effects of cache hits and misses on
the C64x devices. This simulator configuration simulates the cycle effects of the L1P,
L1D, and L2.
Note: The cache analysis tool is supported only with the functional device simulator
configurations.
Buffers transfer optimizations:
• TMS320C55x:
– C5502 device simulator, C5510 device simulator
• TMS320C6x:
– C6201/2/3/4/5 device simulator
– C6211 device simulator
– C6711/2/3 device simulator
– C6411/4/5/6 device simulator
3.5.2 Cache Optimizations
Optimizing for memory accesses involves analyzing program/data cache misses and memory
access patterns. The functional device simulator configurations can be used to quickly simulate
the application and obtain this analysis data. The cache analysis tool can be used to visualize
these access patterns and identify areas of application improvement. The higher simulation
speeds of the functional device simulator configurations enable performing these application
optimizations over multiple iterations.
Once these optimizations are completed, the user can run the application on the device
simulator or the hardware, such as the test and evaluation board (TEB), to get an accurate
estimate of the cycles gained.

12
SPRA864

3.5.3 Buffer Transfer Optimizations


In the background of CPU computations, applications transfer buffers using the DMA across
internal memories, peripherals and external memories. Optimizing these transfers may involve
choosing the right buffer sizes, and sequencing the DMA transfers appropriately to maximize
parallelism between CPU and DMA. The device simulator configurations can be used to
simulate and evaluate the effects of these optimizations.
Example:
When FIR and VOL are integrated into the application framework, the functional device
simulator configurations can be used for memory placement and cache optimizations. The
device simulator configurations can be used to perform buffer transfer optimizations and
measure the cycle effects of all optimizations.

3.6 Real-World Interactions


When simulating the integrated application, it is sometimes essential to provide appropriate
external stimuli to the peripherals, as would happen in a real system. These stimuli could be
some external interrupts or events indicating data availability from the external sources.
Additionally, these stimuli can be simulated using the pin connect and the port connect features
in Code Composer Studio IDE.
Pin connect and port connect features are available on all the simulator configurations. Details
on these are available in the technical reference guide for each ISA simulator.
It is also possible to simulate the device boot process on the C6x simulators and external host
port interactions on the C55x simulators.

3.7 Setting Up Simulator Configurations in Code Composer Studio IDE


Following are the steps to set up the desired simulator configuration in Code Composer Studio
IDE.
1. Click on the Code Composer Studio Setup icon. This will bring up the Import Configuration
window as shown below.

13
SPRA864

2. Select the desired configuration from the available list of configurations shown in the
window.
Click on Import and then Save and Quit.
This will set the selected simulator configuration in the Code Composer Studio setup.

14
SPRA864

3. Save and close the Code Composer Studio setup.

4 User Scenarios
Table 4 lists out a few user scenarios during application development and highlights the
appropriate simulator configuration and the applicable features than can be used in Code
Composer Studio IDE.
Table 4. Algorithm Validation and Optimization
Platform Problem Solution
(a) Algorithm Validation
C55x/C6x I am using FIR, IIR and LMS algorithms from For quick validation, use the C55x functional simulators.
a vendor. How can I quickly validate them on For the C6x, the CPU simulator configurations can be
the C55x/C6x platforms? used.

15
SPRA864

Table 4. Algorithm Validation and Optimization (Continued)


Platform Problem Solution
(b) Pipeline Stalls
C55x I am trying to minimize pipeline stalls by The C55x CPU cycle accurate simulator configuration
reordering certain instructions that seem to supports the pipeline stall analysis feature. This tool
create resource conflicts. How can I do this clearly shows the causes of pipeline stalls, IBU stalls, and
easily? the instructions and resources that are resulting in
conflicts. It allows stepping through the application code
to easily identify the exact areas of code where conflicts
occur.
(c) Algorithm Profiling
C6x/C55x I am working on the C6x/C55x platform. I Use any of the 62x, 64x, 67x ISA simulator configurations
need to get the cycles that are spent in or the C55x CPU cycle accurate simulator configurations.
instructions, without memory latency. The Code Composer Studio IDE profiler or the
multi-event profiler may be used for profiling the
application code.

(d) Profiling a Particular Event


C6x How can I find out details of memory bank A memory bank conflict event is supported on all C6x
conflicts on a per function basis? simulators. The Code Composer Studio IDE profiler can
be setup to profile the code on this event, instead of the
default CPU cycles. This enables profiling the code for
memory bank conflicts and see the per function
distribution of this event.
The multi-event profiler may be used to obtain the this
profile information is needed for memory bank conflicts
simultaneously with other events such as CPU cycles,
cache events, etc.

(e) Algorithm Optimization


C55x I would like to see the cycle effects of The C55x cycle accurate CPU simulator configuration
assembly code optimizations such as using can be used to measure these optimizations. Profile the
dual MAC instructions, adding parallel portion of the code that needs to be optimized and the
instructions in various places in my code. cycle difference can be used for getting the best possible
How can I do that? performance.

(f) Algorithmic Optimization


C55x I am developing an algorithm for the C55x The C55x CPU simulator configuration can be used with
platform. I do not know how to choose the the Code Composer Studio IDE profiler, code coverage
ideal sizes for the local-repeat and and multi-event profiler and the pipeline stall analyzer to
block-repeat loops so that I get best measure branch overheads.
performance from C55x architecture.

16
SPRA864

Table 5. Application Validation and Optimization


Platform Problem Solution
(a) Cache Analysis
C55x I am developing an application for C5502 The C55x cycle accurate CPU simulator has I-cache
device. I want to decide on the placement of information and can be used with the cache analysis tool.
my program within the internal and external The C5502 functional simulator configuration provides
memory such that I can take full advantage of information on I-cache hits and misses. The application
the I-cache in C5502. How can I do it? can be simulated using any of the above configurations
and modified to minimize the number of misses. The
C5502 device simulator configuration can be used to
measure the effects of cycle latencies for different types
of internal and external memories.
(b) Cache Analysis
C6x I am developing an application for C6416 The C6416 functional device simulator configuration can
device. I want to decide on the placement of be used with the cache analysis tool to visualize the
my program within the internal and external memory access patterns and details of cache hits and
memory such that I can take full advantage of misses.
the caches in these devices.

(c) Simulating Full-Application


C55x I am developing a vocoder application on the The C5502 functional device simulator configuration can
C5502 device. The application uses the be used initially while integrating the application modules.
McBSP0 for input data and the McBSP1 for The C5502 device simulator configuration can be used to
sending the decoded data streams. The data validate the programmation of the McBSP0 and McBSP1
streams are transferred to internal memory peripherals. The pin connect and port connect features
using the DMA for the CPU to operate on can be used to provide the input data stream to the
them. McBSP0 and collect the output through the McBSP1.
Can I use the simulator to validate this
application?
(d) Simulating Full Application
C6x I have a gsm.729 algorithm running on a RF3 A functional simulator would have been more appropriate
framework. The application uses tasks if the McBSP was supported. Since it is not supported,
through DSP/BIOS, programs DMA through use the C6416 device simulator for the verifying the
CSL calls, and uses McBSP for the data functional correctness of the application. Use the port
transfers. I want to test the functional connect feature to transfer in the data through the
correctness of the application. McBSP. The pin connect feature can be used to provide
the external frame syncs and the clock to the McBSP if
they are programmed for external clock and frame syncs.

17
SPRA864

Table 5. Application Validation and Optimization (Continued)


Platform Problem Solution
(e) Difference Between CPU, Functional Device and Device Simulators
C6x/C55x I have code that does not use any peripheral Cycle counts: The CPU simulator and functional device
or the DMA. What are the differences, I would simulator configurations will provide identical cycle
observe if I run it on CPU simulator, functional counts. The device simulator configurations may provide
device simulator, and device simulator? different cycle counts if the memory ranges used by the
application falls in the external memory ranges of the
device. On the C55x, the C55x simulator configuration will
give accurate CPU cycle counts and will differ from the
C55x functional simulator configuration.
Simulation speed: The CPU simulator configurations will
run faster than the functional device simulator
configurations. The device simulator configurations will
run slower than the functional device simulator
configurations.

5 References
1. Code Composer Studio Getting Started Guide (SPRU509)
2. Code Coverage And Multi-event Profiler User Guide (SPRU624)
3. Using the Code Coverage And Multi-event Profiler for Robustness and Efficiency Analysis
(SPRA868)
4. Cache Analysis User Guide (SPRU575)
5. Using the Cache Analysis Tool to Improve Cache Utilization (SPRA863)

18
SPRA864

Appendix A

A.1 Simulator Configurations Based on Extent and Detail of Device Simulated


• Functional CPU simulator
This models the ISA behavior without bringing in the pipeline effects. Memory latency is
constant. This helps in ensuring the functional validation of the modules. These run typically
faster than the cycle accurate CPU simulator. This variant is available for C55x simulators
only.
• Cycle accurate CPU simulator
This models the pipeline behavior accurately, giving the detailed behavior on cycle losses
due to CPU stalls and bank conflicts in on-chip memory. They help optimize the modules for
instruction and efficient utilization of the CPU resources.
• CPU and Cache Simulator
This models the cycle behavior of L1I, L1D, and L2 for C64x platform.
The functionality of DMA, caches, and other peripheral is modeled to the extent of ensuring
that applications may be run without modification. These simulators are order of magnitude
faster compared to the device simulators. They compromise on the cycle counts, though still
give accurate event counts. These are primarily used in the validation and certain
optimization phases at the application level.
• Functional device simulators and device simulators
These simulators model very closely the cycle behavior of the caches, DMA, and other
peripherals. These are generally used to get an estimate for real cycles. They run at a much
slower speed compared to the functional device simulators.

A.2 Features to Provide External Stimuli to Target Device


• Pin connect
Pin connect enables the user to simulate and monitor signals from external interrupts. For
taking in external interrupts/triggers, some pins are simulated in the C6x/C5x devices. Any
file having the specified format can be connected to those pins. These formats specify the
absolute/relative cycle for these interrupts to occur.
• Port connect
The port connect tool allows the user to access a file through a memory address. This
feature is very useful to set up an input or output data stream to the simulator at supported
addresses. Whenever a file is connected to a memory (port) address for read (write), data
from the file is accessed whenever there is a read (write) to the address.
• Cross bar
This tool is used for specifying the interconnectivity within McBSP or McBSP. This different
McBSP can be interconnected, through their external pins like DR, FSR, and FSX, etc. This
is available on only some C6x simulators.
• Boot load
This feature can be used to bootload some code into the device memory at the start of
simulation. This is available on some C6x simulators.

Choosing the Appropriate Simulator Configuration in Code Composer Studio 19


SPRA864

• External host port


This helps simulate the behavior of host interaction through a simple command-file based
mechanism. This is available on some C55x simulators.

A.3 Visibility and Analysis Features


• Pipeline analysis
This feature allows visibility into pipeline stages and instructions residing in each stage of the
pipeline as they enter and exit it. It gives stalls including, the instructions involved and
conflicting resources.
• Simulator analysis
Simulator analysis allows the user to set up and monitor the occurrence of specific events.
The simulator analysis plug-in reports the occurrence of particular system events to monitor
and measure the performance of your program. The events can be set up to either
increment a counter when they are triggered or to halt the execution when they are
triggered.
• Code Composer Studio IDE profiler
This helps profile over functions or the ranges in code over clocks or analysis events.
• Cache analysis
It graphically visualizes the memory reference pattern of a program over time. This enables
the programmer to quickly target the areas of code and data that are incurring cache misses,
and provides a road map for applying optimizations and transformations to improve cache
performance.
• Code coverage and multi-event profiler
The code coverage and multi-event profiler tool, available in the Analysis Toolkit for CCS
V2.2 User’s Guide (SPRU623) provides two capabilities:
– Code coverage information by identifying source code that was not exercised in a run of
the application.
– Profile data for functions over multiple events of interest in a single run of the application.

20 Choosing the Appropriate Simulator Configuration in Code Composer Studio


IMPORTANT NOTICE

Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications,
enhancements, improvements, and other changes to its products and services at any time and to discontinue
any product or service without notice. Customers should obtain the latest relevant information before placing
orders and should verify that such information is current and complete. All products are sold subject to TI’s terms
and conditions of sale supplied at the time of order acknowledgment.

TI warrants performance of its hardware products to the specifications applicable at the time of sale in
accordance with TI’s standard warranty. Testing and other quality control techniques are used to the extent TI
deems necessary to support this warranty. Except where mandated by government requirements, testing of all
parameters of each product is not necessarily performed.

TI assumes no liability for applications assistance or customer product design. Customers are responsible for
their products and applications using TI components. To minimize the risks associated with customer products
and applications, customers should provide adequate design and operating safeguards.

TI does not warrant or represent that any license, either express or implied, is granted under any TI patent right,
copyright, mask work right, or other TI intellectual property right relating to any combination, machine, or process
in which TI products or services are used. Information published by TI regarding third–party products or services
does not constitute a license from TI to use such products or services or a warranty or endorsement thereof.
Use of such information may require a license from a third party under the patents or other intellectual property
of the third party, or a license from TI under the patents or other intellectual property of TI.

Reproduction of information in TI data books or data sheets is permissible only if reproduction is without
alteration and is accompanied by all associated warranties, conditions, limitations, and notices. Reproduction
of this information with alteration is an unfair and deceptive business practice. TI is not responsible or liable for
such altered documentation.

Resale of TI products or services with statements different from or beyond the parameters stated by TI for that
product or service voids all express and any implied warranties for the associated TI product or service and
is an unfair and deceptive business practice. TI is not responsible or liable for any such statements.

Mailing Address:

Texas Instruments
Post Office Box 655303
Dallas, Texas 75265

Copyright  2002, Texas Instruments Incorporated

You might also like