Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Embe

ded Computing Platform Design 2.33


2.8. COMPONENTS FOR EMBEDDED PROGRAMS
There are three most important structures or components we need to write any

kinds of embedde programs. Those components are

1. State machine

2. Circular buffer and

3. Queue.
Among. three components state machines are well suited to reactive systems
Such as user interfaces. Circular buffers and queues are useful in digital signal
processing
2.8.1. STATE MACHINE
When inputs given to any kinds of systems, the reaction of most systems can be

characterized in terms of the input received and the current state ofthe system. It
leads to a technique known as finite state machine style.
The finite state machine style will describethe reactive systems behavior. The
state machine style of programming is also an efficient implementation of hardware
design.
2.8.2. CIRCULAR BUFFERS AND STREAM ORIENTED PROGRAMMING
The Circular buffer is a data structure that handles streaming data in an
efficient way. Below figure shows how a circular buffer stores a subset of the data
stream. At each point in time, the algorithm needs a subset of the data stream that
forms a window into the stream.
To avoid copying data within the buffer, we will move the head ofthe buffer in
time. The buffer points to the location at which the next sample will be placed.
Every time we add a sample means automatically overwrite the oldest sample,|
which is the one that needs to be thrown out. When the pointer gets to the end of
the buffer, it wraps around to the top.
Many Digital signal processors provide addressing modes to support circular
Duffers. For example, the C55x provides five circular buffer start address registers.
These registers allow circular buffers to be placed without alignment constraints.
Embedded and Real Time System
tems
2.34 absence of special:.
write our own C code for a circular buffer with ized
understand the operation ofthe buffer.
We can

helps us to
instructions. This code also

Timet

2 6
Time

Time t+ 1

Data stream

2 2

3 3

Timet Timet+1

Fig. 2.23. A circular buffer

Signal flow graph


The FIR filter isonly one type of digital filters. Signal flow
represent many different filtering structures.
graphs are used to

x(n)-
-y(n)

b1

Fig. 2.24. A signal


flow graph
2.35
Embedded Compuling Platform Design
The filter operates at a sample rate with inputs arriving and outputs generateda

the sample
rate. The inputs x(n) and y (n) are sequences indexed by n, which
corresponds to the sequence of samples.
In this graph nodes can be either arithmetic operators or delay operators. I he
labeled zl 1S a
node adds its two inputs and produces the output y (n). The box
means that
delay operator. Ihe z notation represent z transform and-1 superscript
from the delay
the operation performs a time delay of one sample period. The edge
with b, that the output of the
operator to the addition operator is labeled
means

delay operator is multiplied by b,.

The filter takes in a new sample period. The new input


sample on every
in the circular buffer
the oldx becomes x, etc. xa is stored directly
becomes x ,
added to the output sum.
butmust be multiplied by b, before being

PRODUCER/CONSUMER SYSTEMS
2.8.3. QUEUES AND
are used
and event processing. Queues
processing
Queues are also used in signal times or when
whenever data mayarrive depart at somewhat unpredictable
and
elastic buffer.
also called as an
variable amounts of data may arrive. Queue
another oneis
build in two ways. One is using linked list and
Queues can be
list method can allows
the queue to grow
Build a queue using linked
using array.
with array cam hold all
the data.
to an arbitrary size.
Designing the queue
data elements while a queue may
buffer always has
a fixed number of
Circular
elements in it.
numbers of
have a varying time period. But
take in the same amount of data in each
Digital filters always amounts of data
over time and
take in varying
systems may in a chain means the
Signal processing
When these systems operate
amounts. another stage.
produce varying variable rate input of
becomes the
output of
one stage
variable rate

|Producer/Consumer System

- -e-e-
Consumer system
2.25. Aproducer/
Fig.
2.36 Embedded and Real Time Systems
Above figure shows a simple
producer/consumer system. Pl and P2 are t
blocks that perform algorithmic
processing. The data is fed to them by queues that
act as elastic buffers. The queues modify the flow
as store data.
of control in the system as well
For
example consider P2 runs ahead of P1, it will eventually run out of data in
itsq12 input queue. At that point, the queue will return an empty signal to P2.
this point, P2 should At
stop working until more data is available.
This method is easier to
implement in a multitasking environment and also
possible to make effective use of queues in programs structured as
procedures. nested
Data Structures in Queues
The queues in
producer/consumer system may hold either uniform sized data
a

elements or variable sized data elements. In some


cases, the consumer needs to
know how many ofa given
type of data element are associated together. The queue
can be structured to hold a
complex data type. The data structure can be stored as
bytes or integers in the queue.

2.9. MODELS OF PROGRAMS


Programs are collections of instructions to execute a
specific task. Models for
programs general than source code. We cannot use source code
are more

because we have different types of source directly


code such assembly language, C code
as
and so on. We mustsingle model to describe all of them. Single model
use
can be perform usage|
many useful analyses on the model more easily.
Control/ Data Flow Graph (CDFG)
Control/Data Flow graph is a fundamental model for
programs. The CDFG has
contracts that model both data operations and control
CDFG in clear format first
operations. To understand the
we must understand data
descriptions.
2.9.1. DATA FLOW GRAPHS
A data flow graph is a model
of a program with noconditions. In a high level
programming language, a code segment with no condition have only one entry and
exit point is known as a basic block.
Patform Design 2.37
edded Computing
Embeddea

Forexample

W a +b;

X a -C,

y X + d;

X a +c;

Z y+e;
Fig. 2.26. A basic block in C

and it appears twice on the left


In the above
code x is having two assignments
side of an assignment. So we
need to rewrite the code with single assignment.
if any variable having two assignments means it contains only the latest
Because
assigned value. The modified format for above code is
W a +b;

x1 a-c
y x 1 +d;
-C
x2 a +c;

z y+e;

Fig. 2.27. Basic block in single assignment form


because it clearly mention a unique|
Single assignment form i_ more important
the data flow graph is
location of variable in the code. In single assignment form
acyclic.
27.,2. TYPES OF NODES IN DATAFLOW GRAPH

of nodes in the graph such


as
There are two types

. Round nodes represent operators (0)

Square nodes represent values (0)

value nodes be either inputs to the basic block, such as a and b or


ne may
dataflow graph
variables ass
Dles assigned to within the block and x.The
such as w for
above single-assignment
single-assig 2.28.
code is shown in below figure
e
2.38 Embedded and Real Time System

Advantages of Dataflow Graph

1. Order of execution operation is mention


2. It reduces pipeline process
3. It is used to determine feasible
reorderings of the operations.
a

Fig. 2.28. Data Flow graph


2.9.3. CONTROL/ DATA FLOW GRAPHS
A CDFG uses a
(CDFG)
dataflow graphs for constructing
having two types of nodes decision control. CDFU
1. Decision nodes

Decision nodes are used to describe


the control in a
sequential program.
Embedded Computing Platform Design 2.39|
flow nodes
2. Data
Dataflow node encapsulates a complete data flow graph to represent a basi1C

block.
if(cond 1)
basic_block_1 ();
else
basic block_2 ();
basic block_3 ():
switch ( test 1)

case C1 : basicblock4 )

break;
case C2: basic_block_5():
break;
case C3: basic_block_6 ();

break;

For the above C code, we have following CDFG code which is shown in
fig.2.29.
In the CDFG construction it has two kinds of nodes.

1. Rectangular node used to represent the basic blocks


2. Diamond shaped nodes used to represent the conditions.
While loop consists of both a test and loop body, each of which we know how to
represent in a CDFG we can represent for loops by in C, a for loop is defined in
terms of a while loop.
for (i =0;i<N; i++)
2.40
Embedded and Real Time System
loop body ( );

is equivalent to
i =0;
while (i<N)

loop body ( );
i++

Cond 1 T basic block_100

basic block20|

basic block_3()|

test 1
C1

basic block
40||basic_block_50||basic_block 6()

CDFG

Fig. 2.29. CDFG code


Embedded Computing Platform Design 2.41
For complete CDFG model a data flow graph used to model each data flow
node, CDFG is a hierarchical representation and dataflow CDFG can have
complete dataflow graph.

We can build a CDFG for an assembly language program. ARM and many
VLIW processors support predicated execution of instruction for that we need

special constructs of the CDFG.

2.10. ASSEMBLY,LINKING AND LOADING


Assembly and linking are the last steps in the compilation process. They convert
into of program bits in memory. Loading puts the
a list of nstructions an
image
program in memory so that it can be executed.

High Level Assembly Assembler


Object
Compiler code
code
Language
code

Linker

Execution Loader Executable


O binary

compilation through loading


Fig. 2.30. Program generation from
executable program. It will perform
Above diagram shows how to generate an
the following sequence of process.
code. But
high level language code into machine
1. Compiler is converting
machine code. It will generate
do not directly generate
most compilers
code.
human readable assembly language
translate symbolic assembly language statements
2. The assemblers Job is to as object code.
bit-level representations of instructions known
into
addresses.
Assembler has translate labels into
to
2.42 Embedded and Real Time
3. Program may be built from many
System
files, the final step is determinina
Linker. the
addresses of instructions and data are performed by the linker.
produces an executable binary file. Loader will load the will
memory for execution.
program into
Types of Addresses
1. Absolute addresses
2. Relative addresses
The starting address of the
the programmer is called as
assembly language progranm has been
specified by
absolute addresses.
In relative
addresses, the origin of the assembly
computed later. These are the basic functions of Janguage module is to be
into executable code. converting high level language
2.10.1. ASSEMBLERS
Assemblers not only translate
assembly code into object code, it also translate
opcode and format the bits in each instruction and translate
labels into addresses.
Local translation is the more complex task of
1. The first pass
assembler.
scans the code to
determine the address of each label
2. The second
pass assembler assembles the instructions
computed in the first using the label values
pass.

add r0, r1, r2


PLCXX add r3, r4, r5
XX 0x0
cmp r0, r3

YY sub r5, r6, r7 YY 0x 10

Symbol table

Assembly code
Fig. 2.31. Symbol table
processing during assembly
Enbedded Compuing Platform Design 2.43
In the first pass the name of each symbol and its address is stored in a symboi
table. The symbol table is built by scanning from the first instruction to the last
instructions.

During scanning process the current


location in memory is kept in a program
location counter (PLC). The PLC is not used to execute the program, only to

assign memory locations to labels. PLC and PC same but PLC always makes
are
exactly one pass through but program counter (PC) makes many passes over code
in a loop.
After examining the line, the assembler updates the PLC to the next location and

looks at the next instruction. Ifthe instruction beings with a label, a new entry is
made in the symbol table, it includes the label name and its value. The value ofthe
label is equal to the current value of the PLC.
At the end of the first pass, the assembler rewinds to the beginning of the
assembly language file to make the second pass.
During the second pass, when a label name is found, the label is looked up in the
symbol table and its value substituted intothe appropriate place in the instruction.
Assembler allows label to be added to the symbol table without occupying space
in the program memory. A typical name ofthis pseudo-op is EQU, for equate.
The ARM assembler supports one pseudo-op that is particular to the ARM
instruction set. In other architectures an address would be loaded into a register by
reading it from a memory location.
The assembler produces an objectfile that describes the instructions and data in
binary format. A commonly used object file format is known as COFF (common
object file format).
The object file must describe the instructions, data and any addressing
information and also usually carries along the symbol table for later use in
debugging. To understand the details ofturning reloadable code, into executable
Code we must understand the linking process.

|2.10.2. LINKING
Many assembly Language programs are written as several smaller pieces rather
Lhan as a single largefile. A linker allows a program to bestitched together out of
Embedded and Real Time
2.4 Systema
several smaller pieces. The linker operates on the object files created by
assembler and moditfies the assembled code to make the necessary links betwee
the
ween|
files
Some labels will be defined and used in the same file and other tables wil
be defined in a single file but used elsewhere.
Label defined place is known as an entry point and Label used place i
known as an external references. The main Jobs of the loader is to resolve
external references based on available entry point.
*Even entire. symbol table is not kept for later debugging purposes,
if the it|
must be at least pass the entry points.
Phases of Linker

There are two phases available in linker process.


1. First phase

2. Second phase

In tirsi phase, it determines the address of the start of each object file.
In second phase, the loader merges all symbol tables from the object files
into a single large table.
Work stations and PCs provide dynamically linked libraries and some
embedded computing environment also
provides it, Dynamically linked
libraries allow them to be linked in at the start of
program execution.
2.10.3. OBJECT CODE DESIGN
When designing an embedded
system, we may need to control the placement
several types of data such as o"|
Interrupt vectors and other ihformation for
specific locations. VO devices must be placed in
Memory management tables must be set
up
Global variables used for
communication
locations that are accessible to all the usersbetween processes must bePut in
of that data.
Embedded Computing Platform Design 2.45
Reentrancy
Many programs should be designed to be reentrant. A program is reentrant if

can be interrupted by call


another to the function without changing the re-ults ot
either call. If the program changes the value of global variables, it may give a
different answer when it is called recursively.

Example
int foo=1;
inttask 1 ()
{
foo foo +1;
return foo;

In above example, the variable foo is modified, so task1 () gives a different


answer on every invocation. So we can avoid this problem by passing foo in as an
argument.
int task 1 (int foo)

return foo +1;

Relocatability
A program is reloadable if it can be executed when loaded into different parts
of memory. It provides some support from Hardware that provides-address
calculation. But it is possible to write non relocatable code for non relocatable
architecturés. Any addresses that are not fixed by the, architecture or system
configuration should be accessed using relocatable code.

2.11. COMPILATION TECHNIQUES


A compiler is a special program that processes statements written in a particular
programming language and turns them into machine language or code that a
Embedded and Real Time Systen
2.46 tems
Understanding how a compiler works will heln us
computer's processor
uses. to
implementation
compiler to get the assembly language we
code and direct the
write
want.

2.11.1. cOMPILATION PROCESS

Implementing on embedded computing system often requires controlling tha


of data and instructiono
instruction sequence used to handle interrupts, placement

in memory and so on. It is useful to


understand how a high level language programm
is translated into instructions. Many applications
are performance sensitive, so to
to meet our performance goals.
understand how code is generated can help us
Compilation
= Translation + optimization

The high level language program is translated


into the lower level form of
better instructions sequences.
instructions, optimizations try to generate

High Levet
language code

Parsing, symbol table generation, Semantic Analysis

Machine independent optimizations

Instruction level optimizations and code generation

Assembly
code

Fig. 2.32. The Compilation Process


EmbeddedComputing Platformm Desig
2.47
to ensure that
Optimization tCchniques focus on more of the program
are not
decisions that to be good for one statement
compilation
pi
appears

nnecessarily problematic for other parts of the program.


Compilation process begins with high level language code such as C or C*

and generally produces assembly code.


it into statements and
T h e high level language program is parsed to break
includes all the named
expressions. A symbol table is generated which
objects in the program.

Compilers perform two kinds of optimization such as

1. Instruction level optimization

2. Machine independent optimization


arithmetic expressions are one example of a machine independent
Simplifying
does not perform this kinds of optimizations.
optimization. All compilers
2.11.2. STATEMENT TRANSLATION
and logical expressions. Converting
A large amount of code is having arithmetic
how to
are more complex tasks of compiler. To understand
these expressions
let us consider the following example.
compile a single expression,
Arithmetic Expression
Compiling an
X= a x b+5 x(c -d)
the variable is
written in terms of program
In above arithmetic expre_sion, to memory
able to perform_memory
machines we may be
variables. In s o m e variables.
locations corresponding to those
arithmetic directly on, the
we must first load the variable into
such as the ARM,
In many machines the named variables
Tegister. This requires choosing
which registerreceive not only
but also intermediate results such as (c -d).
intermediate values and final result have been
for the
Thetemporary variables trom the tree's root by
To generate code, we walk
named as w, x, y and z.
instructions to
order. During the walk, we generate
uaversing the nodes in post
cover the operation at every node.
Embedded and Real Time Systems
ms
2.48

Fig. 2.33.Graph for above expression

The nodes are numbered in the order in which code is generated, since every
node in the data flow graph corresponds to an operation that is directly supported
by the instruction set. The ARM code for above expression is shown in below

figure.
Optimization is to reuse a register whose value is no longer needed. In the case
of the intermediate values w, x and y, we know that they cannot be used after the
end of the expression. The find result z may in fact be used in a C assignment and
the value reused later in the program.

We have large programs with multiple expression, so we must allocate registers


more carefully because CPUs have a limited number of registers.
Drawing a control flow graph based on the while form of the loop helps us
understand how to translate it into instructions.

C compilers can generate assembler source, which some compilers interspc


rse,

with the C Code. Such code is very a


good mbly
way to learn about both assC
language programming and compilation.
Embedded Compuing Platform Design
2.49

operator 1 (+)

ADR r4, a
get address for a
MOV r1, [r4]
Load a
ADR r4, b
get address for b
MOV 2, [r4]
load b
ADD 13,r1,12
put w into r3

operator 2 (-)
ADR r4, c
get address for c
MOV 4, [r4] load c

ADR r4,d get address for d


MOV r5, [r4] load d
SUB r6,r4, r5 put z into ró
operator 3 (*)

MUL 17, r6, #5 operator 3, puts y into r7

operator 4 (+)
ADD r8, r7, r3 operator 4, puts x into r8
assign to x

ADR r1,x
STR r8, [rl] assigns to x location
CS
A

2.50 Embedded and Real Time


Systems
i 0 Loop initiation code
f 0;

Loop exit i<N Loop test


N

f=f+C[ilrX[il:| Loop body

i=i+ 1; Loop variable update

Fig. 2.18.
2.11.3. PROCEDURES
Creation of procedures is the
code for
major problem in code generation. Generating|
procedure relatively straight forward. Procedure definition
is
the procedure call and must handle
return
In modern
programming language the CPUs subroutine call mechanism
usually not sufficient to directly support is
procedures. So procedures stack and
procedure linkages are different kinds of functions
performed on procedure.
Procedure Linkage
Procedure linkage mechanism
provides a way for the program
prograr to pasS
parameters into the program_and for the
procedure to return a value. It aso
provides help in
restoring the values of registers that the procedure has moditfied.
All procedures in a
given programming language use the
mechanism. The mechanism can also be same inkag|
used to call handwritten
language routines from compiled codc. assemoy
Embedded Computing Platform Design
2.51
Procedure Stack

Procedure stacks typically built to grow down from high addresses. It


are

two pointers, stack pointer defines the end of the current frame and frame has
defines the end of the last frame. pointer
The procedure can refer to an element in the frame by
stack pointer. When a new
addressing relative to
procedure is called, the stack pointer and frame pointer
are modified to push another frame onto the stack.

2.11.4. DATA STRUCTURES


Data structures are the way of organizing the data. The
compiler must also
translate references to data structures into reference to
raw memories.
Converting
data structures require address computations, some of these computations can be
done at compile time and others must be done at run time.

Example
Linked list

Array
Queue
Structure
Union etc.

Array
Array is interesting data structure because the address of an array element is
generally computed at run time. Arrays have three kinds such as
1. One-dimensional Array
2. Two dimensional Arrays
3. Multi - dimensional Arrays

) One-Dimensional Array
It contains only one subscript value. Consider one dimensional aray which is
having following format. a [i] it contains i number of values. The memory layout
of one dimensional array is shown like this.
2.52 Embedded and Real Time Syt
Systems
a a[0]

a[]
X OPLT

alil
The Zero element is stored as the first element of the array, the first elemen
directly below and so on. We can create a pointer for the array and array pointer is
a variable it contains the address of another variable
Amay pointer points to the arrayheadnamely a [0]. Ifwe call that pointer apt
for convenience, then we can rewrite the reading of a [i] as

(aptr+ i)
Gin) Two dimensional Arrays
It contains two subscript values. Two dimensional arrays are more challenging.|
There are multiple possible ways to Layout a two dimensional array in memory.
.

One form of memory Layout for two dimensional arrays is row major.
In the row major the inner variable of the array (j in a [i, jl varies most quickly.|

a[0,0]

a[0,1]
**°°*

a[l,0]

a[l,1]
Embedded Computing Platform Design 2.53
dimensional arrays also require more sophiscated addressing. First we must
Two
Enow the size of array. In row-major form if the a[] array is of size N x M, then
we
can turn the two dimensional array access into a one dimensional array
access.

a [i, j]
becomes

a [i* M+j]
Where the maximum value forj is M -1.

2.11.5. COMPILER OPTIMIZATIONS


Basic compilation techniques can generate inefficient code. Compilers use a

wide range of algorithms to optimize the code they generate.

Loop Transformations
Loops are important program structure because they arecompactly described in
the source code and they use a large fraction of the computation time. Many
techniques have been designed to optimize loops.

Loop Unrolling
A simple and useful transformation is known as loop unrolling. It is important
because it helps to expose parallelism that can be used by later stages of the
compiler.
LoopFusion
Loop fusion is a process used to combines two or more loops into a single loop.
For this transformation to be legal, two conditions must be satisfied.

1. The loops must iterate over the same yalues.


2. The loop bodies must not have dependencies that would be violated if they
are executed together.

Loop Distribution
Loop distribution is the opposite of loop fusion that is, decomposing a single
loop into multiple loops.
2.54 Embedded and Real Time
Loop Tiling
Systems
Loop tiling process breaks up a loop into a set of nested loops, with each ins
1oop performing the operation on a subset of the data. inner
for (i =
0; i < N; +)i +
for(i 0;i <N; i =2)+

for (i =

0;j < N;j + +) for =0:j < N;j 2) + =

c[il= a[i,j] b[iJ; for(i i=i; ii <min (i +2, N); i ++)


forj=j: ij < min G+2, N);j ++)
c[i i] =a [i i,jj] * b[i i:

o, 0-0, 2]0, N-11 0, 0 1 O , 1 [ 0 , 2] [0, N - 1]

1, 01-11, 21-1, N-1]


Access
[1,01--1, 11| 11, 2]. N-1
pattern
in
(2.012.2-2,N-1 2. 012.112.2 [2, N-1
13, 0113, 2-[3, N-1)
an array
13, 0-13, 11'| 13, 2] 3, N-11

Before After

Fig. 2.35. Loop tiling


Each loop is split into two loops. The inner ii loops iterates within the tile and
the outer i loop iterates across the tiles.

Drawback
It changes the order in which array elements are
accessed, so allowing us to
better control the behavior of the cache conflicts
during loop execution.
Dead Code Elimination
Dead code is code that can never be executed. Dead code can be generated by
programmers or by compilers. Dead code can be identified by reach ability
Analysis.
Enbedded Computing Platform Design 2.55
Reachability Analysis is the process of finding the other statement or
instructions from which it can be reached. If a given piece of code cannot be
reached, or it can be reached only by a piece of code that is unreachable from the
main program, then it can be eliminated. Dead code elimination will analyzes the
code for reachability and trims away dead code.

Register Allocation
Register allocation is a very important compilation phase. For a block of code,
we want to choose assignments of variables, (both declared and temporary) to.
registers to minimize the total number of required registers.
If a section of code requires more registers than are available means we must
spill some of the values out to memory temporarily. After computing some vàlues,
we write the values to temporary memory locations, reuse those registers in other
computations and then reread the old values from the temporary locations to
resume work.

Scheduling
Scheduling is the process of selecting a processes from the ready state to
running state. Every CPU manufacturers generally disclose enough information
about the micro architecture to allow us to schedule instructions even when they do
not provide a detailed description of the CPU's internals.
Reservation Table
Reservation table is used to keep track of CPU resources during instruction
scheduling. It has shown in belowfigure.

Time Resource A Resource B

t+1 X X

t+ 2 X

t3 X

Fig. 2.36. Reservation tablefor instruction scheduling


Embedded and Real Tume System
tems
2.56 instruction types 3 an 14
resource
A while
and 2 both slots aand
use
time
instruction types I instruction
execution
An
table Row represent
B. In that
use resource

that must be scheduled.


resources

particular time,
we chect
columns represent executed at a
instruction to be
Before scheduling
an
needed by the instruction
determine whether
all resources

reservation table to
the
available at that time.
are to note all resources used
the instruction, we update the table
Upon scheduling
table provides a good summary of the state of an
that instruction. Reservation
by
instruction scheduling problem in progress.

Software Plipelining
instructions across several loop
Software pipelining is a technique for reordering
iterations to reduce pipeline bubbles.
Instruction Selection

to implement each operation


are not trivial.|
Selecting the instructions to use the same
instructions that can be used to accomplish
There may be several different
execution times. Using one instruction
for one
goal but may have different
they
instructions that can be used in adjacent code.
part ofthe program may affect the

2.12. PROGRAM LEVEL PERFORMANCE ANALYSIS


The CPU performance is more important to understanding performance of any
systems. But the CPU performance is not judged in the same way as program|
performance. CPU clock rate is a very unreliable metric for program performance.
But more importantly, the CPU executes part ofour program quickly doesn't mean
that it will execute the entire program at the rate we desire.
The execution time of a program varies with the input data values because those
values select different execution paths in the program. For example, loops may be
executed a varying number oftimes and different branches may execute blocks ot

varying complexity.
The CPU pipeline and cache act as windows into our program. The cache hasa
major effect on program performance and cache's behavior depends in part onthe
data values input to the program.

You might also like