Professional Documents
Culture Documents
Cyan 2800398239029h09fn0ivj0vcjb0
Cyan 2800398239029h09fn0ivj0vcjb0
basic operations.
In pure assembly language one assembly language statement corresponds to one
basic operation of the processor. When a programmer writes in assembly language
the programmer is asking for the basic operations of the processor.
Most processors endlessly repeat three basic steps. Each machine cycle results in
the execution of one machine instruction.
Processor chips (and the computers that contain them) are often described in
terms of their clock speed. Clock speed is measured in Hertz, where one Hertz is
one clock tick per second.
MHz means megaHertz, a million clock ticks/sec., GHz means giga Hertz, a
billion clock ticks/sec.
Each byte of main storage has an address. Most modern processors use 32bit addresses, so there are 232 possible addresses. Think of main storage as if
it were an array:
byte[0x00000000 ... 0xFFFFFFFF] mainStorage;
Computer systems also have cache memory. Cache memory is very fast RAM that
is inside (or close to) the processor. It duplicates sections of main storage that are
heavily used by the currently running programs. Access to cache memory is much
faster than to normal main memory.
MIPS Architecture
A register is a part of the processor that holds a bit pattern. Processors have many
registers.
MIPS processors have 32 general purpose registers, each holding 32 bits. They
also have registers other than general purpose ones. MIPS instructions
are 4 bytes = 1 word = 32 bits.
The processor chip contains registers, which are electronic components
that can store bit patterns. The processor interacts with memory by
moving bit patterns between memory and its registers.
Load: bits starting at an address in memory is copied into a register
inside the processor.
Store: bits are copied from a processor register to memory at a
designated address.
or d, $s1, $zero
Instruction Types
R-Format
Op (6
bits)
0
0
000000
Rs (5
bits)
$s1
17
10001
I-Format
Rt (5
bits)
$s2
18
10010
Rd (5
bits)
$s0
16
10000
Shamt (5
bits)
0
0
00000
Funct (6
bits)
32
32
100000
Op (6
bits)
35
35
100011
Rs (5
bits)
$s3
19
10011
Inst
Lw
Op
35
Rs
Base
Sw
43
Base
Rt (5
bits)
$t0
8
01000
Rt
Dest
reg
Source
reg
Address
Offset
Offset
Chapter 4 Review
The performance of a computer is determined by three factors.
1. Instruction Count
2. Clock Cycle Time
3. Clock Cycles Per Instruction
I.
II.
III.
IV.
V.
All instructions start by using the program counter to supply the instruction
address to the instruction memory. The program counter is just a register that
contains the address of the next instruction to execute.
After the instruction is fetched, the register operands used by an instruction are
specified by fields of that instruction.
Once the values of the registers have been fetched, they can be operated on by
the ALU to compute a memory address for a load or store, to compute an
arithmetic result, or to do a compare for branch instructions.
If the instruction is arithmetic, the ALU result must be written to a register. If the
operation is a load or store, the ALU result is used as an address to store a value
from the registers or load a value from memory into the registers.
Branches require the use of the ALU output to determine the next instruction
address.
The simplicity and regularity of the MIPS instuction set simplifies the
implementation by making the execution of the three instruction classes similar. We
only need a few extra hardware components to support R-type, I-type and J-type
instructions.
For examle, all instruction types except jump use the ALU after reading the
registers. Memory-reference instructions like Load instructions use the ALU to
compute an address calculation, and arithmetic-logical instructions use the
ALU for performing calculations on two registers.
A load instruction will need to access memory to read data and write that into a
register.
A store instruciton will need to access memory to write data into memory from a
register.
The bits of the control lines select which values of the multiplexor should be turned
on, it guides the flow of the program depending on the operation type.
The datapath contains elements used to operate on or hold data within a
processor. Datapath elements include the instruction memory, data memory, the
register file, the ALU, and adders.
Load and Store ALU OP: add
Branches ALU OP: subtract
R-Type ALU OP: depends on funct number.
MemRead asserted for load instructions, tells memory to do a read.
MemWrite asserted for store instructions, tells memory to do a write.
Fetching Instructions
We must start by fetching the instruction from memory, to
prepare for executing the next instructions, we must also
increment the program counter so that it points at the
next instruction, 4 bytes later.
The program counter is a 32-bit register that is written at
the end of every clock cycles and thus does not need a
write control signal.
Register File
R-Type instructions take in two
registers and the ALU performs an
operation on them, and writes back
the result to the write register.
The register number inputs are 5
bits wide to represent any of the
32 MIPS registers.
The ALU takes two 32-bit inputs and produces a 32-bit result, as well as a 1-bit
signal if the result is zero. The 4-bit control signal of the ALU specifies which
operation will be performed on the inputs. For example, a beq instruction performs
subtraction.
Data Memory
Load and store instructions
read register operands,
calculate addresses using a 16bit offset, which must be sign
extended before sent to the
ALU.
Load: Read memory and
update register.
Store: Write register value to memory.
Branch instructions read register operands and claculate the next
target address if result is zero.
R-Type Instructions
Load Instruction
Jump Instruction
It is inefficient, the clock cycle must have the same length for every
instruction, of course the longest possible path in the processor
determines the clock cycle. It is not feasible to vary period for different
instructions, this violates the design principle of making the common
case fast. Therefore we will improve performance by pipelining.
A single cycle implementation thus violates the design principle of
making the common case fast. We will see another datapath
implementation technique called pipelining, which has much higher
throughput. The way this is accomplished is by executing multiple
instructions simultaneously.
Pipelining
A pipeline is an implementation that allows us to speed up the number of instructions we can get
done in a certain amount of time. Parallelism improves performance and we want to make the
common case fast.
The MIPS pipeline goes through five stages, the datapath cannot do more than one of the same
stages in one clock cycle, so its one step per stage.
IF
ID
EX
MEM
WB
Assume time for stages is the same for the instructions, to make our
calculations easier. However some stages do run faster than others, for
example load word takes the longest.
If all stages are balanced (if they all take the same time), then the speedup is
calculated by =
Speedup=
With forwarding:
Use the result when it is computed, dont wait for it to be stored
into a register.
o Requires extra connections in the datapath.
Without the stall, the path from memory access stage output to execution stage
input would be going backward in time, which is impossible.
For any load-use without forwarding, you would have to stall two times.
The compiler will try using instruction scheduling to avoid stalls. The code will be
reordered to avoid use of load result in the next instruction.
Although we could try to rely on compilers to remove all such hazards, the results
would not be satisfactory.
These dependences happen just too often and the delay is just too long to expect
the compiler to rescue us from this dilemma.
Structural Hazards
This occurs when a planned instruction cannot execute in the proper clock cycle
because the hardware does not support the combination of instructions that are set
to execute.
Control Hazards
Occurs from the need to make a decision based on the results of one instruction
while others are executing.
Solution #1:
Stall: Just operate sequentially until the first batch is dry and then repeat until you have the right
formula. This conservative option certainly works, but it is slow.
Solution #2:
Predict outcome of branch, Only stall if prediction is wrong
Static Branch Prediction
Based on typical branch behavior.
Pipeline Summary
Pipelining improves performance by increasing instruction throughput. The way this
is done is by running multiple instructions in parallel, and each instruction has the
same latency.
Pipelining is subject to hazards, the three hazards are structural, data, and control.
Instruction set design affects the complexity of pipeline implementations.
Pipeline Registers -
pipeline.