Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Performance Evaluation

  Basic concepts

Midterm Recap
  Response time
  Throughput
  Speedup
  CPI
  IPC
Sections 1.1 ~ 1.4, 1.8~1.10,
  Amdahl’s law
Sections 2.1, 2.4 ~ 2.8,
Appendices A&B

How to Summarize Performance Instruction Set Architecture


  Arithmetic mean   ISA types
  Stack
  Weighted arithmetic mean   Accumulator
  General-purpose registers
  Register-memory

  Geometric mean   Register-register

  Common memory addressing modes


  Harmonic mean   Register, immediate, replacement, ……
n / (1/s1 + 1/s2 + … + 1/sn)   Byte ordering: big vs. little endian

1
Basics of MIPS Pipelined 5-Stage Data Path
  Instruction format
 I-type, R-type, J-type
  Instruction types
 ALU, load/store, control, FP
  5-stage pipeline
 IF, ID, EX, MEM, WB

MIPS FP Pipeline Dependences and Hazards


  Dependences  possible hazards
  Dependences
 Data, name (anti, output), control
  Hazards
 RAW, WAR, WAW, branch

2
MIPS Five-Stage Pipeline With/
Dependences vs. Hazards
Without Data Forwarding
Data
  Without data forwarding
  i: ADD R3, R1, R2
  RAW (if j gets the “old” j: ADD R5, R3, R4
value of R3)  Resultexchanges via register file
  Anti i: ADD R3, R2, R1
 Producer: WB  consumer: ID
WAR (if i gets the “new”
  With data forwarding
  j: ADD R2, R4, R5
value of R2)
 Result produced  result used
  Output i: ADD R3, R2, R1
  WAW (if final result in R3 is
j: SUB R3, R4, R5
produced by i)

Register Renaming Dynamic Scheduling


• Eliminate WAR and WAW hazards by register renaming   Split ID stage into two
1.  Issue: Decode inst, check for structural hazards
DIV.D F0,F2,F4 DIV.D F0,F2,F4 2.  Read operands: Wait until no data hazards, then
ADD.D F6,F0,F8 ADD.D S,F0,F8 read operands
S.D F6,0(R1) S.D S,0(R1)   In-order instruction issue
SUB.D F8,F10,F14 SUB.D T,F10,F14   Out-of-order execution
MUL.D F6,F10,F8 MUL.D F6,F10,T   An inst begins execution as soon as its data
operand is available
  Out-of-order completion  cause complication in
handling exception

3
Tomasulo Components Three Stages of Tomasulo Algorithm
  RS entry 1. Issue—get instruction from Inst Queue
 Op—Operation to perform in the unit If reservation station free (no structural hazard),
 Vj, Vk—Value of source operands control issues inst & sends operands (renames registers).
2. Execution—operate on operands (EX)
 Qj, Qk—Reservation stations producing source
When both operands ready then execute;
registers if not ready, watch Common Data Bus for result
  Qj,Qk = 0  ready 3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
 Busy—Indicates reservation station or FU is busy mark reservation station available
  Register result status
  Nospeculation
 Indicates which RS will write each register
  In-orderissue, out-of-order execution, and out-of-order
  Blank: no pending instructions writing the register completion

Dynamic Scheduling, Single-Issue Dynamic Scheduling, 2-Way Issue


Iteration Inst Issue Exe Mem W CDB Iteration Inst Issue Exe Mem W CDB
1 LD R2, 0(R1) 1 2 3 4 1 LD R2, 0(R1) 1 2 3 4
1 ADD R2, R2, #1 2 5 6 1 ADD R2, R2, #1 1 5 6
1 SD R2, 0(R1) 3 4 7 1 SD R2, 0(R1) 2 3 7
1 ADD R1, R1, #4 4 6 7 1 ADD R1, R1, #4 2 3 4
1 BNE R2, R3, loop 5 7 1 BNE R2, R3, loop 3 7
2 LD R2, 0(R1) 6 8 9 10 2 LD R2, 0(R1) 4 8 9 10
2 ADD R2, R2, #1 7 11 12 2 ADD R2, R2, #1 4 11 12
2 SD R2, 0(R1) 8 9 13 2 SD R2, 0(R1) 5 9 13
2 ADD R1, R1, #4 9 10 11 2 ADD R1, R1, #4 5 8 9
2 BNE R2, R3, loop 10 13 2 BNE R2, R3, loop 6 13
2-way issue (branch single-issue), separate INT FUs for address, ALU,
branch, two CDBs

4
Dynamic Scheduling vs.
Reorder Buffer
Speculative Execution
  Dynamic scheduling (w/o speculation)   Contain all in-flight instructions
  A branch must be resolved before actually executing any
instructions in the successor basic block (those instruction   Reorder out-of-order inst to program
can be issued)
  Issue, Exec, Memory (R/W), Write CDB order at the time of writing reg/
  Speculative execution (using dynamic scheduling) memory (commit)
  Allow the execution of later instructions before the branch
is resolved (with the ability to undo the effect of an   Buffer results/supply operands between
incorrectly speculated sequence)
  Issue, Exec, Read memory, Write CDB, Commit (Write execution complete and commit
memory)

Example Architectural Simulator


Iteration Inst. Issue @ Exec @ Read Write Commit@
Mem @ CDB @   Measurement
1 LD R2, 0(R1) 1 2 3 4 5   Accurate
  Only working on existing systems, not flexible
1 ADD R2, R2, #1 1 5 6 7
  Constructing hardware prototype -- Slow, expensive, and
1 SD R2, 0(R1) 2 3 7 complicate
1 ADD R1, R1, #4 2 3 4 8   Analytical models
1 BNE R2, R3, loop 3 7 8
  Fast and with insights
  Hard to model the complexity of today’s processor
2 LD R2, 0(R1) 4 5 6 7 9
  Simulator
2 ADD R2, R2, #1 4 8 9 10   Fast, cheap, flexible and relatively accurate
2 SD R2, 0(R1) 5 6 10
2 ADD R1, R1, #4 5 6 7 11   What is an architectural simulator?
2 BNE R2, R3, loop 6 10 11   A tool that reproduces the behavior of a computing device

You might also like