סיכום מוקלד בעברית

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Digicomp .

summaries

Brought to you by: Merav, Yuval and some "happy" helpers.


5...........................................................................................................................................................................................
5.............................................................................................................................................................................. :
5.......................................................................................................................................................................... :
5....................................................................................................................................................................................... :
5.................................................................................................................................................................................... :CARRY
5...................................................................................................................................................................
5.................................................................................................................................................................................. ...
)5......................................................................................................................................................... (...
6......................................................................................................................................................... ...
6.............................................................................................................................................................................
6...............................................................................................................................................................................
6...............................................................................................................................................................
7........................................................................................................................................................................................ BCD
7.................................................................................................................................................................................. :
8..........................................................................................................................................................................
8....................................................................................................................................................... :
8....................................................................................................................................................................................... :
8................................................................................................................................................................................:
8...................................................................................................................................................................... MIN/MAX-TERM
8............................................................................................................................................................................. :Minterm
8............................................................................................................................................................................ :Maxterm
8.......................................................................................................................................................................................... :
) (8.................................................................................................................................................. :
8................................................................................................................................................................................... :
9........................................................................................................................................................................... :
) 3(9..............................................................................................................................................................:
) 4( 9...............................................................................................................................................................
10................................................................................................................................................................. FLOATING POINT
10................................................................................................................................................................... :
) (PRECISION 10................................................................................................................................................. :
) (ACCURACY 10......................................................................................................................................................:
10..........................................................................................................................................................................:FIXED POINT
10............................................................................................................................................................................. ....
10....................................................................................................................................................................................
10............................................................................................................................................................................... :
10................................................................................................................................................................................. :Base
10................................................................................................................................................................................ IEEE754
10.................................................................................................................................................................................
, 11......................................................................................................... .
, 11............................................................................................................ .
11........................................................................................................................... .
11................................................................................................................................... :
11........................................................................................................................................................................ :
12......................................................................................................................................................... :NORMALIZED NUMBER
12................................................................................................................................................................................. :
12............................................................................................................................................................................... :
12..................................................................................................................................................... :DENORMALIZED NUMBER
12................................................................................................................................................................................. :
12............................................................................................................................................................................... :

12..................................................................................................................................................................... :SPECIAL CASES


12................................................................................................................................................................................. :
12............................................................................................................................................................................... :
12........................................................................................................................................................................... :OVERFLOW
12......................................................................................................................................................................... :UNDERFLOW
12................................................................................................................................................................................. :
13............................................................................................................................................ COMBINATORIAL CIRCUITS
13........................................................................................................................................................................... :
13................................................................................................................. Fredkin

13................................................................................................................... Toffoli

13................................................................................................................................................................................... :
13................................................................................................................... :Pmos

13................................................................................................................... :Nmos

13........................................................................................................................................................ :
14......................................................................................................................................... :MULTI BIT LINES
14................................................................................................................................................................:REDUCTION GATES
14............................................................................................................................................................. :ONE/N DIGIT ADDER
15..................................................................................................................................................... :2-WAY, 1-BIT ,MUX
15.................................................................................................................................................................................. :D-MUX
15.............................................................................................................................................................................. :DECODER
15.............................................................................................................................................................................. :ENCODER
15............................................................................................................................................... ROM READ ONLY MEMORY
17..................................................................................................................................................... SEQUENTIAL CIRCUITS
17............................................................................................................................................................ :SEQUENTIAL CIRCUIT
17.............................................................................................................. :() ASYNCHRONOUS SEQUENTIAL CIRCUIT
17.................................................................................................................:() SYNCHRONOUS SEQUENTIAL CIRCUIT
17................................................................................................................................. :( ) COMBINATORIAL CIRCUIT
17.............................................................................................................................................................................. :D-LATCH
17.................................................................................................................................................................................. :CLOCK
17............................................................................................................................................................ :()FLIP-FLOP
18..................................................................................................................................................................... :1-BIT REGISTER
18..................................................................................................................................... :RAM RANDOM ACCESS MEMORY
19............................................................................................................................... TIMING IN SEQUENTIAL CIRCUITS
19..................................................................................................................................................................... :
19.................................................................................................................................................................... :
19................................................................................................................................................................................. :TLH
19................................................................................................................................................................................. :THL
19................................................................................................................................................................................. :TPD
19................................................................................................................................................................................. :TCD
19..................................................................................................................................................................................... :Tr
19..................................................................................................................................................................................... :Tf
19.......................................................................................................................................................................:
19................................................................................................................................................................................. :TPD
20................................................................................................................................................................................. :TCD
20.............................................................................................................................. :(FLIP-FLOP )
20...................................................................................................................................................................... :TPD,C->Q
20...................................................................................................................................................................... :TCD,C->Q
20........................................................................................................................................................................... :THOLD
20.......................................................................................................................................................................... :TSETUP
20.................................................................................................................................................................................... :T0
20.................................................................................................................................................................................... :T1
22........................................................................................................................................................................................... CPU
22.............................................................................................................................. :ISA (INSTRUCTION SET ARCHITECTURE)
22................................................................................................................................................................... :MEMORY TYPES
22......................................................................................................................................................................................... :PC
22..........................................................................................................................................................................................:IR

22..................................................................................................................... :EXECUTING ARITHMETIC/LOGIC INSTRUCTION


23.................................................................................................................:MIPS RTL CPU
23................................................................................................................................ :Load/Store architecture
23.................................................................................................................................................. :Load/Store architecture
23.................................................................... :(Microprocessor without Interlocked Pipeline Stages) MIPS Architecture
23................................................................................................................................................................. :MIPS
25........................................................................................................................................................... Instruction memory
25.......................................................................................................................................................... Instruction Register
25..................................................................................................................................................................... Register File
26.................................................................................................................................................................................. ALU
26.................................................................................................................................................................... Sign extender
26.................................................................................................................................................................... Data Memory
27................................................................................................................................................................ ,
28......................................................................................................................................... :PROCESSOR PERFORMANCE
28........................................................................................................................................................................... :CPU TIME
28................................................................................................................................................................ SPEED UP
29......................................................................................................................................................
29.................................................................................................................... :MIPS (Millions of instructions per second)
29.............................................................................................. :FLOPS (number of floating point operations per second)
29................................................................................................................................................................................ Peaks
29...................................................................................................................................................................... Benchmarks
30............................................................................................................................................................................
31............................................................................................ MEMORY HIERARCHY CACHE, VIRTUAL MEMORY
31.................................................................................................................................................. :LATENCY (RESPONSE TIME)
31......................................................................................................................................................................... :BANDWIDTH
31...................................................................................................................................................................... :
31.................................................................................................................. :DRAM (DYNAMIC RANDOM ACCESS MEMORY)
31..................................................................................................................................................................... :
31................................................................................................................................................................................... CACHE
32................................................................................................................................................................................... :Hit
32................................................................................................................................................................................. :Miss
32....................................................................................................................................................... :Block size (Line size)
32...................................................................................................................................................... : cache
32.................................................................................................................................................................... :
32...................................................................................................................................................................... :
32...................................................................................................................................................................... CACHE DESIGN
32................................................................................................................................................ .Direct Mapping 1 WAY
33............................................................................................................................................... Fully Associative N WAY
34.................................................................................................................................. Set Associative [SET SIZE] WAY
36........................................................................................................................................................ REPLACEMENT POLICIES
36...........................................................................................................................................................................
36......................................................................................................................................................... Write through policy
36.............................................................................................................................................................. Write back policy
36................................................................................................................................................ :CACHES AND PERFORMANCE
37................................................................................................................................................................. CACHE MISS-RATE
37.................................................................................................................................................:localities
37............................................................................................................................................... cache
38.......................................................................................................................................................................... :The 3 c's
38................................................................................................................................................. ?miss rate
39............................................................................................................ tradeoff
40................................................................................................................................................................ VIRTUAL MEMORY
40............................................................................................................................................
40.................................................................................................................................................................................. Page
40......................................................................................................................................................................... Page fault
40.......................................................................................................................................................................
40......................................................................................................................................................................... Page table
41................................................................................................................................................................................... TLB

42.................................................................................................................................................................................PIPELINE
42..............................................................................................................................................................................
42...................................................................................................................................................................... IDEAL PIPELINE
42.................................................................................................................................................. : pipeline
43............................................................................................................................................................. NON-IDEAL PIPELINE
43..........................................................................................................AMDAHL'S LAW (SPEED UP OF PARALLEL COMPUTING)
43.................................................................................................................................................................... :Pipeline
44..................................................................................................................... DESIGNING AN INSTRUCTION PIPELINE
44.............................................................................................................................................................
44............................................................................................................................................................
44............................................................................................................................................................. :
45.............................................................................................................................................................. The TYP pipeline
49............................................................................................................................... (STALLS)
49........................................................................................................................................ (data dependency)
49................................................................................................................................ (control dependency)
49.............................................................................................................................................. Types of data dependencies
50........................................................................................................................................ Pipeline Hazard
50....................................................................................................................................................................... Forwarding
51..................................................................................................................................................... Speculative forwarding
52.............................................................................................................................. RTL REGISTER TRANSFER LOGIC
52...................................................................................................................................................................................... CAR
52..............................................................................................................................................................


: i 10 .i
145.3 = 1*10^2 + 4*10^1 + 5*10^0 + 3*10^(-1) +
: } {0,1 =0 =1 ,.
.
1101.11 = 2^3 + 2^2 + 0*2^1 + 2^0 + 2^(-1) + 2^(-2) + .

carry .
, , .
: , .

:Carry

"" ... , , carry . ,


! ! !!! !!!!
, .


=0 .=1 ,.
.
... , .
1 NOT
- 2 , 2
* , 1 .1
* " "1 . ) (
Not.
* 2^m x = 2's complement.=m , =X . .
: . ...
9 ,9 10^m x 1 = 9'2 complement. : =m , =x .
10 m .10^m x = 10's complement. x .
) (... .r .r-1
=R.
=N .
=M .
=X .
R^(M) (r^(-n)) x = (R-1)'s complement
:
11.01  100-11.01 0.01 = 00.10 + 0.01 = 00.11
, r " , ) (1.

...
-1
*
* " "1.
-2
* ) (.
* .
, .
3-4 = 3 + (10-4) = 10 (4-3) = 10 1 = 9
.
-9 .1
-10 .2


,.
.2
x,y .2
,.x y 2 .x + 2n y = 2n + x-y :
, x>y .x-y > 0 ) ,2n + (x-y .
, . .x-y
,x<y .x-y < 0
) .(x-y ,2n (y x) : .
) (. 2 .
\ , , .


x=a1a2a3an . b1b2bm 10 :k
, ) (a1an ).(b1bn
:
"" ,k .i
, k " ,"floor
, ==.0
:
. "floor" . , .i
, ) . (


:
, , r .1
, .0

BCD
4 .

:
BCD 1001 : )9(.
.10
, , " 4 ) (17 = 0001,0111
.
, ) 4( ) 1001 ,(9 ) 0110 (6
:
96  1001 0110
+ +
15  0001 0101
--------------------1010 1011
+ 0110 0110
.9 -
--------------------0001 0001 0001
,10 .111.


:
x,z,y Boolean:
-1 x+y,x*y . .
-2 x*1=x ,x+0=x :
-3x+y=y+x ,x*y=y*x :
-4:
x*(y+z) = x*y + x*z
)x+y*z = (x+y)*(x+z
-5 x*x' = 0 , x+x' = 1 :
-6 * .+" 0 .1.
:
-7 (x+y)' = x'*y' (x*y)' = x'+y' :
-8 : ,not , ,.
: x 'x
: .

Min/Max-term
m(i) :Minterm ,1 ,i 0 ) . 1 ,(
.
M(i) :Maxterm 0 ,i 1 - ) . 0 ,(
: )M(i) = m'(i

) (:
,f 1,2,4 :
).f = m(1)+m(2)+m(4
, f , .
f = ((f)')' = ((m(1)+m(2)+m(4))')' = M(3)*M(5) :
: = ) (

:
:
-1
-2
-3
-4
-5

.
.
-1- ) (
) .2 (2^0
, ) !(
8

:
, - .
.
, ,0 ,1.
:
-1 .
-2 -1.
-3 .
-4 .
, , not .

) 3(:

) 4(
4
) 3 " ,"w 2
4.
3.
2.
.

Floating Point
: . ,
.

) (precision :
'' .:
2.718 )10^(-3
2.718281828 )10^(-9
.
) (accuracy : .:
3.14 - 3.25342
.
:Fixed point :
.

....
) (r :
) (mantissa.exponent -
).N=M*(r^E
: 1 Mantissa<Base :
: M=1.
:Base , B .N=M*(r^(E-B)) :
=|E| .B=2^(|E|-1) 1 : .

IEEE754

-1 , .
-2 ) (E B=2^(|E|-1)-1
0 -3 E=F=0
-4

10

, .

) .B , !(
)( . 2^(E-B) : , ) (E-B .
. .

, .

" 0) . 1 ,(
.
) ) ,1. * 2^(x =x , ) .
(.
F\ ) F(
B .x=E-B : X , ... .E

, . E .
) ."
(
.
) . .

:
-1 B<0 ,A>0 - ,B+A B-A .
.
) '( "" .
, 2 ) 2 ,10 '(.
:
)-1.1001 + 0.010100 == -(1.1001 0.010100
, .
.2
: "" )( F ' .
) .(|F|=4
: !

11

IEEE :
-126 exponent +127
-127 +128 .

:Normalized Number
E 00000 : E 11111
M=1.F ,Exp = E-B :

:Denormalized Number
E=00000 :
.M = 0.F ,Exp = -B + 1 :
F=00000 .0 , .0-

:Special cases
E=11111 :

:
,F=00000
.Not a Number. F 00000 -1
, ,
)" overflow (.
:Overflow , , == .
:Underflow , , .0 == .

: , ,single precision 24 ,
. , log10(224) 7.2 : 7 .

12

Combinatorial Circuits
: .Or, And, Not
:
,Fredkin 3 ) (c,I1,I2 3 ) ,(c,O1,O2 c .c
o c=1 )(I1,I2)=(O1,O2
o c=0 )(I1,I2)=(O2,O2
} {Fredkin,0,1 .
) Toffoli ( 3 ) (I1,I2,V 3 ) ,(O1,O2,V1 )(O1,O2
) (I1,I2 , :
o I1=I2=1 'V1=V
o.V1=V ,
} {Toffoli,1 .
(switch) : .
:Pmos .1
:Nmos .0
: Not :P/N mos
: Pmos Nmos !

: :And

:
Xor(X1,X2) = X1*X2' + X1'*X2:

13

:multi bit lines

:Reduction gates
:And n . 0 n , .0
:Or 1 1 n.
:Xor 1 " -1 n-.
:

:One/N digit adder


:One

:N-digit

: ) ,(a-b ,2- .1- ) ,Not(b


.Cin=1 :.

14

:2-way, 1-bit ,Mux

:
-n , " n , .
) n-way n (! 2 " mux ..2 2

, .

:D-mux
,mux . ," ,selector In - ,selector ,In .
) ( .2

:Decoder

Dmux .In=1 , selector ,n 2n .


,selector ) (1 .
demux decoder -and.

:Encoder

. , 2 , .m
) "("3
: ) 00010000 3-( , m=011
: i ." i :

ROM Read Only Memory


:
ROM ) . )
((
" .ROM
ROM ) boot (
:ROM

15

:
, " decoder ,minterms
.A,B,C , or
minterms .
,F0 OR .m1,m2,m7

16

Sequential Circuits
, .
:Sequential circuit .
) ( .
)Asynchronous sequential circuit ( : .
.
)Synchronous sequential circuit ( :
) ( . .
) Combinatorial circuit ( : , , .
:D-Latch' :

:Clock - . , ,:
).(cycle
:
o .
o .
: clock enable - .d-latch
)Flip-Flop( flipflop : - .
.1 ,0 D1 - ) ,(enable=0
D2 ,(enable=1) input
,D1 ) .(0
.2 ,1 D1 - . .input
D2 , , .
output .
.3 ) 0 ( D1 ,
) (1 ,
) . (.
D2 , output ,D1
.
.4 , D1 , ,
D2 , ) (D2 ,
output .

17

:
:
Car ,1 0 .1
Car .1
, ,car=0 ,1 .mux
.A = A+1 ,
,car = 1 , (1+1 = 0) Mux
.A = 2 ,

:1-bit register
, flipflop . , ) load (.
, ,flipflop .

:RAM Random Access Memory


.1 .
.2 ,write address ,dmux -
.load=1
,w.e = 0 dmux
, 0 .
.3 , write data ,
, ) 1,2( .
.4 ,mux -
,read address .
.output
: dmux w.e
w.a - .dmux

:
\ " , .RAM
\ " , .Register File

18

Timing in sequential circuits


: :
.1 .
.2 .

:
:TLH input ,
output) ." - ,50% (

:THL input ,
output.

" :TPD " . .


)TPD = Max(TLH,THL

:TCD

"" . .

:Tr 10% 90%


:Tf - .

:
:TPD .
:

.1 , "" "
) . (.
(a , .
(b ," , , TPD :
))Real Minimum time = TLH(A1) + + TLH(An) + Tr(Input(A1)) + Tr(Input(An
2
))Estimated Minimum time = TLH(A1) + + TLH(An) + Max(Tr(InputA1), , Tr(InputAn
Estimated Min Time > Real Min Time :
.2 "" TPD , ) .
(.
(a ) (.
(b TPD .magic M
)M = Max ( all Tr and Tf of all the inputs in the circuit
TPD = TPD(A1) + + TPD(An) + M

19

:TCD .
:
.1 ) ( TCD .
" , TCD , , TCD .

) :(flip-flop
:TPD,C->Q .
) , flip-flop(
: 90% 90%
.

:TCD,C->Q Flipflop ) .
FF (.

-C->Q : flip-flop , .
:THOLD FF
FF .
:TSETUP FF
FF .

:T0 .
:T0
.a -FF .
.b .
) T0 = Max ( a , b
, , flip-flop b . :
.T0 = b
:T1 FF .
:T1
) -
.a TPD - .FF
.(Logic_1
.b ,TSETUP FF .
T1 = a + b.
:

)time_0 = tpdCQ + tpd(L2

20

) (:
T = T0 + T1 :T
FF :
)THOLD(FFi) < TCD(shortest path which enters FFi
TCD TCD,C->Q FF.
, THOLD .FF

21

CPU
) :ISA (Instruction Set Architecture .
. :
Arithmetic Ins : Add, Subtract
Logic Ins : And, Or, Not
Data Ins : move, Input, Load, Store
Control Flow Ins : Jump, JNE , call, return

:Memory Types

,program counter :PC . ,


) .IL1 cache(.
: , PC ,IR , .PC - , PC .
,instruction register :IR . ) IL1 cache(

:Executing Arithmetic/Logic instruction


PC . IR

,
)
( ,Read1,Read2
" "rt "."rs
, ,write address
" "rd .
,rd1,rd2 ,
, ,write data

,write address "."rd

22

, ,
,control ALU op -.
, ,
,ALU op - .control
5 .! cycles 2

CPU RTL :MIPS


:Load/Store architecture

32 MIPS .
.
, ) (.

:Load/Store architecture
load/store - ) (reference , .Load/Store architecture
) , (.
:Load .
:Store .

:(Microprocessor without Interlocked Pipeline Stages) MIPS Architecture


IR = Instruction register. More to follow.

:MIPS
32.
32 , 5 )" 3 (.
MIPS :

MIPS :
:R-Type (Register Format) .1


.ALU

:Opcode ," controller , .ALU -


:Rs .
:Rt .
:Rd .
.shift amount :Shamt

)cos( x
= ) ( x ,! ) 6 , (.
dx :Func
1

(arctan
x

1
)

.ALU
:

23

:I-Type (Immediate Format) .2


:
.Load/store/bne

:Opcode controller ) I-type .(Load, Store :


:Rs ) (base address .
:Rt :
:Load .
:Store .
:Immediate ) offset ).((rs
:
16 immediate 32 -.
" base address - .rs
:
Load:
Store:

MD .
, .bne (branch if not equal) :
.
, .immediate
. :
4 : 4 ! ! . pc , .
immediate ) sign extender
(.
, ,shift L/R 2
!
.01100  00011<<2

:J-Type (Jump Format) .3


PC
. IL1 cache

24


:Instruction memory
IL1 cache.
,IL1 cache ) PC ( ... .
32 . , , 4 )! (bytes PC - !!
:

:Instruction Register
,
CPU.
IR .IR MI[PC] :

:Register File
2 ".
: RF

:RF

: dmux . , write enable .

25

:ALU
: ,ALU .ALU
:ALUOUT ,ALU
,ALU .

:
ALU cycle !
, ,ALUOUT - ,cycles :

:ALUZERO ALU - .
,true ,1 ALU .0 0.
: .bne rs rt , ,0 .
, ALUZERO ,1 .
, ,ALUZERO = 1 )" ,(PCWCOND = 1 -
ALUOUT .PC - :
.PCWrite = 0
, ,PCWCOND = 1
,ALU , 0 .ALUZERO= 1
, ,not ,0 and - PCWCOND .0
PC ALUOUT cycle . ) 4 (...
, , , 0 .ALUZERO = 0
, not ,1 and PCWCOND , 1 PC ALUOUT ,.

ALUOUT , mux - PCSOURCE.


, ,CPU .PC
:Sign extender
,sign extender .
,2.
) 3 :(6
111100 100
000010 010
:Data Memory
,RAM .
, .RAM

26

, ) : (RTL

,PC .

27

:PROCESSOR PERFORMANCE
:CPU TIME
.CPU time
wall _ clock _ time
= )CPU time = time spent running a program ( as in 1 program
=
1 _ program
Instructions
cycles
time
=
X
X
"= "code size" X "CPI" X "cycle time
program
instruction cycle
" \ ) ,5 2 :(21,22,23,24
code size , )!( . ,
, .ISA -
- CPI ) ( , processor .ISA
Cycle time , chip .
CPU TIME :

speed up
Old _ time
P Old _ CPI Old _ Clock _ Cycle _ T
=
New _ time
P New _ CPI New _ Clock _ Cycle _ T
Instructions
:
program
:
cycles :

= .P

,store .1 cycle" .15%


) (speed up ?
, CPI's :
,cycle ) 15% T (15% :

, , !.

28

= SPEED UP


):MIPS (Millions of instructions per second

.CPU
) . , .(instruction/program
MIPS compiler .ISA

):FLOPS (number of floating point operations per second

,FLOPS standard benchmark .


" .
: , , ... , " .

Peaks
:Peaks .
MIPS ,FLOPS - .
, )!( ) CPU '('peak
) ,(peaks .

Benchmarks
\ .
' ) (:
:
o
o
o
: kenels / microbenchmarks
" o" .
o .
: instruction mixes
o CPU
o .CPI
':
' ...
o compiler/hardware/software ' .
" ' .
o'
' .
o .in realtime environment, choosing gcc :
'
o '
o 3
o data .
, ,
, ,

29


.1 .
.2 .
:
:

.1 B ....A
100 + 10
exec time , B = 55
2
, 55 9.1 , 500.5 B 9.1 .A
.2 , CPI ,A .B
:

. A 500.5

1000 + 1
= 6.99
1 1000
+
3
7
100 + 10
= 5.77
= )Average_CPI(B
10 100
+
4
6
= )Average_CPI(A

30

Memory hierarchy cache, virtual memory


) :Latency (Response time ) . (
:Bandwidth - ) .x \(

:




)
:

):DRAM (Dynamic Random Access Memory


:
) (optimized )( , .
).(kabal
, ) . , "" (.
"" ) (.

:
) : (
".
".

Cache
.

31

:Hit cache .
:Miss cache , .
) :Block size (Line size . .
) (.

: cache
.1 :
.2 .cache
.3 :
o ,
o , .

2 Address _ Size :
Block _ size

) ' (

Cache Design
.Direct Mapping 1 WAY
:

.cache

o" hash
o" .
) ( high order bits = most significant bits

o .
o .hash

:
cache , 128 ) 128 ( .
. 128 7 . 0000000 .1111111
?
128 , , .
) 32 , 5-10 , 11-31 .(tag
?
, 32 , 5-10 , ,11-31
. ,0-4 offset .
, , ,valid, dirty bits ,tag .
5-10 hash !

32

:14

:
.1 : ,tag ,
.
:
.1 hash - ) - ( .
"" , .
.2 , , cache
.
?
"" : 10000 . , 100
2 , .
"" .offset -

Fully Associative N WAY


:

Fully Associative
o .
o ' .hash

o ,
tag check :
.1 .tags
.2 .tags
: ! , tag , .

:
Assume you have a parking lot where they have handed out many parking permits. In fact, there are
more parking permits than parking spots. This is not uncommon at a college. When a lot fills up, the
students park in an overflow lot.
Suppose there are 1000 parking spots, but 5000 students. With a fully associative scheme, a student can
park in any of the 1000 parking spots.
The advantage of such a scheme is that it makes full use of the parking lot.
?
, cache ,fully associative .
:
) ,valid bit (V) = 0 ( , .
-V , ,1 ) ( .
: ,dirty bit = 0 . dirty bit =1
, dirty bit ,0 . V .1

33

?
,fully associative , .
, . , ,tag
, tag ) . " .(XNOR
, .V=1 ,  V=1 tags - "".
, 0-4 offset.
index .
 " ."faulty fully associative cache scheme - ,
) , '(.
...14

.1 " .
.2 .hash

.1 index , . fully associative caches


.
.2 cache scheme , .cache
tag .offset
In general, if you have M bytes of data in the block, and M is a power of 2, then this means that
Log(M) bits are used for the offset. Thus Address[log(M-1) , 0] are the offset bits.
Thus, there are Address.length() log(M) bits for the tag,
and these are bits A[Address.length()-1 , log(M)].

Set Associative [SET SIZE] WAY


:
set-associative
o x.
o hash
x

o tag
o x .

34

.set

A set-associative cache scheme is a combination of fully associative and direct mapped schemes. You
group 'slots' into indices. You find the appropriate index for a given address (which is like the direct
mapped scheme), and within the index you find the appropriate slot (which is like the fully associative
scheme).
This scheme has fewer collisions because you have more slots to pick from, even when cache blocks are
mapped to the same set.
+ ..
, . 128,
0000000 .1111111
, k .
, ? k 128 , 8 , k=16.
,set log(16) bits 4 ,.
, .32 ,5-8 ,9-31 , ,tag - 0-4 ,
.offset
?
32 . ] Address[5,8 .
) 8 ( ) 9-31 (tag
) tags .(fully associative scheme
tag ]) Address[0-4 (offset- .
, )
(fully associative ) tag'(.
8-way set associative cache
: 14

\
n-way set associative cache .
.tags
) .(fully associative ,
) ( . .
, .direct mapping
, N .....fully associative

35

Replacement policies
?
:
FIFO
LRU
)NMRU (not most recently used
Pseudo-random

2 , :
.1 ,
.2 -cache .
.
o , , )( .
o ) ,dirty bit(...

Write through policy

, ,
) ,disk,memory ,L2 ,L1 (?! ....
:
.1 ) bandwidth( .
.2 ) (disk )(.
, , . L2 cache
o .L1,L2 caches -
o L2 write-back -

Write back policy

state :cache
Valid/invalid o .
Clean/Dirty o . )(dirty==1modified
,tag array ) address tag (.
o dirty bit .write
,cache .dirty bit
o == 1 .
write back : .cast-out :

:Caches and performance


In cache design we should enable the cache for the 'more' common case: a cache hit (the block is
present). In addition we would like to enable the design for the less common case: a cache miss.
In such a case we should:
a. Fetch from the next level (sometimes the need to be applied recursively it's a
miss in the upper level).
b. Decide what to do in the meantime.
Overall we'll examine the performance impact due to such cases, and notice that some optimizations are
possible.

36

Cache miss-rate
,cache miss rate :
.1 . .spatial/temporal localities
.2 cache , , ...

:localities

.
o , . , "".
" , .temporal locality
o " ,"taken branches " .
X 90% , ) :
}>If (X) { <code1>} else { <code2
:
}>If (!X) {<code2>} else { <code1
: , <code1> - cache CPU
) .if(X , , branch . code1
.miss
) (data - , .
o ' :' - .cache
" ) .temporal locality 6 cache
, (.
o commonly accessed fields) . struct (c
" temporal .spatial locality
o ) ( , "
heap manager , .spatial locality

cache
:
wall _ clock _ time
=
1 _ program

= )CPU time = time spent running a program ( as in 1 program

Instructions
cycles
time
X
X
"= "code size" X "CPI" X "cycle time
program
instruction cycle

:
cache - hit latency) .cycle time (.
Cache misses .CPI
:
cycles
Misses cycles
) .(hit
,miss penalty
cache -
hit latency
.miss latency -

:

) :P(l miss penalty n ) .cache
(...
) :MPI(l ,miss rate/instruction n .cache -

37

miss rate:
.CPI
, :
o , misses (misses/ref) .
o , misses fetch/load/store.
, :
:
L1 instruction cache with 98% hit rate per instruction.
L1 data cache with 96% hit rate per instruction
Shared L2 cache with 40% local miss rate
L1 miss penalty of 8 cycles
L2 miss penalty of 19 cycles
:

CPI misses ) .1.15 (.


,miss penalty * miss rate ) L1,L1 caches (.
, miss penalty .L1 L1 ins. , L1 data
miss rate .L1
miss penalty L2 L1 L2 :
miss .L2 L2 .ref

:The 3 c's

:Compulsory miss .
Capacity: The working set exceeds the cache capacity (working set - at any given moment, it
)includes the blocks accessed in the last T instructions
, T , .cache -
, ) ( .
:Conflict ) fully associative cache - (.
Capacity - .set

?miss rate
cache , ,
) .(6
: )!( . Conflicts .
: .conflicts ) (6
,8-way associative .
: , ) spatial locality ,
( ." miss rate 64-256 . 512 , :

38

, ) capacity misses
. , ,(capacity miss

tradeoff
....
.miss penalty compulsory misses
Number of compulsory misses = working set / block size
Number of transfers = block size / bus width (-the size we want to move / how much we can move)
: , conflict misses
Number of blocks = cache size / block size
. capacity misses

....
.conflict misses
compulsory misses
.(!) capacity misses
.

: (... )

39

Virtual Memory
:
) ( .
.
?
)"( , " " ) (
.
" ,
.
:


. , )"( . .,
) (.
:Page ." ) ( " ."
.4K-16K

Page fault
) .(?? Miss :
.page table
.
" .page fault
o ) (.
o ) (.
) (OS .
interrupt ) ( ) ( .page fault
:
) ,PTBR (Page Table Base Register
,
.
, offset
) (.
) ,(fetch, load, store
Paging . memory ,"
,multi level page tables " hash collisions
.

:Page table
.
: ... ":
,limit register "" " .pt
.multi level page table - pt .pt
pt ) pageable , pt( " .VM

40

:multi level page table

TLB
...pt cache , :Translation Lookaside Buffer
TLB .cache ) MMU ... ( .
) TLB " ( ) pt (.
" TLB cache "page numbers
,TLB ,cache .
, cache ,TLB's , .data
:
,
,TLB ,
.
, .Page table
, .
,
TLB ) .(cache
, ) page fault .(Pf
,pf .TLB -
, ,
,cache
.

virtual memory management . ,


cache , ,FF 2-..........

41

Pipeline
CPU throughput: the number of instructions performed per unit of time.
Instruction cycle: The latency for processing an instruction. It depends on the architecture (logical).
Machine cycle (cycle time): The latency of each pipeline stage. It depends on the hardware (the
micro-architecture).

) ( .
.
,
.
,
, .
, ,
" .
, ,

.
,
, )
(.

Ideal Pipeline
pipeline:
.1 : .
.2 : .
.3 " : " " .
tradeoff - pipeline - :
:
pipeline K ) ( . , :
K*Machine Cycles n n-1 ,pipeline , machine cycle.
n pipeline K :
)Time_ideally_piped(n) = (K+ (n-1))*(Machine Cycle
\:
Pipeline : .
: .

42

Non-Ideal Pipeline
3 "" :
.1 :
)
( ,
)
( .
.
.2 : multi -
function pipeline
)(
pipe )
( , ,
pipeline.

.3 ):(stall
,
. .

)Amdahl's Law (speed up of parallel computing


.
number.of .instructions.that.can.be. performed .in. parallel
number.of .instructions

= f

, N , :
1

1
N
= speedup

f
1 f
1 f +
N

:
-
f speedup.

:Pipeline
pipeline
) f"( ,pipeline -
pipeline . g:
,N ,pipeline
' -' .
)( , g N.

43

Designing an instruction pipeline


pipeline " .non-ideal-pipeline
) .1( : ) (machine cycles
.
) .2( )( : . \ ,
.
) .3( )( " : . .stalls
, ...pipe
:
, ,
-operand ,
.


:
) . stage(..
) (.
) , ,(


:
Arithmetic operation
Data movement
Instruction sequencing

RISC: Each instruction carries out one generic task type bigger bandwidth.
CISC: Each instruction carries out more than one generic task type.
:

"
, .pipeline
, .

44

The TYP pipeline


,
.
,
, :

'' .
Pipeline
.
" Pipeline,
load
,Operand Fetch store
.Operand Store pipeline
,
ALU ,Branch
store load .
,scalar pipeline
store ,pipe store
pipe ) ...
(.
,ALU store ...
"" , , .

load ALU pipeline 5 , read


3 ) ...decode - (...

45

46

47

48

(STALLS)
:

(data dependency)
:
:
( )o
o

(control dependency)
:
.
.PC RAW

Types of data dependencies


Assume that instruction J occurs after instruction I
RAW -- read after write (True dependence) J tries to read a source before I writes to it.
Therefore J possibly gets an incorrect value.

WAR -- write after read (Anti dependence) J tries to write a destination before the destination
is read by I. Therefore I possibly reads an incorrect value.

WAW -- write after write (Output dependence) J tries to write an operand before it is written by
I. In this case the writes are performed in the wrong order.

.(WAW WAR )" . , RAW )


:TYP pipeline
.TYP pipeline - WAW

.TYP pipeline - WAR

49

RAW , .memory

Pipeline Hazard
) .(RAW
:
structural hazards: attempt to use the same resource two different ways at the same time.
control hazards: attempt to make a decision before condition is evaluated.
data hazards: attempt to use item before it is ready.
!
: !STALL
, .stalls
.
RAW
,TYP pipeline

) ? (

Forwarding
) ,(WB
. : ) ( ...
Bit more Formal : Forwarding invlolves feeding output data into a previous stage of the pipeline.
forwarding:TYP -
ALU ,TYP -
.
, stall
3 . ,
ALU .ALU,
, ,
,ALU ,ALU
.WB
, stall ALU
.
, ,branch PC
,
, branch
,
Ins. Fetch ) pc
.

50

Speculative forwarding
, forwarding stalls , .
, 4 cycles stall .branch
stalls :
.pipeline
branch )
, if ,
abory(.
branch , .
branch ) (
...
) .conditional jump unconditional
.(pipeline
4 cycle penalty only when branch is taken :
: )!(.

51

RTL Register Transfer Logic


RTL CPU .
3 - :RTL
A
.1 Value :
.2 .goto line number :line number
.3 op .cond : op :condition=true
) :CAR , ... (...
:Current Address Register .RTL
,CAR++ goto CAR .line number


\\\ ...
:
: 1 RTL

, ) .RTL (.
: 2

: 3

.CAR

: 4 ) 2 (3

52

: 5 ) (If "" control




.

:control .control
.
RTL .
I ,A -mux
"."load
,CAR==1 , )(AM[i]+A ,I++
-mux . ,
,CAR==2 " .
) ,"
, Mux , (
"" .

,ROM ) . .(combinatorial circuits

And no eggs.

53

You might also like