Professional Documents
Culture Documents
slide kiến trúc máy tính
slide kiến trúc máy tính
Jan2016
NKK-HUST
Contact Information
n
n
Computer Architecture
Address: 502-B1
Mobile: 091-358-5533
e-mail: khanhnk@soict.hust.edu.vn
khanh.nguyenkim@hust.edu.vn
NKK-HUST
Computer Architecture
NKK-HUST
Mc tiu hc phn
n
Ti liu hc tp
n
n
n
Jan2016
Jan2016
Computer Architecture
Jan2016
NKK-HUST
Jan2016
NKK-HUST
Ni dung hc phn
Content
Chapter 1. Introduction
Chapter 2. The Basics of Digital Logic
Chapter 3. Computer Systems
Chapter 4. Computer Arithmetic
Chapter 5. Instruction Set Architecture
Chapter 6. The Processors
Chapter 7. Computer Memory
Chapter 8. Input-Output Systems
Chapter 9. Parallel Architectures
Computer Architecture
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Ni dung ca chng 1
Chng 1
GII THIU CHUNG
Jan2016
Computer Architecture
Jan2016
Computer Architecture
Jan2016
NKK-HUST
NKK-HUST
x l d liu
Cc
thit b vo
(Input
Devices)
Dy cc lnh nm trong b nh yu cu
my tnh thc hin cng vic c th gi l
chng trnh (program).
My tnh hot ng theo chng trnh
n
Jan2016
Computer Architecture
d liu vo
Jan2016
n
n
My ch (Servers) my phc v
n
n
11
Jan2016
Smartphones, Tablet
Kt ni Internet
Jan2016
10
d liu ra
NKK-HUST
chng trnh
ang thc hin
Computer Architecture
Cc
thit b ra
(Output
Devices)
B nh chnh
(Main Memory)
NKK-HUST
B x l
trung tm
(Central
Processing Unit)
12
Jan2016
NKK-HUST
NKK-HUST
Phn lp my tnh
Ngi
s dng
NKK-HUST
Ngn ng bc cao
Hp ng
n
n
Assembly language
M t lnh di dng text
n
n
Qun l b nh v lu tr
iu khin vo-ra
Phn cng
Jan2016
Computer Architecture
swap:
multi
add
lw
lw
sw
sw
jr
CPU
$2, $5,4
$2, $4,$2
$15, 0($2)
$16, 4($2)
$16, 0($2)
$15, 4($2)
$31
B nh chnh
Bus h thng
FIGURE 1.4 C program compiled into assembly language and then assembled into binary
machine language. Although the translation from high-level language to binary machine language is
shown in two steps, some compilers cut out the middleman and produce binary machine language directly.
These languages and this program are examined in more detail in Chapter 2.
Jan2016
15
00000000101000100000000100011000
00000000100000100001000000100001
10001101111000100000000000000000
10001110000100100000000000000100
10101110000100100000000000000000
10101101111000100000000000000100
00000011111000000000000000001000
iu khin hot ng ca my
tnh v x l d liu
Assembler
Binary machine
language
program
(for MIPS)
14
NKK-HUST
n
Assembly
language
program
(for MIPS)
Computer Architecture
Phn cng
H thng vo-ra
Machine language
M t theo phn cng
Cc lnh v d liu c
m ha theo nh phn
15
Compiler
Ngn ng my
n
Jan2016
High-level
language
program
(in C)
Phn mm h thng
Cc mc ca m chng trnh
Ngi
lp trnh
h thng
13
1.3
Phn mm h thng
Jan2016
Phn mm ng dng
n
Ngi
lp trnh
Phn mm ng dng
Computer Architecture
16
Jan2016
NKK-HUST
NKK-HUST
n
n
n
n
Jan2016
Computer Architecture
17
NKK-HUST
Jan2016
n
n
Computer Architecture
18
Massive Cluster
Gigabit Ethernet
B vi x l (Microprocessors)
n
Clusters
Refrigerators
Cars
n
Routers
Routers
Computer Architecture
19
Jan2016
RobotsRobots
B nh bn dn (Semiconductor Memory)
n
Jan2016
NKK-HUST
Sensor
Nets
20
Jan2016
NKK-HUST
NKK-HUST
S pht trin ca b vi x l
n
n
n
n
n
n
My tnh A nhanh hn my B k ln
1985: cc b x l 32-bit
2001: cc b x l 64-bit
2006: cc b x l a li (multicores)
n
PA / PB = tB / tA = k
n
Jan2016
Computer Architecture
21
NKK-HUST
Jan2016
Computer Architecture
Jan2016
T0
n
22
NKK-HUST
tCPU = n T0 =
n
f0
23
Jan2016
24
Jan2016
NKK-HUST
NKK-HUST
V d
n
n
Ta c:
t=
n
f
nA = t A f A = 10s 2GHz = 20 10 9
My tnh B:
n
V d (tip)
nB = 1.2 nA = 24 10 9
fB =
Jan2016
Computer Architecture
25
NKK-HUST
nB 24 10 9
=
= 4 10 9 Hz = 4GHz
tB
6
Jan2016
Computer Architecture
26
NKK-HUST
V d
n = IC CPI
n
n
n
IC CPI
f0
My tnh B:
n
tCPU = IC CPI T0 =
Hy xc nh my no nhanh hn v nhanh hn
bao nhiu ?
Computer Architecture
27
Jan2016
Computer Architecture
28
Jan2016
NKK-HUST
NKK-HUST
V d (tip)
Ta c:
tCPU = IC CPITB T0
ICA = ICB = IC
n = (CPI i ICi )
i=1
T ta c:
CPITB =
t B IC 600 ps
=
= 1.2
t A IC 500 ps
n
1 K
= (CPI i ICi )
IC IC i=1
Computer Architecture
29
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
V d
n
V d
Loi lnh
Loi lnh
IC trong dy lnh 1
20
10
20
IC trong dy lnh 1
20
10
20
IC trong dy lnh 2
40
10
10
IC trong dy lnh 2
40
10
10
Dy lnh 1: S lnh = 50
Computer Architecture
31
Jan2016
Dy lnh 2: S lnh = 60
S chu k =
= 1x20 + 2x10 + 3x20 = 100
n CPITB = 100/50 = 2.0
n
Jan2016
30
Computer Architecture
S chu k =
= 1x40 + 2x10 + 3x10 = 90
n CPITB = 90/60 = 1.5
n
32
Jan2016
NKK-HUST
NKK-HUST
Tm tt v Hiu nng
CPU Time =
Program
Instruction Clock cycle
tCPU
n
MIPS =
IC CPI
= IC CPI T0 =
f0
Instruction count
Instruction count
Clock rate
=
=
Execution time 106 Instruction count CPI 106 CPI106
Clock rate
Thut gii
Ngn ng lp trnh
Chng trnh dch
Kin trc tp lnh
Phn cng
Jan2016
Computer Architecture
MIPS =
33
NKK-HUST
f0
CPI 10 6
Jan2016
CPI =
f0
MIPS 10 6
Computer Architecture
34
NKK-HUST
V d
V d
0.5ns
2ns
Jan2016
Computer Architecture
35
Jan2016
36
Jan2016
NKK-HUST
NKK-HUST
V d
V d
1ns
Jan2016
Computer Architecture
37
NKK-HUST
Jan2016
Computer Architecture
38
NKK-HUST
MFLOPS
GFLOPS109
TFLOPS1012
PFLOPS (1015)
Jan2016
Computer Architecture
39
Jan2016
Computer Architecture
40
10
Jan2016
NKK-HUST
NKK-HUST
Chng 2
C BN V LOGIC S
Ht chng 1
Jan2016
Computer Architecture
41
NKK-HUST
Jan2016
42
NKK-HUST
Ni dung ca chng 2
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Jan2016
Computer Architecture
Computer Architecture
2.1. Cc h m c bn
2.2. i s Boole
2.3. Cc cng logic
2.4. Mch t hp
2.5. Mch dy
43
Jan2016
Computer Architecture
44
11
Jan2016
NKK-HUST
NKK-HUST
2.1. Cc h m c bn
n
Jan2016
1. H thp phn
Computer Architecture
C s 10
10 ch s: 0,1,2,3,4,5,6,7,8,9
Dng n ch s thp phn c th biu din
c 10n gi tr khc nhau:
45
NKK-HUST
00...000
= 0
99...999
= 10n - 1
Jan2016
Computer Architecture
46
NKK-HUST
V d s thp phn
Gi tr ca A c hiu nh sau:
A = an10 n + an110 n1 +... + a1101 + a010 0 + a110 1 +... + am10 m
n
A=
a 10
Computer Architecture
Cc ch s ca phn nguyn:
n
472 : 10 = 47 d
47 : 10 = 4 d
4 : 10 = 0 d
Cc ch s ca phn l:
i=m
Jan2016
47
Jan2016
Computer Architecture
48
12
Jan2016
NKK-HUST
NKK-HUST
S nh phn
2. H nh phn
n
n
n
n
n
C s 2
2 ch s nh phn: 0 v 1
Ch s nh phn c gi l bit (binary digit)
bit l n v thng tin nh nht
Dng n bit c th biu din c 2n gi tr khc
nhau:
n
n
00...000
11...111
Biu din
s nh phn
2-bit
3-bit
4-bit
S
thp phn
00
000
0000
01
001
0001
10
010
0010
11
011
0011
100
0100
101
0101
110
0110
111
0111
1000
= 0
= 2n - 1
Jan2016
Computer Architecture
49
NKK-HUST
Jan2016
Computer Architecture
1001
1010
10
1011
11
1100
12
1101
13
1110
14
1111
15
50
NKK-HUST
Byte = 8-bit
Jan2016
1-bit
Theo nh phn
n v
Vit tt
Gi tr
n v
Vit tt
Gi tr
kilobyte
KB
103
kibibyte
KiB
210 = 1024
megabyte
MB
106
mebibyte
MiB
220
gigabyte
GB
109
gibibyte
GiB
230
terabyte
TB
1012
tebibyte
TiB
240
petabyte
PB
1015
pebibyte
PiB
250
exabyte
EB
1018
exbibyte
EiB
260
Computer Architecture
vi ai=0 hoc 1
Gi tr ca A c tnh nh sau:
A=
a2
i=m
51
Jan2016
Computer Architecture
52
13
Jan2016
NKK-HUST
NKK-HUST
V d s nh phn
1101001.1011(2) =
6 5 4 3 2 1 0
-1 -2 -3 -4
= 26 + 25 + 23 + 20 + 2-1 + 2-3
2-4
= 105.6875(10)
Jan2016
Computer Architecture
53
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
V d: chuyn i 105(10)
n
105 : 2 =
52
52 : 2 =
26
26 : 2 =
13
13 : 2 =
6:2 =
3:2 =
1:2 =
V d 1: chuyn i 105(10)
6
5
3
0
n 105 = 64 + 32 + 8 +1 = 2 + 2 + 2 + 2
biu din
s d
theo chiu
mi tn
n
27
26
25
24
23
22
21
20
128
64
32
16
Kt qu:
214 + 29 + 26 + 25 + 23
Jan2016
54
Computer Architecture
55
Jan2016
7 6 5 4
Computer Architecture
3 2 1 0
56
14
Jan2016
NKK-HUST
NKK-HUST
V d 1: chuyn i 0.6875(10)
n
0.6875 x 2 = 1.375
phn nguyn = 1
0.375 x 2 = 0.75
phn nguyn = 0
0.75
x 2 = 1.5
phn nguyn = 1
0.5
x 2 = 1.0
phn nguyn = 1
biu din
theo
chiu
mi tn
Kt qu : 0.6875(10)= 0.1011(2)
Jan2016
Computer Architecture
57
0.81 x 2 =
1.62
phn nguyn
0.62 x 2 =
1.24
phn nguyn
0.24 x 2 =
0.48
phn nguyn
0.48 x 2 =
0.96
phn nguyn
0.96 x 2 =
1.92
phn nguyn
0.92 x 2 =
1.84
phn nguyn
0.84 x 2 =
1.68
phn nguyn
0.81(10) 0.1100111(2)
NKK-HUST
Jan2016
Computer Architecture
58
NKK-HUST
3. H mi su (Hexa)
n
n
n
Jan2016
V d 2: chuyn i 0.81(10)
C s 16
16 ch s: 0,1,2,3,4,5,6,7,8,9, A,B,C,D,E,F
Dng vit gn cho s nh phn: c mt
nhm 4-bit s c thay bng mt ch s
Hexa
Computer Architecture
59
Jan2016
Computer Architecture
4-bit
S Hexa
Thp phn
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
10
1011
11
1100
12
1101
13
1110
14
1111
15
60
15
Jan2016
NKK-HUST
NKK-HUST
2.2. i s Boole
n
n
n
A AND B :
A OR B :
A B hay AB
A+B
NOT A :
A NAND B:
A NOR B :
A XOR B:
AB
A+ B
A B = A B + A B
Jan2016
Computer Architecture
61
NKK-HUST
A AND B
AB
A OR B
A + B
A NAND B
AB
Jan2016
A NOR B
A XOR B
A + B
Computer Architecture
A B
62
NKK-HUST
A+B=B+A
A (B + C) = (A B) + (A C)
A + (B C) = (A + B) ( A + C)
1A=A
0+A=A
AA=0
A+A=1
0A=0
1+A=1
AA=A
A+A=A
A (B C) = (A B) C
A + (B + C) = (A + B) + C
A B = A + B (nh l De Morgan)
A + B = A B (nh l De Morgan)
Jan2016
NOT A
Computer Architecture
63
Jan2016
Cng NOT
Cng nhiu u vo
Computer Architecture
64
16
The fundamental building block of all digital logic circuits is the gate. Logical functions are implemented by the interconnection of gates.
A gate is an electronic circuit that produces an output signal that is a simple Boolean operation on its input signals. The basic gates used in digital logic are
AND, OR, NOT, NAND, NOR, and XOR. Figure 11.1 depicts these six gates. Each
gate is defined in three ways: graphic symbol, algebraic notation, and truth table.
The symbology used in this chapter is from the IEEE standard, IEEE Std 91. Note
that the inversion (NOT) operation is indicated by a circle.
Each gate shown in Figure 11.1 has one or two inputs and one output.
However, as indicated in Table 11.1b, all of the gates except NOT can have more
than two inputs. Thus, (X + Y + Z) can be implemented with a single OR gate
with three inputs. When one or more of the values at the input are changed, the
correct output signal appears almost instantaneously, delayed only by the propagation time of signals through the gate (known as the gate delay). The significance of
this delay is discussed in Section 11.3. In some cases, a gate is implemented with two
outputs, one output being the negation of the other output.
NKK-HUST
Jan2016
NKK-HUST
Algebraic
Function
Graphical Symbol
Name
Tp y
Truth Table
A B F
0 0 0
0 1 0
B
1 0 0 11.2 / GATES
1 1 1
A B F
Here we introduce
a common term: we say that to assert
0 0 0a signal is to cause a
A
F ! A " Bfalse (0) 0state
F its logically
1 1 to its logically true
signal line toOR
make aBtransition from
1 0 1
(1) state. The true (1) state is either a high or low voltage1 state,
1 1 depending on the
A
AND
F!AB
or
F ! AB
369
Jan2016
F ! AB
0 1 1
B
AND, OR, NOT
1 0 1
1 1 0
AND, NOT
A B F
0 0 1
A
OR, NOT
F!A"B
0 1 0
NOR
F
B
1 0 0
NAND
1 1 0
NOR
A B F
0 0 0
A
XORbe clear that AND, F
F ! NOT
A ! B gates constitute
0 1 1
It should
OR, and
a
B
1 0 1
complete set, because they represent
the
three
operations
of Boolean
Computer Architecture
1 1 0
functionally
algebra. For
the AND and NOT gates to form a functionally complete set, there must be a way
Figure 11.1 Basic Logic Gates
to synthesize
the OR operation from the AND and NOT operations. This can be
done by applying DeMorgans theorem:
65
{AND, NOT}
{OR, NOT}
{NAND}
{NOR}
Jan2016
Computer Architecture
66
A + B = A#B
A OR B = NOT ((NOT A) AND (NOT B))
NKK-HUST
NKK-HUST
A B
A
B
A B
(A+B)
A+B
A B
Jan2016
Computer Architecture
A+B
67
Jan2016
With gates, we have reached the most primitive circuit level of computer
hardware. An examination of theComputer
transistor
combinations used to construct gates
Architecture
departs from that realm and enters the realm of electrical engineering. For our purposes, however, we are content to describe how gates can be used as building blocks
to implement the essential logical circuits of a digital computer.
68
17
Jan2016
NKK-HUST
NKK-HUST
Mt s vi mch logic
1A
1B
1Y
2A
2B
2Y
GND
1A
1B
1Y
2A
2B
2Y
GND
Jan2016
1A
1B
NC
1C
1D
NKK-HUST
1Y
GND
14
13
12
11
10
7400 NAND
14
13
12
11
10
7408 AND
14
13
12
11
10
VDD
1Y
4B
1A
4A
1B
4Y
2Y
3B
2A
3A
2B
3Y
GND
VDD
1A
4B
1B
4A
2A
4Y
2B
3B
2C
3A
2Y
3Y
GND
VDD
1CLR
14
13
12
11
10
7402 NOR
14
13
12
11
10
7411 AND3
VDD
1A
4Y
1Y
4B
2A
4A
2Y
3Y
3A
3B
3Y
3A
GND
VDD
1A
1C
1B
1Y
1Y
3C
2A
3B
2B
3A
2Y
3Y
GND
Computer Architecture
2D
1D
2C
1CLK
NC
1PRE
2B
1Q
2A
1Q
2Y
GND
Mch t hp
7421 AND4
1
2
3
4
14
13
reset
D Q
Q
set
reset
Q D
Q
set
12
11
10
7474 FLOP
VDD
1A
2CLR
1B
2D
1Y
2CLK
2A
2PRE
2B
2Q
2Y
2Q
GND
2.4. Mch t hp
585
14
13
12
11
10
7404 NOT
14
13
12
11
10
9
8
7432 OR
14
13
12
11
10
7486 XOR
6A
6Y
5A
5Y
4A
4Y
VDD
VDD
Cc u vo (Inputs)
Cc u ra (Outputs)
c t chc nng (Functional specification)
c t thi gian (Timing specification)
4B
4Y
3B
3A
3Y
69
Jan2016
n
n
4A
4Y
3B
NKK-HUST
3Y
70
F = 1A B C2 # 1A B C2 # 1A B C2 # 1A B C2 # 1A B C2
V d
u vo
Computer Architecture
4B
There are three combinations of input values that cause F to be 1, and if any
one of these combinations occurs, the result is 1. This form of expression, for selfevident reasons, is known as the sum of products (SOP) form. Figure 11.4 shows a
straightforward implementation with AND, OR, and NOT gates.
Another form can also be derived from the truth table. The SOP form
expresses that the output is 1 if any of the input combinations that produce 1 is true.
We can also say that the output is 1 if none of the input combinations that produce
0 is true. Thus,
0
Mch c nh
1
1
0
1
1
0 ti
u ra c xc nh bi cc1 gi tr trc
v 1gi tr hin
ca u vo
1
VDD
3A
0
0
0
Mch khng nh
0
0
1
u ra c xc nh bi cc0 gi tr hin
1 ti ca 0u vo
371
Mch t hp (Combinational
Circuits)
A
B
n
4A
u ra
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
0
Figure 11.4
F = A BC + A BC + ABC
Jan2016
Computer Architecture
71
Jan2016
Computer Architecture
72
18
F = B(A + C)
D3
Because the complement of the complement of a value is just the original value,
output signal F. To select one of the four possible inputs, a 2-bit selection code is
F = B(A + C) = (AB + (BC) needed, and this is implemented as two select lines labeled S1 and S2.
An example 4-to-1 multiplexer is defined by the truth table in Table 11.7. This
is a simplified form of a truth table. Instead of showing all possible combinations of
input variables, it shows the output as data from line D0, D1, D2, or D3. Figure 11.13
shows an implementation using AND, OR, and NOT gates. S1 and S2 are connected
which has three NAND forms, as illustrated in Figure 11.11.to the AND gates in such a way that, for any combination of S1 and S2, three of the
AND gates will output 0. The fourth AND gate will output the value of the selected
line, which is either 0 or 1. Thus, three of the inputs to the OR gate are always 0,
Multiplexers
and the output of the OR gate will equal the value of the selected input gate. Using
The multiplexer connects
multiple inputs to a single output.this
At regular
any time,
one of theit is easy to construct multiplexers of size 8-to-1, 16-to-1,
NKK-HUST
organization,
so on. representation
inputs is selected to be passed to the output. A general blockand
diagram
Multiplexers
areinput
used in digital circuits to control signal and data routing. An
is shown in Figure 11.12. This represents a 4-to-1 multiplexer. There
are four
theprovide
loading of
the program counter (PC). The value to be loaded into the
lines, labeled D0, D1, D2, and D3. One of these lines is example
selectedisto
the
program counter may come from one of several different sources:
Jan2016
F = (AB)(BC)
NKK-HUST
B chn knh 4 u vo
S2
n
n
n
u vo d liu
n u vo chn
1 u ra d liu
Mi t hp u vo chn (S) xc nh u vo
d liu no (D) s c ni vi u ra (F)
Jan2016
Computer Architecture
D1
F
D0
D3
D1
S2
S1
73
S2
S1
0
0
1
1
0
1
0
1
D0
D1
D2
D3
F
D2
D3
Jan2016
Computer Architecture
74
NKK-HUST
B gii m (Decoder)
n
4-to-1
MUX
D2
NKK-HUST
S1
D0
n 2n
N u vo, 2N u ra
Vi mt t hp ca N u vo, ch c mt u ra tch cc
(khc vi cc u ra cn li)
V d: B gii m 2 ra 4
2:4
Decoder
11
10
01
00
A1
A0
A1
Y3
Y2
Y1
Y0
A
B
001 D
1
C
010 D
2
A0
011 D
3
100 D
4
Y3
101 D
5
Y2
A1
0
0
1
1
Jan2016
A0
0
1
0
1
Y3
0
0
0
1
Y2
0
0
1
0
Y1
0
1
0
0
383
000 D
0
Y0
1
0
0
0
Computer Architecture
Y1
110 D
6
Y0
111 D
7
75
Jan2016
Figure 11.15
76
A0
A7
256 ! 8
RAM
2-to-4
Decoder
Enable
256 ! 8
RAM
Enable
256 ! 8
RAM
Enable
256 ! 8
RAM
Enable
19
Jan2016
386
NKK-HUST
B cng
n
Sum
+ 0
0
A
B cng bn phn 1-bit
Cng 3 bit
Cho php xy dng b cng N-bit
0
1
+ 1
1
+ 0
1
A
B
C
A
B
C
u vo A
B
1+ 1
Cout
0
1Half
Adder
0
387
0
1
1
Sum
1
Cout
+1
0
1 CIRCUITS0
11.3 /1COMBINATIONAL
1
1
01-bit
1
10
0
S
+0 A
A1
u ra
B
0C
Cout
S = A B
+0
+1
B
0
0
0 1
C out = AB
00
1
10
C
0
1
1
0
However, addition canAstill be dealt with in Boolean terms. In Table 11.9a, we
show the logic
1 for adding
0 B two input
1 bits0 to produce a 1-bit sum and a carry bit.
This truth table could easily be implemented in digital logic. However, we are not
1
1 A addition
0 on just1 a single pair of bits. Rather, we wish to
interested in performing
Jan2016
Computer Architecture
77
NKK-HUST
Cin
387
A
B
C
S
u vo
u ra
Cin
Cout
A
B
C
A
B
C
Thus we have
the necessary
logic+toBC
implement a multiple-bit adder such as
Carry
= AB + AC
shown in Figure 11.21. Note that because the output from each adder depends on
Figure 11.20 is an implementation using AND, OR, and NOT gates.
the carry from the previous adder, there is an increasing delay from the least significant to the most significant bit. Each single-bit adder experiences a certain amount
of gate delay,A andBthis gate delay
larger adders,
A2 accumulates.
B2
AFor
B
A0 theB0accumulated
3
3
1
1
delay can become unacceptably high.
If the carry values could be determined without having to ripple through all
Overflow
the previous
stages,Cthen each
single-bit
adder
could CfunctionC independently,
and
0
C3
C2
Cin
C1
Cin
in
in
0
signal
delay would not accumulate. This can be achieved with an approach known as carry
lookahead. Let us look again at the 4-bit adder to explain this approach.
We would like to come up with an expression that specifies the carry input to
any stage of theSadder without reference
to previous
carry values. We
have
S
S
S
1-bit
Full
Adder
Cout
Sum
A31 B31
A
B
C
S31
Cout
Figure 11.21
Carry
A24 B24
A23 B23
C23
8-bit
adder
Cout
A
B
A
C
78
NKK-HUST
Jan2016
Jan2016
the
S24
A16 B16
C15
8-bit
adder
S23
A15 B15
S16
A8 B8
C7
8-bit
adder
S15
A 7 B7
S8
A0 B0
8-bit
adder
S7
Cin
S0
B
C
Computer Architecture
Figure 11.20 Implementation of an Adder
79
Jan2016
Computer Architecture
80
20
Jan2016
NKK-HUST
NKK-HUST
2.5. Mch dy
causing the output to be 1; if only the K input is asserted, the result is a reset function,
causing the output to be 0. When both J and K are 1, the function performed is
referred to as the toggle function: the output is reversed. Thus, if Q is 1 and 1 is applied
to J and K, then Q becomes 0. The reader should verify that the implementation of
Figure 11.26 produces this characteristic function.
Cc Flip-Flop c bn
Name
Truth Table
Qn!1
0
0
1
1
0
1
0
1
Qn
0
1
Qn!1
0
0
1
1
0
1
0
1
Qn
0
1
Qn
Qn!1
0
1
0
1
Ck
SR
Graphical Symbol
Ck
JK
Ck
Jan2016
Computer Architecture
81
NKK-HUST
Jan2016
Computer Architecture
82
NKK-HUST
389
391
393
Data lines
11.4 / SEQUENTIAL CIRCUITS
R
391
Clock
D18
D17
D16
D15
D14
D13
D12
D11
S
Q
ClockS
Figure 11.24
S-R Latch
S-R Flip-Flop
392
Clocked SR Flip-Flop
Clk
Clk
Clk
Clk
Clk
Clk
Clk
Clk
First, let us show that the circuit is bistable. Assume that both S and R are 0
Figure 11.24 Clocked SR Flip-Flop
and that Q is 0. The inputs to the lower NOR gate are Q = 0 and S = 0. Thus, the
Clock
output Q = 1 means that the inputs to the upper NOR gate are Q = 1 and R = 0,
Q
K
which has the output Q = 0. Thus, the state of the circuit is internally consistent
Q
and remains stable as long as S = R = 0. A similar line of reasoning shows that the
Q
state Q = 1, Q = 0 is also stable for R = S = 0.
D
Thus, this circuit can function as a 1-bit memory. We can view the output Q as
Clock
Figure
11.25 D Flip-Flop
Clock
the value of the bit. The
inputs S and R serve to write the values 1 and 0, respectively, into memory. To see this, consider the state Q = 0, Q = 1, S = 0, R = 0.
This device is referred to as a clocked SR flip-flop. Note that the
Suppose that S changes to the value 1. Now the inputs to the lower NOR gatearrangement.
are
Q
R be
and S inputs are passed
to the NOR gates only during the clock pulse.
S = 1, Q = 0. After some time delay ^t, the output of the lower NOR gate
will
J
Q
D
Q = 0 (see Figure 11.23). So, at this point in time, the inputs to the upper NORD
gate
FLIP-FLOP One problem with SR flip-flop is that the condition R = 1, S = 1
11.25
Flip-Flop
become R = 0,Figure
Q = 0.
AfterDanother
gate delay of ^t the output Q becomes 1. must
This be avoided. One way to do this is to allow just a single input. The D flip-flop
is again a stable state. The inputs to the lower gate are now S = 1, Q = 1, which
Figure 11.26 JK Flip-Flop
accomplishes this.
Figure 11.25 shows a gate implementation of the D flip-flop. By
maintain
the output
= 0. As
long as Sto=as1 aand
R = 0,
theflip-flop.
outputs will
arrangement.
ThisQdevice
is referred
clocked
SR
Noteremain
that the
using an inverter, the nonclock inputs to the two AND gates are guaranteed to be
QR
= and
1, QS=inputs
0. Furthermore,
if the
S returns
0, the
outputs
unchanged.
are passed to
NOR to
gates
only
duringwill
theremain
clock pulse.
the opposite of each other.
The R output performs the opposite function. When R goes to 1, it forces
causing
theDoutput
to be
1; if only the
K inputtoisas
asserted,
the
result isbecause
a reset function,
The
flip-flop
is sometimes
referred
the data
flip-flop
it is, in
D FLIP-FLOP One problem with SR flip-flop is that the condition R = 1, Sof= 1
Q = 0, Q = 1 regardless of the previous state of Q and Q. Again, a time delay
causing
the output
tobit
beof0.data.
When
both
J and
K are
1, the function
performed
effect, storage
for one
The
output
of the
D flip-flop
is always equal
to the is
must be avoided. One way to do this is to allow just a single input. The D flip-flop
2^t occurs before the final state is established (Figure 11.23).
referred
to asvalue
the toggle
function:
output
is reversed.
Thus, and
if Q is
1 and 1 is
applied
mostBy
recent
applied
to the the
input.
Hence,
it remembers
produces
the
last
accomplishes this. Figure 11.25 shows a gate implementation of the D flip-flop.
The SR latch can be defined with a table similar to a truth table, called
a
to
J and
K,also
thenreferred
Q becomes
0. The
reader
should
verifyitthat
theaimplementation
using an inverter, the nonclock inputs to the two AND gates are guaranteed
to
beIt is
input.
to as the
delay
flip-flop,
because
delays
0 or 1 applied toof
characteristic
table,
which
shows
the
next
state
or
states
of
a
sequential
circuit
as
Figure
11.26
produces
this characteristic
function.
Jan2016
Computer
Architecture
the
opposite of each other.
its
input
for
a
single
clock
pulse.
We
can
capture
the
logic
of
the
D
flip-flop
in
the83
a function of current states and inputs. In the case of the SR latch, the state can
The D flip-flop is sometimes referred to as the data flip-flop because following
it is, in truth table:
be defined by the value of Q. Table 11.10a shows the resulting characteristic table.
effect, storage for one bit of data. The output of the D flip-flop is always equal to the
Observe that the inputs S = 1, R = 1 are not allowed, because these would proD
Q
n
!
1
most recent value applied to the input. Hence, it remembers and produces the last
duce an inconsistent output (both Q and Q equal 0). The table can be expressed
input. It is also referred to as the delay flip-flop, because it delays a 0 or 1 applied to
Name
Graphical Symbol
Truth Table
0
0
more compactly, as in Table 11.10b. An illustration of the behavior of the SR latch
its input for a single clock pulse. We can capture the logic of the D flip-flop in the
1
1
is shown in Table 11.10c.
following truth table:
S
Q
S
R
Qn!1
CLOCKED SR FLIP-FLOP The output of the SR latch changes, after a brief
JK FLIP-FLOP Another useful flip-flop is the JK flip-flop. Like the SR flip-flop,
D input.
Qn !This
1
time delay, in response to a change in the
is referred to as asynchronous
0
0
it has two inputs. However, in this case all possible combinations
ofQinput
values are
n
operation. More typically, events in the digital
computer
are synchronized to a clock
Ck
0
0
SR
0and Figure 11.27
1
0
flip-flop,
pulse, so that changes occur only when a1clock 1pulse occurs. Figure 11.24 showsvalid.
this Figure 11.26 shows a gate implementation of the JK
shows its characteristic table (along with those for the1 SR 0and D1flip-flops). Note
1
1
Q same as for the
that the first three combinationsRare the
SR flip-flop. With no input
JK FLIP-FLOP Another useful flip-flop is the JK flip-flop. Like the SR flip-flop,
asserted, the output is stable. If only the J input is asserted, the result is a set function,
it has two inputs. However, in this case all possible combinations of input values are
valid. Figure 11.26 shows a gate implementation of the JK flip-flop, and Figure 11.27
J
Q
J
K
Qn!1
shows its characteristic table (along with those for the SR and D flip-flops). Note
0
0
Qn
that the first three combinations are the same as for the SR flip-flop. With no input
Ck
D Flip Flop
Clock
Load
D08
JK
D06
D05
D04
D03
D02
D01
Output lines
Registers
J-K Flip-Flop
D07
Jan2016
As an example of the use of flip-flops, let us first examine one of the essential elements of the CPU: the register. As we know, a register is a digital circuit used within
the CPU to store one or more bits of data. Two basic types of registers are commonly used: parallel registersComputer
and shift
registers.
Architecture
84
can be read or written simultaneously. It is used to store data. The registers that we
have discussed throughout this book are parallel registers.
The 8-bit register of Figure 11.28 illustrates the operation of a parallel register
using D flip-flops. A control signal, labeled load, controls writing into the register
from signal lines, D11 through D18. These lines might be the output of multiplexers,
so that data from a variety of sources can be loaded into the register.
SHIFT REGISTER A shift register accepts and/or transfers information serially.
Consider, for example, Figure 11.29, which shows a 5-bit shift register constructed
from clocked D flip-flops. Data are input only to the leftmost flip-flop. With each
21
NKK-HUST
from clocked D flip-flops. Data are input only to the leftmost flip-flop. With each
clock pulse, data are shifted to the right one position, and the rightmost bit is
transferred out.
Shift registers can be used to interface to serial I/O devices. In addition, they
can be used within the ALU to perform logical shift and rotate functions. In this
B m 4-bit
High
J
Serial in
Q
Clk
Clk
Q
Clk
Q
Clk
Jan2016
Serial out
Clk
Ck
Clock
K
Ck
Q
Ck
Q0
Q
Ck
Q1
K
Q2
Q
Q3
Clock
Q0
Q1
Q2
Q3
(b) Timing diagram
Figure 11.30
Jan2016
Computer Architecture
85
NKK-HUST
Jan2016
Ripple Counter
Computer Architecture
86
NKK-HUST
Chng 3
H THNG MY TNH
Ht chng 2
Jan2016
Computer Architecture
87
Jan2016
Computer Architecture
88
22
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 3
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Jan2016
Computer Architecture
89
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
CPU
B nh chnh
H thng vo-ra
Computer Architecture
Chc nng:
n
iu khin hot ng ca my
tnh v x l d liu
1. B x l trung tm (CPU)
B x l trung tm (Central
Processing Unit CPU)
n
Bus h thng
Jan2016
90
91
Jan2016
Computer Architecture
92
23
Jan2016
NKK-HUST
NKK-HUST
n v
s hc v logic
Bus
h thng
Tp thanh ghi
n
n
93
Jan2016
n
n
n
Jan2016
B nh
m
B nh
chnh
Cc
thit b
lu tr
Computer Architecture
94
NKK-HUST
CPU
NKK-HUST
Computer Architecture
Tp thanh ghi
n
Jan2016
n v s hc v logic
n
n v iu khin
n
n v
iu khin
2. B nh my tnh
Tn ti trn mi my tnh
Cha cc lnh v d liu ca
chng trnh ang c thc hin
S dng b nh bn dn
T chc thnh cc ngn nh
c nh a ch (thng nh
a ch cho tng byte nh)
Ni dung ca ngn nh c th
thay i, song a ch vt l ca
ngn nh lun c nh
CPU mun c/ghi ngn nh cn
phi bit a ch ngn nh
Computer Architecture
B nh m (Cache memory)
Ni dung
a ch
0100 1101
00...0000
0101 0101
00...0001
1010 1111
00...0010
0000 1110
00...0011
0111 0100
00...0100
1011 0010
00...0101
0010 1000
00...0110
1110 1111
00...0111
n
n
n
.
.
n
0110 0010
11...1110
0010 0001
11...1111
95
Jan2016
96
24
Jan2016
NKK-HUST
NKK-HUST
Cn c gi l b nh ngoi
Chc nng v c im
Tc chm
n
n
3. H thng vo-ra
n
n
Cc loi thit b lu tr
n
n
n
Computer Architecture
97
NKK-HUST
Thit b
vo-ra
Cc thit b vo-ra
(IO devices)
Cc m-un vo-ra
(IO modules)
M-un
vo-ra
Thit b
vo-ra
Computer Architecture
98
NKK-HUST
M-un vo-ra
n
n
n
n
Jan2016
M-un
vo-ra
Vo d liu (Input)
Ra d liu (Output)
Jan2016
Cc thit b vo-ra
n
Thit b
vo-ra
B nh t: a cng HDD
B nh bn dn: th rn SSD, nh flash, th nh
B nh quang: CD, DVD
Jan2016
Bus
h
thng
Computer Architecture
99
Jan2016
100
25
Jan2016
NKK-HUST
NKK-HUST
n
n
n
L hot ng c bn ca my tnh
My tnh lp i lp li chu trnh lnh gm
hai bc:
n
n
Jan2016
Computer Architecture
101
NKK-HUST
Jan2016
Jan2016
Computer Architecture
102
NKK-HUST
Nhn lnh
n
Nhn lnh
Thc hin lnh
CPU
lnh
300
CPU
lnh
300
PC
lnh
301
PC
lnh
301
302
lnh i
302
303
lnh i
302
lnh i+1
303
lnh i+2
304
IR
lnh i+1
303
lnh i+2
304
IR
lnh i
103
Jan2016
Computer Architecture
104
26
Jan2016
NKK-HUST
NKK-HUST
2. Ngt (Interrupt)
Jan2016
NKK-HUST
Computer Architecture
106
NKK-HUST
n
n
Jan2016
Jan2016
Cc loi ngt:
n
Computer Architecture
Chng trnh
ang thc hin
lnh
lnh
107
Ngt y
lnh
lnh
lnh
lnh
lnh i
lnh
lnh i+1
lnh
...
RETURN
...
lnh
Jan2016
Computer Architecture
108
27
Jan2016
NKK-HUST
NKK-HUST
X l ngt tun t
n
n
n
X l ngt u tin
n
n
Interrupt
handler X
User program
Interrupt
handler X
User program
Interrupt
User program
Interrupt
handler Y
Interrupt
handler Y
program
Computer User
Architecture
Jan2016
109
Jan2016
Figure 3.13
Computer Architecture
110
82
Interrupt
handler Y
NKK-HUST
NKK-HUST
3. Hot ng vo-ra
82
CPU
M-un nh
M-un vo-ra
cn c kt ni vi nhau
Jan2016
111
Jan2016
Computer Architecture
112
28
Jan2016
NKK-HUST
NKK-HUST
Kt ni m-un nh
Kt ni m-un nh (tip)
n
n
a ch
M-un
nh
d liu
a ch a n xc nh ngn nh
D liu c a n khi ghi
D liu hoc lnh c a ra khi c
n
Tn hiu iu khin c
iu khin c (Read)
iu khin ghi (Write)
Jan2016
Computer Architecture
113
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Kt ni m-un vo-ra
d liu t bn trong
d liu ra bn ngoi
d liu t bn ngoi
d liu vo bn trong
a ch a n xc nh cng vo-ra
Ra d liu (Output)
n
M-un
vo-ra
a ch
n
Cc tn hiu iu khin thit b
n
n
Computer Architecture
115
Jan2016
Vo d liu (Input)
n
tn hiu iu khin c
Jan2016
114
116
29
Jan2016
NKK-HUST
NKK-HUST
Kt ni CPU
Kt ni CPU (tip)
n
lnh
a ch
n
n
CPU
d liu
d liu
Cc tn hiu iu khin
b nh v vo-ra
Jan2016
Computer Architecture
117
NKK-HUST
Jan2016
n
n
Jan2016
118
S cu trc bus c bn
Bus: tp hp cc ng kt ni vn chuyn
thng tin gia cc m-un ca my tnh vi
nhau.
Cc bus chc nng:
n
Computer Architecture
NKK-HUST
2. Cu trc bus c bn
n
CPU
M-un
nh
M-un
vo-ra
M-un
vo-ra
bus a ch
bus d liu
M-un
nh
bus iu khin
119
Jan2016
Computer Architecture
120
30
Jan2016
NKK-HUST
NKK-HUST
Bus a ch
n
Bus d liu
Chc nng:
n
n
N bit:
AN-1, AN-2, ... A2, A1, A0
S lng a ch ti a c s dng l: 2N a ch
(gi l khng gian a ch)
n a ch nh nht:
00 ... 000 (2)
n a ch ln nht:
11 ... 111 (2)
n
Computer Architecture
V d:
My tnh c bus d liu kt ni CPU vi b nh l 64-bit
C th trao i 8 byte nh mt thi im
121
NKK-HUST
Jan2016
Computer Architecture
122
NKK-HUST
Bus iu khin
Jan2016
Jan2016
V d:
Computer Architecture
123
Jan2016
124
31
Jan2016
NKK-HUST
NKK-HUST
88
Jan2016
Computer Architecture
125
Jan2016
Local bus
Cache
Computer Architecture
126
Local I/O
controller
Main
memory
System bus
NKK-HUST
NKK-HUST
Expansion
bus interface
Network
SCSI
3. Phn cp bus
Phn cp bus
Serial
Modem
Expansion bus
Main
memory
Processor
SCSI
FAX
Bus ca b x l
Bus ca RAM
Cc bus vo-ra
Computer Architecture
Cache /
bridge
FireWire
System bus
Graphic
Video
LAN
High-speed bus
Expansion
bus interface
Serial
Modem
Expansion bus
(b) High-performance architecture
Figure 3.17
Jan2016
Local bus
127
Jan2016
Computer Architecture
so an external bus or other interconnect scheme is not needed, although there may
also be an external cache. As will be discussed in Chapter 4, the use of a cache structure insulates the processor from a requirement to access main memory frequently.
Hence, main memory can be moved off of the local bus onto a system bus. In this way,
I/O transfers to and from the main memory across the system bus do not interfere
with the processors activity.
128
32
Jan2016
NKK-HUST
NKK-HUST
4. Kt ni im-im
n
n
Point-to-point connection
Khc phc nhc im ca bus dng chung
(shared bus)
Kt ni QPI
Kt ni PCIe
I/O Hub
Core
A
Core
B
Core
DRAM
DRAM
I/O device
94
Gigabit
Ethernet
PCIe
PCIePCI
Bridge
PCIe
n
n
99
Core
Memory
Chipset
Memory
I/O Hub
DRAM
Core
D
PCIe
Legacy
endpoint
Figure 3.24
QPI
Jan2016
Figure 3.20
PCI Express
Memory bus
PCIe
Switch
PCIe
I/O device
I/O device
DRAM
PCIe
Core
C
PCIe
PCIe
endpoint
PCIe
endpoint
PCIe
endpoint
device and one or more that attach to a switch that manages multiple PCIe streams.
PCIe links from the chipset may attach to the following kinds of devices that imple129
ment PCIe:
Jan2016
Computer Architecture
130
NKK-HUST
9.6 (24.38cm)
The reader unfamiliar with the concept of a protocol architecture will find a brief overview in Appendix L.
Ht chng 3
5
6
11
13
4 12
10
14 15
11.6 (29.46cm)
Jan2016
Computer Architecture
131
Jan2016
Computer Architecture
132
33
Jan2016
NKK-HUST
NKK-HUST
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Chng 4
S HC MY TNH
Jan2016
Computer Architecture
133
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Ni dung chng 4
Jan2016
134
Computer Architecture
n
n
135
Jan2016
Computer Architecture
136
34
Jan2016
NKK-HUST
NKK-HUST
V d 1
A = ai 2 i
B = 150 = 128 + 16 + 4 + 2 = 27 + 24 + 22 + 21
150 = 1001 0110
i=0
Di biu din ca A:
Jan2016
[0, 2n 1]
Computer Architecture
137
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
V d 2
n
Vi n = 8 bit
Biu din c cc gi tr t 0 n 255 (28 - 1)
Ch :
Biu din
nh phn
Gi tr
thp phn
0000 0000
0000 0001
0000 0010
0000 0011
0
1
2
3
4
0000 0100
...
1111 1110
254
1111 1111
255
M = 0001 0010
N = 1011 1001
+ 0000 0001
Xc nh gi tr ca chng ?
n
n
M = 0001 0010 =
24
N = 1011 1001 =
27
21
25
c nh ra ngoi
(Carry out)
= 16 +2 = 18
+
24
23
255 + 1 = 0 ???
20
= 128 + 32 + 16 + 8 + 1 = 185
Computer Architecture
1111 1111
1 0000 0000
Gii:
Jan2016
138
139
Jan2016
Computer Architecture
140
35
Jan2016
NKK-HUST
NKK-HUST
Trc s hc vi n = 8 bit
Trc s hc:
255
0 1 2 3
n
n
Trc s hc my tnh:
254
255 0
n
n
n
n
Jan2016
Computer Architecture
141
NKK-HUST
= 0
= 255
= 256
= 65535
Jan2016
Computer Architecture
142
NKK-HUST
V d
Vi n = 8 bit, cho A = 0010 0101
S b mt v S b hai
n
- 0010 0101
S b hai ca A
(28 - 1)
(A)
1101 1010
S b mt ca A = (2n-1) - A
2n
S b mt ca A c tnh nh sau:
1111 1111
o cc bit ca A
-A
S b hai ca A = (S b mt ca A) +1
(28)
- 0010 0101
(A)
1101 1011
thc hin kh khn
Jan2016
Computer Architecture
143
Jan2016
Computer Architecture
144
36
Jan2016
NKK-HUST
n
n
n
NKK-HUST
Quy tc tm S b mt v S b hai
S b mt ca A = o gi tr cc bit ca A
(S b hai ca A) = (S b mt ca A) + 1
V d:
Cho
A
S b mt ca A
=
=
S b hai ca A
0010 0101
1101 1010
+
1
1101 1011
Nhn xt:
A
S b hai ca A
0010 0101
+ 1101 1011
1 0000 0000 = 0
(b qua bit nh ra ngoi)
S b hai ca A = -A
Jan2016
=
=
Computer Architecture
145
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
V d
n
Xc nh gi tr ca s dng
Jan2016
146
= - 80
Ta c: + 80
S b mt
0 an2 ... a2 a1 a0
n
Gi tr ca s dng:
n2
S b hai
= 0101 0000
= 1010 1111
+
1
= 1011 0000
Vy: B = - 80
= 1011 0000
Computer Architecture
A = ai 2 i
i=0
n
147
Jan2016
148
37
Jan2016
NKK-HUST
NKK-HUST
Xc nh gi tr ca s m
n
Cng thc xc nh gi tr s m
! = 1!!!!! !!!!! ! !!! !!! !!! !
Gi tr ca s m:
!!!
n2
A = 2
n1
+ ai 2
Chng minh?
(2!!!
!!!
i=0
n
!! 2! + 1!
1)
!1
! = 2
!2
+!
!! 2 !
!=0
Jan2016
Computer Architecture
149
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
V d
Gi tr ca A c xc nh nh sau:
n2
A = an1 2
n1
+ ai 2 i
Jan2016
Computer Architecture
Hy xc nh gi tr ca cc s nguyn c du
c biu din theo m b hai vi 8-bit nh
di y:
n
P = 0110 0010
Q = 1101 1011
Gii:
i=0
n
150
151
Jan2016
Computer Architecture
152
38
Jan2016
NKK-HUST
NKK-HUST
Vi n = 8 bit
Biu din c cc gi tr
t -27 n +27-1
Gi tr
thp phn
Biu din
b hai
-128 n +127
Ch c mt gi tr 0
Khng biu din cho gi tr
+128
0000 0000
0000 0001
+2
0000 0010
+126
0111 1110
+127
0111 1111
-128
1000 0000
-127
1000 0001
Trc s hc:
-128
...
Ch :
+127 + 1 = -128
(-128)+(-1) = +127
c trn xy ra (Overflow)
-2
1111 1110
-1
1111 1111
Computer Architecture
Trc s hc my tnh:
+1
+2
+3
-128
153
Jan2016
+127
Computer Architecture
154
0
+1
+32767 (215 - 1)
-32768 (-215)
+32767
S dng:
+19 =
(16bit)
S m:
1110 1101
(8bit)
-1
-231
0001 0011
- 19 =
NKK-HUST
Jan2016
-1
-3
+127
-2
NKK-HUST
-2 -1 0 1 2
...
Jan2016
0
+1
231-1
(8bit)
(16bit)
Jan2016
Computer Architecture
156
39
Jan2016
NKK-HUST
NKK-HUST
n bit
Cout
n bit
B cng n-bit
Cin
n
n bit
Nu Cout = 0 nhn c kt qu ng
Nu Cout = 1 nhn c kt qu sai, do
c nh ra ngoi (Carry Out)
Jan2016
Computer Architecture
157
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
2. Php o du
57
34
91
=
= +
209
73
282
=
= +
0011 1001
0010 0010
0101 1011 = 64+16+8+2+1=91 ng
1101 0001
0100 1001
1 0001 1010
kt qu = 0001 1010 =16+8+2=26 sai
do c nh ra ngoi (Cout=1)
158
Computer Architecture
159
Jan2016
Ta c:
+ 37
b mt
=
=
b hai
0010 0101
1101 1010
+
1
1101 1011 =
-37
Ly b hai ca s m:
- 37 =
1101 1011
b mt =
0010 0100
+
1
b hai
=
0010 0101 =
+37
160
40
Jan2016
NKK-HUST
NKK-HUST
3. Cng s nguyn c du
n
Jan2016
Computer Architecture
161
NKK-HUST
=
=
0100 0110
0010 1010
0111 0000 = +112
(+ 97)
+ (- 52)
+ 45
=
=
0110 0001
1100 1100 (+52=0011 0100)
1 0010 1101 = +45
( - 90)
+ ( +36)
- 54
=
=
( - 74)
+( - 30)
-104
=
=
Jan2016
Computer Architecture
162
NKK-HUST
= 0100 1011
= 0101 0010
1001 1101
= - 128+16+8+4+1= -99 sai
n
n
Computer Architecture
Jan2016
( + 70)
+ ( + 42)
+ 112
n-bit
B hai
B cng n-bit
n-bit
163
Jan2016
S= X-Y
Computer Architecture
164
41
Jan2016
NKK-HUST
NKK-HUST
S b nhn (11)
S nhn
(13)
n
n
Tch
(143)
Computer Architecture
165
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
S b nhn
Mn-1 Mn-2
...
M1
C 0; A 0
M S b nhn
Q S nhn
B m n
M0
iu
khin
cng
B cng n-bit
B logic iu khin
cng v dch
No
An-1 An-2
...
A1
A0
Qn-1 Qn-2
...
Q1
Q0
B m = 0 ?
S nhn
No
Yes
Kt thc
Computer Architecture
Yes
Q0 = 1 ?
C,A A+M
iu khin
dch phi
Jan2016
166
167
Jan2016
Tch trong AQ
Computer Architecture
168
42
Jan2016
NKK-HUST
NKK-HUST
S b nhn M =
S nhn Q =
Tch
=
C
0
0
0
0
0
1
0
Jan2016
1011
1101
1000 1111
2. Nhn s nguyn c du
(11)
(13)
(143)
n
n
A
Q
0000 1101 Cc gi tr khi u
+ 1011
1011 1101 A A + M
0101 1110 Dch phi
0010
+ 1011
1101
0110
+ 1011
0001
1000
1111 A A + M
1111 Dch phi
1111 A A + M
1111 Dch phi
Computer Architecture
169
Jan2016
335
as the twos complement value -7, then each partial product must be a negative
twos complement number of 2n (8) bits, as shown in Figure 10.11b. Note that this is
accomplished by padding out each partial product to the left with binary 1s.
If the multiplier is negative, straightforward multiplication also will not work.
The reason is that the bits of the multiplier no longer correspond to the shifts or
multiplications that must take place. For example, the 4-bit decimal number -3 is
written 1101 in twos complement. If we simply took partial products based on each
bit position, we would have the
following
correspondence:
Computer
Architecture
170
NKK-HUST
NKK-HUST
In fact, what is desired is -(21 + 20). So this multiplier cannot be used directly in
the manner we have been describing.
There are a number of ways out of this dilemma. One would be to convert
both multiplier and multiplicand to positive numbers, perform the multiplication,
and then take the twos complement of the result if and only if the sign of the two
original numbers differed. Implementers have preferred to use techniques that
do not require this final transformation step. One of the most common of these is
Booths algorithm. This algorithm also has the benefit of speeding up the multiplication process, relative to a more straightforward approach.
Booths algorithm is depicted in Figure 10.12 and can be described as follows.
As before, the multiplier and multiplicand are placed in the Q and M registers,
START
0
A
0, Q"1
M
Multiplicand
Q
Multiplier
Count
n
! 10
A"M
Q0, Q"1
! 01
! 11
! 00
A#M
Arithmetic shift
Right: A, Q, Q"1
Count
Count " 1
No
Count ! 0?
Yes
END
Jan2016
Computer Architecture
171
Jan2016
Computer Architecture
172
43
Jan2016
NKK-HUST
NKK-HUST
S b chia
10010011
- 1011
001110
- 1011
001111
- 1011
100
Jan2016
1011
00001101
S chia
Thng
M1
M0
iu
khin
cng/tr
B cng/tr n-bit
B logic iu khin
cng/tr v dch
iu khin
dch tri
An-1 An-2
Phn d
Computer Architecture
...
A1
A0
Qn-1 Qn-2
...
Q1
Q0
S b chia Q
173
NKK-HUST
Jan2016
Computer Architecture
174
NKK-HUST
4. Chia s nguyn c du
Bt u
n
A 0
M S chia
Q S b chia
B m n
n
Yes
Q0 0
AA+M
Q0 1
B mB m-1
No
B m = 0 ?
Yes
Kt thc
Thng Q
S d A
Computer Architecture
A<0?
Jan2016
...
175
Jan2016
S b chia
S chia
Thng
S d
dng
dng
gi nguyn
gi nguyn
dng
o du
gi nguyn
dng
o du
o du
gi nguyn
o du
Computer Architecture
176
44
Jan2016
NKK-HUST
NKK-HUST
4.4. S du phy ng
2. Chun IEEE754-2008
1. Nguyn tc chung
n Floating Point Number biu din cho s
thc
n Tng qut: mt s thc X c biu din
theo kiu s du phy ng nh sau:
X = M * RE
n
n
n
Sign Biased
bit exponent
8 bits
23 bits
(a) Binary32 format
Sign Biased
bit exponent
Computer Architecture
Dng 64-bit
M l phn nh tr (Mantissa),
R l c s (Radix),
E l phn m (Exponent).
Jan2016
Trailing
significand field
Dng 32-bit
52 bits
Sign
bit
Biased
exponent
Dng 128-bit
15 bits
(c) Binary128 format
Figure 10.21
177
112 bits
Jan2016
Table 10.3
Format
Binary32
Binary64
32
64
NKK-HUST
NKK-HUST
Dng 32-bit
e
S
1 bit
S = 0 s dng
S = 1 s m
128
11
15
Exponent bias
127
1023
16383
Maximum exponent
127
1023
16383
- 126
Minimum exponent
Approx normal number range
(base 10)
10-38, 10+38
- 1022
10-308, 10+308
- 16382
10-4932, 10+4932
23
52
112
Number of exponents
254
2046
32766
Number of fractions
223
252
2112
1.99 * 2128
23 bit
1.98 * 231
1.99 * 263
2-126
2-1022
2-16362
2128 - 2104
21024 - 2971
216384 - 216271
-149
-1074
2-16494
Note: *not including implied bit and not including sign bit
S = 1 s m
e = 1000 0010(2) = 130(10) E = 130 - 127 = 3
Vy
X = -1.10101100(2) * 23 = -1101.011(2) = -13.375(10)
n
n
8 bit
S l bit du:
n
V d 1
Binary128
M = 1.m
X = (-1)S*1.m * 2e-127
Jan2016
Computer Architecture
179
Jan2016
Computer Architecture
180
45
Jan2016
NKK-HUST
NKK-HUST
V d 2
Cc qui c c bit
Gii:
Ta c:
S = 0 v y l s dng
Vy:
Computer Architecture
181
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Di gi tr biu din
n
n
182
Dng 64-bit
2-127 n 2+127
10-38 n 10+38
n
n
S l bit du
e (11 bit) l gi tr dch chuyn ca phn
m E:
n
-2+127
-2-127 0
+2-127
+2+127
n
n
X = (-1)S*1.m * 2e-1023
n
Jan2016
Computer Architecture
183
Jan2016
184
46
Jan2016
NKK-HUST
NKK-HUST
Dng 128-bit
n
n
S l bit du
e (15 bit) l gi tr dch chuyn ca phn
m E:
n
n
n
X1 = M1 * RE1
X2 = M2 * RE2
Ta c
n
n
n
X = (-1)S*1.m * 2e-16383
n
Jan2016
Computer Architecture
185
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Cc kh nng trn s
n
Jan2016
186
187
n
n
n
Jan2016
Computer Architecture
188
47
Jan2016
NKK-HUST
NKK-HUST
TR
X=0?
i du ca Y
CNG
X=0?
Y=0?
Phn m
bng nhau?
ZY
N
Tng phn m
nh hn
Z X
TR V
Dch phi
phn nh tr
Z 0
Kt qu
chun ha?
nh tr = 0?
N
Dch phi
phn nh tr
Lm trn
kt qu
Z 0
TR V
TR V
Y=0?
Y
Cng phn m
Tr cho
lch
Trn trn
phn m?
Y Thng bo
trn trn
TR V
nh tr b
trn?
Gim phn m
Trn di
phn m?
Y
Dch phi
phn nh tr
Y=0?
Y
Ct s kia
vo Z
Cng c du
phn nh tr
N Phn m b
trn di?
Tng phn m
N
Nhn phn
nh tr
Y Thng bo
trn di
TR V
Y
Bo trn di
TR V
Chun ha
TR V
Bo trn
Phn m
b trn?
TR V
Lm trn
TR V
Jan2016
Computer Architecture
189
NKK-HUST
Jan2016
Computer Architecture
190
NKK-HUST
X=0?
Y=0?
Z 0
Tr phn m
Cng thm
lch
Trn trn
phn m?
TR V
N
Trn di
phn m?
N
Chia phn
nh tr
Ht chng 4
Y Thng bo
trn trn
Thng bo
trn di
TR V
Chun ha
Lm trn
TR V
Jan2016
Computer Architecture
191
Jan2016
Computer Architecture
192
48
Jan2016
NKK-HUST
NKK-HUST
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Chng 5
KIN TRC TP LNH
Jan2016
Computer Architecture
193
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Ni dung ca chng 5
Hp ng (assembly language):
n
n
Computer Architecture
195
Jan2016
cn gi l m my (machine code)
dng lnh c th c c bi my tnh
Jan2016
194
Computer Architecture
196
49
Jan2016
NKK-HUST
NKK-HUST
B nh chnh
CPU
lnh
lnh
lnh
lnh
PC
n v
iu khin
.
.
.
d liu
d liu
d liu
d liu
.
.
.
ALU
Tp thanh ghi
Vo-ra
B m chng trnh PC
(Program Counter) l thanh ghi
ca CPU gi a ch ca lnh cn
nhn vo thc hin
CPU pht a ch t PC n b
nh, lnh c nhn vo
Jan2016
Computer Architecture
NKK-HUST
Jan2016
lnh k tip
lnh
lnh
Ty thuc vo di ca lnh va
c nhn
MIPS: lnh c di 32-bit, PC tng 4
Computer Architecture
198
Jan2016
lnh c
nhn vo
NKK-HUST
lnh
PC
197
lnh
.
.
.
lnh
Computer Architecture
199
Jan2016
Computer Architecture
200
50
Jan2016
NKK-HUST
NKK-HUST
Hng s a ch
Trong lnh cho hng s
a ch c th
CPU pht gi tr a ch
ny n b nh tm
ra ngn nh d liu cn
c/ghi
d liu
d liu
Hng s a ch
d liu
d liu cn c/ghi
d liu
d liu
d liu
d liu
Jan2016
Computer Architecture
201
NKK-HUST
Jan2016
Jan2016
d liu
d liu
d liu
Thanh ghi
d liu cn c/ghi
d liu
d liu
d liu
d liu
Computer Architecture
202
NKK-HUST
a ch c s (base address):
a ch ca ngn nh c s
Gi tr dch chuyn a ch (offset):
gia s a ch gia ngn nh cn
c/ghi so vi ngn nh c s
a ch ca ngn nh cn c/ghi
= (a ch c s) + (offset)
C th s dng cc thanh ghi
qun l cc tham s ny
Trng hp ring:
n
a ch c s = 0
Offset = 0
Computer Architecture
Ngn xp (Stack)
n
a ch c s
Ngn nh c s
Offset
n
n
d liu cn oc/ghi
203
Jan2016
Computer Architecture
204
51
Jan2016
NKK-HUST
NKK-HUST
nh ngn xp
Gim ni dung ca SP
Thng tin c ct vo ngn nh
c tr bi SP
chiu
a
ch
tng
dn
y ngn xp
Jan2016
Computer Architecture
205
NKK-HUST
Cc sn phm thc t:
n
Jan2016
Computer Architecture
206
NKK-HUST
V d lu tr d liu 32-bit
S
nh phn
S Hexa
Tp lnh
2B
3C
n
n
4D
4D
4000
1A
4000
3C
4001
2B
4001
2B
4002
3C
4002
1A
4003
4D
4003
little-endian
Jan2016
Mi b x l c mt tp lnh xc nh
Tp lnh thng c hng chc n hng trm
lnh
Mi lnh my (m my) l mt chui cc bit (0,1)
m b x l hiu c thc hin mt thao tc
xc nh.
Cc lnh c m t bng cc k hiu gi nh
dng text, chnh l cc lnh ca hp ng
(assembly language)
big-endian
Computer Architecture
207
Jan2016
Computer Architecture
208
52
Jan2016
NKK-HUST
NKK-HUST
Dng lnh hp ng
M C:
a = b + c;
n V d lnh hp ng:
add a, b, c
#a=b+c
trong :
M thao tc
n
n
Jan2016
n
n
n
n
n
209
Jan2016
Computer Architecture
add r1, r2
# r1 = r1 + r2
S dng trn Intel x86, Motorola 680x0
add r1
# Acc = Acc + r1
c s dng trn kin trc th h trc
n
n
0 a ch ton hng:
n
n
Mt a ch ton hng:
n
Jan2016
210
Ba a ch ton hng:
n
NKK-HUST
Ton hng c th l:
n
NKK-HUST
a ch ton hng
211
Jan2016
Computer Architecture
212
53
Jan2016
NKK-HUST
NKK-HUST
n
n
n
n
n
n
S lng lnh t
Hu ht cc lnh truy nhp ton hng cc
thanh ghi
Truy nhp b nh bng cc lnh LOAD/STORE
(np/lu)
Thi gian thc hin cc lnh l nh nhau
Cc lnh c di c nh (thng l 32 bit)
S lng dng lnh t
C t phng php nh a ch ton hng
C nhiu thanh ghi
H tr cc thao tc ca ngn ng bc cao
Jan2016
Computer Architecture
n
n
n
n
n
213
Jan2016
add a, b, c # a = b + c
Hu ht cc lnh s hc/logic c dng
trn
Cc lnh s hc s dng ton hng
thanh ghi hoc hng s
n
n
n
215
Jan2016
Bt u bng du $
$t0, $t1, , $t9 cha cc gi tr tm thi
$s0, $s1, , $s7 ct cc bin
Computer Architecture
Jan2016
214
Computer Architecture
NKK-HUST
NKK-HUST
216
54
Jan2016
NKK-HUST
NKK-HUST
$at
$v0-$v1
2-3
$a0-$a3
4-7
$t0-$t7
8-15
$s0-$s7
16-23
$t8-$t9
24-25
$k0-$k1
26-27
OS temporaries, cc gi tr tm thi ca OS
$gp
28
$sp
29
$fp
30
$ra
31
Jan2016
n
n
217
NKK-HUST
# $t0 = g + h
# $t1 = i + j
# f = (g+h)-(i+j)
Computer Architecture
218
a ch byte nh v word nh
B nh c nh a ch theo byte
n
n
n
Jan2016
Jan2016
# (rd) = (rs)+(rt)
# (rd) = (rs)-(rt)
NKK-HUST
rd, rs, rt
rd, rs, rt
add
add
sub
Ton hng b nh
n
add
sub
V d m C:
f = (g + h) - (i + j);
n
a ch byte
(theo Hexa)
D liu
hoc lnh
a ch word
(theo Hexa)
byte (8-bit)
0x0000 0000
word (32-bit)
0x0000 0000
byte
0x0000 0001
word
0x0000 0004
byte
0x0000 0002
word
0x0000 0008
byte
0x0000 0003
word
0x0000 000C
byte
0x0000 0004
word
0x0000 0010
byte
0x0000 0005
word
0x0000 0014
byte
0x0000 0006
word
0x0000 0018
byte
0x0000 0007
Computer Architecture
D liu
hoc lnh
219
Jan2016
word
0xFFFF FFF4
byte
0xFFFF FFFB
word
0xFFFF FFF8
byte
0xFFFF FFFC
word
0xFFFF FFFC
byte
0xFFFF FFFD
byte
0xFFFF FFFE
byte
0xFFFF FFFF
232 bytes
230 words
Computer Architecture
220
55
Jan2016
NKK-HUST
NKK-HUST
V d ton hng b nh
M C:
// A l mng cc phn t 32-bit
g = h + A[8];
n
a ch c s
Offset = 32
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
Jan2016
Computer Architecture
221
NKK-HUST
Jan2016
Computer Architecture
222
NKK-HUST
V d ton hng b nh
M C:
// A l mng cc phn t 32-bit
g = h + A[8];
n
n
a ch c s
Offset = 32
M hp ng MIPS:
# Ch s 8, do offset = 32
lw
$t0, 32($s3)
# $t0 = A[8]
add $s1, $s2, $t0 # g = h+A[8]
n
offset
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
M C:
a ch c s
A[12] = h + A[8];
n h $s2
n $s3 cha a ch c s ca mng A
Offset = 48
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
base register
Computer Architecture
223
Jan2016
Computer Architecture
224
56
Jan2016
NKK-HUST
NKK-HUST
M C:
Thanh ghi vi B nh
a ch c s
A[12] = h + A[8];
n h $s2
n $s3 cha a ch c s ca mng A
Offset = 48
M hp ng MIPS:
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 48($s3)
Jan2016
# $t0 = A[8]
# $t0 = h+A[8]
# A[12]=h+A[8]
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
Computer Architecture
n
n
225
Jan2016
226
NKK-HUST
X l vi s nguyn
# $s3 = $s3+4
227
Jan2016
Computer Architecture
Jan2016
Computer Architecture
NKK-HUST
Computer Architecture
228
57
Jan2016
NKK-HUST
NKK-HUST
Hng s Zero
n
Khng th thay i gi tr
Jan2016
Computer Architecture
229
NKK-HUST
Jan2016
Computer Architecture
230
NKK-HUST
op
rs
rt
rd
shamt
funct
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
op
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
op
address
6 bits
26 bits
n
Computer Architecture
rs
rt
rd
shamt
funct
5 bits
5 bits
5 bits
5 bits
6 bits
n
n
Lnh kiu J
op
6 bits
Cc trng ca lnh
n
Lnh kiu I
Jan2016
231
Jan2016
232
58
Jan2016
NKK-HUST
NKK-HUST
rs
rt
rd
shamt
funct
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
$s1
$s2
$t0
add
17
18
32
000000
10001
10010
01000
00000
100000
$t3
$t5
11
000000
01011
Jan2016
13
01101
$s0
16
10000
00000
rs: s hiu thanh ghi ngun (addi) hoc thanh ghi c s (lw, sw)
rt: s hiu thanh ghi ch (addi, lw) hoc thanh ghi ngun (sw)
rt, imm(rs)
# (rt) = mem[(rs)+SignExtImm]
34
sw
rt, imm(rs)
# mem[(rs)+SignExtImm] = (rt)
100010
233
Jan2016
Computer Architecture
234
NKK-HUST
V d m my ca lnh addi
+5 =
16-bit
+5 =
32-bit
-12 =
16-bit
1111
1111
1111
1111
1111
Computer Architecture
1111
1111
1111 0100
1111 0100
op
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
-12 =
Jan2016
lw
imm
16 bits
NKK-HUST
rt
5 bits
sub
(0x016D8022)
Computer Architecture
rs
5 bits
(0x02324020)
op
6 bits
$s1
$s0
17
16
001000
10001
10000
$s2
$t1
(0x22300005)
-12
18
-12
001000
10010
01001
32-bit
(0x2249FFF4)
235
Jan2016
Computer Architecture
236
59
Jan2016
NKK-HUST
NKK-HUST
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
lw $t0, 32($s3)
35
$s3
$t0
32
35
19
32
100011
10011
01000
j (jump) op = 000010
jal (jump and link) op = 000011
(0x8E680020)
sw $s1, 4($t1)
Jan2016
43
$t1
$s1
43
17
101011
01001
10001
Computer Architecture
(0xAD310004)
op
237
NKK-HUST
Jan2016
26 bits
Computer Architecture
238
NKK-HUST
5.4. C bn v lp trnh hp ng
1.
2.
3.
4.
5.
6.
7.
8.
Jan2016
address
6 bits
1. Cc lnh logic
Cc lnh logic
Np hng s vo thanh ghi
To cc cu trc iu khin
Lp trnh mng d liu
Chng trnh con
D liu k t
Lnh nhn v lnh chia
Cc lnh vi s du phy ng
Computer Architecture
239
Jan2016
Ton t
trong C
Lnh ca
MIPS
Shift left
<<
sll
Shift right
>>
srl
Bitwise AND
&
and, andi
Bitwise OR
or, ori
Bitwise XOR
xor, xori
Bitwise NOT
nor
Computer Architecture
240
60
Jan2016
NKK-HUST
NKK-HUST
$s2 1111
$s2 1111
M hp ng
1111
1111
Kt qu thanh ghi ch
1111
M hp ng
1111
Kt qu thanh ghi ch
$s3
or
$s4
or
$s4 1111
$s5
$s6
Jan2016
Computer Architecture
241
NKK-HUST
Jan2016
1111
1111
Computer Architecture
242
NKK-HUST
Zero-extended
M hp ng
1111
Zero-extended
Kt qu thanh ghi ch
M hp ng
Kt qu thanh ghi ch
andi $s2,$s1,0xFA34 $s2 0000 0000 0000 0000 0000 0000 0011 0100
ori
ori
$s3,$s1,0xFA34 $s3
$s3,$s1,0xFA34 $s3 0000 0000 0000 0000 1111 1010 1111 1111
xori $s4,$s1,0xFA34 $s4 0000 0000 0000 0000 1111 1010 1100 1011
Computer Architecture
243
Jan2016
Computer Architecture
244
61
Jan2016
NKK-HUST
NKK-HUST
n
n
n
Jan2016
Computer Architecture
NKK-HUST
rt
rd
shamt
funct
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
245
rs
i 0 thnh 1, v i 1 thnh 0
MIPS khng c lnh NOT, nhng c lnh NOR vi 3
ton hng
a NOR b == NOT ( a OR b )
nor $t0, $t1, $zero # $t0 = not($t1)
op
Jan2016
Computer Architecture
246
NKK-HUST
M my:
M my:
op
rs
rt
rd
shamt
funct
op
rs
rt
rd
shamt
funct
16
10
17
18
000000
00000
10000
01010
00100
000000
000000
00000
10001
10010
00010
(0x00105100)
$s0
0000
0000
0000
0000
0000
0000
0000
1101
$t2
0000
0000
0000
0000
0000
0000
1101
0000
000010
(0x00119082)
= 13
$s1
0000
0000
0000
0000
0000
0000
0101
0110
= 86
= 208
$s2
0000
0000
0000
0000
0000
0000
0001
0101
= 21
(13x16)
[86/4]
Computer Architecture
247
Jan2016
Computer Architecture
248
62
Jan2016
NKK-HUST
NKK-HUST
n
n
imm
16 bits
15
$s0
0x21A0
15
16
0x21A0
10000
001111
00000
(0x3C1021A0)
Ni dung $s0 sau khi lnh c thc hin:
$s0
Computer Architecture
249
NKK-HUST
0010
0001
Jan2016
1010 0000
0000
0000
Computer Architecture
0000
0000
250
NKK-HUST
rt
5 bits
Lnh m my
Jan2016
rs
5 bits
ori rt,rt,constant_low16bit
n
op
6 bits
3. To cc cu trc iu khin
# np 0x21A0 vo na cao
# ca thanh ghi $s0
ori $s0,$s0,0x403B
# np 0x403B vo na thp
# ca thanh ghi $s0
n
n
n
0011
or
1011
if
if/else
switch/case
Cc cu trc lp
n
Cc cu trc r nhnh
while
do while
for
251
Jan2016
Computer Architecture
252
63
Jan2016
NKK-HUST
NKK-HUST
Dch cu lnh if
n
n
j L1
n
253
NKK-HUST
Jan2016
Computer Architecture
254
NKK-HUST
Dch cu lnh if
M C:
if (i==j)
f = g+h;
f = f-i;
n f, g, h, i, j $s0, $s1, $s2, $s3, $s4
i == j ?
No
Yes
i = j
M C:
f = f - i
i j
else
if (i==j) f = g+h;
else f = g-h;
f = g + h
i == j ?
f = g + h
f = g - h
M hp ng MIPS:
# $s0 =
# $s3 =
bne
add
L1: sub
Jan2016
f = f - i
Jan2016
Yes
f = g + h
Lnh nhy j
n
No
i == j ?
branch on equal
nu (rs == rt) r nhnh n lnh nhn L1
M C:
if (i==j)
f = g+h;
f = f-i;
n f, g, h, i, j $s0, $s1, $s2, $s3, $s4
f, $s1 = g, $s2
i, $s4 = j
$s3, $s4, L1 #
$s0, $s1, $s2 #
$s0, $s0, $s3 #
= h
Computer Architecture
Nu i=j
th f=g+h
f=f-i
iu kin hp
ng ngc vi
iu kin ca
ngn ng bc
cao
255
Jan2016
Computer Architecture
256
64
Jan2016
NKK-HUST
NKK-HUST
M C:
i j
M C:
else
if (i==j) f = g+h;
else f = g-h;
n
i == j ?
f = g + h
f = g - h
M hp ng MIPS:
Else:
Exit:
bne
add
j
sub
$s3,$s4,Else
$s0,$s1,$s2
Exit
$s0,$s1,$s2
Jan2016
#
#
#
#
Nu i=j
th f=g+h
thot
nu i<>j th f=g-h
Computer Architecture
257
NKK-HUST
switch (amount) {
case 20:
fee = 2; break;
case 50:
fee = 3; break;
case 100: fee = 5; break;
default: fee = 0;
}
// tng ng vi s dng cc cu lnh if/else
if(amount = = 20) fee = 2;
else if (amount = = 50) fee = 3;
else if (amount = = 100) fee = 5;
else fee = 0;
Jan2016
Computer Architecture
NKK-HUST
M hp ng MIPS
Jan2016
258
M C:
i += 1;
A[i] == k?
i $s3, k $s5
a ch c s ca mng A $s6
Yes
while (A[i] == k)
# $t0 = 20
# amount == 20? if not, skip to case50
# if so, fee = 2
# and break out of case
n
n
No
i = i + 1
# $t0 = 50
# amount == 50? if not, skip to case100
# if so, fee = 3
# and break out of case
# $t0 = 100
# amount == 100? if not, skip to default
# if so, fee = 5
# and break out of case
# fee = 0
Computer Architecture
259
Jan2016
Computer Architecture
260
65
Jan2016
NKK-HUST
NKK-HUST
M C:
while (A[i] == k)
n
n
i += 1;
i $s3, k $s5
a ch c s ca mng A $s6
A[i] == k?
No
M C:
Yes
i = i + 1
M hp ng MIPS:
Loop: sll
$t1, $s3, 2
# $t1 = 4*i
add
# $t1 = a ch A[i]
lw
$t0, 0($t1)
# $t0 = A[i]
bne
# nu A[i]<>k th Exit
# nu A[i]=k th i=i+1
# quay li Loop
Loop
Exit: ...
Jan2016
Computer Architecture
261
NKK-HUST
Jan2016
M C:
= sum
$0, 0
$0, $0
$0, 10
$t0, done
$s1, $s0
$s0, 1
#
#
#
#
#
#
#
sum = 0
i = 0
$t0 = 10
Nu i=10, thot
Nu i<10 th sum = sum+i
tng i thm 1
quay li for
Computer Architecture
M hp ng MIPS:
# $s0 = i, $s1
addi $s1,
add
$s0,
addi $t0,
for: beq
$s0,
add
$s1,
addi $s0,
j
for
done: ...
Jan2016
262
NKK-HUST
Computer Architecture
263
Jan2016
Computer Architecture
264
66
Jan2016
NKK-HUST
NKK-HUST
n
n
Lnh slti
slti rt, rs, constant
n
n
So snh s c du v khng du
1 < +1 $t0 = 1
# unsigned
L1:
Jan2016
Computer Architecture
265
NKK-HUST
Jan2016
266
NKK-HUST
Computer Architecture
M C
M hp ng MIPS
int sum = 0;
int i;
for (i=1; i < 101; i = i*2) {
sum = sum + i;
}
loop:
# sum = 0
# i = 1
# t0 = 101
slt
# Nu i>= 101
beq
# th thot
add
sll
# nu i<101 th sum=sum+i
# i= 2*i
loop
# lp li
done:
Jan2016
Computer Architecture
267
Jan2016
Computer Architecture
268
67
Jan2016
NKK-HUST
NKK-HUST
V d v mng
Jan2016
Computer Architecture
Mng 5-phn t, mi
phn t c di 32-bit,
chim 4 byte trong b
nh
a ch c s =
0x12348000 (a ch ca
phn t u tin ca
mng array[0])
Bc u tin truy cp
mng: np a ch c s
vo thanh ghi
269
NKK-HUST
Jan2016
array[0]
0x12348004
array[1]
0x12348008
array[2]
0x1234800C
array[3]
0x12348010
array[4]
Computer Architecture
270
NKK-HUST
V d truy cp cc phn t
n
M C
int array[5];
array[0] = array[0] * 2;
array[1] = array[1] * 2;
n
M hp ng MIPS
lw
sll
sw
$t1, 0($s0)
$t1, $t1, 1
$t1, 0($s0)
# $t1 = array[0]
# $t1 = $t1 * 2
# array[0] = $t1
lw
sll
sw
$t1, 4($s0)
$t1, $t1, 1
$t1, 4($s0)
# $t1 = array[1]
# $t1 = $t1 * 2
# array[1] = $t1
Computer Architecture
M C
int array[1000];
int i;
for (i=0; i < 1000; i = i + 1)
array[i] = array[i] * 8;
# np a ch c s ca mng vo $s0
lui $s0, 0x1234
# 0x1234 vo na cao ca $S0
ori $s0, $s0, 0x8000
# 0x8000 vo na thp ca $s0
Jan2016
0x12348000
// gi s a ch c s ca mng = 0x23b8f000
n
M hp ng MIPS
271
Jan2016
Computer Architecture
272
68
Jan2016
NKK-HUST
NKK-HUST
M hp ng MIPS
# $s0 = array base address (0x23b8f000), $s1 = i
# khi to cc thanh ghi
lui $s0, 0x23b8
# $s0 = 0x23b80000
ori $s0, $s0, 0xf000
# $s0 = 0x23b8f000
addi $s1, $0, 0
# i = 0
addi $t2, $0, 1000
# $t2 = 1000
# vng lp
loop: slt $t0, $s1, $t2
# i < 1000?
beq $t0, $0, done
# if not then done
sll $t0, $s1, 2
# $t0 = i*4
add $t0, $t0, $s0
# address of array[i]
lw
$t1, 0($t0)
# $t1 = array[i]
sll $t1, $t1, 3
# $t1 = array[i]*8
sw
$t1, 0($t0)
# array[i] = array[i]*8
addi $s1, $s1, 1
# i = i + 1
j
loop
# repeat
done:
Jan2016
Computer Architecture
1.
2.
3.
4.
273
NKK-HUST
Jan2016
Computer Architecture
n
n
n
n
n
Jan2016
C th c ghi li bi th tc c gi
Cc lnh gi th tc
274
NKK-HUST
Cc bc yu cu:
a ch ca lnh k tip (a ch tr v) c ct
thanh ghi $ra
Nhy n a ch ca th tc
275
Jan2016
276
69
Jan2016
NKK-HUST
NKK-HUST
Gi th tc lng nhau
Minh ha gi Th tc
main
main
PC
jal
proc
Prepare
to call
Prepare
to continue
PC
jal
proc
abc
Prepare
to call
Prepare
to continue
abc
Procedure
abc
Save
xyz
Save, etc.
jal
xyz
jr
$ra
Procedure
xyz
Restore
Restore
jr
Jan2016
Computer Architecture
277
NKK-HUST
Jan2016
Computer Architecture
278
M hp ng MIPS
Th tc l l th tc khng c li gi th tc khc
M C:
leaf_example:
addi $sp,
sw
$t1,
sw
$t0,
sw
$s0,
add
$t0,
add
$t1,
sub
$s0,
add
$v0,
lw
$s0,
lw
$t0,
lw
$t1,
addi $sp,
jr
$ra
$ra
NKK-HUST
V d Th tc l
n
jr
$ra
Computer Architecture
279
Jan2016
$sp, -12
8($sp)
4($sp)
0($sp)
$a0, $a1
$a2, $a3
$t0, $t1
$s0, $zero
0($sp)
4($sp)
8($sp)
$sp, 12
#
#
#
#
#
#
#
#
#
#
#
#
#
Computer Architecture
to 3 v tr stack
ct ni dung $t1
ct ni dung $t0
ct ni dung $s0
$t0 = g+h
$t1 = i+j
$s0 = (g+h)-(i+j)
tr kt qu sang $v0
khi phc $s0
khi phc $t0
khi phc $t1
xa 3 mc stack
tr v ni gi
280
70
Jan2016
NKK-HUST
NKK-HUST
V d Th tc cnh
n
n
M hp ng MIPS
fact:
addi
sw
sw
slti
beq
addi
addi
jr
L1: addi
jal
lw
lw
addi
mul
jr
L th tc c gi th tc khc
M C:
int fact (int n)
{
if (n < 1) return (1);
else return n * fact(n - 1);
}
n Tham s n $a0
n Kt qu $v0
Jan2016
Computer Architecture
281
NKK-HUST
Jan2016
$a0, -1
0($sp)
4($sp)
$sp, 8
$a0, $v0
#
#
#
#
#
#
#
#
#
#
#
#
#
#
nu ng, kt qu l 1
ly 2 mc t stack
v tr v
nu khng, gim n
gi qui
khi phc n ban u
v a ch tr v
ly 2 mc t stack
nhn nhn kt qu
v tr v
282
NKK-HUST
6. D liu k t
n
$sp
Local
variables
z
y
..
.
Saved
registers
c
b
a
..
.
$fp
Frame for
current
procedure
Before calling
Old ($fp)
c
b
a
..
.
ASCII: 128 k t
n
Frame for
previous
procedure
95 k th hin th , 33 m iu khin
Latin-1: 256 k t
n
ASCII v cc k t m rng
Unicode: Tp k t 32-bit
c s dng trong Java, C++,
n Hu ht cc k t ca cc ngn ng trn th
gii v cc k hiu
n
After calling
Computer Architecture
Cc tp k t c m ha theo byte
n
Frame for
current
procedure
$fp
Jan2016
$sp, -8
4($sp)
0($sp)
$a0, 1
$zero, L1
$zero, 1
$sp, 8
Computer Architecture
$sp
$sp,
$ra,
$a0,
$t0,
$t0,
$v0,
$sp,
$ra
$a0,
fact
$a0,
$ra,
$sp,
$v0,
$ra
283
Jan2016
Computer Architecture
284
71
Jan2016
NKK-HUST
NKK-HUST
Cc thao tc vi Byte/Halfword
n
n
V d copy String
lb rt, offset(rs)
lh rt, offset(rs)
n M rng du thnh 32 bits trong rt
sb rt, offset(rs)
sh rt, offset(rs)
n Ch lu byte/halfword bn phi
Jan2016
Computer Architecture
285
NKK-HUST
Jan2016
Computer Architecture
Jan2016
286
NKK-HUST
V d Copy String
n
M C:
M hp ng MIPS
strcpy:
addi
sw
add
L1: add
lbu
add
sb
beq
addi
j
L2: lw
addi
jr
$sp,
$s0,
$s0,
$t1,
$t2,
$t3,
$t2,
$t2,
$s0,
L1
$s0,
$sp,
$ra
$sp, -4
0($sp)
$zero, $zero
$s0, $a1
0($t1)
$s0, $a0
0($t3)
$zero, L2
$s0, 1
0($sp)
$sp, 4
#
#
#
#
#
#
#
#
#
#
#
#
#
Computer Architecture
n
n
n
n
287
Jan2016
# Move from Hi to rd
# Move from LO to rd
Computer Architecture
288
72
Jan2016
NKK-HUST
NKK-HUST
Cc lnh vi s du phy ng
n
n
n
V d:
n
n
n
Computer Architecture
289
NKK-HUST
Computer Architecture
Hu ht cc ch r nhnh l r nhnh gn
op
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
n
n
n
rs
rt
imm
5 bits
5 bits
16 bits
4 or 5
16
17
Exit
khong cch tng i tnh theo word
Lnh m my
PC-relative addressing
a ch ch = PC + hng s imm 4
Ch : trc PC c tng ln
Hng s imm 16-bit c gi tr trong di [-215 , +215 - 1]
Computer Architecture
op
6 bits
nh a ch tng i vi PC
n
Jan2016
290
NKK-HUST
bc1t, bc1f
VD:
bc1t TargetLabel
Jan2016
Jan2016
Cc lnh so snh
291
Jan2016
beq
000100
10000
10001
bne
000101
10000
10001
Computer Architecture
292
73
Jan2016
NKK-HUST
NKK-HUST
V d m lnh j v jr
j
jr
Cn m ha y a ch trong lnh
op
address
6 bits
26 bits
31
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
From PC
31
op
Computer Architecture
293
rs
20
rt
15
rd
10
sh
fn
0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ALU
instruction
NKK-HUST
Source
register
Jan2016
Unused
Unused
Unused
jr = 8
Computer Architecture
294
NKK-HUST
V d m ha lnh
R nhnh xa
n
$t1, $s3, 2
0x8000
19
add
0x8004
22
32
lw
$t0, 0($t1)
0x8008
35
bne
Loop: sll
0x800C
21
0x8010
19
19
0x8014
Loop
Computer Architecture
Nu ch r nhnh l qu xa m ha vi
offset 16-bit, assembler s vit li code
V d
beq $s0, $s1, L1
(lnh k tip)
...
L1:
0x2000
0x8018
Exit:
Jan2016
25
j=2
a ch ch = PC3128 : (address 4)
Jan2016
op
0 0 0 0 1 0
# nhy n v tr c nhn L1
# nhy n v tr c a ch $ra;
# $ra cha a ch tr v
L1
$ra
Jan2016
Computer Architecture
296
74
Jan2016
NKK-HUST
NKK-HUST
Tm tt v cc phng php nh a ch
2.10
117
1. Immediate addressing
1. nh a ch tc th
op
rs
rt
Immediate
2. Register addressing
2. nh a ch thanh ghi
op
rs
rt
rd
. . . funct
Registers
Register
3. Base addressing
op
3. nh a ch c s
rs
rt
Address
Register
Memory
+
Byte Halfword
Word
4. PC-relative addressing
op
rs
rt
4. nh a ch tng i vi PC
Address
PC
MARS
MipsIt
PCSpim
Memory
+
Word
5. Pseudodirect addressing
op
5. nh a ch gi trc tip
Address
PC
Memory
Word
Jan2016
Computer
FIGURE
2.18 Architecture
Illustration of the five MIPS addressing modes. The operands are shaded297
in color.
The operand of mode 3 is in memory, whereas the operand for mode 2 is a register. Note that versions of
load and store access bytes, halfwords, or words. For mode 1, the operand is 16 bits of the instruction itself.
Modes 4 and 5 address instructions in memory, with mode 4 adding a 16-bit address shifted left 2 bits to the
PC and mode 5 concatenating a 26-bit address shifted left 2 bits with the 4 upper bits of the PC. Note that a
single operation can use more than one addressing mode. Add, for example, uses both immediate (addi)
and register (add) addressing.
NKK-HUST
Jan2016
Hardware/
Software
Interface
Cc lnh (instructions)
D liu
n
Assembly Code
Assembler
Object File
Object Files
Library Files
Linker
298
NKK-HUST
Compiler
Computer Architecture
B nh:
n
Executable
Loader
Memory
Jan2016
Computer Architecture
299
Jan2016
Computer Architecture
300
75
Jan2016
NKK-HUST
NKK-HUST
Bn b nh ca MIPS
Address
V d: M C
Segment
int f, g, y;
variables
0xFFFFFFFC
Reserved
0x80000000
0x7FFFFFFC
Stack
int main(void)
{
f = 2;
g = 3;
y = sum(f, g);
return y;
}
Dynamic Data
0x10010000
Heap
0x1000FFFC
Static Data
0x10000000
0x0FFFFFFC
Text
0x00400000
0x003FFFFC
Reserved
0x00000000
Jan2016
Computer Architecture
301
NKK-HUST
Jan2016
Computer Architecture
302
NKK-HUST
// global
Bng k hiu
K hiu
$sp,
$ra,
$a0,
$a0,
$a1,
$a1,
sum
$v0,
$ra,
$sp,
$ra
$sp, -4
0($sp)
$0, 2
f
$0, 3
g
#
#
#
#
#
#
#
#
#
#
#
y
0($sp)
$sp, 4
stack frame
store $ra
$a0 = 2
f = 2
$a1 = 3
g = 3
call sum
y = sum()
restore $ra
restore $sp
return to OS
a ch
0x10000000
0x10000004
0x10000008
main
0x00400000
sum
0x0040002C
# $v0 = a + b
# return
Computer Architecture
303
Jan2016
Computer Architecture
304
76
necessary. Figure 6.33 shows the executable file. It has three sections:
the executable file header, the text segment, and the data segment.
The executable file header reports the text size (code size) and data size
(amount of globally declared data). Both are given in units of bytes. The
text segment gives the instructions in the order that they are stored in
memory.
The figure shows the instructions in human-readable format next to
the machine code for ease of interpretation, but the executable file
includes only machine instructions. The data segment gives the address
of each global variable. The global variables are addressed with respect
NKK-HUST
to the base address given by the global pointer, $gp. For example, the first
Jan2016
NKK-HUST
Text segment
Data segment
Text Size
Data Size
Memory
Reserved
0x7FFFFFFC
Stack
0x10010000
Heap
Address
Instruction
0x00400000
0x23BDFFFC
0x00400004
0xAFBF0000
sw
0x00400008
0x20040002
0x0040000C
0xAF848000
sw
0x00400010
0x20050003
0x00400014
0xAF858004
sw
0x00400018
0x0C10000B
jal
0x0040002C
0x0040001C
0xAF828008
sw
$v0, 0x8008($gp)
0x00400020
0x8FBF0000
lw
$ra, 0($sp)
0x00400024
0x23BD0004
0x03E00008
0x00400028
0x03E00008
jr
0x00851020
0x0040002C
0x00851020
0x00400030
0x03E00008
jr
Address
Data
0x0C10000B
0x10000000
0x20050003
0x10000004
0xAF848000
0x10000008
$ra, 0($sp)
$a0, 0x8000($gp)
$sp = 0x7FFFFFFC
$gp = 0x10008000
y
$a1, 0x8004($gp)
g
0x10000000
$ra
0x03E00008
0x23BD0004
$ra
0x8FBF0000
0xAF828008
0xAF858004
0x20040002
0xAFBF0000
0x00400000
0x23BDFFFC
PC = 0x00400000
Reserved
Jan2016
Computer Architecture
305
NKK-HUST
Jan2016
Computer Architecture
306
NKK-HUST
V d lnh gi (Pseudoinstruction)
Pseudoinstruction
Jan2016
MIPS Instructions
li $s0, 0x1234AA77
clear $t0
nop
Computer Architecture
Ht chng 5
307
Jan2016
Computer Architecture
308
77
Jan2016
NKK-HUST
NKK-HUST
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Chng 6
B X L
Jan2016
Computer Architecture
309
NKK-HUST
Jan2016
Computer Architecture
310
NKK-HUST
Ni dung ca chng 6
Nhim v ca CPU:
n
Jan2016
Computer Architecture
311
Jan2016
312
78
Jan2016
NKK-HUST
NKK-HUST
S cu trc c bn ca CPU
n
n v
s hc
v logic
(ALU)
n v
iu khin
(CU)
2. n v s hc v logic
Tp
thanh ghi
(RF)
bus bn trong
bus iu khin
Jan2016
bus d liu
bus a ch
Computer Architecture
313
NKK-HUST
Jan2016
Computer Architecture
314
NKK-HUST
M hnh kt ni ALU
D liu t
cc thanh ghi
Cc tn hiu
t n v
iu khin
3. n v iu khin
D liu n
cc thanh ghi
Chc nng
iu khin nhn lnh t b nh a vo CPU
n Tng ni dung ca PC tr sang lnh k tip
n Gii m lnh c nhn xc nh thao
tc m lnh yu cu
n Pht ra cc tn hiu iu khin thc hin lnh
n Nhn cc tn hiu yu cu t bus h thng v
p ng vi cc yu cu .
n
n v
s hc v logic
(ALU)
Thanh ghi c
Computer Architecture
315
Jan2016
Computer Architecture
316
79
Jan2016
NKK-HUST
NKK-HUST
M hnh kt ni n v iu khin
Cc tn hiu a n n v iu khin
n
n
Cc tn hiu
iu khin
bn trong CPU
Cc c
n v iu khin
Clock
Cc tn hiu
iu khin t
bus h thng
Cc tn hiu
iu khin n
bus h thng
Bus iu khin
Jan2016
Computer Architecture
317
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
n
n
iu khin b nh
iu khin cc m-un vo-ra
n
n
n
Jan2016
318
Computer Architecture
319
Jan2016
Nhn lnh
Gii m lnh
Nhn ton hng
Thc hin lnh
Ct ton hng
Ngt
Computer Architecture
320
80
Jan2016
NKK-HUST
NKK-HUST
Nhn lnh
Gii m
thao tc
lnh
Tnh
a ch
ton hng
Nhiu
ton
hng
Thao tc
d liu
Tnh
a ch
ton hng
Quay li vi d liu
String hoc Vector
Jan2016
Ct
ton hng
Nhiu
ton
hng
Tnh
a ch
ca lnh
Nhn lnh
Kim tra
ngt
C
ngt
n
Ngt
Khng
ngt
Computer Architecture
321
NKK-HUST
Jan2016
Computer Architecture
322
NKK-HUST
Gii m lnh
CPU
PC
B nh
n v
iu khin
IR
Jan2016
Computer Architecture
323
Jan2016
Computer Architecture
324
81
Jan2016
NKK-HUST
NKK-HUST
Nhn d liu t b nh
S m t nhn d liu t b nh
CPU
MAR
n v
iu khin
MBR
Jan2016
Computer Architecture
325
NKK-HUST
Jan2016
326
NKK-HUST
Jan2016
Computer Architecture
B nh
c/Ghi b nh
Vo/Ra
Chuyn gia cc thanh ghi
Php ton s hc/logic
Chuyn iu khin (r nhnh)
...
Computer Architecture
CPU a a ch ra bus a ch
327
Jan2016
Computer Architecture
328
82
Jan2016
NKK-HUST
NKK-HUST
Ngt
n
CPU
MAR
n
n v
iu khin
B nh
n
n
MBR
Jan2016
Computer Architecture
329
NKK-HUST
Jan2016
330
NKK-HUST
CPU
SP
MAR
PC
n v
iu khin
B nh
MBR
Jan2016
Computer Architecture
331
Jan2016
Computer Architecture
332
83
Jan2016
NKK-HUST
NKK-HUST
B nh vi chng trnh
(ROM) lu tr cc vi
Cc tn hiu
iu khin t
chng trnh
bus h thng
(microprogram)
Clock
Mt vi chng trnh bao Cc c
gm cc vi lnh
(microinstruction)
Mi vi lnh m ho cho
mt vi thao tc
(microoperation)
hon thnh mt lnh
cn thc hin mt hoc
mt vi vi chng trnh
Tc chm
Jan2016
2. n v iu khin ni kt cng
n
B gii m
Mch dy
B nh
vi chng trnh
n
n
Vi lnh
tip
theo
S dng mch
cng gii m
v to cc tn hiu
iu khin thc
hin lnh
Tc nhanh
Clock
n v iu khin
phc tp
Tn hiu iu
khin bn trong
CPU
T1
T2
..
.
..
.
n v
iu khin
Cc
c
Tn
... C
m-1
Cc tn hiu iu khin
Tn hiu iu
khin n bus h
thng
333
Jan2016
Computer Architecture
334
NKK-HUST
lnh 1
lnh 2
lnh 3
Jan2016
Mch
phn
chia thi
gian
C0 C1
NKK-HUST
B gii m
B gii m vi lnh
Computer Architecture
Computer Architecture
lnh 4
lnh 5
lnh 6
lnh 7
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
lnh 8
10
11
12
WB
Jan2016
Computer Architecture
336
84
Jan2016
NKK-HUST
NKK-HUST
Hazard cu trc
Jan2016
Computer Architecture
337
NKK-HUST
Computer Architecture
338
NKK-HUST
Computer Architecture
Jan2016
Jan2016
Hazard d liu
n
339
Jan2016
Computer Architecture
340
85
Jan2016
NKK-HUST
NKK-HUST
stall
stall
lw
lw
add
sw
lw
add
sw
$t1,
$t2,
$t3,
$t3,
$t4,
$t5,
$t5,
0($t0)
4($t0)
$t1, $t2
12($t0)
8($t0)
$t1, $t4
16($t0)
13 cycles
Jan2016
Computer Architecture
341
NKK-HUST
Jan2016
Computer Architecture
0($t0)
4($t0)
8($t0)
$t1, $t2
12($t0)
$t1, $t4
16($t0)
11 cycles
342
$t1,
$t2,
$t4,
$t3,
$t3,
$t5,
$t5,
NKK-HUST
Hazard iu khin
n
lw
lw
lw
add
sw
add
sw
Vi ng ng ca MIPS
Cn so snh thanh ghi v tnh a ch ch
sm trong ng ng
n Thm phn cng thc hin vic
trong cng on ID
n
Jan2016
Computer Architecture
343
Jan2016
Computer Architecture
344
86
Jan2016
NKK-HUST
NKK-HUST
D on r nhnh
n
Nhng ng ng di hn khng th
sm xc nh d dng kt qu r nhnh
n
Prediction
correct
D on kt qu r nhnh
n
Vi MIPS
n
n
Jan2016
C th d on r nhnh khng xy ra
Nhn lnh ngay sau lnh r nhnh (khng
lm tr)
Computer Architecture
Prediction
incorrect
345
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
c im ca ng ng
Tng s cng on ca ng ng
Lnh 1
Lnh 2
Lnh 3
Lnh 4
Lnh 5
Lnh 6
Cc dng hazard:
n
346
Lnh 2
Lnh 3
Lnh 4
Lnh 5
Lnh 6
Jan2016
Computer Architecture
347
Jan2016
Computer Architecture
348
87
Jan2016
NKK-HUST
NKK-HUST
Ht chng 6
Jan2016
Computer Architecture
349
NKK-HUST
Jan2016
350
NKK-HUST
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Chng 7
B NH MY TNH
Jan2016
Computer Architecture
Computer Architecture
351
Jan2016
Computer Architecture
352
88
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 7
Bn trong CPU:
B nh trong:
n
n
n
Computer Architecture
353
NKK-HUST
Jan2016
Computer Architecture
354
NKK-HUST
Cc c trng ca b nh (tip)
n
Cc c trng ca b nh (tip)
n v truyn
n
T nh
Khi nh
n
n
n
Jan2016
cc thit b lu tr
Dung lng
n
Jan2016
b nh chnh
b nh m (cache)
B nh ngoi:
n
tp thanh ghi
Computer Architecture
Kiu vt l
n
n
n
355
Jan2016
Computer Architecture
356
89
Jan2016
NKK-HUST
NKK-HUST
Cc c trng ca b nh (tip)
n
2. Phn cp b nh
Cc c tnh vt l
B vi x l
CPU
Tp
thanh
ghi
Cache
B nh
chnh
Thit b
lu tr
(HDD, SSD)
T chc
B nh
mng
Computer Architecture
357
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Cng ngh b nh
Thi gian
truy nhp
Gi thnh/GiB
(2012)
SRAM
0,5 2,5 ns
$500 $1000
DRAM
50 70 ns
$10 $20
5 20 ms
$0,75 $1
$0,05 $0,1
B nh l tng
n
n
Jan2016
Cng ngh
b nh
358
359
Jan2016
Computer Architecture
360
90
NKK-HUST
Jan2016
NKK-HUST
7.2. B nh chnh
1. B nh bn dn
Kiu b nh
Read Only Memory
(ROM)
B nh
ch c
Programmable ROM
(PROM)
Erasable PROM
(EPROM)
Electrically Erasable
PROM (EEPROM)
Kh nng xo
Tiu
chun
Khng xo
c
B nh
hu nh
ch c
Flash memory
Random Access
Memory (RAM)
B nh
c-ghi
Jan2016
C ch ghi
Tnh
kh bin
Mt n
n
n
Bng in
Khng
kh bin
bng in,
mc tng byte
Bng in
Computer Architecture
Kh bin
361
Jan2016
Computer Architecture
Jan2016
B nh Flash
ROM mt n:
n
362
NKK-HUST
Cc kiu ROM
bng in,
tng khi
NKK-HUST
B nh khng kh bin
Lu tr cc thng tin sau:
363
Jan2016
Computer Architecture
364
91
Jan2016
NKK-HUST
NKK-HUST
Jan2016
Computer Architecture
365
NKK-HUST
Jan2016
n
n
n
n
n
Jan2016
366
NKK-HUST
Computer Architecture
Cc bit c lu tr trn t in
cn phi c mch lm ti
Cu trc n gin
Dung lng ln
Tc chm hn
R tin hn
Dng lm b nh chnh
Computer Architecture
n
n
n
n
367
Jan2016
Ci tin tng tc
Synchronous DRAM (SDRAM): lm vic
c ng b bi xung clock
DDR-SDRAM (Double Data Rate SDRAM)
DDR3, DDR4
Computer Architecture
368
92
Jan2016
NKK-HUST
NKK-HUST
T chc ca chip nh
S c bn ca chip nh
Cc tn hiu ca chip nh
n
n
A0
D0
A1
D1
.
.
.
Chip nh
2n x m bit
.
.
.
An-1
n
Chip
Dm-1
m-bit
CS
WE
Jan2016
Cc ng a ch: An-1 A0 c 2n t nh
Cc ng d liu: Dm-1 D0 di t
nh = m bit
Dung lng chip nh = 2n x m bit
Cc ng iu khin:
package. The package includes 32 pins, which is one of the standard chip package
sizes. The pins support the following signal lines:
OE
Computer Architecture
369
NKK-HUST
Jan2016
NKK-HUST
T chc ca DRAM
The address of the word being accessed. For 1M words, a total of 20 (220 = 1M)
pins are needed (A0A19).
The data to be read out, consisting of 8 lines (D0D7).
Computer Architecture
The power supply to the chip (Vcc).
370
a RAM can be updated, the data pins are input/output. The write enable (WE)
and output enable (OE) pins indicate whether this is a write or read operation.
A19
32
Vcc
Vcc
24
Vss
A16
31
A18
D0
23
D3
A15
30
A17
D1
22
D2
A12
29
A14
WE
21
CAS
A7
28
A13
RAS
A6
27
A8
NC
A5
26
A9
A4
25
A3
9 32-Pin Dip 24
A2
10
A1
11
A0
12
D0
20
6 24-Pin Dip 19
OE
A10
18
A8
A11
A0
17
A7
Vpp
A1
16
A6
23
A10
A2
10
15
A5
22
CE
A3
11
14
A4
21
D7
Vcc
12 Top View 13
Vss
13
20
D6
D1
14
19
D5
D2
15
18
D4
Vss
16 Top View 17
D3
0.6"
Figure 5.4
Jan2016
Computer Architecture
371
Jan2016
0.6"
A9
Computer Architecture
372
93
Jan2016
NKK-HUST
NKK-HUST
Thit k m-un nh bn dn
n
n
Tng di t nh
VD1:
n Cho chip nh SRAM 4K x 4 bit
n Thit k m-un nh 4K x 8 bit
Gii:
n Dung lng chip nh = 212 x 4 bit
n chip nh c:
2n x
Thit k tng di t nh
Thit k tng s lng t nh
Thit k kt hp
n
n
m-un nh cn c:
n
n
Jan2016
Computer Architecture
373
NKK-HUST
Jan2016
12 chn a ch
4 chn d liu
12 chn a ch
8 chn d liu
Computer Architecture
374
NKK-HUST
S v d tng di t nh
A11A0
A11A0
D3 D0
D3 D0
CS
WE
A11A0
D7 D4
CS
OE
WE
D3 D0
OE
CS
WE
OE
Jan2016
Computer Architecture
375
Jan2016
Computer Architecture
376
94
Jan2016
NKK-HUST
NKK-HUST
Tng s lng t nh
S v d tng s lng t nh
VD2:
n Cho chip nh SRAM 4K x 8 bit
n Thit k m-un nh 8K x 8 bit
Gii:
12 x 8 bit
n Dung lng chip nh = 2
n chip nh c:
n
n
A11A0
A11A0
D7 D0
CS
A12
OE
D7 D0
A11A0
D7 D0
13 chn a ch
8 chn d liu
Computer Architecture
377
NKK-HUST
Jan2016
CS
Y0 Y1
WE
WE
OE
OE
Computer Architecture
378
NKK-HUST
B gii m 24
A
B
Jan2016
Y1
12 chn a ch
8 chn d liu
Jan2016
WE
Y0
B
gii m
1 2
Thit k kt hp
Y0
Y1
Y2
Y3
Y0
Y1
B
gii m
2 4 Y2
Y3
Computer Architecture
V d 3:
n Cho chip nh SRAM 4K x 4 bit
n Thit k m-un nh 8K x 8 bit
Phng php thc hin:
n Kt hp v d 1 v v d 2
n T v s thit k
379
Jan2016
Computer Architecture
380
95
Jan2016
NKK-HUST
NKK-HUST
2. Cc c trng c bn ca b nh chnh
n
n
n
Jan2016
Computer Architecture
381
NKK-HUST
Jan2016
Computer Architecture
382
NKK-HUST
0
1
2
3
4
5
6
. . .
AN-1 - A0
Bng 0
1
3
5
7
9
...
i
...
2i+1
AN-1 - A1
...
. . .
0
2
4
6
8
2i
...
BE1
D15 - D8
D7 - D0
Jan2016
Computer Architecture
383
Jan2016
Computer Architecture
BE0
D7 - D0
384
96
Jan2016
NKK-HUST
NKK-HUST
bng 2
3
7
11
15
19
...
...
Jan2016
BE2
D23-D16
BE1
BE0
D63-D56
385
NKK-HUST
Jan2016
BE0
D7 - D0
386
V d v thao tc ca cache
n
n
n
n
CPU
cache
B nh
chnh
truyn theo t nh
Jan2016
BE1
D15 - D8
NKK-HUST
8i
...
Computer Architecture
...
...
BE7
D7 - D0
0
8
16
8i+1
...
Computer Architecture
...
8i+7
AN-1 - A3
...
D15 - D8
bng 0
1
9
17
...
...
4i
...
bng 1
7
15
23
...
4i+1
...
BE3
D31-D24
0
4
8
12
16
...
4i+2
bng 7
bng 0
1
5
9
13
17
...
4i+3
AN-1 - A2
bng 1
2
6
10
14
18
Computer Architecture
387
Jan2016
Computer Architecture
388
97
Jan2016
NKK-HUST
NKK-HUST
B nh chnh
Tag
Cache
L0
L1
L2
L3
CPU
...
B0
B1
B2
B3
n
n
B nh chnh c 2N byte nh
B nh chnh v cache c chia thnh cc
khi c kch thc bng nhau
...
Bj
Li
...
...
Lm-1
BP-1
Jan2016
Computer Architecture
389
NKK-HUST
Computer Architecture
390
NKK-HUST
2. Cc phng php nh x
Mt s Block ca b nh chnh c np vo
cc Line ca cache
Ni dung Tag (th nh) cho bit Block no ca
b nh chnh hin ang c cha Line
Ni dung Tag c cp nht mi khi Block t
b nh chnh np vo Line
Khi CPU truy nhp (c/ghi) mt t nh, c
hai kh nng xy ra:
n
n
Jan2016
Jan2016
Computer Architecture
391
Jan2016
Computer Architecture
392
98
Jan2016
NKK-HUST
NKK-HUST
nh x trc tip
n
Mi Block ca b nh chnh ch c th c np
vo mt Line ca cache:
n
n
n
n
n
n
n
B0 L0
B1 L1
....
Bm-1 Lm-1
Bm L0
Bm+1 L1
....
n
Jan2016
Line
Word
T bit
L bit
W bit
L0
B0
L1
B1
...
...
Li
So snh
...
cache hit
Bj ch c th np vo L j mod m
m l s Line ca cache.
Bj
...
Lm-1
Bm-1
Computer Architecture
cache miss
393
NKK-HUST
Jan2016
Computer Architecture
394
NKK-HUST
n
n
Computer Architecture
n
n
Jan2016
Tag
B nh
chnh
Cache
Tng qut
n
Tag
Nbit a ch b nh
395
Jan2016
396
99
Jan2016
NKK-HUST
NKK-HUST
Mi Block c th np vo bt k Line
no ca cache
a ch ca b nh chnh chia thnh hai
trng:
Tag
N bit a ch b nh
Tag
Word
T bit
W bit
L0
B0
L1
B1
...
...
Trng Word
n Trng Tag dng xc nh Block ca
b nh chnh
n
B nh
chnh
Cache
Li
So snh
...
cache hit
Bj
...
Lm-1
Bm-1
cache miss
Jan2016
Computer Architecture
397
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Jan2016
n
n
n
nh x lin kt tp hp
398
t s dng
n
Computer Architecture
399
Jan2016
400
100
Jan2016
NKK-HUST
NKK-HUST
nh x lin kt tp hp (tip)
Tag
Nbit a ch b nh
Tag
Set
Word
T bit
S bit
W bit
nh x lin kt tp hp (tip)
Cache
B nh
chnh
L0
B0
L1
L2
S0
L3
L6
B2
B3
L4
L5
B1
B4
S1
L7
B5
n
...
...
So snh
Sk X
...
cache hit
...
Sv-1
n
n
cache miss
Jan2016
Computer Architecture
401
NKK-HUST
n
n
n
n
Computer Architecture
402
NKK-HUST
Vi nh x trc tip
Jan2016
Jan2016
V d v nh x a ch
n
n
n
n
n
nh x trc tip
nh x lin kt ton phn
nh x lin kt tp hp 4 ng
Computer Architecture
403
Jan2016
Line
Word
14 bit
13 bit
5 bit
Computer Architecture
404
101
Jan2016
NKK-HUST
NKK-HUST
Vi nh x lin kt tp hp 4 ng
B nh chnh = 4GiB = 232 byte N = 32 bit
n Line = 32 byte = 25 byte W = 5 bit
n S Line trong cache = 218/ 25 = 213 Line
n Mt Set c 4 Line = 22 Line
s Set trong cache = 213/ 22 = 211 Set
S = 11 bit
n S bit ca trng Tag s l: T = 32 - (11 + 5)
= 16 bit
n
n
n
n
Tag
Word
27 bit
5 bit
Tag
16 bit
Jan2016
Computer Architecture
405
NKK-HUST
Jan2016
Word
11 bit
5 bit
Computer Architecture
406
NKK-HUST
Vi nh x trc tip:
n Khng phi la chn
n Mi Block ch nh x vo mt Line xc
nh
n Thay th Block Line
n
n
n
Jan2016
Set
Computer Architecture
407
Jan2016
408
102
Jan2016
NKK-HUST
NKK-HUST
7.4. B nh ngoi
tc chm
n
n
ch ghi ra cache
tc nhanh
n
n
Bng t: t s dng
a t: a cng HDD (Hard Disk Drive)
a quang: CD, DVD
B nh Flash:
n
Jan2016
Computer Architecture
n
n
409
Jan2016
NKK-HUST
Computer Architecture
410
NKK-HUST
188
Tn ti di dng cc thit b lu tr
Cc kiu b nh ngoi
Tracks
Intersector gap
S6
SN
Surface 9
Platter
Surface 8
Surface 7
n
n
n
Dung lng ln
Tc c/ghi chm
Tn nng lng
D b li c hc
R tin
Surface 2
Surface 1
SN
S5
S5
Surface 4
Surface 3
Intertrack gap
S6
Direction of
arm motion
Surface 6
Surface 5
HDD
191
S1
S4
S4
S1
Surface 0
S2
Spindle
S2
Figure 6.5
S3
S3
Jan2016
Computer Architecture
Figure 6.2 depicts this data layout. Adjacent tracks are separated by gaps. This
prevents, or at least minimizes, errors due to misalignment of the head or simply
interference of magnetic fields.
Data are transferred to and from the disk in sectors (Figure 6.2). There are
typically hundreds of sectors per track, and these may be of either fixed or variable
length. In most contemporary systems, fixed-length sectors are used, with 512 bytes
being the nearly universal sector size. To avoid imposing unreasonable precision
requirements on the system, adjacent sectors are separated by intratrack (intersector) gaps.
A bit near the center of a rotating disk travels past a fixed point (such as a read
write head) slower than a bit on the outside. Therefore, some way must be found
to compensate for the variation in speed so that the head can read all the bits at the
same rate. This can be done by increasing the spacing between bits of information
recorded in segments of the disk. The information can then be scanned at the same
rate by rotating the disk at a fixed speed, known as the constant angular velocity
(CAV). Figure 6.3a shows the layout of a disk using CAV. The disk is divided into
a number of pie-shaped sectors and into a series of concentric tracks. The advantage of using CAV is that individual blocks of data can be directly addressed by
track and sector. To move the head from its current location to a specific address, it
only takes a short movement of the head to a specific track and a short wait for the
Boom
tracks that are of equal distance from the center of the disk. The set of all the tracks
in the same relative position on the platter is referred to as a cylinder. For example,
all of the shaded tracks in Figure 6.6 are part of one cylinder.
Finally, the head mechanism provides a classification of disks into three types.
Traditionally, the read-write head has been positioned a fixed distance above the
platter, allowing an air gap. At the other extreme is a head mechanism that actually
comes into physical contact with the medium during a read or write operation. This
mechanism is used with the floppy disk, which is a small, flexible platter and the
least expensive type of disk.
Jan2016
Computer Architecture
412
103
Jan2016
NKK-HUST
NKK-HUST
n
n
Jan2016
a quang
B nh bn dn flash
Khng kh bin
Tc nhanh
Tiu th nng lng t
Gm nhiu chip nh flash
v cho php truy cp song
song
t b li
t tin
Computer Architecture
n
n
n
413
Jan2016
Computer Architecture
414
NKK-HUST
DVD
n
NKK-HUST
CD (Compact Disc)
RAID 0, 1, 2
198
strip 0
strip 1
strip 2
strip 4
strip 5
strip 6
strip 3
strip 7
strip 8
strip 9
strip 10
strip 11
strip 12
strip 13
strip 14
strip 15
strip 0
strip 1
strip 2
strip 3
strip 0
strip 1
strip 2
strip 4
strip 5
strip 6
strip 7
strip 4
strip 5
strip 6
strip 3
strip 7
strip 8
strip 9
strip 10
strip 11
strip 8
strip 9
strip 10
strip 11
strip 12
strip 13
strip 14
strip 15
strip 12
strip 13
strip 14
strip 15
b2
b3
f0(b)
f1(b)
f2(b)
b0
b1
Computer Architecture
415
Jan2016
Computer Architecture
416
RAID Level 0
RAID level 0 is not a true member of the RAID family because it does not include
redundancy to improve performance. However, there are a few applications, such
as some on supercomputers in which performance and capacity are primary concerns and low cost is more important than improved reliability.
For RAID 0, the user and system data are distributed across all of the disks
in the array. This has a notable advantage over the use of a single large disk: If two
-different I/O requests are pending for two different blocks of data, then there is a
good chance that the requested blocks are on different disks. Thus, the two requests
can be issued in parallel, reducing the I/O queuing time.
But RAID 0, as with all of the RAID levels, goes further than simply distribut-
104
b0
b1
b2
b3
P(b)
Jan2016
(d) RAID 3 (Bit-interleaved parity)
NKK-HUST
NKK-HUST
RAID 3 & 4
block 1
block 2
block 3
block 4
block 5
block 6
block 7
P(4-7)
block 8
block 9
block 10
block 11
P(8-11)
block 12
block 13
block 14
block 15
P(12-15)
RAID 5 & 6
6.2 / RAID
b0
block 0
b1
b2
b3
199
P(0-3)
P(b)
block 0
block 1
block 2
block 3
P(0-3)
block 4
block 5
block 6
P(4-7)
block 7
block 8
block 9
P(8-11)
block 10
block 11
block 12
P(12-15)
block 13
block 14
block 15
P(16-19)
block 16
block 17
block 18
block 19
block 0
block 1
block 2
block 3
block 4
block 5
block 6
block 7
P(0-3)
P(4-7)
block 8
block 9
block 10
block 11
P(8-11)
block 12
block 13
block 14
block 15
P(12-15)
block 0
block 1
block 2
block 3
P(0-3)
Q(0-3)
block 4
block 5
block 6
P(4-7)
Q(4-7)
block 7
block 8
block 9
P(8-11)
Q(8-11)
block 10
block 11
block 12
P(12-15)
Q(12-15)
block 13
block 14
block 15
Jan2016
block 0
block 1
block 4
block 5
block 3
P(0-3)
block 6
P(4-7)
Computer Architecture
P(8-11)
block 10
block 2
block 7
block 8
block 9
block 12
P(12-15)
block 13
block 14
block 15
P(16-19)
block 16
block 17
block 18
block 19
417
block 11
second
Jan2016
418
NKK-HUST
NKK-HUST
nh x d liu ca RAID 0
block 0
block 1
block 2
block 3
200 block
CHAPTER
6 / EXTERNAL
MEMORY
4
block 5
block 6
block 8
block 9
Physical
P(8-11)
Logical
disk
block
12
strip 0
strip 0
strip 1
strip 4
strip 2
P(4-7)
Q(4-7)
block 7
block 10
block 11
Physical
disk 1block
strip 8
RAID Levels
(continued)
strip
12
13
Physical
diskblock
2
strip 2
14
Q(0-3)
Q(8-11)
strip 1
Figure
strip 3 6.8
P(0-3)
Physical
disk
3
block
15
strip 3
strip 5
strip 6
strip 7
strip 9
strip 10
strip 11
strip 13
strip 14
strip 15
strip 4
strip 5
second
on each disk; and so on. The advantage of this layout is that if a single
stripstrips
6
I/O request
consists of Array
multiple logically contiguous strips, then up to n strips for
strip 7
management
that request
can be handled
in parallel, greatly reducing the I/O transfer time.
strip 8
software
Figure
6.9 indicates the use of array management software to map between
strip
9
logical
and
strip
10 physical disk space. This software may execute either in the disk subsystem
11 computer.
or in strip
a host
strip 12
strip 13
strip 14
strip 15
Figure 6.9
RAID 0
Jan2016
FOR HIGH
I/O
Khi nim b nh o: gm b nh
chnh v b nh ngoi m c CPU
coi nh l mt b nh duy nht (b nh
chnh).
Cc k thut thc hin b nh o:
419
Jan2016
Computer Architecture
420
105
A physical address is an actual location in main memory. When the processor executes a process, it automatically converts from logical to physical address by adding
the current starting location of the process, called its base address, to each logical
address. This is another example of a processor hardware feature designed to meet
an OS requirement. The exact nature of this hardware feature depends on the memory management strategy in use. We will see several examples later in this chapter.
Jan2016
Paging
NKK-HUST
NKK-HUST
Phn trang
Both unequal fixed-size and variable-size partitions are inefficient in the use of
memory. Suppose, however, that memory is partitioned into equal fixed-size chunks
that are relatively small, and that each process is also divided into small fixed-size
chunks of some size. Then the chunks of a program, known as pages, could be
assigned to available chunks of memory, known as frames, or page frames. At most,
then, the wasted space in memory for that process is a fraction of the last page.
Figure 8.15 shows an example of the use of pages and frames. At a given point
in time, some of the frames in memory are in use and some are free. The list of free
frames is maintained by the OS. Process A, stored on disk, consists of four pages.
Jan2016
NKK-HUST
process. Within the program, each logical address consists of a page number and
a relative address within the page. Recall that in the case of simple partitioning, a
logical address is the location of Computer
a word relative
to the beginning of the program;
Architecture
the processor translates that into a physical address. With paging, the logicalto-physical address translation is still done by processor hardware. The processor
must know how to access the page table of the current process. Presented with a
logical address (page number, relative address), the processor uses the page table
to produce a physical address (frame number, relative address). An example is
shown in Figure 8.16.
This approach solves the problems raised earlier. Main memory is divided
into many small equal-size frames. Each process is divided into frame-size pages:
smaller processes require fewer pages, larger processes require more. When a
process is brought in, its pages are loaded into available frames, and a page table
is set up.
Process A
Page 0
Page 1
Page 2
Page 3
Logical
address
30
Physical
address
14
30
Page 3
of A
15
421
Jan2016
In
use
17
In
use
18
16
In
use
17
In
use
Process A
page table
18
Page 0
of A
19
In
use
18
In
use
13
14
15
20
(b) After
422
n
n
18
Li trang
Page 0
of A
16
17
15
Page 2
of A
Page 3
15 of A
Computer Architecture
13
14
14
NKK-HUST
16
18
14
(a) Before
n
13
Page 1
of A
Page 2
of A
13
Page 0
Page 1
Page 2
Page 3
20
Process A
19
Main
memory
13
13
15
Page 1
of A
Main
memory
Main
memory
qun
for each process. The page table shows the frame location for each page of the
n
Process A
page table
Jan2016
Computer Architecture
423
Jan2016
Computer Architecture
424
106
Jan2016
NKK-HUST
NKK-HUST
Tht bi
n
n
n
n
n
Li ch
n
n
Jan2016
Computer Architecture
425
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
B nh trn my tnh PC
8.3 / MEMORY MANAGEMENT
291
Virtual address
n bits
Page # Offset
n bits
Hash
function
m bits
Page #
Control
bits
Process
ID
Chain
2m ! 1
Inverted page table
(one entry for each
physical memory frame)
Figure 8.17
Jan2016
426
B nh chnh: Tn ti di dng cc
m-un nh RAM
Frame # Offset
m bits
Real address
inverted page table for each real memory page frame rather than one per virtual
Computer Architecture
page. Thus a fixed proportion of real memory is required for the tables regardless of
the number of processes or virtual pages supported. Because more than one virtual
address may map into the same hash table entry, a chaining technique is used for
managing the overflow. The hashing technique results in chains that are typically
shortbetween one and two entries. The page tables structure is called inverted
because it indexes page table entries by frame number rather than by virtual page
number.
Translation
Lookaside Buffer
Nguyn Kim Khnh
DCE-HUST
In principle, then, every virtual memory reference can cause two physical memory accesses: one to fetch the appropriate page table entry, and one to fetch the
desired data. Thus, a straightforward virtual memory scheme would have the effect
427
Jan2016
Computer Architecture
428
107
Jan2016
NKK-HUST
NKK-HUST
B nh trn PC (tip)
n
n
n
n
Jan2016
Ht chng 7
CMOS RAM:
n
429
NKK-HUST
Jan2016
430
NKK-HUST
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Chng 8
H THNG VO-RA
Nguyn Kim Khnh
Trng i hc Bch khoa H Ni
Jan2016
Computer Architecture
Computer Architecture
431
Jan2016
Computer Architecture
432
108
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 8
Computer Architecture
433
NKK-HUST
M-un
vo-ra
Thit b
vo-ra
M-un
vo-ra
Thit b
vo-ra
Computer Architecture
434
NKK-HUST
Thit b vo-ra
n
n
Nguyn tc hot ng
Tc
Khun dng d liu
n
n
Computer Architecture
435
Jan2016
Giao tip:
n
Jan2016
Cc thit b vo-ra
Cc m-un vo-ra
Jan2016
c im ca h thng vo-ra
n
Thit b
vo-ra
Vo d liu (Input)
Ra d liu (Output)
Jan2016
Bus
h
thng
Ngi - my
My - my
Computer Architecture
436
109
Jan2016
NKK-HUST
NKK-HUST
B
m
d
liu
B
chuyn
i
tn hiu
M-un vo-ra
n
D liu
n/t
bn ngoi
Chc nng:
n
n
n
Tn hiu
iu khin
Tn hiu
trng thi
Jan2016
Computer Architecture
437
NKK-HUST
Jan2016
Computer Architecture
438
NKK-HUST
Bus
n
Cc
ng
d liu
d liu
B m
d liu
Cng
vo
ra
Tn hiu
iu khin
n
Cc
ng
a ch
d liu
Khi logic
iu khin
Cc
ng
iu
khin
Cng
vo
ra
Tn hiu
iu khin
439
Jan2016
Cc b x l 680x0 ca Motorola
Cc b x l theo kin trc RISC: MIPS, ARM, ...
Tn hiu
trng thi
Computer Architecture
Hu ht cc b x l ch c mt khng gian a
ch chung cho c cc ngn nh v cc cng
vo-ra
n
Tn hiu
trng thi
Jan2016
iu khin v nh thi
Trao i thng tin vi CPU hoc b nh chnh
Trao i thng tin vi thit b vo-ra
m gia bn trong my tnh vi thit b vo-ra
Pht hin li ca thit b vo-ra
Khng gian a ch b nh
Khng gian a ch vo-ra
V d: Intel x86
Computer Architecture
440
110
Jan2016
NKK-HUST
NKK-HUST
N bit
000...000
000...001
000...010
000...011
000...100
000...101
Khng gian a
ch vo-ra
.
.
.
.
.
.
00...00
00...01
00...10
00...11
Vo-ra theo bn b nh
(Memory mapped IO)
Vo-ra ring bit
(Isolated IO hay IO mapped IO)
.
.
.
11...11
.
.
.
111...111
Jan2016
Computer Architecture
441
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
Vo-ra theo bn b nh
n
n
n
n
n
n
Jan2016
442
n
n
Cng 1: 0xFFFFFFF4
Cng 2: 0xFFFFFFF8
# a gi tr 0x41
# ra cng 1
443
Jan2016
Computer Architecture
444
111
Jan2016
NKK-HUST
NKK-HUST
n
n
Jan2016
Computer Architecture
445
NKK-HUST
Jan2016
Computer Architecture
446
NKK-HUST
BaCHAPTER
k thut
thc hin vo mt khi d liu
7 / INPUT/OUTPUT
230
Issue read
command to
I/O module
CPU
Read status
of I/O
module
I/O
Not
ready
Check
Status
Issue read
command to
I/O module
I/O
CPU
Read status
of I/O
module
Error
condition
Check
status
Ready
No
CPU
I/O
Do something
else
Interrupt
I/O
CPU
Error
condition
Issue read
block command
to I/O module
Read status
of DMA
module
CPU
DMA
Do something
else
Nguyn tc chung:
Interrupt
DMA
CPU
Next instruction
(c) Direct Memory Access
Ready
Read word
from I/O
module
I/O
Write word
into memory
CPU
CPU
Memory
Done?
Yes
Next instruction
(a) Programmed I/O
Jan2016
No
Read word
from I/O
module
I/O
Write word
into memory
CPU
CPU
Memory
Done?
c trng thi
m-un vo-ra
Sn sng ?
Y
Trao i d liu
Yes
Next instruction
(b) Interrupt-Driven I/O
Computer Architecture
Figure 7.4a gives an example of the use of programmed I/O to read in a block of
data from a peripheral device (e.g., a record from tape) into memory. Data are read
in one word (e.g., 16 bits) at a time. For each word that is read in, the processor must
remain in a status-checking cycle until it determines that the word is available in the
I/O modules data register. This flowchart highlights the main disadvantage of this
technique: it is a time-consuming process that keeps the processor busy needlessly.
447
Jan2016
Computer Architecture
448
112
Jan2016
NKK-HUST
NKK-HUST
Cc lnh vo-ra
Tn hiu iu khin ghi (Write): yu cu mun vo-ra ly d liu trn bus d liu a n
b m d liu ri chuyn ra thit b vo-ra
Jan2016
Computer Architecture
449
NKK-HUST
Jan2016
Computer Architecture
450
NKK-HUST
c im
n
n
Jan2016
Computer Architecture
Nguyn tc chung:
CPU khng phi i trng thi sn sng
ca m-un vo-ra, CPU thc hin mt
chng trnh no
n Khi m-un vo-ra sn sng th n pht tn
hiu ngt CPU
n CPU thc hin chng trnh con x l ngt
vo-ra tng ng trao i d liu
n CPU tr li tip tc thc hin chng trnh
ang b ngt
n
451
Jan2016
Computer Architecture
452
113
Jan2016
NKK-HUST
NKK-HUST
Chng trnh
ang thc hin
lnh
lnh
Ngt y
n
Chng trnh con
x l ngt
lnh
lnh
lnh
lnh
lnh i
lnh
lnh i+1
lnh
...
RETURN
...
lnh
Jan2016
Computer Architecture
453
NKK-HUST
Jan2016
Computer Architecture
n
n
454
NKK-HUST
Lm th no xc nh c m-un
vo-ra no pht tn hiu ngt ?
CPU lm nh th no khi c nhiu yu
cu ngt cng xy ra ?
Jan2016
Computer Architecture
455
Jan2016
Computer Architecture
456
114
Jan2016
NKK-HUST
NKK-HUST
Nhiu ng yu cu ngt
Thanh
ghi
yu
cu
ngt
n
n
n
Computer Architecture
457
NKK-HUST
CPU
Jan2016
M-un
vo-ra
M-un
vo-ra
M-un
vo-ra
Mi m-un vo-ra c ni vi mt ng yu cu
ngt
CPU phi c nhiu ng tn hiu yu cu ngt
Hn ch s lng m-un vo-ra
Cc ng ngt c qui nh mc u tin
Computer Architecture
458
NKK-HUST
C
ngt
M-un
vo-ra
Jan2016
INTR2
INTR1
INTR0
CPU
Jan2016
INTR3
INTR
Bus d liu
M-un
vo-ra
M-un
vo-ra
M-un
vo-ra
C
ngt
M-un
vo-ra
CPU
459
Jan2016
INTR
INTA
M-un
vo-ra
M-un
vo-ra
Computer Architecture
M-un
vo-ra
M-un
vo-ra
460
115
Jan2016
NKK-HUST
NKK-HUST
Bus d liu
INTRn-1
Computer Architecture
n
n
461
M-un
vo-ra
M-un
vo-ra
462
NKK-HUST
Jan2016
M-un
vo-ra
Computer Architecture
INTR0
Jan2016
INTR1
M-un
vo-ra
NKK-HUST
PIC
INTA
n
Jan2016
...
INTR
CPU
463
Jan2016
Computer Architecture
Computer Architecture
464
116
Jan2016
NKK-HUST
NKK-HUST
S cu trc ca DMAC
Cc ng d liu
B m d liu
Thanh ghi a ch
Cc ng a ch
iu khin c
Yu cu bus
Chuyn nhng bus
iu khin ghi
Ngt
Jan2016
Logic iu khin
Yu cu DMA
Ghi
Computer Architecture
465
NKK-HUST
Jan2016
Computer Architecture
n
n
n
n
Jan2016
Vo hay Ra d liu
a ch thit b vo-ra (cng vo-ra tng ng)
a ch u ca mng nh cha d liu np vo
thanh ghi a ch
S t d liu cn truyn np vo b m d liu
466
NKK-HUST
Hot ng DMA
n
467
Jan2016
468
117
Jan2016
NKK-HUST
NKK-HUST
System Bus
CPU
CPU
I/O
Module
DMAC
. .
.
I/O
Module
Memory
I/O
Module
Computer Architecture
469
NKK-HUST
DMAC
I/O
Module
Memory
I/O
Module
Jan2016
. .
.
DMAC
Jan2016
Gia DMAC vi b nh
Computer Architecture
470
NKK-HUST
c im ca DMA
System Bus
CPU
DMAC
Memory
IO Bus
I/O
Module
I/O
Module
. .
.
I/O
Module
n
n
n
Jan2016
Gia DMAC vi b nh
Computer Architecture
471
Jan2016
Computer Architecture
472
118
Jan2016
NKK-HUST
NKK-HUST
4. B x l vo-ra
n
Jan2016
Computer Architecture
473
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
M-un
vo-ra
song
song
n
bus
h
thng
Ni ghp ni tip
n
bus
h
thng
n
thit b
vo-ra
n
n
n
Jan2016
474
n
n
475
Jan2016
M-un
vo-ra
ni tip
n
thit b
vo-ra
476
119
Jan2016
NKK-HUST
NKK-HUST
2. Cc cu hnh ni ghp
Thunderbolt
7.7 / THE EXTERNAL INTERFACE: THUNDERBOLT AND INFINIBAND
im ti im (Point to Point)
n
COMPUTER
Processor
Platform
controller
hub (PCH)
DisplayPort
Memory
Graphics
Subsystem
im ti a im (Point to Multipoint)
251
DisplayPort
PCIe x4
TC
Thunderbolt
controller
Thunderbolt
connector
Thunderbolt
20 Gbps (max)
Daisy
chain
TC
TC
Jan2016
Computer Architecture
477
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
478
Chng 9
CC KIN TRC SONG SONG
3
The term hot plug is defined as pulling out a component from a system and plugging in a new one while
the main power is still on. It allows an external drive, network adapter, or other peripheral to be plugged
in without having to power down the computer.
Ht chng 8
Jan2016
Computer Architecture
479
Jan2016
Computer Architecture
480
120
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 9
Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Jan2016
Computer Architecture
481
NKK-HUST
Jan2016
Computer Architecture
NKK-HUST
SISD
CU
n
n
n
n
n
n
n
Jan2016
482
Computer Architecture
483
Jan2016
IS
PU
DS
MU
484
121
Jan2016
NKK-HUST
NKK-HUST
SIMD
SIMD (tip)
PU1
CU
PU2
IS
DS
DS
LM1
LM2
n
.
.
.
PUn
DS
LMn
n
n
Jan2016
Computer Architecture
485
NKK-HUST
Jan2016
Computer Architecture
n
n
MIMD
n
n
Tp cc b x l
Cc b x l ng thi thc hin cc
dy lnh khc nhau trn cc d liu
khc nhau
Cc m hnh MIMD
n
n
Jan2016
486
NKK-HUST
MISD
n
Vector Computer
Array processor
Computer Architecture
487
Jan2016
Computer Architecture
488
122
Jan2016
NKK-HUST
NKK-HUST
a x l b nh dng chung
(shared memory mutiprocessors)
CU1
CU2
IS
IS
PU1
PU2
.
.
.
CUn
DS
CU1
DS
.
.
.
IS
Jan2016
PUn
B nh
dng
chung
CU2
CUn
489
NKK-HUST
Jan2016
.
.
.
IS
PUn
LM2
.
.
.
DS
Mng
lin
kt
hiu
nng
cao
LMn
490
SIMD
MIMD
Song song mc yu cu
n
Jan2016
pipeline
superscalar
DS
LM1
PU2
DS
NKK-HUST
PU1
Computer Architecture
IS
.
.
.
DS
Computer Architecture
IS
Cloud computing
Computer Architecture
491
Jan2016
Computer Architecture
492
123
598
CHAP. 8
Memory consistency is not a done deal. Researchers are still proposing new
Bi ging Kinmodels
trc
my
(Naeem
et al.,tnh
2011, Sorin et al., 2011, and Tu et al., 2010).
Jan2016
SMP (tip)
n
n
(a)
SEC. 8.3
(b)
(c)
SHARED-MEMORY MULTIPROCESSORS
Figure 8-26. Three bus-based multiprocessors. (a) Without caching. (b) With
caching. (c) With caching and private memories.
607
system is called CC-NUMA (at least by the hardware people). The software people often
because
basically
the memory,
same as software
If thecall
busitishardware
busy whenDSM
a CPU
wantsittois read
or write
the CPUdisjust
tributed
shared
but implemented
thethe
hardware
small
page size.
waits until
the memory
bus becomes
idle. Hereinby
lies
problemusing
with athis
design.
With
One
of
the
first
NC-NUMA
machines
(although
the
name
had
not
yet
been
two or three CPUs, contention for the bus will be manageable; with 32 or 64 it will
coined)
was the The
Carnegie-Mellon
illustrated
in the
simplified
formofintheFig.
be unbearable.
system will beCm*,
totally
limited by
bandwidth
bus,8-32
and
(Swan
al.,CPUs
1977).
consisted
of of
a collection
most ofetthe
willIt be
idle most
the time. of LSI-11 CPUs, each with some
Jan2016
Computer
Architecture
493
memory
a local
bus. to(The
a single-chip
of the
The addressed
solution isover
to add
a cache
eachLSI-11
CPU, was
as depicted
in Fig.version
8-26(b).
The
DEC
a minicomputer
popular
In on
addition,
the LSI-11
sys-or
cachePDP-11,
can be inside
the CPU chip,
next in
to the
the 1970s.)
CPU chip,
the processor
board,
tems
connected
a system
bus.many
When
a memory
cameout
intoofthe
some were
combination
of by
all three.
Since
reads
can nowrequest
be satisfied
the
(specially
modified)
MMU,
a check
to see
thesystem
word needed
was inmore
the
local cache,
there will
be much
lesswas
busmade
traffic,
andifthe
can support
local
If so, a isrequest
the localasbus
get the
If not,
CPUs.memory.
Thus caching
a big was
win sent
here.over
However,
wetoshall
see word.
in a moment,
the
request
routed
over the
bus toisthe
keeping
the was
caches
consistent
withsystem
one another
not system
trivial. containing the word,
NKK-HUST
which
responded.
Of iscourse,
the latter
muchin longer
than CPU
the former.
Yetthen
another
possibility
the design
of Fig.took
8-26(c),
which each
has not
While
program
happily
outmemory
of remote
memory,
it took over
10 times
longer
only aacache
but could
also arun
local,
private
which
it accesses
a dedicated
to
execute
than
the
same
program
running
out
of
local
memory.
(private) bus. To use this configuration optimally, the compiler should place all the
program text, strings, constants and other read-only data, stacks, and local variCPU
Memory
CPU The
Memory
CPU Memory
ables in the
private
memories.
shared memory
is then used CPU
only Memory
for writable
shared variables. In most cases, this careful placement will greatly reduce bus traffic, but it does require active cooperation from the compiler.
n
Jan2016
Local bus
Local bus
Local bus
B x l a li (multicores)
666
8-32.
A NUMA gian
machine based
two levels
of buses.
The Cm*
the CPU
CFigure
mt
khng
a onch
chung
cho
ttwasc
first multiprocessor to use this design.
n Mi CPU c th truy cp t xa sang b nh ca
Memory coherence is guaranteed in an NC-NUMA machine because no caching isCPU
present.khc
Each word of memory lives in exactly one location, so there is no
danger of one copy having stale data: there are no copies. Of course, it now matn
nhp
nh
t xamemory
chm
hnthetruy
nhppenalty
b
ters aTruy
great deal
whichb
page
is in which
because
performance
for being
in cc
the wrong
place is so high. Consequently, NC-NUMA machines use
nh
b
elaborate software to move pages around to maximize performance.
Program counter
Instruction fetch unit
Issue logic
Single-thread register file
Execution units and queues
L1 instruction cache
L1 data cache
L2 cache
(a) Superscalar
Typically, a daemon process called a page scanner runs every few seconds.
job is to examine the usage statistics
and move pages around in an attempt to 495
Computer Architecture
improve performance. If a page appears to be in the wrong place, the page scanner
unmaps it so that the next reference to it will cause a page fault. When the fault
occurs, a decision is made about where to place the page, possibly in a different
memory. To prevent thrashing, usually there is some rule saying that once a page
is placed, it is frozen in place for a time T . Various algorithms have been studied,
but the conclusion is that no one algorithm performs best under all circumstances
(LaRowe and Ellis, 1991). Best performance depends on the application.
Thay i ca b x
l:
Tun t
n Pipeline
n Siu v hng
n a lung
n a li: nhiu CPU
trn mt chip
Local bus
System bus
Its
Jan2016
494
NKK-HUST
MMU
Computer Architecture
Issue logic
Registers n
Bus
L1 instruction cache
L1 data cache
L2 cache
Processor n
(superscalar or SMT)
Cache
L1-I
L1-D
PC n
CPU
Register 1
CPU
Processor 3
(superscalar or SMT)
L1-I
L1-D
CPU
PC 1
CPU
Processor 2
(superscalar or SMT)
Processor 1
(superscalar or SMT)
CPU
L1-I
L1-D
CPU
Shared
memory
Private memory
Shared memory
L1-I
L1-D
NKK-HUST
L2 cache
(c) Multicore
Jan2016
Computer Architecture
496 to
For each of these innovations, designers have over the years attempted
increase the performance of the system by adding complexity. In the case of pipelining, simple three-stage pipelines were replaced by pipelines with five stages, and
then many more stages, with some implementations having over a dozen stages.
There is a practical limit to how far this trend can be taken, because with more
stages, there is the need for more logic, more interconnections, and more control
signals. With superscalar organization, increased performance can be achieved by
increasing the number of parallel pipelines. Again, there are diminishing returns as
the number of pipelines increases. More logic is required to manage hazards and
to stage instruction resources. Eventually, a single thread of execution reaches the
point where hazards and resource dependencies prevent the full use of the multiple
124
Intel has introduced a number of multicore products in recent years. In this section,
we look at two examples: the Intel Core Duo and the Intel Core i7-990X.
CPU Core n
CPU Core 1
CPU Core n
L1-D L1-I
L1-D L1-I
L1-D L1-I
L1-D L1-I
L2 cache
L2 cache
Main memory
I/O
L2 cache
I/O
Main memory
2006
Two x86 superscalar,
shared L2 cache
Dedicated L1 cache
per core
n
CPU Core 1
CPU Core n
L1-D L1-I
L1-D L1-I
CPU Core 1
CPU Core n
L1-D L1-I
L1-D L1-I
L2 cache
L2 cache
L2 cache
Main memory
Jan2016
Thermal control
APIC
APIC
2 MB L2 shared cache
SEC. 8.4
Main memory
I/O
I/O
(d ) Shared L3 cache
497
Bus interface
617
MESSAGE-PASSING MULTICOMPUTERS
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
Node
Memory
Local interconnect
Disk
and
I/O
Local interconnect
Disk
and
I/O
Communication
processor
High-performance interconnection network
DDR3 Memory
Controllers
QuickPath
Interconnect
3 ! 8B @ 1.33 GT/s
The general structure of the Intel Core i7-990X is shown in Figure 18.10. Each
core has its own dedicated L2 cache and the four cores share a 12-MB L3 cache.
Computer
Architecture
One mechanism Intel uses to make
its caches
more effective is prefetching, in which
the hardware examines memory access patterns and attempts to fill the caches speculatively with data thats likely to be requested soon. It is interesting to compare the
performance of this three-level on chip cache organization with a comparable twolevel organization from Intel. Table 18.1 shows the cache access latency, in terms of
clock cycles for two Intel multicore systems running at the same clock frequency.
The Core 2 Quad has a shared L2 cache, similar to the Core Duo. The Core i7
improves on L2 cache performance with the use of the dedicated L2 caches, and
provides a relatively high-speed access to the L3 cache.
The Core i7-990X chip supports two forms of external communications to
other chips. The DDR3 memory controller brings the memory controller for the
498
9.3. a x l b nh phn tn
12 MB
L3 Cache
Jan2016
Thermal control
L3 cache
Figure 18.8
32-kB L1 Caches
CPU Core 1
675
Execution
resources
Arch. state
Cc dng t chc b x l a li
Arch. state
NKK-HUST
Execution
resources
NKK-HUST
Jan2016
The Intel Core Duo, introduced in 2006, implements two x86 superscalar processors
with a shared L2 cache (Figure 18.8c).
The general structure of the Intel Core Duo is shown in Figure 18.9. Let us
consider the key elements starting from the top of the figure. As is common in multicore systems, each core has its own dedicated L1 cache. In this case, each core has
a 32-kB instruction cache and a 32-kB data cache.
Each core has an independent thermal control unit. With the high transistor
density of todays chips, thermal management is a fundamental capability, especially for laptop and mobile systems. The Core Duo thermal control unit is designed
to manage chip heat dissipation to maximize processor performance within thermal
constraints. Thermal management also improves ergonomics with a cooler system
and lower fan acoustic noise. In essence, the thermal management unit monitors
digital sensors for high-accuracy die temperature measurements. Each core can
be defined as an independent thermal zone. The maximum temperature for each
32-kB L1 Caches
499
Jan2016
500
125
Jan2016
NKK-HUST
NKK-HUST
Mng lin kt
SEC. 8.4
MESSAGE-PASSING MULTICOMPUTERS
619
n
(a)
(b)
n
n
(c)
(d)
Jan2016
624
PARALLEL
(e)
(f)
(g)
(h)
Figure 8-37. Various topologies. The heavy dots represent switches. The CPUs
Computer
and memories are not shown.
(a) A star.Architecture
(b) A complete interconnect. (c) A tree.
COMPUTER
ARCHITECTURES
CHAP. 8
(d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube.
501
Jan2016
Computer Architecture
Cluster
n
n
2-GB
DDR2
DRAM
(a)
Card
1 Chip
4 CPUs
2 GB
Board
32 Cards
32 Chips
128 CPUs
64 GB
Cabinet
32 Boards
1024 Cards
1024 Chips
4096 CPUs
2 TB
System
72 Cabinets
73728 Cards
73728 Chips
294912 CPUs
144 TB
(b)
(c)
(d)
(e)
Figure 8-39. The BlueGene/P: (a) chip. (b) card. (c) board. (d) cabinet.
(e) system.
The cards are mounted on plug-in boards, with 32 cards per board for a total of
32 chips (and thus 128 CPUs) per board. Since each card contains 2 GB of
DRAM, the boards contain 64 GB apiece. One board is illustrated in Fig. 8-39(c).
At the next level, 32 of these boards are plugged into a cabinet, packing 4096
CPUs into a single cabinet. A cabinet is illustrated in Fig. 8-39(d).
Finally, a full system, consisting of up to 72 cabinets with 294,912 CPUs, is
depicted in Fig. 8-39(e). A PowerPC 450 can issue up to 6 instructions/cycle, thus
Jan2016
Computer Architecture
502
NKK-HUST
Chip:
4 processors
8-MB L3 cache
H thng qui m ln
t tin: nhiu triu USD
Dng cho tnh ton khoa hc v cc bi
ton c s php ton v d liu rt ln
Siu my tnh
n
n
503
Jan2016
504
126
Jan2016
NKK-HUST
NKK-HUST
PC Cluster ca Google
SEC. 8.4
9.4. B x l ha a dng
635
MESSAGE-PASSING MULTICOMPUTERS
hold exactly 80 PCs and switches can be larger or smaller than 128 ports; these are
just typical values for a Google cluster.
OC-12 Fiber
OC-48 Fiber
n
n
128-port Gigabit
Ethernet switch
128-port Gigabit
Ethernet switch
Two gigabit
Ethernet links
n
80-PC rack
Jan2016
505
Jan2016
Computer Architecture
506
NKK-HUST
NKK-HUST
3. Optimize price/performance.
B x l ha trong my tnh
8 Streaming
processors
Jan2016
Computer Architecture
507
Jan2016
Computer Architecture
508
127
and CUDA cores and other execution units in the SM execute threads. The SM executes
threads in groups of 32 threads called a warp. While programmers can generally ignore warp
execution for functional correctness and think of programming one thread, they can greatly
improve performance by having threads in a warp execute the same code path and access
memory in nearby addresses.
Jan2016
NKK-HUST 512 CUDA cores are organized in 16 SMs of 32 cores each. The GPU has six 64-bit memory
NKK-HUST
NVIDIA Fermi
Instruction Cache
C 16 Streaming
Multiprocessors (SM)
Mi SM c 32 CUDA
cores.
Mi CUDA core
(Cumpute Unified
Device Architecture) c
01 FPU v 01 IU
Jan2016
509
Jan2016
Warp Scheduler
Dispatch Unit
Dispatch Unit
Core
Core
Core
Core
Core
Core
Core
Core
LD/ST
LD/ST
Warp Scheduler
SFU
LD/ST
LD/ST
SFU
SFU
SFU
Computer
Architecture
510
In GT200, the integer
ALU was
limited to 24-bit precision for multiply operations; as a result,
multi-instruction emulation sequences were required for integer arithmetic. In Fermi, the newly
designed integer ALU supports full 32-bit precision for all instructions, consistent with standard
programming language requirements. The integer ALU is also optimized to efficiently support
64-bit and extended precision operations. Various instructions are supported, including
Boolean, shift, move, compare, convert, bit-field extract, bit-reverse insert, and population
count.
16 Load/Store Units
NKK-HUST
Each SM has 16 load/store units, allowing source and destination addresses to be calculated
for sixteen threads per clock. Supporting units load and store the data at each address to
cache or DRAM.
8
8
Ht
Jan2016
Computer Architecture
511
128