slide kiến trúc máy tính

Bi ging Kin trc my tnh
Jan2016
NKK-HUST
TRNG I HC BCH KHOA H NI

Hanoi University of Science and Technology
Contact Information
n
n
KIN TRC MY TNH
Computer Architecture
Address: 502-B1
Mobile: 091-358-5533
e-mail: khanhnk@soict.hust.edu.vn
khanh.nguyenkim@hust.edu.vn
Nguyn Kim Khnh

B mn K thut my tnh
Vin Cng ngh thng tin v Truyn thng
Department of Computer Engineering (DCE)
School of Information and Communication Technology (SoICT)
Version: CA-Jan2016
Jan2016
NKK-HUST
NKK-HUST
Mc tiu hc phn
n
Ti liu hc tp
n
Sinh vin c trang b cc kin thc c s v

kin trc tp lnh v t chc ca my tnh, cng
nh nhng vn c bn trong thit k my
tnh.
Sau khi hc xong hc phn ny, sinh vin c
kh nng:
n
n
n
n
n
Jan2016
Nguyn Kim Khnh DCE-HUST
Sch tham kho:

[1] William Stallings
Computer Organization and Architecture 2013, 9th edition
[2] David A. Patterson, John L. Hennessy
Computer Organization and Design 2012, Revised 4th edition
[3] David Money Harris, Sarah L. Harris
Digital Design and Computer Architecture 2013, 2nd edition
[4] Andrew S. Tanenbaum
Structured Computer Organization 2013, 6th edition
Tm hiu kin trc tp lnh ca cc b x l c th

Lp trnh hp ng
nh gi hiu nng my tnh v ci thin hiu nng
ca chng trnh
Khai thc v qun tr hiu qu cc h thng my tnh
Phn tch v thit k my tnh
Bi ging Kin trc my tnh: CA-Jan2016

download ti: ftp://dce.hust.edu.vn/khanhnk/CA/
Phn mm lp trnh hp ng v m phng cho MIPS:

MARS (MIPS Assembler and Runtime Simulator)
download ti: http://courses.missouristate.edu/KenVollmar/MARS/
Jan2016
Jan2016
NKK-HUST
Jan2016
NKK-HUST
Ni dung hc phn
Content
Chng 1. Gii thiu chung

Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Chapter 1. Introduction
Chapter 2. The Basics of Digital Logic
Chapter 3. Computer Systems
Chapter 4. Computer Arithmetic
Chapter 5. Instruction Set Architecture
Chapter 6. The Processors
Chapter 7. Computer Memory
Chapter 8. Input-Output Systems
Chapter 9. Parallel Architectures
NKK-HUST
Jan2016
NKK-HUST
Kin trc my tnh
Ni dung ca chng 1
1.1. My tnh v phn loi my tnh

1.2. Khi nim kin trc my tnh
1.3. S tin ha ca cng ngh my tnh
1.4. Hiu nng my tnh
Chng 1
GII THIU CHUNG
Nguyn Kim Khnh

Trng i hc Bch khoa H Ni
Jan2016
Jan2016
Jan2016
NKK-HUST
NKK-HUST
1.1. My tnh v phn loi my tnh

n
M hnh n gin ca my tnh
My tnh (Computer) l thit b in t thc

hin cc cng vic sau:
n
n
x l d liu
Nhn d liu vo,

X l d liu theo dy cc lnh c nh sn bn
trong,
a d liu (thng tin) ra.
Cc
thit b vo
(Input
Devices)
Dy cc lnh nm trong b nh yu cu
my tnh thc hin cng vic c th gi l
chng trnh (program).
My tnh hot ng theo chng trnh
n
Jan2016
d liu vo
Jan2016
Desktop computers, Laptop computers

My tnh a dng
n
n
Thit b di ng c nhn (PMD - Personal Mobile

Devices)
n
My ch (Servers) my phc v
n
Dng trong mng qun l v cung cp cc dch v

Hiu nng v tin cy cao
Hng nghn n hng triu USD
Dng cho tnh ton cao cp trong khoa hc v k thut

Hng triu n hng trm triu USD
My tnh nhng (Embedded Computers)

n
n
t n trong thit b khc

c thit k chuyn dng
n
11
Jan2016
Smartphones, Tablet
Kt ni Internet
in ton m my (Cloud Computing)
Siu my tnh (Supercomputers)

n
Jan2016
10
Phn loi my tnh k nguyn sau PC
My tnh c nhn (Personal Computers)

n
d liu ra
NKK-HUST
chng trnh
ang thc hin
Phn loi my tnh k nguyn PC
Cc
thit b ra
(Output
Devices)
B nh chnh
(Main Memory)
NKK-HUST
B x l
trung tm
(Central
Processing Unit)
S dng my tnh qui m ln (Warehouse Scale

Computers), gm rt nhiu servers kt ni vi nhau
Cho cc cng ty thu mt phn cung cp dch v
phn mm
Software as a Service (SaaS): mt phn ca phn
mm chy trn PMD, mt phn chy trn Cloud
V d: Amazon, Google
12
Jan2016
NKK-HUST
NKK-HUST
1.2. Khi nim kin trc my tnh

n
Kin trc my tnh bao gm:

n
Phn lp my tnh
Kin trc tp lnh (Instruction Set Architecture):

nghin cu my tnh theo cch nhn ca ngi lp
trnh
Ngi
s dng
T chc my tnh (Computer Organization) hay

Vi kin trc (Microarchitecture): nghin cu thit k
my tnh mc cao (thit k CPU, h thng nh,
cu trc bus, ...)
NKK-HUST
Ngn ng bc cao
Hp ng
n
n
Assembly language
M t lnh di dng text
n
n
Qun l b nh v lu tr
iu khin vo-ra
Phn cng
Jan2016
B x l, b nh, m-un vo-ra
swap:
multi
add
lw
lw
sw
sw
jr
Cc thnh phn c bn ca my tnh

high-level
programming
language A portable
language such as C, C!!,

Java, or Visual Basic that
is composed of words
and algebraic notation
that can be translated by
a compiler into assembly
language.
CPU
$2, $5,4
$2, $4,$2
$15, 0($2)
$16, 4($2)
$16, 0($2)
$15, 4($2)
$31
B nh chnh
Bus h thng
Ging nhau vi tt c cc loi

my tnh
B x l trung tm (Central
Processing Unit CPU)
FIGURE 1.4 C program compiled into assembly language and then assembled into binary
machine language. Although the translation from high-level language to binary machine language is
shown in two steps, some compilers cut out the middleman and produce binary machine language directly.
These languages and this program are examined in more detail in Chapter 2.
Jan2016
Trao i thng tin gia my tnh

vi bn ngoi
Bus h thng (System bus)

n
15
Cha cc chng trnh ang

thc hin
H thng vo-ra (Input/Output)

n
00000000101000100000000100011000
00000000100000100001000000100001
10001101111000100000000000000000
10001110000100100000000000000100
10101110000100100000000000000000
10101101111000100000000000000100
00000011111000000000000000001000
iu khin hot ng ca my
tnh v x l d liu
B nh chnh (Main Memory)

n
Assembler
Binary machine
language
program
(for MIPS)
14
NKK-HUST
n
Assembly
language
program
(for MIPS)
Lp lch cho cc nhim v v chia s ti

nguyn
Phn cng
H thng vo-ra
Machine language
M t theo phn cng
Cc lnh v d liu c
m ha theo nh phn
H iu hnh (Operating System)
15
Below Your Program
swap(int v[], int k)

{int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
Chng trnh dch (Compiler): dch m

ngn ng bc cao thnh ngn ng my
n
Compiler
Ngn ng my
n
Jan2016
High-level language HLL

Mc tru tng gn vi
vn cn gii quyt
Hiu qu v linh ng
High-level
language
program
(in C)
c vit theo ngn ng bc cao
Phn mm h thng
Cc mc ca m chng trnh
The recognition that a program could be written to translate a more powerful

language into computer instructions was one of the great breakthroughs in the
early days of computing. Programmers today owe their productivityand their
sanityto the creation of high-level programming languages and compilers
that translate programs in such languages into instructions. Figure 1.4 shows the
relationships among these programs and languages, which are more examples of
the power of abstraction.
Ngi
lp trnh
h thng
13
1.3
Phn mm h thng
Cng mt kin trc tp lnh c th c nhiu sn

phm (t chc, phn cng) khc nhau
Phn cng (Hardware): nghin cu thit k logic chi

tit v cng ngh ng gi ca my tnh.
Jan2016
Phn mm ng dng
n
Ngi
lp trnh
Phn mm ng dng
Kt ni v vn chuyn thng tin
16
Jan2016
NKK-HUST
NKK-HUST
1.3. S tin ha ca cng ngh my tnh

n
My tnh dng n in t chn khng (1950s)

n
n
n
n
n
n
SSI - Small Scale Integration

MSI - Medium Scale Integration
LSI - Large Scale Integration
VLSI - Very Large Scale Integration
My tnh dng vi mch ULSI (1990s-nay)

n
ULSI - Ultra Large Scale Integration
Jan2016
Electronic Numerical Intergator

and Computer
D n ca B Quc phng M
Do John Mauchly i hc
Pennsylvania thit k
30 tn
X l theo s thp phn
My tnh dng vi mch VLSI (1980s)

n
My tnh ENIAC: my tnh u tin (1946)

My tnh IAS: my tnh von Neumann (1952)
My tnh dng transistors (1960s)

My tnh dng vi mch SSI, MSI v LSI (1970s)
n
My tnh u tin: ENIAC v IAS
17
NKK-HUST
Jan2016
n
n
18
Mt s loi vi mch s in hnh

n
Massive Cluster
Gigabit Ethernet
B vi x l (Microprocessors)
n
Clusters
Refrigerators
Cars
n
Routers
Routers
19
Jan2016
ROM, RAM, Flash memory
H thng trn chip (SoC System on Chip) hay

B vi iu khin (Microcontrollers)
n
RobotsRobots
Vi mch thc hin cc chc nng ni ghp cc thnh

phn ca my tnh vi nhau
B nh bn dn (Semiconductor Memory)
n
Mt hoc mt vi CPU c ch to trn mt chip
Vi mch iu khin tng hp (Chipset)

n
Jan2016
Thc hin ti Princeton Institute

for Advanced Studies
Do John von Neumann thit k
theo tng stored program
X l theo s nh phn
Tr thnh m hnh c bn ca
my tnh
NKK-HUST
My tnh ngy nay
Sensor
Nets
Tch hp cc thnh phn chnh ca my tnh trn mt

chip vi mch
c s dng ch yu trn smartphone, tablet v cc
my tnh nhng
20
Jan2016
NKK-HUST
NKK-HUST
S pht trin ca b vi x l
n
n
n
1971: b vi x l 4-bit Intel 4004

1972: cc b x l 8-bit
1978: cc b x l 16-bit
n
n
n
n
1.4. Hiu nng my tnh

n
Hiu nng = 1/(thi gian thc hin)

hay l: P = 1/t
My tnh c nhn IBM-PC ra i nm 1981
My tnh A nhanh hn my B k ln
1985: cc b x l 32-bit
2001: cc b x l 64-bit
2006: cc b x l a li (multicores)
n
nh ngha hiu nng P (Performance):
PA / PB = tB / tA = k
n
Nhiu CPU trn 1 chip
V d: Thi gian chy chng trnh:

n
n
n
Jan2016
21
NKK-HUST
Jan2016
Thi gian thc hin ca CPU
V mt thi gian, CPU hot ng theo mt xung nhp

(clock) c tc xc nh
Jan2016
Chu k xung nhp T0 (Clock period): thi gian ca mt

chu k
Tc xung nhp f0 (Clock rate) hay l Tn s xung nhp:
s chu k trong 1s
n f0 = 1/T0
VD: B x l c f0 = 4GHz = 4109Hz
T0 = 1/(4x109) = 0.25x109s = 0.25ns
n gin, ta xt thi gian CPU thc hin

chng trnh (CPU time):
Thi gian thc hin ca CPU =
S chu k xung nhp x Thi gian mt chu k
T0
n
22
NKK-HUST
Tc xung nhp ca CPU

n
10s trn my A, 15s trn my B

tB / tA = 15s / 10s = 1.5
Vy my A nhanh hn my B 1.5 ln
tCPU = n T0 =
n
f0
n: s chu k xung nhp

n
Hiu nng c tng ln bng cch:

n
n
23
Jan2016
Gim s chu k xung nhp n

Tng tc xung nhp f0
24
Jan2016
NKK-HUST
NKK-HUST
V d
n
n
Hai my tnh A v B cng chy mt chng trnh

My tnh A:
n
n
Ta c:
Tc xung nhp ca CPU: fA = 2GHz

Thi gian CPU thc hin chng trnh: tA = 10s
t=
n
f
S chu k xung nhp khi chy chng trnh trn my A:
nA = t A f A = 10s 2GHz = 20 10 9
My tnh B:
n
V d (tip)
Thi gian CPU thc hin chng trnh: tB = 6s

S chu k xung nhp khi chy chng trnh trn my B (nB)
nhiu hn 1.2 ln s chu k xung nhp khi chy chng trnh
trn my A (nA)
S chu k xung nhp khi chy chng trnh trn my B:
nB = 1.2 nA = 24 10 9
Hy xc nh tc xung nhp cn thit cho my B (fB)?
Tc xung nhp cn thit cho my B:
fB =
Jan2016
25
NKK-HUST
nB 24 10 9
=
= 4 10 9 Hz = 4GHz
tB
6
Jan2016
26
NKK-HUST
S lnh v s chu k trn mt lnh
V d
S chu k xung nhp ca chng trnh:
S chu k = S lnh ca chng trnh x S chu k trn mt lnh
n = IC CPI
n
n
n
n - s chu k xung nhp

IC - s lnh ca chng trnh (Instruction Count)
CPI - s chu k trn mt lnh (Cycles per Instruction)
IC CPI
f0
Chu k xung nhp: TA = 250ps

S chu k/ lnh trung bnh: CPIA = 2.0
My tnh B:
n
Vy thi gian thc hin ca CPU:
tCPU = IC CPI T0 =
Hai my tnh A v B c cng kin trc tp lnh

My tnh A c:
Chu k xung nhp: TB = 500ps

S chu k/ lnh trung bnh: CPIB = 1.2
Hy xc nh my no nhanh hn v nhanh hn
bao nhiu ?
Trong trng hp cc lnh khc nhau c CPI khc nhau,

cn tnh CPI trung bnh
Jan2016
27
Jan2016
28
Jan2016
NKK-HUST
NKK-HUST
V d (tip)
Ta c:
CPI trung bnh
tCPU = IC CPITB T0
Hai my cng kin trc tp lnh, v vy s lnh ca cng

mt chng trnh trn hai my l bng nhau:
Nu loi lnh khc nhau c s chu k khc nhau,

ta c tng s chu k:
K
ICA = ICB = IC
n = (CPI i ICi )
Thi gian thc hin chng trnh trn my A v my B:
i=1
t A = ICA CPI A TA = IC 2.0 250 ps = IC 500 ps
CPI trung bnh:
t B = ICB CPI B TB = IC 1.2 500 ps = IC 600 ps
T ta c:
CPITB =
t B IC 600 ps
=
= 1.2
t A IC 500 ps
n
1 K
= (CPI i ICi )
IC IC i=1
Kt lun: my A nhanh hn my B 1.2 ln

Jan2016
29
NKK-HUST
Jan2016
NKK-HUST
V d
n
V d
Cho bng ch ra cc dy lnh s dng cc lnh

thuc cc loi A, B, C. Tnh CPI trung bnh?
Cho bng ch ra cc dy lnh s dng cc lnh

thuc cc loi A, B, C. Tnh CPI trung bnh?
Loi lnh
Loi lnh
CPI theo loi lnh
CPI theo loi lnh
IC trong dy lnh 1
20
10
20
IC trong dy lnh 1
20
10
20
IC trong dy lnh 2
40
10
10
IC trong dy lnh 2
40
10
10
Dy lnh 1: S lnh = 50
31
Jan2016
Dy lnh 2: S lnh = 60
S chu k =
= 1x20 + 2x10 + 3x20 = 100
n CPITB = 100/50 = 2.0
n
Jan2016
30
S chu k =
= 1x40 + 2x10 + 3x10 = 90
n CPITB = 90/60 = 1.5
n
32
Jan2016
NKK-HUST
NKK-HUST
MIPS nh l thc o hiu nng
Tm tt v Hiu nng
CPU Time =
Instructions Clock cycles Seconds
Program
Instruction Clock cycle
MIPS: Millions of Instructions Per Second

(S triu lnh trn 1 giy)
Thi gian CPU = S lnh ca chng trnh x S chu k/lnh

x S giy ca mt chu k
tCPU
n
MIPS =
IC CPI
= IC CPI T0 =
f0
Instruction count
Instruction count
Clock rate
=
=
Execution time 106 Instruction count CPI 106 CPI106
Clock rate
Hiu nng ph thuc vo:

n
n
n
n
n
Thut gii
Ngn ng lp trnh
Chng trnh dch
Kin trc tp lnh
Phn cng
Jan2016
MIPS =
33
NKK-HUST
f0
CPI 10 6
Jan2016
CPI =
f0
MIPS 10 6
34
NKK-HUST
V d
V d
Tnh MIPS ca b x l vi:

clock rate = 2GHz v CPI = 4
Tnh MIPS ca b x l vi:

clock rate = 2GHz v CPI = 4
0.5ns
2ns
Jan2016
35
Jan2016
Chu k T0 = 1/(2x109) = 0.5ns

CPI = 4 thi gian thc hin 1 lnh = 4 x 0.5ns = 2ns
S lnh thc hin trong 1s = (109ns)/(2ns) = 5x108 lnh
Vy b x l thc hin c 500 MIPS
36
Jan2016
NKK-HUST
NKK-HUST
V d
V d
Tnh CPI ca b x l vi:

clock rate = 1GHz v 400 MIPS
Tnh CPI ca b x l vi:

clock rate = 1GHz v 400 MIPS
1ns
Jan2016
37
NKK-HUST
Jan2016
38
NKK-HUST
MFLOPS
Cc tng v i trong kin trc my tnh

1. Design for Moores Law
Thit k theo lut Moore
2. Use abstraction to simplify design
S dng tru tng ha n gin thit k
3. Make the common case fast
Lm cho cc trng hp ph bin thc hin nhanh
4. Performance via parallelism
Tng hiu nng qua x l song song
5. Performance via pipelining
Tng hiu nng qua k thut ng ng
6. Performance via prediction
Tng hiu nng thng qua d on
7. Hierarchy of memories
Phn cp b nh
8. Dependability via redundancy
Tng tin cy thng qua d phng
S dng cho cc h thng tnh ton ln

Millions of Floating Point Operations per Second
S triu php ton s du phy ng trn mt giy
MFLOPS =
Executed floating point operations

Execution time 106
GFLOPS109
TFLOPS1012
PFLOPS (1015)
Jan2016
Chu k T0 = 1/109 = 1ns

S lnh thc hin trong 1 s l 400MIPS = 4x108 lnh
Thi gian thc hin 1 lnh = 1/(4x108)s = 2.5ns
Vy ta c: CPI = 2.5
39
Jan2016
40
10
Jan2016
NKK-HUST
NKK-HUST
Kin trc my tnh
Chng 2
C BN V LOGIC S
Ht chng 1
Nguyn Kim Khnh

Jan2016
41
NKK-HUST
Jan2016
42
NKK-HUST
Ni dung ca chng 2
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Jan2016
2.1. Cc h m c bn
2.2. i s Boole
2.3. Cc cng logic
2.4. Mch t hp
2.5. Mch dy
43
Jan2016
44
11
Jan2016
NKK-HUST
NKK-HUST
2.1. Cc h m c bn
n
Jan2016
1. H thp phn
H thp phn (Decimal System)

con ngi s dng
H nh phn (Binary System)
my tnh s dng
H mi su (Hexadecimal System)
dng vit gn cho s nh phn
C s 10
10 ch s: 0,1,2,3,4,5,6,7,8,9
Dng n ch s thp phn c th biu din
c 10n gi tr khc nhau:
45
NKK-HUST
00...000
= 0
99...999
= 10n - 1
Jan2016
46
NKK-HUST
Dng tng qut ca s thp phn
V d s thp phn
A = an an1 ... a1a0 , a1 ... am
472.38 = 4x102 + 7x101 + 2x100 + 3x10-1 + 8x10-2

n
Gi tr ca A c hiu nh sau:
A = an10 n + an110 n1 +... + a1101 + a010 0 + a110 1 +... + am10 m
n
A=
a 10
Cc ch s ca phn nguyn:
n
472 : 10 = 47 d
47 : 10 = 4 d
4 : 10 = 0 d
Cc ch s ca phn l:
i=m
Jan2016
47
Jan2016
0.38 x 10 = 3.8 phn nguyn =
0.8 x 10 = 8.0 phn nguyn =
48
12
Jan2016
NKK-HUST
NKK-HUST
S nh phn
2. H nh phn
n
n
n
n
n
C s 2
2 ch s nh phn: 0 v 1
Ch s nh phn c gi l bit (binary digit)
bit l n v thng tin nh nht
Dng n bit c th biu din c 2n gi tr khc
nhau:
n
n
00...000
11...111
Biu din
s nh phn
2-bit
3-bit
4-bit
S
thp phn
00
000
0000
01
001
0001
10
010
0010
11
011
0011
100
0100
101
0101
110
0110
111
0111
1000
= 0
= 2n - 1
Cc lnh ca chng trnh v d liu trong

my tnh u c m ha bng s nh phn
Jan2016
49
NKK-HUST
Jan2016
1001
1010
10
1011
11
1100
12
1101
13
1110
14
1111
15
50
NKK-HUST
n v d liu trong my tnh
Dng tng qut ca s nh phn
Byte = 8-bit
Qui c mi v n v d liu trong my tnh:

Theo thp phn
Jan2016
1-bit
A = an an1 ... a1a0 , a1 ... am
Theo nh phn
n v
Vit tt
Gi tr
n v
Vit tt
Gi tr
kilobyte
KB
103
kibibyte
KiB
210 = 1024
megabyte
MB
106
mebibyte
MiB
220
gigabyte
GB
109
gibibyte
GiB
230
terabyte
TB
1012
tebibyte
TiB
240
petabyte
PB
1015
pebibyte
PiB
250
exabyte
EB
1018
exbibyte
EiB
260
vi ai=0 hoc 1
Gi tr ca A c tnh nh sau:
A = an 2 n + an1 2 n1 +... + a1 21 + a0 2 0 + a1 2 1 +... + am 2 m

n
A=
a2
i=m
51
Jan2016
52
13
Jan2016
NKK-HUST
NKK-HUST
V d s nh phn
Chuyn i s nguyn thp phn sang nh phn
1101001.1011(2) =
6 5 4 3 2 1 0
-1 -2 -3 -4
= 26 + 25 + 23 + 20 + 2-1 + 2-3
2-4
= 64 + 32 + 8 + 1 + 0.5 + 0.125 + 0.0625
Phng php 1: chia dn cho 2 ri ly

phn d
Phng php 2: Phn tch thnh tng
ca cc s 2i nhanh hn
= 105.6875(10)
Jan2016
53
NKK-HUST
Jan2016
NKK-HUST
Phng php chia dn cho 2

n
Phng php phn tch thnh tng ca cc 2i

n
V d: chuyn i 105(10)
n
105 : 2 =
52
52 : 2 =
26
26 : 2 =
13
13 : 2 =
6:2 =
3:2 =
1:2 =
V d 1: chuyn i 105(10)
6
5
3
0
n 105 = 64 + 32 + 8 +1 = 2 + 2 + 2 + 2
biu din
s d
theo chiu
mi tn
n
27
26
25
24
23
22
21
20
128
64
32
16
Kt qu:
105(10) = 0110 1001(2)
V d 2: 17000(10) = 16384 + 512 + 64 + 32 + 8

=
Kt qu: 105(10) = 1101001(2)
214 + 29 + 26 + 25 + 23
17000(10) = 0100 0010 0110 1000(2)

15 14 13 12 11 10 9 8
Jan2016
54
55
Jan2016
7 6 5 4
3 2 1 0
56
14
Jan2016
NKK-HUST
NKK-HUST
Chuyn i s l thp phn sang nh phn

n
Chuyn i s l thp phn sang nh phn (tip)
V d 1: chuyn i 0.6875(10)
n
0.6875 x 2 = 1.375
phn nguyn = 1
0.375 x 2 = 0.75
phn nguyn = 0
0.75
x 2 = 1.5
phn nguyn = 1
0.5
x 2 = 1.0
phn nguyn = 1
biu din
theo
chiu
mi tn
Kt qu : 0.6875(10)= 0.1011(2)
Jan2016
57
0.81 x 2 =
1.62
phn nguyn
0.62 x 2 =
1.24
phn nguyn
0.24 x 2 =
0.48
phn nguyn
0.48 x 2 =
0.96
phn nguyn
0.96 x 2 =
1.92
phn nguyn
0.92 x 2 =
1.84
phn nguyn
0.84 x 2 =
1.68
phn nguyn
0.81(10) 0.1100111(2)
NKK-HUST
Jan2016
58
NKK-HUST
3. H mi su (Hexa)
n
n
n
Jan2016
V d 2: chuyn i 0.81(10)
Quan h gia s nh phn v s Hexa

V d:
C s 16
16 ch s: 0,1,2,3,4,5,6,7,8,9, A,B,C,D,E,F
Dng vit gn cho s nh phn: c mt
nhm 4-bit s c thay bng mt ch s
Hexa
1011 0011(2) = B3(16)
0000 0000(2) = 00(16)
59
0010 1101 1001 1010(2) = 2D9A(16)
1111 1111 1111 1111(2) = FFFF(16)
Jan2016
4-bit
S Hexa
Thp phn
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
10
1011
11
1100
12
1101
13
1110
14
1111
15
60
15
Jan2016
NKK-HUST
NKK-HUST
2.2. i s Boole
n
n
n
Php ton i s Boole vi hai bin
i s Boole s dng cc bin logic v php ton logic

Bin logic c th nhn gi tr 1 (TRUE) hoc 0 (FALSE)
Cc php ton logic c bn: AND, OR v NOT
n
A AND B :
A OR B :
A B hay AB
A+B
NOT A :
Th t u tin: NOT > AND > OR

Thm cc php ton logic: NAND, NOR, XOR
n
A NAND B:
A NOR B :
A XOR B:
AB
A+ B
A B = A B + A B
Jan2016
61
NKK-HUST
A AND B
AB
A OR B
A + B
A NAND B
AB
Jan2016
NOT l php ton 1 bin
A NOR B
A XOR B
A + B
A B
62
NKK-HUST
Cc ng nht thc ca i s Boole

AB=BA
A+B=B+A
A (B + C) = (A B) + (A C)
A + (B C) = (A + B) ( A + C)
1A=A
0+A=A
AA=0
A+A=1
0A=0
1+A=1
AA=A
A+A=A
A (B C) = (A B) C
A + (B + C) = (A + B) + C
A B = A + B (nh l De Morgan)
A + B = A B (nh l De Morgan)
Jan2016
NOT A
2.3. Cc cng logic (Logic Gates)

n
Thc hin cc hm logic:

n
Cng logic mt u vo:

n
63
Jan2016
Cng NOT
Cng hai u vo:

n
NOT, AND, OR, NAND, NOR, XOR
AND, OR, XOR, NAND, NOR
Cng nhiu u vo
64
16
The fundamental building block of all digital logic circuits is the gate. Logical functions are implemented by the interconnection of gates.
A gate is an electronic circuit that produces an output signal that is a simple Boolean operation on its input signals. The basic gates used in digital logic are
AND, OR, NOT, NAND, NOR, and XOR. Figure 11.1 depicts these six gates. Each
gate is defined in three ways: graphic symbol, algebraic notation, and truth table.
The symbology used in this chapter is from the IEEE standard, IEEE Std 91. Note
that the inversion (NOT) operation is indicated by a circle.
Each gate shown in Figure 11.1 has one or two inputs and one output.
However, as indicated in Table 11.1b, all of the gates except NOT can have more
than two inputs. Thus, (X + Y + Z) can be implemented with a single OR gate
with three inputs. When one or more of the values at the input are changed, the
correct output signal appears almost instantaneously, delayed only by the propagation time of signals through the gate (known as the gate delay). The significance of
this delay is discussed in Section 11.3. In some cases, a gate is implemented with two
outputs, one output being the negation of the other output.
NKK-HUST
Jan2016
NKK-HUST
K hiu cc cng logic
Algebraic
Function
Graphical Symbol
Name
Tp y
Truth Table
A B F
0 0 0
0 1 0
B
1 0 0 11.2 / GATES
1 1 1
A B F
Here we introduce
a common term: we say that to assert
0 0 0a signal is to cause a
A
F ! A " Bfalse (0) 0state
F its logically
1 1 to its logically true
signal line toOR
make aBtransition from
1 0 1
(1) state. The true (1) state is either a high or low voltage1 state,
1 1 depending on the
A
AND
F!AB
or
F ! AB
369
type of electronic circuitry.

A F
F!A
Typically,
not allA gate types are
used in implementation.
and fabrication
0 Design
1
NOT
F
or
are simpler if only one or two types of gates are
Thus, 1it is0 important to identify
F ! used.
A#
functionally complete sets of gates. This means that any Boolean
function can be impleA B F
mented using only theAgates in the set. The following are functionally
complete sets:
0 0 1
NAND
Jan2016
F ! AB
0 1 1
B
AND, OR, NOT
1 0 1
1 1 0
AND, NOT
A B F
0 0 1
A
OR, NOT
F!A"B
0 1 0
NOR
F
B
1 0 0
NAND
1 1 0
NOR
A B F
0 0 0
A
XORbe clear that AND, F
F ! NOT
A ! B gates constitute
0 1 1
It should
OR, and
a
B
1 0 1
complete set, because they represent
the
three
operations
of Boolean
1 1 0
functionally
algebra. For
the AND and NOT gates to form a functionally complete set, there must be a way
Figure 11.1 Basic Logic Gates
to synthesize
the OR operation from the AND and NOT operations. This can be
done by applying DeMorgans theorem:
65
L tp cc cng c th thc hin c bt k

hm logic no t cc cng ca tp
Mt s v d v tp y :
n
{AND, OR, NOT}
{AND, NOT}
{OR, NOT}
{NAND}
{NOR}
Jan2016
66
A + B = A#B
A OR B = NOT ((NOT A) AND (NOT B))
NKK-HUST
NKK-HUST
Similarly, the OR and NOT operations are functionally complete because

they can be used to synthesize the AND operation.
Figure 11.2 shows how the AND, OR, and NOT functions can be implemented
solely with NAND gates, and Figure 11.3 shows the same thing for NOR gates.
For this reason, digital circuits can be, and frequently are, implemented solely with
NAND gates or solely with NOR gates.
S dng cng NAND
S dng cng NOR

370
CHAPTER 11 / DIGITAL LOGIC

A
A B
A
B
A B
(A+B)
A+B
A B
Figure 11.2 Some Uses of NAND Gates
Jan2016
A+B
Figure 11.3 Some Uses of NOR Gates
67
Jan2016
With gates, we have reached the most primitive circuit level of computer
hardware. An examination of theComputer
transistor
combinations used to construct gates
Architecture
departs from that realm and enters the realm of electrical engineering. For our purposes, however, we are content to describe how gates can be used as building blocks
to implement the essential logical circuits of a digital computer.
11.3 COMBINATIONAL CIRCUITS

A combinational circuit is an interconnected set of gates whose output at any time
is a function only of the input at that time. As with a single gate, the appearance of
the input is followed almost immediately by the appearance of the output, with only
68
17
Jan2016
NKK-HUST
NKK-HUST
Mt s vi mch logic
1A
1B
1Y
2A
2B
2Y
GND
1A
1B
1Y
2A
2B
2Y
GND
Jan2016
1A
1B
NC
1C
1D
NKK-HUST
1Y
GND
14
13
12
11
10
7400 NAND
14
13
12
11
10
7408 AND
14
13
12
11
10
VDD
1Y
4B
1A
4A
1B
4Y
2Y
3B
2A
3A
2B
3Y
GND
VDD
1A
4B
1B
4A
2A
4Y
2B
3B
2C
3A
2Y
3Y
GND
VDD
1CLR
14
13
12
11
10
7402 NOR
14
13
12
11
10
7411 AND3
VDD
1A
4Y
1Y
4B
2A
4A
2Y
3Y
3A
3B
3Y
3A
GND
VDD
1A
1C
1B
1Y
1Y
3C
2A
3B
2B
3A
2Y
3Y
GND
2D
1D
2C
1CLK
NC
1PRE
2B
1Q
2A
1Q
2Y
GND
Mch t hp
7421 AND4
1
2
3
4
14
13
reset
D Q
Q
set
reset
Q D
Q
set
12
11
10
7474 FLOP
VDD
1A
2CLR
1B
2D
1Y
2CLK
2A
2PRE
2B
2Q
2Y
2Q
GND
2.4. Mch t hp
585
A.3 Programmable Logic
14
13
12
11
10
7404 NOT
14
13
12
11
10
9
8
7432 OR
14
13
12
11
10
7486 XOR
Mch logic l mch bao gm:

n
6A
6Y
5A
5Y
4A
4Y
VDD
Cc kiu mch logic:

n
VDD
Cc u vo (Inputs)
Cc u ra (Outputs)
c t chc nng (Functional specification)
c t thi gian (Timing specification)
4B
4Y
3B
3A
3Y
69
Jan2016
n
n
4A
4Y
3B
NKK-HUST
3Y
70
F = 1A B C2 # 1A B C2 # 1A B C2 # 1A B C2 # 1A B C2
V d
This can be rewritten using a generalization of DeMorgans theorem:

(X # Y # Z) = X + Y + Z
u vo
Bng tht (True Table)

Dng s
Phng trnh Boole
4B
Mch t hp l mch logic trong u ra ch

ph thuc u vo thi im hin ti
L mch khng nh v c thc hin bng
cc cng logic
Mch t hp c th c nh ngha theo ba
cch:
There are three combinations of input values that cause F to be 1, and if any
one of these combinations occurs, the result is 1. This form of expression, for selfevident reasons, is known as the sum of products (SOP) form. Figure 11.4 shows a
straightforward implementation with AND, OR, and NOT gates.
Another form can also be derived from the truth table. The SOP form
expresses that the output is 1 if any of the input combinations that produce 1 is true.
We can also say that the output is 1 if none of the input combinations that produce
0 is true. Thus,
Figure A.1 Common 74xx-series logic gates
0
Mch c nh
1
1
0
1
1
0 ti
u ra c xc nh bi cc1 gi tr trc
v 1gi tr hin
ca u vo
1
VDD
3A
0
0
0
Mch khng nh
0
0
1
u ra c xc nh bi cc0 gi tr hin
1 ti ca 0u vo
Mch dy (Sequential Circuits)

1
371
A Boolean Function of Three Variables
Mch t hp (Combinational
Circuits)
A
B
n
4A
11.3 / COMBINATIONAL CIRCUITS

Table 11.3
u ra
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
0
Figure 11.4
Sum-of-Products Implementation of Table 11.3
F = A BC + A BC + ABC
Jan2016
71
Jan2016
72
18
F = B(A + C)
D3
Because the complement of the complement of a value is just the original value,
output signal F. To select one of the four possible inputs, a 2-bit selection code is
F = B(A + C) = (AB + (BC) needed, and this is implemented as two select lines labeled S1 and S2.
An example 4-to-1 multiplexer is defined by the truth table in Table 11.7. This
is a simplified form of a truth table. Instead of showing all possible combinations of
input variables, it shows the output as data from line D0, D1, D2, or D3. Figure 11.13
shows an implementation using AND, OR, and NOT gates. S1 and S2 are connected
which has three NAND forms, as illustrated in Figure 11.11.to the AND gates in such a way that, for any combination of S1 and S2, three of the
AND gates will output 0. The fourth AND gate will output the value of the selected
line, which is either 0 or 1. Thus, three of the inputs to the OR gate are always 0,
Multiplexers
and the output of the OR gate will equal the value of the selected input gate. Using
The multiplexer connects
multiple inputs to a single output.this
At regular
any time,
one of theit is easy to construct multiplexers of size 8-to-1, 16-to-1,
NKK-HUST
organization,
so on. representation
inputs is selected to be passed to the output. A general blockand
diagram
Multiplexers
areinput
used in digital circuits to control signal and data routing. An
is shown in Figure 11.12. This represents a 4-to-1 multiplexer. There
are four
theprovide
loading of
the program counter (PC). The value to be loaded into the
lines, labeled D0, D1, D2, and D3. One of these lines is example
selectedisto
the
program counter may come from one of several different sources:
Applying DeMorgans theorem,
Jan2016
F = (AB)(BC)
NKK-HUST
B chn knh (Multiplexer - MUX)
B chn knh 4 u vo
S2
n
n
n
u vo d liu
n u vo chn
1 u ra d liu
Mi t hp u vo chn (S) xc nh u vo
d liu no (D) s c ni vi u ra (F)
Jan2016
D1
F
D0
D3
D1
S2
S1
Figure 11.12 4-to-1 Multiplexer

Representation
u vo chn u ra
73
S2
S1
0
0
1
1
0
1
0
1
D0
D1
D2
D3
F
D2
D3
Figure 11.13 Multiplexer Implementation
F = D0 S2 S1+ D1 S2 S1+ D2 S2 S1+ D3 S2 S1
Jan2016
74
NKK-HUST
B gii m (Decoder)
n
4-to-1
MUX
D2
NKK-HUST
S1
D0
n 2n
Thc hin b gii m 3 ra 8
N u vo, 2N u ra
Vi mt t hp ca N u vo, ch c mt u ra tch cc
(khc vi cc u ra cn li)
V d: B gii m 2 ra 4
2:4
Decoder
11
10
01
00
A1
A0
A1
Y3
Y2
Y1
Y0
A
B
001 D
1
C
010 D
2
A0
011 D
3
100 D
4
Y3
101 D
5
Y2
A1
0
0
1
1
Jan2016
A0
0
1
0
1
Y3
0
0
0
1
Y2
0
0
1
0
Y1
0
1
0
0
383
000 D
0
Y0
1
0
0
0
Y1
110 D
6
Y0
111 D
7
75
Jan2016
Figure 11.15
Decoder with 3 Inputs and 23 = 8 Outputs

76
A0
A7
256 ! 8
RAM
2-to-4
Decoder
Enable
256 ! 8
RAM
Enable
256 ! 8
RAM
Enable
256 ! 8
RAM
Enable
19
Jan2016
386
Table 11.9 Binary Addition Truth Tables

NKK-HUST
(a) Single-Bit Addition
Cin
A
B
Sum
Carry
NKK-HUST
B cng
n
Cng hai bit to ra bit tng v bit nh ra
Sum
+ 0
0
B cng ton phn 1-bit (Full-adder)

n
(b) Addition with Carry Input
A
B cng bn phn 1-bit
B cng bn phn 1-bit (Half-adder)

n
Cng 3 bit
Cho php xy dng b cng N-bit
0
1
+ 1
1
+ 0
1
A
B
C
A
B
C
u vo A
B
1+ 1
Cout
0
1Half
Adder
0
387
0
1
1
Sum
1
Cout
+1
0
1 CIRCUITS0
11.3 /1COMBINATIONAL
1
1
01-bit
1
10
0
S
+0 A
A1
u ra
B
0C
Cout
S = A B
+0
+1
B
0
0
0 1
C out = AB
00
1
10
C
0
1
1
0
However, addition canAstill be dealt with in Boolean terms. In Table 11.9a, we
show the logic
1 for adding
0 B two input
1 bits0 to produce a 1-bit sum and a carry bit.
This truth table could easily be implemented in digital logic. However, we are not
1
1 A addition
0 on just1 a single pair of bits. Rather, we wish to
interested in performing
Jan2016
77
NKK-HUST
S = ABC + ABC + ABC + ABC

Cout = AB + AC + BC
Cin
387
A
B
C
S
u vo
u ra
Cin
Cout
A
B
C
A
B
C
Thus we have
the necessary
logic+toBC
implement a multiple-bit adder such as
Carry
= AB + AC
shown in Figure 11.21. Note that because the output from each adder depends on
Figure 11.20 is an implementation using AND, OR, and NOT gates.
the carry from the previous adder, there is an increasing delay from the least significant to the most significant bit. Each single-bit adder experiences a certain amount
of gate delay,A andBthis gate delay
larger adders,
A2 accumulates.
B2
AFor
B
A0 theB0accumulated
3
3
1
1
delay can become unacceptably high.
If the carry values could be determined without having to ripple through all
Overflow
the previous
stages,Cthen each
single-bit
adder
could CfunctionC independently,
and
0
C3
C2
Cin
C1
Cin
in
in
0
signal
delay would not accumulate. This can be achieved with an approach known as carry
lookahead. Let us look again at the 4-bit adder to explain this approach.
We would like to come up with an expression that specifies the carry input to
any stage of theSadder without reference
to previous
carry values. We
have
S
S
S
B cng 4-bit v b cng 32-bit
1-bit
Full
Adder
Cout
Figure 11.19 4-Bit Adder
Sum
A31 B31
A
B
C
S31
Cout
Figure 11.21
Carry
A24 B24
A23 B23
C23
8-bit
adder
Cout
A
B
A
C
78
Sum = A BC + ABC + ABC + ABC
NKK-HUST
B cng ton phn 1-bit
Jan2016
add two n-bit numbers. CThis can be done by putting togetherCarry

a set of adders so that
Architecture
carry from one adder is providedComputer
as input
to the next. A 4-bit adder is depicted
in Figure 11.19.
B
For a multiple-bit adder
to work, each of the single-bit adders must have three
C
inputs, including the carry from the next-lower-order adder. The revised truth table
Figure
11.20
Implementation
of an Adder
appears in Table 11.9b. The two
outputs
can be expressed:
Jan2016
the
S24
A16 B16
C15
8-bit
adder
S23
A15 B15
S16
A8 B8
C7
8-bit
adder
S15
A 7 B7
S8
A0 B0
8-bit
adder
S7
Cin
S0
Construction of a 32-Bit Adder Using 8-Bit Adders
B
C
Figure 11.20 Implementation of an Adder
79
Thus we have the necessary logic to implement a multiple-bit adder such as

shown in Figure 11.21. Note that because the output from each adder depends on
the carry from the previous adder, there is an increasing delay from the least significant to the most significant bit. Each single-bit adder experiences a certain amount
of gate delay, and this gate delay accumulates. For larger adders, the accumulated
delay can become unacceptably high.
If the carry values could be determined without having to ripple through all
the previous stages, then each single-bit adder could function independently, and
delay would not accumulate. This can be achieved with an approach known as carry
lookahead. Let us look again at the 4-bit adder to explain this approach.
We would like to come up with an expression that specifies the carry input to
any stage of the adder without reference to previous carry values. We have
Jan2016
80
20
Jan2016
Figure 11.26 JK Flip-Flop
NKK-HUST
NKK-HUST
2.5. Mch dy
causing the output to be 1; if only the K input is asserted, the result is a reset function,
causing the output to be 0. When both J and K are 1, the function performed is
referred to as the toggle function: the output is reversed. Thus, if Q is 1 and 1 is applied
to J and K, then Q becomes 0. The reader should verify that the implementation of
Figure 11.26 produces this characteristic function.
Cc Flip-Flop c bn
Name
Mch dy l mch logic trong u ra ph

thuc gi tr u vo thi im hin ti v
u vo thi im qu kh
Truth Table
Qn!1
0
0
1
1
0
1
0
1
Qn
0
1
Qn!1
0
0
1
1
0
1
0
1
Qn
0
1
Qn
Qn!1
0
1
0
1
Ck
SR
L mch c nh, c thc hin bng phn

t nh (Latch, Flip-Flop) v c th kt hp
vi cc cng logic
Graphical Symbol
Ck
JK
Ck
Jan2016
81
NKK-HUST
Jan2016
82
Figure 11.27 Basic Flip-Flops
NKK-HUST
S-R Latch v cc Flip-Flop

11.4 / SEQUENTIAL CIRCUITS
R
Thanh ghi 8-bit song song
389
391
393
Data lines
R
391
Clock
D18
D17
D16
D15
D14
D13
D12
D11
S
Q
ClockS
Figure 11.24
Figure 11.22 The SR Latch Implemented

with NOR Gates
S-R Latch
S-R Flip-Flop
392
Clocked SR Flip-Flop
Clk
Clk
Clk
Clk
Clk
Clk
Clk
Clk
First, let us show that the circuit is bistable. Assume that both S and R are 0
Figure 11.24 Clocked SR Flip-Flop
and that Q is 0. The inputs to the lower NOR gate are Q = 0 and S = 0. Thus, the
Clock
output Q = 1 means that the inputs to the upper NOR gate are Q = 1 and R = 0,
Q
K
which has the output Q = 0. Thus, the state of the circuit is internally consistent
Q
and remains stable as long as S = R = 0. A similar line of reasoning shows that the
Q
state Q = 1, Q = 0 is also stable for R = S = 0.
D
Thus, this circuit can function as a 1-bit memory. We can view the output Q as
Clock
Figure
11.25 D Flip-Flop
Clock
the value of the bit. The
inputs S and R serve to write the values 1 and 0, respectively, into memory. To see this, consider the state Q = 0, Q = 1, S = 0, R = 0.
This device is referred to as a clocked SR flip-flop. Note that the
Suppose that S changes to the value 1. Now the inputs to the lower NOR gatearrangement.
are
Q
R be
and S inputs are passed
to the NOR gates only during the clock pulse.
S = 1, Q = 0. After some time delay ^t, the output of the lower NOR gate
will
J
Q
D
Q = 0 (see Figure 11.23). So, at this point in time, the inputs to the upper NORD
gate
FLIP-FLOP One problem with SR flip-flop is that the condition R = 1, S = 1
11.25
Flip-Flop
become R = 0,Figure
Q = 0.
AfterDanother
gate delay of ^t the output Q becomes 1. must
This be avoided. One way to do this is to allow just a single input. The D flip-flop
is again a stable state. The inputs to the lower gate are now S = 1, Q = 1, which
Figure 11.26 JK Flip-Flop
accomplishes this.
Figure 11.25 shows a gate implementation of the D flip-flop. By
maintain
the output
= 0. As
long as Sto=as1 aand
R = 0,
theflip-flop.
outputs will
arrangement.
ThisQdevice
is referred
clocked
SR
Noteremain
that the
using an inverter, the nonclock inputs to the two AND gates are guaranteed to be
QR
= and
1, QS=inputs
0. Furthermore,
if the
S returns
0, the
outputs
unchanged.
are passed to
NOR to
gates
only
duringwill
theremain
clock pulse.
the opposite of each other.
The R output performs the opposite function. When R goes to 1, it forces
causing
theDoutput
to be
1; if only the
K inputtoisas
asserted,
the
result isbecause
a reset function,
The
flip-flop
is sometimes
referred
the data
flip-flop
it is, in
D FLIP-FLOP One problem with SR flip-flop is that the condition R = 1, Sof= 1
Q = 0, Q = 1 regardless of the previous state of Q and Q. Again, a time delay
causing
the output
tobit
beof0.data.
When
both
J and
K are
1, the function
performed
effect, storage
for one
The
output
of the
D flip-flop
is always equal
to the is
must be avoided. One way to do this is to allow just a single input. The D flip-flop
2^t occurs before the final state is established (Figure 11.23).
referred
to asvalue
the toggle
function:
output
is reversed.
Thus, and
if Q is
1 and 1 is
applied
mostBy
recent
applied
to the the
input.
Hence,
it remembers
produces
the
last
accomplishes this. Figure 11.25 shows a gate implementation of the D flip-flop.
The SR latch can be defined with a table similar to a truth table, called
a
to
J and
K,also
thenreferred
Q becomes
0. The
reader
should
verifyitthat
theaimplementation
using an inverter, the nonclock inputs to the two AND gates are guaranteed
to
beIt is
input.
to as the
delay
flip-flop,
because
delays
0 or 1 applied toof
characteristic
table,
which
shows
the
next
state
or
states
of
a
sequential
circuit
as
Figure
11.26
produces
this characteristic
function.
Jan2016
Computer
Architecture
the
opposite of each other.
its
input
for
a
single
clock
pulse.
We
can
capture
the
logic
of
the
D
flip-flop
in
the83
a function of current states and inputs. In the case of the SR latch, the state can
The D flip-flop is sometimes referred to as the data flip-flop because following
it is, in truth table:
be defined by the value of Q. Table 11.10a shows the resulting characteristic table.
effect, storage for one bit of data. The output of the D flip-flop is always equal to the
Observe that the inputs S = 1, R = 1 are not allowed, because these would proD
Q
n
!
1
most recent value applied to the input. Hence, it remembers and produces the last
duce an inconsistent output (both Q and Q equal 0). The table can be expressed
input. It is also referred to as the delay flip-flop, because it delays a 0 or 1 applied to
Name
Graphical Symbol
Truth Table
0
0
more compactly, as in Table 11.10b. An illustration of the behavior of the SR latch
its input for a single clock pulse. We can capture the logic of the D flip-flop in the
1
1
is shown in Table 11.10c.
following truth table:
S
Q
S
R
Qn!1
CLOCKED SR FLIP-FLOP The output of the SR latch changes, after a brief
JK FLIP-FLOP Another useful flip-flop is the JK flip-flop. Like the SR flip-flop,
D input.
Qn !This
1
time delay, in response to a change in the
is referred to as asynchronous
0
0
it has two inputs. However, in this case all possible combinations
ofQinput
values are
n
operation. More typically, events in the digital
computer
are synchronized to a clock
Ck
0
0
SR
0and Figure 11.27
1
0
flip-flop,
pulse, so that changes occur only when a1clock 1pulse occurs. Figure 11.24 showsvalid.
this Figure 11.26 shows a gate implementation of the JK
shows its characteristic table (along with those for the1 SR 0and D1flip-flops). Note
1
1
Q same as for the
that the first three combinationsRare the
SR flip-flop. With no input
JK FLIP-FLOP Another useful flip-flop is the JK flip-flop. Like the SR flip-flop,
asserted, the output is stable. If only the J input is asserted, the result is a set function,
it has two inputs. However, in this case all possible combinations of input values are
valid. Figure 11.26 shows a gate implementation of the JK flip-flop, and Figure 11.27
J
Q
J
K
Qn!1
shows its characteristic table (along with those for the SR and D flip-flops). Note
0
0
Qn
that the first three combinations are the same as for the SR flip-flop. With no input
Ck
D Flip Flop
Clock
Load
D08
JK
D06
D05
D04
D03
D02
D01
Output lines
Figure 11.28 8-Bit Parallel Register
Registers
J-K Flip-Flop
D07
Jan2016
As an example of the use of flip-flops, let us first examine one of the essential elements of the CPU: the register. As we know, a register is a digital circuit used within
the CPU to store one or more bits of data. Two basic types of registers are commonly used: parallel registersComputer
and shift
registers.
Architecture
84
PARALLEL REGISTERS A parallel register consists of a set of 1-bit memories that
can be read or written simultaneously. It is used to store data. The registers that we
have discussed throughout this book are parallel registers.
The 8-bit register of Figure 11.28 illustrates the operation of a parallel register
using D flip-flops. A control signal, labeled load, controls writing into the register
from signal lines, D11 through D18. These lines might be the output of multiplexers,
so that data from a variety of sources can be loaded into the register.
SHIFT REGISTER A shift register accepts and/or transfers information serially.
Consider, for example, Figure 11.29, which shows a 5-bit shift register constructed
from clocked D flip-flops. Data are input only to the leftmost flip-flop. With each
21
Thus, a register made up of n flip-flops can count up to 2 - 1. An example of a

counter in the CPU is the program counter.
Counters can be designated as asynchronous or synchronous, depending on
the way in which they operate. Asynchronous counters are relatively slow because
the output of one flip-flop triggers a change in the status of the next flip-flop. In a
synchronous counter, all of the flip-flops change state at the same time. Because the
latter type is much faster, it is the kind used in CPUs. However, it is useful to begin
the discussion with a description of an asynchronous counter.
monly used: parallel registers and shift registers.

PARALLEL REGISTERS A parallel register consists of a set of 1-bit memories that
can be read or written simultaneously. It is used to store data. The registers that we
have discussed throughout this book are parallel registers.
The 8-bit register of Figure 11.28 illustrates the operation of a parallel register
using D flip-flops. A control signal, labeled load, controls writing into the register
from signal lines, D11 through D18. These lines might be the output of multiplexers,
so that data from a variety of sources can be loaded into the register.
SHIFT REGISTER A shift register accepts and/or transfers information serially.

NKK-HUST
Consider, for example, Figure 11.29, which shows a 5-bit shift register constructed
NKK-HUST
from clocked D flip-flops. Data are input only to the leftmost flip-flop. With each
clock pulse, data are shifted to the right one position, and the rightmost bit is
transferred out.
Shift registers can be used to interface to serial I/O devices. In addition, they
can be used within the ALU to perform logical shift and rotate functions. In this
Thanh ghi dch 5-bit
RIPPLE COUNTER An asynchronous counter is also referred to as a ripple counter,

because the change that occurs to increment the counter starts at one end and
ripples through to the other end. Figure 11.30 shows an implementation of a
4-bit counter using JK flip-flops, together with a timing diagram that illustrates its
behavior. The timing diagram is idealized in that it does not show the propagation
delay that occurs as the signals move down the series of flip-flops. The output of
the leftmost flip-flop (Q0) is the least significant bit. The design could clearly be
extended to an arbitrary number of bits by cascading more flip-flops.
B m 4-bit
High
J
Serial in
Q
Clk
Clk
Q
Clk
Q
Clk
Jan2016
Serial out
Clk
Ck
Clock
K
Ck
Q
Ck
Q0
Q
Ck
Q1
K
Q2
Q
Q3
(a) Sequential circuit

Clock
Clock
Figure 11.29 5-Bit Shift Register
Q0
Q1
Q2
Q3
(b) Timing diagram
Figure 11.30
Jan2016
85
NKK-HUST
Jan2016
Ripple Counter
86
NKK-HUST
Kin trc my tnh
Chng 3
H THNG MY TNH
Ht chng 2
Nguyn Kim Khnh

Jan2016
87
Jan2016
88
22
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 3
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Jan2016
3.1. Cc thnh phn c bn ca my tnh

3.2. Hot ng c bn ca my tnh
3.3. Bus my tnh
89
NKK-HUST
Jan2016
NKK-HUST
3.1. Cc thnh phn c bn ca my tnh

n
CPU
B nh chnh
H thng vo-ra
Cha cc chng trnh ang

thc hin
Nguyn tc hot ng c bn:

n
Trao i thng tin gia my tnh

vi bn ngoi
iu khin hot ng ca my tnh

x l d liu
CPU hot ng theo chng trnh nm trong
b nh chnh.
L thnh phn nhanh nht trong h thng
Bus h thng (System bus)

n
Kt ni v vn chuyn thng tin
Chc nng:
n
iu khin hot ng ca my
tnh v x l d liu
H thng vo-ra (Input/Output)

n
B nh chnh (Main Memory)

n
1. B x l trung tm (CPU)
B x l trung tm (Central
Processing Unit CPU)
n
Bus h thng
Jan2016
90
91
Jan2016
92
23
Jan2016
NKK-HUST
NKK-HUST
Cc thnh phn c bn ca CPU

n
n v
s hc v logic
Bus
h thng
Tp thanh ghi
Arithmetic and Logic Unit (ALU)

Thc hin cc php ton s hc v
php ton logic
n
n
93
Jan2016
n
n
n
Jan2016
B nh
m
B nh
chnh
Cc
thit b
lu tr
94
NKK-HUST
B nh chnh (Main memory)

n
B nh chnh (Main memory)

B nh m (Cache memory)
Thit b lu tr (Storage Devices)
CPU
NKK-HUST
Thao tc ghi (Write)

Thao tc c (Read)
Cc thnh phn chnh:

n
Register File (RF)

Gm cc thanh ghi cha cc thng
tin phc v cho hot ng ca CPU
Chc nng: nh chng trnh v d liu (di dng

nh phn)
Cc thao tc c bn vi b nh:
n
Tp thanh ghi
n
Jan2016
Control Unit (CU)

iu khin hot ng ca my tnh
theo chng trnh nh sn
n v s hc v logic
n
n v iu khin
n
n v
iu khin
2. B nh my tnh
Tn ti trn mi my tnh
Cha cc lnh v d liu ca
chng trnh ang c thc hin
S dng b nh bn dn
T chc thnh cc ngn nh
c nh a ch (thng nh
a ch cho tng byte nh)
Ni dung ca ngn nh c th
thay i, song a ch vt l ca
ngn nh lun c nh
CPU mun c/ghi ngn nh cn
phi bit a ch ngn nh
B nh m (Cache memory)
Ni dung
a ch
0100 1101
00...0000
0101 0101
00...0001
1010 1111
00...0010
0000 1110
00...0011
0111 0100
00...0100
1011 0010
00...0101
0010 1000
00...0110
1110 1111
00...0111
n
n
n
.
.
n
0110 0010
11...1110
0010 0001
11...1111
95
Jan2016
B nh c tc nhanh c t m gia CPU

v b nh chnh nhm tng tc CPU truy cp
b nh
Dung lng nh hn b nh chnh
S dng b nh bn dn tc nhanh
Cache thng c chia thnh mt s mc (L1,
L2, L3)
Cache thng c tch hp trn cng chip b
x l
Cache c th c hoc khng
96
24
Jan2016
NKK-HUST
NKK-HUST

n
n
Cn c gi l b nh ngoi
Chc nng v c im
Lu gi ti nguyn phn mm ca my tnh

c kt ni vi h thng di dng cc thit b vo-ra
Dung lng ln
Tc chm
n
n
3. H thng vo-ra
n
n
Cc loi thit b lu tr
n
n
n
Chc nng: Trao i

thng tin gia my tnh
vi th gii bn ngoi
Cc thao tc c bn:
97
NKK-HUST
Thit b
vo-ra
Cc thit b vo-ra
(IO devices)
Cc m-un vo-ra
(IO modules)
M-un
vo-ra
Thit b
vo-ra
98
NKK-HUST
M-un vo-ra
Cn c gi l thit b ngoi vi (Peripherals)

Chc nng: chuyn i d liu gia bn trong
v bn ngoi my tnh
Cc loi thit b vo-ra:
Thit b vo (Input Devices)

Thit b ra (Output Devices)
Thit b lu tr (Stotage Devices)
Thit b truyn thng (Communication Devives)
n
n
n
n
Jan2016
M-un
vo-ra
Vo d liu (Input)
Ra d liu (Output)
Jan2016
Cc thit b vo-ra
n
Thit b
vo-ra
Cc thnh phn chnh:
B nh t: a cng HDD
B nh bn dn: th rn SSD, nh flash, th nh
B nh quang: CD, DVD
Jan2016
Bus
h
thng
99
Jan2016
Chc nng: ni ghp cc thit b vo-ra vi

my tnh
Mi m-un vo-ra c mt hoc mt vi cng
vo-ra (I/O Port)
Mi cng vo-ra c nh mt a ch xc
nh
Cc thit b vo-ra c kt ni v trao i d
liu vi my tnh thng qua cc cng vo-ra
CPU mun trao i d liu vi thit b vo-ra,
cn phi bit a ch ca cng vo-ra tng
ng
100
25
Jan2016
NKK-HUST
NKK-HUST
3.2. Hot ng c bn ca my tnh
1. Thc hin chng trnh

n
n
n
n
Thc hin chng trnh

Hot ng ngt
Hot ng vo-ra
L hot ng c bn ca my tnh
My tnh lp i lp li chu trnh lnh gm
hai bc:
n
n
Hot ng thc hin chng trnh b

dng nu:
n
n
n
Jan2016
101
NKK-HUST
Jan2016
Thc hin lnh b li

Gp lnh dng
Tt my
Jan2016
102
NKK-HUST
Nhn lnh
n
Nhn lnh
Thc hin lnh
Minh ha qu trnh nhn lnh
Bt u mi chu trnh lnh, CPU nhn lnh t b

nh chnh
B m chng trnh PC (Program Counter) l
thanh ghi ca CPU dng gi a ch ca lnh
s c nhn vo
CPU
lnh
300
CPU
lnh
300
PC
lnh
301
PC
lnh
301
302
lnh i
302
303
lnh i
302
lnh i+1
303
lnh i+2
304
IR
CPU pht ra a ch t b m chng trnh PC

tm ra ngn nh cha lnh
Lnh c c t b nh a vo thanh ghi
lnh IR (Instruction Register)
lnh i+1
303
lnh i+2
304
IR
lnh i
Sau khi nhn lnh i
Trc khi nhn lnh i
Sau khi lnh c nhn vo, ni dung PC t

ng tng tr n lnh k tip.
103
Jan2016
104
26
Jan2016
NKK-HUST
NKK-HUST
Thc hin lnh

n
2. Ngt (Interrupt)
B x l gii m lnh c nhn v

pht tn hiu iu khin thc hin thao
tc m lnh yu cu
Cc kiu thao tc c bn ca lnh:

n
Jan2016
Trao i d liu gia CPU vi b nh chnh

hoc CPU vi m-un vo-ra
Chuyn iu khin trong chng trnh: r

nhnh hoc nhy n v tr khc
105
NKK-HUST
106
NKK-HUST
Hot ng ngt (tip)
Sau khi hon thnh mi mt lnh, b x l kim

tra tn hiu ngt
Nu khng c ngt, b x l nhn lnh tip theo
ca chng trnh hin ti
Nu c tn hiu ngt:
n
n
n
n
Jan2016
Bit l (exception): gy ra do li khi thc hin

chng trnh (VD: trn s, m lnh sai, ...)
Ngt t bn ngoi (external interrupt): do thit b
vo-ra (thng qua m-un vo-ra) gi tn hiu ngt
n CPU yu cu trao i d liu
Jan2016
Hot ng vi ngt t bn ngoi

n
Chng trnh con x l ngt (Interrupt handlers)
Cc loi ngt:
n
Thc hin cc php ton s hc hoc php

ton logic vi cc d liu
Khi nim chung v ngt: Ngt l c ch cho

php CPU tm dng chng trnh ang thc
hin chuyn sang thc hin mt chng
trnh con c sn trong b nh.
Chng trnh
ang thc hin
lnh
lnh
Tm dng (suspend) chng trnh ang thc hin

Ct ng cnh (cc thng tin lin quan n chng trnh
b ngt)
Thit lp b m chng trnh PC tr n chng trnh
con x l ngt tng ng
Chuyn sang thc hin chng trnh con x l ngt
Khi phc ng cnh v tr v tip tc thc hin
chng trnh ang b tm dng
107
Ngt y
Chng trnh con

x l ngt
lnh
lnh
lnh
lnh
lnh i
lnh
lnh i+1
lnh
...
RETURN
...
lnh
Jan2016
108
27
Jan2016
NKK-HUST
NKK-HUST
X l vi nhiu tn hiu yu cu ngt

n
X l vi nhiu tn hiu yu cu ngt (tip)
X l ngt tun t
n
n
n
X l ngt u tin
Khi mt ngt ang c thc hin, cc ngt khc b

cm (disabled interrupt)
B x l s b qua cc yu cu ngt tip theo
Cc yu cu ngt
tip theo vn ang
i v c kim tra
sau khi ngt hin ti
c x l xong
Cc ngt c thc
hin tun t
n
n
Interrupt
handler X
User program
Interrupt
handler X
User program
Interrupt
Cc ngt c nh ngha mc u handler

tinY khc nhau
Ngt c mc u tin thp hn c th b ngt bi
ngt c mc u tin cao hn
Xy ra ngt lng nhau
(a) Sequential interrupt processing
Interrupt
handler X
User program
Interrupt
handler Y
Interrupt
handler Y
(a) Sequential interrupt processing

Interrupt
handler X
program
Computer User
Architecture
Jan2016
109
(b) Nested interrupt processing
Jan2016
Figure 3.13
Transfer of Control with Multiple Interrupts
110
82
Interrupt
handler Y
NKK-HUST
NKK-HUST
3. Hot ng vo-ra
3.3. Bus my tnh

1. Lung thng tin trong my tnh
(b) Nested interrupt processing
Hot ng vo-ra: l hot ng trao i

d liu gia m-un vo-ra vi bn trong
my tnh.
Figure 3.13 Transfer of Control with Multiple Interrupts
82
CPU trao i d liu vi m-un vo-ra bi

lnh vo-ra trong chng trnh
CPU
M-un nh
M-un vo-ra
cn c kt ni vi nhau
CPU trao quyn iu khin cho php m-un

vo-ra trao i d liu trc tip vi b nh
chnh (DMA - Direct Memory Access).
Cc m-un trong my tnh:

n
Cc kiu hot ng vo-ra:

n
Jan2016
111
Jan2016
112
28
Jan2016
NKK-HUST
NKK-HUST
Kt ni m-un nh
Kt ni m-un nh (tip)
n
n
a ch
M-un
nh
d liu
a ch a n xc nh ngn nh
D liu c a n khi ghi
D liu hoc lnh c a ra khi c
n
d liu hoc lnh
B nh khng phn bit lnh v d liu
Nhn cc tn hiu iu khin:

n
Tn hiu iu khin c
iu khin c (Read)
iu khin ghi (Write)
Tn hiu iu khin ghi
Jan2016
113
NKK-HUST
Jan2016
NKK-HUST
Kt ni m-un vo-ra
Kt ni m-un vo-ra (tip)

n
d liu t bn trong
d liu ra bn ngoi
d liu t bn ngoi
d liu vo bn trong
a ch a n xc nh cng vo-ra
Ra d liu (Output)
n
M-un
vo-ra
a ch
n
Cc tn hiu iu khin thit b
Cc tn hiu iu khin ngt
tn hiu iu khin ghi
n
n
115
Jan2016
Nhn d liu t bn trong (CPU hoc b nh chnh)

a d liu ra thit b vo-ra
Vo d liu (Input)
n
tn hiu iu khin c
Jan2016
114
Nhn d liu t thit b vo-ra

a d liu vo bn trong (CPU hoc b nh chnh)
Nhn cc tn hiu iu khin t CPU

Pht cc tn hiu iu khin n thit b vo-ra
Pht cc tn hiu ngt n CPU
116
29
Jan2016
NKK-HUST
NKK-HUST
Kt ni CPU
Kt ni CPU (tip)
n
lnh
a ch
n
n
CPU
d liu
d liu
Cc tn hiu iu khin
b nh v vo-ra
Jan2016
117
NKK-HUST
Jan2016
n
n
Jan2016
118
S cu trc bus c bn
Bus: tp hp cc ng kt ni vn chuyn
thng tin gia cc m-un ca my tnh vi
nhau.
Cc bus chc nng:
n
NKK-HUST
2. Cu trc bus c bn
n
Pht a ch n cc m-un nh hay cc mun vo-ra

c lnh t b nh
c d liu t b nh hoc m-un vo-ra
a d liu ra (sau khi x l) n b nh
hoc m-un vo-ra
Pht tn hiu iu khin n cc m-un nh
v cc m-un vo-ra
Nhn cc tn hiu ngt
CPU
Bus a ch (Address bus)

Bus d liu (Data bus)
Bus iu khin (Control bus)
M-un
nh
M-un
vo-ra
M-un
vo-ra
bus a ch
bus d liu
rng bus: l s ng dy ca bus c th

truyn cc bit thng tin ng thi (ch dng cho
bus a ch v bus d liu)
M-un
nh
bus iu khin
119
Jan2016
120
30
Jan2016
NKK-HUST
NKK-HUST
Bus a ch
n
Bus d liu
Chc nng: vn chuyn a ch xc nh v tr

ngn nh hay cng vo-ra
rng bus a ch:
Chc nng:
n
n
N bit:
AN-1, AN-2, ... A2, A1, A0
S lng a ch ti a c s dng l: 2N a ch
(gi l khng gian a ch)
n a ch nh nht:
00 ... 000 (2)
n a ch ln nht:
11 ... 111 (2)
n
V d:
My tnh c bus d liu kt ni CPU vi b nh l 64-bit
C th trao i 8 byte nh mt thi im
121
NKK-HUST
Jan2016
122
NKK-HUST
Bus iu khin
Mt s tn hiu iu khin in hnh
Chc nng: vn chuyn cc tn hiu iu khin
Cc loi tn hiu iu khin:

n
Cc tn hiu iu khin c/ghi
Cc tn hiu iu khin bus
Cc tn hiu (pht ra t CPU) iu khin c/ghi:

n
Jan2016
M bit: DM-1, DM-2, ... D2, D1, D0

M thng l 8, 16, 32, 64 bit
My tnh s dng bus a ch 32-bit (A31-A0), b nh

chnh c nh a ch cho tng byte
C kh nng nh a ch cho 232 bytes nh = 4GiB
Jan2016
rng bus d liu: s bit c truyn ng thi

n
V d:
vn chuyn lnh t b nh n CPU

vn chuyn d liu gia cc thnh phn ca my tnh
vi nhau
123
Jan2016
Memory Read (MEMR): Tn hiu iu khin c d

liu t mt ngn nh c a ch xc nh a ln bus
d liu.
Memory Write (MEMW): Tn hiu iu khin ghi d
liu c sn trn bus d liu n mt ngn nh c a
ch xc nh.
I/O Read (IOR): Tn hiu iu khin c d liu t mt
cng vo-ra c a ch xc nh a ln bus d liu.
I/O Write (IOW): Tn hiu iu khin ghi d liu c sn
trn bus d liu ra mt cng c a ch xc nh.
124
31
Jan2016
NKK-HUST
NKK-HUST
Mt s tn hiu iu khin in hnh (tip)

n
Mt s tn hiu iu khin in hnh (tip)
Cc tn hiu iu khin ngt:

n
Interrupt Request (INTR): Tn hiu t b iu khin

vo-ra gi n yu cu ngt CPU trao i vo-ra.
Tn hiu INTR c th b che.
Cc tn hiu iu khin bus:

n
Interrupt Acknowledge (INTA): Tn hiu pht ra t

CPU bo cho b iu khin vo-ra bit CPU chp
nhn ngt trao i vo-ra.
Non Maskable Interrupt (NMI): tn hiu ngt khng

che c gi n ngt CPU.
Reset: Tn hiu t bn ngoi gi n CPU v cc

thnh phn khc khi ng li my tnh.
88
Bus Request (BRQ) : Tn hiu t m-un vo-ra gi

n yu cu CPU chuyn nhng quyn s dng
bus.
Bus Grant (BGT): Tn hiu pht ra t CPU chp nhn
chuyn nhng quyn s dng bus cho m-un vora.
Lock/ Unlock: Tn hiu cm/cho-php xin chuyn
nhng bus.
CHAPTER 3 / A TOP-LEVEL VIEW OF COMPUTER FUNCTION

Processor
Jan2016
125
Jan2016
Local bus
Cache
126
Local I/O
controller
Main
memory
System bus
NKK-HUST
NKK-HUST
Expansion
bus interface
Network
SCSI
3. Phn cp bus
Phn cp bus
Serial
Modem
Expansion bus
n bus: Tt c cc m-un kt ni vo bus

chung
n
(a) Traditional bus architecture
Main
memory
Bus ch phc v c mt yu cu trao i d liu

ti mt thi im tr ln
Bus phi c tc bng tc bus ca m-un
nhanh nht trong h thng
Processor
SCSI
a bus: Phn cp thnh nhiu bus cho cc

m-un khc nhau v c tc khc nhau
n
n
n
FAX
Bus ca b x l
Bus ca RAM
Cc bus vo-ra
Cache /
bridge
FireWire
System bus
Graphic
Video
LAN
High-speed bus
Expansion
bus interface
Serial
Modem
Expansion bus
(b) High-performance architecture
Figure 3.17
Jan2016
Local bus
127
Jan2016
Example Bus Configurations
so an external bus or other interconnect scheme is not needed, although there may
also be an external cache. As will be discussed in Chapter 4, the use of a cache structure insulates the processor from a requirement to access main memory frequently.
Hence, main memory can be moved off of the local bus onto a system bus. In this way,
I/O transfers to and from the main memory across the system bus do not interfere
with the processors activity.
128
32
Jan2016
NKK-HUST
NKK-HUST
4. Kt ni im-im
n
n
Point-to-point connection
Khc phc nhc im ca bus dng chung
(shared bus)
Kt ni QPI
Kt ni PCIe
I/O Hub
Core
A
Core
B
Core
DRAM
DRAM
I/O device
CHAPTER 3 / A TOP-LEVEL VIEW OF COMPUTER FUNCTION

I/O device
94
Mt s bus in hnh trong my tnh
Gigabit
Ethernet
PCIe
PCIePCI
Bridge
PCIe
n
n
3.6 / PCI EXPRESS
99
Core
Memory
Chipset
Memory
I/O Hub
DRAM
Core
D
PCIe
Legacy
endpoint
Figure 3.24
QPI
Jan2016
Figure 3.20
PCI Express
Memory bus
PCIe
Switch
PCIe
I/O device
I/O device
DRAM
PCIe
Core
C
QPI (Quick Path Interconnect)

PCI bus (Peripheral Component Interconnect):
bus vo-ra a nng
PCIe: (PCI express) kt ni im-im a nng
tc cao
SATA (Serial Advanced Technology Attachment):
Bus kt ni vi a cng hoc a CD/DVD
USB (Universal Serial Bus): Bus ni tip a nng
PCIe
PCIe
endpoint
PCIe
endpoint
PCIe
endpoint
Typical Configuration Using PCIe
device and one or more that attach to a switch that manages multiple PCIe streams.
PCIe links from the chipset may attach to the following kinds of devices that imple129
ment PCIe:
Multicore Configuration Using QPI

Jan2016
130
Switch: The switch manages multiple PCIe streams.

PCIe endpoint: An I/O device or controller that implements PCIe, such as
the network. Direct QPI connections can be a Gigabit Ethernet switch, a graphics or video controller, disk interface, or a
processors. If core A in Figure 3.20 needs to communications controller.
that enables data to move throughout

established between each pair of core
access the memory controller in core D, it sends its request through either cores B Legacy endpoint: Legacy endpoint category is intended for existing designs
or C, which must in turn forward that request on to the memory controller in core D. that have been migrated to PCI Express, and it allows legacy behaviors such
as use of I/O space and locked transactions. PCI Express endpoints are not
Similarly, larger systems with eight or more processors can be built using processors permitted to require the use of I/O space at runtime and must not use locked
with three links and routing traffic through intermediate processors.
transactions. By distinguishing these categories, it is possible for a system
In addition, QPI is used to connect to an I/O module, called an I/O hub (IOH). designer to restrict or eliminate legacy behaviors that have negative impacts
The IOH acts as a switch directing traffic to and from I/O devices. Typically in newer on system performance and robustness.
NKK-HUST
PCIe/PCI bridge: Allows older PCI devices to be connected to PCIe-based
systems, the link from the IOH to the I/O device controller uses an interconnect systems.
technology called PCI Express (PCIe), described later in this chapter. The IOH transAs with QPI, PCIe interactions are defined using a protocol architecture. The
lates between the QPI protocols and formats and the PCIe protocols and formats. A
PCIe protocol architecture encompasses the following layers (Figure 3.25):
core also links to a main memory module (typically the memory uses dynamic access
random memory (DRAM) technology) using a dedicated memory bus.
QPI is defined as a four-layer protocol architecture,3 encompassing the
following layers (Figure 3.21):
8
7
Physical: Consists of the actual wires carrying the signals, as well as circuitry
and logic to support ancillary features required in the transmission and receipt
of the 1s and 0s. The unit of transfer at the Physical layer is 20 bits, which is 3
9
called a Phit (physical unit).
2
NKK-HUST
V d bus trong my tnh Intel
9.6 (24.38cm)
The reader unfamiliar with the concept of a protocol architecture will find a brief overview in Appendix L.
Ht chng 3
5
6
11
13
4 12
10
14 15
11.6 (29.46cm)
Jan2016
131
Jan2016
132
33
Jan2016
NKK-HUST
NKK-HUST
Kin trc my tnh
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 4
S HC MY TNH
Nguyn Kim Khnh

Jan2016
133
NKK-HUST
Jan2016
NKK-HUST
Ni dung chng 4
4.1. Biu din s nguyn
4.1. Biu din s nguyn

4.2. Php cng v php tr s nguyn
4.3. Php nhn v php chia s nguyn
4.4. S du phy ng
Jan2016
134
n
n
135
Jan2016
S nguyn khng du (Unsigned Integer)

S nguyn c du (Signed Integer)
136
34
Jan2016
NKK-HUST
NKK-HUST
1. Biu din s nguyn khng du
V d 1
Nguyn tc tng qut: Dng n bit biu din s

nguyn khng du A:
Biu din cc s nguyn khng du sau y

bng 8-bit:
A = 41 ; B = 150
Gii:
A = 41 = 32 + 8 + 1 = 25 + 23 + 20
41 = 0010 1001
an1an2 ... a2 a1a0

Gi tr ca A c tnh nh sau:
n1
A = ai 2 i
B = 150 = 128 + 16 + 4 + 2 = 27 + 24 + 22 + 21
150 = 1001 0110
i=0
Di biu din ca A:
Jan2016
[0, 2n 1]
137
NKK-HUST
Jan2016
NKK-HUST
V d 2
n
Vi n = 8 bit
Biu din c cc gi tr t 0 n 255 (28 - 1)
Cho cc s nguyn khng du M, N c

biu din bng 8-bit nh sau:
Ch :
Biu din
nh phn
Gi tr
thp phn
0000 0000
0000 0001
0000 0010
0000 0011
0
1
2
3
4
do vt ra khi di biu din
0000 0100
...
1111 1110
254
1111 1111
255
M = 0001 0010
N = 1011 1001
+ 0000 0001
Xc nh gi tr ca chng ?
n
n
M = 0001 0010 =
24
N = 1011 1001 =
27
21
25
c nh ra ngoi
(Carry out)
= 16 +2 = 18
+
24
23
255 + 1 = 0 ???
20
= 128 + 32 + 16 + 8 + 1 = 185
1111 1111
1 0000 0000
Gii:
Jan2016
138
139
Jan2016
140
35
Jan2016
NKK-HUST
NKK-HUST
Trc s hc vi n = 8 bit
Vi n = 16 bit, 32 bit, 64 bit
Trc s hc:
n= 16 bit: di biu din t 0 n 65535 (216 - 1)
255
0 1 2 3
n
n
Trc s hc my tnh:
254
255 0
n
n
n
n
Jan2016
141
NKK-HUST
0000 0000 0000 0000

...
0000 0000 1111 1111
0000 0001 0000 0000
...
1111 1111 1111 1111
= 0
= 255
= 256
= 65535
n= 32 bit: di biu din t 0 n 232 - 1

n= 64 bit: di biu din t 0 n 264 - 1
Jan2016
142
NKK-HUST
2. Biu din s nguyn c du
V d
Vi n = 8 bit, cho A = 0010 0101
S b mt v S b hai
n
- 0010 0101
S b hai ca A
(28 - 1)
(A)
1101 1010
S b mt ca A = (2n-1) - A
2n
S b mt ca A c tnh nh sau:
1111 1111
nh ngha: Cho mt s nh phn A

c biu din bng n bit, ta c:
n
o cc bit ca A
-A
S b hai ca A = (S b mt ca A) +1
S b hai ca A c tnh nh sau:

1 0000 0000
(28)
- 0010 0101
(A)
1101 1011
thc hin kh khn
Jan2016
143
Jan2016
144
36
Jan2016
NKK-HUST
n
n
n
NKK-HUST
Quy tc tm S b mt v S b hai
Biu din s nguyn c du theo m b hai
S b mt ca A = o gi tr cc bit ca A
(S b hai ca A) = (S b mt ca A) + 1
V d:
Nguyn tc tng qut: Dng n bit biu din s

nguyn c du A:
Cho
A
S b mt ca A
=
=
S b hai ca A
an1an2 ... a2 a1a0
0010 0101
1101 1010
+
1
1101 1011
Nhn xt:
A
S b hai ca A
0010 0101
+ 1101 1011
1 0000 0000 = 0
(b qua bit nh ra ngoi)
S b hai ca A = -A
Jan2016
=
=
Vi A l s dng: bit an-1 = 0, cc bit cn li

biu din ln nh s khng du
Vi A l s m: c biu din bi s b hai
ca s dng tng ng, v vy bit an-1 = 1
145
NKK-HUST
Jan2016
NKK-HUST
V d
n
Xc nh gi tr ca s dng
Biu din cc s nguyn c du sau y bng

8-bit:
A = + 58 ; B = - 80
Gii:
A
= + 58
= 0011 1010
B
Jan2016
146
= - 80
Ta c: + 80
S b mt
Dng tng qut ca s dng:
0 an2 ... a2 a1 a0
n
Gi tr ca s dng:
n2
S b hai
= 0101 0000
= 1010 1111
+
1
= 1011 0000
Vy: B = - 80
= 1011 0000
A = ai 2 i
i=0
n
147
Jan2016
Di biu din cho s dng: [0, (2n-1 - 1)]

148
37
Jan2016
NKK-HUST
NKK-HUST
Xc nh gi tr ca s m
n
Cng thc xc nh gi tr s m
! = 1!!!!! !!!!! ! !!! !!! !!! !
Dng tng qut ca s m:
! = 0!!!!! !!!!! !! !!! !!! + 1!
1an2 ... a2 a1a0

n
= 11 111 !!!!! !!!!! ! !!! !!! !!! + 1!
Gi tr ca s m:
!!!
n2
A = 2
n1
+ ai 2
Chng minh?
(2!!!
!!!
i=0
n
!! 2! + 1!
1)
!1
Di biu din cho s m: [- 2n-1, -1]
! = 2
!2
+!
!! 2 !
!=0
Jan2016
149
NKK-HUST
Jan2016
NKK-HUST
Cng thc tng qut cho s nguyn c du

n
V d
Dng tng qut ca s nguyn c du A:
an1an2 ... a2 a1a0

n
Gi tr ca A c xc nh nh sau:
n2
A = an1 2
n1
+ ai 2 i
Jan2016
Di biu din: [-(2n-1), +(2n-1-1)]
Hy xc nh gi tr ca cc s nguyn c du
c biu din theo m b hai vi 8-bit nh
di y:
n
P = 0110 0010
Q = 1101 1011
Gii:
i=0
n
150
151
Jan2016
P = 0110 0010 = 64+32+2 = +98
Q = 1101 1011 = -128+64+16+8+2+1 = -37
152
38
Jan2016
NKK-HUST
NKK-HUST
Vi n = 8 bit
Trc s hc s nguyn c du vi n = 8 bit
Biu din c cc gi tr
t -27 n +27-1
Gi tr
thp phn
Biu din
b hai
-128 n +127
Ch c mt gi tr 0
Khng biu din cho gi tr
+128
0000 0000
0000 0001
+2
0000 0010
+126
0111 1110
+127
0111 1111
-128
1000 0000
-127
1000 0001
Trc s hc:
-128
...
Ch :
+127 + 1 = -128
(-128)+(-1) = +127
c trn xy ra (Overflow)
-2
1111 1110
-1
1111 1111
Trc s hc my tnh:
+1
+2
+3
-128
153
Jan2016
+127
154
M rng bit cho s nguyn
Vi n = 16bit: biu din t -215 n 215-1

n
n
n
n
n
n
n
0000 0000 0000 0000 =

0000 0000 0000 0001 =
...
0111 1111 1111 1111 =
1000 0000 0000 0000 =
1000 0000 0000 0001 =
...
1111 1111 1111 1111 =
0
+1
M rng theo s khng du (Zero-extended):

thm cc bit 0 vo bn tri
M rng theo s c du (Sign-extended):
n
+32767 (215 - 1)
-32768 (-215)
+32767
S dng:
+19 =
(16bit)
S m:
1110 1101
- 19 = 1111 1111 1110 1101
Vi n = 32bit: biu din t

n
63
Vi n = 64bit: biu din t -2 n 263-1
(8bit)
-1
-231
0001 0011
+19 = 0000 0000 0001 0011
- 19 =
NKK-HUST
Jan2016
-1
-3
Vi n = 16 bit, 32 bit, 64 bit
+127
-2
NKK-HUST
-2 -1 0 1 2
...
(do vt ra khi di biu din)
Jan2016
0
+1
231-1
(8bit)
(16bit)

155
Jan2016
156
39
Jan2016
NKK-HUST
NKK-HUST
Nguyn tc cng s nguyn khng du
4.2. Thc hin php cng/tr vi s nguyn
1. Php cng s nguyn khng du

B cng n-bit
Y
Khi cng hai s nguyn khng du n-bit,

kt qu nhn c l n-bit:
n
n bit
Cout
n bit
B cng n-bit
Cin
n
n bit
Nu Cout = 0 nhn c kt qu ng
Nu Cout = 1 nhn c kt qu sai, do
c nh ra ngoi (Carry Out)
Hin tng nh ra ngoi xy ra khi:

tng > (2n - 1)
Jan2016
157
NKK-HUST
Jan2016
NKK-HUST
2. Php o du
V d cng s nguyn khng du

+
57
34
91
=
= +
209
73
282
=
= +
0011 1001
0010 0010
0101 1011 = 64+16+8+2+1=91 ng
1101 0001
0100 1001
1 0001 1010
kt qu = 0001 1010 =16+8+2=26 sai
do c nh ra ngoi (Cout=1)
c kt qu ng, ta thc hin cng theo 16-bit:

209 =
0000 0000 1101 0001
+ 73 = + 0000 0000 0100 1001
0000 0001 0001 1010 = 256+16+8+2 = 282
Jan2016
158
159
Jan2016
Ta c:
+ 37
b mt
=
=
b hai
0010 0101
1101 1010
+
1
1101 1011 =
-37
Ly b hai ca s m:
- 37 =
1101 1011
b mt =
0010 0100
+
1
b hai
=
0010 0101 =
+37
Kt lun: Php o du s nguyn trong my tnh

thc cht l ly b hai
160
40
Jan2016
NKK-HUST
NKK-HUST
3. Cng s nguyn c du
n
Khi cng hai s nguyn c du n-bit, kt qu

nhn c l n-bit v khng cn quan tm n
bit Cout
n
n
V d cng s nguyn c du khng trn
Khi cng hai s khc du th kt qu lun lun ng

Khi cng hai s cng du, nu du kt qu cng du
vi cc s hng th kt qu l ng
Khi cng hai s cng du, nu kt qu c du ngc
li, khi c trn (Overflow) xy ra v kt qu b sai
Hin tng trn xy ra khi tng nm ngoi di

biu din: [ -(2n-1),+(2n-1-1)]
Jan2016
161
NKK-HUST
=
=
0100 0110
0010 1010
0111 0000 = +112
(+ 97)
+ (- 52)
+ 45
=
=
0110 0001
1100 1100 (+52=0011 0100)
1 0010 1101 = +45
( - 90)
+ ( +36)
- 54
=
=
1010 0110 (+90=0101 1010)

0010 0100
1100 1010 = - 54
( - 74)
+( - 30)
-104
=
=
1011 0110 (+74=0100 1010)

1110 0010 (+30=0001 1110)
1 1001 1000 = -104
Jan2016
162
NKK-HUST
V d cng s nguyn c du b trn

( + 75)
+( + 82)
+157
4. Nguyn tc thc hin php tr
= 0100 1011
= 0101 0010
1001 1101
= - 128+16+8+4+1= -99 sai
n
n
Php tr hai s nguyn: X-Y = X+(-Y)

Nguyn tc: Ly b hai ca Y c Y,
ri cng vi X
n-bit
( - 104) = 1001 1000

(+104=0110 1000)
+ ( - 43) = 1101 0101
(+ 43 =0010 1011)
- 147
1 0110 1101
= 64+32+8+4+1= +109 sai
n C hai v d u trn v tng nm ngoi di
biu din [-128, +127]
n
Jan2016
( + 70)
+ ( + 42)
+ 112
n-bit
B hai
B cng n-bit
n-bit
163
Jan2016
S= X-Y
164
41
Jan2016
NKK-HUST
NKK-HUST
4.3. Php nhn v php chia s nguyn
Nhn s nguyn khng du (tip)
1. Nhn s nguyn khng du

1011
x 1101
1011
0000
1011
1011
10001111
Jan2016
Cc tch ring phn c xc nh nh sau:

n
S b nhn (11)
S nhn
(13)
Cc tch ring phn
n
n
Tch
Nu bit ca s nhn bng 0 tch ring phn bng 0

Nu bit ca s nhn bng 1 tch ring phn bng s
b nhn
Tch ring phn tip theo c dch tri mt bit so vi
tch ring phn trc
Tch bng tng cc tch ring phn

Nhn hai s nguyn n-bit, tch c di 2n bit
(khng bao gi trn)
(143)
165
NKK-HUST
Jan2016
NKK-HUST
B nhn s nguyn khng du
Lu nhn s nguyn khng du

Bt u
S b nhn
Mn-1 Mn-2
...
M1
C 0; A 0
M S b nhn
Q S nhn
B m n
M0
iu
khin
cng
B cng n-bit
B logic iu khin
cng v dch
No
An-1 An-2
...
A1
A0
Qn-1 Qn-2
Dch phi C,A,Q

B m B m-1
...
Q1
Q0
B m = 0 ?
S nhn
No
Yes
Kt thc
Yes
Q0 = 1 ?
C,A A+M
iu khin
dch phi
Jan2016
166
167
Jan2016
Tch trong AQ
168
42
Jan2016
NKK-HUST
NKK-HUST
V d nhn s nguyn khng du

n
n
n
S b nhn M =
S nhn Q =
Tch
=
C
0
0
0
0
0
1
0
Jan2016
1011
1101
1000 1111
2. Nhn s nguyn c du
(11)
(13)
(143)
n
n
S dng thut gii nhn khng du

S dng thut gii Booth
A
Q
0000 1101 Cc gi tr khi u
+ 1011
1011 1101 A A + M
0101 1110 Dch phi
0010
+ 1011
1101
0110
+ 1011
0001
1000
1111 Dch phi

10.3 / INTEGER ARITHMETIC
1111 A A + M
1111 Dch phi
1111 A A + M
1111 Dch phi
169
Jan2016
335
as the twos complement value -7, then each partial product must be a negative
twos complement number of 2n (8) bits, as shown in Figure 10.11b. Note that this is
accomplished by padding out each partial product to the left with binary 1s.
If the multiplier is negative, straightforward multiplication also will not work.
The reason is that the bits of the multiplier no longer correspond to the shifts or
multiplications that must take place. For example, the 4-bit decimal number -3 is
written 1101 in twos complement. If we simply took partial products based on each
bit position, we would have the
following
correspondence:
Computer
Architecture
170
1101 g -(1 * 23 + 1 * 22 + 0 * 21 + 1 * 20) = -(23 + 22 + 20)
NKK-HUST
NKK-HUST
S dng thut gii nhn khng du

n
In fact, what is desired is -(21 + 20). So this multiplier cannot be used directly in
the manner we have been describing.
There are a number of ways out of this dilemma. One would be to convert
both multiplier and multiplicand to positive numbers, perform the multiplication,
and then take the twos complement of the result if and only if the sign of the two
original numbers differed. Implementers have preferred to use techniques that
do not require this final transformation step. One of the most common of these is
Booths algorithm. This algorithm also has the benefit of speeding up the multiplication process, relative to a more straightforward approach.
Booths algorithm is depicted in Figure 10.12 and can be described as follows.
As before, the multiplier and multiplicand are placed in the Q and M registers,
Thut gii Booth (tham kho sch COA)
Bc 1. Chuyn i s b nhn v s nhn

thnh s dng tng ng
START
0
A
0, Q"1
M
Multiplicand
Q
Multiplier
Count
n
Bc 2. Nhn hai s dng bng thut gii

nhn s nguyn khng du, c tch ca hai
s dng.
! 10
Bc 3. Hiu chnh du ca tch:

n
A"M
Nu hai tha s ban u cng du th gi nguyn

kt qu bc 2
Q0, Q"1
! 01
! 11
! 00
A#M
Arithmetic shift
Right: A, Q, Q"1
Count
Count " 1
Nu hai tha s ban u l khc du th o du

kt qu ca bc 2 (ly b hai)
No
Count ! 0?
Yes
END
Figure 10.12 Booths Algorithm for Twos

Complement Multiplication
Jan2016
171
Jan2016
172
43
Jan2016
NKK-HUST
NKK-HUST
3. Chia s nguyn khng du
B chia s nguyn khng du

S chia M
Mn-1 Mn-2
S b chia
10010011
- 1011
001110
- 1011
001111
- 1011
100
Jan2016
1011
00001101
S chia
Thng
M1
M0
iu
khin
cng/tr
B cng/tr n-bit
B logic iu khin
cng/tr v dch
iu khin
dch tri
An-1 An-2
Phn d
...
A1
A0
Qn-1 Qn-2
...
Q1
Q0
S b chia Q
173
NKK-HUST
Jan2016
174
NKK-HUST
Lu chia s nguyn khng du
4. Chia s nguyn c du
Bt u
n
A 0
M S chia
Q S b chia
B m n
Dch tri A,Q

AA-M
No
n
Yes
Q0 0
AA+M
Q0 1
B mB m-1
No
B m = 0 ?
Yes
Kt thc
Thng Q
S d A
Bc 1. Chuyn i s b chia v s chia v thnh s

dng tng ng.
Bc 2. S dng thut gii chia s nguyn khng du
chia hai s dng, kt qu nhn c l thng Q v
phn d R u l dng
Bc 3. Hiu chnh du ca kt qu nh sau:
(Lu : php o du thc cht l thc hin php ly b hai)
A<0?
Jan2016
...
175
Jan2016
S b chia
S chia
Thng
S d
dng
dng
gi nguyn
gi nguyn
dng
o du
gi nguyn
dng
o du
o du
gi nguyn
o du
176
44
Jan2016
NKK-HUST
NKK-HUST
4.4. S du phy ng
2. Chun IEEE754-2008
1. Nguyn tc chung
n Floating Point Number biu din cho s
thc
n Tng qut: mt s thc X c biu din
theo kiu s du phy ng nh sau:
X = M * RE
n
n
n
C s R =3462 CHAPTER 10 / COMPUTER ARITHMETIC

Cc dng:
Sign Biased
bit exponent
8 bits
23 bits
(a) Binary32 format
Sign Biased
bit exponent
Dng 64-bit
Trailing significand field

11 bits
(b) Binary64 format
M l phn nh tr (Mantissa),
R l c s (Radix),
E l phn m (Exponent).
Jan2016
Trailing
significand field
Dng 32-bit
52 bits
Sign
bit
Biased
exponent
Dng 128-bit
Trailing significand field
15 bits
(c) Binary128 format
Figure 10.21
177
112 bits
IEEE 754 Formats
is implementation dependent, but the standard places certain constraints on the

length of the exponent and significand. These formats are arithmetic format types
but not interchange format types. The extended formats are to be used for interComputer Architecture
mediate calculations.
With their greater precision, the extended formats lessen 178
the
Jan2016
Table 10.3
IEEE 754 Format Parameters

Parameter
Format
Binary32
Binary64
32
64
Storage width (bits)

Exponent width (bits)
NKK-HUST
NKK-HUST
Dng 32-bit
e
S
1 bit
S = 0 s dng
S = 1 s m
128
11
15
Exponent bias
127
1023
16383
Maximum exponent
127
1023
16383
- 126
Minimum exponent
Approx normal number range
(base 10)
10-38, 10+38
- 1022
10-308, 10+308
- 16382
10-4932, 10+4932
Trailing significand width (bits)*
23
52
112
Number of exponents
254
2046
32766
Number of fractions
223
252
2112
1.99 * 2128
Xc nh gi tr ca cc s thc c biu din

bng 32-bit sau y:
1100 0001 0101 0110 0000 0000 0000 0000
Number of values
23 bit
1.98 * 231
1.99 * 263
Smallest positive normal number
2-126
2-1022
2-16362
Largest positive normal number
2128 - 2104
21024 - 2971
216384 - 216271
Smallest subnormal magnitude
-149
-1074
2-16494
Note: *not including implied bit and not including sign bit
S = 1 s m
e = 1000 0010(2) = 130(10) E = 130 - 127 = 3
Vy
X = -1.10101100(2) * 23 = -1101.011(2) = -13.375(10)
n
n
e = E + 127 phn m E = e - 127
m (23 bit) l phn l ca phn nh tr M:

n
8 bit
e (8 bit) l gi tr dch chuyn ca phn m E:

n
S l bit du:
n
V d 1
Binary128
M = 1.m
0011 1111 1000 0000 0000 0000 0000 0000 = ?
Cng thc xc nh gi tr ca s thc:
X = (-1)S*1.m * 2e-127
Jan2016
179
Jan2016
180
45
Jan2016
NKK-HUST
NKK-HUST
V d 2
Cc qui c c bit
Biu din s thc X= 83.75(10) v dng s du

phy ng IEEE754 32-bit
Cc bit ca e bng 0, cc bit ca m bng 0, th X = 0

x000 0000 0000 0000 0000 0000 0000 0000 X = 0
Gii:
Cc bit ca e bng 1, cc bit ca m bng 0, th X =

x111 1111 1000 0000 0000 0000 0000 0000 X =
X = 83.75(10) = 1010011.11(2) = 1.01001111 x 26
Ta c:
Cc bit ca e bng 1, cn m c t nht mt bit bng 1, th

n khng biu din cho s no c (NaN - not a number)
S = 0 v y l s dng
E = e - 127 = 6 e = 127 + 6 = 133(10) = 1000 0101(2)
Vy:
X = 0100 0010 1010 0111 1000 0000 0000 0000

Jan2016
181
NKK-HUST
Jan2016
NKK-HUST
Di gi tr biu din
n
n
182
Dng 64-bit
2-127 n 2+127
10-38 n 10+38
n
n
S l bit du
e (11 bit) l gi tr dch chuyn ca phn
m E:
n
-2+127
-2-127 0
+2-127
+2+127
n
n
e = E + 1023 phn m E = e - 1023
m (52 bit): phn l ca phn nh tr M

Gi tr s thc:
X = (-1)S*1.m * 2e-1023
n
Jan2016
183
Jan2016
Di gi tr biu din: 10-308 n 10+308

184
46
Jan2016
NKK-HUST
NKK-HUST
Dng 128-bit
n
n
S l bit du
e (15 bit) l gi tr dch chuyn ca phn
m E:
n
n
n
3. Thc hin php ton s du phy ng

n
n
n
e = E + 16383 phn m E = e - 16383
X1 = M1 * RE1
X2 = M2 * RE2
Ta c
n
m (112 bit): phn l ca phn nh tr M

Gi tr s thc:
n
n
X1 * X2 = (M1* M2) * RE1+E2

X1 / X2 = (M1 / M2) * RE1-E2
X1 X2 = (M1*RE1-E2 M2) * RE2 , vi E2 E1
X = (-1)S*1.m * 2e-16383
n
Jan2016
Di gi tr biu din: 10-4932 n 10+4932
185
NKK-HUST
Jan2016
NKK-HUST
Cc kh nng trn s
n
Jan2016
186
Php cng v php tr
Trn trn s m (Exponent Overflow): m

dng vt ra khi gi tr cc i ca s m
dng c th ( )
Trn di s m (Exponent Underflow): m m
vt ra khi gi tr cc i ca s m m c th
( 0)
Trn trn phn nh tr (Mantissa Overflow):
cng hai phn nh tr c cng du, kt qu b
nh ra ngoi bit cao nht
Trn di phn nh tr (Mantissa Underflow):
Khi hiu chnh phn nh tr, cc s b mt bn
phi phn nh tr
187
n
n
n
Jan2016
Kim tra cc s hng c bng 0 hay

khng
Hiu chnh phn nh tr
Cng hoc tr phn nh tr
Chun ho kt qu
188
47
Jan2016
NKK-HUST
NKK-HUST
Thut ton cng/tr s du phy ng
Thut ton nhn s du phy ng

NHN
TR
X=0?
i du ca Y
CNG
X=0?
Y=0?
Phn m
bng nhau?
ZY
N
Tng phn m
nh hn
Z X
TR V
Dch phi
phn nh tr
Z 0
Kt qu
chun ha?
nh tr = 0?
N
Dch phi
phn nh tr
Lm trn
kt qu
Z 0
TR V
TR V
Y=0?
Y
Cng phn m
Tr cho
lch
Trn trn
phn m?
Y Thng bo
trn trn
TR V
nh tr b
trn?
Gim phn m
Trn di
phn m?
Y
Dch phi
phn nh tr
Y=0?
Y
Ct s kia
vo Z
Cng c du
phn nh tr
N Phn m b
trn di?
Tng phn m
N
Nhn phn
nh tr
Y Thng bo
trn di
TR V
Y
Bo trn di
TR V
Chun ha
TR V
Bo trn
Phn m
b trn?
TR V
Lm trn
TR V
Jan2016
189
NKK-HUST
Jan2016
190
NKK-HUST
Thut ton chia s du phy ng

CHIA
X=0?
Y=0?
Z 0
Tr phn m
Cng thm
lch
Trn trn
phn m?
TR V
N
Trn di
phn m?
N
Chia phn
nh tr
Ht chng 4
Y Thng bo
trn trn
Thng bo
trn di
TR V
Chun ha
Lm trn
TR V
Jan2016
191
Jan2016
192
48
Jan2016
NKK-HUST
NKK-HUST
Kin trc my tnh
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 5
KIN TRC TP LNH
Nguyn Kim Khnh

Jan2016
193
NKK-HUST
Jan2016
NKK-HUST
Ni dung ca chng 5
5.1. Gii thiu chung v kin trc tp lnh

n
5.1. Gii thiu chung v kin trc tp lnh

5.2. Lnh hp ng v ton hng
5.3. M my
5.4. C bn v lp trnh hp ng
5.5. Cc phng php nh a ch
5.6. Dch v chy chng trnh hp ng
Kin trc tp lnh (Instruction Set Architecture):

cch nhn my tnh bi ngi lp trnh
Vi kin trc (Microarchitecture): cch thc hin
kin trc tp lnh bng phn cng
Ngn ng trong my tnh:
n
Hp ng (assembly language):
n
n
195
Jan2016
dng lnh c th c c bi con ngi

biu din dng text
Ngn ng my (machine language):

n
cn gi l m my (machine code)
dng lnh c th c c bi my tnh
biu din bng cc bit 0 v 1
Jan2016
194
196
49
Jan2016
NKK-HUST
NKK-HUST
M hnh lp trnh ca my tnh
CPU nhn lnh t b nh
B nh chnh
CPU
lnh
lnh
lnh
lnh
PC
n v
iu khin
.
.
.
d liu
d liu
d liu
d liu
.
.
.
ALU
Tp thanh ghi
Vo-ra
B m chng trnh PC
(Program Counter) l thanh ghi
ca CPU gi a ch ca lnh cn
nhn vo thc hin
CPU pht a ch t PC n b
nh, lnh c nhn vo
Jan2016
NKK-HUST
Jan2016
lnh k tip
lnh
lnh
Ty thuc vo di ca lnh va
c nhn
MIPS: lnh c di 32-bit, PC tng 4
198
CPU c/ghi d liu b nh
B x l gii m lnh c nhn v pht cc

tn hiu iu khin thc hin thao tc m lnh yu
cu
Cc kiu thao tc chnh ca lnh:
n
Jan2016
lnh c
nhn vo
NKK-HUST
Gii m v thc hin lnh

n
lnh
PC
PC tng bao nhiu?
197
lnh
Sau khi lnh c nhn vo, ni

dung PC t ng tng tr sang
lnh k tip
n
.
.
.
lnh
Vi cc lnh trao i d liu vi b nh,

CPU cn bit v pht ra a ch ca ngn
nh cn c/ghi
a ch c th l:
Trao i d liu gia CPU v b nh chnh hoc cng

vo-ra
Thc hin cc php ton s hc hoc php ton logic

vi cc d liu (c thc hin bi ALU)
Hng s a ch c cho trc tip trong lnh

Gi tr a ch nm trong thanh ghi con tr
a ch = a ch c s + gi tr dch chuyn
Chuyn iu khin trong chng trnh (r nhnh, nhy)
199
Jan2016
200
50
Jan2016
NKK-HUST
NKK-HUST
Hng s a ch
Trong lnh cho hng s
a ch c th
CPU pht gi tr a ch
ny n b nh tm
ra ngn nh d liu cn
c/ghi
S dng thanh ghi con tr

n
d liu
d liu
Hng s a ch
d liu
d liu cn c/ghi
d liu
d liu
d liu
d liu
Jan2016
201
NKK-HUST
Jan2016
Jan2016
d liu
d liu
d liu
Thanh ghi
d liu cn c/ghi
d liu
d liu
d liu
d liu
202
NKK-HUST
S dng a ch c s v dch chuyn

n
Trong lnh cho bit

tn thanh ghi con tr
Thanh ghi con tr
cha gi tr a ch
CPU pht a ch ny
ra tm ra ngn nh
d liu cn c/ghi
a ch c s (base address):
a ch ca ngn nh c s
Gi tr dch chuyn a ch (offset):
gia s a ch gia ngn nh cn
c/ghi so vi ngn nh c s
a ch ca ngn nh cn c/ghi
= (a ch c s) + (offset)
C th s dng cc thanh ghi
qun l cc tham s ny
Trng hp ring:
n
a ch c s = 0
Offset = 0
Ngn xp (Stack)
n
a ch c s
Ngn nh c s
Offset
n
n
d liu cn oc/ghi
203
Jan2016
Ngn xp l vng nh d liu c cu trc

LIFO (Last In - First Out vo sau - ra trc)
Ngn xp thng dng phc v cho
chng trnh con
y ngn xp l mt ngn nh xc nh
nh ngn xp l thng tin nm v tr trn
cng trong ngn xp
nh ngn xp c th b thay i
204
51
Jan2016
NKK-HUST
NKK-HUST
Con tr ngn xp SP (Stack Pointer)

n
SP l thanh ghi cha a ch ca

ngn nh nh ngn xp
Khi ct mt thng tin vo ngn
xp:
SP
n
n
B nh chnh c nh a ch cho tng byte
Hai cch lu tr thng tin nhiu byte:

n
nh ngn xp
Gim ni dung ca SP
Thng tin c ct vo ngn nh
c tr bi SP
chiu
a
ch
tng
dn
Khi ly mt thng tin ra khi

ngn xp:
n
Th t lu tr cc byte trong b nh chnh
Thng tin c c t ngn nh

c tr bi SP
Tng ni dung ca SP
y ngn xp
Khi ngn xp rng, SP tr vo

y
Jan2016
205
NKK-HUST
u to (Big-endian): Byte c ngha cao c lu

tr ngn nh c a ch nh, byte c ngha thp
c lu tr ngn nh c a ch ln.
Cc sn phm thc t:
n
Intel x86: little-endian
Motorola 680x0, SunSPARC: big-endian
MIPS, IA-64: bi-endian (c hai kiu)
Jan2016
206
NKK-HUST
V d lu tr d liu 32-bit
S
nh phn
S Hexa
Tp lnh
0001 1010 0010 1011 0011 1100 0100 1101

1A
2B
3C
n
n
4D
4D
4000
1A
4000
3C
4001
2B
4001
2B
4002
3C
4002
1A
4003
4D
4003
little-endian
Jan2016
u nh (Little-endian): Byte c ngha thp c

lu tr ngn nh c a ch nh, byte c ngha
cao c lu tr ngn nh c a ch ln.
Mi b x l c mt tp lnh xc nh
Tp lnh thng c hng chc n hng trm
lnh
Mi lnh my (m my) l mt chui cc bit (0,1)
m b x l hiu c thc hin mt thao tc
xc nh.
Cc lnh c m t bng cc k hiu gi nh
dng text, chnh l cc lnh ca hp ng
(assembly language)
big-endian
207
Jan2016
208
52
Jan2016
NKK-HUST
NKK-HUST
Dng lnh hp ng
Cc thnh phn ca lnh my
M C:
a = b + c;
n V d lnh hp ng:
add a, b, c
#a=b+c
trong :
M thao tc
n
n
Jan2016
n
n
n
Ch : mi lnh ch thc hin mt thao tc
b, c: cc ton hng ngun cho thao tc

a: ton hng ch (ni ghi kt qu)
phn sau du # l li gii thch (ch c tc dng
n ht dng)
n
n
209
Jan2016
add r1, r2, r3 # r1 = r2 + r3

S dng ph bin trn cc kin trc hin nay
add r1, r2
# r1 = r1 + r2
S dng trn Intel x86, Motorola 680x0
add r1
# Acc = Acc + r1
c s dng trn kin trc th h trc
n
n
0 a ch ton hng:
n
n
My tnh vi tp lnh phc tp

Cc b x l: Intel x86, Motorola 680x0
RISC: Reduced Instruction Set Computer

n
Mt a ch ton hng:
n
CISC: Complex Instruction Set Computer

n
Hai a ch ton hng:

n
Jan2016
210
Cc kin trc tp lnh CISC v RISC
Ba a ch ton hng:
n
Hng s nm ngay trong lnh

Ni dung ca thanh ghi
Ni dung ca ngn nh (hoc cng vo-ra)
NKK-HUST
Ton hng c th l:
n
S lng a ch ton hng trong lnh
Cc thao tc chuyn d liu

Cc php ton s hc
Cc php ton logic
Cc thao tc chuyn iu khin (r nhnh, nhy)
a ch ton hng: ch ra ni cha cc ton

hng m thao tc s tc ng
n
NKK-HUST
M thao tc (operation code hay opcode): m

ha cho thao tc m b x l phi thc hin
n
add: k hiu gi nh ch ra thao tc (php ton)

cn thc hin.
n
a ch ton hng
My tnh vi tp lnh thu gn

SunSPARC, Power PC, MIPS, ARM ...
RISC i nghch vi CISC
Kin trc tp lnh tin tin
Cc ton hng u c ngm nh ngn xp

Khng thng dng
211
Jan2016
212
53
Jan2016
NKK-HUST
NKK-HUST
Cc c trng ca kin trc RISC

n
n
n
n
n
n
n
n
Kin trc tp lnh MIPS
S lng lnh t
Hu ht cc lnh truy nhp ton hng cc
thanh ghi
Truy nhp b nh bng cc lnh LOAD/STORE
(np/lu)
Thi gian thc hin cc lnh l nh nhau
Cc lnh c di c nh (thng l 32 bit)
S lng dng lnh t
C t phng php nh a ch ton hng
C nhiu thanh ghi
H tr cc thao tc ca ngn ng bc cao
Jan2016
n
n
n
n
n
213
Jan2016
Thc hin php cng: 3 ton hng
L php ton ph bin nht

n Hai ton hng ngun v mt ton hng
ch
MIPS c tp 32 thanh ghi 32-bit

n
n
add a, b, c # a = b + c
Hu ht cc lnh s hc/logic c dng
trn
Cc lnh s hc s dng ton hng
thanh ghi hoc hng s
n
n
n
215
Jan2016
Bt u bng du $
$t0, $t1, , $t9 cha cc gi tr tm thi
$s0, $s1, , $s7 ct cc bin
Qui c gi d liu trong MIPS:

n
c s dng thng xuyn

c nh s t 0 n 31 (m ha bng 5-bit)
Chng trnh hp dch Assembler t tn:
Jan2016
214
Tp thanh ghi ca MIPS
NKK-HUST
5.2. Lnh hp ng v cc ton hng
Ti liu: MIPS Reference Data Sheet v Chapter 2 COD
NKK-HUST
MIPS vit tt cho:

Microprocessor without Interlocked Pipeline Stages
c pht trin bi John Hennessy v cc ng nghip
i hc Stanford (1984)
c thng mi ha bi MIPS Technologies
Nm 2013 cng ty ny c bn cho Imagination
Technologies (imgtec.com)
L kin trc RISC in hnh, d hc
c s dng trong nhiu sn phm thc t
Cc phn tip theo trong chng ny s nghin cu kin
trc tp lnh MIPS 32-bit
D liu 32-bit c gi l word

D liu 16-bit c gi l halfword
216
54
Jan2016
NKK-HUST
NKK-HUST
Tp thanh ghi ca MIPS

Tn thanh ghi S hiu thanh ghi
$zero
Ton hng thanh ghi

Cng dng
the constant value 0, cha hng s = 0

assembler temporary, gi tr tm thi cho hp ng
$at
$v0-$v1
2-3
procedure return values, cc gi tr tr v ca th tc
$a0-$a3
4-7
procedure arguments, cc tham s vo ca th tc
$t0-$t7
8-15
temporaries, cha cc gi tr tm thi
$s0-$s7
16-23
saved variables, lu cc bin
$t8-$t9
24-25
more temporarie, cha cc gi tr tm thi
$k0-$k1
26-27
OS temporaries, cc gi tr tm thi ca OS
$gp
28
global pointer, con tr ton cc
$sp
29
stack pointer, con tr ngn xp
$fp
30
frame pointer, con tr khung
$ra
31
Jan2016
n
n
217
NKK-HUST
$t0, $s1, $s2

$t1, $s3, $s4
$s0, $t0, $t1
# $t0 = g + h
# $t1 = i + j
# f = (g+h)-(i+j)
218
a ch byte nh v word nh
Np (load) gi tr t b nh vo thanh ghi

Thc hin php ton trn thanh ghi
Lu (store) kt qu t thanh ghi ra b nh
B nh c nh a ch theo byte
n
n
n
Jan2016
c dch thnh m hp ng MIPS:
Jan2016
Mun thc hin php ton s hc vi ton hng b

nh, cn phi:
n
# (rd) = (rs)+(rt)
# (rd) = (rs)-(rt)
NKK-HUST
rd, rs, rt
rd, rs, rt
gi thit: f, g, h, i, j nm $s0, $s1, $s2, $s3, $s4
add
add
sub
Ton hng b nh
n
add
sub
V d m C:
f = (g + h) - (i + j);
n
procedure return address, a ch tr v ca th tc

Lnh add, lnh sub (subtract) ch thao tc vi

ton hng thanh ghi
MIPS s dng 32-bit nh a ch cho cc byte nh v cc

cng vo-ra
Khng gian a ch: 0x00000000 0xFFFFFFFF
MIPS cho php lu tr trong b nh theo kiu u to

(big-endian) hoc kiu u nh (little-endian)
a ch byte
(theo Hexa)
D liu
hoc lnh
a ch word
(theo Hexa)
byte (8-bit)
0x0000 0000
word (32-bit)
0x0000 0000
byte
0x0000 0001
word
0x0000 0004
byte
0x0000 0002
word
0x0000 0008
byte
0x0000 0003
word
0x0000 000C
byte
0x0000 0004
word
0x0000 0010
byte
0x0000 0005
word
0x0000 0014
byte
0x0000 0006
word
0x0000 0018
byte
0x0000 0007
Mi word c di 32-bit chim 4-byte trong b nh, a ch ca

cc word l bi ca 4 (a ch ca byte u tin)
D liu
hoc lnh
219
Jan2016
word
0xFFFF FFF4
byte
0xFFFF FFFB
word
0xFFFF FFF8
byte
0xFFFF FFFC
word
0xFFFF FFFC
byte
0xFFFF FFFD
byte
0xFFFF FFFE
byte
0xFFFF FFFF
232 bytes
230 words
220
55
Jan2016
NKK-HUST
NKK-HUST
Lnh load v lnh store

n
V d ton hng b nh
c word d liu 32-bit t b nh a vo thanh ghi,

s dng lnh load word
lw rt, imm(rs)
# (rt) = mem[(rs)+imm]
M C:
// A l mng cc phn t 32-bit
g = h + A[8];
n
rs: thanh ghi cha a ch c s (base address)

imm (immediate): hng s (offset)
a ch ca word d liu cn c = a ch c s + hng s
n rt: thanh ghi ch, cha word d liu c c vo
n
Cho g $s1, h $s2

$s3 cha a ch c s ca mng A
a ch c s
Offset = 32
ghi word d liu 32-bit t thanh ghi a ra b nh,

s dng lnh store word
sw rt, imm(rs)
# mem[(rs)+imm] = (rt)
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
rt: thanh ghi ngun, cha word d liu cn ghi ra b nh

rs: thanh ghi cha a ch c s (base address)
n imm: hng s (offset)
a ch ni ghi word d liu = a ch c s + hng s
n
n
Jan2016
221
NKK-HUST
Jan2016
222
NKK-HUST
V d ton hng b nh
M C:
// A l mng cc phn t 32-bit
g = h + A[8];
V d ton hng b nh (tip)
n
n
Cho g $s1, h $s2

$s3 cha a ch c s ca mng A
a ch c s
Offset = 32
M hp ng MIPS:
# Ch s 8, do offset = 32
lw
$t0, 32($s3)
# $t0 = A[8]
add $s1, $s2, $t0 # g = h+A[8]
n
offset
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
M C:
a ch c s
A[12] = h + A[8];
n h $s2
n $s3 cha a ch c s ca mng A
Offset = 48
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
base register
(Ch : offset phi l hng s, c th dng hoc m )

Jan2016
223
Jan2016
224
56
Jan2016
NKK-HUST
NKK-HUST
V d ton hng b nh (tip)

n
M C:
Thanh ghi vi B nh
a ch c s
A[12] = h + A[8];
n h $s2
n $s3 cha a ch c s ca mng A
Offset = 48
M hp ng MIPS:
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 48($s3)
Jan2016
# $t0 = A[8]
# $t0 = h+A[8]
# A[12]=h+A[8]
A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]
n
n
225
Jan2016
226
NKK-HUST
X l vi s nguyn
D liu hng s c xc nh ngay trong

lnh
addi $s3, $s3, 4
S nguyn c du (biu din bng b hai):

n
# $s3 = $s3+4
Khng c lnh tr (subi) vi gi tr hng s

n
S dng hng s m trong lnh addi thc

hin php tr
addi $s2, $s1, -1 # $s2 = $s1-1
227
Jan2016
Vi n bit, di biu din: [0, 2n -1]

Cc lnh addu, subu dnh cho s nguyn khng du
Qui c biu din hng s nguyn trong hp

ng MIPS:
n
Vi n bit, di biu din: [-2n-1, +(2n-1-1)]

Cc lnh add, sub dnh cho s nguyn c du
S nguyn khng du:
Jan2016
Ch s dng b nh cho cc bin t c s dng

Cn ti u ha s dng thanh ghi
Ton hng tc th (immediate)
Cn thc hin nhiu lnh hn
Chng trnh dch s dng cc thanh ghi cho

cc bin nhiu nht c th
n
NKK-HUST
Truy nhp thanh ghi nhanh hn b nh

Thao tc d liu trn b nh yu cu np (load)
v lu (store)
s thp phn: 12; 3456; -18

s Hexa (bt u bng 0x): 0x12 ; 0x3456; 0x1AB6
228
57
Jan2016
NKK-HUST
NKK-HUST
Hng s Zero
n
Thanh ghi 0 ca MIPS ($zero hay $0) lun

cha hng s 0
n
5.3. M my (Machine code)

n
Khng th thay i gi tr
Hu ch cho mt s thao tc thng dng

n
Cc lnh c m ha di dng nh phn

c gi l m my
Cc lnh ca MIPS:
n
n
Chng hn, chuyn d liu gia cc thanh ghi

add $t2, $s1, $zero # $t2 = $s1
S hiu thanh ghi c m ha bng 5-bit

n
n
n
Jan2016
229
NKK-HUST
$t0 $t7 c s hiu t 8 15

$t8 $t9 c s hiu t 24 25
$s0 $s7 c s hiu t 16 23
Jan2016
230
NKK-HUST
Lnh kiu R (Registers)
Cc kiu lnh my ca MIPS

Lnh kiu R
op
rs
rt
rd
shamt
funct
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
op
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
op
address
6 bits
26 bits
n
rs
rt
rd
shamt
funct
5 bits
5 bits
5 bits
5 bits
6 bits
op (operation code - opcode): m thao tc

n
n
n
Lnh kiu J
op
6 bits
Cc trng ca lnh
n
Lnh kiu I
Jan2016
c m ha bng cc t lnh 32-bit

Mi lnh chim 4-byte trong b nh, do vy a ch
ca lnh trong b nh l bi ca 4
C t dng lnh
231
Jan2016
vi cc lnh kiu R, op = 000000
rs: s hiu thanh ghi ngun th nht

rt: s hiu thanh ghi ngun th hai
rd: s hiu thanh ghi ch
shamt (shift amount): s bit c dch, ch dng cho
lnh dch bit, vi cc lnh khc shamt = 00000
funct (function code): m hm
232
58
Jan2016
NKK-HUST
NKK-HUST
Lnh kiu I (Immediate)
V d m my ca lnh add, sub

op
rs
rt
rd
shamt
funct
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
add $t0, $s1, $s2
$s1
$s2
$t0
add
17
18
32
000000
10001
10010
01000
00000
100000
$t3
$t5
11
000000
01011
Jan2016
13
01101
$s0
16
10000
00000
rs: s hiu thanh ghi ngun (addi) hoc thanh ghi c s (lw, sw)
rt: s hiu thanh ghi ch (addi, lw) hoc thanh ghi ngun (sw)
imm (immediate): hng s nguyn 16-bit

# (rt) = (rs)+SignExtImm
rt, imm(rs)
# (rt) = mem[(rs)+SignExtImm]
34
sw
rt, imm(rs)
# mem[(rs)+SignExtImm] = (rt)
(SignExtImm: hng s imm 16-bit c m rng theo kiu s c du thnh 32-bit)
100010
233
Jan2016
234
NKK-HUST
V d m my ca lnh addi
Vi cc lnh addi, lw, sw cn cng ni dung

thanh ghi vi hng s:
n
n
Thanh ghi c di 32-bit

Hng s imm 16-bit, cn m rng thnh 32-bit theo
kiu s c du (Sign-extended)
+5 =
0000 0000 0000 0000 0000 0000 0000 0101
16-bit
+5 =
0000 0000 0000 0000 0000 0000 0000 0101
32-bit
-12 =
0000 0000 0000 0000 1111
16-bit
1111
1111
1111
1111
1111
1111
1111
1111 0100
1111 0100
op
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
addi $s0, $s1, 5
V d m rng s 16-bit thnh 32-bit theo kiu

s c du:
-12 =
Jan2016
lw
M rng bit cho hng s theo s c du
imm
16 bits
addi rt, rs, imm
NKK-HUST
rt
5 bits
sub
(0x016D8022)
rs
5 bits
Dng cho cc lnh s hc/logic vi ton hng tc th v cc

lnh load/store (np/lu)
(0x02324020)
sub $s0, $t3, $t5
op
6 bits
$s1
$s0
17
16
001000
10001
10000
0000 0000 0000 0101
addi $t1, $s2, -12

8
$s2
$t1
(0x22300005)
-12
18
-12
001000
10010
01001
1111 1111 1111 0100
32-bit
(0x2249FFF4)
235
Jan2016
236
59
Jan2016
NKK-HUST
NKK-HUST
V d m my ca lnh load v lnh store

op
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
Lnh kiu J (Jump)

n
lw $t0, 32($s3)
35
$s3
$t0
32
35
19
32
100011
10011
01000
0000 0000 0010 0000
Ton hng 26-bit a ch

c s dng cho cc lnh nhy
n
n
j (jump) op = 000010
jal (jump and link) op = 000011
(0x8E680020)
sw $s1, 4($t1)
Jan2016
43
$t1
$s1
43
17
101011
01001
10001
0000 0000 0000 0100
(0xAD310004)
op
237
NKK-HUST
Jan2016
26 bits
238
NKK-HUST
5.4. C bn v lp trnh hp ng
1.
2.
3.
4.
5.
6.
7.
8.
Jan2016
address
6 bits
1. Cc lnh logic
Cc lnh logic
Np hng s vo thanh ghi
To cc cu trc iu khin
Lp trnh mng d liu
Chng trnh con
D liu k t
Lnh nhn v lnh chia
Cc lnh vi s du phy ng
Cc lnh logic thao tc trn cc bit ca

d liu
Php ton
logic
239
Jan2016
Ton t
trong C
Lnh ca
MIPS
Shift left
<<
sll
Shift right
>>
srl
Bitwise AND
&
and, andi
Bitwise OR
or, ori
Bitwise XOR
xor, xori
Bitwise NOT
nor
240
60
Jan2016
NKK-HUST
NKK-HUST
V d lnh logic kiu R
V d lnh logic kiu R
Ni dung cc thanh ghi ngun
Ni dung cc thanh ghi ngun
$s1 0100 0110 1010 0001 1100 0000 1011 0111
$s1 0100 0110 1010 0001 1100 0000 1011 0111
$s2 1111
$s2 1111
M hp ng
1111
1111
1111 0000 0000 0000 0000
Kt qu thanh ghi ch
1111
M hp ng
1111
1111 0000 0000 0000 0000
Kt qu thanh ghi ch
and $s3, $s1, $s2
$s3
and $s3, $s1, $s2
$s3 0100 0110 1010 0001 0000 0000 0000 0000
or
$s4, $s1, $s2
$s4
or
$s4 1111
xor $s5, $s1, $s2
$s5
xor $s5, $s1, $s2
$s5 1011 1001 0101 1110 1100 0000 1011 0111
nor $s6, $s1, $s2
$s6
nor $s6, $s1, $s2
$s6 0000 0000 0000 0000 0011 1111 0100 1000
Jan2016
241
NKK-HUST
$s4, $s1, $s2
Jan2016
1111
1111
1111 1100 0000 1011 0111
242
NKK-HUST
V d lnh logic kiu I
V d lnh logic kiu I
Gi tr cc ton hng ngun

$s1 0000 0000 0000 0000 0000 0000 1111
Gi tr cc ton hng ngun

1111
$s1 0000 0000 0000 0000 0000 0000 1111
imm 0000 0000 0000 0000 1111 1010 0011 0100
imm 0000 0000 0000 0000 1111 1010 0011 0100
Zero-extended
M hp ng
1111
Zero-extended
Kt qu thanh ghi ch
M hp ng
Kt qu thanh ghi ch
andi $s2,$s1,0xFA34 $s2
andi $s2,$s1,0xFA34 $s2 0000 0000 0000 0000 0000 0000 0011 0100
ori
ori
$s3,$s1,0xFA34 $s3
xori $s4,$s1,0xFA34 $s4
$s3,$s1,0xFA34 $s3 0000 0000 0000 0000 1111 1010 1111 1111
xori $s4,$s1,0xFA34 $s4 0000 0000 0000 0000 1111 1010 1100 1011
Ch : Vi cc lnh logic kiu I, hng s imm 16-bit c

m rng thnh 32-bit theo s khng du (zero-extended)
Jan2016
243
Jan2016
244
61
Jan2016
NKK-HUST
NKK-HUST
ngha ca cc php ton logic

n
Lnh logic dch bit
Php AND dng gi nguyn mt s bit trong

word, xa cc bit cn li v 0
Php OR dng gi nguyn mt s bit trong
word, thit lp cc bit cn li ln 1
Php XOR dng gi nguyn mt s bit trong
word, o gi tr cc bit cn li
Php NOT dng o cc bit trong word
n
n
n
n
n
Jan2016
NKK-HUST
rt
rd
shamt
funct
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
Dch tri cc bit v in cc bit 0 vo bn phi

Dch tri i bits l nhn vi 2i (nu kt qu trong phm vi biu din
32-bit)
srl - shift right logical (dch phi logic)

n
245
rs
shamt: ch ra dch bao nhiu v tr (shift amount)

rs: khng s dng, thit lp = 00000
Thanh ghi ch rd nhn gi tr thanh ghi ngun rt c
dch tri hoc dch phi, rt khng thay i ni dung
sll - shift left logical (dch tri logic)
n
i 0 thnh 1, v i 1 thnh 0
MIPS khng c lnh NOT, nhng c lnh NOR vi 3
ton hng
a NOR b == NOT ( a OR b )
nor $t0, $t1, $zero # $t0 = not($t1)
op
Dch phi cc bit v in cc bit 0 vo bn tri

Dch phi i bits l chia cho 2i (ch vi s nguyn khng du)
Jan2016
246
NKK-HUST
V d lnh dch tri sll

Lnh hp ng:
sll $t2, $s0, 4
V d lnh dch phi srl

Lnh hp ng:
srl $s2, $s1, 2
# $t2 = $s0 << 4
M my:
# $s2 = $s1 >> 2
M my:
op
rs
rt
rd
shamt
funct
op
rs
rt
rd
shamt
funct
16
10
17
18
000000
00000
10000
01010
00100
000000
000000
00000
10001
10010
00010
(0x00105100)
V d kt qu thc hin lnh:
V d kt qu thc hin lnh:
$s0
0000
0000
0000
0000
0000
0000
0000
1101
$t2
0000
0000
0000
0000
0000
0000
1101
0000
000010
(0x00119082)
= 13
$s1
0000
0000
0000
0000
0000
0000
0101
0110
= 86
= 208
$s2
0000
0000
0000
0000
0000
0000
0001
0101
= 21
(13x16)
[86/4]
Ch : Ni dung thanh ghi $s0 khng b thay i

Jan2016
247
Jan2016
248
62
Jan2016
NKK-HUST
NKK-HUST
2. Np hng s vo thanh ghi
Lnh lui (load upper immediate)
Trng hp hng s 16-bit s dng lnh addi:

n V d: np hng s 0x4F3C vo thanh ghi $s0:
addi $s0, $0, 0x4F3C #$s0 = 0x4F3C
Trong trng hp hng s 32-bit s dng lnh
lui v lnh ori:
lui rt, constant_hi16bit
n
n
imm
16 bits
15
$s0
0x21A0
15
16
0x21A0
10000
0010 0001 1010 0000
001111
00000
(0x3C1021A0)
Ni dung $s0 sau khi lnh c thc hin:
a 16 bit thp ca hng s 32-bit vo thanh ghi rt
$s0
249
NKK-HUST
0010
0001
Jan2016
1010 0000
0000
0000
0000
0000
250
NKK-HUST
V d khi to thanh ghi 32-bit

n
rt
5 bits
Lnh m my
Copy 16 bit cao ca hng s 32-bit vo 16 bit tri ca rt

Xa 16 bits bn phi ca rt v 0
Jan2016
rs
5 bits
lui $s0, 0x21A0
ori rt,rt,constant_low16bit
n
op
6 bits
3. To cc cu trc iu khin
Np vo thanh ghi $s0 gi tr 32-bit sau:

0010 0001 1010 0000 0100 0000 0011 1011 =0x21A0 403B
lui $s0,0x21A0
# np 0x21A0 vo na cao
# ca thanh ghi $s0
ori $s0,$s0,0x403B
# np 0x403B vo na thp
# ca thanh ghi $s0
n
n
n
Ni dung $s0 sau khi thc hin lnh lui

$s0 0010
0000
0011
or
1011
if
if/else
switch/case
Cc cu trc lp
n
0001 1010 0000 0000 0000 0000 0000

0000 0000 0000 0100 0000
Cc cu trc r nhnh
while
do while
for
Ni dung $s0 sau khi thc hin lnh ori

$s0 0010
Jan2016
0001 1010 0000 0100 0000 0011 1011

251
Jan2016
252
63
Jan2016
NKK-HUST
NKK-HUST
Cc lnh r nhnh v lnh nhy

n
Dch cu lnh if
Cc lnh r nhnh: beq, bne
R nhnh n lnh c nh nhn nu iu

kin l ng, ngc li, thc hin tun t
n beq rs, rt, L1
n
n
n
j L1
n
nhy (jump) khng iu kin n lnh nhn L1

253
NKK-HUST
Jan2016
254
NKK-HUST
Dch cu lnh if
Dch cu lnh if/else
M C:
if (i==j)
f = g+h;
f = f-i;
n f, g, h, i, j $s0, $s1, $s2, $s3, $s4
i == j ?
No
Yes
i = j
M C:
f = f - i
i j
else
if (i==j) f = g+h;
else f = g-h;
f = g + h
i == j ?
f = g + h
f = g - h
f, g, h, i, j $s0, $s1, $s2, $s3, $s4
M hp ng MIPS:
# $s0 =
# $s3 =
bne
add
L1: sub
Jan2016
f = f - i
branch on not equal

nu (rs != rt) r nhnh n lnh nhn L1
Jan2016
Yes
f = g + h
Lnh nhy j
n
No
i == j ?
bne rs, rt, L1

n
branch on equal
nu (rs == rt) r nhnh n lnh nhn L1
M C:
if (i==j)
f = g+h;
f = f-i;
n f, g, h, i, j $s0, $s1, $s2, $s3, $s4
f, $s1 = g, $s2
i, $s4 = j
$s3, $s4, L1 #
$s0, $s1, $s2 #
$s0, $s0, $s3 #
= h
Nu i=j
th f=g+h
f=f-i
iu kin hp
ng ngc vi
iu kin ca
ngn ng bc
cao
255
Jan2016
256
64
Jan2016
NKK-HUST
NKK-HUST
Dch cu lnh if/else

n
Dch cu lnh switch/case

i = j
M C:
i j
M C:
else
if (i==j) f = g+h;
else f = g-h;
n
i == j ?
f = g + h
f = g - h
f, g, h, i, j $s0, $s1, $s2, $s3, $s4
M hp ng MIPS:
Else:
Exit:
bne
add
j
sub
$s3,$s4,Else
$s0,$s1,$s2
Exit
$s0,$s1,$s2
Jan2016
#
#
#
#
Nu i=j
th f=g+h
thot
nu i<>j th f=g-h
257
NKK-HUST
switch (amount) {
case 20:
fee = 2; break;
case 50:
fee = 3; break;
case 100: fee = 5; break;
default: fee = 0;
}
// tng ng vi s dng cc cu lnh if/else
if(amount = = 20) fee = 2;
else if (amount = = 50) fee = 3;
else if (amount = = 100) fee = 5;
else fee = 0;
Jan2016
NKK-HUST
Dch cu lnh switch/case
Dch cu lnh vng lp while
M hp ng MIPS
Jan2016
258
# $s0 = amount, $s1 = fee

case20:
addi $t0, $0, 20
bne $s0, $t0, case50
addi $s1, $0, 2
j
done
case50:
addi $t0, $0, 50
bne $s0, $t0, case100
addi $s1, $0, 3
j
done
case100:
addi $t0, $0, 100
bne $s0, $t0, default
addi $s1, $0, 5
j
done
default:
add $s1 ,$0, $0
done:
M C:
i += 1;
A[i] == k?
i $s3, k $s5
a ch c s ca mng A $s6
Yes
while (A[i] == k)
# $t0 = 20
# amount == 20? if not, skip to case50
# if so, fee = 2
# and break out of case
n
n
No
i = i + 1
# $t0 = 50
# amount == 50? if not, skip to case100
# if so, fee = 3
# $t0 = 100
# amount == 100? if not, skip to default
# if so, fee = 5
# fee = 0
259
Jan2016
260
65
Jan2016
NKK-HUST
NKK-HUST
Dch cu lnh vng lp while

n
M C:
while (A[i] == k)
n
n
Dch cu lnh vng lp for
i += 1;
i $s3, k $s5
a ch c s ca mng A $s6
A[i] == k?
No
M C:
// add the numbers from 0 to 9

int sum = 0;
int i;
for (i=0; i!=10; i++) {
sum = sum + i;
}
Yes
i = i + 1
M hp ng MIPS:
Loop: sll
$t1, $s3, 2
# $t1 = 4*i
add
$t1, $t1, $s6
# $t1 = a ch A[i]
lw
$t0, 0($t1)
# $t0 = A[i]
bne
$t0, $s5, Exit
# nu A[i]<>k th Exit
addi $s3, $s3, 1
# nu A[i]=k th i=i+1
# quay li Loop
Loop
Exit: ...
Jan2016
261
NKK-HUST
Jan2016
Khi lnh c s (basic block)
M C:
// add the numbers from 0 to 9

int sum = 0;
int i;
for (i=0; i!=10; i++) {
sum = sum + i;
}
= sum
$0, 0
$0, $0
$0, 10
$t0, done
$s1, $s0
$s0, 1
#
#
#
#
#
#
#
Khng c lnh r nhnh nhng trong

(ngoi tr cui)
n Khng c ch r nhnh ti (ngoi tr v tr
u tin)
sum = 0
i = 0
$t0 = 10
Nu i=10, thot
Nu i<10 th sum = sum+i
tng i thm 1
quay li for
Khi lnh c s l dy cc lnh vi

n
M hp ng MIPS:
# $s0 = i, $s1
addi $s1,
add
$s0,
addi $t0,
for: beq
$s0,
add
$s1,
addi $s0,
j
for
done: ...
Jan2016
262
NKK-HUST
Dch cu lnh vng lp for

n
263
Jan2016
Chng trnh dch xc

nh khi c s ti u
ha
Cc b x l tin tin c
th tng tc thc hin
khi c s
264
66
Jan2016
NKK-HUST
NKK-HUST
Thm cc lnh thao tc iu kin

n
Lnh slt (set on less than)

slt rd, rs, rt
n
n
Lnh slti
slti rt, rs, constant
n
n
S dng kt hp vi cc lnh beq, bne

slt $t0, $s1, $s2
bne $t0, $zero, L1
...
So snh s c du: slt, slti

So snh s khng du: sltu, sltiu
V d
n
Nu (rs < constant) th rt = 1; ngc li rt = 0;
Nu (rs < rt) th rd = 1; ngc li rd = 0;
So snh s c du v khng du
$s0 = 1111 1111 1111 1111 1111 1111 1111 1111

$s1 = 0000 0000 0000 0000 0000 0000 0000 0001
slt $t0, $s0, $s1 # signed
n
# nu ($s1 < $s2)

# r nhnh n L1
1 < +1 $t0 = 1
sltu $t0, $s0, $s1

n
# unsigned
+4,294,967,295 > +1 $t0 = 0
L1:
Jan2016
265
NKK-HUST
Jan2016
266
NKK-HUST
V d s dng lnh slt

n
V d s dng lnh slt
M C
M hp ng MIPS
# $s0 = i, $s1 = sum
int sum = 0;
int i;
for (i=1; i < 101; i = i*2) {
sum = sum + i;
}
loop:
addi $s1, $0, 0
# sum = 0
addi $s0, $0, 1

addi $t0, $0, 101
# i = 1
# t0 = 101
slt
$t1, $s0, $t0
# Nu i>= 101
beq
$t1, $0, done
# th thot
add
sll
$s1, $s1, $s0

$s0, $s0, 1
# nu i<101 th sum=sum+i
# i= 2*i
loop
# lp li
done:
Jan2016
267
Jan2016
268
67
Jan2016
NKK-HUST
NKK-HUST
4. Lp trnh vi mng d liu

n
V d v mng
Truy cp s lng ln cc d liu cng

loi
Ch s (Index): truy cp tng phn t ca
mng
Kch c (Size): s phn t ca mng
Jan2016
Mng 5-phn t, mi
phn t c di 32-bit,
chim 4 byte trong b
nh
a ch c s =
0x12348000 (a ch ca
phn t u tin ca
mng array[0])
Bc u tin truy cp
mng: np a ch c s
vo thanh ghi
269
NKK-HUST
Jan2016
array[0]
0x12348004
array[1]
0x12348008
array[2]
0x1234800C
array[3]
0x12348010
array[4]
270
NKK-HUST
V d truy cp cc phn t
n
V d vng lp truy cp mng d liu
M C
int array[5];
array[0] = array[0] * 2;
array[1] = array[1] * 2;
n
M hp ng MIPS
lw
sll
sw
$t1, 0($s0)
$t1, $t1, 1
$t1, 0($s0)
# $t1 = array[0]
# $t1 = $t1 * 2
# array[0] = $t1
lw
sll
sw
$t1, 4($s0)
$t1, $t1, 1
$t1, 4($s0)
# $t1 = array[1]
# $t1 = $t1 * 2
# array[1] = $t1
M C
int array[1000];
int i;
for (i=0; i < 1000; i = i + 1)
array[i] = array[i] * 8;
# np a ch c s ca mng vo $s0
lui $s0, 0x1234
# 0x1234 vo na cao ca $S0
ori $s0, $s0, 0x8000
# 0x8000 vo na thp ca $s0
Jan2016
0x12348000
// gi s a ch c s ca mng = 0x23b8f000
n
M hp ng MIPS
# $s0 = array base address (0x23b8f000), $s1 = i
271
Jan2016
272
68
Jan2016
NKK-HUST
NKK-HUST
V d vng lp truy cp mng d liu (tip)
5. Chng trnh con - th tc
M hp ng MIPS
# $s0 = array base address (0x23b8f000), $s1 = i
# khi to cc thanh ghi
lui $s0, 0x23b8
# $s0 = 0x23b80000
ori $s0, $s0, 0xf000
# $s0 = 0x23b8f000
addi $s1, $0, 0
# i = 0
addi $t2, $0, 1000
# $t2 = 1000
# vng lp
loop: slt $t0, $s1, $t2
# i < 1000?
beq $t0, $0, done
# if not then done
sll $t0, $s1, 2
# $t0 = i*4
add $t0, $t0, $s0
# address of array[i]
lw
$t1, 0($t0)
# $t1 = array[i]
sll $t1, $t1, 3
# $t1 = array[i]*8
sw
$t1, 0($t0)
# array[i] = array[i]*8
addi $s1, $s1, 1
# i = i + 1
j
loop
# repeat
done:
Jan2016
1.
2.
3.
4.
t cc tham s vo cc thanh ghi

Chuyn iu khin n th tc
Thc hin cc thao tc ca th tc
t kt qu vo thanh ghi cho chng
trnh gi th tc
5. Tr v v tr gi
273
NKK-HUST
Jan2016
n
n
n
n
n
Jan2016
Gi th tc: jump and link

jal ProcedureAddress
n
C th c ghi li bi th tc c gi
$s0 $s7: ct gi cc bin

n
Cc lnh gi th tc
$a0 $a3: cc tham s vo (cc thanh ghi 4 7)

$v0, $v1: cc kt qu ra (cc thanh ghi 2 v 3)
$t0 $t9: cc gi tr tm thi
n
274
NKK-HUST
S dng cc thanh ghi

n
Cc bc yu cu:
a ch ca lnh k tip (a ch tr v) c ct
thanh ghi $ra
Nhy n a ch ca th tc
Cn phi ct/khi phc bi th tc c gi
$gp: global pointer - con tr ton cc cho d liu

tnh (thanh ghi 28)
$sp: stack pointer - con tr ngn xp (thanh ghi 29)
$fp: frame pointer - con tr khung (thanh ghi 30)
$ra: return address - a ch tr v (thanh ghi 31)
275
Tr v t th tc: jump register

jr $ra
n
Jan2016
Copy ni dung thanh ghi $ra (ang cha a ch tr v)

tr li cho b m chng trnh PC
276
69
Jan2016
NKK-HUST
NKK-HUST
Gi th tc lng nhau
Minh ha gi Th tc
main
main
PC
jal
proc
Prepare
to call
Prepare
to continue
PC
jal
proc
abc
Prepare
to call
Prepare
to continue
abc
Procedure
abc
Save
xyz
Save, etc.
jal
xyz
jr
$ra
Procedure
xyz
Restore
Restore
jr
Jan2016
277
NKK-HUST
Jan2016
278
M hp ng MIPS
Th tc l l th tc khng c li gi th tc khc
M C:
leaf_example:
addi $sp,
sw
$t1,
sw
$t0,
sw
$s0,
add
$t0,
add
$t1,
sub
$s0,
add
$v0,
lw
$s0,
lw
$t0,
lw
$t1,
addi $sp,
jr
$ra
int leaf_example (int g, h, i, j)

{
int f;
f = (g + h) - (i + j);
return f;
}
n Cc tham s g, h, i, j $a0, $a1, $a2, $a3
n f $s0 (do , cn ct $s0 ra ngn xp)
n $t0 v $t1 c th tc dng cha cc gi tr tm
thi, cng cn ct trc khi s dng
n Kt qu $v0
Jan2016
$ra
NKK-HUST
V d Th tc l
n
jr
$ra
279
Jan2016
$sp, -12
8($sp)
4($sp)
0($sp)
$a0, $a1
$a2, $a3
$t0, $t1
$s0, $zero
0($sp)
4($sp)
8($sp)
$sp, 12
#
#
#
#
#
#
#
#
#
#
#
#
#
to 3 v tr stack
ct ni dung $t1
ct ni dung $t0
ct ni dung $s0
$t0 = g+h
$t1 = i+j
$s0 = (g+h)-(i+j)
tr kt qu sang $v0
khi phc $s0
khi phc $t0
khi phc $t1
xa 3 mc stack
tr v ni gi
280
70
Jan2016
NKK-HUST
NKK-HUST
V d Th tc cnh
n
n
M hp ng MIPS
fact:
addi
sw
sw
slti
beq
addi
addi
jr
L1: addi
jal
lw
lw
addi
mul
jr
L th tc c gi th tc khc
M C:
int fact (int n)
{
if (n < 1) return (1);
else return n * fact(n - 1);
}
n Tham s n $a0
n Kt qu $v0
Jan2016
281
NKK-HUST
Jan2016
$a0, -1
0($sp)
4($sp)
$sp, 8
$a0, $v0
#
#
#
#
dnh stack cho 2 mc

ct a ch tr v
ct tham s n
kim tra n < 1
#
#
#
#
#
#
#
#
#
#
nu ng, kt qu l 1
ly 2 mc t stack
v tr v
nu khng, gim n
gi qui
khi phc n ban u
v a ch tr v
ly 2 mc t stack
nhn nhn kt qu
v tr v
282
NKK-HUST
6. D liu k t
n
$sp
Local
variables
z
y
..
.
Saved
registers
c
b
a
..
.
$fp
Frame for
current
procedure
Before calling
Old ($fp)
c
b
a
..
.
ASCII: 128 k t
n
Frame for
previous
procedure
95 k th hin th , 33 m iu khin
Latin-1: 256 k t
n
ASCII v cc k t m rng
Unicode: Tp k t 32-bit
c s dng trong Java, C++,
n Hu ht cc k t ca cc ngn ng trn th
gii v cc k hiu
n
After calling
Cc tp k t c m ha theo byte
n
Frame for
current
procedure
$fp
Jan2016
$sp, -8
4($sp)
0($sp)
$a0, 1
$zero, L1
$zero, 1
$sp, 8
S dng Stack khi gi th tc
$sp
$sp,
$ra,
$a0,
$t0,
$t0,
$v0,
$sp,
$ra
$a0,
fact
$a0,
$ra,
$sp,
$v0,
$ra
283
Jan2016
284
71
Jan2016
NKK-HUST
NKK-HUST
Cc thao tc vi Byte/Halfword
n
n
V d copy String
C th s dng cc php ton logic

Np/Lu byte/halfword trong MIPS
lb rt, offset(rs)
lh rt, offset(rs)
n M rng du thnh 32 bits trong rt
lbu rt, offset(rs)

lhu rt, offset(rs)
n M rng zero thnh 32 bits trong rt
sb rt, offset(rs)
sh rt, offset(rs)
n Ch lu byte/halfword bn phi
Jan2016
void strcpy (char x[], char y[])

{ int i;
i = 0;
while ((x[i]=y[i])!='\0')
i += 1;
}
n Cc a ch ca x, y $a0, $a1
n i $s0
285
NKK-HUST
Jan2016
Jan2016
286
NKK-HUST
V d Copy String
n
M C:
7. Cc lnh nhn v chia s nguyn
M hp ng MIPS
strcpy:
addi
sw
add
L1: add
lbu
add
sb
beq
addi
j
L2: lw
addi
jr
$sp,
$s0,
$s0,
$t1,
$t2,
$t3,
$t2,
$t2,
$s0,
L1
$s0,
$sp,
$ra
$sp, -4
0($sp)
$zero, $zero
$s0, $a1
0($t1)
$s0, $a0
0($t3)
$zero, L2
$s0, 1
0($sp)
$sp, 4
#
#
#
#
#
#
#
#
#
#
#
#
#
adjust stack for 1 item

save $s0
i = 0
addr of y[i] in $t1
$t2 = y[i]
addr of x[i] in $t3
x[i] = y[i]
exit loop if y[i] == 0
i = i + 1
next iteration of loop
restore saved $s0
pop 1 item from stack
and return
MIPS c hai thanh ghi 32-bit: HI (high) v LO (low)

Cc lnh lin quan:
n
n
n
n
n
n
287
Jan2016
mult rs, rt # nhn s nguyn c du

multu rs, rt # nhn s nguyn khng du
n Tch 64-bit nm trong cp thanh ghi HI/LO
div rs, rt
# chia s nguyn c du
divu rs, rt # chia s nguyn khng du
n HI: cha phn d, LO: cha thng
mfhi rd
mflo rd
# Move from Hi to rd
# Move from LO to rd
288
72
Jan2016
NKK-HUST
NKK-HUST
8. Cc lnh vi s du phy ng (FP)
Cc lnh vi s du phy ng
n
Cc thanh ghi s du phy ng
32 thanh ghi 32-bit (single-precision): $f0, $f1, $f31

Cp i cha d liu dng 64-bit (double-precision):
$f0/$f1, $f2/$f3,
n
n
V d:
n
n
ldc1 $f8, 32($s2)
n
289
NKK-HUST
M thao tc, hai thanh ghi, hng s
Hu ht cc ch r nhnh l r nhnh gn
op
rs
rt
imm
6 bits
5 bits
5 bits
16 bits
n
n
n
rs
rt
imm
5 bits
5 bits
16 bits
4 or 5
16
17
Exit
khong cch tng i tnh theo word
Lnh m my
PC-relative addressing
a ch ch = PC + hng s imm 4
Ch : trc PC c tng ln
Hng s imm 16-bit c gi tr trong di [-215 , +215 - 1]
op
6 bits
beq $s0, $s1, Exit

bne $s0, $s1, Exit
R xui hoc r ngc
nh a ch tng i vi PC
n
Jan2016
Lnh beq, bne
Cc lnh Branch ch ra:
290
NKK-HUST
bc1t, bc1f
VD:
bc1t TargetLabel
Jan2016
5.5. Cc phng php nh a ch ca MIPS

n
c.xx.s, c.xx.d (trong xx l eq, lt, le, )

Thit lp hoc xa cc bit m iu kin
VD:
c.lt.s $f3, $f4
Cc lnh r nhnh da trn m iu kin

n
Jan2016
add.d, sub.d, mul.d, div.d

VD:
mul.d $f4, $f4, $f6
Cc lnh so snh
lwc1, ldc1, swc1, sdc1
add.s, sub.s, mul.s, div.s

VD:
add.s $f0, $f1, $f6
Cc lnh s hc vi s FP 64-bit (double-precision)

n
Cc lnh s du phy ng ch thc hin trn cc

thanh ghi s du phy ng
Lnh load v store vi FP
Cc lnh s hc vi s FP 32-bit (single-precision)
291
Jan2016
beq
000100
10000
10001
0000 0000 0000 0110
bne
000101
10000
10001
0000 0000 0000 0110
292
73
Jan2016
NKK-HUST
NKK-HUST
a ch ha cho lnh Jump

n
V d m lnh j v jr
j
jr
ch ca lnh Jump (j v jal) c th l bt

k ch no trong chng trnh
n
Cn m ha y a ch trong lnh
op
address
6 bits
26 bits
31
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
From PC
31
op
293
Effective target address (32 bits)

25
rs
20
rt
15
rd
10
sh
fn
0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ALU
instruction
NKK-HUST
Source
register
Jan2016
Unused
Unused
Unused
jr = 8
294
NKK-HUST
V d m ha lnh
R nhnh xa
n
$t1, $s3, 2
0x8000
19
add
$t1, $t1, $s6
0x8004
22
32
lw
$t0, 0($t1)
0x8008
35
bne
$t0, $s5, Exit
Loop: sll
0x800C
21
addi $s3, $s3, 1
0x8010
19
19
0x8014
Loop
Nu ch r nhnh l qu xa m ha vi
offset 16-bit, assembler s vit li code
V d
beq $s0, $s1, L1
(lnh k tip)
...
L1:
0x2000
0x8018
Exit:
Jan2016
jump target address
25
j=2
a ch ch = PC3128 : (address 4)
Jan2016
op
0 0 0 0 1 0
nh a ch nhy (gi) trc tip

(Pseudo)Direct jump addressing
n
# nhy n v tr c nhn L1
# nhy n v tr c a ch $ra;
# $ra cha a ch tr v
L1
$ra
s c thay bng on lnh sau:

bne $s0, $s1, L2
j L1
L2: (lnh k tip)
...
L1:
295
Jan2016
296
74
Jan2016
NKK-HUST
NKK-HUST
Tm tt v cc phng php nh a ch
2.10
5.6. Dch v chy chng trnh hp ng
117
MIPS Addressing for 32-bit Immediates and Addresses
1. Immediate addressing
1. nh a ch tc th
op
rs
rt
Immediate
2. Register addressing
2. nh a ch thanh ghi
op
rs
rt
rd
. . . funct
Registers
Register
3. Base addressing
op
3. nh a ch c s
rs
rt
Address
Register
Memory
+
Byte Halfword
Word
4. PC-relative addressing
op
rs
rt
4. nh a ch tng i vi PC
Cc phn mm lp trnh hp ng MIPS:
Address
PC
MARS
MipsIt
PCSpim
MIPS Reference Data
Memory
+
Word
5. Pseudodirect addressing
op
5. nh a ch gi trc tip
Address
PC
Memory
Word
Jan2016
Computer
FIGURE
2.18 Architecture
Illustration of the five MIPS addressing modes. The operands are shaded297
in color.
The operand of mode 3 is in memory, whereas the operand for mode 2 is a register. Note that versions of
load and store access bytes, halfwords, or words. For mode 1, the operand is 16 bits of the instruction itself.
Modes 4 and 5 address instructions in memory, with mode 4 adding a 16-bit address shifted left 2 bits to the
PC and mode 5 concatenating a 26-bit address shifted left 2 bits with the 4 upper bits of the PC. Note that a
single operation can use more than one addressing mode. Add, for example, uses both immediate (addi)
and register (add) addressing.
NKK-HUST
Although we show MIPS as having 32-bit addresses, nearly all microprocessors

(including MIPS) have 64-bit address extensions (see Appendix E and Section
2.18). These extensions were in response to the needs of software for larger
programs. The process of instruction set extension allows architectures to expand in
such a way that is able to move software compatibly upward to the next generation
of architecture.
Dch v chy ng dng

High Level Code
Jan2016
Hardware/
Software
Interface
Chng trnh trong b nh
Cc lnh (instructions)
D liu
n
Assembly Code
Assembler
Object File
Object Files
Library Files
Linker
298
NKK-HUST
Compiler
B nh:
n
Executable
Ton cc/tnh: c cp pht trc khi chng trnh

bt u thc hin
ng: c cp pht trong khi chng trnh thc
hin
232 bytes = 4 GiB

a ch t 0x00000000 n 0xFFFFFFFF
Loader
Memory
Jan2016
299
Jan2016
300
75
Jan2016
NKK-HUST
NKK-HUST
Bn b nh ca MIPS
Address
V d: M C
Segment
int f, g, y;
variables
0xFFFFFFFC
Reserved
0x80000000
0x7FFFFFFC
Stack
int main(void)
{
f = 2;
g = 3;
y = sum(f, g);
return y;
}
Dynamic Data
0x10010000
Heap
0x1000FFFC
Static Data
0x10000000
0x0FFFFFFC
Text
int sum(int a, int b) {

return (a + b);
}
0x00400000
0x003FFFFC
Reserved
0x00000000
Jan2016
301
NKK-HUST
Jan2016
302
NKK-HUST
V d chng trnh hp ng MIPS

.data
f:
g:
y:
.text
main:
addi
sw
addi
sw
addi
sw
jal
sw
lw
addi
jr
sum:
add
jr
Jan2016
// global
Bng k hiu
K hiu
$sp,
$ra,
$a0,
$a0,
$a1,
$a1,
sum
$v0,
$ra,
$sp,
$ra
$sp, -4
0($sp)
$0, 2
f
$0, 3
g
#
#
#
#
#
#
#
#
#
#
#
y
0($sp)
$sp, 4
$v0, $a0, $a1

$ra
stack frame
store $ra
$a0 = 2
f = 2
$a1 = 3
g = 3
call sum
y = sum()
restore $ra
restore $sp
return to OS
a ch
0x10000000
0x10000004
0x10000008
main
0x00400000
sum
0x0040002C
# $v0 = a + b
# return
303
Jan2016
304
76
necessary. Figure 6.33 shows the executable file. It has three sections:
the executable file header, the text segment, and the data segment.
The executable file header reports the text size (code size) and data size
(amount of globally declared data). Both are given in units of bytes. The
text segment gives the instructions in the order that they are stored in
memory.
The figure shows the instructions in human-readable format next to
the machine code for ease of interpretation, but the executable file
includes only machine instructions. The data segment gives the address
of each global variable. The global variables are addressed with respect
NKK-HUST
to the base address given by the global pointer, $gp. For example, the first
Jan2016
NKK-HUST
Chng trnh thc thi

Executable file header
Text segment
Figure 6.33 Executable
Data segment
Text Size
Data Size
0x34 (52 bytes)
0xC (12 bytes)
Chng trnh trong b nh

Address
Memory
Reserved
0x7FFFFFFC
Stack
0x10010000
Heap
Address
Instruction
0x00400000
0x23BDFFFC
addi $sp, $sp, 4
0x00400004
0xAFBF0000
sw
0x00400008
0x20040002
addi $a0, $0, 2
0x0040000C
0xAF848000
sw
0x00400010
0x20050003
addi $a1, $0, 3
0x00400014
0xAF858004
sw
0x00400018
0x0C10000B
jal
0x0040002C
0x0040001C
0xAF828008
sw
$v0, 0x8008($gp)
0x00400020
0x8FBF0000
lw
$ra, 0($sp)
0x00400024
0x23BD0004
addi $sp, $sp, 4
0x03E00008
0x00400028
0x03E00008
jr
0x00851020
0x0040002C
0x00851020
add $v0, $a0, $a1
0x00400030
0x03E00008
jr
Address
Data
0x0C10000B
0x10000000
0x20050003
0x10000004
0xAF848000
0x10000008
$ra, 0($sp)
$a0, 0x8000($gp)
$sp = 0x7FFFFFFC
$gp = 0x10008000
y
$a1, 0x8004($gp)
g
0x10000000
$ra
0x03E00008
0x23BD0004
$ra
0x8FBF0000
0xAF828008
0xAF858004
0x20040002
0xAFBF0000
0x00400000
0x23BDFFFC
PC = 0x00400000
Reserved
Jan2016
305
NKK-HUST
Jan2016
306
NKK-HUST
V d lnh gi (Pseudoinstruction)
Pseudoinstruction
Jan2016
MIPS Instructions
li $s0, 0x1234AA77
lui $s0, 0x1234

ori $s0, 0xAA77
mul $s0, $s1, $s2
mult $s1, $s2

mflo $s0
clear $t0
add $t0, $0, $0
move $s1, $s2
add $s2, $s1, $0
nop
sll $0, $0, 0
Ht chng 5
307
Jan2016
308
77
Jan2016
NKK-HUST
NKK-HUST
Kin trc my tnh
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 6
B X L
Nguyn Kim Khnh

Jan2016
309
NKK-HUST
Jan2016
310
NKK-HUST
Ni dung ca chng 6
6.1. T chc ca CPU

1. Cu trc c bn ca CPU
n
6.1. T chc ca CPU

6.2. Thit k n v iu khin
6.3. K thut ng ng lnh
6.4. V d thit k b x l theo kin trc
MIPS*
Nhim v ca CPU:
n
Jan2016
311
Jan2016
Nhn lnh (Fetch Instruction): CPU c lnh t b

nh
Gii m lnh (Decode Instruction): xc nh thao tc
m lnh yu cu
Nhn d liu (Fetch Data): nhn d liu t b nh
hoc cc cng vo-ra
X l d liu (Process Data): thc hin php ton s
hc hay php ton logic vi cc d liu
Ghi d liu (Write Data): ghi d liu ra b nh hay
cng vo-ra
312
78
Jan2016
NKK-HUST
NKK-HUST
S cu trc c bn ca CPU
n
n v
s hc
v logic
(ALU)
n v
iu khin
(CU)
2. n v s hc v logic
Tp
thanh ghi
(RF)
Chc nng: Thc hin cc php ton

s hc v php ton logic:
n
n
S hc: cng, tr, nhn, chia, o du

Logic: AND, OR, XOR, NOT, php dch bit
bus bn trong
n v ni ghp bus (BIU)
bus iu khin
Jan2016
bus d liu
bus a ch
313
NKK-HUST
Jan2016
314
NKK-HUST
M hnh kt ni ALU
D liu t
cc thanh ghi
Cc tn hiu
t n v
iu khin
3. n v iu khin
D liu n
cc thanh ghi
Chc nng
iu khin nhn lnh t b nh a vo CPU
n Tng ni dung ca PC tr sang lnh k tip
n Gii m lnh c nhn xc nh thao
tc m lnh yu cu
n Pht ra cc tn hiu iu khin thc hin lnh
n Nhn cc tn hiu yu cu t bus h thng v
p ng vi cc yu cu .
n
n v
s hc v logic
(ALU)
Thanh ghi c
Thanh ghi c: hin th trng thi ca kt qu php ton

Jan2016
315
Jan2016
316
79
Jan2016
NKK-HUST
NKK-HUST
M hnh kt ni n v iu khin
Cc tn hiu a n n v iu khin
n
Thanh ghi lnh
n
Cc tn hiu
iu khin
bn trong CPU
Cc c
n v iu khin
Clock
Cc tn hiu
iu khin t
bus h thng
Cc tn hiu
iu khin n
bus h thng
Clock: tn hiu nhp t mch to dao

ng bn ngoi
Lnh t thanh ghi lnh a n gii
m
Cc c t thanh ghi c cho bit trng
thi ca CPU
Cc tn hiu yu cu t bus iu khin
Bus iu khin
Jan2016
317
NKK-HUST
Jan2016
NKK-HUST
Cc tn hiu pht ra t n v iu khin

n
4. Hot ng ca chu trnh lnh
Cc tn hiu iu khin bn trong CPU:

n
n
Chu trnh lnh
iu khin cc thanh ghi

iu khin ALU
n
n
Cc tn hiu iu khin bn ngoi CPU:

n
n
iu khin b nh
iu khin cc m-un vo-ra
n
n
n
Jan2016
318
319
Jan2016
Nhn lnh
Gii m lnh
Nhn ton hng
Thc hin lnh
Ct ton hng
Ngt
320
80
Jan2016
NKK-HUST
NKK-HUST
Gin trng thi chu trnh lnh

Nhn
ton hng
Nhn lnh
Gii m
thao tc
lnh
Tnh
a ch
ton hng
Nhiu
ton
hng
Thao tc
d liu
Tnh
a ch
ton hng
Quay li vi d liu
String hoc Vector
Lnh hon thnh,

nhn lnh tip theo
Jan2016
Ct
ton hng
Nhiu
ton
hng
Tnh
a ch
ca lnh
Nhn lnh
Kim tra
ngt
C
ngt
n
Ngt
Khng
ngt
321
NKK-HUST
CPU a a ch ca lnh cn nhn t b

m chng trnh PC ra bus a ch
CPU pht tn hiu iu khin c b nh
Lnh t b nh c t ln bus d liu
v c CPU copy vo thanh ghi lnh IR
CPU tng ni dung PC tr sang lnh
k tip
Jan2016
322
NKK-HUST
S m t qu trnh nhn lnh
Gii m lnh
CPU
PC
B nh
n v
iu khin
Lnh t thanh ghi lnh IR c a

n n v iu khin
n v iu khin tin hnh gii m lnh
xc nh thao tc phi thc hin
Gii m lnh xy ra bn trong CPU
IR
Bus Bus Bus

a d iu
ch liu khin
PC: B m chng trnh

IR: Thanh ghi lnh
Jan2016
323
Jan2016
324
81
Jan2016
NKK-HUST
NKK-HUST
Nhn d liu t b nh
S m t nhn d liu t b nh
CPU
CPU a a ch ca ton hng ra bus

a ch
CPU pht tn hiu iu khin c
Ton hng c c vo CPU
Tng t nh nhn lnh
MAR
n v
iu khin
MBR
MAR: Thanh ghi a ch b nh

MBR: Thanh ghi m b nh
Jan2016
325
NKK-HUST
Jan2016
326
NKK-HUST
Ghi ton hng
C nhiu dng tu thuc vo lnh

C th l:
n
n
n
n
n
n
Jan2016
Bus Bus Bus

a d iu
ch liu khin
Thc hin lnh

n
B nh
c/Ghi b nh
Vo/Ra
Chuyn gia cc thanh ghi
Php ton s hc/logic
Chuyn iu khin (r nhnh)
...
CPU a a ch ra bus a ch
CPU a d liu cn ghi ra bus d liu
CPU pht tn hiu iu khin ghi
327
Jan2016
D liu trn bus d liu c copy n

v tr xc nh
328
82
Jan2016
NKK-HUST
NKK-HUST
S m t qu trnh ghi ton hng
Ngt
n
CPU
MAR
n
n v
iu khin
B nh
n
n
MBR

Jan2016
Bus Bus Bus

a d iu
ch liu khin
329
NKK-HUST
Jan2016
Ni dung ca b m chng trnh PC (a

ch tr v sau khi ngt) c a ra bus d
liu
CPU a a ch (thng c ly t con tr
ngn xp SP) ra bus a ch
CPU pht tn hiu iu khin ghi b nh
a ch tr v trn bus d liu c ghi ra v
tr xc nh ( ngn xp)
a ch lnh u tin ca chng trnh con
iu khin ngt c np vo PC
330
NKK-HUST
S m t chu trnh ngt
6.2. Cc phng php thit k n v iu khin
CPU
SP
MAR
PC
n v
iu khin
B nh
n v iu khin vi chng trnh

(Microprogrammed Control Unit)
n v iu khin ni kt cng
(Hardwired Control Unit)
MBR

PC: B m chng trnh
SP: Con tr ngn xp
Jan2016
Bus Bus Bus

a d iu
ch liu khin
331
Jan2016
332
83
Jan2016
NKK-HUST
NKK-HUST
1. n v iu khin vi chng trnh

n
B nh vi chng trnh
(ROM) lu tr cc vi
Cc tn hiu
iu khin t
chng trnh
bus h thng
(microprogram)
Clock
Mt vi chng trnh bao Cc c
gm cc vi lnh
(microinstruction)
Mi vi lnh m ho cho
mt vi thao tc
(microoperation)
hon thnh mt lnh
cn thc hin mt hoc
mt vi vi chng trnh
Tc chm
Jan2016
2. n v iu khin ni kt cng
n
Thanh ghi lnh
B gii m
Mch dy
Thanh ghi a ch vi lnh
B nh
vi chng trnh
n
n
Vi lnh
tip
theo
Thanh ghi m vi lnh
S dng mch
cng gii m
v to cc tn hiu
iu khin thc
hin lnh
Tc nhanh
Clock
n v iu khin
phc tp
Tn hiu iu
khin bn trong
CPU
T1
T2
..
.
..
.
n v
iu khin
Cc
c
Tn
... C
m-1
Cc tn hiu iu khin
Tn hiu iu
khin n bus h
thng
333
Jan2016
334
NKK-HUST
6.3. K thut ng ng lnh
Biu thi gian ca ng ng lnh
K thut ng ng lnh (Instruction Pipelining): Chia

chu trnh lnh thnh cc cng on v cho php thc
hin gi ln nhau (nh dy chuyn lp rp)
lnh 1
lnh 2
Chng hn b x l MIPS c 5 cng on:
lnh 3
1. IF: Instruction fetch from memory Nhn lnh t b nh

2. ID: Instruction decode & register read Gii m lnh v c
thanh ghi
3. EX: Execute operation or calculate address Thc hin thao tc
hoc tnh ton a ch
4. MEM: Access memory operand Truy nhp ton hng b nh
5. WB: Write result back to register Ghi kt qu tr v thanh ghi
Jan2016
Mch
phn
chia thi
gian
C0 C1
NKK-HUST
B gii m
B gii m vi lnh
Thanh ghi lnh
lnh 4
lnh 5
lnh 6
lnh 7
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
lnh 8
10
11
12
WB
Thi gian thc hin 1 cng on = T

Thi gian thc hin tun t 8 lnh:
8 x 5T = 40T
Thi gian thc hin ng ng 8 lnh: (1 x 5T) + [ (8-1) x T ] = 12T
335
Jan2016
336
84
Jan2016
NKK-HUST
NKK-HUST
Hazard cu trc
Cc mi tr ngi (Hazard) ca ng ng lnh

n
Hazard: Tnh hung ngn cn bt u

ca lnh tip theo chu k tip theo
n
Hazard cu trc: do ti nguyn c yu

cu ang bn
Hazard d liu: cn phi i lnh trc

hon thnh vic c/ghi d liu
Hazard iu khin: do r nhnh gy ra
Jan2016
337
NKK-HUST
338
NKK-HUST
Forwarding (gi vt trc)
Lnh ph thuc vo vic hon thnh truy

cp d liu ca lnh trc
S dng kt qu ngay sau khi n c tnh

Khng i n khi kt qu c lu n thanh
ghi
n Yu cu c ng kt ni thm trong datapath
n
add $s0, $t0, $t1

sub $t2, $s0, $t3
Jan2016
Lnh Load/store yu cu truy cp d liu

Nhn lnh cn tr hon cho chu k
Bi vy, datapath kiu ng ng yu

cu b nh lnh v b nh d liu tch
ri (hoc cache lnh/cache d liu tch
ri)
Jan2016
Hazard d liu
n
Xung t khi s dng ti nguyn

Trong ng ng ca MIPS vi mt b
nh dng chung
339
Jan2016
340
85
Jan2016
NKK-HUST
NKK-HUST
Hazard d liu vi lnh load

n
Khng phi lun lun c th trnh tr hon

bng cch forwarding
n
n
n
Nu gi tr cha c tnh khi cn thit

Khng th chuyn ngc thi gian
Cn chn bc tr hon (stall hay bubble)
Lp lch m trnh tr hon

Thay i trnh t m trnh s dng kt
qu load lnh tip theo
M C:
a = b + e; c = b + f;
stall
stall
lw
lw
add
sw
lw
add
sw
$t1,
$t2,
$t3,
$t3,
$t4,
$t5,
$t5,
0($t0)
4($t0)
$t1, $t2
12($t0)
8($t0)
$t1, $t4
16($t0)
13 cycles
Jan2016
341
NKK-HUST
Jan2016
0($t0)
4($t0)
8($t0)
$t1, $t2
12($t0)
$t1, $t4
16($t0)
11 cycles
342
Tr hon khi r nhnh
R nhnh xc nh lung iu khin
Nhn lnh tip theo ph thuc vo kt qu

r nhnh
n ng ng khng th lun nhn ng lnh
n
$t1,
$t2,
$t4,
$t3,
$t3,
$t5,
$t5,
NKK-HUST
Hazard iu khin
n
lw
lw
lw
add
sw
add
sw
i cho n khi kt qu r nhnh c

xc nh trc khi nhn lnh tip theo
Vn ang lm cng on gii m lnh (ID)

ca lnh r nhnh
Vi ng ng ca MIPS
Cn so snh thanh ghi v tnh a ch ch
sm trong ng ng
n Thm phn cng thc hin vic
trong cng on ID
n
Jan2016
343
Jan2016
344
86
Jan2016
NKK-HUST
NKK-HUST
D on r nhnh
n
Nhng ng ng di hn khng th
sm xc nh d dng kt qu r nhnh
n
Prediction
correct
Cch tr hon khng p ng c
D on kt qu r nhnh
n
MIPS vi d on r nhnh khng xy ra
Ch tr hon khi d on l sai
Vi MIPS
n
n
Jan2016
C th d on r nhnh khng xy ra
Nhn lnh ngay sau lnh r nhnh (khng
lm tr)
Prediction
incorrect
345
NKK-HUST
Jan2016
NKK-HUST
c im ca ng ng
Tng cng kh nng song song mc lnh

n
K thut ng ng ci thin hiu nng

bng cch tng s lnh thc hin
n
n
Tng s cng on ca ng ng
Lnh 1
Lnh 2
Lnh 3
Thc hin nhiu lnh ng thi

Mi lnh c cng thi gian thc hin
Lnh 4
Lnh 5
Lnh 6
Cc dng hazard:
n
346
Cu trc, d liu, iu khin
Siu v hng (Superscalar)

Lnh 1
Thit k tp lnh nh hng n phc

tp ca vic thc hin ng ng
Lnh 2
Lnh 3
Lnh 4
Lnh 5
Lnh 6
Jan2016
347
Jan2016
348
87
Jan2016
NKK-HUST
NKK-HUST
6.4. Thit k b x l theo kin trc MIPS*

B x l MIPS Chapter 4 [2]
Ht chng 6
Jan2016
349
NKK-HUST
Jan2016
350
NKK-HUST
Kin trc my tnh
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 7
B NH MY TNH
Nguyn Kim Khnh

Jan2016
351
Jan2016
352
88
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 7
7.1. Tng quan h thng nh

1. Cc c trng ca b nh
n V tr
7.1. Tng quan h thng nh

7.2. B nh chnh
7.3. B nh m (cache)
7.4. B nh ngoi
7.5. B nh o
Bn trong CPU:
B nh trong:
n
n
n
353
NKK-HUST
di t nh (tnh bng bit)

S lng t nh
Jan2016
354
NKK-HUST
Cc c trng ca b nh (tip)
n
n v truyn
n
T nh
Khi nh
n
n
n
Hiu nng (performance)

n
n
Phng php truy nhp

n
Jan2016
cc thit b lu tr
Dung lng
n
Jan2016
b nh chnh
b nh m (cache)
B nh ngoi:
n
tp thanh ghi
Truy nhp tun t (bng t)

Truy nhp trc tip (cc loi a)
Truy nhp ngu nhin (b nh bn dn)
Truy nhp lin kt (cache)
Kiu vt l
n
n
n
355
Jan2016
Thi gian truy nhp

Chu k nh
Tc truyn
B nh bn dn
B nh t
B nh quang
356
89
Jan2016
NKK-HUST
NKK-HUST
n
2. Phn cp b nh
Cc c tnh vt l
B vi x l
CPU
Kh bin / Khng kh bin

(volatile / nonvolatile)
n Xo c / khng xo c
n
Tp
thanh
ghi
Cache
B nh
chnh
Thit b
lu tr
(HDD, SSD)
T chc
B nh
mng
T tri sang phi:

n dung lng tng dn
n tc gim dn
n gi thnh cng dung lng gim dn
Jan2016
357
NKK-HUST
Jan2016
NKK-HUST
Cng ngh b nh
Thi gian
truy nhp
Gi thnh/GiB
(2012)
SRAM
0,5 2,5 ns
$500 $1000
DRAM
50 70 ns
$10 $20
Flash memory 5.000 50.000 ns

HDD
5 20 ms
Trong mt khong thi gian nh CPU

thng ch tham chiu cc thng tin
trong mt khi nh cc b
V d:
n
$0,75 $1
$0,05 $0,1
Cu trc chng trnh tun t

Vng lp c thn nh
Cu trc d liu mng
B nh l tng
n
n
Jan2016
Nguyn l cc b ho tham chiu b nh
Cng ngh
b nh
358
Thi gian truy nhp nh SRAM

Dung lng v gi thnh nh a cng
359
Jan2016
360
90
NKK-HUST
Jan2016
NKK-HUST
7.2. B nh chnh
ROM (Read Only Memory)
1. B nh bn dn
Kiu b nh
Read Only Memory
(ROM)
B nh
ch c
Programmable ROM
(PROM)
Erasable PROM
(EPROM)
Electrically Erasable
PROM (EEPROM)
Kh nng xo
Tiu
chun
Khng xo
c
B nh
hu nh
ch c
Flash memory
Random Access
Memory (RAM)
B nh
c-ghi
Jan2016
bng tia cc tm,

c chip
bng in,
mc tng byte
C ch ghi
Tnh
kh bin
Mt n
n
n
Bng in
Khng
kh bin
bng in,
mc tng byte
Bng in
Kh bin
361
Jan2016
Cn thit b chuyn dng ghi

Ch ghi c mt ln
Ghi theo khi

Xa bng in
Dung lng ln
EPROM (Erasable PROM)

n
n
n
Cn thit b chuyn dng ghi

Xa c bng tia t ngoi
Ghi li c nhiu ln
EEPROM (Electrically Erasable PROM)

n
n
Jan2016
thng tin c ghi khi sn xut
PROM (Programmable ROM)

n
B nh Flash
ROM mt n:
n
362
NKK-HUST
Cc kiu ROM
Th vin cc chng trnh con

Cc chng trnh iu khin h thng (BIOS)
Cc bng chc nng
Vi chng trnh
bng in,
tng khi
NKK-HUST
B nh khng kh bin
Lu tr cc thng tin sau:
C th ghi theo tng byte

Xa bng in
363
Jan2016
364
91
Jan2016
NKK-HUST
NKK-HUST
RAM (Random Access Memory)

n
n
n
n
Jan2016
SRAM (Static) RAM tnh
B nh c-ghi (Read/Write Memory)

Kh bin
Lu tr thng tin tm thi
C hai loi: SRAM v DRAM
(Static and Dynamic)
Cc bit c lu tr bng cc Flip-Flop

thng tin n nh
n Cu trc phc tp
n Dung lng chip nh
n Tc nhanh
n t tin
n Dng lm b nh cache
n
365
NKK-HUST
Jan2016
n
n
n
n
n
Jan2016
366
NKK-HUST
DRAM (Dynamic) RAM ng

n
Mt s DRAM tin tin thng dng
Cc bit c lu tr trn t in
cn phi c mch lm ti
Cu trc n gin
Dung lng ln
Tc chm hn
R tin hn
Dng lm b nh chnh
n
n
n
n
367
Jan2016
Ci tin tng tc
Synchronous DRAM (SDRAM): lm vic
c ng b bi xung clock
DDR-SDRAM (Double Data Rate SDRAM)
DDR3, DDR4
368
92
Jan2016
NKK-HUST
NKK-HUST
T chc ca chip nh
S c bn ca chip nh
Cc tn hiu ca chip nh
n
n
A0
D0
A1
D1
.
.
.
Chip nh
2n x m bit
.
.
.
An-1
Tn hiu chn chip

(Chip Select)
5.1 CS
/ SEMICONDUCTOR
MAIN MEMORY 167
Tn
hiu
iu
khin
c
OE
(Output
Enable)
Packaging
As was
mentioned
in
Chapter
2, an integrated
circuit is(Write
mounted onEnable)
a package that
n
Tn
hiu
iu
khin
ghi
WE
contains pins for connection to the outside world.
Figure 5.4a shows an example EPROM package, which is an 8-Mbit chip
(Cc
tn hiu iu khin thng tch cc vi mc 0)
organized as 1M * 8. In this case, the organization is treated as a one-word-per-chip
n
n
Chip
Dm-1
m-bit
CS
WE
Jan2016
Cc ng a ch: An-1 A0 c 2n t nh
Cc ng d liu: Dm-1 D0 di t
nh = m bit
Dung lng chip nh = 2n x m bit
Cc ng iu khin:
package. The package includes 32 pins, which is one of the standard chip package
sizes. The pins support the following signal lines:
OE
369
NKK-HUST
Jan2016
NKK-HUST
T chc ca DRAM
The address of the word being accessed. For 1M words, a total of 20 (220 = 1M)
pins are needed (A0A19).
The data to be read out, consisting of 8 lines (D0D7).
The power supply to the chip (Vcc).
370
A ground pin (Vss).

A chip enable (CE) pin. Because there may be more than one memory chip,
each of which is connected to the same address bus, the CE pin is used to indicate whether or not the address is valid for this chip. The CE pin is activated
by logic connected to the higher-order bits of the address bus (i.e., address bits
above A19). The use of this signal is illustrated presently.
A program voltage (Vpp) that is supplied during programming (write operations).
A typical DRAM pin configuration is shown in Figure 5.4b, for a 16-Mbit chip
Vorganized
d
chip
nh
as
4M * 4. There
are several differences from a ROM chip. Because
a RAM can be updated, the data pins are input/output. The write enable (WE)
and output enable (OE) pins indicate whether this is a write or read operation.
Dng n ng a ch dn knh cho

php truyn 2n bit a ch
Tn hiu chn a ch hng RAS
(Row Address Select)
Tn hiu chn a ch ct CAS
(Column Address Select)
Dung lng ca DRAM= 22n x m bit
A19
32
Vcc
Vcc
24
Vss
A16
31
A18
D0
23
D3
A15
30
A17
D1
22
D2
A12
29
A14
WE
21
CAS
A7
28
A13
RAS
A6
27
A8
NC
A5
26
A9
A4
25
A3
9 32-Pin Dip 24
A2
10
A1
11
A0
12
D0
20
6 24-Pin Dip 19
OE
A10
18
A8
A11
A0
17
A7
Vpp
A1
16
A6
23
A10
A2
10
15
A5
22
CE
A3
11
14
A4
21
D7
Vcc
12 Top View 13
Vss
13
20
D6
D1
14
19
D5
D2
15
18
D4
Vss
16 Top View 17
D3
0.6"
(a) 8-Mbit EPROM
Figure 5.4
Jan2016
371
Jan2016
0.6"
A9
(b) 16-Mbit DRAM
Typical Memory Package Pins and Signals
372
93
Jan2016
NKK-HUST
NKK-HUST
Thit k m-un nh bn dn
n
n
Tng di t nh
VD1:
n Cho chip nh SRAM 4K x 4 bit
n Thit k m-un nh 4K x 8 bit
Gii:
n Dung lng chip nh = 212 x 4 bit
n chip nh c:
2n x
Dung lng chip nh

m bit
Cn thit k tng dung lng:
n
n
n
Thit k tng di t nh
Thit k tng s lng t nh
Thit k kt hp
n
n
m-un nh cn c:
n
n
Jan2016
373
NKK-HUST
Jan2016
12 chn a ch
4 chn d liu
12 chn a ch
8 chn d liu
374
NKK-HUST
S v d tng di t nh
Bi ton tng di t nh tng qut
A11A0
A11A0
D3 D0
D3 D0
CS
WE
A11A0
D7 D4
CS
OE
WE
D3 D0
Cho chip nh 2n x mbit

Thit k m-un nh 2n x (k.m) bit
Dng k chip nh
OE
CS
WE
OE
Jan2016
375
Jan2016
376
94
Jan2016
NKK-HUST
NKK-HUST
Tng s lng t nh
S v d tng s lng t nh
VD2:
Gii:
12 x 8 bit
n Dung lng chip nh = 2
n chip nh c:
n
n
A11A0
A11A0
D7 D0
CS
A12
OE
D7 D0
A11A0
D7 D0
13 chn a ch
8 chn d liu
377
NKK-HUST
Jan2016
CS
Y0 Y1
WE
WE
OE
OE
378
NKK-HUST
B gii m 24
A
B
Jan2016
Y1
12 chn a ch
8 chn d liu
Jan2016
WE
Y0
B
gii m
1 2
Dung lng m-un nh = 213 x 8 bit

n
Thit k kt hp
Y0
Y1
Y2
Y3
Y0
Y1
B
gii m
2 4 Y2
Y3
V d 3:
Phng php thc hin:
n Kt hp v d 1 v v d 2
n T v s thit k
379
Jan2016
380
95
Jan2016
NKK-HUST
NKK-HUST
2. Cc c trng c bn ca b nh chnh
n
n
n
Jan2016
T chc b nh an xen (interleaved memory)
Cha cc chng trnh ang thc hin v cc

d liu ang c s dng
Tn ti trn mi h thng my tnh
Bao gm cc ngn nh c nh a ch trc
tip bi CPU
Dung lng ca b nh chnh nh hn khng
gian a ch b nh m CPU qun l.
Vic qun l logic b nh chnh tu thuc vo
h iu hnh
rng ca bus d liu trao i vi

b nh: m = 8, 16, 32, 64,128 ... bit
n Cc ngn nh c t chc theo byte
t chc b nh vt l khc nhau
n
381
NKK-HUST
Jan2016
382
NKK-HUST
m=8bit mt bng nh tuyn tnh
m = 16bit hai bng nh an xen

Bng 1
0
1
2
3
4
5
6
. . .
AN-1 - A0
Bng 0
1
3
5
7
9
...
i
...
2i+1
AN-1 - A1
...
. . .
0
2
4
6
8
2i
...
BE1
D15 - D8
D7 - D0
Jan2016
383
Jan2016
BE0
D7 - D0
384
96
Jan2016
NKK-HUST
NKK-HUST
m = 32bit bn bng nh an xen

bng 3
bng 2
3
7
11
15
19
...
...
Jan2016
BE2
D23-D16
BE1
BE0
D63-D56
385
NKK-HUST
Jan2016
BE0
D7 - D0
386
V d v thao tc ca cache
n
Cache c tc nhanh hn b nh chnh

Cache c t gia CPU v b nh chnh nhm
tng tc CPU truy cp b nh
Cache c th c t trn chip CPU
n
n
n
CPU
cache
B nh
chnh
truyn theo t nh
Jan2016
BE1
D15 - D8
NKK-HUST
1. Nguyn tc chung ca cache
8i
...
7.3. B nh m (cache memory)
...
...
BE7
D7 - D0
0
8
16
8i+1
...
...
8i+7
AN-1 - A3
...
D15 - D8
bng 0
1
9
17
...
...
4i
...
bng 1
7
15
23
...
4i+1
...
BE3
D31-D24
0
4
8
12
16
...
4i+2
bng 7
bng 0
1
5
9
13
17
...
4i+3
AN-1 - A2
bng 1
2
6
10
14
18
m = 64bit tm bng nh an xen
truyn theo block nh
CPU yu cu ni dung ca ngn nh

CPU kim tra trn cache vi d liu ny
Nu c, CPU nhn d liu t cache
(nhanh)
Nu khng c, c Block nh cha d
liu t b nh chnh vo cache
Tip chuyn d liu t cache vo
CPU
387
Jan2016
388
97
Jan2016
NKK-HUST
NKK-HUST
Cu trc chung ca cache / b nh chnh
Cu trc chung ca cache / b nh chnh (tip)
B nh chnh
Tag
Cache
L0
L1
L2
L3
CPU
...
B0
B1
B2
B3
n
n
B nh chnh c 2N byte nh
B nh chnh v cache c chia thnh cc
khi c kch thc bng nhau
...
B nh chnh: B0, B1, B2, ... , Bp-1 (p Blocks)
Bj
B nh cache: L0, L1, L2, ... , Lm-1 (m Lines)
Kch thc ca Block (Line) = 8,16,32,64,128 byte
Li
...
...
Lm-1
Mi Line trong cache c mt th nh (Tag) c

gn vo
BP-1
Jan2016
389
NKK-HUST
390
NKK-HUST
Cu trc chung ca cache / b nh chnh (tip)
2. Cc phng php nh x
Mt s Block ca b nh chnh c np vo
cc Line ca cache
Ni dung Tag (th nh) cho bit Block no ca
b nh chnh hin ang c cha Line
Ni dung Tag c cp nht mi khi Block t
b nh chnh np vo Line
Khi CPU truy nhp (c/ghi) mt t nh, c
hai kh nng xy ra:
(Chnh l cc phng php t chc b

nh cache)
nh x trc tip
(Direct mapping)
nh x lin kt ton phn
(Fully associative mapping)
nh x lin kt tp hp
(Set associative mapping)
n
n
Jan2016
Jan2016
T nh c trong cache (cache hit)

T nh khng c trong cache (cache miss).
391
Jan2016
392
98
Jan2016
NKK-HUST
NKK-HUST
nh x trc tip
n
Mi Block ca b nh chnh ch c th c np
vo mt Line ca cache:
n
n
n
n
n
n
n
nh x trc tip (tip)
B0 L0
B1 L1
....
Bm-1 Lm-1
Bm L0
Bm+1 L1
....
n
Jan2016
Line
Word
T bit
L bit
W bit
L0
B0
L1
B1
...
...
Li
So snh
...
cache hit
Bj ch c th np vo L j mod m
m l s Line ca cache.
Bj
...
Lm-1
Bm-1
cache miss
393
NKK-HUST
Jan2016
394
NKK-HUST
nh x trc tip (tip)
nh x trc tip (tip)
a ch N bit ca b nh chnh chia thnh ba

trng:
n
n
n
Trng Word gm W bit xc nh mt t nh trong

Block hay Line:
2W = kch thc ca Block hay Line
Trng Line gm L bit xc nh mt trong s cc Line
trong cache:
2L = s Line trong cache = m
Trng Tag gm T bit:
T = N - (W+L)
Mi th nh (Tag) ca mt Line cha c T bit

Khi Block t b nh chnh c np vo Line ca cache
th Tag c cp nht gi tr l T bit a ch cao ca
Block
Khi CPU mun truy nhp mt t nh th n pht ra mt
a ch N bit c th
n
n
Nh vo gi tr L bit ca trng Line s tm ra Line tng ng

c ni dung Tag Line (T bit), ri so snh vi T bit bn tri
ca a ch va pht ra
n
n
n
n
Jan2016
Tag
B nh
chnh
Cache
Tng qut
n
Tag
Nbit a ch b nh
395
Jan2016
Ging nhau: cache hit

Khc nhau: cache miss
u im: B so snh n gin

Nhc im: Xc sut cache hit thp
396
99
Jan2016
NKK-HUST
NKK-HUST
nh x lin kt ton phn

n
nh x lin kt ton phn (tip)
Mi Block c th np vo bt k Line
no ca cache
a ch ca b nh chnh chia thnh hai
trng:
Tag
N bit a ch b nh
Tag
Word
T bit
W bit
L0
B0
L1
B1
...
...
Trng Word
n Trng Tag dng xc nh Block ca
b nh chnh
n
B nh
chnh
Cache
Li
So snh
...
cache hit
Tag xc nh Block ang nm Line
Bj
...
Lm-1
Bm-1
cache miss
Jan2016
397
NKK-HUST
Jan2016
NKK-HUST
nh x lin kt ton phn (tip)

n
n
Jan2016
n
n
n
Dung ha cho hai phng php trn

Cache c chia thnh cc Tp (Set)
Mi mt Set cha mt s Line
V d:
n
Nu gp gi tr bng nhau: cache hit xy ra Line

Nu khng c gi tr no bng: cache miss
u im: Xc sut cache hit cao

Nhc im:
n
So snh T bit bn tri ca a ch va pht ra vi ln lt ni

dung ca cc Tag trong cache
n
nh x lin kt tp hp
Mi th nh (Tag) ca mt Line cha c T bit

Khi Block t b nh chnh c np vo Line ca cache
th Tag c cp nht gi tr l T bit a ch cao ca
Block
a ch N bit c th
n
398
nh x theo nguyn tc sau:

n
n
So snh ng thi vi tt c cc Tag mt nhiu thi gian

B so snh phc tp
t s dng
n
399
Jan2016
4 Line/Set 4-way associative mapping

B0 S0
B1 S1
B2 S2
.......
400
100
Jan2016
NKK-HUST
NKK-HUST
nh x lin kt tp hp (tip)
Tag
Nbit a ch b nh
Tag
Set
Word
T bit
S bit
W bit
nh x lin kt tp hp (tip)
Cache
B nh
chnh
L0
B0
L1
L2
S0
L3
L6
B2
B3
L4
L5
B1
B4
S1
L7
B5
n
...
...
Kch thc Block = 2W Word

Trng Set c S bit dng xc nh mt trong s cc
Set trong cache. 2S = S Set trong cache
Trng Tag c T bit: T = N - (W+S)
a ch N bit c th
n
So snh
Sk X
...
cache hit
Nh vo gi tr S bit ca trng Set s tm ra Set tng ng

So snh T bit bn tri ca a ch va pht ra vi ln lt ni
dung ca cc Tag trong Set
n
...
Sv-1
n
n
cache miss
Jan2016
401
NKK-HUST
n
n
n
n
402
NKK-HUST
Vi nh x trc tip
Gi s my tnh nh a ch cho tng byte

Khng gian a ch b nh chnh = 4GiB
Dung lng b nh cache l 256KiB
Kch thc Line (Block) = 32byte.
Xc nh s bit ca cc trng a ch cho ba
trng hp t chc:
n
n
n
Jan2016
Tng qut cho c hai phng php trn

Thng dng vi: 2,4,8,16Lines/Set
Jan2016
V d v nh x a ch
n
Nu gp gi tr bng nhau: cache hit xy ra Line tng ng

Nu khng c gi tr no bng: cache miss
n
n
n
n
nh x trc tip
nh x lin kt ton phn
nh x lin kt tp hp 4 ng
403
Jan2016
B nh chnh = 4GiB = 232 byte N = 32 bit

Cache = 256 KiB = 218 byte.
Line = 32 byte = 25 byte W = 5 bit
S Line trong cache = 218/ 25 = 213 Line
L = 13 bit
T = 32 - (13 + 5) = 14 bit
Tag
Line
Word
14 bit
13 bit
5 bit
404
101
Jan2016
NKK-HUST
NKK-HUST
Vi nh x lin kt ton phn
Vi nh x lin kt tp hp 4 ng
n Line = 32 byte = 25 byte W = 5 bit
n S Line trong cache = 218/ 25 = 213 Line
n Mt Set c 4 Line = 22 Line
s Set trong cache = 213/ 22 = 211 Set
S = 11 bit
n S bit ca trng Tag s l: T = 32 - (11 + 5)
= 16 bit
n
n
n
n

Line = 32 byte = 25 byte W = 5 bit
S bit ca trng Tag s l: T = 32 - 5 = 27 bit
Tag
Word
27 bit
5 bit
Tag
16 bit
Jan2016
405
NKK-HUST
Jan2016
Word
11 bit
5 bit
406
NKK-HUST
3. Thay th block trong cache
Thay th block trong cache (tip)

Vi nh x lin kt: cn c thut gii thay th:
Vi nh x trc tip:
n Khng phi la chn
n Mi Block ch nh x vo mt Line xc
nh
n Thay th Block Line
n
n
n
Jan2016
Set
407
Jan2016
Random: Thay th ngu nhin

FIFO (First In First Out): Thay th Block no
nm lu nht trong Set
LFU (Least Frequently Used): Thay th Block
no trong Set c s ln truy nhp t nht trong
cng mt khong thi gian
LRU (Least Recently Used): Thay th Block
trong Set tng ng c thi gian lu nht khng
c tham chiu ti
Ti u nht: LRU
408
102
Jan2016
NKK-HUST
NKK-HUST
4. Phng php ghi d liu khi cache hit

n
7.4. B nh ngoi
Ghi xuyn qua (Write-through):
ghi c cache v c b nh chnh
tc chm
n
n
Ghi tr sau (Write-back):

n
ch ghi ra cache
tc nhanh
n
n
Bng t: t s dng
a t: a cng HDD (Hard Disk Drive)
a quang: CD, DVD
B nh Flash:
n
khi Block trong cache b thay th cn phi

ghi tr c Block v b nh chnh
Jan2016
n
n
409
nh th rn SSD (Solid State Drive)

USB flash
Th nh
Jan2016
NKK-HUST
410
NKK-HUST
a cng (HDD Hard Disk Drive)

6.1 / MAGNETIC DISK
Readwrite head (1 per surface)
188
Tn ti di dng cc thit b lu tr
Cc kiu b nh ngoi
Tracks
Intersector gap
S6
SN
Surface 9
Platter
Surface 8
Surface 7
n
n
n
Dung lng ln
Tc c/ghi chm
Tn nng lng
D b li c hc
R tin
Surface 2
Surface 1
SN
S5
S5
Surface 4
Surface 3
Intertrack gap
S6
Direction of
arm motion
Surface 6
Surface 5
CHAPTER 6 / EXTERNAL MEMORY

Sectors
HDD
191
S1
S4
S4
S1
Surface 0
S2
Spindle
S2
Figure 6.5
S3
S3
Figure 6.2 Disk Data Layout
Jan2016
Figure 6.2 depicts this data layout. Adjacent tracks are separated by gaps. This
prevents, or at least minimizes, errors due to misalignment of the head or simply
interference of magnetic fields.
Data are transferred to and from the disk in sectors (Figure 6.2). There are
typically hundreds of sectors per track, and these may be of either fixed or variable
length. In most contemporary systems, fixed-length sectors are used, with 512 bytes
being the nearly universal sector size. To avoid imposing unreasonable precision
requirements on the system, adjacent sectors are separated by intratrack (intersector) gaps.
A bit near the center of a rotating disk travels past a fixed point (such as a read
write head) slower than a bit on the outside. Therefore, some way must be found
to compensate for the variation in speed so that the head can read all the bits at the
same rate. This can be done by increasing the spacing between bits of information
recorded in segments of the disk. The information can then be scanned at the same
rate by rotating the disk at a fixed speed, known as the constant angular velocity
(CAV). Figure 6.3a shows the layout of a disk using CAV. The disk is divided into
a number of pie-shaped sectors and into a series of concentric tracks. The advantage of using CAV is that individual blocks of data can be directly addressed by
track and sector. To move the head from its current location to a specific address, it
only takes a short movement of the head to a specific track and a short wait for the
Boom
Components of a Disk Drive

411
tracks that are of equal distance from the center of the disk. The set of all the tracks
in the same relative position on the platter is referred to as a cylinder. For example,
all of the shaded tracks in Figure 6.6 are part of one cylinder.
Finally, the head mechanism provides a classification of disks into three types.
Traditionally, the read-write head has been positioned a fixed distance above the
platter, allowing an air gap. At the other extreme is a head mechanism that actually
comes into physical contact with the medium during a read or write operation. This
mechanism is used with the floppy disk, which is a small, flexible platter and the
least expensive type of disk.
Jan2016
412
103
Jan2016
NKK-HUST
NKK-HUST
SSD (Solid State Drive)

n
n
n
n
n
n
n
Jan2016
a quang
B nh bn dn flash
Khng kh bin
Tc nhanh
Tiu th nng lng t
Gm nhiu chip nh flash
v cho php truy cp song
song
t b li
t tin
n
n
n
413
Digital Video Disc hoc Digital Versatile Disk

Ghi mt hoc hai mt
Mt hoc hai lp trn mt mt
Thng dng: 4,7GB/lp
Jan2016
414
NKK-HUST
H thng lu tr dung lng ln: RAID

n
Dung lng thng dng 650MB
DVD
n
NKK-HUST
CD (Compact Disc)
RAID 0, 1, 2
198
Redundant Array of Inexpensive Disks

(Redundant Array of Independent Disks)
Tp cc a cng vt l c OS coi nh mt
logic duy nht dung lng ln
D liu c lu tr phn tn trn cc a
vt l truy cp song song (nhanh)
Lu tr thm thng tin d tha, cho php khi
phc li thng tin trong trng hp a b hng
an ton thng tin
7 loi ph bin (RAID 0 6)
CHAPTER 6 / EXTERNAL MEMORY
strip 0
strip 1
strip 2
strip 4
strip 5
strip 6
strip 3
strip 7
strip 8
strip 9
strip 10
strip 11
strip 12
strip 13
strip 14
strip 15
(a) RAID 0 (Nonredundant)
strip 0
strip 1
strip 2
strip 3
strip 0
strip 1
strip 2
strip 4
strip 5
strip 6
strip 7
strip 4
strip 5
strip 6
strip 3
strip 7
strip 8
strip 9
strip 10
strip 11
strip 8
strip 9
strip 10
strip 11
strip 12
strip 13
strip 14
strip 15
strip 12
strip 13
strip 14
strip 15
b2
b3
f0(b)
f1(b)
f2(b)
(b) RAID 1 (Mirrored)
b0
b1
(c) RAID 2 (Redundancy through Hamming code)
Figure 6.8 RAID Levels

Jan2016
415
Jan2016
416
RAID Level 0
RAID level 0 is not a true member of the RAID family because it does not include
redundancy to improve performance. However, there are a few applications, such
as some on supercomputers in which performance and capacity are primary concerns and low cost is more important than improved reliability.
For RAID 0, the user and system data are distributed across all of the disks
in the array. This has a notable advantage over the use of a single large disk: If two
-different I/O requests are pending for two different blocks of data, then there is a
good chance that the requested blocks are on different disks. Thus, the two requests
can be issued in parallel, reducing the I/O queuing time.
But RAID 0, as with all of the RAID levels, goes further than simply distribut-
104
b0
b1
b2
b3
P(b)
Jan2016
(d) RAID 3 (Bit-interleaved parity)
NKK-HUST
NKK-HUST
RAID 3 & 4
block 1
block 2
block 3
block 4
block 5
block 6
block 7
P(4-7)
block 8
block 9
block 10
block 11
P(8-11)
block 12
block 13
block 14
block 15
P(12-15)
RAID 5 & 6
6.2 / RAID
b0
block 0
b1
b2
b3
199
P(0-3)
(e) RAID 4 (Block-level parity)
P(b)
block 0
block 1
block 2
block 3
P(0-3)
block 4
block 5
block 6
P(4-7)
block 7
block 8
block 9
P(8-11)
block 10
block 11
block 12
P(12-15)
block 13
block 14
block 15
P(16-19)
block 16
block 17
block 18
block 19
(d) RAID 3 (Bit-interleaved parity)

(f) RAID 5 (Block-level distributed parity)
block 0
block 1
block 2
block 3
block 4
block 5
block 6
block 7
P(0-3)
P(4-7)
block 8
block 9
block 10
block 11
P(8-11)
block 12
block 13
block 14
block 15
P(12-15)
(e) RAID 4 (Block-level parity)
block 0
block 1
block 2
block 3
P(0-3)
Q(0-3)
block 4
block 5
block 6
P(4-7)
Q(4-7)
block 7
block 8
block 9
P(8-11)
Q(8-11)
block 10
block 11
block 12
P(12-15)
Q(12-15)
block 13
block 14
block 15
(g) RAID 6 (Dual redundancy)
Figure 6.8 RAID Levels (continued)
Jan2016
block 0
block 1
block 4
block 5
block 3
P(0-3)
block 6
P(4-7)
P(8-11)
block 10
block 2
block 7
block 8
block 9
block 12
P(12-15)
block 13
block 14
block 15
P(16-19)
block 16
block 17
block 18
block 19
417
block 11
second
Jan2016
strips on each disk; and so on. The

advantage
of this layout is that if a single
Computer
Architecture
I/O request consists of multiple logically contiguous strips, then up to n strips for
that request can be handled in parallel, greatly reducing the I/O transfer time.
Figure 6.9 indicates the use of array management software to map between
logical and physical disk space. This software may execute either in the disk subsystem
or in a host computer.
418
(f ) RAID 5 (Block-level distributed parity)
NKK-HUST
NKK-HUST
nh x d liu ca RAID 0
block 0
block 1
block 2
block 3
200 block
CHAPTER
6 / EXTERNAL
MEMORY
4
block 5
block 6
block 8
block 9
Physical
P(8-11)
P(12-15) disk 0 Q(12-15)
Logical
disk
block
12
strip 0
strip 0
strip 1
strip 4
strip 2
P(4-7)
Q(4-7)
block 7
block 10
block 11
Physical
disk 1block
strip 8
RAID Levels
(continued)
strip
12
13
Physical
diskblock
2
strip 2
14
7.5. B nh o (Virtual Memory)
Q(0-3)
Q(8-11)
strip 1
(g) RAID 6 (Dual redundancy)
Figure
strip 3 6.8
P(0-3)
Physical
disk
3
block
15
strip 3
strip 5
strip 6
strip 7
strip 9
strip 10
strip 11
strip 13
strip 14
strip 15
strip 4
strip 5
second
on each disk; and so on. The advantage of this layout is that if a single
stripstrips
6
I/O request
consists of Array
multiple logically contiguous strips, then up to n strips for
strip 7
management
that request
can be handled
in parallel, greatly reducing the I/O transfer time.
strip 8
software
Figure
6.9 indicates the use of array management software to map between
strip
9
logical
and
strip
10 physical disk space. This software may execute either in the disk subsystem
11 computer.
or in strip
a host
K thut phn trang: Chia khng gian a

ch b nh thnh cc trang nh c kch
thc bng nhau v nm lin k nhau
Thng dng: kch thc trang = 4KiB
n K thut phn on: Chia khng gian nh
thnh cc on nh c kch thc thay
i, cc on nh c th gi ln nhau.
n
strip 12
strip 13
strip 14
strip 15
Figure 6.9
Data Mapping for a RAID Level 0 Array
RAID 0
Jan2016
FOR HIGH DATA TRANSFER CAPACITY The performance of any of the

RAID levels depends critically on the request patterns of the host system and on
Computer
Architecture
the layout of the data. These
issues can
be most clearly addressed in RAID 0, where
the impact of redundancy does not interfere with the analysis. First, let us consider
the use of RAID 0 to achieve a high data transfer rate. For applications to experience
a high transfer rate, two requirements must be met. First, a high transfer capacity
must exist along the entire path between host memory and the individual disk drives.
This includes internal controller buses, host system I/O buses, I/O adapters, and host
memory buses.
The second requirement is that the application must make I/O requests that
drive the disk array efficiently. This requirement is met if the typical request is for
large amounts of logically contiguous data, compared to the size of a strip. In this
case, a single I/O request involves the parallel transfer of data from multiple disks,
increasing the effective transfer rate compared to a single-disk transfer.

RAID 0
FOR HIGH
I/O
REQUEST RATE In a transaction-oriented environment,
Khi nim b nh o: gm b nh
chnh v b nh ngoi m c CPU
coi nh l mt b nh duy nht (b nh
chnh).
Cc k thut thc hin b nh o:
419
Jan2016
420
105
A physical address is an actual location in main memory. When the processor executes a process, it automatically converts from logical to physical address by adding
the current starting location of the process, called its base address, to each logical
address. This is another example of a processor hardware feature designed to meet
an OS requirement. The exact nature of this hardware feature depends on the memory management strategy in use. We will see several examples later in this chapter.
Jan2016
Paging
NKK-HUST
NKK-HUST
Phn trang
Both unequal fixed-size and variable-size partitions are inefficient in the use of
memory. Suppose, however, that memory is partitioned into equal fixed-size chunks
that are relatively small, and that each process is also divided into small fixed-size
chunks of some size. Then the chunks of a program, known as pages, could be
assigned to available chunks of memory, known as frames, or page frames. At most,
then, the wasted space in memory for that process is a fraction of the last page.
Figure 8.15 shows an example of the use of pages and frames. At a given point
in time, some of the frames in memory are in use and some are free. The list of free
frames is maintained by the OS. Process A, stored on disk, consists of four pages.
Cp pht cc khung trang
Jan2016
NKK-HUST
process. Within the program, each logical address consists of a page number and
a relative address within the page. Recall that in the case of simple partitioning, a
logical address is the location of Computer
a word relative
to the beginning of the program;
Architecture
the processor translates that into a physical address. With paging, the logicalto-physical address translation is still done by processor hardware. The processor
must know how to access the page table of the current process. Presented with a
logical address (page number, relative address), the processor uses the page table
to produce a physical address (frame number, relative address). An example is
shown in Figure 8.16.
This approach solves the problems raised earlier. Main memory is divided
into many small equal-size frames. Each process is divided into frame-size pages:
smaller processes require fewer pages, larger processes require more. When a
process is brought in, its pages are loaded into available frames, and a page table
is set up.
Process A
Page 0
Page 1
Page 2
Page 3
Free frame list

13
14
15
18
20
Logical
address
30
Frame Relative address

number within frame
Physical
address
14
30
Page 3
of A
15
421
Jan2016
In
use
17
In
use
18
16
In
use
Free frame list

20
17
In
use
Process A
page table
18
Page 0
of A
19
In
use
18
In
use
13
14
15
20
(b) After
422
Phn trang theo yu cu
n
n
18
Khng yu cu tt c cc trang ca tin trnh nm

trong b nh
Ch np vo b nh nhng trang c yu cu
Li trang
Page 0
of A
16
Nguyn tc lm vic ca b nh o phn trang
17
15
Page 2
of A
Page 3
15 of A
13
14
14
NKK-HUST
16
18
14
(a) Before
n
13
Page 1
of A
Figure 8.15 Allocation of Free Frames
Page 2
of A
13
Page 0
Page 1
Page 2
Page 3
20
Page Relative address

number within page
Process A
19
Main
memory
13
13
15
a ch logic v a ch vt l ca phn trang
Page 1
of A
Main
memory
Main
memory
Phn chia b nh thnh cc phn c kch

thc bng nhau gi l cc khung trang
n Chia chng trnh (tin trnh) thnh cc trang
n Cp pht s hiu khung trang yu cu cho
tin trnh
n OS duy tr danh sch cc khung trang nh
288 CHAPTER 8 / OPERATING SYSTEM SUPPORT
trng
When it comes time to load this process, the OS finds four free frames and loads the
pages trnh
of the process
A into the
four frames.
n four
Tin
khng
yu
cu cc khung trang lin
Now suppose, as in this example, that there are not sufficient unused contiguous
tip frames to hold the process. Does this prevent the OS from loading A?
The answer is no, because we can once again use the concept of logical address. A
addressbng
will no longer
suffice.
Rather,
the OSl
maintains a page table
n simple
S base
dng
trang
qun
for each process. The page table shows the frame location for each page of the
n
Trang c yu cu khng c trong b nh

HH cn hon i trang yu cu vo
C th cn hon i mt trang no ra ly
ch
Cn chn trang a ra
Process A
page table
Jan2016
Figure 8.16 Logical and Physical Addresses
423
Jan2016
424
106
Jan2016
NKK-HUST
NKK-HUST
Tht bi
n
n
n
n
n
Li ch
Qu nhiu tin trnh trong b nh qu nh

OS tiu tn ton b thi gian cho vic hon i
C t hoc khng c cng vic no c thc
hin
a lun lun sng
Gii php:
n
n
n
n
n
Thut ton thay trang

Gim bt s tin trnh ang chy
Thm b nh
Jan2016
425
NKK-HUST
Khng cn ton b tin trnh nm trong

b nh chy
C th hon i trang c yu cu
Nh vy c th chy nhng tin trnh
ln hn tng b nh sn dng
B nh chnh c gi l b nh thc
Ngi dng cm gic b nh ln hn
b nh thc
Jan2016
NKK-HUST
Cu trc bng trang
B nh trn my tnh PC
8.3 / MEMORY MANAGEMENT
291
Virtual address
n bits
Page # Offset
n bits
Hash
function
m bits
Page #
Control
bits
Process
ID
Chain
2m ! 1
Inverted page table
(one entry for each
physical memory frame)
Figure 8.17
B nh cache: tch hp trn chip vi

x l:
Jan2016
426
L1: cache lnh v cache d liu

L2, L3
B nh chnh: Tn ti di dng cc
m-un nh RAM
Frame # Offset
m bits
Real address
Inverted Page Table Structure
inverted page table for each real memory page frame rather than one per virtual
page. Thus a fixed proportion of real memory is required for the tables regardless of
the number of processes or virtual pages supported. Because more than one virtual
address may map into the same hash table entry, a chaining technique is used for
managing the overflow. The hashing technique results in chains that are typically
shortbetween one and two entries. The page tables structure is called inverted
because it indexes page table entries by frame number rather than by virtual page
number.
Translation
Lookaside Buffer
Nguyn Kim Khnh
DCE-HUST
In principle, then, every virtual memory reference can cause two physical memory accesses: one to fetch the appropriate page table entry, and one to fetch the
desired data. Thus, a straightforward virtual memory scheme would have the effect
427
Jan2016
428
107
Jan2016
NKK-HUST
NKK-HUST
B nh trn PC (tip)
n
ROM BIOS cha cc chng trnh sau:

n
n
n
n
n
n
n
Jan2016
Ht chng 7
CMOS RAM:
n
Chng trnh POST (Power On Self Test)

Chng trnh CMOS Setup
Chng trnh Bootstrap loader
Cc trnh iu khin vo-ra c bn (BIOS)
Cha thng tin cu hnh h thng
ng h h thng
C pin nui ring
Video RAM: qun l thng tin ca mn hnh

Cc loi b nh ngoi
429
NKK-HUST
Jan2016
430
NKK-HUST
Kin trc my tnh
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8
H THNG VO-RA
Nguyn Kim Khnh
Jan2016
431
Jan2016
432
108
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 8
8.1. Tng quan v h thng vo-ra

n
8.1. Tng quan v h thng vo-ra

8.2. Cc phng php iu khin vo-ra
8.3. Ni ghp thit b vo-ra
Chc nng: Trao i

thng tin gia my tnh
vi bn ngoi
Cc thao tc c bn:
n
n
433
NKK-HUST
M-un
vo-ra
Thit b
vo-ra
M-un
vo-ra
Thit b
vo-ra
434
NKK-HUST
Thit b vo-ra
Tn ti a dng cc thit b vo-ra khc

nhau v:
n
n
n
n
n
Nguyn tc hot ng
Tc
Khun dng d liu
Cn gi l thit b ngoi vi (Peripherals)

Chc nng: chuyn i d liu gia bn trong
v bn ngoi my tnh
Phn loi:
n
Tt c cc thit b vo-ra u chm hn

CPU v RAM
Cn c cc m-un vo-ra ni ghp
cc thit b vi CPU v b nh chnh
n
n
435
Jan2016
Thit b vo (Input Devices)

Thit b ra (Output Devices)
Thit b truyn thng (Communication Devices)
Giao tip:
n
Jan2016
Cc thit b vo-ra
Cc m-un vo-ra
Jan2016
c im ca h thng vo-ra
n
Thit b
vo-ra
Vo d liu (Input)
Ra d liu (Output)
Cc thnh phn chnh:

n
Jan2016
Bus
h
thng
Ngi - my
My - my
436
109
Jan2016
NKK-HUST
NKK-HUST
Cu trc chung ca thit b vo-ra

D liu
t/n
m-un
vo-ra
B
m
d
liu
B
chuyn
i
tn hiu
M-un vo-ra
n
D liu
n/t
bn ngoi
Chc nng:
n
n
n
Tn hiu
iu khin
Khi logic iu khin
Tn hiu
trng thi
Jan2016
437
NKK-HUST
Jan2016
438
NKK-HUST
Cu trc ca m-un vo-ra
4. a ch ha cng vo-ra (IO addressing)
Bus
n
Cc
ng
d liu
d liu
B m
d liu
Cng
vo
ra
Tn hiu
iu khin
n
Cc
ng
a ch
d liu
Khi logic
iu khin
Cc
ng
iu
khin
Cng
vo
ra
Tn hiu
iu khin
439
Jan2016
Cc b x l 680x0 ca Motorola
Cc b x l theo kin trc RISC: MIPS, ARM, ...
Mt s b x l c hai khng gian a ch tch

bit:
n
Tn hiu
trng thi
Hu ht cc b x l ch c mt khng gian a
ch chung cho c cc ngn nh v cc cng
vo-ra
n
Tn hiu
trng thi
Jan2016
iu khin v nh thi
Trao i thng tin vi CPU hoc b nh chnh
Trao i thng tin vi thit b vo-ra
m gia bn trong my tnh vi thit b vo-ra
Pht hin li ca thit b vo-ra
Khng gian a ch b nh
Khng gian a ch vo-ra
V d: Intel x86
440
110
Jan2016
NKK-HUST
NKK-HUST
Khng gian a ch tch bit

Khng gian a ch
b nh
N bit
000...000
000...001
000...010
000...011
000...100
000...101
Khng gian a
ch vo-ra
.
.
.
.
.
.
Cc phng php a ch ho cng vo-ra

N1 bit
00...00
00...01
00...10
00...11
Vo-ra theo bn b nh
(Memory mapped IO)
Vo-ra ring bit
(Isolated IO hay IO mapped IO)
.
.
.
11...11
.
.
.
111...111
Jan2016
441
NKK-HUST
Jan2016
NKK-HUST
Vo-ra theo bn b nh
n
n
n
n
n
V d lp trnh vo-ra cho MIPS
Cng vo-ra c nh a ch theo khng gian

a ch b nh
CPU coi cng vo-ra nh ngn nh
Lp trnh trao i d liu vi cng vo-ra bng
cc lnh truy nhp d liu b nh
C th thc hin trn mi h thng
V d: B x l MIPS
n
n
Jan2016
442
n
n
Cng 1: 0xFFFFFFF4
Cng 2: 0xFFFFFFF8
Ghi gi tr 0x41 ra cng 1

addi $t0, $0, 0x41
sw $t0, 0xFFF4($0)
32-bit a ch cho mt khng gian a ch chung cho c

cc ngn nh v cc cng vo-ra
Cc cng vo-ra c gn cc a ch thuc vng a
ch d tr
Vo/ra d liu: s dng lnh load/store
V d: C hai cng vo-ra c gn a ch:
# a gi tr 0x41
# ra cng 1
Ch : gi tr 16-bit 0xFFF4 c sign-extended thnh 32-bit 0xFFFFFFF4

n
c d liu t cng 2 a vo $t3

lw $t3, 0xFFF8($0)
443
Jan2016
# c d liu cng 2 a vo $t3
444
111
Jan2016
NKK-HUST
NKK-HUST
Vo-ra ring bit (Isolated IO)

n
8.2. Cc phng php iu khin vo-ra
Cng vo-ra c nh a ch theo khng gian

a ch vo-ra ring
Lp trnh trao i d liu vi cng vo-ra bng
cc lnh vo-ra chuyn dng
V d: Intel x86
n
n
Truy nhp b nh trc tip - DMA

(Direct Memory Access)
Lnh IN: nhn d liu t cng vo

Lnh OUT: a d liu n cng ra
Jan2016
445
NKK-HUST
Jan2016
446
NKK-HUST
BaCHAPTER
k thut
thc hin vo mt khi d liu
7 / INPUT/OUTPUT
1. Vo-ra bng chng trnh
230
Issue read
command to
I/O module
CPU
Read status
of I/O
module
I/O
Not
ready
Check
Status
Issue read
command to
I/O module
I/O
CPU
Read status
of I/O
module
Error
condition
Check
status
Ready
No
CPU
I/O
Do something
else
Interrupt
I/O
CPU
Error
condition
Issue read
block command
to I/O module
Read status
of DMA
module
CPU
DMA
Do something
else
Nguyn tc chung:
Interrupt
DMA
CPU
Next instruction
(c) Direct Memory Access
Ready
Read word
from I/O
module
I/O
Write word
into memory
CPU
CPU
Memory
Done?
Yes
Next instruction
(a) Programmed I/O
Jan2016
Vo-ra iu khin bng ngt

(Interrupt Driven IO)
Dng 8-bit hoc 16-bit a ch cho khng gian a ch

vo-ra ring
C hai lnh vo-ra chuyn dng
Vo-ra bng chng trnh

(Programmed IO)
No
Read word
from I/O
module
I/O
Write word
into memory
CPU
CPU
Memory
Done?
CPU iu khin trc tip vo-ra

bng chng trnh cn phi lp
trnh vo-ra trao i d liu
gia CPU vi m-un vo-ra
CPU nhanh hn thit b vo-ra rt
nhiu ln, v vy trc khi thc
hin lnh vo-ra, chng trnh cn
c v kim tra trng thi sn sng
ca m-un vo-ra
c trng thi
m-un vo-ra
Sn sng ?
Y
Trao i d liu
Yes
Next instruction
(b) Interrupt-Driven I/O
Figure 7.4 Three Techniques for Input of a Block of Data
Figure 7.4a gives an example of the use of programmed I/O to read in a block of
data from a peripheral device (e.g., a record from tape) into memory. Data are read
in one word (e.g., 16 bits) at a time. For each word that is read in, the processor must
remain in a status-checking cycle until it determines that the word is available in the
I/O modules data register. This flowchart highlights the main disadvantage of this
technique: it is a time-consuming process that keeps the processor busy needlessly.

I/O Instructions
With programmed I/O, there is a close correspondence between the I/O-related
447
Jan2016
448
112
Jan2016
NKK-HUST
NKK-HUST
Cc tn hiu iu khin vo-ra
Cc lnh vo-ra
Tn hiu iu khin (Control): kch hot thit b

vo-ra
Tn hiu kim tra (Test): kim tra trng thi
ca m-un vo-ra v thit b vo-ra
Tn hiu iu khin c (Read): yu cu mun vo-ra nhn d liu t thit b vo-ra v
a vo b m d liu, ri CPU nhn d liu
Tn hiu iu khin ghi (Write): yu cu mun vo-ra ly d liu trn bus d liu a n
b m d liu ri chuyn ra thit b vo-ra
Jan2016
449
NKK-HUST
Jan2016
450
NKK-HUST
c im
n
n
Jan2016
Vi vo-ra theo bn b nh: s

dng cc lnh trao i d liu vi b
nh trao i d liu vi cng vo-ra
Vi vo-ra ring bit: s dng cc lnh
vo-ra chuyn dng (IN, OUT)
2. Vo-ra iu khin bng ngt

n
Vo-ra do mun ca ngi lp trnh

CPU trc tip iu khin trao i d liu
gia CPU vi m-un vo-ra
CPU i m-un vo-ra tiu tn nhiu
thi gian ca CPU
Nguyn tc chung:
CPU khng phi i trng thi sn sng
ca m-un vo-ra, CPU thc hin mt
chng trnh no
n Khi m-un vo-ra sn sng th n pht tn
hiu ngt CPU
n CPU thc hin chng trnh con x l ngt
vo-ra tng ng trao i d liu
n CPU tr li tip tc thc hin chng trnh
ang b ngt
n
451
Jan2016
452
113
Jan2016
NKK-HUST
NKK-HUST
Chuyn iu khin n chng trnh con ngt
Hot ng vo d liu: nhn t m-un vo-ra
Chng trnh
ang thc hin
lnh
lnh
Ngt y
n
Chng trnh con
x l ngt
lnh
lnh
lnh
lnh
lnh i
lnh
lnh i+1
lnh
...
RETURN
...
lnh
Jan2016
453
NKK-HUST
Jan2016
n
n
454
NKK-HUST
Hot ng vo d liu: nhn t CPU

n
M-un vo-ra nhn tn hiu iu khin

c t CPU
M-un vo-ra nhn d liu t thit b
vo-ra, trong khi CPU lm vic khc
Khi c d liu m-un vo-ra pht
tn hiu ngt CPU
CPU yu cu d liu
M-un vo-ra chuyn d liu n CPU
Cc vn ny sinh khi thit k
Pht tn hiu iu khin c

Lm vic khc
Cui mi chu trnh lnh, kim tra tn hiu
yu cu ngt
Nu b ngt:
Lm th no xc nh c m-un
vo-ra no pht tn hiu ngt ?
CPU lm nh th no khi c nhiu yu
cu ngt cng xy ra ?
Ct ng cnh (ni dung cc thanh ghi lin

quan)
n Thc hin chng trnh con x l ngt vo
d liu
n Khi phc ng cnh ca chng trnh ang
thc hin
n
Jan2016
455
Jan2016
456
114
Jan2016
NKK-HUST
NKK-HUST
Cc phng php ni ghp ngt

n
n
Nhiu ng yu cu ngt
S dng nhiu ng yu cu ngt

Hi vng bng phn mm (Software
Poll)
Hi vng bng phn cng (Daisy Chain
or Hardware Poll)
S dng b iu khin ngt (PIC)
Thanh
ghi
yu
cu
ngt
n
n
n
457
NKK-HUST
CPU
Jan2016
M-un
vo-ra
M-un
vo-ra
M-un
vo-ra
Mi m-un vo-ra c ni vi mt ng yu cu
ngt
CPU phi c nhiu ng tn hiu yu cu ngt
Hn ch s lng m-un vo-ra
Cc ng ngt c qui nh mc u tin
458
NKK-HUST
C
ngt
M-un
vo-ra
Jan2016
Hi vng bng phn mm
INTR2
INTR1
INTR0
CPU
Jan2016
INTR3
Hi vng bng phn cng
INTR
Bus d liu
M-un
vo-ra
M-un
vo-ra
M-un
vo-ra
C
ngt
M-un
vo-ra
CPU
CPU thc hin phn mm hi ln lt tng

m-un vo-ra
Chm
Th t cc m-un c hi vng chnh l
th t u tin
459
Jan2016
INTR
INTA
M-un
vo-ra
M-un
vo-ra
M-un
vo-ra
M-un
vo-ra
460
115
Jan2016
NKK-HUST
NKK-HUST
Hi vng bng phn cng (tip)
B iu khin ngt lp trnh c

INTR n
Bus d liu
CPU pht tn hiu chp nhn ngt

(INTA) n m-un vo-ra u tin
Nu m-un vo-ra khng gy ra
ngt th n gi tn hiu n m-un k
tip cho n khi xc nh c m-un
gy ngt
Th t cc m-un vo-ra kt ni trong
chui xc nh th t u tin
INTRn-1
n
n
461
M-un
vo-ra
M-un
vo-ra
462
NKK-HUST
3. DMA (Direct Memory Access)
C s kt hp gia phn cng v phn

mm
Phn cng: gy ngt CPU

n Phn mm: trao i d liu gia CPU vi
m-un vo-ra
Jan2016
M-un
vo-ra
INTR0
PIC Programmable Interrupt Controller

PIC c nhiu ng vo yu cu ngt c qui
nh mc u tin
PIC chn mt yu cu ngt khng b cm c
mc u tin cao nht gi ti CPU
Jan2016
c im ca vo-ra iu khin bng ngt
INTR1
M-un
vo-ra
NKK-HUST
PIC
INTA
n
Jan2016
...
INTR
CPU
463
Jan2016
Chim thi gian ca CPU
khc phc dng k thut DMA

n
CPU trc tip iu khin vo-ra

CPU khng phi i m-un vo-ra, do
hiu qu s dng CPU tt hn
Vo-ra bng chng trnh v bng ngt

do CPU trc tip iu khin:
S dng m-un iu khin vo-ra chuyn

dng, gi l DMAC (Controller), iu khin
trao i d liu gia m-un vo-ra vi b
nh chnh
464
116
Jan2016
NKK-HUST
NKK-HUST
S cu trc ca DMAC
Cc ng d liu
B m d liu
Thanh ghi d liu
Thanh ghi a ch
Cc ng a ch
iu khin c
Yu cu bus
Chuyn nhng bus
iu khin ghi
Ngt
Jan2016
Cc thnh phn ca DMAC
Logic iu khin
Yu cu DMA
Ghi
Chp nhn DMA
465
NKK-HUST
Jan2016
n
n
n
n
Jan2016
Vo hay Ra d liu
a ch thit b vo-ra (cng vo-ra tng ng)
a ch u ca mng nh cha d liu np vo
thanh ghi a ch
S t d liu cn truyn np vo b m d liu
CPU lm vic khc

DMAC iu khin trao i d liu
Sau khi truyn c mt t d liu th:
n
Cc kiu thc hin DMA
CPU ni cho DMAC

n
466
NKK-HUST
Hot ng DMA
n
Thanh ghi d liu: cha d liu trao i

Thanh ghi a ch: cha a ch ngn
nh d liu
B m d liu: cha s t d liu cn
trao i
Logic iu khin: iu khin hot ng
ca DMAC
ni dung thanh ghi a ch tng

ni dung b m d liu gim
Khi b m d liu = 0, DMAC gi tn hiu ngt

CPU bo kt thc DMA
467
Jan2016
DMA truyn theo khi (Block-transfer DMA):

DMAC s dng bus truyn xong c khi
d liu
DMA ly chu k (Cycle Stealing DMA): DMAC
cng bc CPU treo tm thi tng chu k
bus, DMAC chim bus thc hin truyn mt
t d liu.
DMA trong sut (Transparent DMA): DMAC
nhn bit nhng chu k no CPU khng s
dng bus th chim bus trao i mt t d
liu.
468
117
Jan2016
NKK-HUST
NKK-HUST
Cu hnh DMA (1)
Cu hnh DMA (2)

System Bus
System Bus
CPU
CPU
I/O
Module
DMAC
. .
.
I/O
Module
Memory
I/O
Module
Mi ln trao i mt d liu, DMAC s dng

bus hai ln
n
n
Gia m-un vo-ra vi DMAC

Gia DMAC vi b nh
469
NKK-HUST
DMAC
I/O
Module
Memory
I/O
Module
DMAC iu khin mt hoc vi m-un vo-ra

Mi ln trao i mt d liu, DMAC s dng
bus mt ln
n
Jan2016
. .
.
DMAC
Jan2016
Gia DMAC vi b nh
470
NKK-HUST
Cu hnh DMA (3)
c im ca DMA
System Bus
CPU
DMAC
Memory
IO Bus
I/O
Module
I/O
Module
. .
.
I/O
Module
n
n
n
Bus vo-ra tch ri h tr tt c cc thit b cho php DMA

Mi ln trao i mt d liu, DMAC s dng bus mt ln
n
Jan2016
CPU khng tham gia trong qu trnh

trao i d liu
DMAC iu khin trao i d liu gia
b nh chnh vi m-un vo-ra (hon
ton bng phn cng) tc nhanh
Ph hp vi cc yu cu trao i mng
d liu c kch thc ln
Gia DMAC vi b nh
471
Jan2016
472
118
Jan2016
NKK-HUST
NKK-HUST
4. B x l vo-ra
n
8.3. Ni ghp thit b vo-ra
Vic iu khin vo-ra c thc hin

bi mt b x l vo-ra chuyn dng
B x l vo-ra hot ng theo chng
trnh ca ring n
Chng trnh ca b x l vo-ra c th
nm trong b nh chnh hoc nm
trong mt b nh ring
Jan2016
1. Cc kiu ni ghp vo-ra

n Ni ghp song song
n Ni ghp ni tip
473
NKK-HUST
Jan2016
NKK-HUST
Ni ghp song song
M-un
vo-ra
song
song
n
bus
h
thng
Ni ghp ni tip
n
bus
h
thng
n
thit b
vo-ra
n
n
n
Jan2016
474
Truyn nhiu bit song song

Tc nhanh
Cn nhiu ng truyn d liu
n
n
475
Jan2016
M-un
vo-ra
ni tip
n
thit b
vo-ra
Truyn ln lt tng bit

Cn c b chuyn i t d liu song song sang
ni tip hoc/v ngc li
Tc chm hn
Cn t ng truyn d liu
476
119
Jan2016
NKK-HUST
NKK-HUST
2. Cc cu hnh ni ghp
Thunderbolt
7.7 / THE EXTERNAL INTERFACE: THUNDERBOLT AND INFINIBAND
im ti im (Point to Point)
n
COMPUTER
Thng qua mt cng vo-ra ni ghp vi mt

thit b
Thng qua mt cng vo-ra cho php ni

ghp c vi nhiu thit b
n V d:
Processor
Platform
controller
hub (PCH)
DisplayPort
Memory
Graphics
Subsystem
im ti a im (Point to Multipoint)
251
DisplayPort
PCIe x4
TC
Thunderbolt
controller
Thunderbolt
connector
USB (Universal Serial Bus): 127 thit b

IEEE 1394 (FireWire): 63 thit b
Thunderbolt
Thunderbolt
20 Gbps (max)
Daisy
chain
TC
TC
Figure 7.17 Example Computer Configuration with Thunderbolt
Jan2016
477
NKK-HUST
Jan2016
NKK-HUST
478
THUNDERBOLT PROTOCOL ARCHITECTURE Figure 7.18 illustrates the

Thunderbolt protocol architecture. The cable and connector layer provides
transmission medium access. This layer specifies the physical and electrical
attributes of the connector port.
The Thunderbolt protocol physical layer is responsible for link maintenance
including hot-plug3 detection and data encoding to provide highly efficient data
transfer. The physical layer has been designed to introduce very minimal overhead
and provides full-duplex 10 Gbps of usable capacity to the upper layers.
The common transport layer is the key to the operation of Thunderbolt and
what makes it attractive as a high-speed peripheral I/O technology. Some of the
features include:
Kin trc my tnh
A high-performance, low-power, switching architecture.

A highly efficient, low-overhead packet format with flexible quality of service
(QoS) support that allows multiplexing of bursty PCI Express transactions
Chng 9
CC KIN TRC SONG SONG
3
The term hot plug is defined as pulling out a component from a system and plugging in a new one while
the main power is still on. It allows an external drive, network adapter, or other peripheral to be plugged
in without having to power down the computer.
Ht chng 8
Nguyn Kim Khnh

Jan2016
479
Jan2016
480
120
Jan2016
NKK-HUST
NKK-HUST
Ni dung ca chng 9
Ni dung hc phn
Chng 4. S hc my tnh
Chng 6. B x l
Chng 7. B nh my tnh
Jan2016
9.1. Phn loi kin trc my tnh

9.2. a x l b nh dng chung
9.3. a x l b nh phn tn
9.4. B x l ha a dng
481
NKK-HUST
Jan2016
NKK-HUST
9.1. Phn loi kin trc my tnh
SISD
CU
Phn loi kin trc my tnh (Michael Flynn -1966)

n
SISD - Single Instruction Stream, Single Data Stream
SIMD - Single Instruction Stream, Multiple Data Stream
MISD - Multiple Instruction Stream, Single Data Stream
MIMD - Multiple Instruction Stream, Multiple Data Stream
n
n
n
n
n
n
n
Jan2016
482
483
Jan2016
IS
PU
DS
MU
CU: Control Unit

PU: Processing Unit
MU: Memory Unit
Mt b x l
n dng lnh
D liu c lu tr trong mt b nh
Chnh l Kin trc von Neumann (tun t)
484
121
Jan2016
NKK-HUST
NKK-HUST
SIMD
SIMD (tip)
PU1
CU
PU2
IS
DS
DS
LM1
LM2
n
.
.
.
PUn
DS
n dng lnh iu khin ng thi cc

n v x l PUs
Mi n v x l c mt b nh d liu
ring LM (local memory)
Mi lnh c thc hin trn mt tp
cc d liu khc nhau
Cc m hnh SIMD
LMn
n
n
Jan2016
485
NKK-HUST
Jan2016
n
n
MIMD
Mt lung d liu cng c truyn n

mt tp cc b x l
Mi b x l thc hin mt dy lnh
khc nhau.
Cha tn ti my tnh thc t
C th c trong tng lai
n
n
Tp cc b x l
Cc b x l ng thi thc hin cc
dy lnh khc nhau trn cc d liu
khc nhau
Cc m hnh MIMD
n
n
Jan2016
486
NKK-HUST
MISD
n
Vector Computer
Array processor
487
Jan2016
Multiprocessors (Shared Memory)

Multicomputers (Distributed Memory)
488
122
Jan2016
NKK-HUST
NKK-HUST
MIMD - Shared Memory
MIMD - Distributed Memory

a x l b nh phn tn
(distributed memory mutiprocessors or
multicomputers)
a x l b nh dng chung
(shared memory mutiprocessors)
CU1
CU2
IS
IS
PU1
PU2
.
.
.
CUn
DS
CU1
DS
.
.
.
IS
Jan2016
PUn
B nh
dng
chung
CU2
CUn
489
NKK-HUST
Jan2016
.
.
.
IS
PUn
LM2
.
.
.
DS
Mng
lin
kt
hiu
nng
cao
LMn
490
SIMD
H thng a x l i xng (SMPSymmetric Multiprocessors)

H thng a x l khng i xng
(NUMA Non-Uniform Memory Access)
B x l a li (Multicore Processors)
Song song mc lung

n
MIMD
Song song mc yu cu
n
Jan2016
pipeline
superscalar
Song song mc d liu

n
DS
LM1
9.2. a x l b nh dng chung
Song song mc lnh

n
PU2
DS
NKK-HUST
PU1
Phn loi cc k thut song song

n
IS
.
.
.
DS
IS
Cloud computing
491
Jan2016
492
123
598
PARALLEL COMPUTER ARCHITECTURES
CHAP. 8
Memory consistency is not a done deal. Researchers are still proposing new
Bi ging Kinmodels
trc
my
(Naeem
et al.,tnh
2011, Sorin et al., 2011, and Tu et al., 2010).
Jan2016
8.3.3 UMA Symmetric Multiprocessor Architectures

NKK-HUST
SMP hay UMA (Uniform Memory Access)
SMP (tip)
n
n
(a)
SEC. 8.3
(b)
(c)
SHARED-MEMORY MULTIPROCESSORS
Figure 8-26. Three bus-based multiprocessors. (a) Without caching. (b) With
caching. (c) With caching and private memories.
607
system is called CC-NUMA (at least by the hardware people). The software people often
because
basically
the memory,
same as software
If thecall
busitishardware
busy whenDSM
a CPU
wantsittois read
or write
the CPUdisjust
tributed
shared
but implemented
thethe
hardware
small
page size.
waits until
the memory
bus becomes
idle. Hereinby
lies
problemusing
with athis
design.
With
One
of
the
first
NC-NUMA
machines
(although
the
name
had
not
yet
been
two or three CPUs, contention for the bus will be manageable; with 32 or 64 it will
coined)
was the The
Carnegie-Mellon
illustrated
in the
simplified
formofintheFig.
be unbearable.
system will beCm*,
totally
limited by
bandwidth
bus,8-32
and
(Swan
al.,CPUs
1977).
consisted
of of
a collection
most ofetthe
willIt be
idle most
the time. of LSI-11 CPUs, each with some
Jan2016
Computer
Architecture
493
memory
a local
bus. to(The
a single-chip
of the
The addressed
solution isover
to add
a cache
eachLSI-11
CPU, was
as depicted
in Fig.version
8-26(b).
The
DEC
a minicomputer
popular
In on
addition,
the LSI-11
sys-or
cachePDP-11,
can be inside
the CPU chip,
next in
to the
the 1970s.)
CPU chip,
the processor
board,
tems
connected
a system
bus.many
When
a memory
cameout
intoofthe
some were
combination
of by
all three.
Since
reads
can nowrequest
be satisfied
the
(specially
modified)
MMU,
a check
to see
thesystem
word needed
was inmore
the
local cache,
there will
be much
lesswas
busmade
traffic,
andifthe
can support
local
If so, a isrequest
the localasbus
get the
If not,
CPUs.memory.
Thus caching
a big was
win sent
here.over
However,
wetoshall
see word.
in a moment,
the
request
routed
over the
bus toisthe
keeping
the was
caches
consistent
withsystem
one another
not system
trivial. containing the word,
NKK-HUST
which
responded.
Of iscourse,
the latter
muchin longer
than CPU
the former.
Yetthen
another
possibility
the design
of Fig.took
8-26(c),
which each
has not
While
program
happily
outmemory
of remote
memory,
it took over
10 times
longer
only aacache
but could
also arun
local,
private
which
it accesses
a dedicated
to
execute
than
the
same
program
running
out
of
local
memory.
(private) bus. To use this configuration optimally, the compiler should place all the
program text, strings, constants and other read-only data, stacks, and local variCPU
Memory
CPU The
Memory
CPU Memory
ables in the
private
memories.
shared memory
is then used CPU
only Memory
for writable
shared variables. In most cases, this careful placement will greatly reduce bus traffic, but it does require active cooperation from the compiler.
n
Jan2016
Local bus
Local bus
Local bus
B x l a li (multicores)
666
8-32.
A NUMA gian
machine based
two levels
of buses.
The Cm*
the CPU
CFigure
mt
khng
a onch
chung
cho
ttwasc
first multiprocessor to use this design.
n Mi CPU c th truy cp t xa sang b nh ca
Memory coherence is guaranteed in an NC-NUMA machine because no caching isCPU
present.khc
Each word of memory lives in exactly one location, so there is no
danger of one copy having stale data: there are no copies. Of course, it now matn
nhp
nh
t xamemory
chm
hnthetruy
nhppenalty
b
ters aTruy
great deal
whichb
page
is in which
because
performance
for being
in cc
the wrong
place is so high. Consequently, NC-NUMA machines use
nh
b
elaborate software to move pages around to maximize performance.
Program counter
Instruction fetch unit
Issue logic
Single-thread register file
Execution units and queues
L1 instruction cache
L1 data cache
L2 cache
(a) Superscalar
Typically, a daemon process called a page scanner runs every few seconds.
job is to examine the usage statistics
and move pages around in an attempt to 495
improve performance. If a page appears to be in the wrong place, the page scanner
unmaps it so that the next reference to it will cause a page fault. When the fault
occurs, a decision is made about where to place the page, possibly in a different
memory. To prevent thrashing, usually there is some rule saying that once a page
is placed, it is frozen in place for a time T . Various algorithms have been studied,
but the conclusion is that no one algorithm performs best under all circumstances
(LaRowe and Ellis, 1991). Best performance depends on the application.
CHAPTER 18 / MULTICORE COMPUTERS
Thay i ca b x
l:
Tun t
n Pipeline
n Siu v hng
n a lung
n a li: nhiu CPU
trn mt chip
Local bus
System bus
Its
Jan2016
494
NKK-HUST
NUMA (Non-Uniform Memory Access)
MMU
Issue logic
Instruction fetch unit
Registers n
Bus
Execution units and queues
L1 instruction cache
L1 data cache
L2 cache
(b) Simultaneous multithreading
Processor n
(superscalar or SMT)
Cache
L1-I
L1-D
PC n
CPU
Register 1
CPU
Processor 3
L1-I
L1-D
CPU
PC 1
CPU
Processor 2
Processor 1
CPU
L1-I
L1-D
CPU
Shared
memory
Private memory
Shared memory
Mt my tnh c n >= 2 b x l ging nhau

Cc b x l dng chung b nh v h thng
vo-ra
Thi gian truy cp b nh l bng nhau vi cc
b x l
Cc b x l c th thc hin chc nng ging
nhau
H thng c iu khin bi mt h iu hnh
phn tn
Hiu nng: Cc cng vic c th thc hin song
song
Kh nng chu li
L1-I
L1-D
The simplest multiprocessors are based on a single bus, as illustrated in

Fig. 8-26(a). Two or more CPUs and one or more memory modules all use the
same bus for communication. When a CPU wants to read a memory word, it first
checks to see whether the bus is busy. If the bus is idle, the CPU puts the address
of the word it wants on the bus, asserts a few control signals, and waits until the
memory puts the desired word on the bus.
NKK-HUST
L2 cache
(c) Multicore
Figure 18.1 Alternative Chip Organizations
Jan2016
496 to
For each of these innovations, designers have over the years attempted
increase the performance of the system by adding complexity. In the case of pipelining, simple three-stage pipelines were replaced by pipelines with five stages, and
then many more stages, with some implementations having over a dozen stages.
There is a practical limit to how far this trend can be taken, because with more
stages, there is the need for more logic, more interconnections, and more control
signals. With superscalar organization, increased performance can be achieved by
increasing the number of parallel pipelines. Again, there are diminishing returns as
the number of pipelines increases. More logic is required to manage hazards and
to stage instruction resources. Eventually, a single thread of execution reaches the
point where hazards and resource dependencies prevent the full use of the multiple
124
Intel has introduced a number of multicore products in recent years. In this section,
we look at two examples: the Intel Core Duo and the Intel Core i7-990X.
Intel Core Duo
CPU Core n
CPU Core 1
CPU Core n
L1-D L1-I
L1-D L1-I
L1-D L1-I
L1-D L1-I
L2 cache
L2 cache
Main memory
I/O
L2 cache
I/O
Main memory
(b) Dedicated L2 cache
(a) Dedicated L1 cache
2006
Two x86 superscalar,
shared L2 cache
Dedicated L1 cache
per core
n
CPU Core 1
CPU Core n
L1-D L1-I
L1-D L1-I
CPU Core 1
CPU Core n
L1-D L1-I
L1-D L1-I
L2 cache
L2 cache
L2 cache
Main memory
Jan2016
Thermal control
APIC
APIC
Power management logic
32KiB instruction and

32KiB data
2 MB L2 shared cache
2MiB shared L2 cache
SEC. 8.4
Main memory
I/O
I/O
(d ) Shared L3 cache
Multicore Organization Alternatives

4. Interprocessor communication is easy to implement, via shared memory locations.

5. The use of a shared L2 cache confines the cache coherency problem to the L1
cache level, which may provide some additional performance advantage.
497
A potential advantage to having only dedicated L2 caches on the chip is that

each core enjoys more rapid access to its private L2 cache. This is advantageous for
threads that exhibit strong locality.
As both the amount of memory available and the number of cores grow, the
NKK-HUST
use of a shared L3 cache combined with either a shared L2 cache or dedicated percore L2 caches seems likely to provide better performance than simply a massive
shared L2 cache.
Another organizational design decision in a multicore system is whether the
cores will be COMPUTERS
superscalar or will implement simultaneous multithreading
678 CHAPTERindividual
18 / MULTICORE
(SMT). For example, the Intel Core Duo uses superscalar cores, whereas the Intel
Core i7 uses SMT cores. SMT has the effect of scaling up the number of hardwarelevel threads that the multicore system supports. Thus, a multicore system with four
cores and SMT that supports four simultaneous threads in each core appears the
same
as a 2multicoreCore
system
cores.
As software
Core to
0 the application
Core 1 level Core
3 with 16
Core
4
Core 5 is
developed to more fully exploit parallel resources, an SMT approach appears to be
more attractive than a superscalar approach.
Intel Core i7-990X
Bus interface
617
MESSAGE-PASSING MULTICOMPUTERS
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
32 kB 32 kB
L1-I L1-D
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
256 kB
L2 Cache
As a consequence of these and other factors, there is a great deal

of interest in
Front-side bus
building and using parallel computers in which each CPU has its own private memory, not directly accessible to any other CPU. These are
the18.9
multicomputers.
ProFigure
Intel Core Duo Block
Diagram
grams on multicomputer CPUs interact using primitives like send and receive to
explicitly pass messages because they cannot get at each others memory with
LOAD and STORE instructions. Computer
This difference
Jan2016
Architecture completely changes the programming model.
Each node in a multicomputer consists of one or a few CPUs, some RAM
(conceivably shared among the CPUs at that node only), a disk and/or other I/O devices, and a communication processor. The communication processors are connected by a high-speed interconnection network of the types we discussed in Sec.
8.3.3. Many different topologies, switching schemes, and routing algorithms are
used. What all multicomputers have in common is that when an application proNKK-HUST
gram executes the send primitive, the communication processor is notified and
transmits a block of user data to the destination machine (possibly after first asking
for and getting permission). A generic multicomputer is shown in Fig. 8-36.
CPU
Node
Memory
Local interconnect
Disk
and
I/O
Local interconnect
Disk
and
I/O
Communication
processor
High-performance interconnection network
DDR3 Memory
Controllers
QuickPath
Interconnect
3 ! 8B @ 1.33 GT/s
4 ! 20B @ 6.4 GT/s
Figure 8-36. A generic multicomputer.
Figure 18.10 Intel Core i7-990X Block Diagram
The general structure of the Intel Core i7-990X is shown in Figure 18.10. Each
core has its own dedicated L2 cache and the four cores share a 12-MB L3 cache.
Computer
Architecture
One mechanism Intel uses to make
its caches
more effective is prefetching, in which
the hardware examines memory access patterns and attempts to fill the caches speculatively with data thats likely to be requested soon. It is interesting to compare the
performance of this three-level on chip cache organization with a comparable twolevel organization from Intel. Table 18.1 shows the cache access latency, in terms of
clock cycles for two Intel multicore systems running at the same clock frequency.
The Core 2 Quad has a shared L2 cache, similar to the Core Duo. The Core i7
improves on L2 cache performance with the use of the dedicated L2 caches, and
provides a relatively high-speed access to the L3 cache.
The Core i7-990X chip supports two forms of external communications to
other chips. The DDR3 memory controller brings the memory controller for the
498
9.3. a x l b nh phn tn
12 MB
L3 Cache
Jan2016
Thermal control
L3 cache
(c) Shared L2 cache
Figure 18.8
32-kB L1 Caches
CPU Core 1
Intel - Core Duo
675
Execution
resources
18.3 / MULTICORE ORGANIZATION
Arch. state
Cc dng t chc b x l a li
Arch. state
NKK-HUST
Execution
resources
NKK-HUST
Jan2016
The Intel Core Duo, introduced in 2006, implements two x86 superscalar processors
with a shared L2 cache (Figure 18.8c).
The general structure of the Intel Core Duo is shown in Figure 18.9. Let us
consider the key elements starting from the top of the figure. As is common in multicore systems, each core has its own dedicated L1 cache. In this case, each core has
a 32-kB instruction cache and a 32-kB data cache.
Each core has an independent thermal control unit. With the high transistor
density of todays chips, thermal management is a fundamental capability, especially for laptop and mobile systems. The Core Duo thermal control unit is designed
to manage chip heat dissipation to maximize processor performance within thermal
constraints. Thermal management also improves ergonomics with a cooler system
and lower fan acoustic noise. In essence, the thermal management unit monitors
digital sensors for high-accuracy die temperature measurements. Each core can
be defined as an independent thermal zone. The maximum temperature for each
32-kB L1 Caches
499
My tnh qui m ln (Warehouse Scale Computers

Interconnection
NetworksProcessors MPP)
or 8.4.1
Massively
Parallel
In Fig. 8-36 we see that multicomputers are held together by interconnection
My
tnhNow
cm
(clusters)
networks.
it is time
to look more closely at these interconnection networks.
Jan2016
Interestingly enough, multiprocessors and multicomputers are surprisingly similar

in this respect because multiprocessors often have multiple memory modules that
must also be interconnected with one another and with the CPUs. Thus the materComputer
Architecture
ial in this section frequently applies
to both
kinds of systems.
The fundamental reason why multiprocessor and multicomputer interconnection networks are similar is that at the very bottom both of them use message
500
125
Jan2016
NKK-HUST
NKK-HUST
Mng lin kt
SEC. 8.4
Massively Parallel Processors
619
n
(a)
(b)
n
n
(c)
(d)
Jan2016
624
PARALLEL
(e)
(f)
(g)
(h)
Figure 8-37. Various topologies. The heavy dots represent switches. The CPUs
Computer
and memories are not shown.
(a) A star.Architecture
(b) A complete interconnect. (c) A tree.
COMPUTER
ARCHITECTURES
CHAP. 8
(d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube.
501
coherency between the L1 caches on Interconnection

the four CPUs.networks
Thus when
piece ofby their dimensionality. For
can abeshared
characterized
memory resides in more than oneour
cache,
accesses
to that storageisby
one processor
purposes,
the dimensionality
determined
by the number of choices there are
will be immediately visible to the to
other
threethe
processors.
A memory
reference
get from
source to the
destination.
If therethat
is never any choice (i.e., there is
misses on the L1 cache but hits on
theone
L2path
cache
takes
cycles. A the network is zero dimenonly
from
eachabout
source11toclock
each destination),
miss on L2 that hits on L3 takes about
a miss inonwhich
L3 that
has tocan be made, for example, go
sional.28Ifcycles.
there is Finally,
one dimension
a choice
go to the main DRAM takes about 75 cycles.
The four CPUs are connected via a high-bandwidth bus to a 3D torus network,
which requires six connections: up, down, north, south, east, and west. In addition,
NKK-HUST
each processor has a port to the collective network, used for broadcasting data to
all processors. The barrier port is used to speed up synchronization operations, giving each processor fast access to a specialized synchronization network.
At the next level up, IBM designed a custom card that holds one of the chips
shown in Fig. 8-38 along with 2 GB of DDR2 DRAM. The chip and the card are
shown in Fig. 8-39(a)(b) respectively.
Jan2016
Cluster
n
n
2-GB
DDR2
DRAM
(a)
Card
1 Chip
4 CPUs
2 GB
Board
32 Cards
32 Chips
128 CPUs
64 GB
Cabinet
32 Boards
1024 Cards
1024 Chips
4096 CPUs
2 TB
System
72 Cabinets
73728 Cards
73728 Chips
294912 CPUs
144 TB
(b)
(c)
(d)
(e)
Figure 8-39. The BlueGene/P: (a) chip. (b) card. (c) board. (d) cabinet.
(e) system.
The cards are mounted on plug-in boards, with 32 cards per board for a total of
32 chips (and thus 128 CPUs) per board. Since each card contains 2 GB of
DRAM, the boards contain 64 GB apiece. One board is illustrated in Fig. 8-39(c).
At the next level, 32 of these boards are plugged into a cabinet, packing 4096
CPUs into a single cabinet. A cabinet is illustrated in Fig. 8-39(d).
Finally, a full system, consisting of up to 72 cabinets with 294,912 CPUs, is
depicted in Fig. 8-39(e). A PowerPC 450 can issue up to 6 instructions/cycle, thus
Jan2016
502
NKK-HUST
IBM Blue Gene/P
Chip:
4 processors
8-MB L3 cache
H thng qui m ln
t tin: nhiu triu USD
Dng cho tnh ton khoa hc v cc bi
ton c s php ton v d liu rt ln
Siu my tnh
n
n
503
Jan2016
Nhiu my tnh c kt ni vi nhau bng

mng lin kt tc cao (~ Gbps)
Mi my tnh c th lm vic c lp (PC
hoc SMP)
Mi my tnh c gi l mt node
Cc my tnh c th c qun l lm vic
song song theo nhm (cluster)
Ton b h thng c th coi nh l mt my
tnh song song
Tnh sn sng cao
Kh nng chu li ln
504
126
Jan2016
NKK-HUST
NKK-HUST
PC Cluster ca Google
SEC. 8.4
9.4. B x l ha a dng
635
hold exactly 80 PCs and switches can be larger or smaller than 128 ports; these are
just typical values for a Google cluster.
OC-12 Fiber
OC-48 Fiber
n
n
128-port Gigabit
Ethernet switch
128-port Gigabit
Ethernet switch
Two gigabit
Ethernet links
n
80-PC rack
Kin trc SIMD

Xut pht t b x l ha GPU (Graphic
Processing Unit) h tr x l ha 2D v
3D: x l d liu song song
GPGPU General purpose Graphic
Processing Unit
H thng lai CPU/GPGPU
n
n
CPU l host: thc hin theo tun t

GPGPU: tnh ton song song
Figure 8-44. A typical Google cluster.
Jan2016
Power density is also a key

issue. A Architecture
typical PC burns about 120 watts or about
Computer
10 kW per rack. A rack needs about 3 m2 so that maintenance personnel can install and remove PCs and for the air conditioning to function. These parameters
give a power density of over 3000 watts/m2 . Most data centers are designed for
6001200 watts/m2 , so special measures are required to cool the racks.
Google has learned three key things about running massive Web servers that
bear repeating.
505
Jan2016
506
1. Components will fail so plan for it.
NKK-HUST
2. Replicate everything for throughput and availability.
NKK-HUST
3. Optimize price/performance.
B x l ha trong my tnh
GPGPU: NVIDIA Tesla

nStreaming
multiprocessor
8 Streaming
processors
Jan2016
507
Jan2016
508
127
and CUDA cores and other execution units in the SM execute threads. The SM executes
threads in groups of 32 threads called a warp. While programmers can generally ignore warp
execution for functional correctness and think of programming one thread, they can greatly
improve performance by having threads in a warp execute the same code path and access
memory in nearby addresses.
Jan2016
An Overview of the Fermi Architecture

The first Fermi based GPU, implemented with 3.0 billion transistors, features up to 512 CUDA
cores. A CUDA core executes a floating point or integer instruction per clock for a thread. The
NKK-HUST 512 CUDA cores are organized in 16 SMs of 32 cores each. The GPU has six 64-bit memory
NKK-HUST
GPGPU: NVIDIA Fermi
NVIDIA Fermi
partitions, for a 384-bit memory interface, supporting up to a total of 6 GB of GDDR5 DRAM

memory. A host interface connects the GPU to the CPU via PCI-Express. The GigaThread
global scheduler distributes thread blocks to SM thread schedulers.
Instruction Cache
Third Generation Streaming

Multiprocessor
C 16 Streaming
Multiprocessors (SM)
Mi SM c 32 CUDA
cores.
Mi CUDA core
(Cumpute Unified
Device Architecture) c
01 FPU v 01 IU
The third generation SM introduces several

architectural innovations that make it not only the
most powerful SM yet built, but also the most
programmable and efficient.
Jan2016
Fermis 16 SM are positioned around a common L2 cache. Each SM is a vertical

rectangular strip that contain an orange portion (scheduler and dispatch), a green portion
Computer
Architecture
(execution units), and light
blue portions
(register file and L1 cache).
509
Jan2016
Warp Scheduler
Dispatch Unit
Dispatch Unit
Register File (32,768 x 32-bit)
Core
Core
Core
Core
Core
Core
Core
Core
LD/ST
LD/ST
512 High Performance CUDA cores
Warp Scheduler
SFU
LD/ST
LD/ST
Each SM features 32 CUDA

LD/ST
CUDA Core
Core
Core
Core
Core
Dispatch Port
LD/ST
processorsa fourfold
Operand Collector
LD/ST
increase over prior SM
Core
Core
Core
Core
LD/ST
designs. Each CUDA
FP Unit
INT Unit
LD/ST
processor has a fully
Core
Core
Core
Core
LD/ST
Result Queue
pipelined integer arithmetic
LD/ST
logic unit (ALU) and floating
Core
Core
Core
Core
LD/ST
point unit (FPU). Prior GPUs used IEEE 754-1985
LD/ST
floating point arithmetic. The Fermi architecture
Core
Core
Core
Core
LD/ST
implements the new IEEE 754-2008 floating-point
LD/ST
standard, providing the fused multiply-add (FMA)
Core
Core
Core
Core
LD/ST
instruction for both single and double precision
arithmetic. FMA improves over a multiply-add
Interconnect Network
(MAD) instruction by doing the multiplication and
64 KB Shared Memory / L1 Cache
addition with a single final rounding step, with no
Uniform Cache
Cache
Uniform
loss of precision in the addition. FMA is more
Fermi Streaming Multiprocessor (SM)
accurate than performing the operations
separately. GT200 implemented double precision FMA.
SFU
SFU
SFU
Computer
Architecture
510
In GT200, the integer
ALU was
limited to 24-bit precision for multiply operations; as a result,
multi-instruction emulation sequences were required for integer arithmetic. In Fermi, the newly
designed integer ALU supports full 32-bit precision for all instructions, consistent with standard
programming language requirements. The integer ALU is also optimized to efficiently support
64-bit and extended precision operations. Various instructions are supported, including
Boolean, shift, move, compare, convert, bit-field extract, bit-reverse insert, and population
count.
16 Load/Store Units
NKK-HUST
Each SM has 16 load/store units, allowing source and destination addresses to be calculated
for sixteen threads per clock. Supporting units load and store the data at each address to
cache or DRAM.
8
8
Ht
Jan2016
511
128

slide kiến trúc máy tính

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

slide kiến trúc máy tính

Uploaded by

Copyright:

Available Formats

Bi ging Kin trc my tnh

TRNG I HC BCH KHOA H NI

KIN TRC MY TNH

Nguyn Kim Khnh

Sinh vin c trang b cc kin thc c s v

Nguyn Kim Khnh DCE-HUST

Sch tham kho:

Tm hiu kin trc tp lnh ca cc b x l c th

Bi ging Kin trc my tnh: CA-Jan2016

Phn mm lp trnh hp ng v m phng cho MIPS:

Bi ging Kin trc my tnh

Chng 1. Gii thiu chung

Kin trc my tnh

1.1. My tnh v phn loi my tnh

Nguyn Kim Khnh

Nguyn Kim Khnh DCE-HUST

Bi ging Kin trc my tnh

1.1. My tnh v phn loi my tnh

M hnh n gin ca my tnh

My tnh (Computer) l thit b in t thc

Nhn d liu vo,

Desktop computers, Laptop computers

Thit b di ng c nhn (PMD - Personal Mobile

Dng trong mng qun l v cung cp cc dch v

Dng cho tnh ton cao cp trong khoa hc v k thut

My tnh nhng (Embedded Computers)

t n trong thit b khc

Nguyn Kim Khnh DCE-HUST

in ton m my (Cloud Computing)

Siu my tnh (Supercomputers)

Phn loi my tnh k nguyn sau PC

My tnh c nhn (Personal Computers)

Phn loi my tnh k nguyn PC

S dng my tnh qui m ln (Warehouse Scale

Bi ging Kin trc my tnh

1.2. Khi nim kin trc my tnh

Kin trc my tnh bao gm:

Kin trc tp lnh (Instruction Set Architecture):

T chc my tnh (Computer Organization) hay

B x l, b nh, m-un vo-ra

Cc thnh phn c bn ca my tnh

language such as C, C!!,

Ging nhau vi tt c cc loi

Trao i thng tin gia my tnh

Bus h thng (System bus)

Cha cc chng trnh ang

H thng vo-ra (Input/Output)

B nh chnh (Main Memory)

Nguyn Kim Khnh DCE-HUST

Lp lch cho cc nhim v v chia s ti

H iu hnh (Operating System)

Below Your Program

swap(int v[], int k)

Chng trnh dch (Compiler): dch m

High-level language HLL

c vit theo ngn ng bc cao

The recognition that a program could be written to translate a more powerful

Cng mt kin trc tp lnh c th c nhiu sn

Phn cng (Hardware): nghin cu thit k logic chi

Kt ni v vn chuyn thng tin

Bi ging Kin trc my tnh

1.3. S tin ha ca cng ngh my tnh

My tnh dng n in t chn khng (1950s)

SSI - Small Scale Integration