Download as pdf or txt
Download as pdf or txt
You are on page 1of 128

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

TRNG I HC BCH KHOA H NI


Hanoi University of Science and Technology

Contact Information
n
n

KIN TRC MY TNH

Computer Architecture

Address: 502-B1
Mobile: 091-358-5533
e-mail: khanhnk@soict.hust.edu.vn
khanh.nguyenkim@hust.edu.vn

Nguyn Kim Khnh


B mn K thut my tnh
Vin Cng ngh thng tin v Truyn thng
Department of Computer Engineering (DCE)
School of Information and Communication Technology (SoICT)
Version: CA-Jan2016
Jan2016

NKK-HUST

Computer Architecture

NKK-HUST

Mc tiu hc phn
n

Ti liu hc tp
n

Sinh vin c trang b cc kin thc c s v


kin trc tp lnh v t chc ca my tnh, cng
nh nhng vn c bn trong thit k my
tnh.
Sau khi hc xong hc phn ny, sinh vin c
kh nng:
n
n
n

n
n
Jan2016

Nguyn Kim Khnh DCE-HUST

Sch tham kho:


[1] William Stallings
Computer Organization and Architecture 2013, 9th edition
[2] David A. Patterson, John L. Hennessy
Computer Organization and Design 2012, Revised 4th edition
[3] David Money Harris, Sarah L. Harris
Digital Design and Computer Architecture 2013, 2nd edition
[4] Andrew S. Tanenbaum
Structured Computer Organization 2013, 6th edition

Tm hiu kin trc tp lnh ca cc b x l c th


Lp trnh hp ng
nh gi hiu nng my tnh v ci thin hiu nng
ca chng trnh
Khai thc v qun tr hiu qu cc h thng my tnh
Phn tch v thit k my tnh
Computer Architecture

Bi ging Kin trc my tnh: CA-Jan2016


download ti: ftp://dce.hust.edu.vn/khanhnk/CA/

Phn mm lp trnh hp ng v m phng cho MIPS:


MARS (MIPS Assembler and Runtime Simulator)
download ti: http://courses.missouristate.edu/KenVollmar/MARS/

Jan2016

Computer Architecture

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

Jan2016

NKK-HUST

Ni dung hc phn

Content

Chng 1. Gii thiu chung


Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song

Chapter 1. Introduction
Chapter 2. The Basics of Digital Logic
Chapter 3. Computer Systems
Chapter 4. Computer Arithmetic
Chapter 5. Instruction Set Architecture
Chapter 6. The Processors
Chapter 7. Computer Memory
Chapter 8. Input-Output Systems
Chapter 9. Parallel Architectures

Computer Architecture

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Kin trc my tnh

Ni dung ca chng 1

1.1. My tnh v phn loi my tnh


1.2. Khi nim kin trc my tnh
1.3. S tin ha ca cng ngh my tnh
1.4. Hiu nng my tnh

Chng 1
GII THIU CHUNG

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Jan2016

Computer Architecture

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

1.1. My tnh v phn loi my tnh


n

M hnh n gin ca my tnh

My tnh (Computer) l thit b in t thc


hin cc cng vic sau:
n
n

x l d liu

Nhn d liu vo,


X l d liu theo dy cc lnh c nh sn bn
trong,
a d liu (thng tin) ra.

Cc
thit b vo
(Input
Devices)

Dy cc lnh nm trong b nh yu cu
my tnh thc hin cng vic c th gi l
chng trnh (program).
My tnh hot ng theo chng trnh
n

Jan2016

Computer Architecture

d liu vo

Jan2016

Desktop computers, Laptop computers


My tnh a dng

n
n

Thit b di ng c nhn (PMD - Personal Mobile


Devices)
n

My ch (Servers) my phc v
n

Dng trong mng qun l v cung cp cc dch v


Hiu nng v tin cy cao
Hng nghn n hng triu USD

Dng cho tnh ton cao cp trong khoa hc v k thut


Hng triu n hng trm triu USD

My tnh nhng (Embedded Computers)


n
n

t n trong thit b khc


c thit k chuyn dng
Computer Architecture

Nguyn Kim Khnh DCE-HUST

n
11

Jan2016

Smartphones, Tablet
Kt ni Internet

in ton m my (Cloud Computing)

Siu my tnh (Supercomputers)


n

Jan2016

10

Phn loi my tnh k nguyn sau PC

My tnh c nhn (Personal Computers)


n

d liu ra

NKK-HUST

chng trnh
ang thc hin

Computer Architecture

Phn loi my tnh k nguyn PC

Cc
thit b ra
(Output
Devices)

B nh chnh
(Main Memory)

NKK-HUST

B x l
trung tm
(Central
Processing Unit)

S dng my tnh qui m ln (Warehouse Scale


Computers), gm rt nhiu servers kt ni vi nhau
Cho cc cng ty thu mt phn cung cp dch v
phn mm
Software as a Service (SaaS): mt phn ca phn
mm chy trn PMD, mt phn chy trn Cloud
V d: Amazon, Google
Computer Architecture

12

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

1.2. Khi nim kin trc my tnh


n

Kin trc my tnh bao gm:


n

Phn lp my tnh

Kin trc tp lnh (Instruction Set Architecture):


nghin cu my tnh theo cch nhn ca ngi lp
trnh

Ngi
s dng

T chc my tnh (Computer Organization) hay


Vi kin trc (Microarchitecture): nghin cu thit k
my tnh mc cao (thit k CPU, h thng nh,
cu trc bus, ...)

NKK-HUST

Ngn ng bc cao

Hp ng
n
n

Assembly language
M t lnh di dng text

n
n

Qun l b nh v lu tr

iu khin vo-ra

Phn cng

Jan2016

B x l, b nh, m-un vo-ra

Computer Architecture

swap:

multi
add
lw
lw
sw
sw
jr

Cc thnh phn c bn ca my tnh


high-level
programming
language A portable

language such as C, C!!,


Java, or Visual Basic that
is composed of words
and algebraic notation
that can be translated by
a compiler into assembly
language.

CPU

$2, $5,4
$2, $4,$2
$15, 0($2)
$16, 4($2)
$16, 0($2)
$15, 4($2)
$31

B nh chnh

Bus h thng

Ging nhau vi tt c cc loi


my tnh
B x l trung tm (Central
Processing Unit CPU)

FIGURE 1.4 C program compiled into assembly language and then assembled into binary
machine language. Although the translation from high-level language to binary machine language is
shown in two steps, some compilers cut out the middleman and produce binary machine language directly.
These languages and this program are examined in more detail in Chapter 2.

Jan2016

Trao i thng tin gia my tnh


vi bn ngoi

Bus h thng (System bus)


n

15

Cha cc chng trnh ang


thc hin

H thng vo-ra (Input/Output)


n

00000000101000100000000100011000
00000000100000100001000000100001
10001101111000100000000000000000
10001110000100100000000000000100
10101110000100100000000000000000
10101101111000100000000000000100
00000011111000000000000000001000

iu khin hot ng ca my
tnh v x l d liu

B nh chnh (Main Memory)


n

Assembler

Binary machine
language
program
(for MIPS)

14

NKK-HUST

n
Assembly
language
program
(for MIPS)

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Lp lch cho cc nhim v v chia s ti


nguyn

Phn cng

H thng vo-ra

Machine language
M t theo phn cng
Cc lnh v d liu c
m ha theo nh phn

H iu hnh (Operating System)

15

Below Your Program

swap(int v[], int k)


{int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}

Chng trnh dch (Compiler): dch m


ngn ng bc cao thnh ngn ng my
n

Compiler

Ngn ng my
n

Jan2016

High-level language HLL


Mc tru tng gn vi
vn cn gii quyt
Hiu qu v linh ng

High-level
language
program
(in C)

c vit theo ngn ng bc cao

Phn mm h thng

Cc mc ca m chng trnh

The recognition that a program could be written to translate a more powerful


language into computer instructions was one of the great breakthroughs in the
early days of computing. Programmers today owe their productivityand their
sanityto the creation of high-level programming languages and compilers
that translate programs in such languages into instructions. Figure 1.4 shows the
relationships among these programs and languages, which are more examples of
the power of abstraction.

Ngi
lp trnh
h thng

13
1.3

Phn mm h thng

Cng mt kin trc tp lnh c th c nhiu sn


phm (t chc, phn cng) khc nhau
Computer Architecture

Phn cng (Hardware): nghin cu thit k logic chi


tit v cng ngh ng gi ca my tnh.

Jan2016

Phn mm ng dng
n

Ngi
lp trnh

Phn mm ng dng

Kt ni v vn chuyn thng tin

Computer Architecture

16

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

1.3. S tin ha ca cng ngh my tnh


n

My tnh dng n in t chn khng (1950s)


n
n

n
n

n
n

SSI - Small Scale Integration


MSI - Medium Scale Integration
LSI - Large Scale Integration

VLSI - Very Large Scale Integration

My tnh dng vi mch ULSI (1990s-nay)


n

ULSI - Ultra Large Scale Integration

Jan2016

Electronic Numerical Intergator


and Computer
D n ca B Quc phng M
Do John Mauchly i hc
Pennsylvania thit k
30 tn
X l theo s thp phn

My tnh dng vi mch VLSI (1980s)


n

My tnh ENIAC: my tnh u tin (1946)


My tnh IAS: my tnh von Neumann (1952)

My tnh dng transistors (1960s)


My tnh dng vi mch SSI, MSI v LSI (1970s)
n

My tnh u tin: ENIAC v IAS

Computer Architecture

17

NKK-HUST

Jan2016

n
n

Computer Architecture

18

Mt s loi vi mch s in hnh


n

Massive Cluster

Gigabit Ethernet

B vi x l (Microprocessors)
n

Clusters

Refrigerators

Cars
n

Routers
Routers

Computer Architecture

Nguyn Kim Khnh DCE-HUST

19

Jan2016

ROM, RAM, Flash memory

H thng trn chip (SoC System on Chip) hay


B vi iu khin (Microcontrollers)
n

RobotsRobots

Vi mch thc hin cc chc nng ni ghp cc thnh


phn ca my tnh vi nhau

B nh bn dn (Semiconductor Memory)
n

Mt hoc mt vi CPU c ch to trn mt chip

Vi mch iu khin tng hp (Chipset)


n

Jan2016

Thc hin ti Princeton Institute


for Advanced Studies
Do John von Neumann thit k
theo tng stored program
X l theo s nh phn
Tr thnh m hnh c bn ca
my tnh

NKK-HUST

My tnh ngy nay

Sensor
Nets

Tch hp cc thnh phn chnh ca my tnh trn mt


chip vi mch
c s dng ch yu trn smartphone, tablet v cc
my tnh nhng
Computer Architecture

20

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

S pht trin ca b vi x l
n
n
n

1971: b vi x l 4-bit Intel 4004


1972: cc b x l 8-bit
1978: cc b x l 16-bit
n

n
n
n

1.4. Hiu nng my tnh


n

Hiu nng = 1/(thi gian thc hin)


hay l: P = 1/t

My tnh c nhn IBM-PC ra i nm 1981

My tnh A nhanh hn my B k ln

1985: cc b x l 32-bit
2001: cc b x l 64-bit
2006: cc b x l a li (multicores)
n

nh ngha hiu nng P (Performance):

PA / PB = tB / tA = k
n

Nhiu CPU trn 1 chip

V d: Thi gian chy chng trnh:


n
n
n

Jan2016

Computer Architecture

21

NKK-HUST

Jan2016

Computer Architecture

Thi gian thc hin ca CPU

V mt thi gian, CPU hot ng theo mt xung nhp


(clock) c tc xc nh

Jan2016

Chu k xung nhp T0 (Clock period): thi gian ca mt


chu k
Tc xung nhp f0 (Clock rate) hay l Tn s xung nhp:
s chu k trong 1s
n f0 = 1/T0
VD: B x l c f0 = 4GHz = 4109Hz
T0 = 1/(4x109) = 0.25x109s = 0.25ns
Computer Architecture

Nguyn Kim Khnh DCE-HUST

n gin, ta xt thi gian CPU thc hin


chng trnh (CPU time):
Thi gian thc hin ca CPU =
S chu k xung nhp x Thi gian mt chu k

T0
n

22

NKK-HUST

Tc xung nhp ca CPU


n

10s trn my A, 15s trn my B


tB / tA = 15s / 10s = 1.5
Vy my A nhanh hn my B 1.5 ln

tCPU = n T0 =

n
f0

n: s chu k xung nhp


n

Hiu nng c tng ln bng cch:


n
n

23

Jan2016

Gim s chu k xung nhp n


Tng tc xung nhp f0
Computer Architecture

24

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d
n
n

Hai my tnh A v B cng chy mt chng trnh


My tnh A:
n
n

Ta c:

Tc xung nhp ca CPU: fA = 2GHz


Thi gian CPU thc hin chng trnh: tA = 10s

t=

n
f

S chu k xung nhp khi chy chng trnh trn my A:

nA = t A f A = 10s 2GHz = 20 10 9

My tnh B:
n

V d (tip)

Thi gian CPU thc hin chng trnh: tB = 6s


S chu k xung nhp khi chy chng trnh trn my B (nB)
nhiu hn 1.2 ln s chu k xung nhp khi chy chng trnh
trn my A (nA)

S chu k xung nhp khi chy chng trnh trn my B:

nB = 1.2 nA = 24 10 9

Hy xc nh tc xung nhp cn thit cho my B (fB)?

Tc xung nhp cn thit cho my B:

fB =
Jan2016

Computer Architecture

25

NKK-HUST

nB 24 10 9
=
= 4 10 9 Hz = 4GHz
tB
6

Jan2016

Computer Architecture

26

NKK-HUST

S lnh v s chu k trn mt lnh

V d

S chu k xung nhp ca chng trnh:

S chu k = S lnh ca chng trnh x S chu k trn mt lnh

n = IC CPI
n
n
n

n - s chu k xung nhp


IC - s lnh ca chng trnh (Instruction Count)
CPI - s chu k trn mt lnh (Cycles per Instruction)

IC CPI
f0

Chu k xung nhp: TA = 250ps


S chu k/ lnh trung bnh: CPIA = 2.0

My tnh B:
n

Vy thi gian thc hin ca CPU:

tCPU = IC CPI T0 =

Hai my tnh A v B c cng kin trc tp lnh


My tnh A c:

Chu k xung nhp: TB = 500ps


S chu k/ lnh trung bnh: CPIB = 1.2

Hy xc nh my no nhanh hn v nhanh hn
bao nhiu ?

Trong trng hp cc lnh khc nhau c CPI khc nhau,


cn tnh CPI trung bnh
Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

27

Jan2016

Computer Architecture

28

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d (tip)
Ta c:

CPI trung bnh

tCPU = IC CPITB T0

Hai my cng kin trc tp lnh, v vy s lnh ca cng


mt chng trnh trn hai my l bng nhau:

Nu loi lnh khc nhau c s chu k khc nhau,


ta c tng s chu k:
K

ICA = ICB = IC

n = (CPI i ICi )

Thi gian thc hin chng trnh trn my A v my B:

i=1

t A = ICA CPI A TA = IC 2.0 250 ps = IC 500 ps

CPI trung bnh:

t B = ICB CPI B TB = IC 1.2 500 ps = IC 600 ps

T ta c:

CPITB =

t B IC 600 ps
=
= 1.2
t A IC 500 ps

n
1 K
= (CPI i ICi )
IC IC i=1

Kt lun: my A nhanh hn my B 1.2 ln


Jan2016

Computer Architecture

29

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

V d
n

V d

Cho bng ch ra cc dy lnh s dng cc lnh


thuc cc loi A, B, C. Tnh CPI trung bnh?

Cho bng ch ra cc dy lnh s dng cc lnh


thuc cc loi A, B, C. Tnh CPI trung bnh?

Loi lnh

Loi lnh

CPI theo loi lnh

CPI theo loi lnh

IC trong dy lnh 1

20

10

20

IC trong dy lnh 1

20

10

20

IC trong dy lnh 2

40

10

10

IC trong dy lnh 2

40

10

10

Dy lnh 1: S lnh = 50

Computer Architecture

Nguyn Kim Khnh DCE-HUST

31

Jan2016

Dy lnh 2: S lnh = 60

S chu k =
= 1x20 + 2x10 + 3x20 = 100
n CPITB = 100/50 = 2.0
n

Jan2016

30

Computer Architecture

S chu k =
= 1x40 + 2x10 + 3x10 = 90
n CPITB = 90/60 = 1.5
n

32

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

MIPS nh l thc o hiu nng

Tm tt v Hiu nng
CPU Time =

Instructions Clock cycles Seconds

Program
Instruction Clock cycle

MIPS: Millions of Instructions Per Second


(S triu lnh trn 1 giy)

Thi gian CPU = S lnh ca chng trnh x S chu k/lnh


x S giy ca mt chu k

tCPU
n

MIPS =

IC CPI
= IC CPI T0 =
f0

Instruction count
Instruction count
Clock rate
=
=
Execution time 106 Instruction count CPI 106 CPI106
Clock rate

Hiu nng ph thuc vo:


n
n
n
n
n

Thut gii
Ngn ng lp trnh
Chng trnh dch
Kin trc tp lnh
Phn cng

Jan2016

Computer Architecture

MIPS =

33

NKK-HUST

f0
CPI 10 6

Jan2016

CPI =

f0
MIPS 10 6

Computer Architecture

34

NKK-HUST

V d

V d

Tnh MIPS ca b x l vi:


clock rate = 2GHz v CPI = 4

Tnh MIPS ca b x l vi:


clock rate = 2GHz v CPI = 4

0.5ns
2ns

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

35

Jan2016

Chu k T0 = 1/(2x109) = 0.5ns


CPI = 4 thi gian thc hin 1 lnh = 4 x 0.5ns = 2ns
S lnh thc hin trong 1s = (109ns)/(2ns) = 5x108 lnh
Vy b x l thc hin c 500 MIPS
Computer Architecture

36

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d

V d

Tnh CPI ca b x l vi:


clock rate = 1GHz v 400 MIPS

Tnh CPI ca b x l vi:


clock rate = 1GHz v 400 MIPS

1ns

Jan2016

Computer Architecture

37

NKK-HUST

Jan2016

Computer Architecture

38

NKK-HUST

MFLOPS

Cc tng v i trong kin trc my tnh


1. Design for Moores Law
Thit k theo lut Moore
2. Use abstraction to simplify design
S dng tru tng ha n gin thit k
3. Make the common case fast
Lm cho cc trng hp ph bin thc hin nhanh
4. Performance via parallelism
Tng hiu nng qua x l song song
5. Performance via pipelining
Tng hiu nng qua k thut ng ng
6. Performance via prediction
Tng hiu nng thng qua d on
7. Hierarchy of memories
Phn cp b nh
8. Dependability via redundancy
Tng tin cy thng qua d phng

S dng cho cc h thng tnh ton ln


Millions of Floating Point Operations per Second
S triu php ton s du phy ng trn mt giy
MFLOPS =

Executed floating point operations


Execution time 106

GFLOPS109
TFLOPS1012
PFLOPS (1015)

Jan2016

Chu k T0 = 1/109 = 1ns


S lnh thc hin trong 1 s l 400MIPS = 4x108 lnh
Thi gian thc hin 1 lnh = 1/(4x108)s = 2.5ns
Vy ta c: CPI = 2.5

Computer Architecture

Nguyn Kim Khnh DCE-HUST

39

Jan2016

Computer Architecture

40

10

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Kin trc my tnh

Chng 2
C BN V LOGIC S

Ht chng 1

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

41

NKK-HUST

Jan2016

42

NKK-HUST

Ni dung ca chng 2

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Jan2016

Computer Architecture

Computer Architecture

Nguyn Kim Khnh DCE-HUST

2.1. Cc h m c bn
2.2. i s Boole
2.3. Cc cng logic
2.4. Mch t hp
2.5. Mch dy

43

Jan2016

Computer Architecture

44

11

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

2.1. Cc h m c bn
n

Jan2016

1. H thp phn

H thp phn (Decimal System)


con ngi s dng
H nh phn (Binary System)
my tnh s dng
H mi su (Hexadecimal System)
dng vit gn cho s nh phn

Computer Architecture

C s 10

10 ch s: 0,1,2,3,4,5,6,7,8,9
Dng n ch s thp phn c th biu din
c 10n gi tr khc nhau:

45

NKK-HUST

00...000

= 0

99...999

= 10n - 1

Jan2016

Computer Architecture

46

NKK-HUST

Dng tng qut ca s thp phn

V d s thp phn

A = an an1 ... a1a0 , a1 ... am

472.38 = 4x102 + 7x101 + 2x100 + 3x10-1 + 8x10-2


n

Gi tr ca A c hiu nh sau:
A = an10 n + an110 n1 +... + a1101 + a010 0 + a110 1 +... + am10 m
n

A=

a 10

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Cc ch s ca phn nguyn:
n

472 : 10 = 47 d

47 : 10 = 4 d

4 : 10 = 0 d

Cc ch s ca phn l:

i=m

Jan2016

47

Jan2016

0.38 x 10 = 3.8 phn nguyn =

0.8 x 10 = 8.0 phn nguyn =

Computer Architecture

48

12

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

S nh phn

2. H nh phn
n
n
n
n
n

C s 2
2 ch s nh phn: 0 v 1
Ch s nh phn c gi l bit (binary digit)
bit l n v thng tin nh nht
Dng n bit c th biu din c 2n gi tr khc
nhau:
n
n

00...000
11...111

Biu din
s nh phn

2-bit

3-bit

4-bit

S
thp phn

00

000

0000

01

001

0001

10

010

0010

11

011

0011

100

0100

101

0101

110

0110

111

0111

1000

= 0
= 2n - 1

Cc lnh ca chng trnh v d liu trong


my tnh u c m ha bng s nh phn

Jan2016

Computer Architecture

49

NKK-HUST

Jan2016

Computer Architecture

1001

1010

10

1011

11

1100

12

1101

13

1110

14

1111

15

50

NKK-HUST

n v d liu trong my tnh

Dng tng qut ca s nh phn

Byte = 8-bit

Qui c mi v n v d liu trong my tnh:


Theo thp phn

Jan2016

1-bit

A = an an1 ... a1a0 , a1 ... am

Theo nh phn

n v

Vit tt

Gi tr

n v

Vit tt

Gi tr

kilobyte

KB

103

kibibyte

KiB

210 = 1024

megabyte

MB

106

mebibyte

MiB

220

gigabyte

GB

109

gibibyte

GiB

230

terabyte

TB

1012

tebibyte

TiB

240

petabyte

PB

1015

pebibyte

PiB

250

exabyte

EB

1018

exbibyte

EiB

260

Computer Architecture

Nguyn Kim Khnh DCE-HUST

vi ai=0 hoc 1

Gi tr ca A c tnh nh sau:

A = an 2 n + an1 2 n1 +... + a1 21 + a0 2 0 + a1 2 1 +... + am 2 m


n

A=

a2

i=m

51

Jan2016

Computer Architecture

52

13

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d s nh phn

Chuyn i s nguyn thp phn sang nh phn

1101001.1011(2) =
6 5 4 3 2 1 0

-1 -2 -3 -4

= 26 + 25 + 23 + 20 + 2-1 + 2-3

2-4

= 64 + 32 + 8 + 1 + 0.5 + 0.125 + 0.0625

Phng php 1: chia dn cho 2 ri ly


phn d
Phng php 2: Phn tch thnh tng
ca cc s 2i nhanh hn

= 105.6875(10)

Jan2016

Computer Architecture

53

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Phng php chia dn cho 2


n

Phng php phn tch thnh tng ca cc 2i


n

V d: chuyn i 105(10)
n

105 : 2 =

52

52 : 2 =

26

26 : 2 =

13

13 : 2 =

6:2 =

3:2 =

1:2 =

V d 1: chuyn i 105(10)
6
5
3
0
n 105 = 64 + 32 + 8 +1 = 2 + 2 + 2 + 2

biu din
s d
theo chiu
mi tn
n

27

26

25

24

23

22

21

20

128

64

32

16

Kt qu:

105(10) = 0110 1001(2)

V d 2: 17000(10) = 16384 + 512 + 64 + 32 + 8


=

Kt qu: 105(10) = 1101001(2)

214 + 29 + 26 + 25 + 23

17000(10) = 0100 0010 0110 1000(2)


15 14 13 12 11 10 9 8

Jan2016

54

Computer Architecture

Nguyn Kim Khnh DCE-HUST

55

Jan2016

7 6 5 4

Computer Architecture

3 2 1 0

56

14

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Chuyn i s l thp phn sang nh phn


n

Chuyn i s l thp phn sang nh phn (tip)

V d 1: chuyn i 0.6875(10)
n

0.6875 x 2 = 1.375

phn nguyn = 1

0.375 x 2 = 0.75

phn nguyn = 0

0.75

x 2 = 1.5

phn nguyn = 1

0.5

x 2 = 1.0

phn nguyn = 1

biu din
theo
chiu
mi tn

Kt qu : 0.6875(10)= 0.1011(2)

Jan2016

Computer Architecture

57

0.81 x 2 =

1.62

phn nguyn

0.62 x 2 =

1.24

phn nguyn

0.24 x 2 =

0.48

phn nguyn

0.48 x 2 =

0.96

phn nguyn

0.96 x 2 =

1.92

phn nguyn

0.92 x 2 =

1.84

phn nguyn

0.84 x 2 =

1.68

phn nguyn

0.81(10) 0.1100111(2)

NKK-HUST

Jan2016

Computer Architecture

58

NKK-HUST

3. H mi su (Hexa)
n
n
n

Jan2016

V d 2: chuyn i 0.81(10)

Quan h gia s nh phn v s Hexa


V d:

C s 16
16 ch s: 0,1,2,3,4,5,6,7,8,9, A,B,C,D,E,F
Dng vit gn cho s nh phn: c mt
nhm 4-bit s c thay bng mt ch s
Hexa

Computer Architecture

Nguyn Kim Khnh DCE-HUST

1011 0011(2) = B3(16)

0000 0000(2) = 00(16)

59

0010 1101 1001 1010(2) = 2D9A(16)

1111 1111 1111 1111(2) = FFFF(16)

Jan2016

Computer Architecture

4-bit

S Hexa

Thp phn

0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

1010

10

1011

11

1100

12

1101

13

1110

14

1111

15

60

15

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

2.2. i s Boole
n
n
n

Php ton i s Boole vi hai bin

i s Boole s dng cc bin logic v php ton logic


Bin logic c th nhn gi tr 1 (TRUE) hoc 0 (FALSE)
Cc php ton logic c bn: AND, OR v NOT
n

A AND B :

A OR B :

A B hay AB
A+B

NOT A :

Th t u tin: NOT > AND > OR


Thm cc php ton logic: NAND, NOR, XOR
n

A NAND B:

A NOR B :

A XOR B:

AB
A+ B
A B = A B + A B

Jan2016

Computer Architecture

61

NKK-HUST

A AND B
AB

A OR B
A + B

A NAND B
AB

Jan2016

NOT l php ton 1 bin

A NOR B

A XOR B

A + B

Computer Architecture

A B

62

NKK-HUST

Cc ng nht thc ca i s Boole


AB=BA

A+B=B+A

A (B + C) = (A B) + (A C)

A + (B C) = (A + B) ( A + C)

1A=A

0+A=A

AA=0

A+A=1

0A=0

1+A=1

AA=A

A+A=A

A (B C) = (A B) C

A + (B + C) = (A + B) + C

A B = A + B (nh l De Morgan)

A + B = A B (nh l De Morgan)

Jan2016

NOT A

Computer Architecture

Nguyn Kim Khnh DCE-HUST

2.3. Cc cng logic (Logic Gates)


n

Thc hin cc hm logic:


n

Cng logic mt u vo:


n

63

Jan2016

Cng NOT

Cng hai u vo:


n

NOT, AND, OR, NAND, NOR, XOR

AND, OR, XOR, NAND, NOR

Cng nhiu u vo

Computer Architecture

64

16

The fundamental building block of all digital logic circuits is the gate. Logical functions are implemented by the interconnection of gates.
A gate is an electronic circuit that produces an output signal that is a simple Boolean operation on its input signals. The basic gates used in digital logic are
AND, OR, NOT, NAND, NOR, and XOR. Figure 11.1 depicts these six gates. Each
gate is defined in three ways: graphic symbol, algebraic notation, and truth table.
The symbology used in this chapter is from the IEEE standard, IEEE Std 91. Note
that the inversion (NOT) operation is indicated by a circle.
Each gate shown in Figure 11.1 has one or two inputs and one output.
However, as indicated in Table 11.1b, all of the gates except NOT can have more
than two inputs. Thus, (X + Y + Z) can be implemented with a single OR gate
with three inputs. When one or more of the values at the input are changed, the
correct output signal appears almost instantaneously, delayed only by the propagation time of signals through the gate (known as the gate delay). The significance of
this delay is discussed in Section 11.3. In some cases, a gate is implemented with two
outputs, one output being the negation of the other output.

Bi ging Kin trc my tnh

NKK-HUST

Jan2016

NKK-HUST

K hiu cc cng logic

Algebraic
Function

Graphical Symbol

Name

Tp y

Truth Table

A B F
0 0 0
0 1 0
B
1 0 0 11.2 / GATES
1 1 1
A B F
Here we introduce
a common term: we say that to assert
0 0 0a signal is to cause a
A
F ! A " Bfalse (0) 0state
F its logically
1 1 to its logically true
signal line toOR
make aBtransition from
1 0 1
(1) state. The true (1) state is either a high or low voltage1 state,
1 1 depending on the
A

AND

F!AB
or
F ! AB

369

type of electronic circuitry.


A F
F!A
Typically,
not allA gate types are
used in implementation.
and fabrication
0 Design
1
NOT
F
or
are simpler if only one or two types of gates are
Thus, 1it is0 important to identify
F ! used.
A#
functionally complete sets of gates. This means that any Boolean
function can be impleA B F
mented using only theAgates in the set. The following are functionally
complete sets:
0 0 1
NAND

Jan2016

F ! AB

0 1 1
B
AND, OR, NOT
1 0 1
1 1 0
AND, NOT
A B F
0 0 1
A
OR, NOT
F!A"B
0 1 0
NOR
F
B
1 0 0
NAND
1 1 0
NOR
A B F
0 0 0
A
XORbe clear that AND, F
F ! NOT
A ! B gates constitute
0 1 1
It should
OR, and
a
B
1 0 1
complete set, because they represent
the
three
operations
of Boolean
Computer Architecture
1 1 0

functionally
algebra. For
the AND and NOT gates to form a functionally complete set, there must be a way
Figure 11.1 Basic Logic Gates
to synthesize
the OR operation from the AND and NOT operations. This can be
done by applying DeMorgans theorem:

65

L tp cc cng c th thc hin c bt k


hm logic no t cc cng ca tp
Mt s v d v tp y :
n

{AND, OR, NOT}

{AND, NOT}

{OR, NOT}

{NAND}

{NOR}

Jan2016

Computer Architecture

66

A + B = A#B
A OR B = NOT ((NOT A) AND (NOT B))

NKK-HUST

NKK-HUST

Similarly, the OR and NOT operations are functionally complete because


they can be used to synthesize the AND operation.
Figure 11.2 shows how the AND, OR, and NOT functions can be implemented
solely with NAND gates, and Figure 11.3 shows the same thing for NOR gates.
For this reason, digital circuits can be, and frequently are, implemented solely with
NAND gates or solely with NOR gates.

S dng cng NAND

S dng cng NOR


370

CHAPTER 11 / DIGITAL LOGIC


A

A B

A
B

A B

(A+B)

A+B

A B

Figure 11.2 Some Uses of NAND Gates

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

A+B

Figure 11.3 Some Uses of NOR Gates

67

Jan2016

With gates, we have reached the most primitive circuit level of computer
hardware. An examination of theComputer
transistor
combinations used to construct gates
Architecture
departs from that realm and enters the realm of electrical engineering. For our purposes, however, we are content to describe how gates can be used as building blocks
to implement the essential logical circuits of a digital computer.

11.3 COMBINATIONAL CIRCUITS


A combinational circuit is an interconnected set of gates whose output at any time
is a function only of the input at that time. As with a single gate, the appearance of
the input is followed almost immediately by the appearance of the output, with only

68

17

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Mt s vi mch logic
1A
1B
1Y
2A
2B
2Y
GND

1A
1B
1Y
2A
2B
2Y
GND

Jan2016
1A
1B
NC
1C
1D

NKK-HUST

1Y
GND

14

13

12

11

10

7400 NAND

14

13

12

11

10

7408 AND

14

13

12

11

10

VDD

1Y

4B

1A

4A

1B

4Y

2Y

3B

2A

3A

2B

3Y

GND

VDD

1A

4B

1B

4A

2A

4Y

2B

3B

2C

3A

2Y

3Y

GND

VDD

1CLR

14

13

12

11

10

7402 NOR

14

13

12

11

10

7411 AND3

VDD

1A

4Y

1Y

4B

2A

4A

2Y

3Y

3A

3B

3Y

3A

GND

VDD

1A

1C

1B

1Y

1Y

3C

2A

3B

2B

3A

2Y

3Y

GND

Computer Architecture
2D

1D

2C

1CLK

NC

1PRE

2B

1Q

2A

1Q

2Y

GND

Mch t hp
7421 AND4

1
2
3
4

14

13

reset
D Q
Q
set

reset
Q D
Q
set

12
11
10

7474 FLOP

VDD

1A

2CLR

1B

2D

1Y

2CLK

2A

2PRE

2B

2Q

2Y

2Q

GND

2.4. Mch t hp

585

A.3 Programmable Logic

14

13

12

11

10

7404 NOT

14

13

12

11

10

9
8

7432 OR

14

13

12

11

10

7486 XOR

Mch logic l mch bao gm:


n

6A
6Y

5A

5Y
4A

4Y

VDD

Cc kiu mch logic:


n

VDD

Cc u vo (Inputs)
Cc u ra (Outputs)
c t chc nng (Functional specification)
c t thi gian (Timing specification)

4B

4Y

3B

3A

3Y

69

Jan2016

n
n

4A
4Y
3B

NKK-HUST

3Y

70

F = 1A B C2 # 1A B C2 # 1A B C2 # 1A B C2 # 1A B C2

V d

This can be rewritten using a generalization of DeMorgans theorem:


(X # Y # Z) = X + Y + Z

u vo

Bng tht (True Table)


Dng s
Phng trnh Boole

Computer Architecture

4B

Mch t hp l mch logic trong u ra ch


ph thuc u vo thi im hin ti
L mch khng nh v c thc hin bng
cc cng logic
Mch t hp c th c nh ngha theo ba
cch:

There are three combinations of input values that cause F to be 1, and if any
one of these combinations occurs, the result is 1. This form of expression, for selfevident reasons, is known as the sum of products (SOP) form. Figure 11.4 shows a
straightforward implementation with AND, OR, and NOT gates.
Another form can also be derived from the truth table. The SOP form
expresses that the output is 1 if any of the input combinations that produce 1 is true.
We can also say that the output is 1 if none of the input combinations that produce
0 is true. Thus,

Figure A.1 Common 74xx-series logic gates

0
Mch c nh
1
1
0
1
1
0 ti
u ra c xc nh bi cc1 gi tr trc
v 1gi tr hin
ca u vo
1

VDD

3A

0
0
0
Mch khng nh
0
0
1
u ra c xc nh bi cc0 gi tr hin
1 ti ca 0u vo

Mch dy (Sequential Circuits)


1

371

A Boolean Function of Three Variables

Mch t hp (Combinational
Circuits)
A
B
n

4A

11.3 / COMBINATIONAL CIRCUITS


Table 11.3

u ra

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

0
0
1
1
0
0
1
0

Figure 11.4

Sum-of-Products Implementation of Table 11.3

F = A BC + A BC + ABC
Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

71

Jan2016

Computer Architecture

72

18

F = B(A + C)

D3

Because the complement of the complement of a value is just the original value,

output signal F. To select one of the four possible inputs, a 2-bit selection code is

F = B(A + C) = (AB + (BC) needed, and this is implemented as two select lines labeled S1 and S2.

Bi ging Kin trc my tnh

An example 4-to-1 multiplexer is defined by the truth table in Table 11.7. This
is a simplified form of a truth table. Instead of showing all possible combinations of
input variables, it shows the output as data from line D0, D1, D2, or D3. Figure 11.13
shows an implementation using AND, OR, and NOT gates. S1 and S2 are connected
which has three NAND forms, as illustrated in Figure 11.11.to the AND gates in such a way that, for any combination of S1 and S2, three of the
AND gates will output 0. The fourth AND gate will output the value of the selected
line, which is either 0 or 1. Thus, three of the inputs to the OR gate are always 0,
Multiplexers
and the output of the OR gate will equal the value of the selected input gate. Using
The multiplexer connects
multiple inputs to a single output.this
At regular
any time,
one of theit is easy to construct multiplexers of size 8-to-1, 16-to-1,
NKK-HUST
organization,
so on. representation
inputs is selected to be passed to the output. A general blockand
diagram
Multiplexers
areinput
used in digital circuits to control signal and data routing. An
is shown in Figure 11.12. This represents a 4-to-1 multiplexer. There
are four
theprovide
loading of
the program counter (PC). The value to be loaded into the
lines, labeled D0, D1, D2, and D3. One of these lines is example
selectedisto
the
program counter may come from one of several different sources:

Applying DeMorgans theorem,

Jan2016

F = (AB)(BC)

NKK-HUST

B chn knh (Multiplexer - MUX)

B chn knh 4 u vo
S2

n
n
n

u vo d liu
n u vo chn
1 u ra d liu
Mi t hp u vo chn (S) xc nh u vo
d liu no (D) s c ni vi u ra (F)

Jan2016

Computer Architecture

D1

F
D0

D3

D1

S2

S1

Figure 11.12 4-to-1 Multiplexer


Representation
u vo chn u ra

73

S2

S1

0
0
1
1

0
1
0
1

D0
D1
D2
D3

F
D2

D3

Figure 11.13 Multiplexer Implementation

F = D0 S2 S1+ D1 S2 S1+ D2 S2 S1+ D3 S2 S1

Jan2016

Computer Architecture

74

NKK-HUST

B gii m (Decoder)
n

4-to-1
MUX

D2

NKK-HUST

S1

D0

n 2n

Thc hin b gii m 3 ra 8

11.3 / COMBINATIONAL CIRCUITS

N u vo, 2N u ra
Vi mt t hp ca N u vo, ch c mt u ra tch cc
(khc vi cc u ra cn li)
V d: B gii m 2 ra 4
2:4
Decoder
11
10
01
00

A1
A0

A1

Y3
Y2
Y1
Y0

A
B

001 D
1
C
010 D
2

A0

011 D
3

100 D
4

Y3
101 D
5

Y2

A1
0
0
1
1
Jan2016

A0
0
1
0
1

Y3
0
0
0
1

Y2
0
0
1
0

Y1
0
1
0
0

383

000 D
0

Y0
1
0
0
0
Computer Architecture

Y1

110 D
6

Y0

111 D
7

75

Jan2016

Figure 11.15

Decoder with 3 Inputs and 23 = 8 Outputs


Computer Architecture

76

A0
A7

Nguyn Kim Khnh DCE-HUST

256 ! 8
RAM
2-to-4
Decoder

Enable

256 ! 8
RAM
Enable

256 ! 8
RAM
Enable

256 ! 8
RAM
Enable

19

Bi ging Kin trc my tnh

Jan2016
386

Table 11.9 Binary Addition Truth Tables


NKK-HUST
(a) Single-Bit Addition
Cin
A
B
Sum
Carry

NKK-HUST

B cng
n

Cng hai bit to ra bit tng v bit nh ra

Sum

+ 0
0

B cng ton phn 1-bit (Full-adder)


n

(b) Addition with Carry Input

A
B cng bn phn 1-bit

B cng bn phn 1-bit (Half-adder)


n

CHAPTER 11 / DIGITAL LOGIC

Cng 3 bit
Cho php xy dng b cng N-bit

0
1

+ 1
1

+ 0
1

A
B
C
A
B
C

u vo A
B

1+ 1

Cout
0

1Half
Adder
0

387

0
1
1

Sum

1
Cout

+1

0
1 CIRCUITS0
11.3 /1COMBINATIONAL
1
1
01-bit
1

10

0
S

+0 A

A1

u ra

B
0C

Cout

S = A B

+0

+1

B
0
0
0 1
C out = AB
00
1
10
C
0
1
1
0
However, addition canAstill be dealt with in Boolean terms. In Table 11.9a, we
show the logic
1 for adding
0 B two input
1 bits0 to produce a 1-bit sum and a carry bit.
This truth table could easily be implemented in digital logic. However, we are not
1
1 A addition
0 on just1 a single pair of bits. Rather, we wish to
interested in performing
Jan2016

Computer Architecture

77

NKK-HUST

S = ABC + ABC + ABC + ABC


Cout = AB + AC + BC

Cin

11.3 / COMBINATIONAL CIRCUITS

387

A
B
C

S
u vo

u ra

Cin

Cout

A
B
C
A
B
C

Thus we have
the necessary
logic+toBC
implement a multiple-bit adder such as
Carry
= AB + AC
shown in Figure 11.21. Note that because the output from each adder depends on
Figure 11.20 is an implementation using AND, OR, and NOT gates.
the carry from the previous adder, there is an increasing delay from the least significant to the most significant bit. Each single-bit adder experiences a certain amount
of gate delay,A andBthis gate delay
larger adders,
A2 accumulates.
B2
AFor
B
A0 theB0accumulated
3
3
1
1
delay can become unacceptably high.
If the carry values could be determined without having to ripple through all
Overflow
the previous
stages,Cthen each
single-bit
adder
could CfunctionC independently,
and
0
C3
C2
Cin
C1
Cin
in
in
0
signal
delay would not accumulate. This can be achieved with an approach known as carry
lookahead. Let us look again at the 4-bit adder to explain this approach.
We would like to come up with an expression that specifies the carry input to
any stage of theSadder without reference
to previous
carry values. We
have
S
S
S

B cng 4-bit v b cng 32-bit

1-bit
Full
Adder

Cout

Figure 11.19 4-Bit Adder

Sum

A31 B31

A
B
C

S31

Cout

Figure 11.21

Carry

A24 B24

A23 B23
C23

8-bit
adder

Cout

A
B
A
C

78

Sum = A BC + ABC + ABC + ABC

NKK-HUST

B cng ton phn 1-bit

Jan2016

add two n-bit numbers. CThis can be done by putting togetherCarry


a set of adders so that
Architecture
carry from one adder is providedComputer
as input
to the next. A 4-bit adder is depicted
in Figure 11.19.
B
For a multiple-bit adder
to work, each of the single-bit adders must have three
C
inputs, including the carry from the next-lower-order adder. The revised truth table
Figure
11.20
Implementation
of an Adder
appears in Table 11.9b. The two
outputs
can be expressed:

Jan2016
the

S24

A16 B16

C15

8-bit
adder

S23

A15 B15

S16

A8 B8
C7

8-bit
adder

S15

A 7 B7

S8

A0 B0

8-bit
adder

S7

Cin

S0

Construction of a 32-Bit Adder Using 8-Bit Adders

B
C

Computer Architecture
Figure 11.20 Implementation of an Adder

79

Thus we have the necessary logic to implement a multiple-bit adder such as


shown in Figure 11.21. Note that because the output from each adder depends on
the carry from the previous adder, there is an increasing delay from the least significant to the most significant bit. Each single-bit adder experiences a certain amount
of gate delay, and this gate delay accumulates. For larger adders, the accumulated
delay can become unacceptably high.
If the carry values could be determined without having to ripple through all
the previous stages, then each single-bit adder could function independently, and
delay would not accumulate. This can be achieved with an approach known as carry
lookahead. Let us look again at the 4-bit adder to explain this approach.
We would like to come up with an expression that specifies the carry input to
any stage of the adder without reference to previous carry values. We have

Nguyn Kim Khnh DCE-HUST

Jan2016

Computer Architecture

80

20

Bi ging Kin trc my tnh

Jan2016

Figure 11.26 JK Flip-Flop

NKK-HUST

NKK-HUST

2.5. Mch dy

causing the output to be 1; if only the K input is asserted, the result is a reset function,
causing the output to be 0. When both J and K are 1, the function performed is
referred to as the toggle function: the output is reversed. Thus, if Q is 1 and 1 is applied
to J and K, then Q becomes 0. The reader should verify that the implementation of
Figure 11.26 produces this characteristic function.

Cc Flip-Flop c bn
Name

Mch dy l mch logic trong u ra ph


thuc gi tr u vo thi im hin ti v
u vo thi im qu kh

Truth Table

Qn!1

0
0
1
1

0
1
0
1

Qn
0
1

Qn!1

0
0
1
1

0
1
0
1

Qn
0
1
Qn

Qn!1

0
1

0
1

Ck

SR

L mch c nh, c thc hin bng phn


t nh (Latch, Flip-Flop) v c th kt hp
vi cc cng logic

Graphical Symbol

Ck

JK

Ck

Jan2016

Computer Architecture

81

NKK-HUST

Jan2016

Computer Architecture

82

Figure 11.27 Basic Flip-Flops

NKK-HUST

S-R Latch v cc Flip-Flop


11.4 / SEQUENTIAL CIRCUITS
R

Thanh ghi 8-bit song song

389

11.4 / SEQUENTIAL CIRCUITS

391

393

11.4 / SEQUENTIAL CIRCUITS

Data lines
11.4 / SEQUENTIAL CIRCUITS
R

391

Clock

D18

D17

D16

D15

D14

D13

D12

D11

S
Q

ClockS

Figure 11.24

Figure 11.22 The SR Latch Implemented


with NOR Gates

S-R Latch

S-R Flip-Flop

392

Clocked SR Flip-Flop

Clk

Clk

Clk

Clk

Clk

Clk

Clk

Clk

CHAPTER 11 / DIGITAL LOGIC

First, let us show that the circuit is bistable. Assume that both S and R are 0
Figure 11.24 Clocked SR Flip-Flop
and that Q is 0. The inputs to the lower NOR gate are Q = 0 and S = 0. Thus, the
Clock
output Q = 1 means that the inputs to the upper NOR gate are Q = 1 and R = 0,
Q
K
which has the output Q = 0. Thus, the state of the circuit is internally consistent
Q
and remains stable as long as S = R = 0. A similar line of reasoning shows that the
Q
state Q = 1, Q = 0 is also stable for R = S = 0.
D
Thus, this circuit can function as a 1-bit memory. We can view the output Q as
Clock
Figure
11.25 D Flip-Flop
Clock
the value of the bit. The
inputs S and R serve to write the values 1 and 0, respectively, into memory. To see this, consider the state Q = 0, Q = 1, S = 0, R = 0.
This device is referred to as a clocked SR flip-flop. Note that the
Suppose that S changes to the value 1. Now the inputs to the lower NOR gatearrangement.
are
Q
R be
and S inputs are passed
to the NOR gates only during the clock pulse.
S = 1, Q = 0. After some time delay ^t, the output of the lower NOR gate
will
J
Q
D
Q = 0 (see Figure 11.23). So, at this point in time, the inputs to the upper NORD
gate
FLIP-FLOP One problem with SR flip-flop is that the condition R = 1, S = 1
11.25
Flip-Flop
become R = 0,Figure
Q = 0.
AfterDanother
gate delay of ^t the output Q becomes 1. must
This be avoided. One way to do this is to allow just a single input. The D flip-flop
is again a stable state. The inputs to the lower gate are now S = 1, Q = 1, which
Figure 11.26 JK Flip-Flop
accomplishes this.
Figure 11.25 shows a gate implementation of the D flip-flop. By
maintain
the output
= 0. As
long as Sto=as1 aand
R = 0,
theflip-flop.
outputs will
arrangement.
ThisQdevice
is referred
clocked
SR
Noteremain
that the
using an inverter, the nonclock inputs to the two AND gates are guaranteed to be
QR
= and
1, QS=inputs
0. Furthermore,
if the
S returns
0, the
outputs
unchanged.
are passed to
NOR to
gates
only
duringwill
theremain
clock pulse.
the opposite of each other.
The R output performs the opposite function. When R goes to 1, it forces
causing
theDoutput
to be
1; if only the
K inputtoisas
asserted,
the
result isbecause
a reset function,
The
flip-flop
is sometimes
referred
the data
flip-flop
it is, in
D FLIP-FLOP One problem with SR flip-flop is that the condition R = 1, Sof= 1
Q = 0, Q = 1 regardless of the previous state of Q and Q. Again, a time delay
causing
the output
tobit
beof0.data.
When
both
J and
K are
1, the function
performed
effect, storage
for one
The
output
of the
D flip-flop
is always equal
to the is
must be avoided. One way to do this is to allow just a single input. The D flip-flop
2^t occurs before the final state is established (Figure 11.23).
referred
to asvalue
the toggle
function:
output
is reversed.
Thus, and
if Q is
1 and 1 is
applied
mostBy
recent
applied
to the the
input.
Hence,
it remembers
produces
the
last
accomplishes this. Figure 11.25 shows a gate implementation of the D flip-flop.
The SR latch can be defined with a table similar to a truth table, called
a
to
J and
K,also
thenreferred
Q becomes
0. The
reader
should
verifyitthat
theaimplementation
using an inverter, the nonclock inputs to the two AND gates are guaranteed
to
beIt is
input.
to as the
delay
flip-flop,
because
delays
0 or 1 applied toof
characteristic
table,
which
shows
the
next
state
or
states
of
a
sequential
circuit
as
Figure
11.26
produces
this characteristic
function.
Jan2016
Computer
Architecture
the
opposite of each other.
its
input
for
a
single
clock
pulse.
We
can
capture
the
logic
of
the
D
flip-flop
in
the83
a function of current states and inputs. In the case of the SR latch, the state can
The D flip-flop is sometimes referred to as the data flip-flop because following
it is, in truth table:
be defined by the value of Q. Table 11.10a shows the resulting characteristic table.
effect, storage for one bit of data. The output of the D flip-flop is always equal to the
Observe that the inputs S = 1, R = 1 are not allowed, because these would proD
Q
n
!
1
most recent value applied to the input. Hence, it remembers and produces the last
duce an inconsistent output (both Q and Q equal 0). The table can be expressed
input. It is also referred to as the delay flip-flop, because it delays a 0 or 1 applied to
Name
Graphical Symbol
Truth Table
0
0
more compactly, as in Table 11.10b. An illustration of the behavior of the SR latch
its input for a single clock pulse. We can capture the logic of the D flip-flop in the
1
1
is shown in Table 11.10c.
following truth table:
S
Q
S
R
Qn!1
CLOCKED SR FLIP-FLOP The output of the SR latch changes, after a brief
JK FLIP-FLOP Another useful flip-flop is the JK flip-flop. Like the SR flip-flop,
D input.
Qn !This
1
time delay, in response to a change in the
is referred to as asynchronous
0
0
it has two inputs. However, in this case all possible combinations
ofQinput
values are
n
operation. More typically, events in the digital
computer
are synchronized to a clock
Ck
0
0
SR
0and Figure 11.27
1
0
flip-flop,
pulse, so that changes occur only when a1clock 1pulse occurs. Figure 11.24 showsvalid.
this Figure 11.26 shows a gate implementation of the JK
shows its characteristic table (along with those for the1 SR 0and D1flip-flops). Note

1
1
Q same as for the
that the first three combinationsRare the
SR flip-flop. With no input
JK FLIP-FLOP Another useful flip-flop is the JK flip-flop. Like the SR flip-flop,
asserted, the output is stable. If only the J input is asserted, the result is a set function,
it has two inputs. However, in this case all possible combinations of input values are
valid. Figure 11.26 shows a gate implementation of the JK flip-flop, and Figure 11.27
J
Q
J
K
Qn!1
shows its characteristic table (along with those for the SR and D flip-flops). Note
0
0
Qn
that the first three combinations are the same as for the SR flip-flop. With no input
Ck

D Flip Flop

Clock
Load
D08

JK

D06

D05

D04

D03

D02

D01

Output lines

Figure 11.28 8-Bit Parallel Register

Registers

J-K Flip-Flop

Nguyn Kim Khnh DCE-HUST

D07

Jan2016

As an example of the use of flip-flops, let us first examine one of the essential elements of the CPU: the register. As we know, a register is a digital circuit used within
the CPU to store one or more bits of data. Two basic types of registers are commonly used: parallel registersComputer
and shift
registers.
Architecture

84

PARALLEL REGISTERS A parallel register consists of a set of 1-bit memories that

can be read or written simultaneously. It is used to store data. The registers that we
have discussed throughout this book are parallel registers.
The 8-bit register of Figure 11.28 illustrates the operation of a parallel register
using D flip-flops. A control signal, labeled load, controls writing into the register
from signal lines, D11 through D18. These lines might be the output of multiplexers,
so that data from a variety of sources can be loaded into the register.
SHIFT REGISTER A shift register accepts and/or transfers information serially.

Consider, for example, Figure 11.29, which shows a 5-bit shift register constructed
from clocked D flip-flops. Data are input only to the leftmost flip-flop. With each

21

Thus, a register made up of n flip-flops can count up to 2 - 1. An example of a


counter in the CPU is the program counter.
Counters can be designated as asynchronous or synchronous, depending on
the way in which they operate. Asynchronous counters are relatively slow because
the output of one flip-flop triggers a change in the status of the next flip-flop. In a
synchronous counter, all of the flip-flops change state at the same time. Because the
latter type is much faster, it is the kind used in CPUs. However, it is useful to begin
the discussion with a description of an asynchronous counter.

monly used: parallel registers and shift registers.


PARALLEL REGISTERS A parallel register consists of a set of 1-bit memories that
can be read or written simultaneously. It is used to store data. The registers that we
have discussed throughout this book are parallel registers.
The 8-bit register of Figure 11.28 illustrates the operation of a parallel register
using D flip-flops. A control signal, labeled load, controls writing into the register
from signal lines, D11 through D18. These lines might be the output of multiplexers,
so that data from a variety of sources can be loaded into the register.

Bi ging Kin trc my tnh

SHIFT REGISTER A shift register accepts and/or transfers information serially.


NKK-HUST
Consider, for example, Figure 11.29, which shows a 5-bit shift register constructed

NKK-HUST

from clocked D flip-flops. Data are input only to the leftmost flip-flop. With each
clock pulse, data are shifted to the right one position, and the rightmost bit is
transferred out.
Shift registers can be used to interface to serial I/O devices. In addition, they
can be used within the ALU to perform logical shift and rotate functions. In this

Thanh ghi dch 5-bit

RIPPLE COUNTER An asynchronous counter is also referred to as a ripple counter,


because the change that occurs to increment the counter starts at one end and
ripples through to the other end. Figure 11.30 shows an implementation of a
4-bit counter using JK flip-flops, together with a timing diagram that illustrates its
behavior. The timing diagram is idealized in that it does not show the propagation
delay that occurs as the signals move down the series of flip-flops. The output of
the leftmost flip-flop (Q0) is the least significant bit. The design could clearly be
extended to an arbitrary number of bits by cascading more flip-flops.

B m 4-bit
High
J

Serial in

Q
Clk

Clk

Q
Clk

Q
Clk

Jan2016

Serial out

Clk

Ck

Clock
K

Ck
Q

Ck

Q0

Q
Ck

Q1

K
Q2

Q
Q3

(a) Sequential circuit


Clock

Clock

Figure 11.29 5-Bit Shift Register

Q0
Q1
Q2
Q3
(b) Timing diagram

Figure 11.30
Jan2016

Computer Architecture

85

NKK-HUST

Jan2016

Ripple Counter
Computer Architecture

86

NKK-HUST

Kin trc my tnh

Chng 3
H THNG MY TNH

Ht chng 2

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

87

Jan2016

Computer Architecture

88

22

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Ni dung ca chng 3

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Jan2016

Computer Architecture

3.1. Cc thnh phn c bn ca my tnh


3.2. Hot ng c bn ca my tnh
3.3. Bus my tnh

89

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

3.1. Cc thnh phn c bn ca my tnh


n

CPU

B nh chnh

H thng vo-ra

Cha cc chng trnh ang


thc hin

Nguyn tc hot ng c bn:


n

Trao i thng tin gia my tnh


vi bn ngoi

iu khin hot ng ca my tnh


x l d liu
CPU hot ng theo chng trnh nm trong
b nh chnh.

L thnh phn nhanh nht trong h thng

Bus h thng (System bus)


n

Kt ni v vn chuyn thng tin

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Chc nng:
n

iu khin hot ng ca my
tnh v x l d liu

H thng vo-ra (Input/Output)


n

B nh chnh (Main Memory)


n

1. B x l trung tm (CPU)

B x l trung tm (Central
Processing Unit CPU)
n

Bus h thng

Jan2016

90

91

Jan2016

Computer Architecture

92

23

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cc thnh phn c bn ca CPU


n

n v
s hc v logic

Bus
h thng

Tp thanh ghi

Arithmetic and Logic Unit (ALU)


Thc hin cc php ton s hc v
php ton logic

n
n

93

Jan2016

n
n

n
Jan2016

B nh
m

B nh
chnh

Cc
thit b
lu tr

Computer Architecture

94

NKK-HUST

B nh chnh (Main memory)


n

B nh chnh (Main memory)


B nh m (Cache memory)
Thit b lu tr (Storage Devices)

CPU

NKK-HUST

Thao tc ghi (Write)


Thao tc c (Read)

Cc thnh phn chnh:


n

Register File (RF)


Gm cc thanh ghi cha cc thng
tin phc v cho hot ng ca CPU

Computer Architecture

Chc nng: nh chng trnh v d liu (di dng


nh phn)
Cc thao tc c bn vi b nh:
n

Tp thanh ghi
n

Jan2016

Control Unit (CU)


iu khin hot ng ca my tnh
theo chng trnh nh sn

n v s hc v logic
n

n v iu khin
n

n v
iu khin

2. B nh my tnh

Tn ti trn mi my tnh
Cha cc lnh v d liu ca
chng trnh ang c thc hin
S dng b nh bn dn
T chc thnh cc ngn nh
c nh a ch (thng nh
a ch cho tng byte nh)
Ni dung ca ngn nh c th
thay i, song a ch vt l ca
ngn nh lun c nh
CPU mun c/ghi ngn nh cn
phi bit a ch ngn nh
Computer Architecture

Nguyn Kim Khnh DCE-HUST

B nh m (Cache memory)
Ni dung

a ch

0100 1101

00...0000

0101 0101

00...0001

1010 1111

00...0010

0000 1110

00...0011

0111 0100

00...0100

1011 0010

00...0101

0010 1000

00...0110

1110 1111

00...0111

n
n
n

.
.

n
0110 0010

11...1110

0010 0001

11...1111
95

Jan2016

B nh c tc nhanh c t m gia CPU


v b nh chnh nhm tng tc CPU truy cp
b nh
Dung lng nh hn b nh chnh
S dng b nh bn dn tc nhanh
Cache thng c chia thnh mt s mc (L1,
L2, L3)
Cache thng c tch hp trn cng chip b
x l
Cache c th c hoc khng
Computer Architecture

96

24

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Thit b lu tr (Storage Devices)


n
n

Cn c gi l b nh ngoi
Chc nng v c im

Lu gi ti nguyn phn mm ca my tnh


c kt ni vi h thng di dng cc thit b vo-ra
Dung lng ln

Tc chm

n
n

3. H thng vo-ra

n
n

Cc loi thit b lu tr
n
n
n

Chc nng: Trao i


thng tin gia my tnh
vi th gii bn ngoi
Cc thao tc c bn:

Computer Architecture

97

NKK-HUST

Thit b
vo-ra

Cc thit b vo-ra
(IO devices)
Cc m-un vo-ra
(IO modules)

M-un
vo-ra

Thit b
vo-ra

Computer Architecture

98

NKK-HUST

M-un vo-ra

Cn c gi l thit b ngoi vi (Peripherals)


Chc nng: chuyn i d liu gia bn trong
v bn ngoi my tnh
Cc loi thit b vo-ra:

Thit b vo (Input Devices)


Thit b ra (Output Devices)
Thit b lu tr (Stotage Devices)
Thit b truyn thng (Communication Devives)

n
n
n
n

Jan2016

M-un
vo-ra

Vo d liu (Input)
Ra d liu (Output)

Jan2016

Cc thit b vo-ra
n

Thit b
vo-ra

Cc thnh phn chnh:

B nh t: a cng HDD
B nh bn dn: th rn SSD, nh flash, th nh
B nh quang: CD, DVD

Jan2016

Bus
h
thng

Computer Architecture

Nguyn Kim Khnh DCE-HUST

99

Jan2016

Chc nng: ni ghp cc thit b vo-ra vi


my tnh
Mi m-un vo-ra c mt hoc mt vi cng
vo-ra (I/O Port)
Mi cng vo-ra c nh mt a ch xc
nh
Cc thit b vo-ra c kt ni v trao i d
liu vi my tnh thng qua cc cng vo-ra
CPU mun trao i d liu vi thit b vo-ra,
cn phi bit a ch ca cng vo-ra tng
ng
Computer Architecture

100

25

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

3.2. Hot ng c bn ca my tnh

1. Thc hin chng trnh


n

n
n
n

Thc hin chng trnh


Hot ng ngt
Hot ng vo-ra

L hot ng c bn ca my tnh
My tnh lp i lp li chu trnh lnh gm
hai bc:
n
n

Hot ng thc hin chng trnh b


dng nu:
n
n
n

Jan2016

Computer Architecture

101

NKK-HUST

Jan2016

Thc hin lnh b li


Gp lnh dng
Tt my

Jan2016

Computer Architecture

102

NKK-HUST

Nhn lnh
n

Nhn lnh
Thc hin lnh

Minh ha qu trnh nhn lnh

Bt u mi chu trnh lnh, CPU nhn lnh t b


nh chnh
B m chng trnh PC (Program Counter) l
thanh ghi ca CPU dng gi a ch ca lnh
s c nhn vo

CPU

lnh

300

CPU

lnh

300

PC

lnh

301

PC

lnh

301

302

lnh i

302

303

lnh i

302

lnh i+1

303

lnh i+2

304

IR

CPU pht ra a ch t b m chng trnh PC


tm ra ngn nh cha lnh
Lnh c c t b nh a vo thanh ghi
lnh IR (Instruction Register)

lnh i+1

303

lnh i+2

304

IR
lnh i

Sau khi nhn lnh i

Trc khi nhn lnh i

Sau khi lnh c nhn vo, ni dung PC t


ng tng tr n lnh k tip.
Computer Architecture

Nguyn Kim Khnh DCE-HUST

103

Jan2016

Computer Architecture

104

26

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Thc hin lnh


n

2. Ngt (Interrupt)

B x l gii m lnh c nhn v


pht tn hiu iu khin thc hin thao
tc m lnh yu cu

Cc kiu thao tc c bn ca lnh:


n

Jan2016

Trao i d liu gia CPU vi b nh chnh


hoc CPU vi m-un vo-ra

Chuyn iu khin trong chng trnh: r


nhnh hoc nhy n v tr khc
105

NKK-HUST

Computer Architecture

106

NKK-HUST

Hot ng ngt (tip)

Sau khi hon thnh mi mt lnh, b x l kim


tra tn hiu ngt
Nu khng c ngt, b x l nhn lnh tip theo
ca chng trnh hin ti
Nu c tn hiu ngt:
n
n

n
n
Jan2016

Bit l (exception): gy ra do li khi thc hin


chng trnh (VD: trn s, m lnh sai, ...)
Ngt t bn ngoi (external interrupt): do thit b
vo-ra (thng qua m-un vo-ra) gi tn hiu ngt
n CPU yu cu trao i d liu

Jan2016

Hot ng vi ngt t bn ngoi


n

Chng trnh con x l ngt (Interrupt handlers)

Cc loi ngt:
n

Thc hin cc php ton s hc hoc php


ton logic vi cc d liu

Computer Architecture

Khi nim chung v ngt: Ngt l c ch cho


php CPU tm dng chng trnh ang thc
hin chuyn sang thc hin mt chng
trnh con c sn trong b nh.

Chng trnh
ang thc hin

lnh
lnh

Tm dng (suspend) chng trnh ang thc hin


Ct ng cnh (cc thng tin lin quan n chng trnh
b ngt)
Thit lp b m chng trnh PC tr n chng trnh
con x l ngt tng ng
Chuyn sang thc hin chng trnh con x l ngt
Khi phc ng cnh v tr v tip tc thc hin
chng trnh ang b tm dng
Computer Architecture

Nguyn Kim Khnh DCE-HUST

107

Ngt y

Chng trnh con


x l ngt

lnh

lnh

lnh

lnh

lnh i

lnh

lnh i+1
lnh

...

RETURN
...

lnh

Jan2016

Computer Architecture

108

27

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

X l vi nhiu tn hiu yu cu ngt


n

X l vi nhiu tn hiu yu cu ngt (tip)

X l ngt tun t
n

n
n

X l ngt u tin

Khi mt ngt ang c thc hin, cc ngt khc b


cm (disabled interrupt)
B x l s b qua cc yu cu ngt tip theo
Cc yu cu ngt
tip theo vn ang
i v c kim tra
sau khi ngt hin ti
c x l xong
Cc ngt c thc
hin tun t

n
n

Interrupt
handler X

User program

Interrupt
handler X

User program

Interrupt

Cc ngt c nh ngha mc u handler


tinY khc nhau
Ngt c mc u tin thp hn c th b ngt bi
ngt c mc u tin cao hn
Xy ra ngt lng nhau
(a) Sequential interrupt processing
Interrupt
handler X

User program

Interrupt
handler Y

Interrupt
handler Y

(a) Sequential interrupt processing


Interrupt
handler X

program
Computer User
Architecture

Jan2016

109

(b) Nested interrupt processing

Jan2016

Figure 3.13

Computer Architecture

Transfer of Control with Multiple Interrupts

110

82
Interrupt
handler Y

NKK-HUST

NKK-HUST

3. Hot ng vo-ra

3.3. Bus my tnh


1. Lung thng tin trong my tnh

(b) Nested interrupt processing

Hot ng vo-ra: l hot ng trao i


d liu gia m-un vo-ra vi bn trong
my tnh.
Figure 3.13 Transfer of Control with Multiple Interrupts

82

CPU trao i d liu vi m-un vo-ra bi


lnh vo-ra trong chng trnh

Nguyn Kim Khnh DCE-HUST

CPU
M-un nh
M-un vo-ra

cn c kt ni vi nhau

CPU trao quyn iu khin cho php m-un


vo-ra trao i d liu trc tip vi b nh
chnh (DMA - Direct Memory Access).
Computer Architecture

Cc m-un trong my tnh:


n

Cc kiu hot ng vo-ra:


n

Jan2016

111

Jan2016

Computer Architecture

112

28

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Kt ni m-un nh

Kt ni m-un nh (tip)
n
n

a ch

M-un
nh

d liu

a ch a n xc nh ngn nh
D liu c a n khi ghi
D liu hoc lnh c a ra khi c
n

d liu hoc lnh

B nh khng phn bit lnh v d liu

Nhn cc tn hiu iu khin:


n

Tn hiu iu khin c

iu khin c (Read)
iu khin ghi (Write)

Tn hiu iu khin ghi

Jan2016

Computer Architecture

113

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Kt ni m-un vo-ra

Kt ni m-un vo-ra (tip)


n

d liu t bn trong

d liu ra bn ngoi

d liu t bn ngoi

d liu vo bn trong

a ch a n xc nh cng vo-ra
Ra d liu (Output)
n

M-un
vo-ra

a ch

n
Cc tn hiu iu khin thit b

Cc tn hiu iu khin ngt

tn hiu iu khin ghi

n
n

Computer Architecture

Nguyn Kim Khnh DCE-HUST

115

Jan2016

Nhn d liu t bn trong (CPU hoc b nh chnh)


a d liu ra thit b vo-ra

Vo d liu (Input)
n

tn hiu iu khin c

Jan2016

114

Nhn d liu t thit b vo-ra


a d liu vo bn trong (CPU hoc b nh chnh)

Nhn cc tn hiu iu khin t CPU


Pht cc tn hiu iu khin n thit b vo-ra
Pht cc tn hiu ngt n CPU
Computer Architecture

116

29

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Kt ni CPU

Kt ni CPU (tip)
n

lnh

a ch
n
n

CPU

d liu

d liu

Cc tn hiu iu khin
b nh v vo-ra

Cc tn hiu iu khin ngt

Jan2016

Computer Architecture

117

NKK-HUST

Jan2016

n
n

Jan2016

118

S cu trc bus c bn

Bus: tp hp cc ng kt ni vn chuyn
thng tin gia cc m-un ca my tnh vi
nhau.
Cc bus chc nng:
n

Computer Architecture

NKK-HUST

2. Cu trc bus c bn
n

Pht a ch n cc m-un nh hay cc mun vo-ra


c lnh t b nh
c d liu t b nh hoc m-un vo-ra
a d liu ra (sau khi x l) n b nh
hoc m-un vo-ra
Pht tn hiu iu khin n cc m-un nh
v cc m-un vo-ra
Nhn cc tn hiu ngt

CPU

Bus a ch (Address bus)


Bus d liu (Data bus)
Bus iu khin (Control bus)

Nguyn Kim Khnh DCE-HUST

M-un
nh

M-un
vo-ra

M-un
vo-ra

bus a ch
bus d liu

rng bus: l s ng dy ca bus c th


truyn cc bit thng tin ng thi (ch dng cho
bus a ch v bus d liu)
Computer Architecture

M-un
nh

bus iu khin

119

Jan2016

Computer Architecture

120

30

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Bus a ch
n

Bus d liu

Chc nng: vn chuyn a ch xc nh v tr


ngn nh hay cng vo-ra
rng bus a ch:

Chc nng:
n
n

N bit:
AN-1, AN-2, ... A2, A1, A0
S lng a ch ti a c s dng l: 2N a ch
(gi l khng gian a ch)
n a ch nh nht:
00 ... 000 (2)
n a ch ln nht:
11 ... 111 (2)
n

Computer Architecture

V d:
My tnh c bus d liu kt ni CPU vi b nh l 64-bit
C th trao i 8 byte nh mt thi im

121

NKK-HUST

Jan2016

Computer Architecture

122

NKK-HUST

Bus iu khin

Mt s tn hiu iu khin in hnh

Chc nng: vn chuyn cc tn hiu iu khin

Cc loi tn hiu iu khin:


n

Cc tn hiu iu khin c/ghi

Cc tn hiu iu khin ngt

Cc tn hiu iu khin bus

Cc tn hiu (pht ra t CPU) iu khin c/ghi:


n

Jan2016

M bit: DM-1, DM-2, ... D2, D1, D0


M thng l 8, 16, 32, 64 bit

My tnh s dng bus a ch 32-bit (A31-A0), b nh


chnh c nh a ch cho tng byte
C kh nng nh a ch cho 232 bytes nh = 4GiB

Jan2016

rng bus d liu: s bit c truyn ng thi


n

V d:

vn chuyn lnh t b nh n CPU


vn chuyn d liu gia cc thnh phn ca my tnh
vi nhau

Computer Architecture

Nguyn Kim Khnh DCE-HUST

123

Jan2016

Memory Read (MEMR): Tn hiu iu khin c d


liu t mt ngn nh c a ch xc nh a ln bus
d liu.
Memory Write (MEMW): Tn hiu iu khin ghi d
liu c sn trn bus d liu n mt ngn nh c a
ch xc nh.
I/O Read (IOR): Tn hiu iu khin c d liu t mt
cng vo-ra c a ch xc nh a ln bus d liu.
I/O Write (IOW): Tn hiu iu khin ghi d liu c sn
trn bus d liu ra mt cng c a ch xc nh.
Computer Architecture

124

31

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Mt s tn hiu iu khin in hnh (tip)


n

Mt s tn hiu iu khin in hnh (tip)

Cc tn hiu iu khin ngt:


n

Interrupt Request (INTR): Tn hiu t b iu khin


vo-ra gi n yu cu ngt CPU trao i vo-ra.
Tn hiu INTR c th b che.

Cc tn hiu iu khin bus:


n

Interrupt Acknowledge (INTA): Tn hiu pht ra t


CPU bo cho b iu khin vo-ra bit CPU chp
nhn ngt trao i vo-ra.

Non Maskable Interrupt (NMI): tn hiu ngt khng


che c gi n ngt CPU.

Reset: Tn hiu t bn ngoi gi n CPU v cc


thnh phn khc khi ng li my tnh.

88

Bus Request (BRQ) : Tn hiu t m-un vo-ra gi


n yu cu CPU chuyn nhng quyn s dng
bus.
Bus Grant (BGT): Tn hiu pht ra t CPU chp nhn
chuyn nhng quyn s dng bus cho m-un vora.
Lock/ Unlock: Tn hiu cm/cho-php xin chuyn
nhng bus.

CHAPTER 3 / A TOP-LEVEL VIEW OF COMPUTER FUNCTION


Processor

Jan2016

Computer Architecture

125

Jan2016

Local bus

Cache

Computer Architecture

126

Local I/O
controller

Main
memory

System bus

NKK-HUST

NKK-HUST

Expansion
bus interface

Network
SCSI

3. Phn cp bus

Phn cp bus

Serial
Modem

Expansion bus

n bus: Tt c cc m-un kt ni vo bus


chung
n

(a) Traditional bus architecture

Main
memory

Bus ch phc v c mt yu cu trao i d liu


ti mt thi im tr ln
Bus phi c tc bng tc bus ca m-un
nhanh nht trong h thng

Processor

SCSI

a bus: Phn cp thnh nhiu bus cho cc


m-un khc nhau v c tc khc nhau
n
n
n

FAX

Bus ca b x l
Bus ca RAM
Cc bus vo-ra
Computer Architecture

Nguyn Kim Khnh DCE-HUST

Cache /
bridge

FireWire

System bus

Graphic

Video

LAN

High-speed bus

Expansion
bus interface

Serial
Modem

Expansion bus
(b) High-performance architecture

Figure 3.17

Jan2016

Local bus

127

Jan2016

Example Bus Configurations

Computer Architecture

so an external bus or other interconnect scheme is not needed, although there may
also be an external cache. As will be discussed in Chapter 4, the use of a cache structure insulates the processor from a requirement to access main memory frequently.
Hence, main memory can be moved off of the local bus onto a system bus. In this way,
I/O transfers to and from the main memory across the system bus do not interfere
with the processors activity.

128

32

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

4. Kt ni im-im
n
n

Point-to-point connection
Khc phc nhc im ca bus dng chung
(shared bus)
Kt ni QPI

Kt ni PCIe

I/O Hub

Core
A

Core
B

Core

DRAM

DRAM

I/O device

CHAPTER 3 / A TOP-LEVEL VIEW OF COMPUTER FUNCTION


I/O device

94

Mt s bus in hnh trong my tnh

Gigabit
Ethernet

PCIe

PCIePCI
Bridge

PCIe

n
n

3.6 / PCI EXPRESS

99

Core

Memory

Chipset
Memory

I/O Hub

DRAM

Core
D

PCIe

Legacy
endpoint

Figure 3.24

QPI

Jan2016

Figure 3.20

PCI Express

Memory bus

PCIe

Switch

PCIe

I/O device

I/O device

DRAM

PCIe

Core
C

QPI (Quick Path Interconnect)


PCI bus (Peripheral Component Interconnect):
bus vo-ra a nng
PCIe: (PCI express) kt ni im-im a nng
tc cao
SATA (Serial Advanced Technology Attachment):
Bus kt ni vi a cng hoc a CD/DVD
USB (Universal Serial Bus): Bus ni tip a nng

PCIe
PCIe
endpoint

PCIe
endpoint

PCIe
endpoint

Typical Configuration Using PCIe

device and one or more that attach to a switch that manages multiple PCIe streams.
PCIe links from the chipset may attach to the following kinds of devices that imple129
ment PCIe:

Multicore Configuration Using QPI


Computer Architecture

Jan2016

Computer Architecture

130

Switch: The switch manages multiple PCIe streams.


PCIe endpoint: An I/O device or controller that implements PCIe, such as
the network. Direct QPI connections can be a Gigabit Ethernet switch, a graphics or video controller, disk interface, or a
processors. If core A in Figure 3.20 needs to communications controller.

that enables data to move throughout


established between each pair of core
access the memory controller in core D, it sends its request through either cores B Legacy endpoint: Legacy endpoint category is intended for existing designs
or C, which must in turn forward that request on to the memory controller in core D. that have been migrated to PCI Express, and it allows legacy behaviors such
as use of I/O space and locked transactions. PCI Express endpoints are not
Similarly, larger systems with eight or more processors can be built using processors permitted to require the use of I/O space at runtime and must not use locked
with three links and routing traffic through intermediate processors.
transactions. By distinguishing these categories, it is possible for a system
In addition, QPI is used to connect to an I/O module, called an I/O hub (IOH). designer to restrict or eliminate legacy behaviors that have negative impacts
The IOH acts as a switch directing traffic to and from I/O devices. Typically in newer on system performance and robustness.
NKK-HUST
PCIe/PCI bridge: Allows older PCI devices to be connected to PCIe-based
systems, the link from the IOH to the I/O device controller uses an interconnect systems.
technology called PCI Express (PCIe), described later in this chapter. The IOH transAs with QPI, PCIe interactions are defined using a protocol architecture. The
lates between the QPI protocols and formats and the PCIe protocols and formats. A
PCIe protocol architecture encompasses the following layers (Figure 3.25):
core also links to a main memory module (typically the memory uses dynamic access
random memory (DRAM) technology) using a dedicated memory bus.
QPI is defined as a four-layer protocol architecture,3 encompassing the
following layers (Figure 3.21):
8
7
Physical: Consists of the actual wires carrying the signals, as well as circuitry
and logic to support ancillary features required in the transmission and receipt
of the 1s and 0s. The unit of transfer at the Physical layer is 20 bits, which is 3
9
called a Phit (physical unit).
2

NKK-HUST

V d bus trong my tnh Intel

9.6 (24.38cm)

The reader unfamiliar with the concept of a protocol architecture will find a brief overview in Appendix L.

Ht chng 3

5
6

11

13
4 12

10

14 15

11.6 (29.46cm)

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

131

Jan2016

Computer Architecture

132

33

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Kin trc my tnh

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song

Chng 4
S HC MY TNH

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

133

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Ni dung chng 4

4.1. Biu din s nguyn

4.1. Biu din s nguyn


4.2. Php cng v php tr s nguyn
4.3. Php nhn v php chia s nguyn
4.4. S du phy ng

Jan2016

134

Computer Architecture

Nguyn Kim Khnh DCE-HUST

n
n

135

Jan2016

S nguyn khng du (Unsigned Integer)


S nguyn c du (Signed Integer)

Computer Architecture

136

34

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

1. Biu din s nguyn khng du

V d 1

Nguyn tc tng qut: Dng n bit biu din s


nguyn khng du A:

Biu din cc s nguyn khng du sau y


bng 8-bit:
A = 41 ; B = 150
Gii:
A = 41 = 32 + 8 + 1 = 25 + 23 + 20
41 = 0010 1001

an1an2 ... a2 a1a0


Gi tr ca A c tnh nh sau:
n1

A = ai 2 i

B = 150 = 128 + 16 + 4 + 2 = 27 + 24 + 22 + 21
150 = 1001 0110

i=0

Di biu din ca A:
Jan2016

[0, 2n 1]

Computer Architecture

137

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

V d 2
n

Vi n = 8 bit
Biu din c cc gi tr t 0 n 255 (28 - 1)

Cho cc s nguyn khng du M, N c


biu din bng 8-bit nh sau:

Ch :

Biu din
nh phn

Gi tr
thp phn

0000 0000
0000 0001
0000 0010
0000 0011

0
1
2
3
4

do vt ra khi di biu din

0000 0100
...
1111 1110

254

1111 1111

255

M = 0001 0010

N = 1011 1001

+ 0000 0001

Xc nh gi tr ca chng ?
n
n

M = 0001 0010 =

24

N = 1011 1001 =

27

21

25

c nh ra ngoi
(Carry out)

= 16 +2 = 18
+

24

23

255 + 1 = 0 ???

20

= 128 + 32 + 16 + 8 + 1 = 185
Computer Architecture

Nguyn Kim Khnh DCE-HUST

1111 1111

1 0000 0000

Gii:

Jan2016

138

139

Jan2016

Computer Architecture

140

35

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Trc s hc vi n = 8 bit

Vi n = 16 bit, 32 bit, 64 bit

Trc s hc:

n= 16 bit: di biu din t 0 n 65535 (216 - 1)

255

0 1 2 3

n
n

Trc s hc my tnh:

254

255 0

n
n

n
n

Jan2016

Computer Architecture

141

NKK-HUST

0000 0000 0000 0000


...
0000 0000 1111 1111
0000 0001 0000 0000
...
1111 1111 1111 1111

= 0
= 255
= 256
= 65535

n= 32 bit: di biu din t 0 n 232 - 1


n= 64 bit: di biu din t 0 n 264 - 1

Jan2016

Computer Architecture

142

NKK-HUST

2. Biu din s nguyn c du

V d
Vi n = 8 bit, cho A = 0010 0101

S b mt v S b hai
n

- 0010 0101

S b hai ca A

(28 - 1)
(A)

1101 1010

S b mt ca A = (2n-1) - A
2n

S b mt ca A c tnh nh sau:
1111 1111

nh ngha: Cho mt s nh phn A


c biu din bng n bit, ta c:
n

o cc bit ca A

-A

S b hai ca A = (S b mt ca A) +1

S b hai ca A c tnh nh sau:


1 0000 0000

(28)

- 0010 0101

(A)

1101 1011
thc hin kh khn
Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

143

Jan2016

Computer Architecture

144

36

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

n
n
n

NKK-HUST

Quy tc tm S b mt v S b hai

Biu din s nguyn c du theo m b hai

S b mt ca A = o gi tr cc bit ca A
(S b hai ca A) = (S b mt ca A) + 1
V d:

Nguyn tc tng qut: Dng n bit biu din s


nguyn c du A:

Cho
A
S b mt ca A

=
=

S b hai ca A

an1an2 ... a2 a1a0

0010 0101
1101 1010
+
1
1101 1011

Nhn xt:

A
S b hai ca A

0010 0101
+ 1101 1011
1 0000 0000 = 0
(b qua bit nh ra ngoi)
S b hai ca A = -A

Jan2016

=
=

Computer Architecture

Vi A l s dng: bit an-1 = 0, cc bit cn li


biu din ln nh s khng du
Vi A l s m: c biu din bi s b hai
ca s dng tng ng, v vy bit an-1 = 1

145

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

V d
n

Xc nh gi tr ca s dng

Biu din cc s nguyn c du sau y bng


8-bit:
A = + 58 ; B = - 80
Gii:
A
= + 58
= 0011 1010
B

Jan2016

146

= - 80
Ta c: + 80
S b mt

Dng tng qut ca s dng:

0 an2 ... a2 a1 a0
n

Gi tr ca s dng:
n2

S b hai

= 0101 0000
= 1010 1111
+
1
= 1011 0000

Vy: B = - 80

= 1011 0000

Computer Architecture

Nguyn Kim Khnh DCE-HUST

A = ai 2 i
i=0
n

147

Jan2016

Di biu din cho s dng: [0, (2n-1 - 1)]


Computer Architecture

148

37

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Xc nh gi tr ca s m
n

Cng thc xc nh gi tr s m
! = 1!!!!! !!!!! ! !!! !!! !!! !

Dng tng qut ca s m:

! = 0!!!!! !!!!! !! !!! !!! + 1!

1an2 ... a2 a1a0


n

= 11 111 !!!!! !!!!! ! !!! !!! !!! + 1!

Gi tr ca s m:

!!!

n2

A = 2

n1

+ ai 2

Chng minh?

(2!!!

!!!

i=0
n

!! 2! + 1!

1)

!1

Di biu din cho s m: [- 2n-1, -1]

! = 2

!2

+!

!! 2 !
!=0

Jan2016

Computer Architecture

149

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Cng thc tng qut cho s nguyn c du


n

V d

Dng tng qut ca s nguyn c du A:

an1an2 ... a2 a1a0


n

Gi tr ca A c xc nh nh sau:
n2

A = an1 2

n1

+ ai 2 i

Jan2016

Di biu din: [-(2n-1), +(2n-1-1)]

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Hy xc nh gi tr ca cc s nguyn c du
c biu din theo m b hai vi 8-bit nh
di y:
n

P = 0110 0010

Q = 1101 1011

Gii:

i=0
n

150

151

Jan2016

P = 0110 0010 = 64+32+2 = +98

Q = 1101 1011 = -128+64+16+8+2+1 = -37

Computer Architecture

152

38

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Vi n = 8 bit

Trc s hc s nguyn c du vi n = 8 bit

Biu din c cc gi tr
t -27 n +27-1

Gi tr
thp phn

Biu din
b hai

-128 n +127
Ch c mt gi tr 0
Khng biu din cho gi tr
+128

0000 0000
0000 0001

+2

0000 0010

+126

0111 1110

+127

0111 1111

-128

1000 0000

-127

1000 0001

Trc s hc:
-128

...

Ch :
+127 + 1 = -128
(-128)+(-1) = +127
c trn xy ra (Overflow)

-2

1111 1110

-1

1111 1111

Computer Architecture

Trc s hc my tnh:

+1

+2
+3

-128

153

Jan2016

+127

Computer Architecture

154

M rng bit cho s nguyn

Vi n = 16bit: biu din t -215 n 215-1


n
n
n
n
n
n
n

0000 0000 0000 0000 =


0000 0000 0000 0001 =
...
0111 1111 1111 1111 =
1000 0000 0000 0000 =
1000 0000 0000 0001 =
...
1111 1111 1111 1111 =

0
+1

M rng theo s khng du (Zero-extended):


thm cc bit 0 vo bn tri
M rng theo s c du (Sign-extended):
n

+32767 (215 - 1)
-32768 (-215)
+32767

S dng:
+19 =

Nguyn Kim Khnh DCE-HUST

(16bit)

S m:
1110 1101

- 19 = 1111 1111 1110 1101

Vi n = 32bit: biu din t


n
63
Vi n = 64bit: biu din t -2 n 263-1
Computer Architecture

(8bit)

thm cc bit 0 vo bn tri

-1

-231

0001 0011

+19 = 0000 0000 0001 0011

- 19 =

NKK-HUST

Jan2016

-1

-3

Vi n = 16 bit, 32 bit, 64 bit

+127

-2

NKK-HUST

-2 -1 0 1 2

...

(do vt ra khi di biu din)

Jan2016

0
+1

231-1

(8bit)
(16bit)

thm cc bit 1 vo bn tri


155

Jan2016

Computer Architecture

156

39

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Nguyn tc cng s nguyn khng du

4.2. Thc hin php cng/tr vi s nguyn

1. Php cng s nguyn khng du


B cng n-bit
Y

Khi cng hai s nguyn khng du n-bit,


kt qu nhn c l n-bit:
n

n bit
Cout

n bit

B cng n-bit

Cin
n

n bit

Nu Cout = 0 nhn c kt qu ng
Nu Cout = 1 nhn c kt qu sai, do
c nh ra ngoi (Carry Out)

Hin tng nh ra ngoi xy ra khi:


tng > (2n - 1)

Jan2016

Computer Architecture

157

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

2. Php o du

V d cng s nguyn khng du


+

57
34
91

=
= +

209
73
282

=
= +

0011 1001
0010 0010
0101 1011 = 64+16+8+2+1=91 ng

1101 0001
0100 1001
1 0001 1010
kt qu = 0001 1010 =16+8+2=26 sai
do c nh ra ngoi (Cout=1)

c kt qu ng, ta thc hin cng theo 16-bit:


209 =
0000 0000 1101 0001
+ 73 = + 0000 0000 0100 1001
0000 0001 0001 1010 = 256+16+8+2 = 282
Jan2016

158

Computer Architecture

Nguyn Kim Khnh DCE-HUST

159

Jan2016

Ta c:
+ 37
b mt

=
=

b hai

0010 0101
1101 1010
+
1
1101 1011 =

-37

Ly b hai ca s m:
- 37 =
1101 1011
b mt =
0010 0100
+
1
b hai
=
0010 0101 =

+37

Kt lun: Php o du s nguyn trong my tnh


thc cht l ly b hai
Computer Architecture

160

40

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

3. Cng s nguyn c du
n

Khi cng hai s nguyn c du n-bit, kt qu


nhn c l n-bit v khng cn quan tm n
bit Cout
n
n

V d cng s nguyn c du khng trn

Khi cng hai s khc du th kt qu lun lun ng


Khi cng hai s cng du, nu du kt qu cng du
vi cc s hng th kt qu l ng
Khi cng hai s cng du, nu kt qu c du ngc
li, khi c trn (Overflow) xy ra v kt qu b sai

Hin tng trn xy ra khi tng nm ngoi di


biu din: [ -(2n-1),+(2n-1-1)]

Jan2016

Computer Architecture

161

NKK-HUST

=
=

0100 0110
0010 1010
0111 0000 = +112

(+ 97)
+ (- 52)
+ 45

=
=

0110 0001
1100 1100 (+52=0011 0100)
1 0010 1101 = +45

( - 90)
+ ( +36)
- 54

=
=

1010 0110 (+90=0101 1010)


0010 0100
1100 1010 = - 54

( - 74)
+( - 30)
-104

=
=

1011 0110 (+74=0100 1010)


1110 0010 (+30=0001 1110)
1 1001 1000 = -104

Jan2016

Computer Architecture

162

NKK-HUST

V d cng s nguyn c du b trn


( + 75)
+( + 82)
+157

4. Nguyn tc thc hin php tr

= 0100 1011
= 0101 0010
1001 1101
= - 128+16+8+4+1= -99 sai

n
n

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Php tr hai s nguyn: X-Y = X+(-Y)


Nguyn tc: Ly b hai ca Y c Y,
ri cng vi X
n-bit

( - 104) = 1001 1000


(+104=0110 1000)
+ ( - 43) = 1101 0101
(+ 43 =0010 1011)
- 147
1 0110 1101
= 64+32+8+4+1= +109 sai
n C hai v d u trn v tng nm ngoi di
biu din [-128, +127]
n

Jan2016

( + 70)
+ ( + 42)
+ 112

n-bit

B hai

B cng n-bit
n-bit

163

Jan2016

S= X-Y

Computer Architecture

164

41

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

4.3. Php nhn v php chia s nguyn

Nhn s nguyn khng du (tip)

1. Nhn s nguyn khng du


1011
x 1101
1011
0000
1011
1011
10001111
Jan2016

Cc tch ring phn c xc nh nh sau:


n

S b nhn (11)
S nhn
(13)

Cc tch ring phn

n
n

Tch

Nu bit ca s nhn bng 0 tch ring phn bng 0


Nu bit ca s nhn bng 1 tch ring phn bng s
b nhn
Tch ring phn tip theo c dch tri mt bit so vi
tch ring phn trc

Tch bng tng cc tch ring phn


Nhn hai s nguyn n-bit, tch c di 2n bit
(khng bao gi trn)

(143)

Computer Architecture

165

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

B nhn s nguyn khng du

Lu nhn s nguyn khng du


Bt u

S b nhn
Mn-1 Mn-2

...

M1

C 0; A 0
M S b nhn
Q S nhn
B m n

M0

iu
khin
cng

B cng n-bit

B logic iu khin
cng v dch

No

An-1 An-2

...

A1

A0

Qn-1 Qn-2

Dch phi C,A,Q


B m B m-1

...

Q1

Q0
B m = 0 ?

S nhn

Nguyn Kim Khnh DCE-HUST

No

Yes
Kt thc

Computer Architecture

Yes

Q0 = 1 ?

C,A A+M

iu khin
dch phi

Jan2016

166

167

Jan2016

Tch trong AQ

Computer Architecture

168

42

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d nhn s nguyn khng du


n
n
n

S b nhn M =
S nhn Q =
Tch
=

C
0

0
0

0
0

1
0

Jan2016

1011
1101
1000 1111

2. Nhn s nguyn c du

(11)
(13)
(143)

n
n

S dng thut gii nhn khng du


S dng thut gii Booth

A
Q
0000 1101 Cc gi tr khi u
+ 1011
1011 1101 A A + M
0101 1110 Dch phi
0010
+ 1011
1101
0110
+ 1011
0001
1000

1111 Dch phi


10.3 / INTEGER ARITHMETIC

1111 A A + M
1111 Dch phi
1111 A A + M
1111 Dch phi
Computer Architecture

169

Jan2016

335

as the twos complement value -7, then each partial product must be a negative
twos complement number of 2n (8) bits, as shown in Figure 10.11b. Note that this is
accomplished by padding out each partial product to the left with binary 1s.
If the multiplier is negative, straightforward multiplication also will not work.
The reason is that the bits of the multiplier no longer correspond to the shifts or
multiplications that must take place. For example, the 4-bit decimal number -3 is
written 1101 in twos complement. If we simply took partial products based on each
bit position, we would have the
following
correspondence:
Computer
Architecture

170

1101 g -(1 * 23 + 1 * 22 + 0 * 21 + 1 * 20) = -(23 + 22 + 20)

NKK-HUST

NKK-HUST

S dng thut gii nhn khng du


n

In fact, what is desired is -(21 + 20). So this multiplier cannot be used directly in
the manner we have been describing.
There are a number of ways out of this dilemma. One would be to convert
both multiplier and multiplicand to positive numbers, perform the multiplication,
and then take the twos complement of the result if and only if the sign of the two
original numbers differed. Implementers have preferred to use techniques that
do not require this final transformation step. One of the most common of these is
Booths algorithm. This algorithm also has the benefit of speeding up the multiplication process, relative to a more straightforward approach.
Booths algorithm is depicted in Figure 10.12 and can be described as follows.
As before, the multiplier and multiplicand are placed in the Q and M registers,

Thut gii Booth (tham kho sch COA)

Bc 1. Chuyn i s b nhn v s nhn


thnh s dng tng ng

START

0
A
0, Q"1
M
Multiplicand
Q
Multiplier
Count
n

Bc 2. Nhn hai s dng bng thut gii


nhn s nguyn khng du, c tch ca hai
s dng.

! 10

Bc 3. Hiu chnh du ca tch:


n

A"M

Nu hai tha s ban u cng du th gi nguyn


kt qu bc 2

Q0, Q"1

! 01

! 11
! 00

A#M

Arithmetic shift
Right: A, Q, Q"1
Count
Count " 1

Nu hai tha s ban u l khc du th o du


kt qu ca bc 2 (ly b hai)

No

Count ! 0?

Yes

END

Figure 10.12 Booths Algorithm for Twos


Complement Multiplication

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

171

Jan2016

Computer Architecture

172

43

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

3. Chia s nguyn khng du

B chia s nguyn khng du


S chia M
Mn-1 Mn-2

S b chia

10010011
- 1011
001110
- 1011
001111
- 1011
100

Jan2016

1011
00001101

S chia
Thng

M1

M0

iu
khin
cng/tr

B cng/tr n-bit

B logic iu khin
cng/tr v dch

iu khin
dch tri

An-1 An-2

Phn d

Computer Architecture

...

A1

A0

Qn-1 Qn-2

...

Q1

Q0

S b chia Q

173

NKK-HUST

Jan2016

Computer Architecture

174

NKK-HUST

Lu chia s nguyn khng du

4. Chia s nguyn c du

Bt u

n
A 0
M S chia
Q S b chia
B m n

Dch tri A,Q


AA-M
No

n
Yes

Q0 0
AA+M

Q0 1

B mB m-1
No
B m = 0 ?
Yes
Kt thc

Thng Q
S d A

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Bc 1. Chuyn i s b chia v s chia v thnh s


dng tng ng.
Bc 2. S dng thut gii chia s nguyn khng du
chia hai s dng, kt qu nhn c l thng Q v
phn d R u l dng
Bc 3. Hiu chnh du ca kt qu nh sau:
(Lu : php o du thc cht l thc hin php ly b hai)

A<0?

Jan2016

...

175

Jan2016

S b chia

S chia

Thng

S d

dng

dng

gi nguyn

gi nguyn

dng

o du

gi nguyn

dng

o du

o du

gi nguyn

o du

Computer Architecture

176

44

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

4.4. S du phy ng

2. Chun IEEE754-2008

1. Nguyn tc chung
n Floating Point Number biu din cho s
thc
n Tng qut: mt s thc X c biu din
theo kiu s du phy ng nh sau:
X = M * RE
n
n
n

C s R =3462 CHAPTER 10 / COMPUTER ARITHMETIC


Cc dng:

Sign Biased
bit exponent

8 bits
23 bits
(a) Binary32 format
Sign Biased
bit exponent

Computer Architecture

Dng 64-bit

Trailing significand field


11 bits
(b) Binary64 format

M l phn nh tr (Mantissa),
R l c s (Radix),
E l phn m (Exponent).

Jan2016

Trailing
significand field

Dng 32-bit

52 bits

Sign
bit

Biased
exponent

Dng 128-bit

Trailing significand field

15 bits
(c) Binary128 format

Figure 10.21

177

112 bits

IEEE 754 Formats

is implementation dependent, but the standard places certain constraints on the


length of the exponent and significand. These formats are arithmetic format types
but not interchange format types. The extended formats are to be used for interComputer Architecture
mediate calculations.
With their greater precision, the extended formats lessen 178
the

Jan2016

Table 10.3

IEEE 754 Format Parameters


Parameter

Format
Binary32

Binary64

32

64

Storage width (bits)


Exponent width (bits)

NKK-HUST

NKK-HUST

Dng 32-bit
e

S
1 bit

S = 0 s dng
S = 1 s m

128

11

15

Exponent bias

127

1023

16383

Maximum exponent

127

1023

16383

- 126

Minimum exponent
Approx normal number range
(base 10)

10-38, 10+38

- 1022
10-308, 10+308

- 16382
10-4932, 10+4932

Trailing significand width (bits)*

23

52

112

Number of exponents

254

2046

32766

Number of fractions

223

252

2112

1.99 * 2128

Xc nh gi tr ca cc s thc c biu din


bng 32-bit sau y:
1100 0001 0101 0110 0000 0000 0000 0000
Number of values

23 bit

1.98 * 231

1.99 * 263

Smallest positive normal number

2-126

2-1022

2-16362

Largest positive normal number

2128 - 2104

21024 - 2971

216384 - 216271

Smallest subnormal magnitude

-149

-1074

2-16494

Note: *not including implied bit and not including sign bit

S = 1 s m
e = 1000 0010(2) = 130(10) E = 130 - 127 = 3
Vy
X = -1.10101100(2) * 23 = -1101.011(2) = -13.375(10)
n
n

e = E + 127 phn m E = e - 127

m (23 bit) l phn l ca phn nh tr M:


n

8 bit

e (8 bit) l gi tr dch chuyn ca phn m E:


n

S l bit du:
n

V d 1

Binary128

M = 1.m

0011 1111 1000 0000 0000 0000 0000 0000 = ?

Cng thc xc nh gi tr ca s thc:

X = (-1)S*1.m * 2e-127
Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

179

Jan2016

Computer Architecture

180

45

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d 2

Cc qui c c bit

Biu din s thc X= 83.75(10) v dng s du


phy ng IEEE754 32-bit

Cc bit ca e bng 0, cc bit ca m bng 0, th X = 0


x000 0000 0000 0000 0000 0000 0000 0000 X = 0

Gii:

Cc bit ca e bng 1, cc bit ca m bng 0, th X =


x111 1111 1000 0000 0000 0000 0000 0000 X =

X = 83.75(10) = 1010011.11(2) = 1.01001111 x 26

Ta c:

Cc bit ca e bng 1, cn m c t nht mt bit bng 1, th


n khng biu din cho s no c (NaN - not a number)

S = 0 v y l s dng

E = e - 127 = 6 e = 127 + 6 = 133(10) = 1000 0101(2)

Vy:

X = 0100 0010 1010 0111 1000 0000 0000 0000


Jan2016

Computer Architecture

181

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Di gi tr biu din
n
n

182

Dng 64-bit

2-127 n 2+127
10-38 n 10+38

n
n

S l bit du
e (11 bit) l gi tr dch chuyn ca phn
m E:
n

-2+127

-2-127 0

+2-127

+2+127

n
n

e = E + 1023 phn m E = e - 1023

m (52 bit): phn l ca phn nh tr M


Gi tr s thc:

X = (-1)S*1.m * 2e-1023
n

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

183

Jan2016

Di gi tr biu din: 10-308 n 10+308


Computer Architecture

184

46

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Dng 128-bit
n
n

S l bit du
e (15 bit) l gi tr dch chuyn ca phn
m E:
n

n
n

3. Thc hin php ton s du phy ng


n
n
n

e = E + 16383 phn m E = e - 16383

X1 = M1 * RE1
X2 = M2 * RE2
Ta c
n

m (112 bit): phn l ca phn nh tr M


Gi tr s thc:

n
n

X1 * X2 = (M1* M2) * RE1+E2


X1 / X2 = (M1 / M2) * RE1-E2
X1 X2 = (M1*RE1-E2 M2) * RE2 , vi E2 E1

X = (-1)S*1.m * 2e-16383
n

Jan2016

Di gi tr biu din: 10-4932 n 10+4932

Computer Architecture

185

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Cc kh nng trn s
n

Jan2016

186

Php cng v php tr

Trn trn s m (Exponent Overflow): m


dng vt ra khi gi tr cc i ca s m
dng c th ( )
Trn di s m (Exponent Underflow): m m
vt ra khi gi tr cc i ca s m m c th
( 0)
Trn trn phn nh tr (Mantissa Overflow):
cng hai phn nh tr c cng du, kt qu b
nh ra ngoi bit cao nht
Trn di phn nh tr (Mantissa Underflow):
Khi hiu chnh phn nh tr, cc s b mt bn
phi phn nh tr
Computer Architecture

Nguyn Kim Khnh DCE-HUST

187

n
n
n

Jan2016

Kim tra cc s hng c bng 0 hay


khng
Hiu chnh phn nh tr
Cng hoc tr phn nh tr
Chun ho kt qu

Computer Architecture

188

47

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Thut ton cng/tr s du phy ng

Thut ton nhn s du phy ng


NHN

TR

X=0?

i du ca Y

CNG

X=0?

Y=0?

Phn m
bng nhau?

ZY

N
Tng phn m
nh hn

Z X

TR V

Dch phi
phn nh tr

Z 0

Kt qu
chun ha?

nh tr = 0?

N
Dch phi
phn nh tr

Lm trn
kt qu

Z 0

TR V

TR V

Y=0?
Y

Cng phn m
Tr cho
lch

Trn trn
phn m?

Y Thng bo
trn trn

TR V

nh tr b
trn?

Gim phn m

Trn di
phn m?

Y
Dch phi
phn nh tr

Y=0?
Y
Ct s kia
vo Z

Cng c du
phn nh tr

N Phn m b
trn di?

Tng phn m

N
Nhn phn
nh tr

Y Thng bo
trn di

TR V

Y
Bo trn di

TR V

Chun ha
TR V

Bo trn

Phn m
b trn?

TR V

Lm trn

TR V

Jan2016

Computer Architecture

189

NKK-HUST

Jan2016

Computer Architecture

190

NKK-HUST

Thut ton chia s du phy ng


CHIA

X=0?

Y=0?

Z 0

Tr phn m
Cng thm
lch

Trn trn
phn m?

TR V

N
Trn di
phn m?
N
Chia phn
nh tr

Ht chng 4

Y Thng bo
trn trn

Thng bo
trn di

TR V

Chun ha

Lm trn

TR V

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

191

Jan2016

Computer Architecture

192

48

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Kin trc my tnh

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song

Chng 5
KIN TRC TP LNH

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

193

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Ni dung ca chng 5

5.1. Gii thiu chung v kin trc tp lnh


n

5.1. Gii thiu chung v kin trc tp lnh


5.2. Lnh hp ng v ton hng
5.3. M my
5.4. C bn v lp trnh hp ng
5.5. Cc phng php nh a ch
5.6. Dch v chy chng trnh hp ng

Kin trc tp lnh (Instruction Set Architecture):


cch nhn my tnh bi ngi lp trnh
Vi kin trc (Microarchitecture): cch thc hin
kin trc tp lnh bng phn cng
Ngn ng trong my tnh:
n

Hp ng (assembly language):
n
n

Computer Architecture

Nguyn Kim Khnh DCE-HUST

195

Jan2016

dng lnh c th c c bi con ngi


biu din dng text

Ngn ng my (machine language):


n

cn gi l m my (machine code)
dng lnh c th c c bi my tnh

biu din bng cc bit 0 v 1

Jan2016

194

Computer Architecture

196

49

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

M hnh lp trnh ca my tnh

CPU nhn lnh t b nh

B nh chnh

CPU
lnh
lnh
lnh
lnh

PC
n v
iu khin

.
.
.

d liu
d liu
d liu
d liu
.
.
.

ALU

Tp thanh ghi

Vo-ra

B m chng trnh PC
(Program Counter) l thanh ghi
ca CPU gi a ch ca lnh cn
nhn vo thc hin
CPU pht a ch t PC n b
nh, lnh c nhn vo

Jan2016

Computer Architecture

NKK-HUST

Jan2016

lnh k tip
lnh
lnh

Ty thuc vo di ca lnh va
c nhn
MIPS: lnh c di 32-bit, PC tng 4
Computer Architecture

198

CPU c/ghi d liu b nh

B x l gii m lnh c nhn v pht cc


tn hiu iu khin thc hin thao tc m lnh yu
cu
Cc kiu thao tc chnh ca lnh:
n

Jan2016

lnh c
nhn vo

NKK-HUST

Gii m v thc hin lnh


n

lnh
PC

PC tng bao nhiu?

197

lnh

Sau khi lnh c nhn vo, ni


dung PC t ng tng tr sang
lnh k tip
n

.
.
.

lnh

Vi cc lnh trao i d liu vi b nh,


CPU cn bit v pht ra a ch ca ngn
nh cn c/ghi
a ch c th l:

Trao i d liu gia CPU v b nh chnh hoc cng


vo-ra

Thc hin cc php ton s hc hoc php ton logic


vi cc d liu (c thc hin bi ALU)

Hng s a ch c cho trc tip trong lnh


Gi tr a ch nm trong thanh ghi con tr
a ch = a ch c s + gi tr dch chuyn

Chuyn iu khin trong chng trnh (r nhnh, nhy)

Computer Architecture

Nguyn Kim Khnh DCE-HUST

199

Jan2016

Computer Architecture

200

50

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Hng s a ch
Trong lnh cho hng s
a ch c th
CPU pht gi tr a ch
ny n b nh tm
ra ngn nh d liu cn
c/ghi

S dng thanh ghi con tr


n

d liu
d liu
Hng s a ch

d liu

d liu cn c/ghi
d liu
d liu

d liu
d liu

Jan2016

Computer Architecture

201

NKK-HUST

Jan2016

Jan2016

d liu
d liu
d liu
Thanh ghi

d liu cn c/ghi
d liu
d liu
d liu
d liu

Computer Architecture

202

NKK-HUST

S dng a ch c s v dch chuyn


n

Trong lnh cho bit


tn thanh ghi con tr
Thanh ghi con tr
cha gi tr a ch
CPU pht a ch ny
ra tm ra ngn nh
d liu cn c/ghi

a ch c s (base address):
a ch ca ngn nh c s
Gi tr dch chuyn a ch (offset):
gia s a ch gia ngn nh cn
c/ghi so vi ngn nh c s
a ch ca ngn nh cn c/ghi
= (a ch c s) + (offset)
C th s dng cc thanh ghi
qun l cc tham s ny
Trng hp ring:
n

a ch c s = 0

Offset = 0
Computer Architecture

Nguyn Kim Khnh DCE-HUST

Ngn xp (Stack)
n

a ch c s

Ngn nh c s

Offset

n
n
d liu cn oc/ghi

203

Jan2016

Ngn xp l vng nh d liu c cu trc


LIFO (Last In - First Out vo sau - ra trc)
Ngn xp thng dng phc v cho
chng trnh con
y ngn xp l mt ngn nh xc nh
nh ngn xp l thng tin nm v tr trn
cng trong ngn xp
nh ngn xp c th b thay i

Computer Architecture

204

51

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Con tr ngn xp SP (Stack Pointer)


n

SP l thanh ghi cha a ch ca


ngn nh nh ngn xp
Khi ct mt thng tin vo ngn
xp:
SP
n
n

B nh chnh c nh a ch cho tng byte

Hai cch lu tr thng tin nhiu byte:


n

nh ngn xp

Gim ni dung ca SP
Thng tin c ct vo ngn nh
c tr bi SP

chiu
a
ch
tng
dn

Khi ly mt thng tin ra khi


ngn xp:
n

Th t lu tr cc byte trong b nh chnh

Thng tin c c t ngn nh


c tr bi SP
Tng ni dung ca SP

y ngn xp

Khi ngn xp rng, SP tr vo


y

Jan2016

Computer Architecture

205

NKK-HUST

u to (Big-endian): Byte c ngha cao c lu


tr ngn nh c a ch nh, byte c ngha thp
c lu tr ngn nh c a ch ln.

Cc sn phm thc t:
n

Intel x86: little-endian

Motorola 680x0, SunSPARC: big-endian

MIPS, IA-64: bi-endian (c hai kiu)

Jan2016

Computer Architecture

206

NKK-HUST

V d lu tr d liu 32-bit
S
nh phn

S Hexa

Tp lnh

0001 1010 0010 1011 0011 1100 0100 1101


1A

2B

3C

n
n

4D

4D

4000

1A

4000

3C

4001

2B

4001

2B

4002

3C

4002

1A

4003

4D

4003

little-endian

Jan2016

u nh (Little-endian): Byte c ngha thp c


lu tr ngn nh c a ch nh, byte c ngha
cao c lu tr ngn nh c a ch ln.

Mi b x l c mt tp lnh xc nh
Tp lnh thng c hng chc n hng trm
lnh
Mi lnh my (m my) l mt chui cc bit (0,1)
m b x l hiu c thc hin mt thao tc
xc nh.
Cc lnh c m t bng cc k hiu gi nh
dng text, chnh l cc lnh ca hp ng
(assembly language)

big-endian

Computer Architecture

Nguyn Kim Khnh DCE-HUST

207

Jan2016

Computer Architecture

208

52

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Dng lnh hp ng

Cc thnh phn ca lnh my

M C:
a = b + c;
n V d lnh hp ng:
add a, b, c
#a=b+c
trong :

M thao tc

n
n

Jan2016

n
n
n

Ch : mi lnh ch thc hin mt thao tc

b, c: cc ton hng ngun cho thao tc


a: ton hng ch (ni ghi kt qu)
phn sau du # l li gii thch (ch c tc dng
n ht dng)
Computer Architecture

n
n

209

Jan2016

Computer Architecture

add r1, r2, r3 # r1 = r2 + r3


S dng ph bin trn cc kin trc hin nay

add r1, r2
# r1 = r1 + r2
S dng trn Intel x86, Motorola 680x0

add r1
# Acc = Acc + r1
c s dng trn kin trc th h trc

n
n

0 a ch ton hng:
n
n

My tnh vi tp lnh phc tp


Cc b x l: Intel x86, Motorola 680x0

RISC: Reduced Instruction Set Computer


n

Mt a ch ton hng:
n

CISC: Complex Instruction Set Computer


n

Hai a ch ton hng:


n

Jan2016

210

Cc kin trc tp lnh CISC v RISC

Ba a ch ton hng:
n

Hng s nm ngay trong lnh


Ni dung ca thanh ghi
Ni dung ca ngn nh (hoc cng vo-ra)

NKK-HUST

Ton hng c th l:
n

S lng a ch ton hng trong lnh

Cc thao tc chuyn d liu


Cc php ton s hc
Cc php ton logic
Cc thao tc chuyn iu khin (r nhnh, nhy)

a ch ton hng: ch ra ni cha cc ton


hng m thao tc s tc ng
n

NKK-HUST

M thao tc (operation code hay opcode): m


ha cho thao tc m b x l phi thc hin
n

add: k hiu gi nh ch ra thao tc (php ton)


cn thc hin.
n

a ch ton hng

My tnh vi tp lnh thu gn


SunSPARC, Power PC, MIPS, ARM ...
RISC i nghch vi CISC
Kin trc tp lnh tin tin

Cc ton hng u c ngm nh ngn xp


Khng thng dng
Computer Architecture

Nguyn Kim Khnh DCE-HUST

211

Jan2016

Computer Architecture

212

53

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cc c trng ca kin trc RISC


n
n

n
n
n
n
n
n

Kin trc tp lnh MIPS

S lng lnh t
Hu ht cc lnh truy nhp ton hng cc
thanh ghi
Truy nhp b nh bng cc lnh LOAD/STORE
(np/lu)
Thi gian thc hin cc lnh l nh nhau
Cc lnh c di c nh (thng l 32 bit)
S lng dng lnh t
C t phng php nh a ch ton hng
C nhiu thanh ghi
H tr cc thao tc ca ngn ng bc cao

Jan2016

Computer Architecture

n
n

n
n
n

213

Jan2016

Thc hin php cng: 3 ton hng

L php ton ph bin nht


n Hai ton hng ngun v mt ton hng
ch

MIPS c tp 32 thanh ghi 32-bit


n
n

add a, b, c # a = b + c
Hu ht cc lnh s hc/logic c dng
trn
Cc lnh s hc s dng ton hng
thanh ghi hoc hng s

n
n
n

Nguyn Kim Khnh DCE-HUST

215

Jan2016

Bt u bng du $
$t0, $t1, , $t9 cha cc gi tr tm thi
$s0, $s1, , $s7 ct cc bin

Qui c gi d liu trong MIPS:


n

Computer Architecture

c s dng thng xuyn


c nh s t 0 n 31 (m ha bng 5-bit)

Chng trnh hp dch Assembler t tn:

Jan2016

214

Tp thanh ghi ca MIPS

Computer Architecture

NKK-HUST

5.2. Lnh hp ng v cc ton hng

Ti liu: MIPS Reference Data Sheet v Chapter 2 COD

NKK-HUST

MIPS vit tt cho:


Microprocessor without Interlocked Pipeline Stages
c pht trin bi John Hennessy v cc ng nghip
i hc Stanford (1984)
c thng mi ha bi MIPS Technologies
Nm 2013 cng ty ny c bn cho Imagination
Technologies (imgtec.com)
L kin trc RISC in hnh, d hc
c s dng trong nhiu sn phm thc t
Cc phn tip theo trong chng ny s nghin cu kin
trc tp lnh MIPS 32-bit

D liu 32-bit c gi l word


D liu 16-bit c gi l halfword
Computer Architecture

216

54

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Tp thanh ghi ca MIPS


Tn thanh ghi S hiu thanh ghi
$zero

Ton hng thanh ghi


Cng dng

the constant value 0, cha hng s = 0


assembler temporary, gi tr tm thi cho hp ng

$at

$v0-$v1

2-3

procedure return values, cc gi tr tr v ca th tc

$a0-$a3

4-7

procedure arguments, cc tham s vo ca th tc

$t0-$t7

8-15

temporaries, cha cc gi tr tm thi

$s0-$s7

16-23

saved variables, lu cc bin

$t8-$t9

24-25

more temporarie, cha cc gi tr tm thi

$k0-$k1

26-27

OS temporaries, cc gi tr tm thi ca OS

$gp

28

global pointer, con tr ton cc

$sp

29

stack pointer, con tr ngn xp

$fp

30

frame pointer, con tr khung

$ra

31

Jan2016

n
n

217

NKK-HUST

$t0, $s1, $s2


$t1, $s3, $s4
$s0, $t0, $t1

# $t0 = g + h
# $t1 = i + j
# f = (g+h)-(i+j)

Computer Architecture

218

a ch byte nh v word nh

Np (load) gi tr t b nh vo thanh ghi


Thc hin php ton trn thanh ghi
Lu (store) kt qu t thanh ghi ra b nh

B nh c nh a ch theo byte
n

n
n

Jan2016

c dch thnh m hp ng MIPS:

Jan2016

Mun thc hin php ton s hc vi ton hng b


nh, cn phi:
n

# (rd) = (rs)+(rt)
# (rd) = (rs)-(rt)

NKK-HUST

rd, rs, rt
rd, rs, rt

gi thit: f, g, h, i, j nm $s0, $s1, $s2, $s3, $s4

add
add
sub

Ton hng b nh
n

add
sub

V d m C:
f = (g + h) - (i + j);
n

procedure return address, a ch tr v ca th tc


Computer Architecture

Lnh add, lnh sub (subtract) ch thao tc vi


ton hng thanh ghi

MIPS s dng 32-bit nh a ch cho cc byte nh v cc


cng vo-ra
Khng gian a ch: 0x00000000 0xFFFFFFFF

MIPS cho php lu tr trong b nh theo kiu u to


(big-endian) hoc kiu u nh (little-endian)

Nguyn Kim Khnh DCE-HUST

a ch byte
(theo Hexa)

D liu
hoc lnh

a ch word
(theo Hexa)

byte (8-bit)

0x0000 0000

word (32-bit)

0x0000 0000

byte

0x0000 0001

word

0x0000 0004

byte

0x0000 0002

word

0x0000 0008

byte

0x0000 0003

word

0x0000 000C

byte

0x0000 0004

word

0x0000 0010

byte

0x0000 0005

word

0x0000 0014

byte

0x0000 0006

word

0x0000 0018

byte

0x0000 0007

Mi word c di 32-bit chim 4-byte trong b nh, a ch ca


cc word l bi ca 4 (a ch ca byte u tin)

Computer Architecture

D liu
hoc lnh

219

Jan2016

word

0xFFFF FFF4

byte

0xFFFF FFFB

word

0xFFFF FFF8

byte

0xFFFF FFFC

word

0xFFFF FFFC

byte

0xFFFF FFFD

byte

0xFFFF FFFE

byte

0xFFFF FFFF

232 bytes

230 words

Computer Architecture

220

55

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Lnh load v lnh store


n

V d ton hng b nh

c word d liu 32-bit t b nh a vo thanh ghi,


s dng lnh load word
lw rt, imm(rs)
# (rt) = mem[(rs)+imm]

M C:
// A l mng cc phn t 32-bit
g = h + A[8];
n

rs: thanh ghi cha a ch c s (base address)


imm (immediate): hng s (offset)
a ch ca word d liu cn c = a ch c s + hng s
n rt: thanh ghi ch, cha word d liu c c vo
n

Cho g $s1, h $s2


$s3 cha a ch c s ca mng A

a ch c s

Offset = 32

ghi word d liu 32-bit t thanh ghi a ra b nh,


s dng lnh store word
sw rt, imm(rs)
# mem[(rs)+imm] = (rt)

A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]

rt: thanh ghi ngun, cha word d liu cn ghi ra b nh


rs: thanh ghi cha a ch c s (base address)
n imm: hng s (offset)
a ch ni ghi word d liu = a ch c s + hng s
n
n

Jan2016

Computer Architecture

221

NKK-HUST

Jan2016

Computer Architecture

222

NKK-HUST

V d ton hng b nh
M C:
// A l mng cc phn t 32-bit
g = h + A[8];

V d ton hng b nh (tip)

n
n

Cho g $s1, h $s2


$s3 cha a ch c s ca mng A

a ch c s

Offset = 32

M hp ng MIPS:
# Ch s 8, do offset = 32
lw
$t0, 32($s3)
# $t0 = A[8]
add $s1, $s2, $t0 # g = h+A[8]
n

offset

A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]

M C:

a ch c s

A[12] = h + A[8];
n h $s2
n $s3 cha a ch c s ca mng A
Offset = 48

A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]

base register

(Ch : offset phi l hng s, c th dng hoc m )


Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

223

Jan2016

Computer Architecture

224

56

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d ton hng b nh (tip)


n

M C:

Thanh ghi vi B nh
a ch c s

A[12] = h + A[8];
n h $s2
n $s3 cha a ch c s ca mng A
Offset = 48

M hp ng MIPS:
lw $t0, 32($s3)
add $t0, $s2, $t0
sw $t0, 48($s3)

Jan2016

# $t0 = A[8]
# $t0 = h+A[8]
# A[12]=h+A[8]

A[0]
A[1]
A[2]
A[3]
A[4]
A[5]
A[6]
A[7]
A[8]
A[9]
A[10]
A[11]
A[12]

Computer Architecture

n
n

225

Jan2016

226

NKK-HUST

X l vi s nguyn

D liu hng s c xc nh ngay trong


lnh
addi $s3, $s3, 4

S nguyn c du (biu din bng b hai):


n

# $s3 = $s3+4

Khng c lnh tr (subi) vi gi tr hng s


n

S dng hng s m trong lnh addi thc


hin php tr
addi $s2, $s1, -1 # $s2 = $s1-1

Nguyn Kim Khnh DCE-HUST

227

Jan2016

Vi n bit, di biu din: [0, 2n -1]


Cc lnh addu, subu dnh cho s nguyn khng du

Qui c biu din hng s nguyn trong hp


ng MIPS:
n

Computer Architecture

Vi n bit, di biu din: [-2n-1, +(2n-1-1)]


Cc lnh add, sub dnh cho s nguyn c du

S nguyn khng du:

Jan2016

Ch s dng b nh cho cc bin t c s dng


Cn ti u ha s dng thanh ghi

Computer Architecture

Ton hng tc th (immediate)

Cn thc hin nhiu lnh hn

Chng trnh dch s dng cc thanh ghi cho


cc bin nhiu nht c th
n

NKK-HUST

Truy nhp thanh ghi nhanh hn b nh


Thao tc d liu trn b nh yu cu np (load)
v lu (store)

s thp phn: 12; 3456; -18


s Hexa (bt u bng 0x): 0x12 ; 0x3456; 0x1AB6

Computer Architecture

228

57

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Hng s Zero
n

Thanh ghi 0 ca MIPS ($zero hay $0) lun


cha hng s 0
n

5.3. M my (Machine code)


n

Khng th thay i gi tr

Hu ch cho mt s thao tc thng dng


n

Cc lnh c m ha di dng nh phn


c gi l m my
Cc lnh ca MIPS:
n
n

Chng hn, chuyn d liu gia cc thanh ghi


add $t2, $s1, $zero # $t2 = $s1

S hiu thanh ghi c m ha bng 5-bit


n
n
n

Jan2016

Computer Architecture

229

NKK-HUST

$t0 $t7 c s hiu t 8 15


$t8 $t9 c s hiu t 24 25
$s0 $s7 c s hiu t 16 23

Jan2016

Computer Architecture

230

NKK-HUST

Lnh kiu R (Registers)

Cc kiu lnh my ca MIPS


Lnh kiu R

op

rs

rt

rd

shamt

funct

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

op

rs

rt

imm

6 bits

5 bits

5 bits

16 bits

op

address

6 bits

26 bits
n

Computer Architecture

Nguyn Kim Khnh DCE-HUST

rs

rt

rd

shamt

funct

5 bits

5 bits

5 bits

5 bits

6 bits

op (operation code - opcode): m thao tc


n

n
n

Lnh kiu J

op
6 bits

Cc trng ca lnh
n

Lnh kiu I

Jan2016

c m ha bng cc t lnh 32-bit


Mi lnh chim 4-byte trong b nh, do vy a ch
ca lnh trong b nh l bi ca 4
C t dng lnh

231

Jan2016

vi cc lnh kiu R, op = 000000

rs: s hiu thanh ghi ngun th nht


rt: s hiu thanh ghi ngun th hai
rd: s hiu thanh ghi ch
shamt (shift amount): s bit c dch, ch dng cho
lnh dch bit, vi cc lnh khc shamt = 00000
funct (function code): m hm
Computer Architecture

232

58

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Lnh kiu I (Immediate)

V d m my ca lnh add, sub


op

rs

rt

rd

shamt

funct

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

add $t0, $s1, $s2

$s1

$s2

$t0

add

17

18

32

000000

10001

10010

01000

00000

100000

$t3

$t5

11

000000

01011

Jan2016

13
01101

$s0

16

10000

00000

rs: s hiu thanh ghi ngun (addi) hoc thanh ghi c s (lw, sw)

rt: s hiu thanh ghi ch (addi, lw) hoc thanh ghi ngun (sw)

imm (immediate): hng s nguyn 16-bit


# (rt) = (rs)+SignExtImm

rt, imm(rs)

# (rt) = mem[(rs)+SignExtImm]

34

sw

rt, imm(rs)

# mem[(rs)+SignExtImm] = (rt)

(SignExtImm: hng s imm 16-bit c m rng theo kiu s c du thnh 32-bit)

100010
233

Jan2016

Computer Architecture

234

NKK-HUST

V d m my ca lnh addi

Vi cc lnh addi, lw, sw cn cng ni dung


thanh ghi vi hng s:
n
n

Thanh ghi c di 32-bit


Hng s imm 16-bit, cn m rng thnh 32-bit theo
kiu s c du (Sign-extended)

+5 =

0000 0000 0000 0000 0000 0000 0000 0101

16-bit

+5 =

0000 0000 0000 0000 0000 0000 0000 0101

32-bit

-12 =

0000 0000 0000 0000 1111

16-bit

1111

1111

1111

1111

1111

Computer Architecture

Nguyn Kim Khnh DCE-HUST

1111
1111

1111 0100
1111 0100

op

rs

rt

imm

6 bits

5 bits

5 bits

16 bits

addi $s0, $s1, 5

V d m rng s 16-bit thnh 32-bit theo kiu


s c du:

-12 =
Jan2016

lw

M rng bit cho hng s theo s c du

imm
16 bits

addi rt, rs, imm

NKK-HUST

rt
5 bits

sub

(0x016D8022)

Computer Architecture

rs
5 bits

Dng cho cc lnh s hc/logic vi ton hng tc th v cc


lnh load/store (np/lu)

(0x02324020)

sub $s0, $t3, $t5

op
6 bits

$s1

$s0

17

16

001000

10001

10000

0000 0000 0000 0101

addi $t1, $s2, -12


8

$s2

$t1

(0x22300005)
-12

18

-12

001000

10010

01001

1111 1111 1111 0100

32-bit

(0x2249FFF4)
235

Jan2016

Computer Architecture

236

59

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d m my ca lnh load v lnh store


op

rs

rt

imm

6 bits

5 bits

5 bits

16 bits

Lnh kiu J (Jump)


n

lw $t0, 32($s3)

35

$s3

$t0

32

35

19

32

100011

10011

01000

0000 0000 0010 0000

Ton hng 26-bit a ch


c s dng cho cc lnh nhy
n
n

j (jump) op = 000010
jal (jump and link) op = 000011

(0x8E680020)

sw $s1, 4($t1)

Jan2016

43

$t1

$s1

43

17

101011

01001

10001

0000 0000 0000 0100

Computer Architecture

(0xAD310004)

op

237

NKK-HUST

Jan2016

26 bits

Computer Architecture

238

NKK-HUST

5.4. C bn v lp trnh hp ng
1.
2.
3.
4.
5.
6.
7.
8.

Jan2016

address

6 bits

1. Cc lnh logic

Cc lnh logic
Np hng s vo thanh ghi
To cc cu trc iu khin
Lp trnh mng d liu
Chng trnh con
D liu k t
Lnh nhn v lnh chia
Cc lnh vi s du phy ng

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Cc lnh logic thao tc trn cc bit ca


d liu
Php ton
logic

239

Jan2016

Ton t
trong C

Lnh ca
MIPS

Shift left

<<

sll

Shift right

>>

srl

Bitwise AND

&

and, andi

Bitwise OR

or, ori

Bitwise XOR

xor, xori

Bitwise NOT

nor

Computer Architecture

240

60

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d lnh logic kiu R

V d lnh logic kiu R

Ni dung cc thanh ghi ngun

Ni dung cc thanh ghi ngun

$s1 0100 0110 1010 0001 1100 0000 1011 0111

$s1 0100 0110 1010 0001 1100 0000 1011 0111

$s2 1111

$s2 1111

M hp ng

1111

1111

1111 0000 0000 0000 0000

Kt qu thanh ghi ch

1111

M hp ng

1111

1111 0000 0000 0000 0000

Kt qu thanh ghi ch

and $s3, $s1, $s2

$s3

and $s3, $s1, $s2

$s3 0100 0110 1010 0001 0000 0000 0000 0000

or

$s4, $s1, $s2

$s4

or

$s4 1111

xor $s5, $s1, $s2

$s5

xor $s5, $s1, $s2

$s5 1011 1001 0101 1110 1100 0000 1011 0111

nor $s6, $s1, $s2

$s6

nor $s6, $s1, $s2

$s6 0000 0000 0000 0000 0011 1111 0100 1000

Jan2016

Computer Architecture

241

NKK-HUST

$s4, $s1, $s2

Jan2016

1111

1111

1111 1100 0000 1011 0111

Computer Architecture

242

NKK-HUST

V d lnh logic kiu I

V d lnh logic kiu I

Gi tr cc ton hng ngun


$s1 0000 0000 0000 0000 0000 0000 1111

Gi tr cc ton hng ngun


1111

$s1 0000 0000 0000 0000 0000 0000 1111

imm 0000 0000 0000 0000 1111 1010 0011 0100

imm 0000 0000 0000 0000 1111 1010 0011 0100

Zero-extended

M hp ng

1111

Zero-extended

Kt qu thanh ghi ch

M hp ng

Kt qu thanh ghi ch

andi $s2,$s1,0xFA34 $s2

andi $s2,$s1,0xFA34 $s2 0000 0000 0000 0000 0000 0000 0011 0100

ori

ori

$s3,$s1,0xFA34 $s3

xori $s4,$s1,0xFA34 $s4

$s3,$s1,0xFA34 $s3 0000 0000 0000 0000 1111 1010 1111 1111

xori $s4,$s1,0xFA34 $s4 0000 0000 0000 0000 1111 1010 1100 1011

Ch : Vi cc lnh logic kiu I, hng s imm 16-bit c


m rng thnh 32-bit theo s khng du (zero-extended)
Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

243

Jan2016

Computer Architecture

244

61

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

ngha ca cc php ton logic


n

Lnh logic dch bit

Php AND dng gi nguyn mt s bit trong


word, xa cc bit cn li v 0
Php OR dng gi nguyn mt s bit trong
word, thit lp cc bit cn li ln 1
Php XOR dng gi nguyn mt s bit trong
word, o gi tr cc bit cn li
Php NOT dng o cc bit trong word
n
n

n
n
n

Jan2016

Computer Architecture

NKK-HUST

rt

rd

shamt

funct

6 bits

5 bits

5 bits

5 bits

5 bits

6 bits

Dch tri cc bit v in cc bit 0 vo bn phi


Dch tri i bits l nhn vi 2i (nu kt qu trong phm vi biu din
32-bit)

srl - shift right logical (dch phi logic)


n

245

rs

shamt: ch ra dch bao nhiu v tr (shift amount)


rs: khng s dng, thit lp = 00000
Thanh ghi ch rd nhn gi tr thanh ghi ngun rt c
dch tri hoc dch phi, rt khng thay i ni dung
sll - shift left logical (dch tri logic)
n

i 0 thnh 1, v i 1 thnh 0
MIPS khng c lnh NOT, nhng c lnh NOR vi 3
ton hng
a NOR b == NOT ( a OR b )
nor $t0, $t1, $zero # $t0 = not($t1)

op

Dch phi cc bit v in cc bit 0 vo bn tri


Dch phi i bits l chia cho 2i (ch vi s nguyn khng du)

Jan2016

Computer Architecture

246

NKK-HUST

V d lnh dch tri sll


Lnh hp ng:
sll $t2, $s0, 4

V d lnh dch phi srl


Lnh hp ng:
srl $s2, $s1, 2

# $t2 = $s0 << 4

M my:

# $s2 = $s1 >> 2

M my:
op

rs

rt

rd

shamt

funct

op

rs

rt

rd

shamt

funct

16

10

17

18

000000

00000

10000

01010

00100

000000

000000

00000

10001

10010

00010

(0x00105100)

V d kt qu thc hin lnh:

V d kt qu thc hin lnh:

$s0

0000

0000

0000

0000

0000

0000

0000

1101

$t2

0000

0000

0000

0000

0000

0000

1101

0000

000010

(0x00119082)

= 13

$s1

0000

0000

0000

0000

0000

0000

0101

0110

= 86

= 208

$s2

0000

0000

0000

0000

0000

0000

0001

0101

= 21

(13x16)

[86/4]

Ch : Ni dung thanh ghi $s0 khng b thay i


Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

247

Jan2016

Computer Architecture

248

62

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

2. Np hng s vo thanh ghi

Lnh lui (load upper immediate)

Trng hp hng s 16-bit s dng lnh addi:


n V d: np hng s 0x4F3C vo thanh ghi $s0:
addi $s0, $0, 0x4F3C #$s0 = 0x4F3C
Trong trng hp hng s 32-bit s dng lnh
lui v lnh ori:
lui rt, constant_hi16bit

n
n

imm
16 bits

15

$s0

0x21A0

15

16

0x21A0

10000

0010 0001 1010 0000

001111

00000

(0x3C1021A0)
Ni dung $s0 sau khi lnh c thc hin:

a 16 bit thp ca hng s 32-bit vo thanh ghi rt

$s0

Computer Architecture

249

NKK-HUST

0010

0001

Jan2016

1010 0000

0000

0000

Computer Architecture

0000

0000
250

NKK-HUST

V d khi to thanh ghi 32-bit


n

rt
5 bits

Lnh m my

Copy 16 bit cao ca hng s 32-bit vo 16 bit tri ca rt


Xa 16 bits bn phi ca rt v 0

Jan2016

rs
5 bits

lui $s0, 0x21A0

ori rt,rt,constant_low16bit
n

op
6 bits

3. To cc cu trc iu khin

Np vo thanh ghi $s0 gi tr 32-bit sau:


0010 0001 1010 0000 0100 0000 0011 1011 =0x21A0 403B
lui $s0,0x21A0

# np 0x21A0 vo na cao
# ca thanh ghi $s0

ori $s0,$s0,0x403B

# np 0x403B vo na thp
# ca thanh ghi $s0

n
n
n

Ni dung $s0 sau khi thc hin lnh lui


$s0 0010
0000

0011

or

1011

if
if/else
switch/case

Cc cu trc lp
n

0001 1010 0000 0000 0000 0000 0000


0000 0000 0000 0100 0000

Cc cu trc r nhnh

while
do while
for

Ni dung $s0 sau khi thc hin lnh ori


$s0 0010
Jan2016

0001 1010 0000 0100 0000 0011 1011


Computer Architecture

Nguyn Kim Khnh DCE-HUST

251

Jan2016

Computer Architecture

252

63

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cc lnh r nhnh v lnh nhy


n

Dch cu lnh if

Cc lnh r nhnh: beq, bne

R nhnh n lnh c nh nhn nu iu


kin l ng, ngc li, thc hin tun t
n beq rs, rt, L1
n

n
n

j L1
n

nhy (jump) khng iu kin n lnh nhn L1


Computer Architecture

253

NKK-HUST

Jan2016

Computer Architecture

254

NKK-HUST

Dch cu lnh if

Dch cu lnh if/else

M C:
if (i==j)
f = g+h;
f = f-i;
n f, g, h, i, j $s0, $s1, $s2, $s3, $s4

i == j ?

No

Yes

i = j

M C:

f = f - i

i j
else

if (i==j) f = g+h;
else f = g-h;

f = g + h

i == j ?

f = g + h

f = g - h

f, g, h, i, j $s0, $s1, $s2, $s3, $s4

M hp ng MIPS:
# $s0 =
# $s3 =
bne
add
L1: sub

Jan2016

f = f - i

branch on not equal


nu (rs != rt) r nhnh n lnh nhn L1

Jan2016

Yes

f = g + h

Lnh nhy j
n

No

i == j ?

bne rs, rt, L1


n

branch on equal
nu (rs == rt) r nhnh n lnh nhn L1

M C:
if (i==j)
f = g+h;
f = f-i;
n f, g, h, i, j $s0, $s1, $s2, $s3, $s4

f, $s1 = g, $s2
i, $s4 = j
$s3, $s4, L1 #
$s0, $s1, $s2 #
$s0, $s0, $s3 #

= h

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Nu i=j
th f=g+h
f=f-i

iu kin hp
ng ngc vi
iu kin ca
ngn ng bc
cao

255

Jan2016

Computer Architecture

256

64

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Dch cu lnh if/else


n

Dch cu lnh switch/case


i = j

M C:

i j

M C:

else

if (i==j) f = g+h;
else f = g-h;
n

i == j ?

f = g + h

f = g - h

f, g, h, i, j $s0, $s1, $s2, $s3, $s4

M hp ng MIPS:

Else:
Exit:

bne
add
j
sub

$s3,$s4,Else
$s0,$s1,$s2
Exit
$s0,$s1,$s2

Jan2016

#
#
#
#

Nu i=j
th f=g+h
thot
nu i<>j th f=g-h

Computer Architecture

257

NKK-HUST

switch (amount) {
case 20:
fee = 2; break;
case 50:
fee = 3; break;
case 100: fee = 5; break;
default: fee = 0;
}
// tng ng vi s dng cc cu lnh if/else
if(amount = = 20) fee = 2;
else if (amount = = 50) fee = 3;
else if (amount = = 100) fee = 5;
else fee = 0;
Jan2016

Computer Architecture

NKK-HUST

Dch cu lnh switch/case

Dch cu lnh vng lp while

M hp ng MIPS

Jan2016

258

# $s0 = amount, $s1 = fee


case20:
addi $t0, $0, 20
bne $s0, $t0, case50
addi $s1, $0, 2
j
done
case50:
addi $t0, $0, 50
bne $s0, $t0, case100
addi $s1, $0, 3
j
done
case100:
addi $t0, $0, 100
bne $s0, $t0, default
addi $s1, $0, 5
j
done
default:
add $s1 ,$0, $0
done:

M C:
i += 1;

A[i] == k?

i $s3, k $s5
a ch c s ca mng A $s6

Yes

while (A[i] == k)

# $t0 = 20
# amount == 20? if not, skip to case50
# if so, fee = 2
# and break out of case

n
n

No

i = i + 1

# $t0 = 50
# amount == 50? if not, skip to case100
# if so, fee = 3
# and break out of case
# $t0 = 100
# amount == 100? if not, skip to default
# if so, fee = 5
# and break out of case
# fee = 0
Computer Architecture

Nguyn Kim Khnh DCE-HUST

259

Jan2016

Computer Architecture

260

65

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Dch cu lnh vng lp while


n

M C:

while (A[i] == k)
n
n

Dch cu lnh vng lp for

i += 1;

i $s3, k $s5
a ch c s ca mng A $s6

A[i] == k?

No

M C:

// add the numbers from 0 to 9


int sum = 0;
int i;
for (i=0; i!=10; i++) {
sum = sum + i;
}

Yes

i = i + 1

M hp ng MIPS:
Loop: sll

$t1, $s3, 2

# $t1 = 4*i

add

$t1, $t1, $s6

# $t1 = a ch A[i]

lw

$t0, 0($t1)

# $t0 = A[i]

bne

$t0, $s5, Exit

# nu A[i]<>k th Exit

addi $s3, $s3, 1

# nu A[i]=k th i=i+1

# quay li Loop

Loop

Exit: ...
Jan2016

Computer Architecture

261

NKK-HUST

Jan2016

Khi lnh c s (basic block)

M C:

// add the numbers from 0 to 9


int sum = 0;
int i;
for (i=0; i!=10; i++) {
sum = sum + i;
}

= sum
$0, 0
$0, $0
$0, 10
$t0, done
$s1, $s0
$s0, 1

#
#
#
#
#
#
#

Khng c lnh r nhnh nhng trong


(ngoi tr cui)
n Khng c ch r nhnh ti (ngoi tr v tr
u tin)

Nguyn Kim Khnh DCE-HUST

sum = 0
i = 0
$t0 = 10
Nu i=10, thot
Nu i<10 th sum = sum+i
tng i thm 1
quay li for

Computer Architecture

Khi lnh c s l dy cc lnh vi


n

M hp ng MIPS:

# $s0 = i, $s1
addi $s1,
add
$s0,
addi $t0,
for: beq
$s0,
add
$s1,
addi $s0,
j
for
done: ...
Jan2016

262

NKK-HUST

Dch cu lnh vng lp for


n

Computer Architecture

263

Jan2016

Chng trnh dch xc


nh khi c s ti u
ha
Cc b x l tin tin c
th tng tc thc hin
khi c s

Computer Architecture

264

66

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Thm cc lnh thao tc iu kin


n

Lnh slt (set on less than)


slt rd, rs, rt

n
n

Lnh slti
slti rt, rs, constant

n
n

S dng kt hp vi cc lnh beq, bne


slt $t0, $s1, $s2
bne $t0, $zero, L1
...

So snh s c du: slt, slti


So snh s khng du: sltu, sltiu
V d
n

Nu (rs < constant) th rt = 1; ngc li rt = 0;

Nu (rs < rt) th rd = 1; ngc li rd = 0;

So snh s c du v khng du

$s0 = 1111 1111 1111 1111 1111 1111 1111 1111


$s1 = 0000 0000 0000 0000 0000 0000 0000 0001
slt $t0, $s0, $s1 # signed
n

# nu ($s1 < $s2)


# r nhnh n L1

1 < +1 $t0 = 1

sltu $t0, $s0, $s1


n

# unsigned

+4,294,967,295 > +1 $t0 = 0

L1:
Jan2016

Computer Architecture

265

NKK-HUST

Jan2016

266

NKK-HUST

V d s dng lnh slt


n

Computer Architecture

V d s dng lnh slt

M C

M hp ng MIPS

# $s0 = i, $s1 = sum

int sum = 0;
int i;
for (i=1; i < 101; i = i*2) {
sum = sum + i;
}

loop:

addi $s1, $0, 0

# sum = 0

addi $s0, $0, 1


addi $t0, $0, 101

# i = 1
# t0 = 101

slt

$t1, $s0, $t0

# Nu i>= 101

beq

$t1, $0, done

# th thot

add
sll

$s1, $s1, $s0


$s0, $s0, 1

# nu i<101 th sum=sum+i
# i= 2*i

loop

# lp li

done:

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

267

Jan2016

Computer Architecture

268

67

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

4. Lp trnh vi mng d liu


n

V d v mng

Truy cp s lng ln cc d liu cng


loi
Ch s (Index): truy cp tng phn t ca
mng
Kch c (Size): s phn t ca mng

Jan2016

Computer Architecture

Mng 5-phn t, mi
phn t c di 32-bit,
chim 4 byte trong b
nh
a ch c s =
0x12348000 (a ch ca
phn t u tin ca
mng array[0])
Bc u tin truy cp
mng: np a ch c s
vo thanh ghi
269

NKK-HUST

Jan2016

array[0]

0x12348004

array[1]

0x12348008

array[2]

0x1234800C

array[3]

0x12348010

array[4]

Computer Architecture

270

NKK-HUST

V d truy cp cc phn t
n

V d vng lp truy cp mng d liu

M C

int array[5];
array[0] = array[0] * 2;
array[1] = array[1] * 2;
n

M hp ng MIPS

lw
sll
sw

$t1, 0($s0)
$t1, $t1, 1
$t1, 0($s0)

# $t1 = array[0]
# $t1 = $t1 * 2
# array[0] = $t1

lw
sll
sw

$t1, 4($s0)
$t1, $t1, 1
$t1, 4($s0)

# $t1 = array[1]
# $t1 = $t1 * 2
# array[1] = $t1
Computer Architecture

Nguyn Kim Khnh DCE-HUST

M C
int array[1000];
int i;
for (i=0; i < 1000; i = i + 1)
array[i] = array[i] * 8;

# np a ch c s ca mng vo $s0
lui $s0, 0x1234
# 0x1234 vo na cao ca $S0
ori $s0, $s0, 0x8000
# 0x8000 vo na thp ca $s0

Jan2016

0x12348000

// gi s a ch c s ca mng = 0x23b8f000
n

M hp ng MIPS

# $s0 = array base address (0x23b8f000), $s1 = i

271

Jan2016

Computer Architecture

272

68

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d vng lp truy cp mng d liu (tip)

5. Chng trnh con - th tc

M hp ng MIPS
# $s0 = array base address (0x23b8f000), $s1 = i
# khi to cc thanh ghi
lui $s0, 0x23b8
# $s0 = 0x23b80000
ori $s0, $s0, 0xf000
# $s0 = 0x23b8f000
addi $s1, $0, 0
# i = 0
addi $t2, $0, 1000
# $t2 = 1000
# vng lp
loop: slt $t0, $s1, $t2
# i < 1000?
beq $t0, $0, done
# if not then done
sll $t0, $s1, 2
# $t0 = i*4
add $t0, $t0, $s0
# address of array[i]
lw
$t1, 0($t0)
# $t1 = array[i]
sll $t1, $t1, 3
# $t1 = array[i]*8
sw
$t1, 0($t0)
# array[i] = array[i]*8
addi $s1, $s1, 1
# i = i + 1
j
loop
# repeat
done:

Jan2016

Computer Architecture

1.
2.
3.
4.

t cc tham s vo cc thanh ghi


Chuyn iu khin n th tc
Thc hin cc thao tc ca th tc
t kt qu vo thanh ghi cho chng
trnh gi th tc
5. Tr v v tr gi

273

NKK-HUST

Jan2016

Computer Architecture

n
n

n
n
n
Jan2016

Gi th tc: jump and link


jal ProcedureAddress
n

C th c ghi li bi th tc c gi

$s0 $s7: ct gi cc bin


n

Cc lnh gi th tc

$a0 $a3: cc tham s vo (cc thanh ghi 4 7)


$v0, $v1: cc kt qu ra (cc thanh ghi 2 v 3)
$t0 $t9: cc gi tr tm thi
n

274

NKK-HUST

S dng cc thanh ghi


n

Cc bc yu cu:

a ch ca lnh k tip (a ch tr v) c ct
thanh ghi $ra
Nhy n a ch ca th tc

Cn phi ct/khi phc bi th tc c gi

$gp: global pointer - con tr ton cc cho d liu


tnh (thanh ghi 28)
$sp: stack pointer - con tr ngn xp (thanh ghi 29)
$fp: frame pointer - con tr khung (thanh ghi 30)
$ra: return address - a ch tr v (thanh ghi 31)
Computer Architecture

Nguyn Kim Khnh DCE-HUST

275

Tr v t th tc: jump register


jr $ra
n

Jan2016

Copy ni dung thanh ghi $ra (ang cha a ch tr v)


tr li cho b m chng trnh PC
Computer Architecture

276

69

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Gi th tc lng nhau

Minh ha gi Th tc
main

main

PC

jal

proc

Prepare
to call
Prepare
to continue

PC

jal

proc

abc

Prepare
to call
Prepare
to continue

abc

Procedure
abc
Save
xyz

Save, etc.
jal

xyz

jr

$ra

Procedure
xyz

Restore

Restore
jr

Jan2016

Computer Architecture

277

NKK-HUST

Jan2016

Computer Architecture

278

M hp ng MIPS

Th tc l l th tc khng c li gi th tc khc
M C:

leaf_example:
addi $sp,
sw
$t1,
sw
$t0,
sw
$s0,
add
$t0,
add
$t1,
sub
$s0,
add
$v0,
lw
$s0,
lw
$t0,
lw
$t1,
addi $sp,
jr
$ra

int leaf_example (int g, h, i, j)


{
int f;
f = (g + h) - (i + j);
return f;
}
n Cc tham s g, h, i, j $a0, $a1, $a2, $a3
n f $s0 (do , cn ct $s0 ra ngn xp)
n $t0 v $t1 c th tc dng cha cc gi tr tm
thi, cng cn ct trc khi s dng
n Kt qu $v0
Jan2016

$ra

NKK-HUST

V d Th tc l
n

jr

$ra

Computer Architecture

Nguyn Kim Khnh DCE-HUST

279

Jan2016

$sp, -12
8($sp)
4($sp)
0($sp)
$a0, $a1
$a2, $a3
$t0, $t1
$s0, $zero
0($sp)
4($sp)
8($sp)
$sp, 12

#
#
#
#
#
#
#
#
#
#
#
#
#

Computer Architecture

to 3 v tr stack
ct ni dung $t1
ct ni dung $t0
ct ni dung $s0
$t0 = g+h
$t1 = i+j
$s0 = (g+h)-(i+j)
tr kt qu sang $v0
khi phc $s0
khi phc $t0
khi phc $t1
xa 3 mc stack
tr v ni gi

280

70

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

V d Th tc cnh
n
n

M hp ng MIPS
fact:
addi
sw
sw
slti
beq
addi
addi
jr
L1: addi
jal
lw
lw
addi
mul
jr

L th tc c gi th tc khc
M C:
int fact (int n)
{
if (n < 1) return (1);
else return n * fact(n - 1);
}
n Tham s n $a0
n Kt qu $v0

Jan2016

Computer Architecture

281

NKK-HUST

Jan2016

$a0, -1
0($sp)
4($sp)
$sp, 8
$a0, $v0

#
#
#
#

dnh stack cho 2 mc


ct a ch tr v
ct tham s n
kim tra n < 1

#
#
#
#
#
#
#
#
#
#

nu ng, kt qu l 1
ly 2 mc t stack
v tr v
nu khng, gim n
gi qui
khi phc n ban u
v a ch tr v
ly 2 mc t stack
nhn nhn kt qu
v tr v
282

NKK-HUST

6. D liu k t
n

$sp
Local
variables

z
y
..
.

Saved
registers
c
b
a
..
.

$fp
Frame for
current
procedure

Before calling

Old ($fp)

c
b
a
..
.

Nguyn Kim Khnh DCE-HUST

ASCII: 128 k t
n

Frame for
previous
procedure

95 k th hin th , 33 m iu khin

Latin-1: 256 k t
n

ASCII v cc k t m rng

Unicode: Tp k t 32-bit
c s dng trong Java, C++,
n Hu ht cc k t ca cc ngn ng trn th
gii v cc k hiu
n

After calling

Computer Architecture

Cc tp k t c m ha theo byte
n

Frame for
current
procedure

$fp

Jan2016

$sp, -8
4($sp)
0($sp)
$a0, 1
$zero, L1
$zero, 1
$sp, 8

Computer Architecture

S dng Stack khi gi th tc

$sp

$sp,
$ra,
$a0,
$t0,
$t0,
$v0,
$sp,
$ra
$a0,
fact
$a0,
$ra,
$sp,
$v0,
$ra

283

Jan2016

Computer Architecture

284

71

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cc thao tc vi Byte/Halfword
n
n

V d copy String

C th s dng cc php ton logic


Np/Lu byte/halfword trong MIPS

lb rt, offset(rs)
lh rt, offset(rs)
n M rng du thnh 32 bits trong rt

lbu rt, offset(rs)


lhu rt, offset(rs)
n M rng zero thnh 32 bits trong rt

sb rt, offset(rs)
sh rt, offset(rs)
n Ch lu byte/halfword bn phi

Jan2016

Computer Architecture

void strcpy (char x[], char y[])


{ int i;
i = 0;
while ((x[i]=y[i])!='\0')
i += 1;
}
n Cc a ch ca x, y $a0, $a1
n i $s0

285

NKK-HUST

Jan2016

Computer Architecture

Jan2016

286

NKK-HUST

V d Copy String
n

M C:

7. Cc lnh nhn v chia s nguyn

M hp ng MIPS

strcpy:
addi
sw
add
L1: add
lbu
add
sb
beq
addi
j
L2: lw
addi
jr

$sp,
$s0,
$s0,
$t1,
$t2,
$t3,
$t2,
$t2,
$s0,
L1
$s0,
$sp,
$ra

$sp, -4
0($sp)
$zero, $zero
$s0, $a1
0($t1)
$s0, $a0
0($t3)
$zero, L2
$s0, 1
0($sp)
$sp, 4

#
#
#
#
#
#
#
#
#
#
#
#
#

adjust stack for 1 item


save $s0
i = 0
addr of y[i] in $t1
$t2 = y[i]
addr of x[i] in $t3
x[i] = y[i]
exit loop if y[i] == 0
i = i + 1
next iteration of loop
restore saved $s0
pop 1 item from stack
and return

Computer Architecture

Nguyn Kim Khnh DCE-HUST

MIPS c hai thanh ghi 32-bit: HI (high) v LO (low)


Cc lnh lin quan:
n
n

n
n

n
n
287

Jan2016

mult rs, rt # nhn s nguyn c du


multu rs, rt # nhn s nguyn khng du
n Tch 64-bit nm trong cp thanh ghi HI/LO
div rs, rt
# chia s nguyn c du
divu rs, rt # chia s nguyn khng du
n HI: cha phn d, LO: cha thng
mfhi rd
mflo rd

# Move from Hi to rd
# Move from LO to rd
Computer Architecture

288

72

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

8. Cc lnh vi s du phy ng (FP)

Cc lnh vi s du phy ng
n

Cc thanh ghi s du phy ng

32 thanh ghi 32-bit (single-precision): $f0, $f1, $f31


Cp i cha d liu dng 64-bit (double-precision):
$f0/$f1, $f2/$f3,

n
n

V d:

n
n

ldc1 $f8, 32($s2)

n
Computer Architecture

289

NKK-HUST

Computer Architecture

M thao tc, hai thanh ghi, hng s

Hu ht cc ch r nhnh l r nhnh gn
op

rs

rt

imm

6 bits

5 bits

5 bits

16 bits

n
n
n

Nguyn Kim Khnh DCE-HUST

rs

rt

imm

5 bits

5 bits

16 bits

4 or 5

16

17

Exit
khong cch tng i tnh theo word

Lnh m my

PC-relative addressing
a ch ch = PC + hng s imm 4
Ch : trc PC c tng ln
Hng s imm 16-bit c gi tr trong di [-215 , +215 - 1]
Computer Architecture

op
6 bits

beq $s0, $s1, Exit


bne $s0, $s1, Exit

R xui hoc r ngc

nh a ch tng i vi PC
n

Jan2016

Lnh beq, bne

Cc lnh Branch ch ra:

290

NKK-HUST

bc1t, bc1f
VD:
bc1t TargetLabel

Jan2016

5.5. Cc phng php nh a ch ca MIPS


n

c.xx.s, c.xx.d (trong xx l eq, lt, le, )


Thit lp hoc xa cc bit m iu kin
VD:
c.lt.s $f3, $f4

Cc lnh r nhnh da trn m iu kin


n

Jan2016

add.d, sub.d, mul.d, div.d


VD:
mul.d $f4, $f4, $f6

Cc lnh so snh

lwc1, ldc1, swc1, sdc1

add.s, sub.s, mul.s, div.s


VD:
add.s $f0, $f1, $f6

Cc lnh s hc vi s FP 64-bit (double-precision)


n

Cc lnh s du phy ng ch thc hin trn cc


thanh ghi s du phy ng
Lnh load v store vi FP

Cc lnh s hc vi s FP 32-bit (single-precision)

291

Jan2016

beq

000100

10000

10001

0000 0000 0000 0110

bne

000101

10000

10001

0000 0000 0000 0110

Computer Architecture

292

73

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

a ch ha cho lnh Jump


n

V d m lnh j v jr
j
jr

ch ca lnh Jump (j v jal) c th l bt


k ch no trong chng trnh
n

Cn m ha y a ch trong lnh
op

address

6 bits

26 bits

31

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1

x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
From PC

31

op

Computer Architecture

293

Effective target address (32 bits)


25

rs

20

rt

15

rd

10

sh

fn

0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ALU
instruction

NKK-HUST

Source
register

Jan2016

Unused

Unused

Unused

jr = 8

Computer Architecture

294

NKK-HUST

V d m ha lnh

R nhnh xa
n

$t1, $s3, 2

0x8000

19

add

$t1, $t1, $s6

0x8004

22

32

lw

$t0, 0($t1)

0x8008

35

bne

$t0, $s5, Exit

Loop: sll

0x800C

21

addi $s3, $s3, 1

0x8010

19

19

0x8014

Loop

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Nu ch r nhnh l qu xa m ha vi
offset 16-bit, assembler s vit li code
V d
beq $s0, $s1, L1
(lnh k tip)
...

L1:

0x2000

0x8018

Exit:

Jan2016

jump target address

25

j=2

a ch ch = PC3128 : (address 4)

Jan2016

op

0 0 0 0 1 0

nh a ch nhy (gi) trc tip


(Pseudo)Direct jump addressing
n

# nhy n v tr c nhn L1
# nhy n v tr c a ch $ra;
# $ra cha a ch tr v

L1
$ra

s c thay bng on lnh sau:


bne $s0, $s1, L2
j L1
L2: (lnh k tip)
...
L1:
295

Jan2016

Computer Architecture

296

74

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Tm tt v cc phng php nh a ch
2.10

5.6. Dch v chy chng trnh hp ng

117

MIPS Addressing for 32-bit Immediates and Addresses

1. Immediate addressing

1. nh a ch tc th

op

rs

rt

Immediate

2. Register addressing

2. nh a ch thanh ghi

op

rs

rt

rd

. . . funct

Registers

Register

3. Base addressing
op

3. nh a ch c s

rs

rt

Address

Register

Memory
+

Byte Halfword

Word

4. PC-relative addressing
op

rs

rt

4. nh a ch tng i vi PC

Cc phn mm lp trnh hp ng MIPS:

Address

PC

MARS
MipsIt
PCSpim

MIPS Reference Data

Memory
+

Word

5. Pseudodirect addressing
op

5. nh a ch gi trc tip

Address

PC

Memory
Word

Jan2016

Computer
FIGURE
2.18 Architecture
Illustration of the five MIPS addressing modes. The operands are shaded297
in color.
The operand of mode 3 is in memory, whereas the operand for mode 2 is a register. Note that versions of
load and store access bytes, halfwords, or words. For mode 1, the operand is 16 bits of the instruction itself.
Modes 4 and 5 address instructions in memory, with mode 4 adding a 16-bit address shifted left 2 bits to the
PC and mode 5 concatenating a 26-bit address shifted left 2 bits with the 4 upper bits of the PC. Note that a
single operation can use more than one addressing mode. Add, for example, uses both immediate (addi)
and register (add) addressing.

NKK-HUST

Although we show MIPS as having 32-bit addresses, nearly all microprocessors


(including MIPS) have 64-bit address extensions (see Appendix E and Section
2.18). These extensions were in response to the needs of software for larger
programs. The process of instruction set extension allows architectures to expand in
such a way that is able to move software compatibly upward to the next generation
of architecture.

Dch v chy ng dng


High Level Code

Jan2016

Hardware/
Software
Interface

Chng trnh trong b nh

Cc lnh (instructions)
D liu
n

Assembly Code
Assembler
Object File

Object Files
Library Files

Linker

298

NKK-HUST

Compiler

Computer Architecture

B nh:
n

Executable

Ton cc/tnh: c cp pht trc khi chng trnh


bt u thc hin
ng: c cp pht trong khi chng trnh thc
hin

232 bytes = 4 GiB


a ch t 0x00000000 n 0xFFFFFFFF

Loader
Memory
Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

299

Jan2016

Computer Architecture

300

75

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Bn b nh ca MIPS
Address

V d: M C

Segment

int f, g, y;
variables

0xFFFFFFFC

Reserved
0x80000000
0x7FFFFFFC

Stack

int main(void)
{
f = 2;
g = 3;
y = sum(f, g);
return y;
}

Dynamic Data
0x10010000

Heap

0x1000FFFC

Static Data
0x10000000
0x0FFFFFFC

Text

int sum(int a, int b) {


return (a + b);
}

0x00400000
0x003FFFFC

Reserved
0x00000000

Jan2016

Computer Architecture

301

NKK-HUST

Jan2016

Computer Architecture

302

NKK-HUST

V d chng trnh hp ng MIPS


.data
f:
g:
y:
.text
main:
addi
sw
addi
sw
addi
sw
jal
sw
lw
addi
jr
sum:
add
jr
Jan2016

// global

Bng k hiu
K hiu

$sp,
$ra,
$a0,
$a0,
$a1,
$a1,
sum
$v0,
$ra,
$sp,
$ra

$sp, -4
0($sp)
$0, 2
f
$0, 3
g

#
#
#
#
#
#
#
#
#
#
#

y
0($sp)
$sp, 4

$v0, $a0, $a1


$ra

stack frame
store $ra
$a0 = 2
f = 2
$a1 = 3
g = 3
call sum
y = sum()
restore $ra
restore $sp
return to OS

a ch

0x10000000

0x10000004

0x10000008

main

0x00400000

sum

0x0040002C

# $v0 = a + b
# return
Computer Architecture

Nguyn Kim Khnh DCE-HUST

303

Jan2016

Computer Architecture

304

76

necessary. Figure 6.33 shows the executable file. It has three sections:
the executable file header, the text segment, and the data segment.
The executable file header reports the text size (code size) and data size
(amount of globally declared data). Both are given in units of bytes. The
text segment gives the instructions in the order that they are stored in
memory.
The figure shows the instructions in human-readable format next to
the machine code for ease of interpretation, but the executable file
includes only machine instructions. The data segment gives the address
of each global variable. The global variables are addressed with respect
NKK-HUST
to the base address given by the global pointer, $gp. For example, the first

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

Chng trnh thc thi


Executable file header

Text segment

Figure 6.33 Executable

Data segment

Text Size

Data Size

0x34 (52 bytes)

0xC (12 bytes)

Chng trnh trong b nh


Address

Memory
Reserved

0x7FFFFFFC

Stack

0x10010000

Heap

Address

Instruction

0x00400000

0x23BDFFFC

addi $sp, $sp, 4

0x00400004

0xAFBF0000

sw

0x00400008

0x20040002

addi $a0, $0, 2

0x0040000C

0xAF848000

sw

0x00400010

0x20050003

addi $a1, $0, 3

0x00400014

0xAF858004

sw

0x00400018

0x0C10000B

jal

0x0040002C

0x0040001C

0xAF828008

sw

$v0, 0x8008($gp)

0x00400020

0x8FBF0000

lw

$ra, 0($sp)

0x00400024

0x23BD0004

addi $sp, $sp, 4

0x03E00008

0x00400028

0x03E00008

jr

0x00851020

0x0040002C

0x00851020

add $v0, $a0, $a1

0x00400030

0x03E00008

jr

Address

Data

0x0C10000B

0x10000000

0x20050003

0x10000004

0xAF848000

0x10000008

$ra, 0($sp)
$a0, 0x8000($gp)

$sp = 0x7FFFFFFC

$gp = 0x10008000
y

$a1, 0x8004($gp)

g
0x10000000

$ra

0x03E00008
0x23BD0004

$ra

0x8FBF0000
0xAF828008
0xAF858004

0x20040002
0xAFBF0000
0x00400000

0x23BDFFFC

PC = 0x00400000

Reserved

Jan2016

Computer Architecture

305

NKK-HUST

Jan2016

Computer Architecture

306

NKK-HUST

V d lnh gi (Pseudoinstruction)
Pseudoinstruction

Jan2016

MIPS Instructions

li $s0, 0x1234AA77

lui $s0, 0x1234


ori $s0, 0xAA77

mul $s0, $s1, $s2

mult $s1, $s2


mflo $s0

clear $t0

add $t0, $0, $0

move $s1, $s2

add $s2, $s1, $0

nop

sll $0, $0, 0

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Ht chng 5

307

Jan2016

Computer Architecture

308

77

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Kin trc my tnh

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song

Chng 6
B X L

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

309

NKK-HUST

Jan2016

Computer Architecture

310

NKK-HUST

Ni dung ca chng 6

6.1. T chc ca CPU


1. Cu trc c bn ca CPU
n

6.1. T chc ca CPU


6.2. Thit k n v iu khin
6.3. K thut ng ng lnh
6.4. V d thit k b x l theo kin trc
MIPS*

Nhim v ca CPU:
n

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

311

Jan2016

Nhn lnh (Fetch Instruction): CPU c lnh t b


nh
Gii m lnh (Decode Instruction): xc nh thao tc
m lnh yu cu
Nhn d liu (Fetch Data): nhn d liu t b nh
hoc cc cng vo-ra
X l d liu (Process Data): thc hin php ton s
hc hay php ton logic vi cc d liu
Ghi d liu (Write Data): ghi d liu ra b nh hay
cng vo-ra
Computer Architecture

312

78

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

S cu trc c bn ca CPU
n

n v
s hc
v logic
(ALU)

n v
iu khin
(CU)

2. n v s hc v logic

Tp
thanh ghi
(RF)

Chc nng: Thc hin cc php ton


s hc v php ton logic:
n
n

S hc: cng, tr, nhn, chia, o du


Logic: AND, OR, XOR, NOT, php dch bit

bus bn trong

n v ni ghp bus (BIU)

bus iu khin

Jan2016

bus d liu

bus a ch

Computer Architecture

313

NKK-HUST

Jan2016

Computer Architecture

314

NKK-HUST

M hnh kt ni ALU
D liu t
cc thanh ghi

Cc tn hiu
t n v
iu khin

3. n v iu khin
D liu n
cc thanh ghi

Chc nng
iu khin nhn lnh t b nh a vo CPU
n Tng ni dung ca PC tr sang lnh k tip
n Gii m lnh c nhn xc nh thao
tc m lnh yu cu
n Pht ra cc tn hiu iu khin thc hin lnh
n Nhn cc tn hiu yu cu t bus h thng v
p ng vi cc yu cu .
n

n v
s hc v logic
(ALU)

Thanh ghi c

Thanh ghi c: hin th trng thi ca kt qu php ton


Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

315

Jan2016

Computer Architecture

316

79

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

M hnh kt ni n v iu khin

Cc tn hiu a n n v iu khin
n

Thanh ghi lnh

n
Cc tn hiu
iu khin
bn trong CPU

Cc c

n v iu khin
Clock

Cc tn hiu
iu khin t
bus h thng

Cc tn hiu
iu khin n
bus h thng

Clock: tn hiu nhp t mch to dao


ng bn ngoi
Lnh t thanh ghi lnh a n gii
m
Cc c t thanh ghi c cho bit trng
thi ca CPU
Cc tn hiu yu cu t bus iu khin

Bus iu khin

Jan2016

Computer Architecture

317

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Cc tn hiu pht ra t n v iu khin


n

4. Hot ng ca chu trnh lnh

Cc tn hiu iu khin bn trong CPU:


n
n

Chu trnh lnh

iu khin cc thanh ghi


iu khin ALU

n
n

Cc tn hiu iu khin bn ngoi CPU:


n
n

iu khin b nh
iu khin cc m-un vo-ra

n
n
n

Jan2016

318

Computer Architecture

Nguyn Kim Khnh DCE-HUST

319

Jan2016

Nhn lnh
Gii m lnh
Nhn ton hng
Thc hin lnh
Ct ton hng
Ngt
Computer Architecture

320

80

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Gin trng thi chu trnh lnh


Nhn
ton hng

Nhn lnh

Gii m
thao tc
lnh

Tnh
a ch
ton hng

Nhiu
ton
hng

Thao tc
d liu

Tnh
a ch
ton hng

Quay li vi d liu
String hoc Vector

Lnh hon thnh,


nhn lnh tip theo

Jan2016

Ct
ton hng

Nhiu
ton
hng

Tnh
a ch
ca lnh

Nhn lnh

Kim tra
ngt

C
ngt

n
Ngt

Khng
ngt

Computer Architecture

321

NKK-HUST

CPU a a ch ca lnh cn nhn t b


m chng trnh PC ra bus a ch
CPU pht tn hiu iu khin c b nh
Lnh t b nh c t ln bus d liu
v c CPU copy vo thanh ghi lnh IR
CPU tng ni dung PC tr sang lnh
k tip

Jan2016

Computer Architecture

322

NKK-HUST

S m t qu trnh nhn lnh

Gii m lnh

CPU
PC

B nh

n v
iu khin

Lnh t thanh ghi lnh IR c a


n n v iu khin
n v iu khin tin hnh gii m lnh
xc nh thao tc phi thc hin
Gii m lnh xy ra bn trong CPU

IR

Bus Bus Bus


a d iu
ch liu khin

PC: B m chng trnh


IR: Thanh ghi lnh

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

323

Jan2016

Computer Architecture

324

81

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Nhn d liu t b nh

S m t nhn d liu t b nh
CPU

CPU a a ch ca ton hng ra bus


a ch

CPU pht tn hiu iu khin c

Ton hng c c vo CPU

Tng t nh nhn lnh

MAR

n v
iu khin

MBR

MAR: Thanh ghi a ch b nh


MBR: Thanh ghi m b nh

Jan2016

Computer Architecture

325

NKK-HUST

Jan2016

326

NKK-HUST

Ghi ton hng

C nhiu dng tu thuc vo lnh


C th l:
n
n
n
n
n
n

Jan2016

Bus Bus Bus


a d iu
ch liu khin

Computer Architecture

Thc hin lnh


n

B nh

c/Ghi b nh
Vo/Ra
Chuyn gia cc thanh ghi
Php ton s hc/logic
Chuyn iu khin (r nhnh)
...

Computer Architecture

Nguyn Kim Khnh DCE-HUST

CPU a a ch ra bus a ch

CPU a d liu cn ghi ra bus d liu

CPU pht tn hiu iu khin ghi

327

Jan2016

D liu trn bus d liu c copy n


v tr xc nh

Computer Architecture

328

82

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

S m t qu trnh ghi ton hng

Ngt
n

CPU
MAR

n
n v
iu khin

B nh

n
n

MBR

MAR: Thanh ghi a ch b nh


MBR: Thanh ghi m b nh

Jan2016

Bus Bus Bus


a d iu
ch liu khin

Computer Architecture

329

NKK-HUST

Jan2016

Ni dung ca b m chng trnh PC (a


ch tr v sau khi ngt) c a ra bus d
liu
CPU a a ch (thng c ly t con tr
ngn xp SP) ra bus a ch
CPU pht tn hiu iu khin ghi b nh
a ch tr v trn bus d liu c ghi ra v
tr xc nh ( ngn xp)
a ch lnh u tin ca chng trnh con
iu khin ngt c np vo PC
Computer Architecture

330

NKK-HUST

S m t chu trnh ngt

6.2. Cc phng php thit k n v iu khin

CPU
SP

MAR

PC
n v
iu khin

B nh

n v iu khin vi chng trnh


(Microprogrammed Control Unit)
n v iu khin ni kt cng
(Hardwired Control Unit)

MBR

MAR: Thanh ghi a ch b nh


MBR: Thanh ghi m b nh
PC: B m chng trnh
SP: Con tr ngn xp

Jan2016

Bus Bus Bus


a d iu
ch liu khin

Computer Architecture

Nguyn Kim Khnh DCE-HUST

331

Jan2016

Computer Architecture

332

83

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

1. n v iu khin vi chng trnh


n

B nh vi chng trnh
(ROM) lu tr cc vi
Cc tn hiu
iu khin t
chng trnh
bus h thng
(microprogram)
Clock
Mt vi chng trnh bao Cc c
gm cc vi lnh
(microinstruction)
Mi vi lnh m ho cho
mt vi thao tc
(microoperation)
hon thnh mt lnh
cn thc hin mt hoc
mt vi vi chng trnh
Tc chm

Jan2016

2. n v iu khin ni kt cng
n

Thanh ghi lnh

B gii m

Mch dy

Thanh ghi a ch vi lnh

B nh
vi chng trnh

n
n

Vi lnh
tip
theo

Thanh ghi m vi lnh

S dng mch
cng gii m
v to cc tn hiu
iu khin thc
hin lnh
Tc nhanh
Clock
n v iu khin
phc tp

Tn hiu iu
khin bn trong
CPU

T1
T2
..
.

..
.

n v
iu khin

Cc
c

Tn
... C
m-1

Cc tn hiu iu khin

Tn hiu iu
khin n bus h
thng

333

Jan2016

Computer Architecture

334

NKK-HUST

6.3. K thut ng ng lnh

Biu thi gian ca ng ng lnh

K thut ng ng lnh (Instruction Pipelining): Chia


chu trnh lnh thnh cc cng on v cho php thc
hin gi ln nhau (nh dy chuyn lp rp)

lnh 1
lnh 2

Chng hn b x l MIPS c 5 cng on:

lnh 3

1. IF: Instruction fetch from memory Nhn lnh t b nh


2. ID: Instruction decode & register read Gii m lnh v c
thanh ghi
3. EX: Execute operation or calculate address Thc hin thao tc
hoc tnh ton a ch
4. MEM: Access memory operand Truy nhp ton hng b nh
5. WB: Write result back to register Ghi kt qu tr v thanh ghi

Jan2016

Mch
phn
chia thi
gian

C0 C1

NKK-HUST

B gii m

B gii m vi lnh

Computer Architecture

Thanh ghi lnh

Computer Architecture

Nguyn Kim Khnh DCE-HUST

lnh 4
lnh 5
lnh 6
lnh 7

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

lnh 8

10

11

12

WB

Thi gian thc hin 1 cng on = T


Thi gian thc hin tun t 8 lnh:
8 x 5T = 40T
Thi gian thc hin ng ng 8 lnh: (1 x 5T) + [ (8-1) x T ] = 12T
335

Jan2016

Computer Architecture

336

84

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Hazard cu trc

Cc mi tr ngi (Hazard) ca ng ng lnh


n

Hazard: Tnh hung ngn cn bt u


ca lnh tip theo chu k tip theo
n

Hazard cu trc: do ti nguyn c yu


cu ang bn

Hazard d liu: cn phi i lnh trc


hon thnh vic c/ghi d liu

Hazard iu khin: do r nhnh gy ra

Jan2016

Computer Architecture

337

NKK-HUST

Computer Architecture

338

NKK-HUST

Forwarding (gi vt trc)

Lnh ph thuc vo vic hon thnh truy


cp d liu ca lnh trc

Computer Architecture

Nguyn Kim Khnh DCE-HUST

S dng kt qu ngay sau khi n c tnh


Khng i n khi kt qu c lu n thanh
ghi
n Yu cu c ng kt ni thm trong datapath
n

add $s0, $t0, $t1


sub $t2, $s0, $t3

Jan2016

Lnh Load/store yu cu truy cp d liu


Nhn lnh cn tr hon cho chu k

Bi vy, datapath kiu ng ng yu


cu b nh lnh v b nh d liu tch
ri (hoc cache lnh/cache d liu tch
ri)

Jan2016

Hazard d liu
n

Xung t khi s dng ti nguyn


Trong ng ng ca MIPS vi mt b
nh dng chung

339

Jan2016

Computer Architecture

340

85

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Hazard d liu vi lnh load


n

Khng phi lun lun c th trnh tr hon


bng cch forwarding
n
n
n

Nu gi tr cha c tnh khi cn thit


Khng th chuyn ngc thi gian
Cn chn bc tr hon (stall hay bubble)

Lp lch m trnh tr hon


Thay i trnh t m trnh s dng kt
qu load lnh tip theo
M C:
a = b + e; c = b + f;

stall

stall

lw
lw
add
sw
lw
add
sw

$t1,
$t2,
$t3,
$t3,
$t4,
$t5,
$t5,

0($t0)
4($t0)
$t1, $t2
12($t0)
8($t0)
$t1, $t4
16($t0)

13 cycles
Jan2016

Computer Architecture

341

NKK-HUST

Jan2016

Computer Architecture

0($t0)
4($t0)
8($t0)
$t1, $t2
12($t0)
$t1, $t4
16($t0)

11 cycles
342

Tr hon khi r nhnh

R nhnh xc nh lung iu khin

Nhn lnh tip theo ph thuc vo kt qu


r nhnh
n ng ng khng th lun nhn ng lnh
n

$t1,
$t2,
$t4,
$t3,
$t3,
$t5,
$t5,

NKK-HUST

Hazard iu khin
n

lw
lw
lw
add
sw
add
sw

i cho n khi kt qu r nhnh c


xc nh trc khi nhn lnh tip theo

Vn ang lm cng on gii m lnh (ID)


ca lnh r nhnh

Vi ng ng ca MIPS
Cn so snh thanh ghi v tnh a ch ch
sm trong ng ng
n Thm phn cng thc hin vic
trong cng on ID
n

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

343

Jan2016

Computer Architecture

344

86

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

D on r nhnh
n

Nhng ng ng di hn khng th
sm xc nh d dng kt qu r nhnh
n

Prediction
correct

Cch tr hon khng p ng c

D on kt qu r nhnh
n

MIPS vi d on r nhnh khng xy ra

Ch tr hon khi d on l sai

Vi MIPS
n
n

Jan2016

C th d on r nhnh khng xy ra
Nhn lnh ngay sau lnh r nhnh (khng
lm tr)
Computer Architecture

Prediction
incorrect

345

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

c im ca ng ng

Tng cng kh nng song song mc lnh


n

K thut ng ng ci thin hiu nng


bng cch tng s lnh thc hin
n
n

Tng s cng on ca ng ng
Lnh 1
Lnh 2
Lnh 3

Thc hin nhiu lnh ng thi


Mi lnh c cng thi gian thc hin

Lnh 4
Lnh 5
Lnh 6

Cc dng hazard:
n

346

Cu trc, d liu, iu khin

Siu v hng (Superscalar)


Lnh 1

Thit k tp lnh nh hng n phc


tp ca vic thc hin ng ng

Lnh 2
Lnh 3
Lnh 4
Lnh 5
Lnh 6

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

347

Jan2016

Computer Architecture

348

87

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

6.4. Thit k b x l theo kin trc MIPS*


B x l MIPS Chapter 4 [2]

Ht chng 6

Jan2016

Computer Architecture

349

NKK-HUST

Jan2016

350

NKK-HUST

Kin trc my tnh

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song

Chng 7
B NH MY TNH

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

Computer Architecture

Nguyn Kim Khnh DCE-HUST

351

Jan2016

Computer Architecture

352

88

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Ni dung ca chng 7

7.1. Tng quan h thng nh


1. Cc c trng ca b nh
n V tr

7.1. Tng quan h thng nh


7.2. B nh chnh
7.3. B nh m (cache)
7.4. B nh ngoi
7.5. B nh o

Bn trong CPU:

B nh trong:

n
n

n
Computer Architecture

353

NKK-HUST

di t nh (tnh bng bit)


S lng t nh

Jan2016

Computer Architecture

354

NKK-HUST

Cc c trng ca b nh (tip)
n

Cc c trng ca b nh (tip)

n v truyn
n

T nh
Khi nh

n
n
n

Hiu nng (performance)


n
n

Phng php truy nhp


n

Jan2016

cc thit b lu tr

Dung lng
n

Jan2016

b nh chnh
b nh m (cache)

B nh ngoi:
n

tp thanh ghi

Truy nhp tun t (bng t)


Truy nhp trc tip (cc loi a)
Truy nhp ngu nhin (b nh bn dn)
Truy nhp lin kt (cache)

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Kiu vt l
n
n
n

355

Jan2016

Thi gian truy nhp


Chu k nh
Tc truyn
B nh bn dn
B nh t
B nh quang

Computer Architecture

356

89

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cc c trng ca b nh (tip)
n

2. Phn cp b nh

Cc c tnh vt l

B vi x l
CPU

Kh bin / Khng kh bin


(volatile / nonvolatile)
n Xo c / khng xo c
n

Tp
thanh
ghi

Cache

B nh
chnh

Thit b
lu tr
(HDD, SSD)

T chc

B nh
mng

T tri sang phi:


n dung lng tng dn
n tc gim dn
n gi thnh cng dung lng gim dn
Jan2016

Computer Architecture

357

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Cng ngh b nh
Thi gian
truy nhp

Gi thnh/GiB
(2012)

SRAM

0,5 2,5 ns

$500 $1000

DRAM

50 70 ns

$10 $20

Flash memory 5.000 50.000 ns


HDD

5 20 ms

Trong mt khong thi gian nh CPU


thng ch tham chiu cc thng tin
trong mt khi nh cc b
V d:
n

$0,75 $1

$0,05 $0,1

Cu trc chng trnh tun t


Vng lp c thn nh
Cu trc d liu mng

B nh l tng
n
n

Jan2016

Nguyn l cc b ho tham chiu b nh

Cng ngh
b nh

358

Thi gian truy nhp nh SRAM


Dung lng v gi thnh nh a cng
Computer Architecture

Nguyn Kim Khnh DCE-HUST

359

Jan2016

Computer Architecture

360

90

Bi ging Kin trc my tnh

NKK-HUST

Jan2016

NKK-HUST

7.2. B nh chnh

ROM (Read Only Memory)

1. B nh bn dn
Kiu b nh
Read Only Memory
(ROM)

B nh
ch c

Programmable ROM
(PROM)
Erasable PROM
(EPROM)
Electrically Erasable
PROM (EEPROM)

Kh nng xo

Tiu
chun

Khng xo
c

B nh
hu nh
ch c

Flash memory
Random Access
Memory (RAM)

B nh
c-ghi

Jan2016

bng tia cc tm,


c chip
bng in,
mc tng byte

C ch ghi

Tnh
kh bin

Mt n

n
n

Bng in

Khng
kh bin

bng in,
mc tng byte

Bng in

Computer Architecture

Kh bin
361

Jan2016

Computer Architecture

Cn thit b chuyn dng ghi


Ch ghi c mt ln

Ghi theo khi


Xa bng in
Dung lng ln

EPROM (Erasable PROM)


n
n
n

Cn thit b chuyn dng ghi


Xa c bng tia t ngoi
Ghi li c nhiu ln

EEPROM (Electrically Erasable PROM)


n
n

Jan2016

thng tin c ghi khi sn xut

PROM (Programmable ROM)


n

B nh Flash

ROM mt n:
n

362

NKK-HUST

Cc kiu ROM

Th vin cc chng trnh con


Cc chng trnh iu khin h thng (BIOS)
Cc bng chc nng
Vi chng trnh

bng in,
tng khi

NKK-HUST

B nh khng kh bin
Lu tr cc thng tin sau:

C th ghi theo tng byte


Xa bng in
Computer Architecture

Nguyn Kim Khnh DCE-HUST

363

Jan2016

Computer Architecture

364

91

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

RAM (Random Access Memory)


n
n
n
n

Jan2016

SRAM (Static) RAM tnh

B nh c-ghi (Read/Write Memory)


Kh bin
Lu tr thng tin tm thi
C hai loi: SRAM v DRAM
(Static and Dynamic)

Computer Architecture

Cc bit c lu tr bng cc Flip-Flop


thng tin n nh
n Cu trc phc tp
n Dung lng chip nh
n Tc nhanh
n t tin
n Dng lm b nh cache
n

365

NKK-HUST

Jan2016

n
n
n
n
n

Jan2016

366

NKK-HUST

DRAM (Dynamic) RAM ng


n

Computer Architecture

Mt s DRAM tin tin thng dng

Cc bit c lu tr trn t in
cn phi c mch lm ti
Cu trc n gin
Dung lng ln
Tc chm hn
R tin hn
Dng lm b nh chnh

Computer Architecture

Nguyn Kim Khnh DCE-HUST

n
n

n
n

367

Jan2016

Ci tin tng tc
Synchronous DRAM (SDRAM): lm vic
c ng b bi xung clock
DDR-SDRAM (Double Data Rate SDRAM)
DDR3, DDR4

Computer Architecture

368

92

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

T chc ca chip nh
S c bn ca chip nh

Cc tn hiu ca chip nh
n
n

A0

D0

A1

D1

.
.
.

Chip nh
2n x m bit

.
.
.

An-1

Tn hiu chn chip


(Chip Select)
5.1 CS
/ SEMICONDUCTOR
MAIN MEMORY 167
Tn
hiu
iu
khin
c
OE
(Output
Enable)
Packaging
As was
mentioned
in
Chapter
2, an integrated
circuit is(Write
mounted onEnable)
a package that
n
Tn
hiu
iu
khin
ghi
WE
contains pins for connection to the outside world.
Figure 5.4a shows an example EPROM package, which is an 8-Mbit chip
(Cc
tn hiu iu khin thng tch cc vi mc 0)
organized as 1M * 8. In this case, the organization is treated as a one-word-per-chip
n

n
Chip

Dm-1

m-bit

CS

WE

Jan2016

Cc ng a ch: An-1 A0 c 2n t nh
Cc ng d liu: Dm-1 D0 di t
nh = m bit
Dung lng chip nh = 2n x m bit
Cc ng iu khin:

package. The package includes 32 pins, which is one of the standard chip package
sizes. The pins support the following signal lines:

OE

Computer Architecture

369

NKK-HUST

Jan2016

NKK-HUST

T chc ca DRAM

The address of the word being accessed. For 1M words, a total of 20 (220 = 1M)
pins are needed (A0A19).
The data to be read out, consisting of 8 lines (D0D7).
Computer Architecture
The power supply to the chip (Vcc).

370

A ground pin (Vss).


A chip enable (CE) pin. Because there may be more than one memory chip,
each of which is connected to the same address bus, the CE pin is used to indicate whether or not the address is valid for this chip. The CE pin is activated
by logic connected to the higher-order bits of the address bus (i.e., address bits
above A19). The use of this signal is illustrated presently.
A program voltage (Vpp) that is supplied during programming (write operations).
A typical DRAM pin configuration is shown in Figure 5.4b, for a 16-Mbit chip
Vorganized
d
chip
nh
as
4M * 4. There
are several differences from a ROM chip. Because

a RAM can be updated, the data pins are input/output. The write enable (WE)
and output enable (OE) pins indicate whether this is a write or read operation.

Dng n ng a ch dn knh cho


php truyn 2n bit a ch
Tn hiu chn a ch hng RAS
(Row Address Select)
Tn hiu chn a ch ct CAS
(Column Address Select)
Dung lng ca DRAM= 22n x m bit

A19

32

Vcc

Vcc

24

Vss

A16

31

A18

D0

23

D3

A15

30

A17

D1

22

D2

A12

29

A14

WE

21

CAS

A7

28

A13

RAS

A6

27

A8

NC

A5

26

A9

A4

25

A3

9 32-Pin Dip 24

A2

10

A1

11

A0

12

D0

20
6 24-Pin Dip 19

OE

A10

18

A8

A11

A0

17

A7

Vpp

A1

16

A6

23

A10

A2

10

15

A5

22

CE

A3

11

14

A4

21

D7

Vcc

12 Top View 13

Vss

13

20

D6

D1

14

19

D5

D2

15

18

D4

Vss

16 Top View 17

D3

0.6"

(a) 8-Mbit EPROM

Figure 5.4

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

371

Jan2016

0.6"

A9

(b) 16-Mbit DRAM

Typical Memory Package Pins and Signals

Computer Architecture

372

93

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Thit k m-un nh bn dn
n
n

Tng di t nh
VD1:
n Cho chip nh SRAM 4K x 4 bit
n Thit k m-un nh 4K x 8 bit
Gii:
n Dung lng chip nh = 212 x 4 bit
n chip nh c:

2n x

Dung lng chip nh


m bit
Cn thit k tng dung lng:
n
n
n

Thit k tng di t nh
Thit k tng s lng t nh
Thit k kt hp

n
n

m-un nh cn c:
n
n

Jan2016

Computer Architecture

373

NKK-HUST

Jan2016

12 chn a ch
4 chn d liu
12 chn a ch
8 chn d liu
Computer Architecture

374

NKK-HUST

S v d tng di t nh

Bi ton tng di t nh tng qut

A11A0

A11A0
D3 D0

D3 D0

CS
WE

A11A0

D7 D4

CS
OE

WE

D3 D0

Cho chip nh 2n x mbit


Thit k m-un nh 2n x (k.m) bit
Dng k chip nh

OE

CS
WE
OE

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

375

Jan2016

Computer Architecture

376

94

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Tng s lng t nh

S v d tng s lng t nh

VD2:
n Cho chip nh SRAM 4K x 8 bit
n Thit k m-un nh 8K x 8 bit
Gii:
12 x 8 bit
n Dung lng chip nh = 2
n chip nh c:
n
n

A11A0
A11A0
D7 D0
CS
A12

OE
D7 D0

A11A0
D7 D0

13 chn a ch
8 chn d liu
Computer Architecture

377

NKK-HUST

Jan2016

CS

Y0 Y1

WE

WE

OE

OE

Computer Architecture

378

NKK-HUST

B gii m 24

A
B

Jan2016

Y1

12 chn a ch
8 chn d liu

Jan2016

WE

Y0
B
gii m
1 2

Dung lng m-un nh = 213 x 8 bit


n

Thit k kt hp

Y0

Y1

Y2

Y3

Y0

Y1
B
gii m
2 4 Y2

Y3

Computer Architecture

Nguyn Kim Khnh DCE-HUST

V d 3:
n Cho chip nh SRAM 4K x 4 bit
n Thit k m-un nh 8K x 8 bit
Phng php thc hin:
n Kt hp v d 1 v v d 2
n T v s thit k

379

Jan2016

Computer Architecture

380

95

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

2. Cc c trng c bn ca b nh chnh
n

n
n

Jan2016

T chc b nh an xen (interleaved memory)

Cha cc chng trnh ang thc hin v cc


d liu ang c s dng
Tn ti trn mi h thng my tnh
Bao gm cc ngn nh c nh a ch trc
tip bi CPU
Dung lng ca b nh chnh nh hn khng
gian a ch b nh m CPU qun l.
Vic qun l logic b nh chnh tu thuc vo
h iu hnh

Computer Architecture

rng ca bus d liu trao i vi


b nh: m = 8, 16, 32, 64,128 ... bit
n Cc ngn nh c t chc theo byte
t chc b nh vt l khc nhau
n

381

NKK-HUST

Jan2016

Computer Architecture

382

NKK-HUST

m=8bit mt bng nh tuyn tnh

m = 16bit hai bng nh an xen


Bng 1

0
1
2
3
4
5
6

. . .
AN-1 - A0

Bng 0
1
3
5
7
9

...
i

...
2i+1

AN-1 - A1

...

. . .

0
2
4
6
8

2i

...

BE1
D15 - D8

D7 - D0
Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

383

Jan2016

Computer Architecture

BE0
D7 - D0

384

96

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

m = 32bit bn bng nh an xen


bng 3

bng 2
3
7
11
15
19

...

...

Jan2016

BE2
D23-D16

BE1

BE0

D63-D56

385

NKK-HUST

Jan2016

BE0
D7 - D0

386

V d v thao tc ca cache
n

Cache c tc nhanh hn b nh chnh


Cache c t gia CPU v b nh chnh nhm
tng tc CPU truy cp b nh
Cache c th c t trn chip CPU

n
n

n
CPU

cache

B nh
chnh

truyn theo t nh
Jan2016

BE1
D15 - D8

NKK-HUST

1. Nguyn tc chung ca cache

8i

...

Computer Architecture

7.3. B nh m (cache memory)

...

...

BE7

D7 - D0

0
8
16

8i+1

...

Computer Architecture

...

8i+7

AN-1 - A3

...

D15 - D8

bng 0
1
9
17

...

...
4i

...

bng 1
7
15
23

...
4i+1

...

BE3
D31-D24

0
4
8
12
16

...
4i+2

bng 7

bng 0
1
5
9
13
17

...
4i+3

AN-1 - A2

bng 1
2
6
10
14
18

m = 64bit tm bng nh an xen

truyn theo block nh

Computer Architecture

Nguyn Kim Khnh DCE-HUST

CPU yu cu ni dung ca ngn nh


CPU kim tra trn cache vi d liu ny
Nu c, CPU nhn d liu t cache
(nhanh)
Nu khng c, c Block nh cha d
liu t b nh chnh vo cache
Tip chuyn d liu t cache vo
CPU

387

Jan2016

Computer Architecture

388

97

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cu trc chung ca cache / b nh chnh

Cu trc chung ca cache / b nh chnh (tip)

B nh chnh
Tag

Cache
L0
L1
L2
L3

CPU

...

B0
B1
B2
B3

n
n

B nh chnh c 2N byte nh
B nh chnh v cache c chia thnh cc
khi c kch thc bng nhau

...

B nh chnh: B0, B1, B2, ... , Bp-1 (p Blocks)

Bj

B nh cache: L0, L1, L2, ... , Lm-1 (m Lines)

Kch thc ca Block (Line) = 8,16,32,64,128 byte

Li
...

...

Lm-1

Mi Line trong cache c mt th nh (Tag) c


gn vo

BP-1
Jan2016

Computer Architecture

389

NKK-HUST

Computer Architecture

390

NKK-HUST

Cu trc chung ca cache / b nh chnh (tip)

2. Cc phng php nh x

Mt s Block ca b nh chnh c np vo
cc Line ca cache
Ni dung Tag (th nh) cho bit Block no ca
b nh chnh hin ang c cha Line
Ni dung Tag c cp nht mi khi Block t
b nh chnh np vo Line
Khi CPU truy nhp (c/ghi) mt t nh, c
hai kh nng xy ra:

(Chnh l cc phng php t chc b


nh cache)
nh x trc tip
(Direct mapping)
nh x lin kt ton phn
(Fully associative mapping)
nh x lin kt tp hp
(Set associative mapping)

n
n

Jan2016

Jan2016

T nh c trong cache (cache hit)


T nh khng c trong cache (cache miss).

Computer Architecture

Nguyn Kim Khnh DCE-HUST

391

Jan2016

Computer Architecture

392

98

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

nh x trc tip
n

Mi Block ca b nh chnh ch c th c np
vo mt Line ca cache:
n
n
n
n
n
n
n

nh x trc tip (tip)

B0 L0
B1 L1
....
Bm-1 Lm-1
Bm L0
Bm+1 L1
....

n
Jan2016

Line

Word

T bit

L bit

W bit

L0

B0

L1

B1

...

...
Li

So snh

...
cache hit

Bj ch c th np vo L j mod m
m l s Line ca cache.

Bj

...
Lm-1

Bm-1

Computer Architecture

cache miss

393

NKK-HUST

Jan2016

Computer Architecture

394

NKK-HUST

nh x trc tip (tip)

nh x trc tip (tip)

a ch N bit ca b nh chnh chia thnh ba


trng:
n

n
n

Trng Word gm W bit xc nh mt t nh trong


Block hay Line:
2W = kch thc ca Block hay Line
Trng Line gm L bit xc nh mt trong s cc Line
trong cache:
2L = s Line trong cache = m
Trng Tag gm T bit:
T = N - (W+L)

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Mi th nh (Tag) ca mt Line cha c T bit


Khi Block t b nh chnh c np vo Line ca cache
th Tag c cp nht gi tr l T bit a ch cao ca
Block
Khi CPU mun truy nhp mt t nh th n pht ra mt
a ch N bit c th
n
n

Nh vo gi tr L bit ca trng Line s tm ra Line tng ng


c ni dung Tag Line (T bit), ri so snh vi T bit bn tri
ca a ch va pht ra
n
n

n
n

Jan2016

Tag

B nh
chnh

Cache

Tng qut
n

Tag

Nbit a ch b nh

395

Jan2016

Ging nhau: cache hit


Khc nhau: cache miss

u im: B so snh n gin


Nhc im: Xc sut cache hit thp
Computer Architecture

396

99

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

nh x lin kt ton phn


n

nh x lin kt ton phn (tip)

Mi Block c th np vo bt k Line
no ca cache
a ch ca b nh chnh chia thnh hai
trng:

Tag
N bit a ch b nh
Tag

Word

T bit

W bit

L0

B0

L1

B1

...

...

Trng Word
n Trng Tag dng xc nh Block ca
b nh chnh
n

B nh
chnh

Cache

Li

So snh

...
cache hit

Tag xc nh Block ang nm Line

Bj

...
Lm-1

Bm-1

cache miss

Jan2016

Computer Architecture

397

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

nh x lin kt ton phn (tip)


n
n

Jan2016

n
n
n

Dung ha cho hai phng php trn


Cache c chia thnh cc Tp (Set)
Mi mt Set cha mt s Line
V d:
n

Nu gp gi tr bng nhau: cache hit xy ra Line


Nu khng c gi tr no bng: cache miss

u im: Xc sut cache hit cao


Nhc im:
n

So snh T bit bn tri ca a ch va pht ra vi ln lt ni


dung ca cc Tag trong cache
n

nh x lin kt tp hp

Mi th nh (Tag) ca mt Line cha c T bit


Khi Block t b nh chnh c np vo Line ca cache
th Tag c cp nht gi tr l T bit a ch cao ca
Block
Khi CPU mun truy nhp mt t nh th n pht ra mt
a ch N bit c th
n

398

nh x theo nguyn tc sau:


n
n

So snh ng thi vi tt c cc Tag mt nhiu thi gian


B so snh phc tp

t s dng

n
Computer Architecture

Nguyn Kim Khnh DCE-HUST

399

Jan2016

4 Line/Set 4-way associative mapping


B0 S0
B1 S1
B2 S2
.......
Computer Architecture

400

100

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

nh x lin kt tp hp (tip)
Tag

Nbit a ch b nh
Tag

Set

Word

T bit

S bit

W bit

nh x lin kt tp hp (tip)

Cache

B nh
chnh

L0

B0

L1
L2

S0

L3

L6

B2
B3

L4
L5

B1

B4

S1

L7

B5

n
...

...

Kch thc Block = 2W Word


Trng Set c S bit dng xc nh mt trong s cc
Set trong cache. 2S = S Set trong cache
Trng Tag c T bit: T = N - (W+S)
Khi CPU mun truy nhp mt t nh th n pht ra mt
a ch N bit c th
n

So snh

Sk X

...

cache hit

Nh vo gi tr S bit ca trng Set s tm ra Set tng ng


So snh T bit bn tri ca a ch va pht ra vi ln lt ni
dung ca cc Tag trong Set
n

...

Sv-1

n
n

cache miss

Jan2016

Computer Architecture

401

NKK-HUST

n
n
n
n

Computer Architecture

402

NKK-HUST

Vi nh x trc tip

Gi s my tnh nh a ch cho tng byte


Khng gian a ch b nh chnh = 4GiB
Dung lng b nh cache l 256KiB
Kch thc Line (Block) = 32byte.
Xc nh s bit ca cc trng a ch cho ba
trng hp t chc:
n
n
n

Jan2016

Tng qut cho c hai phng php trn


Thng dng vi: 2,4,8,16Lines/Set

Jan2016

V d v nh x a ch
n

Nu gp gi tr bng nhau: cache hit xy ra Line tng ng


Nu khng c gi tr no bng: cache miss

n
n
n
n

nh x trc tip
nh x lin kt ton phn
nh x lin kt tp hp 4 ng

Computer Architecture

Nguyn Kim Khnh DCE-HUST

403

Jan2016

B nh chnh = 4GiB = 232 byte N = 32 bit


Cache = 256 KiB = 218 byte.
Line = 32 byte = 25 byte W = 5 bit
S Line trong cache = 218/ 25 = 213 Line
L = 13 bit
T = 32 - (13 + 5) = 14 bit
Tag

Line

Word

14 bit

13 bit

5 bit

Computer Architecture

404

101

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Vi nh x lin kt ton phn

Vi nh x lin kt tp hp 4 ng
B nh chnh = 4GiB = 232 byte N = 32 bit
n Line = 32 byte = 25 byte W = 5 bit
n S Line trong cache = 218/ 25 = 213 Line
n Mt Set c 4 Line = 22 Line
s Set trong cache = 213/ 22 = 211 Set
S = 11 bit
n S bit ca trng Tag s l: T = 32 - (11 + 5)
= 16 bit
n

n
n
n

B nh chnh = 4GiB = 232 byte N = 32 bit


Line = 32 byte = 25 byte W = 5 bit
S bit ca trng Tag s l: T = 32 - 5 = 27 bit

Tag

Word

27 bit

5 bit

Tag
16 bit
Jan2016

Computer Architecture

405

NKK-HUST

Jan2016

Word

11 bit

5 bit

Computer Architecture

406

NKK-HUST

3. Thay th block trong cache

Thay th block trong cache (tip)


Vi nh x lin kt: cn c thut gii thay th:

Vi nh x trc tip:
n Khng phi la chn
n Mi Block ch nh x vo mt Line xc
nh
n Thay th Block Line

n
n

n
Jan2016

Set

Computer Architecture

Nguyn Kim Khnh DCE-HUST

407

Jan2016

Random: Thay th ngu nhin


FIFO (First In First Out): Thay th Block no
nm lu nht trong Set
LFU (Least Frequently Used): Thay th Block
no trong Set c s ln truy nhp t nht trong
cng mt khong thi gian
LRU (Least Recently Used): Thay th Block
trong Set tng ng c thi gian lu nht khng
c tham chiu ti
Ti u nht: LRU
Computer Architecture

408

102

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

4. Phng php ghi d liu khi cache hit


n

7.4. B nh ngoi

Ghi xuyn qua (Write-through):

ghi c cache v c b nh chnh

tc chm

n
n

Ghi tr sau (Write-back):


n

ch ghi ra cache

tc nhanh

n
n

Bng t: t s dng
a t: a cng HDD (Hard Disk Drive)
a quang: CD, DVD
B nh Flash:
n

khi Block trong cache b thay th cn phi


ghi tr c Block v b nh chnh

Jan2016

Computer Architecture

n
n

409

nh th rn SSD (Solid State Drive)


USB flash
Th nh

Jan2016

NKK-HUST

Computer Architecture

410

NKK-HUST

a cng (HDD Hard Disk Drive)


6.1 / MAGNETIC DISK
Readwrite head (1 per surface)

188

Tn ti di dng cc thit b lu tr
Cc kiu b nh ngoi

Tracks

Intersector gap

S6

SN

Surface 9
Platter

Surface 8
Surface 7

n
n
n

Dung lng ln
Tc c/ghi chm
Tn nng lng
D b li c hc
R tin

Surface 2
Surface 1

SN

S5

S5

Surface 4
Surface 3

Intertrack gap

S6

Direction of
arm motion

Surface 6
Surface 5

CHAPTER 6 / EXTERNAL MEMORY


Sectors

HDD

191

S1

S4

S4

S1

Surface 0

S2

Spindle

S2

Figure 6.5

S3

S3

Figure 6.2 Disk Data Layout

Jan2016

Computer Architecture

Figure 6.2 depicts this data layout. Adjacent tracks are separated by gaps. This
prevents, or at least minimizes, errors due to misalignment of the head or simply
interference of magnetic fields.
Data are transferred to and from the disk in sectors (Figure 6.2). There are
typically hundreds of sectors per track, and these may be of either fixed or variable
length. In most contemporary systems, fixed-length sectors are used, with 512 bytes
being the nearly universal sector size. To avoid imposing unreasonable precision
requirements on the system, adjacent sectors are separated by intratrack (intersector) gaps.
A bit near the center of a rotating disk travels past a fixed point (such as a read
write head) slower than a bit on the outside. Therefore, some way must be found
to compensate for the variation in speed so that the head can read all the bits at the
same rate. This can be done by increasing the spacing between bits of information
recorded in segments of the disk. The information can then be scanned at the same
rate by rotating the disk at a fixed speed, known as the constant angular velocity
(CAV). Figure 6.3a shows the layout of a disk using CAV. The disk is divided into
a number of pie-shaped sectors and into a series of concentric tracks. The advantage of using CAV is that individual blocks of data can be directly addressed by
track and sector. To move the head from its current location to a specific address, it
only takes a short movement of the head to a specific track and a short wait for the

Boom

Components of a Disk Drive


411

tracks that are of equal distance from the center of the disk. The set of all the tracks
in the same relative position on the platter is referred to as a cylinder. For example,
all of the shaded tracks in Figure 6.6 are part of one cylinder.
Finally, the head mechanism provides a classification of disks into three types.
Traditionally, the read-write head has been positioned a fixed distance above the
platter, allowing an air gap. At the other extreme is a head mechanism that actually
comes into physical contact with the medium during a read or write operation. This
mechanism is used with the floppy disk, which is a small, flexible platter and the
least expensive type of disk.

Nguyn Kim Khnh DCE-HUST

Jan2016

Computer Architecture

412

103

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

SSD (Solid State Drive)


n
n
n
n
n

n
n
Jan2016

a quang

B nh bn dn flash
Khng kh bin
Tc nhanh
Tiu th nng lng t
Gm nhiu chip nh flash
v cho php truy cp song
song
t b li
t tin
Computer Architecture

n
n
n

413

Digital Video Disc hoc Digital Versatile Disk


Ghi mt hoc hai mt
Mt hoc hai lp trn mt mt
Thng dng: 4,7GB/lp

Jan2016

Computer Architecture

414

NKK-HUST

H thng lu tr dung lng ln: RAID


n

Dung lng thng dng 650MB

DVD
n

NKK-HUST

CD (Compact Disc)

RAID 0, 1, 2
198

Redundant Array of Inexpensive Disks


(Redundant Array of Independent Disks)
Tp cc a cng vt l c OS coi nh mt
logic duy nht dung lng ln
D liu c lu tr phn tn trn cc a
vt l truy cp song song (nhanh)
Lu tr thm thng tin d tha, cho php khi
phc li thng tin trong trng hp a b hng
an ton thng tin
7 loi ph bin (RAID 0 6)

CHAPTER 6 / EXTERNAL MEMORY

strip 0

strip 1

strip 2

strip 4

strip 5

strip 6

strip 3
strip 7

strip 8

strip 9

strip 10

strip 11

strip 12

strip 13

strip 14

strip 15

(a) RAID 0 (Nonredundant)

strip 0

strip 1

strip 2

strip 3

strip 0

strip 1

strip 2

strip 4

strip 5

strip 6

strip 7

strip 4

strip 5

strip 6

strip 3
strip 7

strip 8

strip 9

strip 10

strip 11

strip 8

strip 9

strip 10

strip 11

strip 12

strip 13

strip 14

strip 15

strip 12

strip 13

strip 14

strip 15

b2

b3

f0(b)

f1(b)

f2(b)

(b) RAID 1 (Mirrored)

b0

b1

(c) RAID 2 (Redundancy through Hamming code)

Figure 6.8 RAID Levels


Jan2016

Computer Architecture

415

Jan2016

Computer Architecture

416

RAID Level 0

Nguyn Kim Khnh DCE-HUST

RAID level 0 is not a true member of the RAID family because it does not include
redundancy to improve performance. However, there are a few applications, such
as some on supercomputers in which performance and capacity are primary concerns and low cost is more important than improved reliability.
For RAID 0, the user and system data are distributed across all of the disks
in the array. This has a notable advantage over the use of a single large disk: If two
-different I/O requests are pending for two different blocks of data, then there is a
good chance that the requested blocks are on different disks. Thus, the two requests
can be issued in parallel, reducing the I/O queuing time.
But RAID 0, as with all of the RAID levels, goes further than simply distribut-

104

b0

b1

b2

b3

P(b)

Bi ging Kin trc my tnh

Jan2016
(d) RAID 3 (Bit-interleaved parity)

NKK-HUST

NKK-HUST

RAID 3 & 4

block 1

block 2

block 3

block 4

block 5

block 6

block 7

P(4-7)

block 8

block 9

block 10

block 11

P(8-11)

block 12

block 13

block 14

block 15

P(12-15)

RAID 5 & 6

6.2 / RAID

b0

block 0

b1

b2

b3

199

P(0-3)

(e) RAID 4 (Block-level parity)

P(b)

block 0

block 1

block 2

block 3

P(0-3)

block 4

block 5

block 6

P(4-7)

block 7

block 8

block 9

P(8-11)

block 10

block 11

block 12

P(12-15)

block 13

block 14

block 15

P(16-19)

block 16

block 17

block 18

block 19

(d) RAID 3 (Bit-interleaved parity)


(f) RAID 5 (Block-level distributed parity)

block 0

block 1

block 2

block 3

block 4

block 5

block 6

block 7

P(0-3)
P(4-7)

block 8

block 9

block 10

block 11

P(8-11)

block 12

block 13

block 14

block 15

P(12-15)

(e) RAID 4 (Block-level parity)

block 0

block 1

block 2

block 3

P(0-3)

Q(0-3)

block 4

block 5

block 6

P(4-7)

Q(4-7)

block 7

block 8

block 9

P(8-11)

Q(8-11)

block 10

block 11

block 12

P(12-15)

Q(12-15)

block 13

block 14

block 15

(g) RAID 6 (Dual redundancy)

Figure 6.8 RAID Levels (continued)

Jan2016

block 0

block 1

block 4

block 5

block 3

P(0-3)

block 6
P(4-7)
Computer Architecture
P(8-11)
block 10

block 2

block 7

block 8

block 9

block 12

P(12-15)

block 13

block 14

block 15

P(16-19)

block 16

block 17

block 18

block 19

417

block 11

second
Jan2016

strips on each disk; and so on. The


advantage
of this layout is that if a single
Computer
Architecture
I/O request consists of multiple logically contiguous strips, then up to n strips for
that request can be handled in parallel, greatly reducing the I/O transfer time.
Figure 6.9 indicates the use of array management software to map between
logical and physical disk space. This software may execute either in the disk subsystem
or in a host computer.

418

(f ) RAID 5 (Block-level distributed parity)

NKK-HUST

NKK-HUST

nh x d liu ca RAID 0
block 0

block 1

block 2

block 3

200 block
CHAPTER
6 / EXTERNAL
MEMORY
4
block 5
block 6
block 8

block 9

Physical

P(8-11)

P(12-15) disk 0 Q(12-15)

Logical
disk
block
12
strip 0

strip 0

strip 1

strip 4

strip 2

P(4-7)

Q(4-7)

block 7

block 10

block 11

Physical
disk 1block

strip 8

RAID Levels
(continued)
strip
12

13

Physical
diskblock
2
strip 2

14

7.5. B nh o (Virtual Memory)

Q(0-3)

Q(8-11)

strip 1

(g) RAID 6 (Dual redundancy)

Figure
strip 3 6.8

P(0-3)

Physical
disk
3
block

15

strip 3

strip 5

strip 6

strip 7

strip 9

strip 10

strip 11

strip 13

strip 14

strip 15

strip 4
strip 5

second
on each disk; and so on. The advantage of this layout is that if a single
stripstrips
6
I/O request
consists of Array
multiple logically contiguous strips, then up to n strips for
strip 7
management
that request
can be handled
in parallel, greatly reducing the I/O transfer time.
strip 8
software
Figure
6.9 indicates the use of array management software to map between
strip
9
logical
and
strip
10 physical disk space. This software may execute either in the disk subsystem
11 computer.
or in strip
a host

K thut phn trang: Chia khng gian a


ch b nh thnh cc trang nh c kch
thc bng nhau v nm lin k nhau
Thng dng: kch thc trang = 4KiB
n K thut phn on: Chia khng gian nh
thnh cc on nh c kch thc thay
i, cc on nh c th gi ln nhau.
n

strip 12
strip 13
strip 14
strip 15

Figure 6.9

Data Mapping for a RAID Level 0 Array

RAID 0
Jan2016

FOR HIGH DATA TRANSFER CAPACITY The performance of any of the


RAID levels depends critically on the request patterns of the host system and on
Computer
Architecture
the layout of the data. These
issues can
be most clearly addressed in RAID 0, where
the impact of redundancy does not interfere with the analysis. First, let us consider
the use of RAID 0 to achieve a high data transfer rate. For applications to experience
a high transfer rate, two requirements must be met. First, a high transfer capacity
must exist along the entire path between host memory and the individual disk drives.
This includes internal controller buses, host system I/O buses, I/O adapters, and host
memory buses.
The second requirement is that the application must make I/O requests that
drive the disk array efficiently. This requirement is met if the typical request is for
large amounts of logically contiguous data, compared to the size of a strip. In this
case, a single I/O request involves the parallel transfer of data from multiple disks,
increasing the effective transfer rate compared to a single-disk transfer.

Nguyn Kim Khnh DCE-HUST


RAID 0

FOR HIGH

I/O

REQUEST RATE In a transaction-oriented environment,

Khi nim b nh o: gm b nh
chnh v b nh ngoi m c CPU
coi nh l mt b nh duy nht (b nh
chnh).
Cc k thut thc hin b nh o:

419

Jan2016

Computer Architecture

420

105

A physical address is an actual location in main memory. When the processor executes a process, it automatically converts from logical to physical address by adding
the current starting location of the process, called its base address, to each logical
address. This is another example of a processor hardware feature designed to meet
an OS requirement. The exact nature of this hardware feature depends on the memory management strategy in use. We will see several examples later in this chapter.

Bi ging Kin trc my tnh

Jan2016

Paging

NKK-HUST

NKK-HUST

Phn trang

Both unequal fixed-size and variable-size partitions are inefficient in the use of
memory. Suppose, however, that memory is partitioned into equal fixed-size chunks
that are relatively small, and that each process is also divided into small fixed-size
chunks of some size. Then the chunks of a program, known as pages, could be
assigned to available chunks of memory, known as frames, or page frames. At most,
then, the wasted space in memory for that process is a fraction of the last page.
Figure 8.15 shows an example of the use of pages and frames. At a given point
in time, some of the frames in memory are in use and some are free. The list of free
frames is maintained by the OS. Process A, stored on disk, consists of four pages.

Cp pht cc khung trang

Jan2016

NKK-HUST

process. Within the program, each logical address consists of a page number and
a relative address within the page. Recall that in the case of simple partitioning, a
logical address is the location of Computer
a word relative
to the beginning of the program;
Architecture
the processor translates that into a physical address. With paging, the logicalto-physical address translation is still done by processor hardware. The processor
must know how to access the page table of the current process. Presented with a
logical address (page number, relative address), the processor uses the page table
to produce a physical address (frame number, relative address). An example is
shown in Figure 8.16.
This approach solves the problems raised earlier. Main memory is divided
into many small equal-size frames. Each process is divided into frame-size pages:
smaller processes require fewer pages, larger processes require more. When a
process is brought in, its pages are loaded into available frames, and a page table
is set up.

Process A
Page 0
Page 1
Page 2
Page 3

Free frame list


13
14
15
18
20

Logical
address

30

Frame Relative address


number within frame

Physical
address

14

30

Page 3
of A

15

421

Jan2016

In
use

17

In
use

18

16

In
use

Free frame list


20

17

In
use

Process A
page table

18

Page 0
of A

19

In
use

18
In
use

13
14
15

20

(b) After

422

Phn trang theo yu cu

n
n

18

Khng yu cu tt c cc trang ca tin trnh nm


trong b nh
Ch np vo b nh nhng trang c yu cu

Li trang

Page 0
of A

16

Nguyn tc lm vic ca b nh o phn trang

17

15

Page 2
of A

Page 3
15 of A

Computer Architecture

13
14

14

NKK-HUST

16
18

14

(a) Before

n
13

Page 1
of A

Figure 8.15 Allocation of Free Frames

Page 2
of A

13

Page 0
Page 1
Page 2
Page 3

20

Page Relative address


number within page

Process A

19

Main
memory

13

13

15

a ch logic v a ch vt l ca phn trang

Page 1
of A

Main
memory

Main
memory

Phn chia b nh thnh cc phn c kch


thc bng nhau gi l cc khung trang
n Chia chng trnh (tin trnh) thnh cc trang
n Cp pht s hiu khung trang yu cu cho
tin trnh
n OS duy tr danh sch cc khung trang nh
288 CHAPTER 8 / OPERATING SYSTEM SUPPORT
trng
When it comes time to load this process, the OS finds four free frames and loads the
pages trnh
of the process
A into the
four frames.
n four
Tin
khng
yu
cu cc khung trang lin
Now suppose, as in this example, that there are not sufficient unused contiguous
tip frames to hold the process. Does this prevent the OS from loading A?
The answer is no, because we can once again use the concept of logical address. A
addressbng
will no longer
suffice.
Rather,
the OSl
maintains a page table
n simple
S base
dng
trang

qun
for each process. The page table shows the frame location for each page of the
n

Trang c yu cu khng c trong b nh


HH cn hon i trang yu cu vo
C th cn hon i mt trang no ra ly
ch
Cn chn trang a ra

Process A
page table

Jan2016

Figure 8.16 Logical and Physical Addresses

Computer Architecture

Nguyn Kim Khnh DCE-HUST

423

Jan2016

Computer Architecture

424

106

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Tht bi
n
n
n

n
n

Li ch

Qu nhiu tin trnh trong b nh qu nh


OS tiu tn ton b thi gian cho vic hon i
C t hoc khng c cng vic no c thc
hin
a lun lun sng
Gii php:
n
n
n

n
n

Thut ton thay trang


Gim bt s tin trnh ang chy
Thm b nh

Jan2016

Computer Architecture

425

NKK-HUST

Khng cn ton b tin trnh nm trong


b nh chy
C th hon i trang c yu cu
Nh vy c th chy nhng tin trnh
ln hn tng b nh sn dng
B nh chnh c gi l b nh thc
Ngi dng cm gic b nh ln hn
b nh thc

Jan2016

Computer Architecture

NKK-HUST

Cu trc bng trang

B nh trn my tnh PC
8.3 / MEMORY MANAGEMENT

291

Virtual address
n bits
Page # Offset

n bits
Hash
function

m bits

Page #

Control
bits
Process
ID
Chain

2m ! 1
Inverted page table
(one entry for each
physical memory frame)

Figure 8.17

B nh cache: tch hp trn chip vi


x l:

Jan2016

426

L1: cache lnh v cache d liu


L2, L3

B nh chnh: Tn ti di dng cc
m-un nh RAM

Frame # Offset
m bits
Real address

Inverted Page Table Structure

inverted page table for each real memory page frame rather than one per virtual
Computer Architecture
page. Thus a fixed proportion of real memory is required for the tables regardless of
the number of processes or virtual pages supported. Because more than one virtual
address may map into the same hash table entry, a chaining technique is used for
managing the overflow. The hashing technique results in chains that are typically
shortbetween one and two entries. The page tables structure is called inverted
because it indexes page table entries by frame number rather than by virtual page
number.

Translation
Lookaside Buffer
Nguyn Kim Khnh
DCE-HUST
In principle, then, every virtual memory reference can cause two physical memory accesses: one to fetch the appropriate page table entry, and one to fetch the
desired data. Thus, a straightforward virtual memory scheme would have the effect

427

Jan2016

Computer Architecture

428

107

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

B nh trn PC (tip)
n

ROM BIOS cha cc chng trnh sau:


n
n
n
n

n
n

n
Jan2016

Ht chng 7

CMOS RAM:
n

Chng trnh POST (Power On Self Test)


Chng trnh CMOS Setup
Chng trnh Bootstrap loader
Cc trnh iu khin vo-ra c bn (BIOS)
Cha thng tin cu hnh h thng
ng h h thng
C pin nui ring

Video RAM: qun l thng tin ca mn hnh


Cc loi b nh ngoi
Computer Architecture

429

NKK-HUST

Jan2016

430

NKK-HUST

Kin trc my tnh

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song

Chng 8
H THNG VO-RA
Nguyn Kim Khnh
Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

Computer Architecture

Nguyn Kim Khnh DCE-HUST

431

Jan2016

Computer Architecture

432

108

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Ni dung ca chng 8

8.1. Tng quan v h thng vo-ra


n

8.1. Tng quan v h thng vo-ra


8.2. Cc phng php iu khin vo-ra
8.3. Ni ghp thit b vo-ra

Chc nng: Trao i


thng tin gia my tnh
vi bn ngoi
Cc thao tc c bn:
n
n

Computer Architecture

433

NKK-HUST

M-un
vo-ra

Thit b
vo-ra

M-un
vo-ra

Thit b
vo-ra

Computer Architecture

434

NKK-HUST

Thit b vo-ra

Tn ti a dng cc thit b vo-ra khc


nhau v:
n
n
n

n
n

Nguyn tc hot ng
Tc
Khun dng d liu

Cn gi l thit b ngoi vi (Peripherals)


Chc nng: chuyn i d liu gia bn trong
v bn ngoi my tnh
Phn loi:
n

Tt c cc thit b vo-ra u chm hn


CPU v RAM
Cn c cc m-un vo-ra ni ghp
cc thit b vi CPU v b nh chnh

n
n

Computer Architecture

Nguyn Kim Khnh DCE-HUST

435

Jan2016

Thit b vo (Input Devices)


Thit b ra (Output Devices)
Thit b lu tr (Storage Devices)
Thit b truyn thng (Communication Devices)

Giao tip:
n

Jan2016

Cc thit b vo-ra
Cc m-un vo-ra

Jan2016

c im ca h thng vo-ra
n

Thit b
vo-ra

Vo d liu (Input)
Ra d liu (Output)

Cc thnh phn chnh:


n

Jan2016

Bus
h
thng

Ngi - my
My - my
Computer Architecture

436

109

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cu trc chung ca thit b vo-ra


D liu
t/n
m-un
vo-ra

B
m
d
liu

B
chuyn
i
tn hiu

M-un vo-ra
n

D liu
n/t
bn ngoi

Chc nng:
n
n
n

Tn hiu
iu khin

Khi logic iu khin

Tn hiu
trng thi

Jan2016

Computer Architecture

437

NKK-HUST

Jan2016

Computer Architecture

438

NKK-HUST

Cu trc ca m-un vo-ra

4. a ch ha cng vo-ra (IO addressing)

Bus

n
Cc
ng
d liu

d liu

B m
d liu

Cng
vo
ra

Tn hiu
iu khin

n
Cc
ng
a ch

d liu

Khi logic
iu khin
Cc
ng
iu
khin

Cng
vo
ra

Tn hiu
iu khin

Nguyn Kim Khnh DCE-HUST

439

Jan2016

Cc b x l 680x0 ca Motorola
Cc b x l theo kin trc RISC: MIPS, ARM, ...

Mt s b x l c hai khng gian a ch tch


bit:
n

Tn hiu
trng thi

Computer Architecture

Hu ht cc b x l ch c mt khng gian a
ch chung cho c cc ngn nh v cc cng
vo-ra
n

Tn hiu
trng thi

Jan2016

iu khin v nh thi
Trao i thng tin vi CPU hoc b nh chnh
Trao i thng tin vi thit b vo-ra
m gia bn trong my tnh vi thit b vo-ra
Pht hin li ca thit b vo-ra

Khng gian a ch b nh
Khng gian a ch vo-ra
V d: Intel x86

Computer Architecture

440

110

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Khng gian a ch tch bit


Khng gian a ch
b nh

N bit
000...000
000...001
000...010
000...011
000...100
000...101

Khng gian a
ch vo-ra

.
.
.

.
.
.

Cc phng php a ch ho cng vo-ra


N1 bit

00...00
00...01
00...10
00...11

Vo-ra theo bn b nh
(Memory mapped IO)
Vo-ra ring bit
(Isolated IO hay IO mapped IO)

.
.
.
11...11

.
.
.
111...111

Jan2016

Computer Architecture

441

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Vo-ra theo bn b nh
n

n
n

n
n

V d lp trnh vo-ra cho MIPS

Cng vo-ra c nh a ch theo khng gian


a ch b nh
CPU coi cng vo-ra nh ngn nh
Lp trnh trao i d liu vi cng vo-ra bng
cc lnh truy nhp d liu b nh
C th thc hin trn mi h thng
V d: B x l MIPS
n

n
Jan2016

442

n
n

Nguyn Kim Khnh DCE-HUST

Cng 1: 0xFFFFFFF4
Cng 2: 0xFFFFFFF8

Ghi gi tr 0x41 ra cng 1


addi $t0, $0, 0x41
sw $t0, 0xFFF4($0)

32-bit a ch cho mt khng gian a ch chung cho c


cc ngn nh v cc cng vo-ra
Cc cng vo-ra c gn cc a ch thuc vng a
ch d tr
Vo/ra d liu: s dng lnh load/store
Computer Architecture

V d: C hai cng vo-ra c gn a ch:

# a gi tr 0x41
# ra cng 1

Ch : gi tr 16-bit 0xFFF4 c sign-extended thnh 32-bit 0xFFFFFFF4


n

c d liu t cng 2 a vo $t3


lw $t3, 0xFFF8($0)

443

Jan2016

# c d liu cng 2 a vo $t3

Computer Architecture

444

111

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Vo-ra ring bit (Isolated IO)


n

8.2. Cc phng php iu khin vo-ra

Cng vo-ra c nh a ch theo khng gian


a ch vo-ra ring
Lp trnh trao i d liu vi cng vo-ra bng
cc lnh vo-ra chuyn dng
V d: Intel x86

n
n

Truy nhp b nh trc tip - DMA


(Direct Memory Access)

Lnh IN: nhn d liu t cng vo


Lnh OUT: a d liu n cng ra

Jan2016

Computer Architecture

445

NKK-HUST

Jan2016

Computer Architecture

446

NKK-HUST

BaCHAPTER
k thut
thc hin vo mt khi d liu
7 / INPUT/OUTPUT

1. Vo-ra bng chng trnh

230

Issue read
command to
I/O module

CPU

Read status
of I/O
module

I/O

Not
ready

Check
Status

Issue read
command to
I/O module

I/O

CPU

Read status
of I/O
module

Error
condition

Check
status

Ready

No

CPU

I/O
Do something
else
Interrupt

I/O

CPU

Error
condition

Issue read
block command
to I/O module
Read status
of DMA
module

CPU

DMA
Do something
else

Nguyn tc chung:

Interrupt
DMA

CPU

Next instruction
(c) Direct Memory Access

Ready

Read word
from I/O
module

I/O

Write word
into memory

CPU

CPU

Memory

Done?
Yes

Next instruction
(a) Programmed I/O

Jan2016

Vo-ra iu khin bng ngt


(Interrupt Driven IO)

Dng 8-bit hoc 16-bit a ch cho khng gian a ch


vo-ra ring
C hai lnh vo-ra chuyn dng

Vo-ra bng chng trnh


(Programmed IO)

No

Read word
from I/O
module

I/O

Write word
into memory

CPU

CPU

Memory

Done?

CPU iu khin trc tip vo-ra


bng chng trnh cn phi lp
trnh vo-ra trao i d liu
gia CPU vi m-un vo-ra
CPU nhanh hn thit b vo-ra rt
nhiu ln, v vy trc khi thc
hin lnh vo-ra, chng trnh cn
c v kim tra trng thi sn sng
ca m-un vo-ra

c trng thi
m-un vo-ra

Sn sng ?

Y
Trao i d liu

Yes
Next instruction
(b) Interrupt-Driven I/O

Figure 7.4 Three Techniques for Input of a Block of Data

Computer Architecture

Figure 7.4a gives an example of the use of programmed I/O to read in a block of
data from a peripheral device (e.g., a record from tape) into memory. Data are read
in one word (e.g., 16 bits) at a time. For each word that is read in, the processor must
remain in a status-checking cycle until it determines that the word is available in the
I/O modules data register. This flowchart highlights the main disadvantage of this
technique: it is a time-consuming process that keeps the processor busy needlessly.

Nguyn Kim Khnh DCE-HUST


I/O Instructions

With programmed I/O, there is a close correspondence between the I/O-related

447

Jan2016

Computer Architecture

448

112

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cc tn hiu iu khin vo-ra

Cc lnh vo-ra

Tn hiu iu khin (Control): kch hot thit b


vo-ra
Tn hiu kim tra (Test): kim tra trng thi
ca m-un vo-ra v thit b vo-ra
Tn hiu iu khin c (Read): yu cu mun vo-ra nhn d liu t thit b vo-ra v
a vo b m d liu, ri CPU nhn d liu

Tn hiu iu khin ghi (Write): yu cu mun vo-ra ly d liu trn bus d liu a n
b m d liu ri chuyn ra thit b vo-ra

Jan2016

Computer Architecture

449

NKK-HUST

Jan2016

Computer Architecture

450

NKK-HUST

c im
n
n

Jan2016

Vi vo-ra theo bn b nh: s


dng cc lnh trao i d liu vi b
nh trao i d liu vi cng vo-ra
Vi vo-ra ring bit: s dng cc lnh
vo-ra chuyn dng (IN, OUT)

2. Vo-ra iu khin bng ngt


n

Vo-ra do mun ca ngi lp trnh


CPU trc tip iu khin trao i d liu
gia CPU vi m-un vo-ra
CPU i m-un vo-ra tiu tn nhiu
thi gian ca CPU

Computer Architecture

Nguyn Kim Khnh DCE-HUST

Nguyn tc chung:
CPU khng phi i trng thi sn sng
ca m-un vo-ra, CPU thc hin mt
chng trnh no
n Khi m-un vo-ra sn sng th n pht tn
hiu ngt CPU
n CPU thc hin chng trnh con x l ngt
vo-ra tng ng trao i d liu
n CPU tr li tip tc thc hin chng trnh
ang b ngt
n

451

Jan2016

Computer Architecture

452

113

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Chuyn iu khin n chng trnh con ngt

Hot ng vo d liu: nhn t m-un vo-ra

Chng trnh
ang thc hin

lnh
lnh

Ngt y

n
Chng trnh con
x l ngt

lnh

lnh

lnh

lnh

lnh i

lnh

lnh i+1
lnh

...

RETURN

...

lnh

Jan2016

Computer Architecture

453

NKK-HUST

Jan2016

Computer Architecture

n
n

454

NKK-HUST

Hot ng vo d liu: nhn t CPU


n

M-un vo-ra nhn tn hiu iu khin


c t CPU
M-un vo-ra nhn d liu t thit b
vo-ra, trong khi CPU lm vic khc
Khi c d liu m-un vo-ra pht
tn hiu ngt CPU
CPU yu cu d liu
M-un vo-ra chuyn d liu n CPU

Cc vn ny sinh khi thit k

Pht tn hiu iu khin c


Lm vic khc
Cui mi chu trnh lnh, kim tra tn hiu
yu cu ngt
Nu b ngt:

Lm th no xc nh c m-un
vo-ra no pht tn hiu ngt ?
CPU lm nh th no khi c nhiu yu
cu ngt cng xy ra ?

Ct ng cnh (ni dung cc thanh ghi lin


quan)
n Thc hin chng trnh con x l ngt vo
d liu
n Khi phc ng cnh ca chng trnh ang
thc hin
n

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

455

Jan2016

Computer Architecture

456

114

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cc phng php ni ghp ngt


n
n

Nhiu ng yu cu ngt

S dng nhiu ng yu cu ngt


Hi vng bng phn mm (Software
Poll)
Hi vng bng phn cng (Daisy Chain
or Hardware Poll)
S dng b iu khin ngt (PIC)

Thanh
ghi
yu
cu
ngt

n
n
n
Computer Architecture

457

NKK-HUST

CPU

Jan2016

M-un
vo-ra

M-un
vo-ra

M-un
vo-ra

Mi m-un vo-ra c ni vi mt ng yu cu
ngt
CPU phi c nhiu ng tn hiu yu cu ngt
Hn ch s lng m-un vo-ra
Cc ng ngt c qui nh mc u tin
Computer Architecture

458

NKK-HUST

C
ngt

M-un
vo-ra

Jan2016

Hi vng bng phn mm

INTR2
INTR1
INTR0

CPU

Jan2016

INTR3

Hi vng bng phn cng

INTR
Bus d liu

M-un
vo-ra

M-un
vo-ra

M-un
vo-ra

C
ngt

M-un
vo-ra

CPU

CPU thc hin phn mm hi ln lt tng


m-un vo-ra
Chm
Th t cc m-un c hi vng chnh l
th t u tin
Computer Architecture

Nguyn Kim Khnh DCE-HUST

459

Jan2016

INTR
INTA

M-un
vo-ra

M-un
vo-ra

Computer Architecture

M-un
vo-ra

M-un
vo-ra

460

115

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Hi vng bng phn cng (tip)

B iu khin ngt lp trnh c


INTR n

Bus d liu

CPU pht tn hiu chp nhn ngt


(INTA) n m-un vo-ra u tin
Nu m-un vo-ra khng gy ra
ngt th n gi tn hiu n m-un k
tip cho n khi xc nh c m-un
gy ngt
Th t cc m-un vo-ra kt ni trong
chui xc nh th t u tin

INTRn-1

Computer Architecture

n
n

461

M-un
vo-ra

M-un
vo-ra

462

NKK-HUST

3. DMA (Direct Memory Access)

C s kt hp gia phn cng v phn


mm

Phn cng: gy ngt CPU


n Phn mm: trao i d liu gia CPU vi
m-un vo-ra

Jan2016

M-un
vo-ra

Computer Architecture

INTR0

PIC Programmable Interrupt Controller


PIC c nhiu ng vo yu cu ngt c qui
nh mc u tin
PIC chn mt yu cu ngt khng b cm c
mc u tin cao nht gi ti CPU

Jan2016

c im ca vo-ra iu khin bng ngt

INTR1

M-un
vo-ra

NKK-HUST

PIC
INTA

n
Jan2016

...

INTR
CPU

Nguyn Kim Khnh DCE-HUST

463

Jan2016

Chim thi gian ca CPU

khc phc dng k thut DMA


n

CPU trc tip iu khin vo-ra


CPU khng phi i m-un vo-ra, do
hiu qu s dng CPU tt hn

Computer Architecture

Vo-ra bng chng trnh v bng ngt


do CPU trc tip iu khin:

S dng m-un iu khin vo-ra chuyn


dng, gi l DMAC (Controller), iu khin
trao i d liu gia m-un vo-ra vi b
nh chnh

Computer Architecture

464

116

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

S cu trc ca DMAC

Cc ng d liu

B m d liu

Thanh ghi d liu

Thanh ghi a ch

Cc ng a ch

iu khin c

Yu cu bus
Chuyn nhng bus

iu khin ghi

Ngt

Jan2016

Cc thnh phn ca DMAC

Logic iu khin

Yu cu DMA

Ghi

Chp nhn DMA

Computer Architecture

465

NKK-HUST

Jan2016

Computer Architecture

n
n

n
n

Jan2016

Vo hay Ra d liu
a ch thit b vo-ra (cng vo-ra tng ng)
a ch u ca mng nh cha d liu np vo
thanh ghi a ch
S t d liu cn truyn np vo b m d liu

CPU lm vic khc


DMAC iu khin trao i d liu
Sau khi truyn c mt t d liu th:
n

Cc kiu thc hin DMA

CPU ni cho DMAC


n

466

NKK-HUST

Hot ng DMA
n

Thanh ghi d liu: cha d liu trao i


Thanh ghi a ch: cha a ch ngn
nh d liu
B m d liu: cha s t d liu cn
trao i
Logic iu khin: iu khin hot ng
ca DMAC

ni dung thanh ghi a ch tng


ni dung b m d liu gim

Khi b m d liu = 0, DMAC gi tn hiu ngt


CPU bo kt thc DMA
Computer Architecture

Nguyn Kim Khnh DCE-HUST

467

Jan2016

DMA truyn theo khi (Block-transfer DMA):


DMAC s dng bus truyn xong c khi
d liu
DMA ly chu k (Cycle Stealing DMA): DMAC
cng bc CPU treo tm thi tng chu k
bus, DMAC chim bus thc hin truyn mt
t d liu.
DMA trong sut (Transparent DMA): DMAC
nhn bit nhng chu k no CPU khng s
dng bus th chim bus trao i mt t d
liu.
Computer Architecture

468

117

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Cu hnh DMA (1)

Cu hnh DMA (2)


System Bus

System Bus
CPU

CPU

I/O
Module

DMAC

. .
.

I/O
Module

Memory

I/O
Module

Mi ln trao i mt d liu, DMAC s dng


bus hai ln
n
n

Gia m-un vo-ra vi DMAC


Gia DMAC vi b nh

Computer Architecture

469

NKK-HUST

DMAC

I/O
Module

Memory

I/O
Module

DMAC iu khin mt hoc vi m-un vo-ra


Mi ln trao i mt d liu, DMAC s dng
bus mt ln
n

Jan2016

. .
.

DMAC

Jan2016

Gia DMAC vi b nh
Computer Architecture

470

NKK-HUST

Cu hnh DMA (3)

c im ca DMA
System Bus

CPU

DMAC

Memory
IO Bus

I/O
Module

I/O
Module

. .
.

I/O
Module

n
n
n

Bus vo-ra tch ri h tr tt c cc thit b cho php DMA


Mi ln trao i mt d liu, DMAC s dng bus mt ln
n

Jan2016

CPU khng tham gia trong qu trnh


trao i d liu
DMAC iu khin trao i d liu gia
b nh chnh vi m-un vo-ra (hon
ton bng phn cng) tc nhanh
Ph hp vi cc yu cu trao i mng
d liu c kch thc ln

Gia DMAC vi b nh

Computer Architecture

Nguyn Kim Khnh DCE-HUST

471

Jan2016

Computer Architecture

472

118

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

4. B x l vo-ra
n

8.3. Ni ghp thit b vo-ra

Vic iu khin vo-ra c thc hin


bi mt b x l vo-ra chuyn dng
B x l vo-ra hot ng theo chng
trnh ca ring n
Chng trnh ca b x l vo-ra c th
nm trong b nh chnh hoc nm
trong mt b nh ring

Jan2016

Computer Architecture

1. Cc kiu ni ghp vo-ra


n Ni ghp song song
n Ni ghp ni tip

473

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

Ni ghp song song

M-un
vo-ra
song
song

n
bus
h
thng

Ni ghp ni tip

n
bus
h
thng

n
thit b
vo-ra

n
n
n
Jan2016

474

Truyn nhiu bit song song


Tc nhanh
Cn nhiu ng truyn d liu
Computer Architecture

Nguyn Kim Khnh DCE-HUST

n
n
475

Jan2016

M-un
vo-ra
ni tip

n
thit b
vo-ra

Truyn ln lt tng bit


Cn c b chuyn i t d liu song song sang
ni tip hoc/v ngc li
Tc chm hn
Cn t ng truyn d liu
Computer Architecture

476

119

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

2. Cc cu hnh ni ghp

Thunderbolt
7.7 / THE EXTERNAL INTERFACE: THUNDERBOLT AND INFINIBAND

im ti im (Point to Point)
n

COMPUTER

Thng qua mt cng vo-ra ni ghp vi mt


thit b

Thng qua mt cng vo-ra cho php ni


ghp c vi nhiu thit b
n V d:

Processor

Platform
controller
hub (PCH)

DisplayPort

Memory

Graphics
Subsystem

im ti a im (Point to Multipoint)

251

DisplayPort

PCIe x4
TC

Thunderbolt
controller

Thunderbolt
connector

USB (Universal Serial Bus): 127 thit b


IEEE 1394 (FireWire): 63 thit b
Thunderbolt

Thunderbolt
20 Gbps (max)

Daisy
chain

TC

TC

Figure 7.17 Example Computer Configuration with Thunderbolt

Jan2016

Computer Architecture

477

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

478

THUNDERBOLT PROTOCOL ARCHITECTURE Figure 7.18 illustrates the


Thunderbolt protocol architecture. The cable and connector layer provides
transmission medium access. This layer specifies the physical and electrical
attributes of the connector port.
The Thunderbolt protocol physical layer is responsible for link maintenance
including hot-plug3 detection and data encoding to provide highly efficient data
transfer. The physical layer has been designed to introduce very minimal overhead
and provides full-duplex 10 Gbps of usable capacity to the upper layers.
The common transport layer is the key to the operation of Thunderbolt and
what makes it attractive as a high-speed peripheral I/O technology. Some of the
features include:

Kin trc my tnh

A high-performance, low-power, switching architecture.


A highly efficient, low-overhead packet format with flexible quality of service
(QoS) support that allows multiplexing of bursty PCI Express transactions

Chng 9
CC KIN TRC SONG SONG
3

The term hot plug is defined as pulling out a component from a system and plugging in a new one while
the main power is still on. It allows an external drive, network adapter, or other peripheral to be plugged
in without having to power down the computer.

Ht chng 8

Nguyn Kim Khnh


Trng i hc Bch khoa H Ni

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

479

Jan2016

Computer Architecture

480

120

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Ni dung ca chng 9

Ni dung hc phn
Chng 1. Gii thiu chung
Chng 2. C bn v logic s
Chng 3. H thng my tnh
Chng 4. S hc my tnh
Chng 5. Kin trc tp lnh
Chng 6. B x l
Chng 7. B nh my tnh
Chng 8. H thng vo-ra
Chng 9. Cc kin trc song song
Jan2016

Computer Architecture

9.1. Phn loi kin trc my tnh


9.2. a x l b nh dng chung
9.3. a x l b nh phn tn
9.4. B x l ha a dng

481

NKK-HUST

Jan2016

Computer Architecture

NKK-HUST

9.1. Phn loi kin trc my tnh

SISD
CU

Phn loi kin trc my tnh (Michael Flynn -1966)


n

SISD - Single Instruction Stream, Single Data Stream

SIMD - Single Instruction Stream, Multiple Data Stream

MISD - Multiple Instruction Stream, Single Data Stream

MIMD - Multiple Instruction Stream, Multiple Data Stream

n
n
n
n
n
n
n

Jan2016

482

Computer Architecture

Nguyn Kim Khnh DCE-HUST

483

Jan2016

IS

PU

DS

MU

CU: Control Unit


PU: Processing Unit
MU: Memory Unit
Mt b x l
n dng lnh
D liu c lu tr trong mt b nh
Chnh l Kin trc von Neumann (tun t)
Computer Architecture

484

121

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

SIMD

SIMD (tip)
PU1

CU

PU2

IS

DS

DS

LM1

LM2
n

.
.
.

PUn

DS

n dng lnh iu khin ng thi cc


n v x l PUs
Mi n v x l c mt b nh d liu
ring LM (local memory)
Mi lnh c thc hin trn mt tp
cc d liu khc nhau
Cc m hnh SIMD

LMn

n
n

Jan2016

Computer Architecture

485

NKK-HUST

Jan2016

Computer Architecture

n
n

MIMD

Mt lung d liu cng c truyn n


mt tp cc b x l
Mi b x l thc hin mt dy lnh
khc nhau.
Cha tn ti my tnh thc t
C th c trong tng lai

n
n

Tp cc b x l
Cc b x l ng thi thc hin cc
dy lnh khc nhau trn cc d liu
khc nhau
Cc m hnh MIMD
n
n

Jan2016

486

NKK-HUST

MISD
n

Vector Computer
Array processor

Computer Architecture

Nguyn Kim Khnh DCE-HUST

487

Jan2016

Multiprocessors (Shared Memory)


Multicomputers (Distributed Memory)

Computer Architecture

488

122

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

MIMD - Shared Memory

MIMD - Distributed Memory


a x l b nh phn tn
(distributed memory mutiprocessors or
multicomputers)

a x l b nh dng chung
(shared memory mutiprocessors)
CU1

CU2

IS

IS

PU1

PU2

.
.
.

CUn

DS

CU1
DS

.
.
.

IS

Jan2016

PUn

B nh
dng
chung

CU2

CUn

489

NKK-HUST

Jan2016

.
.
.

IS

PUn

LM2
.
.
.

DS

Mng
lin
kt
hiu
nng
cao

LMn

490

SIMD

H thng a x l i xng (SMPSymmetric Multiprocessors)


H thng a x l khng i xng
(NUMA Non-Uniform Memory Access)
B x l a li (Multicore Processors)

Song song mc lung


n

MIMD

Song song mc yu cu
n

Jan2016

pipeline
superscalar

Song song mc d liu


n

DS

LM1

9.2. a x l b nh dng chung

Song song mc lnh


n

PU2

DS

NKK-HUST

PU1

Computer Architecture

Phn loi cc k thut song song


n

IS

.
.
.

DS

Computer Architecture

IS

Cloud computing
Computer Architecture

Nguyn Kim Khnh DCE-HUST

491

Jan2016

Computer Architecture

492

123

598

PARALLEL COMPUTER ARCHITECTURES

CHAP. 8

Memory consistency is not a done deal. Researchers are still proposing new

Bi ging Kinmodels
trc
my
(Naeem
et al.,tnh
2011, Sorin et al., 2011, and Tu et al., 2010).

Jan2016

8.3.3 UMA Symmetric Multiprocessor Architectures


NKK-HUST

SMP hay UMA (Uniform Memory Access)

SMP (tip)
n
n

(a)

SEC. 8.3

(b)

(c)

SHARED-MEMORY MULTIPROCESSORS

Figure 8-26. Three bus-based multiprocessors. (a) Without caching. (b) With
caching. (c) With caching and private memories.

607

system is called CC-NUMA (at least by the hardware people). The software people often
because
basically
the memory,
same as software
If thecall
busitishardware
busy whenDSM
a CPU
wantsittois read
or write
the CPUdisjust
tributed
shared
but implemented
thethe
hardware
small
page size.
waits until
the memory
bus becomes
idle. Hereinby
lies
problemusing
with athis
design.
With
One
of
the
first
NC-NUMA
machines
(although
the
name
had
not
yet
been
two or three CPUs, contention for the bus will be manageable; with 32 or 64 it will
coined)
was the The
Carnegie-Mellon
illustrated
in the
simplified
formofintheFig.
be unbearable.
system will beCm*,
totally
limited by
bandwidth
bus,8-32
and
(Swan
al.,CPUs
1977).
consisted
of of
a collection
most ofetthe
willIt be
idle most
the time. of LSI-11 CPUs, each with some
Jan2016
Computer
Architecture
493
memory
a local
bus. to(The
a single-chip
of the
The addressed
solution isover
to add
a cache
eachLSI-11
CPU, was
as depicted
in Fig.version
8-26(b).
The
DEC
a minicomputer
popular
In on
addition,
the LSI-11
sys-or
cachePDP-11,
can be inside
the CPU chip,
next in
to the
the 1970s.)
CPU chip,
the processor
board,
tems
connected
a system
bus.many
When
a memory
cameout
intoofthe
some were
combination
of by
all three.
Since
reads
can nowrequest
be satisfied
the
(specially
modified)
MMU,
a check
to see
thesystem
word needed
was inmore
the
local cache,
there will
be much
lesswas
busmade
traffic,
andifthe
can support
local
If so, a isrequest
the localasbus
get the
If not,
CPUs.memory.
Thus caching
a big was
win sent
here.over
However,
wetoshall
see word.
in a moment,
the
request
routed
over the
bus toisthe
keeping
the was
caches
consistent
withsystem
one another
not system
trivial. containing the word,
NKK-HUST
which
responded.
Of iscourse,
the latter
muchin longer
than CPU
the former.
Yetthen
another
possibility
the design
of Fig.took
8-26(c),
which each
has not
While
program
happily
outmemory
of remote
memory,
it took over
10 times
longer
only aacache
but could
also arun
local,
private
which
it accesses
a dedicated
to
execute
than
the
same
program
running
out
of
local
memory.
(private) bus. To use this configuration optimally, the compiler should place all the
program text, strings, constants and other read-only data, stacks, and local variCPU
Memory
CPU The
Memory
CPU Memory
ables in the
private
memories.
shared memory
is then used CPU
only Memory
for writable
shared variables. In most cases, this careful placement will greatly reduce bus traffic, but it does require active cooperation from the compiler.

n
Jan2016

Local bus

Local bus

Local bus

B x l a li (multicores)
666

8-32.
A NUMA gian
machine based
two levels
of buses.
The Cm*
the CPU
CFigure
mt
khng
a onch
chung
cho
ttwasc
first multiprocessor to use this design.
n Mi CPU c th truy cp t xa sang b nh ca
Memory coherence is guaranteed in an NC-NUMA machine because no caching isCPU
present.khc
Each word of memory lives in exactly one location, so there is no
danger of one copy having stale data: there are no copies. Of course, it now matn
nhp
nh
t xamemory
chm
hnthetruy
nhppenalty
b
ters aTruy
great deal
whichb
page
is in which
because
performance
for being
in cc
the wrong
place is so high. Consequently, NC-NUMA machines use
nh
b
elaborate software to move pages around to maximize performance.

Nguyn Kim Khnh DCE-HUST

Program counter
Instruction fetch unit

Issue logic
Single-thread register file
Execution units and queues

L1 instruction cache

L1 data cache
L2 cache

(a) Superscalar

Typically, a daemon process called a page scanner runs every few seconds.
job is to examine the usage statistics
and move pages around in an attempt to 495
Computer Architecture
improve performance. If a page appears to be in the wrong place, the page scanner
unmaps it so that the next reference to it will cause a page fault. When the fault
occurs, a decision is made about where to place the page, possibly in a different
memory. To prevent thrashing, usually there is some rule saying that once a page
is placed, it is frozen in place for a time T . Various algorithms have been studied,
but the conclusion is that no one algorithm performs best under all circumstances
(LaRowe and Ellis, 1991). Best performance depends on the application.

CHAPTER 18 / MULTICORE COMPUTERS

Thay i ca b x
l:
Tun t
n Pipeline
n Siu v hng
n a lung
n a li: nhiu CPU
trn mt chip

Local bus

System bus

Its
Jan2016

494

NKK-HUST

NUMA (Non-Uniform Memory Access)

MMU

Computer Architecture

Issue logic

Instruction fetch unit

Registers n

Bus

Execution units and queues

L1 instruction cache

L1 data cache
L2 cache

(b) Simultaneous multithreading

Processor n
(superscalar or SMT)

Cache

L1-I
L1-D

PC n

CPU

Register 1

CPU

Processor 3
(superscalar or SMT)

L1-I
L1-D

CPU

PC 1

CPU

Processor 2
(superscalar or SMT)

Processor 1
(superscalar or SMT)

CPU

L1-I
L1-D

CPU

Shared
memory

Private memory

Shared memory

Mt my tnh c n >= 2 b x l ging nhau


Cc b x l dng chung b nh v h thng
vo-ra
Thi gian truy cp b nh l bng nhau vi cc
b x l
Cc b x l c th thc hin chc nng ging
nhau
H thng c iu khin bi mt h iu hnh
phn tn
Hiu nng: Cc cng vic c th thc hin song
song
Kh nng chu li

L1-I
L1-D

The simplest multiprocessors are based on a single bus, as illustrated in


Fig. 8-26(a). Two or more CPUs and one or more memory modules all use the
same bus for communication. When a CPU wants to read a memory word, it first
checks to see whether the bus is busy. If the bus is idle, the CPU puts the address
of the word it wants on the bus, asserts a few control signals, and waits until the
memory puts the desired word on the bus.

NKK-HUST

L2 cache
(c) Multicore

Figure 18.1 Alternative Chip Organizations

Jan2016

Computer Architecture
496 to
For each of these innovations, designers have over the years attempted
increase the performance of the system by adding complexity. In the case of pipelining, simple three-stage pipelines were replaced by pipelines with five stages, and
then many more stages, with some implementations having over a dozen stages.
There is a practical limit to how far this trend can be taken, because with more
stages, there is the need for more logic, more interconnections, and more control
signals. With superscalar organization, increased performance can be achieved by
increasing the number of parallel pipelines. Again, there are diminishing returns as
the number of pipelines increases. More logic is required to manage hazards and
to stage instruction resources. Eventually, a single thread of execution reaches the
point where hazards and resource dependencies prevent the full use of the multiple

124

Intel has introduced a number of multicore products in recent years. In this section,
we look at two examples: the Intel Core Duo and the Intel Core i7-990X.

Intel Core Duo

CPU Core n

CPU Core 1

CPU Core n

L1-D L1-I

L1-D L1-I

L1-D L1-I

L1-D L1-I

L2 cache

L2 cache

Main memory

I/O

L2 cache

I/O

Main memory

(b) Dedicated L2 cache

(a) Dedicated L1 cache

2006
Two x86 superscalar,
shared L2 cache
Dedicated L1 cache
per core
n

CPU Core 1

CPU Core n

L1-D L1-I

L1-D L1-I

CPU Core 1

CPU Core n

L1-D L1-I

L1-D L1-I

L2 cache

L2 cache

L2 cache

Main memory

Jan2016

Thermal control

APIC

APIC

Power management logic

32KiB instruction and


32KiB data

2 MB L2 shared cache

2MiB shared L2 cache

SEC. 8.4

Main memory

I/O

I/O

(d ) Shared L3 cache

Multicore Organization Alternatives


Computer Architecture

4. Interprocessor communication is easy to implement, via shared memory locations.


5. The use of a shared L2 cache confines the cache coherency problem to the L1
cache level, which may provide some additional performance advantage.

497

A potential advantage to having only dedicated L2 caches on the chip is that


each core enjoys more rapid access to its private L2 cache. This is advantageous for
threads that exhibit strong locality.
As both the amount of memory available and the number of cores grow, the
NKK-HUST
use of a shared L3 cache combined with either a shared L2 cache or dedicated percore L2 caches seems likely to provide better performance than simply a massive
shared L2 cache.
Another organizational design decision in a multicore system is whether the
cores will be COMPUTERS
superscalar or will implement simultaneous multithreading
678 CHAPTERindividual
18 / MULTICORE
(SMT). For example, the Intel Core Duo uses superscalar cores, whereas the Intel
Core i7 uses SMT cores. SMT has the effect of scaling up the number of hardwarelevel threads that the multicore system supports. Thus, a multicore system with four
cores and SMT that supports four simultaneous threads in each core appears the
same
as a 2multicoreCore
system
cores.
As software
Core to
0 the application
Core 1 level Core
3 with 16
Core
4
Core 5 is
developed to more fully exploit parallel resources, an SMT approach appears to be
more attractive than a superscalar approach.

Intel Core i7-990X

Bus interface

617

MESSAGE-PASSING MULTICOMPUTERS

32 kB 32 kB
L1-I L1-D

32 kB 32 kB
L1-I L1-D

32 kB 32 kB
L1-I L1-D

32 kB 32 kB
L1-I L1-D

32 kB 32 kB
L1-I L1-D

32 kB 32 kB
L1-I L1-D

256 kB
L2 Cache

256 kB
L2 Cache

256 kB
L2 Cache

256 kB
L2 Cache

256 kB
L2 Cache

256 kB
L2 Cache

As a consequence of these and other factors, there is a great deal


of interest in
Front-side bus
building and using parallel computers in which each CPU has its own private memory, not directly accessible to any other CPU. These are
the18.9
multicomputers.
ProFigure
Intel Core Duo Block
Diagram
grams on multicomputer CPUs interact using primitives like send and receive to
explicitly pass messages because they cannot get at each others memory with
LOAD and STORE instructions. Computer
This difference
Jan2016
Architecture completely changes the programming model.
Each node in a multicomputer consists of one or a few CPUs, some RAM
(conceivably shared among the CPUs at that node only), a disk and/or other I/O devices, and a communication processor. The communication processors are connected by a high-speed interconnection network of the types we discussed in Sec.
8.3.3. Many different topologies, switching schemes, and routing algorithms are
used. What all multicomputers have in common is that when an application proNKK-HUST
gram executes the send primitive, the communication processor is notified and
transmits a block of user data to the destination machine (possibly after first asking
for and getting permission). A generic multicomputer is shown in Fig. 8-36.
CPU

Node

Memory

Local interconnect

Disk
and
I/O

Local interconnect

Disk
and
I/O

Communication
processor
High-performance interconnection network

DDR3 Memory
Controllers

QuickPath
Interconnect

3 ! 8B @ 1.33 GT/s

4 ! 20B @ 6.4 GT/s

Figure 8-36. A generic multicomputer.

Figure 18.10 Intel Core i7-990X Block Diagram

The general structure of the Intel Core i7-990X is shown in Figure 18.10. Each
core has its own dedicated L2 cache and the four cores share a 12-MB L3 cache.
Computer
Architecture
One mechanism Intel uses to make
its caches
more effective is prefetching, in which
the hardware examines memory access patterns and attempts to fill the caches speculatively with data thats likely to be requested soon. It is interesting to compare the
performance of this three-level on chip cache organization with a comparable twolevel organization from Intel. Table 18.1 shows the cache access latency, in terms of
clock cycles for two Intel multicore systems running at the same clock frequency.
The Core 2 Quad has a shared L2 cache, similar to the Core Duo. The Core i7
improves on L2 cache performance with the use of the dedicated L2 caches, and
provides a relatively high-speed access to the L3 cache.
The Core i7-990X chip supports two forms of external communications to
other chips. The DDR3 memory controller brings the memory controller for the

Nguyn Kim Khnh DCE-HUST

498

9.3. a x l b nh phn tn

12 MB
L3 Cache

Jan2016

Thermal control

L3 cache

(c) Shared L2 cache

Figure 18.8

32-kB L1 Caches

CPU Core 1

Intel - Core Duo

675

Execution
resources

18.3 / MULTICORE ORGANIZATION

Arch. state

Cc dng t chc b x l a li

Arch. state

NKK-HUST

Execution
resources

NKK-HUST

Jan2016

The Intel Core Duo, introduced in 2006, implements two x86 superscalar processors
with a shared L2 cache (Figure 18.8c).
The general structure of the Intel Core Duo is shown in Figure 18.9. Let us
consider the key elements starting from the top of the figure. As is common in multicore systems, each core has its own dedicated L1 cache. In this case, each core has
a 32-kB instruction cache and a 32-kB data cache.
Each core has an independent thermal control unit. With the high transistor
density of todays chips, thermal management is a fundamental capability, especially for laptop and mobile systems. The Core Duo thermal control unit is designed
to manage chip heat dissipation to maximize processor performance within thermal
constraints. Thermal management also improves ergonomics with a cooler system
and lower fan acoustic noise. In essence, the thermal management unit monitors
digital sensors for high-accuracy die temperature measurements. Each core can
be defined as an independent thermal zone. The maximum temperature for each

32-kB L1 Caches

Bi ging Kin trc my tnh

499

My tnh qui m ln (Warehouse Scale Computers


Interconnection
NetworksProcessors MPP)
or 8.4.1
Massively
Parallel
In Fig. 8-36 we see that multicomputers are held together by interconnection
My
tnhNow
cm
(clusters)
networks.
it is time
to look more closely at these interconnection networks.

Jan2016

Interestingly enough, multiprocessors and multicomputers are surprisingly similar


in this respect because multiprocessors often have multiple memory modules that
must also be interconnected with one another and with the CPUs. Thus the materComputer
Architecture
ial in this section frequently applies
to both
kinds of systems.
The fundamental reason why multiprocessor and multicomputer interconnection networks are similar is that at the very bottom both of them use message

500

125

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

Mng lin kt
SEC. 8.4

MESSAGE-PASSING MULTICOMPUTERS

Massively Parallel Processors

619

n
(a)

(b)

n
n
(c)

(d)

Jan2016

624

PARALLEL

(e)

(f)

(g)

(h)

Figure 8-37. Various topologies. The heavy dots represent switches. The CPUs
Computer
and memories are not shown.
(a) A star.Architecture
(b) A complete interconnect. (c) A tree.
COMPUTER
ARCHITECTURES
CHAP. 8
(d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube.

501

coherency between the L1 caches on Interconnection


the four CPUs.networks
Thus when
piece ofby their dimensionality. For
can abeshared
characterized
memory resides in more than oneour
cache,
accesses
to that storageisby
one processor
purposes,
the dimensionality
determined
by the number of choices there are
will be immediately visible to the to
other
threethe
processors.
A memory
reference
get from
source to the
destination.
If therethat
is never any choice (i.e., there is
misses on the L1 cache but hits on
theone
L2path
cache
takes
cycles. A the network is zero dimenonly
from
eachabout
source11toclock
each destination),
miss on L2 that hits on L3 takes about
a miss inonwhich
L3 that
has tocan be made, for example, go
sional.28Ifcycles.
there is Finally,
one dimension
a choice
go to the main DRAM takes about 75 cycles.
The four CPUs are connected via a high-bandwidth bus to a 3D torus network,
which requires six connections: up, down, north, south, east, and west. In addition,
NKK-HUST
each processor has a port to the collective network, used for broadcasting data to
all processors. The barrier port is used to speed up synchronization operations, giving each processor fast access to a specialized synchronization network.
At the next level up, IBM designed a custom card that holds one of the chips
shown in Fig. 8-38 along with 2 GB of DDR2 DRAM. The chip and the card are
shown in Fig. 8-39(a)(b) respectively.

Jan2016

Computer Architecture

Cluster
n

n
2-GB
DDR2
DRAM

(a)

Card
1 Chip
4 CPUs
2 GB

Board
32 Cards
32 Chips
128 CPUs
64 GB

Cabinet
32 Boards
1024 Cards
1024 Chips
4096 CPUs
2 TB

System
72 Cabinets
73728 Cards
73728 Chips
294912 CPUs
144 TB

(b)

(c)

(d)

(e)

Figure 8-39. The BlueGene/P: (a) chip. (b) card. (c) board. (d) cabinet.
(e) system.

The cards are mounted on plug-in boards, with 32 cards per board for a total of
32 chips (and thus 128 CPUs) per board. Since each card contains 2 GB of
DRAM, the boards contain 64 GB apiece. One board is illustrated in Fig. 8-39(c).
At the next level, 32 of these boards are plugged into a cabinet, packing 4096
CPUs into a single cabinet. A cabinet is illustrated in Fig. 8-39(d).
Finally, a full system, consisting of up to 72 cabinets with 294,912 CPUs, is
depicted in Fig. 8-39(e). A PowerPC 450 can issue up to 6 instructions/cycle, thus

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

502

NKK-HUST

IBM Blue Gene/P

Chip:
4 processors
8-MB L3 cache

H thng qui m ln
t tin: nhiu triu USD
Dng cho tnh ton khoa hc v cc bi
ton c s php ton v d liu rt ln
Siu my tnh

n
n
503

Jan2016

Nhiu my tnh c kt ni vi nhau bng


mng lin kt tc cao (~ Gbps)
Mi my tnh c th lm vic c lp (PC
hoc SMP)
Mi my tnh c gi l mt node
Cc my tnh c th c qun l lm vic
song song theo nhm (cluster)
Ton b h thng c th coi nh l mt my
tnh song song
Tnh sn sng cao
Kh nng chu li ln
Computer Architecture

504

126

Bi ging Kin trc my tnh

Jan2016

NKK-HUST

NKK-HUST

PC Cluster ca Google
SEC. 8.4

9.4. B x l ha a dng

635

MESSAGE-PASSING MULTICOMPUTERS

hold exactly 80 PCs and switches can be larger or smaller than 128 ports; these are
just typical values for a Google cluster.
OC-12 Fiber

OC-48 Fiber

n
n

128-port Gigabit
Ethernet switch

128-port Gigabit
Ethernet switch

Two gigabit
Ethernet links

n
80-PC rack

Kin trc SIMD


Xut pht t b x l ha GPU (Graphic
Processing Unit) h tr x l ha 2D v
3D: x l d liu song song
GPGPU General purpose Graphic
Processing Unit
H thng lai CPU/GPGPU
n
n

CPU l host: thc hin theo tun t


GPGPU: tnh ton song song

Figure 8-44. A typical Google cluster.

Jan2016

Power density is also a key


issue. A Architecture
typical PC burns about 120 watts or about
Computer
10 kW per rack. A rack needs about 3 m2 so that maintenance personnel can install and remove PCs and for the air conditioning to function. These parameters
give a power density of over 3000 watts/m2 . Most data centers are designed for
6001200 watts/m2 , so special measures are required to cool the racks.
Google has learned three key things about running massive Web servers that
bear repeating.

505

Jan2016

Computer Architecture

506

1. Components will fail so plan for it.

NKK-HUST

2. Replicate everything for throughput and availability.

NKK-HUST

3. Optimize price/performance.

B x l ha trong my tnh

GPGPU: NVIDIA Tesla


nStreaming
multiprocessor

8 Streaming
processors

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

507

Jan2016

Computer Architecture

508

127

and CUDA cores and other execution units in the SM execute threads. The SM executes
threads in groups of 32 threads called a warp. While programmers can generally ignore warp
execution for functional correctness and think of programming one thread, they can greatly
improve performance by having threads in a warp execute the same code path and access
memory in nearby addresses.

Bi ging Kin trc my tnh

Jan2016

An Overview of the Fermi Architecture


The first Fermi based GPU, implemented with 3.0 billion transistors, features up to 512 CUDA
cores. A CUDA core executes a floating point or integer instruction per clock for a thread. The

NKK-HUST 512 CUDA cores are organized in 16 SMs of 32 cores each. The GPU has six 64-bit memory

NKK-HUST

GPGPU: NVIDIA Fermi

NVIDIA Fermi

partitions, for a 384-bit memory interface, supporting up to a total of 6 GB of GDDR5 DRAM


memory. A host interface connects the GPU to the CPU via PCI-Express. The GigaThread
global scheduler distributes thread blocks to SM thread schedulers.

Instruction Cache

Third Generation Streaming


Multiprocessor

C 16 Streaming
Multiprocessors (SM)
Mi SM c 32 CUDA
cores.
Mi CUDA core
(Cumpute Unified
Device Architecture) c
01 FPU v 01 IU

The third generation SM introduces several


architectural innovations that make it not only the
most powerful SM yet built, but also the most
programmable and efficient.

Jan2016

Fermis 16 SM are positioned around a common L2 cache. Each SM is a vertical


rectangular strip that contain an orange portion (scheduler and dispatch), a green portion
Computer
Architecture
(execution units), and light
blue portions
(register file and L1 cache).

509

Jan2016

Warp Scheduler

Dispatch Unit

Dispatch Unit

Register File (32,768 x 32-bit)

Core

Core

Core

Core

Core

Core

Core

Core

LD/ST
LD/ST

512 High Performance CUDA cores

Warp Scheduler

SFU

LD/ST
LD/ST

Each SM features 32 CUDA


LD/ST
CUDA Core
Core
Core
Core
Core
Dispatch Port
LD/ST
processorsa fourfold
Operand Collector
LD/ST
increase over prior SM
Core
Core
Core
Core
LD/ST
designs. Each CUDA
FP Unit
INT Unit
LD/ST
processor has a fully
Core
Core
Core
Core
LD/ST
Result Queue
pipelined integer arithmetic
LD/ST
logic unit (ALU) and floating
Core
Core
Core
Core
LD/ST
point unit (FPU). Prior GPUs used IEEE 754-1985
LD/ST
floating point arithmetic. The Fermi architecture
Core
Core
Core
Core
LD/ST
implements the new IEEE 754-2008 floating-point
LD/ST
standard, providing the fused multiply-add (FMA)
Core
Core
Core
Core
LD/ST
instruction for both single and double precision
arithmetic. FMA improves over a multiply-add
Interconnect Network
(MAD) instruction by doing the multiplication and
64 KB Shared Memory / L1 Cache
addition with a single final rounding step, with no
Uniform Cache
Cache
Uniform
loss of precision in the addition. FMA is more
Fermi Streaming Multiprocessor (SM)
accurate than performing the operations
separately. GT200 implemented double precision FMA.

SFU

SFU

SFU

Computer
Architecture
510
In GT200, the integer
ALU was
limited to 24-bit precision for multiply operations; as a result,
multi-instruction emulation sequences were required for integer arithmetic. In Fermi, the newly
designed integer ALU supports full 32-bit precision for all instructions, consistent with standard
programming language requirements. The integer ALU is also optimized to efficiently support
64-bit and extended precision operations. Various instructions are supported, including
Boolean, shift, move, compare, convert, bit-field extract, bit-reverse insert, and population
count.
16 Load/Store Units

NKK-HUST

Each SM has 16 load/store units, allowing source and destination addresses to be calculated
for sixteen threads per clock. Supporting units load and store the data at each address to
cache or DRAM.

8
8

Ht

Jan2016

Computer Architecture

Nguyn Kim Khnh DCE-HUST

511

128

You might also like