Pcasm Book K2opt

PC Assembly Language
Paul A.Carter
July 23, 2006
Copyright ? 2001, 2002, 2003, 2004 by Paul

Carter
This
may
be reproduced
entirety (including this

permission
made
for
notice)
the
and distributed
au-thorship,
provided
document
authors
consent.
excerpts
like
This
reviews
in its
copyright and
that
no
charge
itself,
without
includes
fair
and
is
the
use
advertising,
and
derivative works like trans-lations.
Note that this restriction is not intended to

prohibit charging for the service of printing
or
copying the document.
Instructors are encouraged to use this document as

a class resource; however, the author would
appreciate being notified inthis
case.
Contents
Preface
1 Introduction
1.1
1.2
.........
..........
...........
.......
.....
..........
.........
...
..
........
.
..
.........
Number Systems
1.1.1
1.1.2
Decimal
Binary
1.1.3
Hexadecimal
Computer Organization
1.2.1
Memory
1.2.2
The CPU
1.2.3
The 80x86 family of CPUs
1.2.4
8086 16-bit Registers
1.2.5
80386 32-bit registers
1.2.6
Real Mode
1.2.7
16-bit Protected Mode
1.2.8
1.2.9
Interrupts
1.3
1.4
.......
....
....
...
....
.........
....
.........
.......
.......
..
...
...
Assembly Language
1.3.1
Machine language
1.3.2
Assembly language
1.3.3
Instruction operands
1.3.4
Basic instructions
1.3.5
Directives
1.3.6
Input and Output
1.3.7
Debugging
1.4.1
a Program
First program
1.4.2
Compiler dependencies
1.4.3
Assembling the code
1.4.4
Compiling the C code
1.4.5
Linking the object files
Creating
...............
1
1
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
...............
1
1
3
4
4
5
6
7
8
8
9
10
10
11
11
11
12
12
...............
...............
...............
...............
...............
...............
...............
...............
............... .
.......
1.4.6
13
16
16
18
18
22
22
23
23
Understanding
listing file
23
ii
1.5
an assembly
Skeleton File
..............
.........
.....
.........
Basic Assembly Language
2.1
Working with Integers
2.1.1
Integer representation
2.1.2
Sign extension
2.2
.......
...........
..........
......
.....
2.1.3
Twos complement arithmetic
2.1.4
Example
2.1.5
Extended precision arithmetic
program
Control Structures
2.2.1
Comparisons
2.2.2
Branch instructions
2.2.3
The loop instructions
.............
.............
.............
.............
CONTENTS
25
27
27
27
30
.............
.............
.............
.............
.............
.............
.............
.
.......... ..........
33
35
36
37
37
38
41
2.3
Translating Standard Control Structures

42
2.4
3
If statements
2.3.2
While loops
2.3.3
Do while loops
Example: Finding Prime Numbers
Bit Operations
3.1
..........
.........
..
............
..........
...........
........
..........
2.3.1
Shift Operations
3.1.1
Logical shifts
3.1.2
Use of shifts
3.1.3
Arithmetic shifts
3.1.4
Rotate shifts
3.1.5
3.2
.......
......
......
.......
......
......
......
.....
....
.........
...
Simple application
Boolean Bitwise Operations
3.2.1
The AND operation
3.2.2
The OR operation
3.2.3
The XOR operation
3.2.4
The NOT operation
3.2.5
The TEST instruction
3.2.6
Uses of bit operations
3.3
Avoiding Conditional Branches
3.4
Manipulating bits inC
3.4.1
The bitwise operators of C
3.4.2
Using bitwise operators in C
.............
.............
.............
.............
3.5
Big and Little Endian Representations
42
43
43
43
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
47
47
47
48
48
49
49
50
50
50
51
51
51
52
53
.............
.............
.............
.............
......
....
.......................
56
56
56
57
3.5.1
When
to Care About
Little
and Big
.
.......................
....................
...
.................
...........
.....
................
...
..........
59 3.6
Endian
Counting Bits
60 3.6.1
Method
one
60 3.6.2
Method two
61 3.6.3
Method three
123
CONTENTS
Subprograms
4.1
Indirect Addressing
4.2
Simple Subprogram Example
4.3
The Stack
4.4
The CALL and RET Instructions
4.5
Calling Conventions
4.5.1
Passing parameters
on the stack
4.5.2
Local variables
..
........
......
.........
.......
.......
on the stack
4.6
Multi-Module Programs
4.7
Interfacing Assembly with C
4.7.1
Saving registers
4.7.2
Labels of functions
4.7.3
Passing parameters
............
............
............
............
............
............
iii
65
65
66
68
69
70
70
............
............
............
............
............
............
.
.......
....
................
75
77
80
81
82
82
4.7.4
Calculating addresses of local variables
...........................
............
.............
.......... ..............
...
.......
...
82 4.7.5
Returning values
83 4.7.6
Other calling conventions
83 4.7.7
Examples
85 4.7.8
Calling C
88
functions from assembly
4.8
Reentrant and Recursive Subprograms
89 4.8.1
Recursive subprograms
89 4.8.2
Review of C variable storage types
91
Arrays
............ ...............
............ ..........
............
..
.......
...
............
...
......
..
............
....
95
5.1
Introduction
95
5.1.1
Defining
arrays
95
5.1.2
Accessing elements of
arrays
96
1 3
More advanced indirect addressing
98
5.2
5.1.4
Example
5.1.5
Multidimensional Arrays
Array/String Instructions
memory
5.2.1
Reading and writing
5.2.2
The REP instruction prefix
5.2.3
Comparison string instructions
5.2.4
5.2.5
The REPx instruction prefixes

Example
Floating Point
6.1
Floating Point Representation
6.1.1
Non-integral binary numbers
.............
.............
.............
.............
.............
.............
.............
.............
.............
.............
..........
...................
99
103
106
106
108
109
109
111
117
117
117
6.1.2
IEEE floating point representation
119 6.2
Arithmetic
122
Floating Point
..............
.........................
....
........
.............
............
.............
........
.....
..........
.................
...........
..........
.............
...........
6.2.1
Addition
122
6.2.2
Subtraction
iv
6.3
6.2.3
Multiplication and division
6.2.4
Ramifications for programming
The Numeric Coprocessor
6.3.1
Hardware
6.3.2
6.3.3
Instructions
Examples
6.3.4
Quadratic formula
6.3.5
Reading
6.3.6
Finding primes
array
from file
Structures and C++
7.1
7.2
Structures
7.1.1
Introduction
7.1.2
7.1.3
Memory alignment
Bit Fields
7.1.4
Using structures inassembly
Assembly and C++
7.2.1
Overloading and Name Mangling
.............
..........
..............
........
.....
.......
7.2.2
References
7.2.3
Inline functions
7.2.4
Classes
7.2.5
Inheritance and Polymorphism
7.2.6
Other C++ features
A 80x86 Instructions
A.1Non-floating Point Instructions

A.2 Floating Point Instructions
Index
............
............
............
............
CONTENTS
123
124
124
124
............
............
............
............
............
............
............
............
............
............
............
............
125
130
130
133
135
143
143
143
145
146
150
150
151
............
............
............
............
............
............
............
153
154
156
166
171
173
173
179
181
Preface
Purpose
purpose
The
of this book is to give the reader
better understanding
at
work
languages
lower
like
Pascal.
than
By
often be much
gaining
more
productive
assembly
program
used
language
supported
real
books
The
mode.
address
the computer.
multitasking
discusses how to
processors
teach
to
how
that the original PC
8086
processor
In this
mode,
only
any
device in
This mode is not suitable
instead
C and
this goal. Other PC
any memory or
secure,
later
as
in assembly language
still
processor
1981!
program may
program
way to achieve
the 8086
in
deeper
developing
software in higher level languages such

C++. Learning to
is an excellent
really
in programming
of how computers work, the reader
understanding
can
how computers
of
level
for
operating system. This book
program
the 80386 and
inprotected mode (the mode that
Windows
and
supports
the
systems
Linux
runs
in).
features
that
modern
expect,
such
memory protection.
use protected mode:
as
There
virtual
are
This
mode
operating
memory and
reasons to
several
1.It is easier to program inprotected mode than

inthe 8086 real mode
that other books
use.
2.All modern PC operating systems
run in
protected mode.
3.There is free software available that
runs in
this mode.
The lack of textbooks
for protected
assembly
is the
programming
main
mode PC
reason
that
the author wrote this book.

As alluded to above, this text makes
Free/Open
assembler
Both
Source software:
and the DJGPP
of these
the Internet.
use
NASM
are
use
of
namely, the NASM
available
C/C++
compiler.
to download
from
The text also dis-cusses how to
assembly
code
under
the Linux
operating
sys-tem
found
for all of these
on
my
drpaulcarter
the example
many
with
C/C++ compilers
Microsofts
Examples
and
web
Borlands
and
under Win-dows
platforms
can
be
site: http://www.
com/pcasm. You must download
code if you wish to assemble and
run
of the examples inthis tutorial.
v
vi
PREFACE
Be aware that this text does not attempt to

cover every aspect of assem-bly programming. The
author has tried to cover the most important
topics that all programmers should be acquainted
with.
Acknowledgements
The author would like to thank the many
programmers
around the world that have
contributed to the Free/Open Source movement.
All the programs
and even this book itself
were
produced using free software. Specifically, the

author would like to thank John S. Fine, Simon
Tatham, Julian Hall and others for developing
the NASM assembler that allthe examples in

this book
are based on; DJ Delorie
for developing
the DJGPP C/C++ compiler used; the numerous

people who have contributed to the GNU
compiler
on which DJGPP
is based
gcc
on; Donald
Knuth and others for developing the TEX and

LATE X2 typesetting languages that
were used to
produce the book; Richard Stallman (founder of

the Free Software Foundation), Linus Torvalds
(creator of the Linux kernel) and others who
produced the underlying soft-ware the author used
to produce this work.

Thanks to the following people for
corrections:
John S. Fine
Marcelo Henrique Pinto de Almeida
Sam Hopkins
Nick DImperio
Jeremiah Lawrence
Ed Beroset
Jerry Gembarowski
Ziqiang Peng
Eno Compton
Josh I
Cates
Mik Miffl in
Luke Wallis
Gaku Ueda
Brian Heward
vii
Chad Gorshing
F. Gotti
Bob Wilkinson
Markus Koegel
Louis Taber
Dave Kiddell
Eduardo Horowitz
S ebastien Le Ray
Nehal Mistry
Jianyue Wang
Jeremias Kleer
Marc Janicki
Resources
Authors
page
NASM SourceForge
DJGPP
on the Internet
http://www.drpaulcarter.com/
page
http://sourceforge.net/projects/nasm/
Linux Assembly
The Art of Assembly
http://www.delorie.com/djgpp
http://www.linuxassembly.org/
http://webster.cs.ucr.edu/
USENET
Intel documentation
comp.lang.asm.x86
http://developer.intel.com/design/Pentium4/documentation.htm
Feedback
The author welcomes
any feedback on this
work.
E-mail:
pacman128@gmail.com
WWW:
http://www.
drpaulcarter.com/pcasm
viii
PREFACE
Chapter 1
Introduction
1.1
Number Systems
Memory in a computer consists of numbers

Computer
in
memory
decimal
simplifies
does not store these numbers
(base
the
10).
hardware,
Because
it
computers
greatly
store
all
information in a binary (base 2) format. First lets
review the decimal system.
1.1.1
Decimal
are composed of 10 possible

digits (0-9). Each digit of a number has a power
of 10 associated with it based on its position in
Base 10 numbers
the number. For example:
234
=2
10
+3
10
+4
10
1.1.2
Binary
are composed of 2 possible

digits (0 and 1).Each digit of a number has a
power of 2 associated with it based on its
Base 2 numbers
position inthe number. (A single binary digit is

called
a bit.) For example:
110012
+1 2 + 0
16 + 8 + 1
1 2
=
=
+0
+1
25
This shows how binary
may
be converted to
decimal. Table 1.1shows how the first few

numbers
are represented
inbinary.
Figure 1.1shows how individual binary
digits (i.e., bits)
are added.
Heres
1
2
CHAPTER 1.INTRODUCTION
an example:
Decimal
Binary
Decimal
Binary
0000
1000
0001
1001
0010
10
1010
0011
11
1011
0100
12
1100
0101
13
1101
0110
14
1110
0111
15
1111
Table 1.1: Decimal 0 to 15 inBinary

No previous
+0
+1
carry
1
+0
1
Previous
+1
+0
0
+1
carry
1
+0
+1
Figure 1.1: Binary addition (c stands for carry)
110112
+100012
1011002
If one considers the following decimal division:
1234
can see
he
that
10
= 123 r 4
this division
strips
off the
rightmost decimal digit of the number and shifts

the other decimal digits
Dividing
one position to the right.

a similar operation, but
by two performs
for the binary digits of the number.

1:
following binary division
11012
This
fact
can
102
= 1102
1.2
shows.
rightmost digit
significant
r1
be used to convert
number to its equivalent binary
Figure
Consider the
This
decimal
representation
method
finds
as
the
first, this digit is called the least
bit (lsb). The leftmost digit is called
the most significant bit (msb). The basic unit of
memory
consists of 8 bits and is called
a byte.
The 2 subscript is used to show that the number is
represented inbinary, not decimal
1.1. NUMBER SYSTEMS
Decimal
25
= 12 r 1
2 =6 r0
2 =3 r0
2 =1
r1
2 =0 r1
12
6
3
Binary
= 1100 r 1
= 110 r 0
10 = 11r 0
10 = 1
r1
10 = 0 r 1
11001
10
1100
10
110
11
=110012
Thus 2510
Figure 1.2: Decimal conversion
589
16
36
16
16
=
=
=
Thus 589
36 r13
2 r4
0 r2
=24D16
Figure 1.3:
1.1.3
Hexadecimal
Hexadecimal
Hexadecimal
numbers
use
can
(or hex for short)
base
16.
be used
as a
shorthand
for
binary
numbers.
possible digits. This creates
Hex
a problem
has
16
since there
are no symbols to use for these extra digits after

9. By convention, letters are used for these
extra digits. The 16 hex digits are 0-9 then A,
B, C, D, Eand F.The digit A is equivalent to
10 indecimal, B is 11, etc

number
has
a power
Example:
=
=
=
2BD16
To convert
idea that
Each digit of
of 16 associated
+ 11
512 + 176 + 13
2
16
16
+ 13
16
use
the
701
from decimal to hex,
was
a hex
with it.
used for binary
conversion
same
except
divide by 16. See Figure 1.3 for an example.
reason that hex is useful

very simple way to convert
The
is that there is
4
between hex and binary. Binary numbers get large

and cumbersome
quickly.
Hex provides
much
more compact way to represent binary.

To convert a hex number to binary, simply
convert each hex digit to a 4-bit binary number.
For example, 24D16
is converted
to 0010 0100
zeros of the 4-bits

zero for the middle
digit of 24D16 is not used the result is wrong.
Converting from binary to hex is just as easy. One
does the process in reverse. Convert each 4-bit
11012
are
Note that the leading
important! If the leading
segments of the binary to hex. Start from the

right end, not the left end of the binary number.
This
ensures
that the
process uses
the correct 4-bit
segments 2.Example:
110
0000
0101
1010
0111
11102
E16
A 4-bit number
is called
nibble
Thus
each hex digit

nibbles
make
represented
value
by
ranges
a
so a
to
corresponds
a
a
byte and
nibble. Two
can
byte
be
2-digit hex number. A bytes
from 0 to 11111111 in binary, 0 to
FF inhex and 0 to 255 indecimal.
1.2
1.2.1
Computer Organization
Memory
Memory
is
memory
megabytes
can hold
measured
is
computer
units of kilobytes ( 2
10
32
with
memory
of
roughly 32 million bytes of information.
Each byte in 1,024

(2
20
and
labeled
The basic unit of
in
byte.
by
address
as
Address
bytes),
megabytes
=1,048,576
gigabytes
unique
2 30
number
as
known
is
shows.
1,073,741,824
1.4
Figure
memory
bytes)
its
bytes).
Memory
2A
45
B8
20
8F
CD
12
2E
Figure 1.4: Memory Addresses
memory
Often
single
have
is used in larger chunks than
bytes. On the PC
been
memory as
given
larger
sections
stored by
memory is numeric.
using a char-acter code
to characters.
numbers
common
character
(American
is supplanting
One
codes
Standard
Interchange)
difference
of
Table 1.2 shows.
All data in
are
names
architecture,
to these
Code
is
for
that
maps
the
most
as
is known
new, more
ASCII
of
Characters
ASCII
Informa- tion
complete code that
Unicode.
One
key
between the two codes is that ASCII
uses
2
If
it is not
clear
why
the
starting
difference, try converting the example
point
makes
starting at the left.
1.2. COMPUTER ORGANIZATION
word
2 bytes
double word
4 bytes
quad word
8 bytes
paragraph
16 bytes
Table 1.2: Units of Memory
one
a character, but
per character.
byte to encode
two bytes (or a word)
maps
ASCII
character
004116
the
capital
byte
4116
the ASCII val-ues to words

characters
important
for representing
The CPU
the
the word
limited to
Unicode extends
and allows
to be represented.
languages of the world.
1.2.2
(6510 ) to
maps
uses a byte, it is
3
uses
For example,
A; Unicode
Since ASCII
only 256 different characters
more
Unicode
characters
many
This
is
for all the
The Central Processing Unit (CPU) is the

physical device that performs
instructions. The instructions that CPUs perform
are generally very simple.

Instructions may require
the data they act
on to
be inspecial storage loca-
tions inthe CPU itself called registers. The CPU
can access
data in registers
much faster than data inmemory. However, the

number of registers in a
CPU is limited,
care to keep
so the programmer must
take
only currently
used data inregisters.
The instructions
make
a type of CPU executes
up the CPUs machine

programs
language. Machine
have
a much more
basic structure than higherlevel languages. Machine language instructions
are encoded as raw numbers,

not in friendly text formats. A CPU must be able
an instructions
purpose very quickly to run effi ciently.
to decode
Machine
language is designed with

this goal inmind, not to be easily deciphered by
humans. Programs written

inother languages must be converted to the
native machine language of
the CPU to run on the computer. A compiler

a program that translates
programs written ina programming language
is
into the machine language of
a particular computer architecture. Ingeneral,

every type of CPU has its
own unique machine language. This is one reason
why programs written for
a Mac can not run onan IBM-type PC.
Computers use a clock to synchronize the
execution of the instructions.
gigahertz
(known
GHz stands for
The clock pulses at a fixed frequency
as the clock
billion cycles per

buy a 1.5 GHz computer,
speed). When
you or one
1.5 GHz is the frequency of this clock
4.
The clock
does not keep track of minutes and seconds. It simply beats at a constant
second.
A 1.5 GHz CPU
has 1.5 billion clock pulses
per second.
3
In fact, ASCII only
uses
has 128 different values to

4
Actually, clock pulses
the lower 7-bits and
so only
use.
are used
in many different
components of
a computer.
other components often

CPU.
The
use different
clock speeds than the
6
rate. The electronics of the CPU uses the beats

to perform their operations correctly, like how the
a metronome
beats of
help
one
play music at
the correct rhythm. The number of beats (or

they
are
requires
usually
depends
model. The
called
on
number
cycles)
an
1.2.3
of cycles depends
as
on
and
the
well.
The 80x86 family of CPUs
IBM-type
PCs contain
80x86 family (or

this
instruction
the CPU generation
instructions before it and other factors
as
family
including
more recent
CPU from Intels
a clone of one). The CPUs in

some common features
all have
base machine language. However, the

members greatly enhance
the features.
8088,8086: These CPUs from the programming

standpoint
are
identical.
They
were
the CPUs
used in the earliest PCs. They provide several
16-bit registers: AX, BX, CX, DX, SI, DI, BP, SP,
CS, DS, SS, ES, IP, FLAGS. They only support
up to one
memory and only operate

a program may access
even the memory of other
megabyte of
in real mode. In this mode,
any memory
programs!
very
address,
This makes
diffi cult! Also,
divided
debugging
and security
program memory
into segments. Each segment
has to be
can not
be
larger than 64K.
80286: This CPU

adds
some new
was
used in AT class PCs. It
instructions
to the base machine
language of the 8088/86. However, its main
new
feature is 16-bit protected mode. In this mode, it
can access up to 16 megabytes and protect

programs from accessing each others memory.
However,
programs are still divided into
segments that could not be bigger than 64K.
80386: This CPU greatly enhanced the 80286.

First, it extends
the registers
many
of
to hold 32-bits (EAX, EBX, ECX,
EDX, ESI, EDI, EBP, ESP, EIP) and adds two
new
new
16-bit registers
FS and GS. It also adds
32-bit protected
mode.
In this mode, it
can access up to
again
divided
segment
into
gigabytes.
segments,
Programs
80486/Pentium/Pentium
now
but
can also be up to 4 gigabytes
are
each
insize!
Pro:
These
members of the 80x86 family add
very
few
new
up the execution
features. They mainly speed
of the
instructions.
Pentium MMX: This
processor
adds the MMX
(MultiMedia eXtensions)
instructions
instructions
to
the
Pentium.
can speed up common
graphics operations.
These
AX
AH
AL
Figure 1.5: The AX register
Pentium II:This is the Pentium Pro processor

with the MMX instructions
added. (The Pentium IIIis essentially just
a faster
Pentium II.)
1.2.4
8086 16-bit Registers
The original 8086 CPU provided four 16-bit

general
purpose
Each of these
into
registers: AX, BX, CX and DX.

registers
could be decomposed
two 8-bit registers. For example, the AX
register could be decomposed

AL registers
as
register contains
Figure
the
upper
into
the AH and
1.5 shows. The AH

(or high) 8 bits of AX
and AL contains the lower 8 bits of AX. Often
AH and AL
are
used
as
independent
one
byte
registers; however, it is important
they
are not
independent
to realize that
of AX. Changing AXs
value will change AH and AL and vice

general
purpose
registers
are
versa.
The
used in many of the
data movement and arithmetic instructions.
are two 16-bit index registers: SI and

are often used as pointers, but can be
used for many of the same purposes as the
general
registers. However, they can not be
There
DI. They
decomposed into 8-bit registers.

The 16-bit BP and SP registers
point to data inthe
are
ma-chine
are
used to
language stack and
called the Base Pointer and Stack Pointer,
respectively. These will be discussed later.
are
memory is
a program. CS
The 16-bit CS, DS, SS and ES registers
segment registers.
used
stands
for different
They denote what
parts
of
for Code Segment, DS for Data Segment,
SS for Stack Segment and ES for Extra Segment.

ES is used
as a temporary segment register. The

are in Sections 1.2.6 and
details of these registers
1.2.7.
The Instruction Pointer (IP) register is used
with the CS register
to
keep
track
of the
address of the next instruction to be executed
by the
as an
CPU. Normally,
executed, IP is advanced
instruction
is
to point to the next
instruction inmemory.
The
information
instruction
FLAGS
about
stores
register
the
These
results
results
of
are
important
previous
stored
as
individual bits inthe register. For example, the Z

bit is 1
if the result of the previous
was zero or
0 if not
zero.
instruction
Not all instructions
modify the bits in FLAGS, consult the table in

the appendix
to
see
how individual instructions
affect the FLAGS register.
8
1.2.5
80386 32-bit registers
80386
The
and
processors
later
extended registers. For example,
have
the 16-bit AX
register is extended to be 32-bits. To be backward
compatible,
AX still refers to the 16-bit register
and EAX is used to refer to the extended
32-bit
register. AX is the lower 16-bits of EAX just
as
AL is the lower 8-bits of AX (and EAX). There is
no way to access
the
upper
directly. The other extended
16-bits
registers
of EAX
are
EBX,
ECX, EDX, ESI and EDI.
Many of the other registers
well.
BP
becomes
EBP;
SP
are
extended
becomes
as
ESP;
FLAGS becomes EFLAGS and IP becomes EIP.
However
registers,
unlike the index and general

in 32-bit
below)
only
the
registers
are used.
protected
extended
The segment registers
are
mode
versions
purpose
(discussed
of
these
still 16-bit in the
80386. There
FS and
are
new segment registers:

names do not stand for
also two
GS. Their
anything
are extra temporary
They
segment
registers (like ES).
of the term word refers to
One of definitions
the size of the data registers
80x86 family, the term is

In Table 1.2, one
sees
was
80386
was
was
developed, it
given this meaning
released.
first
was
When
the
decided to leave the
of word unchanged,
definition
confusing.
that word is defined to be
2 bytes (or 16 bits). It

when the 8086
of the CPU. For the
now a little
even
though the
register size changed.
1.2.6
Real Mode
where
So
memory
did
is limited
bytes). Valid
range
mous
from?
required
The BIOS
some of the 1M
one
to only
DOS
like the video
megabyte
640K
limit
(2
20
address
to FFFFF. These
a 20-
bit number. Obviously,
a 20-bit
number will not fit into
any of the 8086s
16-bit registers. Intel solved this problem, by using two 16-bit values to
for its code and for hard-
ware devices
screen.
In real mode,
infa-
from (in hex) 00000
addresses require
come
the
determine
an address.
The first 16-bit value is called the selector. Selector
values must be stored in segment registers. The second 16-bit value is called
the offset. The physical address referenced by
computed by the formula
a 32-bit
selector:offset
pair is
16 selector
+ offset
Multiplying by 16 inhex is easy, just add
a 0 to
the right of the number. For example, the
physical addresses referenced by 047C:0048 is

given by:
047C0
+0048
04808
In effect, the selector value is a paragraph
number (see Table 1.2).
Real segmented addresses have disadvantages:
can only reference 64K

memory (the upper limit of the 16-bit offset).
What if a program has more than 64K of code?
A single value in CS can not be used for the
entire execution of the program. The program
must be split up into sections (called segments)
less than 64K in size. When execution moves
from one seg-ment to another, the value of CS
must be changed. Similar problems occur with
A single selector value
of
large amounts
This
of data
and the DS register.
can be very awkward!
Each byte in
memory
does not have
unique
segmented address. The

physical address 04808
can
be referenced
by
047C:0048, 047D:0038,
047E:0028
or
047B:0058.
the comparison of
This
can
seg-
mented addresses.
1.2.7
complicate
Inthe 80286s 16-bit protected mode,

selector values
are interpreted
completely differently than inreal mode.

real mode,
a selector
In
value
is a paragraph number of physical
memory. In
a selector
value is an index into a descriptor table.
In
both modes, programs are
protected mode,
divided into segments. Inreal mode, these
segments
are at fixed positions

memory and the selector
inphysical
value
denotes the paragraph number

of the beginning of the segment. Inprotected
mode, the segments
are not
at fixed positions inphysical
memory.
In fact,
they do not have to be in
memory at all!
Protected mode uses a technique called virtual
memory The basic idea
of a virtual memory system is to only keep the
data and code in memory that
programs are currently using. Other data and
code are stored temporarily
on disk until they are needed again. In16-bit
protected mode, segments
are
memory and disk as needed.

a segment is returned
to memory from disk, it is very likely that it will
be put into a different area
of memory that it was inbefore being moved to
moved between
When
disk. All of this is done

transparently
program
by the operating system. The
does not have to be
written differently for virtual
memory to work.
Inprotected mode, each segment is assigned
an entry
ina descriptor
table. This entry has allthe information that

the system needs to know
about the segment. This information includes:
is it currently in memory;
if in memory, where is it;access permissions
(e.g., read-only). The index

of the entry of the segment is the selector value
that is stored in segment

registers.
One big disadvantage of 16-bit protected
mode is that offsets
are still
One
well-known
PC 16-bit quantities. As a consequence of this,
segment sizes
the 286
arrays
dead.
are still limited to
columnist called
at most 64K. This makes the
problematic!
use of large
CPU brain
10
1.2.8
The 80386 introduced 32-bit protected mode.

There
are two major
dif-ferences between 386
32-bit and 286 16-bit protected modes:
are expanded to be 32-bits. This

an offset to range up
to 4 billion. Thus, segments can have sizes
1.Offsets
allows
up to 4 gigabytes.
can be divided into smaller 4K-sized

pages
The virtual memory system
works with pages now instead of segments. This
means that only parts of segment may be in
memory at any one time. In 286 16-bit mode,
either the entire segment is in memory or none of
2. Segments
units called
it
is. This
is not
practical
with
the
larger
segments that 32-bit mode allows.

InWindows 3.x, standard mode referred to
286 16-bit protected mode and enhanced mode
referred to 32-bit mode. Windows 9X, Windows

OS/2 and Linux all run inpaged
NT/2000/XP,
32-bit protected mode.
1.2.9
Interrupts
the ordinary
Sometimes
flow
of
a program
must be interrupted to process events that require
response
The hardware of a computer
a mechanism called interrupts to handle
these events. For example, when a mouse is
moved
the mouse hardware
interrupts
the
current program to handle the mouse movement
(to move the mouse cursor, etc. )Interrupts
cause control to be passed to an interrupt
handler. Interrupt handlers are routines that
process the interrupt. Each type of interrupt is
assigned an integer number.
At the beginning
of physical memory, a table of inter-rupt vectors
prompt
provides
resides that contain the segmented

the interrupt
essentially
an index
External
addresses of
handlers. The number of interrupt is

into this table.
interrupts
the CPU. (The
mouse
are
raised from outside
is an example of this type.)
Many
keyboard,
cards)
devices
I/O
raise
interrupts
(e.g.,
timer, disk drives, CD-ROM and sound
are raised from within

an error or the interrupt
interrupts are also called traps.
Internal interrupts
the CPU, either from

instruction. Error
Interrupts
generated
are
instruction
uses
called
these types of interrupts
(Application
modern
UNIX)
Interface)
interface.
Many interrupt
handlers
program
restore
values they
More
as Windows
and
return control back

when they
all the registers
to the
had before the interrupt
Thus, the interrupted
DOS
to implement its API
Programming
use a C based
interrupt
interrupts.
operating systems (such
to the interrupted
They
the
from
software
finish.
same
occurred.
program runs as if nothing

some CPU cycles).
happened (except that it lost

Traps generally
do not return. Often they abort
the program.
5
However,
kernel level.
they
may use a
lower level interface
at the
1.3. ASSEMBLY LANGUAGE
11
1.3
Assembly Language
1.3.1
Machine language
type
Every
machine
language
of
language.
are
understands
CPU
Instructions
its
in
own
machine
as bytes in memory.
own unique numeric
its operation code or opcode
for
80x86 processors instructions vary
numbers stored
Each instruc-tion has its

code
called
short. The
insize. The opcode is always at the beginning of

the instruction.
Many instructions
data (e.g., constants
or
addresses)
also include
used by the
instruction.
Machine
program
language
is
very
in directly. Deciphering
of the numerical-coded
for humans.
says to add
diffi cult
instructions
For example,
to
the meanings
is tedious
the instruction
the EAX and EBX registers
that
together
and store the result back into EAX is encoded by
the following hex codes:
C3
This is hardly obvious. Fortunately,

called
an assembler can do
a program
this tedious work for
the programmer.
1.3.2
Assembly language
An assembly
text (just
as a
language
program
is stored
higher level language
program).
Each assembly instruction represents exactly
ma-chine
as
one
instruction. For example, the addition
instruction described above would be represented
as:
inassembly language
add
eax, ebx
Here the meaning
of the instruction
is much
clearer than in machine code. The word add is

mnemonic
for
the
addition
instruction.
general form of an assembly instruction is:

mnemonic operand(s)
The
An assembler is a program that reads
a text
file with assembly instructions and converts the assembly into machine
code. Compilers
are programs
that do similar conversions for high-level
programming languages. An assembler is much simpler than
a compiler.
assembly language statement
years
for directly represents
Every
It took several
a single
machine
computer
instruction. High-level language statescientists to fig-ments
are much more complex
may
require many machine instructions.

out how to even write
and
ure
Another important difference between

assembly and high-level languages
that since
every
a compiler!
is
different type of CPU has its
own machine language, it

also has its own assembly language.
assembly programs between
Porting
12
computer
different
architectures
diffi cult than in a high-level

This
books
off the Internet
com-mon
More
examples
or NASM
Assembler
Assembler
is much
more
language.
uses
the
Netwide
for short. It is freely available
(see the preface for the URL).
are Microsofts
or Borlands Assembler
are some differences in the
assemblers
(MASM)
There
(TASM).
assembly syntax for MASM/-TASM and NASM.
1.3.3
Instruction operands
Machine
number
and
code
type
instructions
of
general, each instruction

number of
oper-ands
operands;
have
varying
however,
a fixed
can have
itself will have
(0 to 3). Operands
the following types:

register: These operands refer directly to the
contents of the CPUs regis-
ters.
in
memory:
These refer to data in
address of the data
may
a constant hardcoded
or may be computed using
values
registers.
of
memory.
The
be
into the instruction
are
Address
always
offsets from the beginning of a
segment.
immediate:
These
are
values
fixed
that
are
listed inthe instruction itself.

They
are
stored in the instruction itself (in
the code segment), not in

the data segment.
implied:
These
operands
are not
explicitly
shown. For example, the in-
crement instruction adds
or memory.
The
one to a register
one is
implied.
1.3.4
The
Basic instructions
most
instruction.
It
basic
moves
instruction
data from
is
one
the
MOV
location to
another
(like
the
assignment
operator
in
high-level language). It takes two operands:
mov
dest,
src
The data specified
by
src
is copied to dest
One restriction is that both operands
memory
operands.
This
quirk
assembly.
There
of
points
are
may not
out
be
another
often somewhat
arbitrary rules about how the various instructions
are
used. The operands
size. The value of AX

Here
comment):
is
an
must also be the
can not be stored
example
(semicolons
same
into BL.
start
13
mov
eax, 3
;
store
3 into EAX
register (3 is immediate operand)
mov
bx, ax
;
store
the value of AX
The ADD instruction is used to add integers.

add
eax, 4
add
al,ah
= eax + 4
;
al= al+ ah
;
eax
The SUB instruction subtracts integers.

sub
sub
-10
ebx, edi ;
ebx = ebx -edi
bx, 10
;
bx = bx
The INC and DEC instructions increment
or
into the BX register
decrement values by
Since the
one is an implicit
one.
operand, the
machine code for INC and DEC is smaller than for

the equivalent ADD and SUB instructions.
inc
ecx
;
ecx++
dec
dl
;
dl--
1.3.5
Directives
A directive is an artifact of the assembler not

the CPU. They
are gen-erally
used to either
instruct the assembler to do something

the assembler of something. They
or inform
are not
translated into machine code. Com-mon uses of

directives
are:
define constants
memory to store data into

group memory into segments
conditionally include source code
define
include other files
passes through a preprocessor

many of the same preprocessor
commands as C.However, NASMs preprocessor
di-rectives start with a % instead of a # as inC.
NASM code
just like C. It has
equ directive
equ directive can be used to define a
symbol. Symbols are named constants that can be
used inthe assembly program. The format is:
The
The
symbol
equ value
Symbol values
can not
be redefined later.
14
Unit
Letter
byte
word
double word
quad word
ten bytes
Table 1.3: Letters for RESX and DX Directives
The %define directive
This directive is similar to Cs #define

directive. It is most commonly
used to define constant
macros
just
as in C.
%define SIZE 100
mov
eax, SIZE
The above code defines

and shows its
use
a macro
named SIZE
in a MOV instruction. Macros
are more flexible than

Macros can be redefined
simple constant numbers.
symbols
and
can
in two
be
more
ways
than
Data directives
are used in data segments to

define room for memory. There are two ways
memory can be reserved. The first way only
defines room for data; the second way defines
room and an initial value. The first method uses
one of the RESX directives. The X is replaced with
a letter that determines the size of the object (or
Data directives
objects) that will be stored. Table 1.3 shows the

possible values.
The second method (that defines
uses one of the

are the same as
an
initial
value, too)
DX directives. The X
letters
those
in the
RESX
directives.
It is very
common to mark memory locations

with labels. Labels allow one to easily refer to
memory locations in code. Below are several
examples:
L1
db
;
byte
with initial value 0 L2
;
word
L3
dw
labeled L1
1000
labeled L2 with initial value 1000
db
110101b
;
byte
initialized
to binary 110101 (53 indecimal) L4
;
byte
12h
db
initialized to hex 12 (18
indecimal) L5
;
byte
17o
db
initialized to octal 17 (15 indecimal) L6
dd
;
double
1A92h
to hex 1A92 L7
word initialized
;
1
resb
uninitialized byte
L8
db
;
byte
"A"
initialized
to ASCII code for A (65)
Double quotes and single quotes
same. Consecutive data

definitions are stored sequentially
memory. That is,the word L2 is stored
immediately after L1in memory.
are
treated the
in
Sequences
1.3. ASSEMBLY
inC.
LANGUAGE
15
L9
db
d, 0
db
0,1,2,3
defines 4 bytes L10
db
"w", "o", "r",
;
defines a C string
= "word"
L11
;
same as
word, 0
L10
The DD directive
can
be used to define both
integer
and
single
precision
constants. However, the DQ
point
floating
can
only be used to
define double precision floating point constants.
For large
sequences,
NASMs
TIMES directive
is often useful. This direc-tive repeats its operand
a specified
L12
number of times. For example,
times 100 db 0
equivalent to 100 (db 0)s
L13
resw
;
reserves room
100
for 100 words
can be used to refer to

data in code. There are two ways that a label can
be used. If a plain label is used, it is interpreted as
Remember
that labels
the address (or offset) of the data. If the label is

placed inside
as
square
brackets ([]),it is interpreted
the data at the address. In other words,
one
a label as a pointer to the data
and
should think of
the square brackets dereferences
the pointer just
the asterisk does in C. (MASM/TASM

different
convention.)
mov
as
a
In 32-bit mode, addresses
are 32-bit. Here are some

1
follow
examples:
al,[L1]
;
copy
byte
at L1into AL
mov
;
EAX
eax, L1
mov
address of byte at L1 3
;
copy
ah
mov
[L1],
AH into byte at L1
;
copy
eax, [L6]
double word at L6 into EAX
eax, [L6]
;
EAX
at L6
add
add
= EAX +double
[L6], eax
word
double word at L6 += EAX
mov
al,[L6]
;
copy
first
byte of double word at L6 into AL
shows an important
assem-bler does not keep
track of the type of data that a label refers to. It
is up to the programmer to make sure that he (or
she) uses a label correctly. Later
it will be
common to store addresses of data in registers
and use the register like a pointer variable in C.
Again, no checking is made that a pointer is
used correctly. In this way, assembly is much
more error prone than even C.
Line
7 of the
examples
property of NASM. The
Consider the following instruction:
mov
[L6], 1
;
store a 1
at
L6
This
not
statement
specified
produces
error.
an
operation
Why?
Because
size
the
assembler does not know whether to store the 1

as
a byte,
word
or
double word. To fix this, add
size specifier:
mov
dword [L6], 1
;
store a 1
at L6
6
Single precision floating point is equivalent to a float
variable in C.
16
This tells the assembler to store
an 1
at the
double word that starts at L6. Other size
are: BYTE, WORD, QWORD
specifiers
1.3.6
and TWORD
7.
Input and Output
Input and output

activities.
It
are very system
involves
in-terfacing
dependent
with
the
systems hardware. High level languages, like C
provide standard libraries of routines that provide
a simple,
uniform programming
interface for I/O.
no standard libraries
access hardware
operation in pro-tected mode)
Assembly languages provide
must
They
either
(which is a privileged
or use
whatever
directly
low
level
routines
that
the
operating system provides.

It is
very common
be interfaced
that the assembly

library
the
code
I/O routines
rules
for assembly
with C. One
for
can use
However,
passing
routines to
advantage
of this is
the standard C
one must
information
know
between
complicated
uses.
These rules are too
to cover here. (They are covered
later!)
simplify
routines
that
To
developed
C
rules
the
I/O,
author
has
his own routines that hide the complex
and
1 .4
Table
interface.
provide
provided. All of the
more
much
describes
the
rou-tines preserve
simple
routines
the value of
all registers, except for the read routines. These
routines do modify the value of the EAX register.

To use these routines,
information
them.
that
one must
the
a
preprocessor
To include
%include
a file with
to use
NASM, use the
include
assembler
file in
needs
directive.
The
following
line includes the file needed by the authors I/O

8:
routines
%include "asm_io.inc"
To
use one
one
uses a
of the print routines,
EAX with the correct
value
and
loads
CALL
instruction to invoke it. The CALL instruction is

equivalent
to
function
call in
high level
language. It jumps execution to another
section
of code, but returns back to its origin after
the routine is over. The example
program
below
shows
several
examples
of calls to these I/O
routines.
1.3.7
Debugging
authors
library also contains some
for debugging programs
These
The
useful routines
debugging routines display information about the

state of the computer without modifying the state.
These
routines
are really macros
7
defines
TWORD
floating point
8
that
code
The
..
asm io
asm io
a ten
byte
coprocessor uses
downloads
(and
inc
requires)
inc
on
the
area
of
memory.
The
this data type.
the
are
asm io
in
web
object
the
page
file
example
for
this
tutorial, http://www.drpaulcarter.com/pcasm
17
print int
prints out to the screen the value
of the integer stored
inEAX
print char
prints out to the screen the
character whose ASCII
value stored inAL
print string
prints out to the
screen
the
contents of the string at

the address stored inEAX. The
string must be a C-
type string (i.e. null terminated).

print nl
prints out to the
line character. read int
screen a new
an integer
reads
from the keyboard and stores it into

the EAX register.
read char
reads
a single
character from the
keyboard and stores

its ASCII code into the EAX
register.
Table 1.4: Assembly I/O Routines
preserve the current state of the CPU and

a subroutine call. The macros are
defined inthe asm io.inc file discussed above.
Macros are used like ordinary instructions.
Operands of macros are separated by commas.
There are four debugging routines named
dump regs, dump mem, dump stack and dump math;
they display the values of registers, memory,
that
then make
stack and the math

dump
regs
This
coprocessor,
macro
respectively.
prints out the values of
the registers (in hexadeci-mal) of the computer to
stdout (i.e. the screen). It also displays the bits
set in the FLAGS
zero flag is 1,ZF

displayed.
register. For example, if the
is displayed
that is printed
distinguish
the
If it is 0,it is not
a single integer
out as well. This can
It takes
output
of
argument
be used to
dump
different
regs
commands.
dump
of
as
mem
a region
macro
of memory (in
This
.
.
ASCII characters.
arguments
delimited
that
dump
is used
regs
prints out the values
hexadecimal) and also
It takes
to label
argument)
the
paragraphs
memory
output
integer
(just
is
can
be
the number
as
will
start
on
is
label.)
of 16-byte
to display after the address
displayed
comma
an
The second argument
the address to display. (This

The last argument
three
The first is
the
The
first
paragraph boundary before the requested address.

dump stack This
macro
prints out the values
on the CPU stack. (The

stack will be covered
stack is organized
in Chapter
4.) The
as double
words and this routine displays them this
way.
9
It
takes
three
comma
Chapter 2 discusses this register
18
arguments
delimited
The first is
an
integer
label (like dump regs). The second is the number

of double words to display below the address
that
the EBP register holds and the third argument is

the number of double words to display above the
address inEBP.
dump math This
macro
prints out the values of
the registers of the math
coprocessor.
It
takes
single
integer
argument that is used to label

dump
1.4
the
output
regs
does.
Creating
just
as
the
of
a Program
Today, it is unusual to create
program
argument
written completely
stand alone
in assembly language
Assembly
is usually used to key certain critical
rou-tines.
Why? It is much easier to program in a
higher level language than in assembly. Also, using
assembly
makes
other platforms.
a program very hard to port to

In fact, it is rare to use assembly
at all.
So, why should
anyone
learn assembly at all?
1.Sometimes code written inassembly
can be
faster and smaller than
compiler generated code.
access to direct
2.Assembly allows
hardware
features of the system that
might be diffi cult
or impossible to use from a
higher level language.
3.Learning to program inassembly helps
one
gain a deeper understanding of how computers work.
4.Learning to program inassembly helps
one
understand better how
compilers and high level languages like C

work.
These
last
two
learning assembly
programs
points
demonstrate
that
can be useful even if one never
in it later. In fact, the author rarely
programs
in assembly, but he
uses
the ideas he
learned from it everyday.
1.4.1
First
The early
program
programs
in this text will all start
from the simple C driver

It
simply
asm main.
calls
program
another
This is really
in Figure 1.6
function
named
a routine that will be

are several advantages
written inassembly. There
in using the C driver routine. First, this lets the

C system set
protected
up the program to run correctly
mode.
corresponding
All the segments
code need not
worry
about
of this. Secondly, the C library will also be
available to be used by the assembly
1.4. CREATING A PROGRAM
19
in
their
segment registers will be initialized
by C. The assembly
any
and
code. The
int main()
int ret status
ret status
=asm main();
return ret status
Figure 1.6: driver.c code

advantage of this. They
use Cs
I/O
functions (printf, etc.). The following
shows
a simple
assembly
program.
first.asm
;
file: first.asm
;
First assembly program. This program
;
input and prints out their sum.
;
To create executable using djgpp:
;
nasm -f coff first.asm
;
gcc -o first first.o driver.c asm_io.
89
10
12
;
initialized
;
13
segment .data
11
data is put inthe .data
14
asks for two integers
as
segment
15
;
These
labels refer to strings used
for output
16
17
prompt1 db
;
dont
"Enter
a number:
forget null terminator
18
",0
prompt2
"Enter another number: ",0
db
19
outmsg1 db
"You entered ",0
20
outmsg2 db
"and ",0
21
outmsg3 db
",the
sum of these
is
",0
22
23
24
;
;
uninitialized
data is put inthe
.bss segment
25
26
segment .bss
27
28
;
;
These
labels refer to double words
used to store the inputs
29
20
30
input1
resd 1
31
input2
resd 1
32
35
;
;
code
;
36
segment .text
33
34
37
38
is put inthe .text segment
global
_asm_main
_asm_main:
39
enter
40
pusha
0,0
41
42
mov
eax, prompt1
43
call
print_string
45
call
read_int
46
mov
[input1],
48
mov
eax, prompt2
49
call
print_string
44
eax
;
;
47
50
51
call
read_int
52
mov
[input2],
eax
;
;
54
mov
55
add
56
mov
eax, [input1]
eax, [input2]
ebx, eax
;
;
;
53
57
dump_regs 1
58
dump_mem
59
60
61
62
2,outmsg1, 1
;
;
next
;
print out result
message
63
mov
eax, outmsg1
64
call
print_string
65
mov
eax, [input1]
66
call
print_int
67
mov
eax, outmsg2
68
call
print_string
69
mov
eax, [input2]
70
call
print_int
71
mov
eax, outmsg3
setup routine
print out prompt
read integer
store into input1

print out prompt
read integer
store into input2
eax
= dword at input1
eax += dword at input2

ebx
= eax
;
print
;
print
as series
;
print
;
print
;
print
;
print
out register values

out
memory
of steps
out first
message
out input1
out second
out input2
message
72
call
print_string
73
mov
eax, ebx
74
call
print_int
75
call
print_nl
76
77
popa
78
mov
79
leave
eax, 0
21
;
print
out third
;
print
out
;
print
new-line
;
return
80
message
sum (ebx)
back to C
ret
first.asm
program defines a section of

the program that specifies memory to be stored
in the data segment (whose name is .data)
Line 13 of the
initialized
Only
data should be defined
segment. On lines 17 to 21, several

declared.
They
will be printed
so must
library and
in this
be terminated
are
strings
with
the
with
character (ASCII code 0). Remember
null
there is
big difference between 0 and 0.
Uninitialized
data should be declared in the
on line 26). This

segment gets its name from an early UNIX-based
assem-bler operator that meant block started by
symbol. There is also a stack segment too. It
bss
segment
(named
.bss
will be discussed later.

The
code
historically.
segment
is
.text
named
are
It is where instructions
placed.
Note that the code label for the main routine

(line 38) has
an
underscore
of the C calling
specifies the rules C

It is
very
prefix. This is part
convention.
important
uses
now, one
will be presented;
only needs to know
(i.e., functions
and
conven-tion
to know this convention
when interfacing C and assembly

tire convention
This
when compiling code
global
Later the
en-
however,
for
that all C symbols

variables)
have
underscore
compiler.
to them by the C
prefix appended
(This
DOS/Windows,
rule
is
specifically
the Linux C compiler
for
does not
prepend anything to C sym-bol names.)
on line 37 tells the

asm main label global.
Unlike in C, labels have internal scope by default.
This means that only code in the same module
can use the label. The global directive gives the
specified label (or labels) external scope. This
type of label can be accessed by any module in
the program. The asm io module declares the
The
assembler
global
directive
to make the
print int, et.al. labels to be global. This is

why
one can use them inthe
first.asm module.
22
1.4.2
Compiler dependencies
The assembly code above is specific to the free

10-based
11
GNU
DJGPP C/C++ compiler.
This
compiler
can
be freely downloaded
ternet. It requires
runs
386-based PC
under DOS, Windows
compiler
uses
95/98
from the In-
or better and
or NT. This
object files in the COFF (Common
Object File Format) format. To assemble to this
format
use
the -f coff switch
with
nasm
(as
shown in the comments of the above code). The

extension of
the resulting object file will be o.
a GNU compiler also.

above to run under Linux,
The Linux C compiler is

To convert the code
simply
remove
the underscore prefixes in lines 37
and 38. Linux
uses
the ELF (Executable
and
Linkable Format) format for object files. Use the

-f elf switch for Linux. It also produces
an
object
The compiler specific
ex- with ano extension.
ample files, available from
another
Microsoft
popular
the authors
for object
Borland C/C++ is
compiler.
files. Use
uses
It
web site, have
the
OMF format
the -f obj
switch
for
Borland compilers.
already been modified to
work with the appropriate
compiler.
The extension of the object file willbe obj. The OMF format
uses differ-
ent segment directives than the other object formats. The data segment
(line 13) must be changed to:
segment
DATA public align=4 class=DATA
use32
The bss segment (line 26) must be changed to:

segment
BSS public align=4 class=BSS
use32
The text segment (line 36) must be changed to:

segment
TEXT public align=1 class=CODE
use32
Inaddition
line 36:
a new
line should be added before
group
The
DGROUP
Microsoft
BSS
DATA
compiler
C/C++
or the
given a
can use
either the OMF format
Win32 format for
object
OMF
files.
converts
(If
the
information
to
format,
Win32
internally.)
Win32 format allows segments
defined just
as
format
to be
for DJGPP and Linux. Use the
-f win32 switch to output
in this mode. The
extension of the object file will be obj.
1.4.3
Assembling the code
The first step is to assembly the code. From the

command line, type:
nasm
10
it
-f object-format first.asm
GNU is a project of the Free Software Foundation
(http://www.fsf.org)
11
http://www.delorie.com/djgpp
23
where object-format
is either coff, elf
win32 depending
on
used. (Remember
that the
what
C compiler
source
changed for both Linux and Borland
1.4.4
or
obj
will be
file must be
as well.)
Compiling the C code
Compile the driver.c file using
For DJGPP,
a C compiler.
use:
gcc -c driver.c
-c switch means to just compile, do not

same switch works on
Linux, Borland and Microsoft compilers as well.
The
attempt to link yet. This
1.4.5
Linking the object files
Linking
is
the
process
of
machine code and data in object

files together to create
combining
the
files and library
an executable
file. As will
be shown below, this process is complicated.

C code requires the standard
special startup code to
let the C compiler
correct
pa-rameters,
run.
C library and
It is much easier to
call the linker
directly. For example, to link the
first
program
using DJGPP,
gcc -o first
This creates
with the
than to try to call the linker
use:
driver.o first.o
an executable
code for the
asm io.o
called first.exe (or
just first under Linux).

With Borland,
one would use:
bcc32 first.obj driver.obj
Borland
uses the name
of the first file listed to
determine the executable
case, the program
asm io.obj
name. So inthe above
would be named first.exe.
It is possible to combine the compiling and

linking step. For example,
gcc -o first
Now
driver.c first.o
gcc will compile
asm io.o
driver.c and then link.
1.4.6
Understanding
an assembly
listing file
The -llisting-file
tell
nasm to create a listing
This file shows
how the
can be used to
file of a given name.
code was assembled.
switch
Here is how lines 17 and 18 (in the data segment)
appear
in the listing file. (The line numbers
are
in the listing file; however notice that the line

numbers in the
source
file may not be the
same as
and little endian. Big endian is t
24
48 00000000 456E7465722061206Eprompt1 db
"Enter
a number:
",0 49
00000009 756D6265723A2000
50 00000011 456E74657220616E6Fprompt2 db
510000001A 74686572206E756D6252 00000023 65723A2000
The first column in each line is the line number
and the second is the offset (in hex) of the data in
raw
the segment. The third column shows the

hex
values that will be stored. In this
case
the
hex data correspond to ASCII codes. Finally, the
The offsets listed in the second
on the line
column are very
likely not the true offsets that
the data will be
text from the
source
file is displayed
placed at in the complete
may
program.
Each module
define its own labels inthe data segment (and
the other segments,
too). In the link
section
these
1.4.5),
all
definitions
are
segment.
The
combined
new
final
data
to
step (see
segment
form
offsets
one
are
label
data
then
computed by the linker.

Here is
the
source
small section (lines 54 to 56 of
file) of the text segment in the listing
file:
94
0000002C
eax,
mov
A1 [00000000]
95 00000031 0305[04000000]
[input1]
add
eax,
mov
ebx, eax
[input2] 96
00000037
89C3
The
third
generated
code for
column
shows
the
by the assembly.
machine
Often
an instruction can not
code
the complete
be computed yet.
For example, in line 94 the offset (or address) of

input1 is not known until the code is linked.
The assembler
can compute
the op-code for the
mov
instruction
(which from the listing is A1),
but
it writes
the
In this
in
offset
because the exact value
can not
case, a temporary
square
brackets
be computed yet.
offset
of 0 is used
because input1 is at the beginning of the part of
the bss segment

this
does
beginning
program.
not
of
defined in this file. Remember
mean
the
that
final
When the code
it will be
bss
segment
at the
of
the
is linked, the linker
will insert the correct offset into the position.

Other instructions, like line 96, do not reference
any
labels. Here the assembler
can compute
the
complete machine code.
Big and Little Endian Representation
one looks closely at line 95, something seems

very strange about the offset in the square
If
brackets of the machine code. The input2 label

is at offset 4 (as defined in this file); however, the
offset that
appears
but 04000000.
multibyte
in
memory
is not 00000004,
Why? Different
processors store
integers in different orders in memory.

There
Endian is pronounced
like
endian and little endian

method indian.
are two popular
methods of
storing integers
Big endian
:
big
is the
1.5. SKELETON FILE
25
that
seems
the most natural
most significant)
next
biggest,
00000004
etc. For
would be stored
processors
method. However,
The biggest (i.e
example,
as
the
dword
the four bytes 00 00
processors
use this big endian
Intel-based processors use the
00 04. IBM mainframes,

and Motorola
byte is stored first, then the
most RISC
all
little endian method! Here the least significant
byte
is stored first. So, 00000004 is stored in memory
as 04 00 00 00. This format is hardwired into the

CPU and can not be changed. Normally, the
programmer does not need to worry about which
format is used. However, there are circumstances
where it is important.
1.When binary data is transfered between

different computers (either
from files
or through a network).
2.When binary data is written out to memory
as a multibyte
integer
and then read back
as individual
bytes
or vice
versa.
Endianness
array
always
strings
does not apply to the order of
elements.
The first element of
at the lowest
(which
are
address.
just
an array
This applies
character
Endianness still applies to the individual

of the
1.5
arrays)
is
to
elements
arrays.
Skeleton File
Figure 1.7 shows
as a starting
programs.
used
a skeleton
file that
can be
point for writing assembly
26
skel.
1
segment .data
;
initialized data
;
segment here
is put inthe data
segment .bss
8
9
;
;
uninitialized
data is put inthe bss
segment
10
11
12
13
14
segment .text
global
_asm_main
_asm_main:
15
enter
16
pusha
0,0
17
18
19
20
21
;
;
code is put inthe text
;
or after this comment.
;
22
23
popa
segment.
24
mov
25
leave
;
setup
eax, 0
routine
Do not modify the code before
;
return
26
back to C
ret
Figure 1.7: Skeleton Program
skel.asm
Chapter 2
Basic Assembly
Language
2.1
2.1.1
Working with Integers
Integer representation
Integers
signed
come
in two flavors: unsigned and
Unsigned
non-negative)
are
straightforward
binary
integers
(which
represented
manner.
in
are
very
The number 200
as an one byte unsigned integer would be represented as by 11001000 (or C8 inhex).

Signed integers (which may be positive or
negative) are represented in a more complicated
ways. For example, consider 56. +56 as a byte
would be represented by 00111000. repr
On esen
paper,
ted
one
by
could represent 56 as 111000, but how would
this be represented
memory.
in
byte in the computers
How would the minus sign be stored?
There
are
three
general
have been used to represent
computer
memory
techniques
that
signed integers in
All of these methods
the most significant
bit of the integer
use
as a sign
bit.This bit is 0 if the number is positive and 1

if negative.
Signed magnitude
The first method is the simplest and is called

signed magnitude. It
rep-resents
the integer
as two
parts. The first part is the sign bit and the second
is the magnitude
be represented
as
bit is underlined)
The
largest
+127
of the integer. So 56 would
the byte
or
(the sign
byte value would be 01111111
and the smallest
11111111
00111000
and 56 would be 10111000.
byte
127. To negate
bit is reversed.
value
a value,
or
be
the sign
This method is straightforward,
First, there are

zero, +0 (00000000) and
zero is neither positive nor
but it does have its drawbacks.
two possible values of

0 (10000000).
would
Since
negative, both of these representations
same.
the
This
complicates
the
should act
logic
of
arithmetic for the CPU. Secondly,
27
28
CHAPTER 2.BASIC
ASSEMBLY
LANGUAGE
general arithmetic is also complicated.

general arit If 10 is
added to 56, this must be recast
as 10
subtracted by 56. Again, this complicates the

logic of the CPU.
Ones complement
The
second
method
is
known
as
ones
complement representation. The ones complement

of
a number
number.
new
is found by reversing each bit in the
(Another
way to
look at it is that the
bit value is 1
oldbitvalue.) For example, the
bit
value.
)For
ones
complement
In ones
ones
of 0 0111000 (+56) is 11000111.
com-plement
complement
is
notation,
equivalent
computing
to
Thus, 11000111 is the representation
the
nega-tion.
for 56.
was
Note that the sign bit

by ones complement
automatically
and that
as one
pect taking the ones complement

the original number. As for
there
are two
(+0) and
representations
ex-
the first method,

of
zero:
00000000
with ones
numbers is complicated.
There is
complement
would
twice yields
11111111 (0). Arithmetic
complement
changed
a
of
handy trick to finding the ones
a number
in hexadecimal without
converting it to binary. The trick is to subtract the

hex digit from F(or 15 indecimal). This method
assumes that the number of bits in the number

a multiple of 4. Here is an example: +56
represented
is
is
by 38 in hex. To find the ones
complement, subtract each digit from F to get C7

inhex. This
agrees
with the result above.
Twos complement
The first two methods described

early computers
Modern
method called two s complement

The
twos complement
the following two steps:
of
were used on
use a third
computers
representation.
number is found by
1.Find the ones complement of the number
2.Add
Heres
one to the result
an example
of step 1
using 00111000 (56). First the
ones complement is com-puted: 11000111. Then
one is added:
11000111
1
11001000
In two complements
twos
complement
number.
Thus,
complement
negations
should reproduce
WITH INTEGERS
the
to negating
is equiv-alent
11001000
is
rep-resentation
Surprising
twos
The C language
29
notation, computing
of
the
twos
56.
Two
the original number.
complement
does
meet
2.1. WORKING
this
Number
Hex Representation
00
01
127
7F
-128
80
-127
81
-2
FE
-1
FF
Table 2.1: Twos Complement
Representation
complement of 11001000 by adding
one to the
ones complement.
00110111
1
00111000
When performing
complement
leftmost bit
not
used.
computer
number
may
the
produce
Remember
is of
of
the addition in the twos
operation,
some
bits).
addition
a carry.
that
fixed
Adding
This
all data
of
size (in terms
two
bytes
the
carry is
on the
of
always
produces
byte
words produces
important
twos
for
example,
as a result (just as adding two

a word, etc.) This property is
complement
complement
notation.
zero as a one
consider
number (00000000)
byte
For
twos
Computing
its
two complement produces the sum:

11111111
c
where
c represents
00000000
a carry.
carry,
shown how to detect this

in
the
result.)
Thus,
notation there is only

complement
arithmetic
in
(Later
it will be
but it is not stored
twos
one zero. This
complement
makes twos
simpler than the previous
methods.
twos
Using
byte
can
complement
notation,
be used to represent
to +127. Table 2.1 shows

If 16 bits
are
to +32, 767
signed
the numbers 128
some
selected values
used, the signed numbers 32, 768
can
be represented.
+32, 767 is
represented
be
re prese ntby
ed.7FFF, 32, 768 by 8000, -128
FF80 and -1 as
numbers
range
approximately.
as
FFFF. 32 bit twos complement

from
billion
to +2 billion
no
The CPU has

(or word
or double
idea what
particular byte
word) is supposed to represent.
Assembly does not have the idea of types that
high level language has. How data is interpreted

depends
Whether
on what
instruction is used
the hex value
data
FF is considered
a signed 1 or a unsigned
on the programmer. The C language
represent
on the
to
+255 depends
30
CHAPTER 2.BASIC
ASSEMBLY
LANGUAGE
defines signed and unsigned integer types.
This allows
a C compiler to
determine the
correct instructions to use with the data.
2.1.2
Sign extension
a specified size. It
uncommon to need to change the size of
to use it with other data. Decreasing size is
In assembly, all data has

is not
data
the easiest.
Decreasing size of data
To decrease the size of data, simply
the
more
remove
a
significant bits of the data. Heres
trivial example:
mov
;
ax =52 (stored
ax, 0034h
in16 bits)
mov
cl,al
lower 8-bits of
Of
course,
if
;
cl=
ax
the
number
can not
be
represented
correctly
in
the
smaller
size,
decreasing the size does not work. For example,

if AX
were
above
code
method
would
308 in decimal) then the
still set
CL to 34h. This
works with both signed and unsigned
numbers.
Consider
FFFFh (1
(1
0134h (or
as a
as a
byte).
signed numbers,
was
if AX
word), then CL would be FFh
However, note that this is not
correct if the value inAX was unsigned!
The rule for unsigned numbers is that all the

bits being removed must be 0 for the conversion
to be correct. The rule for signed numbers is

that the bits being removed must be either all 1s
or
all 0s. In addition,
removed
must
have
the first
the
same
removed bits. This bit will be the
bit not being

value
new
as
the smaller value. It is important that it be
as the original
the
sign bit of
same
sign bit!
Increasing size of data
Increasing the size of data is
than decreasing. Consider

is extended
to
more
complicated
the hex byte FF. If it
word, what value should the
word have? It depends
a unsigned
If FF is
on how
FF is interpreted.
byte (255 in decimal), then
the word should be 00FF; however, if it is
signed byte (1 in decimal), then the word should

be FFFF.
an
In general, to extend
one
makes
new
all the
unsigned
bits
number,
of the expanded
number 0. Thus, FF becomes
00FF. However,
a signed number, one must extend the

bit. This means that the new bits become
to extend
sign
copies of the sign bit. Since the sign bit of FF
is 1, the
new
bits must also be all
ones, to
produce FFFF. If the signed number 5A (90 in
decimal)
was extended,
the result would be 005A.
2.1. WORKING WITH INTEGERS
31
are
There
provides
that
several instructions that the 80386
for extension
the computer
number
is
signed
of
does
or
numbers.
not know
unsigned.
programmer to use the correct

For unsigned
numbers,
It is
Remember
whether
up to
the
instruction.
one can
simply
put
zeros
in the
upper
bits using
MOV instruction.
an
For example, to extend the byte in AL to

unsigned word inAX:
mov
;
zero
ah, 0
it is not
However,
out
upper
possible
an
unsigned
double
no way to
EAX in a MOV. The
providing
a new
There is
instruction
word
use a
to
instruction to convert the unsigned
8-bits
MOV
word in AX to
in EAX. Why
specify the
upper
not?
16 bits of
80386 solves this problem

instruction
has two operands.
MOVZX
by
This
The destination
a 16 or 32 bit register.
source (second operand) may be an 8 or 16 bit
register or a byte or word of memory. The other
(first operand) must be
The
restriction is that the destination

than the
source.
(Most
must be larger
instructions
require
source and destination to be the same

are some examples:
the
size.) Here
movzx
eax, ax
;
extends ax into eax
movzx
eax, al
;
extends
alinto
eax
movzx
ax, al
;
extends
alinto
ax
movzx
ebx, ax
;
extends ax into
ebx
no easy way to
any case. The 8086
For signed numbers, there is
use
the MOV instruction
provided
several
numbers.
The
for
instructions
CBW
to extend
instruction sign extends the AL register

The operands
are
Word to Double
signed
to Word)
Byte
(Convert
into AX.
implicit. The CWD (Convert
word) instruction
sign extends
means to
as one 32 bit
AX into DX:AX. The notation DX:AX

think of the DX and AX registers
register with the
upper
16 bits in DX and the
lower bits in AX. (Remember
any
new
that the 8086 did
not have
32 bit registers!) The 80386 added
several
instructions.
Word
to Double
sign extends
Double
word
word
The
(Convert
instruction
AX into EAX. The CDQ (Convert
to Quad word) instruction
extends EAX into EDX:EAX

the MOVSX instruction
uses the rules
CWDE
Extended)
sign
(64 bits!). Finally,
works like MOVZX except it
for signed numbers.
Application to C programming
Extending
also
occurs
of unsigned
and signed
in C. Variables in
integers
ANSI C does not
define
may
be declared
unsigned (int is signed)

char type is
the code inFigure 2.1. Inline 3,the variable
as
either signed
Consider
a is extended
using the rules
for unsigned values (using MOVZX), but inline 4,the signed rules
are used
signed
or not, it is up to
each individual compiler to

decide this.
for b(using MOVSX).
or
whether the
That is why
the type is explicitly de-
fined inFigure 2.1.
32
CHAPTER 2.BASIC
ASSEMBLY
LANGUAGE
=0xFF;
=0xFF;
unsigned char uchar
signed char
schar
a = (int) uchar;
int
int b
= (int) schar;
a = 255
/ b
= 1
(0x000000FF) /
(0xFFFFFFFF) /
Figure 2.1:
char ch;
while( (ch
= fgetc(fp))
!= EOF ){
/ do something with ch /
}
Figure 2.2:
There is a common C programming bug

that directly relates to this subject. Consider the
code inFigure 2.2. The prototype of fgetc()is:
int fgetc( FILE *);
One might question why does the function return
back
an int since
reason
an char
it reads characters? The
is that it normally
does return back
an int value using zero extension).

one value that it may return
that is not a character, EOF. This is a macro
that is usually defined as 1. Thus,
us fgetc()
ual ly d
either returns back a char extended to an int
value (which looks like 000000xx in hex) or EOF
(ex-tended to
However, there is
(which looks like FFFFFFFF inhex).

The
basic
problem
the program in
re-turns an int, but
with
Figure 2.2 is that fgetc()
this value is stored in a char. C will truncate the
higher
order bits to fit the int value into the
char. The only problem is that the numbers (in

hex)
000000FF
truncated
can not
and
FFFFFFFF
both
will be
to the byte FF. Thus, the while loop
distinguish between reading the byte FF
from the file and end of file.

Exactly
depends
Why?
case,
or unsigned.
what the code does in this
on whether
char is signed
Because in line 2, ch is compared with
an int value
ch will be
extended to an int so that two values being
2.
compared are of the same size
As Figure 2.1
showed, where the variable is signed or unsigned
is very important.
EOF.
Since
EOF is
If char is unsigned,
000000FF.
to be
FF is extended
This is compared
to EOF (FFFFFFFF)
and found to be not equal. Thus, the loop
never
ends!
1
It is
a common
misconception that files have
an EOF
character at their end. This is not true!

2
The
reason
for this requirement will be shown later.
33
char
If
is
signed,
FFFFFFFF. This does
FF
is
compare as
extended
to
equal and the
loop ends. However, since the byte FF may have
been read from the file, the loop could be ending

prematurely.
The solution to this problem is to define the
as an int, not a char. When this

no truncating or extension is done in line
ch variable
is
done,
2.
Inside
the loop, it is safe to truncate
the value
since ch must actually be a simple byte there.
2.1.3
As
Twos complement arithmetic
was seen
performs
performs
addition
earlier,
and
subtraction.
the
the
Two
add
instruction
sub
instruction
of the bits
FLAGS register that these instructions set
overflow and
carry
in the
are the
flag. The overflow flag is set
if the true result of the operation is too big to
fit into the destination
for signed
arithmetic.
carry flag is set if there is a carry in the

msb of an addition or a borrow inthe msb of a
subtraction. Thus, it can be used to detect
overflow for unsigned arithmetic. The uses of the
carry flag for signed arithmetic will be seen
The
shortly.
subtraction
integers.
signed
or
of the great
One
complement
advantages
is that the rules
are
same as
may
exactly the
Thus, add and sub
FFFF
on
43
a carry generated,
answer.
There are two different
There is
but it is not part of
multiply
instructions. First, to mul-tiply

IMUL
be used
(1)
(1)
002B
used
for unsigned
44
002C
or
2s
and
unsigned integers.
the
of
for addition
instruction.
and divide
use either
the MUL
The MUL instruction
is
to multiply unsigned numbers and IMUL is
used to multiply
different
signed integers.
instructions
multiplication
complement
are
needed?
different
Why
The
are two
rules
for
for unsigned and 2s
signed numbers. How so? Con-sider
the multiplication
yielding
of the byte FF with itself
word
result.
Using
multiplication this is 255 times 255
unsigned
or 65025
FE01 in hex). Using signed multiplication
or 1
(or 0001 in hex).
There
or 1
(or
are
0001
several
in hex).
forms of the
(or
this is
1 times 1
1
multiplication
instructions. The oldest form looks like:

mul
The
source
source
reference.
Exactly
depends
is either
It
what
on the
can not
register
be
an
multiplication
size of the
or a memory
immediate
source
is
value.
performed
operand. If the
operand is byte sized, it is multiplied by the byte
in the AL register and the result is stored in the
16 bits of AX. If
multiplied
the
source
is 16-bit, it is
by the word in AX and the 32-bit
34
BASIC ASSEMBLY
CHAPTER 2
LANGUAGE
dest
source1
source2
Action
reg/mem32
= AL*source1
DX:AX =AX*source1
EDX:EAX =EAX*source1
reg16
reg/mem16
dest *= source1
reg32
reg/mem32
dest *= source1
reg16
immed8
dest *= immed8
reg32
immed8
dest *= immed8
reg16
immed16
reg32
immed32
reg16
reg/mem16
immed8
dest
reg32
reg/mem32
immed8
dest
reg16
reg/mem16
immed16
reg32
reg/mem32
immed32
reg/mem8
AX
reg/mem16
dest *= immed16
dest *= immed32
=source1*source2
=source1*source2
dest =source1*source2
dest =source1*source2
Table 2.2: imul Instructions
is stored inDX:AX. If the
source
is 32-bit, it is
multiplied by EAX and the 64-bit result is stored
into EDX:EAX.
The IMUL instruction has the same formats
MUL, but also adds

There
some
as
other instruction formats.
are two and three operand
imul
dest, source1
imul
dest, source1, source2
formats:
Table 2.2 shows the possible combinations.

The two division operators
are DIV and
IDIV. They perform unsigned and signed integer

division respectively. The general format is:
div
source
If the
source
is 8-bit, then AX is divided by the
operand. The quotient

remainder
DX:AX
is stored in AL and the
in AH. If the
source
is stored into AX and remainder
source
is 16-bit, then
is divided by the operand. The quotient
into DX. If the
is 32-bit, then EDX:EAX
the operand
is divided by
and the quotient is stored into EAX
and the remainder into EDX. The IDIV instruction

works the
instructions
same way.
like
There
the special
are no
IMUL
special IDIV
ones.
If the
or the
zero, the program is interrupted and
terminates. A very common error is to forget to
initialize DX or EDX before division.
quotient
is too big to fit into its register
divisor is
The
NEG
instruction
negates
its
operand by computing its twos complement.

operand
may
be
any
8-bit, 16-bit,
or
single
Its
32-bit
register
or memory
location.
2.1.4
Example programmath.asm
segment .data
prompt
db
"Enter
square_msg
db
"Square of input
;
Output
strings
a number:
35
",0
is ",0
cube_msg
is ",0
db
"Cube of input
cube25_msg
db
input times 25 is ",0
db
8
"Cube of
quot_msg
"Quotient of cube/100 is ",0
rem_msg
db
"Remainder of
neg_msg
db
"The negation
10
11
segment .bss
12
input
resd 1
13
14
15
16
segment .text
global
_asm_main
_asm_main:
17
enter
18
pusha
0,0
19
20
mov
eax, prompt
21
call
print_string
23
call
read_int
24
mov
[input],
imul
eax
ebx, eax
28
mov
mov
29
call
print_string
30
mov
eax, ebx
31
call
print_int
32
call
print_nl
22
eax
25
26
27
33
eax, square_msg
;
;
34
mov
ebx, eax
35
imul
ebx, [input]
36
mov
eax, cube_msg
37
call
print_string
38
mov
eax, ebx
39
call
print_int
40
call
print_nl
cube/100 is ",0
of the remainder is ",0
setup routine
edx:eax
=eax *eax
save answer
inebx
ebx *= [input]
36
ASSEMBLY
CHAPTER 2.BASIC
LANGUAGE
41
42
imul
ecx, ebx, 25
43
mov
eax, cube25_msg
44
call
print_string
45
mov
eax, ecx
46
call
print_int
47
call
print_nl
49
mov
eax, ebx
50
cdq
51
mov
ecx, 100
52
idiv
54
mov
mov
ecx
ecx, eax
eax, quot_msg
55
call
print_string
56
mov
eax, ecx
57
call
print_int
58
call
print_nl
59
mov
eax, rem_msg
48
53
60
call
print_string
61
mov
eax, edx
62
call
print_int
63
call
print_nl
neg
mov
edx
66
67
call
print_string
68
mov
eax, edx
69
call
print_int
70
call
print_nl
64
65
eax, neg_msg
71
72
popa
73
mov
74
leave
;
ecx
eax, 0
=ebx*25
;
initialize
;
cant
edx by sign extension
divide by immediate value
;
edx:eax / ecx
;
save quotient
into
ecx
;
negate
the remainder
;
return
back to C
ret
75
2.1.5
Extended precision arithmetic
Assembly
.
that
math.asm
allow
language also provides
one
to
perform
instructions
addition
and
subtraction of numbers larger than double words
These in-structions
use
the
carry
flag. As stated
above, both the ADD and SUB instruc-tions modify
the
carry
flag if
respectively.
a carry or borrow are
generated,
2.2. CONTROL STRUCTURES
37
This information stored inthe

used to add
breaking
or subtract
carry
flag
can be
large numbers by
up the operation
into smaller double
word (or smaller) pieces.

The ADC and SBB instructions
use this
information inthe carry flag. The ADC instruction

performs the following operation:
operand1
flag
= operand1 + carry
+ operand2
The SBB instruction performs:

operand1
= operand1 -carry
flag
operand2
How
are these used? Consider
the sum of 64-bit
integers inEDX:EAX and EBX:ECX. The

following code would store the sum inEDX:EAX:
add
eax, ecx
;
add lower
adc
edx, ebx
;
add
32-bits
2
upper
32-bits and
carry
from previous
sum
Subtraction is very similar. The following code
subtracts EBX:ECX from

EDX:EAX:
lower 32-bits
sbb
;
subtract upper
For really
to
use
ADC
(instead of all but the first iteration).
can be done
instruction
right
initialize the
carry
there is
a loop could be
a sum loop, it would
instruction for every
large numbers,
be convenient
This
edx, ebx
32-bits and borrow
used (see Section 2.2). For
iteration
;
subtract
eax, ecx
sub
no
instructions.
by using the CLC (CLear Carry)

the
before
loop
flag to 0.If the
difference
The
starts
carry
to
flag is 0,
between the ADD and ADC
same
idea
can
be used
for
subtraction, too.
2.2
Control Structures
High
control
level
languages
structures
(e.g.,
provide
high
the
and
if
level
while
statements) that control the thread of execution.
Assembly lan-guage does not provide such complex

control structures
goto
and
used
It instead
uses the
can
inappropriately
infamous
result
in
spaghetti code! How-ever, it is possible to write

structured
assembly
basic procedure
using the familiar

and translate
assembly
language
programs
program
is to design the
high level
The
logic
control structures
the design into the appropriate
language
(much like
compiler
would
do).
2.2.1
Comparisons
Control structures decide what to do based
on comparisons of data. In assembly, the result of

a comparison is stored inthe FLAGS register to
be
38
CHAPTER 2.BASIC
ASSEMBLY
used
LANGUAGE
later.
The
80x86
instruction to perform comparisons

register is set based
operands
are
of
on the
provides
the
CMP
The FLAGS
difference of the two
the CMP instruction.
subtracted and the FLAGS
The operands
are set
based
on
the result, but the result is not stored anywhere.

If you need the
result
use the SUB instead
of the CMP instruction.
are two flags (bits

are important: the
zero (ZF) and carry (CF) flags. The zero flag is
set (1) if the resulting difference would be zero.
The carry flag is used as a
borrow
flag
for
subtraction.
Consider
a
For unsigned integers, there
in the FLAGS register)
that
comparison like:
cmp
vleft, vright
The difference
of vleft
- vright
is computed
are set accord-ingly. If the difference

of the of CMP is zero, vleft = vright, then ZF is
and the flags
set (i.e. 1) and the CF is unset (i.e. 0). If vleft
> vright,
then ZF is unset and CF is unset (no
<
borrow). If vleft
vright, then ZF is unset

and CF
is set (borrow).
are three flags that

are important: the zero
For signed integers, there
= OF if
Why does SF
(ZF) flag, the overflow
(OF) flag and the sign (SF) flag. The overflow
flag vleft
> vright?
an operation
If there
is set if the result of
overflows (or underflows)
flag
The sign
is no overflow, then the

will
difference
have
the
is
set
if the result of an operation is negative.
is
set
(just
If vleft
as for unsigned
integers). If vleft
= vright, the ZF
> vright, ZF is unset
correct
value
be non-negative.
SF
and
must
Thus,
SF
=OF =0.However,
if there is an overflow, the
and
= OF. If vleft < vright, ZF is unset and SF 6= OF.
Do not forget that other instructions
can also change
the FLAGS register,
not just CMP.
difference will not have the
correct
value
instructions
(and
will be negative).
SF
in fact
transfer
Branch
Thus,
= OF = 1
can
2.2.2
execution
Branch instructions
points of a
to arbitrary
program. In other
are two types
words, they act like

of branches:
conditional. An unconditional
a goto,
it
conditional
always
branch depending
JMP
(short
unconditional
usually
and
branch is just like
makes
the
branch.
branch does not make
conditional
the branch, control

The
There
may or may not make the

on the flags in the FLAGS
branch
register. If
a goto.
unconditional
passes to the next
for
jump)
branches
Its
instruction.
instruction
single
makes
argument
is
code label to the instruction to branch
to.The assembler
or linker
will replace the label
with correct address of the in-struction. This is
one
another
easier.
of the tedious operations
that the
does to make the programmers
assembler
It
statement
is
important
immediately
to realize
that
life
the
after the JMP instruction
will never be executed
unless another instruction branches to it!

There
are
several
variations
of
the
jump
instruction:
SHORT This jump is
very
can only move up or down
range. It
in memory.
limited in
128 bytes
39
JZ
branches only if ZF is set
JNZ
branches only if ZF is unset
JO
branches only if OF is set
JNO
branches only if OF is unset
JS
branches only if SF is set
JNS
branches only if SF is unset
JC
branches only if CF is set
JNC
branches only if CF is unset
JP
branches only if PF is set
JNP
branches only if PF is unset
Table 2.3: Simple Conditional Branches
memory
than the others.
signed byte to store the

jump. The displacement
move
ahead
or
behind.
added to EIP). To specify

SHORT
keyword
It
uses a
displacement
is how
(The
immediately
many
single
of the
bytes to
displacement
is
use
the
short jump,
before the label in
the JMP instruction.
NEAR This jump is the default type for both
unconditional
and condi-tional branches
it
can
be used to jump to any location in a seg-ment
Actually, the 80386 supports two types of
near
uses two bytes for the displacement.

one to move up or down roughly
32,000 bytes. The other type uses four bytes
for the displacement, which of course allows one
to move to any location in the code segment.
jumps. One
This allows
The
four
protected
byte
mode.
type
The
is the
two
default
byte
type
specified by putting the WORD keyword
in 386
can
be
before the
label inthe JMP instruction.

FAR This jump allows control to move to
another code segment. This is a
very rare thing to do in386 protected

mode.
Valid code labels follow

data labels. Code labels
them
in the
code
are
segment
the
same
rules
as
defined by placing
in front
of
the
statement they label. A colon is placed at the end
of the label at its point of definition. The colon is
not part of the name.

There
are many
instructions.
different conditional branch
a code label as their

simplest ones just look at a
They also take
single operand. The
single flag in the FLAGS register to determine
whether to branch
of these
which
indicates
number
or not.
instructions.
the
See Table 2.3 for a list
(PF is the parity
flag
or evenness
the
odd
of bits set in the lower 8-bits
of
of the
result.)
The following pseudo-code:
40
CHAPTER 2.BASIC
ASSEMBLY
LANGUAGE
== 0 )
= 1;
if (EAX
EBX
else
EBX
= 2;
could be written inassembly
cmp
as:
eax, 0
jz
thenblock
mov
ebx, 2
jmp
next
thenblock:
mov
ebx, 1
next:
Other comparisons
;
set
flags (ZF set if eax
;
if ZF is set
of IF
;
jump over
THEN part of IF
;
THEN part
of IF
so easy
-0 =0)
branch to thenblock
;
ELSE part
not
using the conditional branches in
Table 2.3. To illustrate, consider the following

pseudo-code:
if (EAX
EBX
>= 5 )
= 1;
are
else
EBX
= 2;
If EAX is greater than
may be set or unset
or equal to five, the
ZF
and SF will equal OF. Here
is assembly code that tests for these conditions

(assuming that EAX is signed):
cmp
eax, 5
js
signon if SF
=1
jo
;
goto
elseblock if OF
jmp
thenblock
if SF
jo
elseblock
=1
and SF =0
;
goto
= 0 and OF =0
;
goto
signon
signon:
;
goto
thenblock
thenblock if SF
thenblock
=1
and OF =1
elseblock:
8
mov
ebx, 2
jmp
next
10
thenblock:
mov
11
12
ebx, 1
next:
The above code is very awkward. Fortunately,

the 80x86 provides addi-tional branch instructions
to make these type of tests much easier. There
are signed
and unsigned versions of each
Table
2.4 shows these instruc-tions. The equal and not

equal branches (JE and JNE)
are
the
same
for
both signed and unsigned integers. (In fact, JE

and JNE are really identical
41
Unsigned
Signed
= vright
JE
branches if vleft
JNE
branches if vleft 6= vright
JL, JNGE
branches if vleft
JLE, JNG
branches if vleft vright
JG, JNLE
branches if vleft
JGE, JNL
< vright
> vright
=vright
JE
branches if vleft
JNE
branches if vleft 6= vright
JB, JNAE
branches if vleft
JBE, JNA
JA, JNBE
branches if vleft
JAE, JNB
<vright
>vright
Table 2.4: Signed and Unsigned Comparison Instructions
to JZ and JNZ, respectively.) Each of the other

branch
instructions
two
have
synonyms.
For
example, look at JL (jump less than) and JNGE

(jump not greater than
same instruction
or equal to). These are the
because:
x < y =
The unsigned branches
not(x y)
use
below instead of Land G.

Using these
A for above and B for
new branch instructions

the
can be translated to assembly
pseudo-code above
much easier.
cmp
eax, 5
jge
thenblock
ebx, 2
jmp
next
thenblock:
mov
mov
ebx, 1
next:2.2.3
The loop instructions
The 80x86 provides several instructions

designed to implement for-like loops. Each of
these instructions takes
a code label as its single
operand.
LOOP Decrements ECX, if ECX 6= 0, branches
to label
LOOPE, LOOPZ Decrements ECX (FLAGS
register is not modified), if

ECX 6= 0 and ZF
= 1, branches
LOOPNE, LOOPNZ Decrements ECX

(FLAGS unchanged), if ECX 6=
0 and ZF
= 0,branches
The last two loop instructions
are useful
sequential search loops. The following
for
pseudo-code:
42
CHAPTER 2
sum =0;
for( i=10; i
>0; i )
sum += i;
could be translated into assembly
as:
mov
eax, 0
;
eax
is
mov
ecx, 10
;
ecx
is
loop_start:
add
eax, ecx
loop
loop_start
BASIC ASSEMBLY
LANGUAGE
sum
i
2.3
Translating Standard
Control Structures
This section looks at how the standard

control structures of high level languages
can be
implemented inassembly language.
2.3.1
If statements
The following pseudo-code:

if (condition )
then block;
else
else block
could be implemented
so that
;
code
to set FLAGS
jxx
else_block
;
select xx
3
for then block
jmp
endif
else_block:
;
code
;
code
branches if condition false
as:
for else block
endif:
If there is no else, then the else block
branch
can be replaced
by a branch to endif.
;
code
to set FLAGS
jxx
endif
xx so that
;
code
4
for then block
endif:
;
select
branches if condition false
2.4. EXAMPLE: FINDING PRIME NUMBERS
43
2.3.2
While loops
The while loop is a top tested loop:

while( condition ){
body of loop;
}This
could be translated into:
1
while:
;
code
to set FLAGS based
jxx
endwhile
on
condition
3
xx so that
branches if false
;
select
4
body of loop
jmp
while
endwhile:
2.3.3
Do while loops
The do while loop is a bottom tested

loop:
do {
body of loop;
}while( condition );
This could be translated into:

1
do:
;
body
of loop
;
code
to set FLAGS based
jxx
do
on
condition
4
;
select xx
so that
branches if true
2.4
Example:
Finding
Prime
Numbers
This section looks at
a program
prime numbers. Recall that
that finds
prime numbers
are
evenly divisible by only 1

and themselves. There
no formula for doing this. The basic method

program uses is to find the factors of all odd
3
numbers
below a given limit. If no factor can be
found for an odd number, it is prime. Figure 2.3
is
this
C.
shows the basic algorithm written in
Heres the assembly version:

3
2 is the only
44
even prime
number.
CHAPTER
BASIC ASSEMBLY
guess;
LANGUAGE
/ current
guess
unsigned
unsigned factor;
/ possible factor of
unsigned limit;
/ find primes
for prime
guess
up to this
value /
45
printf (Find primes

scanf(%u,
up to:);
&limit);
printf (2\n);
/ treat
printf (3\n);
/ special
case
/ initial
guess
guess
=5;
while (guess
10
/ look for
11
factor
12
<= limit
){
a factor
of
guess
14
factor
15
guess
as
/
/
= 3;
while (factorfactor
13
first two primes
< guess
&&
% factor != 0 )
+= 2;
if (guess % factor != 0)
16
printf (%d\n,
17
guess += 2;
18
guess);
/ only look at odd numbers /
19
Figure 2.3:
prime.asm
1
segment .data
Message
db
"Find
resd
45
segment .bss
6
Limit
resd
Guess
89
segment .text
global
10
11
_asm_main
_asm_main:
12
enter
13
pusha
0,0
14
15
mov
eax, Message
16
call
print_string
17
call
read_int
18
mov
[Limit],
eax
19
primes
up to:",0
;
find primes up to this
;
the current guess
;
setup routine
;
scanf("%u",
limit
for prime
& limit );
2.4. EXAMPLE: FINDING PRIME NUMBERS
45
20
mov
eax, 2
21
call
print_int
22
call
print_nl
23
mov
eax, 3
24
call
print_int
25
call
print_nl
mov
dword [Guess], 5
26
27
28
while_limit:
29
mov
eax,[Guess]
30
cmp
eax, [Limit]
31
jnbe
end_while_limit
mov
ebx, 3
32
33
34
while_factor:
35
mov
eax,ebx
36
mul
eax
37
jo
end_while_factor
38
cmp
eax, [Guess]
39
jnb
end_while_factor
40
mov
eax,[Guess]
41
mov
edx,0
42
div
ebx
43
cmp
edx, 0
44
je
end_while_factor
46
add
ebx,2
47
jmp
while_factor
45
48
end_while_factor:
49
je
end_if
50
mov
eax,[Guess]
51
call
print_int
call
print_nl
54
add
dword [Guess], 2
55
jmp
while_limit
52
53
56
end_if:
end_while_limit:
57
58
popa
59
mov
60
leave
eax, 0
;
printf("2\n");
;
printf("3\n");
;
Guess
=5;
;
while
(Guess
<= Limit
;
use
jnbe since numbers
;
ebx
is factor
=3;
are
unsigned
;
edx:eax
= eax*eax
;
if answer
wont fit ineax alone
;
if !(factor*factor < guess)
;
edx
=edx:eax
;
if !(guess
% ebx
% factor != 0)
;
factor += 2;
;
if !(guess
% factor != 0)
;
printf("%u\n")
;
guess += 2
return back to C
61
ret
46
prime.asm
CHAPTER 2.BASIC
ASSEMBLY
LANGUAGE
Chapter 3
Bit Operations
3.1
Shift Operations
Assembly language allows the

manipulate
common
the
individual
bits
bit operation is called
programmer to
of
data.
One
shift. A shift
moves the position of the bits of some

data. Shifts can be either toward the left (i.e.
toward the most significant bits) or toward the
operation
right (the least significant bits).
3.1.1
Logical shifts
A logical shift is the simplest type of shift. It

shifts in a very straightforward
shows
an example
of
a shifted
manner.
Figure 3.1
single byte number.
Original
Left shifted
Right shifted
Figure 3.1: Logical shifts

Note that
new,
incoming bits
The SHL and SHR instruc-tions
are
allow
one to shift
positions. The number of

either be
any
by
These
number of
can
positions to shift
a constant or can
be stored in the CL
The last bit shifted out of the data is
register.
stored
zero.
are used to perform
logical left and right shifts respectively

instructions
always
in the
carry
flag. Here
are some
code
examples:
mov
to
bit
right,
ax,
ax
ax
ax
0C123H
8246H,
2091H,
CF
;
shift
4123H,
CF
;
shift
;
shift
ax
left,
ax,
shr
ax, 0C123H
shl
CF
15
=0
13
1
bit to
shr
1
bit to right,
mov
ax
47
48
,
,
CHAPTER 3.BIT OPERATIONS
ax
shl
to
bits
mov
cl,3
ax
shr
bits to right,
3.1.2
048CH,
CF
17
;
shift
cl
ax =0091H,
CF
=1
Use of shifts
Fast
most
;
shift
ax
left,
multiplication
common uses
of
and
a shift
division
are
operations. Recall
that in the decimal system, multiplication

division
digits.
binary.
the
and
a power of ten are simple, just shift

The same is true for powers of two in
by
For
example,
to double
the
binary
once to
the left to get 101102 (or 22). The quotient of a
division by a power of two is the result of a right
shift. To divide by just 2, use a single right shift;
number 10112
(or 11 in decimal), shift
to divide by 4 (2 2), shift right 2 places; to divide

by 8 (2
3),
instructions
shift 3 places to the right, etc. Shift
are very
basic and
are
much faster
than the corresponding MUL and DIV instructions!

Actually,
multiply
logical
can
shifts
and divide unsigned
be
used
to
val-ues. They do
not work in general for signed values. Consider

the 2-byte
logically
value FFFF
(signed
once,
right shifted
If value
it is
1).
the result is 7FFF
which is +32, 767! Another type of shift
can
be
used for signed values.
3.1.3
Arithmetic shifts
These
shifts
are
designed
to allow
signed
numbers to be quickly multi-plied and divided by
powers
of 2. They insure that the sign bit is
treated correctly.
SAL Shift Arithmetic
just
a synonym
is assembled
machine code
as
Left
- This instruction
is
for SHL. It
into
the exactly
the
same
as SHL. As long
the sign bit is not changed by the shift,
the result will be correct.

SAR Shift Arithmetic
instruction
Right
that does not shift
- This
is
a new
the sign bit (i.e.
the msb) of its operand.

shifted
as
normal except that the
enter from the left
are copies
is, if the sign bit is 1, the

Thus, if
The other
new
bits
are
bits that
of the sign bit (that
new
bits
are
also 1).
byte is shifted with this instruction,
only the lower 7 bits
are shifted.
As for the other
shifts, the last bit shifted out is stored in the
carry
flag.
8246H,
mov
ax,0C123H
ax,
sal
CF
13
1
sal
;
ax = 048CH, CF = 1
;
ax = 0123H, CF = 0
sar
;
ax =
ax, 1
ax, 2
3.1. SHIFT OPERATIONS
49
3.1.4
Rotate shifts
The rotate shift instructions
work like logical
shifts except that bits lost off one end of the data
are
shifted in on the other side. Thus, the data
is treated
simplest
as if it is a circular structure. The two

rotate instructions
are ROL and ROR
which make left and right rotations, respectively.
Just
as
for the other shifts, these shifts leave the
copy of the last
bit shifted around inthe carry flag.
mov
8247H,
CF
ax, 0C123H
ax
rol
13
are two
;
ax =
ax, 1
ax, 1
ax, 2
ax, 1
;
ax = 048FH, CF = 1
;
ax = 091EH, CF = 0
;
ax = 8247H, CF = 1
;
ax = C123H, CF = 1
There
rol
rol
ror
ror
additional rotate instructions
that shift the bits inthe data and the

named
RCL and RCR. For example,
carry
flag
if the AX
register
is rotated
17-bits made
up
with these instructions,
of AX and the
carry
flag
the
are
rotated.
mov
clc
the
carry
ax, 0C123H
flag (CF
= 0)
;
ax =8246H,
15
rcl
091BH, CF
=0
=1
rcr
CF
;
ax = C123H,
CF
CF
;
ax =
ax, 1
6
;
ax = 8246H,
3.1.5
CF
;
ax = 048DH,
ax, 1
rcl
;
clear
rcl
ax,
ax, 2
=1
=0
rcr
ax, 1
Simple application
Here is a code snippet that counts the

number of bits that
are on
(i.e. 1) inthe EAX
register.
1
mov
;
blwill
bl,0
contain the count of ON bits
;
ecx
ecx, 32
mov
is the loop counter
count_loop:
4
bit into
shl
carry
;
shift
eax, 1
flag
jnc
;
if CF == 0,goto
skip_inc
skip_inc
7
inc
bl
skip_inc:
loop
count_loop
50
X AND Y
Table 3.1: The AND operation
AND
Figure 3.2: ANDing
a byte
The above code destroys the original value of EAX

(EAX is zero at the end of the loop). If one wished
to retain the value of EAX, line 4 could

replaced
with rol eax, 1.
be
3.2
Boolean Bitwise Operations
are
There
four
common
boolean
operators:
AND, OR, XOR and NOT. A truth table shows

the result
of each operation
for each possible
value of its operands.
3.2.1
The AND operation
The result of the AND of two bits is only 1

if both bits
are 1, else the
result is 0 as the truth
table inTable 3.1shows.

Processors
instructions
support
these
operations
bits of data
in parallel.
contents of AL and BL
are
as
on
all the
For example,
if the
that act indepen-dently
ANDed together, the
basic AND operation is applied to each of the 8

pairs of corresponding
bits in the two registers
as Figure
3.2 shows. Below is a code example:
mov
ax,0C123H
and
ax, 82F6H
8022H
;
ax =
3.2.2
The OR operation
The inclusive OR of 2 bits is 0 only if both
bits are 0,else the result is 1

as the truth table in
Table 3.2 shows. Below is a code example:
mov
ax,0C123H
or
ax, 0E831H
= E933H
;
ax
3.2. BOOLEAN BITWISE OPERATIONS
51
X OR Y
Table 3.2: The OR operation
X XOR Y
Table 3.3: The XOR operation
3.2.3
The XOR operation
The exclusive OR of 2 bits is 0 if and only if

both bits are equal, else the result is 1
as the truth
table inTable 3.3 shows. Below is a code
example:
mov
ax, 0C123H
xor
ax, 0E831H
;
ax
= 2912H
3.2.4
The NOT operation
The NOT
operation
a unary
is
operation
on one operand, not two like binary

as AND). The NOT of a bit is
the opposite value of the bit as the truth table in
Table 3.4 shows. Below is a code example:
(i.e. it acts
operations such
mov
ax, 0C123H
not
ax
;
ax =
3EDCH
Note
that
complement.
the
NOT
finds
the
Unlike the other bitwise
the NOT instruction
does not change
ones
operations,
any
of the
an
A ND
bits inthe FLAGS register.
3.2.5
The
The TEST instruction
TEST
instruction
performs
operation, but does not store the result. It only
sets the FLAGS register based

would
be (much like how
performs
a subtraction
on what
but only sets FLAGS). For
example, if the result would be
set.
the result
the CMP instruction
zero, ZF would
be
52
Turn
on bit i
Turn offbit i
Complement
bit i
NOT X
Table 3.4: The NOT operation

OR the number with 2
(which is the binary
number with just bit i

on)
AND the number with the binary number

with only bit i
off.
called
This operand is often
a mask
XOR the number with 2
Table 3.5: Uses of boolean operations
3.2.6
Uses of bit operations
Bit operations
are very useful
for
manipulating individual bits of data without

modifying the other bits. Table 3.5 shows three
common uses
of these operations. Below is some
example code, implementing these ideas.
mov
xor
ax,
C12BH
410BH
0FFF0H
4F00H
or
invert
ax,
;
turn
nibbles
0FFFFH
ax
off nibble,
ax
= BF0FH
;
1s
bit
ax,
= 4F0BH
xor
ax = C10BH
bit 5,
;
invert
ax
ax
and
8000H
;
turn on nibble,
ax,
;
turn on
;
turn off
0FFDFH
ax
ax
ax
3,
bit
ax,0C123H
or
0F00H
and
ax
complement,
0F00FH
8
31,
xor
ax
40F0H
The AND operation

the remainder
of
can
To find the remainder of

the
number with
mask will contain
also be used to find
a power of two
a division by 2 i,AND
division
by
a mask equal to 2 i 1. This

ones from bit 0 up to bit i 1.
It
from
is just
bit 0 these bits that contain the remainder.
The result
zero out
of the AND will keep these bits and
the others. Next is a snippet of code that
finds the quotient and remainder of the division

of 100 by 16.
mov
64H
ebx, 0000000FH
= 16 -1
= 15 or F
;
ebx = remainder =4
Using the CL register
it is possible
bits of data. Next is
sets (turns
on)
an
arbitrary
an
;
mask
ebx, eax
and
arbitrary
;
100 =
eax, 100
mov
to modify
example that
bit in EAX. The
number of the bit to set is stored inBH.
3.3. AVOIDING CONDITIONAL BRANCHES
53
mov
;
blwill
bl,0
contain the count of ON bits
;
ecx
ecx, 32
mov
is the loop counter
count_loop:
bit into
carry
;
add just
loop
;
shift
eax, 1
shl
the
adc
carry
flag to bl
flag
bl,0
6
count_loop
Figure 3.3: Counting bits with ADC
mov
;
first
cl,bh
build the number to OR with
mov
ebx, 1
3
shl
left cltimes
;
shift
ebx, cl
or
eax, ebx
;
turn on bit
Turning
1
a bit offis
mov
just
a little
harder.
;
first
cl,bh
build the number to AND with
mov
ebx, 1
shl
left cltimes
;
invert
not
ebx
bits
;
turn off
eax, ebx
and
;
shift
ebx, cl
4
bit
Code to complement
an arbitrary
bit is left
as an
exercise for the reader.

It is not
uncommon to see the following

program:
puzzling instruction in a 80x86
xor
;
eax
eax, eax
=0
A number XORed with itself always results in
zero. This instruction
is used because its machine
code is smaller than the corresponding MOV

instruction.
3.3
Avoiding Conditional
Branches
Modern
processors
use very sophisticated

as quickly as possible.
is known as speculative
techniques to execute code

One
common
technique
execu-tion
This
processing
multiple
capabilities
processor,
the
parallel
of the CPU to
execute
once.
at
instructions
branches present
uses
technique
Conditional
prob-lem with this idea. The
ingeneral, does not know whether the
branch will be taken
or not.
different set of instructions
If it is taken,
will be executed than
if it is not taken. Processors
try to predict
whether
taken.
the
branch
will
be
wrong, the processor

time executing the wrong code.
prediciton is
If
has wasted
the
its
54
One
using
The
way to
avoid this problem is to avoid
conditional
sample
provides
code
a simple
branches
in
when
possible.
3.1.5
example of where
one
could do
this. In the previous example, the on bits of the
EAX register
are counted.
the INC instruction.

branch
can
be
a branch to skip
Figure 3.3 shows
removed
instruction to add the

The SETxx
It uses
carry
instructions
by
how the
using
the
ADC
flag directly.
provide
a way to
remove
branches
to
location
These
a byte register or
zero or one based on the
state of the FLAGS register.

after SET
cases.
certain
set the value of
instructions
memory
in
are
conditional
same
the
branches.
If
The characters
characters
the
used for
corresponding
condition of the SETxx is true, the result stored

is a one, if false
setz
a zero is stored. For example,

;
AL = 1
if Z flag is
al
set, else 0
Using these instructions,

clever
techniques
that
one can
cal-cuate
develop
values
some
without
branches.
For example, consider the problem of finding

the
maximum
of
two
values.
The
standard
use a
use a conditional branch to act on which
was larger. The example program below
how the maximum can be found without
approach to solving this problem would be to

CMP and
value
shows
any branches.
1
;
file: max.asm
segment .data
message1 db "Enter
a number:
",0
message2 db "Enter another number: ",
0
message3 db "The larger number is:",
0
89
segment .bss
10
11
input1
resd
;
first
12
13
14
15
segment .text
global
_asm_main
_asm_main:
16
enter
17
pusha
0,0
18
19
mov
eax, message1
20
call
print_string
21
call
read_int
number entered
;
setup
;
print
;
input
routine
out first
message
first number
3.3. AVOIDING CONDITIONAL BRANCHES
55
mov
[input1],
24
mov
eax, message2
25
call
print_string
26
call
read_int
28
xor
ebx, ebx
29
cmp
eax, [input1]
30
setg
bl
;
;
;
neg
mov
ebx
32
33
and
ecx, ebx
ecx, eax
34
not
ebx
35
and
ebx, [input1]
22
eax
23
27
31
;
;
;
;
;
or
ecx, ebx
38
mov
eax, message3
39
call
print_string
40
mov
eax, ecx
41
call
print_int
42
call
print_nl
36
37
43
44
popa
45
mov
46
leave
47
ret
print out second
eax, 0
message
input second number (ineax)
ebx
=0
compare
second and first number
= (input2
ebx = (input2
ecx = (input2
ecx = (input2
ebx = (input2
ebx = (input2
ecx = (input2
ebx
>input1)
>input1)
>input1)
>input1)
>input1)
>input1)
>input1)
1
:
0
?0xFFFFFFFF
:
0
?0xFFFFFFFF
:
0
input2
:
0
?
?
?
?
0:
0xFFFFFFFF
0:
input1
input2
:
input1
print out result
return back to C
The trick is to create

used to select the correct
a bit
mask that
can be
value for the maximum.
The SETG instruction in line 30 sets BL to 1

if
the second input is the maximum
or
0 otherwise.
This is not quite the bit mask desired. To create
bit mask, line 31 uses the NEG

on the entire EBX register
(Note
was zeroed out earlier.) If EBX is 0,
the required
instruction
that EBX
this does nothing; however,
if EBX is 1, the
result is the twos complement representation of -1
or
0xFFFFFFFF.
This
is just
the
bit
uses this bit

as the maximum.
required. The remaining code
to select the correct input
mask
mask
An alternative
trick is to use the DEC
statement. Inthe above code, if the NEG is replaced
with
DEC, again the result will either be 0
0xFFFFFFFF.
However, the values
than when using the NEG instruction.
are
or
reversed
56
3.4
3.4.1
Manipulating bits inC
The bitwise operators of C
Unlike
some
high-level languages, C does
provide operators for bitwise
operations. The AND operation is

1.
represented by the binary & operator
The OR operation is represented by the binary
|
operator. The XOR operation is represetned by the binary ^operator.
And the NOT operation is

represented by the unary
The shift operations
~operator.
are performed
<< and >> binary operators.

The << operator performs
>> operator performs right
by Cs
left shifts and the
shifts. These operators take two operands.
The left operand is the value to

shift and the right operand is the number of
bits to shift by. If the value
to shift is an unsigned type, a logical shift is

made. If the value is a signed
type (like int), then an arithmetic shift is

used. Below is some example C
code using these operators:
1
s;
short int
int is 16bit /
3
s = 1;
complement) /
assume
short unsigned
that short
u;
s = 0xFFFF
u= 100;
(2s
/
u=
0x0064 /
5
u = u|
0x0100;
u = 0x0164
s =s & 0xFFF0;
s = 0xFFF0
s = s u;
s = 0xFE94
u= u<< 3;
u = 0x0B20
(logical
shift )/
(arithmetic
3.4.2
s = s >> 2;
s = 0xFFA5
shift )/
Using bitwise operators inC
are used
same purposes as they are used
language.
They
allow
one to
individual bits of data and can be
The bitwise operators
multiplication
in C for the
in assembly
manipulate
used for fast
and division. In fact,
a smart
compiler will
use a shift
for
a multiplication
like,
x *= 2,automatically.
Many
POSIX
operating
system
API
2 s (such
and Win32) contain functions which
operands
that have data encoded
example,
systems
POSIX
as
(a better
others.
name
Each
would be owner),
user can
type of
bits. For
maintain
permissions for three different types of
programmer
to
be granted
a file requires
manipulate
POSIX defines several
file
users: user
group and
permission to read, write and/or execute
To change the permissions of
as
use
individual
macros to help
a file.
the C
bits
(see Table
3.6). The
1
This
unary
2
operator
is different
from
the
binary
&&
and
& operators!
Application Programming Interface

stands
Computer
for Portable
Environments.
IEEE based
on UNIX.
Operating
System
stan-dard
Interface
developed
by
for
the
3.5. BIG AND LITTLE ENDIAN
57
REPRESENTATIONS
Macro
Meaning
S IRUSR
user can read

user can write
user can execute
group can read
group can write
group can execute
others can read
others can write
others can execute
S IWUSR
S IXUSR
S IRGRP
S IWGRP
S IXGRP
S IROTH
S IWOTH
S IXOTH
Table 3.6: POSIX File Permission Macros

chmod function can be used to
permissions of file. This function takes
a string with
on and an integer 4
two parameters,
file to act
with the appropriate
the
set
name
the
of the
bits set for the desired
permissions. For example, the

code below sets the permissions
to allow the
owner
of the file to read and
write to it, users in the

and others have
chmod(foo,
group to read
the file
no access.
S IRUSR |
S IWUSR |
S IRGRP );
can
The POSIX stat function
be used to
find out the current permission

bits for the file. Used with the chmod function,
it is possible to modify
of the
permissions
Here is an example that

write
access to
some
without
changing
others.
removes
others and adds read
access
to the owner of the file. The

other permissions
1
struct stat
are not
file stats
altered.
/ struct
used by
stat () /
2
stat(foo,
& file stats );
//
read
read
filefil info
e
file sta
st mode holds permission bits /
ts
..
chmod(foo,
(file stats .st mode & S IWOTH) |

S IRUSR);
3.5
Big and Little Endian
Representations
Chapter 1introduced
the concept of big and
little endian
representations
of multibyte
data.
However, the author has found that this subject
many
confuses
people. This section
covers
the
topic inmore detail.

The reader will recall that endianness
refers
to the order that the in-dividual bytes (not bits)

of
a multibyte
data element is stored in memory.
Big endian is the most straightforward
method.
It stores the most signif-icant byte first, then the
next significant byte and
so on. In other
words
the big bits
are
stored first. Little endian stores
the
in
the
bytes
Actually
typedef to
opposite
a parameter
of
type
mode t
which
an integral type.
58
unsigned short word

unsigned char
if (p[0]
= 0x1234;
p = (unsigned
assumes
sizeof(short)
char ) &word;
== 0x12 )
printf (Big Endian Machine\n);
else
printf ( Little Endian Machine\n);
== 2 /
is
Figure 3.4: How to Determine Endianness
order (least significant
first). The x86 family of
processors use little endian representation.

As an example, consider the double
1234567816
representing
In
big
representation, the bytes would be stored
56 78. In little
would be stored
endian represenation,
now, why any sane

Intel
representations
would
seem
the bytes
for
asking himself
chip de-signer would
representation?
sadists
as 12 34
as 78 56 34 12.
The reader is probably
endian
word
endian
use
right
little
Were the engineers
inflicting
on multitudes
this
of
at
confusing
program-mers?
It
that the CPU has to do extra work
to store the bytes backward in memory like this
unreverse them when read back in to

memory). The answer is that the CPU does not
do any extra work to write and read memory
(and to
using little endian format. One has to realize
that
the CPU is composed
circuits that simply work

(and bytes)
CPU.
are not
of
many
electronic
on bit values. The bits

necessary order in the
in any
Consider
can
the 2-byte AX register. It
be
decomposed into the single byte registers: AH and
are
AL. There
circuits in the CPU that maintain
the values of AH and AL. Circuits
are not
in any
order in a CPU. That is, the circuits for AH
not before
instruction
or
are
mov
to memory
is not any
after the circuits for AL. A
that copies the value of AX
copies the value of AL then AH. This
harder for the CPU to do than storing AH first.
same argument applies to the individual

bits in a byte. They are not really in any order in
the circuits of the CPU (or memory for that
matter). However, since individual bits can not be
addressed in the CPU or memory, there is no way
to know (or care about) what order they seem
The
to be kept internally by the CPU.
The C code in Figure 3.4 shows how the

endianness of
a CPU can be
determined. The
pointer treats the word variable

char-acter
array.
as a two
element
Thus, p[0] evaluates to the first
byte of word in
memory
endianness of the CPU.
which
depends
on
the
3.5. BIG AND LITTLE ENDIAN
59
REPRESENTATIONS
unsigned invert endian( unsigned
unsigned invert;
const unsigned char
unsigned char ip
x)
xp = (const unsigned char ) &x;

= (unsigned char ) & invert;
67
= xp [3];
= xp[2];
ip[2] = xp[1];
ip[3] = xp[0];
ip[0]
reverse
the individual bytes /
ip[1]
10
11
return invert
12
13
/ return the bytes reversed /
Figure 3.5: invert endian Function
3.5.1
When to Care About Little and
Big Endian
For typical programming, the endianness of
the CPU is not significant.

The most
common
time that it is important is
when binary data is trans-
ferred between different computer systems. This is
usually either using
some
type of physical data media (such as a disk)

network. Since ASCII data
or a
With the advent of
multi-is single byte, endianness is not
an issue
for
byte character
it.
sets,
like
Allinternal TCP/IP headers store integers inbig endian format (called

network byte order). TCP/IP libraries provide C functions for dealing with
endianness issues in a portable
way. For example,

UNICODE, endianness is
important
even
for
the htonl() function
con-
text
data. UNICODE supports

verts
a double
either endianness and has
word (or long integer) from host to network format. The

5
a mechanism
for specifying
For a big endian
which endianness is being
system, the two functions just return their input unchanged. This allows
used to represent the data.
ntohl() function performs the opposite transformation.
one to write network programs

and run correctly onany
that will compile
system irrespective of its endianness. For more

information, about endi-
anness
and network programming
see W.
Richard Stevens excellent book,

UNIX Network Programming.
Figure 3.5 shows
a C function
that inverts
the endianness of a double

word. The 486
processor
introduced
a new
machine instruction named BSWAP

that
reverses
the bytes of
any
32-bit register. For
example,
bswap
;
swap
edx
bytes of edx
The instruction can not be used

registers.
However, the XCHG
5
Actually, reversing the endianness of
reverses
little
the bytes; thus,
to big is the
functions
do the
same
con-verting
same
thing.
on
an integer
16-bit
simply
from big to little
operation.
So both
of
or
these
60
int count bits (unsigned int data )
int cnt
=0;
45
while( data != 0 ){
=data
data
cnt++;
return cnt;
10
& (data 1);
Figure 3.6: Bit Counting: Method One
instruction
can be used to swap the bytes of the

can be decomposed into 8-bit
16-bit registers that
registers. For example:

xchg
3.6
ah,al
;
swap
bytes of ax
Counting Bits
Earlier
a straightforward
for counting the number
was given
are on in a
technique
of bits that
double
word. This
section
looks
direct methods of doing this
at other less
as an exercise
using
the bit operations discussed in this chapter.
3.6.1
Method
The first
one
method
very
is
simple, but not
obvious. Figure 3.6 shows the code.
How
does
this
work? In every
one bit is turned off in
bits are off (i.e. when data
method
iteration of the loop,

data. When all the
is zero), the loop stops. The number of iterations

required to make data
zero is equal to
the number
of bits inthe original value of data.
Line 6 is where
a bit of data
does this work? Consider

binary representation
is turned off. How
the general form of the
of data and the rightmost
1 in this representation. By definition,

after this 1
must be
binary representation
zero.
every
bit
Now, what will be the
of data
- 1? The bits to
the left of the rightmost 1

will be the
same as for
data, but at the point of the rightmost 1

the bits
will
be the
complement
of data. For example:

data
xxxxx10000
of
the
original
bits
data
-1 =
xxxxx01111
3.6. COUNTING BITS
61
1
static unsigned char byte bit count [256];
/ lookup table /
23
initialize count bits ()
void
4
int cnt, i data;
67
= 0; i< 256; i++ ){

=0;
data = i;
for( i
cnt
while( data != 0 ){
10
data
11
/ method
one
& (data 1);
cnt++;
12
13
byte bit count [i]
14
=cnt;
15
16
= data
17
18
int count bits (unsigned int data )
19
const unsigned char byte
20
= (unsigned
char ) & data;
21
22
return byte bit count [byte[0]]
23
byte bit count [byte[2]]
24
+byte bit count

+byte bit count
[byte[1]]
[byte [3]];
Figure 3.7: Method Two
where the xs
are
the
same
for both numbers.
When data is ANDed with data
-1, the result
will
zero
the rightmost 1
in data and leave all the
other bits unchanged.
Method two
3.6.2
A lookup table
bits
of
an
straightforward
precompute
can
arbitrary
also be used to count the

double
approach
the number
two related
would
be
The
to
of bits for each double
word and store this in an
are
word.
problems
array.
However, there
with this
approach.
are roughly 4 billion double word values!

means that the array will be very big and
that initializing
it will also be very time
consuming.
(In fact, unless one is going to
actually use the array more than 4 billion times,
more time will be taken to initialize the array than
There
This
it would require to just compute
using method one!)
the bit counts
62
A more realistic method would precompute

the bit counts for all possible byte values and store
these into an array. Then the double word
can be
up into four byte values. The bit counts of

are looked up from the
array and sumed to find the bit count of the
split
these four byte values
original double word. Figure 3.7 shows the to

code implement this approach.
The initialize count bits function must be
called before the first call to the count bits

function. This function initializes the global
array. The count bits function

looks at the data variable not as a double word,
but as an array of four bytes. The dword pointer
acts as a pointer to this four byte array. Thus,
dword[0] is one of the bytes indata (either the
least significant or the most significant byte
depending on if the hardware is little or big
endian, respectively.) Of course, one could use a
byte bit count
construction
like:
(data
>> 24) & 0x000000FF
to find
similar
most
the
ones
significant
byte
value
and
for the other bytes; however, these
constructions
will
be
slower
an array
than
reference.
a for loop could easily be

used to compute the sum on lines 22 and 23. But,
a for loop would include the overhead of
initializing a loop index, comparing the index
One last point,
after each iteration and incrementing
Computing
the
sum as
values will be faster.
the explicit
In fact,
the index.
sum
a smart
of four
compiler
would convert the for loop version to the explicit
sum.
This
loop
iterations
process
technique known
3.6.3
or
compiler
eliminating
optimization
as loop unrolling.
Method three
There
counting
of reducing
is
is
yet
another
the bits that
clever
are on
in
method
of
data. This
method literally adds the ones and zeros of the

data together. This
sum must
equal the number of
ones in the data. For example, consider

the ones in
byte stored in
data. The first step
counting
variable
named
is to perform the following
operation:
data
= (data & 0x55) +((data
>> 1) & 0x55);
What does this do? The hex constant
0x55 is
01010101 in binary. In the first operand of the
addition, data is ANDed with this, bits at the

odd
are
bit positions
bits
position
same
>>
1) & 0x55) first moves all

even positions to an odd
and uses the same mask to pull out these
operand ((data
the
pulled out. The second
at the
bits. Now, the first operand
contains
the
even bits
of data. When these two operands are added
together, the even and odd bits of data are
odd bits and the second operand the
added
together.
101100112
For
then:
3.6. COUNTING BITS
63
example,
if
data
is
int count bits(unsigned int
x)
static unsigned int mask[]
= {0x55555555,
0x33333333,
0x0F0F0F0F,
0x00FF00FF,
0x0000FFFF };
int i;
int shift
/ number of positions to shift to right /
10
for( i=0, shift=1; i

< 5; i++, shift = 2 )
11
x =(x & mask[i]) + ((x >> shift) & mask[i ]);
12
return
13
14
x;
Figure 3.8: Method 3

data & 010101012
+ (data >> 1) & 010101012
or
00
01
00
01
01
01
00
01
01
10
00
10
on the right shows the actual bits added together. The

bits of the byte are divided into four 2-bit fields to show that actually there
The addition
are four independent
additions being performed.
sums can be is two, there is

no possibility that the sum will overflow its field
and corrupt one of the other fields sums.
Of course, the total number of bits have not
been computed yet. How-ever, the same technique
that was used above can be used to compute the
total in a series of similar steps. The next step
Since the most these
would be:
data
= (data & 0x33) + ((data
>> 2) &
0x33);
Continuing the above example (remember that
data
now
is 011000102 ):
data & 001100112
+(data >> 2) & 001100112

Now there
are two 4-bit
or
fields to that
0010
0010
0001
0000
0011
0010
are
independently added.
The next step is to add these two bit
sums
together to form the final result:

data
= (data & 0x0F) + ((data >> 4) & 0x0F);
Using the example above (with data equal to
001100102 ):
data & 000011112
00000010
+ (data >> 4) & 000011112
or
00000011
00000101
64
Now
data
Figure
is 5 which
3.8 shows
an
is the correct
implementation
result.
of
this
method that counts the bits in a double word. It
uses a for
loop to compute the
sum.
It would be
faster to unroll the loop; however, the loop makes
it clearer how the method generalizes to different

sizes of data.
Chapter 4
Subprograms
This chapter
make modular
programs
level languages (like C)
are
looks at using subprograms
to
and to interface with high

Functions and procedures
high level language examples of subprograms.
a subprogram and the

agree on how data will be
passed between them. These rules on how data
will be passed are called calling conventions. A
The code that calls
subprogram itself must
large part of this chapter
will
standard C calling conventions
to
interface
programs
assembly
deal with the
that
can be
subprograms
used
with
This (and other conventions) often
pass
the addresses of data (i.e. pointers) to allow the

subprogram to access the data inmemory.
4.1
Indirect Addressing
Indirect addressing allows registers to act like

pointer variables. To in-dicate that
as a pointer,
be used indirectly
square
is to
brackets ([]).For example:
direct
mov
memory
mov
ebx, Data
a register
it is enclosed in
mov
ax, [Data]
addressing of
;
ebx
;
normal
a word
2
= & Data
ax, [ebx]
;
ax =
*ebx
Because AX holds
word, line 3 reads
word
starting at the address stored inEBX. If AX

replaced with AL, only
read. It is important
a single
was
byte would be
to realize that registers do
not have types like variables do in C. What EBX

is assumed to point to is completely
determined
are used. Furthermore, even

the fact that EBX is a pointer is completely
determined by the what instructions are used
If EBX is used incorrectly, often there will be no
assembler error; however, the program will not
work correctly.
This is one of the many
reasons that assembly programming is more error
prone than high level programming.
by what instructions
65
66
CHAPTER 4.SUBPROGRAMS
All the 32-bit
general
purpose
(EAX, EBX,
ECX, EDX) and index (ESI, EDI) registers
can be
used for indirect addressing. In general, the 16-bit

and 8-bit registers
4.2
can not be.
Simple Subprogram
Example
A subprogram
an
independent
unit of
a
a subprogram is like a
function in C. A jump can be used to invoke the
subprogram, but returning presents a problem. If
code that
program.
can
is
be used from different parts of
In other words,
the subprogram is to be used by different parts of

the program, it must return back to the section of
code that invoked it.Thus, the jump back from
the subprogram
The code below
can not
be hard coded to
a label.
shows how this could be done
using the indirect form of the JMP instruction.

form of the instruction
uses
the value of
This
a register
to determine where to jump to (thus, the register
a
program
acts much like
function pointer in C.) Here
the first
from chapter 1
rewritten to
is
use
a subprogram.
sub1.asm
;
file: sub1.asm
;
Subprogram example program
45
segment .data
a number:
prompt1 db
"Enter
prompt2 db
",0
outmsg1 db
"You entered ",0
outmsg2 db
"and ",0
10
outmsg3 db
",the
sum of these
is ",0
11
12
segment .bss
13
input1
resd 1
14
input2
resd 1
15
16
17
18
segment .text
global
_asm_main
_asm_main:
19
enter
20
pusha
0,0
;
setup
eax, prompt1
;
print
21
22
mov
23
call
print_string
mov
ebx, input1
24
25
;
dont
;
store
forget null terminator
routine
out prompt
address of input1 into ebx
4.2. SIMPLE SUBPROGRAM EXAMPLE
67
26
mov
ecx, ret1
27
jmp
short get_int
;
;
29
mov
eax, prompt2
30
call
print_string
32
mov
ebx, input2
33
mov
ecx, $ + 7
34
jmp
short get_int
mov
eax, [input1]
28
ret1:
31
35
36
;
;
37
add
38
mov
eax, [input2]
ebx, eax
40
mov
eax, outmsg1
41
call
print_string
42
mov
eax, [input1]
43
call
print_int
44
mov
eax, outmsg2
45
call
print_string
46
mov
eax, [input2]
47
call
print_int
48
mov
eax, outmsg3
49
call
print_string
50
mov
eax, ebx
51
call
print_int
52
call
print_nl
;
;
eax, 0
39
53
54
popa
55
mov
56
leave
57
ret
58
;
subprogram
59
;
Parameters:
60
ebx
get_int
-address
of dword to store
store return address into

read integer
print out prompt
= this address + 7
eax = dword at input1
ecx
eax += dword at input2

ebx
= eax
print out first
message
print out input1

print out second
message
print out input2

print out third
print out
message
sum (ebx)
print new-line
return back to C
ecx
integer into
61
ecx
-address
of instruction to
return to
63
;
Notes:
; value
64
get_int:
62
of eax is destroyed
65
call
66
mov
read_int
[ebx],
eax
67
store input into

jmp
;
jump
memory
ecx
sub1.
back to caller
68
get int
The
register-based
EBX
register
uses a simple,
conven-tion. It expects the
subprogram
calling
to
hold
the
address
of
the
DWORD to store the number input into and the
ECX register
instruction
to hold the code address
of the
to jump back to. In lines 25 to 28,
the ret1 label is used
to compute
this return
address. In lines 32 to 34, the $ operator is used
to compute the return address. The $ operator

returns the current address for the line it appears
on. The
expression $ + 7 computes the address
of the MOV instruction
on line 36.
Both of these return address computations
are
awkward.
to be defined
second
The first method requires

for each subprogram
instead of
simpler
a label, but
If a near jump was
method does not require
require careful thought.
would
a label
call. The
not
a short
does
used
jump, the number to add to $
be 7! Fortunately,
way to invoke
there is
much
subprograms. This method
uses
the stack.
4.3
The Stack
Many
stack.
CPUs
have built-in
A stack is
list. The stack is

organized
support
a Last-In First-Out
an area of memory
for
(LIFO)
that is
in this fashion. The PUSH instruction
adds data to the stack and the POP instruction
removes
data. The data removed
is always the
a last-in
last data added (that is why it is called

first-out list).
The
SS
segment
register
specifies
the
segment that contains the stack (usually this is

the
same segment
ESP register
data is stored into). The
contains
the address of the data
that would be removed
from the stack.
This
data is said to be at the top of the stack. Data
can only be added in double word units. That is,

one can not push a single byte on the stack.
1
The PUSH instruction inserts a double word
on the stack by subtracting 4 from ESP and then
stores
the
instruction
double
word
at [ESP]
The
POP
reads the double word at [ESP] and
then adds 4 to ESP. The code below demostrates

how these instructions
work and
assumes
that
ESP is initially 1000H.
push
= 0FFCh
;
2 stored
at 0FF8h, ESP
push
dword 3
;
1
stored
dword 1
0FFCh, ESP
= 0FF8h
;
3 stored
= 0FF4h
pop
EAX =3,ESP = 0FF8h
;
EBX = 2,ESP = 0FFCh
;
ECX = 1,ESP = 1000h
ESP
Actually words
words
on the stack.
at 0FF4h,
eax
pop
ebx
pop
can be pushed too, but
protected mode, it is better to work
at
dword 2
push
in32-bit
with only double
ecx
4.4. THE CALL AND RET INSTRUCTIONS
69
The stack
can
be used
as a convenient
to store data temporarily.

making
subprogram
place
It is also used for
calls, passing
parameters
and local variables.

The 80x86 also provides
that pushes
the values
of
a PUSHA
instruction
EAX, EBX, ECX,
EDX, ESI, EDI and EBP registers (not in this
order). The POPA instruction
can
be used to
pop
them allback off.
4.4
The CALL and RET
Instructions
The 80x86 provides two instructions
the stack to make calling subprograms
easy.
The
CALL
uncondi-tional jump to
instruction
subprogram
pops
of f
use
makes
an
and pushes
on the stack.
an address and
the address of the next instruction

The RET instruction
that
quick and
jumps
to
that
address.
When
using
instructions, it is very important that

the stack correctly
so
these
one man-age
that the right number is
popped offby the RET instruction!

The previous
these
new
program can be rewritten to use
instructions
by changing lines 25 to 34
to be:
mov
ebx, input1
call
get_int
mov
ebx, input2
call
get_int
and change the subprogram get int to:

get_int:
call
read_int
mov
[ebx],
eax
ret
There
are several
advantages to CALL and RET:
It is simpler!
al lows
ogram
sca
It allows subprogramsIt calls
to subpr
be nested
easily.
Notice
that
get int
calls read int. This call
pushes another address

of read ints code is
on the
a RET
stack. At the end

that
pops
off the
return address and jumps back to get ints code.
pops off
asm main.
Then when get ints RET is executed, it

the return address that jumps back to
This works correctly because of the LIFO property

of the stack.
70
Remember it is very important to pop offall

data that is pushed
on the
stack. For example,
consider the following:
get_int:
call
read_int
mov
[ebx],
push
eax
ret
eax
;
pops
off
EAX value, not return address!!
This code would not return correctly!
4.5
Calling Conventions
When
subprogram
code and the subprogram
on
how to
languages
as
the calling
(the callee) must
agree
data between them. High-level
have standard
calling
interface
pass
is invoked,
conventions.
with assembly
ways to pass
For
data known
high-level
language,
code
to
the assembly
language code must

the high-level
can
use
language.
the
same
on
how the code
are on or
optimizations
convention
a CALL
All
or may vary
is compiled
not). One
(e.g. if
universal
is that the code will be invoked with
instruction and return via

PC
compilers
support
convention that will be described

chapter
as
The calling conventions
differ from compiler to compiler
depending
RET.
one
calling
inthe rest of this
allow one
are reentrant. A
may be called at any point
in stages. These conventions
to create
subprograms
reentrant subprogram
of
conventions
a program
that
safely (even inside the subprogram
itself).
4.5.1
Passing parameters
Parameters
on the
stack.
the
a subprogram may be passed

are pushed onto the stack
instruction.
Just as in C, if
to
They
before the CALL
parameter
subprogram,
on the stack
is
to
be
changed
the address of the data
by
passed, not the value. If the parameters
less than
a double
the
must be
size is
word, it must be converted to a
double word before being pushed.

The
parameters
on
the
popped off by the subprogram,
stack
are not
are
instead they
accessed from the stack itself. Why?
Since
they
have
to be pushed
on
the stack
before the CALL instruction,
the return address would have to be popped off

irst (and then pushed
back
on again).
Often the parameters will have to be used in

several places in the
subprogram. Usually, they
can not
be kept in a
register for the entire

subprogram
and would have to be stored in
chapter
CONVENTIONS
71
4.5. CALLING
ESP + 4
Parameter
Return address
ESP
Figure 4.1:
ESP + 8
Parameter
ESP + 4
Return address
ESP
subprogram data
Figure 4.2:
on the stack keeps a copy of the data in memory

that can be accessed at any point of the
subprogram.
Consider
a subprogram that is passed a

on the stack. When using
single parameter
indirect ad-When the subprogram is invoked, the
stack looks like Figure 4.1. The

80x86
proces-
pa-
dressing, the
rameter
can be accessed
using indirect addressing ([ESP+4]
sor accesses
2).
If the stack is also used inside the subprogram to store data, the number
needed to be added to ESP will change. For example, Figure 4.2 shows what
the stack looks like if a DWORD is pushed the stack. Now the parameter is
at ESP +8 not ESP +4.Thus, it can be very
registers
indirect
sion.
ESP (and EBP)
error prone to use ESP when use the stack segment
referencing parameters. To solve this problem, the 80386 supplies another
EAX,
register to use: EBP. This registers only
is to reference data
different segon what

are used inthe
addressing expres-
ments depending
on the
EDX
EBX,
while
ECX
and
purpose
use the data
segment.
stack. The C calling convention mandates that
a subprogram
first
save the
value of EBP on the stack and then set EBP to be equal to ESP. This allows
ESP to change
as data
is pushed
or popped
offthe stack without modifying
EBP. At the end of the subprogram, the original value of EBP must be
restored (this is why it is saved at the start of the subprogram.) Figure 4.3
However,
this is usually
proprograms, be-
unimportant for most

tected mode
cause
for them the data
and stack segments
are the
same.
shows the general form of a subprogram that follows these conventions.
Lines 2 and 3 inFigure 4.3 make
up the general
prologue of a subprogram.
Lines 5 and 6 make
up the epilogue.
Figure 4.4
shows what the stack looks

like immediately after the prologue. Now the
parameter
can be access
any place
[EBP + 8] at
with
inthe subprogram
without worrying about what

else has been pushed onto the stack by the
subprogram.
After the subprogram is over, the parameters

that
were pushed on the
stack must be removed. The C calling

convention specifies that the caller
code must do this. Other conventions
are
different. For example, the Pascal
calling convention specifies that the subprogram
must
2
remove
the parame-
It is legal to add
a constant to a register
using indirect addressing.
complicated expressions
when
More
are
possible too. This topic is
covered inthe next chapter
72
subprogram_label:
push
ebp
;
save original
mov
ebp, esp
;
new
;
subprogram
pop
ret
EBP
EBP value
on stack
=ESP
code
ebp
;
restore
original EBP value
Figure 4.3: General subprogram form
ESP + 8
EBP + 8
Parameter
ESP + 4
EBP + 4
Return address
ESP
EBP
Figure 4.4:
saved EBP
ters.
(There
is
another
form
instruction that makes this
support
compilers
pascal keyword
is
of
easy to
the
RET
do.) Some C
too.
The
used in the prototype
and
this
convention
definition of the function to tell the compiler to
use
this
convention.
In
fact,
convention that the MS Windows
use
the
stdcall
API C functions
way. What is the advantage of

this way? It is a little more effi cient than the C
convention. Why do all C functions not use this
convention, then? In general, C allows a function
to have vary-ing number of arguments (e.g., the
also works this
types of
printf and scanf functions). For these

functions,
the
operation
parameters from the stack
remove the
will vary from one call
to
of the function to the next. The C convention
allows
the instructions
to perform this operation
to be easily varied from
one call to the next. The
Pascal
convention
and
stdcall
makes
operation
very
convention
(like the Pascal language)
diffi cult.
Thus,
the
does not
allow this type of function. MS Windows
this convention
since
none
this
Pascal
can use
of its API functions
take varying numbers of arguments.
Figure 4.5 shows how

C calling
removes
convention
subprogram
would
using the
be called.
Line
the parameter from the stack by directly
manipu-lating the stack pointer. A POP instruction

could be used to do this also, but would require the
useless result to be stored in a register. Actually,
case, many compilers would use

to remove the parameter.
would use a POP instead of an ADD
ADD requires more bytes for the
for this particular
POP ECX instruction
The compiler
because
the
instruction. However, the POP also changes

value! Next is another example
subprograms
that
use
program
ECXs
with two
the C calling conventions
discussed above. Line 54 (and other lines) shows

that multiple data and text segments
stack
4.5.
may
be
CALLING
CONVENTIONS
73
1
push
dword 1
call
fun
add
esp, 4
;
pass 1
as parameter
;
remove
parameter from stack
Figure 4.5: Sample subprogram call
source
file. They will be combined
into single
data and text segments

Splitting
up
in the linking
process.
the data and code into separate
segments allow the data that
a subprogram uses to
be defined close by the code of the

subprogram.
23
segment .data
4
sum
dd
56
segment .bss
7
input
resd 1
89
;
10
11
12
13
14
15
;
pseudo-code
;
i
=1;
;
sum = 0;
algorithm
;
while( get_int(i,
; sum += input;
; i++;
&input),
16
;
}
17
;
print_sum(num);
18
segment .text
global
19
20
_asm_main
_asm_main:
21
enter
22
pusha
0,0
23
mov
24
25
edx, 1
while_loop:
26
push
edx
27
push
dword input
28
call
get_int
29
add
esp, 8
sub3.asm
input != 0 ){
;
setup routine
;
edx is i inpseudo-code
;
save
i
on stack
;
push address on input on stack
;
remove i
and &input from stack
74
30
32
mov
cmp
eax, [input]
eax, 0
33
je
end_while
add
[sum],
37
inc
edx
38
jmp
short while_loop
31
34
35
eax
36
39
40
end_while:
41
push
dword [sum]
42
call
print_sum
43
pop
ecx
44
45
popa
46
leave
ret
47
48
49
50
;
subprogram
;
Parameters
get_int
(inorder pushed
on
;
sum += input
;
push value
;
remove
of
sum onto
stack
[sum] from stack
stack)
+12])
51
number of input (at [ebp
52
address of word to store input
into (at [ebp

54
+ 8])
53
;
Notes:
values of eax and ebx
are
destroyed
55
segment .data
56
prompt
db
")Enter
number (0 to quit): ",0

58
segment .text
59
get_int:
an integer
57
60
push
ebp
61
mov
ebp, esp
mov
eax, [ebp + 12]
62
63
call
print_int
66
mov
eax, prompt
67
call
print_string
69
call
read_int
70
mov
ebx, [ebp
71
mov
[ebx],
64
65
68
store input into
memory
eax
+ 8]
4.5. CALLING CONVENTIONS
72
73
pop
74
ret
ebp
75
81
;
subprogram print_sum
;
prints out the sum
;
Parameter:
; sum to print out (at [ebp+8])
;
Note: destroys value of eax
;
82
segment .data
83
result
76
77
78
79
80
db
"The
sum is ",0
84
85
segment .text
86
print_sum:
87
push
ebp
88
mov
ebp,
90
mov
eax, result
91
call
print_string
esp
89
92
93
mov
eax, [ebp+8]
94
call
print_int
95
call
print_nl
97
pop
ebp
98
ret
96
;
jump
4.5.2
sub3.
75
back to caller
Local variables
The stack
on the stack
can be used as a convenient
location
for local variables. This is exactly where C stores
normal (or automatic in C lingo) variables. Using

the stack for variables is important if
subprograms
to
be
reentrant
one
subprogram will work if it is invoked at

including the subprogram
Data not stored
any
place,
itself. In other words,
can be invoked recursively.

for variables also saves memory.
on the stack is using memory
reentrant subprograms
Using the stack
wishes
reentrant
from the beginning of the
program
until the end
program (C calls these types of variables

or static). Data stored on the stack only use
memory when the subprogram they are defined for
of the
global
is active.
Local
variables
are
stored
right
by subtracting
the number
after
the
are
allocated
of bytes
required
saved EBP value in the stack. They
76
subprogram_label:
push
ebp
;
save original
mov
ebp, esp
;
new
sub
esp, LOCAL_BYTES
;
=# bytes
;
subprogram
EBP
EBP value
onstack
=ESP
needed by locals
code
mov
esp, ebp
;
deallocate
pop
ebp
;
restore
ret
locals
original EBP value
Figure 4.6: General subprogram form with local

variables
void calc sum( int
int i
n,int
sump
sum =0;
45
for( i=1; i
<= n;i++ )
sum += i;
sump
= sum;
Figure 4.7: C version of
in the prologue of the subprogram.

shows the
register
new
is
subprogram
to
used
sum
Figure 4.6
skeleton. The EBP
access
local
variables.
Consider the C function in Figure 4.7. Figure 4.8

shows how the equivalent subprogram
could be written inassembly.
Figure 4.9 shows what the stack looks like after
pro-gram
the prologue of the
in Figure 4.8. This
section of the stack that contains the parameters,
return information
and local variable

called
Despite
the
fact
function creates
and
LEAVE
that
ENTER
a new
simplify
a stack
the
prologue and epilogue they
of a
on the stack.
invocation
stack frame
storage is
frame. Every
are not used very
often.
The prologue and epilogue of a subprogram
two special instructions that
Why
performs
performs
The
Because
the
they
prologue
can be simplified
instruction
For the
are
ENTER
code
takes
simplier
by using
specifically for this purpose. The
and
instruction
the
the slower than the equivalent
ENTER
operands
are designed
LEAVE
epilogue
two immediate
instructions!
This
calling convention, the second operand is always
0.The first operand is

an example of when the number bytes needed by local variables. The LEAVE instruction has no
one can not assume that a
operands. Figure 4.10 shows how these instructions are used. Note that the
one instruction sequence is
program skeleton (Figure 1.7) also uses ENTER and LEAVE.
faster than a multiple inis
struction
one.
4.6. MULTI-MODULE PROGRAMS
77
1
cal_sum:
push
ebp
mov
ebp, esp
sub
esp, 4
;
make room for local sum
56
-4],0
;
sum =0
mov
dword [ebp
mov
ebx, 1
;
ebx (i)= 1
for_loop:
cmp
ebx, [ebp+8]
;
is i
<= n?
jnle
end_for
12
add
[ebp-4], ebx
13
inc
ebx
14
jmp
short for_loop
10
11
;
sum += i
15
16
end_for:
17
mov
ebx, [ebp+12]
;
ebx =sump
18
mov
eax, [ebp-4]
;
eax =sum
19
mov
[ebx],
21
mov
esp, ebp
22
pop
ebp
23
ret
eax
;
*sump
=sum;
20
Figure 4.8: Assembly version of sum
4.6
Multi-Module Programs
program
A multi-module
more
than
presented
programs.
one
object
here
is
have
one
composed of
programs
All the
file.
been
multi-module
They consisted of the C driver object
file and the assembly
object
file (plus the
library
Recall
that
object
files).
combines the object files into
program.
the
linker
a single executable
up references
The linker must match
made to each label in one module (i.e. object

file) to its definition
in another
order for module A to

module B, the
After
the
use a
module.
In
label defined in
extern directive must be used.
extern
directive
delimited list of labels.
comes
a comma
The directive tells the
as external to the
are labels that can be
but are defined in another.
assembler to treat these labels
module. That is, these

used in this module,
The
asm io inc file

as external.
routines
In assembly,
defines the
labels
can not
78
read int, etc.
be
accessed
ESP
+ 16
+ 12
ESP + 8
ESP + 4
EBP + 12
ESP
EBP + 8
EBP + 4
Return address
ESP
EBP
EBP
sump
saved EBP
-4
sum
Figure 4.9:
1
subprogram_label:
enter
leave
ret
;
=# bytes
LOCAL_BYTES, 0
;
subprogram
needed by locals
code
Figure 4.10: General subprogram form with local

variables using ENTER and LEAVE
can be accessed from other modules than the

one it is defined in, it must be declared global in
its module.
Line
13
Figure
defined
of
The
global
does
this
program listing in
asm main label being
the skeleton
1.7 shows
as
directive
global.
there would be
the
Without
this
declaration,
a linker error. Why?
Because the
C code would not be able to refer to the internal
asm main label.
Next
rewritten
is the code for the previous
separate
use two
to
subprograms
modules.
two
The
(get int and print sum)
source
example,
are
in
file than the asm main routine.
main4.asm
1
segment .data
4
sum
dd
segment .bss
7
input
resd 1
segment .text
10
global
_asm_main
11
extern
get_int, print_sum
12
13
_asm_main:
enter
0,0
setup routine
14
pusha
4.6. MULTI-MODULE PROGRAMS
79
15
16
17
mov
edx, 1
while_loop:
18
push
edx
19
push
dword input
20
call
get_int
21
add
esp, 8
23
mov
eax, [input]
24
cmp
eax, 0
25
je
end_while
add
[sum],
29
inc
edx
30
jmp
short while_loop
22
26
27
eax
28
31
32
end_while:
33
push
dword [sum]
34
call
print_sum
35
pop
ecx
36
37
popa
leave
38
;
edx is i inpseudo-code
;
save i
on stack
;
push address on input on stack
;
remove
i
and &input from stack
;
sum += input
;
push value
;
remove
of
sum onto
[sum] from stack

ret
39
stack
main4.asm
sub4.asm
1
segment .data
4
prompt
db
")Enter
number (0 to quit): ",

0
an integer
segment .text
7
global
get_int, print_sum
get_int:
enter
0,0
10
11
mov
eax, [ebp + 12]
12
call
print_int
14
mov
eax, prompt
15
call
print_string
13
16
80
17
call
read_int
18
mov
ebx, [ebp
19
mov
[ebx],
+ 8]
eax
20
21
leave
22
ret
23
24
segment .data
25
result
db
"The
sum is ",0
26
27
segment .text
28
print_sum:
enter
0,0
31
mov
eax, result
32
call
print_string
34
mov
eax, [ebp+8]
35
call
print_int
36
call
print_nl
29
30
33
37
38
leave
39
ret
sub4.
;
store input into memory
;
jump back to caller
The previous example only has global code

labels; however, global data labels work exactly
same way.
the
4.7
Interfacing Assembly with C
Today,
very
few
programs
completely in assembly. Compilers

converting
are written
are very good at
high level code into effi cient machine
code. Since
it is much easier to write code in a
high
language,
level
it
is
more popular. In
more portable
addition, high level code is much
than assembly!
When assembly
is used, it is often only used
for small parts of the code. This
can be done
in
two
ways:
calling assembly subroutines from C
or inline assembly. Inline assembly allows the

programmer to place assembly statements directly
into C code. This can be very convenient; however,
there are disadvantages to inline assembly. The
assembly code must be written in the format the
compiler
uses.
supports
NASMs
require different
No
compiler
format.
formats
at
the
Different
moment
compilers
Borland and Microsoft
require MASM format. DJGPP and Linuxs

require GAS
GAS is the assembler
uses
gcc
format. The
that all GNU compilers
the AT&T syntax which
use. It
4.7. INTERFACING ASSEMBLY
WITH C
81
1
segment .data
dd
format
db
"x
45
= %d\n", 0
...
segment .text
6
push
dword [x]
;
push xs
push
dword format
;
push address
call
_printf
;
note underscore!
add
esp, 8
;
remove
10
value
of format string
parameters from stack
Figure 4.11: Call to printf
an assembly
more standardized on
technique of calling
much
subroutine is
the PC.
Assembly routines
the following
Direct
are usually
used with C for
reasons:
access
is needed to hardware features of
are
or impossible to access
the computer that
diffi cult
from C.
The routine must be
as fast as possible
and the
programmer can hand

optimize the code better than the compiler
can.
reason
as valid as it once was.

over the years
compilers can often generate very effi cient
(especially
if compiler optimizations
are
The last
is not
Compiler technology
and
code
turned
on)
routines
are: reduced
Most
has improved
The
of
the
disadvantages
of
assembly
portability and readability.
calling
conventions
already been specified. However,
there
have
are a
few
additional features that need to be described.
4.7.1
Saving registers
First, C
assumes
that
the values of the following
subroutine
maintains
The register keyword
can
registers: EBX, ESI, EDI, EBP, CS, DS, SS, ES.
This does not

dec-
mean
that
be used in
C variable
the subroutine
can not change
them internally. Instead, it means that if
it does change their values, it must restore their original values before the
subroutine returns.
The EBX, ESI and EDI values must be unmodified
because C uses these registers for register variables. Usually the stack is
used to save the original values of these registers.
ern
compilers
relatively
do this auto-is very
similar
syntaxes
laration to suggest to the

compiler that it use a register for this variable instead of
amemory locaare known as
tion. These
register variables.
different
from
Mod-
the
of MASM, TASM and NASM.
matically without requiring
any suggestions.
82
EBP + 12
value of
EBP + 8
address of format string
EBP + 4
Return address
EBP
saved EBP
Figure 4.12: Stack inside printf
4.7.2
Labels of functions
Most
compilers
prepend
single
underscore( ) character at the be-ginning of the
names
of functions
For example,
and global/static
a function
variables.
named f will be assigned
the label f.Thus, if this is to be
an
assembly
routine, it must be labelled f,not f.The Linux
gcc
compiler
does
not
Under Linux ELF executables,
use
any character.
one simply would
prepend
the label f for the C function f.However,
an underscore. Note
that inthe assembly skeleton program (Figure 1
.7), the label for the main routine is asm main.
DJGPP s
gcc
does prepend
Passing parameters
4.7.3
Under the C calling convention, the arguments of
a function are pushed on the stack inthe reverse

order that they appear inthe function call.
Consider the following C statement: printf("x
%d\n",x); Figure 4.11 shows how this would be

compiled (shown inthe equivalent NASM format)
Figure 4.12 shows what the stack looks like

after the prologue inside the printf function.
The printf function is one of the C library

functions that
can take any number
of arguments.
The rules of the C calling conventions
necessary to use were specifically written
It is not
to allow these types of functions. Since the

address assembly to process
anar- of the format
string is pushed last, its location
on the stack
will
always be at
bitrary
number
of
argu-
EBP + 8 no matter how
header file defines
many parameters are passed to the function. The

can then look at the format string to determine how many
parameters should have been passed and look for them on the stack.
Of course, if a mistake is made, printf("x = %d\n"), the printf code
good C book for details.
will still print out the double word value at [EBP
ments inC.The stdarg.h
macros
that can be used to process
them portably.
See any
printf code
+12]. However,
not be xs value!
4.7.4
Calculating addresses of local
this will
variables
Finding the address of

data
linker
or
bss segments is simple
does
this.
a label defined
However,
inthe
Basically, the
calculating
the
a local variable (or parameter) on the

stack is not as straightforward. However, this is
a very common need when calling subroutines
Consider the case of passing the address of a
variable (lets call it x) to a function
address of
WITH C
83
x is located at EBP
one cannot just use:
(lets call it foo). If

the stack,
mov
eax, ebp
8 on
-8
Why? The value
that MOV
must be computed by the

must in the end be
stores into EAX
as-sembler
a constant).
(that is, it
However, there
is an instruction that does the desired calculation.

It is called LEA (for Load Ef-fective Address). The
following
would calculate
the address
of
and
store it into EAX:
eax, [ebp
lea
-8]
Now EAX holds the address of

pushed
Do
on the
not
be
instruction
however,
is
x and
could be
stack when calling function foo
confused,
reading
it
the
looks
data
like
at
this is not true. The LEA
this
[EBP8];
instruction
not
never
true reads
The memory!
LEA in stru
It only
ction computes
the
address
that
instruction
would
be
read
by
and stores this address
another
in its first
register operand. Since it does not actually read
any memory, no memory

dword) is needed or allowed.
4.7.5
( e.g.
size designation
Returning values
Non-void
conventions
are
done. Return values
types
integral
returned
return back
C functions
The C calling
(char,
in the
value
how this is
specify
passed via registers. All
enum,
int,
EAX
register
etc.)
If they
are
are
are extended to 32-bits

are extended
are signed or unsigned types.)
smaller than 32-bits, they
when stored in EAX. (How they

depends
64-bit
on if they
are
values
returned
in the EDX:EAX
are also stored in

EAX. Floating point values are stored in the ST0
register of the math coprocessor. (This register
register
pair. Pointer values
is discussed inthe floating point chapter.)
4.7.6
The
Other calling conventions
rules
above
describe
the
standard
calling convention that is sup-ported by all 80x86

C compilers
calling
support
Often compilers
conventions
as
well.
other
When interfacing
with assembly language it is
very
know what calling
convention
using when it calls
your
important
to
the compiler
is
function. Usually, the
the standard calling con-vention;

4.
however, this is not always the case
Compilers
default is to
use
that
use
multiple
command
conventions
line switches
can
that
often
have
to
be used
change
4
The Watcom
does not
example
use
C compiler
the standard
source
is
an
example
conven-tion
of
one
that
by default. See the
code file for Watcom for details
84
the
calling
They
also
provide
to the C syntax to explicitly assign
conventions
However,
and
convention.
default
extensions
these
may vary
to
individual
extensions
are not
functions
standardized
from one compiler to another.
The GCC
compiler
allows
conventions. The convention
of
different
calling
a function can be
explicitly
declared
by using the
exten-sion. For example, to declare
uses the standard

that takes a single
that
following
a void
function
calling convention named f

int
parameter,
use
the
syntax for its prototype:
void f(int )
attribute
supports
also
GCC
attribute
the
((cdecl));
standard
convention. The function above
call
calling
could be declared
to use this convention by replacing the cdecl with

stdcall.
The difference in stdcall and cdecl is
that stdcall requires

the parameters
calling
convention
convention
take
can
a fixed
the subroutine
to
remove
from the stack (as the Pascal
does)
Thus, the stdcall
only be used with functions that
number of arguments
(i.e.
ones not
like printf and scanf).
an additional attribute
regparm that tells the compiler to use
registers to pass up to 3 integer arguments to a
function instead of using the stack.
This is a
common type of optimization
that
many
GCC also supports
called
compilers support.
Borland and Microsoft
to declare
calling
cdecl and
immediately
prototype.
would
conven-tions.
stdcall
as
act
keywords
use a common syntax
function
the
before
They
add the
to C. These
keywords
modifiers
appear
in a
and
name
function
For example, the function f above
be defined
as
follows
for Borland
and
Microsoft:
void
cdecl f(int );
There
each
are
advantages
the
of
calling
and disadvantages
conven-tions.
The
to
main
advantages
of the cdecl convention is that it is
simple and
very
flexible. It
can
be used for
any
type of C function and C compiler. Us-ing other

conventions
can
limit
the
portability
subroutine. Its main disadvantage
of
the
can
use more
is that it
some of the others and

every invocation of the function
to remove the parameters on the
be slower than
memory
(since
requires
code
stack).
The advantages of the stdcall convention is

that it
uses
less
memory
than cdecl. No stack
cleanup is required after the CALL instruction. Its
main
disadvantage
with functions
is that it
can not
that have variable
be used
numbers
of
arguments.
The advantage
uses
registers to
of using
pass
convention
that
integer parameters is speed.
The main disadvantage
is that the convention
more
parameters
complex.
Some
registers and others
on the stack.
may
be
is
in
WITH C
85
4.7.7
Examples
example
that shows how an
can be interfaced to a C program.
(Note that this program does not use the assembly
skeleton
program (Figure 1.7) or the driver.c
Next
is
an
assembly routine
module.)
main5.c
#include <stdio.h>
/ prototype
for assembly routine /

3
void calc sum( int

attribute
int main( void )

6
int
n,sum;
int )
((cdecl));
up to:);
printf (Sum integers
&n);
10
scanf(%d,
11
calc sum(n, &sum);
printf (S
(Sum is %d\n, sum);
12
return 0;
13
14
main5.c
sub5.asm
;
subroutine _calc_sum
;
finds the sum of the integers
through n
1
;
Parameters:
; n
what
to sum up to (at [ebp
+ 8])
5
sump
-pointer
into (at [ebp

7
;
void
;
{
10
11
;
;
;
+ 12])
to int to store
;
pseudo
calc_sum( int
int i,
sum
n,int
= 0;
for( i=1; i
<= n;i++ )
sum += i;
sum
C code:
*sump )
; *sump
;
}
12
13
=sum;
14
segment .text
15
global
16
_calc_sum
17
86
Sum integers
up to:10
Stack Dump # 1
EBP
= BFFFFB70
ESP
= BFFFFB68
+16
BFFFFB80
080499EC
+12
BFFFFB7C
BFFFFB80
+8
BFFFFB78
0000000A
+4
BFFFFB74
08048501
+0
BFFFFB70
BFFFFB88
-4
BFFFFB6C
00000000
-8
BFFFFB68
4010648C
Sum is 55
Figure 4.13:
18
;
local variable:
19
sum at [ebp-4]
20
_calc_sum:
21
enter
4,0
22
push
ebx
23
24
mov
25
dump_stack 1,2,4
;
;
26
mov
ecx, 1
;
;
27
dword [ebp-4],0
for_loop:
28
cmp
ecx, [ebp+8]
29
jnle
end_for
31
add
[ebp-4],
32
inc
ecx
33
jmp
short for_loop
36
mov
ebx, [ebp+12]
37
mov
eax, [ebp-4]
38
mov
[ebx],
40
pop
ebx
41
leave
30
ecx
34
35
end_for:
eax
39
Sample
run of sub5 program
;
make room for sum on stack
;
IMPORTANT!
sum =0
print out stack from ebp-8 to ebp+16
ecx
is i
inpseudocode
cmp
i
and
if not i
<= n,quit
sum += i
ebx
eax
=sump
=sum
restore ebx
42
ret
sub5.asm
WITH C
87
Why is line 22 of sub5.asm
so
important?
Because
the C calling
con-vention
value of EBX to be unmodified

call. If this is not done, it is
program
Line
macro
just a
requires
the
by the function
very
likely that the
will not work correctly.
25 demonstrates
how
works. Recall that the

numeric
the dump stack
first parameter
is
label, and the second and third
many double words to

display below and above EBP respec-tively. Figure
4.13 shows an example run of the program. For
this dump, one can see that the address of the
dword to store the sum is BFFFFB80 (at EBP +
12); the number to sum up to is 0000000A (at
parameters
EBP
determine how
+ 8); the
return address for the routine is
08048501 (at EBP

BFFFFB88
+ 4); the saved
EBP value is
(at EBP); the value
variable is 0 at (EBP
- 4); and
of the local
finally the saved
EBX value is 4010648C (at EBP
-8).
sum function could be rewritten to

return the sum as its return value instead of using
a pointer parameter. Since the sum is an
integral value, the sum should be left inthe EAX
register. Line 11of the main5
c file would be
The calc
changed to:
sum = calc
sum(n);
Also, the prototype of calc
sum would
need be
altered. Below is the modi-fied assembly code:
sub6.asm
;
subroutine _calc_sum
;
finds the sum of the integers
through n
1
2
;
Parameters:
-what to sum up to (at
;
Return value:
; value of sum
;
pseudo C code:
;
int calc_sum( int n)
;
{
=0;
10
int i,
sum
11
for( i=1; i
<= n;i++ )
13
;
;
return
14
;
}
15
segment .text
12
16
sum += i;
sum;
global
_calc_sum
17
18
;
local variable:
19
20
_calc_sum:
21
[ebp
sum at [ebp-4]
enter
4,0
+ 8])
;
make room for sum on stack
88
segment .data
format
34
db "%d", 0
...
segment .text
5
lea
eax, [ebp-16]
push
eax
push
dword format
call
_scanf
add
esp, 8
10
11
...
Figure 4.14: Calling scanf from assembly
22
23
mov
dword [ebp-4],0
24
mov
ecx, 1
25
for_loop:
26
cmp
ecx, [ebp+8]
27
jnle
end_for
29
add
[ebp-4],
30
inc
ecx
28
ecx
31
jmp
short for_loop
mov
eax, [ebp-4]
32
33
end_for:
34
35
leave
36
;
sum = 0
;
ecx is i
inpseudocode
;
cmp
i
and
;
if not
i
<= n,quit
;
sum += i
;
eax
37
= sum
ret
sub6.asm
4.7.8
Calling C functions from
assembly
One great advantage
assembly is that allows
of interfacing
as-sembly
code to
the large C library and user-written
C and
access
functions
For example, what if one wanted to call the scanf
function to read in an integer

Figure
from the keyboard?
4.14 shows code to do this. One
important
the C calling standard to the letter. This

that it
preserves
EDI registers;
registers
definitely
very
point to remember is that scanf follows
may
means
the values of the EBX, ESI and
however, the EAX, ECX and EDX

be modified!
be changed,
return value of the scanf
as
In fact, EAX
will
it will contain
the
4.8. REENTRANT AND RECURSIVE
89
SUBPROGRAMS
call. For other examples of using interfacing

with C,look at the code in
asm io.asm
which
was used to create asm io.obj.

4.8
Reentrant and Recursive
Subprograms
A reentrant subprogram must satisfy the

following properties:
It must not modify
any
code instructions.
In
a high level language

this would be diffi cult, but in assembly it is not
hard for a program to
try to modify its own code. For example:
mov
word [cs:$+7], 5
into the word 7 bytes ahead
;
previous
;
copy
add
ax, 2
statement changes 2 to 5!
This code would
work
in real mode, but in
protected
mode
operating
segment is marked
as
systems
the
code
read only. When the first
program will be aborted

on these systems. This type of programming is
bad for many reasons. It is confusing, hard to
line above executes, the
and does not allow code sharing (see
maintain
below).
It must not modify global data (such
as data
in
the data and the bss
tasegments).
in
All variables
are stored on the
stack.
There
are several
advantages to writing reentrant
code.
A reentrant subprogram
can be called
recursively.
program can
processes. On many
reentrant
multiple
multi-tasking
operating
multiple instances of
program
be
shared
by
systems, if there
are
running, only
one copy
of the code
is inmemory. Shared
libraries and DLLs (Dynamic
Link Libraries)
use this idea as well.
Reentrant
subprograms
multi-threaded
grams
work much better in
pro-
Windows 9x/NT and most UNIX-like
operating systems (Solaris,
Linux,
support
etc.)
multi-threaded
programs.
4.8.1
Recursive subprograms
These types of subprograms

The recursion
can be
occurs
Direct recursion
foo,
calls
recursion
itself
occurs
either
direct
when
inside
when
call themselves.
foos
or
indirect.
subprogram,
body.
a subprogram
say
Indirect
is not called
by itself directly, but by another subprogram
it
calls. For example, subprogram foo could call bar

and bar could call foo.
5
A multi-threaded
execution. That is,the
program
program
has multiple
threads
itself is multi-tasked.
90
of
;
finds
segment .text
global _fact
n!
_fact:
enter
0,0
67
mov
eax, [ebp+8]
cmp
eax, 1
;
eax =n
;
if n<= 1,terminate
jbe
term_cond
10
dec
eax
11
push
eax
12
call
_fact
;
eax =fact(n-1)
13
pop
ecx
;
answer
14
mul
dword [ebp+8]
;
edx:eax
15
jmp
short end_fact
16
=eax *[ebp+8]
term_cond:
mov
17
18
ineax
eax, 1
end_fact:
19
leave
20
ret
Figure 4.15: Recursive factorial function
Recursive
termination
true,
subprograms
condition.
no more
recursive
condition
recursion
routine
must
When this
recursive
calls
have
condition
is
are made. If a
a termination
does not have
or the condition never becomes true, the

will never end (much like an infinite
loop).
Figure 4.15 shows
a function
that calculates
factorials recursively.
It could be called from C
with:
x =fact
/ find 3!/
(3);
Figure 4.16 shows what the stack looks like at its

deepest point for the above function call.
Figures
4.17
and 4.18
complicated recursive example

respectively.
show
another
What is the output is for f(3)?
Note that the ENTER instruction creates
on the
stack for each recursive
recursive
more
in C and assembly,
instance of f has its
variable i. Defining
i
as
a new
call. Thus, each
own
independent
double word in the
data segment would not work the same.

SUBPROGRAMS
n=3 frame
n=2 frame
n=1frame
91
n(3)
Return address
Saved EBP
n(2)
Return address
Saved EBP
n(1)
Return address
Saved EBP
Figure 4.16: Stack frames for
factorial function
1
void f(int
x)
int i;
for( i=0; i
< x;i++ ){
printf (%d\n,
i);
f(i);
Figure 4.17: Another example (C version)
4.8.2
types
Review of C variable storage
C provides several types of variable storage.
are
global These variables
any
function and
locations
are
defined outside of
stored
(in the data
or
memory
at fixed
bss segments)
exist from the beginning of the
program
and
until the
can be accessed from any

program; however, if they are
end. By default, they

function in the
as
declared
same
static, only the functions
can access
module
in the
them (i.e. in assembly
terms, the label is internal, not external).

static These
that
are
are
local variables
the keyword
static
for two different
function
purposes!)
are also stored at fixed memory

or bss), but can only be directly
inthe functions they are defined in.
These variables
locations (in data
accessed
of
declared static. (Unfortunately, C uses
92
%define i
ebp-4
%define
segment .data
format
segment .text
;
useful macros
;
10 =\n
db "%d", 10,0
global _f
extern _printf
x ebp+8
_f:
enter
4,0
;
allocate room on stack
mov
dword [i],0
;
i
=0
13
mov
eax, [i]
;
is i
<x?
14
cmp
eax, [x]
15
jnl
quit
17
push
eax
18
push
format
19
call
_printf
20
add
esp, 8
22
push
dword [i]
23
call
_f
24
pop
eax
26
inc
dword [i]
27
jmp
short lp
for i
10
11
12
lp:
16
;
call printf
21
;
call f
25
28
;
i++
quit:
29
leave
30
ret
Figure 4.18: Another example (assembly version)

SUBPROGRAMS
93
automatic
type for
This is the default
a funcvariables are
variable defined inside

tion. This
allocated
on
the
stack when the function they
are
defined in is invoked and
are
deallocated
when the function returns.
Thus,
they
not
do
have
fixed
memory
locations.
register This keyword asks the compiler to
use
register for the data in this variable. This is
just
a request.
The compiler does not have to
honor it. If the address
of the variable
anywhere in the
program
(since registers
do not have addresses)
only
values.
simple
integral
Structured types
not fit in
automatically
types
can
be
Also,
register
can not be; they
register! C compilers
make normal
into register variables
is used
it will not be honored
automatic
without
any
would
will often
variables
hint from the
programmer.
volatile This keyword tells the compiler that the
value of the variable
may
any moment.
can not make any
change
compiler
assumptions
modified. Often
about
This
means
that the
when the variable
is
a compiler
in
the register in place of the variable in
might store the value of

register temporarily and
variable
use
section of code. It can not

do
these
types
of
optimizations
with
volatile variables. A common

example
one could
of
volatile
variable
be altered by two
threads
of
program
multi-threaded
Consider the following code:

1
If
would be
x =10;
y =20;
z = x;
could be altered by another
thread, it is
x between
so that z would not be 10.
if the x was not declared volatile, the
might assume that x is unchanged and
possible that the other thread changes
lines 1
and 3
However,
compiler
set z to 10.
use of volatile is to keep

from using a register for a variable.
Another
the compiler
94
Chapter 5
Arrays
5.1
Introduction
An array is a contiguous block of list of data

in memory. Each element
of the list must be the
same type and use exactly the same number of

bytes of memory for storage. Because of these
properties, arrays allow effi cient access of the data
by its position (or index) in the array. The
address of any element can be computed by
knowing three facts:
The address of the first element of the
array.
The number of bytes in each element

The index of the element
It is convenient
to consider the index of the
array to be zero (just as in C).

to use other values for the first
first element of the
It is possible
index, but it complicates the computations.
5.1.1
arrays
Defining
arrays
Defining
inthe data and bss
segments
To define
segment,
NASM
use
initialized
also provides
TIMES that
many
an
can
times
statements
array
in the data
the normal db, dw, etc. directives.
useful
directive
be used to repeat
without
having
by hand. Figure
named
a statement
to duplicate
the
5.1 shows several
examples of these.
array in the bss

resw, etc. directives.
Remember that these directives have an operand
that spec-ifies how many units of memory to
reserve. Figure 5.1 also shows examples of these
To define
segment,
use
an
uninitialized
the
resb,
types of definitions.
95
96
CHAPTER 5.ARRAYS
segment .data
;
define array
a1
;
define array
a2
;
same as before
a3
;
define array
a4
of 10 double words initialized to 1,2,..,10
1,2,3,4,5,6,7,8,9,10
dd
of 10 words initialized to 0
0,0,0,0,0,0,0,0,0,0
dw
using TIMES
times 10 dw 0
of bytes with 200 0s and then a 100 1s
times 200 db 0
times 100 db 1
10
11
12
segment .bss
13
;
define anarray
14
a5
15
;
define anarray
16
a6
resd
resw
of 10 uninitialized double words
10
of 100 uninitialized words
100
Figure 5.1: Defining
Defining
arrays
arrays as local variables on the
stack
no direct way to define a local

array variable on the stack. As before, one
There is
computes the total bytes required by all local

variables,
including
from ESP (either
arrays,
directly
and subtracts
or
using the
this
ENTER
instruction). For example, if
a function
needed
character variable, two double word integers and
a 50 element
4
+ 50
word
array, one
= 109 bytes.
would need 1+ 2
However, the number
a multiple of four
(112 in this case) to keep ESP on a double word
boundary.
One could arrange
the variables
inside this 109 bytes in several ways. Figure 5.2
shows two possible ways
The unused part of
subtracted
the first
from ESP should be
on double
memory accesses.
words
5.1.2
is there to keep the double
ordering
word boundaries
Accessing elements of
There
language
array,
is
as
its address
the following
array1
array
no
[ ] operator
in C. To
two
access an
arrays
in assembly
element of
an
definitions:
5, 4, 3, 2, 1
db
;
array
Here
up
must be computed. Consider
array
of bytes array2
2,1
to speed
are some
dw
of words
examples using this
5.1. INTRODUCTION
5,4,3
arrays:
97
EBP
-1
-8
EBP -12
EBP
char
unused
dword 1
dword 2
word
array
word
-100
-104
EBP -108
EBP -109
array
EBP
EBP
EBP
-112
dword 1
dword 2
char
unused
Figure 5.2: Arrangements of the stack
mov
al,[array1]
mov
al,[array1 + 1]
mov
[array1 + 3],al
mov
ax, [array2]
mov
ax, [array2 + 2]
mov
[array2
mov
ax, [array2 + 1]
+ 6],ax
Inline 5,element 1
of the word
array
;
al=array1[0]
;
al=array1[1]
;
array1[3]
=al
;
ax =array2[0]
;
ax =array2[1]
;
array2[3]
(NOT array2[2]!)
=ax
;
ax =??
is referenced, not element 2.Why?
are two byte units, so to move to the next

element of a word array, one must move two bytes
ahead, not one. Line 7 will read one byte from the
first element and one from the second. In C, the
compiler looks at the type
of a pointer in
determining how many bytes to move in an
expression that uses pointer arithmetic so that the
programmer does not have to. However, in
assembly, it is up to the programmer to take the
size of array elements in account when moving
Words
from element to element.
Figure 5.3 shows

all the
elements
example code.
code snippet that adds
of array1
in the
previous
In line 7, AX is added to DX.
Why not AL? First, the two operands of the ADD
same size. Secondly, it

would be easy to add up bytes and get a sum
that was too big to fit into a byte. By using DX,
sums up to 65,535 are allowed. However, it is
instruction
must be the
important to realize that AH is being added also.

This is why AH is set to zero
inline 3.
Figures 5.4 and 5.5 show two alternative
to calculate the
sum.
ways
The lines in italics replace
lines 6 and 7 of Figure 5.3.

1
Setting AH to
an
unsigned
zero
number.
is implicitly
If
action would be to insert

and 7
it is
a CBW
assuming that AL is
signed,
the
appropriate
instruction between lines 6
98
CHAPTER 5.ARRAYS
mov
ebx, array1
;
ebx =address
mov
dx,0
;
dx will hold sum
mov
ah,0
;
?
mov
ecx, 5
mov
al,[ebx]
;
al=*ebx
add
dx,ax
;
dx += ax (not
inc
ebx
;
bx++
loop
lp
of array1
lp:
al!)
Figure 5.3: Summing elements of an array

(Version 1)
1
mov
ebx, array1
;
ebx =address
mov
dx,0
;
dx will hold sum
mov
ecx, 5
add
dl,[ebx]
;
dl+= *ebx
jnc
next
;
if no carry
inc
dh
;
inc dh
inc
ebx
;
bx++
loop
lp
10
of array1
lp:
goto next
next:

(Version 2)
5.1.3
More advanced indirect
addressing
Not surprisingly, indirect addressing

is often used with arrays. The most
general form of an indirect
memory
reference is:
[base
reg + factor*index reg + constant]

where:
base
reg is one of the registers
EAX,
EDI.
EBX, ECX, EDX, EBP, ESP, ESI or
factor is either 1,2,4 or 8.(If 1,
factor is omitted.)
index
reg is one of the registers
EAX,
EBX, ECX, EDX, EBP, ESI, EDI.

(Note that ESP is not inlist.)
5.1. INTRODUCTION
99
1
mov
ebx, array1
;
ebx =address
mov
dx,0
;
dx will hold sum
mov
ecx, 5
add
dl,[ebx]
;
dl+= *ebx
adc
dh,0
;
dh+= carry
inc
ebx
;
bx++
loop
lp
of array1
lp:
flag
+0

(Version 3)
constant is a 32-bit constant.
The constant
can be a label (or a label

expression).
5.1.4
Example
Here is an example that uses anarray and

passes it to a function. It uses the array1c.c
program (listed below) as a driver, not the
driver.c program.
array1.
1
%define ARRAY_SIZE 100
%define NEW_LINE 10
segment .data
5
FirstMsg
db
"First 10
elements of array", 0
6
Prompt
db
"Enter index of
element to display: ",0

7
SecondMsg
db
"Element %d is
db
"Elements 20
%d", NEW_LINE, 0
8
ThirdMsg
through 29 of array", 0
9
InputFormat
db
"%d", 0
10
11
segment .bss
12
array
resd ARRAY_SIZE
13
14
segment .text
15
extern
_puts, _printf, _scanf
global
_asm_main
_dump_line
16
17
_asm_main:
enter
18
local dword variable at EBP

push
20
21
ebx
push
4,0
esi
-4
19
100
22
;
initialize array
to 100, 99,98
23
24
25
26
mov
ecx, ARRAY_SIZE
mov
ebx, array
init_loop:
27
mov
[ebx],
28
add
ebx, 4
ecx
29
loop
init_loop
31
push
dword FirstMsg
32
call
_puts
33
pop
ecx
35
push
dword 10
36
push
dword
37
call
_print_array
38
add
esp, 8
30
34
array
39
40
;
prompt user
41
Prompt_loop:
for element index
42
push
dword Prompt
43
call
_printf
44
pop
ecx
45
97,
...
CHAPTER 5.ARRAYS
;
print
out FirstMsg
;
print
first 10 elements of
lea
46
eax
= address
47
eax
push
48
push
49
call
_scanf
50
add
esp, 8
cmp
eax, 1
51
eax
eax, [ebp-4]
of local dword
array
dword InputFormat
= return value of scanf
;
je
52
InputOK
53
call
54
_dump_line
rest of line and start

Prompt_loop
57
InputOK:
over
;
dump
jmp
55
;
if input
invalid
56
58
mov
esi, [ebp-4]
59
push
dword [array
60
push
esi
61
push
dword SecondMsg
print out value of element
+ 4*esi]
;
call
62
_printf
63
add
esp, 12
5.1. INTRODUCTION
64
65
push
dword ThirdMsg
66
call
_puts
67
pop
ecx
69
push
dword 10
70
push
dword
71
call
_print_array
72
add
esp, 8
74
pop
esi
75
pop
ebx
76
mov
eax, 0
77
leave
68
array + 20*4
73
ret
78
79
80
81
;
;
routine
_print_array
101
;
print
out elements 20-29
;
address
of array[20]
return back to C
82
;
C-callable
routine that prints out
elements of a double word
array as
83
signed integers.
84
85
;
C prototype:
;
void print_array(
const int *a,int
n);
;
Parameters:
; a
pointer
(at ebp+8 on stack)
86
87
to array to print out

88
-number
integers to print out (at ebp+12
stack)
89
90
segment .data
on
of
91
OutputFormat
db
"%-5d %5d"
92
93
segment .text
global
94
95
_print_array
_print_array:
96
enter
0,0
97
push
esi
98
push
ebx
100
xor
esi, esi
101
mov
ecx, [ebp+12]
102
mov
ebx, [ebp+8]
99
103
print_loop:
push
104
ecx
105
NEW_LINE, 0
;
esi = 0
;
ecx = n
;
ebx = address of array
;
printf might change ecx!
102
+ 4*esi]
106
push
dword [ebx
107
push
esi
108
push
dword OutputFormat
109
call
_printf
110
add
esp, 12
112
inc
esi
113
pop
ecx
114
loop
print_loop
pop
pop
ebx
117
118
leave
119
ret
111
115
116
esi
array1.
CHAPTER 5.ARRAYS
;
push array[esi]
;
remove
parameters (leave ecx!)
array1c.c
#include <stdio.h>
int
4
asm main(
void );
void dump line( void );
int main()
7
{
{
int ret status
=asm main();
ret status
10
return ret status
11
12
13
14
function dump line
15
dumps all chars left in current line from
input buffer
16
17
void dump line()
18
int ch;
19
20
while( (ch
21
\n)
23
22
= getchar())
/ null body/
!= EOF && ch !=
array1c.c
5.1. INTRODUCTION
103
The LEA instruction revisited
The LEA instruction can be used

purposes than just calcuating addresses
common one is for fast computations.
the following:
lea
ebx, [4*eax
This effectively
+eax]
for other
A fairly
Consider
stores the value

of eff
5 ective
EAX ly
into
T his
st
EBX. Using LEA to do this
is both easier and
faster
than using MUL. However,
do this
one must
realize
that the expression inside the square brackets must

be a legal indirect address. Thus, for example, this
instruction
can not
be used to multiple
quickly.
5.1.5
Multidimensional Arrays
Multidimensional
different
by 6
arrays are not really very

one dimensional arrays
than the plain
mem-ory as
array.
just that,
Two Dimensional Arrays
Not
are represented in
a plain one dimensional
already discussed. In fact, they
surprisingly,
the
simplest
array is a two dimen-sional one.

A two dimensional array is often displayed as a
grid of elements. Each element is identified by a
multidimensional
pair of indices. By convention,
with the
identified
row
the first index
of the element
and the
rows
and two
second index the column.

Consider
an array
as:
with three
columns defined
int
a [3][2];
reserve
The C
room
compiler
for awould
6 (= 2r
map the elements as follows:
The C compiler would

integer
array
Index
Element
What
and
0
a[0][0]
1
a[0][1]
the table attempts
element
referenced
a[1][0]
a[1][1]
a[2][0]
a[2][1]
to show is that the
as a [0] [0] is stored at the

one dimensional array.
beginning of the 6 element

Element
is
a[0] [1] is stored in the next position
so on. Each row of the two

dimensional
array is stored contiguously in
memory. The last element of a row is followed
by the first element of the next row
This is
known as the rowwise representation of the array
and is how a C/C++ compiler would represent
the array.
(index
1) and
How does the compiler determine
[j]
appears
in the rowwise
where a[i]
representation?
simple formula will compute the index from i

and
j
The formula inthis
hard to
see
case is 2i + j.Its not too
how this formula is derived. Each
row is two elements long; so, the first element

of row i
is at position 2i. Then the position of
104
CHAPTER 5.ARRAYS
mov
2
eax, [ebp
sal
-44]
eax, 1
;
ebp
-44 isis location

;
multiple i
by 2
3
-48]
add
eax, [ebp
mov
eax, [ebp + 4*eax
add j
4
-40]
-40 is the address of a[0][0]

[ebp -52], eax
;
store
result into x (at ebp -52)
ebp
mov
Figure 5.6: Assembly for x
= a[i ][j]
This analysis also shows how the formula is
generalized to anarray with N columns: N i+j.

Notice that
columns
the formula does not depend
number of
on the
rows.
As an example, let
following code (using
us see how gcc compiles the

the array a defined above):
x = a[i ][ j];
Figure 5.6 shows the assembly this is translated
into. Thus, the compiler
essentially converts the
code to:
x = (&a[0][0] + 2i + j);
and infact, the programmer could write this way
with the
same result.
There is nothing magical about the choice of
the rowwise representation
of the
array. A
columnwise representation would work just
as
well:
0
Index
Element
a[0][0]
1
a[1][0]
a[2][0]
a[0][1]
a[1][1]
a[2][1]
Inthe columnwise representation, each column

is stored contiguously. El-ement [i][j] is stored
at position i
+ 3j.Other languages
use the columnwise
for example)
(FORTRAN,
representation. This is important when

interfacing code with multiple languages.
Dimensions Above Two
For dimensions above two, the same basic

idea is applied. Consider
three dimensional
array:
int b[4][3][2];
This
array
would be stored like it
dimensional
consecutively
arrays each of
in memory. The
how it starts out:

5.1. INTRODUCTION
105
was
size
four two
[3]
[2]
table below shows
Index
Element
b[0][0][1]
b[0][1][0]
b[0][1][1]
b[0][2][0]
b[0][2][1]
10
b[1][0][0]
b[1][0][1]
b[1][1][0]
b[1][1][1]
b[1][2][0]
Index
Element
b[0][0][0]
11
b[1][2][1]
The formula for computing the position of b[i]
[j][k] is 6i + 2j + k.The 6 is determined by
arrays. Ingeneral, for

as a[L][M][N] the position
the size of the [3][2]
anar-ray dimensioned
of element a[i][j][k] will be M
+ k.Notice
does not
i+ N
again that the first dimension (L)
appear
inthe formula.
same process
anndimen-sional array of
For higher dimensions, the
generalized. For
dimensions D1 to Dn
is
the position of element
denoted by the indices i1 to in is given by the

formula:
D2
D3
+ Dn
Dn
in1
i1
+D3
D4
Dn
i2
+ in
or for the uber math geek, it can be written more

succinctly as:
Dk
j=1
ij
k=j+1
does not
The first dimension, D1

formula.
appear
This is where
in the
you can
tell
For
the
columnwise
general formula would be:
representation,
the author
the
was
a physics
major. (Or
+ D1
i2 + + D1
was the refer-
i1
D2
D2 Dn1 in
Dn2
in1 + D1
ence to FORTRAN a give- or
away?)
inuber math geek notation:
j1
Dk
j=1
ij
k=1
Inthis case, it is the last dimension, Dn
not
appear
in the formula.
Passing Multidimensional Arrays
that does
as
Parameters inC
The
rowwise
representation
of
arrays has a direct effect in C

programming. For one dimensional arrays, the
size of the array is not required to compute where
any specific element is located in memory. This is
not true for multidimensional arrays
To access
the elements of these arrays, the compiler must
multidimensional
know all but the first dimension.
apparent
when considering
function that takes
This becomes
a
array as a
the prototype
a multidimensional
of
parameter. The following will not compile:

void
f(
int
information /
[][]);
no dimension
106
CHAPTER 5.ARRAYS
However, the following does compile:

void f(int
a [][2] );
Any two dimensional
array
with two columns
can
be passed to this function. The first dimension is

2.
not required
Do not be confused by
a function
with this
prototype:
void f( int
This defines
a []);
a single
array of integer
can be used to create an
acts much like a two
dimensional
pointers (which incidently
array
of arrays that
dimensional
array).
For higher dimensional
first dimension of the
arrays,
all but the
array must be specified

a four dimensional
parameters. For example,
array parameter
void f(int
5.2
might be passed like:
a [][4][3][2]
);
Array/String Instructions
for
The
80x86
family
of
These
instructions.
provide
are de-signed to work with

are called string
several instructions that
arrays
processors
instructions
They
use
the index registers
(ESI
an operation and then to

or decrement one or
reg-isters. The direction flag
and EDI) to perform

automatically
increment
both of the index
(DF) in the FLAGS register determines where the
are incremented or
are two instructions
that
index registers
There
decremented
modify
the
direction flag:
CLD clears the direction flag. Inthis state, the
index registers
are incre-
mented.
STD sets the direction flag. Inthis state, the
index registers
are decre-
mented.
very common
mistake in 80x86 programming
is
to forget to explicitly put the direction flag in the

correct
works
state. This often leads
to code that
most of the time (when the direction flag
happens to be in the desired state), but does not
work allthe time.
5.2.1
Reading and writing
memory
The simplest string instructions either read
or write memory or both. They may read or

write a byte, word or double word at a time.
Figure 5.7
2
A size
can be specified
here, but it is ignored by the
compiler.
5.2. ARRAY/STRING INSTRUCTIONS
107
=[DS:ESI]
= ESI 1
AX = [DS:ESI]
ESI = ESI 2
EAX = [DS:ESI]
ESI = ESI 4
AL
LODSB
STOSB
LODSW
LODSD
= AL
= EDI 1
[ES:EDI] = AX
EDI = EDI 2
[ES:EDI] = EAX
EDI = EDI 4
[ES:EDI]
EDI
ESI
STOSW
STOSD
Figure 5.7: Reading and writing string instructions
segment .data
array1
dd
1,2,3,4,5,6,7,8,9,10
34
segment .bss
5
array2
resd 10
67
segment .text
;
dont
cld
mov
esi, array1
10
mov
edi, array2
11
mov
ecx, 10
12
forget this!
lp:
13
lodsd
14
stosd
15
loop
lp
Figure 5.8: Load and store example
shows
these
pseudo-code
are several
instructions
description
with
short
of what they
do. There
points to notice here. First, ESI is used
for reading
and
easy to
EDI for writing. It is
remember this if
one
remembers that SI stands for
Source Index and DI stands for Destination Index.
Next, notice that the register that holds the data
or
is fixed (either AL, AX
that the storing instructions

the segment
as
DS
programming,
programmer
ES to detemine
this is not usually
there is only
one
should be automatically
(just
use
to write to, not DS. In protected
mode programming
since
EAX). Finally, note
is).
it
is
a problem,
data segment
initialized
However,
very
to initialize
3.
in real
important
ES
and ES
to reference it
to the
mode
for
the
correct
segment
selector
use
example
3
Another
value
of
Figure
these
complication
5.8 shows
instructions
is that
an
that
one can not copy
the
value of the DS register into the ES register directly using

single
copied
MOV instruction.
to
general
Instead,
purpose
the value of DS must
register
be
(like AX) and then
copied from that register to ES using two
108
CHAPTER 5.ARRAYS
MOVSB
byte [ES:EDI]
ESI
EDI
MOVSW
= byte [DS:ESI]
= ESI 1
= EDI 1
word [ES:EDI]
= word [DS:ESI]
= ESI 2
EDI = EDI 2
ESI
MOVSD
dword [ES:EDI]
= dword [DS:ESI]
= ESI 4
EDI = EDI 4
ESI
Figure 5.9: Memory

1
segment .bss
array
move
string instructions
resd 10
34
segment .text
;
dont
cld
mov
edi, array
mov
ecx, 10
xor
eax, eax
rep stosd
forget this!
Figure 5.10: Zero
array
example
copies
an array
The
into another.
combination
of
and
LODSx
STOSx
instruction (as in lines 13 and 14 of Figure 5.8) is
very common.
by
performed
instructions
5.8
single
5.9 describes
Figure
could
difference
be
string instruction.
MOVSx
the operations
that
these
perform. Lines 13 and 14 of Figure
be
instruction
can
In fact, this combination
replaced
with
the
with
same
single
MOVSD
The
effect.
only
would be that the EAX register would
not be used at all inthe loop.
5.2.2
The REP instruction prefix
The
80x86
family
provides
.
a
special
instruction prefix
called REP that can be used
with the above string instructions
This prefix
tells the CPU to repeat the next string instruction
a specified number of times. The ECX
4
MOV instructions.
4
A instruction
prefix
is not
special byte that is placed before
modifies
the instructions
an
a
instruction,
it is
string instruction that
behavior. Other prefixes
used to override segment defaults of memory
are
accesses
also
109
CMPSB
compares
byte [DS:ESI] and byte [ES:EDI]
= ESI 1
EDI = EDI 1
ESI
CMPSW
compares
word [DS:ESI] and word [ES:EDI]
= ESI 2
EDI = EDI 2
ESI
CMPSD
compares
dword [DS:ESI] and dword [ES:EDI]
= ESI 4
EDI = EDI 4
ESI
SCASB
compares
EDI
SCASW
compares
EDI
SCASD
AX and [ES:EDI]
compares
EDI
AL and [ES:EDI]
1
EAX and [ES:EDI]
Figure 5.11: Comparison string instructions
register is used to count the iterations (just

for the LOOP instruction).
as
Using the REP prefix,
the loop in Figure 5.8 (lines 12 to 15) could be

replaced
with a single line:
rep movsd
Figure 5.10 shows another example that
zeroes
out the contents of an array.
5.2.3
Comparison string instructions
Figure
5.11
shows
new
several
string
that can be used to

compare
memory with other memory or a register.
They are useful for comparing or searching
arrays. They set the FLAGS register just like
instructions
the
CMP instruction.
compare
corresponding
scan memory
SCASx
The
instructions
CMPSx
memory
locations and the
locations
for
specific
value.
Figure 5.12 shows
searches
array.
short code snippet that
for the number
12
The SCASD instruction
a double word
on line 10 always
in
even if the value searched for is

one wishes to find the address of
in the array, it is necessary to
adds 4 to EDI,
found. Thus, if
the 12 found
subtract 4 from EDI (as line 16 does).
5.2.4
The REPx instruction prefixes
There
are several other REP-like instruction

can be used with the comparison
prefixes that
string instructions.
Figure 5.13 shows the two
bel.
110
CHAPTER 5.ARRAYS
segment .bss
array
resd 100
34
segment .text
5
cld
mov
edi, array
;
pointer
mov
ecx, 100
;
number
of elements
mov
eax, 12
;
number
to scan for
lp:
10
scasd
11
je
found
12
loop
lp
13
;
code to perform
14
15
16
17
18
to start of array
jmp
onward
sub
edi, 4
if not found
found:
;
code to perform
;
edi now
points to 12 inarray
if found
onward:
Figure 5.12: Search example
or at most ECX times

or at most ECX
REPE, REPZ
repeats instruction while Z flag is set
REPNE, REPNZ
repeats instruction while Z flag is cleared

times
Figure 5.13: REPx instruction prefixes
prefixes
and
are
(as
describes
are
and REPZ
just
REPNE
comparison
their
synonyms
and
operation.
REPE
same
prefix
for the
REPNZ).
If the
repeated
stops because
string instruction
the result of the comparison, the index register
are
registers
decremented;
still
however,
incremented
and
of
or
ECX
the FLAGS register still
holds the state that terminated the repetition.

Why
can one not
use
the
because of
flag
to
comparisons
just look
Thus, it is possible to
to determine
see
if ECX is
if the
zero
a comparison or ECX
repeated
after
stopped
becoming
zero.
the repeated comparison?
Figure 5.14 shows
an
example
code snippet
that determines
if two
blocks
of
memory are
equal. The JE
example checks to
see
the result
on
line 7 of the
of the previous
instruction. If the repeated comparison
stopped
because it found two unequal bytes, the Z flag
will still be cleared

however,
if the
ECX became
and
no
comparisons
zero, the
branch is
stopped
made;
because
Z flag will still be set and
the code branches to the equal label.
111
1
segment .text
cld
mov
esi, block1
;
address
of first block
mov
edi, block2
;
address
of second block
mov
ecx, size
;
size of blocks
repe
cmpsb
;
repeat
je
equal
;
code to perform
jmp
inbytes
while Z flag is set
;
if Z set, blocks
if blocks
equal
are not equal
onward
equal:
10
11
;
code to perform
if equal
onward:
12
Figure 5.14: Comparing
5.2.5
memory
blocks
Example
This section contains
an assembly source
array
file
with several functions that implement
operations using string instructions. Many of the
functions
duplicate familiar C library functions.
global
_asm_copy,
_asm_find,
memory.asm
_asm_strlen, _asm_strcpy
2
segment .text
;
function _asm_copy
;
copies blocks of memory
;
C prototype
;
void asm_copy( void *dest, const
void *src, unsigned sz);
;
parameters:
; dest
pointer to buffer to copy
4
to
10
src
-pointer
sz
-number
to buffer to copy
from
11
of bytes to copy
12
13
;
next, some
helpful symbols
defined
14
15
%define dest [ebp+8]
16
%define
src
[ebp+12]
17
%define
sz
[ebp+16]
18
_asm_copy:
19
enter
0,0
20
push
esi
112
are
21
push
edi
mov
mov
mov
esi, src
22
23
24
25
edi, dest
ecx, sz
26
27
cld
28
rep
movsb
pop
pop
edi
31
32
leave
33
ret
29
30
esi
34
35
36
37
38
39
;
function _asm_find
;
searches memory for a given
;
void *asm_find( const void
;
parameters:
CHAPTER 5.ARRAYS
;
esi =address
of buffer to copy from
;
edi =address
of buffer to copy to
;
ecx
=number
;
clear
of bytes to copy
direction flag
;
execute
movsb ECX times
byte
*src, char target, unsigned sz);
40
src
-pointer
to buffer to
search
41
42
;
;
target
sz
-byte value to search for

-number of bytes inbuffer
;
return value:
; if target is found, pointer to
first occurrence of target inbuffer
43
44
45
is returned
46
else
;
NULL is returned
;
NOTE: target is a byte value,
pushed on stack as a dword value.
47
48
49
but is
The byte value is stored inthe lower
8-bits.
50
51
%define
52
%define target [ebp+12]
53
%define
src
sz
[ebp+8]
[ebp+16]
54
55
_asm_find:
56
enter
0,0
57
push
edi
mov
eax, target
58
59
has value to search for
;
al
mov
60
edi, src
61
mov
62
cld
ecx, sz
113
63
repne
64
until ECX
je
66
zero
mov
;
scan
scasb
== 0 or [ES:EDI] == AL
67
;
if not
eax, 0
short quit
;
if
found_it
flag set, then found value
return NULL pointer
65
68
found,
jmp
69
found_it:
70
mov
eax, edi
71
dec
eax
found return (DI

73
pop
74
leave
75
ret
-1)
72
;
if
quit:
edi
76
77
78
79
80
81
82
83
84
;
function _asm_strlen
;
returns the size of a string
;
unsigned asm_strlen( const char
;
parameter:
; src
pointer to string
*);
;
return value:
; number of chars
instring (not
counting, ending 0) (inEAX)

85
src [ebp +8]
86
%define
87
_asm_strlen:
88
enter
0,0
89
push
edi
91
mov
edi, src
92
mov
ecx, 0FFFFFFFFh ;
90
93
xor
94
cld
al,al
scasb
95
repnz
96
97
98
99
;
;
repnz
100
;
not
101
will go one step too far,
FFFFFFFF
-ECX
102
mov
eax,0FFFFFFFEh
103
sub
eax, ecx
104
edi
=pointer
use largest
to string
possible ECX
=0
al
scan for terminating

so length
length
-ECX,
-ecx
is FFFFFFFE
=0FFFFFFFEh
114
105
pop
106
leave
107
ret
edi
108
109
;
function
_asm_strcpy
114
;
copies a string
;
void asm_strcpy(
;
parameters:
; dest
pointer
; src
pointer
115
116
%define dest [ebp
117
%define
118
_asm_strcpy:
110
111
112
113
src
char *dest,
to string to
to string to
+ 8]
[ebp + 12]
119
enter
0,0
120
push
esi
121
push
edi
mov
edi, dest
122
123
124
mov
125
cld
126
esi, src
cpy_loop:
127
lodsb
128
stosb
129
or
al,al
130
jnz
cpy_loop
pop
pop
edi
133
134
leave
131
132
esi
CHAPTER 5.ARRAYS
const char *src);
copy to
copy from
;
load
AL & inc si
;
store
;
set
AL & inc di
condition flags
;
if not
past terminating 0,continue
ret
135
memory.
memex.c
#include <stdio.h>
#define STR SIZE 30

/ prototypes
void
asm copy(
unsigned )
void , const void void

,
attribute
asm find( const
((cdecl));
void ,
char target
void
unsigned )
5.2. ARRAY/STRING
INSTRUCTIONS
115
unsigned
attribute
asm strlen( const

((cdecl));
, const char )
10
char )
void
attribute
asm strcpy(
((cdecl));
11
char
12
int main()
13
14
char st1[STR SIZE]
15
char st2[STR SIZE];
16
char st;
17
char
= test
string;
ch;
18
19
asm copy(st2,
st1, STR SIZE);
30 chars of string /
copy
printf (%s\n,
20
all
st2);
21
22
printf (Enter
in string /
24
25
26
27
28
st
23
a char: );
/ look for byte
scanf(%c%[\n],
= asm find(st2,
&ch);
ch, STR SIZE);
if (st )
printf
pr
(Found it:%s\n, st);
else
printf (Not found\n);
29
30
31
st1[0]
= 0;
printf (Enter string :);
32
scanf(%s,
33
printf (len
st1);
= %u\n,
asm strlen(st1));
34
35
asm strcpy(
st2, st1);
copy
meaningful data in string /
36
printf (%s\n,
st2 );
37
return 0;
38
39
memex.c
116
CHAPTER 5.ARRAYS
Chapter 6
Floating Point
6.1
Floating Point
Representation
6.1.1
Non-integral binary numbers
When number systems
first chapter, only integer

Obviously,
it must
non-integral
numbers
decimal. In decimal,
were
discussed in the
were
values
to represent
be possible
in other bases
digits to the
0.123
=1
10
+2
10
as
well
+3
powers
10
Not surprisingly, binary numbers work similarly:
0.1012
=1
+0
+1
as
right of the
decimal point have associated negative
ten:
discussed.
= 0.625
of
This idea
can be combined
with the integer
methods of Chapter 1
to convert
a general
number:
= 4 + 2 +0.25 + 0.125 = 6.375

Converting from decimal to binary is not
very
diffi cult either. Ingeneral, divide the decimal

number into two parts: integer and fraction.
Convert the integer part to binary using the
methods from Chapter 1.The fractional part is

converted using the method described below.
Consider
a binary
fraction with the bits
..
labeled a,b,c,... The number inbinary then

looks like:
0.abcdef
Multiply the number by two. The binary

representation of the
new
...
number will be:
a.bcdef
117
118
CHAPTER 6.FLOATING POINT
0.5625
0.125
0.25
0.5
2
2
2
2
=
=
=
=
1.125
0.25
first bit
second bit
0.5
third bit
1.0
fourth bit
Figure 6.1: Converting 0.5625 to binary
=
=
=
=
1
0
0
1
0.85
0.7
0.4
0.8
0.6
0.2
0.4
0.8
2
2
2
2
2
2
2
2
=
=
=
=
=
=
=
=
1.7
1.4
0.8
1.6
1.2
0.4
0.8
1.6
Figure 6.2: Converting 0.85 to binary
Note that the first bit is
..
now
Replace the a with 0 to get:

0.bcdef
in the ones place.
...
and multiply by two again to get:

b.cdef
Now the second bit (b) is inthe ones position.

This procedure
bits
are
needed
can be repeated until as many

are found. Figure 6.1 shows a
real example that converts 0.5625 to binary. The

method stops when
fractional
part of
zero
is
reached.
As
another
example,
consider
converting
23.85 to binary. It is easy to convert the integral
part
(23
fractional
=
part
101112 ), but
(0.85)?
what
Figure
about
6.2 shows
beginning of this calculation. If one looks at
the
the
6.1. FLOATING POINT REPRESENTATION
119
the numbers carefully,

This
means
to
opposed
There
an infinite loop is found!

a repeating binary (as
that 0.85 is
repeating
a pattern to
is
decimal
the
in base 10)
numbers
calculation. Looking at the pattern,
that
0.85
10111.1101102
0.1101102
in the
one can see
23.85
Thus,
consequence of the above

can not be represented
using a finite number of bits
important
One
calculation is that 23.85
exactly in binary
(Just
13
as
a finite
can not
be represented in decimal with
number of digits.) As this chapter shows,
float and double variables in C
binary. Thus, values like 23.85

exactly
into
these
approximation of 23.85
To
simplify
the
are
can not
variables.
Only
an
hardware,
floating
point
are stored in a con-sistent format

This
uses scientific notation (but in binary,
powers of two, not ten). For example,
or 10111.11011001100110
.2 would be
format
23.85
be stored
can be stored.
numbers
using
stored in
..
stored
as:
1.011111011001100110...
100
(where the exponent (100) is in binary). A
normalized floating point number has the form:
1.ssssssssssssssss
eeeeeee
where 1.sssssssssssss is the significand and
eeeeeeee
6.1.2
is the exponent.
IEEE floating point
representation
The
IEEE
Electronic
(Institute
Engineers)
organization
is
on most
made
Often
binary
specific
(but not all!)
it
is
of the computer
Intels numeric (or math)
This
com-puters
supported
by
IEEE
different
defines
the
itself. For example,
coprocessors
two different
are
use it.
(which
built into all its CPUs since the Pentium)
The
and
inter-national
floating point numbers
format is used
today.
Electrical
an
that has designed
formats for storing
hardware
of
formats
with
precisions: single and double precision.
Single preci-sion is used by float variables in C
and double precision is used by double variables.
coprocessor also uses a third,

ex-tended precision. In fact,
all data in the coprocessor itself is inthis precision.
When it is stored in memory
from the
coprocessor it is converted to either single or
Intels math
higher precision called
double
precision
precision
uses a
automatically.
slightly
different
Extended
general format
than the IEEE float and double formats and
so
will not be discussed here.

1
not
It should
repeat
in
repeats
0.13
2
type
one
be
in decimal,
Some
compilers
this
use
double. (This
surprising
that
not another.
but in ternary
compilers
uses
so
base, but
(such
extended
as
number
Think
about
might
1
3
(base 3) it would
Borland)
precision.
long
However,
it
be
double
other
double precision for both double and long

is allowed by ANSI C.)
120
31
30
23
s
s
e
22
e
sign bit
-0 = positive, 1= negative
biased exponent (8-bits)
= true exponent
+ 7F (127 decimal).
The
values 00 and FF have special meaning (see text).

f
fraction
-the first 23-bits after the 1.inthe significand.

Figure 6.3: IEEE single precision
IEEE single precision
Single precision floating point
encode the number.
uses 32 bits to
It is usually accurate to 7
significant decimal digits. Floating point

numbers
are stored
in a much
more
complicated
format than integers. Figure 6.3 shows the basic

format of
a IEEE single
precision number.
are sev-eral quirks to the format. Floating

use the twos complement
representation for negative numbers. They use a
signed mag-nitude representation. Bit 31
determines the sign of the number as shown. The
There
point numbers do not
binary exponent is not stored directly.
Instead,
the sum of the exponent and 7F is stored from

bit 23 to 30. This biased exponent is always
non-negative.
The fraction part
assumes a normalized
(in the form 1.sssssssss).
significand
Since the first bit is
anone, the leading one is not stored!

This al-lows the storage of an additional bit at the
end and so increases the precision
slightly. This idea is know as the hidden one
always
representation.
How would 23.85 be
One should always keep in
stored? First, it is positive
so the sign bit is 0.
Next mind that the bytes 41BE
the true exponent
is 4, so the biased exponent is 7F+4
= 8316
Finally, the
CC CD can be interpreted
ways depending
on what aprogram does
with them!
As as single
different
precision
fraction is 01111101100110011001100 (remember the leading
one is hidden).
Putting this all together (to help clarify the different sections of the floating
point format, the sign bit and the faction have been underlined and the bits
floating
point
have been grouped into
4-bit nibbles):
number,
they
23.850000381,
represent
but
as
0100 00011
0111110 1100 1100 1100 11002
= 41BECCCC16
double word integer, they
represent
1,103,023,309!
This is not exactly 23.85

(since it is a repeating binary). If one converts
The CPU does not know
the above back to
decimal,
one finds that it is approximately

which
23.849998474.
is
correct
the
This
number is very close to 23.85, but it is not exact.
Actually, in C,23.85 interpretation!

would not be represented exactly
the left-most bit that
as above. Since
was
truncated from the exact representation is 1, the
up to 1. So 23.85 would be
as 41BE CC CD inhex using single
last bit is rounded
represented
precision. Converting this to decimal results in
23.850000381 which is a slightly better

approximation of 23.85.
6.1. FLOATING POINT REPRESENTATION
121
e =0
and
=0
denotes the number
zero (which can not be nor-
malized) Note that there is a +0 and -0.
e =0
and
f 6= 0
denotes
a denormalized
number. These
are dis-
cussed inthe next section.
e =FF
and
=0
denotes infinity (). There
are
both positive
and negative infinities.
e =FF
and
f 6= 0
an undefined
a Number).
denotes
(Not
result, known
Table 6.1: Special values of f and e
as NaN
63
62
52
0
s
51
Figure 6.4: IEEE double precision
How
would
-23.85
be
represented?
Just
change the sign bit: C1BE CC CD. Do not take

the twos complement!
Certain combinations
meanings
for IEEE floats.
these special values.
f have special
Table 6.1 describes
An infinity is produced
operation
a negative
number, adding two infinities, etc.
overflow
undefined
Normalized
or
e and
zero.
An
result is produced by an invalid
such as trying to find the square root of
by
an
of
by division
single
precision
by
numbers
N
orm alizedcan
sin
range
in magnitude
1.0
2in magnitude
( 1.1755
gle
precision
numbersfrom
can range
35)
127
35).
10
to 1.11111...
2
( 3.4028
10
126
Denormalized numbers
Denormalized
numbers
can be used to
represent numbers with magni-tudes too small to
126).
normalize tud
(i.e.
es too
below
sm all
1.0 to 2n ormal ize
For (i.e.
example,
be lo
129
consider
1.02 the number 1.0012
2
( 1.6530
39).
Inthe given normalized form, the exponent
can
is too small. However, it
be represented in
127.
the unnormal- ized form: 0.010012
2
To
store this number, the biased exponent is set to 0

(see Table 6.1) and the fraction is the complete
significand
with 2
the
of the
127
one to
number written
(i.e. all bits

the
left
of
are
the
The representation of 1.0012
as a
stored
decimal
the one
point)
129
product
including
is then:
0000 0000 00010010 0000 0000 0000 0000
122
IEEE double precision
IEEE double precision
uses 64 bits to
represent numbers and is usually accurate to

about 15 significant decimal digits. As Figure
6.4 shows, the basic format is very similar to

single precision. More bits
are used for the
biased
exponent (11) and the fraction (52) than for

single precision.
The larger
range
for the biased exponent has
two consequences. The first is that it is calculated
as the sum of the true exponent and 3FF (1023)

(not 7F as for single precision). Secondly, a large
range of true exponents (and thus a larger range
of magnitudes) is allowed. Double precision
magnitudes
308.
to 10
can range
from approximately 10
308
It is the larger field of the fraction that is

responsible for the increase in the number of
significant digits for double values.
As an example, consider 23.85 again. The
biased exponent will be 4 + 3FF
= 403 inhex.
Thus, the double representation would be:
0100 0000 0011

0111110110011001100110011001100110011001100110011010
or 40 37 D9 99 99 99 99 9A inhex. If one
converts this back to decimal, one finds
23.8500000000000014 (there are 12 zeros!) which
is a much better approximation of 23.85.
The double precision has the same special
3.
values as single precision
Denormalized
numbers are also very similar. The only main
difference is that double denormalized numbers
use 2 1023 instead of 2 127.
6.2
Floating Point Arithmetic
Floating
different
point arithmetic
than in continuous
mathematics,
all numbers
on a computer
mathematics
can
is
In
be considered
exact. As shown in the previous section,
on a
many numbers can not be represented

a finite number of bits. All
calculations are per-formed with limited precision.
In the examples of this section, numbers with an
computer
exactly
with
8-bit significand will be used for simplicity.
6.2.1
Addition
To add two floating point numbers, the
exponents must be equal. If they
are not
already equal, then they must be made equal by
shifting the significand of the number with the

smaller exponent. For example, consider 10.375
6.34375
= 16.71875
or inbinary:
1.0100110
1.0100110 2 2
1.1001011
1.1001011
The only difference is that for the infinity and
undefined values, the biased exponent
is 7FF not FF.
6.2. FLOATING POINT ARITHMETIC
123
These two numbers do not have the same
so shift
exponent
the significand
to make the
exponents the same and then add:
1.0100110
0.1100110
0.1100110
2
2
10.0001100
Note that the shifting of 1.1001011

the
Note trailing
tha
t the
one
sh
rounding
2
results
and
ift
in
drops off
after
ing of 1.100101
3.
0.11001102
nding
n0.11
2
The
result of the (or
ad
The result
result s iof
the 00110
addition,
10.00011002
4)
1.00001100
2
is equal to 10000.1102 or 16.75.
3
answer (16.71875)!
an approximation due to the round off
errors of the addition process.
This is not equal to the exact
It is only
It is important
always
mathematics
point
to realize that floating point
on a computer
an approximation
arithmetic
(or
calculator)
The
laws
is
of
do not always work with floating
numbers
on a computer.
Mathemat- ics
assumes
infinite precision which
no computer can
match. For example, mathematics
+ b)
exactly
6.2.2
= a; however,
this
teaches that (a
may not
hold true
ona computer!
Subtraction
Subtraction works very

same problems as addition.
consider 16.75 15.9375
Shifting 1.1111111
1.0000000
As
an example,
= 0.8125:
1.0000110
1.1111111
1.1111111
3
22
gives (rounding up)
0.0000110
similarly and has the
24
1.0000110
24
1.0000000
1.0000000
2
2
0.0000110
24
=0.112 =0.75 which is not
exactly correct.
6.2.3
For multiplication, the significands
are
multiplied and the exponents
10.375
2.5
added. Consider
= 25.9375:
1.0100110
1.0100000
1.0100000
2
2
10100110
10100110
1.10011111000000
are
124
Of
course,
the real result would be rounded to
8-bits to give:
1.1010000
= 11010.0002 = 26
Division is more complicated, but has similar
problems with round of f
6.2.4
errors.
Ramifications for programming
The main point of this section is that floating
calculations
are not exact.
The
programmer needs to be aware of this.
A
common mistake that programmers make with
floating point numbers is to compare them
assuming that a calculation is exact. For example,
consider a function named f(x) that makes a
complex calculation and a program is trying to
point
find the functions roots
4.
One might be tempted
to use the following statement to check to see if x

is a root:
if (f(x)
== 0.0 )
But, what if f(x) returns 1 10

likely
means
that
30?
x is a very good
This
very
approximation
of a true root; however, the equality will be false.
may not be any IEEE floating point value

x that returns exactly zero, due to round off
errors in f(x).
A much better method would be to use:
There
of
if (fabs(f(x))
< EPS )
where EPS is a macro defined to

be aEPS
very ismall
where
10).
This is true
positive value (like 110
whenever f(x) is very close to zero. Ingeneral,
to compare
another
a floating
(y) use:
point value (say x) to
if (fabs(x y)/fabs(y)
6.3
< EPS
The Numeric Coprocessor
6.3.1
Hardware
The earliest Intel
processors
had
support for floating point operations

not
mean
that
they
could
not
no
hardware
This does
perform
float
operations.
It just
by
performed
non-floating
systems
that they had to be
procedures
composed
point instructions.
Intel did
math
provide
coprocessor.
an
many
of
For these early

additional
chip
coprocessor
has machine
instructions
that perform many
floating point operations much faster than using a
software procedure (on early processors, at least
called
means
A math
10 times
4
A root of a function is a value
x such that
f(x)
=0
6.3. THE NUMERIC COPROCESSOR
125
faster! )
The
coprocessor
and
the
for
processor
80386,
80387.
integrated the math
80486 itself.
was
was a 80287
for the 8086/8088
called the 8087. For the 80286, there
The
80486DX
coprocessor
into the
Since the Pentium, all generations of
coprocessor;
however, it is still programmed as if it was a
separate unit. Even earlier systems without
a
coprocessor can install software that emulates a
math coprocessor. These emulator packages are
automatically
activated
when
a program
executes a coprocessor instruction and run a
software procedure that produces the same result
as the coprocessor would have (though much
80x86
processors
have
a builtin
math
slower, of course).
The numeric
point registers.
coprocessor
has eight
Each register
holds
data. Floating point numbers
.
as
80-bit
extended
precision
registers. The registers

ST7.
The
floating
are
point
..
always stored
numbers
are named
floating
80 bits of
in these
ST0, ST1, ST2,
registers
are
used
differently
than the integer registers of the main
are organized as
a Last-In First-Out
CPU. The floating point registers
a stack.
Recall that
a stack
(LIFO) list. ST0 always
top of the stack. All
is
refers to the value at the
new
numbers
top of the stack. Existing
the
pushed down
new
on the
are
added to
numbers
stack to make
room
are
for the
number.
register in the numeric
flags. Only the 4 flags
used for comparisons

and C3
6.3.2
The
,,
a status
It has several
There is also
coprocessor.
will be covered: C0
use of these
C1
C2
is discussed later.
Instructions
To make it easy to distinguish the normal

CPU instructions from copro-cessor
coprocessor
ones, all the
mnemonics start with an F.
Loading and storing
are several instructions that load data

copro-cessor register stack:
FLD source
loads a floating point number
from memory onto the top of
There
onto the top of the
the stack. The
source may
be a single, double
or extended
or a coprocessor register.
reads an integer from memory,
precision number
FILD
source
converts it to floating point
on top of the stack. The

source may be
a word, double word or quad word.
stores a one on the top of the
and stores the result
either
FLD1
stack.
stores
FLDZ
a zero on the top of the
stack.
are also several instructions that store

memory. Some of these
instructions also pop (i.e. remove) the number
There
data from the stack into
from
5
However, the 80486SX did not have have
integrated
coprocessor.
There
was a separate
an
80487SX
chip for these machines.
126
the stack
as it stores
FST dest
it.
stores the top of the stack (ST0)
into
memory.
The destina-
may either be a
number or a
coprocessor register.
tion
double precision
just
or
stores the top of the stack into
FSTP dest
memory
single
as FST; however,
after the number
is stored, its
value is popped from the stack.

The
single, double
may
destination
either
or extended precision number
or a coprocessor
register.
FIST dest
stores the value of the top of the
stack converted to an integer
memory.
or a double
into
may
either
a word
word.
The
The
destination
stack
itself
is
to
an
unchanged. How the floating point

number
integer depends
is
converted
onsome bits in
the
coprocessors
control
word.
This is a special (non-floating

point) word register that controls
how the coprocessor works.
By default, the control word is

initialized
so that
it rounds
to the nearest integer when it
converts to integer. However,

the
FSTCW
(Store
Control
Word) and FLDCW (Load Control

Word) instructions
can
be used
to change this behavior.

FISTP
dest
Same
as
FIST except for two
things. The top of the stack is
may
popped and the destination

also be a quad word.
There are two other instructions
move or remove data on the
that
can
stack itself.
FXCH STn
exchanges the values in ST0 and
STn on the stack (where
is register number from 1

to 7).
FFREE STn
up a register on the
as
unused or empty.
frees
stack
by marking the register
Addition and subtraction
Each of the addition instructions compute
the sum of ST0 and another operand. The result

is always stored in a coprocessor register.
FADD
may
src
ST0
coprocessor register
or a single or double
+= src. The src
be any
precision number in
memory.
dest
FADD dest, ST0
dest
may
be any
FADDP dest
coprocessor reg-ister.
or
stack. The dest
dest
may
+= ST0 then pop
be any
coprocessor
FADDP dest, STO

FIADD
+= ST0. The
src
ST0
register.
+= (float) src.
an integer to ST0. The

src must be a word or double word inmemory.
There are twice as many subtraction
Adds
instructions than addition because the order of

the operands is important for subtraction (i.e. a
+b = b + a,
but
b 6= b a!). For each
but
a b 6=there
b a!).
For
each instruction,
instruction,
is an
alternate
one that t
subtracts inthe reverse order. These
6.3.
COPROCESSOR
127
THE
reverse
NUMERIC
segment .bss
array
resq
SIZE
sum
resq
45
segment .text
6
mov
ecx, SIZE
mov
esi, array
fldz
;
ST0 =0
lp:
10
fadd
qword [esi]
;
ST0 += *(esi)
11
add
esi, 8
;
move
12
loop
lp
13
fstp
qword
to next double
;
store
sum
result into
sum
sum example
Figure 6.5: Array
R or RP. Figure 6.5 shows
a short code snippet

of anarray of doubles
up the elements
one must specify the size
memory operand. Otherwise the assembler
that adds
On lines 10 and 13,
of the
would not know whether the
memory
operand
FSUB
src
FSUBR
src
was a float
FSUB dest, ST0

FSUBR dest, ST0
FSUBP dest
or
FSUBP dest, STO
FSUBRP dest
or
FSUBRP dest, STO
FISUB
src
FISUBR
(dword)
ST0
or a double
src
(qword).
-= src. The src may be any coprocessor
register
or a single or double precision number in memory.

ST0 = src
ST0.
The src may be any coprocessor register or a single or double precision number in
memory.
dest
ister.
dest
cessor
dest
-= ST0. The dest may be any coprocessor
reg-
= ST0 -dest. The dest may be any coproregister.
-= ST0 then pop stack. The dest may be any
coprocessor
register.
= ST0 -dest then pop stack. The dest may
dest
be any
ST0 -= (float) src.
Subtracts an integer from
ST0. The src must be a word or double word in memory.
ST0 = (float) src
ST0. Subtracts ST0 from an
integer. The src must be a word or double word in
memory.
128
The multiplication instructions
are completely
analogous to the addition instructions.

FMUL
may
src
ST0 *= src. The
src
be any
precision number in
memory.
dest *= ST0. The
FMUL dest, ST0
dest
may
be any
FMULP dest
or
stack. The dest
dest *= ST0 then pop
may
be any
FMULP dest, STO

FIMUL
coprocessor
register.
ST0 *= (float) src.

an integer to ST0.
The src must be a word or double word in
memory.
src
Multiplies
are
Not surprisingly, the division instructions

analogous to the subtrac-tion instructions.
Division by
FDIV
may
zero results
inan infinity.
ST0 /= src. The
src
src
be any
precision number in
memory.
FDIVR
The
src
ST0
= src / ST0.
src may be any coprocessor register or a single or double
precision
number in
memory.
FDIV dest, ST0
dest
may
be any
dest /= ST0. The
FDIVR dest, ST0
dest
= ST0 / dest.
may be any coprocessor register.

FDIVP dest or
dest /= ST0 then pop
stack. The dest may be any
FDIVP dest, STO
FDIVRP dest or
dest = ST0 / dest
then pop stack. The dest may
FDIVRP dest, STO
be any coprocessor
The dest
register.
FIDIV
src
ST0 /= (float)
src.
Divides ST0 by an integer.

The
FIDIVR
ST0.
src must
src
Divides
ST0. The
be a word
or double word in
memory.
ST0 = (float) src /
an integer by
src must be a word or double
word in
memory.
Comparisons
The
coprocessor
also performs comparisons
of floating point numbers. The FCOM family of

instructions does this operation.
129
1
if (
x >y )
fld
qword [x]
;
ST0 =x
fcomp
qword [y]
;
compare
fstsw
ax
;
move
sahf
else_part
;
if x not above y,goto
10
11
12
13
jna
STO and
C bits into FLAGS
else_part
then_part:
;
code for then part
jmp
end_if
else_part:
;
code for else part
end_if:
Figure 6.6: Comparison example
FCOM src
compares ST0 and src. The src
can be a coprocessor register
or a float or double in memory.
FCOMP src
compares ST0 and src, then
pops stack. The src can be a
coprocessor register or a float or
double in memory.
FCOMPP
compares ST0 and ST1, then
pops stack twice.
FICOM src
compares ST0 and (float)
src. The src can be a word or

dword integer inmemory.
FICOMP
src
then pops stack.
compares ST0 and (float)src,

The src
can be a word or dword integer
inmemory.
compares
FTST
C3 bits of the coprocessor
,,
ST0 and 0.
These instructions change the C0
C1 C2 and
status register.
Unfortunately, it is not possible for the CPU to
access
these bits directly. The conditional branch
use the FLAGS register, not the

coprocessor status register. However, it is
instructions
relatively simple to trans-fer the bits of the status
word into the corresponding bits of the FLAGS

register using
some new
FSTSW dest
word into either
SAHF
instructions:
Stores the coprocessor status
a word in memory or the AX register.

Stores the AH register into the
FLAGS register.
LAHF
Loads the AH register with the
bits of the FLAGS register.

Figure 6.6 shows
a short
example code
snippet. Lines 5 and 6 transfer
the C0
,,
C1 C2
coprocessor status word into

are transfered so
are analogous to the result of a
and C3 bits of the
the FLAGS register. The bits

that they
comparison of two unsigned integers. This is

why line 7 uses
a JNA
instruction.
130
The Pentium Pro (and later
processors
new comparison
(Pentium IIand III)) support two
operators that directly modify the CPUs FLAGS

register.
compares ST0 and src. The

reg-ister.
FCOMIP src
compares ST0 and src, then
pops stack. The src must be a coprocessor
FCOMI
src must
src
be a coprocessor
register.
Figure 6.7 shows
an example
subroutine that finds
the maximum of two dou-bles using the FCOMIP

instruction. Do not confuse these instructions
with the integer comparison functions (FICOM and
FICOMP).
Miscellaneous instructions
covers some other miscellaneous

that the co-processor provides.
This section
instructions
FCHS
ST0
ST0
= -ST0 Changes
the sign of
= |ST0|
Takes the absolute
STO
= |ST0|
Takes the absolut
STO
ST0 =
Takes the square
ST0
FABS
value of ST0
ST0
value
FSQRT
of S
root of ST0
= ST02
ST0 by
a power
of 2 quickly. ST1
ST0
= ST02
multiples
bST1c
ST0
FSCALE
is not removed from the
coprocessor
stack. Figure 6.8 shows
an example
of how to use this
instruction.
6.3.3
Examples
6.3.4
Quadratic formula
The first example shows how the quadratic

formula
can be encoded
in assembly. Recall that
the quadratic formula computes the solutions to

the quadratic equation:
ax
+bx +c = 0
The formula itself gives two solutions for x:x1
and x2
x1,x2
b
2
2a
4ac
The expression inside the
square root
(b
4ac)
is called the discriminant.
Its value is useful in
determining
the
possibilities
which
1.There is only
4ac
of
following
three
are true for the solutions.
one real degenerate
solution. b
=0
2.There
are two real solutions.
3.There
are two complex
4ac
solutions. b
Here is a small C program that
uses
>0
4ac
<0
the assembly
6.3. THE NUMERIC

COPROCESSOR
131
quadt.c
#include <stdio.h>
2
int quadratic(
double, double, double, double

, double );
int main()
6
double a,b,c, root1, root2;
printf (Enter
scanf(%lf
10
if (quadratic(
11
a,b,c,&root1,
&root2) )
printf (roots:
pr
%.10g %.10g\n,
12
root2);
13
root1,
else
printf (
(No real roots\n);
14
return 0;
15
16
a,b,c:);
%lf %lf, &a, &b, &c);
quadt.c
Here is the assembly routine:

quad.
1
;
function
;
finds
quadratic
solutions to the quadratic
equation:
3
a*x^2
+ b*x +c = 0
;
C prototype:
int quadratic( double
double
6
a,double
b,
c,
double *root1,
double *root2 )
7
;
Parameters:
a,b,c
-coefficients
quadratic equation (see above)
-pointer
in
10
root1
to double to store first root

root2
-pointer
store second root in

12
of powers of
11
to double to
;
Return
value:
returns 1
if real roots found,
else 0
13
qword [ebp+8]
14
%define
15
%define b
qword [ebp+16]
16
%define
qword [ebp+24]
17
%define root1
dword [ebp+32]
18
%define root2
dword [ebp+36]
19
%define disc
qword [ebp-8]
132
20
%define one_over_2a
qword
21
22
segment .data
23
MinusFour
dw
-4
24
segment .text
25
global
26
_quadratic
_quadratic:
27
28
push
ebp
29
mov
ebp, esp
30
sub
esp, 16
31
push
ebx
32
[ebp-16]
;
allocate
2 doubles (disc & one_over_2a)
;
must save
original ebx
33
fild
word [MinusFour];
34
fld
35
fld
36
fmulp
st1
37
fmulp
st1
38
fld
39
fld
40
fmulp
st1
41
faddp
st1
42
ftst
;
;
43
fstsw
44
sahf
45
jb
46
fsqrt
47
fstp
48
fld1
49
fld
50
fscale
51
fdivp
st1
52
fst
one_over_2a
;
;
53
fld
54
fld
disc
55
fsubrp
st1
56
fmulp
st1
57
mov
ebx, root1
58
fstp
qword [ebx]
ax
no_real_solutions
;
disc
;
a
59
fld
60
fld
disc
61
fchs
stack -4
stack:
a,-4
stack:
c,a,-4
stack: a*c, -4
stack: -4*a*c
stack: b,b,-4*a*c
stack: b*b, -4*a*c
stack: b*b
-4*a*c
test with 0
;
if disc < 0,no real solutions
stack: sqrt(b*b
store and
-4*a*c)
pop stack
stack: 1.0
stack:
a,1.0
stack:
a *2^(1.0)
=2*a, 1
stack: 1/(2*a)
stack: 1/(2*a)
stack: b,1/(2*a)
stack: disc, b,1/(2*a)
stack: disc
stack: (-b
-b,1/(2*a)
+ disc)/(2*a)
store in*root1
stack: b
stack: disc, b
stack: -disc, b
133
62
fsubrp
st1
63
fmul
one_over_2a
64
mov
ebx, root2
65
fstp
qword [ebx]
66
mov
eax, 1
67
jmp
short quit
68
69
no_real_solutions:
mov
eax, 0
ebx
74
pop
mov
75
pop
ebp
76
ret
70
71
72
quit:
73
esp, ebp
-b
-disc)/(2*a)
;
stack:
-disc
;
stack:
(-b
;
store
in*root2
;
return
value is 1
;
return
value is 0
quad.asm
6.3.5
Reading
array
from file
an assembly routine reads

a file. Here is a short C test
Inthis example,
doubles from
program:
readt.c
This
assembly
from stdin
program tests
procedure.
the
32bit
read doubles
This
program
tests th
It reads the doubles
(Use redirection to read from file .)
/
5
#include <stdio.h>
extern int read doubles( FILE , double ,
int );
#define MAX 100
int main()
10
11
int i,n;
12
double a[MAX];
13
n= read
14
doubles(stdin
a,MAX);
15
for( i=0; i
< n;i++ )
16
printf (%3d %g\n, i,a[i ]);
17
return 0;
18
19
1 34
readt
Here is the assembly routine
r e
1
segment .data
format
34
db
ad
"%lf", 0
asm
segment .text
5
global
_read_doubles
extern
_fscanf
78
%define SIZEOF_DOUBLE
%define FP
dword
10
%define ARRAYP
dword
11
%define ARRAY_SIZE
dword
;
format
for fscanf()
+ 8]
[ebp + 12]
[ebp + 16]
[ebp
12
%define TEMP_DOUBLE
13
14
15
;
function
16
;
C prototype:
_read_doubles
[ebp
-8]
17
int read_doubles( FILE *fp,
double *arrayp, int array_size );18
This function reads doubles from

file into
array
anarray, until
a text
;
EOF or
19
is full.
20
;
Parameters:
21
-FILE pointer
fp
from (must be open for input)
-pointer
arrayp
read into
23
22
to double
array_size
to read
;
array to
-number of
elements inarray
24
;
Return
25
value:
number of doubles stored into
26
27
_read_doubles:
28
push
ebp
29
mov
ebp,esp
30
sub
esp, SIZEOF_DOUBLE
32
push
esi
33
mov
esi, ARRAYP
34
xor
edx, edx
31
35
36
while_loop:
37
cmp
edx, ARRAY_SIZE
38
jnl
short quit
array
(in EAX)
;
define one double on stack
;
save
esi
;
esi =ARRAYP
;
edx =array
index (initially 0)
;
is edx < ARRAY_SIZE?
;
if not, quit
loop
6.3. THE NUMERIC

COPROCESSOR
135
39
40
;
call fscanf()
to read
a double
into
TEMP_DOUBLE
41
;
fscanf()
42
might change edx
so save
43
push
edx
44
lea
eax, TEMP_DOUBLE
45
push
eax
46
push
dword format
47
push
FP
48
call
_fscanf
49
add
esp, 12
50
pop
edx
51
cmp
eax, 1
52
jne
short quit
it
53
54
56
;
copy
;
(The
57
55
TEMP_DOUBLE into ARRAYP[edx]
8-bytes of the double
are copied
-8]
58
mov
eax, [ebp
59
mov
[esi +8*edx],
eax
-4]
60
mov
eax, [ebp
61
mov
[esi +8*edx
63
inc
edx
64
jmp
while_loop
pop
esi
mov
eax, edx
71
mov
esp, ebp
72
pop
ebp
62
65
66
quit:
67
68
69
70
;
save
edx
;
push &TEMP_DOUBLE
;
push &format
;
push file pointer
;
restore
edx
;
did fscanf
return 1?
;
if not, quit
loop
+4],eax
by two 4-byte copies)
;
first copy
;
next copy
lowest 4 bytes
highest 4 bytes
;
restore esi
;
store return
value into
ret
73
6.3.6
eax
read.asm
Finding primes
This final example

numbers
looks at finding
prime
is more
one. It stores the
in an array and only divides
again. This imple-mentation
effi cient than the previous

primes it has found
by the previous
every
primes it has found
odd number to find
new
primes.
136
instead of
is that it computes
One other difference
square root
of the
guess
for
the
the next prime to
can stop searching for

coprocessor control word so
it stores the square root as an integer,
determine at what point it

factors. It alters the
that when
it
truncates
instead
of
rounding.
This
is
controlled by bits 10 and 11of the control word
are called the RC (Rounding Control)

bits. If they are both 0 (the default), the
coprocessor rounds when converting to integer. If
they are both 1, the coprocessor
truncates
These bits
integer
conversions.
careful to
save
Notice that the routine is
the original
control
word and
restore it before it returns.

Here is the C driver
program:
fprime.c
#include <stdio.h>
#include <stdlib.h>
function find primes
finds the indicated

i
number of primes
Parameters:
a array to hold
10
how
many
primes
primes to find
extern void find primes (int
a,unsigned n
);11
12
int main()
13
14
int status;
15
unsigned i;
16
unsigned
17
int
max;
a;
18
19
printf (How
find? );
20
many
primes do you wish to
scanf(%u,
&max);
21
22
a =calloc(
sizeof(int ),max);
23
24
if (a ){
25
26
find primes(a,max);
27
28
/ print out the last 20 primes found /
for(i= (max
29
? max 20
> 20 )
:0; i
< max;
i++ )
30
printf (%3d %d\n, i+1, a[i]);
3132
free(a);
status
33
34
35
else {
=0;
fprintf (stderr
36
status
37
=1;
Can not create
array
38
39
return status;
40
41
137
of %u ints\n, max);
fprime.c
Here is the assembly routine:
prime2.asm
1
segment .text
global
_find_primes
;
;
function find_primes
;
finds the indicated
number of primes
;
Parameters:
; array
array to hold primes
; n_find
how many primes to find
;
C Prototype:
10
;extern void find_primes( int *array,
11
12
%define
13
%define n_find
14
%define
15
%define isqrt
16
%define orig_cntl_wd
17
%define new_cntl_wd
array
n
18
19
_find_primes:
ebp
ebp
+8
+12
-4
-8
ebp -10
ebp -12
ebp
ebp
enter
12,0
22
push
ebx
23
push
esi
25
fstcw
word [orig_cntl_wd]
26
mov
ax, [orig_cntl_wd]
20
21
24
unsigned n_find )
;
number of primes found so far
;
floor of sqrt of guess
;
original control word
;
new
control word
;
make room for local variables
;
save possible register variables
;
get current control word
138
27
or
ax, 0C00h
28
mov
[new_cntl_wd],
29
fldcw
word [new_cntl_wd]
30
ax
31
mov
esi,[array]
32
mov
dword [esi], 2
33
mov
dword [esi + 4],3
34
mov
ebx, 5
35
mov
dword [n], 2
36
37
;
This
outer loop finds
a new
prime
;
set
rounding bits to 11(truncate)
;
esi points
;
array[0]
;
array[1]
;
ebx
to array
=2
=3
= guess = 5
;
n= 2
each iteration, which it adds to the
38
;
end of the array. Unlike
prime finding
program,
the earlier
this function
39
does not determine primeness by dividing

by allodd numbers. It only
40
;
divides
by the prime numbers that it
41
;
are stored
42
43
while_limit:
inthe array.)
44
mov
eax, [n]
45
cmp
eax, [n_find]
46
jnb
short quit_limit
48
mov
ecx, 1
49
push
ebx
50
fild
dword [esp]
51
pop
ebx
52
fsqrt
47
fistp
53
54
dword [isqrt]
has already found. (Thats why they
;
while
(n< n_find )
;
ecx is used as array
;
store guess on stack
;
load guess
;
get guess
onto
index
coprocessor
off stack
stack
;
find sqrt(guess)
;
isqrt
55
;
This
=floor(sqrt(quess))
inner loop divides
guess
(ebx) by
58
;
until it finds a prime factor of guess
;
or until the prime number to divide is
;
59
while_factor:
56
57
[esi + 4*ecx]
60
mov
eax, dword
61
cmp
eax, [isqrt]
62
jnbe
short quit_factor_prime
63
mov
eax, ebx
64
xor
edx, edx
65
div
dword [esi +4*ecx]
66
or
edx, edx
earlier computed prime numbers

(which
means guess
is not prime)
greater than floor(sqrt(guess))
;
eax
= array[ecx]
;
while
(isqrt
;
&& guess
67
jz
<array[ecx]
% array[ecx] != 0 )
short
quit_factor_not_prime
inc
68
ecx
jmp
69
short while_factor
70
71
72
;
found anew
73
74
quit_factor_prime:
prime !
mov
mov
eax, [n]
76
77
inc
eax
78
mov
[n], eax
75
dword [esi + 4*eax], ebx
79
80
quit_factor_not_prime:
81
add
ebx, 2
82
jmp
short while_limit
83
84
quit_limit:
85
86
fldcw
word [orig_cntl_wd]
87
pop
esi
88
pop
ebx
89
90
leave
139
;
add guess
;
inc n
;
try next
;
restore
;
restore
to end of
odd number
control word
register variables
ret
91
array
140
prime2.
global _dmax
23
segment .text
4
;
function
;
returns
;
C prototype
;
double
;
Parameters:
_dmax
the larger of its two double arguments
dmax( double d1,double d2 )
-first double
-second double
d1
10
d2
11
;
Return
12
13
%define d1
ebp+8
14
%define d2
ebp+16
15
_dmax:
value:
larger of d1
and d2 (inST0)
enter
0,0
18
fld
qword [d2]
19
fld
qword [d1]
;
ST0 =d1,ST1= d2
20
fcomip
st1
;
ST0 =d2
21
jna
short d2_bigger
22
fcomp
st0
;
pop d2 from stack
23
fld
qword [d1]
;
ST0 =d1
24
jmp
short exit
16
17
25
d2_bigger:
26
exit:
;
if d2 is max, nothing
27
leave
28
ret
Figure 6.7: FCOMIP example

6.3.
COPROCESSOR
141
THE
NUMERIC
to do
segment .data
dq
2.75
five
dw
;
converted
to double format
45
segment .text
6
fild
dword [five]
;
ST0 =5
fld
qword [x]
;
ST0 =2.75, ST1=5
fscale
;
ST0 =2.75 *32,ST1= 5
Figure 6.8: FSCALE example
142
Chapter 7
Structures and
C++
7.1
7.1.1
Structures
Introduction
Structures
are used inC to group together

a composite variable. This
related data into
technique has several advantages:
1.It clarifies the code by showing that the data

defined inthe structure
are intimately
related.
2.It simplifies passing the data to functions.

Instead of passing multiple
variables separately, they
single unit.
can be passed as a
3.It increases the locality
From the assembly standpoint
as an array
be considered
of the code.
a structure can
with elements of varying
arrays are always the

same size and type. This property is what allows
one to calculate the address of any element by
knowing the starting address of the array, the
size. The elements of real
size of the
elements
and the desired
elements
index.
A structures
same
elements do not have to be the
are not). Because of this

a structure must be explicitly
and is given a tag (or name) instead of a
size (and usually
each element
specified
of
numerical index.
a structure will
way as an element of an
array. To access an element, one must know the
In assembly, the element of
be accessed ina similar
starting address of the structure and the relative
offset of that element from the beginning of the
structure. However, unlike

offset
can
be calculated
element, the element of
offset by the compiler.
an array
where this
by the index
a structure
of the
is assigned
an
See the virtual
memory management
section of
Operating System text book for discussion of this term.
143
144
CHAPTER 7
STRUCTURES AND C++
Offset
Element
y
6
Figure 7.1: Structure S
Offset
Element
any
unused
y
8
Figure 7.2: Structure S
For example, consider the following structure:
struct S {
short int
x;
/ 2byte integer /
int
y;
/ 4byte integer /
double
z;
/ 8byte float
};
Figure 7.1 shows how
might look in the computers

C
standard
that
variable
of type S
memory.
The ANSI
the
elements
a
same
of
are arranged in the memory in the

as they are defined in the struct definition.
structure
order
states
It also states that the first element
very
is at the
beginning of the structure (i.e. offset zero).
It also defines another useful
macro
header file named offsetof().
inthe stddef
This
macro
computes and returns the offset of any element of

a structure. The macro takes two parameters, the
first is the name of the type of the structure, the
second is the name of the element to find the
offset of. Thus, the result of offsetof
would be 2 from Figure 7.1.
(S
y)
7.1. STRUCTURES
145
struct S {
short int
x;
/ 2byte integer /
int
y;
/ 4byte integer /
double
z;
attribute
/ 8byte float
((packed));
Figure 7.3: Packed struct using
7.1.2
gcc
Memory alignment
If one uses the offsetof

offset of y using the gcc
macro to find the
compiler, they will find that it returns 4,not 2!

Why? Because gcc (and Recall that an address is
on many other compilers) align variables on double
word boundaries by default. a double word
boundary if
In32-bit protected mode, the CPU
memory faster if the data starts at it is

divisible by 4 a double word boundary. Figure 7.2
reads
shows how the S structure really looks

using
gcc. The compiler
inserts two unused bytes
into the structure to align
y (and
z) ona double word boundary. This
shows why it is a good idea
to use offsetof to compute the offsets instead

of calculating them oneself
when using structures defined in C.
course, if the structure is only used in

assembly, the programmer
can determine the offsets himself. However, if
one is interfacing C and
assembly, it is very important that both the
Of
assembly code and the C code
agree on the offsets
of the elements of the
structure! One complication is

that different C compilers
may
give different
offsets to the elements. For

example, as we have seen, the gcc compiler creates
an S structure that looks
like Figure 7.2; however, Borlands compiler
would create
a structure
that
looks like Figure 7.1. C compilers provide
ways
to specify the alignment

used for data. However, the ANSI C standard
does not specify how this will
be done and thus, different compilers do it
differently.
The
gcc compiler
has
a flexible
and
complicated method of specifying the

alignment. The compiler allows
one to specify
the alignment of any type

using
a special syntax.
For example, the
following line:
typedef short int unaligned int
attribute
((
aligned (1)));
defines
aligned
a new type
on byte
parenthesis
named unaligned int that is

boundaries.
after
(Yes,
attribute
The 1
inthe aligned parameter
with
other
alignments.
powers
type,
gcc
the
are required!)
can be replaced
two to specify
other
(2 for word alignment, 4 for double
word alignment,
structure
of
all
was
etc.)
changed
would put
If the
to be
y at
y element of
an unaligned
offset 2.
the
int
However,
would still be at offset 8 since doubles
are
also
double word aligned by default. The definition of
zs type would have to be changed
put at offset 6.
as well for it to
146
CHAPTER 7
STRUCTURES AND C++
#pragma pack(push) /
save
alignment
state /
/ set byte alignment
#pragma pack(1)
struct S {
short int
x;
/ 2byte integer /
int
y;
/ 4byte integer /
double
z;
/ 8byte float
};
#pragma pack(pop)
/ restore
original alignment
Figure 7.4: Packed struct using Microsoft
or
Borland
The
gcc
compiler also allows
structure.
This tells the
minimum
possible
space
one to pack a
to use the
compiler
for
the
structure.
Figure 7.3 shows how S could be rewritten this
way.
This form of S would
use
the
minimum
bytes possible, 14 bytes.

Microsofts
support the
using
and
same
a #pragma
Borlands
compilers
both
method of specifying alignment
directive.
#pragma pack(1)
The directive above tells the compiler to pack
on
elements of structures
with
no extra
The
with two, four, eight
replaced
on word,
alignment
paragraph
stays
directive
these directives
in
effect
This
are
differently
lead
to
respectively.
until
The
overridden
by
problems since
often used in header files. If
files with structures,
can
be
or sixteen to specify
can cause
the header file is included
laid out
(i.e.,
one can
double word, quad word and
boundaries,
another directive
This
byte boundaries
padding).
before other header
these structures
may
be
than they would by default.
a very hard to find error

a program might lay out
modules of
Different
the elements of the structures indifferent places!

There
is
a way to
avoid
Microsoft and Borland support
current
alignment
state
and
this
problem.
a way to save
restore
the
it later
Figure 7.4 shows how this would be done.
7.1.3
Bit Fields
Bit fields allow
one to
specify members of
struct that only
use a spec-ified
number of bits.
The size of bits does not have to be
a multiple
an
of eight. A bit field member is defined like

unsigned int
or int
size appended to it.Figure
This defines
a colon and bit

7.5 shows an example.
member
a 32-bit
with
variable that is decomposed
inthe following parts:

7.1. STRUCTURES
147
struct S {
unsigned f1
:
3;
/ 3bit field
unsigned f2 :
10;
/ 10bit field /
unsigned f3 :
11;
/ 11bit field /
unsigned f4 :
8;
/ 8bit field
};
Figure 7.5: Bit Field Example
Byte \ Bit
Operation Code (08h)
Logical Unit #
msb of LBA
middle of Logical Block Address
lsb of Logicial Block Address
Transfer Length
Control
Figure 7.6: SCSI Read Command Format
8 bits
11bits
10 bits
3 bits
f4
The
f3
first
bitfield
f2
is
to the
assigned
significant bits of its double word.
f1
least
so simple if one
are actually stored in
memory. The diffi culty occurs when bitfields
span byte boundaries. Because the bytes on a
little endian processor will be reversed in memory.
However, the format is not
looks
at how the bits
For example, the S struct bitfields will look like

this in memory:
5 bits
3 bits
3 bits
5 bits
8 bits
8 bits
f2l
f1
f3l
f2m
f3m
f4
The f2l label refers to the last five bits (i.e., the
five least significant bits) of the f2 bit field. The
f2m label refers to the five most significant
bits
f2. The double vertical lines show the byte
of
boundaries.
If
one reverses
all
the bytes, the
pieces of the f2 and f3 fields will be reunited in

the correct place.
The physical
memory
layout
is not usually
important unless the data is being transfered in or
out of the
common
program
(which
with bit fields).
is actually
It is
quite
common
for
hardware devices interfaces to use odd number of
bits that bitfields could be useful to represent.

2
Actually, the ANSI/ISO
some
flexibility
However,
in exactly
common
C standard gives the compiler

how
the
compilers
bits
(gcc,
are
laid
Microsoft
Borland) will lay the fields out like this.
148
STRUCTURES AND C++
CHAPTER 7
out
and
#define MS OR BORLAND (defined( BORLANDC
)\
||
defined( MSC VER))
34
#if MS OR BORLAND
5
pragma
pack(push)
pragma
pack(1)
#endif
89
struct SCSI read cmd {
:
8;
10
unsigned opcode
11
unsigned lba msb
12
unsigned logical unit
13
unsigned lba mid
14
unsigned lba lsb :

8;
15
unsigned transfer length
16
unsigned control
:
5;
:
3;
:
8;
18
#if defined( GNUC )
20
attribute
21
:
8;
:
8;
17
19
/ middle bits /
((packed))
#endif
22
23
#if MS OR BORLAND
24
25
#endif
pragma
pack(pop)
Figure 7.7: SCSI Read Command Format

Structure
One example is SCSI
3.
A direct read command
for a SCSI device is spec-ified by sending
message to the
a six byte
device in the format specified in
Figure 7.6. The diffi culty representing
this using
bitfields is the logical block address which

different
7.6,
bytes of the command.
one sees
spans
that the data is stored
endian format.
Figure
From Figure
7.7 shows
in big
definition
that attempts to work with all compilers. The
first two lines define
a macro
that is true if the
or
code is compiled with the Microsoft

compilers.
The potentially
lines 11to 14. First
one
as a
are
might wonder why the
lba mid and lba lsb fields

and not
Borland
parts
confusing
are
defined separately
single 16-bit field? The
reason
is
that the data is inbig endian order. A 16-bit field

would be stored in little endian order by the
compiler.
fields
3
Next,
the
lba msb
appear to be reversed;
and
logical unit
however,
Small Computer Systems Interface,
an industry
7.1. STRUCTURES
149
8 bits
3 bits
control
8 bits
5 bits
transfer length
8 bits
8 bits
8 bits
lba lsb
lba mid
logical unit
lba msb
opcode
Figure 7.8: Mapping of SCSI read cmd fields

1
struct SCSI read cmd {
unsigned char opcode;
unsigned char lba msb
unsigned char logical unit
unsigned char lba mid;
unsigned char lba lsb;
unsigned char transfer length
unsigned char control;
10
:
5;
:
3;
/ middle bits /
#if defined( GNUC )
attribute
11
12
#endif
13
((packed))
Figure 7.9: Alternate SCSI Read Command
Format Structure
this is not the
case.
They have to be put in
this order. Figure 7.8 shows

mapped
are again
as a
how the fields
are
48-bit entity. (The byte boundaries
denoted by the double lines.) When this
is stored in memory inlittle endian order, the bits
are arranged
inthe desired format (Figure 7.6).
To complicate
for
the
correctly
matters
SCSI read cmd
for
Microsoft
more,
does
C.
the definition
not
If
quite
the
work
ex-pression
sizeof(SCSI read cmd)
is evalutated,
Microsoft C will return 8, not 6!This is because
the
uses
compiler
Microsoft
the
type of the
bitfield indetermining how to map the bits. Since
all the bit fields
are
defined
as
unsigned
types,
pads two bytes at the end of the
the compiler
structure to make it
words. This
can
an integral
be remedied
number of double
by making all the
fields unsigned short instead. Now, the Microsoft

does not need to add
compiler
since six bytes is

words.
any
pad bytes
an integral number of two-byte

com-pilers also work correctly
The other
with this change. Figure 7.9 shows yet another
definition that works for all three compilers. It
avoids
all but two of the bit
fields by using
unsigned char.
The reader should not be discouraged
if he
confusing.
It is
found
the previous
confusing!
The
discussion
author
often
it
finds
confusing to avoid bit fields altogether and

operations
4
Mixing
to examine
different
types
and
of
bit
modify
fields
use bit
the
leads
less
to
bits
very
confusing behavior! The reader is invited to experiment.
150
CHAPTER 7
STRUCTURES AND C++
manually.
Using structures inassembly
7.1.4
As discussed
assembly
is
above, accessing
very
much like
a structure in
an array
how one would
would zero out
accessing
a simple example, consider

write an assembly routine that
the y element of an S structure.
For
Assuming
the
prototype of the routine would be:

void
zero
y( S
s p );
the assembly routine would be:
%define
_zero_y:
y_offset
enter
0,0
mov
eax, [ebp + 8]
s_p (struct
mov
6
;
get
pointer) from stack
+ y_offset],
dword [eax
leave
ret
one to pass a structure
C allows
a function;
by value to
however, this is almost always
a bad
idea. When passed by value, the entire data in
structure
the
must be copied to the stack and
more
a pointer to a structure instead.
C also allows a structure type to be used as
the return value of a func-tion. Obviously
a
structure can not be returned in the EAX register.
then retrieved
by the routine. It is much
effi cient to pass
compilers
Different
handle
common
this
situation
com-pilers
use is to internally rewrite the function as one
that takes a structure
pointer as a parameter.
differently.
solution that
The pointer is used to put the return value into
a structure
Most
defined outside of the routine called.

assemblers
built-in support
assembly
(including
for defining
code. Consult
your
NASM)
structures
documentation
details.
7.2
in
Assembly and C++
have
your
for
The
programming
C++
extension of the C language.
language
is
an
Many of the basic
rules of interfacing C and assembly language also

apply to C++.
modified.
are
However,
Also,
some
easier to understand
assembly
language
knowledge of C++.
some
rules need to be
of the extensions of C++
with
This section
a knowledge of
assumes a basic
7.2. ASSEMBLY
AND C++
151
1
#include <stdio.h>
23
void f(int
4
printf (%d\n,
x)
{
x);
78
void f(double
9
printf (%g\n,
10
11
x)
{
x);
Figure 7.10: Two f()functions
7.2.1
Overloading and Name Mangling
C++
allows
different
functions
(and
class
same name to be
defined
When more than one function share
the same name, the functions are said to be
overloaded. If two functions are defined with the
same name in C, the linker will produce an
error because it will find two definitions for the
same symbol in the object files it is linking. For
member
functions)
with the
example, consider the code in Figure 7.10. The
assembly code would define two labels
equivalent
named f which will obviously be an error.
uses the same linking process as C, but

this error by per-forming name mangling or
C++
avoids
modifying the symbol used to label the function.
In a
way, C already uses name

adds an underscore to the name
when
creating
the
label
mangling, too. It
of the C function
for
the
function.
name of both
functions in Figure 7.10 the same way and
produce
an error. C++ uses a more
sophisticated mangling process that produces two
However,
C will
func-tion
first
assigned
the
labels for the functions.
different
the
mangle
in Figure
by DJGPP
For example,
7.10
would
be
the label f Fi and the
second function, f Fd. This avoids
any
linker
errors.
Unfortunately,
to manage
mangle
Borland
@f$qd
names
names
there is
no
standard for how
in C++ and different compilers

For
differently.
C++ would
use
the labels @f$qi and
for the two functions
However, the rules
example
are not
in Fig-ure 7.10
completely
arbitrary
name
The mangled
encodes the signature of the
function. The signature of

by the order
function is defined
and the type of its parameters.
Notice that the func-tion that takes
a single
int
has an i
at the end of its mangled
name (for both DJGPP and Borland) and that
the one that takes a double argument has a d at
the end of its mangled name. If there was a
argument
function named f with the prototype:
152
CHAPTER 7
STRUCTURES AND C++
void f(int
x,int y,double
DJGPP would mangle its
z);
name to
be f Fiid
and Borland to @f$qiid.

The return type of the function is not part of
functions
mangled
overloading
signature
name.
in
C++.
fact
explains
unique
Only
may
functions
rule
of
whose
one
can see, if two functions with the same name and
signature are defined in C++, they will produce
the same mangled name and will create a linker
error. By default, all C++ functions are name
signatures
are
is not encoded in its
and
This
be overloaded. As
even ones
mangled,
no way
are
that
When it is compiling
not
overloaded
file, the compiler has
a particular function
is overloaded or not, so it mangles all names. In
fact, it also mangles the names of global variables
by encoding the type of the variable in a similar
way as function signatures. Thus, if one defines a
global variable in one file as a certain type and
then tries to use it in another file as the wrong
type, a linker error will be produced. This
characteristic
of C++ is known as typesafe
linking. It also exposes another type of error,
inconsistent prototypes
This occurs when the
definition of a function in one module does not
agree with the prototype used by another module.
In C, this can be a very diffi cult problem to
debug. C does not catch this error. The program
of knowing whether
will compile
behavior
different
as
and link, but will have undefined

the calling
types
on
code
will be pushing
the stack than
expects. In C++, it will produce
the function
a linker error.
a function
When the C++ compiler is parsing

call, it looks for
the
types
of
a matching
the
function by looking at
arguments
passed
to the
If it finds
function
match, it then creates
CALL to the correct function using the compilers
name
mangling rules.
use
compilers
Since different
name
different
mangling rules, C++ code compiled by different
may not
compilers
This
be able to be linked together.
fact is important
when considering
using
precompiled C++ library! If one wishes to write

in assembly
function
that
a
a
will be used with
C++ code, she must know the
name
mangling
rules for the C++ compiler to be used (or use the

technique explained below).
The astute
student
may
question
whether
as
expected.
the code in Figure 7.10 will work

Since C++
printf
name
function
compiler
printf.
mangles all functions, then the

will
be
will not produce
This
is
prototype for printf
mangled
valid
was
and
CALL to the
concern!
If
the
label
the
simply placed at the
top of the file, this would happen. The prototype

is:
int printf (const char , ...);
DJGPP would mangle this to be printf FPCce.
(The Fis for function, P
The match does not have to be
compiler
will
arguments.
scope
consider
The
rules
an exact
matches
made
for this
process
of this book. Consult
a C++
match, the
by
casting
the
are
beyond
the
book for details.
7.2. ASSEMBLY
AND C++
153
for pointer, C for const,
ellipsis.)
This
would
for
the regular
for char and
not call
printf function! Of course, there must

a way for C++ code to call C code. This is
very important because there is a lot of useful old
C code around. In addition to allowing one to call
legacy C code, C++ also allows one to call
librarys
be
assembly
code using
the normal
C mangling
conventions.
C++ extends the extern keyword to allow it
to specify that the func-tion

modifies
uses
terminology,
or
global variable it
the normal C conventions. In C++

the function
or
global variable
uses
C linkage. For example, to declare printf to have

C linkage,
use the prototype:
extern C int printf (const char
...
);
use the C++

name mangling rules on this function, but instead
to use the C rules. However, by doing this, the
This instructs the compiler not to
may not be overloaded. This

provides the easiest way to interface C++ and
assembly, define the function to use C linkage and
then use the C calling convention.
printf
For
linkage
function
convenience,
of
block
C++
of
also
functions
allows
and
the
global
variables to be defined. The block is denoted by

the usual curly braces.
extern C {
/ C linkage global
prototypes
variables
and function
If one examines the ANSI C header files that
come with C/C++ com-pilers today, they will find

the following near the top of each header file:
#ifdef
cplusplus
extern C {
#endif
a similar construction near the bottom

a closing curly brace. C++ compilers
define the
cplusplus macro (with two
And
containing
leading under-scores). The snippet above encloses
the entire header file within
an extern
"C" block if
as C++, but does

nothing if compiled as C (since a C compiler
would give a syntax error for extern "C"). This
same technique can be used by any programmer
to create a header file for assembly routines that
can be used with either C or C++.
the header file is compiled
7.2.2
References
are another new feature of C++

one to pass parameters to functions
References
They allow
without explicitly using pointers.
consider
the
code
reference parameters
in Figure
are pretty
For example,
7.11.
Actually,
154
CHAPTER 7
STRUCTURES AND C++
void f(int & x )
{x++; }
// the & denotes
a reference parameter
34
int main()
5
y =5;
int
f(y);
no & here!
y); // prints out 6!
return 0;
10
// reference to y is passed, note
printf (%d\n,
Figure 7.11: Reference example
simple,
they
really
are
just
pointers.
The
programmer (j
implement
var
compiler just hides this from the
ust
as
Pascal
parameters
as
compilers
pointers)
When
the compiler
generates assembly for the function call on line 7,

it
passes
the address
function f in assembly,
6:
prototype
was
void f( int xp);
of
y. If one was writing

as if the
they would act
are
References
just
convenience
are
that
especially useful for opera-tor overloading. This is
another
define
feature
meanings
structure
use
or
is to
of C++
for
that
the
(+) operator
plus
concatenate string objects. Thus, if

strings,
the
operators
class types. For example,
define
one to
on
a common
allows
common
a and
to
b were
a + b would return the concatenation of

a and b. C++ would actually call a
strings
function to do this (in fact, these expression could
be rewritten in function notation

+(a,b)). For effi ciency,
the address
one
of the string
as operat or
pass
would like to
objects
in-stead
of
passing them by value. Without references, this
as operator +(&a,&b), but this

one to write in operator syntax as
&a + &b. This would be very awkward and
confusing. However, by using references, one can
write it as a +b,which looks very natural.
could be done
would require
7.2.3
Inline functions
Inline functions are yet another feature of

7.
Inline functions are meant to replace the
C++
error-prone,
preprocessor-based
macros
that take
pa-rameters. Recall from C, that writing a macro

that squares a number might look like:
6
Of
course,
they might want to declare the function
with C linkage to avoid
name
mangling
as discussed
in
Section 7.2.1
7
C compilers often support this feature
as an extension
7.2. ASSEMBLY AND C++
155
1
inline int
inline f (int
x)
{return xx; }
34
int f(int
5
x)
{return xx; }
67
int main()
8
int
y =inline
11
f (x);
return 0;
12
13
y,x = 5;
y =f(x);
10
Figure 7.12: Inlining example
#define SQR(x) ((x)(x))
Because the
C
and
parenthesis
preprocessor
does
are
simple
does not understand
sub-stitutions,
the
required to compute the correct
answer
in most
cases.
However,
answer
will not give the correct

Macros
overhead
are
As
demonstrated,
version
used because they eliminate
of making
function.
even this
for SQR(x++).
the
func-tion call for
several steps. For
on
chapter
performing
a very
the
simple
subprograms
function call involves
simple function, the time
it takes to make the function call
may
be
more
than the time to actually perform the operations
are a much more

that looks like a
does not CALL a
in the function! Inline functions

friendly
way to
write
normal
function,
but
common
block
functions
are
function.
code
that
of code. Instead,
calls to inline
replaced by code that performs the
C++ allows
function
to be
made
inline by placing the keyword inline in front of
the function defini-tion. For example, consider the

functions
declared
in Figure 7.12. The call
function f on line 10 does

(in assembly, assuming
y is at ebp-4):
push
dword [ebp-8]
a normal
x is at
to
function call
address ebp-8 and
3
4
call
_f
pop
mov
ecx
[ebp-4],
eax
However, the call to function inline f on line 11

would look like:
156
CHAPTER 7
STRUCTURES AND C++
mov
eax, [ebp-8]
imul
eax, eax
mov
[ebp-4],
In this
case,
there
eax
are two
advantages
to
inlining. First, the inline func-tion is faster. No
parameters
are
pushed
on
no stack
no branch is
func-tion call uses less
the stack,
frame is created and then destroyed,
made. Secondly, the inline
code! This last point is true for this example,

but does not hold true inall cases.
The main disadvantage
inline code is not linked and
of inlining
so
is that
the code of
an
inline function must be available to all files that
use it. The
previous example assembly code shows
this. The call of the non-inline
function
only
requires knowledge of the parameters, the return

value type, calling convention and the
the label for the function.
name
of
All this information
is available
from the prototype
However,
using
knowledge
of the all the code of the function.
This
means
the
that
inline
of the function.
function must be recompiled
non-inline
any part of an inline

all source files that use the
if
function is changed,
Recall that for
functions, if the prototype
change, often the files that

need not be recompiled.
use
does not
the function
For all these
the code for inline functions
header
requires
function
files. This practice
are usually
reasons,
placed in
is contrary
to the
normal hard and fast rule in C that executable
are never
code statements
7.2.4
placed inheader files.
Classes
A C++ class
describes
a type
of object. An
has both data mem-bers and function

8.
members
In other words, its a struct with
object
data and functions associated with it.Consider the
simple class defined in Figure 7.13. A variable of

Simple type would look just like
normal C
struct with a
Actually,
C++
uses
the
single int member. The
functions
are not
stored in
memory
the this keyword to access the
member
functions.
passed
structure
to
However,
functions
are different from other
pointer to the object acted
They are
a hidden parameter.
pointer to the
on from inside
This parameter is
the member
that the member function is acting

function.
the set data
method
object
on.
For
consider
assigned
example,
of the Simple
class of Fig-
ure
7.13.
like
If it
was
function
written in C, it would look
that
was
passed a
on as the code
-S switch on the
gcc and Borland
explicitly
pointer to the object being acted

in Fig-ure 7.14 shows. The
DJGPP
compilers
an
compiler
as well)
assembly
file
(and
the
tells the compiler to produce
containing
the
equivalent
assembly language for the code produced.

DJGPP and
gcc
the assembly
.s extension
8
Often called member functions
For
file ends in an
in C++
and unfortu-
or more
generally
methods.
7.2. ASSEMBLY
AND C++
157
1
class Simple {
public:
Simple();
// default constructor
Simple();
// destructor
int get data() const;
// member functions
void set data( int );
private:
int data;
// member data
};
10
11
12
Simple::Simple()
{data
= 0;}
13
14
15
Simple::Simple()
{/ null body / }
16
17
int Simple::get
18
{return data; }
data() const
19
20
21
void Simple:: set data( int

{data
x)
= x;}
Figure 7.13: A simple C++ class
nately
uses
AT&T
assembly
language
syntax
which is quite different from NASM and MASM

9
syntaxes
(Borland and MS compilers generate
a file
with a .asm extension using MASM syntax.)
7.15
Figure
converted
shows
very
DJGPP
the
purpose
of the statements. On
first line, note that the set data method
is assigned
name
of
to NASM syntax and with comments
added to clarify
the
output
the
mangled
label that encodes
the
of the method, the name of the class and the
parameters.
because
The
other
name
of the class is encoded
classes
might
have
method
named set data and the two methods must be
assigned
different
labels.
encoded
so
the
that
parameters
The
can
class
overload
set data method to take other parameters

normal C++ functions. However, just
compilers
different
are
the
just
as
as
before,
will encode this information
differently inthe mangled label.
Next
prologue
9
The
called
the
gcc
compiler
and 3, the familiar function

line
On
are
differences
outputs
several
5,
system includes
gas. The gas assembler uses
compiler
There
free
on lines 2
appears.
the code
pages on
its
in the
the
web
named
a2i
format
that
in INTEL and AT&T formats.
program
own
assembler
AT&T syntax and thus

for
discuss
gas.
the
There is also
(http://www.multimania.com/placr/a2i.html),
that
converts AT&T format to NASM format.
158
CHAPTER 7
STRUCTURES AND C++
void set data( Simple object
int
x)
{
object>data
= x;
Figure 7.14: C Version of Simple::set data()

1
_set_data__6Simplei:
mangled
name
push
ebp
mov
ebp, esp
mov
eax, [ebp + 8]
pointer to object (this)

edx, [ebp
parameter
;
data
+12]
is at offset 0
leave
10
;
edx
mov
ret
;
eax
mov
= integer
[eax], edx
Figure 7.15: Compiler output of Simple::set data(

int )
the first parameter

EAX. This is not the
hidden parameter
being acted
on the stack is stored into

x param-eter! Instead it is the
10
that points to the object
on. Line 6 stores
the
x parameter
into
EDX and line 7 stores EDX into the double word

that EAX points to. This is the data member of
the Simple
object being acted
on,
which being
the only data in the class, is stored at offset 0 in

the Simple structure.
Example
This section
a C++
uses
the ideas of the chapter to
an unsigned
can
be any size, it will be stored in an array
of unsigned integers (double words)
It can be
made any size by using dynamical allocation.
11
The double words are stored in reverse order
create
class that represents
integer of arbitrary
size. Since the integer
(i.e. the least significant double word is at index

0). Figure
Big int
7.16 shows the definition of the

12.
The size of a Big int is
class
measured by the size of the unsigned
array
that
is used to store
10
As usual, nothing is hidden inthe assembly code!
11
Why?
Because
start processing
addition
operations
at the beginning
of the
will then always
array
move
and
forward.
12
See the code example
source
for the complete
for this example. The text will only refer to

code.
159
some
code
of the
class Big int {

public:
Parameters:
explicit Big int (

size t
value of Big int
size
unsigned initial value
as a normal
unsigned int
=0);
Parameters:
13
14
15
16
18
initial
initial value
12
17
of
normal unsigned int s
10
11
as number
size of integer expressed
size
of
normal unsigned int s

initial value
initial
value of Big int
as a string
holding
hexadecimal representation of value.
/
Big int (size t
const char
19
as number
size of integer expressed
size
size
initial value );
20
21
22
Big int (const Big int & big int to copy );

Big int ();
23
24
25
// returns size of Big int (in terms of unsigned int s)

size t
size () const;
26
27
const Big int & operator
28
friend Big int operator
= (const
+(const
const Big int & op2 );
29
30
friend Big int operator (const Big int & op1,
const Big int & op2);
31
32
friend bool operator
== (const
35
36
37
Big int & op1,
const Big int & op2 );
33
34
Big int & big int to copy );
Big int & op1,
friend bool operator
< (const
Big int & op1,

friend ostream & operator
<< (ostream
&
os,
const Big int & op );
38
private:
size t
39
41
size
unsigned number
40
// size of unsigned
;
//
array
pointer to unsigned
array
holding value
};
Figure 7.16: Definition of Big int class
160
STRUCTURES AND C++
CHAPTER
// prototypes for assembly routines
extern C {
const Big int & op1,
res,
int sub big ints (Big int &
const Big int & op1,
res,
int add big ints (Big int &
10
11
12
inline Big int operator
+(const
13
Big int result (op1.size ());
14
int
= add big ints( result

== 1)
op1, op2);
throw Big int::Overflow();
16
if (res
17
== 2)
throw Big int::Size mismatch();
18
return result
19
20
res
if (res
15
Big int & op1, const Big int & op2)
21
22
23
inline Big int operator (const Big int & op1, const Big int & op2)
{
24
Big int result (op1.size ());
25
int
=sub big ints (result

== 1)
op1, op2);
throw Big int::Overflow();
27
if (res
28
== 2)
throw Big int::Size mismatch();
29
return result
30
31
res
if (res
26
Figure 7.17: Big int Class Arithmetic Code

7.2. ASSEMBLY
AND C++
161
its data. The size data member of the class is
assigned offset
zero
and the
number member is
assigned offset 4.
To
these
simplify
instances with the
same
example,
size
only
object
ar-rays can be added
to or subtracted from each other.

The class has three constructors:
(line 9) initializes
normal
unsigned
the class
integer;
initializes
the instance
contains
the second
by using
hexadecimal
the first
instance by using
value.
(line 18)
string
The
that
third
constructor (line 21) is the copy constructor.

This discussion focuses
and
subtraction
where
operators
the assembly
on
how the addition
work
language
since
this
is
is used. Figure
7.17 shows the relevant parts of the header file for

these operators. They show how the operators
set
up to
call
different compilers
rules
for
functions
the
use radically
operator
are
assembly
up
are
Since
different mangling
functions,
used to set
routines.
inline
operator
calls to C linkage
assembly routines. This makes it relatively
easy to
port to different compilers and is just

direct
calls. This technique
need to throw
an exception
as fast as
also eliminates
the
from assembly!
Why is assembly used at all here? Recall that
to perform
carry must
pre-cision arithmetic, the

moved from one dword to be
multiple
be
added to the next significant

C) do not allow the
CPUs
only
carry
flag. Performing
be done
recalculate the
by
carry
dword. C++ (and
programmer to access
having
the addition could
C++
to the next dword. It is much
can
be accessed
independently
flag and conditionally
write the code in assembly
the
more
where the
add it
effi cient to
carry
flag
and using the ADC instruction
which automatically
adds the
carry
flag in makes
a lot of sense.
For brevity, only the add big ints assembly
routine will be discussed
here. Below is the code
for this routine (from big math.asm):
big math.
1
segment .text
global
add_big_ints,
%define size_offset 0
%define number_offset 4
56
%define EXIT_OK 0
7
%define EXIT_OVERFLOW 1
%define EXIT_SIZE_MISMATCH 2
10
;
Parameters
11
%define
12
%define op1ebp+12
13
%define op2 ebp+16
for both add and
res ebp+8
14
sub_big_ints
sub routines
162
15
add_big_ints:
16
push
ebp
17
mov
ebp, esp
18
push
ebx
19
push
esi
20
push
edi
21
;
first
;
22
23
set
up esi to
edi to
CHAPTER 7.STRUCTURES AND C++
point to op1
point to op2
24
25
26
27
28
;
mov
mov
mov
;
ebx to point to res

esi, [op1]
edi, [op2]
ebx, [res]
30
;
make sure
;
31
mov
eax, [esi + size_offset]
32
cmp
eax, [edi + size_offset]
33
jne
sizes_not_equal
34
cmp
eax, [ebx + size_offset]
35
jne
sizes_not_equal
29
that all 3 Big_ints
have
36
37
38
39
40
mov
ecx, eax
;
;
now, set registers to point
;
esi = op1.number_
to their
= op2.number_
= res.number_
edi
ebx
43
;
;
44
mov
ebx, [ebx
45
mov
esi, [esi +number_offset]
46
mov
edi, [edi +number_offset]
41
42
+number_offset]
47
clc
48
xor
edx, edx
;
;
addition loop
49
50
51
52
add_loop:
53
mov
eax, [edi+4*edx]
54
adc
eax, [esi+4*edx]
55
mov
[ebx
56
inc
edx
the
same
+4*edx], eax
size
;
op1.size_
!= op2.size_
;
op1.size_
!= res.size_
;
ecx
=size of Big_ints
respective
arrays
;
clear carry
;
edx
=0
flag
;
does
not alter
carry
flag
57
loop
add_loop
jc
overflow
xor
eax, eax
jmp
done
58
59
60
ok_done:
61
62
63
overflow:
64
mov
eax, EXIT_OVERFLOW
65
jmp
done
66
sizes_not_equal:
mov
eax, EXIT_SIZE_MISMATCH
69
pop
edi
70
pop
esi
71
pop
ebx
72
leave
73
ret
67
68
done:
big math.asm
163
;
return
value
= EXIT_OK
most
Hopefully,
straightforward
of
this
code
to the reader by
to 27 store pointers
should
now.
be
Lines 25
to the Big int objects
passed to the function into registers. Remember
are just pointers. Lines 31

to 35 check to make sure that the sizes of the
three objectss arrays are the same. (Note that the
offset of size is added to the pointer to access the
that references
really
data member.) Lines 44 to 46 adjust the registers
to point to the
objects
instead
array
of
used by the respective

the
objects
themselves
(Again, the offset of the number member is added
to the object pointer.)

The loop in lines 52 to 57 adds the integers
stored in the
arrays
together
by adding the least
dword
significant
in this
first, then
the
next
least
dwords, etc. The addition must be done
significant
sequence
for extended preci-sion arithmetic
(see Section 2.1.5). Line 59 checks for overflow,
on overflow
the
carry
flag will be set by the last
addition of the most significant

dwords
array are
in the
dword. Since the
stored in little endian
order, the loop starts at the beginning of the

and
moves
array
forward toward the end.
Figure 7.18 shows
a short
example using the
Big int class. Note that Big int constants must

be declared
necessary
constructor
conversion
unsigned
as on line
reasons. First,
explicitly
for two
int
to
Big ints of the

makes conversion
that
will
a Big int.
same size can
problematic
16. This
is
there is
no
an
convert
Secondly,
since it would be
diffi cult to know what size to convert to. A

sophisticated
allow
any
author
implementation
only
be added. This
of the
more
class would
any other size. The

over complicate this
size to be added to
did not
wish to
example by implementing this here. (However, the

reader is encouraged to do this.)
164
CHAPTER
STRUCTURES AND C++
#include big int.hpp
#include <iostream>
using
namespace
std;
45
int main()
6
try {
Big int b(5,8000000000000a00b);
Big int a(5,80000000000010230);

Big int
10
cout
11
c =a + b;
<< a << + << b << = << c << endl;
for( int i=0; i

< 2; i++ ){
12
c =c + a;
13
cout
14
15
16
cout
<< c = << c << endl;
<< c1
= << c Big int(5,1)
18
cout
<< d = << d<< endl;
19
cout
<< c
20
cout
<< c > d << (c >d) << endl;
== d << (c == d) << endl;
21
22
catch( const char str ){
cerr << Caught:
23
<< str << endl;
24
25
catch( Big int ::Overflow
){
cerr << Overflow << endl;
26
27
28
catch( Big int ::Size mismatch ){
cerr << Size
29
mismatch
<< endl;
30
return 0;
31
32
<< endl;
Big int d(5, 12345678);
17
Figure 7.18: Simple Use of Big int
165
1
#include <cstddef>
#include <iostream>
using
namespace
std;
45
class A {
6
public:
cdecl m() {cout
void
int ad;
<< A::m() << endl; }
};
10
11
12
class B :
public A {
public:
cdecl m() {cout
13
void
14
int bd;
15
<< B::m() << endl; }
};
16
17
void f( A
18
= 5;
19
p>ad
20
p>m();
21
p)
22
23
int main()
24
25
A a;
26
Bb;
27
cout
28
29
cout
<< Size of a: << sizeof(a)

<< Offset of ad: << offsetof(A,ad) << endl;
<< Size of b: << sizeof(b)
30
<< Offset
of ad: << offsetof(B,ad)
31
<< Offset
of bd: << offsetof(B,bd)
32
f(&a);
33
f(&b);
<< endl;
return 0;
34
35
Figure 7.19: Simple Inheritance
166
_f__FP1A:
push
ebp
mov
ebp, esp
mov
eax, [ebp+8]
mov
dword [eax], 5
mov
eax, [ebp+8]
push
eax
call
_m__1A
add
esp, 4
10
leave
11
ret
;
mangled function name
;
eax points to object
;
using offset 0 for ad
;
passing address of object
to A::m()
;
mangled
method
name
for A::m()
Figure 7.20: Assembly Code for Simple

Inheritance
7.2.5
Inheritance and Polymorphism
Inheritance
data
and
consider
allows
methods
one
of
class
another.
to inherit
For
the
example,
the code in Figure 7.19. It shows two
classes, A and B, where class B inherits from A.

The output of the program is:
Size of
a:4 Offset
of ad: 0
Size of b:8 Offset of ad: 0 Offset of

bd: 4 A::m()
A::m()
Notice that the ad data members of both classes

(B inherits it from A) are at the
same offset. This

may be passed
or any object of a
is important since the f function
pointer to either
an A object
type derived (i.e. inherited

shows
the (edited)
asm
from) A. Figure 7.20
code
for the function
(generated
by gcc).
Note that inthe output that As
a and
one can see that
m method was
called for both the
b objects. From the
assembly,
the call to A::m() is
hard-coded
into
object-oriented
the
function.
programming,
should depend
on what type
the function.
This
For
true
the method
called
of object is passed to
is known
as
polymorphism.
C++ turns this feature off by default. One
uses
the virtual keyword to enable it. Figure 7.21

shows how the two classes
would be changed.
None of the other code needs to be changed.

Polymor- phism
Unfortunately,
transition
can
be implemented
gccs
at the time of this writing
becoming significantly
initial
cover
the
more
implementation.
simplifying
many ways.
implementation
complicated
In
this discussion,
the
is
in
and is
than its
interest
of
the author will only
the implementation of polymor-phism which
Windows
based
Microsoft
and
Borland
167
class A {
public:
virtual void
cdecl m() {cout
<< A::m() << endl; }
int ad;
};
67
class B :
public A {
8
public:
virtual void
11
cdecl m() {cout
<< B::m() << endl; }
int bd;
10
};
Figure 7.21: Polymorphic Inheritance
implementation has not changed in many
years
and probably will not change inthe foreseeable

future.
With these changes, the output of the
program
changes:
Size of
a:8 Offset
of ad: 4
Size of b:12 Offset of ad: 4 Offset of

bd: 8 A::m()
B::m()
Now the second call to f calls the B: :m()
a B object.
This is
not the only change however. The size of
an A is
method because it is passed
now
8 (and B is 12). Also, the offset of ad is 4,
answer to these
not 0.What is at offset 0? The
are
questions
related
to how polymorphism
is
implemented.
A C++ class that has
any
virtual methods is
given
an extra
array
of method pointers
called
the vtable. For the A and B classes this
pointer
hidden field that is a pointer to an

13.
This table is often
is stored
compilers
always
at offset
put
0. The
this
Windows
pointer
at
the
beginning of the class at the top of the inheritance
tree. Looking at the assembly code (Figure 7.22)

generated
for function f (from Figure 7.19) for
the virtual method version
can see
of the
that the call to method
program, one
m is not to a
label. Line 9 finds the address of the vtable from

the object. The address of the object is pushed
on the
stack in line 11. Line 12 calls the virtual
method by branching to the first address in the

14.
vtable
This call does not use a label, it branches
to the code address pointed to by EDX. This

type of call is an
13
For classes
always
without
virtual
make the class compatible
methods
with
C++
compilers
normal C struct
with the
14
It
Of
same
data members.
course,
was put
this value is already
there in line 8 and
and the next line changed
very
effi cient
because
optimizations turned
it
in the ECX register.
line 10 could be removed
to push ECX. The code is not
was
generated
without
compiler
on.
168
?f@@YAXPAVA@@@Z:
push
ebp
mov
ebp, esp
mov
eax, [ebp+8]
mov
dword [eax+4], 5
mov
mov
ecx, [ebp + 8]
10
mov
eax, [ebp + 8]
11
push
eax
12
call
dword [edx]
13
add
esp, 4
45
78
edx, [ecx]
14
15
pop
16
ret
ebp
;
p->ad
=5;
=p
;
edx = pointer
;
eax = p
;
ecx
;
push "this"
to vtable
pointer
;
call first function
;
clean up stack
invtable
Figure 7.22: Assembly Code for f() Function
example of late binding. Late binding delays the

decision of which method
running.
appropriate
This
allows
to call until the code is

the
code
to
call
the
method for the object. The normal
case
(Figure 7.20) hard-codes
call to
certain
method and is called early binding (since here the

method is bound early, at compile time).
The attentive reader will be wondering
why
the class methods in Fig-ure 7.21 are explicitly
declared
to
using the
use
uses a different
methods
passes
the C calling
convention
calling
conven-tion for
than the standard
C++ class
C convention.
It
on by
the
the pointer to the object acted
method in the ECX register instead

stack.
The
stack
explicit parameters
modifier
is still used
of using the
for the other
of the method. The
tells it to
convention.
use
Borland
the standard
uses
C++
lets
complicated
look
example
at
(Figure
cdecl
C calling
the C calling
convention by default.
Next
by
cdecl keyword. By default, Microsoft
slightly
7.23)
more
In it, the
classes A and B each have two methods: m1 and
m2. Remember
its
own
method.
that since class B does not define
m2 method,
Figure
7.24
it inherits
the
shows
the b object
how
A classs
appears in memory. Figure 7.25 shows the output

of the program. First, look at the address of the
vtable
for each
object.
The
two B objectss
are the same and thus, they share the

same vtable. A vtable is a property of the class
not an object (like a static data member). Next,
addresses
look
at the addresses
looking at assembly
in the
output,
169
vtables.
one can
From
determine
class A {
public:
virtual void
cdecl m1() {cout
virtual void
cdecl m2() {cout
int ad;
<< A::m1() << endl; }

<< A::m2() << endl; }
};
78
class B :
public A {
9
cdecl m1() {cout
virtual void
10
<< B::m1() << endl; }
int bd;
11
12
// Binherits As m2()
public:
};
13
/ prints the vtable of given object /
14
void print vtable (A
15
pa
psees pa as an array
16
//
17
unsigned
18
// vt
sees
of dwords
p = reinterpret
vtable
= reinterpret
19
void vt
20
cout
21
for( int i=0; i

< 2;i++ )
<< hex << vtable
cout
22
cast<unsigned
as an array
<< dword
>(pa);
of pointers
cast<void
address
>(p[0]);
= << vt << endl;
<< i
<< :
<< vt[i] << endl;
23
functions in EXTREMELY
24
// call
25
void (m1func
virtual
pointer)(A
26
m1func pointer
= reinterpret
27
m1func pointer(pa);
);
nonportable
way!
// function pointer variable
cast<void ()(A)>(vt[0]);
// call method m1via function pointer
28
29
void (m2func
pointer)(A
30
m2func pointer
= reinterpret
m2func pointer(pa);
31
32
);
// function pointer variable
cast<void ()(A)>(vt[1]);
// call method m2 via function pointer
33
34
int main()
35
36
A a;
Bb1;
37
cout
<< a:
Bb2;
<< endl;
print vtable (&a);
38
cout
<< b1:
<< endl; print vtable (&b);
39
cout
<< b2:
<< endl; print vtable (&b2);
40
return 0;
41
Figure 7.23: More complicated example
170
CHAPTER 7
STRUCTURES AND C++
048
vtablep
ad
bd
&B::m1()
&A::m2()
vtable
b1
Figure 7.24: Internal

representation of b1
a:
vtable address
=004120E8
dword 0:
00401320
dword 1:
00401350
A::m1()
A::m2()
b1:
vtable address
=004120F0
dword 0:
004013A0
dword 1:
00401350
B::m1()
A::m2()
b2:
vtable address
=004120F0
dword 0:
004013A0
dword 1:
00401350
B::m1()
A::m2()
Figure 7.25: Output of program inFigure 7.23
7.2. ASSEMBLY
AND C++
171
is at offset 0 (or dword 0) and m2 is at offset 4

(dword 1). The m2 method
pointers
for the A and B class vtables
are the same
because
class B
inherits the m2 method from the A class.
Lines 25 to 32 show how
one
could call
virtual function by reading its address out of the

15.
vtable for the object
The method address is
stored into
C-type
function
pointer
with
an
explicit this pointer. From the output in Figure
7.25,
one can see
that it does work. However,
please do not write code like this! This is only

used to illustrate
how the virtual methods
use
the vtable.
There
are some
practical lessons to learn from
this. One important fact is that
to be
very
one
careful when reading
class variables to a binary file. One
use a binary
to the file! This
and writing
can not just

or write on the entire object as
or write out the vtable pointer
is a pointer to where the vtable
read
this would read
would have
memory and will vary

from program to program. This same problem
can occur in C with structs, but in C, structs only
have pointers in them if the programmer explicitly
puts them in. There are no obvious pointers
defined ineither the A or B classes.
resides in the programs
Again, it is important to realize that different

compilers
implement
vir-tual methods differently
In Windows, COM (Component
use
class objects
16
interfaces
vtables
compilers
Only
virtual method
vtables
as
Object Model)
to implement
that
COM
implement
does
can
uses
one of
the
Microsoft
create COM classes. This is why Borland
same implementation as Microsoft and

reasons why gcc can not be used to create
the
COM
classes.
The
exactly
code
like
for
the
virtual
non-virtual
one.
method
that calls it is different. If the compiler

absolutely
called,
it
sure
can
of what virtual method
ignore
the vtable
Other C++ features
can be
will be
and call
method directly (e.g., use early binding).
7.2.6
looks
Only the code
the
The workings
RunTime
of other C++ features
Type Information,
and multiple inheritance)
exception
are
good
starting
Reference
point
is
The
handling
beyond the
of this text. If the reader wishes to
go
(e.g.,
scope
a
further,
Anno-tated
C++
Manual by Ellis and Stroustrup
and
The Design and Evolution of C++ by Stroustrup.
15
Remember this code only works with the MS and
Borland compilers, not

16
COM classes also
172
gcc.
use the
stdcall calling convention,
CHAPTER 7
STRUCTURES AND C++
Appendix A
80x86 Instructions
A.1
Non-floating Point
Instructions
This section lists and describes the actions
and formats of the non-floating point instructions

of the Intel 80x86 CPU family.
The formats
use the following
abbreviations:
general register
R8
8-bit register
R16
16-bit register
R32
32-bit register
SR
segment register
These
can
memory
M8
byte
M16
word
M32
double word
immediate value
be combined for the multiple operand
instructions.
means
For
example,
that the instruction
the
format
R, R
takes two register
Many of the two operand instructions
operands.
allow the
same
operands. The abbre-viation O2 is
used to represent these operands: R,R R,M R,I

M,R M,I. If
used for
a 8-bit
register
an operand,
or memory can
the abbreviation,
be
R/M8 is
used.
The table also shows how various bits of the
FLAGS register
are
affected
by each instruction
If the column is blank, the corresponding
bit is
not affected at all. If the bit is always changed to
particular
value,
1or 0 is
column. If the bit is changed
depends
placed
on the
in the
modified in
shown in the
to
value that
operands of the instruction,

column.
some
Finally,
undefined
way a
if the
a C is
bit
is
? appears in
the column. Because the
173
174
APPENDIX
A.
80X86 INSTRUCTIONS
only instructions that change the direction flag
are CLD and STD, it is not

FLAGS columns.
listed under the
Flags
Name
Description
ADC
Add with Carry
O2
ADD
Add Integers
O2
AND
Bitwise AND
O2
BSWAP
Byte Swap
R32
CALL
Call Routine
RMI
CBW
Convert Byte to Word
CDQ
Convert
Dword
Formats
to
Qword
CLC
Clear Carry
CLD
Clear Direction Flag
CMC
Complement
CMP
Compare Integers
CMPSB
Carry
C
C
Compare Bytes
CMPSW
Compare Words
CMPSD
Compare Dwords
CWD
Convert
Word
O2
to
Dword into DX:AX

CWDE
Convert
Word
to
Dword into EAX

DEC
Decrement Integer
RM
DIV
Unsigned Divide
RM
ENTER
Make stack frame
I,0
IDIV
Signed Divide
RM
IMUL
Signed Multiply
R16,R/M16
R32,R/M32
R16,I
R32,I
R16,R/M16,I
R32,R/M32,I
INC
Increment Integer
RM
INT
Generate Interrupt
JA
Jump Above
JAE
Jump Above
JB
Jump Below
JBE
Jump Below
JC
Jump Carry
or Equal
I
I
or Equal
I
I
A.1. NON-FLOATING POINT
175
INSTRUCTIONS
Flags
Name
Description
Formats
=0
JCXZ
Jump if CX
JE
Jump Equal
JG
Jump Greater
JGE
Jump
I
I
I
Greater
or
Equal
JL
Jump Less
JLE
Jump Less
JMP
Unconditional Jump
JNA
Jump Not Above
JNAE
Jump Not Above
or Equal
I
RMI
I
or
Equal
JNB
Jump Not Below
JNBE
Jump Not Below
or
Equal
JNC
Jump No Carry
JNE
Jump Not Equal
JNG
Jump Not Greater
JNGE
Jump Not Greater
or
Equal
JNL
Jump Not Less
JNLE
Jump
Not
Less
or
Equal
JNO
Jump No Overflow
JNS
Jump No Sign
JNZ
Jump Not Zero
JO
Jump Overflow
JPE
Jump Parity Even
JPO
Jump Parity Odd
JS
Jump Sign
JZ
Jump Zero
LAHF
Load FLAGS into AH
LEA
Load Effective Address
LEAVE
Leave Stack Frame
LODSB
Load Byte
LODSW
Load Word
LODSD
Load Dword
LOOP
Loop
LOOPE/LOOPZ
Loop If Equal
LOOPNE/LOOPNZ
Loop If Not Equal
176
80X86 INSTRUCTIONS
R32,M
APPENDIX
A.
Flags
Name
MOV
Description
Formats
Move Data
O2
SR,R/M16
R/M16,SR
MOVSB
Move Byte
MOVSW
Move Word
MOVSD
Move Dword
MOVSX
Move Signed
R16,R/M8
R32,R/M8
R32,R/M16
MOVZX
Move Unsigned
R16,R/M8
R32,R/M8
R32,R/M16
MUL
Unsigned Multiply
RM
NEG
Negate
RM
NOP
No Operation
NOT
1s Complement
RM
OR
Bitwise OR
O2
POP
Pop From Stack
R/M16
POPA
Pop All
POPF
Pop FLAGS
PUSH
Push to Stack
PUSHA
Push All
PUSHF
Push FLAGS
RCL
Rotate Left with Carry
R/M32
R/M16
R/M32 I
R/M,I
R/M,CL
RCR
Rotate
Right
with
Carry
REP
Repeat
REPE/REPZ
Repeat If Equal
REPNE/REPNZ
Repeat If Not Equal
RET
Return
ROL
Rotate Left
R/M,I
R/M,CL
R/M,I
R/M,CL
ROR
Rotate Right
R/M,I
R/M,CL
AH
Copies
SAHF
into
FLAGS
A.1. NON-FLOATING POINT
177
INSTRUCTIONS
Flags
Name
SAL
Description
Formats
Shifts to Left
R/M,I
R/M, CL
SBB
Subtract with Borrow
SCASB
Scan for Byte
SCASW
Scan for Word
SCASD
Scan for Dword
SETA
Set Above
SETAE
Set Above
SETB
Set Below
SETBE
Set Below
SETC
Set Carry
R/M8
SETE
Set Equal
R/M8
SETG
Set Greater
SETGE
Set Greater
SETL
Set Less
SETLE
Set Less
SETNA
Set Not Above
SETNAE
Set
O2
R/M8
or Equal
R/M8
R/M8
or Equal
R/M8
R/M8
or Equal
R/M8
R/M8
or Equal
Not
R/M8
R/M8
Above
or
R/M8
Equal
SETNB
Set Not Below
SETNBE
Set
Not
R/M8
Below
or
R/M8
Equal
SETNC
Set No Carry
R/M8
SETNE
Set Not Equal
R/M8
SETNG
Set Not Greater
SETNGE
Set
Not
Greater
R/M8
or
R/M8
Equal
SETNL
Set Not Less
SETNLE
Set Not LEss
R/M8
or Equal
R/M8
SETNO
Set No Overflow
R/M8
SETNS
Set No Sign
R/M8
SETNZ
Set Not Zero
R/M8
SETO
Set Overflow
R/M8
SETPE
Set Parity Even
R/M8
SETPO
Set Parity Odd
R/M8
SETS
Set Sign
R/M8
SETZ
Set Zero
SAR
Arithmetic
R/M8
Shift
to
Right
R/M,I
R/M, CL
178
APPENDIX
A.
80X86 INSTRUCTIONS
Flags
Name
SHR
Description
Logical Shift to Right
Formats
R/M,I
R/M, CL
SHL
Logical Shift to Left
R/M,I
R/M, CL
STC
Set Carry
STD
Set Direction Flag
STOSB
Store Btye
STOSW
Store Word
STOSD
Store Dword
SUB
Subtract
O2
TEST
Logical Compare
R/M,R
R/M,I
XCHG
Exchange
XOR
Bitwise XOR
R/M,R
R,R/M
O2
A.2. FLOATING POINT INSTRUCTIONS
179
A.2
Floating Point Instructions
In this section,
coprocessor
section
of
information
the
of the 80x86
are
instructions
description
operation
many
described.
about whether
the
The
describes
briefly
instruction.
math
the
save space,
instruction pops
To
the stack is not given inthe description.

The
format
operands
column
can be used
what
type
are used:
STn
Single precision number inmemory
Double precision number in memory
register
Extended precision number in memory

I16
Integer word in memory
I32
Integer double word inmemory
I64
Integer quad word inmemory
Instructions requiring
terisk(
of
with each instruction. The
following abbreviations
coprocessor
shows
).
a Pentium
Pro
or better are marked
with anas-
Description
ST0
FABS
FADDP dest[,ST0]
+= src
+= STO
dest += ST0
FCHS
ST0
FADD
src
FADD dest, ST0
FCOM
src
FCOMP
ST0
STn FD
dest
STn
STn
= ST0
Compare ST0 and
src
Format
= |ST0|
Compare ST0 and
src
src
STn FD
STn FD
src
Compares ST0 and ST1
src
Compares into FLAGS
STn
Compares into FLAGS
STn
ST0 /= src
STn FD
FDIV dest, ST0
dest /= STO
STn
FDIVP dest[,ST0]
dest /= ST0
STn
FCOMPP
FCOMI
FCOMIP
FDIV
src
src
FDIVRP dest[,ST0]
= src/ST0
dest = ST0/dest
dest = ST0/dest
FFREE dest
Marks
FDIVR
src
FDIVR dest, ST0
STn FD
ST0
STn
STn
as empty
STn
+= src
FIADD
src
ST0
FICOM
src
Compare ST0 and
I16 I32
Compare
I16 I32
FICOMP
FIDIV
src
src
FIDIVR
src
STO /= src
STO
= src/ST0
180
80X86 INSTRUCTIONS
I16 I32
src
ST0 and src
I16 I32
I16 I32
APPENDIX
A.
Instruction
FILD
src
FIMUL
src
Description
I16 I32 I64
ST0 *= src
I16 I32
FINIT
Initialize Coprocessor
FIST dest
Store ST0
FISTP dest
Store ST0
FISUB
src
FISUBR
src
-= src
ST0 =src -ST0
ST0
FLD src
Push src on Stack
FLD1
Push 1.0 on Stack
FLDCW
src
Format
Push src on Stack
Load Control Word Register
I16 I32
I16 I32 I64
I16 I32
I16 I32
STn FDE
I16
Push on Stack
FLDPI
Push 0.0 on Stack
FLDZ
ST0 *= src
STn FD
FMUL dest, ST0
dest *= STO
STn
FMULP dest[,ST0]
dest *= ST0
STn
FRNDINT
Round ST0
FMUL
src
= ST0
=
bST1c
FSCALE
ST0
FSQRT
ST0
FST dest
Store ST0
FSTP dest
Store ST0
FSTCW dest
Store Control Word Register
I16
FSTSW dest
Store Status Word Register
I16 AX
STO
FSUBP dest[,ST0]
-= src
dest -= STO
dest -= ST0
ST0 =src-ST0
dest = ST0-dest
dest = ST0-dest
FTST
Compare ST0 with 0.0
FXCH dest
Exchange ST0 and dest
FSUB
src
FSUB dest, ST0

FSUBP dest[,ST0]
FSUBR
src
FSUBR dest, ST0
ST0
STn FD
STn FDE
STn FD
STn
STn
STn FD
STn
STn
STn
Index
ADC, 37, 54
ADD, 13, 36
AND, 50
array1.asm, 99102
arrays,
95115
accessing, 96102
defining, 9596
local variable, 96
static, 95
multidimensional, 103106
parameters, 105106
two dimensional, 103104

assembler, 11
assembly language, 1112
binary, 12
addition, 2
bit operations
AND, 50
assembly, 5253
C,5657
NOT, 51
OR, 50
shifts, 4750
arithmetic shifts, 48
logical shifts, 4748
rotates, 49
XOR, 51
branch prediction, 53
bss segment, 21
BSWAP, 59
BYTE, 16
byte, 4
C driver, 19
181
C++, 150171
Big int example, 158163

classes, 156171
copy constructor,
161
early binding, 168
extern C, 153
inheritance, 166171
inline functions, 154156
late binding, 168

member functions,
see meth-
ods
name
mangling, 151153
polymorphism, 166171
references, 153154
typesafe linking, 152

virtual, 166
vtable, 167171
CALL, 6970
calling convention, 65, 7076, 83
84
cdecl, 84
stdcall, 84
C, 21, 71, 8084
labels, 82
parameters, 82
registers, 81
return values, 83
Pascal, 71
register, 84
standard call, 84
stdcall, 72, 84, 171
CBW, 31
CDQ, 31
CLC, 37
182
CLD, 106
clock, 5
CMP, 3738
CMPSB, 109
CMPSD, 109
CMPSW, 109
code segment, 21
COM, 171
comment, 12
compiler, 5,11
Borland, 22, 23
DJGPP, 22, 23
gcc, 22
attribute
149
84, 145, 148,
Microsoft, 22
pragma
pack, 146, 148, 149
Watcom, 83
conditional branch, 3941

counting bits, 6064
method
one, 6061
method three, 6264

method two, 6162
CPU, 57
80x86, 6
CWD, 31
CWDE, 31
data segment, 21
debugging, 1618
DEC, 13
decimal, 1
directive, 1315
%define, 14
DX, 14, 95
data, 1415
DD, 15
DQ, 15
equ, 13
extern, 77
global, 21, 78, 80
RESX, 14, 95
TIMES, 15, 95
INDEX
DIV, 34, 48
do while loop, 43
DWORD, 16
endianess, 2425, 5760

invert endian, 59
FABS, 130
FADD, 126
FADDP, 126
FCHS, 130
FCOM, 129
FCOMI, 130
FCOMIP, 130, 140
FCOMP, 129
FCOMPP, 129
FDIV, 128
FDIVP, 128
FDIVR, 128
FDIVRP, 128
FFREE, 126
FIADD, 126
FICOM, 129
FICOMP, 129
FIDIV, 128
FIDIVR, 128
FILD, 125
FIST, 126
FISUB, 127
FISUBR, 127
FLD, 125
FLD1, 125
FLDCW, 126
FLDZ, 125
floating point, 117139

arithmetic, 122124
representation, 117122
denormalized, 121
double precision, 122

hidden
one, 120
IEEE, 119122
single precision, 120121
floating point
INDEX
coprocessor,
124139
addition and subtraction, 126
127
comparisons, 128130
hardware, 124125
loading and storing data, 125
126
multiplication and division, 128
FMUL, 128
FMULP, 128
FSCALE, 130, 141
FSQRT, 130
FST, 126
FSTCW, 126
FSTP, 126
FSTSW, 129
FSUB, 127
FSUBP, 127
FSUBR, 127
FSUBRP, 127
FTST, 129
FXCH, 126
gas, 157
hexadecimal, 34
I/O, 1618
asm io library,
1618
dump math, 18
dump
dump
mem, 17
regs, 17
dump stack, 17
print char, 17
print int, 17
print nl, 17
print string, 17
read char, 17
read int, 17
IDIV, 34
if statment, 42
immediate, 12
IMUL, 3334
INC, 13
183
indirect addressing, 6566
arrays,
98102
integer, 2738
comparisons, 3738
division, 34
extended precision, 3637

multiplication, 3334
representation, 2733
ones complement, 28
signed magnitude, 27
twos complement, 2830
sign bit, 27, 30
sign extension, 3033

signed, 2730, 38
unsigned, 27, 38
interfacing with C, 8089
interrupt, 10
JC, 39
JE, 41
JG, 41
JGE, 41
JL, 41
JLE, 41
JMP, 3839
JNC, 39
JNE, 41
JNG, 41
JNGE, 41
JNL, 41
JNLE, 41
JNO, 39
JNP, 39
JNS, 39
JNZ, 39
JO, 39
JP, 39
JS, 39
JZ, 39
label, 1416
LAHF, 129
LEA, 83, 103
184
linking, 23
listing file, 2324

locality, 143
LODSB, 107
LODSD, 107
LODSW, 107
LOOP, 41
LOOPE, 41
LOOPNE, 41
LOOPNZ, 41
LOOPZ, 41
machine language, 5,11

MASM, 12
math.asm, 3536
memory, 45
pages, 10
segments, 9,10
virtual, 9,10
memory.asm,
111115
memory:segments, 9
methods, 156
mnemonic, 11
MOV, 12
MOVSB, 108
MOVSD, 108
MOVSW, 108
MOVSX, 31
MOVZX, 31
MUL, 3334, 48, 103
multi-module
NASM, 12
NEG, 34, 55
nibble, 4
NOT, 52
programs,
7780
opcode, 11
OR, 51
prime.asm, 4345
prime2.asm, 135139
protected mode
16-bit, 9
INDEX
32-bit, 10
quad.asm, 130133
QWORD, 16
RCL, 49
RCR, 49
read.asm, 133135
real mode, 89
recursion, 8990
register, 5,78
32-bit, 8
base pointer, 7,8
EDI, 107
EDX:EAX, 31, 34, 37, 83
EFLAGS, 8
EIP, 8
ESI, 107
FLAGS, 7,3738
CF, 38
DF, 106
OF, 38
PF, 39
SF, 38
ZF, 38
index, 7
IP, 7
segment, 7,8,107
stack pointer, 7,8
REP, 108109
REPE, 110
REPNE, 110
see REPNE
see REPE
REPNZ,
REPZ,
RET, 6970, 72
ROL, 49
ROR, 49
SAHF, 129
SAL, 48
SAR, 48
SBB, 37
SCASB, 109
INDEX
185
SCASD, 109
16 SCASW, 109
WORD,
word,
8 SCSI, 147149
SETxx, 54
60 SETG, 55
51 SHL, 47
SHR, 47
skeleton file, 25
speculative execution, 53
stack, 6876
local variables, 7576, 8283
parameters, 7073
startup code, 23
STD, 106
storage types
XCHG,
XOR,
automatic, 91
global, 91
register, 93
static, 91
volatile, 93
STOSB, 107
STOSD, 107
string instructions, 106115
structures, 143150
alignment, 145146
bit fields, 146150
offsetof(), 144
SUB, 13, 36
subprogram, 6693
calling, 6976
reentrant, 89
subroutine,
see subprogram
TASM, 12
TCP/IP, 59
TEST, 51
text segment,
see code segment
twos complement, 2830
arithmetic, 3337
TWORD, 16
UNICODE, 59
while loop, 43

Pcasm Book K2opt

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pcasm Book K2opt

Uploaded by

Copyright:

Available Formats

PC Assembly Language

Copyright ? 2001, 2002, 2003, 2004 by Paul

entirety (including this

derivative works like trans-lations.

Note that this restriction is not intended to

copying the document.

Instructors are encouraged to use this document as

appreciate being notified inthis

The 80x86 family of CPUs

8086 16-bit Registers

80386 32-bit registers

16-bit Protected Mode

32-bit Protected Mode

Input and Output

Assembling the code

Compiling the C code

Linking the object files

Basic Assembly Language

Working with Integers

Twos complement arithmetic

Extended precision arithmetic

The loop instructions

Translating Standard Control Structures

Example: Finding Prime Numbers

Boolean Bitwise Operations

The AND operation

The XOR operation

The NOT operation

The TEST instruction

Uses of bit operations

Avoiding Conditional Branches

Manipulating bits inC

The bitwise operators of C

Using bitwise operators in C

Big and Little Endian Representations

Simple Subprogram Example

The CALL and RET Instructions

Interfacing Assembly with C

Calculating addresses of local variables

Other calling conventions

functions from assembly

Reentrant and Recursive Subprograms

Review of C variable storage types

More advanced indirect addressing

Reading and writing

The REP instruction prefix

Comparison string instructions

The REPx instruction prefixes

Floating Point Representation

Non-integral binary numbers

IEEE floating point representation

Multiplication and division

Ramifications for programming

The Numeric Coprocessor

Structures and C++

Using structures inassembly

Assembly and C++

Overloading and Name Mangling

Inheritance and Polymorphism

Other C++ features

A.1Non-floating Point Instructions

of this book is to give the reader

that the original PC