Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Introduction to 80x86 Assembler

What is assembly language?

The CPU (central processing unit) or microprocessor acts as an interpreter. It reads instructions
one at a time from memory and performs each action. These instructions are in what is called
machine language. The instructions are just bytes of binary data. Machine language
instructions, which may have data associated with them, may be one byte long; most are two or
three bytes long, and some are larger.

For example, there is an instruction which instructs the processor to clear the carry flag. (Don't
worry about what this means.) This particular instruction has a one-byte machine language code:
"11111000b" (F8 hex). Whenever the processor reads in the instruction F8 hex, it will clear the
carry flag.

Multi-byte instructions often include data. These data are usually called operands. These are
usually extra bytes added after a instruction code; the instruction codes themselves are called
opcodes. The extra bytes might hold a value representing a certain register, or an address, or
some arbitrary value (perhaps the ASCII code of a certain character).

To write a machine-language program, we would first need a big chart listing all of the different
instructions, and all of the corresponding opcodes, either in binary or hexadecimal. Then we
could write a program by writing down all of the necessary instructions in sequence.

As far as I know, this was the way programming was done in the very early years of computing
when stored-program computers were new (before that, most computers required "re-wiring" to
do different tasks; grab an encyclopedia and look at the photos of ENIAC and its buddies). There
are a few problems with this method though, which assembly language and higher-level
languages solve.

The first problem is that of unreadability: a program consisting of pages upon pages of hex digits
is not very readable, at least for humans. Imagine searching for a bug in a stack of pages of hex
digits.

The second problem involves the lack of variables, or at least variable names. You have to
manually set aside and specify addresses to chunks of memory to serve as variables.

The third problem, which is even more serious, involves addressing issues. If you want to insert
extra instructions into the middle of a program, you have to recalculate any addresses or relative
addresses (described later, don't worry) in your program to accomodate the changes in location
of the different sections of code. This makes modifying existing programs incredibly difficult.

Assembly language solves these problems. Assembly language allows you to refer to instructions
by longer names, called mnemonics. So, instead of remembering or looking up F8 hex, you can
use the mnemonic "CLC" to clear the carry flag. The majority of mnemonics are abbreviations
that are three letters in length, although they can be between two (eg. "JE") and nine or so letters
(eg. "CMPXCHG8B") in length. These mnemonics will seem confusing and hard to remember at
first, but they do actually become quite memorizable. They are far better than long strings of hex
digits.

Assembly language also allows the use of labels and variables, which let you give names to
addresses in a program without specifying the actual "concrete" addresses.

An assembler is a program that takes an assembler source code file and (perhaps in conjunction
with a linker) translates it to machine language, so that it can be run (that is, interpreted by the
processor). It's really just like a compiler, except that there is a one-to-one relationship for the
assembler's instructions: a mnemonic, plus its operands (data), always translates to a single
machine language opcode, plus the operands after it. In a compiler, commands and structures
normally expand into several (or many) machine language instructions.

Why use assembly language?

Assembly language instructions perform "tiny little actions". Instructions exist to do such things
as increment, decrement, add, subtract, perform logical operations (AND, OR, XOR, etc.),
compare two values, move values from one register to another, and so on. You won't find any
assembly language instructions to write text to the screen or handle keyboard input.

This means assembler programs have to deal with a lot more detail. For writing to the screen, for
example, you can't just call a pre-written function like printf(). You have to write your own
special routines in assembler to write to the screen (which, in this case, can be made a bit easier
using interrupts).

So a major disadvantage is the micromanagement that you must handle. It almost always takes
longer to write a program in assembler than it does to write an equivalent program in a high-level
language. So why would anyone use assembly language?

The two biggest reasons are control and speed. Assembler allows you direct access to whatever
hardware devices and resources you want. You can do anything however you like, whereas in
high-level languages, if you don't like the way a built-in function works, there's not much you
can do about it. This added control lets you optimize for speed. Properly written assembler
programs can run much faster than compiled programs, for several reasons. Compilers often
generate redundant code or code that could execute faster if written a different way. (Modern
compilers are actually very good at producing optimized code, but there is still room for
improvement.) Also, high-level languages often perform extra error checking, such as bounds
checking. True, C does basically no error checking, which is why it's generally faster than other
languages. Assembler gives you total control over these matters, so you can decide how
efficiently a routine should run.

In terms of readability and maintainability, high-level langauges are far superior to assembler.
The worst aspect of assembler is its complete lack of portability -- you can't easily convert your
program to other platforms that use different families of microprocessors. Other makes and
models of processors use different instruction sets and assembly languages.

Today, assembly langauge is mainly used for speeding up critical routines used in larger
programs that are written in high-level languages such as C or C++. In this and the following
chapters, we'll write both independent, stand-alone programs, and functions to be used in C and
C++ programs.

Which assembler should I use?

These tutorials are written with Borland's Turbo Assembler (TASM) in mind. It seems to be the
most widely-used assembler for the PC, so it's more or less the standard. I'm satisfied with it. The
latest version of TASM as of this writing is 5.0. It's impossible to find in stores and it's not
cheap, although you can get a slight discount if you're a student.

A good alternative is the shareware assembler A86, along with its debugger D86. These should
be reasonably easy to find on the internet -- do an FTP search or visit some software repository
sites and look for "A86". If you do decide to use A86, please register it; I believe the registration
fee is about $50 US.

Microsoft has or had an assembler, MASM -- I don't think they update or support it any longer.
There are also some other lesser-known commercial brands of assemblers, and there are some
old shareware ones such as CHASM.

The assembler code here has only been tested with TASM. For other assemblers, you may need
to do some basic conversions. The modern assemblers are more or less similar. The instructions
and opcodes must be the same across all assemblers for the PC; they mainly differ in the
formatting of the "overhead" assembler directives and structures, and they differ in terms of
fancy new features.

While you're choosing an assembler, you might want to go out and get a book on assembly
language. These tutorials will cover all of the important points, but it's always good to have a
second source of reference. More importantly, though, make sure that you get an assembly
language book that has a good instruction set listing at the back. If computer books are too
expensive (yes, they certainly are), you can use an on-line instruction set reference such as
http://www.qzx.com/pc-gpe/intel.doc.
SAMPLE CODE:

SAMPLE1:

; program #1

% TITLE 'sending output to printer'


IDEAL
MODEL small
STACK 100h

DATASEG
hellomessage DB 'hello, world',13,10,12
HELLO_MESSAGE_LENGTH = $ - hellomessage

CODESEG
START:
mov ax,@data
mov ds,ax ;set DS to point to the data
segment
mov ah,40h ;DOS write to device function#
mov bx,4 ;printer handle
mov cx,HELLO_MESSAGE_LENGTH ;number of characters to print
mov dx,OFFSET hellomessage ;string to print
Int 21h ;print "hello,world"
mov ah,4ch ;dos terminate program function#
Int 21h ;terminate the program
END START
SAMPLE 2

Turbo Assembler Version 4.1 08/25/01 16:12:35 Page 1


hand1.ASM

1 ; program #1
2
3 % TITLE 'sending output to printer'
4 IDEAL
5 0000 MODEL small
6 0000 STACK 100h
7
8 0100 DATASEG
9 0000 68 65 6C 6C 6F 2C 20+ hellomessage DB 'hello,
world',13,10,12
10 77 6F 72 6C 64 0D 0A+
11 0C
12 =000F HELLO_MESSAGE_LENGTH = $ -
hellomessage
13
14 000F CODESEG
15 0000 START:
16 0000 B8 0000s mov ax,@data
17 0003 8E D8 mov ds,ax
;set DS to point to the data segment
18 0005 B4 40 mov ah,40h
;DOS write to device function#
19 0007 BB 0004 mov bx,4
;printer handle
20 000A B9 000F mov cx,HELLO_MESSAGE_LENGTH
;number of characters to print
21 000D BA 0000r mov dx,OFFSET hellomessage
;string to print
22 0010 CD 21 Int 21h
;print "hello,world"
23 0012 B4 4C mov ah,4ch
;dos terminate program function#
24 0014 CD 21 Int 21h
;terminate the program
25 END START
Turbo Assembler Version 4.1 08/25/01 16:12:35 Page 2
Symbol Table
'sending output to printer'

Symbol Name Type Value

??DATE Text "08/25/01"


??FILENAME Text "hand1 "
??TIME Text "16:12:35"
??VERSION Number 040A
@32BIT Text 0
@CODE Text _TEXT
@CODESIZE Text 0
@CPU Text 0101H
@CURSEG Text _TEXT
@DATA Text DGROUP
@DATASIZE Text 0
@FILENAME Text HAND1
@INTERFACE Text 000H
@MODEL Text 2
@STACK Text DGROUP
@WORDSIZE Text 2
HELLOMESSAGE Byte DGROUP:0000
HELLO_MESSAGE_LENGTH Number 000F
START Near _TEXT:0000

Groups & Segments Bit Size Align Combine Class

DGROUP Group
STACK 16 0100 Para Stack STACK
_DATA 16 000F Word Public DATA
_TEXT 16 0016 Word Public CODE

You might also like