Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

CHAPTER 3

ASSEMBLERS
Basic Assembler Functions
To accept as input an assembler language
program and to produce its machine language
equivalent along with information for the
loader.

Assembler
Language To Linker
Program Assembler Machine Language

Listing
EXAMPLE (Refer to Prog.)
 The program has a main routine that calls
subroutines RDREC and WRREC.

 The main routine reads records from an input


device (F1) and copies them to an output device
(code 05).

 Subroutine RDREC reads a record into a buffer


and subroutine WRREC writes the record from
the buffer to the output device.

 Only one character is transferred at a time.

 The end of each record is marked by a null


character (00).
EXAMPLE (CONT.)
 If a record is longer than the length of
the buffer (4096 bytes) only the first
4096 bytes are copied.

 The end of a file is indicated by a zero


length record.

 When the end of file is detected the


program writes characters EOF on the
output device and terminates by
executing an RSUB instruction.
Subroutine to read record into Buffer
Subroutine to write record from Buffer
The program is re-written here
below with the generated object
(machine code) for each
statement assuming that the
program was loaded from
address 1000
A simple SIC Assembler
A simple SIC Assembler
A simple SIC Assembler
A simple SIC Assembler
The translation of the assembler program to object
code needs the following:
1. Convert mnemonic operation codes to their
machine language equivalent; e.g. translate STL
to 14.
2. Convert symbolic operands to their equivalent
machine addresses e.g. translate RETADR to
1033.
3. Build the machine instructions in the correct
format.
4. Convert the data constants into their internal
machine representations e.g. translate EOF to
454F46
5. Write the object program and the assembly
listing.
A simple SIC Assembler
The translation of addresses to their equivalent
codes is complicated because the address to
be assigned to the symbol is unknown.

Because of this, most assemblers make 2


passes over the source program.
ASSEMBLER PASSES
Pass 1 (Define symbols)
 Scans the source program for label
definitions and assigns addresses to all
statements in the program.
 Saves the addresses assigned to all
labels for use in pass 2
 Perform some processing of the
assembler directives. e.g., determining the
length of data areas defined by BYTE,
RESW etc.
ASSEMBLER PASSES
Pass 2
 Assemble instructions (translating
operation codes and looking up
addresses)
 Generate data values defined by BYTE,
WORD, etc.
 Perform processing of the directives not
done during pass 1.
 Write the object program and assembly
listing onto some output device which will
later be loaded in memory for execution.
ASSEMBLER TABLES AND LOGIC
 OPTAB (Operation Code Table) is used to look
up mnemonic operation codes and translate
them into their machine language equivalents.

 SYMTAB (Symbol Table) is used to store


addresses assigned to labels. It includes the
name and address for each label in the source
code program.

 A location counter LOCCTR is used to assign


addresses.
ASSEMBLER TABLES AND LOGIC
 In pass 1 OPTAB is used to look up and
validate operation codes in the source
program.
 In pass 2 OPTAB is used to translate the
opcodes to machine language.
 In pass 1 the labels are entered into the SYMTAB
as they are encountered in the source program
along with their assigned addresses.
 In the second pass symbols used as operands
are looked up in SYMTAB to obtain the
addresses to be inserted in the assembler
instructions.
OBJECT PROGRAMS
A simple object program contains three types of
records:
 The Header: contains the program name, the
starting address of the program and the length of
the whole object program.
 The Text record contains the translated
instructions and the data of the program together
with an indication of the addresses where they are
loaded.
 The End record marks the end of the object
program and specifies the address of the program
where execution is to begin.
OBJECT PROGRAMS
Header Record
Col. 1 H
Col 2-7 Program Name
Col 8-13 Starting address of the object program
Col 14-19 Length of object program in bytes.

Text Record
Col. 1 T
Col 2-7 Starting address for object code in this record
Col 8-9 Length of object code in this record in bytes.
Col 10-69 Object code in hexadecimal.

End Record
Col. 1 E
Col 2-7 Address of first executable instruction in
object program.
OBJECT PROGRAMS
H^ COPY ^001000^00107A
T^001000^1E^141033^482039^001036^281030^
301015^482061^3C1003^00102A^0C1039^00102D

T^00101E^15^0C1036^482061^081033^4C0000^
454F46^000003^000000

T^002039^1E^041030^001030^E0205D^30203F^
D8205D^281030^302057^549039^2C205E^38203F

T^002057^IC^101036^4C0000^F1^001000^041030^
E02079^302064^509039^DC2079^2C1036

T^002073^07^382064^4C0000^05

E^001000
OBJECT PROGRAMS
H^ COPY ^001000^00107A
T^001000^1E^141033^482039^001036^281030^301015^482061^3C1003^00102A^0C1039^00102D
T^00101E^15^0C1036^482061^081033^4C0000^454F46^000003^000000
T^002039^1E^041030^001030^E0205D^30203F^D8205D^281030^302057^549039^2C205E^38203F
T^002057^IC^101036^4C0000^F1^001000^041030^E02079^302064^509039^DC2079^2C1036

T^002073^07^382064^4C0000^05
E^001000
Machine Dependent Features
Machine Dependent Features
Machine Dependent Features
Machine Dependent Features
They are features that get affected
when different machines are used.
The new program runs on the SIC/XE
machine.
Indirect addressing is indicated by
adding an appendix @ to the
operand.
Immediate operands are denoted
with the prefix #.
Machine Dependent Features
Instruction Formats and addressing Modes
Register-register instructions are preferred wherever
possible.
COMP ZERO is replaced by COMPR A, S;
TIX MAXLEN is replaced by TIXR T

The assembler simply converts the mnemonic


operation code to machine language using the
OPTAB.
e.g.
COMPR A,S= A004; TIXR T = B850
Clear X = B410; Clear A = B400
Clear S = B440; Clear T = B450
Machine Dependent Features
Instruction Formats and addressing Modes

Instructions that refer to memory are assembled


using either program counter relative or the base
relative mode.

The instruction
0000 FIRST STL RETADR 17202D
Is a typical Program Counter relative assembly.

The PC will contain the address of the next


instruction (0003).

The displacement needed is 0030 – 0003 = 02D.


Machine Dependent Features
Instruction Formats and addressing
Modes

Bit p is set to 1 to indicate program counter


relative addressing.

Bits n and i are both set to 1 indicating


neither indirect nor immediate addressing.
Machine Dependent Features
0017 J CLOOP 3F2FEC
 The operand address is 0006. During
execution the PC will contain 001A. The
displacement required is
0006 – 001A = -014 = FEC
Machine Dependent Features
The assembler directive BASE is used in
conjunction with base relative addressing.

 The statement
BASE LENGTH
informs the assembler that the base
register will contain the address of
LENGTH which is loaded by the
instruction
LDB #LENGTH.
Machine Dependent Features
The instruction
104E STCH BUFFER, X 57C003
is an example of base relative assembly.
Register B contains 0033. The address
of buffer is 0036. The displacement is
therefore 0036-0033 = 3. Bits x and b
are set to 1 to indicate indexed and base
relative addressing.
Machine Dependent Features
Instruction Formats and addressing Modes
If the displacements are too large Format 4
is used and no displacement is calculated.

In the instruction

0006 CLOOP +JSUB RDREC 4B101036

the operand address is 1036. It is stored in


the instruction with bit e set to 1 to indicate
extended instruction format.
Machine Dependent Features
Immediate Addressing

0020 LDA #3 010003


The operand is within the instruction as 003.
Bit i is set to 1 to indicate immediate addressing.

103C +LDT #4096 75101000


The operand is too large to fit into 12 bits, so extended
format is used.

0003 LDB #LENGTH 69202D


Here program counter addressing is combined with
immediate addressing.

002A J @RETADR 3E2003


combines program counter relative and indirect addressing.
Program Relocation
 In most cases the actual starting address of a program is
not known until load time.

Example: In the Simple SIC program, the instruction


101B LDA THREE 00102D
register A is to be loaded from memory address 102D. If an
attempt is made to load the program at address 2000
instead of 1000 the address 102D will not contain the
expected value.

 Some changes in the address portion of this instruction are


needed before executing the program at address 2000.

 The object program that contains information necessary to


perform this kind of modification is called a relocatable
program.
Program Relocation
 For the SIC/XE, the JSUB instruction is
loaded at address 0006. The address field of
this instruction contains 01036.

 If the program is loaded at address 5000 the


address of the instruction labeled RDREC is
then 6036. The JSUB instruction will have to
be modified to contain the new address.

 No matter where the program is loaded,


RDREC will always be 1036 bytes past the
starting address.
Program Relocation
0000

0006 4B101036 (+JSUB RDREC)

1036 B410 RDREC

1076

5000

5006 4B106036 (+JSUB RDREC)


7420
6036 B410 RDREC
7426 4B108456 (+JSUB RDREC)

6076 8456 B410 RDREC

8496
Program Relocation
 The relocation problem is solved in the following way:

 When the assembler generates the object code for the JSUB
instruction it will insert the address of RDREC relative to the
start of the program.

 It will also produce a command for the loader instructing it to


add the beginning address of the program to the address field
in the JSUB instruction at load time.

 This is accomplished by having a modification record


with the format:
Col. 1 M
Col 2-7: starting location of the address field to be
modified relative to the beginning of the program
Col 8-9: length of the address field to be modified in half
bytes
Program Relocation
 For the JSUB instruction the modification
record would be M00000705

 The beginning address of the program is to be


added to a field that begins at the address
000007 and it is 5 half bytes in length.

 The same relocation must be added for the


instructions at addresses 0013 and 0026
respectively.

 Instructions that do not access memory e.g.


CLEAR S or LDA #3 or instruction assembled
using program counter relative addressing or
base relative addressing are not relocated.
Program Relocation
Machine Independent Assembler Features.
 These are features that are not related to
machine structure.

 They are related to programmer’s


convenience and software environment
Machine Independent Assembler Features.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
Literals
 They are constant operands written as part
of the instruction that uses them.

 001A ENDFIL LDA =C’EOF’ 032010


specifies a 3 byte operand whose value is
the character string EOF.

 1062 WLOOP TD =X’05’


specifies a 1 byte literal with a hexadecimal
value 05.
Machine Independent Assembler Features.
 Literals are different from immediate
addressing.

 With immediate addressing the


operand value is assembled as part of
the machine instruction.

 With a literal the assembler generates


the specified value as a constant at
some other memory location.
Machine Independent Assembler Features.
 Literal operands used in a program are
gathered together into a literal pool at the
end of the program.

 Literal pools can be placed at some other


location in the object code if a directive
LTORG is used.

 When the assembler encounters LTORG it


creates a literal pool that contains all the
literal operands used since the previous
LTORG or the beginning of the program.
Machine Independent Assembler Features.
 The assembler generates a Literal Table
LITTAB that contains the literal name, the
operand value and the address assigned to
the operand when it is placed in the literal
pool.

 During pass 1 the assembler puts the


literals in LITTAB.

 During pass 2 the data values specified by


the literals are inserted in the appropriate
places in the object program.
Machine Independent Assembler Features.
2. Symbol Defining Statements
 The directive EQU (equate) allows the programmer to
define symbols and specify their values as:

Symbol EQU Value

The symbol is defined into SYMTAB and it is


assigned a specified value.

 For +LDT #4096 we could include a statement:


MAXLEN EQU 4096,
 then write in the program
+LDT #MAXLEN
Machine Independent Assembler Features.
 The Base and index registers may be defined as
BASE EQU R1
INDEX EQU R2

 Another directive ORG (Origin) also assigns values


to symbols with a format:
ORG VALUE
where VALUE is a constant or an expression
involving constants and previously defined symbols.

 When the assembler encounters this statement it


resets the LOCCTR to the specified value.
Machine Independent Assembler Features.
3. Expressions
 They can be used wherever a single operand is
permitted.

 The current value of the location counter is


*
designated by . It represents the value of the
next unassigned memory location.

 The statement
BUFEND EQU *
gives BUFEND the value of the address of the
next byte after the buffer area.
Machine Independent Assembler Features.
 An expression that contains only absolute terms is an
absolute expression. It may also contain relative terms
so long as the relative terms occur in pairs and the terms
in each pair have opposite signs.

 A relative expression is one in which all the relative


terms except one can be paired. The remaining unpaired
relative term must have a positive sign.

MAXLEN EQU BUFEND – BUFFER

 BUFEND and BUFFER are relative terms but the


expression represents an absolute value.

 Expressions such as BUFEND + BUFFER, 100 –


BUFFER OR 3* BUFFER represent neither absolute
values nor locations within the program.
Machine Independent Assembler Features.
4. Program Blocks
 They are segments of code that are re-arranged
within a single object unit.

 In the example below the first unnamed block contains


the executable instructions. The second block CDATA
contains data areas that are a few words in length, the
third block CBLKS has data areas that have larger
blocks of memory.
Machine Independent Assembler Features.
Program Blocks
 The assembler directive USE indicates which portion of
the program belongs to the various blocks.

 The assembler will rearrange these segments to gather


together the pieces of each block. These blocks are
then assigned addresses in the object program with the
blocks appearing in the same order in which they were
first began in the source program.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
 During pass 1 separate location counters are
maintained for each block. They are initialized to
0 when the block is first began.

 At the end of pass 1 the latest value of the


location counter for each block indicates the
length of that block.

 MAXLEN is shown without a block number


because it is an absolute value whose value is
not relative to the start of any block.

 At the end of pass 1 the assembler constructs a


working table that contains the starting addresses
and lengths of all blocks
Machine Independent Assembler Features.

Block Name Block Number Address Length


Default 0 0000 0066
CDATA 1 0066 000B
CBLKS 2 0071 1000

For the instruction 0006 LDA LENGTH


the value of the operand has relative address
0003 within the CDATA block. The starting
address for CDATA is 0066. The desired target
address for this instruction is therefore
0003 + 0066= 0069.
Machine Independent Assembler Features.
 The address of the next instruction is 0009
within the default block. The required
displacement therefore is 0069 – 0009 = 60.

 Because the large buffer area is moved to the


end of the object program there is no need to
use extended format instructions. Base
register is also no longer necessary.
Machine Independent Assembler Features.
 5. Control Sections and Program Linking
 A control Section is part of the program that
maintains its identity after assembly; each
section can be loaded and relocated
independently of each other.

 External references are used to link control


sections.

 Control sections differ from program blocks in


that they are handled separately by the
assembler.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
 The directive CSECT signals the start of a
new control section.

 The EXTDEF statement names symbols


called external symbols that are defined in
this control section but may also be used
by other sections.

 EXTREF names symbols used in this


section but are defined elsewhere.
Machine Independent Assembler Features.
For 0003 CLOOP +JSUB RDREC 4B100000
The operand RDREC is an EXTREF. The
assembler inserts an address of zeros and passes
information to the loader which will cause the
proper address to be inserted at load time

Relative addressing is not possible so an extended


format must be used to provide room for the actual
address to be inserted.
Machine Independent Assembler Features.
 0017 +STCH BUFFER,X 57900000
is also assembled using extended format but the
x bit is set to 1 to indicate indexed addressing.

 0028 MAXLEN WORD BUFEND-BUFFER 000000


Both BUFEND and BUFFER are external
references which are stored as zeros.

 1000 MAXLEN EQU BUFEND-BUFFER


BUFEND and BUFFER are defined in the same
control section, the value of the expression can
therefore be calculated immediately by the
assembler.
Machine Independent Assembler Features.
 The object program includes two new
record types DEFINE and REFER

 A DEFINE record gives information about


EXTDEF and a REFER record lists the
EXTREFs.
Machine Independent Assembler Features.
The Define record:
Col 1 D
Col 2-7 Name of external symbol defined in this
control section
Col 8-13 Relative address of symbol within this
section
Col 14-73 Repeat information in col 2-13 for other
external symbols.

The Refer record:


Col 1 R
Col 2-7 Name of external symbol defined in this
control section
Col 8-73 Names of other external reference
symbols.
Machine Independent Assembler Features.
 The other needed information is added to
the modification record
 Col 1 M
 Col 2-7 Starting address of the field to be
modified, relative to the beginning of
the control section
 Col 8-9 Length of the field to be modified in
half bytes.
 Col 10 Modification flag (+ or -)
 Col 11-16 External symbol whose value
is to be added to or subtracted from
the indicated field.
Machine Independent Assembler Features.
Machine Independent Assembler Features.
 The modification record M^000004^05^+RDREC
implies that the address of RDREC is to be added
onto this field in order to produce the correct machine
instruction for execution.

 At address 0028 both BUFEND and BUFFER are in a


different control section. The assembler generates an
initial value of zero for this word.

 The last two modification records in RDREC direct


that the address of BUFEND be added to this field
and the address of BUFFER be subtracted from it.

 If an expression is to be used, all terms in an


expression must be relative within the same section
because if the terms are in different sections their
difference has a value that is unpredictable.
Two Pass Assembler with
Overlay Structure
 Most assemblers use 2 passes.

 Some tables and subroutines that are


used during pass 1 are not needed after
the pass is completed; others like the
SYMTAB are needed for both passes.

 Since pass 1 and pass 2 are not needed


at the same time, they can occupy the
same locations in memory during
execution of the assembler.
Driver
Shared Tables and
Routines

Pass 1 Tables Pass 2 tables


and Routines and Routines
The root segment contains a driver program to
call the other two segments. It also contains the
tables and routines needed by both passes.

Initially the root and pass 1 segments are


loaded into memory and the assembler makes
the first pass.

At the end of the first pass, the pass 2 segment


is loaded in memory replacing the pass 1
segment.

The assembler thus uses less memory and


reduces its memory requirements.
One Pass assemblers
The main problem in trying to assemble a
program in one pass is forward references
because the operands are often not yet
defined.
There are two types of one pass assemblers;
 One type produces object code directly in
memory for immediate execution (load
and go). No object program is written.

 The other method produces the usual kind


of object program for later execution.
• Object Code in Memory and symbol table entries for the program
below after the instruction at address 2021

Address Content Symbol value


1000 454F4600 00030000 00xxxxxx xxxxxxxx LENGTH 100C
RDREC * 2013
1010 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
THREE 1003
ZERO 1006
WRREC * . 201F
2000 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXX14 EOF 1000
2010 100948--- --00100C 28100630 ---------48-- ENDFIL * 201C
2020 ---3C2012 RETADR 1009
BUFFER 100F
CLOOP 2012
FIRST 200F
• Object Code in Memory and symbol table entries for the program
below after the instruction at address 2052

Address Content Symbol value


1000 45454F4600 00030000 00xxxxxx xxxxxxxx LENGTH 100C
1010 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX RDREC 203D
THREE 1003
ZERO 1006
WRREC * . 201F
2000 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXX14 EOF 1000
2010 10094820 3D00100C 28100630 202448-- ENDFIL 2024 2031
2020 ---3C2012 0010000C 100F0010 030C100C RETADR 1009
2030 48-------08 10094C00 00F10010 00041006 BUFFER 100F
CLOOP 2012
2040 001006E0 20393020 43D82039 28100630
FIRST 200F
2050 -------5490 0F MAXLEN 203A
INPUT 2039
EXIT *. 2050 *
RLOOP 2043
Load and Go Assembler
The assembler generates object code instructions as it
scans the source program.

If the operand is a symbol that has not yet been defined,


the operand address is omitted and the symbol used as
an operand is entered in the symbol table.

When the definition for the symbol is encountered the


symbol table is scanned and the proper address is
inserted into any instructions previously generated.

Any SYMTAB entries that are still marked with * at the


end of the program indicate undefined symbols and they
should be flagged by the assembler as errors.
The second type of a one pass assembler produces
object code.

Forward references are entered into lists as before


but when the definition of the symbol is encountered
another text record with the correct operand address
is generated. When the program is loaded, this
address will be inserted into the instruction by the
action of the loader.

The second text record contains the object code


generated from 200F through 2021. The operand
addresses for the instructions at addresses 2012,
201B and 201E have been generated as 0000.
H^COPY ^00100^00107A
T^001000^09^454F46^000003^000000
T^00200F^15^141009^480000^00100C^281006^300000^480000^3C2012
T^00201C^02^2024
T^002024^19^001000^0C100F^001003^0C100C^480000^081009^4C0000^F1^001000
T^002013^02^203D
T^00203D^1E^041006^001006^E02039^302043^D82039^281006^300000^54900F^2C203A^382043

T^002050^02^205B
T^00205B^07^10100C^4C0000^05
T^00201F^02^2062
T^002031^02^2062
T^002062^18^041006^E02061^302065^50900F^DC2061^2C100C^382065^4C0000
E^00200F
When the definition of ENDFIL at address
2024 is encountered the assembler
generates a third text record. This record
specifies that the value 2024 is to be loaded
at location 201C.

When this program is loaded the value 2024


will replace 0000 which was previously
loaded.

You might also like