369W17 Slideset04

Slide Set 4 for Lecture Section 01
for ENCM 369 Winter 2017
Steve Norman, PhD, PEng
Electrical & Computer Engineering

Schulich School of Engineering
University of Calgary
January 2017
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 2/63
Contents
Encoding of MIPS branch and jump instructions
Building and running executables starting from C source code
Linking and Libraries
Notes about some confusing words

Outline of Slide Set 4 for Lecture Section 01

(Textbook, Section 6.5.)
Previous lectures and Lab 2 showed machine code for

instructions like add, sub, addi, lw, sw.
How is machine code organized for beq, bne, j, jal, jr?
Encoding of beq and bne
A 6-bit opcode tells what kind of instruction it is.

Two 5-bit fields specify which registers to compare.
A 16-bit offset field indicates how many instructions to skip
forward or back.
Note: Unlike load and store offsets, a branch offset counts
words, not bytes.
The “Simplest Model for How a Computer Works”

and MIPS branch instructions
The model is: Perform Step 1, Step 2, Step 1, Step 2, . . . ,
forever.
For MIPS, Step 1 is: Read the instruction the PC points to,
then do PC = PC + 4.
If the instruction turns out to be a branch, then Step 2 is:
if (branch should be taken)
PC = PC + 4 × branch offset
else
do nothing
Attention: By the time Step 2 starts, the PC = PC + 4
update in Step 1 has already taken place.
beq encoding example
What is the MIPS machine code for each of the

beq instructions?
L1: beq $t0, $t1, L2 # first branch

lw $t2, ($t0)
add $t3, $t3, $t2
addi $t0, $t0, 4
beq $zero, $zero, L1 # second branch
L2: sw $t3, ($s0)
Branch encoding and assemblers

Assemblers can compute the offset between a branch
instruction and its target instruction. (In the previous example,
the target of “first branch” is the sw instruction.)
So, humans can use labels as operands in beq and bne.
So, humans avoid the pain of writing A.L. for “first branch”
and “second branch” as
beq $t0, $t1, 4
and
beq $zero, $zero, -5
(Counting instructions would be painful. Updating the counts
correctly when you insert or delete instructions would be more
painful, and very hard to do perfectly.)
Machine code for j and jal
opcode address field

bit number: 31 ... 26 25 ... 0
Both instructions use this format:

I 6-bit opcode (000010 for j, 000011 for jal);
I 26-bit address field giving part of the address of the

target instruction.
PC updates in j and jal
A jump can be thought of as “updating the PC register with

something other than the usual PC+4.” The PC update for
MIPS j and jal is this:
OLD PC: these 26 bits will change 00
bit number: 31 ... 28 27 ... 21 0

4 bits get copied
copy of address field 00
NEW PC: from j or jal instruction
bit number: 31 ... 28 27 ... 21 0
Processing of j or jal: example
Suppose instruction 0x0c10_0020 is located at address

0x0040_0064.
What is the address of the next instruction to be run? Does

$ra get updated, and if so, with what?
Encoding of j and jal, compared to beq or bne
Attention: To encode a j or jal instruction, the address of

the jump target must be known.
This is unlike encoding beq or bne, in which it is enough to
know how many instructions to skip forward or
back—the exact address of the branch target does not have
to be known.
Encoding of jr
31 26 25 21 20 16 15 11 10 65 0
000000 000000000000000001000
Bits 31–26 and 5–0 together indicate that this is a jr

instruction.
The number of the GPR used is encoded in bits 25–21.
So, what is the machine code for jr $ra ?
(By the way, bits 20–6 don’t have any particular use in a jr
instruction.)

Building and running executables starting from C

source code
Here we consider real programming environments such as
Linux, Mac OS X, and Windows, not simulated environments
like MARS.
Examples are focused on C, but are quite applicable to C++ as
well. Java works quite differently—we won’t look at Java in
ENCM 369.
Textbook reference: Section 6.6. However, note this
warning: Section 6.6 oversimplifies things. Contrary to the
example on page 339, an assembler won’t be able to decide on
absolute addresses of procedures and global variables. (Don’t
worry if this warning doesn’t make sense the first time you see
it—it should make more sense once we get to slide 41 or so.)
Building and running C programs: a note about

the word compiler
Definition One: A compiler is a program that translates

high-level language to assembly language.
Definition Two: A compiler is a package of programs and
other files that can be used to develop software. Example:
“Apple’s Xcode is the compiler used for development of
iPhone apps.” (Such a package includes a compiler, in the
Definition One sense of that term.)
In ENCM 369, Definition One applies when we use the word
compiler.
Building and running C programs: tools and the

toolchain
Tools are programs that read input files and write output files.
Preprocessor: tool to convert C code into rearranged or edited

C code.
Compiler: tool to translate C code into assembly language.

Building and running C programs: tools and the

toolchain, continued
Assembler: tool to create an object file using assembly

language input. (Wait, what does “object file” mean?)
Linker: tool to combine object files and information from
library files to make an executable file. (Wait, what are the
meanings of “library files” and “executable file”?)
Together, preprocessor, compiler, assembler, and linker make a

chain of tools.
Building and running C programs: many kinds of

files
C source code files: .c and .h files written by a programmer as

part of a specific program.
Library header files: .h files provided to give information about

types and functions defined in the library. (Example file:
stdio.h. Example types and functions: FILE, fopen,
fprintf, printf.)

files, continued
Translation unit: File that is output from the preprocessor,

which modifies C code according to directives like #include,
#define, #ifdef, etc.
When you use gcc on Linux, translation units (.i files) usually
don’t get saved permanently. The same is true with Cygwin64
on Windows and for translation units produced in most other
C and C++ development systems. So you’ve probably never
noticed them.
Preprocessor output:
Preprocessor inputs:
a translation unit
C source files
/* file foo.h */
int foo(int arg); int foo(int arg);
#define BAR 42
int quux = 77;
/* file foo.c */ int foo(int arg)
#include ”foo.h” {
int quux = 77; quux++;
int foo(int arg) return arg + 42 ;
{ }
quux++;
return arg + BAR; Note: Some translation unit
} contents omitted to keep the
slide simple.

files, continued again
Assembly language file: File that is output from the compiler.

Remember, the compiler is the tool that translates C code into
assembly language.
With gcc on Linux or Cygwin, assembly language files
(.s files), like translation units, usually don’t get saved
permanently, so you’ve probably never noticed them either.
Compiler input: Compiler output: An
a translation unit assembly language file
.data
int foo(int arg); .globl quux
quux: .word 77
int quux = 77; .text
int foo(int arg) .globl foo
{ foo:
quux++; la $t0, quux
return arg + 42 ; lw $t1, ($t0)
} addi $t2, $t1, 1
sw $t2, ($t0)
Note: Some translation unit addi $v0, $a0, 42
contents omitted to keep the jr $ra
slide simple.

files, continued further
Source code, library headers, translation units, and assembly

language files are all text files—sequences of ASCII codes,
organized in lines, editable with a text editor.
Files yet to be seen—object files, library machine code files,

executable files—are binary files, full mostly of machine code
for instructions and base two representations of numbers.

files, on and on . . .
An object file is output from the assembler.
Let’s make a sketch of typical object file organization.

Assembler output:
Assembler input: An An object file
assembly language file
.data header
.globl quux
quux: .word 77 machine code text
.text for instructions segment
.globl foo of procedure
foo: foo
la $t0, quux
lw $t1, ($t0) 32-bit base two
representation of 77 data
addi $t2, $t1, 1 segment
sw $t2, ($t0)
addi $v0, $a0, 42 relocation info and
jr $ra symbol table

files, second last slide . . .
Library machine code files contain instructions and static data

belonging to library procedures such as printf, fopen,
fprintf, strcpy, and thousands more.
These files occupy many megabytes or gigabytes in the file
system of a modern OS.
Chunks of machine code and data can be copied out of these
files as needed when building executable files.

files, finally done!
An executable file is created by the linker, the last tool in the

chain.
The linker combines
I one or more object files;
I information from the library.
Suppose an executable is built from object files alpha.o and

beta.o. Let’s sketch the organization of the executable file.
Linux, Cygwin64, and other platforms
These slides were originally written under the assumption that

the platform used to build and run programs would be Linux.
Building and running programs with Cygwin64 on top of
Microsoft Windows is very similar to doing the same things on
Linux. So I’ve edited all the slides in this slide set to refer to
Cygwin64 instead of Linux.
Please keep in mind that all of the ideas here apply not only to
Cygwin64 but also to important OSes such as Linux, FreeBSD
and OpenBSD.
(The ideas also apply to Mac OS X, for the most part, for now,
but might not work so well with future Mac OS X versions.)
Running an executable
Suppose you have an executable file called a.exe in the

current directory of your Cygwin64 terminal window.
Suppose you type and enter the command ./a.exe
Cygwin64 copies the text and data segments from a.exe into
main memory. (That’s an oversimplified model—a system
called demand paging reduces the amount of copying needed
to get a program started, but I don’t want to get into the
complex details of that here.)
Cygwin64 starts the program by making the PC point to the
start of the in-memory text segment.
Running an executable, not just in a terminal

window
These days, users of desktops, laptops, tablets, and

smartphones usually launch programs via some kind of “Start”
menu, or by clicking or double-clicking or tapping a program
icon or file icon.
At the operating system level, that’s the same as entering a
command line: text and data segments get copied from an
executable file to memory, then the PC gets pointed to the
start of the in-memory text segment.
Review of the toolchain: running gcc
Example command: gcc aa.c bb.c

Students might say, informally, that this is “running the
compiler.”
Better-educated students would say that this is “running a
bunch of programs that are tools in the toolchain.”
What sequence of programs gets run?

Topics in the upcoming slides:

I What the linker does
I How symbol tables and relocation information in object

files help the linker.
I Comparison of static linking with dynamic linking.
Review: object files and executable files
Contents of an object file: machine code (in text segment),

initial values for static data (in data segment), symbol table,
relocation info.
An object file is NOT a runnable program!
The linker makes an executable file by combining one or

more object files with instructions and data from library files.
An executable file IS a runnable program!
Symbol tables, relocation information, and linking
Symbol tables and relocation information are sections within

object files; the roles of these sections haven’t yet been
explained.
These two sections play key roles in helping the linker to
insert important pieces of addresses into instructions in
executable files.
The symbol table
The symbol table is a part of an object file that lists .globl

symbols from the A.L. file used by the assembler to generate
the object file.
In this context, symbol means the same thing as label.
For each symbol coming from a text segment, the table gives
the offset of that symbol relative to the start of the text
segment in the object file.
Similarly, for each symbol coming from a data segment, the
table gives the offset of that symbol relative to the start of the
data segment in the object file.
# A.L. example to help explain symbol tables and relocation info.
.text
.globl foo
foo:
[. . . some instructions omitted to save space on slide . . . ]
jal quux
[. . . more instructions omitted to save space on slide . . . ]
jal bar
jr $ra
.data
.globl my_array
my_array: .word 0x100, 0x200, 0x300
.globl my_int
my_int: .word 0x9999
.text
.globl bar
bar:
[. . . some instructions omitted to save space on slide . . . ]
la my_int
jr $ra
Attention, regarding the previous slide
That is obviously not a complete A.L. program—procedures

main, quux and possibly other important things were not
defined there.
So to make an executable file, the linker will have to combine

the object file that comes from that A.L. file with one or more
other object files.
Symbol table example
For the A.L. file from two slides back, let’s assume . . .
I foo has 18 instructions in total;
I jal quux is the 5th instruction in foo;
I jal bar is the 11th instruction in foo;
I la my_int generates lui and ori instructions that will

be the 5th and 6th instructions of bar.
Let’s sketch out the text segment, data segment, and symbol
table that will appear in the object file.
Problems solved by relocation information, Part 1
Consider jal quux in our A.L. example.

The function quux is not defined in the given A.L. file, so it
must be defined in some other file.
The assembler is supposed to encode the jal instruction in
the text segment of the object file, but can’t because the
assembler doesn’t know what address information to put into
bits 25–0 of the machine code for the instruction.
Problems solved by relocation information, Part 2
Now consider jal bar and la my_int in our A.L. example.

Does the assembler have enough information to completely
encode the jal instruction, and the lui and ori instructions
that will be needed for the la pseudoinstruction?
Solution to problems: role of assembler
The assembler partially encodes jal (and j) instructions, but

just puts zero bits in the address fields, and makes notes in the
relocation info about what must be fixed.
The assembler partially encodes lui and ori instructions, but
just puts zero bits in the 16-bit constant fields, and makes
notes in relocation info about what must be fixed.
Let’s sketch the relocation info for our A.L. example.
Solution to problems: role of linker

The linker must combine multiple text segments and multiple
data segments to build a single text segment and a single data
segment for the executable file.
The linker—unlike the assembler—knows what base
addresses (for example, 0x0040_0000 and 0x1001_0000) are
expected by the operating system for .text and .data
segments.
So the linker can compute the addresses for all instructions
and data items.
The linker uses symbol tables and relocation information to
insert correct pieces of addresses into instructions such as jal,
j, lui, and ori.
Example of a common error seen by gcc users
A file called joe.c . . .

void foo(void);
int main(void)
{
foo();
return 0;
}
Will the command gcc joe.c succeed? Why or why not?

Error message from command gcc joe.c run on an a

Cygwin64 system . . .
/tmp/ccYL2A05.o:joe.c:(.text+0xe): undefined
reference to ‘foo’
/tmp/ccYL2A05.o:joe.c:(.text+0xe): relocation
truncated to fit: R_X86_64_PC32 against undefined
symbol ‘foo’
collect2: error: ld returned 1 exit status
Explanation of a common gcc error
What happened?
I Did the compiler have a problem?
I Did the assembler have a problem?
I Did the linker have a problem?

Static vs. Dynamic Linking
Slides so far have presented the process for creating a kind of

executable file called a statically-linked executable.
Most executable files on current Mac, Windows, or Linux
operating systems are a different kind, called
dynamically-linked. (But it’s still possible to create and run
statically-linked executables on these systems.)
Contents of a statically-linked executable file

(review from earlier slides)
Executable file header (information about sizes and layouts of

other segments).
Text segment: start-up code; machine code from object files;
machine code for all necessary library procedures.
Data segment: initial values for static data from object files;
initial values for static data belonging to library procedures.
Contents of a dynamically-linked executable file
Executable file header (information about sizes and layouts of

other segments).
Text segment: start-up code; machine code from object files;
Data segment: initial values for static data from object files;
Information about where to find library instructions and library
data in the file system.
Running a dynamically-linked executable
Text and data segments get copied into memory from the
executable file (as is the case with a statically-linked
executable).
It is the operating system’s responsibility to make sure
library machine code and data are in memory when needed.
Often, library machine code is already in memory when a
program starts, because some other running programs also
need it—this is useful sharing of memory by multiple running
programs.
Running a dynamically-linked executable:

example C code
Source code blastoff.c . . .
#include <stdio.h>
int main(void)
{
int count;
for (count = 10; count > 0; count--)
printf("%d ... \n", count);
printf("Blastoff!\n");
return 0;
}
Command to build executable:
gcc blastoff.c -o blastoff
Running a dynamically-linked executable:

example, continued
What is in the executable file called blastoff?
What important and relevant information is NOT in the

executable file called blastoff?
What happens when blastoff is run?

Static linking: Advantages
Compared to dynamic linking, static linking is easy to

understand and implement.
Installing software is relatively easy to manage—it may require

placing only one file, the executable, in the right place.
Static linking: Disadvantages
Executables are big. A collection of statically-linked

executables contains many copies of the same library machine
code and data. This wastes space in the file system.
If many programs are running on a multi-tasking operating
system, many copies of the same library machine code may be
in memory. This wastes memory and hurts performance.
Installed executables can’t take advantage of library upgrades
such as bug fixes and performance improvements.
Dynamic linking: Advantages
Essentially, all the main disadvantages of static linking are

overcome:
I Executables are smaller,
I less total memory is needed to run multiple executables at

the same time,
I executables can benefit from library upgrades.
Dynamic linking: Disadvantages
It’s harder to understand and implement than static linking.

Software installation can be complicated—are all the right
versions of library files for dynamic linking available to support
all the executables? (Failures in this area on older versions of
Microsoft Windows were called “DLL hell.”)
Quick Tour of C library,

for Cygwin64 in Winter 2016
Cygwin64 is a complicated mess that mixes Linux-like stuff

with Microsoft Windows, so a “quick” tour is impossible.
(Previous years’ versions of these slides were able to say a few
useful things about where important library files were located
on Linux systems in ICT 320.)

Computer programming terminology and jargon have evolved

(or have just accumulated chaotically) over the past several
decades.
Some of the choices that were made were not very helpful to
students trying to learn about computers and programming!
Notes about some confusing words: text
The text segment, where the instructions go in an object file

or executable file, is not related to the concept of text as a
sequence of character codes in ASCII or some other character
set!
Notes about some confusing words: object
The term object file has nothing to do with the concept of

an object in object-oriented programming in languages such as
C++, Java, Objective-C, Python, etc.
Notes about some confusing words: link
The word link in the MIPS jump-and-link instruction means,

“leave a return address to allow return from a callee
procedure.”
The word link in the term linker means, “connect together

procedures and data from one or more object files to make an
executable program.”
So there are two different meanings for link.

369W17 Slideset04

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

369W17 Slideset04

Uploaded by

Copyright:

Available Formats

Slide Set 4 for Lecture Section 01

for ENCM 369 Winter 2017

Steve Norman, PhD, PEng

Electrical & Computer Engineering

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words

Outline of Slide Set 4 for Lecture Section 01

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words

Encoding of MIPS branch and jump instructions

(Textbook, Section 6.5.)

Previous lectures and Lab 2 showed machine code for

Encoding of beq and bne

A 6-bit opcode tells what kind of instruction it is.

The “Simplest Model for How a Computer Works”

beq encoding example

What is the MIPS machine code for each of the

L1: beq $t0, $t1, L2 # first branch

Branch encoding and assemblers

Machine code for j and jal

opcode address field

Both instructions use this format:

I 26-bit address field giving part of the address of the

PC updates in j and jal

A jump can be thought of as “updating the PC register with

OLD PC: these 26 bits will change 00

bit number: 31 ... 28 27 ... 21 0

Processing of j or jal: example

Suppose instruction 0x0c10_0020 is located at address

What is the address of the next instruction to be run? Does

Encoding of j and jal, compared to beq or bne

Attention: To encode a j or jal instruction, the address of

Bits 31–26 and 5–0 together indicate that this is a jr

Outline of Slide Set 4 for Lecture Section 01

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words

Building and running executables starting from C

Building and running C programs: a note about

Definition One: A compiler is a program that translates

Building and running C programs: tools and the

Preprocessor: tool to convert C code into rearranged or edited

Compiler: tool to translate C code into assembly language.

Building and running C programs: tools and the

Assembler: tool to create an object file using assembly

Together, preprocessor, compiler, assembler, and linker make a

Building and running C programs: many kinds of

C source code files: .c and .h files written by a programmer as

Library header files: .h files provided to give information about

Building and running C programs: many kinds of

Translation unit: File that is output from the preprocessor,

Building and running C programs: many kinds of

Assembly language file: File that is output from the compiler.

Building and running C programs: many kinds of

Source code, library headers, translation units, and assembly

Files yet to be seen—object files, library machine code files,

Building and running C programs: many kinds of

An object file is output from the assembler.