Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

Slide Set 4 for Lecture Section 01

for ENCM 369 Winter 2017

Steve Norman, PhD, PEng

Electrical & Computer Engineering


Schulich School of Engineering
University of Calgary

January 2017
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 2/63

Contents

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 3/63

Outline of Slide Set 4 for Lecture Section 01

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 4/63

Encoding of MIPS branch and jump instructions

(Textbook, Section 6.5.)

Previous lectures and Lab 2 showed machine code for


instructions like add, sub, addi, lw, sw.
How is machine code organized for beq, bne, j, jal, jr?
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 5/63

Encoding of beq and bne

A 6-bit opcode tells what kind of instruction it is.


Two 5-bit fields specify which registers to compare.
A 16-bit offset field indicates how many instructions to skip
forward or back.
Note: Unlike load and store offsets, a branch offset counts
words, not bytes.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 6/63

The “Simplest Model for How a Computer Works”


and MIPS branch instructions
The model is: Perform Step 1, Step 2, Step 1, Step 2, . . . ,
forever.
For MIPS, Step 1 is: Read the instruction the PC points to,
then do PC = PC + 4.
If the instruction turns out to be a branch, then Step 2 is:
if (branch should be taken)
PC = PC + 4 × branch offset
else
do nothing
Attention: By the time Step 2 starts, the PC = PC + 4
update in Step 1 has already taken place.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 7/63

beq encoding example

What is the MIPS machine code for each of the


beq instructions?

L1: beq $t0, $t1, L2 # first branch


lw $t2, ($t0)
add $t3, $t3, $t2
addi $t0, $t0, 4
beq $zero, $zero, L1 # second branch
L2: sw $t3, ($s0)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 8/63

Branch encoding and assemblers


Assemblers can compute the offset between a branch
instruction and its target instruction. (In the previous example,
the target of “first branch” is the sw instruction.)
So, humans can use labels as operands in beq and bne.
So, humans avoid the pain of writing A.L. for “first branch”
and “second branch” as
beq $t0, $t1, 4
and
beq $zero, $zero, -5
(Counting instructions would be painful. Updating the counts
correctly when you insert or delete instructions would be more
painful, and very hard to do perfectly.)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 9/63

Machine code for j and jal

opcode address field


bit number: 31 ... 26 25 ... 0

Both instructions use this format:


I 6-bit opcode (000010 for j, 000011 for jal);

I 26-bit address field giving part of the address of the


target instruction.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 10/63

PC updates in j and jal

A jump can be thought of as “updating the PC register with


something other than the usual PC+4.” The PC update for
MIPS j and jal is this:

OLD PC: these 26 bits will change 00

bit number: 31 ... 28 27 ... 21 0


4 bits get copied
copy of address field 00
NEW PC: from j or jal instruction
bit number: 31 ... 28 27 ... 21 0
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 11/63

Processing of j or jal: example

Suppose instruction 0x0c10_0020 is located at address


0x0040_0064.

What is the address of the next instruction to be run? Does


$ra get updated, and if so, with what?
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 12/63

Encoding of j and jal, compared to beq or bne

Attention: To encode a j or jal instruction, the address of


the jump target must be known.
This is unlike encoding beq or bne, in which it is enough to
know how many instructions to skip forward or
back—the exact address of the branch target does not have
to be known.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 13/63

Encoding of jr

31 26 25 21 20 16 15 11 10 65 0
000000 000000000000000001000

Bits 31–26 and 5–0 together indicate that this is a jr


instruction.
The number of the GPR used is encoded in bits 25–21.
So, what is the machine code for jr $ra ?
(By the way, bits 20–6 don’t have any particular use in a jr
instruction.)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 14/63

Outline of Slide Set 4 for Lecture Section 01

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 15/63

Building and running executables starting from C


source code
Here we consider real programming environments such as
Linux, Mac OS X, and Windows, not simulated environments
like MARS.
Examples are focused on C, but are quite applicable to C++ as
well. Java works quite differently—we won’t look at Java in
ENCM 369.
Textbook reference: Section 6.6. However, note this
warning: Section 6.6 oversimplifies things. Contrary to the
example on page 339, an assembler won’t be able to decide on
absolute addresses of procedures and global variables. (Don’t
worry if this warning doesn’t make sense the first time you see
it—it should make more sense once we get to slide 41 or so.)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 16/63

Building and running C programs: a note about


the word compiler

Definition One: A compiler is a program that translates


high-level language to assembly language.
Definition Two: A compiler is a package of programs and
other files that can be used to develop software. Example:
“Apple’s Xcode is the compiler used for development of
iPhone apps.” (Such a package includes a compiler, in the
Definition One sense of that term.)
In ENCM 369, Definition One applies when we use the word
compiler.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 17/63

Building and running C programs: tools and the


toolchain

Tools are programs that read input files and write output files.

Preprocessor: tool to convert C code into rearranged or edited


C code.

Compiler: tool to translate C code into assembly language.


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 18/63

Building and running C programs: tools and the


toolchain, continued

Assembler: tool to create an object file using assembly


language input. (Wait, what does “object file” mean?)
Linker: tool to combine object files and information from
library files to make an executable file. (Wait, what are the
meanings of “library files” and “executable file”?)

Together, preprocessor, compiler, assembler, and linker make a


chain of tools.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 19/63

Building and running C programs: many kinds of


files

C source code files: .c and .h files written by a programmer as


part of a specific program.

Library header files: .h files provided to give information about


types and functions defined in the library. (Example file:
stdio.h. Example types and functions: FILE, fopen,
fprintf, printf.)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 20/63

Building and running C programs: many kinds of


files, continued

Translation unit: File that is output from the preprocessor,


which modifies C code according to directives like #include,
#define, #ifdef, etc.
When you use gcc on Linux, translation units (.i files) usually
don’t get saved permanently. The same is true with Cygwin64
on Windows and for translation units produced in most other
C and C++ development systems. So you’ve probably never
noticed them.
Preprocessor output:
Preprocessor inputs:
a translation unit
C source files
/* file foo.h */
int foo(int arg); int foo(int arg);
#define BAR 42
int quux = 77;
/* file foo.c */ int foo(int arg)
#include ”foo.h” {
int quux = 77; quux++;
int foo(int arg) return arg + 42 ;
{ }
quux++;
return arg + BAR; Note: Some translation unit
} contents omitted to keep the
slide simple.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 22/63

Building and running C programs: many kinds of


files, continued again

Assembly language file: File that is output from the compiler.


Remember, the compiler is the tool that translates C code into
assembly language.
With gcc on Linux or Cygwin, assembly language files
(.s files), like translation units, usually don’t get saved
permanently, so you’ve probably never noticed them either.
Compiler input: Compiler output: An
a translation unit assembly language file
.data
int foo(int arg); .globl quux
quux: .word 77
int quux = 77; .text
int foo(int arg) .globl foo
{ foo:
quux++; la $t0, quux
return arg + 42 ; lw $t1, ($t0)
} addi $t2, $t1, 1
sw $t2, ($t0)
Note: Some translation unit addi $v0, $a0, 42
contents omitted to keep the jr $ra
slide simple.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 24/63

Building and running C programs: many kinds of


files, continued further

Source code, library headers, translation units, and assembly


language files are all text files—sequences of ASCII codes,
organized in lines, editable with a text editor.

Files yet to be seen—object files, library machine code files,


executable files—are binary files, full mostly of machine code
for instructions and base two representations of numbers.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 25/63

Building and running C programs: many kinds of


files, on and on . . .

An object file is output from the assembler.

Let’s make a sketch of typical object file organization.


Assembler output:
Assembler input: An An object file
assembly language file
.data header
.globl quux
quux: .word 77 machine code text
.text for instructions segment
.globl foo of procedure
foo: foo
la $t0, quux
lw $t1, ($t0) 32-bit base two
representation of 77 data
addi $t2, $t1, 1 segment
sw $t2, ($t0)
addi $v0, $a0, 42 relocation info and
jr $ra symbol table
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 27/63

Building and running C programs: many kinds of


files, second last slide . . .

Library machine code files contain instructions and static data


belonging to library procedures such as printf, fopen,
fprintf, strcpy, and thousands more.
These files occupy many megabytes or gigabytes in the file
system of a modern OS.
Chunks of machine code and data can be copied out of these
files as needed when building executable files.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 28/63

Building and running C programs: many kinds of


files, finally done!

An executable file is created by the linker, the last tool in the


chain.
The linker combines
I one or more object files;

I information from the library.

Suppose an executable is built from object files alpha.o and


beta.o. Let’s sketch the organization of the executable file.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 29/63

Linux, Cygwin64, and other platforms

These slides were originally written under the assumption that


the platform used to build and run programs would be Linux.
Building and running programs with Cygwin64 on top of
Microsoft Windows is very similar to doing the same things on
Linux. So I’ve edited all the slides in this slide set to refer to
Cygwin64 instead of Linux.
Please keep in mind that all of the ideas here apply not only to
Cygwin64 but also to important OSes such as Linux, FreeBSD
and OpenBSD.
(The ideas also apply to Mac OS X, for the most part, for now,
but might not work so well with future Mac OS X versions.)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 30/63

Running an executable

Suppose you have an executable file called a.exe in the


current directory of your Cygwin64 terminal window.
Suppose you type and enter the command ./a.exe
Cygwin64 copies the text and data segments from a.exe into
main memory. (That’s an oversimplified model—a system
called demand paging reduces the amount of copying needed
to get a program started, but I don’t want to get into the
complex details of that here.)
Cygwin64 starts the program by making the PC point to the
start of the in-memory text segment.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 31/63

Running an executable, not just in a terminal


window

These days, users of desktops, laptops, tablets, and


smartphones usually launch programs via some kind of “Start”
menu, or by clicking or double-clicking or tapping a program
icon or file icon.
At the operating system level, that’s the same as entering a
command line: text and data segments get copied from an
executable file to memory, then the PC gets pointed to the
start of the in-memory text segment.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 32/63

Review of the toolchain: running gcc

Example command: gcc aa.c bb.c


Students might say, informally, that this is “running the
compiler.”
Better-educated students would say that this is “running a
bunch of programs that are tools in the toolchain.”
What sequence of programs gets run?
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 33/63

Outline of Slide Set 4 for Lecture Section 01

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 34/63

Linking and Libraries

Topics in the upcoming slides:


I What the linker does

I How symbol tables and relocation information in object


files help the linker.
I Comparison of static linking with dynamic linking.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 35/63

Review: object files and executable files

Contents of an object file: machine code (in text segment),


initial values for static data (in data segment), symbol table,
relocation info.
An object file is NOT a runnable program!

The linker makes an executable file by combining one or


more object files with instructions and data from library files.
An executable file IS a runnable program!
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 36/63

Symbol tables, relocation information, and linking

Symbol tables and relocation information are sections within


object files; the roles of these sections haven’t yet been
explained.
These two sections play key roles in helping the linker to
insert important pieces of addresses into instructions in
executable files.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 37/63

The symbol table

The symbol table is a part of an object file that lists .globl


symbols from the A.L. file used by the assembler to generate
the object file.
In this context, symbol means the same thing as label.
For each symbol coming from a text segment, the table gives
the offset of that symbol relative to the start of the text
segment in the object file.
Similarly, for each symbol coming from a data segment, the
table gives the offset of that symbol relative to the start of the
data segment in the object file.
# A.L. example to help explain symbol tables and relocation info.
.text
.globl foo
foo:
[. . . some instructions omitted to save space on slide . . . ]
jal quux
[. . . more instructions omitted to save space on slide . . . ]
jal bar
[. . . more instructions omitted to save space on slide . . . ]
jr $ra
.data
.globl my_array
my_array: .word 0x100, 0x200, 0x300
.globl my_int
my_int: .word 0x9999
.text
.globl bar
bar:
[. . . some instructions omitted to save space on slide . . . ]
la my_int
[. . . more instructions omitted to save space on slide . . . ]
jr $ra
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 39/63

Attention, regarding the previous slide

That is obviously not a complete A.L. program—procedures


main, quux and possibly other important things were not
defined there.

So to make an executable file, the linker will have to combine


the object file that comes from that A.L. file with one or more
other object files.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 40/63

Symbol table example

For the A.L. file from two slides back, let’s assume . . .
I foo has 18 instructions in total;

I jal quux is the 5th instruction in foo;

I jal bar is the 11th instruction in foo;

I la my_int generates lui and ori instructions that will


be the 5th and 6th instructions of bar.
Let’s sketch out the text segment, data segment, and symbol
table that will appear in the object file.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 41/63

Problems solved by relocation information, Part 1

Consider jal quux in our A.L. example.


The function quux is not defined in the given A.L. file, so it
must be defined in some other file.
The assembler is supposed to encode the jal instruction in
the text segment of the object file, but can’t because the
assembler doesn’t know what address information to put into
bits 25–0 of the machine code for the instruction.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 42/63

Problems solved by relocation information, Part 2

Now consider jal bar and la my_int in our A.L. example.


Does the assembler have enough information to completely
encode the jal instruction, and the lui and ori instructions
that will be needed for the la pseudoinstruction?
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 43/63

Solution to problems: role of assembler

The assembler partially encodes jal (and j) instructions, but


just puts zero bits in the address fields, and makes notes in the
relocation info about what must be fixed.
The assembler partially encodes lui and ori instructions, but
just puts zero bits in the 16-bit constant fields, and makes
notes in relocation info about what must be fixed.
Let’s sketch the relocation info for our A.L. example.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 44/63

Solution to problems: role of linker


The linker must combine multiple text segments and multiple
data segments to build a single text segment and a single data
segment for the executable file.
The linker—unlike the assembler—knows what base
addresses (for example, 0x0040_0000 and 0x1001_0000) are
expected by the operating system for .text and .data
segments.
So the linker can compute the addresses for all instructions
and data items.
The linker uses symbol tables and relocation information to
insert correct pieces of addresses into instructions such as jal,
j, lui, and ori.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 45/63

Example of a common error seen by gcc users

A file called joe.c . . .


void foo(void);
int main(void)
{
foo();
return 0;
}

Will the command gcc joe.c succeed? Why or why not?


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 46/63

Error message from command gcc joe.c run on an a


Cygwin64 system . . .
/tmp/ccYL2A05.o:joe.c:(.text+0xe): undefined
reference to ‘foo’
/tmp/ccYL2A05.o:joe.c:(.text+0xe): relocation
truncated to fit: R_X86_64_PC32 against undefined
symbol ‘foo’
collect2: error: ld returned 1 exit status
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 47/63

Explanation of a common gcc error

What happened?
I Did the compiler have a problem?

I Did the assembler have a problem?

I Did the linker have a problem?


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 48/63

Static vs. Dynamic Linking

Slides so far have presented the process for creating a kind of


executable file called a statically-linked executable.
Most executable files on current Mac, Windows, or Linux
operating systems are a different kind, called
dynamically-linked. (But it’s still possible to create and run
statically-linked executables on these systems.)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 49/63

Contents of a statically-linked executable file


(review from earlier slides)

Executable file header (information about sizes and layouts of


other segments).
Text segment: start-up code; machine code from object files;
machine code for all necessary library procedures.
Data segment: initial values for static data from object files;
initial values for static data belonging to library procedures.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 50/63

Contents of a dynamically-linked executable file

Executable file header (information about sizes and layouts of


other segments).
Text segment: start-up code; machine code from object files;
Data segment: initial values for static data from object files;
Information about where to find library instructions and library
data in the file system.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 51/63

Running a dynamically-linked executable

Text and data segments get copied into memory from the
executable file (as is the case with a statically-linked
executable).
It is the operating system’s responsibility to make sure
library machine code and data are in memory when needed.
Often, library machine code is already in memory when a
program starts, because some other running programs also
need it—this is useful sharing of memory by multiple running
programs.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 52/63

Running a dynamically-linked executable:


example C code
Source code blastoff.c . . .
#include <stdio.h>
int main(void)
{
int count;
for (count = 10; count > 0; count--)
printf("%d ... \n", count);
printf("Blastoff!\n");
return 0;
}
Command to build executable:
gcc blastoff.c -o blastoff
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 53/63

Running a dynamically-linked executable:


example, continued

What is in the executable file called blastoff?

What important and relevant information is NOT in the


executable file called blastoff?

What happens when blastoff is run?


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 54/63

Static linking: Advantages

Compared to dynamic linking, static linking is easy to


understand and implement.

Installing software is relatively easy to manage—it may require


placing only one file, the executable, in the right place.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 55/63

Static linking: Disadvantages

Executables are big. A collection of statically-linked


executables contains many copies of the same library machine
code and data. This wastes space in the file system.
If many programs are running on a multi-tasking operating
system, many copies of the same library machine code may be
in memory. This wastes memory and hurts performance.
Installed executables can’t take advantage of library upgrades
such as bug fixes and performance improvements.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 56/63

Dynamic linking: Advantages

Essentially, all the main disadvantages of static linking are


overcome:
I Executables are smaller,

I less total memory is needed to run multiple executables at


the same time,
I executables can benefit from library upgrades.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 57/63

Dynamic linking: Disadvantages

It’s harder to understand and implement than static linking.


Software installation can be complicated—are all the right
versions of library files for dynamic linking available to support
all the executables? (Failures in this area on older versions of
Microsoft Windows were called “DLL hell.”)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 58/63

Quick Tour of C library,


for Cygwin64 in Winter 2016

Cygwin64 is a complicated mess that mixes Linux-like stuff


with Microsoft Windows, so a “quick” tour is impossible.
(Previous years’ versions of these slides were able to say a few
useful things about where important library files were located
on Linux systems in ICT 320.)
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 59/63

Outline of Slide Set 4 for Lecture Section 01

Encoding of MIPS branch and jump instructions

Building and running executables starting from C source code

Linking and Libraries

Notes about some confusing words


ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 60/63

Notes about some confusing words

Computer programming terminology and jargon have evolved


(or have just accumulated chaotically) over the past several
decades.
Some of the choices that were made were not very helpful to
students trying to learn about computers and programming!
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 61/63

Notes about some confusing words: text

The text segment, where the instructions go in an object file


or executable file, is not related to the concept of text as a
sequence of character codes in ASCII or some other character
set!
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 62/63

Notes about some confusing words: object

The term object file has nothing to do with the concept of


an object in object-oriented programming in languages such as
C++, Java, Objective-C, Python, etc.
ENCM 369 Winter 2017 Slide Set 4 for Lecture Section 01 slide 63/63

Notes about some confusing words: link

The word link in the MIPS jump-and-link instruction means,


“leave a return address to allow return from a callee
procedure.”

The word link in the term linker means, “connect together


procedures and data from one or more object files to make an
executable program.”

So there are two different meanings for link.

You might also like