CHP 3 Hood

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 7

Chapter 3 Lifting The Hood

RAX COSMAC ELF story.

address and locaton as embodied on a 'random access' chip.

Memory Access Time


chips are graded by how long it takes data to appear on the data pin, after you
apply the address (_ of the data) to the address pin. To become an expert assembly
language programmer you need to learn how to shave off memory accesses your program
needs to perform. Abrash has published books that do this in the context of
graphics programming - the essence of his advice is - stay out of memory whenever
you can.

Bytes, words, double words, quad words


Understanding how the computer gathers its memory chips together to form a coherent
memory *system* is critical when you wish to write efficient assembly programs.
There are many ways to hook memory chips together. The system described here is
that used by the Intel 8080 PC (_ system??)

Our (_final) memory system must store information. How we organize a memory system
depends on how we organize our information.

The answer begins with a 'byte'. From a functional perspective, memory is measured
in bytes.
A byte is 8 bits.
Two bytes side by side is a word. (_ == 16 bits)
two words side by side is a double word (_ == 32 bits)
four words side by side is a quad word (_ == 64 bits)

In 2005 most computers process information one double word (== 32 bits) at a time.
Some computers process quad words (64 bits) at a time.

Pretty Chips All In A Row

A single memory chip contains half a billion bytes but does not contain a byte of
information, something that puzzles beginners.

some details about how addresses and data are realized at the hardware level. This
organization does not affect how your programs work. When a program accesses a byte
at a particular address, the computer takes care of fetching it, irrespective of
the actual hardware organization.

Computers can process 1, 2, 4,8, 16. 32, 64 bits etc at a time, but in reality,
each byte has its own address. Analogy: Each of the three volumes of The Lord Of
The Rings has its own ISBN number, but they can be checked out together.

When 32 bit computers are asked to retreive the byte at a given address, they
actually read 4 bytes starting at the address. You can use the 2nd, 3d, 4th bytes
or ignores.

So far we talked about accessing and writing to addresses, but ignored *who* is
doing all this. Almost always the answer is "the CPU".
Analogy : you are the CEO, the CPU is the foreman. It does a lot of the work, but
also farms out work to peripherals (etc).

The CPU's most important job is to communicate with memory, and like memory chips
has a number of pins devoted to memory addresses. When the CPU needs to read the
value at an address it places the address (encoded in binary) on these pins, and a
few nana second later, the requested byte (or word, double word, quad word etc)
appear on the data pins of the memory chips, then on the data pins of the CPU
itself.

The reverse process = the CPU wants to write a value to a particular memory
location. It first places the memory location onto its address pins. Some
nanoseconds later it places the value on its data pins. The memory system then
places the value at the specified address.

This give and take between CPU and memory represents the bulk of what happens in
your computer. But, there are other information paths. Your computer contains
peripherals, that are sources, destination, or both of information. E.g: video
display boards, disk drives, USB ports. Most peripherals contain one or two large
(?) chips and some supporting chips, and address and data pins. Some peripherals,
specifically GPUs have their own memory chips.

The electrical 'lines' across which these peripherals talk to the CPU and each
other are collectively known as a "data bus". An elaborate arbitraction systems
decides when and in what order devices talk to each other. This happens as above -
an address is placed on the bus, followed by data. Special signals are put on the
bus to indicate whether the address is of a memory location or a perpipheral
attached to the bus. The address of a peripheral is called an I/O address to
distinguish it from a memory address.

The Foreman's Pockets


The CPU contains a very few data storage cubbyholes called (KEY) registers. These
are analogously the foreman's pockets *and* the foreman's workbench. When the CPU
needs to store a value for a short time, it puts the value into a register. The CPU
*could* use the memory to store this value, but this takes much more time than
using a register. Putting value into and retreiving values from a register is
*fast*>
(the foreman's workbench -->) to add two nummbers, the fastest way is to put the
numbers into two registers and add the registers together. The sum can either
replace the value in either register, or use a third register, or stored in memory,
etc.

Registers are connected to each other, and there is little data movement when using
registers. CPU registers unlike memory addresses which have numerical address have
distinct names. e.g: EDI, EAX. Some registers have special properties not shared by
other registers.

If the CPU is the assembly line foreman, peripherals are assembly line workers, and
the data bus is the assembly line itself. Unlike most assembly line, the foreman
works much harder than the line workers!

Who tells the foreman and crew what to do? *you* do, by writing a program. The
program, like all data is stored in memory. Iow, the program *is* data - this is
the beauty of program.

The Box That Follows A Plan

The Essence Of Program - the nature of programs and how they direct the CPU to
control the computer and get your work done.
We have seen how memory can be used to store bytes of information. these bytes are
all binary codes, patterns of 0 and 1, stored as minute electrical impulses. these
binary signals can be interpreted as Symbols, numbers, punctuations etc.

There are some binary patterns that mean something to the CPU, - these are "machine
instructions" - which are instructions to the CPU. Each such (binary) number tells
the CPU *what* to do - the CPU knows *how* to do these tasks. When the CPU is
executing a program, it picks a sequence of numbers off the data bus, one at a time
and executes it. It continues doing so till something - a program instruction, a
reset button - tells it to step.

E.g: (from IA 32) - 40H == 0100 0000 tells the processor "add 1 to the value in the
register AX, put the result back in AX"
(step) most machine instructions take more than one byte - can be 2, 3, 4 or more
bytes in length.
Example the (two bytes) B6H 73H says "load the value 73H into register DX". etc.
the "what to do" can be very complicated for a specific instruction.

IA 32 has several hundred instructions.


- some perform arithmetic operations - add, subtract, multiply, divide etc
- logical operations - AND, OR, NOT
- move instructions around in memory
- "steer" the path that program execution takes (? JMP instructions? )
- highly arcane functions that don't turn up outside operating systems

for now important point == each instruction tells the CPU to perform one limited
task.

(KEY) Many instructions handed sequentially to the CPU instruct it to perform more
complicated tasks. Writing that sequence of instructions is what assembly language
programming is.

Fetch And Execute


(KEY) A computer program is nothing more than a sequence of these (_ machine
language) instructions stored in memory. (KEY) There is nothing special about this
table, or where it is stored in memory. It can be stored almost (?) anywhere and
the bytes in the table are nothing more than binary numbers.

The binary numbers that comprise the program are special (KEY) only in the way that
the CPU treats them.

When a modern 32 bit CPU starts running it fetches two words (_ so 32 bits) from an
agreed upon address in memory. How this starting address is agreed upon doesn't
matter for now). This double word is treated as (_ more than one?) machine
instruction and the CPU performs the task indicated by this double word. As soon as
it finished executing this task, the CPU fetches the next double word from memory.
Inside the CPU there is a special register called the Instruction Pointer that
*literally* contains the address of the next instruction to be fetched from memory
and executed. Each time an instruction finishes executing, the Instruction Pointer
(register) is updated to point to the next instruction in memory. (In practice
there is some special magic inside CPUs that "guesses" what is to be fetched next
and keeps it on a side shelf (_ the cache) so if it is necessary to execute that
instruction next, it can be fetched much more quickly than from memory).

Ancient 8088 based 8 bit machines, such as the original IBM PC, fetched only one
byte at a time, rather than 4 bytes at once, like a later 32 bit Pentium does, and
so the 8088 CPU had to return to memory to fetch a second (and then possibly a
third or fourth) byte to complete the instruction before executing it.
The computer has a subsystem called the "system clock" that is an oscillator that
emits square wave pulse at very precise intervals. The immense number of
microscopic transistors inside the CPU co ordinate their actions according to the
pulses emitted by the clock. In the past, it often took multiple clock cycles to
execute each instruction. as computers got faster, the majority of instructions
completed execution within one clock cycle. Modern CPUs can execute instructions in
parallel, so multiple instructions can be executed in parallel.

so the processor repetitively performs (KEY) fetch and execute, fetch and execute.
The CPU works its way through memory (_ where instructions reside) led by the
Instruction Pointer, and doing work (_ by executing instructions fetched) - moves
data around in memory, moving values around in registers, passing data to
peripherals, crunching data in arithmetic or logical operations.

The Foreman's Innards


Machine Instructions are *binary* codes. To really understand the true nature of
the CPU, we have to step away from the view of machine instructions as *numbers*.
They are *not* numbers, but are binary patterns designed to throw electrical
switches.

Inside the CPU are a very large number of tranistors. some prose here, but the
basic idea == each pattern in binary throws (8, 32, 16 etc) 'switches' ot on or
off, which possibly triggers other switches, and so on.

Changing Course

1st bit of magic == a string of binary numbers in memory tell the computer what to
do
2nd bit of magic == jewel in the crown == there are instructions that (KEY) change
the order in which machine instructions are fetched and executed (vs just
traversing linearly through instructions laid out in sequence in memory).

iow once the CPU has finished executing an instruction (_ by setting the IP to the
already executed instructions address rather than by defaule incrementing it by 1),
the next instruction might tell the CPU to go back and "play it again". and again,
as many times as (the programmer deems) necessary. The CPU can keep track of how
many times an instruction (_ or a sequence of instructions) it executes and keep
repeating them till a specific count has been achieved.

what this means is that the list of instructions composing the program does not
begin at the top and run to the bottom. It can execute these instructions (_ or
jump to other instructions located in memory) in any pattern (_ top to bottom,
bottom to top, loops, zigzag...) . The mechanism for this is by manipualting the
'address of the next instruction to execute' value in the instruction pointer.

(step) In addition, there are a set of special one bit registers called flags.
These can also be used (in addition to values in registers and memory) to make
decisions about which instruction to execute next.

In chapter 1, a program was said to be a sequence of steps *and tests*. For the
tests, the test is always two way and the choice is always jump or don't jump.

What vs How: Architecture vs Microarchitecture

A CPU 'imlementation' is divided cleanly into two parts - what it does and how it
does (what it does).
A programmer's view (from the outside) consists of - the set of CPU registers
- the set of instructions the CPU understands
- special purpose subsystems such as fast math co processors, which can have
registers and special instructions of their own

All this is documented by Intel. Together all these definitions are called
*architecture* of that chip. CPU architectures evolve over time odten adding
registers, ideally with backward compatibility, so that old instructions (_ and
programs based on them) keep working on the new processor.

1986 : 16 bit --> 32 bits. added several instructions and operational modes,
doubled the width of the CPU.
In 2003, x86 architecture expanded again, again with new instructions, modes of
operation, and expanded registers. These chips still run 32 bit programs.

In addition to periodic additions to the instruction set, architectures often make


quantum leaps, typically involving a change in the "width" of the CPU.

With minor glitches the IA 64 architecture includes the IA-32 architecture, the
latter is used in this book.

Because of backward compatibility issues CPU designers don't add new registers,
instructions etc without very good reason. Better ways to improve processor
families are to increase processor throughput (number of instructions executed per
unit time) and decrease power consumption.

A lot of arcane tricks are associated with increasing throughput with names like
- prefetching
- L1 and L2 cache
- branch prediction
- hyper pipelining
- macro ops fusion
plenty of others
Some of these techniques were used to reduce or eliminate bottlenecks within the
CPU and so that the CPU and memory system can remain maximally busy. other
techniques stretch the ability of the processor to execute more than one
instruction at the same time. Taken together, all the electrical mechanisms by
which **how** the cpu does what its instructions tell it to do, is called the
microarchitecture of the CPU.

Anology: micro architecture == the machinery in the basement you can't see.

Analogy: you produce components for a Ford model car with two factories. One old,
one new. The components *produced* are identical, with very tight tolerances. Ford
doesn't care which factory the parts came from but the newey factory incorporates
all the lessons you have learned from operating the old one, and has a more logical
layout, better lighting, and modern automated tooling, that requires fewer fewer
people and works longer without reset. A day will come when you build a third
factory incorporating lessons from the second and close the first down. The
tooling, assembly line layouts, and the general structure of the plant can be
considered the microarchitecture of the plant.

Exotic code names - Conroe, Katmai, Yonah -usually indicate tweaks/minor changes
in microarchitecture. Major changes also have codenames like P6, Netburst, Core. As
a programmer, you can ignore most of this. It is extremely rare that a change in
microarchitecture gives you an exploitable advantage in how you create programs.
For now treat microarchitecture as a mystery.

Enter The Plant Manager (_ the operating system!)

Operating Systems - The Corner Office

An operating system is a *program* *that manages the operation of a computer


system*. It is like any other program in that consists of steps and tests, a
sequence of instructions executed by the CPU, but this program has special powers
not granted to user programs like word processors and spreadsheets.

Continuing the metaphor of the CPU as the shop foreman, the operating system is the
plant manager, with the entire physical plant being in his control. It (_ is in
charge of) brings material to the plant. It supervises all work done in the plant
(including the work done by the foreman) and packages up the result of the work for
shipment to customers.

In the early days of computing, operating systems didn't do much, they 'spun the
disks', handled the storage of data in the disks, fetching data from the disks as
required.

The CP/M operating system was state of the art in 1979. If you entered the
name of a program at the keyboard (_ CP/M assumed loaded into memory) like
WordPerfect, and handed over control of the whole machine to the loaded program
(here Wordstar). When wordstar ran, it overwrote the OS (CP/M) in memory, which was
so expensive, that only one program can run at a time. Then CP/M would be reloaded
from the floopy disk and the computer would wait for another command from the
keyboard.

BIOS: Software, just not as soft.

So what brought CP/M back into memory when Wordstar exited? Wordstar *rebooted* the
computer. Every time a piece of software ran, CP/M went away. In fact every time
the (_user) software exited, it rebooted the machine and CP/M came back from the
floppy disk. CP/M was so small that this rebookting took less than two seconds.

Computers got faster, memory got cheaper. DOS replaced CP/M. DOS didn't go away
when a user program loaded on top of it. DOS wasn't much larger than CP/M but could
do a lot more. This was because IBM had taken the code that handled the keyboard,
the display, and the disk drives, and burned it into a special chip called Read
Only Memory. Ordinary Random Access Memory goes blank when the power is turned off.
ROM retains its data whether it has power or not. The software on the ROM was
called Basic Input Output System (BIOS). Thus these thousands of machine
instructions didn't need to be loaded from disk every time because they were
always there in a chip soldered to the motherboard.

Somewhere along the way software on ROM chips were named firmware. All modern
computers have a BIOS though it does different things now than in i1981.

Multitasking Magic

In 1995, Windows 1995 launched, which had a brand new graphical user interface, but
something more radical in the basement. It operated in 32 bit protected mode, and
required an 80386 processor to run, and operated in "protected mode" (to be
explained later).
For now thinking of protected mode as what enables the operating system to be the
boss, and no longer a peer of word processors and spreadsheets. Win95 did not make
full use of protected mode because it had DOS, and DOS applications to deal with.

<some text about the concept of multiprocessing >

Promotion to Kernel
In 1991 Linus released Linux. Linux did not have a fancy graphical interface, but
it could handle multittasking, but it had a powerful internal structure, called
kernel, which took full advantage of 80386 protected mode, and the kernel was
entirely separated from the user interface, and was proteted from damage (?) due
to malfunctioning programs elsewhere in the systems. System memory was tagged
either as kenel space or user space, and nothing running in either user space can
write to, and in general, read from kernel space memory. Communication between
kernel space and user space was handled through strictly controlled (?) system
calls. (more on this later in the book).

Direct access to physical hardware, including memory and peripherals, is limited to


software running in kernel space. Programs wishing to make use of system
peripherals could only get access through kernel mode device drivers.

Windows NT had an internal structure like Linux with a kernel and device drivers
running in kernel space, and everything else running in user space. This basic
design is still in use.

From 2000, computers with multiple processors were released.

Summing up, the most potent metaphors for computing - the computer is a box that
follows a plan. You write the plan. The computer follows this plan by executing it
instruction by instruction.

You might also like