Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

Next: 

Assembly Language Programming Up: Computer Architecture -


CSC Previous: The Components of a Computer 

Subsections

 Introduction
 The Architecture of Mic-1
o Registers
o Internal Buses
o External Buses
o Shifter
o Memory Address Register (MAR) and Memory Buffer Register (MBR)
o A-Multiplexer (AMUX)
o Memory
 The Fetch-Execute Cycle
 Microinstructions
 Instruction Set Model
 Memory-mapped Input-output
o Introduction
o Input from standard-input device
o Output to the standard-output device
 Microprogram Control versus Hardware Control
 CISC versus RISC
 Microprogram
o Microinstructions
o Microinstruction Format
o Microinstruction Timing
o MicroInstruction Sequencing
o The Microprogram
 Exercises

The Central Processing Unit (CPU)


Introduction
This chapter describes the structure of a computer from the point of view Level 1 - the
Microprogramming level. First we will give an overview of how a processor and
memory function together to execute a single machine instruction. Then, a sequence
of machine instructions - a program.

We will then build up a real Central Processing Unit (CPU) from subsystems - ALU,
Registers etc. We will consider a CPU to consist of three primary parts:

1.
The internal registers, the ALU and the connecting buses - sometimes called
the data path;
2.
The input-output interface, which is the gateway through which data are sent
and received from main memory and input-output devices. Quite often this
interface is shown as part of the data path. No harm, as long as you understand
its interface role;
3.
The control part, which directs the activities of the data path and input-output
interface, e.g. opening and closing access to buses, selecting ALU function, etc.

A fourth part, main memory, is never far from the CPU but from a logical point of
view is best kept separate.

We will pay most attention to the data path part of the processor, and what must
happen in it to cause useful things to happen - to cause program instructions to be
executed.

Finally, we will briefly describe how the control part can be implemented
by microprogram, i.e. how the decoding of a machine instruction can lead to the
execution of a set of sequencing steps called a microprogram.

Note on terminology: the term `microprogram' came about before microprocessors


were ever dreamt of. It refers to a program in which each instruction is concerned with
a very small event - like the sending of the contents of a register to the ALU.

Before you continue, you should review the material on multiplexers, decoders,
ALUs, registers, buses, etc. in the previous chapter.

The Architecture of Mic-1


Figure 5.1 shows the data path part of our hypothetical CPU from [Tanenbaum, 1990],
page 170 onwards.
  
Figure 5.1: Mic-1 CPU, data path
Here, we briefly describe the components of Figure 5.1. Then we give a qualitative
discussion of how it executes program instructions. Finally we describe the execution
of instructions in some detail.

Registers
There are 16 identical 16-bit registers. But, they are not general purpose, each has a
special use:

PC, program counter


The PC points to memory location that holds the next instruction to be
executed;
AC, accumulator
The accumulator is like the display register in a calculator; most operations use
it implicitly as an unmentioned input, and the result of any operation is placed
in it. Macro-level (level 2) programmers do all arithmetic and logical
operations through it; e.g. ADD, you must put operand 1 in AC, operand 2 must
come from a memory location, and the result is put in AC. Of course, micro-
level (level 1) programmers can treat all these registers in a uniform way;
SP, stack-pointer
Ignore for now;
IR, Instruction Register:
Holds the instruction (the actual instruction data) currently being executed.
TIR, Temporary Instruction Register
Holds temporary versions of the instruction while it is being decoded;
0, +1, -1
Constants; it is handy to have copies of them close by - avoids wasting time
accessing main memory.
AMASK
Another constant; used for masking (anding) the address part of the instruction;
i.e. AMASK and IR   address.
SMASK:
ditto for Stack (relative) addresses.
A, B, ...F
: General purpose registers; but general purpose only for the microprogrammer,
i.e. the assembly language cannot address them.

Internal Buses
There are three internal buses, A and B (source) buses and C (destination) bus.

External Buses
The address bus and the data bus.

Latches

A and B latches hold stable versions of A and B buses. There would be problems if,
for example, AC was connected straight into the A input of the ALU and, meanwhile,
the output of the ALU was connected to AC, i.e.. what version of AC to use; the
answer would be continuously changing.

ALU

As described in Figure 4.2.2 above. Recall that the ALU may perform any of four
functions:

0
A + B; note `plus', rather than or; (F1, F0) = (0, 0);
1
; (F1, F0) = (0, 1);
2
A straight through, B ignored; (F1, F0) = (1, 0);
3
; (F1, F0) = (1, 1).

Any other functions have to be programmed.

Shifter
The shifter is not a register - it passes the ALU output straight through: shifted left,
shifted right or not shifted.

Memory Address Register (MAR) and Memory Buffer


Register (MBR)
The MAR is a register which is used as a gateway - a `buffer' - onto the address bus.
Likewise the MBR (it might be better to call this memory data register) for the data
bus.
A-Multiplexer (AMUX)
The ALU input A can be fed with either:

1.
The contents of the A latch, or
2.
The contents of MBR, i.e. what was originally the contents of a memory
location.

Memory
The memory is considered to be a collection of cells or locations, each of which can
be addressed individually, and thus written to or read from. Effectively, memory is
like an array in Java or any other high-level language. For brevity, we shall refer to
this memory `array' as M and the address of a general cell as x and so, the contents of
the cell at address x as M[x], or m[x].

To read from a memory cell, the controller must cause the following to happen:

1.
Put an address, x, in MAR;
2.
Requests read - by asserting a read control line;
3.
At some time later, the contents of x, M[x] appear in MBR, from where, the
controller can cause it to be ...
4.
Transferred to the ALU or somewhere else.

To write to a memory cell, the controller must cause something similar to happen:

1.
Put an address, x, in MAR;
2.
Put the data in MBR;
3.
Requests write - by asserting a write control line;
4.
At some time later, the data arrive in memory cell x.
It is a feature of all general purpose computers that
executable instructions and data occupy the same memory space. Often, programs
are organised so that there are blocks of instructions and blocks of data. But, there is
no fundamental reason, except tidiness and efficiency, why instructions and data
cannot be mixed up together.

Register Transfer Language

To describe the details of operation of the CPU, we use a simple language called
Register Transfer Language (RTL). The notation is as follows. M[x]
denotes contents of location x; sometimes m[x], or even just [x]. Think of an envelope
with £100 in it, and your address on it.

Reg denotes a register; Reg = PC, IR, AC, R1 or R2.

[M[x]] denotes contents of the address contained in M[x]. Think of an envelope


containing another envelope.

We use   to denote transfer:  . Pronounce this as `A gets B'. In the case

of  , we say `A gets contents of x'.

Note: There is a world of difference between an address, 100, say, and data value
contained in that address.

The Fetch-Execute Cycle


How does the CPU and its controller execute a sequence of instructions?

Let us start by considering the execution the instruction at location 100Hex; what
follows is an endless loop of the so-called fetch-execute cycle:

Fetch:
read the next instruction and put it in the Instruction Register. Point to the next
instruction, ready for the next Fetch.
Execute:
decode and obey that instruction; if it is a JUMP type instruction, then revise
the pointing to the jumped-to instruction. Goto Fetch.

Start off with PC = 100H - PC is the Program Counter, and is used to address the
instruction (data) to be fetched and executed.
Fetch

i.e. get the program instruction from memory.


F1.
Load the contents of the program counter into the memory address register,
which is then put on the address bus. I.e..  ; MAR now holds
100;
F2.
Assert RD (read) from memory; this causes the data in cell 100 to be put on the
data bus;
F3.
Instruct the memory buffer register (MBR) to read the data bus;
F4.
Transfer the value in MBR into the Instruction Register,  ;
F5.

Point to the next instruction:  .

Execute

, or more precisely, decode and execute. For example, ADD Register +1 to AC;

i.e.  .
E1.
Transfer the contents of +1 to the A input of the ALU, via the A bus; transfer
the contents of AC to the A input of the ALU, via the B bus;
E2.
Set the ALU function (F0, F1) to ADD. Instruct it to perform the operation; at
some time later the result of the ADD will appear on the output of the ALU and
hence the C bus.
E3.
Transfer the data on the C bus into AC.

Microinstructions
In previous versions of this course, I used to go into complete detail of the
microprogram which controls the CPU - the microprogram implements the Fetch-
(Decode)-Execute cycle. This time I'm going to leave it out. If interested, see
section 5.9 appended to this chapter, or [Tanenbaum, 1990], section 4.2.2 onwards.
Instruction Set Model
We now examine the instruction set, by which Level 2 programmers can program the
machine; if in doubt, we call these macroinstructions.

Incidentally, you could program in microcode - but life is too short, just like life is too
short to program in assembly (macro) code, if you can do it in Java, C, or C++.

We will call the level 2 machine Mac-1a; Mac-1a is a restricted version of


Tanenbaum's Mac-1). The main characteristics of Mac-1a are: data word length 16-
bit; address size 12-bits.

Exercise: What is the maximum number of words we can have in the main memory of
Mac-1a? (neglect memory mapped input-output). How many bytes?

There are two addressing modes : immediate and direct; we will neglect


Tanenbaum's local and indirect for the meanwhile.

It is accumulator based: everything is done through AC; thus, `Add' is done as


follows: put operand 1 in AC, add to memory location, result is put in AC.

The Mac-1a programmer has no access to the PC or other CPU registers. Also, for
present purposes, assume that SP does not exist.

A limited version of the Mac-1 instruction set is shown in Figure 5.2.

The columns are as follows:

Binary code for instruction.


I.e. what the instruction looks like in computer memory, Machine code.
Mnemonic.
The name given to the instruction. Used when coding in assembly code.
Long name.
Descriptive name for instruction.
Action.
What the instruction actually does, described formally in register transfer
language (RTL).

  
Figure 5.2: Mac-1a Instruction Set (limited
version of Mac-1)

Memory-mapped Input-output
 

Introduction
As stated earlier, there are no direct instructions for input- output; instead Mac-1a
uses memory-mapped input-output, whereby some memory cells are mapped to
input-output ports; for simplicity we assume that there are only two ports, one
connected to a standard-input device, the other connected to a standard-output device:

 Input, mapped to 4092 (0FFCHex); status 4093 (0FFDHex).


 Output, mapped to 4094 (0FFEHex); status 4095 (0FFFHex).

We assume that each device works with bytes (i.e. 8-bits).

Input from standard-input device


A read from address 0FFCHex yields a 16-bit word, with the actual data byte in the
lower order byte. There is no use in reading the input port until the connected device
has put the data there: so 0FFDH is used to read the input status register; the top bit
(sign) of 0FFDH is set when the input data is available (DAV).
Thus, a read routine should go into a tight loop, continuously reading 0FFDHex, until
it goes negative; then 0FFCHex can be read to get the data. Reading 0FFC clears
0FFD again.

Output to the standard-output device


Output, to 0FFE, runs along the same lines as input. A write to 0FFE will send the
lower order byte to the standard-output device. The sign bit of 0FFFH signifies that
the device is in a ready to receive (RDY) state; again there is no use writing data to
the output port until the device is ready to read it.

Microprogram Control versus Hardware


Control
As discussed, control of the CPU - fetch, decode, execute - is done by a
microcontoller which obeys a program of microinstructions.

We might think of the microcontroller as a `black-box' such as that shown in


Figure 5.3.

You can think of it as a black-box which has a set of inputs and a set of outputs - just
like any other circuit, ALU, multiplexer, etc. Therefore, instead of
microprogramming, it can be made from logic hardware.

  
Figure 5.3: Controller Black-box, either
Microcontroller or Logic
To design the circuit, all you have to do is prepare a truth-table (6 input columns - op-
code (4 bits) and N, Z, 22 output columns), and generate the logic.

There is no reason why this hardware circuit could not decode an instruction in ONE
clock period, i.e. a lot faster than the microcode solution.

The microprogrammed solution allows arbitrarily complex instructions to be built-up.


It may also be more flexible, for example, there were many machines that users could
microprogram themselves; and, there were computers which differed only by their
microcode, perhaps one optimised for execution of C programs, another for COBOL
programs.

On the other hand, if implemented on a chip, control store takes up a lot of chip space.
And, as you can see by examining [Tanenbaum, 1990], microcode interpretation may
be relatively slow -- and gets slower, the more instructions there are.

CISC versus RISC


Machines with large sets of complex (and perhaps slow) instructions (implemented
with microcode), are called CISC - complex instruction set computer.
Those with small sets of relatively simple instructions, probably implemented in logic
are called RISC - reduced instruction set computer.

Most early machines - before about 1965 - were RISC. Then the fashion switched to
CISC. Now the fashion is switching back to RISC, albeit with some special go-
faster features that were not present on early RISC.

CISC machines are easier to program in machine and assembly code (see next
chapter), because they have a richer set of instructions. But, nowadays, less and less
programmers use assembly code, and compilers are becoming better. It comes down
to a tradeoff, complexity of `silicon' (microcode and CISC) or complexity of software
(highly efficient optimising compilers and RISC).

Microprogram
 

Microinstructions
To control the data path in Figure 5.1 we need 61 different signals:

 16 to control the loading of the A bus from the scratchpad


 16 to control the loading of the B bus from the scratchpad
 16 to control the loading of the scratchpad from C bus
 2 to clock A and B latches.
 2 to control ALU
 2 to control shifter
 4 to control MAR and MBR
 2 to indicate memory read and memory write
 1 to control AMUX

If we have decoded a macro-instruction and know the values of the 61 signals then it
is possible to execute the macro-instruction i.e.. perform one cycle of the data path.
[From now on instruction = macro-instruction; micro-instruction will never be
shortened.]

We could always work with a 61-bit micro-instruction register - whose outputs are
connected to the right places. However, many savings are possible:

1.
Only one scratchpad register may send data to the A bus (at any one time).
Thus 16 signals can be encoded in 4-bits; the 4-bits can be decoded on the way
out to the registers. 16-4 = 12 signals saved.
2.
Ditto B bus.

12 saved.

3.
The C bus is different - there may be many listeners; but, in practice, it is
treated the same as A and B buses.

12 saved.

4.
Latch signals L0 and L1 are ALWAYS needed, and always at the same time in
every data path cycle, so you can connect one phase of the clock to them.

2 saved.

5.
One additional signal: ENC - ENable C, ie. sometimes there is no need to put a
result in the scratch-pad, e.g. for a compare instruction, ENC=0, the output of
ALU goes nowhere.

1 extra.

6.
RD (memory read) and WR (memory write) can be used to control access from
MBR to the system data bus. Thus saving 2 signals.

2 saved.

Thus the size of micro-instruction register = 61 - (12+12+12+2+2-1) = 22 bits.

Microinstruction Format
Figure 5.4 shows microinstruction layout, with two new fields: ADDR and COND,
which are used by the microcontroller, to control itself.
  
Figure 5.4: Microinstruction format

Example. Here is the microinstruction which performs the execute part of a JUMP
instruction. 

Microinstruction Timing
A four phase clock is used. There are four phases or subcycles:

1.
Load next microinstruction into MIR, the MicroInstruction Register.
2.
Gate registers onto A and B buses. Latch bus data into latch A and B. Both
done by clock phase 2.
3.
Load MAR if necessary. Wait for ALU to do its work. (No need to tell it what
to do - the appropriate MIR bits are connected to it, all the time)
4.
Shifter output is now stable, and on the C bus. Clock C bus data into scratchpad
- if required. Load MBR from C bus - if required.

And that is it - one cycle of the data path operation; but note that it takes more than
one of these cycles to execute a macro-instruction.

Figure 5.5 shows the completed microarchitecture.

  
Figure 5.5: Microarchitecture block diagram
The components of the microcontrol part are now explained.

Control Store:
High-speed memory store for microinstructions; it is 32-bits wide. The address
uses 8-bits.
Control Store Interface:
Recall MAR and MBR for main memory.
MPC:
MicroProgram Counter. Equivalent to the MAR in the main CPU - the MPC
addresses the control store; so, it is also a bit like the main CPU PC register.
MIR:
MicroInstruction Register. Equivalent to MBR/IR. MIR is loaded only by clock
cycle 1.
Decoders:
Decoders are needed for coded A, B, C bus gating signals.

MicroInstruction Sequencing

Often micro-instructions follow in numerical order, i.e.   does


the sequencing.

But, as with conventional programs, may need to break normal sequence. I.e. jump .


The COND field states conditions under which jump may take place (including
never).

If a jump is required,   (via Mmux). The box Micro seq.


logic actually controls Mmux, but for clarity, this connection is not shown in
Figure 5.5. All this happens at clock phase 4, when ALU flags N, Z are stable.

Main memory read and write

We assume, as is usual, that it takes more than one micro- instruction cycle to Read or
write to memory (assume 2 cycles). Thus if you start a Read in cycle n by asserting
RD, you must ensure that RD is asserted in cycle n+1; in fact the whole n+1th
microinstruction can be RD alone. But, that would be wasteful, and clever
microprogrammers can usually find something useful to fill the time.

The Microprogram
Example. We require to add AC to A and store result in AC. This microinstruction
would do that:
ENC=1, C=1, B=1, A=10(decimal); fields not mentioned = 0

This is one way of writing it. You could also write out the 32 bits in full (as they
would appear in control store or in MIR). However, that would be tedious and error
prone. It is better to use a high-level language notation. Thus, we will use RTL,
supplemented with:

ALU functions:

;  ,
means run contents of REG through the ALU, e.g. to test if zero.
Shifter functions:
LSHIFT, RSHIFT.
Conditional statements:
IF cc THEN GOTO xx: where cc = ALU condition-code (N or Z) and xx =
microinstruction number to jump to.
RD and WR:
to indicate memory read, write.

Figure 5.6 illustrates the resultant MicroAssembler Language or MAL, and how it


translates into numeric microinstructions (we use decimal, instead of binary, to save
space) .

  
Figure 5.6: Illustration of MAL instructions,
plus their numeric code
Figure 5.7 show the full microprogram that runs on Mic-1 and interprets (fetches-
decodes-executes) Mac-1A instructions.

  
Figure 5.7: Illustration of MAL instructions,
plus their numeric code

Discussion

As described earlier, the microprogram continuously loops on:

 fetch;
 (decode and) execute.
Line 0 starts by fetching contents of PC; RD is asserted.

Line 1 just holds RD high (we need 2 cycles for memory access); it increments PC
while it is waiting.

Line 2 gets the instruction into IR; it also passes IR through ALU to test bit 15; if bit
15 is set then we jump to 0 ( no instructions in Mac-1a has bit 15 set - check in Figure
5.4-4).

Note the subtraction algorithm (see Chapter 2 for discussion of twos complement
etc.). If x and y are binary: 

x - y = x + (-y) = x + (y' +1) = x + 1 + y'

Exercise.

Simulate, by tracing on paper, the execution of the STOD instruction: STOD is at


location 100, it is storing to location 543. Assume PC = 100, AC = 4321.

(a) Write down the contents of all registers at the start.

(b) Ditto for all relevant memory locations, i.e. 100 and 543.

(c) Step through the microcode, from line: 0; note each line visited; write down the
contents of any register changed by the microinstruction.

(d) When you get back to line: 0 write down the contents of all registers and all
relevant memory locations.

(e) Use (a), (b) and (d) to verify that STOD did what it was meant to do.

Answer.
(a) Write down the contents of all registers at the start.

PC = 100, AC = 4321, others don't care.

(b) Ditto for all relevant memory locations, ie. 100 and
543.

[100] = 1543; [543] = don't care.

(c) Step through the microcode, from line: 0; note each line
visited; write down the contents of any register changed by
the microinstruction.

0: mar<-pc;rd;

mar = 100; RD set;

1: pc<-pc+1;rd;

pc = 101; RD set;

2: ir<-mbr;if n then goto 0;

mbr = 1543; ir = 1543; n=FALSE (0);


1543 = 0001 0101 0100 0011

3: tir<-lshift(ir+ir); if n then goto 19;

alu output = 0010 1010 1000 0110


shf output = 0101 0101 0000 1100
tir = 0101 0101 0000 1100
n = FALSE (0)

4: tir<-lshift(tir); if n then goto 11;

alu output = 0101 0101 0000 1100 (NB NOT shifted yet)
shf output = 1010 1010 0001 1000
tir = 1010 1010 0001 1000
n = FALSE (0) (because bit 15 of alu is 0)

5: alu<-tir; if n then goto 9;


alu output = 1010 1010 0001 1000
Finally, n = TRUE (1),
so, we go to 9.

9: mar<-ir;mbr<-ac;wr;

mar = 0101 0100 0011 (543 - it only holds 12 bits)


mbr = 4321;
WR set.

10:wr; goto 0;
WR set
[543] = 4321 at the end of this microcycle.

(d) When you get back to line: 0 write down the contents of all registers
and all relevant memory locations.

pc = 101, ac = 4321, [543] = 4321,


remainder don't care.

(e) Use (a), (b) and (d) to verify that STOD did what it was meant
to do.

I think so!
Exercises
1.
Consider the Mac-1a assembly code to do the equivalent of: a0 = a1 + a2:
lodd a1
addd a2
stod a0

Taking into account the fetch-execute cycle, and that there is a controller which
also uses MAR and MBR, and assuming that the program starts at 100Hex
(lodd a1 is there), and that a0, a1, a2 are at 100Hex, 101Hex, and 102Hex,
respectively, describe precisely, and in order, all the data travel along the bus,
to and from memory. Distinguish addresses and data.

2.
Which Mac-1a instructions make use of the N, Z condition flags in their
execution (I say execution to distinguish it from decode).
3.
Some machines allow assembly programmers access to registers such as A, B,
C ...in Figure 5.1. Why might programs be speeded up by using these registers?
For example, let us assume that there are instructions such as LODDA address,
LODDB address, ...which cause registers a, B, etc. to be loaded instead of AC;
also, corresponding ADDDA, ADDDB, which cause registers A, B, etc. to be
added to AC.
4.
If access to main memory is a bottleneck (the `von Neumann bottleneck'), think
of ways a alleviate the problem; hint: find a definition of (a) cache memory,
(b) pipelining.
5.
Using assembly language, show how to clear the accumulator on Mac-1A.
6.
Many machines have a HALT instruction - which causes the machine to stop
dead at the current PC address. Using assembly language, show how to halt
Mac-1A, e.g. make it sit at PC = 100 effectively doing nothing.
7.
Many machines have a NOOP - no-operation, i.e. the instruction wastes a bit of
time and no other effect. Using assembly language, show how to program the
same effect as a NOOP.
8.
List and describe two major shortcomings of Mac-1A, i.e. the limited set of
instructions in Figure 5.2.
9.
Make a prioritized list of instructions you think would improve Mac-1A.
Explain and give a rationale in each case.
10.
Given that many machines have writable control store (i.e. you can write your
own microcode to interpret your own macro-code), why do programmers
hardly ever use this facility. List and describe advantages and disadvantages.
11.
Describe some of the strengths and weaknesses of microprogramming and a
general programming technique - compare it to assembly language, to machine
language, and to Java.
12.
Assuming that Mac-1a uses 16-bit twos complement to store signed integers,
why is it impossible to LOCO 'load constant' minus 1, or indeed, any negative
number. Suggest a method to circumvent this problem - which actually impacts
positive numbers as well.
13.
Put on your thinking cap. How, in qualitative terms, could you avoid the two
cycle wait while Mac-1a reads the next instruction to be executed during the
fetch-decode-execute cycle? - at least avoid most, if not all of them.
14.
See the Exercise on tracing the operation of STOD. Count the number of
microinstructions required to execute STOD.
15.
If Mac-1A is being clocked at 100 MHz (overall cycle - don't worry about
phases). How long does it take to execute a STOD.
16.
Do the following Mic-1 instructions perform the same function (i.e. yield the
same set of results):
a<-a+a; if n then goto 0;

a<-lshift(a); if n then goto 0;

17.
Translate the following binary microinstruction into macroassembly language.
Figure 5.4 may be a help.
1100 1000 0001 0001 1001 0000 0000 1000

18.
Implement an AOAC ('add-one-to-accumulator') instruction in microcode. Use
Binary 1111000000000000 as its machine code.
19.
Implement a CLAC ('clear-accumulator') instruction in microcode.
Use 0111000000000000 as its binary code. What is significant about that code.
Analyse your implementation to see if it is any faster than the 'old' Mac-1A
method.
20.
Find the box marked Micro.seq. logic in Figure 5.5.
(a)
Compile a truth-table for the output bit; assume a 0 to the 'Mmux' causes the
'incremented MPC' to be selected; and, therefore, 1 causes ADDR to be
selected.
(b)
Derive a circuit for Micro. seq. logic.

       
Next: Assembly Language Programming Up: Computer Architecture -
CSC Previous: The Components of a
jc 
2000-11-13

You might also like