Professional Documents
Culture Documents
The Central Processing Unit (CPU) : Next
The Central Processing Unit (CPU) : Next
Subsections
Introduction
The Architecture of Mic-1
o Registers
o Internal Buses
o External Buses
o Shifter
o Memory Address Register (MAR) and Memory Buffer Register (MBR)
o A-Multiplexer (AMUX)
o Memory
The Fetch-Execute Cycle
Microinstructions
Instruction Set Model
Memory-mapped Input-output
o Introduction
o Input from standard-input device
o Output to the standard-output device
Microprogram Control versus Hardware Control
CISC versus RISC
Microprogram
o Microinstructions
o Microinstruction Format
o Microinstruction Timing
o MicroInstruction Sequencing
o The Microprogram
Exercises
We will then build up a real Central Processing Unit (CPU) from subsystems - ALU,
Registers etc. We will consider a CPU to consist of three primary parts:
1.
The internal registers, the ALU and the connecting buses - sometimes called
the data path;
2.
The input-output interface, which is the gateway through which data are sent
and received from main memory and input-output devices. Quite often this
interface is shown as part of the data path. No harm, as long as you understand
its interface role;
3.
The control part, which directs the activities of the data path and input-output
interface, e.g. opening and closing access to buses, selecting ALU function, etc.
A fourth part, main memory, is never far from the CPU but from a logical point of
view is best kept separate.
We will pay most attention to the data path part of the processor, and what must
happen in it to cause useful things to happen - to cause program instructions to be
executed.
Finally, we will briefly describe how the control part can be implemented
by microprogram, i.e. how the decoding of a machine instruction can lead to the
execution of a set of sequencing steps called a microprogram.
Before you continue, you should review the material on multiplexers, decoders,
ALUs, registers, buses, etc. in the previous chapter.
Registers
There are 16 identical 16-bit registers. But, they are not general purpose, each has a
special use:
Internal Buses
There are three internal buses, A and B (source) buses and C (destination) bus.
External Buses
The address bus and the data bus.
Latches
A and B latches hold stable versions of A and B buses. There would be problems if,
for example, AC was connected straight into the A input of the ALU and, meanwhile,
the output of the ALU was connected to AC, i.e.. what version of AC to use; the
answer would be continuously changing.
ALU
As described in Figure 4.2.2 above. Recall that the ALU may perform any of four
functions:
0
A + B; note `plus', rather than or; (F1, F0) = (0, 0);
1
; (F1, F0) = (0, 1);
2
A straight through, B ignored; (F1, F0) = (1, 0);
3
; (F1, F0) = (1, 1).
Shifter
The shifter is not a register - it passes the ALU output straight through: shifted left,
shifted right or not shifted.
1.
The contents of the A latch, or
2.
The contents of MBR, i.e. what was originally the contents of a memory
location.
Memory
The memory is considered to be a collection of cells or locations, each of which can
be addressed individually, and thus written to or read from. Effectively, memory is
like an array in Java or any other high-level language. For brevity, we shall refer to
this memory `array' as M and the address of a general cell as x and so, the contents of
the cell at address x as M[x], or m[x].
To read from a memory cell, the controller must cause the following to happen:
1.
Put an address, x, in MAR;
2.
Requests read - by asserting a read control line;
3.
At some time later, the contents of x, M[x] appear in MBR, from where, the
controller can cause it to be ...
4.
Transferred to the ALU or somewhere else.
To write to a memory cell, the controller must cause something similar to happen:
1.
Put an address, x, in MAR;
2.
Put the data in MBR;
3.
Requests write - by asserting a write control line;
4.
At some time later, the data arrive in memory cell x.
It is a feature of all general purpose computers that
executable instructions and data occupy the same memory space. Often, programs
are organised so that there are blocks of instructions and blocks of data. But, there is
no fundamental reason, except tidiness and efficiency, why instructions and data
cannot be mixed up together.
To describe the details of operation of the CPU, we use a simple language called
Register Transfer Language (RTL). The notation is as follows. M[x]
denotes contents of location x; sometimes m[x], or even just [x]. Think of an envelope
with £100 in it, and your address on it.
We use to denote transfer: . Pronounce this as `A gets B'. In the case
Note: There is a world of difference between an address, 100, say, and data value
contained in that address.
Let us start by considering the execution the instruction at location 100Hex; what
follows is an endless loop of the so-called fetch-execute cycle:
Fetch:
read the next instruction and put it in the Instruction Register. Point to the next
instruction, ready for the next Fetch.
Execute:
decode and obey that instruction; if it is a JUMP type instruction, then revise
the pointing to the jumped-to instruction. Goto Fetch.
Start off with PC = 100H - PC is the Program Counter, and is used to address the
instruction (data) to be fetched and executed.
Fetch
Execute
, or more precisely, decode and execute. For example, ADD Register +1 to AC;
i.e. .
E1.
Transfer the contents of +1 to the A input of the ALU, via the A bus; transfer
the contents of AC to the A input of the ALU, via the B bus;
E2.
Set the ALU function (F0, F1) to ADD. Instruct it to perform the operation; at
some time later the result of the ADD will appear on the output of the ALU and
hence the C bus.
E3.
Transfer the data on the C bus into AC.
Microinstructions
In previous versions of this course, I used to go into complete detail of the
microprogram which controls the CPU - the microprogram implements the Fetch-
(Decode)-Execute cycle. This time I'm going to leave it out. If interested, see
section 5.9 appended to this chapter, or [Tanenbaum, 1990], section 4.2.2 onwards.
Instruction Set Model
We now examine the instruction set, by which Level 2 programmers can program the
machine; if in doubt, we call these macroinstructions.
Incidentally, you could program in microcode - but life is too short, just like life is too
short to program in assembly (macro) code, if you can do it in Java, C, or C++.
Exercise: What is the maximum number of words we can have in the main memory of
Mac-1a? (neglect memory mapped input-output). How many bytes?
The Mac-1a programmer has no access to the PC or other CPU registers. Also, for
present purposes, assume that SP does not exist.
Figure 5.2: Mac-1a Instruction Set (limited
version of Mac-1)
Memory-mapped Input-output
Introduction
As stated earlier, there are no direct instructions for input- output; instead Mac-1a
uses memory-mapped input-output, whereby some memory cells are mapped to
input-output ports; for simplicity we assume that there are only two ports, one
connected to a standard-input device, the other connected to a standard-output device:
You can think of it as a black-box which has a set of inputs and a set of outputs - just
like any other circuit, ALU, multiplexer, etc. Therefore, instead of
microprogramming, it can be made from logic hardware.
Figure 5.3: Controller Black-box, either
Microcontroller or Logic
To design the circuit, all you have to do is prepare a truth-table (6 input columns - op-
code (4 bits) and N, Z, 22 output columns), and generate the logic.
There is no reason why this hardware circuit could not decode an instruction in ONE
clock period, i.e. a lot faster than the microcode solution.
On the other hand, if implemented on a chip, control store takes up a lot of chip space.
And, as you can see by examining [Tanenbaum, 1990], microcode interpretation may
be relatively slow -- and gets slower, the more instructions there are.
Most early machines - before about 1965 - were RISC. Then the fashion switched to
CISC. Now the fashion is switching back to RISC, albeit with some special go-
faster features that were not present on early RISC.
CISC machines are easier to program in machine and assembly code (see next
chapter), because they have a richer set of instructions. But, nowadays, less and less
programmers use assembly code, and compilers are becoming better. It comes down
to a tradeoff, complexity of `silicon' (microcode and CISC) or complexity of software
(highly efficient optimising compilers and RISC).
Microprogram
Microinstructions
To control the data path in Figure 5.1 we need 61 different signals:
If we have decoded a macro-instruction and know the values of the 61 signals then it
is possible to execute the macro-instruction i.e.. perform one cycle of the data path.
[From now on instruction = macro-instruction; micro-instruction will never be
shortened.]
We could always work with a 61-bit micro-instruction register - whose outputs are
connected to the right places. However, many savings are possible:
1.
Only one scratchpad register may send data to the A bus (at any one time).
Thus 16 signals can be encoded in 4-bits; the 4-bits can be decoded on the way
out to the registers. 16-4 = 12 signals saved.
2.
Ditto B bus.
12 saved.
3.
The C bus is different - there may be many listeners; but, in practice, it is
treated the same as A and B buses.
12 saved.
4.
Latch signals L0 and L1 are ALWAYS needed, and always at the same time in
every data path cycle, so you can connect one phase of the clock to them.
2 saved.
5.
One additional signal: ENC - ENable C, ie. sometimes there is no need to put a
result in the scratch-pad, e.g. for a compare instruction, ENC=0, the output of
ALU goes nowhere.
1 extra.
6.
RD (memory read) and WR (memory write) can be used to control access from
MBR to the system data bus. Thus saving 2 signals.
2 saved.
Microinstruction Format
Figure 5.4 shows microinstruction layout, with two new fields: ADDR and COND,
which are used by the microcontroller, to control itself.
Figure 5.4: Microinstruction format
Example. Here is the microinstruction which performs the execute part of a JUMP
instruction.
Microinstruction Timing
A four phase clock is used. There are four phases or subcycles:
1.
Load next microinstruction into MIR, the MicroInstruction Register.
2.
Gate registers onto A and B buses. Latch bus data into latch A and B. Both
done by clock phase 2.
3.
Load MAR if necessary. Wait for ALU to do its work. (No need to tell it what
to do - the appropriate MIR bits are connected to it, all the time)
4.
Shifter output is now stable, and on the C bus. Clock C bus data into scratchpad
- if required. Load MBR from C bus - if required.
And that is it - one cycle of the data path operation; but note that it takes more than
one of these cycles to execute a macro-instruction.
Figure 5.5: Microarchitecture block diagram
The components of the microcontrol part are now explained.
Control Store:
High-speed memory store for microinstructions; it is 32-bits wide. The address
uses 8-bits.
Control Store Interface:
Recall MAR and MBR for main memory.
MPC:
MicroProgram Counter. Equivalent to the MAR in the main CPU - the MPC
addresses the control store; so, it is also a bit like the main CPU PC register.
MIR:
MicroInstruction Register. Equivalent to MBR/IR. MIR is loaded only by clock
cycle 1.
Decoders:
Decoders are needed for coded A, B, C bus gating signals.
MicroInstruction Sequencing
We assume, as is usual, that it takes more than one micro- instruction cycle to Read or
write to memory (assume 2 cycles). Thus if you start a Read in cycle n by asserting
RD, you must ensure that RD is asserted in cycle n+1; in fact the whole n+1th
microinstruction can be RD alone. But, that would be wasteful, and clever
microprogrammers can usually find something useful to fill the time.
The Microprogram
Example. We require to add AC to A and store result in AC. This microinstruction
would do that:
ENC=1, C=1, B=1, A=10(decimal); fields not mentioned = 0
This is one way of writing it. You could also write out the 32 bits in full (as they
would appear in control store or in MIR). However, that would be tedious and error
prone. It is better to use a high-level language notation. Thus, we will use RTL,
supplemented with:
ALU functions:
; ,
means run contents of REG through the ALU, e.g. to test if zero.
Shifter functions:
LSHIFT, RSHIFT.
Conditional statements:
IF cc THEN GOTO xx: where cc = ALU condition-code (N or Z) and xx =
microinstruction number to jump to.
RD and WR:
to indicate memory read, write.
Figure 5.6: Illustration of MAL instructions,
plus their numeric code
Figure 5.7 show the full microprogram that runs on Mic-1 and interprets (fetches-
decodes-executes) Mac-1A instructions.
Figure 5.7: Illustration of MAL instructions,
plus their numeric code
Discussion
fetch;
(decode and) execute.
Line 0 starts by fetching contents of PC; RD is asserted.
Line 1 just holds RD high (we need 2 cycles for memory access); it increments PC
while it is waiting.
Line 2 gets the instruction into IR; it also passes IR through ALU to test bit 15; if bit
15 is set then we jump to 0 ( no instructions in Mac-1a has bit 15 set - check in Figure
5.4-4).
Note the subtraction algorithm (see Chapter 2 for discussion of twos complement
etc.). If x and y are binary:
Exercise.
(b) Ditto for all relevant memory locations, i.e. 100 and 543.
(c) Step through the microcode, from line: 0; note each line visited; write down the
contents of any register changed by the microinstruction.
(d) When you get back to line: 0 write down the contents of all registers and all
relevant memory locations.
(e) Use (a), (b) and (d) to verify that STOD did what it was meant to do.
Answer.
(a) Write down the contents of all registers at the start.
(b) Ditto for all relevant memory locations, ie. 100 and
543.
(c) Step through the microcode, from line: 0; note each line
visited; write down the contents of any register changed by
the microinstruction.
0: mar<-pc;rd;
1: pc<-pc+1;rd;
pc = 101; RD set;
alu output = 0101 0101 0000 1100 (NB NOT shifted yet)
shf output = 1010 1010 0001 1000
tir = 1010 1010 0001 1000
n = FALSE (0) (because bit 15 of alu is 0)
9: mar<-ir;mbr<-ac;wr;
10:wr; goto 0;
WR set
[543] = 4321 at the end of this microcycle.
(d) When you get back to line: 0 write down the contents of all registers
and all relevant memory locations.
(e) Use (a), (b) and (d) to verify that STOD did what it was meant
to do.
I think so!
Exercises
1.
Consider the Mac-1a assembly code to do the equivalent of: a0 = a1 + a2:
lodd a1
addd a2
stod a0
Taking into account the fetch-execute cycle, and that there is a controller which
also uses MAR and MBR, and assuming that the program starts at 100Hex
(lodd a1 is there), and that a0, a1, a2 are at 100Hex, 101Hex, and 102Hex,
respectively, describe precisely, and in order, all the data travel along the bus,
to and from memory. Distinguish addresses and data.
2.
Which Mac-1a instructions make use of the N, Z condition flags in their
execution (I say execution to distinguish it from decode).
3.
Some machines allow assembly programmers access to registers such as A, B,
C ...in Figure 5.1. Why might programs be speeded up by using these registers?
For example, let us assume that there are instructions such as LODDA address,
LODDB address, ...which cause registers a, B, etc. to be loaded instead of AC;
also, corresponding ADDDA, ADDDB, which cause registers A, B, etc. to be
added to AC.
4.
If access to main memory is a bottleneck (the `von Neumann bottleneck'), think
of ways a alleviate the problem; hint: find a definition of (a) cache memory,
(b) pipelining.
5.
Using assembly language, show how to clear the accumulator on Mac-1A.
6.
Many machines have a HALT instruction - which causes the machine to stop
dead at the current PC address. Using assembly language, show how to halt
Mac-1A, e.g. make it sit at PC = 100 effectively doing nothing.
7.
Many machines have a NOOP - no-operation, i.e. the instruction wastes a bit of
time and no other effect. Using assembly language, show how to program the
same effect as a NOOP.
8.
List and describe two major shortcomings of Mac-1A, i.e. the limited set of
instructions in Figure 5.2.
9.
Make a prioritized list of instructions you think would improve Mac-1A.
Explain and give a rationale in each case.
10.
Given that many machines have writable control store (i.e. you can write your
own microcode to interpret your own macro-code), why do programmers
hardly ever use this facility. List and describe advantages and disadvantages.
11.
Describe some of the strengths and weaknesses of microprogramming and a
general programming technique - compare it to assembly language, to machine
language, and to Java.
12.
Assuming that Mac-1a uses 16-bit twos complement to store signed integers,
why is it impossible to LOCO 'load constant' minus 1, or indeed, any negative
number. Suggest a method to circumvent this problem - which actually impacts
positive numbers as well.
13.
Put on your thinking cap. How, in qualitative terms, could you avoid the two
cycle wait while Mac-1a reads the next instruction to be executed during the
fetch-decode-execute cycle? - at least avoid most, if not all of them.
14.
See the Exercise on tracing the operation of STOD. Count the number of
microinstructions required to execute STOD.
15.
If Mac-1A is being clocked at 100 MHz (overall cycle - don't worry about
phases). How long does it take to execute a STOD.
16.
Do the following Mic-1 instructions perform the same function (i.e. yield the
same set of results):
a<-a+a; if n then goto 0;
17.
Translate the following binary microinstruction into macroassembly language.
Figure 5.4 may be a help.
1100 1000 0001 0001 1001 0000 0000 1000
18.
Implement an AOAC ('add-one-to-accumulator') instruction in microcode. Use
Binary 1111000000000000 as its machine code.
19.
Implement a CLAC ('clear-accumulator') instruction in microcode.
Use 0111000000000000 as its binary code. What is significant about that code.
Analyse your implementation to see if it is any faster than the 'old' Mac-1A
method.
20.
Find the box marked Micro.seq. logic in Figure 5.5.
(a)
Compile a truth-table for the output bit; assume a 0 to the 'Mmux' causes the
'incremented MPC' to be selected; and, therefore, 1 causes ADDR to be
selected.
(b)
Derive a circuit for Micro. seq. logic.
Next: Assembly Language Programming Up: Computer Architecture -
CSC Previous: The Components of a
jc
2000-11-13