Professional Documents
Culture Documents
Computer Organization - Hardwired V/s Micro-Programmed Control Unit
Computer Organization - Hardwired V/s Micro-Programmed Control Unit
To execute an instruction, the control unit of the CPU must generate the required
control signal in the proper sequence. There are two approaches used for generating
the control signals in proper sequence as Hardwired Control unit and Micro-
programmed control unit.
Hardwired Control Unit –
The control hardware can be viewed as a state machine that changes from one state to
another in every clock cycle, depending on the contents of the instruction register, the
condition codes and the external inputs. The outputs of the state machine are the
control signals. The sequence of the operation carried out by this machine is
determined by the wiring of the logic elements and hence named as “hardwired”.
Fixed logic circuits that correspond directly to the Boolean expressions are used
to generate the control signals.
Hardwired control is faster than micro-programmed control.
A controller that uses this approach can operate at high speed.
RISC architecture is based on hardwired control unit
The control signals associated with operations are stored in special memory units
inaccessible by the programmer as Control Words.
Control signals are generated by a program are similar to machine language
programs.
Micro-programmed control unit is slower in speed because of the time it takes to
fetch microinstructions from the control memory.
Some Important Terms –
1. Control Word : A control word is a word whose individual bits represent various
control signals.
2. Micro-routine : A sequence of control words corresponding to the control
sequence of a machine instruction constitutes the micro-routine for that instruction.
3. Micro-instruction : Individual control words in this micro-routine are referred to
as microinstructions.
4. Micro-program : A sequence of micro-instructions is called a micro-program,
which is stored in a ROM or RAM called a Control Memory (CM).
5. Control Store : the micro-routines for all instructions in the instruction set of a
computer are stored in a special memory called the Control Store.
1
1. Horizontal Micro-programmed control Unit :
The control signals are represented in the decoded binary format that is 1 bit/CS.
Example: If 53 Control signals are present in the processor than 53 bits are required.
More than 1 control signal can be enabled at a time.
It supports longer control word.
It is used in parallel processing applications.
It allows higher degree of parallelism. If degree is n, n CS are enabled at a time.
It requires no additional hardware(decoders). It means it is faster than Vertical
Microprogrammed.
It is more flexible than vertical microprogrammed
2. Vertical Micro-programmed control Unit :
The control signals re represented in the encoded binary format. For N control signals-
Log2(N) bits are required.
It supports shorter control words.
It supports easy implementation of new control signals therefore it is more
flexible.
It allows low degree of parallelism i.e., degree of parallelism is either 0 or 1.
Requires an additional hardware (decoders) to generate control signals, it implies
it is slower than horizontal microprogrammed.
It is less flexible than horizontal but more flexible than that of hardwired control
unit.
Instruction pipelining
is a technique used in the design of modern microprocessors, microcontrollers and CPUs to
increase their instruction throughput (the number of instructions that can be executed in a unit of
time).
The main idea is to divide (termed "split") the processing of a CPU instruction, as defined by the
instruction microcode, into a series of independent steps of micro-operations (also
called "microinstructions", "micro-op" or "µop"), with storage at the end of each step. This
allows the CPUs control logic to handle instructions at the processing rate of the slowest step,
which is much faster than the time needed to process the instruction as a single step.
The term pipeline refers to the fact that each step is carrying a single microinstruction (like a
drop of water), and each step is linked to another step (analogy; similar to water pipes).
Most modern CPUs are driven by a clock. The CPU consists internally of logic and memory (flip
flops). When the clock signal arrives, the flip flops store their new value then the logic requires a
period of time to decode the flip flops new values. Then the next clock pulse arrives and the flip
flops store another values, and so on. By breaking the logic into smaller pieces and inserting flip
flops between pieces of logic, the time required by the logic (to decode values till generating
valid outputs depending on these values) is reduced. In this way the clock period can be reduced.
For example, the RISC pipeline is broken into five stages with a set of flip flops between each
stage as follows:
1. Instruction fetch
2
2. Instruction decode and register fetch
3. Execute
4. Memory access
5. Register write back
Processors with pipelining consist internally of stages (modules) which can semi-independently
work on separate microinstructions. Each stage is linked by flip flops to the next stage (like a
"chain") so that the stage's output is an input to another stage until the job of processing
instructions is done. Such organization of processor internal modules reduces the instruction's
overall processing time.
A non-pipeline architecture is not as efficient because some CPU modules are idle while another
module is active during the instruction cycle. Pipelining does not completely remove idle time in
a pipelined CPU, but making CPU modules work in parallel increases instruction throughput.
An instruction pipeline is said to be fully pipelined if it can accept a new instruction every clock
cycle. A pipeline that is not fully pipelined has wait cycles that delay the progress of the pipeline.
1. The cycle time of the processor is reduced; increasing the instruction throughput.
Pipelining doesn't reduce the time it takes to complete an instruction; instead it increases
the number of instructions that can be processed simultaneously ("at once") and reduces
the delay between completed instructions (called 'throughput').
The more pipeline stages a processor has, the more instructions it can process "at once"
and the less of a delay there is between completed instructions. Every predominant
general purpose microprocessor manufactured today uses at least 2 stages of pipeline up
to 30 or 40 stages.
2. If pipelining is used, the CPU Arithmetic logic unit can be designed faster, but more
complex.
3. Pipelining in theory increases performance over an un-pipelined core by a factor of the
number of stages (assuming the clock frequency also increases by the same factor) and
the code is ideal for pipeline execution.
4. Pipelined CPUs generally work at a higher clock frequency than the RAM clock
frequency, (as of 2008 technologies, RAMs work at a low frequencies compared to
CPUs frequencies) increasing computers overall performance.
Disadvantages of Pipelining:
Pipelining has many disadvantages though there are a lot of techniques used by CPUs
and compilers designers to overcome most of them; the following is a list of common
drawbacks:
4
Examples[change | change source]
Generic pipeline[change | change source]
Generic 4-stage pipeline; the colored boxes represent instructions independent of each other
To the right is a generic pipeline with four stages:
1. Fetch
2. Decode
3. Execute
4. Write-back
The top gray box is the list of instructions waiting to be executed; the bottom gray box is the
list of instructions that have been completed; and the middle white box is the pipeline.
Execution is as follows:
Time Execution
6
A bubble in cycle 3 delays execution
When a "hiccup" (interruption) in execution occurs, a "bubble" is created in the pipeline in
which nothing useful happens. In cycle 2, the fetching of the purple instruction is delayed
and the decoding stage in cycle 3 now contains a bubble. Everything behind the purple
instruction is delayed as well but everything in front of the purple instruction continues with
execution.
Clearly, when compared to the execution above, the bubble yields a total execution time of 8
clock ticks instead of 7.
Bubbles are like stalls (delays), in which nothing useful will happen for the fetch, decode,
execute and writeback. It is like a NOP (short for No OPeration) code.
Example 1[change | change source]
A typical instruction to add two numbers might be ADD A, B, C , which adds the values
found in memory locations A and B, and then puts the result in memory location C. In a
pipelined processor the pipeline controller would break this into a series of tasks similar to:
LOAD A, R1
LOAD B, R2
ADD R1, R2, R3
STORE R3, C
LOAD next instruction
The locations 'R1' and 'R2' are registers in the CPU. The values stored in memory locations
labeled 'A' and 'B' are loaded (copied) into these registers, then added, and the result is stored
in a memory location labeled 'C'.
In this example the pipeline is three stages long- load, execute, and store. Each of the steps
are called pipeline stages.
On a non-pipelined processor, only one stage can be working at a time so the entire
instruction has to complete before the next instruction can begin. On a pipelined processor,
all of the stages can be working at once on different instructions. So when this instruction is
at the execute stage, a second instruction will be at the decode stage and a 3rd instruction
will be at the fetch stage.
Example 2[change | change source]
To better understand the concept, we can look at a theoretical 3-stage pipeline:
Stage Description
7
Execut
Execute instruction
e
Clock 1
LOA
D
Clock 2
MOV
LOAD
E
The LOAD instruction is executed, while the MOVE instruction is fetched from memory.
Clock 3
8
Load Execute Store
The LOAD instruction is in the Store stage, where its result (the number 40) will be stored in
the register A. In the meantime, the MOVE instruction is being executed. Since it must move
the contents of A into B, it must wait for the ending of the LOAD instruction.
Clock 4
The STORE instruction is loaded, while the MOVE instruction is finishing off and the ADD
is calculating.
And so on. Note that, sometimes, an instruction will depend on the result of another one (like
our MOVE example). When more than one instruction references a particular location for an
operand, either reading it (as an input) or writing it (as an output), executing those
instructions in an order different from the original program order can lead to
the hazards situation (mentioned above).