Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

AMBO UNIVERSITY WALISO CAMPUS

Dep:- Information Technology


GROUP NAME AND ID
NO NAME OF STUDENT ID
1 ABDINAH DEBA AU/R 25866/11
2 MIKIRU NIGUSE AU/R 25874/11
3 GEBAYEHU SEKETA 026/10
4 ABDINAOL TAMIRAT AU/R 25864/11
5 KITESA TENA AU/R 26093/11
6 MEGERSA LEMU AU/R 25904/11
7 AMENUEL ITEFA AU/R 25982/11

function organization

COMPUTER
Mr. Amanuel . ORGANIZATION AND
ARCHITECTURE
FUNCTION ORGANIZATION 01/29/2020

Implementation of simple data path

• We will design a simplified MIPS processor

• The instructions supported are


– memory-reference instructions: lw, sw
– arithmetic-logical instructions: add, sub, and, or, slt
– control flow instructions: beq, j• We will design a simplified MIPS processor
• The instructions supported are
– memory-reference instructions: lw, sw
– arithmetic-logical instructions: add, sub, and, or, slt
– control flow instructions: beq, j

• Generic Implementation:
– use the program counter (PC) to supply instruction address
– get the instruction from memory
– read registers
– use the instruction to decide exactly what to do
• All instructions use the ALU after reading the registers
Why? memory-reference? arithmetic? control flow?

What blocks we need

• We need an ALU
– We have already designed that
• We need memory to store inst and data
– Instruction memory takes address and supplies inst
– Data memory takes address and supply data for lw
– Data memory takes address and data and write into memory

• We need to manage a pc and its update mechanism

• We need a register file to include 32 registers


– We read two operands and write a result back in register file
• Sometimes part of the operand comes from instruction
• We may add support of immediate class of instructions
• We may add support for J, JR, JAL

January 29, 2020


1
FUNCTION ORGANIZATION 01/29/2020

Compute
r Organization | Hardwired v/s Micro-Programmed Control Unit

To execute an instruction, the control unit of the CPU must generate the required control signal
in the proper sequence. There are two approaches used for generating the control signals in
proper sequence as Hardwired Control unit and Micro-programmed control unit.
HardwiredControlUnit –
The control hardware can be viewed as a state machine that changes from one state to another in
every clock cycle, depending on the contents of the instruction register, the condition codes and

January 29, 2020


2
FUNCTION ORGANIZATION 01/29/2020

the external inputs. The outputs of the state machine are the control signals. The sequence of the
operation carried out by this machine is determined by the wiring of the logic elements and
hence named as “hardwired”.
 Fixed logic circuits that correspond directly to the Boolean expressions are used to
generate the control signals.
 Hardwired control is faster than micro-programmed control.
 A controller that uses this approach can operate at high speed.
 RISC architecture is based on hardwired control unit
 Design is complicated.
 Decoding is complex.
 Flexibility, difficult to alter hardwire for new instruction.
 Implementation approach, use sequential circuit contain logic gate.
 Ability for complexity and application, difficult to handle complex instruction.
 Control memory are required, less / absent.

Micro-programmed Control Unit –


A language is used to deign microprogrammed control unit known as a microprogramming
language. Each line describes a set of micro-operations occurring at one time and is known as a
microinstruction.
A sequence of instructions is known as a microprogram, or firmware.
 The control signals associated with operations are stored in special memory units
inaccessible by the programmer as Control Words.
 Control signals are generated by a program are similar to machine language programs.
 Micro-programmed control unit is slower in speed because of the time it takes to fetch
microinstructions from the control memory.
 Design is systematic.
 Decoding is pretty much easy.
 Flexibility, easy to alter machine instruction for new operation.
 Implementation, use micro programming having micro instruction.
 Control memory and chip are required, more.
 Ability for complexity application: easy to handle complex instruction, CISC micro
processor.

Instruction pipelining 
is a technique used in the design of modern microprocessors, microcontrollers and CPUs to
increase their instruction throughput (the number of instructions that can be executed in a unit of
time).
The main idea is to divide (termed "split") the processing of a CPU instruction, as defined by the
instruction microcode, into a series of independent steps of micro-operations (also
called "microinstructions", "micro-op" or "µop"), with storage at the end of each step.
This allows the CPUs control logic to handle instructions at the processing rate of the slowest
step, which is much faster than the time needed to process the instruction as a single step.

January 29, 2020


3
FUNCTION ORGANIZATION 01/29/2020

The term pipeline refers to the fact that each step is carrying a single microinstruction (like a
drop of water), and each step is linked to another step (analogy; similar to water pipes).
Most modern CPUs are driven by a clock. The CPU consists internally of logic and memory (flip
flops). When the clock signal arrives, the flip flops store their new value then the logic requires a
period of time to decode the flip flops new values. Then the next clock pulse arrives and the flip
flops store another values, and so on. By breaking the logic into smaller pieces and inserting flip
flops between pieces of logic, the time required by the logic (to decode values till generating
valid outputs depending on these values) is reduced. For example, the RISC pipeline is broken
into five stages with a set of flip flops between each stage as follows:

1. Instruction fetch
2. Instruction decode and register fetch
3. Execute
4. Memory access
5. Register write back.

Processors with pipelining consist internally of stages (modules) which can semi-independently
work on separate microinstructions. Each stage is linked by flip flops to the next stage (like a
"chain") so that the stage's output is an input to another stage until the job of processing
instructions is done. Such organization of processor internal modules reduces the instruction's
overall processing time.
A non-pipeline architecture is not as efficient because some CPU modules are idle while another
module is active during the instruction cycle. Pipelining does not completely remove idle time in
a pipelined CPU, but making CPU modules work in parallel increases instruction throughput.
An instruction pipeline is said to be fully pipelined if it can accept a new instruction every clock
cycle. A pipeline that is not fully pipelined has wait cycles that delay the progress of the pipeline.

January 29, 2020


4
FUNCTION ORGANIZATION 01/29/2020

Advantages and Disadvantages of Pipelining[change | change source]


Advantages of Pipelining:

1. The cycle time of the processor is reduced; increasing the instruction throughput.
Pipelining doesn't reduce the time it takes to complete an instruction; instead it increases
the number of instructions that can be processed simultaneously ("at once") and reduces
the delay between completed instructions (called 'throughput').
The more pipeline stages a processor has, the more instructions it can process "at once"
and the less of a delay there is between completed instructions. Every predominant
general purpose microprocessor manufactured today uses at least 2 stages of pipeline up
to 30 or 40 stages.
2. If pipelining is used, the CPU Arithmetic logic unit can be designed faster, but more
complex.
3. Pipelining in theory increases performance over an un-pipelined core by a factor of the
number of stages (assuming the clock frequency also increases by the same factor) and
the code is ideal for pipeline execution.
4. Pipelined CPUs generally work at a higher clock frequency than the RAM clock
frequency, (as of 2008 technologies, RAMs work at a low frequencies compared to
CPUs frequencies) increasing computers overall performance.
Disadvantages of Pipelining:
Pipelining has many disadvantages though there are a lot of techniques used by CPUs
and compilers designers to overcome most of them; the following is a list of common
drawbacks:

1. The design of a non-pipelined processor is simpler and cheaper to manufacture, non-


pipelined processor executes only a single instruction at a time. This prevents branch
delays (in Pipelining, every branch is delayed) as well as problems when serial
instructions being executed concurrently.
2. In pipelined processor, insertion of flip flops between modules increases the
instruction latency compared to a non-pipelining processor.
3. A non-pipelined processor will have a defined instruction throughput. The
performance of a pipelined processor is much harder to predict and may vary widely
for different programs.
4. Hazards: When a programmer (or compiler) writes assembly code, they generally
assume that each instruction is executed before the next instruction is being
executed. When this assumption is not validated by pipelining it causes a program to
behave incorrectly, the situation is known as a hazard.
Various techniques for resolving hazards or working around such as forwarding and
delaying (by inserting a stall or a wasted clock cycle) exist.

January 29, 2020


5
FUNCTION ORGANIZATION 01/29/2020

Example 1[change | change source]


A typical instruction to add two numbers might be  ADD A, B, C , which adds the values
found in memory locations A and B, and then puts the result in memory location C. In a
pipelined processor the pipeline controller would break this into a series of tasks similar to:

LOAD A, R1
LOAD B, R2
ADD R1, R2, R3
STORE R3, C
LOAD next instruction

The locations 'R1' and 'R2' are registers in the CPU. The values stored in memory locations
labeled 'A' and 'B' are loaded (copied) into these registers, then added, and the result is stored
in a memory location labeled 'C'.
In this example the pipeline is three stages long- load, execute, and store. Each of the steps
are called pipeline stages.
On a non-pipelined processor, only one stage can be working at a time so the entire
instruction has to complete before the next instruction can begin. On a pipelined processor,
all of the stages can be working at once on different instructions. So when this instruction is
at the execute stage, a second instruction will be at the decode stage and a 3rd instruction
will be at the fetch stage.
Introduction to instruction level parallelism.
Introduces the concept of pipelined architecture, using as a use case the design of _vet stages
basic processor. It also introduces the concept of hazard and classifies the deferent kinds of
hazards. Finally, it introduces the idea of multicycle operations.
Two strategies to support ILP:
• Dynamic Scheduling: Depend on the hardware to locate parallelism
• The hardware reorder dynamically the instruction execution to reduce pipeline stalls while
maintaining data flow and exception behavior.
Main advantages (PROs):
• It enables handling some cases where dependences are unknown at compile time
• It simplifies the compiler complexity
• It allows compiled code to run efficiently on a different pipeline (code portability).
Those advantages are gained at a cost of (CONs):
• A significant increase in hardware complexity
• Increased power consumption
• Could generate imprecise exceptions
• Simple pipeline: hazards due to data dependences that cannot be hidden by forwarding stall
the pipeline no new instructions are fetched nor issued.
• Dynamic scheduling: Hardware reorder instructions execution so as to reduce stalls,
maintaining data flow and exception behavior.
• Typical example: Superscalar Processor

January 29, 2020


6
FUNCTION ORGANIZATION 01/29/2020

Static Scheduling: Rely on compiler for identifying potential parallelism


• Hardware intensive approaches dominate desktop and server markets
• Static detection and resolution of dependences
 static scheduling: accomplished by the compiler
 dependences are avoided by code reordering. Output of the compiler: reordered into
dependency-free code.
Typical example: VLIW (Very Long Instruction
Word) processors expect dependency-free code generated by the compiler
• Compilers can use sophisticated algorithms for code scheduling to exploit ILP
(Instruction Level Parallelism).
• The size of a basic block – a straight-line code sequence with no branches in except to the
entry and
no branches out except at the exit – is usually quite small and the amount of parallelism
available within
a basic block is quite small.
• Example: For typical MIPS programs the average branch frequency is between 15% and
25%  from 4 to 7 instructions execute between a pair of branches.
• Data dependence can further limit the amount of ILP we can exploit within a basic block to
much less than the average basic block size.
• To obtain substantial performance enhancements, we must exploit ILP across multiple
basic blocks (i.e. across branches such as in trace scheduling).

Introduction to instruction level parallelism has the following general structure:


1. Introduction to pipelining.
2. Hazards.
3. Multicycle operations.

Introduction to pipelining
Pipelining is an implementation technique based on breaking down each instruction execution
into multiple stages. This technique does not affect (or only minimally) to each instruction
latency.
However, it allows to increase throughput, as it allows to finish (ideally) an instruction every
cycle.
In the materials details are given for the design of a five stages simplified pipelined processor:
 Instruction Fetch IF. Instruction read and update program counter register.
 (Instruction Decode ID. Instruction decode, registers read, sign extension in of sets, and
possible
branch address computation.
 Execution _ EX. ALU operation on registers and computation of effective branch
address.
 Memory _ M. Memory read or write.
 Write-back _ WB. Result write on register _le.
Hazards
A hazard is a situation preventing next instruction to start in the expected clock cycle.

January 29, 2020


7
FUNCTION ORGANIZATION 01/29/2020

These situations reduce performance of pipelined architectures.


Hazards may be classified in three different kinds: structural hazards, data hazards and control
hazards. The most simple approach (and less
efficient) is to stall the instruction _ow until the hazard has been eliminated.
Structural hazards happen when hardware cannot support all possible instruction sequences. This
happens if two stages need to make use of the same hardware resource.
Most common reasons are the presence of functional units that are not fully pipelined or
functional units that are not fully duplicated.
In general, structural hazard can be avoided during design but the increase cost of resulting
hardware.
A data hazard happens when pipelining modifies the read/write access order to operands. Data
hazards can be from three different kinds: RAW (Read After Write), WAR (Write After Read),
and WAW (Write After Write). From those three, only RAW hazards may happen in MIPS type
_five stages architecture. RAW data hazards can be solved sometimes through the use of the
forwarding technique.
A control hazard happens in a branch instruction, when the value to be used to make the branch
decision is still unknown. Control hazards can be solved at compile time (static solutions) or at
runtime (dynamic solutions).
Static solutions from control hazards may range from pipeline freezing or fixed prediction
(always predict to taken or not taken), to using branching with delay slots.
Dynamic solutions may use a Branch History Table (BHT) and a state machine to perform
predication. Thus, there is a state machine associated to each entry in the table.

Multicycle operations
Allocating a single cycle for floating point operations requires an extremely long clock cycle or
using a very complex floating point logic (causing high use of design resources). Alternatively,
the floating point unit itself may be pipelined, so that instructions require multiple cycles for
each execution stage.

QUESTION
1, _____ is a situation preventing next instruction to start in the expected clock cycle?

A, Hazard B, Pipeline C, Multicycle Operation D, Hardwired control unit

2, ____________: Depend on the hardware to locate parallelism?

A, Static Scheduling B, ILP C, Dynamic Scheduling D, All

January 29, 2020


8

You might also like