coa 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

[Type here]

21CS304: COMPUTER ORGANIZATION AND


ARCHITECTURE
MODULE – 4
Arithmetic
Numbers
Arithmetic Operations and Characters
Addition and Subtraction of Signed Numbers
Addition/Subtraction Logic Unit
Design of Fast Adders
Carry-Lookahead Addition
Multiplication of Positive Numbers

Basic Processing Unit


Some Fundamental Concepts
Execution of a Complete Instruction
Multiple Bus Organization
Hard-wired Control
Micro programmed Control

[Type here]
[Type here]

M4- ARITHMETIC
4.1 Introduction
4.1.1 Numbers

Addition and subtraction are basic operations performed on digital computer. The Arithmetic and Logic Unit
(ALU) performs these operations along with other logical operations like AND,OR,NOT,XOR.

10001 -11001 20010 -21010


30011 -31011 40100 -41100
50101 -51101 60110 -61110
70111 -71111 00000 -01000

So, in case of sign and magnitude way of representing numbers, the only difference between +ve and –ve
numbers is the MSB bit.

(a) 1’s Complement


 Positive numbers are straightforward
00000, 10001, 20010, 30011, 40100, 50101, 60110, 70111
 Negative numbers are opposite of +ve numbers, i.e. 0 becomes 1 and 1 becomes 0.
-01111, -11110, -21101, -31100, -41011, -51010, -61001, -71000
 In other words, negative numbers (1’s complement) is obtained by subtracting the number
from 2n – 1. (i.e. 1111 – (+ve number) = -ve number)
Ex 7 = 0111 hence -7= 1111 – 0111 = 1000
5 = 0101 hence -5 = 1111 – 0101 = 1010
(b) 2’s Complement
2’s complement of a number is found as below:
Step 1: Calculate 1’s complement of a number
Step 2: Add 1 to the solution
Ex: Find 2’s complement representation of -4
Solution: 4 is represented as 0100
Step 1: 1’s Complement of 0100 is 1011(make 0’s as 1’s & 1’s as 0)
Step 2: Add 1 to the number 1011
+ 0001
-
1100
Ex 2: Find 2’s complement representation of -6
Solution: 6 is represented as 0110
Step 1: 1’s complement of 0110 is 1001
Step 2: Add 1 to this i.e. 1001
+ 0001

1010
4.1.2 : Addition of Positive numbers
Example:
0 0 1 1
+0 +1 +0 +1
0 1 1 10

[Type here]
[Type here]
Carry bit
When we add two “1’s a carry bit is generated which is moved to next higher bit. In the below
example, addition of 3rd bit generates a carry which is moved to higher bit.
Ex: 0101
0110
1011

4.1.3 Addition and Subtraction of signed numbers


 Now let’s generalize and try to add and subtract two signal numbers(i.e. using 2’s complement)
 Adding two signed numbers (Rule)
 Convert the number into 2’s complement
 Add the n bit representation of the two numbers
 If a carry bit is generated from adding the two MSB, simply ignore it

Ex: 2 + 3 =? 1 + 6 =?

2  0010 1  0001
+ 3  0011 + 6  0110
5  0101 7  0111

Ex: (-5) + (-2) =? (in 2’s complement)


Step 1:
5 is written as 0101
-5 in 1’s complement is opposite of 5 i.e. 1010
-5 in 2’s complement = -5 in 1’s complement + 1
1010
0001

1011  -5 (1)
Step 2:
2 is written as 0010
-2 in 1’s complement is opposite of 2 i.e. 1101
-2 in 2’s complement = -5 in 1’s complement + 1
1101
0001

1110  -2 (2)
Add (1) & (2) 1011
1110

11001 (ignore MSB carry bit:) Answer is -7 i.e. 1001

Note: If you already have 2’s complement of numbers then you simply add them

Ex: Add 0111 and 1101


Solution: 0111
1101

10100

[Type here]
[Type here]
Ignore the MSB 1, hence solution is 0100

Rule 2 Subtracting two signed numbers


If X and Y are two n-bit numbers, then X-Y is done as below:
(1) Find 2’s complement of Y
(2) Add the value to X

Example 1: what is the value of 1101  X


-1001  Y

?
Step1: Find 2’s complement of Y (1001), i.e. find 1’s complement of 1001 which is equal to 0110. To
that add “1” to get the 2’s complement, which is equal to 0111.
Step 2: Add this value (i.e. 0111) to X (i.e. 1101). The answer is 10100. Ignoring the carry bit we get
the answer as 0100.

Example 2: What is the value of 0110  X


-0011  Y
Step 1: Find 2’s complement of Y (0011). i.e. find 1’s complement of 0011 which is equal to 1100.
To that add “1” to get the 2’s complement, which is equal to 1101.
Step 2: Add this value (1101) to X (i.e. 0110). The answer is 10011. Ignoring the carry bit we get the
answer as 0011.

Example 3: Now, in all the above examples we conveniently used bits. But how to calculate (-7) – (-5)?
Solution: First obtain -7 using 2’s complement
7 is written as 0111
-7 in 1’s complement is written as opposite of 7 i.e. 1000
-7 in 2’s complement is (add 1 to 1’s complement of -7). So we get 1001 -------------- (1)

Now – (-5) is equal to +5 i.e. 0101 (2)

Adding (1) and (2) i.e. 1001 + 0101 we get 1110.

Example 4: Let’s take another example. How to subtract -7 and 1 i.e. -7 – (1) = ?
Solution: Step1: Find 2’s complement representation of -7 and -1
Step 2: Add these two values.
2’s complement of -7 is 1001.
2’s complement of -1 is 1111
Adding both the values gives 11000. Ignoring the MSB bit gives the answer 1000

4.1.4 Overflow in Integer Arithmetic

Case 1: Let’s take a simple example to add two numbers i.e. 7 + 1 = ? When the operation is in binary bits,
the following may be obtained:
0111 (i.e. 7) + 0001 (i.e. 1) = 1000. But 1000 is -8 !!!

So, 7 + 1 = -8. This is clearly wrong. Thus, an overflow occurs at MSB position.

Case 2: When we add -4 and -6 we get 1100 + 1010 = 1 0110 i.e. +6 !!!.
So, -4 + -6 results in +6! This is wrong. There is again an overflow.

[Type here]
[Type here]
 When does an overflow occur?
Possibly when we add two numbers of same sign (i.e. adding two positive numbers or two negative
numbers)
In case 2 it can be seen that, there are two possibilities:
(a) Overflow bit after MSB
(b) Change in sign at MSB bit

 How to detect an overflow


If addition of two positive numbers results in a –ve number (case 1) or when adding two negative
numbers results in +ve number, that is change in sign occurs when adding 2 n-bit numbers, then
overflow occurs.

4.1.5 Characters
Computers don’t just handle numbers, it also handles non- numeric data like alphabets, punctuation
marks etc. Hence, codes are used to represent all of these. One such code is ASCII. “Unicode” is
another way to represent characters for other languages like Hindi. Hex value 0x0905 represents अ,
0x0906 represents अ etc.
Summary:
Wheel representation of 2’s complement numbers.

Let us consider adding +7 to -3. To do this using the wheel option, first
locate 7 (i.e. 0111) then move 13 steps to the right (13 steps because 2's
complement for -3 is 1101 which is 13). This gives the answer 0100 (+4)

The table shown below depicts all the 3 forms of representation of numbers:

[Type here]
[Type here]

4.2 Addition and Subtraction of Signed Numbers


 Let X and Y be any two n bit numbers and xi and yi be the ith bit respectively in both numbers.
Let Ci be the carry coming from (i – 1) thstage of addition of Xi-1 and Yi-1.
The various possibilities are shown below:
Sum Carry-out
xi yi Carry-in Ci Si Ci+1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

 Sum(Si) Logic
If you notice carefully Sum (Si) bit is 1 (i.e. ON) in 4 scenarios.
Scenario 1: xi = 0,yi=0 and Ci=1 i.e.x̅𝑖 𝑦̅𝑖 𝑐𝑖
Scenario 2: xi =0, yi=1 ,Ci=0 i.e. 𝑥𝑖̅ 𝑦𝑖 𝑐𝑖̅
Scenario 3: xi = 1, yi=0, Ci=0 i.e. 𝑥𝑖 𝑦𝑖̅ 𝑐𝑖̅
Scenario 4: xi = 1, yi= 1, Ci=1 i.e. 𝑥𝑖𝑦𝑖𝑐𝑖
Hence, Si =x𝑖̅ 𝑦̅+ 𝑥̅𝑖 𝑦𝑖 𝑐𝑖̅ +𝑥𝑖 𝑦̅𝑖 𝑐𝑖̅ +𝑥𝑖 𝑦𝑖 𝑐𝑖 (Note: The + refers to “OR” and not arithmetic ‘+’)
In short, Si = 𝒙𝒊  𝒚𝒊  𝒛 𝒊 i.e. xi XOR yiXOR Ci
 Carry (Ci) Logic 
You will notice that carry is 1 in following scenarios
Scenario 1: xi = 0,yi=1 and Ci=1 i.e. 𝑥𝑖̅ 𝑦𝑖 𝑐𝑖 or 𝑦𝑖 𝑐𝑖
Scenario 2: xi = 1,yi=0 and Ci=1 i.e. 𝑥𝑖 𝑦𝑖̅ 𝑐𝑖 or 𝑥𝑖 𝑐𝑖
Scenario 3: xi = 1,yi=1 and Ci=0 i.e. 𝑥𝑖 𝑦𝑖 𝑐𝑖̅ or 𝑥𝑖 𝑦𝑖
Scenario 4: xi = 1,yi=1 and Ci=1 i.e. 𝑥𝑖𝑦𝑖𝑐𝑖
Now, Scenario 3 and Scenario 4 can be shown by single equation i.e. 𝑥𝑖𝑦𝑖
Hence, carry is generated when 𝑐𝑖+1 = 𝑦𝑖𝑐𝑖 + 𝑥𝑖𝑐𝑖 + 𝑥𝑖𝑦𝑖
 Circuits for Si and Ci+1

Both the above circuits can be put together and shown as a full adder (FA) as below:

Figure 6.2a: Logic for a single stage


The above figure represents addition of two one bits. However, numbers are n bit long. Hence, a
cascaded full adder is required to add two “n” bit number X and Y. Such a cascaded circuit where carry
bit ripples from one FA to another is called a “n bit ripple-carry adder” as shown below:

[Type here]
[Type here]

Figure 6.2b: An n-bit ripple-carry adder

Suppose we need to add k such n bit number.

(Ex: If n=8 and we want to add two 32-bit numbers, then we will need 4 such ripple address)
So, a k n-bit adder is shown as:

4.2.1 Addition/Subtraction Logic Unit

Addition:
Addition of 2n bit numbers is fairly straight forward:
Step 1: Obtain 2’s complement of X (X is n bit numbers)
Step 2: Obtain 2’s complement of Y (Y is n bit numbers)
Step 3: Use Fig 6.2 (b) to add these numbers. Co will be equal to 0 in this case.
Xn-1 and Yn-1 shall be the sign bits (MSB)

 Detecting overflow during Addition (Method 1)


Let’s recall showing when an overflow can occur:
0 1 11
+  Sum is 1000 (There is an overflow as MSB bit is 1 for result)
0 0 01
1 0 00
+  Sum is 0111 (There is an overflow as MSB bit is 0 for result)
1 1 11

The problem with overflow is that the sum has a different sign compared to operands
Note: Overflow is seen when
xn-1 = 0, yn-1 = 0 and Sn-1 = 1 i.e. ̅𝑋̅𝑛−̅𝑌
𝑛̅ −̅ 1
1 𝑆𝑛−1
or
xn-1 = 1, yn-1 = 1 and Sn-1 = 0 i.e.𝑋𝑛−1 𝑌𝑛−1 ̅𝑆 𝑛̅ −̅1

A circuit to detect overflow can be written as:


𝑂𝑣𝑒𝑟𝑓𝑙𝑜𝑤 = ̅𝑋̅𝑛−̅𝑌
1
̅ ̅ 𝑆𝑛−1 + 𝑋𝑛−1 𝑌𝑛−1 𝑆
𝑛−1
̅𝑛̅−̅1
 Detecting overflow during Addition (Method 2)
If Carry(Cn-1) and Carry (Cn) are different then an overflow has occurred

[Type here]
[Type here]

Another way of saying that an overflow can be detected is by Cn-1  Cn. i.e., (CnXOR Cn-1). If it is 1
then an overflow occurred else an overflow did not occur.

 Binary Addition-Subtraction logical network:


The below logic diagram can be used to do both addition and subtraction.

Figure 6.3: Binary addition-subtraction logic network

This is an interesting circuit let’s see how we can add and subtract 2 n-bit numbers using this single
circuit. Before that we can see that Y’s content is XORed before being fed to the circuit. An XOR
logic follows below table:
𝒚𝒊 Add/Sub Control 𝒚𝒊  Add/Sub Control
0 0 0
0 1 1
1 0 1
1 1 0

Addition:
Note: If Add/Sub = 0 then Yi input is same as Yi  Add/Sub. So, to add X and Y we need to
do the following:
Step 1: Pass X to lines X0 … Xn-1
Step 2: Pass Y to lines Y0…Yn-1
Step 3: Pass 0 to Add/Sub line which goes as carry C0 as well.

Subtraction:
So, the circuit should do the following
(a) Calculate 2’s complement of Y (i.e. ex 1)
(b) Add it to X (ex 7)

So, to subtract 2 n bit numbers, following is done:


Step 1: Pass X to line X0…Xn-1
Step 2: Pass Y to line Y0---Yn-1
Step 3: Pass 1 to add/Sub control line
And the two numbers are subtracted!!

[Type here]
[Type here]
The XOR gate at “Y” converts Y to 1’s complement of Y. The “Add/Sub Control” which results
in Carry C0 bit to be 1. So this bit further adds 1 to the sum resulting in 2’s complement of Y.
Hence X and Y are essentially subtracted.

4.3 Design of Fast Adders:


In the n-bit ripple carry adder, by the time the complete sum is calculated, a significant amount of delay is
observed. There are three sources of delay.
(a) At each full adder, calculation has to go through one XOR for sum and another for carry.
Hence, for “n” bit addition the number of gate through which the bits have to pass = 2n.
(b) Additionally, at “y’ calculation an XOR logic is applied(Fig 6.3). This adds 1 to the gate count.
(c) To take care of the carry issue(Cn  Cn-1) one more gate logic is required.
Hence, number of gate logic circuits that are required before the sum is finally computed is 2n+2.
There are two ways to reduce the delay.
(a) Implement the n bit ripple carry using latest fastest possible electronic technology.
(b) Use a gate logic that is larger than Fig 6.2b.

4.3.1 Carry-Look Ahead Addition


We know that the sum and carry for ith x and y bits is represented as
Si = 𝒙𝒊  𝒚𝒊  𝒛𝒊and Ci+1 = xiyi + xiCi + yiCi

 The above equation can be written as C i+1 = xiyi + (xi + yi) Ci


If we represent Gi = xiyi and Pi = xi + yi, Then Ci+1 = Gi + PiCi
Gi is called Generate and Pi is called Propagate.

 We can write Pi = xi  yiinstead of Pi = xi + yiand still Ci+1 will hold good, because of following reason.
Xi Yi xi+yi xi  yi
0 0 0 0
0 1 1 1
1 0 1 1
1 1 1 0
Note: + means logical OR and  means XOR
Only difference is the last row, where xi  yi gives 0 instead of 1 but this is compensated by Gi i.e.
xiyi = 1.1 = 1 leading to Ci = 1+ 0 = 1. Hence, we can write
Ci+1 = Gi + PiCi where Gi = xiyi Pi=xi  Yi and we know Si = xi  Yi  Ci
 So, we can write the basic cell for a single bit adder as below.

[Type here]
[Type here]
We can recursively write Ci+1 as below

Ci+1 = Gi + PiCi
=Gi + Pi[Gi-1 + Pi-1Ci-1]
=Gi + PiGi-1 + PiPi-1Ci-1
= Gi + PiGi-1 + PiPi-1[Gi-2+ Pi-2Ci-2]
=Gi + PiGi-1 + PiPi-1Gi-2 + PiPi-1Pi-2Ci-2
Ci+1 = Gi + PiGi-1 + PiPi-1Gi-2+…..+PiPi-1….P1G0 + PiPi-1…P0C0

One will notice one important thing: to calculate thei+1th, carry you only need C0.You don’t need the
chain now.

As soon as you apply the value of X,Y and C0 the sum is obtained in 3 gate delay (instead of n gate
delay) as below:
o One gate delay to calculate ALL Pi and Gi
o One gate delay for AND logic (ex PiGi-1)
o One more gate delay to do the OR logic (ex: Gi+PiGi+…+…+)
Hence, in 3 gate delay we get the carry.
o For sum we need to do one final XOR. Hence One more XOR for sum
Hence, the sum is obtained in four gate delays.

 A 4-bit carry-lookahead adder circuit is given in the following figure:

The carry C1, C2, C3, C4 can be represented in terms of G and P as below:
C1 = G0 + P0C0
C2 = G1 + P1C1 = G1 + P1(G0+ P0C0) = G1+ P1G0 + P1P0C0
C3 = G2+ P2C2 = G2+ P2G1 + P2P1G0 + P2P1P0C0
C4 = G3 + P3C3 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0C0
The above circuit is called a carry-lookaheadadder.
Carry-lookahead circuit calculates C1,C2,C3,C4 using Pi and Gi. This circuit uses 3 gate delays for all
carry and 4 gate delays for sum. In comparison the 4 –bit ripple carry adder uses 7 gate delays for S 3
and 8 gate delays for C4.
 Multiple 4-bit carry look ahead adder can be used to implement n-bit address.
 For example: Eight 4-bit adders can be connected together to form a 32-bit adder.
In this case sum S31 and carry C32 are available after 63 and 64 gate delays respectively.

[Type here]
[Type here]
Higher – level Generate and Propagate Function

The figure below shows the 16-bit carry –look ahead adder built using 4-bit adders.

For the above figure 6.5, we have


P0I = P3P2P1P0
G0I = G3+ P3G2 + P3P2G1 + P3P2P1G0
The final carry generated is C16.This can be represented as:
C16 = G3I + P 3IG2I + P 3IP2IG1I + P 3IP2IP1IG0I+P 3IP2IP 1IP0IC0
Now let’s see the delay
 The sum S15 is available after 8 gate delays. The C16 carry is available after 5 gate delays.
If a 32-bit adder is created by cascading two 16-bit address, in such a configuration
 The sum S31 is available in 10 gate delays.
 The carry C31 is available in 9 gate delays.
However, if the same 32-bit adder is created by cascading eight 4-bit lookahead adders, then the
gate delays will be:
 The sum S31 is available in 18 gate delays
 The carry C31 is available in 17 gate delays

4.4 Multiplication of Positive Numbers:


 When two “n” bits numbers are multiplied by hand it is done as below. For ex:

 The above method of multiplication can be implemented as shown:

[Type here]
[Type here]

The square box represents a single cell that implements partial product for one bit as shown:

o Each row i, where 0 ≤ i ≤ 3 adds the multiplicand to the incoming partial product, PP i to generate
the outgoing partial product, PP (i+1), if qi=1. However, if qi=0, PPi is passed vertically downwards
unchanged.
o Note: The worst case signal propagation delay path is from the upper right corner of the array to
the higher order product bit at the bottom left corner of array.
o The path has a total of 6(n-1)-1 gate delays including initial AND gate delay in all cells for n X n
array.

 The other method to perform multiplication is to use the adder circuitry in the ALU for a number of
sequential steps. The figure below shows the sequential circuit binary multiplier:

Now, let’s take the example as before:


M = 1101, A = 0000, Q = 1011, C = 0
As per above circuit diagram the following has to be done
Step 1: Add M and A (only if LSB of Q i.e. q0 = 1).
Step 2: Store the result in A and carry bit(if any in C)
Step 3: Shift by 1 bit the pattern “C A Q”

[Type here]
[Type here]
Round 1
Step 1: Add M and A (only if q0=1. Since q0=1 we can add M and A) .

Step 2: Store the result in A and carry = 0 in C

Step 3: Shift right by 1 bit the C A Q value. New values in C A and Q after shift right by 1 bit
is

Round 2:
Step 1: Add M and A only if q0=1. This is true

Step 2: Store the values in “C” and “A” (i.e. sum and carry generated in step 1)

Step 3: Shift right “C” and “Q” by 1 bit. So, the value of “C” “A” and “Q” after shifting right is

Round 3
Step 1: Add M and A if q0 = 1
This condition fails so no addition done (q0≠ 1)
Step 2: The sum and carry generated in Step 1 are stored in “C” and “A”
Note: Since no sum was done, so no new values are updated for “C” and “A” so “C”,
“A” and “Q” values remain as in step 3 of round 2.

Step 3: Shift right by 1 “C” “A” and “Q”

Round 4:
Step 1: Add M and A if q0=1

Step 2: The sum and carry generated in Step 1 are copied to “A” and “C” respectively.

Step 3: Shift right by 1 bit “C”, “A” and “Q”. Hence, the values in C A and Q after shifting
right is:

Final answer:
The values in A and Q concatenated together to form the product

[Type here]
[Type here]

The above steps can be written in the following table to visualize in one shot.

Sequential circuit binary multiplication example:

[Type here]
[Type here]

Basic Processing Unit

5.1 Introduction
 Instruction Set Processor (ISP) or processor executes machine instructions and coordinates
the activities of other units.
 It is also termed Central Processing Unit (CPU). The term “Central” is less appropriate today
because many modern computer systems include several processing units.
 Organization of processors has evolved over the years, driven by developments in technology
and need to provide high performance.
 To achieve high performance, make various functional units operate in parallel.
 Such high performance processors have:
* Pipelined organization – execution of one instruction is started before the execution of
preceding instruction is completed.
* Superscalar operation – several instructions are fetched and executed at the same time.
 Here, we discuss on basic ideas that are common to all processors.

5.2 SOME FUNDAMENTAL CONCEPTS


 To execute a program, the processor fetches one instruction at a time and performs the
operations specified.
 To execute an instruction, the processor has to perform the following 3 steps:
1. Fetch the contents of the memory location pointed to by the PC. The content of this location
is the instruction to be executed. Hence, load into IR.
o IR [[PC]]
2. Assuming memory is byte addressable, increment the contents of the PC by 4
o PC [PC] + 4
3. Carry out the actions specified by the instruction in the IR.
1 & 2 are fetch phase. 3 is execution phase.

 Register MDR has 2 inputs and 2 outputs.


 Register MAR gets input from internal bus and gives output to the external bus.
 The control lines of memory bus are connected to the instruction decoder and control
logic block.
 This unit is responsible for issuing the signals that control the operation of all the units
inside the processor and for interacting with memory bus.
 R0 to R(n-1) are the general-purpose registers used by the programmer.
 Special purpose registers are index or stack pointers.
 Y,Z, TEMP are registers that are never referenced by an instruction.
 They are used by the processor for temporary storage during execution of some
instructions.

[Type here]
[Type here]

 MUX – multiplexer selects either output of Y or constant 4 (to increment the contents of
program counter).

 2 possible values of MUX control input Select as Select4 and SelectY.


 As instruction execution progresses, data are transferred from one register to another,
passing through ALU to perform arithmetic and logic operation.
 Instruction decoder and control logic unit is responsible for implementing the actions
specified by instruction loaded in the IR register.
 Decoder generates the control signals needed to select the registers involved and direct
the transfer of data.
 ALU and interconnecting bus is referred to as data path.
 An instruction can be executed by performing one or more of the following operations in
some specified sequence:
 RiRj : Transfer a word of data from one processor register to another or to the ALU.
 Ar/L: Perform an arithmetic or logic operation and store the result in a processor
register.

[Type here]
[Type here]

 ML R: Fetch the contents of a given memory location and load them into processor
register.
 R ML: Store a word of data from a processor register to memory location.
5.2.1 Register Transfers
 Instruction execution involves data transfers from one register to another.
 For each register, 2 control signals are used. It is represented symbolically as shown in Figure
7.2.

 are switches controlled by signals Riin and Riout.


 When Riin is set to 1, the data on the bus are loaded into Ri.
 When Riout is set to 1, the contents of register Ri are placed on the bus.

[Type here]
[Type here]

 Move R1,R4 is an instruction. This can be accomplished as follows:


o Enable the output of R1 by setting R1out to 1. This places the contents of R1 on the
processor bus.
o Enable the input of R4 by setting R4in to 1. This loads data from processor bus into R4.
 All operations and data transfers within processor takes place within time periods defined by
processor clock.
 Example: At the start of clock cycle, R1out and R4in are set to 1.
 Registers are edge-triggered flip-flops.
 At next active edge of clock, data is loaded into R4. At the same time, R1out and R4in return to
0.
Alternate approach:
 Data transfers may use both rising and falling edges of the clock.
 When edge-triggered flip-flops are not used, 2 or more clock signals may be needed to
guarantee proper transfer of data. This is multiphase clocking.

 When Riin=1, mux selects data on the bus. Data is loaded into flip-flop at rising edge of the
clock.
 When Riin=0, mux feeds back the value currently stored in flip-flop.
 When Riout=0, gate’s output is in high-impedance (electrically disconnected) state. i.e. open-
state of switch.
 When Riout=1, gate drives the bus to 0 or 1, depending on the value of Q.

5.2.2 Performing an Arithmetic or logic operation


 The ALU is a combinational circuit that has no internal storage. Input to ALU is through mux
and from the processor bus.
 Output is stored temporarily in register Z.
 Sequence of operations to perform Add R1,R2,R3
1. R1out, Yin
2. R2out, SelectY,Add,Zin
3. Zout, R3in
 Only one register output can be connected to the bus during any clock cycle.

[Type here]
[Type here]

5.2.3 Fetching a word from memory

 To fetch a word of information from memory, processor has to specify the address of the
memory location where this information is stored and request a Read operation.
 Information can be instruction or an operand
 Processor transfers the required address to MAR.
 Processor uses control lines to indicate that a Read operation is needed.
 When requested data are received from the memory , they are stored in register MDR

 The processor completes one internal data transfer in one clock cycle.
 The speed of operation of the addressed device varies with the device.
 Devices include cache memory, register in memory mapped I/O devices, main memory, etc.
 The cache responds to a read request in one clock cycle.
 When cache miss occurs, request is forwarded to main memory which introduces several clock
cycles delay.
 To accommodate variability in response time, the processor waits until it receives an indication
that requested Read operation has been completed.
 A control signal called Memory Function Completed(MFC) is used for this purpose.
 Addressed device sets this signal to 1 to indicate that the contents of the specified location
have been read and are available on the data lines of the memory bus.
 Consider the instruction Move (R1), R2. The actions needed to execute this instruction are:
o MAR [R1]
o Start a Read operation on the memory bus.
o Wait for the MFC response from the memory.
o Load MDR from the memory bus.
o R2 [MDR]

[Type here]
[Type here]

 Contents of MAR are always available on the address lines of memory bus.
 When a new address is loaded into MAR, it will appear on the memory bus at the beginning of
the next clock cycle as shown.
 A Read control signal is activated at the same time MAR is loaded.
 This signal will cause the bus interface circuit to send a read command, MR(Memory Read)
on the bus.
 MDRinE is active waiting for a response from the memory.
 Data received from memory are loaded into MDR at the end of the clock cycle in which MFC
signal is received.
 In the next clock cycle. MDRout is activated to transfer the data to register R2.
 Signals are activated as follows:
1. R1out, MARin, Read
2. MDRinE, WMFC (wait for arrival of MFC signal.
3. MDRout, R2in

[Type here]
[Type here]

5.2.4 Storing a word in memory

 Desired address is loaded into MAR


 Data is loaded into MDR.
 Write command is issued
 Move R2,(R1) requires the following sequence:
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
Note: WMFC:- Wait for arrival of MFC signal

5.3 EXECUTION OF A COMPLETE INSTRUCTION


Consider the instruction
Add (R3), R1
 Executing this instruction requires the following actions:
o Fetch the instruction
o Fetch the first operand (the contents of the memory location pointed to by R3).
o Perform the addition.
o Load the result into R1.
Control sequence for execution of the instruction Add (R3), R1
1. PCout, MARin, Read, Select4, Add, Zin
Fetch operation is initiated by loading the address in PC into MAR and sending a Read request to
the memory. Select signal is set to Select4, which causes the MUX to select constant value 4. This
value is added to the operand at input B (contents of PC) and the result is stored in register Z.
2. Zout, PCin, Yin, WMFC
The updated value in Z is moved to PC (to point to the next address) while waiting for the memory
to respond.
3. MDRout, IRin
Once MFC signal is received from memory, the fetched instruction will be moved into MDR and
then to IR.
These 3 phases are instruction fetch phase.
4. R3out, MARin, Read
The instruction decoding circuit interprets the contents of IR and the processor starts the execution
phase. Contents of R3 (address of the operand) are loaded into MAR and a Read signal is issued.
5. R1out, Yin, WMFC
While waiting for the memory to respond, contents of R1 are transferred into Y register.
6. MDRout, SelectY, Add, Zin
The memory provides data on the bus, which is moved into MDR and onto the B input of ALU. The
contents of Y (R1 contents) are gated into input A of ALU using SelectY signal of MUX.
Add control signal is activated.
After addition, result is transferred to Z.
7. Zout, R1in, End.
Finally, the sum is moved out of register Z into R1. The End signal causes new instruction fetch
cycle to begin by returning to step 1.

[Type here]
[Type here]

Ex: Control sequence for the instruction Add (R3)+,, R1


1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read, Select4, Add, Zin
5. Zout, R3in
6. R1out, Yin, WMFC
7. MDRout, SelectY, Add, Zin
8. Zout, R1in, End

5.3.1 Branch Instructions


 A branch instruction replaces the contents of the PC with the branch target address.
 This address is usually obtained by adding an offset X, which is given in the branch
instruction, to the updated value of PC.

[Type here]
[Type here]

 The offset X used in a branch instruction is the difference between the branch target address
and the address immediately following the branch instruction.
 Ex:- If branch instruction is at 2000, branch target address is 2050, then value of X must be 46.
(This is because PC would have incremented during fetch phase, so it would be pointing to
2004 already. Therefore, only 46 is the offset.)
 For a conditional branch, we need to check status of condition codes before loading a new
value into PC.
 For (Branch > 0) instruction, Step 4 is replaced with Offset-field-of IRout, Add, Zin, If N=0, then
End.If N=0, the processor returns to step 1 immediately after step 4.
 If N=1, step 5 is performed to load a new value into PC, thus performing the branch operation.

5.4 Multiple-Bus Organization


 Single bus structure results in control sequences which are quit long because only one item
can be transferred over the bus in 1 clock cycle.
 To reduce the number of steps and enhance the CPU performance, modern processors
provide multiple internal paths that enable several transfers to take place in parallel.

[Type here]
[Type here]

 All general purpose registers are combined into a single block called the register file.
 Register file has 3 parts.
o 2 output’s allowing contents of two different registers to be accessed simultaneously and
their contents are placed on A and B.
o 1 port allows the data on C to be loaded into third register during the same clock cycle.
 Buses A and B are used to transfer the source operands to the A and B inputs of ALU.
 Output of ALU is transferred over bus C.
 If ALU simple pass one of its two input operands unmodified to bus C, indicate using R=A or
R=B
 Using incremental eliminates the need to add 4 to PC using ALU and add operation.

Ex: - Control sequence for the instruction Add R4, R5, R6 for the 3-bus organization
1. PCout, R=B, MARin, Read, IncPC
Contents of PC are passed through ALU using R=B control signal and loaded into MAR to start a
memory read operation. PC is incremented by 4 to point to the next instruction in sequence.
2. WMFC
Processor waits for MFC signal from memory.
3. MDRoutB, R=B, IRin
The instruction code is received in MDR and transferred to IR.this completes the fetch phase.
4. R4outA, R5outB, SelectA, Add, R6in, End.
The instruction is decoded and add operation takes place.

5.5 Hardwired Control


 To execute instructions, processor must have some means of generating the control signals
needed in proper sequence.
 Two approaches:
o Hardwired control
o Microprogrammed control

[Type here]
[Type here]

 The decoder/encoder block is a combinational circuit that generates the requested control
signals (outputs) depending on the states of all its inputs.

 The step decoder provides a separate signal line for each step, or time slot, in the control
sequence.
 Output of instruction decoder consists of a separate line for each machine instruction.
 For any instruction loaded in IR, one of the output lines INS1 through INSm is set to 1 and all
other lines are set to 0.
 Input signals to the encoder block are combined to generate the individual control signals like
Yin, PCout, Add, End, etc.

[Type here]
[Type here]

 This circuit implements the logic function: Zin= T1+T4.Br+T6.Add+...


 This signal is asserted during time slot T 1 for all instructions, during T4 for unconditional branch
instructions and T6 for an Add instruction.

RUN control signal:


 When set to 1, RUN causes the counter to be incremented by one at the end of every clock
cycle.
 When set to 0, the counter stops counting. This is needed when WMFC signal is issued, to
cause the processor to wait for reply from memory.

 Logic Function:
 End signal starts a new instruction fetch cycle by resetting the control step counter to its
starting value.
 The control hardware can be viewed as a state machine that changes from one state to
another in every clock cycle, depending on the contents of IR, condition codes and external
inputs.
 Output of the state machine are control signals
 Sequence of operations carried out by the machine is determined by wiring of the logic
elements, hence the name “hardwired”.

5.5.1 A Complete Processor


 Instruction unit fetches the instructions from an instruction cache or from the main memory
when the desired instructions are not already in cache.
 It has separate processing units to deal with integer data and floating-point data.

[Type here]
[Type here]

 Most of the processors today use separate caches for instructions and data.
 Processor is connected to the system bus through bus interface.
 To increase the potential for concurrent operations, several integer and floating point units.

5.6 Microprogrammed Control


 Using hardwired control, the control signals required inside the processor can be generated
using a control step counter and a decoder/ encoder circuit.
 In microprogrammed control, the control signals are generated by a program similar to
machine language programs.
 A control word (CW) is a word whose individual bits represent the various control signals.
 A sequence of CWs corresponding to the control sequence of a machine instruction constitutes
the micro routine for that instruction.
 Individual control words in this micro routine are referred to as microinstructions.
 The micro routines for all instructions in the instruction set of a computer are stored in special
memory called the control store.

[Type here]
[Type here]

 The control unit can generate the control signals for any instruction by sequentially reading the
CWs of the micro routine from the control store.
 To accomplish this, the organization of CU can be:

[Type here]
[Type here]

 Micro Program Counter (µPC) is used to read the control words sequentially from the control
store.
 Every time a new instruction is loaded into the IR, the output of the block labelled “starting
address generator” is loaded into the µPC.
 µPC is automatically incremented by the clock, causing successive microinstructions to be
read from the control store.
 Therefore, the control signals are delivered to various parts of the processor in correct
sequence.
 This organization cannot handle a situation, wherein the CU has to check the status of
condition codes or external inputs.
 Hardwired control handles this situation by including an appropriate logic function in the
encoder circuit.
 In microprogrammed control, alternative approach is to use conditional branch
microinstructions.
 The micro-routine for Branch instruction says that: After loading Branch<0 into IR, a branch
microinstruction transfers control to the corresponding micro-routine, which is assumed to start
at location 25 in control store.
 The microinstructions at location 25 tests the N bit of condition codes.
o If it is 0, a branch takes place to location 0 to fetch a new machine instruction.
o Otherwise, microinstruction at location 26 is executed. Then 27 is followed.
 To support this microprogram branching, CU is as shown:

[Type here]
[Type here]

 In this CU, the µPC is incremented every time a new microinstruction is fetched from the
microprogram memory, except in the following situations:
o When a new instruction is loaded into IR, the µPC is loaded with starting address of
µroutine for that instruction.
o When a Branch µinstruction is encountered and the branch condition is satisfied, the
µPC is loaded with the branch address.
o When an End µinstruction is encountered, the µPC is loaded with the address of first
CW in the µ-routine for instruction fetch cycle.

[Type here]
[Type here]

5.6.1 Microinstructions
 A straight forward way to structure microinstructions is to assign one bit position to each control
signal.
 This scheme has a serious drawback – assigning individual bits to each control signal results in
long microinstructions because the number of required signals is large.
 Only few bits are set to 1, which means the available bit space is poorly used.
 Approaches to design a format for microinstructions:
1) Assuming that a processor contains only 4 general-purpose registers, R0,R1,R2 and R3.
 Enable some of the connections in this processor permanently. Such as output of IR to
decoding circuits – both inputs to the ALU.
 Connections to various registers require 20 gating signals
 Control signals like Read, Write, Select, WMFC and End signals need space.
 Assuming 16 functions to perform ALU including Add, Subtract, AND and XOR.
 In total 42 control signals are needed.
 Disadvantage of this approach: Most signals are not needed simultaneously, and many
signals are mutually exclusive. This space can be reduced.
2) The signals can be grouped so that all mutually exclusive signals are placed in same group.
 A binary coding scheme is used to represent the signals within a group.

[Type here]
[Type here]
Disadvantage of this approach: this format requires a little more hardware because
decoding circuits must be used to decode the bit patterns of each field into individual
control signals.
 Advantage: - This format results in smaller control store Only 20 bits are needed to store
the patterns for 42 signals.
3) Enumerating the patterns of required signals in all possible microinstructions.
 Each meaningful combination of active control signals can be assigned a distinct code
that represents the microinstruction.
 Such full encoding reduces the length of MW’s but increase complexity of required
decoder circuits.
 Such highly encoded schemes that use compact codes to specify only a small number
of control functions in each µinstruction are referred to as a “vertical organization”.
 “Horizontal organization” is an encoded scheme in which many resources can be
controlled with a single ingle instruction as shown in Figure 7.15.
 This organization is useful when a higher operating speed is desired and when the
machine structure allows parallel use of resources
 The second approach is a horizontal organization.

5.6.2 Microprogram Sequencing


 A simple microprogram using the format in Figure 7.15 is fairly simple for writing and
verification. But this scheme has 2 disadvantages:
o Requirement of large control store, since each machine instruction has a separate micro
routine.
o If the machine instructions have several addressing modes, a separate micro routine for
each of these combinations may produce duplication of common parts of the program.
 To solve these problems, the microprogram should be organized so that micro routines share
the common parts
o This requires many branch instructions to transfer control among various parts.
o This leads to another problem-program execution time will be longer since more time is
required to carry out the branch instructions.
 Consider an instruction “Add src,Rdst” to illustrate the complexity of sequencing the operations
 This instruction adds the source operand to the contents of register Rdst and places the sum in
Rdst, the destination register.
 Source operand can be in any of the following addressing modes: register, auto increment,
auto decrement, indexed, indirect.
 A microprogram is presented in flowchart form, for easier understanding.
 Each box in the chart represents a microinstruction that controls the transfers and operations
indicated within the box.
 The microinstruction (µinstruction) is located at the address indicated by the octal number
above the upper right-hand corner of the box.
 Each octal digit represents 3 bits.
 Techniques used:

[Type here]
[Type here]
Branch Address Modification using Bit-ORing
 From the flowchart, it can be seen that branches are made to different addresses
because some parts of the micro routineis shared among all the microprograms
 At a point labelled α, a decision is to be made about branching:
o If direct mode is specified, instruction at location 170 is bypassed and control
goes to 171
o If indirect mode is specified, then the µinstruction at location 170 is executed to
fetch the operand from memory.
 This is performed using a technique called bit-ORing.

[Type here]
[Type here]
Bit-ORing
 Simplest way to transfer control directly to location 171 is to make the preceding branch
µinstruction specify the address 170 and then use an OR gate to change the LSB of this
address to 1 if direct addressing mode is specified. This is known as bit-ORing technique.

[Type here]
[Type here]

5.6.3 Wide-Branch Addressing


 The flowchart includes a wide branch in the µinstruction at location 003.
 The instruction decoder, InstDec generates the starting address of the µroutine that
implements the instruction that has just been loaded into IR.
 In our example, the instruction “Add src, Rdst” is loaded into IR.
 The instruction decoder generates µinstruction address 101.
 However, this address cannot be loaded as it is into the µPC, because src operand can be in
any of the several addressing modes.
 The flowchart shows 5 possible branches starting from left to right: indexed, auto decrement,
auto increment, register direct and register indirect.
 Bit-ORing technique is used to modify the starting address generated by the instruction
decoder to reach the appropriate path.
 WMFC is used in a branch µinstruction because branch must not take place until the memory
transfer in progress is completed.
 A case of source operand being accessed in auto increment mode: Add (Rsrc)+,Rdst

[Type here]
[Type here]

Octal Binary
Address generated by instruction decoder 101 001 000 001
Indexed 161 001 110 001
Autodecrement 141 001 100 001
Autoincrement 121 001 010 001
Register direct 101 001 000 001
Register indirect 111 001 001 001

Mode Bits Assumptions:


10th Bit 9th Bit Mode
1 1 Indexed
1 0 Autodecrement
0 1 Autoincrement
0 0 Register

8th Bit
0 Direct
1 Indirect

 Processor has 16 registers being used for addressing, each specified using 4-bit code.
 There are 2 stages of decoding:
o The microinstruction field must be decoded to determine that an Rsrc orRdst register is
involved.
o The decoded output is then used to gate the contents of the Rsrc or Rdst fields in IR.
Into second decoder, which produces the gating signals for actual registers R0 to R15
 The micro routine for Add (Rsrc)+Rdst has two Bit-ORing examples:
1) Microinstruction at location 003:
 There are 5 starting addresses for the micro routine depending on the addressing mode.
 These addresses differ in the middle octal digit only.
 The 3 bits to be ORed with the middle digit are supplied by decoding circuitry connected
to the src address.

2) Microinstruction at location 123:


 It causes a branch to the microinstruction at location 170, which causes another fetch
from memory using indirect addressing mode.
 Using direct addressing mode, the above additional fetch is bypassed by ORing the
inverse of the indirect bit in the src address field (bit 8 in the IR) with the 0-bit position of
the micro processor

[Type here]
[Type here]

5.6.4 Microinstructions with next-address field


 The flowchart in Figure 7.20 contains several branch µinstructions which perform no useful
operation in the data path.
 These instructions are needed only to determine the address of the next µinstruction.
 More number of such instructions will reduce the speed of computation.
 Solution to this problem is:
o Include an address field as part of every µinstruction to indicate the location of next
µinstruction to be fetched.
 Advantage of this scheme is:
o Need for separate branch µinstruction is eliminated.
o No need of a counter to keep track of address. Therefore, µPC is replaced by
µAR(microinstruction Address Register). This register is loaded from next address field
of each µinstruction.
 New microprogramming control structure with µAR and bit-ORing capability can be designed
as:

 The decoding circuits generate the starting address of a given µroutine on the basis of opcode
in IR.
 The next address bits are fed through OR gates to µAR.

[Type here]
[Type here]

 The address can be modified depending on the data in the IR, condition codes and external
inputs.
 Reconsidering the instruction, “Add (Rsrc)+, Rdst”
o µroutine is shown in Figure 7.21
o if we use the control structure just designed, we need to modify the µinstruction format
designed on Figure 7.19
 Extra fields to be added along with the previous format are:
o Signal ORmode is used to indicate whether bit-ORing is used or not.
o Signal ORindsrc is used to indicate whether indirect addressing of source operand is
used for wide branching in the flowchart of Figure 7.20.
o One bit in the µinstruction is used to indicate when the output of the instruction decoder
is to be gated into the µAR.
o Each µinstruction contains an 8-bit field that holds the address of the next µinstruction.

[Type here]
[Type here]

[Type here]
[Type here]

 The branch µinstruction at location 123 is continued with 122.


 When µinstruction sequencing is controlled by µPC, the End signal is used to reset the µPC to
point to the starting address of the µinstruction that fetches the next machine instruction to be
executed.
 In the organization considering µAR, this starting address is 0008(i.e 000 in octal).
 End signal is explicitly specified in FO field.
 Figure 7.25 and 7.26 of textbook gives in detail picture of control structure of figure 7.22 and
circuitry for bit-ORing.

5.6.5 Prefetching Microinstructions


 One drawback of µprogrammed control is that it leads to slower operating speed because of
the time it takes to fetch µinstructions from control store
 Faster operation is achieved if the next µinstruction is prefetched while the current one is being
executed.
 Execution time can be overlapped with fetch time.
 Prefetching has some problems:
o The status flags and results of currently executed µinstruction are needed to determine
the address of next µinstruction.
o Therefore, straightforward prefetching occasionally prefetches a wrong µinstruction.
o In such cases, the fetch must be repeated with correct address, which requests complex
hardware.
 These disadvantages are minor and prefetching technique is often used.

5.6.6 Emulation
 Given a computer with certain instruction set, it is possible to define additional machine
instructions and implement with extra µroutines using microprogrammed control.
 Given computer M1 is added with instruction set of different computer M2.
 Machine language of M2 can be run on M1i.e M1 emulates M2.
 Emulation:
o Allows to replace obsolete equipment with up-to-date machines.
o Supports no software changes to be made to run existing systems
o Facilitates transitions to new computer systems with minimal disruption.
o Is easier when machines involved have similar architectures.
o However, can be done on different architecture machines too.

Problem:
Write the control sequence of execution of the instruction ADD (R3),R1. For this sequence of
instructions, the processor is driven by a continuously running clock such that each control step is
2ns in duration. How long will the processor have to wait in steps 2 & 5, assuming that a memory
read operation takes 16ns to complete? Also compute the percentage of time for which the
processor is idle during the execution of this instruction.

Solution:
Control sequence:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin

[Type here]
[Type here]

4. R3out, MARin, Read


5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, End.

Total time in execution is: (5X2)+(2X16)=42ns


There are totally 7 steps.
Each step takes 2 ns.
Step 2 & 5 takes 16ns
Therefore, 5 steps take 2 ns.
Remaining 2 steps takes 16+16 = 32ns

Therefore, (5X2)+(2X16)=42ns

The processor is idle during memory read operations. i.e. for a duration of 32ns out of 42ns.
Therefore, processor idle time = 32ns/42ns = 76.2% of the total time.

5.7 Comparison between Hardwired &Microprogrammed control

Attribute Hardwired Control Microprogrammed Control


Speed Fast Slow

Control Function Implemented in hardware Implemented in software


Not flexible, to accommodate More flexible, to accommodate
Flexibility new system specifications or new system specification or new
new instructions instructions redesign is required.

Ability to handle
Difficult Easier
large/complex instruction sets

Ability to support OS &


Very difficult Easy
diagnostic features

Design process Somewhat complicated Orderly & systematic

Mostly RISC Mainframes, some


Applications
microprocessors microprocessors
Usually under 100
Instruction set size Usually over 100 instructions
instructions
2K to 10K by 20-40bit
ROM size -
microinstructions

Chip area efficiency Uses least area Uses more area

[Type here]

You might also like