Lab Manual

LAB MANUAL
Subject: PROCESSOR ARCHITECTURE LAB (PAL)

Subject Code: CSL403
Class: SE Computer Engineering
Semester: IV (CBGS)
Div: A & B
Prof. Vidhya Seeman Dr. Mahavir Devmane

Prof. Tulshidas Mane (H.O.D.)
(Subject In charge)
Academic Year 2017-2018
CLASS: SE DIV: A & B SEM: IV

SBJECT: PROCESSOR ARCHITECTURE LAB (PAL)
EXPERIMENT LIST
SR. NO. NAME

1 Dismantling and assembling PC
2 Ripple Carry Adder
3 Carry-look-ahead adder
4 Registers and Counters
5 Booth's Multiplier
6 ALU Design
7 CPU Design
8 Memory Design
9 Case Study on multi-core Processors
10 Associative cache Design
11 Direct Mapped Cache Design
12 Case study on PCI
Prof. Vidhya Seeman Dr. Mahavir Devmane

Prof. Tulshidas Mane (H.O.D.)
(Subject In charge)
Experiment No. 1
AIM: To Dismantle and assemble processor of PC
OBJECTIVE:
1. Troubleshoot yourself and save time.
2. Knowing about system internals and components.
3. Removing components.
OUTCOME: Students will be able to assemble a CPU
COMPONENTS: Phillips head screw driver
THEORY:
A computer is made up of a case (or chassis) which houses several important internal components,
and provides places to connect the external components, including non-peripherals.
Inside the case go the following internal parts:
• Power Supply/PSU – power supply unit, converts outlet power, which is alternating current (AC),
to direct current (DC) which is required by internal components, as well as providing appropriate
voltages and currents for these internal components.
• Motherboard/main board – As the name indicates, this is the electronic centerpiece of the
computer: everything else connects to the motherboard.
• Processor/CPU – central processing unit, the "brain" of the computer, most actual computation
takes place here.
• RAM – random access memory, the "short-term memory" of a computer, used by the CPU to
store program instructions and data upon which it is currently operating. Data in RAM is lost when
the computer is powered off, thus necessitating a hard drive.
PROCEDURE:
Disassembling the computer system
1. Detach the power cable:
The disassembling of the computer system starts with externally connected device detachment.
Make sure the computer system is turned off, if not then successfully shut down the system and
then start detaching the external devices from the computer system. It includes removing the power
cable from electricity switchboard, then remove the cable from SMPS (switch mode power supply)
from the back of the CPU Cabinet. Do not start the disassembling without detaching the power
cable from the computer system. Now remove the remaining external devices like keyboard, mouse,
monitor, printer or scanner from the back of CPU cabinet.
2. Remove the Cover:
The standard way of removing tower cases used to be to undo the screws on the back of the case,
slide the cover back about an inch and lift it off. The screwdrivers as per the type of screw are
required to do the task.
3. Remove the adapter cards:

Make sure if the card has any cables or wires that might be attached and decide if it would be easier
to remove them before or after you remove the card. Remove the screw if any, that holds the card in
place. Grab the card by its edges, front and back, and gently rock it lengthwise to release it.
4. Remove the drives:

Removing drives is easier. There can be possibly three types of drives present in your computer
system, Hard disk drive, CD/DVD/Blue-ray drives, floppy disk drives (almost absolute now a day).
They usually have a power connector and a data cable attached from the device to a controller card
or a connector on the motherboard. CD/DVD/Blue Ray drive may have an analog cable connected
to the sound card for direct audio output.
The power may be attached using one of two connectors, a Molex connector or a Berg
connector for the drive. The Molex connector may require to be wiggled slightly from side to side
and apply gentle pressure outwards. The Berg connector may just pull out or it may have a small
tab which has to be lifted with a screwdriver.
Now Pull data cables off from the drive as well as motherboard connector. The hard disk drive
and CD/DVD drives have two types of data cables. IDE and SATA cables. The IDE cables need
better care while being removed as it may cause the damage to drive connector pins. Gently wiggle
the cable sideways and remove it. The SATA cables can be removed easily by pressing the tab and
pulling the connector straight back.
Now remove the screws and slide the drive out the back of the bay.
5. Remove the memory module:

Memory modules are mounted on the motherboard as the chips that can be damaged by manual
force if applied improperly. Be careful and handle the chip only by the edges. SIMMs and DIMMs
are removed in a different way:
SIMM - gently push back the metal tabs while holding the SIMM chips in the socket. Tilt the
SIMM chip away from the tabs until a 45% angle. It will now lift out of the socket. Put SIMM in a
safe place.
DIMM- There are plastic tabs on the end of the DIMM sockets. Press the tabs down and away from
the socket. The DIMM will lift slightly. Now grab it by the edges and place it safely. Do not let the
chips get dust at all.
6. Remove the power supply:
The power supply is attached into tower cabinet at the top back end of the tower. Make sure the
power connector is detached from the switchboard. Start removing the power connector connected
to motherboard including CPU fan power connector, cabinet fan, the front panel of cabinet power
buttons and all the remaining drives if not detached yet.
Now remove the screws of SMPS from the back of the cabinet and the SMPS can be detached
from the tower cabinet.
7. Remove the motherboard:

Before removing all the connectors from the motherboard, make sure u memorize the connectors
for assembling the computer if required, as that may require connecting the connectors at its place.
Remove the screws from the back of the motherboard and you will be able to detach it from the
cabinet. Now remove the CPU fan from the motherboard. The heat sink will be visible now which
can be removed by the pulling the tab upward. Finally, the processor is visible now, which can be
removed by the plastic tab which can be pulled back one stretching it side way.
Assembling the computer system

The assembling of the computer system is exactly the opposite of disassembling operation.
Before starting assembling the computer system, make sure you have the screws and a screwdriver
for those.
The first step for assembling the computer system starts with mounting the processor on the
processor socket of the motherboard. To mount the process, you don't need to apply any force. The
special ZIF (zero insertion force) sockets are usually used to prevent any damage to the processor
pins. Once the processor is mounted, the heat sink will be attached on top of the processor. The
CPU fan is also attached on top of the heat sink.
Now the motherboard is to be fixed vertically in the tower case and the screws are fixed
from behind of the motherboard.
Now line up the power supply at the top back end of the cabinet and screw it. The power
connectors for motherboard power supply and CPU fan power supply are to be connected. If the
cabinet cooling FAN is required then it is to be screwed at the back end grill of the cabinet and its
power connector is to be connected from SMPS.
Install the CD/DVD drives at the top front end of the cabinet and screw it. Install the Hard
disk drive and floppy disk drive below CD/DVD drive and screw it. Make sure once screwed there
is no vibration in either of the CD/DVD, Hard disk or Floppy disk drives.
Now select the appropriate data cable and connect one end of the cable to its drive socket
and another end at its appropriate connector on the motherboard. For SATA hard disk drive or
CD/DVD drives use SATA cable and its power cable, else use IDE data cable. Do the proper
jumper settings as per the usage requirement.
It is time now to mount the memory modules on the motherboard by aligning the RAM to
its socket on the motherboard and press it downward. Make sure the side tab are fixed into the
RAM notch. If not, you may still have to press a bit.
Install the internal cards to its socket and attach the cables or power cable to it. The selection
of right socket or slot is required as per the type of socket.
Cover the tower by placing it and pressing towards front side and screw it.
Connect the external devices with CPU at its appropriate socket. It includes mouse and
keyboard at PS2 or USB connectors. Monitor at the video output socket. Connect the power cable
to the back of tower in SMPS. Plug in the power cable to the electric board.
POWERING UP FOR THE FIRST TIME:
1. Ensure that no wires are touching the CPU heat sink fan.
2. Plug your monitor, mouse and keyboard.
3. Plug in power card and switch the power supply.
4. If everything is connected as it should be
• All system, fans should start spinning.
• U should hear a single beep and after about 5-10 sec.
• Amber light on monitor should go green.
• You will see computer start to boot with a memory check.
• Now check front LED’S to see if u plugged them in correctly.
• Check all other buttons.
• Power afford change any wrong settings.
CONCLUSION:
A CPU was assembled and dismantled successfully.
Experiment No. 2
AIM: To Perform Ripple Carry Adder
OBJECTIVE: To understand the operation of a ripple carry adder, specifically how the carry
ripples through the adder.
1. Examining the behaviour of the working module to understand how the carry ripples through the
adder stages.
2. To design a ripple carry adder using full adders to mimic the behaviour of the working module.
3. The adder will add two 4 bit numbers.
OUTCOME: Students will understand ripple carry adder
SOFTWARE REQUIRED: Simulator.jar
THEORY:
Half Adders can be used to add two one bit binary numbers. It is also possible to create a logical
circuit using multiple full adders to add N-bit binary numbers. Each full adder inputs a Cin, which
is the Cout of the previous adder. This kind of adder is a Ripple Carry Adder, since each carry bit
"ripples" to the next full adder. The first (and only the first) full adder may be replaced by a half
adder. The block diagram of 4-bit Ripple Carry Adder is shown here below -
The layout of ripple carry adder is simple, which allows for fast design time; however, the ripple
carry adder is relatively slow, since each full adder must wait for the carry bit to be calculated from
the previous full adder. The gate delay can easily be calculated by inspection of the full adder
circuit. Each full adder requires three levels of logic. In a 32-bit [ripple carry] adder, there are 32
full adders, so the critical path (worst case) delay is 31 * 2(for carry propagation) + 3(for sum) = 65
gate delays.
The corresponding boolean expressions are given here to construct a ripple carry adder. In the half
adder circuit the sum and carry bits are defined as
sum = A ⊕ B
carry = AB
In the full adder circuit the Sum and Carry output is defined by inputs A, B and Carryin as
Sum=ABC + ABC + ABC + ABC
Carry=ABC + ABC + ABC + ABC
Having these we could design the circuit. But, we first check to see if there are any logically
equivalent statements that would lead to a more structured equivalent circuit.
With a little algebraic manipulation, one can see that
Sum= ABC + ABC + ABC + ABC
= (AB + AB) C + (AB + AB) C
= (A ⊕ B) C + (A ⊕ B) C
=A ⊕ B ⊕ C
Carry= ABC + ABC + ABC + ABC
= AB + (AB + AB) C
= AB + (A ⊕ B) C
PROCEDURE:
1. Start the simulator as directed. This simulator supports 5-valued logic.
2. To design the circuit we need 3 full adder, 1 half adder, 8 Bit switch(to give input), 3 Digital
display(2 for seeing input and 1 for seeing output sum), 1 Bit display(to see the carry output), wires.
3. The pin configuration of a component is shown whenever the mouse is hovered on any canned
component of the palette or press the 'show pinconfig' button. Pin numbering starts from 1 and from
the bottom left corner (indicating with the circle) and increases anticlockwise.
4. For half adder input is in pin-5,8 output sum is in pin-4 and carry is pin-1, For full adder input is
in pin-5,6,8 output sum is in pin-4 and carry is pin-1
5. Click on the half adder component (in the Adder drawer in the pallet) and then click on the
position of the editor window where you want to add the component (no drag and drop, simple click
will serve the purpose), likewise add 3 full adders (from the Adder drawer in the pallet), 8 Bit
switches, 3 digital display and 1 bit Displays(from Display and Input drawer of the pallet, if it is not
seen scroll down in the drawer)
6. To connect any two components select the Connection menu of Palette, and then click on the
Source terminal and click on the target terminal. According to the circuit diagram connect all the
components; connect 4 bit switches to the 4 terminals of a digital display and another set of 4 bit
switches to the 4 terminals of another digital display. Connect the pin-1 of the full adder which will
give the final carry output. Connect the sum(pin-4) of all the adders to the terminals of the third
digital display(according to the circuit diagram shown in screenshot). After the connection is over
click the selection tool in the pallete.
7. To see the circuit working, click on the Selection tool in the pallet then give input by double
clicking on the bit switch, (let it be 0011(3) and 0111(7)) you will see the output on the output(10)
digital display as sum and 0 as carry in bit display.
CIRCUIT DIAGRAM:
CONCLUSION:
Thus ripple carry adder is designed.
Experiment No. 3
AIM: To perform carry look ahead adder.
OBJECTIVE: It computes the carries parallely thus greatly speeding up the computation.
1. Understanding behaviour of carry lookahead adder from module designed by the student as part
of the experiment.
2. Understanding the concept of reducing computation time with respect of ripple carry adder by
using carry generate and propagate functions.
3. The adder will add two 4 bit numbers.
OUTCOME: Student will understand carry look ahead adder.
THEORY:
To reduce the computation time, there are faster ways to add two binary numbers by using carry
look ahead adders. They work by creating two signals P and G known to be Carry Propagator and
Carry Generator. The carry propagator is propagated to the next level whereas the carry generator is
used to generate the output carry, regardless of input carry. The block diagram of a 4-bit Carry
Look ahead Adder is shown here below -
The number of gate levels for the carry propagation can be found from the circuit of full adder. The
signal from input carry Cin to output carry Cout requires an AND gate and an OR gate, which
constitutes two gate levels. So if there are four full adders in the parallel adder, the output carry C5
would have 2 X 4 = 8 gate levels from C1 to C5. For an n-bit parallel adder, there are 2n gate levels
to propagate through.
Design Issues:
The corresponding boolean expressions are given here to construct a carry lookahead adder. In the
carry-lookahead circuit we need to generate the two signals carry propagator(P) and carry
generator(G),
Pi = Ai ⊕ Bi
Gi = Ai · Bi
The output sum and carry can be expressed as
Sumi = Pi ⊕ Ci
Ci+1 = Gi + ( Pi · Ci)
Having these we could design the circuit. We can now write the Boolean function for the carry
output of each stage and substitute for each Ci its value from the previous equations:
C1 = G0 + P0 · C0
C2 = G1 + P1 · C1 = G1 + P1 · G0 + P1 · P0 · C0
C3 = G2 + P2 · C2 = G2 P2 · G1 + P2 · P1 · G0 + P2 · P1 · P0 · C0
C4 = G3 + P3 · C3 = G3 P3 · G2 P3 · P2 · G1 + P3 · P2 · P1 · G0 + P3 · P2 · P1 · P0 · C0
PROCEDURE:
2. To design the circuit we need 7 half adder, 3 OR gate, 1 V+(to give 1 as input), 3 Digital
display(2 for seeing input and 1 for seeing output sum), 1 Bit display(to see the carry output), wires.
component of the palette or press the 'show pinconfig' button. Pin numbering starts from 1 and from
the bottom left corner(indicating with the circle) and increases anticlockwise.
4. For half adder input is in pin-5,8 output sum is in pin-4 and carry is pin-1
5. Click on the half adder component(in the Adder drawer in the pallet) and then click on the
position of the editor window where you want to add the component(no drag and drop, simple click
will serve the purpose), likewise add 6 more full adders(from the Adder drawer in the pallet), 3 OR
gates(from Logic Gates drawer in the pallete), 1 V+, 3 digital display and 1 bit Displays(from
Display and Input drawer of the pallet,if it is not seen scroll down in the drawer)
components, connect V+ to the upper input terminals of 2 digital displays according to you input.
connect the OR gates according to the diagram shown in the screenshot connect the pin-1 of the half
adder which will give the final carry output. connet the sum(pin-4) of those adders to the terminals
of the third digital display which will give output sum. After the connection is over click the
selection tool in the pallete.
7. See the output, in the screenshot diagram we have given the value 0011(3) and 0111(7) so get 10
as sum and 0 as carry.you can also use many bit switches instead of V+ to give input and by double
clicking those bit switches can give different values and check the result.
CIRCUIT DIAGRAM:
CONCLUSION: Thus carry look ahead adder is designed.

Experiment No. 4
AIM: To design registers and counters
OBJECTIVE:
Objective of designing registers:
1. To understand the shifting of data
2. To examine the behaviour of different modes of data input and data output (serial-in serial-out,
serial-in parallel-out, parallel-in serial out, parallel-in parallel-out)
3. To make use of shift register in data transfer
4. Developing skills in the designing and testing of sequential logic circuits
5. Developing skills in analysing timing signals
Objective of designing counters:
1. Understanding the concept of counting up to certain limiting value and returning back to the start
state from final state
2. Understanding the generation of timing sequences to control operations in a digital system
3. Developing skills in the design and testing of counters for given timing sequences
4. Developing skills in generating timing signals
OUTCOME: Student will be able to design register and counter.
THEORY:
In a sequential circuit the present output is determined by both the present input and the past output.
In order to receive the past output some kind of memory element can be used. The memory
elements commonly used in the sequential circuits are time-delay devices. The block diagram of the
sequential circuit-
A circuit with flip-flops is considered a sequential circuit even in the absence of combinational
logic. Circuits that include flip-flops are usually classified by the function they perform. Two such
circuits are registers and counters:
Register is a group of flip-flops. Its basic function is to hold information within a digital system so
as to make it available to the logic units during the computing process.
Counter is essentially a register that goes through a predetermined sequence of states.
There are various different kind of Flip-Flops. Some of the common flip-flops are: R-S Flip-Flop, D
Flip-Flop, J-K Flip-Flop, T Flip-Flop. The block diagram of different flip-flops are shown here -
RS flipflop: If R is high then reset state occurs and when S=1 set state. The both cannot be high
simultaneously. This input combination is avoided.
JK flipflop: If J and K are both low then no change occurs. If J and K are both high at the clock
edge then the output will toggle from one state to the other.
D flipflop: The D flip-flop tracks the input, making transitions with match those of the input D. It is
used as data store.
T flipflop: The T or "toggle" flip-flop changes its output on each clock edge,
Types of Registers:
4-bit Serial-in Serial-out
4-bit Serial-in Parallel-out
4-bit Parallel-in Serial-out
4-bit Parallel-in Parallel-out
Types of Counters:
4-bit Synchronous Binary Counter
4-bit Synchronous Ring Counter
4-bit Synchronous Johnson Counter
Design Issues :
The four different types of flip-flops are supplied here. One can easily build any register or counter
using those flip-flop and different logic gates. But the clock input is under development, so it is not
possible now to build any register or counter completely.
PROCEDURE:
1. Start the simulator as directed.This simulator supports 5-valued logic.
2. To design a 4 bit shift register (right shift), we need 4 MSD flipflop, 1 free running clock, 1 Bit
switch (which will act as input to the left most flipflop), 4 Bit display(to see the output of individual
flipflops so that the shifting can be seen with the clock input), wires.
3. The MSD flipflop component is in the sequential circuit drawer in the pallet. The pin
configuration is shown whenever the mouse is hovered on any canned component of the palette or
press the 'show pinconfig' button. Pin numbering starts from 1 and from the bottom left
corner(indicating with the circle) and increases anticlockwise.
4. For MSD flipflop input is in pin-5, output(Q) is in pin-4, clock is in pin-8
5. Click on the MSD flipflop component in the pallet and then click on the position of the editor
window where you want to add the component(no drag and drop, simple click will serve the
purpose), likewise add 4 MSD flipflops, 1 free running clock, 1 Bit switche and 4 bit
Displayes(from Display and Input drawer of the pallet,if it is not seen scroll down in the drawer)
Source terminal and click on the target terminal. connect all the components, connect the clock to
the pin-8 of all the MSD flipflops, connect a bit switch to the pin-5(Q) of the left most MSD
flipflop, connect 4 bit displayes to the pin-4 of 4 MSD flipflops, connect the Q output of the
previous flipflop to the D(pin-5) input of the next flipflop.
7. To see the circuit working, click on the Selection tool in the pallet then give input by double
clicking on the bit switch, to the left most D flipflop at pi-5(let it be 1), start the clock now check
the output and see how the 1 is shifting from left to right.
SCREEN SHOT:
CONCLUSION: Thus registers and counters are designed.

Experiment No. 5
AIM: To design Booth’s Multiplier.
OBJECTIVE:
1. Understanding behaviour of Booth's multiplication algorithm from working module and the
module designed by the student as part of the experiment
2. Designing Booth's multiplier with a controller and a datapath. This will also help in the learning
of control unit design as a finite state machine
3. Understanding the advantages of Booth's multiplier
 It can handle signed integers in 2's complement notion
 It decreases the number of addition and subtraction
 It requires less hardware than combinational multiplier
 It is faster than straightforward sequential multiplier
OUTCOME: Students will understand the design of Booth’s multiplier.
THEORY:
Booth's multiplication algorithm is an algorithm which multiplies 2 signed integers in 2's
complement. The algorithm is depicted in the following figure with a brief description. This
approach uses fewer additions and subtractions than more straightforward algorithms.
The multiplicand and multiplier are placed in the m and Q registers respectively. A 1 bit register is
placed logically to the right of the LSB (least significant bit) Q0 of Q register. This is denoted by Q-
1. A and Q-1 are initially set to 0. Control logic checks the two bits Q0 and Q-1. If the twi bits are
same (00 or 11) then all of the bits of A, Q, Q-1 are shifted 1 bit to the right. If they are not the
same and if the combination is 10 then the multiplicand is subtracted from A and if the combination
is 01 then the multiplicand is added with A. In both the cases results are stored in A, and after the
addition or subtraction operation, A, Q, Q-1 are right shifted. The shifting is the arithmetic right
shift operation where the left most bit namely, An-1 is not only shifted into An-2 but also remains
in An-1. This is to preserve the sign of the number in A and Q. The result of the multiplication will
appear in the A and Q.
PROCEDURE:
2. To perform the experiment on the given modules, we need the datapath specified for booth's
multiplication, a controller with a specified state chart, a clock input, Bit switch(to give input,which
will toggle its value with a double click), Bit displays(for seeing output), wires.
3. Instantiating the controller.
4. Instantiate the Booth's multiplier datapath from the sequential ckt drawer in the palette (by
clicking as mentioned previously).
5. The pin configuration of the component is shown whenever the mouse is hovered on any canned
component of the palette or pressing the show pin configuration button on the toolbar will show it
constantly in the left pane. Pin numbering starts from 1 and from the bottom left corner(indicating
with the circle) and increases anticlockwise.
Source terminal and click on the target terminal.
7. At first initialize the multiplier by giving the specified inputs specified earlier, this will load the
multiplier and multiplicand, then start the multiplication operation by giving the specified inputs
specified earlier.
CIRCIUT DIAGRAM:
CONCLUSION: Thus Booth’s Multiplier is designed.

Experiment No. 6
AIM: To Design ALU
OBJECTIVE:
1. Understanding behaviour of arithmetic logic unit from working module and the module designed
by the student as part of the experiment
2. Designing an arithmetic logic unit for given parameter
OUTCOME: Student will be able to design ALU.
THEORY:
ALU or Arithmetic Logical Unit is a digital circuit to do arithmetic operations like addition,
subtraction, division, multiplication and logical operations like and, or, xor, nand, nor etc. A simple
block diagram of a 4 bit ALU for operations and, or, xor and Add is shown here :
The 4-bit ALU block is combined using 4 1-bit ALU block
Design Issues:
The circuit functionality of a 1 bit ALU is shown here, depending upon the control signal S1 and S0
the circuit operates as follows:
for Control signal S1 = 0 , S0 = 0, the output is A And B,
for Control signal S1 = 0 , S0 = 1, the output is A Or B,
for Control signal S1 = 1 , S0 = 0, the output is A Xor B,
for Control signal S1 = 1 , S0 = 1, the output is A Add B.
PROCEDURE:
2. To design the circuit we need 4 1-bit ALU, 11 Bit switch (to give input, which will toggle its
value with a double click), 5 Bit displays (for seeing output), wires.
component of the palette. Pin numbering starts from 1 and from the bottom left corner (indicating
with the circle) and increases anticlockwise.
4. For 1-bit ALU input A0 is in pin-9,B0 is in pin-10, C0 is in pin-11(this is input carry), for
selection of operation, S0 is in pin-12, S1 is in pin-13, output F is in pin-8 and output carry is pin-7
5. Click on the 1-bit ALU component(in the Other Component drawer in the pallet) and then click
on the position of the editor window where you want to add the component(no drag and drop,
simple click will serve the purpose), likewise add 3 more 1-bit ALU(from the Other Component
drawer in the pallet), 11 Bit switches and 5 Bit Displays(from Display and Input drawer of the
pallet, if it is not seen scroll down in the drawer), 3 digital display and 1 bit Displays(from Display
and Input drawer of the pallet, if it is not seen scroll down in the drawer)
components. Connect the Bit switches with the inputs and Bit displays component with the outputs.
After the connection is over click the selection tool in the pallete.
7. See the output, in the screenshot diagram we have given the value of S1 S0=11 which will
perform add operation and two number input as A0 A1 A2 A3=0010 and B0 B1 B2 B3=0100 so get
output F0 F1 F2 F3=0110 as sum and 0 as carry which is indeed an add operation. you can also use
many other combination of different values and check the result.
CIRCUIT DIAGRAM:
CONCLUSION: Thus ALU was designed.

Experiment No. 7
AIM: To design CPU
OBJECTIVE: Main objective for this experiment is to show the basic top level functionality,
organization and architecture of a computer.
OUTCOME: Student will be able to design a CPU .
THEORY:
At the top level a computer consists of a CPU (central processing unit), memory, I/O components,
with one or more modules of each type. These modules are interconnected in a specific manner to
achieve the basic functionality of a computer i.e. executing programs. At the top level a computer
system can be described as follows:
 describing the external behaviour of each component i.e the data and the control signals that
it exchanges with other components
 describing the interconnection structure and the controls required
We are considering the Von Neumann architecture. Some of the basic features of this architecture
are as follows:
 data and instructions are stored in a single read-write memory
 the contents of the memory are addressable by location
 execution occurs in a sequential manner (unless explicitly specified) from one instruction to
the next
PROCEDURE:
2. We need the CPU, the working memory with a program and data loaded, a clock input, Bit
switch, Bit displays, wires.
3. Load memory: click on the load memory button in the left pane. The memory provides 4-bit
address space and 12 bit data word.
4. Instantiating the memory: after loading the memory, click on the memory component from the
computer design drawer in the palette of the simulator then click on the position of the design editor
where you want to put the component.
5. Instantiate the CPU from the computer design drawer in the palette of the simulator then click on
the position of the design editor where you want to put the component.
6. Connect the memory outputs to the input terminals of the CPU, specified datapath outputs to the
inputs of the controller, the clock input, Bit switches with the inputs and Bit displays component
with the outputs. After the connection is over click the selection tool in the pallete.
7. Start clock and observe the behaviour of the CPU. See the content of memory by clicking show
memory button in the left pane. Observe how the program is executing sequentially and modify the
data content as per the program.
CIRCUIT DIAGRAM:
CONCLUSION: Thus CPU was designed.

Experiment No. 8
AIM: To design a memory
OBJECTIVE: To design memory units and understand how it operates during read and write
operation.
 understanding behaviour of memory from working module and the module designed by the
student as part of the experiment
 designing an memory for given parameter
OUTCOME: Students will be able to design a memory unit.
THEORY:
A memory unit is a collection of storage cells together with associated circuits needed to transform
information in and out of the device. Memory cells which can be accessed for information transfer
to or from any desired random location is called random access memory (RAM). The block
diagram of a memory unit-
Internal Construction: The internal construction of a random-access memory of m words with n bits
per word consists of m*n binary storage cells and associated decoding circuits for selecting
individual words. The binary cell is the basic building block of a memory unit.
Design Issues:
A basic RAM cell has been provided here as a component which can be used to design larger
memory units. An IC memory consisting of 4 words each having 3 bits has been also provided.
PROCEDURE:
2. To design the circuit we need 16 binary RAM cell, 12 OR gate, wires.
3. Click on the 'decoder with enable' component(in the Other Components drawer in the pallet) and
then click on the position of the editor window where you want to add the component, likewise add
16 binary RAM cell(from the Other Components drawer in the pallet), 12 OR gates(from Logic
Gates drawer in the pallete),
components, connect 2 bit switches to the inputs of the 'decoder with enable'(which will act as
address input), 1 bit switch to the enable pin of the 'decoder with enable'(which will act as memory
enable input), connect a bit switch to the Read/Write(R/W') line, 3 bit switches to the data inputs
line, 3 bit displays to the data output line and OR gates according to the diagram shown in the
circuit diagram. after athe connection is over click the selection tool in the pallete.
5. To see the circuit working, Do some read or write operation by properly setting the R/W',
memory enable then give input and check the output.
CIRCUIT DIAGRAM:
CONCLUSION: Thus memory is designed.

Experiment No. 9
AIM: Case Study on multi-core Processors.
OBJECTIVE: To understand the multi-core processors.
OUTCOME: Students will be able to describe multi – core processors.
THEORY:
1. Introduction
The microprocessor industry continues to have great importance in the course of technological
advancements ever since their coming to existence in 1970s. The growing market and the demand
for faster performance drove the industry to manufacture faster and smarter chips. One of the most
classic and proven techniques to improve performance is to clock the chip at higher frequency
which enables the processor to execute the programs in a much quicker time and the industry has
been following this trend from 1983 – 2002. Additional techniques have also been devised to
improve performance including parallel processing, data level parallelism and instruction level
parallelism which have all proven to be very effective. One such technique which improves
significant performance boost is multi-core processors. Multi-core processors have been in
existence since the past decade, but however have gained more importance off late due to
technology limitations single-core processors are facing today such as high throughput and long
lasting battery life with high energy efficiency.
2. Evolution of Multi-core processor

Driven by a performance hungry market, microprocessors have always been designed
keeping performance and cost in mind. Gordon Moore, founder of Intel Corporation predicted that
the number of transistors on a chip will double once in every 18 months to meet this ever growing
demand which is popularly known as Moore’s Law in the semiconductor industry. Advanced chip
fabrication technology alongside with integrated circuit processing technology offers increasing
integration density which has made it possible to integrate one billion transistors on a chip to
improve performance. However, the performance increase by micro-architecture governed by
Pollack’s rule is roughly proportional to square root of increase in complexity. This would mean
that doubling the logic on a processor core would only improve the performance by 40%. With
advanced chip fabrication techniques comes along another major bottleneck, power dissipation
issue. Studies have shown that transistor leakage current increases as the chip size shrinks further
and further which increases static power dissipation to large values as shown in Figure 1.
One alternate means of improving performance is to increase the frequency of operation which
enables faster execution of programs. However the frequency is again limited to 4GHz currently as
any increase beyond this frequency increases power dissipation again. “Battery life and system cost
constraints drive the design team to consider power over performance in such a scenario”. Power
consumption has increased to such high levels that traditional air-cooled microprocessor server
boxes may require budgets for liquid-cooling or refrigeration hardware. Designers eventually hit
what is referred to as the power wall, the limit on the amount of power a microprocessor could
dissipate.
Semiconductor industry once driven by performance being the major design objective, is
today being driven by other important considerations such chip fabrication costs, fault tolerance,
power efficiency and heat dissipation. This led to the development of multi-core processors which
have been effective in addressing these challenges.
3. Multi-core processors
“A Multi-core processor is typically a single processor which contains several cores on a
chip”. The cores are functional units made up of computation units and caches. These multiple
cores on a single chip combine to replicate the performance of a single faster processor. The
individual cores on a multi-core processor don’t necessarily run as fast as the highest performing
single-core processors, but they improve overall performance by handling more tasks in parallel.
The performance boost can be seen by understanding the manner in which single core and multi-
core processors execute programs. Single core processors running multiple programs would assign
time slice to work on one program and then assign different time slices for the remaining programs.
If one of the processes is taking longer time to complete then all the rest of the processes start
lagging behind. However, In the case of multi-core processors if you have multiple tasks that can be
run in parallel at the same time, each of them will be executed by a separate core in parallel thus
boosting the performance as shown in figure 2.
Figure 2. Multicore chips perform better – based on Intel tests using the SPECint2000 and SPECfp2000
benchmarks – than single-core processors.
The multiple cores inside the chip are not clocked at a higher frequency, but instead their
capability to execute programs in parallel is what ultimately contributes to the overall performance
making them more energy efficient and low power cores as shown in the figure below. Multi-core
processors are generally designed partitioned so that the unused cores can be powered down or
powered up as and when needed by the application contributing to overall power dissipation
savings.
Figure 3: Dual core processor at 20% reduced clock frequency effectively delivers 73% more performance
while approximately using the same power as a single-core processor at maximum frequency.
Multi-core processors could be implemented in many ways based on the application

requirement. It could be implemented either as a group of heterogeneous cores or as a group of
homogenous cores or a combination of both. In homogeneous core architecture, all the cores in the
CPU are identical and they apply divide and conquer approach to improve the overall processor
performance by breaking up a high computationally intensive application into less computationally
intensive applications and execute them in parallel. Other major benefits of using a homogenous
multi-core processor are reduced design complexity, reusability, reduced verification effort and
hence easier to meet time to market criterion. On the other hand heterogeneous cores consist of
dedicated application specific processor cores that would target the issue of running variety of
applications to be executed on a computer. An example could be a DSP core addressing multimedia
applications that require heavy mathematical calculations, a complex core addressing
computationally intensive application and a remedial core which addresses less computationally
intensive applications.
Multi-core processors could also be implemented as a combination of both homogeneous
and heterogeneous cores to improve performance taking advantages of both implementations.
CELL multi-core processor from IBM follows this approach and contains a single general purpose
microprocessor and eight similar area and power efficient accelerators targeting for specific
applications has proven to be performance efficient. “Another major multi-core benefit comes from
individual applications optimized for multi-core processors. These applications when properly
programmed can split a task into multiple smaller tasks and run them in parallel”. Due to the
multiple advantages that multi-core processors come along with, most of the processor
manufactures started developing them. Intel announced that all its future processors will be
multicore when they realized that this technology can get past the power wall to improve
performance. Other popular processor manufactures namely AMD, IBM and TENSILICA all have
started developing multi-core processors. The number of cores in a processor is expected to increase
and some even predict it to follow Moore’s law. TENSILICA’s CEO has expressed his view on this
technology stating that “System on a Chip will become a sea of processors. You will have ten to
maybe a thousand processors on a chip”.
However it was observed that there is no throughput improvement while executing
sequential programs/single threaded applications on multi-core chips due to under utilization of
cores. Compilers are being developed which automatically parallelize applications so that multiple
independent tasks can run simultaneously on different cores.
4. Major challenges faced by multi-core processors.
In spite of the many advantages that multi-core processors come with, there are a few major
challenges the technology is facing. One main issue seen is with regard to software programs which
run slower on multicore processors when compared to single core processors. It has been correctly
pointed out that “Applications on multi-core systems don’t get faster automatically as cores are
increased”. Programmers must write applications that exploit the increasing number of processors
in a multi-core environment without stretching the time needed to develop software. Majority of
applications used today were written to run on only a single processor, failing to use the capability
of multi-core processors. Although software firms can develop software programs capable of
utilizing the multi-core processor to the fullest, the grave challenge the industry faces is how to port
legacy software programs developed years ago to multi-core aware software programs. Redesigning
programs although sounds possible, it’s really not a technological decision in today’s environment.
It’s more of a business decision where in companies have to decide whether to go ahead
redesigning software programs keeping in mind key parameters such as time to market, customer
satisfaction and cost reduction.
The industry is addressing this problem by designing compilers which can port legacy single
core software programs to ‘multi-core aware’ programs which will be capable of utilizing the power
of multi-core processors. The compilers could perform “code reordering”, where in compilers will
generate code, reordering instructions such that instructions that can be executed in parallel are
close to each other. This would enable us to execute instructions in parallel improving performance.
Also compilers are being developed to generate parallel threads or processes automatically for a
given application so that these processes can be executed in parallel. Intel released major updates
for C++ and Fortran tools which aimed at programmers exploiting parallelism in multi-core
processors. Also alongside OpenMP (Open Multiprocessing), an application programming interface
which supports multiprocessing programming in C, C++ and Fortran provides directives for
efficient multithreaded codes. It has however been correctly pointed out that “The throughput,
energy efficiency and multitasking performance of multi-core processors will all be fully realized
when application code is multi-core ready”.
Secondly, on-chip interconnects are becoming a critical bottle-neck in meeting performance
of multi-core chips. With increasing number of cores comes along the huge interconnect delays
(wire delays) when data has to be moved across the multi-core chip from memories in particular.
The performance of the processor truly depends on how fast a CPU can fetch data rather than how
fast it can operate on it to avoid data starvation scenario. Buffering and smarter integration of
memory and processors are a few classic techniques which have attempted to address this issue.
Network on a chip (NoCs) are IPs (Intellectual Property) being developed and researched upon
which are capable of routing data on a SoC in a much more efficient manner ensuring less
interconnect delay.
Increased design complexity due to possible race conditions as the number of cores increase
in a multi-core environment. “Multiple threads accessing shared data simultaneously may lead to a
timing dependent error known as data race condition”. In a multi-core environment data structure is
open to access to all other cores when one core is updating it. In the event of a secondary core
accessing data even before the first core finishes updating the memory, the secondary core faults in
some manner. Race conditions are especially difficult to debug and cannot be detected by
inspecting the code, because they occur randomly. Special hardware requirement implementing
mutually exclusion techniques have to be implemented for avoiding race conditions.
Another important feature which impacts multi-core performance is the interaction between
on chip components viz. cores, memory controllers and shared components viz. cache and
memories where bus contention and latency are the key areas of concern. Special crossbars or mesh
techniques have been implemented on hardware to address this issue.
CONCLUSION:
Power and frequency limitations observed on single core implementations have paved the
gateway for multicore technology and will be the trend in the industry moving forward. However
the complete performance throughput can be realized only when the challenges multi-core
processors facing today are fully addressed. A lot of technological breakthroughs are expected in
this area of technology including a new multi-core programming language, software to port legacy
software to “multi-core aware” software programs. Although it has been one of the most
challenging technologies to adopt to, there is considerable amount of research going on in the field
to utilize multi-core processors more efficiently.
Experiment No. 10
AIM: To Design of Associative Cache Memory.
OBJECTIVE:
1. Understanding behaviour of associative cache from working module
2. Designing a associative cache for given parameters
OUTCOME: Student will understand design of Associative Cache Memory
THEORY:
Cache memory is a small (in size) and very fast (zero wait state) memory which sits between the
CPU and main memory. The notion of cache memory actually rely on the correlation properties
observed in sequences of address references generated by CPU while executing a
programm(principle of locality).When a memory request is generated, the request is first presented
to the cache memory, and if the cache cannot respond, the request is then presented to main
memory.
 Hit: a cache access finds data resident in the cache memory
 Miss: a cache access does not find data resident, so it forces to access the main memory.
Cache treats main memory as a set of blocks.As the cache size is much smaller than main memory
so the number of cache lines are very less than the number of main memory blocks. So a procedure
is needed for mapping main memory blocks into cache lines.cache mapping scheme affects cost and
performance. There are three methods in block placement-
 Direct Mapped Cache
 Fully Associative Mapped Cache
 Set Associative Mapped Cache
Block diagram of a associated cache:
PROCEDURE:
1. Click on the 'Associative Cache' component (in the 'other components' drawer in the pallet) and
then click on the position of the editor window where you want to add the component, likewise add
15 Bit switches and 3 Bit Displays
components. After the connection is over click the selection tool in the pallete.
3. See the output, Bit switches are used to give input so that you can toggle its value with a double
click and see the outputs with different inputs.
CONCLUSION: Thus associative cache memory is designed.

Experiment No. 11
AIM: To Design of Direct Mapped Cache memory.
OBJECTIVE:
OUTCOME:
THEORY:
Cache memory is a small (in size) and very fast (zero wait state) memory which sits between the
CPU and main memory. The notion of cache memory actually rely on the correlation properties
observed in sequences of address references generated by CPU while executing a programm
(principle of locality).When a memory request is generated, the request is first presented to the
cache memory, and if the cache cannot respond, the request is then presented to main memory.
 Hit: a cache access finds data resident in the cache memory
 Miss: a cache access does not find data resident, so it forces to access the main memory.
Cache treats main memory as a set of blocks. As the cache size is much smaller than main memory
so the number of cache lines is very less than the number of main memory blocks. So a procedure is
needed for mapping main memory blocks into cache lines. cache mapping scheme affects cost and
performance. There are three methods in block placement-
 Direct Mapped Cache
 Fully Associative Mapped Cache
 Set Associative Mapped Cache
PROCEDURE:
1. Click on the 'Direct Mapped Cache' component(in the 'other components' drawer in the pallet)
and then click on the position of the editor window where you want to add the component, likewise
add 15 Bit switches and 3 Bit Displays.
2. 'Direct Mapped Cache' component in the 'other components' drawer in the simulator supports
both writing in the cache and the cache mapping. No replacement policy has been implemented.
Initially the cache is empty, user has to give inputs. the component contains 4 sets, each set has 5
bits, the left most bit is the valid bit, next 2 bits are tags, next bits are data bits, also it contains a one
dimensional array of memory with 4 bit to store the memory address, user has to give this address
input also.the cache reads all the data bits at a time so block offset is not required.
3. The pin configuration of the component can be seen whenever the mouse is hovered on any
canned component of the palette or press the 'show pinconfig' button.
components. After the connection is over click the selection tool in the pallete.
5. See the output, Bit switches are used to give input so that you can toggle its value with a double
click and see the outputs with different inputs.
SCREEN SHOT:
CONCLUSION: Thus Direct Mapped Cache memory is designed.

Experiment No. 12
AIM: Case study on PCI
OBJECTIVE: To understand about PCI
OUTCOME: Student will understand about PCI
THEORY:
PCI Fundamentals
The PCI bus is the de-facto standard bus for current-generation personal computers. The main
advantages for embedded applications like the STT are:
 direct implementation in FPGAs (no data buffers or glue chips)
 efficient protocol ("data burst" is the standard transfer)
 ready availability of development hardware (PC motherboards or VME carriers)
The PCI bus is a 32- or 64-bit wide bus with multiplexed address and data lines. The bus requires
about 47 lines for a complete (32-bit) implementation. The standard operating speed is 33MHz, and
data can be transferred continuously at this rate for large bursts.
The basic transfer mechanism is a burst, composed of an address phase and one or more data
phases.
Block diagram of a PCI bus system:
Typical read and write transfers are illustrated below:
PCI Read Cycle. Note that the first data phase is delayed by the target, the second is not delayed
(full speed) and the third is delayed by the master.
PCI Write Cycle. The first two data phases run at full speed, while the second is delayed first by
the master, then by the target.
Required PCI Bus Signals

All required PCI bus signals is shown in the table below with explanations.
CONCLUSION: Thus studied about PCI bus system.

Lab Manual

Uploaded by

Copyright:

Available Formats

You might also like

Lab Manual

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Manual

Uploaded by

Copyright:

Available Formats

LAB MANUAL

Subject: PROCESSOR ARCHITECTURE LAB (PAL)

Prof. Vidhya Seeman Dr. Mahavir Devmane

CLASS: SE DIV: A & B SEM: IV

SR. NO. NAME

Prof. Vidhya Seeman Dr. Mahavir Devmane

AIM: To Dismantle and assemble processor of PC

OUTCOME: Students will be able to assemble a CPU

COMPONENTS: Phillips head screw driver

3. Remove the adapter cards:

4. Remove the drives:

5. Remove the memory module:

7. Remove the motherboard:

Assembling the computer system

OUTCOME: Students will understand ripple carry adder

SOFTWARE REQUIRED: Simulator.jar

Sum=ABC + ABC + ABC + ABC

Carry=ABC + ABC + ABC + ABC

With a little algebraic manipulation, one can see that

Sum= ABC + ABC + ABC + ABC

= (AB + AB) C + (AB + AB) C

Carry= ABC + ABC + ABC + ABC

OUTCOME: Student will understand carry look ahead adder.

SOFTWARE REQUIRED: Simulator.jar

The output sum and carry can be expressed as

CONCLUSION: Thus carry look ahead adder is designed.

OUTCOME: Student will be able to design register and counter.

SOFTWARE REQUIRED: Simulator.jar

CONCLUSION: Thus registers and counters are designed.

OUTCOME: Students will understand the design of Booth’s multiplier.

SOFTWARE REQUIRED: Simulator.jar

CONCLUSION: Thus Booth’s Multiplier is designed.

OUTCOME: Student will be able to design ALU.

SOFTWARE REQUIRED: Simulator.jar

The 4-bit ALU block is combined using 4 1-bit ALU block

CONCLUSION: Thus ALU was designed.

OUTCOME: Student will be able to design a CPU .

SOFTWARE REQUIRED: Simulator.jar

CONCLUSION: Thus CPU was designed.

OUTCOME: Students will be able to design a memory unit.

SOFTWARE REQUIRED: Simulator.jar

CONCLUSION: Thus memory is designed.

OBJECTIVE: To understand the multi-core processors.

OUTCOME: Students will be able to describe multi – core processors.

2. Evolution of Multi-core processor

Multi-core processors could be implemented in many ways based on the application

OUTCOME: Student will understand design of Associative Cache Memory

SOFTWARE REQUIRED: Simulator.jar

CONCLUSION: Thus associative cache memory is designed.

SOFTWARE REQUIRED: Simulator.jar

CONCLUSION: Thus Direct Mapped Cache memory is designed.

OBJECTIVE: To understand about PCI

OUTCOME: Student will understand about PCI

Required PCI Bus Signals

You might also like