Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

th

4 Semester CSE Computer Architecture (CS403)


Assignment–02(Individual) [Instruction Level Parallelism]
Last date of submission: 13/04/2019
NB:- Write the answers in your own way and do not copy from other

1. How long would the following instruction take to execute on an in-order superscalar processor with two
execution units, where each execution unit can execute any operations. Load operations have a latency of
4 cycles, store operations have a latency of 3 cycles and all other operations have a latency of 2 cycles.
LD R4, (R5): LD R7, (R8) : ADD R9, R4, R7: ST (R20), R9 : MUL R12,R13,R14 : SUB R2,R3,R1
ST (R2), R15 MUL R21, R4, R7: ST (R22), R23: ST (R24), R21

2. A 5-satge pipeline processor has instruction fetch (IF), ID, OF, perform operation (OP) and write operand
(WO) stages. The IF, ID, OF and WO stages take 2 clock cycle for any instruction. The PO stages take 2
clock cycle for ADD and SUB instruction, 3 clock cycle for MUL instruction and 4 clock cycles for DIV
instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles
needed to execute the following instruction?
MUL R2,R0,R1 : ADD R5,R5,R2 : DIV R6,R5,R2 : SUB R5,R2,R6

3. How long would the following sequence of instructions take to execute on an in-order processor with two
execution units, each of which can execute any instruction? Load operations have a latency of two cycles,
and all other operations have a latency of one cycle. Assume that the pipeline depth is 5 stages.
LD r1, (r2): ADD r3, r1, r4: SUB r5, r6, r7: MUL r8, r9, r10
4. What is superscalar and super pipelined processor? Measure their performance in term of speedup.
5. Describe the architecture of CISC, RISC and VLIW processor.
6. Convert the following code into 7 level unrolled loop
for (i = 0; i < 200; i ++ )
a[i] = b [i] + c[i];
7. Convert the following code into sequence of scalar instruction and then into its vector form
a. 10 I = 1, 100, 1 c. Do 20 I=2, N, 2
10 (I) = 3*A(I+2) + B(I+3) A (I) = B (I–2)
b. A(0)= X 20 B (I) = 2* B (I+1)
Do 20 I = 3, N d. Do 20 I = 8, 120, 2
20 (I)=A(I–3)* B(I) + C(I+2) 20 A(I)=A(I–3)* B(I) + C(I+2)
8. Describe the architecture of register to register and memory to memory vector processor
9. Calculate the speedup of vector processor.
10. Explain scatter and gather instruction with example

1|Page
11. Design four-way low-order interleaving with two memory bank. The length of memory address is five
bits including two bits for word address.
12. Design high-order memory interleaving. The length of memory address is five bits including two bits for
module address.
13. Draw and describe S-access memory interleaving and their timing diagram of access pattern
14. Explain pipeline chaining and timing diagram with following instruction
V2 = V1 * S2, V3 = V2 + V0 and V4= V3 – V1
15. Apply vector register optimization technique in the instruction: A – B*C

Course outcome: Ability to understand the Instruction Level Parallelism, familiarize with the concept of superscalar
and vector architectures to improve the performance of a system.

2|Page

You might also like