2324sem 1-CS2100

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

CS2100

NATIONAL UNIVERSITY OF SINGAPORE

CS2100 – COMPUTER ORGANISATION


(Semester 1: AY2023/24)

Time Allowed: 2 Hours

INSTRUCTIONS TO STUDENTS

1. This assessment paper consists of TWENTY (20) questions in TWO (2) parts and
comprises of FOURTEEN (14) printed pages including an annexe.
2. Answer ALL questions on Examplify. Only provide the answers stated.
3. This is an OPEN BOOK assessment.
4. Electronic, printed/written materials are allowed. Apart from calculators,
electronic devices are not allowed.
5. You are not allowed to run any compiler, spreadsheet software, or another other
programs other than PowerPoint, WORD and a PDF reader. You are only allowed
to use these to view your notes.
6. Page 14 contains the MIPS Data Reference sheet.
7. The maximum mark of this assessment is 100.
Question Max. mark
Part A: Q1 – 15 30
Part B: Q16 15
Part B: Q17 13
Part B: Q18 12
Part B: Q19 10
Part B: Q20 20
Total 100

——— END OF INSTRUCTIONS ———

Page 1 of 14
CS2100

Part A: Multiple-Choice Questions [Total: 15×2=30 marks]


Each multiple-choice question (MCQ) is worth TWO marks and has exactly one correct answer. Please
write your answers in CAPITAL LETTERS.

Consider the following MIPS program fragment, which accesses an 8-element integer array whose
starting address is in register $7. Of these 8 elements, three are below the value of 5. The number of
elements in the array (in this case 8) is in register $8. We assume that we are allowed to use $1.

addi $1, $zero, 0


addi $2, $zero, 0
Loop: beq $2, $8, Exit
sll $3, $2, 2
add $3, $3, $7
lw $4, 0($3)
slti $5, $4, 5
bne $5, $zero, Skip
addi $1, $1, 1
Skip: addi $2, $2, 1
j Loop
Exit: ...

1. Assume that we run this program on a single-cycle non-pipelined MIPS datapath with a clock rate
of 500 MHz (500 x 106 Hz). What is the time taken to execute this program in nanoseconds (ns)?
(Note that 1 ns = 10-9 second.)
A. 150 ns
B. 144 ns
C. 142 ns
D. 134 ns
E. None of the options (A), (B), (C), (D) are correct.

Now let’s suppose we execute the above program on a multi-cycle datapath with the following stage
timings in nanoseconds. Instructions will skip stages they do not use thus saving time.
Fetch Decode Execute Memory Writeback
0.1 ns 0.2 ns 0.2 ns 0.4 ns 0.2 ns

2. What is the clock rate of this system? Note that 1 GHz = 109 Hz.
A. 10 GHz
B. 5 GHz
C. 2.5 GHz
D. 1 GHz
E. None of the options (A), (B), (C), (D) are correct.

Page 2 of 14
CS2100

3. How much time does it take to execute the program above in ns on this new architecture? Assume
that the j instruction only uses the fetch and decode stages.
A. Approximately 26 ns
B. Approximately 52 ns
C. Approximately 105 ns
D. Approximately 210 ns
E. None of the options (A), (B), (C), (D) are correct.

4. Given the same program above running on a 5-stage pipeline with the same stage timings in ns as
shown above. This is an “almost perfect” pipeline; we can ignore all dependencies between
instructions, but each stage has a 0.05 ns delay introduced by the stage’s pipeline register, on top
of the timings shown above.
How long does it take, in ns, to execute the program above?
A. Approximately 15 ns
B. Approximately 19 ns
C. Approximately 30 ns
D. Approximately 34 ns
E. None of the options (A), (B), (C), (D) are correct.

5. Which ONE of the following statements about branching in a pipeline is INCORRECT?


A. In our MIPS pipeline with delay slots, we need to fill the slot with instructions that would be
executed regardless of the branch outcome, without breaking any data dependencies. If no
such instruction is found, we must fill the slots with NOP (no-op).
B. In our MIPS pipeline with a “predict-taken” branch prediction strategy, we will fetch from
the branch target without first knowing the branch outcome.
C. In our MIPS pipeline with early branching and no branch prediction or delay slots, a branch
depending on a lw instruction immediately before it will incur a stall of at least one clock
cycle.
D. If our MIPS pipeline completely does not handle branch hazards at all, then it is equivalent
to a pipeline with a “predict not taken” branch prediction strategy.
E. None of the options (A), (B), (C), (D) are INCORRECT.

6. A single 2-input logic gate can be used to find out whether an 8-bit binary value ABCDEFGH
representing an unsigned integer is divisible by four, that is, its output is 1 if the value is divisible
by four, or 0 otherwise. What gate is it?
A. NAND gate.
B. XOR gate.
C. XNOR gate.
D. NOR gate.
E. None of the options (A), (B), (C), (D) are correct.

Page 3 of 14
CS2100

7. Which of the following statements are true?


(i) An OR function can be created from only AND gates.
(ii) A NOR function can be created from only NAND gates.
(iii) Every product-of-sum expression is a product-of-maxterms expression.
(iv) A simplest sum-of-products expression can be obtained by including only the essential
prime implicants on the K-map but leaving out all the non-essential prime implicants.
A. Only (i).
B. Only (ii).
C. Only (i) and (iv).
D. Only (ii) and (iii).
E. None of the options (A), (B), (C), (D) are correct.

8. Given the logic diagram below:


A
B F

Assuming that complemented literals are not available and all gates (except inverters) are 2-input
gates, which of the following circuits is the smallest circuit that can be used to replace the above
circuit?
A. A circuit with 3 AND gates and 1 OR gate.
B. A circuit with 2 AND gates and 1 OR gate.
C. A circuit with 1 AND gate and 2 OR gates.
D. A circuit with 1 AND gate and 1 OR gate.
E. None of the options (A), (B), (C), (D) are correct.

Page 4 of 14
CS2100

9. Which of the following Boolean expressions can be implemented using a single 24 decoder with
one-enable and regular outputs, without any additional logic gates? Complemented literals are
not available.
(i) ABC
(ii) A'BC'
(iii) A'B'C'
A. Only (i) and (ii).
B. Only (i) and (iii).
C. Only (ii) and (iii).
D. All of (i), (ii) and (iii).
E. None of the options (A), (B), (C), (D) are correct.

10. Which of the following Boolean functions can be implemented using a single 4:1 multiplexer
without any additional logic gates? Complemented literals are not available.
(i) F1(A,B,C) = m(3,4,5,7)
(ii) F2(A,B,C) = m(0)
(iii) F3(A,B,C) = m(0,2,6)
A. Only (i) and (ii).
B. Only (i) and (iii).
C. Only (ii) and (iii).
D. All of (i), (ii) and (iii).
E. None of the options (A), (B), (C), (D) are correct.

11. The purpose of pipelining is to:


A. Increase the throughput of instruction processing.
B. Decrease the latency of instruction processing.
C. Increase the capabilities of certain instructions.
D. Decrease power consumption of the processer.
E. None of the options (A), (B), (C), (D) are correct.

Page 5 of 14
CS2100

12. Consider the following MIPS code:


L8:
lw $2,0($5) # I1
slt $2,$2,124 # I2
bne $2,$0,L3 # I3
. . .
sw $3,0($5) # I4
L3:
add $3,$3,$4 # I5
lw $5,4($5) # I6
bne $5,$0,L8 # I7
? ? ? # delay slot

Which instruction in the code can we move safely to the delay slot if the processor supports
delayed branching?
A. I1
B. I2
C. I4
D. I5
E. I6

13. Which of the following cannot happen in a fully associative cache?


A. Cache misses.
B. Cold misses.
C. Conflict misses.
D. Capacity misses.
E. All of the above.

Page 6 of 14
CS2100

14. Consider the following word (4 byte) memory access trace on a cache with a block size of 4 words, each
word being 4 bytes long:
(1) 0x804ab0 – cache miss
(2) 0x804ab4 – cache hit
(3) 0x804ab8 – cache hit
(4) 0x804ab0 – cache hit
Which of the following statements is true?
A. The miss in access (1) is due to a spatial locality.
B. The hit in access (3) is due to spatial locality.
C. Both hits in access (2) and (3) are due to temporal locality.
D. The hit in access (4) is due to spatial locality.
E. None of the options (A), (B), (C), (D) are correct.

15. In a byte-addressed processor where addresses and words are 32 bits in length, which of the
following statements is correct if the processor has a 4-way set associative cache that can hold
2048 blocks in total where each block consists of 4 words? (Bit position starts with the least
significant bit being position 0.)
A. The tag of an address consists of bits 31 to 16, the set index consists of the bits 15 to 5, while
the offset are the bits 4 to 0.
B. The tag of an address consists of bits 31 to 16, the set index consists of the bits 15 to 6, while
the offset are the bits 5 to 0.
C. The tag of an address consists of bits 31 to 14, the set index consists of the bits 13 to 5, while
the offset are the bits 4 to 0.
D. The tag of an address consists of bits 31 to 13, the set index consists of the bits 12 to 4, while
the offset are the bits 3 to 0.
E. None of the options (A), (B), (C), (D) are correct.

Page 7 of 14
CS2100

Part B: FITB questions [Total: 70 marks]


Q16. MIPS [15 marks]
Consider the following program running on a 5-stage MIPS pipeline:
addi $2, $zero, 0 # i1
addi $3, $zero, 0 # i2
Loop: add $4, $2, $8 # i3
lw $5, 0($4) # i4
beq $5, $zero, skip # i5
addi $3, $3, 1 # i6
Skip: addi $2, $2, 4 # i7
slti $5, $2, 8 # i8
bne $5, $zero, Loop # i9
Exit: # i10
# i11
# i12

This program processes an array of non-zero elements. The comments shown are instruction IDs that
you may or may not want to use for working out your solution. Instructions i10 to i12 are outside of
our program but you may require them for your calculations.
(a) Which register is likeliest to hold the array’s base address? Fill in only the register number. For example, if
you think that the answer is $0, fill in 0. [1 mark]

For all your answers below, count only up to the last cycle of the final time that i9 is executed.
(b) Assuming that this program is run on a pipeline without forwarding but with early branching in
the ID stage and no branch prediction strategies or delay slots, how many cycles does it take to
run the program above to completion? [5 marks]
(c) Assuming that this program is run on a pipeline with forwarding and early branching in the decode
stage, and with no branch prediction strategies or delay slots, how many cycles does it take to run
the program above? [5 marks]
(d) Assuming that this program is run on a pipeline with forwarding and early branching and a predict-
not-taken prediction strategy, how many cycles does it take to run the program above?
[4 marks]

Page 8 of 14
CS2100

Q17. Combinational circuits [13 marks]


(a) Given the following Boolean function P(W,X,Y,Z), where d’s are don’t cares:
P(W,X,Y,Z) = m(1,8,13) + d(0,2,5,9,10,15).
(i) How many prime implicants are there in the K-map of P? [1 mark]
(ii) How many essential prime implicants are there in the K-map of P? [1 mark]

(b) Given the following Boolean function Q(W,X,Y,Z), where d’s are don’t cares:
Q(W,X,Y,Z) = M(5,7,8,10,13,15)  d(0,2,3,4,9,14).
(i) How many prime implicants are there in the K-map of Q? [1 mark]
(ii) How many essential prime implicants are there in the K-map of Q? [1 mark]

(c) A Boolean function Z(A,B) is implemented using a half adder, two inverters, and a 2-to-1 priority
encoder as shown below. The function tables of the half adder and priority encoder are also
shown below.
Half adder Priority encoder
Half Priority X Y C S F1 F0 G
Adder Encoder 0 0 0 0 0 0 X
A X C F1 0 1 0 1 0 1 0
G Z 1 0 0 1 1 X 1
B Y S F0
1 1 1 0

The circuit above may be replaced by a single 2-input logic gate. What is the logic gate? Write the
name of the logic gate in upper-case letters (OR, AND, NOR, NAND, XOR, XNOR). [2 marks]

(d) A Boolean function R(A,B,C,D) is implemented with a 4-bit magnitude comparator, an OR gate
and an AND gate as shown below:

A X3 4-bit
B X2 COMP
C X1
D X0 X<Y
X=Y R(A,B,C,D)
Y3
Y2 X>Y
Y1
Y0

What is R(A,B,C,D) in m notation? Fill in the list of minterm numbers in m(…), arranging the
numbers in increasing order and separating them with commas, for example, if the answer is
m(3,5,9), write “3,5,9” (without the double quotes). Do not write m, or add spaces or other
punctuation marks in your answer. [3 marks]

Page 9 of 14
CS2100

(e) A Boolean function S(A,B,C,D) is implemented with a 24 decoder with one-enable, two 2:1
multiplexers and an OR gate as shown below:

24
DEC 0 0
S0
1 1
A S1 D S
2 0
B S0
S0
EN 3 1
D
C

What is S(A,B,C,D) in m notation? Fill in the list of minterm numbers in m(…), arranging the
numbers in increasing order and separating them with commas, for example, if the answer is
m(3,5,9), write “3,5,9” (without the double quotes). Do not write m, or add spaces or other
punctuation marks in your answer. [4 marks]

Page 10 of 14
CS2100

Q18. Sequential circuits [12 marks]


(a) The logic diagram on the right shows
a sequential circuit with a D flip-flop D Q A
and a T flip-flop. AB denotes the
Q'
state in binary. However, we will
refer to the state values in decimal
(base 10).
T Q B
For parts (i) to (iv) below, write your
answer as a single decimal value Q'
without any space or punctuation.
(i) What is the next state after state 0? Clock [1 mark]
(ii) What is the next state after state 1? [1 mark]
(iii) What is the next state after state 2? [1 mark]
(iv) What is the next state after state 3? [1 mark]

(b) The state diagram on the right shows a 0


sequential circuit with 3 states: state 0 0
(AB=002) through state 2 (AB=102) and an
external input x. Design this sequential circuit 1 1
using JK flip-flops.
0 1 2 0,1

For parts (i) to (iv) below, note that the NOT operator is the single quote ' (eg: A'), the OR operator
is the plus sign + (eg: A+B), and the AND operator is the full stop . (eg: A.B). You should not add a
full stop or other punctuation or any space at the end of your answer, for example, if your answer
is A+B, type “A+B” (without the double quotes) and not “A+B.” or “A + B.”.
(i) Write out the simplified SOP expression for the flip-flop input JA. [1 mark]
(ii) Write out the simplified SOP expression for the flip-flop input KA. [1 mark]
(iii) Write out the simplified SOP expression for the flip-flop input JB. [1 mark]
(iv) Write out the simplified SOP expression for the flip-flop input KB. [1 mark]

For parts (v) to (viii) below, assume that you have implemented this sequential circuit with the
correct simplified SOP expressions for the flip-flop input functions. Write your answer as a single
decimal value without any space or punctuation.
(v) What is the minimum number of gates required to implement this circuit? [1 mark]
(vi) If the circuit is in state 3, what is the next state on input x=0? [1 mark]
(vii) If the circuit is in state 3, what is the next state on input x=1? [1 mark]
(viii) Is the circuit self-correcting? Answer “Yes” or “No” (without the double quotes). [1 mark]

Page 11 of 14
CS2100

Q19. Pipelining [10 marks – 1 mark for each answer below]


Consider the following MIPS processor:

Now consider the following sequence of MIPS instructions:


I1: add $7, $8, $9
I2: lw $10, 4($7)
I3: add $10, $10, $10
I4: add $20, $10, $2

Assuming that instruction I1 has reached the MEM stage in the current cycle, fill in the value for the
labelled signal in the table below as binary strings of the correct length. If there is a signal that is
unknown or “don’t care”, fill in with “X” – but the number of X’s must be of the correct length. Note
that the correct length of the binary (or don’t care) string as well as the content of the string are part
of the correctness criteria for this question.
Signal label Value as a binary string of the correct length









Page 12 of 14
CS2100

Q20. Cache [20 marks – 20/24 mark for each answer below, with rounding]
Consider the following MIPS code running on a byte-addressed 32-bit MIPS machine (i.e., 32-bit word
size and 32-bit addresses):
# Assume $a0 contains the address of
# array A which is 0x1000.
#
# Assume $a1 contains the address of
# array B which is 0xa010.
addi $t1, $a0, 64 # PC = 0x100
L2:
lw $t0, 0($a0) # PC = 0x104
sw $t0, 0($a1) # PC = 0x108
lw $t0, 4($a0) # PC = 0x10c
sw $t0, 16($a1) # PC = 0x110
lw $t0, 8($a0) # PC = 0x114
sw $t0, 32($a1) # PC = 0x118
lw $t0, 12($a0) # PC = 0x11c
sw $t0, 48($a1) # PC = 0x120
addi $a0, $a0, 16 # PC = 0x124
addi $a1, $a1, 4 # PC = 0x128
bne $t1, $a0, L2 # PC = 0x12c
OUT:
... # PC = 0x130

(a) Assuming a data cache that is direct-mapped with 4 blocks, each of which consists of 2 words that
were empty at the beginning of the loop, i.e., when PC = 0x100. Fill in the final state of the
cache at the exit of the loop, i.e., when PC = 0x130. Use the “M[address]” notation for the
contents of the “Word 0” and “Word 1” below (where “address” is a length 4 hexadecimal string
prefixed by “0x” and the tag value is a hexadecimal string of length 8 prefixed by “0x”). Note that
failure to use the correct format may lead to marks deduction since we are using auto-grading.
Index Tag value Word 0 Word 1
0
1
2
3

(b) Suppose the cache is instead two-way set associative while having the same block size of 2 words.
Each way holds 1 block. What would be the final state at the exit of the loop assuming a LRU (Least
Recently Used) replacement policy? In order to account for possible ambiguity in the allocation
of free blocks, write your answer such that the blocks in Way 0 will have lower 32-bit addresses
(nearer 0) than those in Way 1. Do note that this shall be one of the criteria for correctness since
we use automated grading.
Way 0 Way 1
Set Index
Tag Word 0 Word 1 Tag Word 0 Word 1
0
1
=== END OF PAPER ===

Page 13 of 14
CS2100

Page 14 of 14

You might also like