Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

JANUARY 2021 Examination Period

FACULTY OF ENGINEERING

COMPUTER ARCHITECTURE

TIME ALLOWED:
2 hours

Answers to COMS10015: COMPUTER ARCHITECTURE


Part 1: weeks 1 to 5
Q1. Imagine that within a given C function, you declare a signed, 8-bit integer variable x
(i.e., whose type is int8_t), then assign it the (decimal) value —10(19). C represents
signed integers using two’s-complement: if the expression

~( ( x >> 2) * OxF4 )

is evaluated, what (decimal) value results?


Ac 10g)
B. 1010)
SE Wea)
D. 54;10)
E. 203 (10)
[1 mark]

Solution: First, note that

i —10(10)

ee et
+> 11110110

Next, since x is signed (implying an arithmetic vs. logical right-shift), we can evaluate
the expression as follows

~( (x >> 2) * OxF4 ) 4((11110110 >> 2) @ 11110100)


-=(11111101 6 11110100)
—=(00001001)
ee

11110110
=F shea ge) dL oe aL ae
—10(10)

Q2. A m-output, 1-bit demultiplexer connects a 1-bit input x to one of m separate 1-bit
outputs (say r; for O < i < m). The output is selected using an /-bit control signal c (or,
equivalently, c is a collection of / separate 1-bit control signals). If m = 5, what value
of / is required?
A. 0
B. 1
C. 2

Page 2 of 26 Qu. continues ...


(cont.)

D. 3
E. 4
[1 mark]

Solution: Given an l-bit control signal c, the demultiplexer can select between at most
2l outputs: we treat c as an unsigned, l-bit integer which will clearly range in value
between 0 and 2l − 1. In general, we want an l st. 2l ≥ m so each output can be
specified; typically m is a power-of-two, since this matches the maximum number of
outputs that can be specified. However, in this case we have m = 5.
Since 22 = 4 < 5 and 23 = 8 > 5 we know a 2-bit control signal is not enough (it
cannot select r4 since 0 ≤ c < 4), but a 3-bit control signal is (although it could cope
with upto m = 8, and since 0 ≤ c < 8 select r5 , r6 and r7 if they existed). In summary
then, l = 3 is the correct answer.

Q3. Consider a DRAM-based memory device with a capacity of 65536 addressable bytes. Of
the following options
A. 8 address pins, 65536 cells
B. 16 address pins, 65536 cells
C. 8 address pins, 524288 cells
D. 16 address pins, 524288 cells
E. None of the above
which offers the most likely description of said device?
[1 mark]

Solution: There are 216 addressable bytes, meaning a 16-bit address needs to be sup-
plied. However, in contrast to an SRAM memory, a DRAM memory will normally use a
2-step (or more, potentially) approach: half the address is supplied by each of the steps
(under control of row and column address strobe signals), which requires only half the
number of address pins.
The memory stores bytes, i.e., 8-bit elements, so we expect there to be 8 duplicated
arrays each consisting of 65536 cells. Overall, there will be 8·65536 = 524288 cells. So,
in summary, an answer of 8-bit address pins, and 524288 cells is correct; the alternative
of 16-bit address pins, and 524288 cells is not wrong per se, but certainly less likely in
practice.

Q4. Which of the following options would you not expect (or at least it would be uncommon)
to see in the specification of an ISA:

Page 3 of 26 Turn Over/Qu. continues . . .


(cont.)

A. The word size


B. The set of accessible general-purpose registers
C. The way a given instruction is expressed as machine code
D. The execution latency of a given instruction
E. The set of accessible special-purpose registers
[1 mark]

Solution: The option that stands out as unlikely to be included in the ISA is execution
latency. This is essentially a performance metric, measuring how long, e.g., number
of cycles, an instruction takes to execute. As a result, it is a property of the micro-
architecture not the ISA. Put another way, since the ISA is an interface that allows
flexibility wrt. the micro-architectural implementation, it is unlike the former would
include such a measure: doing so would constrain the later, reducing said flexibility.

Q5. Which of the following statements best describes the semantics of a relative branch
instruction?
A. The program counter is reset to zero, so is independent from the current
program counter value
B. The branch target is at an offset from, and so is dependent on the current
program counter value
C. The branch target is relatively far from the current program counter value
D. The branch target is greater than the current program counter value
E. The program counter is only updated if a condition is true
[1 mark]

Solution: Some of the statements describe other ways to classify a branch; for example,
the last statement describes a conditional branch. Some of the statements are nonsense;
for example, the term “relatively far” is subjective so has no clear meaning in this
context.
The semantics of a relative branch instruction can be written as

PC ← PC + x,

st. the branch target (i.e., new program counter), is an offset from the current program
counter value. The question does not specify whether the offset x is an immediate, but,
either way, the branch target is clearly dependant on the current program counter value.

Page 4 of 26
Q6. Consider the equivalence

(y ∧ ¬x) ∨ (x ∧ ¬y ) ≡ (x ∨ y ) ∧ ¬(x ∧ y ),

the left-hand side of which can be manipulated into the right-hand side by applying the
following sequence of Boolean axioms:

identity inverse distribution commutativity distribution commutativity X

The final axiom is missing, i.e., replaced with X: which of the following options for X
yields a valid derivation?
A. Absorption
B. Idempotency
C. Implication
D. Null
E. de Morgan
[2 marks]

Solution: The derivation is as follows


LHS = (y ∧ ¬x) ∨ (x ∧ ¬y )
= (y ∧ ¬x) ∨ 0 ∨ (x ∧ ¬y ) ∨ 0 (identity)
= (y ∧ ¬x) ∨ (y ∧ ¬y ) ∨ (x ∧ ¬y ) ∨ (x ∧ ¬x) (inverse)
= (y ∧ (¬x ∨ ¬y )) ∨ (x ∧ (¬y ∨ ¬x)) (distribution)
= ((¬x ∨ ¬y ) ∧ y ) ∨ ((¬x ∨ ¬y ) ∧ x) (commutativity)
= (¬x ∨ ¬y ) ∧ (x ∨ y ) (distribution)
= (x ∨ y ) ∧ (¬x ∨ ¬y ) (commutativity)
= (x ∨ y ) ∧ ¬(x ∧ y ) (de Morgan)
= RHS

suggesting that the correct option is the de Morgan axiom.

Question Q7 and Question Q8 both relate to Figure 1, which describes the implementation
of two components denoted C0 and C1 . Each component Ci produces one output ri given
two inputs x and y , and has been implemented using MOSFET transistors.

Q7. The truth table below includes 5 possibilities for outputs r0 and r1 (stemming from
instances of C0 and C1 ), given x and y . Recall that Vss and Vdd are used to represent 0

Page 5 of 26 Turn Over/. . .


and 1 respectively: which option is correct?

A. B. C. D. E.
z }| { z }| { z }| { z }| { z }| {
x y r0 r1 r0 r1 r0 r1 r0 r1 r0 r1
0 0 1 0 0 0 1 0 Z 0 1 Z
0 1 1 1 0 0 0 0 Z Z Z Z
1 0 1 1 0 0 0 0 Z Z Z Z
1 1 0 1 1 0 0 0 1 Z Z 0

[2 marks]

Solution: Note that if a given ri is not connected to either Vdd or Vss , it is deemed to
have the high impedance value Z. This suggests the correct truth table is

x y r0 r1
0 0 1 Z
0 1 Z Z
1 0 Z Z
1 1 Z 0

The reason is because C0 is st. r0 connects to Vdd via two (pull-up) P-type MOSFETs;
since these MOSFETs only connect source to drain if the gate is Vss , we can say that
r0 = 1 if x = y = 0 and r0 = Z (i.e., disconnected) otherwise. Conversely, C1 is st.
r1 connects to Vss via two (pull-down) N-type MOSFETs; since these MOSFETs only
connect source to drain if the gate is Vdd , we can say that r1 = 0 if x = y = 1 and
r1 = Z (i.e., disconnected) otherwise.

Q8. The vendor of these components claims they can be used to implement any Boolean
function; their reasoning is based on the fact that a NAND gate can be implemented
using instances of C0 and C1 . Assuming you want to minimise the number of C0 and C1
instances, how many of each are required to implement such a NAND gate?

A. B. C. D. E.
z }| { z }| { z }| { z }| { z }| {
C0 C1 C0 C1 C0 C1 C0 C1 C0 C1
1 1 5 3 3 5 3 3 5 5

[2 marks]

Page 6 of 26 Qu. continues . . .


(cont.)

Solution: Note that the option using 1 instance of C0 and 1 instance of C1 sort of
makes sense: one can implement a NAND gate using 2 P-type and 2 N-type MOSFETS,
matching those that exist within instances of C0 and C1 . However, the question explicitly
says we need to use instances of C0 and C1 : we cannot, for example, “merge” their
internal implementation to make this option viable. So, as a first step, we implement a
NOT gate as follows:

t0
C0

x r

C1
t1

This is useful because we can reuse it when implementing a NAND gate, but also
because it explains the design approach involved: the idea is basically that the output
is driven by one instance of C0 or C1 at a time, with all the others producing the high
impedance value (which is “overridden” by the driving value). The behaviour can be
described as follows:
x t0 t1 r
0 1 Z 1
1 Z 0 0
Using the same design approach, we can now implement an NAND gate as follows:

x y
t0
C0

t1
C0

C0
t2

C1
t3

Page 7 of 26 Turn Over/. . .


Applying the same argument wrt. behaviour, we find that

x y t0 t1 t2 t3 r
0 0 1 Z Z Z 1
0 1 Z 1 Z Z 1
1 0 Z Z 1 Z 1
1 1 Z Z Z 0 0

matches the truth table for NAND: remembering to count the components within each
NOT gate, we therefore use 5 instances of C0 and 3 instances of C1 .
As an aside, note that one can implement a NOR gate by swapping the components
types in the NAND implementation: we therefore implement the required behaviour
using 3 instances of C0 and 5 instances of C1 .

Q9. Consider the specification of an ISA, which includes a) a fixed-length, 32-bit instruction
encoding, and b) a byte addressable memory, with a 32-bit address space; instructions
are required to be aligned in memory. Imagine the ISA is implemented by some micro-
architecture, in which the program counter is a register comprised of n D-type latches:
what is the minimum n possible?
A. 0
B. 14
C. 16
D. 30
E. 32
[2 marks]

Solution: Clearly is must be possible for PC to point at any instruction in memory.


There would be 232 addressable bytes in the specified 32-bit address space, but 230
addressable instructions as a result of the requirement for instructions to be aligned.
As such, one could argue a 30-bit PC will suffice: since the address x of an instruction
is required to satisfy x ≡ 0 (mod 4), due to their fixed 32-bit (or 4-byte) length, the 2
LSBs of the full 32-bit PC will always be 0 and so need not be stored in a D-type latch.

Question Q10 to Question Q13 all relate to Figure 2 and Figure 3, which describe an FSM
implementation and an associated waveform. When read left-to-right, the waveform captures
how values of Φ1 and Φ2 (a 2-phase clock), and r st (a reset signal) change over time; the
other input s maintains the value A6(16) throughout. Note that the waveform is annotated
with some instances and periods in time (e.g., ρ, and each ti ).

Page 8 of 26
Q10. What is the value of r at time t0 ?
A. 0
B. 1
C. undefined
[1 mark]

Solution: Before t0 , we can see that a pulse on r st at the same time as Φ2 = 1; this
acts as a reset, storing s (as a result of the multiplexers) into the top register. Then,
at t0 we find that Φ1 = 1: during this period, the design stores a the bottom register
as provided by the top register (which, at that point, is fixed since Φ2 = 0). As such,
at t0 we expect the bottom register to store s and hence r to be the MSB of s, i.e.,
r = s7 = 1.

Q11. What is the value of r at time t1 ?


A. 0
B. 1
C. undefined
[1 mark]

Solution: At t1 the design has performed one cycle relative to t0 : the value stored in
the bottom register at t0 is updated by the middle of the design, then stored in the top
register, and finally stored back in the bottom register (ready for the next cycle). The
middle of the design is fairly simple. Ignoring the less-significant end since this does not
impact r (yet), it basically just shifts the bits toward the more-significant end. At t1 ,
we therefore expect the bottom register to be st. r = s6 = 0.

Q12. What is the value of r at time t2 ?


A. 0
B. 1
C. undefined
[1 mark]

Solution: This design is a Linear Feedback Shift Register (LFSR); such a design might
be used to support a variety of use-cases, with a common example being the generation
of (pseudo-)random bits. As the name suggests, an LFSR is essentially an n-bit shift
register. After initialising (or seeding) the register state with s, successive updates are
performed; each such update a) shifts-out an output bit (wlog. the MSB), which forms

Page 9 of 26 Turn Over/. . .


the LFSR output, and b) shifts-in an input bit (wlog. the LSB), which is computed
using a linear function of the state. A set T captures the tap bits, which specify the
function of x used to compute the input bit; given n, T is selected to maximise the
period of the LFSR, noting that x = 0 should be disallowed to avoid trivial behaviour.
Both Fibonacci- and Galois-form LFSR designs are possible; in this case, we have an
example of the former, with n = 8 and T = {3, 4, 5, 7}. Given a state x, the update
process, yielding an output bit r and a next state x 0 , can be formalised as

r = x7
x 0 = (x  1) k ( i∈T xi )
L

= (x6 k x5 k · · · k x0 ) k (x3 ⊕ x4 ⊕ x5 ⊕ x7 )

As such, we can use a table to trace the state and output as it is updated:

i x x0 r
A6(16) seed x with s
0 A6(16) 4C (16) 1 generate 0-th output bit
1 4C (16) 99(16) 0 generate 1-st output bit
2 99(16) 33(16) 1 generate 2-nd output bit
3 33(16) 66(16) 0 generate 3-rd output bit
4 66(16) CD(16) 0 generate 4-th output bit
5 CD(16) 9A(16) 1 generate 5-th output bit
6 9A(16) 35(16) 1 generate 6-th output bit
7 35(16) 6A(16) 0 generate 7-th output bit
8 6A(16) D4(16) 0 generate 8-th output bit
.. .. .. ..
. . . .

Using this table, we can infer that at time t2 (where the 8-th output bit is generated,
which is the first bit which is computed from x vs. matching s), r = 0.

Q13. Consider the following NAND-based implementations

D-type latch 7→ Figure 4


2-input XOR gate 7 → Figure 5
2-input, 1-bit multiplexer 7 → Figure 6

relating to components used within Figure 2. The waveform is annotated with ρ, which
illustrates the clock period. If a 2-input NAND gate imposes a gate delay of Tnand =
10ns, which value most closely reflects the maximum possible clock frequency?
A. 1.0MHz
B. 1.2GHz
C. 3.8MHz

Page 10 of 26 Qu. continues . . .


(cont.)

D. 5.9MHz
E. 6.6MHz
[3 marks]

Solution: Within the clock period (i.e., within the “time limit” which ρ dictates), two
steps must be completed; those steps are completed when Φ1 = 1 and Φ2 = 1 re-
spectively, and can be described as 1) the top register must be updated with a value
computed by the middle of the design (i.e., the combinatorial logic) from the value in
the bottom register, then 2) the bottom register must be updated with the value in
the top register. So if Tlatch and Tlogic are the critical paths associated with a D-type
latch and said combinatorial logic respectively, then we can write

ρ ≥ (Tlogic + Tlatch ) + (Tlatch ).

Adding more detail, we could then reflect the critical path of components constituting
the combinatorial logic: writing

Tlogic = Txor + Txor + Tmux

then reflects the fact that the critical path includes two XOR gates and one multiplexer.
Overall then, we have
ρ ≥ (Txor + Txor + Tmux + Tlatch ) + (Tlatch )
≥ 2 · Tlatch + 2 · Txor + Tmux
Since we have the design of each component, we can, as a next step, be more concrete
about each term above: inspecting the NAND based designs, we can deduce
Tlatch = 4 · Tnand = 40ns
Txor = 3 · Tnand = 30ns
Tmux = 3 · Tnand = 30ns
and thus
ρ ≥ 2 · Tlatch + 2 · Txor + Tmux
≥ 2 · 40ns + 2 · 30ns + 30ns
≥ 80ns + 60ns + 30ns
≥ 170ns
Tlatch arguably represents the more tricky case, noting that the cross-coupled right-
hand side means the path is through 4 NAND gates. Finally, the maximum clock
frequency is inversely proportional to this critical path so we find
fmax = 1/ρ
= 1/170ns
' 5.9MHz
is correct.

Page 11 of 26 Turn Over/. . .


Q14. Consider a counter machine with r = 4 registers which supports the instruction set
shown in Figure 7. After implementing the counter machine, the program, held in memory
as machine code, is fixed to

MEM = h 0A3(16) , 060(16) , 080(16) , 097(16) ,


050(16) , 020(16) , 083(16) , 0C0(16) i

Using the initial configuration

C0 = (l = 0, v0 = 0, v1 = 2, v2 = 1, v3 = 0)

and a subsequent trace of execution, decide which of the following options best describes
the purpose of this program.
A. Compare the values in R1 and R2 , setting R3 to reflect the result
B. Add the values in R1 and R2 , setting R3 to reflect the result
C. Swap the values in R1 and R2
D. Copy the value in R1 into R2 , retaining the value in R1
E. Copy the value in R1 into R2 , clearing the value in R1
[3 marks]

Solution: Producing a solution to this question requires two steps. First, we need to
decode the machine code program: using Figure 7, we find that

0A3(16) = 010100011(2) 7→ L0 : if R2 = 0 then goto L3 else goto L1


060(16) = 001100000(2) 7 → L1 : R2 ← R2 − 1 then goto L2
080(16) = 010000000(2) 7→ L2 : if R0 = 0 then goto L0 else goto L3
097(16) = 010010111(2) 7→ L3 : if R1 = 0 then goto L7 else goto L4
050(16) = 001010000(2) 7→ L4 : R1 ← R1 − 1 then goto L5
020(16) = 000100000(2) 7→ L5 : R2 ← R2 + 1 then goto L6
083(16) = 010000011(2) 7→ L6 : if R0 = 0 then goto L3 else goto L7
0C0(16) = 011000000(2) 7→ L7 : halt

Second, we produce a trace of execution for the program: starting with the initial

Page 12 of 26 Qu. continues . . .


(cont.)

configuration given, we find that

C0 = (0, 0, 2, 1, 0)
L0 if R2 = 0 then goto L3 else goto L1
C1 = (1, 0, 2, 1, 0)
L1 R2 ← R2 − 1 then goto L2
C2 = (2, 0, 2, 0, 0)
L2 if R0 = 0 then goto L0 else goto L3
C3 = (0, 0, 2, 0, 0)
L0 if R2 = 0 then goto L3 else goto L1
C4 = (3, 0, 2, 0, 0)
L3 if R1 = 0 then goto L7 else goto L4
C5 = (4, 0, 2, 0, 0)
L4 R1 ← R1 − 1 then goto L5
C6 = (5, 0, 1, 0, 0)
L5 R2 ← R2 + 1 then goto L6
C7 = (6, 0, 1, 1, 0)
L6 if R0 = 0 then goto L3 else goto L7
C8 = (3, 0, 1, 1, 0)
L3 if R1 = 0 then goto L7 else goto L4
C9 = (4, 0, 1, 1, 0)
L4 R1 ← R1 − 1 then goto L5
C10 = (5, 0, 0, 1, 0)
L5 R2 ← R2 + 1 then goto L6
C11 = (6, 0, 0, 2, 0)
L6 if R0 = 0 then goto L3 else goto L7
C12 = (3, 0, 0, 2, 0)
L3 if R1 = 0 then goto L7 else goto L4
C13 = (7, 0, 0, 2, 0)
L7 halt

where the final configuration halts execution. As a result, stating that the program will
“copy the value in R1 into R2 , clearing the value in R1 ” is the best match.
Note that the program itself is in two parts: L0 to L2 clear (or zero) R2 , and L3 to L6
move R1 into R2 . Also note that it depends on having R0 = 0, allowing the construction
of unconditional branches in L2 and L6 .

Page 13 of 26 Turn Over/. . .


Part 2: weeks 6 to 10
Q15. Consider the following instructions.

SUB rl, r2 r1 < rl — r2; set the carry flag (CF) if the result is negative
SBB rl, r2 r1 + rl —(r2+ CF)
AND ri, r2 r1 + r1l&r2
ADD rl, r2 ri + rl1+r2

which are inspired by the x86 ISA, and operate on 4-bit registers. If we set r1 = 3, r2
= 9, and r8 = 0, then execute the following program

sub ri, r2
sbb r3, r3
and r3, ri
add r2, r3

what values do the three registers r1, r2, and r3 have afterwards?

A. ri =3,r2 =9,r38=1
B. ri =9$9,r2=3,r3=1
C. rl = -6, r2 = 3, r3 = -6

D. ri = 6,r2 =9,r3 = 6
E.ri =9,r2? =3,r3 = 0
[2 marks]

Solution: The instruction sequence computes r2 <+ min(r1, r2).


The SUB instruction computes 3 — 9 = —6, setting r1=-6 and CF=1.
The SBB instruction computes 0 — (0+ 1) = —1, setting r3=-1.
The AND instruction computes —1 A —6, which in two’s complement equals —6, setting
r3=-6.

Finally, the ADD instruction computes 9 + —6 = 3, setting r2 = 3.

Q16. Which of the following is not usually present in a stack frame:


A. Stack pointer
B. Return address
C. Return value

Page 14 of 26 Qu. continues ...


(cont.)

D. Subroutine parameters
E. Local variables
[1 mark]

Solution: Requires understanding on what the stack is doing to support subroutines,


but mostly just a fact check.

Q17. A program has the following mix of instructions:

Instruction Cycles Frequency


Load 5 32%
Store 3 15%
Branch 8 18%
Arithmetic 1 35%

What is the average number of clock cycles per instruction (CPI) for this program?
A. 22.59
B. 0.17
C. 4.25
D. 3.84
E. 384.00
[2 marks]

Solution: The CPI is computed as the weighted average of the instruction cycles where
the weights are the frequencies.

5 × 32 + 3 × 15 + 8 × 18 + 1 × 35
CP I = = 3.84
100

Q18. In a given computer system, accesses to main memory by the processor are supported
by a 16-way set-associative cache. Memory addresses are 16 bits, and each addressable
element has a word size of 1 byte. The cache has a capacity of 32 KiB (32,768 bytes),
cache blocks are of size 64 bytes, and cache sets are numbered starting at 0 (which
contains the lowest memory addresses).
Consider the memory address 0110100100110101, where here the highest (or most-
significant) bits are on the left-hand side. Which set is this address stored in, and what
tag is stored in the tag store?
A. Set 12, tag 01101001001

Page 15 of 26 Turn Over/Qu. continues . . .


(cont.)

B. Set 53, tag 01101


C. Set 4, tag 01101
D. Set 9, tag 011010
E. Set 309, tag 01101
[3 marks]

Solution: There are 16 cache lines per set.


Cache lines are 64 bytes, so 6 bits are required to index bytes within a cache line.
With a capacity of 32 KiB, there are (32 ∗ 1024)/(16 ∗ 64) = 32 = 25 sets. Therefore
5 bits are required to index into the cache.
The memory address is 0110 1001 0011 0101. The lowest 6 bits are used to index
within the block: 11 0101. The next lowest 5 bits determine the set: 001 00 = 4 in
decimal. The highest 16 − 5 − 6 = 5 bits determine the tag: 0110 1.
Therefore, the memory is placed in set 4 with tag 01101.

Question Q19 and Question Q20 both refer to the the two’s complement addition of the
8-bit numbers 00101111 (47 in decimal) and 01010001 (81 in decimal).

Q19. Compute the two’s complement addition of the two 8-bit numbers.
A. 10000000 (128 in decimal)
B. 10000000 (-128 in decimal)
C. 01111110 (126 in decimal)
D. 01111100 (124 in decimal)
[2 marks]

Solution:

00101111 (47 in decimal)


01010001 (81 in decimal)
--------
10000000 result
01111111 carry out

Q20. Did the result overflow?


A. Yes

Page 16 of 26 Qu. continues . . .


(cont.)

B. No
[1 mark]

Solution: The carry out bit if 0 and most-significant bit of the answer 1 differ, therefore
overflow has occurred.
An alternative explanation is in Two’s Complement, the addition to the two positive
numbers yields a negative number.

Q21. Why is it important to support different execution privileges at an architectural level?


A. To ensure user programs cannot access restricted portions of memory
B. To ensure kernel cannot access application memory
C. To allow relocation of programs in memory
D. To allow the kernel to schedule multiple programs on a single processor
[1 mark]

Solution: Synthesis of knowledge from the lectures.

Q22. The Hex 8 ISA has a word length of 8 bits. Each instruction is formed of a 4-bit
opcode and a 4-bit operand, which limits the number of instructions to 16.
Some of the instructions in the Hex 8 ISA do not require operands. Which of the following
schemes would allow the ISA to be expanded to contain 31 instructions, while retaining
4-bit opcodes which can be decoded in a single cycle?
A. Include an instruction to specify the operand is to be interpreted as an opcode
of an instruction which doesn’t require an operand. E.g. EXE ADD
B. Send an interrupt to indicate the instruction opcode is larger than 16. E.g. A
4-bit opcode refers to two instructions depending on an interrupt signal
C. Use a mechanism similar to the PFIX instruction to construct the opcode from
two instructions
D. Write a micro-program to implement the additional instructions
[3 marks]

Solution: The only way to get 31 instructions of the solutions listed is to use one
instruction of the original 16 to designate the operand is the opcode. This yields
15 + 16 = 31 instructions.
The other methods yield 32 and are impractical, require two cycles or are incorrect
(respectively).

Page 17 of 26 Turn Over/. . .


Q23. When in the instruction execution cycle can interrupts occur?
A. Fetch
B. Decode
C. Execute
D. Writeback
E. At any time
[1 mark]

Solution: Knowledge test, leading to next question.

Q24. When might a processor handle interrupts?


A. Immediately
B. When the program has finished
C. After the current instruction has finished executing
D. At the end of the current subroutine
[1 mark]

Solution: Application of knowledge. It might be thought that the interrupt should be


handled immediately, but the processor must write back the current instruction before
dealing with the interrupt to ensure consistent state on return.

Q25. What type of data dependency hazard occurs in following instruction stream?

r1 ← r2 + r3
r4 ← r3 + r4
r5 ← r3 + r1

A. Read after Write (RAW)


B. Read after Read (RAR)
C. Write after Write (WAW)
D. Write after Read (WAR)
[2 marks]

Page 18 of 26
Solution: The register r 1 is read on line 3 after being written on line 1, therefore the
answer is Read after Write (RAW).
Line 2 is not a hazard because although r 2 is read and written, it is within the same
instruction and therefore does not impose a data dependence.

Q26. Figure 8 and Figure 9 show two different circuits for 3-bit multiplication, labelled A
and B; both are constructed using half adders (HA) and full adders (FA). Which circuit
was generated using the Wallace Tree procedure, and why?
A. Multiplication circuit A (i.e., Figure 8) is a Wallace Tree because it follows the
Wallace Tree procedure
B. Multiplication circuit A (i.e., Figure 8) is a Wallace Tree because it has two
layers
C. Multiplication circuit B (i.e., Figure 9) is a Wallace Tree because it follows the
Wallace Tree procedure
D. Multiplication circuit B (i.e., Figure 9) is a Wallace Tree because it has two
layers
E. Multiplication circuit B (i.e., Figure 9) is a Wallace Tree because it passes
through the lowest partial sum unmodified
[2 marks]

Solution: Figure 8 applies the Wallace Tree procedure of adding a layer using half adders
to sum 2 inputs and full adders to sum 3 (or more) inputs. For 3-bit multiplication, the
procedure is only applied once, rendering a final layer which is a simple 6-bit ripple-carry
adder.
Figure 9 follows instead the shift-and-add multiplication circuit.
The question asks for the Wallace Tree approach which is Figure 8.

Q27. During the assembly of an assembly program, labels are resolved. Consider the following
Hex 8 assembly program:

Line 1 - label:
Line 2 - ldac 0
Line 3 - ldbc -5
Line 4 - loop: br loop
Line 5 - br next
Line 6 - next: ldbc loop

Page 19 of 26 Turn Over/Qu. continues . . .


(cont.)

During label resolution, what constant offsets are the operands on Line 4 (loop) and
Line 6 (loop) replaced with?
A. Line 4 becomes br -1 and line 6 becomes ldbc -1
B. Line 4 becomes br 4 and line 6 becomes ldbc 4
C. Line 4 becomes br 0 and line 6 becomes ldbc 4
D. Line 4 becomes br 0 and line 6 becomes ldbc 0
E. Line 4 becomes br -1 and line 6 becomes ldbc 4
[2 marks]

Solution: After label resolution the program will be:

Line 1 - NOOP
Line 2 - ldac 0
Line 3 - ldbc -5
Line 4 - br -1
Line 5 - br 0
Line 6 - ldbc 4

The Hex 8 processor has an execution cycle of Fetch/Increment PC/Execute. The


BR instruction computes pc ← pc + or eg. Just before line 4 BR is executed pc = 5.
Therefore to set pc = 4, the operand must be −1.
On line 6, the LDBC instruction sets br eg ← or eg. With an operand of the label loop,
this instruction needs the address of the label. Therefore the operand must be 4.

Page 20 of 26
Additional figures and tables

Vdd Vdd

y r1
x
y
r0
x

Vss Vss
(a) C0 (using P-type MOSFETs). (b) C1 (using N-type MOSFETs).

Figure 1: MOSFET-based implementations of C0 and C1 .

Page 21 of 26 Turn Over/. . .


r
D Q

¬Q

D Q

¬Q
en

en
s7
y c
r
x

D Q

¬Q

D Q

¬Q
en

en
s6
y c
r
x

D Q

¬Q

D Q

¬Q
en

en
s5
y c
r
x
D Q

¬Q

D Q

¬Q
en

en
s4

y c
r
x
D Q

¬Q

D Q

¬Q
en

en
s3

y c
r
x
D Q

¬Q

D Q

¬Q
en

en
s2

y c
r
x
D Q

¬Q

D Q

¬Q
en

en
s1

y c
r
x
D Q

¬Q

D Q

¬Q
en

en
s0

y c
r
x
Φ2

Φ1
r st

Figure 2: An FSM implementation, which has 4 inputs (1-bit Φ1 , Φ2 and r st on the left-hand
side; 8-bit s spread within the design) and 1 output (1-bit r on the right-hand side).

Page 22 of 26
Φ2

Φ1

r st

t0 t1 ρ t2

Figure 3: A waveform describing behaviour of Φ1 , Φ2 , and r st within Figure ??.

Page 23 of 26 Turn Over/. . .


D S0
Q

en

¬Q
R0
Figure 4: A NAND-based implementation of a D-type latch.

Figure 5: A NAND-based implementation of a 2-input XOR gate.

r
y

c
Figure 6: A NAND-based implementation of a 2-input, 1-bit multipliexer.

Page 24 of 26
8 7 6 5 4 3 2 1 0

Li : Raddr ← Raddr + 1 then goto Li+1 7→ 000 addr 0000


8 7 6 5 4 3 2 1 0

Li : Raddr ← Raddr − 1 then goto Li+1 7→ 001 addr 0000


8 7 6 5 4 3 2 1 0

Li : if Raddr = 0 then goto Ltarget else goto Li+1 7→ 010 addr target
8 7 6 5 4 3 2 1 0

Li : halt 7→ 011 00 0000

Figure 7: The instruction set for an example 4-register counter machine.

Page 25 of 26 Turn Over/. . .


x2 y1 x0 y2 x1 y0
x2 y0
x2 y2 x1 y2 x1 y1 x0 y1 x0 y0

HA FA HA

0 0

FA FA FA FA HA

Figure 8: 3-bit multiplication circuit A.

x2 y0 x1 y0

x1 y1 x0 y1 x0 y0

x2 y1

x2 y2 FA HA

x0 y2
HA

x1 y2 HA

FA FA

Figure 9: 3-bit multiplication circuit B.

Page 26 of 26
END OF PAPER

You might also like