Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

CSE141 Midterm 1

PLEASE SHOW ALL WORK!

Name _________________________________________________________

Email _________________________________________________________

One hour total; equal points per problem; 25% of your grade.

Solution checking

Question Solved by Checked by

1 Yajie Yen-Yi

2 Tee Yajie

3 Jordan Tee

4 Paul Jordan

5 Yen-Yi Jordan

6 Jordan Paul

7 Jordan Zhizhen

8 Zhizhen Jordan
1​. Computer A uses one ISA and has a 2 GHz clock frequency. Computer B uses a different
ISA and has a 3 GHz clock frequency. On average, A's programs execute 1.5 times as many
instructions as B's. For program P1, computer A has a CPI of 2 and computer B has a CPI of 3.
Which computer has faster execution time? What is the speedup?

Computer Clk CPI # Instr

A 2 GHz 2 1.5×

B 3 GHz 3 1×

A.​ ​A is faster, speedup is 25%

B.​ ​A is faster, speedup is 1.5

C.​ ​B is faster, speedup is 2.25%

D.​ ​B is faster, speedup is 1.5

E. None of the Above _____

IC B × CP I B
P erf ormance A Execution T imeB Clock RateB Clock RateA IC B CP I B 2 1 3 2
P erf ormanceB
= Execution T imeA
= IC A ×CP I A = Clock RateB
× IC A
× CP I A
= 3
× 1.5
× 2
= 3
Clock RateA

We can infer than B is faster


Execution T imeA 3
Speedup of B over A, Execution T imeB
= 2
= 1.5

Therefore, ​B is faster, speedup is 1.5


2​. ​A program has the following instruction type distribution. Assume the processor
that will be running this program has the following instruction latencies.

Instruction Instr. Frequency Latency (Cycles)

load 30% 4

store 10% 4

add 50% 2

multiply 8% 16

divide 2% 50

If you could pick one type of instruction to make twice as fast (half the latency) in the
next-generation of this processor, which instruction type would you pick? Why?

A. l​ oad _____
B. store _____
C. add _____
D. multiply _____
E. divide _____

​Assuming that total number of instructions is 100: 30 load, 10 store, 50 add, 8 multiply, 2 divide

(30*4)+(10*4)+(50*2)+(8*16)+(2*50)
C P I base = 30+10+50+8+2 = 4.88

Make load faster

(30*2)+(10*4)+(50*2)+(8*16)+(2*50)
C P I load = 30+10+50+8+2 = 4.28

Make stores faster

(30*4)+(10*2)+(50*2)+(8*16)+(2*50)
C P I store = 30+10+50+8+2 = 4.68

Make adds faster


(30*4)+(10*4)+(50*1)+(8*16)+(2*50)
C P I add = 30+10+50+8+2 = 4.38

Make mults faster

(30*4)+(10*4)+(50*2)+(8*8)+(2*50)
C P I mults = 30+10+50+8+2 = 4.28

Make divs faster

(30*4)+(10*4)+(50*2)+(8*16)+(2*25)
C P I divs = 30+10+50+8+2 = 4.38

Therefore, I would pick multiply because it results it the lowest CPI


3​. ​Processor A has an average CPI of 5.0 for a specific program and a clock speed
of 2GHz. If we optimize 40% of instructions by a factor of 2, what is the speedup
from the optimization?

A. 1.0X _____

B. 1.20X _____

C. 1.25X​ _____

D. 1.5X _____

E. 1.6X _____

F. 2.0X _____

G. None of the Above _____

Solution

St = 1/(x/S + (1-x)) = 1/(0.4/2 +0.6) = 1/0.8 = 1.25x


4​. Translate the C code below into the equivalent ARM Assembly code. Just perform a direct
translation – no optimization required. Map r0 = a and r1 = b and assume they already contain
values when your code starts.

(assume already declared and initialized integer a, b;)

if (a > b)
{
do
{
a = a - b;
} while (a >= 55);
--b;
}
else
{
b = b + 44;
}

Solution

/* assume a is in r0 and b is in r1 */

CMP r0, r1 // set Z = 1 if a> b

BGT True // goto True when a>b

True: // (initiate true)

Loop: SUB r0, r0, r1 // a=a-b

CMP r0, #55 // set Z = 1 if a>=55

BGE Loop // goto Loop if a>=55

SUB r1, r1, #1 // --b

ADD r1, r1, #44 // b=b+44 (initiate false)

B Done // goto Done

Done: // (initiate done)


/* assume a is in r0 and b is in r1 */

CMP r0, r1

BLE else // goto else when a<=b

loop:

SUB r0, r0, r1 // a=a-b

CMP r0, #55

BGE loop // goto loop if a>=55

SUB r1, r1, #1 // --b

B done // goto done

else:

ADD r1, r1, #44 // b=b+44

done:
5​. What does each line of the following ARM assembly code (Euclid's Greatest Common Divisor
algorithm) do?

0) MOV R0, #40 // R0 = a


1) MOV R1, #25 // R1 = b

again: CMP R0, R1 ​//​Is a == b?

3) BEQ halt ​//​ Branch to halt sequence if a==b

4) BLT isLess ​//​ Branch to isLess if a<b

5) SUB R0, R0, R1 ​//​ a = a-b

6) B again ​//​ Branch to again

isLess: SUB R1, R1, R0 ​//​ b = b - a

8) B again ​//​ Repeat again loop

halt: swi 0 ​//​ Software interrupt

while(a != b) {
if(a>b) a -= b;
else b -= a;
}
6​. Represent each value in as many ways as possible as an ARM immediate:

0x600

Rotate Immediate

0xC 0x06

0xD 0x18

0xE 0x60

0xC0

Rotate Immediate

0x0 0xC0

0xF 0x30

0xE 0x0C

0xD 0x03

0x102

Impossible
7​. For each assembly code snippet, list the values of the controls listed. Use a single X for don't
care. Note the width of each signal and be sure you ​enter a digit for each wire in the signal
(e.g. for signal[2:0] write 001 instead of just 1).

instruction add r3, r1, r3 mvn r3, #8 str r3, [r2, #4] ldr r6, [r3, #0]

mem_to_reg 0 0 X 1

reg_read_addr1[2:0] 001 X 010 011

reg_read_addr2[2:0] 011 X 011 X

reg_write_en[0] 1 1 0 1

reg_write_addr[2:0] 011 011 X 110

immed_alu[0] 0 1 1 1

alu_op[3:0] 4 F 4 4

mem_write_en[0] 0 0 1 0

mem_addr* X X *r2+4 *r3+0

reg_write_en = 1 if writing to reg_file at end of current cycle, 0 if not


mem_write_en = 1 if writing to data memory at end of current cycle, 0 if not
immed_alu = 1 if routing immed. bus to alu, 0 if routing reg_file output to alu
Mem_to_reg = 1 if writing mem_read_data to reg_file, 0 if writing alu_result to reg_file
*mem_addr: write your answer in the form *rx+o (no spaces). Use *rx to denote the contents of
register x. o is the offset from the base address.

for alu_op, use:


CODE OP CODE OP CODE OP

0 A&B 4 A+B C A|B

1 A^B 5 A+B+Ci D 0+B

2 A-B 6 A-B+Ci E A&(~B)

3 B-A 7 B-A+Ci F 0+(~B)


8​. Which of the following computations overflow in an 8-bit two's complement number system?

a​. 0x40 + 0x40 OK _____ overflow __​X​___

Explanation: 0100_0000+0100_0000=0_1000_0000, carry-in=1, carry-out=0, 1 XOR 0 = 1

b​. 0xC0 + 0xC0 OK __​X​___ overflow _____

Explanation: 1100_0000+1100_0000=1_1000_0000, carry-in=1, carry-out=1, 1 XOR 1 = 0

c​. 0xC0 - 0x40 OK __​X​___ overflow _____

​ Explanation: -0x40=-0100_0000=1100_0000

1100_0000+1100_0000=1_1000_0000, carry-in=1, carry-out=1, 1 XOR 1 = 0

d. 0 - 0x80 OK _____ overflow ___​X​__

Explanation: -0x80=-1000_0000=1000_0000 ---- Overflow happened here, when performing


the “+1” step for two’s complement.

Alternative explanation: 0000_0000+1000_0000=1000_0000, which is a negative number.


However, when we subtract 0 by a negative number, we should get a positive result.

You might also like