Midterm 1 Sol.pdf

CSE141 Midterm 1
PLEASE SHOW ALL WORK!
Name _________________________________________________________
Email _________________________________________________________
One hour total; equal points per problem; 25% of your grade.
Solution checking
Question Solved by Checked by
1 Yajie Yen-Yi
2 Tee Yajie
3 Jordan Tee
4 Paul Jordan
5 Yen-Yi Jordan
6 Jordan Paul
7 Jordan Zhizhen
8 Zhizhen Jordan
1. Computer A uses one ISA and has a 2 GHz clock frequency. Computer B uses a different
ISA and has a 3 GHz clock frequency. On average, A's programs execute 1.5 times as many
instructions as B's. For program P1, computer A has a CPI of 2 and computer B has a CPI of 3.
Which computer has faster execution time? What is the speedup?
Computer Clk CPI # Instr
A 2 GHz 2 1.5×
B 3 GHz 3 1×
A. A is faster, speedup is 25%
B. A is faster, speedup is 1.5
C. B is faster, speedup is 2.25%
D. B is faster, speedup is 1.5
E. None of the Above _____
IC B × CP I B
P erf ormance A Execution T imeB Clock RateB Clock RateA IC B CP I B 2 1 3 2
P erf ormanceB
= Execution T imeA
= IC A ×CP I A = Clock RateB
× IC A
× CP I A
= 3
× 1.5
× 2
= 3
Clock RateA
We can infer than B is faster

Execution T imeA 3
Speedup of B over A, Execution T imeB
= 2
= 1.5
Therefore, B is faster, speedup is 1.5

2. A program has the following instruction type distribution. Assume the processor
that will be running this program has the following instruction latencies.
Instruction Instr. Frequency Latency (Cycles)
load 30% 4
store 10% 4
add 50% 2
multiply 8% 16
divide 2% 50
If you could pick one type of instruction to make twice as fast (half the latency) in the
next-generation of this processor, which instruction type would you pick? Why?
A. l oad _____
B. store _____
C. add _____
D. multiply _____
E. divide _____
Assuming that total number of instructions is 100: 30 load, 10 store, 50 add, 8 multiply, 2 divide
(30*4)+(10*4)+(50*2)+(8*16)+(2*50)
C P I base = 30+10+50+8+2 = 4.88
Make load faster
(30*2)+(10*4)+(50*2)+(8*16)+(2*50)
C P I load = 30+10+50+8+2 = 4.28
Make stores faster
(30*4)+(10*2)+(50*2)+(8*16)+(2*50)
C P I store = 30+10+50+8+2 = 4.68
Make adds faster

(30*4)+(10*4)+(50*1)+(8*16)+(2*50)
C P I add = 30+10+50+8+2 = 4.38
Make mults faster
(30*4)+(10*4)+(50*2)+(8*8)+(2*50)
C P I mults = 30+10+50+8+2 = 4.28
Make divs faster
(30*4)+(10*4)+(50*2)+(8*16)+(2*25)
C P I divs = 30+10+50+8+2 = 4.38
Therefore, I would pick multiply because it results it the lowest CPI

3. Processor A has an average CPI of 5.0 for a specific program and a clock speed
of 2GHz. If we optimize 40% of instructions by a factor of 2, what is the speedup
from the optimization?
A. 1.0X _____
B. 1.20X _____
C. 1.25X _____
D. 1.5X _____
E. 1.6X _____
F. 2.0X _____
G. None of the Above _____
Solution
St = 1/(x/S + (1-x)) = 1/(0.4/2 +0.6) = 1/0.8 = 1.25x

4. Translate the C code below into the equivalent ARM Assembly code. Just perform a direct
translation – no optimization required. Map r0 = a and r1 = b and assume they already contain
values when your code starts.
(assume already declared and initialized integer a, b;)
if (a > b)
{
do
{
a = a - b;
} while (a >= 55);
--b;
}
else
{
b = b + 44;
}
Solution
/* assume a is in r0 and b is in r1 */
CMP r0, r1 // set Z = 1 if a> b
BGT True // goto True when a>b
True: // (initiate true)
Loop: SUB r0, r0, r1 // a=a-b
CMP r0, #55 // set Z = 1 if a>=55
BGE Loop // goto Loop if a>=55
SUB r1, r1, #1 // --b
ADD r1, r1, #44 // b=b+44 (initiate false)
B Done // goto Done
Done: // (initiate done)

/* assume a is in r0 and b is in r1 */
CMP r0, r1
BLE else // goto else when a<=b
loop:
SUB r0, r0, r1 // a=a-b
CMP r0, #55
BGE loop // goto loop if a>=55
SUB r1, r1, #1 // --b
B done // goto done
else:
ADD r1, r1, #44 // b=b+44
done:
5. What does each line of the following ARM assembly code (Euclid's Greatest Common Divisor
algorithm) do?
0) MOV R0, #40 // R0 = a

1) MOV R1, #25 // R1 = b
again: CMP R0, R1 //Is a == b?
3) BEQ halt // Branch to halt sequence if a==b
4) BLT isLess // Branch to isLess if a<b
5) SUB R0, R0, R1 // a = a-b
6) B again // Branch to again
isLess: SUB R1, R1, R0 // b = b - a
8) B again // Repeat again loop
halt: swi 0 // Software interrupt
while(a != b) {
if(a>b) a -= b;
else b -= a;
}
6. Represent each value in as many ways as possible as an ARM immediate:
0x600
Rotate Immediate
0xC 0x06
0xD 0x18
0xE 0x60
0xC0
Rotate Immediate
0x0 0xC0
0xF 0x30
0xE 0x0C
0xD 0x03
0x102
Impossible
7. For each assembly code snippet, list the values of the controls listed. Use a single X for don't
care. Note the width of each signal and be sure you enter a digit for each wire in the signal
(e.g. for signal[2:0] write 001 instead of just 1).
instruction add r3, r1, r3 mvn r3, #8 str r3, [r2, #4] ldr r6, [r3, #0]
mem_to_reg 0 0 X 1
reg_read_addr1[2:0] 001 X 010 011
reg_read_addr2[2:0] 011 X 011 X
reg_write_en[0] 1 1 0 1
reg_write_addr[2:0] 011 011 X 110
immed_alu[0] 0 1 1 1
alu_op[3:0] 4 F 4 4
mem_write_en[0] 0 0 1 0
mem_addr* X X *r2+4 *r3+0
reg_write_en = 1 if writing to reg_file at end of current cycle, 0 if not

mem_write_en = 1 if writing to data memory at end of current cycle, 0 if not
immed_alu = 1 if routing immed. bus to alu, 0 if routing reg_file output to alu
Mem_to_reg = 1 if writing mem_read_data to reg_file, 0 if writing alu_result to reg_file
*mem_addr: write your answer in the form *rx+o (no spaces). Use *rx to denote the contents of
register x. o is the offset from the base address.
for alu_op, use:

CODE OP CODE OP CODE OP
0 A&B 4 A+B C A|B
1 A^B 5 A+B+Ci D 0+B
2 A-B 6 A-B+Ci E A&(~B)
3 B-A 7 B-A+Ci F 0+(~B)

8. Which of the following computations overflow in an 8-bit two's complement number system?
a. 0x40 + 0x40 OK _____ overflow __X___
Explanation: 0100_0000+0100_0000=0_1000_0000, carry-in=1, carry-out=0, 1 XOR 0 = 1
b. 0xC0 + 0xC0 OK __X___ overflow _____
Explanation: 1100_0000+1100_0000=1_1000_0000, carry-in=1, carry-out=1, 1 XOR 1 = 0
c. 0xC0 - 0x40 OK __X___ overflow _____
Explanation: -0x40=-0100_0000=1100_0000
1100_0000+1100_0000=1_1000_0000, carry-in=1, carry-out=1, 1 XOR 1 = 0
d. 0 - 0x80 OK _____ overflow ___X__
Explanation: -0x80=-1000_0000=1000_0000 ---- Overflow happened here, when performing

the “+1” step for two’s complement.
Alternative explanation: 0000_0000+1000_0000=1000_0000, which is a negative number.

However, when we subtract 0 by a negative number, we should get a positive result.

Midterm 1 Sol.pdf

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Midterm 1 Sol.pdf

Uploaded by

Copyright:

Available Formats

CSE141 Midterm 1

PLEASE SHOW ALL WORK!

Question Solved by Checked by

Computer Clk CPI # Instr

A.​ ​A is faster, speedup is 25%

B.​ ​A is faster, speedup is 1.5

C.​ ​B is faster, speedup is 2.25%

D.​ ​B is faster, speedup is 1.5

E. None of the Above _____

We can infer than B is faster

Therefore, ​B is faster, speedup is 1.5

Instruction Instr. Frequency Latency (Cycles)

Make load faster

Make stores faster

Make adds faster

Make mults faster

Make divs faster

Therefore, I would pick multiply because it results it the lowest CPI

G. None of the Above _____

St = 1/(x/S + (1-x)) = 1/(0.4/2 +0.6) = 1/0.8 = 1.25x

(assume already declared and initialized integer a, b;)

CMP r0, r1 // set Z = 1 if a> b

BGT True // goto True when a>b

True: // (initiate true)

Loop: SUB r0, r0, r1 // a=a-b

CMP r0, #55 // set Z = 1 if a>=55

BGE Loop // goto Loop if a>=55

SUB r1, r1, #1 // --b

ADD r1, r1, #44 // b=b+44 (initiate false)

B Done // goto Done

Done: // (initiate done)

BLE else // goto else when a<=b

SUB r0, r0, r1 // a=a-b

CMP r0, #55

BGE loop // goto loop if a>=55

SUB r1, r1, #1 // --b

B done // goto done

ADD r1, r1, #44 // b=b+44

0) MOV R0, #40 // R0 = a

again: CMP R0, R1 ​//​Is a == b?

3) BEQ halt ​//​ Branch to halt sequence if a==b

4) BLT isLess ​//​ Branch to isLess if a<b

5) SUB R0, R0, R1 ​//​ a = a-b

6) B again ​//​ Branch to again

isLess: SUB R1, R1, R0 ​//​ b = b - a

8) B again ​//​ Repeat again loop

halt: swi 0 ​//​ Software interrupt

reg_read_addr1[2:0] 001 X 010 011

reg_read_addr2[2:0] 011 X 011 X

reg_write_addr[2:0] 011 011 X 110

mem_addr* X X *r2+4 *r3+0

reg_write_en = 1 if writing to reg_file at end of current cycle, 0 if not

for alu_op, use:

0 A&B 4 A+B C A|B

1 A^B 5 A+B+Ci D 0+B

2 A-B 6 A-B+Ci E A&(~B)

3 B-A 7 B-A+Ci F 0+(~B)

a​. 0x40 + 0x40 OK _____ overflow __​X​___

Explanation: 0100_0000+0100_0000=0_1000_0000, carry-in=1, carry-out=0, 1 XOR 0 = 1

b​. 0xC0 + 0xC0 OK __​X​___ overflow _____

Explanation: 1100_0000+1100_0000=1_1000_0000, carry-in=1, carry-out=1, 1 XOR 1 = 0

c​. 0xC0 - 0x40 OK __​X​___ overflow _____

1100_0000+1100_0000=1_1000_0000, carry-in=1, carry-out=1, 1 XOR 1 = 0

d. 0 - 0x80 OK _____ overflow ___​X​__

A. A is faster, speedup is 25%

B. A is faster, speedup is 1.5

C. B is faster, speedup is 2.25%

D. B is faster, speedup is 1.5

Therefore, B is faster, speedup is 1.5

again: CMP R0, R1 //Is a == b?

3) BEQ halt // Branch to halt sequence if a==b

4) BLT isLess // Branch to isLess if a<b

5) SUB R0, R0, R1 // a = a-b

6) B again // Branch to again

isLess: SUB R1, R1, R0 // b = b - a

8) B again // Repeat again loop

halt: swi 0 // Software interrupt

mem_addr* X X r2+4 r3+0

a. 0x40 + 0x40 OK _ overflow X_

b. 0xC0 + 0xC0 OK X_ overflow _____

c. 0xC0 - 0x40 OK X_ overflow _____

d. 0 - 0x80 OK ___ overflow _X__