Midtermarch 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

ECE 452: Computer Organization and Design Spring 2010 Midterm

Colorado State University 23 Mar 2010

Name: Student ID number:

___________________________________ ___________________________________

Instructions:
Write all answers on the white space provided below the question. In case you require more space, you can use the back of the sheet or extra sheets of paper. Only 1 sheet of hand written notes (both sides) on regular sized paper is allowed, along with a calculator. There are 9 regular questions for a total score of 100, and a 10th question for extra credit (10 points) You have 1 hour 30 minutes. Write legibly and give clear answers showing all your steps. Try to attempt all questions as partial points will be given for a correct approach. Make reasonable assumptions if there is ambiguity for any question. Not all questions are of equal difficulty. Please review the entire set of questions first and then budget your time carefully.

Q1. [10 points] Computer A has an overall CPI of 1.3 and can be run at a clock rate of 600MHz. Computer B has a CPI of 2.5 and can be run at a clock rate of 750 Mhz. We have a particular program we wish to run. When compiled for computer A, this program has exactly 100,000 instructions. How many instructions would the program need to have when compiled for Computer B, in order for the two computers to have exactly the same execution time for this program?

Q2. [10 points] Suppose that we can improve the floating point instruction performance of machine by a factor of 15 (the same floating point instructions run 15 times faster on this new machine). What percent of the instructions must be floating point to achieve a Speedup of at least 4?

Q3. [5 points] Prior to the early 1980s, machines were built with more and more complex instruction set. The MIPS is a RISC machine. Why has there been a move to RISC machines away from complex instruction machines?

Q4. [5 points] In the snippet of MIPS assembler code below, how many times is instruction memory accessed? How many times is data memory accessed? (Count only accesses to memory, not registers.) lw $v1, 0($a0) addi $v0, $v0, 1 sw $v1, 0($a1) addi $a0, $a0, 1

Q5. [20 points] In MIPS assembly, write an assembly language version of the following C code segment: int A[100], B[100]; for (i=1; i < 100; i++) { A[i] = A[i-1] + B[i]; } At the beginning of this code segment, the only values in registers are the base address of arrays A and B in registers $a0 and $a1. Avoid the use of multiplication instructionsthey are unnecessary.

Q6. [10 points] Structural, data and control hazards typically require a processor pipeline to stall. Listed below are a series of optimization techniques implemented in a compiler or a processor pipeline designed to reduce or eliminate stalls due to these hazards. For each of the following optimization techniques, state which pipeline hazards it addresses and (briefly) how it addresses it. Some optimization techniques may address more than one hazard, so be sure to include explanations for all addressed hazards.

(a) Branch Prediction

(b) Instruction Scheduling

(c) delay slots

(d) increasing availability of functional units (ALUs, adders etc)

(e) caches

Q7. [10 points] For the MIPS datapath shown below, several lines are marked with X.

(a) For each one, describe in words the negative consequence of cutting this line relative to the working unmodified processor.

(b) Provide a (minimum 2 instruction) snippet of MIPS assembly code that will fail

(c) Provide a (minimum 2 instruction) snippet of MIPS assembly code that will still work

Q8. [10 points] Consider a MIPS machine with a 5-stage pipeline with a cycle time of 10ns. Assume that you are executing a program where a fraction, f, of all instructions immediately follow a load upon which they are dependent. (a) With forwarding enabled what is the total execution time for N instructions, in terms of f ?

(b) Consider a scenario where the MEM stage, along with its pipeline registers, needs 12 ns. There are now two options: add another MEM stage so that there are MEM1 and MEM2 stages or increase the cycle time to 12ns so that the MEM stage fits within the new cycle time and the number of pipeline stages remain unaffected. For a program mix with the above characteristics, when is the first option better than the second. Your answer should be based on the value of f.

Q9. [20 points] Consider the following assembly language code: I0: ADD R4 = R1 + R0; I1: SUB R9 = R3 - R4; I2: ADD R4 = R5 + R6; I3: LDW R2 = MEM[R3 + 100]; I4: LDW R2 = MEM[R2 + 0]; I5: STW MEM[R4 + 100] = R2; I6: AND R2 = R2 & R1; I7: BEQ R9 == R1, Target; I8: AND R9 = R9 & R1; Consider a pipeline with forwarding, hazard detection, and 1 delay slot for branches. The pipeline is the typical 5stage IF, ID, EX, MEM, WB MIPS design. For the above code, complete the pipeline diagram below (instructions on the left, cycles on top) for the code. Insert the characters IF, ID, EX, MEM, WB for each instruction in the boxes. Assume that there two levels of bypassing, that the second half of the decode stage performs a read of source registers, and that the first half of the write-back stage writes to the register file. Label all data stalls (Draw an X in the box). Label all data forwards that the forwarding unit detects (arrow between the stages handing off the data and the stages receiving the data). What is the final execution time of the code?

Q10. [Extra credit question: 10 points] Prediction and Predication (a) Consider the following piece of code: for(i=0; i<1000000; i++){ a = random(100); if(a >= 50){ ... } } Assume that random(N) returns a random number uniformly distributed between 0 and N-1 inclusive. Consider the branch instruction associated with the if statement. Is there a type of branch predictor that predicts this branch well? Explain your answer.

(b) Predication can eliminate all forward conditional branches in a program. The backward branches in a program are, typically, associated with loops and hence are mostly taken. So it is possible to eliminate all forward branches and statically predict the backward branches as Taken and this would get rid of the complex branch predictors in a machine. But the Itanium processor which has hardware support for predication still retains a complex two level branch predictor. Explain why this is the case.

You might also like