Professional Documents
Culture Documents
COL216 Assignment 4: 1 Problem Statement
COL216 Assignment 4: 1 Problem Statement
1 Problem Statement
Memory Request Ordering
Consider the following memory READ address sequence with a DRAM Row size = 1024 bytes: 1000, 2500, 1004,
2504. If we service the requests in the above order, the DRAM will change rows ON EVERY ACCESS, which results
in poor performance. Can we do better?
1. Sometimes there is an opportunity to change the order in which DRAM requests are serviced. When does this
opportunity arise? Assume that the order of instructions and address values of memory instructions cannot be
changed.
2. Design and implement a strategy for efficient ordering of DRAM requests at runtime. Remember that the
program’s semantics cannot be violated (its output cannot change).
Use the same DRAM size/rowsize/other architectural assumptions used in the Minor exam. Sample test cases are
provided. If you did not handle some of these instruction formats in earlier assignments, please do so now for this
assignment.
Input:
1. MIPS assembly language program (as text file, NOT machine instructions). Your interpreter should handle all
the instructions: mentioned in Assignment 3: add, sub, mul, beq, bne, slt, j, lw, sw, addi.
2. DRAM timing values ROW ACCESS DELAY and COL ACCESS DELAY in cycles (as command line argu-
ments). Typical values could be 10 cycles and 2 cycles respectively.
Output:
1. At every clock cycle,print the clock cycle number and all activity in that cycle,such as:
(a) Address of Completed instruction, if any
(b) Modified registers,if any(register number and new value)
(c) Modified memory locations, if any (memory location and new value) d. Activity on the DRAM,if any(memory
location,row buffer updates)
2. After execution completes,print the relevant statistics such as:
(a) Total execution time in clock cycles
(b) Number of row buffer updates
2 Approach
First, we parse through the entire input file and load it into two maps - map<int, IS> instructions and map<string,
int> labels that contains all instructions (with their arguments) and labels respectively. Then we process instruction-
by-instruction and execute them, printing the required output after each step.
At the end, we print the details such as number of clock cycles and number of times each instruction was executed.
We have implemented forwarding using a multi-dimensional queue the details of which are described in the following
section.
1
2.1 Forwarding
Data hazards are problems with the instruction pipeline in CPU when the next instruction cannot execute in the
following clock cycle and can potentially lead to incorrect computation results. A pipeline stall is a delay in execution
of an instruction in order to resolve a hazard. Operand forwarding (or data forwarding) is an optimization in pipelined
CPUs to limit performance deficits which occur due to pipeline stalls.
Checking Dependency:
1. Upper Commands -
(a) sw $x y($z) - no dependency
(b) lw $x y($z) - only dependency caused by $x
We push the lw and sw instructions into the request queue using an algorithm illustrated by the following example.
Let us say I3 - R4 corresponds to an lw or sw instruction with address corresponding to row 4 of the DRAM. Suppose
the instructions lw/sw instructions appear in the following order in a given input file.
I1 - R4
I2 - R1
I3 - R4
I4 - R3
I5 - R1
I6 - R3
I7 - R2
I8 - R3
Then the corresponding memory request 2D- queue will be of the following form.
4: I1, I3
1: I2, I5
3: I4, I6, I8
2: I7
3 Implementation
Here are the implementation details and design decisions :
2
3. Instruction format follows the MIPS convention, it is not comma separated but space separated (to keep imple-
mentation simple).
1. Input
We have taken input as a file that contains the instructions in MIPS format
2. Data Structures
(a) int numClockCycles : integer that stores the number of clock cycles during execution.
(b) int32 t memoryUsed : integer that stores the memory used by the loaded instructions and data.
(c) structure IS : a custom-made structure that stores 2 integer values and 3 string values, used to store an
instruction in a meaningful way.
(d) structure converter : a custom-made structure that stores an integer value and a boolean, used for
simplifying code.
(e) structure mem : a custom-made structure that stores an integer value and a boolean, used for memory.
(f) structure memoryRequest : a custom-made structure for each memory request element in the queue.
(g) int[] array registers : a array that stores register data as value
(h) map<int, IS> instructions : a map that stores address as key and instruction stored at that address as
value
(i) map<string, int> labels : a map that stores labels as key and address corresponding to those labels as
value
(j) int[][] 2D array DRAM : a map that stores address as key and data corresponding to those addresses as
value
(k) string[] Instructions : an array that stores instruction names
(l) int[] numInstructions : an array that stores number of times each instruction was called
(m) map<int, vector<memoryRequest>> RowQueue : a map that stores row number as key and queue of
memory requests of that row as value
3. Function details
3
(h) void final result():
Function to print final specification a total number of cycles, memory used and number of calls of each
instruction.
(i) int run():
Function that runs each instruction in order of execution, simulating the entire execution process.
4. Instructions
(a) add
Takes 3 input arguments - $r1, $r2 and $r3 (add $r1 $r2 $r3)
Value stored in $r2 and $r3 are added and result is stored in $r1.
(b) sub
Takes 3 input arguments - $r1, $r2 and $r3 (sub $r1 $r2 $r3)
Value stored in $r3 is subtracted from value stored in $r2 and result is stored in $r1.
(c) mul
Takes 3 input arguments - $r1, $r2 and $r3 (mul $r1 $r2 $r3)
Value stored in $r2 and $r3 are multiplied and result is stored in $r1.
(d) beq
Takes 3 input arguments - $r1, $r2 and label (beq $r1 $r2 label)
Value stored in $r1 and $r2 are compared and if same, execution jumps to label. Uses the j() function to
jump based on the returned bool.
(e) bne
Takes 3 input arguments - $r1, $r2 and label (bne $r1 $r2 label)
Value stored in $r1 and $r2 are compared and if different, execution jumps to label. Uses the j() function
to jump based on the returned bool.
(f) slt
Takes 3 input arguments - $r1, $r2 and $r3 (slt $r1 $r2 $r3)
Value stored in $r2 and $r3 are compared and if value in $r2 is less than value stored in $r3 then value 1
is stored in $r1, else value 0 is stored in $r1.
(g) j
Takes 1 input argument - label (j label)
Executions jumps to label. Returns integer value that instructs the main iterator(explained in pseudo code)
where to jump to.
(h) lw
Takes 2 input arguments - $r1 and address (lw $r1 address)
Loads the value stored at address into $r1.
(i) sw
Takes 2 input arguments - $r1 and address (sw $r1 address)
Saves the value stored in $r1 at address.
(j) addi
Takes input arguments - $r1, $r2 and c (a constant) (add $r1 $r2 c)
Value stored in $r2 and c are added and result is stored in $r1.
5. Output
The output is 32 registers and their corresponding data each in a new line for every instruction call while
execution. After the execution is complete we print the details such as number of clock cycles and number of
times each instruction was executed, refer testcases.pdf for exact input/output specifications.
4 Testing Strategy
We have developed a test strategy that will check our implementation on all possible classes of inputs. We have not
done testing on large test sets as it is not necessary to evaluate accuracy. We have looked at small test cases that
prove the robustness of our implementation.
We have tested for all possible cases of inputs for our algorithm above and our code works correctly on them, the
input and output files are listed in testcases.pdf.