Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

COL216 Assignment 4

Aniket Mishra (2019CS50420)


Tanishq Dubey (2019CS51077)
March 2021

1 Problem Statement
Memory Request Ordering

Consider the following memory READ address sequence with a DRAM Row size = 1024 bytes: 1000, 2500, 1004,
2504. If we service the requests in the above order, the DRAM will change rows ON EVERY ACCESS, which results
in poor performance. Can we do better?

1. Sometimes there is an opportunity to change the order in which DRAM requests are serviced. When does this
opportunity arise? Assume that the order of instructions and address values of memory instructions cannot be
changed.
2. Design and implement a strategy for efficient ordering of DRAM requests at runtime. Remember that the
program’s semantics cannot be violated (its output cannot change).

Use the same DRAM size/rowsize/other architectural assumptions used in the Minor exam. Sample test cases are
provided. If you did not handle some of these instruction formats in earlier assignments, please do so now for this
assignment.

Input:

1. MIPS assembly language program (as text file, NOT machine instructions). Your interpreter should handle all
the instructions: mentioned in Assignment 3: add, sub, mul, beq, bne, slt, j, lw, sw, addi.
2. DRAM timing values ROW ACCESS DELAY and COL ACCESS DELAY in cycles (as command line argu-
ments). Typical values could be 10 cycles and 2 cycles respectively.

Output:
1. At every clock cycle,print the clock cycle number and all activity in that cycle,such as:
(a) Address of Completed instruction, if any
(b) Modified registers,if any(register number and new value)
(c) Modified memory locations, if any (memory location and new value) d. Activity on the DRAM,if any(memory
location,row buffer updates)
2. After execution completes,print the relevant statistics such as:
(a) Total execution time in clock cycles
(b) Number of row buffer updates

2 Approach
First, we parse through the entire input file and load it into two maps - map<int, IS> instructions and map<string,
int> labels that contains all instructions (with their arguments) and labels respectively. Then we process instruction-
by-instruction and execute them, printing the required output after each step.
At the end, we print the details such as number of clock cycles and number of times each instruction was executed.
We have implemented forwarding using a multi-dimensional queue the details of which are described in the following
section.

1
2.1 Forwarding
Data hazards are problems with the instruction pipeline in CPU when the next instruction cannot execute in the
following clock cycle and can potentially lead to incorrect computation results. A pipeline stall is a delay in execution
of an instruction in order to resolve a hazard. Operand forwarding (or data forwarding) is an optimization in pipelined
CPUs to limit performance deficits which occur due to pipeline stalls.

Checking Dependency:
1. Upper Commands -
(a) sw $x y($z) - no dependency
(b) lw $x y($z) - only dependency caused by $x

Let T be the first argument of lw that causes dependency.


2. Lower Commands -
(a) Type 1 - add, mul, sub, addi, slt
add $x $y $z
y, z is T hence dependent
x is T hence independent hence delete corresponding lw request
(b) Type 2 - sw, lw
sw $x y($z)
z is T hence dependent
x is T hence dependent
lw $x y($z)
z is T hence dependent
x is T hence delete corresponding lw request
(c) Type 3 - j - no dependency
(d) Type 4 - bne, beq
bne $x $y label
x, y is T hence dependent
Simply stop as soon as you find a dependency

Request ordering in 2D Queue:

We push the lw and sw instructions into the request queue using an algorithm illustrated by the following example.
Let us say I3 - R4 corresponds to an lw or sw instruction with address corresponding to row 4 of the DRAM. Suppose
the instructions lw/sw instructions appear in the following order in a given input file.
I1 - R4
I2 - R1
I3 - R4
I4 - R3
I5 - R1
I6 - R3
I7 - R2
I8 - R3

Then the corresponding memory request 2D- queue will be of the following form.
4: I1, I3
1: I2, I5
3: I4, I6, I8
2: I7

3 Implementation
Here are the implementation details and design decisions :

1. A line by line explanation is given as part of the code.


2. Each instruction occupies 4 bytes and is executed in one clock cycle.

2
3. Instruction format follows the MIPS convention, it is not comma separated but space separated (to keep imple-
mentation simple).

4. Register file has 32 registers


5. Memory has 220 Bytes. Instructions start at address 0. The space after the instructions can be occupied by
data.
6. lw and sw are not as used in MIPS but of the following type -
lw $register address(an integer)
7. As we do not have any .data declaration right now, we memory contains only garbage values (which we have set
to 123).

1. Input
We have taken input as a file that contains the instructions in MIPS format
2. Data Structures

(a) int numClockCycles : integer that stores the number of clock cycles during execution.
(b) int32 t memoryUsed : integer that stores the memory used by the loaded instructions and data.
(c) structure IS : a custom-made structure that stores 2 integer values and 3 string values, used to store an
instruction in a meaningful way.
(d) structure converter : a custom-made structure that stores an integer value and a boolean, used for
simplifying code.
(e) structure mem : a custom-made structure that stores an integer value and a boolean, used for memory.
(f) structure memoryRequest : a custom-made structure for each memory request element in the queue.
(g) int[] array registers : a array that stores register data as value
(h) map<int, IS> instructions : a map that stores address as key and instruction stored at that address as
value
(i) map<string, int> labels : a map that stores labels as key and address corresponding to those labels as
value
(j) int[][] 2D array DRAM : a map that stores address as key and data corresponding to those addresses as
value
(k) string[] Instructions : an array that stores instruction names
(l) int[] numInstructions : an array that stores number of times each instruction was called
(m) map<int, vector<memoryRequest>> RowQueue : a map that stores row number as key and queue of
memory requests of that row as value

3. Function details

(a) void initialize()


Initializes relevant global variables and maps as mentioned in previous section
(b) bool label validity(string a):
Checks if a given label is valid, i.e first character is an alphabet and the label is a non keyword, returns
true if valid
(c) present(string b):
Checks if b is a valid register
(d) int typeconvert(string s):
Converts instruction name into an appropriate integer, useful for simplifying code
(e) converter convert(string s):
Converts string s into an integer and returns that integer as well as whether that integer extracted was the
given string or not (boolean value)
(f) void show registers():
Function to print all 32 registers’ names and the corresponding data stored in them
(g) int process():
Function that processes the input file and converts it into the maps above for execution

3
(h) void final result():
Function to print final specification a total number of cycles, memory used and number of calls of each
instruction.
(i) int run():
Function that runs each instruction in order of execution, simulating the entire execution process.

4. Instructions

(a) add
Takes 3 input arguments - $r1, $r2 and $r3 (add $r1 $r2 $r3)
Value stored in $r2 and $r3 are added and result is stored in $r1.
(b) sub
Takes 3 input arguments - $r1, $r2 and $r3 (sub $r1 $r2 $r3)
Value stored in $r3 is subtracted from value stored in $r2 and result is stored in $r1.
(c) mul
Takes 3 input arguments - $r1, $r2 and $r3 (mul $r1 $r2 $r3)
Value stored in $r2 and $r3 are multiplied and result is stored in $r1.
(d) beq
Takes 3 input arguments - $r1, $r2 and label (beq $r1 $r2 label)
Value stored in $r1 and $r2 are compared and if same, execution jumps to label. Uses the j() function to
jump based on the returned bool.
(e) bne
Takes 3 input arguments - $r1, $r2 and label (bne $r1 $r2 label)
Value stored in $r1 and $r2 are compared and if different, execution jumps to label. Uses the j() function
to jump based on the returned bool.
(f) slt
Takes 3 input arguments - $r1, $r2 and $r3 (slt $r1 $r2 $r3)
Value stored in $r2 and $r3 are compared and if value in $r2 is less than value stored in $r3 then value 1
is stored in $r1, else value 0 is stored in $r1.
(g) j
Takes 1 input argument - label (j label)
Executions jumps to label. Returns integer value that instructs the main iterator(explained in pseudo code)
where to jump to.
(h) lw
Takes 2 input arguments - $r1 and address (lw $r1 address)
Loads the value stored at address into $r1.
(i) sw
Takes 2 input arguments - $r1 and address (sw $r1 address)
Saves the value stored in $r1 at address.
(j) addi
Takes input arguments - $r1, $r2 and c (a constant) (add $r1 $r2 c)
Value stored in $r2 and c are added and result is stored in $r1.

5. Output
The output is 32 registers and their corresponding data each in a new line for every instruction call while
execution. After the execution is complete we print the details such as number of clock cycles and number of
times each instruction was executed, refer testcases.pdf for exact input/output specifications.

4 Testing Strategy
We have developed a test strategy that will check our implementation on all possible classes of inputs. We have not
done testing on large test sets as it is not necessary to evaluate accuracy. We have looked at small test cases that
prove the robustness of our implementation.

We have tested for all possible cases of inputs for our algorithm above and our code works correctly on them, the
input and output files are listed in testcases.pdf.

You might also like