Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Memory Hierarchy Assignment

Date: 11 Nov, 2021

SUBMITTED BY:-

NAME- INDRADEV KUMAR

REG NO- 19MCMC38

1. Exercises A.8, A.9 from Hennessy and Patterson 5th ed.

A.8) For the following we consider instruction encoding for instruction set architectures.

a.Consider the case of a processor with an instruction length of 12 bits and with 32 general-
purpose registers so the size of the address fields is 5 bits. Is it possible to have instruction encodings
for the following?

■ 3 two-address instructions

■ 30 one-address instructions

■ 45 zero-address instructions

b. [10] <A.2, A.7> Assuming the same instruction length and address field sizes as above,
determine if it is possible to have

■ 3 two-address instructions

■ 31 one-address instructions

■ 35 zero-address instructions

Explain your answer

Ans:-

a) Apologies for the ambiguity of the question. If you answered it either of the following ways
correctly, you got the points.

(if you interpreted it as all 3 bullets at once)

All of the instructions can fit together with their needed operands if variable-length opcodes are
used. We can discover this without coming up with the encoding by enumeration:

3 2-addr. inst: 3 * 2^5 * 2^5 possible combinations = 3072


30 1-addr inst: 30 * 2^5 = 960

45 1-addr inst: 45

total = 3072 + 960 + 45 = 4077

2^12 == 4096, and 4077 < 4096, so it should be possible.

If we want to break down how to actually encode these instructions:

3 inst = 2 bits with 1 extra encoding

+ 5 bits * 2 addresses = 12 bits

So:

00 + 2 * 5 bit addr.

01 "

10 "

Next set will have to use the 4th (11) value to differentiate from the first 3.

30 inst. fits in 2^5 (32), so we have 2 remaining

2 bits + 5-bit opcode + 5-bit addr = 12 bits

11 + 00000 + 5 addr bits

...

11 + 11101 + 5 addr bits

Use the two remaining encodings (in this example: 11+11110 and 11+11111) plus the remaining
bits to represent the zero-address instructions.

45 inst < 2^6 (64), so 6 unique bits for opcode

11 + 1111 + 6 bits

So yes, all of these instructions can be encoded with 13 bits.

(if you interpreted it as 3 separate sets)

 3 two-address instructions:-

3 inst = 2 bit opcode (e.g. 00,01,10)

2 addresses = 5 bits * 2 = 10 bits


total = 2 + 5*2 = 12  yes

 30 one-address instructions:-

opcode: 2^5 = 32, so need 5 bits (log30/log2 == 4.9 and round up)

total = 5 + 1*(5 bits) = 10 ->yes

 45 zero-address instructions:

opcode: 2^6 = 64 so 6 bits

total = 6 + 0*(5 bits)

(yes)

A.9 For the following assume that values A, B, C, D, E, and F reside in memory. Also assume that
instruction operation codes are represented in 8 bits, memory addresses are 64 bits, and register
addresses are 6 bits.

a. [10] <A.2> For each instruction set architecture shown in Figure A.2, how many addresses,
or names, appear in each instruction for the code to compute C = A + B, and what is the total
code size?

b. [15] <A.2> Some of the instruction set architectures in Figure A.2 destroy operands in the
course of computation. This loss of data values from processor internal storage has
performance consequences. For each architecture in Figure A.2, write the code sequence to
compute:

C=A+B

D=A–E

F=C+D

In your code, mark each operand that is destroyed during execution and mark each
“overhead” instruction that is included just to overcome this loss of data from processor
internal storage. What is the total code size, the number of bytes of instructions and data
moved to or from memory, the number of overhead instructions, and the number of overhead
data bytes for each of your code sequences?

Ans:-

a.
Arch. type Stack Accumulator Register (register-memory) Register (load-store)

# Addresses 3 3 7 (3 mem. + 4 reg.) 9 (3 mem. + 6 reg.)

Code size 224 bits 216 bits 240 bits 260 bits

b.

o Destroyed operands.
o Overhead instruction (or load).
o Data moved assumes 4-byte (32-bit) words.

Arch. Stack bits Accumulator bits Register bits Register bits


type: (reg- (load-
mem) store)

Code: Push A 8+64 Load A 8+64 Load R1, 8+6+64 Load R1, 8+6+64
Push B 8+64 8 Add B (A) 8+64 A 8+6+6+64 A 8+6+64
Add (A,B) 8+64 Store C 8+64 Add R2, 8+6+64 Load R2, 8+6+6+6
Pop C (C) 8+64 Load A (C) 8+64 R1, B 8+6+6+64 B 8+6+64
Push A 8+64 8 Sub E (A) 8+64 Store R2, 8+6+64 Add R3, 8+6+64
Push E 8+64 Store D 8+64 C 8+6+6+64 R1, R2 8+6+6+6
Sub (A,E) 8+64 Add C 8+64 Sub R3, 8+6+64 Store R3, 8+6+64
Pop D (D) 8+64 8 Store F 8+64 R1, E C 8+6+6+6
Push C 8+64 Store R3, Load R4, 8+6+64
Push D D E
Add (C,D) Add R4, Sub R5,
Pop F (F) R3, C R1, R4
Store R4, Store R5,
F D
Add R6,
R3, R5
Store R6,
F

Code size: 672 bits 576 bits 564 bits 546 bits

Data 9*4B = 36 B (288 b) 8*4B = 32 B (256 b) 7*4B = 28 B (224 b) 6*4 = 24 B (192 b)


moved:

Overhead 3 1 [2 if you count 'Add C'] 0 [1 if you count 'Add C'] 0


inst.:

Overhead 3*4 = 12 bytes 2*4 = 8 bytes 1*4 = 4 bytes 0 bytes


data
bytes:

o For Reg-Mem, also accepted answers that assumed a 3-register version of Add existed.

2. Exercises A.11, A.22 from Hennessy and Patterson 5th ed.

A.11 [5] <A.3> Consider a C struct that includes the following members:
struct foo {

char a;

bool b;

int c;

double d;

short e;

float f;

double g;

char * cptr;

float * fptr;

int x;

};

For a 32-bit machine, what is the size of the foo struct? What is the minimum size required for
this struct, assuming you may arrange the order of the struct members as you wish? What about for a
64-bit machine?

Ans:-

#include <stdio.h>
#include <stdbool.h>
typedef struct
{
char a;
bool b;
int c;
double d;
short e;
float f;
double g;
char * cptr;
float * fptr;
int x;

} foo;
int main(int argc, char *argv[])
{
//Create pointer to the structure
foo *psInfo = NULL;
//Increment the pointer
psInfo++;
printf("Size of structure = %u\n\n",psInfo);
return 0;
}

Fro 32-bit machine date types size are :-

Size of structure = 48

Fro 64-bit machine date types size are :-

Size of structure = 56

A.22 [15/15/10/10] <A.3> The value represented by the hexadecimal number 434F 4D50 5554 4552 is
to be stored in an aligned 64-bit double word.

a. [15] <A.3> Using the physical arrangement of the first row in Figure A.5, write the value to
be stored using Big Endian byte order. Next, interpret each byte as an ASCII character and
below each byte write the corresponding character, forming the character string as it would
be stored in Big Endian order.

b. [15] <A.3> Using the same physical arrangement as in part (a), write the value to be stored
using Little Endian byte order, and below each byte write the corresponding ASCII character.

c. [10] <A.3> What are the hexadecimal values of all misaligned 2-byte words `that can be
read from the given 64-bit double word when stored in Big Endian byte order?

d. [10] <A.3> What are the hexadecimal values of all misaligned 4-byte words that can be read
from the given 64-bit double word when stored in Little Endian byte order?

3. A computer has a 32KB L1 cache with a hit rate of 90% and a L2 cache of 256KB with a hit rate

of 95%. The hit time of L1 is 1 cycle and L2 is 15 cycles. The main memory access takes 120

cycles. Calculate the AMAT if no cache is used, only L1 is used, only L2 is used and both L1 and

L2 are used in a hierarchy.

For L1

Amat= 1 +0.1* (/... +.05 *15) = 1+


For L2

Amat = 0.95 +.05 * 15 = 1.70

4. Assume we currently have a 64KB 4-way set associative cache with 90% hit rate and a hit time of

1.4 cycles. The latency to access main memory is 200 cycles. What is the AMAT using the cache?

We would like to do way prediction on this which has an accuracy of 80%. A direct mapped cache

of 16KB size has a hit rate of 75% and a hit time of 1 cycle. What is the AMAT of the way predicted
cache?

5. Question B.2 from Hennessy and Patterson 5th ed.


B.2 [15/15] <B.1> For the purpose of this exercise, we assume that we have 512-byte cache with 64-byte
blocks. We will also assume that the main memory is 2 KB large. We can regard the memory as an array
of 64-byte blocks: M0, M1, …, M31. Figure B.30 sketches the memory blocks that can reside in different
cache blocks if the cache was fully associative.

a. [15 ] <B.1> Show the contents of the table if cache is organized as a directmapped cache.

b. [15] <B.1> Repeat part (a) with the cache organized as a four-way set associative cache.

6. Question 2.11 from Hennessy and Patterson 5th ed.

2.11 [12/15] <2.2> Consider the usage of critical word first and early restart on L2 cache misses.
Assume a 1 MB L2 cache with 64 byte blocks and a refill path that is 16 bytes wide. Assume that the L2
can be written with 16 bytes every 4 processor cycles, the time to receive the first 16 byte block from
the memory controller is 120 cycles, each additional 16 byte block from main memory requires 16
cycles, and data can be bypassed directly into the read port of the L2 cache. Ignore any cycles to
transfer the miss request to the L2 cache and the requested data to the L1 cache.

a. [12] <2.2> How many cycles would it take to service an L2 cache miss with and without
critical word first and early restart?

b. [15] <2.2> Do you think critical word first and early restart would be more important for L1
caches or L2 caches, and what factors would contribute to their relative importance?

ANS:-

a) Without critical word first and early

restart:L2 block size = 6


4/ 16 = 4 words
Total cycles = Cycle for first word + remaining words = 120 + (3 x 16) = 168 cycles
With critical word first and early restart:
Total cycles = Cycle for first word = 120 cycles

b) The benefits of critical word first and early restart depend on the block size.

Since,L2 has larger cache block size.


Thus, it’s more important to L2 which do not have to wait the whole block.
While L1 block is generally smaller, the improvement may not be significant .
It also depends on the average memory access time and percent reduction in miss service
times

7. Question B.10 from Hennessy and Patterson 5th ed.

B.10 ) [10/10/15] <B.3> Consider a two-level memory hierarchy made of L1 and L2 data caches. Assume
that both caches use write-back policy on write hit and both have the same block size. List the actions
taken in response to the following events:

a. [10] <B.3> An L1 cache miss when the caches are organized in an inclusive

hierarchy.

b. [10] <B.3> An L1 cache miss when the caches are organized in an exclusive

hierarchy.

c. [15] <B.3> In both parts (a) and (b), consider the possibility that the evicted

line might be clean or dirty.

You might also like