Professional Documents
Culture Documents
Memory Hierarchy Assignment: 1. Exercises A.8, A.9 From Hennessy and Patterson 5th Ed
Memory Hierarchy Assignment: 1. Exercises A.8, A.9 From Hennessy and Patterson 5th Ed
SUBMITTED BY:-
A.8) For the following we consider instruction encoding for instruction set architectures.
a.Consider the case of a processor with an instruction length of 12 bits and with 32 general-
purpose registers so the size of the address fields is 5 bits. Is it possible to have instruction encodings
for the following?
■ 3 two-address instructions
■ 30 one-address instructions
■ 45 zero-address instructions
b. [10] <A.2, A.7> Assuming the same instruction length and address field sizes as above,
determine if it is possible to have
■ 3 two-address instructions
■ 31 one-address instructions
■ 35 zero-address instructions
Ans:-
a) Apologies for the ambiguity of the question. If you answered it either of the following ways
correctly, you got the points.
All of the instructions can fit together with their needed operands if variable-length opcodes are
used. We can discover this without coming up with the encoding by enumeration:
45 1-addr inst: 45
So:
00 + 2 * 5 bit addr.
01 "
10 "
Next set will have to use the 4th (11) value to differentiate from the first 3.
...
Use the two remaining encodings (in this example: 11+11110 and 11+11111) plus the remaining
bits to represent the zero-address instructions.
11 + 1111 + 6 bits
3 two-address instructions:-
30 one-address instructions:-
opcode: 2^5 = 32, so need 5 bits (log30/log2 == 4.9 and round up)
45 zero-address instructions:
(yes)
A.9 For the following assume that values A, B, C, D, E, and F reside in memory. Also assume that
instruction operation codes are represented in 8 bits, memory addresses are 64 bits, and register
addresses are 6 bits.
a. [10] <A.2> For each instruction set architecture shown in Figure A.2, how many addresses,
or names, appear in each instruction for the code to compute C = A + B, and what is the total
code size?
b. [15] <A.2> Some of the instruction set architectures in Figure A.2 destroy operands in the
course of computation. This loss of data values from processor internal storage has
performance consequences. For each architecture in Figure A.2, write the code sequence to
compute:
C=A+B
D=A–E
F=C+D
In your code, mark each operand that is destroyed during execution and mark each
“overhead” instruction that is included just to overcome this loss of data from processor
internal storage. What is the total code size, the number of bytes of instructions and data
moved to or from memory, the number of overhead instructions, and the number of overhead
data bytes for each of your code sequences?
Ans:-
a.
Arch. type Stack Accumulator Register (register-memory) Register (load-store)
Code size 224 bits 216 bits 240 bits 260 bits
b.
o Destroyed operands.
o Overhead instruction (or load).
o Data moved assumes 4-byte (32-bit) words.
Code: Push A 8+64 Load A 8+64 Load R1, 8+6+64 Load R1, 8+6+64
Push B 8+64 8 Add B (A) 8+64 A 8+6+6+64 A 8+6+64
Add (A,B) 8+64 Store C 8+64 Add R2, 8+6+64 Load R2, 8+6+6+6
Pop C (C) 8+64 Load A (C) 8+64 R1, B 8+6+6+64 B 8+6+64
Push A 8+64 8 Sub E (A) 8+64 Store R2, 8+6+64 Add R3, 8+6+64
Push E 8+64 Store D 8+64 C 8+6+6+64 R1, R2 8+6+6+6
Sub (A,E) 8+64 Add C 8+64 Sub R3, 8+6+64 Store R3, 8+6+64
Pop D (D) 8+64 8 Store F 8+64 R1, E C 8+6+6+6
Push C 8+64 Store R3, Load R4, 8+6+64
Push D D E
Add (C,D) Add R4, Sub R5,
Pop F (F) R3, C R1, R4
Store R4, Store R5,
F D
Add R6,
R3, R5
Store R6,
F
Code size: 672 bits 576 bits 564 bits 546 bits
o For Reg-Mem, also accepted answers that assumed a 3-register version of Add existed.
A.11 [5] <A.3> Consider a C struct that includes the following members:
struct foo {
char a;
bool b;
int c;
double d;
short e;
float f;
double g;
char * cptr;
float * fptr;
int x;
};
For a 32-bit machine, what is the size of the foo struct? What is the minimum size required for
this struct, assuming you may arrange the order of the struct members as you wish? What about for a
64-bit machine?
Ans:-
#include <stdio.h>
#include <stdbool.h>
typedef struct
{
char a;
bool b;
int c;
double d;
short e;
float f;
double g;
char * cptr;
float * fptr;
int x;
} foo;
int main(int argc, char *argv[])
{
//Create pointer to the structure
foo *psInfo = NULL;
//Increment the pointer
psInfo++;
printf("Size of structure = %u\n\n",psInfo);
return 0;
}
Size of structure = 48
Size of structure = 56
A.22 [15/15/10/10] <A.3> The value represented by the hexadecimal number 434F 4D50 5554 4552 is
to be stored in an aligned 64-bit double word.
a. [15] <A.3> Using the physical arrangement of the first row in Figure A.5, write the value to
be stored using Big Endian byte order. Next, interpret each byte as an ASCII character and
below each byte write the corresponding character, forming the character string as it would
be stored in Big Endian order.
b. [15] <A.3> Using the same physical arrangement as in part (a), write the value to be stored
using Little Endian byte order, and below each byte write the corresponding ASCII character.
c. [10] <A.3> What are the hexadecimal values of all misaligned 2-byte words `that can be
read from the given 64-bit double word when stored in Big Endian byte order?
d. [10] <A.3> What are the hexadecimal values of all misaligned 4-byte words that can be read
from the given 64-bit double word when stored in Little Endian byte order?
3. A computer has a 32KB L1 cache with a hit rate of 90% and a L2 cache of 256KB with a hit rate
of 95%. The hit time of L1 is 1 cycle and L2 is 15 cycles. The main memory access takes 120
cycles. Calculate the AMAT if no cache is used, only L1 is used, only L2 is used and both L1 and
For L1
4. Assume we currently have a 64KB 4-way set associative cache with 90% hit rate and a hit time of
1.4 cycles. The latency to access main memory is 200 cycles. What is the AMAT using the cache?
We would like to do way prediction on this which has an accuracy of 80%. A direct mapped cache
of 16KB size has a hit rate of 75% and a hit time of 1 cycle. What is the AMAT of the way predicted
cache?
a. [15 ] <B.1> Show the contents of the table if cache is organized as a directmapped cache.
b. [15] <B.1> Repeat part (a) with the cache organized as a four-way set associative cache.
2.11 [12/15] <2.2> Consider the usage of critical word first and early restart on L2 cache misses.
Assume a 1 MB L2 cache with 64 byte blocks and a refill path that is 16 bytes wide. Assume that the L2
can be written with 16 bytes every 4 processor cycles, the time to receive the first 16 byte block from
the memory controller is 120 cycles, each additional 16 byte block from main memory requires 16
cycles, and data can be bypassed directly into the read port of the L2 cache. Ignore any cycles to
transfer the miss request to the L2 cache and the requested data to the L1 cache.
a. [12] <2.2> How many cycles would it take to service an L2 cache miss with and without
critical word first and early restart?
b. [15] <2.2> Do you think critical word first and early restart would be more important for L1
caches or L2 caches, and what factors would contribute to their relative importance?
ANS:-
b) The benefits of critical word first and early restart depend on the block size.
B.10 ) [10/10/15] <B.3> Consider a two-level memory hierarchy made of L1 and L2 data caches. Assume
that both caches use write-back policy on write hit and both have the same block size. List the actions
taken in response to the following events:
a. [10] <B.3> An L1 cache miss when the caches are organized in an inclusive
hierarchy.
b. [10] <B.3> An L1 cache miss when the caches are organized in an exclusive
hierarchy.
c. [15] <B.3> In both parts (a) and (b), consider the possibility that the evicted