Cao 2021 HW2

CAO 2021
Homework 2
Do not write long explanations. Write at most 2 sentence explanation for any question
(marks will be deducted for writing long explanations).
2 bonus mark for writing your answers in latex. 1 bonus mark for writing your
answer in Word or similar software (i.e., non-handwritten submissions).
Only PDF will be accepted. It should have RollNumber_Name_HW1.pdf name. Do
not upload zipped format or word format, etc.
Do not worry if MS-teams renames your file-submission.
1. Consider a processor with a base CPI of 2.

Case 1: The processor has only one level of cache. It has I-cache miss rate of 0.75%
and D-cache miss rate of 2%. Find CPI. [1 mark]
Case 2: The processor has two levels of cache. L1 I-cache miss rate is 0.75% and L1
D-cache miss rate is 2%. Unified L2 cache has access time of 8ns. Assume that 95%
of the L1 I-cache misses are hit in L2 cache. For accesses to L2 cache coming due to
misses in L1 D-cache, the local miss rate of L2 cache is 3%. Find CPI. [3 mark]
For both cases, main memory access latency (i.e., miss penalty) is 80ns. Loads are
25% and stores are 15% of total instructions. Clock frequency is 2 GHz.
2. Consider a cache where the access time is 0.8ns. The processor frequency is
500MHz. Will the cache allow virtual multiporting? If so, how many accesses
can be simultaneously served? [1 mark for each answer]
3. We use the solution of "multiple cache copies" to improve parallelism.

We have the following access sequence coming to cache. Here, number after
read/write command is the address that is accessed.
Wr 100
Rd 2000
Rd 3000
Wr 500
Rd 5400
Wr 550
How many actual reads and writes will happen to the entire cache (i.e., taking both
copies together). [1 mark each for read and write]
4. A cache has 4 banks. A block has a size of 8bits. The cache can store a total
of 12 blocks. Each bank can provide at most 1 bit in each cycle.
We use
(a) 2:1 interleaving
(b) 4:1 interleaving.
Find the latency of reading the entire block in each case. [1 mark for each part, no
need of drawing figures]
5. Show the storage of the matrix below in

(a) row-major format
(b) column-major format
4 6 2
44 38 472
122 456 555
129 771 841
No explanation required. [1 mark for each format]
6. A cache has 2 banks. Block size is 2B. Consider a block which has memory
addresses 16 to 31 (bit-level addresses). Which addresses will be stored in
bank 0 and which addresses will be stored in bank 1? No explanation
required. [ 1 mark]
7. Consider these two codes:
OriginalCode:
for i = 0 to N
for j = 0 to M
A[i][j] = C* A[i][j]
NewCode (after loop-collapsing):
for k = 0 to N*M
NewA[k] = C* NewA[k]
Whether this code will be equivalent to OriginalCode only for row-major layout, or
column-major layout or both? [1 mark]
8. Consider invalidating snooping protocol with 3 CPUs: C1, C2 and C3.
Consider a memory location `L', where the value 15 is stored.
In the following table, fill the cells with activity or values stored. If a
particular cache or the memory does not cache/store the location L, leave it
blank. No explanation required, just fill the table.
C1's C2's C3's

cache cache cache Memory
C2 reads L
C3 reads L
C2 writes 15
to L
C1 reads L
C2 writes 25
to L
C3 reads L
C3 writes 50
to L
[7 marks total. 0.25 marks for writing 4 cells of each step correctly]
9. Assume that a system has 4 processors (P=4). Assume that directory-based

coherence protocol is used. Show the state of (P+1) bit directory for a cache
block after each of these operations to that block.
P3 has read miss
P1 has write miss
P1 has read miss
P0 has write miss
P2 has read miss
P3 has write miss
[6 marks total, 1 marks for writing state correctly after each step]
10. Consider the code.

float arr[N]; //all the values are initialized to zero. That code is omitted and is
irrelevant for us.
for (int i=0; i<N; i++)
arr[i] ++;
Find the working set size of this code if N=128. Write 1 sentence explanation. [1
mark]
11. Represent 7.6979 in single-precision and double-precision floating point
notation (you may use online tool for this). [ total 1 mark] Also, find the
difference between the number (i.e., 7.6979) and what is stored for both
single and double precision-numbers. [total 1 mark]
Try to observe, in which case, is the error (difference) smaller?
12. Represent 2000.53 in IEEE-754 FP32 format. You have to show the
computation steps and three parts (Sign, exp, mantissa) separately. [ 2mark]
13. Which number is shown by 01000011111110100110000101001000. Show the
steps starting from three parts (Sign, exp, mantissa) separately. [ 2mark]
14. Show the range and accuracy of FP16. Do the calculation to show the steps.
[2 marks for each answer]
15. Consider an FP number system with total of 24 bits. It has no sign bit, 10
exponents and remaining mantissa bits. Show the bias (1mark), smallest and
largest normal number (total 2mark) and largest denormal number (2 mark).
How many representations of NAN will this number system have (1 mark)?
16. We have an application where all data values are positive. The highest value
we need to store is 2400. We want to use 16-bit storage and fixed-point
number system. How many bits should we allocate for integer and fraction
part, so we can have as high precision as possible, without causing overflow.
[2 mark]

Cao 2021 HW2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cao 2021 HW2

Uploaded by

Copyright:

Available Formats

CAO 2021

1. Consider a processor with a base CPI of 2.

3. We use the solution of "multiple cache copies" to improve parallelism.

5. Show the storage of the matrix below in

C1's C2's C3's

9. Assume that a system has 4 processors (P=4). Assume that directory-based

10. Consider the code.

You might also like