C F C P S (CS61063) : Tutorial 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

COMPUTATIONAL FOUNDATIONS OF CYBER PHYSICAL

SYSTEMS (CS61063)

Tutorial 1
Pipelining

No pipeline

pipelined
Pipelining
• Consider a k-segment pipeline and n tasks need to
be executed
• Each task takes tn time unit to complete in a non-
pipeline architecture
• Each sub-task takes tp time unit to complete in a
pipelined architecture

• Time required to complete n tasks in non-pipelined architecture = n * tn


• In case of pipelined architecture:
1. 1st task takes (k*tp) time units to complete
2. Each of the remaining (n-1) tasks takes tp unit of time to complete i.e. total [(n-1)*tp]
• Speed-up of pipeline processing
s = (n * tn)/[(n + k -1)*tp]
• Throughput = Number of instructions / Total time to complete the instructions
9/16/2020
Pipelining: Problem 1
A nonpipelined system takes 50 ns to process a task. The same task can
be processed in a 6-segment pipeline with a clock cycle of 10ns.
Determine the speed up ratio of the pipeline for 100 tasks

9/16/2020
Pipelining: Problem 2
Consider an instruction pipeline with 4 stages with the stage delays 5 nsec, 6
nsec, 11 nsec, and 8 nsec respectively. The delay of an inter-stage register
stage of the pipeline is 1 nsec. What is the approximate speedup of the
pipeline in the steady state under ideal conditions as compared to the
corresponding non-pipelined implementation?

Consider there are n tasks.


tn = (5 + 6 + 11 + 8) ns = 30 ns
tp = (max{5,6,11,8} + 1 ) ns = 12 ns
speedup = (30 * n)/[(4 + n -1)*12] = 2.5 (approx)
Amdahl’s Law
Amdahl's Law gives us a quick way to find the speedup that occurred because of
some enhancement. Speedup tells us how much faster a task will run using the
computer with the enhancement as opposed to the original computer.
Following are the 2 factors, this calculation is based on:

Fractionenhanced (<1) : The fraction of the computation time in the original


computer that can be converted to take advantage of the enhancement.

Speedupenhanced (>1): The improvement gained by the enhanced execution mode


that is, how much faster the task would run if the enhanced mode were used for
the entire program.
Amdahl’s Law
The Execution timenew with the enhanced mode will be the time spent using the
unenhanced portion of the computer plus the time spent using the enhancement:

Execution timenew = Execution timeold*(1-Fractionenhanced) +


[Execution timeold*Fractionenhanced/Speedupenhanced]

Then the Speedupoverall is the ratio of the new and old execution times i.e.
how much faster the task will run now using the enhancement against the
unenhanced/original computer.
Amdahl’s Law: Problem 1
A common transformation required in graphics processors is square root.
Suppose Floating Point square root (FPSQR) is responsible for 20% of the
execution time of a critical graphics benchmark.
Proposal 1 : Enhance the FPSQR hardware and speed up this operation by a
factor of 10.
Proposal 2 : Try to make all Floating Point (FP) instructions in the graphics
processor run faster by a factor of 1.7.
If FP instructions are responsible for half of the execution time for the
application, compare these two design alternatives.
Amdahl’s Law: Problem 1
Soln.

Since,

[Fractionenhanced=0.2, Speedupenhanced= 10]


SpeedupFPSQR = 1/((1-0.2)+0.2/10)=1.22
[Fractionenhanced=0.5, Speedupenhanced= 1.7]
SpeedupFP = 1/((1-0.5)+0.5/1.7)=1.25
So, considering higher frequency of all FP operations Improving the performance of
all the FP operations is a better strategy i.e. Proposal 2 promises better speed up.
Memory Hierarchy Problems
Suppose that a computer has a processor with two L1 caches, one for
instructions and one for data, and an L2 cache. Let the access time for the two
L1 caches be C1=t. The miss penalties are : approximately C2=15t for transferring
a block from L2 to L1, and M=100t for transferring a block from the main
memory to L2. Assume that, the hit rates are the same for instructions and data
i.e. the hit rates in the L1 cache is h1=0.96 and in L2 cache is h2=0.80.
Memory Hierarchy Problem 1
(a) What fraction of accesses miss in both the L1 and L2 caches, thus requiring
access to the main memory?
Soln. With L1 and L2 caches, the average memory access time (AMAT) is
tavg = h1C1+(1−h1)[h2C2+(1−h2)M]
The fraction of memory accesses that miss in both the L1 and L2 caches
is(1−h1)(1−h2)=(1−0.96)(1−0.80)=0.008
Memory Hierarchy Problem 2,3
(b) What is the average access time as seen by the processor?
Soln. The average memory access time using two cache levels =
tavg (actual)= h1C1+(1−h1)[h2C2+(1−h2)M]
= 0.96t+0.04(0.80×15t+0.20×100t)=2.24t
(c) Suppose that the L2 cache has an ideal hit rate of 1. By what factor would
this reduce the average memory access time as seen by the processor?
Soln. With no misses in the L2 cache, average memory access time
tavg (ideal)= 0.96t+0.04×15t= 1.56t
Therefore, tavg(actual)/tavg(ideal) = 2.24t/1.56t = 1.44
Memory Hierarchy Problem 4
(d) Consider the following change to the memory hierarchy. The L2 cache is
removed and the size of the L1 caches is increased so that their miss rate is cut in
half. What is the average memory access time as seen by the processor in this
case?
Soln. For single level cache the AMAT is tavg=h1C1+(1−h1)M
So, with larger L1 caches and the L2 cache removed, the AMAT is
tavg= 0.98t + 0.02 × 100t = 2.98t

You might also like