Professional Documents
Culture Documents
SLIDES01 - SE15 - AOA (Read-Only)
SLIDES01 - SE15 - AOA (Read-Only)
SLIDES01 - SE15 - AOA (Read-Only)
Day 1
1
Objectives of the course
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 2
Technologies Ltd Version No: 2.0
The main objective of the course is to introduce “Analysis of Algorithms” and to compute the
performance parameters of an algorithm.
After studying this course, you will get a better understanding on the importance of
designing good algorithms and efficient programs.
2
References
1. Donald E Knuth (1997)The Art of Computer Programming, Fundamental
Algorithms, Volume 1, Third Edition, Addison Wesley
2. Cormen, Leiserson, Rivest, Stein(2001), Introduction to Algorithms, Second
Edition, Prentice Hall
3. Alfred V Aho, John E Hopcraft, Jeffrey D Ullman (1998), Design & Analysis of
Computer Algorithms, Addison Wesley Publishing Company
4. Ellis Horowitz, Sartaj Sahni, Sanguthevar, (1998)Fundamentals of Computer
Algorithms, Galgotia Publications private limited, New Delhi
5. Weiss M, W. (1993) Data Structures and Algorithm Analysis in C, Benjamin
Cummings, Addison Wesley
6. Jon Bentley(2000), Programming Pearls, Second Edition, Pearson Education
7. McConnell, S. (1993) Code complete, Microsoft Press
8. Press, et al (2002), Numerical Recipes in C++, Cambridge Univ Press
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 3
Technologies Ltd Version No: 2.0
3
Course Plan
Day 1
• Analyzing Algorithms
– Introduction to Space and Time complexities
– Basic Mathematical principles
– Order of magnitude
– Introduction to Asymptotic notations
• Best case
• Worst case
• Average case
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 4
Technologies Ltd Version No: 2.0
4
Course Plan (cont...)
Day 2
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 5
Technologies Ltd Version No: 2.0
5
Course plan (cont…)
Day 3
• Code Tuning
• SQL Query Tuning
• Introduction to Numerical Analysis
• Intractable problems
– Deterministic Vs Non-Deterministic machines
– P Vs NP
– NP Complete
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 6
Technologies Ltd Version No: 2.0
6
Analysis of Algorithms
Unit 1 - Introduction
7
Introduction to Algorithms
The etymology of the word Algorithm dates
back to the 8th Century AD.
The word Algorithm is derived from the
name of the Persian author
“Abu Jafar Mohammad ibn Musa al
Khowarizmi”
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 8
Technologies Ltd Version No: 2.0
Abu Jafar Mohammad ibn Musa al Khowarizmi - was a great mathematician who was born
around 780 AD in Baghdad. He worked on, algebra, geometry, and astronomy. His treatise
on algebra, Hisab al-jabr w'al-muqabala, was the most famous and important of all of al-
Khwarizmi's works. It is the title of this text that gives us the word "algebra"
8
What is an Algorithm?
• Finite set of instructions to accomplish a task. The algorithm should be
correct
• The properties of an algorithm are as follows:
Finiteness
Output Input
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 9
Technologies Ltd Version No: 2.0
9
Algorithm
Practice:
Check if the above algorithm (Euclid’s Algorithm) to find the GCD of two given
numbers is satisfying all the properties of an algorithm
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 10
Technologies Ltd Version No: 2.0
The above algorithm satisfies all the properties except definiteness because what will
happen if m=-2 and n=3.45.
So change Step1 as “Get two positive non zero integers m and n”.
10
Algorithms span a vast space
• The definition
“Finite Set of Instructions to accomplish a task”
spans a very vast space. We will only discuss a few kinds of algorithms,
but will briefly indicate the larger picture, through a simplified banking
application
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 11
Technologies Ltd Version No: 2.0
The gamut of algorithms is very vast, spanning symbolic, numerical, power efficient, fault
tolerant algorithms, etc. We illustrate this wide variety through an simplified banking
example.
The different kinds of algorithms used in the banking application have different speeds,
memory requirements, real time response, numerical accuracy, fault tolerance, etc.
11
Banking Applications: Utilize Computers, Networks,
and Storage Routing Tables, Link State
Information
Failover
This banking application utilizes all kinds of algorithms from symbolic through real time through fault tolerant. The
figure also illustrates in a simplified form the architectural building blocks which comprise this banking system, and
algorithm classes written to execute on it.
We show a Finacle installation, with terminals at a branch connected to a set of clustered web servers for
authentication. The web servers are in turn connected to a set of application servers for implementing banking
rules and polices. The application servers access mirrored and/or replicated data storage. Redundancy is present
in the network also. Telephone banking using an Interactive Voice Response System is used as a backup if the
branch terminals break down.
The design of the authentication hardware and software requires fault tolerance – the users should not have to
relogin if one or more servers fail – some state should be stored in the form of cookies in non-volatile storage
somewhere. The banking calculations require very high accuracy (30+ digit accuracy). Various kinds of fault
tolerance schemes are used for storage. For example, two mirrored disks always keep identical data. A write to
one disk is not considered complete till the other is written also. The servers have to respond within seconds to
each user level request (deposit, withdrawal, etc) – the real time response of the system has to be evaluated using
queuing theory and similar techniques. For TTS, the response output speech samples have to be guaranteed to
be delivered at periodic time intervals, say every 125 microseconds.
Glossary:
DR: Disaster Recovery
TTS: Text to Speech
IVR: Interactive Voice Response
WAN: Wide Area Network
12
Pseudo Code
• An algorithm is independent of any language or machine whereas a program is
dependent on a language and machine
• To fill the gap between these two, we need pseudo codes
Psuedo-code is a way to represent the step by step methods in finding the solution
to the given problem.
Example:
Algorithm arrayMax (A,n)
Input array A of n integers
Output maximum element of A
CurrentMax Å A[0]
for i = 1 to n do
if A[i] > currentMax then
currentMax Å A[i]
return currentMax
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 13
Technologies Ltd Version No: 2.0
Algorithms are developed during the design phase of software engineering. During the
design phase, we first look at the problem, try to write the “psuedo-code” and move towards
the programming (implementation) phase.
13
Life Cycle of an Algorithm
• Design the Algorithm
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 14
Technologies Ltd Version No: 2.0
The life cycle of an algorithm consists of the four phases: Design, Write, Test and Analyze.
(i) Design:
The design techniques help in devising the algorithms. Some techniques are Divide & Conquer, Greedy
Technique, Dynamic Programming etc.
The design techniques will be dealt in Unit-3 (day 2).
(ii) Write (implementation): Implementing the algorithm in pseudo code which will be later represented in an
appropriate programming language.
(iv) Analyze: Estimating the amount of time/space (which are considered to be prime resources) required while
executing the algorithm.
14
Resources available in a computer
POWER
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 15
Technologies Ltd Version No: 2.0
In this course we will focus on time (CPU utilization) and space (memory utilization).
When an algorithm is designed it should be analyzed for the amount of these resources it
consumes. While solving a problem, an algorithm consuming more resources than others
will not be considered in most of the cases.
15
Analysis of Algorithms
• An algorithm when implemented, uses the computer’s primary memory and
Central Processing Unit
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 16
Technologies Ltd Version No: 2.0
In Analysis we analyze the amount of resources needed for a particular solution of the
problem.
There are two types of Analysis:
Priori Analysis:
This is the theoretical estimation of resources required. Here the efficiency of the algorithm
is checked. If possible the logic of the algorithm can be improved for efficiency.
This is done before the implementation of the algorithm on a machine and so it is done
independent of any machine/software.
Posteriori Analysis:
This Analysis is done after implementing the algorithm on a target machine. It is aimed
at determination of actual statistics about algorithm’s consumption of time and space
requirements (primary memory) in the computer when it is being executed as a program.
Before implementing the algorithm (Priori Analysis) in a programming language, the best
of the three algorithms will be selected(Algo3 will suit if n is large).
16
A high-level view of analysis of algorithms
Accurate, to within
Algorithm Error Margin
Number!
Condition Number!
Correctness
Asymptotics
Beyond
O(N2), Asymptotics: Power Analysis,
O(Nlog(N))
O(Nlog(N)) Mean,Variance,
Mean,Variance, Physical
… Modeling
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 17
Technologies Ltd Version No: 2.0
•Numerical algorithms have to be devised for adequate accuracy. Only after you get
sufficient accuracy can we look at speed.
•Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc.
Memory or in general resource usage is a dual metric
•Many algorithms, especially banking and finance are required to be fault tolerant,
especially of server failures, etc. These systems are required to be generally geographically
distributed. The resulting communication overhead can often be the dominant contribution
to time.
17
Efficiency Measures
• Performance of a solution
• Performance Measures
• Time
• Quality
• Simplicity…
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 18
Technologies Ltd Version No: 2.0
Why Performance?
Since most of the software problems do not have a unique solution, we are always
interested in finding the better solution. A better solution is judged based on its performance.
Some of the performance measures include the time taken by the solution, the quality of the
solution, the simplicity of the solution, etc.
For any solution to a problem we would always ask the following questions:
“Is it feasible to use this solution?” Æ In other words is it efficient enough to be used in
practice? The efficiency measure which we normally look for is time and space. How much
time does this solution take?. How much space (memory) does this solution occupy?
Improving the performance of a solution can be done by improving the algorithm design,
database design, transaction design and by paying attention to the end-user psychology.
Also continuous improvements in hardware and communication infrastructure aid in
improving the performance of a solution.
18
Efficiency Measures (Contd…)
• Space Time Tradeoff
Which implementation would provide faster access to an employee with a given employee
number?
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 19
Technologies Ltd Version No: 2.0
The above mentioned example tries to highlight the need for performance. Each of the three
questions asked are aimed at some performance measure.
The array data structure is a better data structure for each of these questions. However if a
different company also plans to buy this product, then the size of the array must be very high
(which could as well lead to wastage of space). In this case a linked list data structure might
be a better option.
This example also highlights an universal problem called the space time tradeoff, which we
will be discussing shortly.
19
Efficiency Measures (Contd …)
Example 2: Think of a GUI drop-down list box that displays a list of employees
whose names begin with a specified sequence of characters. If the employee
database is on a different machine, then there are two options:
Option a: fire a SQL and retrieve the relevant employee names each time the list
is dropped down.
Option b: keep the complete list of employees in memory and refer to it each time
the list is dropped down.
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 20
Technologies Ltd Version No: 2.0
This example again does not have a unique solution. It depends on various parameters
which include:
• The number of employees
•The transmission time from the database server to the client machine
•The volume of data transmission each time
•The frequency of such requests.
•The network bandwidth
Neither of the solutions is the better one. The main point here is the tradeoff. When ever we
need a better performance in terms of time taken, then we could opt for the option b which
would however lead to more memory requirements. The vice versa is also true. When we
want our solution to occupy less memory (space) then we need to strike a compromise for
the efficiency in terms of time taken. This tradeoff is called the space time tradeoff which is
an universal principle.
20
Efficiency Measures (Contd …)
Example 3:
Which one of the following problems requires more space?
• Design a computer program that sorts ( in Ascending order ) and outputs the
result for any input sequence a1,a2,…an of numbers, where n is any natural
number
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 21
Technologies Ltd Version No: 2.0
21
Summary of Unit - 1
• What is an Algorithm?
• Properties of an Algorithm
• Performance Measures
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 22
Technologies Ltd Version No: 2.0
22
Analysis of Algorithms
Unit 2 - Analyzing Algorithms
23
Analysis of Algorithms
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 24
Technologies Ltd Version No: 2.0
When a programmer builds an algorithm during design phase of software life cycle, he/she
might not be able to implement it immediately. This is because programming comes in later
part of the software life cycle. But there is a need to analyze the algorithm at that stage.
This will help in forecasting how much time the algorithm takes or how much primary
memory it might occupy when it is implemented. So analysis of algorithm becomes very
important.
24
Space Complexity
The space needed by a program has the following components:
• Instruction space
• Data space
• Environment stack space
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 25
Technologies Ltd Version No: 2.0
Instruction space:
Space needed to store the object code.
Data space:
Space needed to store constants & variables.
Environment stack space:
Space needed when functions are called. If the function, fnA calls another
function fnB then the return address and all the local variables and formal parameters are
to stored.
25
Time Complexity
Time complexity depends on the machine, compilers and other real time factors.
Where opi(n) is the number of instances the operation opi occurs and ti is the time
taken for executing the operation
This Total time is a varying factor which depends on the current load of the system
and other real time factors like communication
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 26
Technologies Ltd Version No: 2.0
Time complexity also depends on all the factors that the space complexity depends on.
Time complexity includes the compilation time and execution time but compilation is done
once whereas the execution is done n number of times. So the compilation time is not
considered in most of the cases but only the execution time.
26
Time Complexity (Cont…)
Operation count is one way to estimate the Time Complexity.
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 27
Technologies Ltd Version No: 2.0
The success of this method (Operation count) depends on the identification of the exact
operation/s that contribute most to the time complexity.
27
Time Complexity (Cont…)
Step count is another way to estimate time complexity
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 28
Technologies Ltd Version No: 2.0
28
Time Complexity (Cont…)
Recursive functions:
Total steps
________
fact(n) 0
{ 0
1.1 if (n<=1) n
return 1; 1
1.2 return ( n*fact(n-1) ); 2n-2
} 0
________
3n - 1
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 29
Technologies Ltd Version No: 2.0
29
Time Complexity (Cont…)
Function calling:
Consider a function calling the function sum(array,n) (ref: slide 28)
Total Steps
____________
Callsum(array1,array2,n) 0
{ 0
1.1 for( i=0 ; I < n; i++) 2n+2
1.1.1 array2[i]=sum(array1,i+1); 3i + 8 = n(3n+13)/2
} 0
______________
Total number of steps: (3n2 + 17n + 4)/2
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 30
Technologies Ltd Version No: 2.0
30
Kinds of Analysis of Algorithms
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 31
Technologies Ltd Version No: 2.0
Priori Analysis is carried out before the program is written (based on the algorithm). The
calculation of order of magnitude in the examples we have seen above, is the priori
analysis of the algorithm.
In case of priori analysis, we ignore the machine and platform dependent factors. Also we
analyze the algorithm before we write the program. It is always better if we analyze the
algorithm at the earlier stage of the software life cycle.
31
Posteriori analysis
• External factors influence the execution of the algorithm
– Network delay
– Hardware failure etc.,
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 32
Technologies Ltd Version No: 2.0
32
Posteriori Analysis (Cont…)
PROFILER
• What is a Profiler?
A tool to identify the performance bottlenecks of an application.
• Why Profiler?
– To find the performance bottle necks.
– Visualizing the run time of the code.
– Finding out the time consumed by the code for the given input
• Limitations of a Profiler
– Most Profilers talks more specific in terms of time duration
– May vary depending on the load on the system
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 33
Technologies Ltd Version No: 2.0
Build a table which lists the total number of steps that each statement contributes. Add the
contributions of all statements to obtain the step count for the entire program. So we can
get the percentage of each statement. This approach in obtaining the step count (ref: time
complexity) is called profiling. The same approach is applicable to various functions
(subprograms) available in a program.
33
Priori Analysis
Priori analysis require the knowledge of
– Mathematical equations
– Determination of the problem size
– Order of magnitude of any algorithm
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 34
Technologies Ltd Version No: 2.0
34
Some Basic Mathematics
Arithmetic Progressions:
n
+1)
∑i = 1 + 2 + 3 + ... + (n −1) + n = n(n2
i =0
Geometric Progressions:
x −1 n +1
, if ( x ≠ 1)
n
∑x =
i
i =0
x −1
∞
1
∑x
i =0
i
=
1− x
, if x < 1
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 35
Technologies Ltd Version No: 2.0
Geometric Progressions:
There will be a constant ratio between an element and its successor( it is the same as the
ratio between an element and its predecessor).
So the series will be,
a, a r, ar^2, a r^3, …
35
Some Basic Mathematics (Contd…)
Logarithms:
a = b log b a
log c a
log b a =
log c b
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 36
Technologies Ltd Version No: 2.0
Factorials:
A number n! is represented by 1 * 2 * 3 * …. * (n-1) * n
36
Some Basic Mathematics (Contd…)
A few mathematical formulae.
1 + 22 + … + n2 = n * (n + 1) * (2n + 1) / 6
1 + a + a2 + … + an = (a(n+1) – 1) / (a – 1)
Choice function:n n!
C =
r!( n − r )!
r
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 37
Technologies Ltd Version No: 2.0
•Applying the basic concepts we had seen so far, the above series can be
evaluated.
37
Growth of functions
Algorithm complexity will be represented
in terms of mathematical functions
Ex. n log n, n2
log(n)
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 38
Technologies Ltd Version No: 2.0
•In the figure in the slide, the x axis represents the problem size and the y axis represents
the resources.
•As part of Basic Mathematical Principles we will introduce applicable mathematics as
required for this course.
•Growth of functions: The above figure shows the growth of a few mathematical
functions. The x-axis varies from 0 to 50 and the y-axis varies from 0 to 100. The
point to observed here is that the growth rate of the function log(n) is smaller when
compared to the other functions namely n, nlog(n), n2 and 2n. An exponential
function like 2n will ultimately over take any polynomial function. The need to
understand the growth of these basic functions will be well appreciated in the later
chapters wherein we analyze algorithms.
•From the graph, we can find that the logarithmic functions will grow more slowly
and the exponential functions will grow much faster.
The functions which grows at the rate of n! are called factorial functions.
The growth rate of factorial is tremendous, that it will be much more greater than 2 ^ x.
38
Some Basic Mathematics (Contd…)
How many times should we divide (into half) the number of elements ‘n’
(discarding reminders if any) to reach 1 element?
m= floor(log2n)
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 39
Technologies Ltd Version No: 2.0
•As a corollary to the above mentioned result we can easily see that a
number n must be halved floor(log2n) + 1 times to reach 0.
39
A high-level view of analysis of algorithms
Accurate, to within
Algorithm Error Margin
Number!
Condition Number!
Correctness
Asymptotics
Beyond
O(N2), Asymptotics: Power Analysis,
O(Nlog(N))
O(Nlog(N)) Mean,Variance,
Mean,Variance, Physical
… Modeling
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 40
Technologies Ltd Version No: 2.0
Given the wide variety of algorithms, they can be analyzed in many dimensions, speed,
accuracy, power consumption, and resiliency.
•Numerical algorithms have to be devised for adequate accuracy. Only after you get
sufficient accuracy can we look at speed.
•Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc.
Instead of time, we can look at memory or in general resource usage also.
•Many algorithms, especially banking and finance are required to be fault tolerant,
especially of server failures, etc. These systems are required to be generally geographically
distributed. The resulting communication overhead can often be the dominant contribution
to time.
40
Problem size
The problem size depends on the nature of the problem for which we are
developing the algorithms.
The complexity of an algorithm is expressed as a function of problem size
Examples:
• If we are searching an element in an array having ‘n’ elements, the problem
size is ____
same as the size of array ( = ‘n’).
• If we are merging 2 arrays of size ‘n’ and ‘m’, the problem size of the algorithm
is _____
sum of two array sizes ( = ‘n + m’)
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 41
Technologies Ltd Version No: 2.0
41
Order of Magnitude of an algorithm
Calculate the running time and consider only the leading term of the formula which
gives the order of magnitude.
• Example 1
for( i = 0; i< n; i ++)
{
...
...
}
Assume there are ‘c’ number of statements inside the loop
Each statement takes 1 unit of time
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 42
Technologies Ltd Version No: 2.0
In calculating the order of magnitude, the lower order terms are left out as they are
relatively insignificant.
The assumptions in the example are made because we will not know on which machine the
algorithm is to be implemented. So we can’t exactly say how much time each statement will
take. The exact time depends on the machine on which the algorithm is run.
In the example the approximation is done because for higher values of ‘n’, the effect of ‘c’
(constant) will not be significant. Thus, constants can be ignored.
42
Order of Magnitude of an algorithm (Cont…)
• Example 2
for( i=0;i<n; i ++) {
for(j=0;j<m;j++) {
…. ….
}
}
Assume we have ‘c’ number of statements inside the innermost loop
Following the same assumptions as the earlier example
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 43
Technologies Ltd Version No: 2.0
In the above example, the inner loop will be executed m times and the outer loop n times.
43
Analysis based on the nature of the problem
The analysis of the algorithm can be performed based on the nature of the
problem.
Thus we have:
• Worst case analysis
• Average case analysis
• Best case analysis
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 44
Technologies Ltd Version No: 2.0
Worst case:
Under what condition/s does the algorithm when executed consumes maximum amount of
resources. It is the maximum amount of resource the algorithm can consume for any value
of problem size.
Best case:
Under what condition/s does the algorithm when executed consumes minimum amount of
resources.
Average case:
This is between worst case & best case. It is probabilistic in nature. Average-case running
times are calculated by first arriving at an understanding of the average nature of the input,
and then performing a running-time analysis of the algorithm for this configuration.
Average case analysis is done by considering every possibility are equally likely to happen.
44
Why Worst case analysis?
Even though the average case is more tends towards the real situation worst case
analysis is preferred due to the following reasons:
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 45
Technologies Ltd Version No: 2.0
Here we prefer worst case complexity is due to ease of computation of the worst case
complexity of the algorithm compared to the average case complexity and the least usage
of best case. Also it is better to find the maximum time of execution of an algorithm to be on
the safer side.
45
Asymptotic notations for determination of order of
magnitude of an algorithm
The limiting behavior of the complexity of a problem as problem size increases is
called asymptotic complexity
• ‘Omega’ notation:
It represents the lower bound of the resources required to solve a problem.
It is represented by Ω
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 46
Technologies Ltd Version No: 2.0
The goodness of an algorithm is expressed usually in terms of its worst case running time.
‘Worst case running time’ of an algorithm is the ‘upper bound’ for time of execution of
that algorithm for different problem size.
An algorithm is said to have a worst-case running time of O(n^2) if, its running time
(execution time) is always bound within n^2 where n is the problem size.
46
Asymptotic analysis: What it does?
• Asymptotic analysis is necessary but not sufficient for many kinds of problems
?
%
100%
N 2N
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 47
Technologies Ltd Version No: 2.0
The large body of literature on asymptotic (apriori) analysis basically answers the question:
In relative terms, how much more time does a problem of twice (say) the size take? Say, if I
can sort 1000 numbers in unit time, how much time will it take for sorting 10000 numbers?
The unit time is not specified (analysis is relative), but could be say 10-100 microseconds
on typical modern PCs.
It does not attempt to give exact estimates of runtime. In database and similar applications,
asymptotic analysis is very useful, as it yields insight into scalability to larger database
sizes. In real-time and transaction processing system, scalability in terms of throughput
(increased answers/second for problems of the same size), requires the mean and variance
of the execution time to be controlled instead.
?%
100%
N 2N
47
Big Oh notation
T(n) = O(f(n)) if there are constants c and n0 such that T(n) <= cf(n) when
n >= n0. In this Big-Oh notation for worst case analysis, c and n0 are positive
integers. n0 represents the threshold problem size.
T(n)
T(n) is bound within f(n)
for different values of n
cf(n)
T(n) (Upped bound of
the algorithm)
Problem Size
Threshold
Problem size
n0
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 48
Technologies Ltd Version No: 2.0
While we compute the complexity of any algorithm, we take the threshold problem size i.e n
> n0 , where n0 is the threshold problem size and n is the problem size. Accordingly we
determine the upper bound of computation.
In the above graph, the dotted line (parallel to y axis ) passing through the intersection of
T(n) and f(n) represents the threshold problem size.
The threshold problem size is taken into account in priori analysis because the algorithm
might have some assignment operations which can’t be neglected for a lower problem size
( i.e for lower values of ‘n’).
Example:
T(n) = (n+1)2
Which is O(n2).
f(n) = n2
Let n0 = 1 ( threshold value)
c=(1+1)2 = 4
So there exists n0 and c such that T(n) <= cf(n).
48
Theta & Omega notations
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 49
Technologies Ltd Version No: 2.0
Theta notation:
If it can proved that for any two constants c1 & c2, T(n) lies between c1.f(n) and c2.f(n) then
T(n) can be expressed as Θ( f( n )).
Omega notation:
The function f(n) is the lower bound for T(n). This means for any value of n (n ≥ n0), the
time of computation of the algorithm T(n) is always above the graph of f(n). So f(n) serves
as the lower bound for T(n).
49
Big ‘Oh’ Vs Omega notations
Case (i) : A Project manager requires maximum of 100 software engineers to
finish the project on time.
Case (ii) : The Project manager can start the project with minimum of 50 software
engineers but cannot assure the completion of project in time.
Case (i) is similar to Big Oh notation, specifying the upper bound of resources
needed to do a task.
Case (ii) is similar to Omega notation, specifying the lower bound of resources
needed to do a task.
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 50
Technologies Ltd Version No: 2.0
50
‘Big Oh’ manipulations
While finding the worst case complexities of algorithms using Big Oh notation,
some/all of the following rules are used.
Rule I
The leading coefficients of highest power of ‘n’ and all lower powers of ‘n’ and the
constants are ignored in f(n)
Example:
T(n) = O(100n3 + 29 n2 + 19n)
T(n) = O(n3)
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 51
Technologies Ltd Version No: 2.0
The constants and the slower growing terms are ignored as their growth rates are
insignificant compared to the growth rate of the highest power.
51
Big Oh Manipulations (contd.,)
Rule II :
The time of execution of a ‘for loop’ is the ‘running time’ of all statements
inside the ‘for loop’ multiplied by number of iterations of the ‘for loop’.
Example:
for( i=0 to n)
{
x Å x + 1;
y Å y + 1;
xÅx+y
}
T ( n ) = O( 3 * n ) = O ( n )
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 52
Technologies Ltd Version No: 2.0
52
Big Oh Manipulations (contd.,)
Rule III :
If we have a ‘nested for loop’, in an algorithm, the analysis of that algorithm should
start from the inner loop and move it outwards towards outer loop.
Example:
for(j=0 to m) {
for( i=0 to n) {
x Å x + 1;
y Å y + 1;
z Å x + y;
}
}
The worst case running time of inner loop is O( 3*n )
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 53
Technologies Ltd Version No: 2.0
53
Big Oh Manipulations (contd.,)
Rule IV :
The execution of an ‘if else statement’ is an algorithm comprises of
• Execution time for testing the condition
• The maximum execution time of either ‘if’ or ‘else’( whichever is larger )
Example:
If(x > y) {
print( “ x is larger than y”);
print(“ x is the value to be selected”);
z Å x;
x Å x+1;
}
else print( “ x is smaller than y”);
The execution time of the program is the exec. time of testing (X > Y) +
exec. time of ‘if’ statement, as the execution time of ‘if’ statement is
more than that of ‘else’ statement
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 54
Technologies Ltd Version No: 2.0
O(constant)=1.
For example, O(100)=1
54
Case study on analysis of algorithms
The following examples will help us to understand the concept of worst case and
average case complexities
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 55
Technologies Ltd Version No: 2.0
The above given code inserts a value k into position l in an array a. The basic operation here
is copy.
Worst Case Analysis: Step 2 does n-1 copies in the worst case. Step 3 does 1 copy. So
the total number of copy operations is n-1+1=n. Hence the worst case complexity of array
insertion is O(n).
Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived
as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs
2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the
average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n + (n/n) =
(n+1)/2. Also step 3 performs 1 copy. So on an average the array insertion performs
((n+1)/2) + 1 copies. Hence the average case complexity of array insertion is O(n).
Best case Analysis:
O(1) = 1, as only one insertion is done with no movements.
55
Case study (Contd…)
1 to (i-1)
j to (n-1)
(i+1) to n
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 56
Technologies Ltd Version No: 2.0
The above given code deletes the value k at a given index i in an array a. The basic
operation here is copy.
Worst Case Analysis: Step 2 does n-1 copies in the worst case. So the total number of
copy operations is n-1. Hence the worst case complexity is O(n).
Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived
as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs
2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the
average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n = (n-1)/2. So
on an average the array deletion performs
((n-1)/2) copies. Hence the average case complexity of array insertion is O(n).
Best case Analysis:
O(1) = 1, as only one deletion will be done with no further movements.
56
Summary of Unit-2
• Analyzing Algorithms
– Introduction to Space and Time complexities
– Basic Mathematical principles
– Order of magnitude
– Introduction to Asymptotic notations
• Best case
• Worst case
• Average case
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 57
Technologies Ltd Version No: 2.0
57
Thank You!
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 58
Technologies Ltd Version No: 2.0
58