SLIDES01 - SE15 - AOA (Read-Only)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Analysis of Algorithms

Day 1

1
Objectives of the course

• To introduce the concept of ‘Analysis of Algorithms’


• To learn the various factors that affect the performance of an algorithm
• To introduce algorithm design techniques
• To learn Code Tuning Techniques
• To introduce Numerical Analysis (Accuracy)
• To introduce Intractable problems

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 2
Technologies Ltd Version No: 2.0

The main concerns of a software engineer are to ensure:


(i) Correctness of the solution
(ii) Decomposition of a software application into small and clean units which can be
maintained easily
(iii) Improving the performance of software application

The main objective of the course is to introduce “Analysis of Algorithms” and to compute the
performance parameters of an algorithm.
After studying this course, you will get a better understanding on the importance of
designing good algorithms and efficient programs.

2
References
1. Donald E Knuth (1997)The Art of Computer Programming, Fundamental
Algorithms, Volume 1, Third Edition, Addison Wesley
2. Cormen, Leiserson, Rivest, Stein(2001), Introduction to Algorithms, Second
Edition, Prentice Hall
3. Alfred V Aho, John E Hopcraft, Jeffrey D Ullman (1998), Design & Analysis of
Computer Algorithms, Addison Wesley Publishing Company
4. Ellis Horowitz, Sartaj Sahni, Sanguthevar, (1998)Fundamentals of Computer
Algorithms, Galgotia Publications private limited, New Delhi
5. Weiss M, W. (1993) Data Structures and Algorithm Analysis in C, Benjamin
Cummings, Addison Wesley
6. Jon Bentley(2000), Programming Pearls, Second Edition, Pearson Education
7. McConnell, S. (1993) Code complete, Microsoft Press
8. Press, et al (2002), Numerical Recipes in C++, Cambridge Univ Press

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 3
Technologies Ltd Version No: 2.0

3
Course Plan
Day 1

• Introduction to Analysis of algorithms


– What is an Algorithm?
– Properties of an Algorithm
– Life cycle of an Algorithm

• Analyzing Algorithms
– Introduction to Space and Time complexities
– Basic Mathematical principles
– Order of magnitude
– Introduction to Asymptotic notations
• Best case
• Worst case
• Average case

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 4
Technologies Ltd Version No: 2.0

4
Course Plan (cont...)
Day 2

• Algorithm design techniques


– Brute force
– Greedy
– Divide & Conquer
– Decrease & Conquer
– Dynamic Programming

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 5
Technologies Ltd Version No: 2.0

5
Course plan (cont…)
Day 3

• Code Tuning
• SQL Query Tuning
• Introduction to Numerical Analysis
• Intractable problems
– Deterministic Vs Non-Deterministic machines
– P Vs NP
– NP Complete

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 6
Technologies Ltd Version No: 2.0

6
Analysis of Algorithms
Unit 1 - Introduction

7
Introduction to Algorithms
The etymology of the word Algorithm dates
back to the 8th Century AD.
The word Algorithm is derived from the
name of the Persian author
“Abu Jafar Mohammad ibn Musa al
Khowarizmi”

Muhammad al-Khowarizmi, from a 1983 USSR


commemorative stamp scanned by Donald Knuth
Reference: ACM Trans - Algorithms

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 8
Technologies Ltd Version No: 2.0

Abu Jafar Mohammad ibn Musa al Khowarizmi - was a great mathematician who was born
around 780 AD in Baghdad. He worked on, algebra, geometry, and astronomy. His treatise
on algebra, Hisab al-jabr w'al-muqabala, was the most famous and important of all of al-
Khwarizmi's works. It is the title of this text that gives us the word "algebra"

8
What is an Algorithm?
• Finite set of instructions to accomplish a task. The algorithm should be
correct
• The properties of an algorithm are as follows:

Finiteness

Effectiveness Algorithm Definiteness

Output Input

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 9
Technologies Ltd Version No: 2.0

An Algorithm is defined as “Finite set of instructions to accomplish a task”.

An Algorithm has five properties as follows:


Finiteness: An algorithm should end in a finite number of steps.
Definiteness: Every step of an algorithm should be clear and unambiguously defined.
Input: The input of an algorithm can either be given interactively by the user or generated
internally.
Output: An algorithm should have at least one output.
Effectiveness: Every step in the algorithm should be easy to understand and prove using
paper and pencil.

9
Algorithm

Practice:

Write an algorithm to find the GCD of two numbers?

Step1: Get two numbers m & n


Step2: Divide m by n
Step3: If the remainder is 0 then return n as the GCD
else
m Å n, nÅ remainder
Goto Step2

Check if the above algorithm (Euclid’s Algorithm) to find the GCD of two given
numbers is satisfying all the properties of an algorithm

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 10
Technologies Ltd Version No: 2.0

The above algorithm satisfies all the properties except definiteness because what will
happen if m=-2 and n=3.45.
So change Step1 as “Get two positive non zero integers m and n”.

10
Algorithms span a vast space
• The definition
“Finite Set of Instructions to accomplish a task”
spans a very vast space. We will only discuss a few kinds of algorithms,
but will briefly indicate the larger picture, through a simplified banking
application

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 11
Technologies Ltd Version No: 2.0

The gamut of algorithms is very vast, spanning symbolic, numerical, power efficient, fault
tolerant algorithms, etc. We illustrate this wide variety through an simplified banking
example.

The different kinds of algorithms used in the banking application have different speeds,
memory requirements, real time response, numerical accuracy, fault tolerance, etc.

11
Banking Applications: Utilize Computers, Networks,
and Storage Routing Tables, Link State
Information

Telephone Banking (IVR),


Authentication, Encryption Real Time TTS

Fault tolerant Communication protocols, real


Datastructures time / Error Recovery

Failover

High Speed Rule-based System, with/without


state, 10K+ transactions/second
Financially accurate calculation (Rs 1 in Rs
1000,000 crores, one part in 1013)
WAN Link to
Mirror, Keep multiple Copies in DR
Sync!
Huge databases: 10’s of Terabytes
Replication
Disk Layout, Data Compression, Database Optimization, Encryption Bandwidth Conservation, MP3
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 12
Fault tolerance (detect potential loss)
Technologies Ltd Version No: 2.0

This banking application utilizes all kinds of algorithms from symbolic through real time through fault tolerant. The
figure also illustrates in a simplified form the architectural building blocks which comprise this banking system, and
algorithm classes written to execute on it.

We show a Finacle installation, with terminals at a branch connected to a set of clustered web servers for
authentication. The web servers are in turn connected to a set of application servers for implementing banking
rules and polices. The application servers access mirrored and/or replicated data storage. Redundancy is present
in the network also. Telephone banking using an Interactive Voice Response System is used as a backup if the
branch terminals break down.

The design of the authentication hardware and software requires fault tolerance – the users should not have to
relogin if one or more servers fail – some state should be stored in the form of cookies in non-volatile storage
somewhere. The banking calculations require very high accuracy (30+ digit accuracy). Various kinds of fault
tolerance schemes are used for storage. For example, two mirrored disks always keep identical data. A write to
one disk is not considered complete till the other is written also. The servers have to respond within seconds to
each user level request (deposit, withdrawal, etc) – the real time response of the system has to be evaluated using
queuing theory and similar techniques. For TTS, the response output speech samples have to be guaranteed to
be delivered at periodic time intervals, say every 125 microseconds.

Glossary:
DR: Disaster Recovery
TTS: Text to Speech
IVR: Interactive Voice Response
WAN: Wide Area Network

12
Pseudo Code
• An algorithm is independent of any language or machine whereas a program is
dependent on a language and machine
• To fill the gap between these two, we need pseudo codes
Psuedo-code is a way to represent the step by step methods in finding the solution
to the given problem.

Example:
Algorithm arrayMax (A,n)
Input array A of n integers
Output maximum element of A
CurrentMax Å A[0]
for i = 1 to n do
if A[i] > currentMax then
currentMax Å A[i]
return currentMax

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 13
Technologies Ltd Version No: 2.0

Algorithms are developed during the design phase of software engineering. During the
design phase, we first look at the problem, try to write the “psuedo-code” and move towards
the programming (implementation) phase.

It is a high level description of the algorithm


It is less detailed than the program
Will not reveal the design issues of the program
Uses English like language

13
Life Cycle of an Algorithm
• Design the Algorithm

• Write (Implementation of the Algorithm)

• Test the Algorithm

• Analyze the Algorithm

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 14
Technologies Ltd Version No: 2.0

The life cycle of an algorithm consists of the four phases: Design, Write, Test and Analyze.
(i) Design:
The design techniques help in devising the algorithms. Some techniques are Divide & Conquer, Greedy
Technique, Dynamic Programming etc.
The design techniques will be dealt in Unit-3 (day 2).

(ii) Write (implementation): Implementing the algorithm in pseudo code which will be later represented in an
appropriate programming language.

(iii) Test: Testing the algorithm for its correctness.

(iv) Analyze: Estimating the amount of time/space (which are considered to be prime resources) required while
executing the algorithm.

14
Resources available in a computer

POWER

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 15
Technologies Ltd Version No: 2.0

The Primary Resources available in a deterministic silicon computer are:


CPU &
Primary memory

In this course we will focus on time (CPU utilization) and space (memory utilization).

When an algorithm is designed it should be analyzed for the amount of these resources it
consumes. While solving a problem, an algorithm consuming more resources than others
will not be considered in most of the cases.

15
Analysis of Algorithms
• An algorithm when implemented, uses the computer’s primary memory and
Central Processing Unit

• Analyzing the amount of resources needed for a particular solution of the


problem

• The Analysis is done at two stages:


– Priori Analysis:
» Analysis done before implementation
– Posteriori Analysis:
» Analysis done after implementation

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 16
Technologies Ltd Version No: 2.0

In Analysis we analyze the amount of resources needed for a particular solution of the
problem.
There are two types of Analysis:
Priori Analysis:
This is the theoretical estimation of resources required. Here the efficiency of the algorithm
is checked. If possible the logic of the algorithm can be improved for efficiency.
This is done before the implementation of the algorithm on a machine and so it is done
independent of any machine/software.
Posteriori Analysis:
This Analysis is done after implementing the algorithm on a target machine. It is aimed
at determination of actual statistics about algorithm’s consumption of time and space
requirements (primary memory) in the computer when it is being executed as a program.

Eg. Algorithm to check whether a number is prime or not.


Algo1: Divide the number n from 2 to (n-1) and check the reminder
Algo2: Divide the number n from 2 to n/2 and check the reminder
Algo3: Divide the number n from 2 to sqrt(n) and check the reminder

Before implementing the algorithm (Priori Analysis) in a programming language, the best
of the three algorithms will be selected(Algo3 will suit if n is large).

After implementing the algorithm (Posteriori Analysis) in a programming language, the


performance is checked with the help of a profiler.

16
A high-level view of analysis of algorithms

Accurate, to within
Algorithm Error Margin
Number!
Condition Number!

Correctness

Resource Usage: Resiliency


Time/Memory/Power Analysis:
Communication/I- Mirroring,
Communication/I-O
Replication,…
Distributed
System Analysis

Asymptotics
Beyond
O(N2), Asymptotics: Power Analysis,
O(Nlog(N))
O(Nlog(N)) Mean,Variance,
Mean,Variance, Physical
… Modeling

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 17
Technologies Ltd Version No: 2.0

Algorithms can be analyzed in many dimensions, speed, accuracy, power consumption,


and resiliency.

•Numerical algorithms have to be devised for adequate accuracy. Only after you get
sufficient accuracy can we look at speed.

•Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc.
Memory or in general resource usage is a dual metric

•Embedded systems have to be power efficient, e.g. cell phones.

•Many algorithms, especially banking and finance are required to be fault tolerant,
especially of server failures, etc. These systems are required to be generally geographically
distributed. The resulting communication overhead can often be the dominant contribution
to time.

17
Efficiency Measures
• Performance of a solution

• Most of the software problems do not have a single best solution

• Then how do we judge these solutions?

• The solutions are chosen based on performance measures

• Performance Measures

• Time

• Quality

• Simplicity…

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 18
Technologies Ltd Version No: 2.0

Why Performance?

Since most of the software problems do not have a unique solution, we are always
interested in finding the better solution. A better solution is judged based on its performance.
Some of the performance measures include the time taken by the solution, the quality of the
solution, the simplicity of the solution, etc.

For any solution to a problem we would always ask the following questions:

“Is it feasible to use this solution?” Æ In other words is it efficient enough to be used in
practice? The efficiency measure which we normally look for is time and space. How much
time does this solution take?. How much space (memory) does this solution occupy?

Improving the performance of a solution can be done by improving the algorithm design,
database design, transaction design and by paying attention to the end-user psychology.
Also continuous improvements in hardware and communication infrastructure aid in
improving the performance of a solution.

18
Efficiency Measures (Contd…)
• Space Time Tradeoff

Example 1: Consider a personnel management product that an organization can purchase


and use to maintain information about its employees. If employee details were to be stored in
an array, the array would have to be declared large enough to be able to hold the maximum
number of records the system was rated to handle. This would always take up a large
amount of memory. With a linked list implementation on the other hand, there would be
better utilization of memory.

Which implementation would provide faster access to an employee with a given employee
number?

Which implementation would be easier to code?

Which implementation would be easier to test?

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 19
Technologies Ltd Version No: 2.0

The above mentioned example tries to highlight the need for performance. Each of the three
questions asked are aimed at some performance measure.

The array data structure is a better data structure for each of these questions. However if a
different company also plans to buy this product, then the size of the array must be very high
(which could as well lead to wastage of space). In this case a linked list data structure might
be a better option.

This example also highlights an universal problem called the space time tradeoff, which we
will be discussing shortly.

19
Efficiency Measures (Contd …)
Example 2: Think of a GUI drop-down list box that displays a list of employees
whose names begin with a specified sequence of characters. If the employee
database is on a different machine, then there are two options:

Option a: fire a SQL and retrieve the relevant employee names each time the list
is dropped down.

Option b: keep the complete list of employees in memory and refer to it each time
the list is dropped down.

In your opinion which is the preferred option and why?

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 20
Technologies Ltd Version No: 2.0

This example again does not have a unique solution. It depends on various parameters
which include:
• The number of employees
•The transmission time from the database server to the client machine
•The volume of data transmission each time
•The frequency of such requests.
•The network bandwidth

Neither of the solutions is the better one. The main point here is the tradeoff. When ever we
need a better performance in terms of time taken, then we could opt for the option b which
would however lead to more memory requirements. The vice versa is also true. When we
want our solution to occupy less memory (space) then we need to strike a compromise for
the efficiency in terms of time taken. This tradeoff is called the space time tradeoff which is
an universal principle.

20
Efficiency Measures (Contd …)
Example 3:
Which one of the following problems requires more space?

• Design a computer program which produces an output 1, if the word is of


length 3n (n=0,1,2,…) and 0, otherwise
example:
If the input is “aabcef” the output is 1
If the input is “aabc” then the output is 0

• Design a computer program that sorts ( in Ascending order ) and outputs the
result for any input sequence a1,a2,…an of numbers, where n is any natural
number

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 21
Technologies Ltd Version No: 2.0

Consider the RAM size required in both the programs.


Program 1 always requires a constant amount of memory.
Program 2 must require memory of arbitrary length.

21
Summary of Unit - 1
• What is an Algorithm?

• Properties of an Algorithm

• Life Cycle of an Algorithm

• Performance Measures

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 22
Technologies Ltd Version No: 2.0

22
Analysis of Algorithms
Unit 2 - Analyzing Algorithms

23
Analysis of Algorithms

• Refers to predicting the resources required by the


algorithm, based on size of the problem

• The primary resources required are Time and Space

• Analysis based on time taken to execute


the algorithm is called Time complexity of the Algorithm

• Analysis based on the memory required to


execute the algorithm is called Space
complexity of the Algorithm

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 24
Technologies Ltd Version No: 2.0

When a programmer builds an algorithm during design phase of software life cycle, he/she
might not be able to implement it immediately. This is because programming comes in later
part of the software life cycle. But there is a need to analyze the algorithm at that stage.
This will help in forecasting how much time the algorithm takes or how much primary
memory it might occupy when it is implemented. So analysis of algorithm becomes very
important.

Complexity of an algorithm represents the amount of resources required while executing


the algorithm.
There will always be a tradeoff between the time and space complexity.
Most of the problems which require more space will take less time to execute and vice
versa.

24
Space Complexity
The space needed by a program has the following components:
• Instruction space
• Data space
• Environment stack space

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 25
Technologies Ltd Version No: 2.0

Instruction space:
Space needed to store the object code.
Data space:
Space needed to store constants & variables.
Environment stack space:
Space needed when functions are called. If the function, fnA calls another
function fnB then the return address and all the local variables and formal parameters are
to stored.

25
Time Complexity
Time complexity depends on the machine, compilers and other real time factors.

Total time = Σ ( ti * opi(n) )

Where opi(n) is the number of instances the operation opi occurs and ti is the time
taken for executing the operation

This Total time is a varying factor which depends on the current load of the system
and other real time factors like communication

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 26
Technologies Ltd Version No: 2.0

Time complexity also depends on all the factors that the space complexity depends on.

Time complexity includes the compilation time and execution time but compilation is done
once whereas the execution is done n number of times. So the compilation time is not
considered in most of the cases but only the execution time.

26
Time Complexity (Cont…)
Operation count is one way to estimate the Time Complexity.

• Example 1: Searching an array for the presence of an element


Here the time complexity is estimated based on the number of search operations.

• Example 2: Finding the roots of a quadratic equation ax2+bx+c =0


The roots are (–b + sqrt(b2 -4*a*c))/2a and (–b - sqrt(b2 -4*a*c))/2a.

Here the number of operations can be reduced by computing the common


expression sqrt(b2 -4*a*c).

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 27
Technologies Ltd Version No: 2.0

The success of this method (Operation count) depends on the identification of the exact
operation/s that contribute most to the time complexity.

27
Time Complexity (Cont…)
Step count is another way to estimate time complexity

Consider the code below: Total steps


___________
sum(array, n) 0
{ 0
1.1 tsum = 0; 1
1.2 for (i=0 ; i<n ; i++) 2n+2
1.2.1 tsum = tsum + array[i]; n
1.3 return tsum; 1
} 0
___________
Total number of steps: 3n+4

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 28
Technologies Ltd Version No: 2.0

28
Time Complexity (Cont…)
Recursive functions:
Total steps
________
fact(n) 0
{ 0
1.1 if (n<=1) n
return 1; 1
1.2 return ( n*fact(n-1) ); 2n-2
} 0
________
3n - 1

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 29
Technologies Ltd Version No: 2.0

Step1.1 is executed for n times and return for 1 time


Step1.2 contains one multiplication and one function calling. Each will be done for (n-1)
times, so 2n-2

29
Time Complexity (Cont…)
Function calling:
Consider a function calling the function sum(array,n) (ref: slide 28)

Total Steps
____________
Callsum(array1,array2,n) 0
{ 0
1.1 for( i=0 ; I < n; i++) 2n+2
1.1.1 array2[i]=sum(array1,i+1); 3i + 8 = n(3n+13)/2
} 0
______________
Total number of steps: (3n2 + 17n + 4)/2

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 30
Technologies Ltd Version No: 2.0

Regarding step 1.1.1 the function sum(array,n) is being called.


The total number of steps for that function is already calculated as 3n + 4. The function
Callsum is called for n=i+1. So substituting n=i+1 will give 3(i + 1) + 4 = 3i + 7.
This value is incremented by 1 for the function call. So, it will become 3i + 8.
This 3i + 8 will vary for i=0 to n-1 which is (3*0 + 8) + (3*1 + 8) + … + (3*(n-1) + 8)
= 3(0+1+2+…+n-1) + 8n = 3(n-1)n/2 + 8n = n(3n+13)/2

30
Kinds of Analysis of Algorithms

• Posteriori Analysis is aimed at determination of actual statistics about


algorithm’s consumption of time and space requirements (primary memory) in
the computer when it is being executed as a program. The Profiler tool is
mainly used in finding the performance bottlenecks of a program

• Priori Analysis is aimed at analyzing the algorithm before it is implemented on


any computer. It will give the approximate amount of resources required to
solve the problem before execution

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 31
Technologies Ltd Version No: 2.0

Posteriori analysis is done after implementing the algorithm in a Programming Language


and running it in a machine.

Priori Analysis is carried out before the program is written (based on the algorithm). The
calculation of order of magnitude in the examples we have seen above, is the priori
analysis of the algorithm.

In case of priori analysis, we ignore the machine and platform dependent factors. Also we
analyze the algorithm before we write the program. It is always better if we analyze the
algorithm at the earlier stage of the software life cycle.

31
Posteriori analysis
• External factors influence the execution of the algorithm
– Network delay
– Hardware failure etc.,

• The same algorithms might behave differently on different systems


• The load on the machine can vary which affects the real performance
measure of the algorithm
• Profiler tool can be used for performing Posteriori analysis

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 32
Technologies Ltd Version No: 2.0

32
Posteriori Analysis (Cont…)
PROFILER
• What is a Profiler?
A tool to identify the performance bottlenecks of an application.

• Why Profiler?
– To find the performance bottle necks.
– Visualizing the run time of the code.
– Finding out the time consumed by the code for the given input

• Limitations of a Profiler
– Most Profilers talks more specific in terms of time duration
– May vary depending on the load on the system

• Queries can also be profiled (provided by database vendors)


– tkprof

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 33
Technologies Ltd Version No: 2.0

Build a table which lists the total number of steps that each statement contributes. Add the
contributions of all statements to obtain the step count for the entire program. So we can
get the percentage of each statement. This approach in obtaining the step count (ref: time
complexity) is called profiling. The same approach is applicable to various functions
(subprograms) available in a program.

Refer Lab guide for VC++ profiler.

33
Priori Analysis
Priori analysis require the knowledge of
– Mathematical equations
– Determination of the problem size
– Order of magnitude of any algorithm

Each of these are discussed in the forthcoming sections

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 34
Technologies Ltd Version No: 2.0

34
Some Basic Mathematics
Arithmetic Progressions:

n
+1)
∑i = 1 + 2 + 3 + ... + (n −1) + n = n(n2
i =0

Geometric Progressions:

x −1 n +1

, if ( x ≠ 1)
n

∑x =
i

i =0
x −1

1
∑x
i =0
i
=
1− x
, if x < 1

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 35
Technologies Ltd Version No: 2.0

Mathematical knowledge is an essence for performing priori analysis.


Arithmetic progressions:
In this series, the difference between an element to its successor is the same as the
difference between the element and its predecessor.
So the series will be,
a, a + d, a + 2d, a + 3d,…
Sum of n terms = n/2 * ( first term + last term)
Also the sum of n terms = (n/2) * [ 2 * first term + (n-1) * constant diff.] = (n/2)*[2a + (n - 1) d]

Geometric Progressions:
There will be a constant ratio between an element and its successor( it is the same as the
ratio between an element and its predecessor).
So the series will be,
a, a r, ar^2, a r^3, …

The sum to n terms are shown in the above slide.

35
Some Basic Mathematics (Contd…)

Logarithms:
a = b log b a

logb a = (logc a)(logb c)


1
logb a =
log a b

log c a
log b a =
log c b
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 36
Technologies Ltd Version No: 2.0

The log functions grow slowly compared to linear functions.


•loga(x) is a constant multiple of logb(x) for fixed a, b

Whenever the log is specified, it is log base 2.

Factorials:
A number n! is represented by 1 * 2 * 3 * …. * (n-1) * n

36
Some Basic Mathematics (Contd…)
A few mathematical formulae.

ƒ 1 + 22 + … + n2 = n * (n + 1) * (2n + 1) / 6

ƒ 1 + a + a2 + … + an = (a(n+1) – 1) / (a – 1)

ƒ Floor function f(x) or x :


For a real number x, f(x) is the largest integer not greater than x.

ƒ Choice function:n n!
C =
r!( n − r )!
r

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 37
Technologies Ltd Version No: 2.0

•Applying the basic concepts we had seen so far, the above series can be
evaluated.

37
Growth of functions
ƒ Algorithm complexity will be represented
in terms of mathematical functions
Ex. n log n, n2

ƒ Given the complexities, n2 nlog(n)


n log(n)
n2
which will grow slow?
2n n

log(n)

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 38
Technologies Ltd Version No: 2.0

•In the figure in the slide, the x axis represents the problem size and the y axis represents
the resources.
•As part of Basic Mathematical Principles we will introduce applicable mathematics as
required for this course.

•Growth of functions: The above figure shows the growth of a few mathematical
functions. The x-axis varies from 0 to 50 and the y-axis varies from 0 to 100. The
point to observed here is that the growth rate of the function log(n) is smaller when
compared to the other functions namely n, nlog(n), n2 and 2n. An exponential
function like 2n will ultimately over take any polynomial function. The need to
understand the growth of these basic functions will be well appreciated in the later
chapters wherein we analyze algorithms.
•From the graph, we can find that the logarithmic functions will grow more slowly
and the exponential functions will grow much faster.

What are factorial functions? What is their growth rate?

The functions which grows at the rate of n! are called factorial functions.

The growth rate of factorial is tremendous, that it will be much more greater than 2 ^ x.

38
Some Basic Mathematics (Contd…)
How many times should we divide (into half) the number of elements ‘n’
(discarding reminders if any) to reach 1 element?

Since n is being divided by 2 consecutively we need to consider two cases.


Case – 1: n is a power of 2:
Say for example n = 8 in which case 8 must be halved 3 times to reach 1
8 Î 4 Î 2 Î 1. Similarly 16 must be halved 4 times to reach 1.
16 Î 8 Î 4 Î 2 Î 1
Case – 2: n is not a power of 2:
Say for example n = 9 in which case 9 must be halved 3 times to reach 1
9 Î 4 Î 2 Î 1. Similarly 15 must be halved 3 times to reach 1
15 Î 7 Î 3 Î 1. So if 2m < n < 2(m+1) then n must be halved m times to reach 1
In general, n must be halved m times and m is given by :

m= floor(log2n)

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 39
Technologies Ltd Version No: 2.0

•The above mentioned result is necessary for analyzing most of the


algorithms

•As a corollary to the above mentioned result we can easily see that a
number n must be halved floor(log2n) + 1 times to reach 0.

39
A high-level view of analysis of algorithms

Accurate, to within
Algorithm Error Margin
Number!
Condition Number!

Correctness

Resource Usage: Resiliency


Time/Memory/Power Analysis:
Communication/I- Mirroring,
Communication/I-O
Replication,…
Distributed
System Analysis

Asymptotics
Beyond
O(N2), Asymptotics: Power Analysis,
O(Nlog(N))
O(Nlog(N)) Mean,Variance,
Mean,Variance, Physical
… Modeling

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 40
Technologies Ltd Version No: 2.0

Given the wide variety of algorithms, they can be analyzed in many dimensions, speed,
accuracy, power consumption, and resiliency.

•Numerical algorithms have to be devised for adequate accuracy. Only after you get
sufficient accuracy can we look at speed.

•Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc.
Instead of time, we can look at memory or in general resource usage also.

•Embedded systems have to be power efficient, e.g. cell phones.

•Many algorithms, especially banking and finance are required to be fault tolerant,
especially of server failures, etc. These systems are required to be generally geographically
distributed. The resulting communication overhead can often be the dominant contribution
to time.

•In this module, we shall primarily focus on ASYMPTOTICS

40
Problem size
The problem size depends on the nature of the problem for which we are
developing the algorithms.
The complexity of an algorithm is expressed as a function of problem size

Examples:
• If we are searching an element in an array having ‘n’ elements, the problem
size is ____
same as the size of array ( = ‘n’).
• If we are merging 2 arrays of size ‘n’ and ‘m’, the problem size of the algorithm
is _____
sum of two array sizes ( = ‘n + m’)

• If we are computing the nth factorial, the problem size is

The Problem size is ‘n’

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 41
Technologies Ltd Version No: 2.0

The space required for storing n elements is n.


The space required for representing the binary form of a number n is floor(log n) + 1.

41
Order of Magnitude of an algorithm
Calculate the running time and consider only the leading term of the formula which
gives the order of magnitude.
• Example 1
for( i = 0; i< n; i ++)
{
...
...
}
Assume there are ‘c’ number of statements inside the loop
Each statement takes 1 unit of time

Execution time for 1 loop = c * 1 = c

Total execution time = n * c

Since ‘c’ is constant it is insignificant. So the order is ‘n’

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 42
Technologies Ltd Version No: 2.0

In calculating the order of magnitude, the lower order terms are left out as they are
relatively insignificant.

The assumptions in the example are made because we will not know on which machine the
algorithm is to be implemented. So we can’t exactly say how much time each statement will
take. The exact time depends on the machine on which the algorithm is run.
In the example the approximation is done because for higher values of ‘n’, the effect of ‘c’
(constant) will not be significant. Thus, constants can be ignored.

42
Order of Magnitude of an algorithm (Cont…)
• Example 2
for( i=0;i<n; i ++) {
for(j=0;j<m;j++) {
…. ….
}
}
Assume we have ‘c’ number of statements inside the innermost loop
Following the same assumptions as the earlier example

Execution time for 1 loop = c * 1

Execution time for the inner loop = m * c

Total execution time = n * (m * c)

Since c is a constant, the total execution time = n * m

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 43
Technologies Ltd Version No: 2.0

In the above example, the inner loop will be executed m times and the outer loop n times.

43
Analysis based on the nature of the problem
The analysis of the algorithm can be performed based on the nature of the
problem.

Thus we have:
• Worst case analysis
• Average case analysis
• Best case analysis

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 44
Technologies Ltd Version No: 2.0

Worst case:
Under what condition/s does the algorithm when executed consumes maximum amount of
resources. It is the maximum amount of resource the algorithm can consume for any value
of problem size.

Best case:
Under what condition/s does the algorithm when executed consumes minimum amount of
resources.

Average case:
This is between worst case & best case. It is probabilistic in nature. Average-case running
times are calculated by first arriving at an understanding of the average nature of the input,
and then performing a running-time analysis of the algorithm for this configuration.
Average case analysis is done by considering every possibility are equally likely to happen.

44
Why Worst case analysis?
Even though the average case is more tends towards the real situation worst case
analysis is preferred due to the following reasons:

• It is better to bound ones pessimism – the time of execution can’t go beyond


T(n) as it is the upper bound

• Generally it is easy to compute the worst case analysis as compared to


computation of best case and average case of algorithms

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 45
Technologies Ltd Version No: 2.0

During Priori analysis Worst case complexity is preferred. Why?


The goodness of an algorithm is most often expressed in terms of its worst-case running
time. There are two reasons for this: the need for a bound on one’s pessimism, and the
ease of calculation (in most of the cases) of worst-case times as compared to average-case
times

Here we prefer worst case complexity is due to ease of computation of the worst case
complexity of the algorithm compared to the average case complexity and the least usage
of best case. Also it is better to find the maximum time of execution of an algorithm to be on
the safer side.

45
Asymptotic notations for determination of order of
magnitude of an algorithm
The limiting behavior of the complexity of a problem as problem size increases is
called asymptotic complexity

The most common asymptotic notations are:


• ‘Big Oh’ ( ‘O’) notation:
It represents the upper bound of the resources required to solve a problem.
It is represented by ‘O’

• ‘Omega’ notation:
It represents the lower bound of the resources required to solve a problem.
It is represented by Ω

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 46
Technologies Ltd Version No: 2.0

The goodness of an algorithm is expressed usually in terms of its worst case running time.
‘Worst case running time’ of an algorithm is the ‘upper bound’ for time of execution of
that algorithm for different problem size.
An algorithm is said to have a worst-case running time of O(n^2) if, its running time
(execution time) is always bound within n^2 where n is the problem size.

Goodness of an algorithm refers to efficiency or capability


Upper bound is also called the upper limit or the range of maximum values. Eg: when we
consider marks of a student out of 100, 100 is the upper bound. Student can’t get marks
greater than 100.

46
Asymptotic analysis: What it does?
• Asymptotic analysis is necessary but not sufficient for many kinds of problems

?
%

100%

N 2N

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 47
Technologies Ltd Version No: 2.0

The large body of literature on asymptotic (apriori) analysis basically answers the question:
In relative terms, how much more time does a problem of twice (say) the size take? Say, if I
can sort 1000 numbers in unit time, how much time will it take for sorting 10000 numbers?
The unit time is not specified (analysis is relative), but could be say 10-100 microseconds
on typical modern PCs.
It does not attempt to give exact estimates of runtime. In database and similar applications,
asymptotic analysis is very useful, as it yields insight into scalability to larger database
sizes. In real-time and transaction processing system, scalability in terms of throughput
(increased answers/second for problems of the same size), requires the mean and variance
of the execution time to be controlled instead.

A large portion of this course will deal with asymptotic analysis.

?%

100%

N 2N
47
Big Oh notation
T(n) = O(f(n)) if there are constants c and n0 such that T(n) <= cf(n) when
n >= n0. In this Big-Oh notation for worst case analysis, c and n0 are positive
integers. n0 represents the threshold problem size.

T(n)
T(n) is bound within f(n)
for different values of n
cf(n)
T(n) (Upped bound of
the algorithm)

Problem Size

Threshold
Problem size
n0

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 48
Technologies Ltd Version No: 2.0

While we compute the complexity of any algorithm, we take the threshold problem size i.e n
> n0 , where n0 is the threshold problem size and n is the problem size. Accordingly we
determine the upper bound of computation.
In the above graph, the dotted line (parallel to y axis ) passing through the intersection of
T(n) and f(n) represents the threshold problem size.
The threshold problem size is taken into account in priori analysis because the algorithm
might have some assignment operations which can’t be neglected for a lower problem size
( i.e for lower values of ‘n’).

Example:
T(n) = (n+1)2
Which is O(n2).
f(n) = n2
Let n0 = 1 ( threshold value)
c=(1+1)2 = 4
So there exists n0 and c such that T(n) <= cf(n).

48
Theta & Omega notations

Theta notation (Θ):

T( n ) = Θ( f( n )) if there are positive constants c1, c2 and n0 such that


c2.f(n) ≤ T( n ) ≤ c1.f(n), for all n ≥ n0.

Omega Notation (Ω):

T( n ) = Ω( f( n )) if there are positive constants c and n0 such that


T( n ) ≥ c.f( n ) for all n ≥ n0.

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 49
Technologies Ltd Version No: 2.0

Theta notation:
If it can proved that for any two constants c1 & c2, T(n) lies between c1.f(n) and c2.f(n) then
T(n) can be expressed as Θ( f( n )).

Omega notation:
The function f(n) is the lower bound for T(n). This means for any value of n (n ≥ n0), the
time of computation of the algorithm T(n) is always above the graph of f(n). So f(n) serves
as the lower bound for T(n).

49
Big ‘Oh’ Vs Omega notations
Case (i) : A Project manager requires maximum of 100 software engineers to
finish the project on time.
Case (ii) : The Project manager can start the project with minimum of 50 software
engineers but cannot assure the completion of project in time.

Case (i) is similar to Big Oh notation, specifying the upper bound of resources
needed to do a task.
Case (ii) is similar to Omega notation, specifying the lower bound of resources
needed to do a task.

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 50
Technologies Ltd Version No: 2.0

Which case is preferred?

Case (i) is preferred in most of the situations.

50
‘Big Oh’ manipulations

While finding the worst case complexities of algorithms using Big Oh notation,
some/all of the following rules are used.

Rule I
The leading coefficients of highest power of ‘n’ and all lower powers of ‘n’ and the
constants are ignored in f(n)

Example:
T(n) = O(100n3 + 29 n2 + 19n)

Representing the same in big Oh notation

T(n) = O(n3)

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 51
Technologies Ltd Version No: 2.0

The constants and the slower growing terms are ignored as their growth rates are
insignificant compared to the growth rate of the highest power.

51
Big Oh Manipulations (contd.,)
Rule II :
The time of execution of a ‘for loop’ is the ‘running time’ of all statements
inside the ‘for loop’ multiplied by number of iterations of the ‘for loop’.

Example:
for( i=0 to n)
{
x Å x + 1;
y Å y + 1;
xÅx+y
}

The for loop is executed n times.


So, worst case running time of the algorithm is

T ( n ) = O( 3 * n ) = O ( n )

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 52
Technologies Ltd Version No: 2.0

52
Big Oh Manipulations (contd.,)
Rule III :
If we have a ‘nested for loop’, in an algorithm, the analysis of that algorithm should
start from the inner loop and move it outwards towards outer loop.
Example:
for(j=0 to m) {
for( i=0 to n) {
x Å x + 1;
y Å y + 1;
z Å x + y;
}
}
The worst case running time of inner loop is O( 3*n )

The worst case running time of outer loop is O( m*3*n )

The total running time = O ( m * n )

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 53
Technologies Ltd Version No: 2.0

53
Big Oh Manipulations (contd.,)
Rule IV :
The execution of an ‘if else statement’ is an algorithm comprises of
• Execution time for testing the condition
• The maximum execution time of either ‘if’ or ‘else’( whichever is larger )
Example:
If(x > y) {
print( “ x is larger than y”);
print(“ x is the value to be selected”);
z Å x;
x Å x+1;
}
else print( “ x is smaller than y”);
The execution time of the program is the exec. time of testing (X > Y) +
exec. time of ‘if’ statement, as the execution time of ‘if’ statement is
more than that of ‘else’ statement
ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 54
Technologies Ltd Version No: 2.0

O(constant)=1.
For example, O(100)=1

54
Case study on analysis of algorithms

The following examples will help us to understand the concept of worst case and
average case complexities

Example – 1: Consider the following pseudocode.


To insert a given value, k at a particular index, l in an array, a[1…n]:
1. Begin
2. Copy a[l…n] to a[l+1…n+1] (Assuming space is available)
3. Copy k to a[l]
4. End

BEST CASE: O (1)

WORST CASE: O (n)

AVERAGE CASE: O (n)

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 55
Technologies Ltd Version No: 2.0

The above given code inserts a value k into position l in an array a. The basic operation here
is copy.
Worst Case Analysis: Step 2 does n-1 copies in the worst case. Step 3 does 1 copy. So
the total number of copy operations is n-1+1=n. Hence the worst case complexity of array
insertion is O(n).
Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived
as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs
2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the
average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n + (n/n) =
(n+1)/2. Also step 3 performs 1 copy. So on an average the array insertion performs
((n+1)/2) + 1 copies. Hence the average case complexity of array insertion is O(n).
Best case Analysis:
O(1) = 1, as only one insertion is done with no movements.

55
Case study (Contd…)

Example – 2: Consider the following pseudocode.


To delete the value, k at a given index, i in an array, a[1…n]:
1. Begin
1 to (j-1)
2. Copy a[i+1…n] to a[i…n-1]
3. Clear a[n]
4. End

1 to (i-1)
j to (n-1)

(i+1) to n

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 56
Technologies Ltd Version No: 2.0

The above given code deletes the value k at a given index i in an array a. The basic
operation here is copy.
Worst Case Analysis: Step 2 does n-1 copies in the worst case. So the total number of
copy operations is n-1. Hence the worst case complexity is O(n).
Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived
as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs
2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the
average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n = (n-1)/2. So
on an average the array deletion performs
((n-1)/2) copies. Hence the average case complexity of array insertion is O(n).
Best case Analysis:
O(1) = 1, as only one deletion will be done with no further movements.

56
Summary of Unit-2

• Analyzing Algorithms
– Introduction to Space and Time complexities
– Basic Mathematical principles
– Order of magnitude
– Introduction to Asymptotic notations
• Best case
• Worst case
• Average case

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 57
Technologies Ltd Version No: 2.0

57
Thank You!

ER/CORP/CRS/SE15/003
Copyright © 2004, Infosys 58
Technologies Ltd Version No: 2.0

58

You might also like