Unit1 Daa

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Design and Analysis of Algorithms- Unit I

Algorithm Definitions:
A mathematical relation between an observed quantity and a variable used in a
step-by-step mathematical process to calculate a quantity

Algorithm is any well defined computational procedure that takes some value or set of
values as input and produces some value or set of values as output

A procedure for solving a mathematical problem in a finite number of steps that


frequently involves repetition of an operation; broadly: a step-by-step procedure for solving a
problem or accomplishing some end (Webster’s Dictionary)
An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
An algorithm is a finite step-by-step procedure to achieve a required result.
An algorithm is a sequence of computational steps that transform the input into the output.
An algorithm is a sequence of operations performed on data that have to be organized in data
structures.
An algorithm is an abstraction of a program to be executed on a physical machine (model of
Computation).
What is an algorithm?
An algorithm to be any well-defined computational procedure that takes some values as
input and produces some values as output. Like a cooking recipe, an algorithm provides a
step-by-step method for solving a computational problem. Unlike programs, algorithms are not
dependent on a particular programming language, machine, system, or compiler. They are
mathematical entities, which can be thought of as running on some sort of idealized computer
with an infinite random access memory and an unlimited word size. Algorithm design is all
about the mathematical theory behind the design of good programs.

Why study algorithm design?


Programming is a very complex task, and there are a number of aspects of programming
that make it so complex. The first is that most programming projects are very large, requiring the
coordinated efforts of many people. (This is the topic a course like software engineering.) The
next is that many programming projects involve storing and accessing large quantities of data
efficiently. (This is the topic of courses on data structures and databases.)
The last is that many programming projects involve solving complex computational
problems, for which simplistic or naive solutions may not be efficient enough. The complex
problems may involve numerical data (the subject of courses on numerical analysis), but often
they involve discrete data. This is where the topic of algorithm design and analysis is important.
Although the algorithms discussed in this course will often represent only a tiny fraction
of the code that is generated in a large software system, this small fraction may be very important
for the success of the overall project.
An unfortunately common approach to this problem is to first design an inefficient
algorithm and data structure to solve the problem, and then take this poor design and attempt to
fine-tune its performance.
The problem is that if the underlying design is bad, then often no amount of fine-tuning is
going to make a substantial difference.
The focus of this course is on how to design good algorithms, and how to analyze their
efficiency. This is among the most basic aspects of good programming
Algorithm: Finite set of instructions that, if followed, accomplishes a particular task.
Describe: in natural language / pseudo-code / diagrams / etc.

Criteria to follow:

Input: Zero or more quantities (externally produced)


Output: One or more quantities
Definiteness: Clarity, precision of each instruction
Finiteness: The algorithm has to stop after a finite (may be very large) number of steps
Effectiveness: Each instruction has to be basic enough and feasible

PSEUDOCODECONVENTIONS

1. Comments begin with // and continue until the end of the line.
2. Blocks are indicated with matching braces {and}.
3. An identifier (= data type) begins with a letter. Simple types are assumed: Integer, float, char,
boolean, and so on. Compound data types can be formed with records:
node = record
{ datatype_1 data_1;

datatype_ndata_n;
node *link;
}
In the example, link is a pointer to the record type node.
4. Assignment of values to variables is done using the assignment statement
<variable> := <expression>;
5. There are two Boolean values true and false. In order to produce these values, the logical
operators and, or, and not and the relational operators <,≤, ≠ , =, ≥, > are provided.
6. Elements of multidimensional arrays are accessed using [and]. For example, if A is a
two-dimensional array, the (i,j)th element of the array is denoted A[i,j].

7. The following looping statements are involved: for, while, repeat, until.
The while loop takes the following form:
while<condition> do
{
<statement 1>

<statement n>
}
As long as <condition> is true, the statements get executed. When <condition> becomes false,
the loop is exited.
The general form for a for loop is:
for variable:= value1 tovalue2step step do
{
<statement 1>

<statement n>
}
Here, value1, value2 and step are arithmetic expressions.
The for loop can be implemented as a while loop.
A repeat-until statement is constructed as follows:
repeat
{
<statement 1>

<statement n>
}
Until<condition>
The statements are executed for as long as condition is false.

The instruction break; can be used within any looping instruction to force exit. In case of
nested loops, break; results in the exit of the innermost loop that is part of it. A return statement
within any of the above loops also will result in exiting the loops. A return statement results in
the exit of the function itself.

8. A conditional statement has the following forms:


if<condition> then<statement>
if<condition> then<statement 1> else<statement 2>
Here, <condition> is a Boolean statement, and <statement>, <statement 1> and <statement 2>
are arbitrary statements.
case
{
:<condition 1>:<statement1>

:<condition n>: <statement n>
: else: <statement n+1>
}
9. Input and output are done using the instructions read and write.

10. There is only one type of procedure: Algorithm.


An algorithm consists of a heading and a body. The heading takes the form:

AlgorithmName (<parameter list>)


Where Name is the name of the procedure and (<parameter list>) is a listing of the
procedure parameters. The body has one or more statements enclosed within { and }.
EXAMPLE: SELECTION SORT
Goal (specification of the outputs):
We must devise an algorithm that sorts a collection of n≥1 elements of arbitrary type.
Solution: From those elements that are currently unsorted, find the smallest and place it next in
the sorted list.
Although the statement adequately describes the sorting problem, it is not an algorithm
because it leaves several questions unanswered. For example, it does not tell us where and how
the elements are initially stored or where we should place the result. We assume that the elements
are stored in an array a, such that i th integer is stored in the i th position a[i], 1 ≤ i≤ n.
This is our first attempt:
1.for i:= 1 to ndo
2.{
3.Examine a[i] to a[n] and suppose
4.the smallest element is a[j];
5.Interchange a[i] with a[j];
6.}

To turn the algorithm into a pseudocode program, two subtasks remain: Finding the
smallest element.
This task is solved by assuming that a[i] is the minimum, and then comparing a[i] with
a[i+1], a[i+2],..., and whenever a smaller element is found, regarding it as a new minimum.
Eventually a[n] is compared with current minimum and we are done.

Defining “interchange”:
Define t:= a[i]; a[i] = a[j]; a[j]:= t.
1. Algorithm SelectionSort(a,n)
2. // Sort the array a[1:n] into non decreasing order
3. {
4. for i:= 1 to n do
5. {
6. j := i;
7. for k := i+1 to n do
8. if k:= (a[k] < a[j]) then
9. j := k; t:=a[i]; a[i]:=a[j]; a[j]:= t;
10. }
11. }

EXAMPLE: SELECTION SORT IN C++


1.Void SelectionSort(Type a[], int n)
2.// Sortthe array a[1:n] into non decreasing order
3.{
4.for (inti=1; i<=n; i++){
5.intj= i;
6.for (intk=i+1; k<=n; k++)
7.if (a[k]<a[j]) j=k;
8.Type t = a[i]; a[i] = a[j]; a[j] = t;
9.}
10.}
DOES THE ALGORITHM WORKS CORRECTLY?
1 How to state correctness?
2 What do we want to prove?
3 How?

PROVING CORRECTNESS
Theorem. Algorithm SelectionSort(a,n) correctly sorts a set of n ≥ 1 elements; the result remains
in a[1:n] such that a[1] ≤ a[2] ≤... ≤ a[n].

Proof (?) from a textbook.


We first note that for any i, say i= q, following the execution of lines 6 to 9 in Pseudo Code, it is
the case that a[q] ≤ a[r], q < r ≤ n. When i becomes greater than q, a[1:q] is unchanged. Hence,
following the last execution of these lines (i= n), we have a[1] ≤ a[2] ≤... ≤ a[n].

RECURRENCE RELATIONS

A Recurrence is an equation or inequality that describes a function in terms of its value on


smaller inputs.
Special techniques are required to analyze the space and time required

​ What is recursion?
Recursion is the concept of well-defined self-reference. We use recursion
frequently; consider, for example, the following hypothetical definition of a Jew. (This
definition isn't official - it's just something I heard at a party once.)

Self-referential definitions can be dangerous if we're not careful to avoid


circularity. The definition ``A rose is a rose'' just doesn't cut it. This is why our earlier
definition of recursion includes the word well-defined.

Algorithm fact(n)

{ //n is a +ve integer, f:=1 n=5

If n=1

return 1

else

n=n*fact(n-1) n= 5*4*3*2
return n;

● Very handy for defining functions and data types simply:


● Consider the nth Fibonacci number, Fn:
▪ = 1, if n = 1 or n=2
▪ = Fn-1 + Fn-2 , for all n>2
● Very handy when a large problem can be broken in similar (but smaller) problems
o We’ll look at the Towers of Hanoi in a moment

About the Towers of Hanoi

• There are three towers


• 64 gold disks, with decreasing sizes, placed on the first tower
• You need to move the stack of disks from one tower to another, one disk at a time
• Larger disks can not be placed on top of smaller disks
• The third tower can be used to temporarily hold disks

Step1:

Step 2:
Step 3:

Step 4:

Step 5:

Step 6:

Step 7:

Step 8:
The Tower of Hanoi puzzle involves moving a pile of deferent size disks from one peg to
another using an intermediate peg. Only one disk at a time can be moved, a disk can only be
moved if it is the top disk on a pile, and a larger disk can never be placed on a smaller one.
Figure shows the initial and goal states of a three-disk problem.
The most straight-forward axiomatization of this problem consists of an operator for
moving each disk between each pair of pegs. For the three-disk problem, this axiomatization
requires 18 operators. Table 1 shows the operator for moving disk C, the largest disk, from peg 1
to 3. The preconditions require that disk C is initially on peg 1 and that neither disk A nor B are
on peg 1 or 3. This representation is far from the most concise one, but it is used in this paper to
simplify the exposition. The basic ideas described in this paper apply to and have been tested on
more compact representations.

Recursive Algorithm

Algorithm Hanoi( n, a, b, c)
// a,b,c are arrays for disk
{
if (n == 1) /* base case */
Move( a, b );
else { /* recursion */
Hanoi( n-1, a, c, b );
Move( a, b );
Hanoi( n-1, c, b, a );
}
}

ASYMPTOTICS
Used to formalize that an algorithm has running time or storage requirements that are “never
more than”, “always greater than”, or “exactly” some amount
• Asymptotic Upper Bound
• For a given function g(n), we denote O(g(n)) as the set of functions:
O(g(n)) = { f(n)| there exists positive
constants c and n0 such that
0 ≤ f(n) ≤ c g(n) for all n ≥ n0 }

Asymptotic Algorithm Analysis


🞂 The asymptotic analysis of an algorithm determines the running time in big-Oh notation
🞂 To perform the asymptotic analysis
◦ We find the worst-case number of primitive operations executed as a function of
the input size
◦ We express this function with big-Oh notation
🞂 Example:
◦ We determine that algorithm arrayMax executes at most 6n − 1 primitive
operations
◦ We say that algorithm arrayMax “runs in O(n) time”
Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard
them when counting primitive operation.

Big-Oh Notation

🞂 To simplify the running time estimation, for a function f(n), we ignore the constants
and lower order terms.
Example: 10n3+4n2-4n+5 is O(n3).

🞂 Given functions f(n) and g(n), we say that f(n) is O(g(n)) if there are positive constants
c and n0 such that
f(n) ≤ cg(n) for n ≥ n0

🞂 Example: 2n + 10 is O(n)
◦ 2n + 10 ≤ cn
◦ (c − 2) n ≥ 10
◦ n ≥ 10/(c − 2)
◦ Pick c = 3 and n0 = 10

Big-Oh Rules

🞂 If f(n) is a polynomial of degree d, then f(n) is O(nd), i.e.,


1. Drop lower-order terms
2. Drop constant factors
🞂 Use the smallest possible class of functions
1. Say “2n is O(n)” instead of “2n is O(n2)”
🞂 Use the simplest expression of the class
1. Say “3n + 5 is O(n)” instead of “3n + 5 is O(3n)”
Ο-Notation (Upper Bound)
This notation gives an upper bound for a function to within a constant factor. We write
f(n) = O(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of
f(n) always lies on or below c g(n). In the set notation, we write as follows: For a given
function g(n), the set of functions
Ο(g(n)) = {f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ c g(n) for all n
≥ n0}
We say that the function g(n) is an asymptotic upper
bound for the function f(n). We use Ο-notation to give an
upper bound on a function, to within a constant factor.
Graphically, for all values of n to the right of n0, the value
of the function f(n) is on or below g(n). We write f(n) =
O(g(n)) to indicate that a function f(n) is a member of the
set Ο(g(n)) i.e.
f(n) Ο(g(n))
Note that f(n) = Θ(g(n)) implies f(n) = Ο(g(n)), since
Θ-notation is a stronger notation than Ο-notation.
Example: 2n2 = Ο(n3), with c = 1 and n0 = 2.
Equivalently, we may also define f is of order g as follows:
If f(n) and g(n) are functions defined on the positive integers, then f(n) is Ο(g(n)) if and
only if there is a c > 0 and an n0 > 0 such that
| f(n) | ≤ | g(n) | for all n ≥ n0

Growth Rate of Running Time


🞂 Consider a program with time complexity O(n). For the input of size n, it takes 5 seconds.
If the input size is doubled (2n), then it takes 10 seconds.
🞂 Consider a program with time complexity O(n2). For the input of size n, it takes 5
seconds.
If the input size is doubled (2n), then it takes 20 seconds.
🞂 Consider a program with time complexity O(n3). For the input of size n, it takes 5
seconds.
If the input size is doubled (2n), then it takes 40 seconds.

Growth Rate of Running Time


🞂 Changing the hardware/ software environment
◦ Affects T(n) by a constant factor, but
◦ Does not alter the growth rate of T(n)
The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax
The Growth Rate of the Six Popular functions
n logn n nlogn n2 n3 2n
4 2 4 8 16 64 16
8 3 8 24 64 512 256
16 4 16 64 256 4,096 65,536
32 5 32 160 1,024 32,768 4,294,967,296
64 6 64 384 4,094 262,144 1.84 * 1019
128 7 128 896 16,384 2,097,152 3.40 * 1038
256 8 256 2,048 65,536 16,777,216 1.15 * 1077
512 9 512 4,608 262,144 134,217,728 1.34 * 10154
1024 10 1,024 10,240 1,048,576 1,073,741,824 1.79 * 10308

OTHER ASYMPTOTICS NOTATIONS


Θ-notation
• Asymptotic tight bound
• Θ (g(n)) represents a set of functions such that:
Θ (g(n)) = {f(n): there exist positive
constants c1, c2, and n0 such
that 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)
for all n≥ n0}
Ω-notation
• Asymptotic lower bound
• Ω (g(n)) represents a set of functions such that:
Ω(g(n)) = {f(n): there exist positive
constants c and n0 such that
0 ≤ c g(n) ≤ f(n) for all n≥ n0}
Asymptotic performance:
How does algorithm behave as the problem size gets very large?
o Running time
o Memory/storage requirements
An Example: Insertion Sort

Analyzing Insertion Sort


• T(n) = c1n + c2(n-1) + c3(n-1) + c4T + c5(T - (n-1)) + c6(T - (n-1)) + c7(n-1) = c8T + c9n +
c10
• What can T be?
▪ Best case -- inner loop body never executed
⮚ ti = 1 🡺 T(n) is a linear function
▪ Worst case -- inner loop body executed for all previous elements
⮚ ti = i 🡺 T(n) is a quadratic function
▪ Average case
⮚ ???
Practical Complexities:

⚫ A function f(n) is o(g(n)) if ∃ positive constants c and n0 such that


f(n) < c g(n) ∀ n ≥ n0
⚫ A function f(n) is ω(g(n)) if ∃ positive constants c and n0 such that
c g(n) < f(n) ∀ n ≥ n0
⚫ Intuitively,
⚫ o( ) is like <
⚫ O( ) is like ≤
⚫ ω( ) is like >
⚫ Ω( ) is like ≥
⚫ Θ( ) is like =
Randomized Algorithms:
Probability Theory
Probability:
The probability of an event E is defined to be |E| / |S|, where S is the sample
space.
Example:
Tossing three coins
The probability of the event {H H T, H T T, T T T} is 3/8.
The probability of the event {H H H, T T T} is 2/8. and that of the event { } is Zero

Mutual Exclusion
Two events E1 and E2 are said to be mutually exclusive if they do not have any common
sample points, that is, if E1 ∩ E2 = Φ .

Example:

Tossing three coins


Let E1 be the event that are two H’s and E2 be the event that there are at least two T’s.
These two events are mutually exclusive since there are no common sample points.
on the other hand , E2 ’ is defined to be the event that there is at least one T. then E1
and E2 ’ will not be mutually exclusive since they will have T H H, H T H, and H H T as
common sample points.

Random variable:
Let S be the sample space of an experiment. A random variable on s is a function that
maps the element of S to the set of real numbers. For any Sample point s Є S , X ( s) denotes
the image of s under this mapping. If the range of X, that is, the set of values X can take, is finite
, we say X is discrete.
Let the range of a discrete random variable X be {r1,r2,….,rm }. Then , Prob.[ X=ri ], for an
I, is defined to be the number of sample points whose image is ri divided by the number of
sample points in S.

Makes use of Randomizer (Random number generator)


The execution time of a randomized algorithm could also vary from run to run for
the same input.
Two types
-Las Vegas algorithm
-Monte Carlo algorithm
Las Vegas algorithm

Always produce the same (correct) output for the same input.
The execution time depends on the output of the randomizer, in general
execution time is characterized as a random variable
Monte Carlo algorithm
Always produce the different output for the same input from run to run.
Consider any problem for which there are only two possible answers, say , yes or no.
Las Vegas algorithm
Algorithm LasVegas()
{
while (true) do
{
i:=Random() mod 2;
if ( i ≥ 1) then return;
}
}

Repeated Element
Consider an array a[ ] of n numbers that has n/2 distinct elements and n/2 copies of
another element. The problem is to identify the repeated element.
Here , the sampling performed is with repetitions; i.e., the first and second elements are
randomly picked from out of the n elements. Thus there is a probability (equal to 1/n) that the
same array element is picked each time. If we just checked for the equality of the two elements
picked, our answer might be incorrect, Therefore, it is essential to make sure that the two array

indices picked are different and two array cells contain the same value.
Repeated Elements algorithm
Algorithm Repeated Elements (a, n)
// Find the repeated element from a [1: n]
{
While (true) do
{i:=Random() mod n + 1;
j:=Random() mod n + 1;
// I and j are random numbers in the range [1, n]
if ( ( i ≠ j ) and (a[ i ] = a[ j ] )) then return i;
}
}

Primality Testing
Any integer greater than one is said to be a prime if its only divisors are 1 and the
integer itself.

By Convention, we take 1 to be a nonprime. Then 2, 3, 5,7,11 and 13 are first six


primes. Given an integer n, the problem of deciding whether n is a prime is
known as primality testing.
if a number n is composite (i.e., non-prime), it must have a divisor ≤ [√ n]
Based on this, primality testing:
Consider each number L in the interval [ 2, [√n ]] and check whether L
divides n. If none of these numbers divides n, then n is prime ; otherwise it is
composite
Primality Testing
Algorithm Prime0 (n,α)
// returns true if n is a prime and false otherwise
// α is the probability parameter
{
q:=n-1;
for i:=1 to large do // specify large
{
m:=q; y:=1;
a:=Random() mod q+1;
// chose random number range 1 to n-1.
z:=a;
n-1
// compute a mod n
while (m>0) do
{
while (m mod 2 = 0) do
{
2
z:=z mod n; m:=[m/2];
}
m:=m-1; y:=(y * z) mod n;
}
if(y ≠1) then return false;

n-1
// If a mod n is not 1, n is not a prime.
}
return true;
}

Miller-Rabin’s Primality testing algorithm


Algorithm Prime (n, α)
// returns true if n is a prime and false otherwise
// α is the probability parameter
{
q:=n-1;
for i:=1 to α * log(n) do
{
m:=q; y:=1;
a:=Random() mod q+1;
// chose random number range 1 to n-1.
z:=a;

// compute an-1 mod n


while (m>0) do
{
while (m mod 2 = 0) do
{
x:=z;
z:=z2 mod n;
// if x is a nontrivial square , root of 1 , n is not a prime
if( (z=1)and (x ≠1) and ( x ≠ q) )then
return false;
m:=[m/2];
}
m:=m-1; y:=(y * z) mod n;
}
if(y ≠1) then return false;

// If an-1 mod n is not 1, n is not a prime.


}
return true;
}

Algorithm Prime1(n)

{
// Specify t;
for i :=1 to t do
{
m:= Power(n,0.5);
j:=Random() mod m+2;
if((n mod j) = 0) then return false;
// if j divides n, n is not prime.
}
return true;
}

You might also like