Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

DATA STRUCTURE

AND
ALGORITHM

Dr. Irfana Memon


Department of CSE, QUEST

https://sites.google.com/a/quest.edu.pk/dr-irfana-memon/lecture-slides
1
NO TOPIC CLO
01 A General Overview CLO1
02 Introduction to Data Structures and Algorithm CLO1
03 String Processing CLO1
04 Abstract Data Types CLO1
05 Linked list CLO1
06 Stack and Queue CLO1
07 Recursion CLO1
08 Complexity Analysis CLO2
09 Sorting and Searching techniques CLO2
10 Trees CLO2
11 Graph CLO3
12 P & NP CLO3

2
3
o The same problem can frequently be solved with
different algorithms that differ in efficiency.
o The differences between the algorithms may be
immaterial for processing a small number of data
items, but these differences grow proportionally
with the amount of data.
o In order to compare the efficiency of algorithms, a
measure of degree of difficulty of an algorithm is
called the computational complexity.

4
o Computational Complexity indicates how much
effort is needed to apply an algorithm or how
costly it is.
o Two efficiency criteria: Time and Space
o The factor of time is more important than that of
space, so efficiency considerations usually focus on the
amount of time elapsed when processing data. However,
the most inefficient algorithm run on a Cray Computer can
execute much faster than most efficient algorithm run on
a PC, so run-time is always system dependent.
5
To compare 100 algorithms all of them would
have
oto run on the same machine. Furthermore, the results of
run-time tests depend on the language in which a give
algorithm is written even if the tests are performed on the
same machine.

oTo evaluate the efficiency, real time units such as


microseconds and nanoseconds should not be used.
Rather, logical units that express the relationship between
the size n of a file or an array and the amount of time t
required to process the data should be used.

6
If n2 = 5 n1 then t2 = 5 t1
o If t1= log2 n the doubling n increases t by only one unit
of time.
o Therefore,
t2 = log2 (2n), then t2 = t1 + 1
o A function expressing the relationship between n and t is
usually much more complex and calculating such a
function is important only in regard to large bodies of
data.
Note: Any term which does not substantially change the
magnitude of the function should be eliminated from the
function. 7
o The same problem can frequently be solved with different
algorithms that differ in efficiency.

o The differences between the algorithms may be


immaterial for processing a small number of data items, but
these differences grow proportionally with the amount of
data.

o In order to compare the efficiency of algorithms, a


measure of degree of difficulty of an algorithm is called the
computational complexity.

8
o The resulting function only gives an approximate measure
of efficiency of the original function .
o This approximate measure of efficiency is called
asymptotic complexity

9
10
The running time of an algorithm depends on:
o how long it takes a computer to run the lines of
code of the algorithm.
o That in turn depends on the speed of the
computer, the programming language, and the
compiler that translates the program from the
programming language into code that runs
directly on the computer
o as well as other factors.

11
o Here we are more interested in running time in
terms of input size
o we think about the running time of the
algorithm as a function of the size of its input.
o Secondly we focus on how fast this function
grows with the input size. We call that
o the rate of growth of the running time.
o So we simplify propose the function to distill the
most important part and cast aside the less
important parts
12
o For example, suppose that an algorithm, running on an
input of size n, takes 6n2 + 100n + 300 machine
instructions.
o The 6n2 term becomes larger than the remaining terms,
100 n + 300 once n becomes large enough (20 in this
case).

13
14
algorithm grows as n2

Why we ignored the coefficient 6 and the remaining terms


100n + 300?

As long as the running time is an2 + bn +c for some


numbers a>0, b, and c, there will always be a value of n for
which an2 is greater than bn+c
this difference increases as n increases.
Example: For values of 0.6n2 and
1000n+3000, we have: 15
16
By dropping the less significant terms and the constant
coefficients, we can focus on the important part of an
algorithm's running time
its rate of growth
When we drop the constant coefficients and the less
significant terms, we use asymptotic notation.
Three forms of it:
big-Θ notation
big-O notation , and
big Ω notation
17
Big-O is the upper bound, while Omega is the
lower bound.
Theta requires both Big O and Omega

For example, an algorithm taking Omega(n log


n) takes at least n log n time, but has no upper limit.

An algorithm taking Theta(n log n) is far preferential since


it takes at least n log
n (Omega n log n) and no more than n log
n (Big O n log n).

18
o The maximum number of times that the for- loop can
run is n, and this worst case occurs when the value
being searched for is not present in the array

o Each step in the for loop takes a constant amount of


time each time it executes.

o If the for-loop iterates n times, then the time for all n


iterations is c1.n
o where c1 is the sum of the times for the computations in
one loop iteration.
19
o The value of c1 depends on the speed of the computer,
the programming language used, the compiler or
interpreter that translates the source program into
runnable code, and other factors
o If we also consider the other overhead time as c2, then
total time for linear search in the worst case is c1.n+c2
o The constant factor c1 and the low-order term c2 don't
tell us about the rate of growth of the running time.
o What's significant is that the worst-case running time of
linear search grows like the array size n.

20
o The notation Θ(n) called "big-Theta of n" or just "Theta of
n.“
o When we say that a particular running time is Θ(n), we're
saying that once n gets large enough, the running time is at
least k1.n and at most k2.n for some constants k1 and k2
o For small values of n, we don't care how the running time
compares with k1.n or k2.n
o But once n gets large enough—on or to the right of the
dashed line, the running time must be sandwiched
between k1.n and k2.n
o As long as these constants k1, k2 exist, we say that the
running time is Θ(n)
21
o Once n gets large enough, the running time is between
k1.f(n) and k2.f(n)
o we just drop constant factors and low-order terms.
o Advantage of using big-Θ notation is that we don't have
to worry about which time units we're using.

22
o For example, suppose that you calculate that a running
time is 6n2+100n+300 microseconds. Or maybe it's
milliseconds.
o When you use big-Θ notation, you drop the factor 6 and
the low-order terms 100n+300, and you just say that the
running time is Θ(n 2).
o When we use big-Θ notation, we're saying that we have
an asymptotically tight bound on the running time.
o "Asymptotically" because it matters for only large values
of n
o "Tight bound" because we've nailed the running time to
within a constant factor above and below.
23
o Sometimes, we only need an upper bound on the
running time
o For example, although the worst-case running time of
binary search is Θ(lgn), it would be incorrect to say that
binary search runs in Θ(lgn) time in all cases
o What if we find the target value upon the first guess?
o Then it runs in Θ(1) time
o The running time of binary search is never worse than
Θ(lgn), but it's sometimes better
o It would be convenient to have a form of asymptotic
notation that means "the running time grows at most this
much, but it could grow more slowly."
24
 If a running time is O(f(n)), then for large enough n, the
running time is at most k.f(n) for some constant k

25
o We use big-O notation for asymptotic upper bounds,
since it bounds the growth of the running time from
above for large enough input sizes

o We can say that the running time of binary search is


always O(lgn).
o Although the worst-case running time is Θ(lgn)
o But the strongest statement we can make is that binary
search runs in O(lgn) time

26
o big-Θ looks a lot like big-O notation, except that big- Θ
notation bounds the running from both above and below,
rather than just from above

o If we say that a running time is Θ(f(n)) in a particular


situation, then it's also O(f(n))
o For example, we can say that because the worst-case
running time of binary search is Θ(lgn), it's also O(lgn)
o The converse is not necessarily true: that binary search
always runs in O(lgn) time but not that it always runs in
Θ(lgn) time

27
o It is correct to say that binary search runs in O(n) time
o That's because the running time grows no faster than a
constant times n
o In fact, it grows slower
o Think of it this way
o Suppose you have 10 dollars in your pocket. You go up
to your friend and say, "I have an amount of money in
my pocket, and I guarantee that it's no more than one
million dollars." Your statement is absolutely true,
though not terribly precise

28
Ω
o If a running time is Ω(f(n)), then for large enough n, the running
time is at least k f(n) for some constant k
o We use big-Ω notation for asymptotic lower bounds, since it
bounds the growth of the running time from below for large
enough input sizes

o Just as Θ(f(n)) automatically implies O(f(n)), it also


automatically implies Ω(f(n))
o So we can say that the worst-case running time of binary
search is Ω(lgn)
o For example, just as if you really do have a million dollars in
your pocket, you can truthfully say "I have an amount of money
in my pocket, and it's at least 10 dollars,"

29
30
31
32
General Rules
Rule 1—FOR loops
The running time of a for loop is at most the
running time of the statements inside the for loop
(including tests) times the number of iterations.

Rule 2—Nested loops


Analyze these inside out. The total running time of
a statement inside a group of nested loops is the
running time of the statement multiplied by the
product of the sizes of all the loops.
33
As an example, the following program fragment is O(N2):
for( i =0; i <n; ++i)

for( j =0; j <n; ++j)

++k;

34
These just add. It means that the maximum is the one that
counts.
As an example, the following program fragment, which has
O(N) work followed by O(N2) work, is also O(N2):

for( i = 0; i < n; ++i )


a[ i ] = 0;

for( i = 0; i < n; ++i )

for( j = 0; j < n; ++j )

a[ i ] += a[ j ] + i + j;
35
Rule 4—If/Else
For the fragment

if( condition )

S1

else

S2
the running time of an if/else statement is never more than
the running time of the test plus the larger of the running
times of S1 and S2.

36
For instance, the following function is really just a simple
loop. What is order of this function?

long factorial(int n )
{

if( n <= 1 )

return 1;

else
return n * factorial( n - 1 );
}
37
For instance, the following function is really just a simple
loop. What is order of this function is O(N).
long factorial( int n )
{
if( n <= 1 )
return 1;

else
return n * factorial( n - 1 );
}

38
The most confusing aspect of analyzing algorithms probably
centers around the logarithm.
We have already seen that some divide-and-conquer
algorithms will run in O(N log N) time. Besides divide-and-
conquer algorithms, the most frequent appearance of
logarithms centers around the following general rule:
An algorithm is O(log N) if it takes constant (O(1)) time to cut
the problem size by a fraction (which is usually 1/2). On the
other hand, if constant time is required to merely reduce
the problem by a constant amount (such as to make the
problem smaller by 1), then the algorithm is O(N).

39
Binary Search
Given an integer X and integers A0, A1, . . . , AN—1, which are
presorted and already in memory, find i such that Ai = X, or return i
= —1 if X is not in the input.

The obvious solution consists of scanning through the list from left
to right and runs in linear time. However, this algorithm does not
take advantage of the fact that the list is sorted and is thus not
likely to be best.

A better strategy is to check if X is the middle element. If so, the


answer is at hand. If X is smaller than the middle element, we can
apply the same strategy to the sorted sub array to the left of the
middle element; likewise, if X is larger than the middle element, we
look to the right half.
40
Clearly, all the work done inside the loop takes O(1) per
iteration, so the analysis requires determining the number of
times around the loop. The loop starts with high — low
= N — 1 and finishes with high — low ≥ —1. Every time
through the loop, the value high — low must be at least halved
from its previous value; thus, the number of times around the
loop is at most µlog(N — 1)y + 2.

! As an example, if high — low = 128, then the


maximum values of high — low after each iteration are 64, 32,
16, 8,
4, 2, 1, 0, —1. Thus, the running time is O(log N).
Equivalently, we could write a recursive formula for the running
time, but this kind of brute-force approach is usually
unnecessary when you understand what is really going on and
why. 41
A second example is Euclid’s algorithm for computing the greatest
common divisor. The greatest common divisor (gcd) of two
integers is the largest integer that divides both. Thus, gcd(50,
15) = 5. The algorithm in Figure 2.10 computes gcd(M, N),
assuming M ≥ N. (If N > M, the first iteration of the loop swaps
them.)
The algorithm works by continually computing remainders until 0
is reached. The last nonzero remainder is the answer. Thus, if M
= 1,989 and N = 1,590, then the sequence of remainders is 399,
393, 6, 3, 0. Therefore, gcd(1989, 1590) = 3. As the example
shows, this is a fast algorithm.

42
43
Our last example in this section deals with raising an integer
to a power (which is also an integer). Numbers that result
from exponentiation are generally quite large, so an
analysis works only if we can assume that we have a
machine that can store such large integers (or a compiler
that can simulate this). We will count the number of
multiplications as the measurement of running time.
The obvious algorithm to compute XN uses N—1
multiplications. A recursive algorithm can do better. N ≤ 1
is the base case of the recursion. Otherwise, if N is even,
we have XN =XN/2 ·XN/2,and if N is odd, XN = X (N—1)/2 ·X(N
—1)/2 ·X.

44
For instance, to compute X62, the algorithm does the following
calculations, which involve only nine multiplications:
X3 =(X2)X,
X7 =(X3)2X
X15 =(X7)2 X
X31 =(X15)2 X
X62 =(X31)2

45
46
47
Estimate the running time of the following program
fragments (a “big-O” analysis). In the fragments, the
variable n, an integer, is the input.

for (i = 0; i < n; i++) sum++;

48
Estimate the running time of the following program
fragments (a “big-O” analysis). In the fragments, the
variable n, an integer, is the input.

for (i = 0; i < n; i++) sum++;

Answer: the running time is O(n)

49
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)


for (j = 0; j < n; i++) sum++;

50
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)


for (j = 0; j < n; i++) sum++;

Answer: the running time is O(N2)


51
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)


for (j = 0; j < n * n; i++) sum++;

52
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)


for (j = 0; j < n * n; i++) sum++;

Answer: the running time is O(N3)


53
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)

for (j = 0; j < i; i++)


sum++;
54
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)

for (j = 0; j < i; i++)


sum++;

Answer: the running time is O(N2)


55
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)

for (j = 0; j < i * i; i++)


for (k = 0; k < j; i++) sum++;

56
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)

for (j = 0; j < i * i; i++)


for (k = 0; k < j; i++) sum++;

Answer: the running time is O(N5)


57
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i++)


for (j = 0; j < i * i; i++) if (j % i = = 0)

for (k = 0; k < j; i++) sum++;

58
Estimate the running time of the following program
fragments (a “big-O” analysis). In the fragments, the
variable n, an integer, is the input.
A will run every time j is a multiple
 for (i = 0; i < n; i++)
of I which is N
 for (j = 0; j < i * i; i++) if (j % i = = 0)times. Other
occasions the loop
will terminate in
for (k = 0; k < j; constant time.
i++)
sum++; Answer:

the running time is

O(N4)
59
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i=i * 2) sum++;

60
Estimate the running time of the following
program fragments (a “big-O” analysis). In the
fragments, the variable n, an integer, is the
input.

for (i = 0; i < n; i=i * 2) sum++;

Answer: the running time is O(log N)

61
Wish
You
Good Luck

62

You might also like