Design and Analysis of Algorithm: Unit-V

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 74

DESIGN AND ANALYSIS OF

ALGORITHM
UNIT-V
RANDOMIZING QUICKSORT

 Randomly permute the elements of the input array before


sorting
 OR ... modify the PARTITION procedure
 At each step of the algorithm we exchange element A[p] with
an element chosen at random from A[p…r]
 The pivot element x = A[p] is equally likely to be any one of
the r – p + 1 elements of the subarray

2
RANDOMIZED ALGORITHMS
 No input can elicit worst case behavior
 Worstcase occurs only if we get “unlucky” numbers from the
random number generator
 Worst case becomes less likely
 Randomization can NOT eliminate the worst-case but it can
make it less likely!

3
RANDOMIZED PARTITION

Alg.: RANDOMIZED-PARTITION(A, p, r)

i ← RANDOM(p, r)

exchange A[p] ↔ A[i]

return PARTITION(A, p, r)

4
RANDOMIZED QUICKSORT
Alg. : RANDOMIZED-QUICKSORT(A, p, r)

if p < r

then q ← RANDOMIZED-PARTITION(A, p, r)

RANDOMIZED-QUICKSORT(A, p, q)

RANDOMIZED-QUICKSORT(A, q + 1, r)

5
FORMAL WORST-CASE
ANALYSIS OF QUICKSORT
 T(n) = worst-case running time
T(n) = max (T(q) + T(n-q)) + (n)
1 ≤ q ≤ n-1

 Use substitution method to show that the running time of


Quicksort is O(n2)
 Guess T(n) = O(n2)
 Induction goal: T(n) ≤ cn2
 Induction hypothesis: T(k) ≤ ck2 for any k < n

6
WORST-CASE ANALYSIS OF QUICKSORT
 Proof of induction goal:
T(n) ≤ max (cq2 + c(n-q)2) + (n) =
1 ≤ q ≤ n-1

= c  max (q2 + (n-q)2) + (n)


1 ≤ q ≤ n-1

 The expression q2 + (n-q)2 achieves a maximum over the


range 1 ≤ q ≤ n-1 at one of the endpoints

max (q2 + (n - q)2) = 12 + (n - 1)2 = n2 – 2(n – 1)


1 ≤ q ≤ n-1

T(n) ≤ cn2 – 2c(n – 1) + (n) 7


≤ cn2
REVISIT PARTITIONING
 Hoare’s partition
 Select a pivot element x around which to partition
A[p…i]  x x  A[j…r]
 Grows two regions
A[p…i]  x
x  A[j…r] i j

8
ANOTHER WAY TO PARTITION
(LOMUTO’S PARTITION A[p…i]
– PAGE 146) > x
≤ x A[i+1…j-1]
 Given an array A, partition the
p i i+1 j-1 j r
array into the following subarrays:
 A pivot element x = A[q]
unknown
 Subarray A[p..q-1] such that each element of A[p..q-1] is smaller than or
pivot
equal to x (the pivot)
 Subarray A[q+1..r], such that each element of A[p..q+1] is strictly greater
than x (the pivot)

 The pivot element is not included in any of the two subarrays

9
EXAMPLE

at the end,
swap pivot 10
ANOTHER WAY TO PARTITION
(CONT’D)
Alg.: PARTITION(A, p, r) A[p…i] ≤ x A[i+1…j-1] > x
x ← A[r]
p i i+1 j-1 j r
i←p-1
for j ← p to r - 1
do if A[ j ] ≤ x
then i ← i + 1 unknown
exchange A[i] ↔ A[j] pivot
exchange A[i + 1] ↔ A[r]
return i + 1

Chooses the last element of the array as a pivot


Grows a subarray [p..i] of elements ≤ x
Grows a subarray [i+1..j-1] of elements >x 11
Running Time: (n), where n=r-p+1
RANDOMIZED QUICKSORT
(USING LOMUTO’S PARTITION)
Alg. : RANDOMIZED-QUICKSORT(A, p, r)

if p < r

then q ← RANDOMIZED-PARTITION(A, p, r)

RANDOMIZED-QUICKSORT(A, p, q - 1)

RANDOMIZED-QUICKSORT(A, q + 1, r)

The pivot is no longer included in any of the subarrays!!


12
ANALYSIS OF RANDOMIZED
QUICKSORT
Alg. : RANDOMIZED-QUICKSORT(A, p, r)

if p < r The running time of Quicksort is


dominated by PARTITION !!
then q ← RANDOMIZED-PARTITION(A, p, r)

RANDOMIZED-QUICKSORT(A, p, q - 1)

RANDOMIZED-QUICKSORT(A, q + 1, r)

PARTITION is called
at most n times
(at each call a pivot is selected and never again 13
included in future calls)
PARTITION
Alg.: PARTITION(A, p, r)
x ← A[r]
O(1) - constant
i←p-1
for j ← p to r - 1
do if A[ j ] ≤ x
# of comparisons: Xk
then i ← i + 1 between the pivot and
exchange A[i] ↔ A[j] the other elements
exchange A[i + 1] ↔ A[r]
return i + 1
O(1) - constant

Amount of work at call k: c + Xk 14


AVERAGE-CASE ANALYSIS OF
QUICKSORT
 Let X = total number of comparisons performed X
calls to PARTITION:

in all
X
k
k

 The total work done over the entire execution of


Quicksort is
O(nc+X)=O(n+X)

 Need to estimate E(X)

15
REVIEW OF PROBABILITIES

16
REVIEW OF PROBABILITIES

(discrete case)

17
RANDOM VARIABLES
Def.: (Discrete) random variable X: a function from a sample
space S to the real numbers.
 It associates a real number with each possible outcome of an
experiment.

X(j)

18
RANDOM VARIABLES
E.g.: Toss a coin three times

define X = “numbers of heads”

19
COMPUTING PROBABILITIES
USING RANDOM VARIABLES

20
EXPECTATION
 Expected value (expectation, mean) of a discrete random
variable X is:
E[X] = Σx x Pr{X = x}
 “Average” over all possible values of random variable X

21
EXAMPLES
Example: X = face of one fair dice

E[X] = 11/6 + 21/6 + 31/6 + 41/6 + 51/6 + 61/6


= 3.5

Example:

22
INDICATOR RANDOM VARIABLES
 Given a sample space S and an event A, we define the indicator random
variable I{A} associated with A:
 I{A} = 1 if A occurs

0 if A does not occur

 The expected value of an indicator random variable XA=I{A} is:

E[XA] = Pr {A}

 Proof:

1  Pr{A} + 0  Pr{Ā} = Pr{A}


E[XA] = E[I{A}] = 23
AVERAGE-CASE ANALYSIS OF
QUICKSORT
 Let X = total number of comparisons performed X
calls to PARTITION:

in all
X
k
k

 The total work done over the entire execution of


Quicksort is
O(n+X)

 Need to estimate E(X)

24
NOTATION
z2 z9 z8 z3 z5 z4 z1 z6 z10 z7
2 9 8 3 5 4 1 6 10 7

 Rename the elements of A as z1, z2, . . . , zn, with zi being the i-


th smallest element

 Define the set Zij = {zi , zi+1, . . . , zj } the set of elements

between zi and zj, inclusive

25
TOTAL NUMBER OF COMPARISONS
IN PARTITION
• Define Xij = I {zi is compared to zj }

• Total number of comparisons X performed by the


algorithm:

n 1 n

X  X ij i1 j i 1
i n-1

i+1 n
26
EXPECTED NUMBER OF TOTAL
COMPARISONS IN PARTITION
 Compute the expected value of X:

 n 1 n  n 1 n
E[X ]  E   X ij  
 i 1 j i 1 
  EX  
i 1 j i 1
ij

by linearity indicator
of expectation random variable
n 1 n
   Pr{zi is compared to z j }
i 1 j i 1

the expectation of Xij is equal to


the probability of the event “zi is
compared to zj”
27
COMPARISONS IN PARTITION :
OBSERVATION 1
 Each pair of elements is compared at most once during the entire
execution of the algorithm
 Elements are compared only to the pivot point!
 Pivot point is excluded from future calls to PARTITION

28
COMPARISONS IN PARTITION:
OBSERVATION 2
 Only the pivot is compared with elements in both partitions!

z2 z9 z8 z3 z5 z4 z1 z6 z10 z7
2 9 8 3 5 4 1 6 10 7

Z1,6= {1, 2, 3, 4, 5, 6} {7} Z8,9 = {8, 9, 10}


pivot

Elements between different partitions


29
are never compared!
COMPARISONS IN PARTITION
z2 z9 z8 z3 z5 z4 z1 z6 z10 z7
2 9 8 3 5 4 1 6 10 7

Z1,6= {1, 2, 3, 4, 5, 6} {7} Z8,9 = {8, 9, 10}

Pr{zi is compared to z j }?

 Case 1: pivot chosen such as: zi < x < zj


 zi and zj will never be compared
 Case 2: zi or zj is the pivot
 zi and zj will be compared
 only if one of them is chosen as pivot before any other element in
range zi to zj 30
SEE WHY 

z2 will never be
compared with z6
since z5 (which
z2 z 4 z 1 z3 z5 z 7 z 9 z 6 belongs to [z2, z6])
was chosen as a
pivot first !
31
PROBABILITY OF COMPARING ZI WITH
ZJ
Pr{ zi is compared to zj } =

Pr{ zi is the first pivot chosen from Zij }


+
OR
Pr{ zj is the first pivot chosen from Zij }
= 1/( j - i + 1) + 1/( j - i + 1) = 2/( j - i + 1)

•There are j – i + 1 elements between zi and zj


– Pivot is chosen randomly and independently
– The probability that any particular element is the first one 32
chosen is 1/( j - i + 1)
NUMBER OF COMPARISONS IN PARTITION
Expected number of comparisons in PARTITION:
n 1 n
E[ X ]  
n 1
E[ X ]   
n
2 n 1 n i
 
 Pr{z
2 n 1 n
2 n1
    O(lg n)
i is compared to z j }
i 1 j i 1
i 1 j i 1
j  i  1 i 1 k 1 k  1 i 1 k 1 k i 1

(set k=j-i) (harmonic series)

 O (n lg n )
 Expected running time of Quicksort using
RANDOMIZED-PARTITION is O(nlgn)
33
ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT
• See Problem 7-2, page 16

• Focus on the expected running time of


each individual recursive call to Quicksort,
rather than on the number of comparisons
performed.

• Use Hoare partition in our analysis.

34
ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT

(i.e., any element has the same probability to be chosen as pivot) 35


ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT

36
ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT

37
ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT

38
ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT

- 1:n-1 splits have 2/n probability


- all other splits have 1/n probability
39
ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT

40
ALTERNATIVE AVERAGE-CASE
ANALYSIS OF QUICKSORT

recurrence for average case: Q(n) =

41
PROBLEM
 Consider the problem of determining whether an
arbitrary sequence {x1, x2, ..., xn} of n numbers contains
repeated occurrences of some number. Show that this can
be done in Θ(nlgn) time.
 Sort the numbers
 Θ(nlgn)
 Scan the sorted sequence from left to right, checking whether
two successive elements are the same
 Θ(n)
 Total
 Θ(nlgn)+Θ(n)=Θ(nlgn)

42
EX 2.3-6 (PAGE 37)
 Can we use Binary Search to improve InsertionSort (i.e.,
find the correct location to insert A[j]?)

Alg.: INSERTION-SORT(A)
for j ← 2 to n
do key ← A[ j ]
Insert A[ j ] into the sorted sequence A[1 . . j -1]
i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
43
A[i + 1] ← key
EX 2.3-6 (PAGE 37)
 Can we use binary search to improve InsertionSort (i.e.,
find the correct location to insert A[j]?)

 Thisidea can reduce the number of comparisons from O(n) to


O(lgn)
 Number of shifts stays the same, i.e., O(n)
 Overall, time stays the same ...
 Worthwhile idea when comparisons are expensive (e.g.,
compare strings)

44
PROBLEM
 Analyze the complexity of the following function:

F(i)
if i=0
then return 1
return (2*F(i-1))

Recurrence: T(n)=T(n-1)+c
Use iteration to solve it .... T(n)=Θ(n)
45
PROBLEM
 What is the running time of Quicksort when all the
elements are the same?

 Using Hoare partition  best case


 Split in half every time
 T(n)=2T(n/2)+n  T(n)=Θ(nlgn)

 Using Lomuto’s partition  worst case


 1:n-1 splits every time
 T(n)=Θ(n2)

46
NP-COMPLETENESS

47
OUTLINE
 P and NP
 Definition of P
 Definition of NP
 Alternate definition of NP

 NP-completeness
 Definition of NP-hard and NP-complete
 The Cook-Levin Theorem
 Some NP-complete problems
 Problem reduction
 SAT (and CNF-SAT and 3SAT)
 Vertex Cover
 Clique
 Hamiltonian Cycle

48
RUNNING TIME REVISITED
 Input size, n
 To be exact, let n denote the number of bits in a nonunary encoding of the input
 All the polynomial-time algorithms studied so far in this course run in polynomial
time using this definition of input size.
 Exception: any pseudo-polynomial time algorithm

849 PVD
1843 ORD 2
SFO 14
802
4 3 LGA
17
337

7
HNL 2555 138 10
9
LAX 1233 9
DFW 1120 49
MIA
DEALING WITH HARD PROBLEMS
 What to do when we find a problem that looks hard…

I couldn’t find a polynomial-time algorithm;


I guess I’m too dumb.
50 79])
(cartoon inspired by [Garey-Johnson,
DEALING WITH HARD PROBLEMS
 Sometimes we can prove a strong lower bound… (but not
usually)

I couldn’t find a polynomial-time algorithm,


because no such algorithm exists!
51 79])
(cartoon inspired by [Garey-Johnson,
DEALING WITH HARD PROBLEMS
 NP-completeness let’s us show collectively that a problem
is hard.

I couldn’t find a polynomial-time algorithm,


but neither could all these other smart people.
52 79])
(cartoon inspired by [Garey-Johnson,
POLYNOMIAL-TIME
DECISION PROBLEMS
 To simplify the notion of “hardness,” we will focus on the
following:
 Polynomial-time as the cut-off for efficiency
 Decision problems: output is 1 or 0 (“yes” or “no”)
 Examples:
 Does a given graph G have an Euler tour?
 Does a text T contain a pattern P?
 Does an instance of 0/1 Knapsack have a solution with benefit at least K?
 Does a graph G have an MST with weight at most K?

53
PROBLEMS AND LANGUAGES
 A language L is a set of strings defined over some alphabet Σ
 Every decision algorithm A defines a language L
 L is the set consisting of every string x such that A outputs “yes” on input x.
 We say “A accepts x’’ in this case
 Example:

 If A determines whether or not a given graph G has an Euler tour, then the

language L for A is all graphs with Euler tours.

54
THE COMPLEXITY CLASS P

 A complexity class is a collection of languages


 P is the complexity class consisting of all languages that are accepted by
polynomial-time algorithms
 For each language L in P there is a polynomial-time decision algorithm A for L.
 If n=|x|, for x in L, then A runs in p(n) time on input x.
 The function p(n) is some polynomial

55
THE COMPLEXITY CLASS NP
 We say that an algorithm is non-deterministic if it uses the following operation:
 Choose(b): chooses a bit b
 Can be used to choose an entire string y (with |y| choices)

 We say that a non-deterministic algorithm A accepts a string x if there exists


some sequence of choose operations that causes A to output “yes” on input x.
 NP is the complexity class consisting of all languages accepted by polynomial-
time non-deterministic algorithms.

56
NP EXAMPLE
 Problem: Decide if a graph has an MST of weight K

 Algorithm:
1. Non-deterministically choose a set T of n-1 edges
2. Test that T forms a spanning tree
3. Test that T has weight K

 Analysis: Testing takes O(n+m) time, so this algorithm runs in polynomial


time.

57
AN INTERESTING PROBLEM
A Boolean circuit is a circuit of AND, OR, and NOT
gates; the CIRCUIT-SAT problem is to determine if
there is an assignment of 0’s and 1’s to a circuit’s
inputs so that the circuit outputs 1.
Inputs:
Logic Gates: 0 1 0
1
NOT 1
1
0 1 Output:

OR 1
0
1 0 1

AND

58
CIRCUIT-SAT IS IN NP
Non-deterministically choose a set of inputs and the
outcome of every gate, then test each gate’s I/O.

Inputs:
Logic Gates: 0 1 0
1
NOT 1
1
0 1 Output:

OR 1
0
1 0 1

AND

59
NP-COMPLETENESS
 A problem (language) L is NP-hard if every problem in NP can be
reduced to L in polynomial time.
 That is, for each language M in NP, we can take an input x for M,
transform it in polynomial time to an input x’ for L such that x is in M
if and only if x’ is in L.
 L is NP-complete if it’s in NP and is NP-hard.

NP poly-time L

60
NP-complete
SOME THOUGHTS ABOUT P problems live here

AND NP NP P

CIRCUIT-SAT

 Belief: P is a proper subset of NP.


 Implication: the NP-complete problems are the hardest in NP.
 Why: Because if we could solve an NP-complete problem in polynomial time, we could solve
every problem in NP in polynomial time.
 That is, if an NP-complete problem is solvable in polynomial time, then P=NP.
 Since so many people have attempted without success to find polynomial-time solutions to
NP-complete problems, showing your problem is NP-complete is equivalent to showing that a
lot of smart people have worked on your problem and found no polynomial-time algorithm.

61
PROBLEM REDUCTION
 A language M is polynomial-time reducible to a language L if an instance x for M
can be transformed in polynomial time to an instance x’ for L such that x is in M if
and only if x’ is in L.
 Denote this by ML.
 A problem (language) L is NP-hard if every problem in NP is polynomial-time
reducible to L.
 A problem (language) is NP-complete if it is in NP and it is NP-hard.
 CIRCUIT-SAT is NP-complete:
 CIRCUIT-SAT is in NP
 For every M in NP, M  CIRCUIT-SAT.

Inputs:
0 1 0
1
1
1
0 1 Output:

1
0 62
1 0 1
TRANSITIVITY OF REDUCIBILITY
 If A  B and B  C, then A  C.
 An input x for A can be converted to x’ for B, such that x is in A if and only if x’ is in B.
Likewise, for B to C.
 Convert x’ into x’’ for C such that x’ is in B iff x’’ is in C.
 Hence, if x is in A, x’ is in B, and x’’ is in C.
 Likewise, if x’’ is in C, x’ is in B, and x is in A.
 Thus, A  C, since polynomials are closed under composition.
 Types of reductions:
 Local replacement: Show A  B by dividing an input to A into components and show
how each component can be converted to a component for B.
 Component design: Show A  B by building special components for an input of B that
enforce properties needed for A, such as “choice” or “evaluate.”

63
SAT
 A Boolean formula is a formula where the variables and operations are
Boolean (0/1):
 (a+b+¬d+e)(¬a+¬c)(¬b+c+d+e)(a+¬c+¬e)
 OR: +, AND: (times), NOT: ¬

 SAT: Given a Boolean formula S, is S satisfiable, that is, can we assign


0’s and 1’s to the variables so that S is 1 (“true”)?
 Easy to see that CNF-SAT is in NP:
 Non-deterministically choose an assignment of 0’s and 1’s to the variables and
then evaluate each clause. If they are all 1 (“true”), then the formula is satisfiable.

64
SAT IS NP-COMPLETE
 Reduce CIRCUIT-SAT to SAT.
 Given a Boolean circuit, make a variable for every input and gate.
 Create a sub-formula for each gate, characterizing its effect. Form the
formula as the output variable AND-ed with all these sub-formulas:
 Example: m((a+b)↔e)(c↔¬f)(d↔¬g)(e↔¬h)(ef↔i)…

Inputs:
a e h
b The formula is satisfiable
i
k if and only if the
c f Output: Boolean circuit
g
m is satisfiable.
d j n
65
3SAT *
 The SAT problem is still NP-complete even if the formula is a
conjunction of disjuncts, that is, it is in conjunctive normal form
(CNF).
 The SAT problem is still NP-complete even if it is in CNF and every
clause has just 3 literals (a variable or its negation):
 (a+b+¬d)(¬a+¬c+e)(¬b+d+e)(a+¬c+¬e)

66
VERTEX COVER
 A vertex cover of graph G=(V,E) is a subset W of V, such that, for every edge (a,b)
in E, a is in W or b is in W.
 VERTEX-COVER: Given an graph G and an integer K, is does G have a vertex
cover of size at most K?

 VERTEX-COVER is in NP: Non-deterministically choose a subset W of size K and


check that every edge is covered by W.
67
VERTEX-COVER IS NP-COMPLETE
Reduce 3SAT to VERTEX-COVER.
Let S be a Boolean formula in CNF with each clause
having 3 literals.
For each variable x, create a node for x and ¬x,
and connect these two:
x ¬x
For each clause (a+b+c), create a triangle and
connect these three nodes.
a

68
c b
VERTEX-COVER IS NP-COMPLETE
Completing the construction
Connect each literal in a clause triangle to its copy
in a variable pair.
E.g., a clause (¬x+y+z)

x ¬x y ¬y z ¬z

Let n=# of variables


a
Let m=# of clauses
Set K=n+2m
69
c b
VERTEX-COVER IS NP-COMPLETE
Example: (a+b+c)(¬a+b+¬c)(¬b+¬c+¬d)
Graph has vertex cover of size K=4+6=10 iff formula is
satisfiable.

a ¬a b ¬b c ¬c d ¬d

12 22 32

70

11 13 21 23 31 33
CLIQUE
 A clique of a graph G=(V,E) is a subgraph C that is fully-connected (every pair in C
has an edge).
 CLIQUE: Given a graph G and an integer K, is there a clique in G of size at least K?

This graph has


a clique of size 5

 CLIQUE is in NP: non-deterministically choose a subset C of size K and check that


every pair in C has an edge in G.

71
CLIQUE IS NP-COMPLETE
Reduction from VERTEX-COVER.
A graph G has a vertex cover of size K if and only if
it’s complement has a clique of size n-K.

G G’
72
SOME OTHER NP-COMPLETE
PROBLEMS
 SET-COVER: Given a collection of m sets, are there K of these sets
whose union is the same as the whole collection of m sets?
 NP-complete by reduction from VERTEX-COVER
 SUBSET-SUM: Given a set of integers and a distinguished integer
K, is there a subset of the integers that sums to K?
 NP-complete by reduction from VERTEX-COVER

73
SOME OTHER NP-COMPLETE
PROBLEMS
 0/1 Knapsack: Given a collection of items with weights and benefits, is there
a subset of weight at most W and benefit at least K?
 NP-complete by reduction from SUBSET-SUM
 Hamiltonian-Cycle: Given an graph G, is there a cycle in G that visits each
vertex exactly once?
 NP-complete by reduction from VERTEX-COVER
 Traveling Saleperson Tour: Given a complete weighted graph G, is there a
cycle that visits each vertex and has total cost at most K?
 NP-complete by reduction from Hamiltonian-Cycle.

74

You might also like