Correctness of Kruskal's Algorithm: Operations

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Correctness of Kruskal’s algorithm

1. Throughout the execution of Kruskal, (V , F ) remains a spanning forest.


Proof: (V , F ) is a spanning subgraph because the vertex set is V . It
Algorithms and Data Structures: always remains a forest because edges with endpoints in different
connected components never induce a cycle.
Minimum Spanning Trees II
2. Eventually, (V , F ) will be connected and thus a spanning tree.
Proof: Suppose that after the complete execution of the loop, (V , F ) has
a connected component (V1 , F1 ) with V1 6= V . Since G is connected, there
is an edge e ∈ E with exactly one endpoint in V1 . This edge would have
been added to F when being processed in the loop, so this can never
5th Nov, 2010 happen.
3. Throughout the execution of Kruskal, (V , F ) is contained in some MST
of G.
Proof: Similar to the proof of the corresponding statement for Prim’s
algorithm.

ADS: lect 12 – slide 1 – 5th Nov, 2010 ADS: lect 12 – slide 3 – 5th Nov, 2010

Kruskal’s Algorithm Data Structures for Disjoint Sets


A different approach to computing MSTs. I A disjoint set data structure maintains a collection S = {S1 , . . . , Sk }
A forest is a graph whose connected components are trees. of disjoint sets.
I The sets are dynamic, i.e., they may change over time.
Idea
Starting from the spanning forest without any edges, I Each set Si is identified by some representative, which is some
repeatedly add edges of minimum weight until the forest member of that set.
becomes a tree. Operations:
I Make-Set(x): Creates new set whose only member is x. The
Algorithm Kruskal(G, W ) representative is x.
1. F ← ∅ I Union(x, y ): Unites set Sx containing x and set Sy containing y
2. for all e ∈ E in the order of increasing weight do into a new set S and removes Sx and Sy from the collection.
3. if the endpoints of e belong to differ- I Find-Set(x): Returns representative of the set holding x.
ent connected components of (V , F )
then
4. F ← F ∪ {e}
5. return tree with edge set F

ADS: lect 12 – slide 2 – 5th Nov, 2010 ADS: lect 12 – slide 4 – 5th Nov, 2010
Implementation of Kruskal’s Algorithm Amortized Analysis
Want to analyse . . .
Algorithm Kruskal(G, W )
1. F ← 0 Given a sequence of m “Disjoint Set” operations
2. for all vertices v of G do (Make-Set, Union, Find-Set), bound the total running
3. Make-Set(v ) time to perform a sequence of m operations.
4. sort edges of G into non-decreasing order by weight
I Usually parametrize in terms of n (no. of Make-Set operations),
5. for all edges (u, v ) of G in non-decreasing order by weight do
not just m.
6. if Find-Set(u) 6= Find-Set(v ) then
7. F ← F ∪ {(u, v )} I (Of course) the bound we get will depend on the particular data
8. Union(u, v ) structure/implementation we design for this problem.
9. return F

ADS: lect 12 – slide 5 – 5th Nov, 2010 ADS: lect 12 – slide 7 – 5th Nov, 2010

Analysis of Kruskal Linked List Implementation of Disjoint Sets


Linked List Implementation of Disjoint Sets
Let n be the number of vertices and m the number of edges of the input Each element represented by a pointer to a cell:
graph
Each element represented by a pointer to a cell:
I Line 1, line 9: Θ(1)
I Loop in Lines 2–3: Θ(n · TMake-Set (n))
I Line 4: Θ(m lg m)
  x
I Loop in Lines 5–8: Θ m · TFind-Set (n) + n · TUnion (n) .
Overall:
Use a linked list for each set.
  
Θ n TMake-Set (n) + TUnion (n) + m lg m + TFind-Set (n)
I Representative
Use of the
a linked list for set
eachis at the head of the list.
set.
With efficient implementation of disjoint set this amounts to I Each cell has a pointer
Representative of thedirect
set istoat
thethe
representative
head of the (head
list.of the list).
T (n, m) = Θ(m lg m). I Each cell has a pointer direct to the representative (head of the list).

A&DS Lecture 11 8 Mary Cryan

ADS: lect 12 – slide 6 – 5th Nov, 2010 ADS: lect 12 – slide 8 – 5th Nov, 2010
Example
Example Example (cont’d)
Example (cont’d)
Linked list representation of
Linked list representation of

{ a, f }, { b }, { g, c, e }, {d} :
a f
a f

b b

g c e g c e

d
d
The representatives are a, b, g and d.
The representatives are a, b, g and d. U NION (g, b)
A&DS Lecture 11 9 Mary Cryan
ADS: lect 12 – slide 9 – 5th Nov, 2010 ADS: lect 12 – slide 11 – 5th Nov, 2010
A&DS Lecture 11 11 Mary Cryan

Analysis of Linked List Implementation Conventions for Further Analysis


Make-Set: constant time. Express running time in terms of:
Find-Set: constant time. n : the number of Make-Set operations,
Union: Naive implementation of m : the number of Make-Set, Union and Find-Set
operations.
Union(x, y )
Note
appends x’s list onto end of y ’s list. 1. After n − 1 Union operations only one set remains.
Assumption: Representative y of each set has attribute 2. m ≥ n.
last[y ]: a pointer to last cell of y ’s list. (not shown)
Snag: have to update “representative pointer” in each cell
of x’s list to point to the representative (head) of y ’s list.
Cost is:
Θ(length of x’s list).

ADS: lect 12 – slide 10 – 5th Nov, 2010 ADS: lect 12 – slide 12 – 5th Nov, 2010
A nasty example The Forest Implementation of Disjoint-Sets
The Forest Implementation of Disjoint-Sets
Take: n = dm/2e + 1, q = m − n = bm/2c − 1. Each set represented by a rooted tree:
Elements: x1 , x2 , . . . , xn .
Each set represented by a rooted tree:
Operation Number of objects updated
Make-Set(x1 ) 1
Make-Set(x2 ) 1 i g b
. .
. .
Make-Set(xn ) 1
Union(x1 , x2 ) 1 f h e a
Union(x2 , x3 ) 2
Union(x3 , x4 ) 3
. .
. .
c d
Union(xq−1 , xq ) q−1
Total Θ(m2 )

ADS: lect 12 – slide 13 – 5th Nov, 2010 A&DS Lecture 11


ADS:
15
lect 12 – slide 15 – 5th Nov, 2010Mary Cryan

The Weighted-Union Heuristic Basic Operations

IDEA Record length of each list. To execute Make-Set: Constant time.


Find-Set: Follow pointers to root. (Path followed is called the find
Union(x, y ) path.) Cost proportional to height of tree.
append shorter list to longer one (breaking ties arbitrarily). Union: Naive strategy: root of tree of x made to point to that
of y . Cost proportional to height of x’s tree plus the height
Theorem 1 of y ’s tree.
Using the linked-list representation of disjoint sets and the weighted-union
heuristic, a sequence of m Make-Set, Union & Find-Set operations, n of Not faster than linked list implementation.
which are Make-Set operations, takes

O(m + n lg n)

time.
Crucial Proof Idea: Each element can appear at most lg n times in the shorter
list of a Union. (because if x is in the shortest list, the union operation doubles
the size of x’s list).

ADS: lect 12 – slide 14 – 5th Nov, 2010 ADS: lect 12 – slide 16 – 5th Nov, 2010
Improving the Running Time
f b g d
General Strategy
rank [f] = 0 rank [d] = 0
Keep trees low.
a e c
U NION (f, d) rank [g] = 1 rank [d] = 1
Two Heuristics
1. Union-by-Rank: Attach lower tree to the root of b g d
higher tree in Union
2. Path Compression: Update tree during Find-Set a e c f
operations. U NION (f, g) rank [g] = 2

b g

a d e c

f
A&DS Lecture 11 19 Mary Cryan
ADS: lect 12 – slide 17 – 5th Nov, 2010 ADS: lect 12 – slide 19 – 5th Nov, 2010

The Union-by-Rank Heuristic The Height of the Trees


At each node x maintain a variable rank [x], such that if x is the root of Lemma 2
its tree, rank [x] is the height of this tree. The height of a tree (in our datastructure) is at most

lg(size of the tree).


In executing
Union(x, y ) Proof By induction on the number of Union operations it is proved that a tree of
height h has size at least 2h .
make the tree of smaller rank point to the one with larger rank.
I As long as there are no Unions, all trees have height 0 and contain 1 = 20 node.
(Break ties arbitrarily.)
I Suppose Union(x, y ) is executed. Let rx and ry be the roots of the trees of x
and y , resp.
Case 1: rank (rx ) < rank (ry ).
Then rx becomes child of ry , and the height remains unchanged, but the number
of nodes increases.
Case 2: rank (rx ) > rank (ry ). Analogously.
Case 3: rank (rx ) = rank (ry ).
Then height increases by one and the size of the new tree is
size(rx ) + size(ry ) ≥ 2rank (rx ) + 2rank (ry ) = 2rank (rx )+1 ,
which is the height of the new tree.
ADS: lect 12 – slide 18 – 5th Nov, 2010 ADS: lect 12 – slide 20 – 5th Nov, 2010
Running Time with Union-by-Rank Only Example
Example
Make-Set: constant time.
Find-Set: Bounded by rank: O(log n). f

Union: Bounded by rank: O(log n). e

d f

Bottom line (m operations, of which n are Make-Set): c


a b c d e

O(m log n). b

NOT any better than Linked-Lists with Weighted union heuristic. a

F IND -S ET (a)

A&DS Lecture 11 Find-Set(a)


23 Mary Cryan

ADS: lect 12 – slide 21 – 5th Nov, 2010 ADS: lect 12 – slide 23 – 5th Nov, 2010

The Path-Compression Heuristics The Ackermann Function


Idea Ackerman’s Function
When performing Function A : N × N → N defined by
Find-Set(x)
A(1, j) = 2j for j ≥ 1
make each vertex on find path point to root. A(i, 1) = A(i − 1, 2) for i ≥ 2
A(i, j) = A(i − 1, A(i, j − 1)) for i, j ≥ 2
Implementation
Other variants exist—last line of definition is crucial
common point.
Algorithm Find-Set(x)
1. if x 6= π(x) then
2. π(x) ← Find-Set(π(x)) Inverse
3. return π(x) Not true mathematical inverse—grows as slowly as A(i, j)
grows fast:

α(m, n) = min{i ≥ 1 | A(i, bm/nc) > lg n}.

ADS: lect 12 – slide 22 – 5th Nov, 2010 ADS: lect 12 – slide 24 – 5th Nov, 2010
Small table of A(i, j) Reading
[CLRS] Chapter 21 or [CLR] Chapter 22
j =1 j =2 j =3 j =4 [CLRS] Chapter 23 This is Chapter 24 of [CLR].
i =1 21 22 23 24
2
2 22 22
i =2 22 22  22   22    Chapters and Exercises have same labels in both editions of [CLRS]
·2 ·2 ·2 ·2 ·2 ·2
·· ·· ·· ·· ·· ··
2 16 2 2 16 2 2 2 16
22
i =3 2 2 2 2

Historical importance of A(i, j): Showed that the class of primitive recursive
functions does not include all computable functions.
A(i, j) grows faster than any primitive recursive function.

ADS: lect 12 – slide 25 – 5th Nov, 2010 ADS: lect 12 – slide 27 – 5th Nov, 2010

Anaysis of Disjoint Forests with both Heuristics Problems


1. Exercise 23.2-1 of [CLRS] Ex. 24.2-1 of [CLR].
Theorem 3
2. Suppose that all edge weights in a graph G are integers in the
Using both the union-by-rank and path compression heuristic, the
range 1 to |V |. How fast can you make Kruskal’s algorithm run in
worst-case runtime for disjoint forests is
this case?
O(m α(m, n)). What if the edge weights are integers in the range from 1 to C , for
some constant C ?
This is Exercise 23.2-4 of [CLRS] Ex. 24.2-4 of [CLR].
Remark
3. Exercise 21.2-2 of [CLRS]. 22.2-2 of [CLR].
Ackermann’s Function grows really really fast, so the
inverse α grows really really slowly. 4. Exercise 21.3-1 of [CLRS]. 22.3-1 of [CLR].
For all practical purposes α(m, n) ≤ 4.

ADS: lect 12 – slide 26 – 5th Nov, 2010 ADS: lect 12 – slide 28 – 5th Nov, 2010

You might also like