Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Algorithms and Problem Solving (15B11CI411)

EVEN 2022

Design Technique: Greedy

Jaypee Institute of Information Technology (JIIT)


A-10, Sector 62, Noida
Greedy Algorithms

• A problem exhibits optimal substructure if an optimal solution contains within it optimal


solutions to sub-problems
• A problem has greedy-choice property if a global optimal solution can be arrived by making a
locally optimal greedy choice at each step
• Always make the choice that look best at the moment
• Always makes the choice that looks best at the moment
• When a greedy algorithm leads to an optimal solution, it is because a locally optimal choice
leads to a globally optimal solution.
• Usually simple and fast
• Implementation/running time analysis is typically straightforward
• Often implementation involves use of a sorting algorithm or a data structure to facilitate
identification of next greedy choice
Job Scheduling
• Job scheduling is the problem of scheduling jobs out of a set of N jobs on a single processor
which maximizes profit as much as possible.
• Consider N jobs, each taking unit time for execution.
• Each job is having some profit and deadline associated with it.
• Profit earned only if the job is completed on or before its deadline.
• Otherwise, we have to pay a profit as a penalty.
• Each job has deadline di ≥ 1 and profit pi ≥ 0. At a time, only one job can be active on the
processor.
• For N jobs, there exist 2N schedules, so this brute force approach runs in O(2N) time.
• Greedy approach:
• Sort all jobs in decreasing order of profit.
• Start with the empty schedule, select one job at a time and if it is feasible then schedule it
in the latest possible slot.
Algorithm JOB_SCHEDULING( J, D, P ) Complexity Analysis
// J: Array of N jobs • On average, N jobs search N/2 slots.
// D: Array of the deadline for each job • This would take O(N2) time.
// P: Array of profit associated with each job • However, with the use of set data structure
(find and union), the algorithm runs nearly in
Sort all jobs in J in decreasing order of profit O(N) time.
S←Φ
SP ← 0
for i ← 1 to N do
if Job J[i] is feasible then
Schedule the job in the latest possible free slot meeting its deadline.
S ← S ∪ J[i]
SP ← SP + P[i]
• n = 7, profits (p1, p2, p3, p4, p5, p6, p7) = (3, 5, 20, 18, 1, 6, 30) and deadlines (d1, d2, d3, d4, d5, d6,
d7) = (1, 3, 4, 3, 2, 1, 2). Schedule the jobs in such a way to get maximum profit.
Jobs j1 j2 j3 j4 j5 j6 j7
Profit 3 5 20 18 1 6 30
Deadline 1 3 4 3 2 1 2

• Sort all jobs in descending order of profit.


• So, P = (30, 20, 18, 6, 5, 3, 1), J = (J7, J3, J4, J6, J2, J1, J5) and D = (2, 4, 3, 1, 3, 1, 2).
• Iteration 1: Deadline for job J7 is 2. Slot 2 (t = 1 to t = 2) is free, so schedule it in slot 2. Solution
set S = {J7}, and Profit SP = {30}

• Iteration 2: Deadline for job J3 is 4. Slot 4 (t = 3 to t = 4) is free, so schedule it in slot 4. Solution


set S = {J7, J3}, and Profit SP = {30, 20}

• Iteration 3: Deadline for job J4 is 3. Slot 3 (t = 2 to t = 3) is free, so schedule it in slot 3.


• Solution set S = {J7, J3, J4}, and Profit SP = {30, 20, 18}
• Iteration 4: Deadline for job J6 is 1. Slot 1 (t = 0 to t = 1) is free, so schedule it in slot 1.
• Solution set S = {J7, J3, J4, J6}, and Profit
• SP = {30, 20, 18, 6}

• First, all four slots are occupied and none of the remaining jobs has deadline lesser than 4.
• So none of the remaining jobs can be scheduled.
• Thus, with the greedy approach, we will be able to schedule four jobs {J 7, J3, J4, J6}, which give a
profit of (30 + 20 + 18 + 6) = 74 units.
Knapsack Problem
• Given a set of items having some weight and value/profit associated with it. The knapsack
problem is to find the set of items such that the total weight is less than or equal to a given
limit (size of knapsack) and the total value/profit earned is as large as possible.
• Knapsack problem has two variants.
• Binary or 0/1 knapsack : Item cannot be broken down into parts.
• Fractional knapsack : Item can be divided into parts.
• Useful in solving resource allocation problems
• Let X = <x1, x2, x3, . . . . . , xn> is the set of n items. W = <w1, w2, w3, . . . , wn> and V = <v1, v2, v3, . .
. , vn> are the set of weight and value associated with each items in x, respectively. Knapsack
capacity is M.
• Select items one by one from the set of items x and fill the knapsack such that it would
maximize the value.
Greedy algorithm for Binary Knapsack Iteration 1 :
Weight = (Weight + w1) = 0 + 1 = 1
Algorithm BINARY_KNAPSACK(W, V, M) Weight ≤ M, so select I1
// Items are pre sorted in decreasing order of pi = vi / wi ratio
S ← Φ // Variable to keep track of weight of selected items S = { I1 }, Weight = 1, P = 0 + 11 = 11
i←1 Iteration 2 :
P ← 0 // Variable to store earned profit
while S < M do Weight = (Weight + w2) = 1 + 11 = 12
if (S + w[i]) ≤ M then Weight ≤ M, so select I2
S ← Sunion ∪ w[i] S = { I1, I2 }, Weight = 12, P = 11 + 21 = 32
P ← P + v[i]
else Iteration 3 :
i←i+1 Weight = (Weight + w3) = 12 + 21= 33
Item Weight Value pi = vi / wi
Weight ≤ M, so select I3
Example: S = { I1, I2, I3 }, Weight = 33, P = 32 + 31 = 63
I1 1 11 11.0
N=8
I2 11 21 1.91 Iteration 4 :
P = {11, 21, 31, 33, 43, 53, 55, 65}
I3 21 31 1.48 Weight = (Weight + w4) = 33 + 23 = 56
W = {1, 11, 21, 23, 33, 43, 45, 55}
I4 23 33 1.44 Weight ≤ M, so select I4
M = 110
I5 33 43 1.30
S = { I1, I2, I3, I4 }, Weight = 56, P = 63 + 33 = 96
I6 43 53 1.23
I7 45 55 1.22
I8 55 65 1.18
Greedy algorithm for Binary Knapsack Iteration 5 :
Weight = (Weight + w5) = 56 + 33 = 89
Algorithm BINARY_KNAPSACK(W, V, M) Weight ≤ M, so select I5
// Items are pre sorted in decreasing order of pi = vi / wi ratio
S ← Φ // Variable to keep track of weight of selected items S = { I1, I2, I3, I4, I5 }, Weight = 89, P = 96 + 43 =
i←1 139
P ← 0 // Variable to store earned profit Iteration 6 :
while S < M do
if (S + w[i]) ≤ M then Weight = (Weight + w6) = 89 + 43 = 132
S ← Sunion ∪ w[i] Weight > M, so reject I6
P ← P + v[i]
else S = { I1, I2, I3, I4, I5 }, Weight = 89, P = 139
i←i+1 Iteration 7 :
Weight = (Weight + w7) = 89 + 45= 134
Item Weight Value pi = vi / wi
Example: Weight > W, so reject I7
I1 1 11 11.0
N=8 S = { I1, I2, I3, I4, I5 }, Weight = 89, P = 139
I2 11 21 1.91
P = {11, 21, 31, 33, 43, 53, 55, 65} Iteration 8 : Weight = (Weight + w8) = 89 +
I3 21 31 1.48
W = {1, 11, 21, 23, 33, 43, 45, 55} 55= 144
I4 23 33 1.44
M = 110 Weight > M, so reject I8
I5 33 43 1.30
I6 43 53 1.23 S = { I1, I2, I3, I4, I5 }, Weight = 89, P = 139
I7 45 55 1.22
I8 55 65 1.18
Fractional Knapsack Iteration 1 :
SW= (SW + w2) = 0 + 15 = 15
Algorithm FRACTIONAL_KNAPSACK(W, V, M) SW ≤ M, so select I2
// Items are pre sorted in decreasing order of pi = vi / wi ratio
S←Φ S = { I2 }, SW = 15, SP = 0 + 25 = 25
SW ← 0 // weight of selected items Iteration 2 :
SP ← 0 // profit of selected items
i←1 SW + w1 > M, so break down item I1.
while i ≤ n do The remaining capacity of the knapsack is 5
if (SW + w[i]) ≤ M then unit, so select only 5 units of item I1.
S ← S ∪ X[i]
SW ← SW + W[i] frac = (M – SW) / W[i] = (20 – 15) / 18 = 5 /
SP ← SP + V[i] 18
else S = { I2, I1 * 5/18 }
frac ← (M - SW) / W[i]
SP = SP + v1 * frac = 25 + (24 * (5/18)) = 25 +
S ← S ∪ X[i] * frac // Add fraction of item X[i]
SP ← SP + V[i] * frac // Add fraction of profit 6.67 = 31.67
SW ← SW + W[i] * frac // Add fraction of weight SW = SW + w1 * frac = 15 + (18 * (5/18)) = 15
i←i+1 + 5 = 20
Item (xi) Value (vi) Weight (wi) pi = vi / wi The knapsack is full.
Example:
N = 3, M = 20, V = (24, 25, 15) I2 25 15 1.67 Selects items { I2, I1 * 5/18 }, and it gives a
and W = (18, 15, 20) I1 24 18 1.33 profit of 31.67 units.
I3 15 20 0.75
Minimum Spanning Tree Problem
Prim’s Algorithm
• Initialize list of unvisited vertices with UV = {V1, V2, V3, . . . , Vn}
• Select any arbitrary vertex from input graph G, call that subgraph as partial MST. Remove
selected vertex from UV
• Form a set NE – a set of unvisited neighbour edges of all vertices present in partial MST
• Select an edge e = (u, v) from NE with minimum weight, and add it to partial MST if it does
not form a cycle and it is not already added.
If the addition of edge e forms a cycle, then skip that edge and select next minimum
weight edge from NE. Continue this procedure until we get an edge e which does not form
a cycle. Remove corresponding added vertices u and v from UV
• Go to step 2 and repeat the procedure until UV is empty.
• Prim’s algorithm is preferred when the graph is dense with V << E.
Kruskal performs better in the case of the sparse graph with V ≣ E.
MST-Prim(G, w, r)
Q = V[G];
for each u  Q
key[u] = ;
key[r] = 0;
p[r] = NULL;
while (Q not empty)
u = ExtractMin(Q);
for each v  Adj[u] Partial solution
Set of
neighbour
Cost of
neighbour Updated UV
if (v  Q and w(u,v) < key[v]) edges NE edges
p[v] = u; Step 1 : UV = {a, b, c, d, e, f}. <a, b> 3 { b, c, d, e, f}
key[v] = w(u,v); Let us start with arbitrary vertex <a, f> 5
a. Initial partial solution <a, e> 6
Complexity Analysis
ExtractMin called Θ(V) times Step 2 : <a, f> 5 { c, d, e, f}
Edge <a, b> <a, e> 6
DecreaseKey called Θ(E) times
has minimum <b, c> 1
Running time = Θ(V) TExtractMin + Θ(E) TDecreaseKey cost, so add it. <b, f> 4
Binary Heap:
Time = Θ(V lg V) + Θ(E lg V) = O(E lg V) (since for
a connected graph |E| ≥ |V| - 1) Step 3 : <a, f> 5 { d, e, f}
Edge <b, c> <a, e> 6
Fibonacci Heap:
has a <b, f> 4
Time = O (V lg V + E) minimum <c, f> 4
cost, so add it. <c, d> 6
Set of neighbour Cost of
Partial solution Updated UV
edges NE neighbour edges
Step 4 : Edge <b, f> <a, f> 5 { d, e }
and <c, f> have same <a, e> 6
minimum cost, so we <b, f> 4
can add any of it. <c, d> 6
Let us add <c, f> <f, d> 5
<f, e> 2
Step 5 : Edge <f, e> <a, f> 5 {d}
has a minimum cost, <a, e> 6
so add it. <b, f> 4
<c, d> 6
<f, d> 5

Step 6 : Edge <b, f> has minimum cost, but UV is empty so the
its inclusion in above partial solution creates tree generated in
cycle, so skip edge <b, f> and check for step 6 is the
next minimum cost edge i.e. <a, f> Inclusion minimum spanning
of <a, f> also creates cycle so skip it and tree of given graph.
check for next minimum cost edge, i.e. <f, Cost of solution: w(a,
d> The inclusion of <f, d> does not create a b) + w(b, c) + w(c, f)
cycle, so it is a feasible edge, add it. + w(f, d) + w(f, e) = 3
+1+4+5+2=
15
Minimum Spanning Tree Problem
Krushkal’s Algorithm
• The algorithm first sorts all the edges in nondecreasing order of their weight.
• Edge with minimum weight is selected and its feasibility is tested.
• If inclusion of the edge to a partial solution does not form the cycle, then the edge is feasible
and added to the partial solution.
• If is not feasible then skip it and check for the next edge. The process is repeated until all edges
are scanned.
• Let, list unvisited edges UE = {e1, e2, e3, . . . , em}, be the edges sorted by increasing order of
their weight.
• Select minimum weight edge emin from input graph G, which is not already added.
• If the addition of edge emin to a partial solution does not form a cycle, add it. Otherwise
look for next minimum weight edge until we get the feasible edge. Remove checked edges
from E.
• Go to step 1 and repeat the procedure until all edges in E are scanned.
Kruskal() Sorted edges are listed in following table:
{
T = ; Edge <1, 2> <3, 5> <1, 3> <2, 3> <3, 4> <4, 5> <2, 4>

for each v  V Cost 1 2 3 3 4 5 6


MakeSet(v);
Initially, set of unvisited edges UE = {<1, 2>, <3, 5>, <1, 3>,
Sort E by increasing edge weight w <2, 3>, <3, 4>, <4, 5>, <2, 4>}
for each (u,v)  E (in sorted order)
if FindSet(u)  FindSet(v)
Partial solution Updated UE
T = T U {{u,v}};
Union(FindSet(u), FindSet(v)); Step 1: Minimum cost edge is UE = { <3, 5>, <1, 3>,
} <1, 2>, so add it to MST and <2, 3>, <3, 4>, <4, 5>,
remove from UE <2, 4> }

Complexity Analysis Step 2: Minimum cost edge UE = { <1, 3>, <2, 3>,
Sort edges: O(E lg E) is <3, 5>, so add it to MST <3, 4>, <4, 5>, <2, 4>
O(V) MakeSet()’s and remove from UE }
O(E) FindSet()’s
O(E) Union()’s
Best disjoint-set union algorithm makes above Step 3: Minimum cost edge is UE = { <2, 3>, <3, 4>,
3 operations take O((V+E)(V)) <1, 3>, so add it to MST <4, 5>, <2, 4> }
Overall thus O(E lg V), since for a connected and remove from UE
graph |E| ≥ |V| - 1 and (V) = O(lg V)
Partial solution Updated UE

Step 4 : Minimum cost edge is <2, 3>, UE = { <4, 5>, <2, 4> }
but its inclusion creates cycle so remove it
from UE So, UE = { <3, 4>, <4, 5>, <2, 4> }
Next minimum cost edge is <3, 4> and its
inclusion does not form cycle, so add it to
MST and remove from UE

Step 5 : Minimum cost edge is <4, 5>, but its inclusion creates cycle so UE is empty so the tree generated in step 4 is the
remove it from UE So, UE = { <2, 4> } minimum spanning tree of given graph.
Let w(u, v) represent the weight of edge (u, v)
Minimum cost edge is
<2, 4>, but its inclusion creates cycle so remove it from U E So, UE = { } Cost of solution:
w(1, 2) + w(1, 3)
+ w(3, 4) + w(3, 5) = 1 + 3 + 4 + 2 = 10
Single Source Shortest Path (SSSP) problem
Dijkstra’s Algorithm: For a given source vertex s, the algorithm finds the shortest path to every
other vertex v in the graph.
• Assumption : Weight of all edges is non-negative.
• Initializes the distance of source vertex to zero and remaining all other vertices to infinity.
• Set source node to current node and put remaining all nodes in the list of unvisited vertex list.
Compute the tentative distance of all immediate neighbour vertex of the current node.
• If the newly computed value is smaller than the old value, then update it.
• For example, C is the current node, whose distance from source S is dist (S, C) = 5. Consider
N is the neighbour of C and weight of edge (C, N) is 3. So the distance of N from source via
C would be 8. If the distance of N from source was already computed and if it is greater
than 8 then relax edge (S, N) and update it to 8, otherwise don’t update it.

d(S, N) = 11 d(S, N) = 7
d(S, C) + d(C, N) < d(S, N) d(S, C) + d(C, N) > d(S, N)
⇒ Relax edge (S, N) ⇒ Don’t update d(S, N)
Update d(S, N) = 8
• When all the neighbours of a current node are explored, mark it as visited. Remove it from
unvisited vertex list. Mark the vertex from unvisited vertex list with minimum distance and
repeat the procedure.
• Stop when the destination node is tested or when unvisited vertex list becomes empty.
DIJAKSTRA_SHORTEST_PATH(G, s, t)
// s is source vertex, t is target vertex, π[u] stores the parent/
previous node of u Complexity Analysis
dist[s] ← 0
First for loop does initialization in O(|V|) time.
π[s] ← NIL
for each vertex v ∈ V do As there are |V| nodes in the graph, size of queue Q
would be V, and hence while loop iterates |V| times
if v ≠ s then
in worst case.
dist[v] ← ∞
π[v] ← undefined For loop inside while loop run maximum |V| time,
because a node can have maximum |V| – 1
ENQUEUE(v, Q) // insert v to queue Q
neighbours.
while Q is not empty do
u ← vertex in Q having minimum dist[u] The worst case upper bound running time of this
algorithm is described as O(|V2|).
if u == t then
break
DEQUEUE(u, Q) // Remove u from queue Q
for each adjacent node v of u do
val ← dist[u] + weight(u, v) Note: Dijkstra’s algorithm cannot handle negative weight
if val<dist[v] then
dist[v] ← val
π[v] ← u
Source vertex is A.
Vertex u A B C D E F G H
dist[u] 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞
π[u] NIL NIL NIL NIL NIL NIL NIL NIL

Iteration 1:
Iteration 2:
u = unprocessed vertex in Q having minimum dist[u] = A
u = unprocessed vertex in Q having minimum dist[u] = B
Adjacent[A] = {B, E, F}
Adjacent[B] = {C, F, G}
val[B] = dist[A] + weight(A, B) = 0 + 1 = 1
val[C] = dist[B] + weight(B, C) = 1 + 2 = 3
Here, val[B] <dist[B], so update dist[B]
Here, val[C] < dist[C], so update dist[C]
dist[B] = 1, and π[B] = A
dist[C] = 3 and π[C] = B
val[E] = dist[A] + weight(A, E) = 0 + 4 = 4
val[F] = dist[B] + weight(B, F) = 1 + 6 = 7
Here, val[E] < dist[E], so update dist[E]
Here, val[F] < dist[F], so update dist[F]
dist[E] = 4 and π[6] = A
dist[F] = 7 and π[F] = B
val[F] = dist[A] + weight(A, F) = 0 + 8 = 8
val[G] = dist[B] + weight(B, G) = 1 + 6 = 7
Here, val[F] < dist[F], so update dist[F]
Here, val[G] < dist[G], so update dist[G]
dist[F] = 8 and π[F] = A
dist[G] = 7 and π[G] = B
Vertex u A B C D E F G H
Vertex u A B C D E F G H
dist[u] 0 1 ∞ ∞ 4 8 ∞ ∞
dist[u] 0 1 3 ∞ 4 7 7 ∞
π[u] NIL A NIL NIL A A NIL NIL
π[u] NIL A B NIL A B B NIL
Iteration 3:
u = unprocessed vertex in Q having minimum dist[u] = C
Adjacent [C] = {D, G}
val[D] = dist[C] + weight(C, D)= 3 + 1= 4
Here, val[D] < dist[D], so update dist[D]
dist[D] = 4 and π[D] = C Iteration 5:
val[G] = dist[C] + weight(C, G)= 3 + 2= 5 u = unprocessed vertex in Q having minimum dist[u] = D
Here, val[G] < dist[G], so update dist[G] Adjacent[D] = {G, H}
dist[G] = 5 and π[G] = C val[G] = dist[D] + weight(D, G) = 4 + 1 = 5
Here, val[G] = dist[G], so don’t update dist[G]
Vertex u A B C D E F G H
val[H] = dist[D] + weight(D, H) = 4 + 4 = 8
dist[u] 0 1 3 4 4 7 5 ∞
Here, val[H] < dist[H], so update dist[H]
π[u] NIL A B C A B C NIL
dist[H] = 8 and π[H] = D
Vertex u A B C D E F G H
Iteration 4:
dist[u] 0 1 3 4 4 7 5 8
u = unprocessed vertex in Q having minimum dist[u] = E
π [u] NIL A B C A B D D
Adjacent[E] = {F}
val[F] = dist[E] + weight(E, F) = 4 + 5 = 9
Here, val[F] > dist[F], so no change in table
Iteration 6:
u = unprocessed vertex in Q having minimum dist[u] = G
Adjacent[G] = { F, H }
val[F] = dist[G] + weight(G, F)= 5 + 1= 6 Iteration 8:
Here, val[F] < dist[F], so update dist[F]
u = unprocessed vertex in Q having minimum dist[u] = H
dist[F] = 6 and π[F] = G Adjacent[H] = { }
val[H] = dist[G] + weight(G, H)= 5 + 1= 6 So, no change in table
Here, val[H] < dist[H], so update dist[H]
dist[H] = 6 and π[H] = G
In the table, p[u] indicates the parent node of vertex u.
Vertex u A B C D E F G H
dist[u] 0 1 3 4 4 6 5 6
The shortest path tree is shown in following figure
π [u] NIL A B C A G C G

Iteration 7:
u = unprocessed vertex in Q having minimum dist[u] = F
Adjacent[F] = { }
So, no change in table
Strip Packing Problem
• The strip packing problem is a 2-dimensional geometric minimization
problem. Given a set of axis-aligned rectangles and a strip of bounded width
and infinite height, determine an overlapping-free packing of the rectangles
into the strip minimizing its height.
• Next-Fit Decreasing-Height (NFDH)
• Sort the items by order of non-increasing height.
• Starting at position (0,0), the algorithm places the items next to each
other in the strip until the next item will overlap the right border of the
strip.
• At this point, the algorithm defines a new level at the top of the tallest
item in the current level and places the items next to each other in this
new level.
• First-Fit Decreasing-Height (FFDH)
• Works similar to the NFDH algorithm.
• However, when placing the next item, the algorithm scans the levels
from bottom to top and places the item in the first level on which it will
fit.
• A new level is only opened if the item does not fit in any previous ones.
• Best-Fit Decreasing Height (BFDH)
• BFDH packs the next item i (in non-increasing height) on the level, among those that can
accommodate, for which the residual horizontal space is the minimum.
Bin Packing Problem
• In the 2-D bin packing problem, we are given an unlimited number of finite identical rectangular bins,
each having width W and height H, and a set of n rectangular items with width wj <= W and height hj,
for 1 <= j <= n.
• The problem is to pack, without overlap, all the items into the minimum number of bins.
• The items cannot be rotated.
• Most of the off-line algorithm in the literature are of greedy type, and can be classified into two
families:
• One phase algorithms directly pack the items into the finite bins;
• Two phase algorithms start by packing the items into a single strip, i.e., a bin having width W and
infinite height. In the second phase, the strip solution is used to construct a packing into finite
bins.
Two phase algorithms
• Hybrid First-Fit (HFF)
• In the first phase, a strip packing is obtained by the FFDH algorithm.
• The second phase adopts the First-Fit Decreasing (FFD) algorithm, which packs an item to the
first bin that it fits or start a new bin otherwise.

• Similarly, Hybrid Next-Fit nad Hybrid Best-Fit algorithms can be used for 2-D bin packing
One phase algorithms
• Finite First-Fit (FFF)
• An item is packed on the lowest level of the first bin where it fits; if no level can accommodate it,
a new level is created in the first bin having sufficient vertical space, otherwise, the new level is
created in a new bin.

• Similarly, Finite Next-Fit and Finite Best-Fit algorithms can be used for 2-D bin packing
Huffman Encoding
• Find prefix code for given characters occurring with certain frequency.
Prefix Code:
• Variable length code should be such that decoding will not create any ambiguity.
• Let’s take code for A = 0, B = 01 and C = 001.
• If our message is ABC, then its encoding will be 001001.
• As A is prefix of B and C and AB is prefix of C, it is difficult to decode the string.
• 001001 can be decoded in many ways like CC, ABAB, CAB, ABC etc.
• To prevent such ambiguity, code of any character must not be prefix of any other code. Such
codes are known as prefix code.
• Prefix code always provide optimal text data compression without any ambiguity.
• We can construct encoding message just by concatenating prefix code of each character of the
message.
• Decoding process needs to traverse Huffman tree from root to leaf till encoded string is not
over.
Huffman Encoding
HUFFMAN_CODE (PQ)
// PQ is the priority queue, in which priority is frequency of each character.
for i ← 1 to n – 1 do
z ← CreatNode()
x ← LeftChild[z] ← deque(PQ)
y ← RightChild[z] ← deque(PQ) Complexity Analysis of Huffman Coding
z.priority ← x.priority + y.priority enqueue (PQ, z)
• Sorting of n characters according to their
end frequency can be achieved in O(nlog2n) time.
return deque(Q) • Least frequent two elements are added, it can
be done in constant time.
• Merged node should be inserted in priority
queue, that will take linear time O(n).
• So T(n) = O(nlog2 n) + O(n) = O(nlog2 n).
Example: Given that for character set S = <A, B, C, D, E> occurrence in text file is P = <35, 12, 8, 25,
20>. Find prefix code for each symbol.

• Step 1 : Arrange all characters in decreasing order of their frequency. S = <A, D, E, B, C> and
corresponding P = <35, 25, 20, 12, 8>

• Step 2 : Merge last two nodes and arrange it again in order

• Step 3 : Merge last two nodes and arrange it again in order,


• Step 4 : Merge last two nodes and arrange it again in order,

• Step 5 : Merge last two nodes and arrange it again in order.


Label all left arc by 1 and all right arc by 0

• Visit all leaf nodes and read its edge value to find its prefix
code.

Character Prefix Code


A 11
B 001
C 000
D 10
E 01
• A character is 1 byte long = 8 bits.
• S = <A, B, C, D, E> occurrence in text file is P = <35, 12, 8, 25, 20>
This requires: 8x100 = 800 bits. Character Prefix Code
A 11
• Since 0 and 1 take only 1 bit each, with Huffman Encoding the string
requires: B 001
C 000
• Total bits = (freq of ‘A')x(length of code for ‘A')+ (freq of ‘B')x(length of
D 10
code for ‘B')+.....
E 01
Total bits = 35x2 + 12x3 + 8x3 + 25x2 + 20x2 = 220 bits only!
• Average code length=Total no. of bits/Total no. of symbols = 220/100 =
2.2
i.e. Average code length is 2.2 bits per symbol as compared to 8 bits per
symbol before encoding.
Shannon Fano algorithm
• An entropy coding technique used for lossless data compression.
• It uses the probabilities of occurrence of a character and assigns a unique variable-length code
to each of them.
• If c is a character, Probability(c) = Frequency(c) / sum of frequencies
1. We start by calculating the probability of occurrence of each character and sort them in
increasing order of frequency.
2. Divide the characters into two halves(left half and right half) such that the sum of frequencies
is as close as possible.
3. Repeat the above step for each half until individual elements are left.
Step 1.
1. Find the probability of occurrence of each
character in the string
2. Sort the array of characters in the increasing
order of their probabilities, say A
3. fanoShannon(A):
4. If (size(A)==1)
5. return
6. Divide A into left and right such that the
difference between the sum of probabilities of left
half and right half is minimum
7. Append 0 in the codes of the left half
8. Append 1 in the codes of the right half
9. fanoShannon(left)
10. fanoShannon(right)
Optimal Merge Pattern Complexity Analysis
• Merge n sorted sequences of different lengths into one Construction of heap takes O(logn) time.
sequence while minimizing reads.
T(n) = O(nlogn).
• Two way merge compares elements of two sorted lists of
T(n) = O(n) * max(O(findmin), O(insert))
size m1 and m2 and put them in new sorted list needing
m1+m2 comparisons. Case 1 : If list is not sorted :
• Let S = {s1, s2, …, sn} be the set of sequences to be merged. O(findmin) = O(n)
• Greedy approach selects minimum length sequences O(insert) = O(1)
si and sj from S. So, T(n) = (n – 1) * n = O(n2)
• The new set S’ is defined as, S’ = (S – {si, sj}) ∪ {si + sj}. Case 2 : If list is sorted :
• This procedure is repeated until only one sequence is left. Case 2.1 : List is represented as an array
Algorithm OPTIMAL_MERGE_PATTERNS(S) O(findmin) = O(1)
Create min heap H from S O(insert) = O(n)
while H.length > 1 do So, T(n) = (n – 1) * n = O(n2)
min1 ← minDel(H) // minDel function returns minimum element
from H and delete it from H Case 2.2 : List is represented as min-heap
min2 ← minDel(H) O(findmin) = O(1)
NewNode.Data ← min1 + min2
NewNoode.LeftChild ← min1 O(insert) = O(logn)
NewNode.RightChild ← min2 So, T(n) = (n – 1) * logn = O(nlogn)
Insert(NewNode, H) // Insert node NewNode in heap H
• Consider the sequence {3, 5, 9, 11, 16, 18, 20}. Find optimal merge patter for this data

• Step 1: Given sequence is,

• Step 2: Merge the two smallest sequences and sort in ascending order

• Step 3: Merge the two smallest sequences and sort in ascending order

• Step 4: Merge the two smallest sequences and sort in ascending order
• Step 5: Merge the two smallest sequences • Step 7: Merge the two smallest sequences
and sort in ascending order and sort in ascending order

• Step 6: Merge the two smallest sequences


and sort in ascending order

• Total time = 8 + 17 + 27 + 35 + 47 + 82
= 216

You might also like