Advanced Business Data Structures

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 137

P.O.

Box 342-01000 Thika


Email: info@mku.ac.ke
Web: www.mku.ac.ke

COURSE CODE: BIT 4107


COURSE TITLE: ADVANCED BUSINESS DATA
STRUCTURES AND COMPUTER ALGORITHMS

Instructional Manual for BBIT – Distance Learning

Prepared by Paul M Kathale, mutindaz@yahoo.com

1|Page
TABLE OF CONTENTS
FOUNDATIONS TO DATA STRUCTURES ............................................................................................. 6
Basic Definitions ....................................................................................................................................... 6
Structural and Behavioral Definitions....................................................................................................... 7
Abstract Data Types (ADT) ...................................................................................................................... 8
Categories of data types ............................................................................................................................ 9
Structural Relationships ............................................................................................................................ 9
Why study Data structures ...................................................................................................................... 12
INTRODUCTION TO DESIGN AND ALGORITHM ANALYSIS ......................................................... 13
The Classic Multiplication Algorithm .................................................................................................... 13
Algorithm's Performance ........................................................................................................................ 14
Θ-Notation (Same order) ........................................................................................................................ 15
Ο-Notation (Upper Bound) ..................................................................................................................... 16
Ω-Notation (Lower Bound)..................................................................................................................... 17
Algorithm Analysis ................................................................................................................................. 17
Optimality ............................................................................................................................................... 18
Reduction ................................................................................................................................................ 18
MATHEMATICS FOR ALGORITHMIC.................................................................................................. 19
Sets .......................................................................................................................................................... 19
Union of Sets....................................................................................................................................... 20
Symmetric difference .......................................................................................................................... 22
Sequences ............................................................................................................................................ 22
Linear Inequalities and Linear Equations ............................................................................................... 24
Inequalities .......................................................................................................................................... 24
Fundamental Properties of Inequalities............................................................................................... 24
Solution of Inequality ......................................................................................................................... 24
Geometric Interpretation of Inequalities ............................................................................................. 25
One Unknown ..................................................................................................................................... 26
Two Unknowns ................................................................................................................................... 26
n Equations in n Unknowns ................................................................................................................ 28
Solution of a Triangular System ......................................................................................................... 30
Back Substitution Method................................................................................................................... 30
Gaussian Elimination .......................................................................................................................... 31

2|Page
Second Part ......................................................................................................................................... 32
Determinants and systems of linear equations .................................................................................... 33
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES : Greedy algorithm........................................ 34
Greedy Approach ................................................................................................................................ 34
Characteristics and Features of Problems solved by Greedy Algorithms ........................................... 35
Definitions of feasibility ..................................................................................................................... 36
1. An Activity Selection Problem ................................................................................................... 36
An activity-selection is the problem of scheduling a resource among several competing
activity. Problem Statement ............................................................................................................... 36
Greedy Algorithm for Selection Problem ........................................................................................... 37
2. Minimum Spanning Tree ............................................................................................................ 40
3. Kruskal's Algorithm .................................................................................................................... 43
4. Prim's Algorithm ......................................................................................................................... 49
5. Dijkstra's Algorithm .................................................................................................................... 50
Analysis .............................................................................................................................................. 50
Example: Step by Step operation of Dijkstra algorithm. .................................................................... 50
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Divide & Conquer Algorithm....................... 54
Binary Search (simplest application of divide-and-conquer).............................................................. 54
Sequential Search ................................................................................................................................ 54
Analysis .............................................................................................................................................. 55
Binary Search ...................................................................................................................................... 55
Analysis .............................................................................................................................................. 55
Iterative Version of Binary Search...................................................................................................... 55
Analysis .............................................................................................................................................. 56
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Dynamic Programming Algorithm ............... 57
The Principle of Optimality ................................................................................................................ 59
1. Matrix-chain Multiplication Problem ........................................................................................ 62
2. 0-1 Knapsack Problem ................................................................................................................ 73
3. Knapsack Problem ...................................................................................................................... 74
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Amortized Analysis ...................................... 77
1. Aggregate Method .............................................................................................................................. 77
Aggregate Method Characteristics ...................................................................................................... 77
2. Accounting Method ............................................................................................................................ 79

3|Page
3. Potential Method ................................................................................................................................. 83
GRAPH ALGORITHMS ............................................................................................................................ 97
Graph Theory is an area of mathematics that deals with following types of problems ...................... 97
Introduction to Graphs ............................................................................................................................ 97
Definitions............................................................................................................................................... 97
Graphs, vertices and edges .................................................................................................................. 97
Undirected and directed graphs........................................................................................................... 97
Neighbours and adjacency .................................................................................................................. 97
An example ......................................................................................................................................... 97
Mathematical definition ...................................................................................................................... 98
Digraph ................................................................................................................................................... 98
1. Transpose .......................................................................................................................................... 100
2. Square ............................................................................................................................................... 101
3. Incidence Matrix ............................................................................................................................... 102
Types of Graph Algorithms .................................................................................................................. 103
1. Breadth-First Search Traversal Algorithm ................................................................................ 103
2. Depth-First Search .................................................................................................................... 110
3. Strongly Connected Components.............................................................................................. 118
4. Euler Tour ................................................................................................................................. 124
Running Time of Euler Tour ............................................................................................................. 126
AUTOMATA THEORY .......................................................................................................................... 127
What is Automata Theory? ................................................................................................................... 127
The Central Concepts of Automata Theory .......................................................................................... 128
Languages ............................................................................................................................................. 129
Structural expressions ........................................................................................................................... 130
Proofs .................................................................................................................................................... 131
Terminology ...................................................................................................................................... 131
Hints for Finding Proofs ................................................................................................................... 131
Proving techniques ................................................................................................................................ 133
By contradiction ................................................................................................................................ 133
By induction ...................................................................................................................................... 134
Proof by Induction: Example ............................................................................................................ 135
Proof by Construction ....................................................................................................................... 135

4|Page
“If-and-Only-If” statements .................................................................................................................. 136
REFERENCES ......................................................................................................................................... 137

5|Page
FOUNDATIONS TO DATA STRUCTURES

Basic Definitions
Data structures

This is the study of methods of representing objects, the design of algorithms to manipulate the
representations, the proper encapsulation of objects in a reusable form, and the evaluation of the
cost of the implementation, including the measurement of the complexity of the time and space
requirements.
Algorithms

 A finite step-by-step procedure to solve a given problem.


 A sequence of computational steps that transform input into output.
Abstraction

This is the separation between what a data structure represents and what an algorithm
accomplishes, from the implementation details of how things are actually carried out. I.e, hiding
the unnecessary details

Data Abstraction
Hiding of the representational details

Data Types

A data type consists of: a domain (= a set of values) and a set of operations; the kind of data
variables may “hold”.

Example 1:

Boolean or logical data type provided by most programming languages.

 Two values: true, false.


 Many operations, including AND, OR, NOT, etc.

6|Page
Example 2:

The data type fraction. How can we specify the domain and operations that define fractions? It
seems straightforward to name the operations; fractions are numbers so all the normal arithmetic
operations apply, such as addition, multiplication, and comparison. In addition there might be
some fraction-specific operations such as normalizing a fraction by removing common terms
from its numerator and denominator - for example, if we normalized 6/9 we'd get 2/3.

But how do we specify the domain for fractions, i.e. the set of possible values for a fraction?

Structural and Behavioral Definitions


There are two different approaches to specifying a domain: we can give a structural definition,
or we can give a behavioral definition. Let us see what these two are like.

Structural Definition of the domain for `Fraction'

The value of a fraction is made of three parts (or components):

 A sign, which is either + or -


 A numerator, which may be any non-negative integer
 A denominator, which may be any positive integer (not zero, not negative).
This is called a structural definition because it defines the values of type `fraction' by imposing
an internal structure on them (they have 3 parts...). The parts themselves have specific types, and
there may be further constraints. For example, we could have insisted that a fraction's numerator
and denominator have no common divisor (in that case we wouldn't need the normalize
operation - 6/9 would not be a fraction by this definition).

Behavioral Definition of the domain for `Fraction'

The alternative approach to defining the set of values for fractions does not impose any internal
structure on them. Instead it just adds an operation that creates fractions out of other things, such
as

CREATE_FRACTION(N,D)

Where N is any integer, D is any non-zero integer.

7|Page
The values of type fraction are defined to be the values that are produced by this function for any
valid combination of inputs.

The parameter names were chosen to suggest its intended behavior:


CREATE_FRACTION(N,D) should return a value representing the fraction N/D (N for
numerator, D for denominator).

How do we guarantee that CREATE_FRACTION(N,D) actually returns the fraction N/D?

The answer is that we have to constrain the behavior of this function, by relating it to the other
operations on fractions. For example, one of the key properties of multiplication is that:

NORMALIZE ((N/D) * (D/N)) = 1/1

This turns into a constraint on CREATE_FRACTION:

NORMALIZE (CREATE_FRACTION(N,D) * CREATE_FRACTION(D,N)) =


CREATE_FRACTION(1,1)

So you see CREATE_FRACTION cannot be any old function, its behavior is highly constrained,
because we can write down lots and lots of constraints like this.

And that's the reason we call this sort of definition behavioral, because the definition is strictly in
terms of a set of operations and constraints or axioms relating the behavior of the operations to
one another.

Abstract Data Types (ADT)


 An Abstract Data Type (ADT) defines data together with the operations.
 ADT is specified independently of any particular implementation. ADT depicts the basic
nature or concept of the data structure rather than the implementation details of the data.
 Examples of ADTs- stack,queue, list, graphs, trees.
 ADTs are implemented using : using an array or using a linked list.

8|Page
Categories of data types
 Atomic/Basic data types
 Structured data types
 Abstract Data types

Atomic/Simple Data Types

 These are data types that are defined without imposing any structure on their values.
 Example
o Boolean
o Integer
o Character
o Double
 They are used to implement structured datypes.

Structured Data Types


 The opposite of atomic is structured. A structured data type has a definition that imposes
structure upon its values. As we saw above, fractions normally are a structured data type.
 In structured data types, there is an internal structural relationship, or organization,
that holds between the components.
Example,

Think of an array as a structured type, with each position in the array being a component, then
there is a structural relationship of `followed by': we say that component N is followed by
component N+1.

N N+1 N+2 N+3 N+i

Structural Relationships
 Many structured data types do have an internal structural relationship, and these can be
classified according to the properties of this relationship.

9|Page
Linear Structure:

The most common organization for components is a linear structure. A structure is linear if it has
these 2 properties:

 Property P1: Each element is `followed by' at most one other element.
 Property P2: No two elements are `followed by' the same element.
An array is an example of a linearly structured data type. We generally write a linearly structured
data type like this: A->B->C->D (this is one value with 4 parts).
 Counter example 1 (violates P1): A points to B and C B<-A->C
 Counter example 2 (violates P2): A and B both point to C A->C<-B

Tree Structure

In tree structure, an element can point to more than one other element, but no two element can
point to one element. i.e
Dropping Constraint P1: If we drop the first constraint and keep the second we get a tree
structure or hierarchy: no two elements are followed by the same element. This is a very
common structure too, and extremely useful.

Counter example 1 is a tree, but counter example 2 is not.

10 | P a g e
A is followed by B C D, B by E F, C by G. We are not allowed to add any more arcs that point to
any of these nodes (except possibly A - see cyclic structures below).

Graph Structure

Graph - Is a non linier structure in which component may have more than one predecessor and
more than one successor.

Dropping both P1 and P2:

If we drop both constraints, we get a graph. In a graph, there are no constraints on the relations
we can define.

Graph - Is a non linier structure in which component may have more than one predecessor and
more than one successor.

Cyclic Structures:
All the examples we've seen are acyclic. This means that there is no sequence of arrows that
leads back to where it started. Linear structures are usually acyclic, but cyclic ones are not
uncommon.

11 | P a g e
Example of cyclic linear structure: A B C D A

Trees are virtually always acyclic.

Graphs are often cyclic, although the special properties of acyclic graphs make them an
important topic of study.

Example: Add an edge from G to D, and from E to A.

Why study Data structures


 Helps to understand how data is organized and stored. This is essential for creating
efficient algorithms.
 Gives designers a clear notion of relative advantages and disadvantages of each type of
data structure.
 Gives ability to make correct decisions regarding which data structure to use based on the
following issues:
o Run time – Number of operations to perform a given task
o Memory & secondary storage utilization
o Developmental cost of the program - total person-hour invested
i.e helps to make trade off between the three issues. This because no absolute data structure
which is best.

 Study of data structures expose designers/students to vast collection of tried and proven
methods used for designing efficient programs.

12 | P a g e
INTRODUCTION TO DESIGN AND ALGORITHM ANALYSIS

An algorithm, named after the ninth century scholar Abu Jafar Muhammad Ibn Musu Al-
Khowarizmi, is defined as follows:
 An algorithm is a set of rules for carrying out calculation either by hand or on a machine.
 An algorithm is a finite step-by-step procedure to achieve a required result.
 An algorithm is a sequence of computational steps that transform the input into the
output.
 An algorithm is a sequence of operations performed on data that have to be organized in
data structures.
 An algorithm is an abstraction of a program to be executed on a physical machine (model
of Computation).
The most famous algorithm in history dates well before the time of the ancient Greeks: this is the
Euclid's algorithm for calculating the greatest common divisor of two integers. This theorem
appeared as the solution to the Proposition II in the Book VII of Euclid's "Elements." Euclid's
"Elements" consists of thirteen books, which contain a total number of 465 propositions.

The Classic Multiplication Algorithm


1. Multiplication, the American way:
Multiply the multiplicand one after another by each digit of the multiplier taken from right to
left.

2. Multiplication, the English way:


Multiply the multiplicand one after another by each digit of the multiplier taken from left to
right.

13 | P a g e
Algorithmic is a branch of computer science that consists of designing and analyzing computer
algorithms
1. The “design” pertain to:
i. The description of algorithm at an abstract level by means of a pseudo language, and
ii. Proof of correctness that is, the algorithm solves the given problem in all cases.
2. The “analysis” deals with performance evaluation (complexity analysis).

We start with defining the model of computation, which is usually the Random Access Machine
(RAM) model, but other models of computations can be use such as PRAM. Once the model of
computation has been defined, an algorithm can be describe using a simple language (or pseudo
language) whose syntax is close to programming language such as C or java.

Algorithm's Performance
Two important ways to characterize the effectiveness of an algorithm are its space complexity
and time complexity. Time complexity of an algorithm concerns determining an expression of
the number of steps needed as a function of the problem size. Since the step count measure is
somewhat coarse, one does not aim at obtaining an exact step count. Instead, one attempts only
to get asymptotic bounds on the step count. Asymptotic analysis makes use of the O (Big Oh)
notation. Two other notational constructs used by computer scientists in the analysis of
algorithms are Θ (Big Theta) notation and Ω (Big Omega) notation.
The performance evaluation of an algorithm is obtained by totaling the number of occurrences of
each operation when running the algorithm. The performance of an algorithm is evaluated as a
function of the input size n and is to be considered modulo a multiplicative constant.

14 | P a g e
The following notations are commonly use notations in performance analysis and used to
characterize the complexity of an algorithm.

Θ-Notation (Same order)


This notation bounds a function to within constant factors. We say f(n) = Θ(g(n)) if there exist
positive constants n0, c1 and c2 such that to the right of n0 the value of f(n) always lies between c1
g(n) and c2 g(n) inclusive.

In the set notation, we write as follows:


Θ(g(n)) = {f(n) : there exist positive constants c1, c1, and n0 such that 0 ≤ c1 g(n) ≤ f(n) ≤ c2 g(n)
for all n ≥ n0}
We say that is g(n) an asymptotically tight bound for f(n).

Graphically, for all values of n to the right of n0, the value of f(n) lies at or above c1 g(n) and at or
below c2 g(n). In other words, for all n ≥ n0, the function f(n) is equal to g(n) to within a constant
factor. We say that g(n) is an asymptotically tight bound for f(n).
In the set terminology, f(n) is said to be a member of the set Θ(g(n)) of functions. In other words,
because O(g(n)) is a set, we could write
f(n) ∈ Θ(g(n))
to indicate that f(n) is a member of Θ(g(n)). Instead, we write
f(n) = Θ(g(n))
to express the same notation.
Historically, this notation is "f(n) = Θ(g(n))" although the idea that f(n) is equal to something
called Θ(g(n)) is misleading.

15 | P a g e
Example: n2/2 − 2n = (n2), with c1 = 1/4, c2 = 1/2, and n0 = 8.

Ο-Notation (Upper Bound)


This notation gives an upper bound for a function to within a constant factor. We write f(n) =
O(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n)
always lies on or below c g(n).
In the set notation, we write as follows: For a given function g(n), the set of functions
Ο(g(n)) = {f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤ c g(n) for all n ≥ n0}
We say that the function g(n) is an asymptotic upper bound for the function f(n). We use Ο-
notation to give an upper bound on a function, to within a constant factor.

Graphically, for all values of n to the right of n0, the value of the function f(n) is on or below
g(n). We write f(n) = O(g(n)) to indicate that a function f(n) is a member of the set Ο(g(n)) i.e.
f(n) ∈ Ο(g(n))
Note that f(n) = Θ(g(n)) implies f(n) = Ο(g(n)), since Θ-notation is a stronger notation than Ο-
notation.
Example: 2n2 = Ο(n3), with c = 1 and n0 = 2.

Equivalently, we may also define f is of order g as follows:


If f(n) and g(n) are functions defined on the positive integers, then f(n) is Ο(g(n)) if and only if
there is a c > 0 and an n0 > 0 such that
| f(n) | ≤ | g(n) | for all n ≥ n0

16 | P a g e
Historical Note: The notation was introduced in 1892 by the German mathematician Paul
Bachman.

Ω-Notation (Lower Bound)


This notation gives a lower bound for a function to within a constant factor. We write f(n) =
Ω(g(n)) if there are positive constants n0 and c such that to the right of n0, the value of f(n)
always lies on or above c g(n).
In the set notation, we write as follows: For a given function g(n), the set of functions
Ω(g(n)) = {f(n) : there exist positive constants c and n0 such that 0 ≤ c g(n) ≤ f(n) for all n ≥ n0}
We say that the function g(n) is an asymptotic lower bound for the function f(n).

The intuition behind Ω-notation is shown above.


Example: √n = (lg n), with c = 1 and n0 = 16.

Algorithm Analysis
The complexity of an algorithm is a function g(n) that gives the upper bound of the number of
operation (or running time) performed by an algorithm when the input size is n.
There are two interpretations of upper bound.
Worst-case Complexity
The running time for any given size input will be lower than the upper bound except
possibly for some values of the input where the maximum is reached.
Average-case Complexity
The running time for any given size input will be the average number of operations over
all problem instances for a given size.

17 | P a g e
Because, it is quite difficult to estimate the statistical behavior of the input, most of the time we
content ourselves to a worst case behavior. Most of the time, the complexity of g(n) is
approximated by its family o(f(n)) where f(n) is one of the following functions. n (linear
complexity), log n (logarithmic complexity), na where a ≥ 2 (polynomial complexity), an
(exponential complexity).

Optimality
Once the complexity of an algorithm has been estimated, the question arises whether this
algorithm is optimal. An algorithm for a given problem is optimal if its complexity reaches the
lower bound over all the algorithms solving this problem. For example, any algorithm solving
“the intersection of n segments” problem will execute at least n2 operations in the worst case
even if it does nothing but print the output. This is abbreviated by saying that the problem has
Ω(n2) complexity. If one finds an O(n2) algorithm that solve this problem, it will be optimal and
of complexity Θ(n2).

Reduction
Another technique for estimating the complexity of a problem is the transformation of problems,
also called problem reduction. As an example, suppose we know a lower bound for a problem A,
and that we would like to estimate a lower bound for a problem B. If we can transform A into B
by a transformation step whose cost is less than that for solving A, then B has the same bound as
A.
The Convex hull problem nicely illustrates "reduction" technique. A lower bound of Convex-hull
problem established by reducing the sorting problem (complexity: Θ(n log n)) to the Convex hull
problem.

18 | P a g e
MATHEMATICS FOR ALGORITHMIC
Sets
A set is a collection of different things (distinguishable objects or distinct objects) represented as
a unit. The objects in a set are called its elements or members. If an object x is a member of a set
S, we write x S. On the the hand, if x is not a member of S, we write z S. A set cannot
contain the same object more than once, and its elements are not ordered.

For example, consider the set S= {7, 21, 57}. Then 7 {7, 21, 57} and 8 {7, 21, 57} or
equivalently, 7 S and 8 S.

We can also describe a set containing elements according to some rule. We write
{n : rule about n}
Thus, {n : n = m2 for some m N } means that a set of perfect squares.

Set Cardinality
The number of elements in a set is called cardinality or size of the set, denoted |S| or sometimes
n(S). The two sets have same cardinality if their elements can be put into a one-to-one
correspondence. It is easy to see that the cardinality of an empty set is zero i.e., | |.

Mustiest
If we do want to take the number of occurrences of members into account, we call the group a
multiset.
For example, {7} and {7, 7} are identical as set but {7} and {7, 7} are different as multiset.

Infinite Set
A set contains infinite elements. For example, set of negative integers, set of integers, etc.

Empty Set
Set contain no member, denoted as or {}.

19 | P a g e
Subset
For two sets A and B, we say that A is a subset of B, written A B, if every member of A also is
a member of B.
Formally, A B if
x A implies x B
written

x A => x B.

Proper Subset
Set A is a proper subset of B, written A B, if A is a subset of B and not equal to B. That is, A

set A is proper subset of B if A B but A B.

Equal Sets
The sets A and B are equal, written A = B, if each is a subset of the other. Rephrased definition,
let A and B be sets. A = B if A B and B A.

Power Set
Let A be the set. The power of A, written P(A) or 2A, is the set of all subsets of A. That is, P(A)
= {B : B A}.
For example, consider A={0, 1}. The power set of A is {{}, {0}, {1}, {0, 1}}. And the power set
of A is the set of all pairs (2-tuples) whose elements are 0 and 1 is {(0, 0), (0, 1), (1, 0), (1, 1)}.

Disjoint Sets
Let A and B be sets. A and B are disjoint if A B = .

Union of Sets
The union of A and B, written A B, is the set we get by combining all elements in A and B into
a single set. That is,
A B = { x : x A or x B}.
For two finite sets A and B, we have identity
20 | P a g e
|A B| = |A| + |B| - |A B|
We can conclude
|A B| |A| + |B|
That is,
if |A B| = 0 then |A B| = |A| + |B| and if A B then |A| |B|

Intersection Sets

The intersection of set set A and B, written A B, is the set of elements that are both in A and
in B. That is,

A B = { x : x A and x B}.

Partition of Set
A collection of S = {Si} of nonempty sets form a partition of a set if

i. The set are pair-wise disjoint, that is, Si, Sj and i j imply Si Sj = .
ii. Their union is S, that is, S = Si

In other words, S form a partition of S if each element of S appears in exactly on Si.

Difference of Sets
Let A and B be sets. The difference of A and B is
A - B = {x : x A and x B}.

For example, let A = {1, 2, 3} and B = {2, 4, 6, 8}. The set difference A - B = {1, 3} while B-A
= {4, 6, 8}.

Complement of a Set
All set under consideration are subset of some large set U called universal set. Given a universal
set U, the complement of A, written A', is the set of all elements under consideration that are not
in A.
21 | P a g e
Formally, let A be a subset of universal set U. The complement of A in U is
A' = A - U
OR
A' = {x : x U and x A}.
For any set A U, we have following laws
i. A'' = A

ii. A A' = .
iii. A A' = U

Symmetric difference
Let A and B be sets. The symmetric difference of A and B is

A B = { x : x A or x B but not both}


Therefore,

A B = (A B) - (A B)

As an example, consider the following two sets A = {1, 2, 3} and B = {2, 4, 6, 8}. The

symmetric difference, A B = {1, 3, 4, 6, 8}.

Sequences
A sequence of objects is a list of objects in some order. For example, the sequence 7, 21, 57
would be written as (7, 21, 57). In a set the order does not matter but in a sequence it does.

Hence, (7, 21, 57) {57, 7, 21} But (7, 21, 57) = {57, 7, 21}.
Repetition is not permitted in a set but repetition is permitted in a sequence. So, (7, 7, 21, 57) is
different from {7, 21, 57}.

22 | P a g e
Tuples
Finite sequence often are called tuples. For example,
(7, 21) 2-tuple or pair
(7, 21, 57) 3-tuple
(7, 21, ..., k ) k-tuple

An ordered pair of two elements a and b is denoted (a, b) and can be defined as (a, b) = (a, {a,
b}).

Cartesian Product or Cross Product


If A and B are two sets, the cross product of A and B, written A×B, is the set of all pairs wherein
the first element is a member of the set A and the second element is a member of the set B.
Formally,
A×B = {(a, b) : a A, b B}.

For example, let A = {1, 2} and B = {x, y, z}. Then A×B = {(1, x), (1, y), (1, z), (2, x), (2, y), (2,
z)}.

When A and B are finite sets, the cardinality of their product is


|A×B| = |A| . |B|

n-tuples
The cartesian product of n sets A1, A2, ..., An is the set of n-tuples
A1 × A2 × ... × An = {(a1, ..., an) : ai Ai, i = 1, 2, ..., n}
whose cardinality is
| A1 × A2 × ... × An| = |A1| . |A2| ... |An|

If all sets are finite. We denote an n-fold cartesian product over a single set A by the set
An = A × A × ... × A
whose cardinality is
|An | = | A|n if A is finite.

23 | P a g e
Linear Inequalities and Linear Equations

Inequalities
The term inequality is applied to any statement involving one of the symbols <, >, , .
Example of inequalities are:
i. x 1
ii. x + y + 2z > 16
iii. p2 + q2 1/2
iv. a2 + ab > 1

Fundamental Properties of Inequalities


1. If a b and c is any real number, then a + c b + c.
For example, -3 -1 implies -3+4 -1 + 4.
2. If a b and c is positive, then ac bc.
For example, 2 3 implies 2(4) 3(4).
3. If a b and c is negative, then ac bc.
For example, 3 9 implies 3(-2) 9(-2).
4. If a b and b c, then a c.
For example, -1/2 2 and 2 8/3 imply -1/2 8/3.

Solution of Inequality
By solution of the one variable inequality 2x + 3 7 we mean any number which substituted for x
yields a true statement.
For example, 1 is a solution of 2x + 3 7 since 2(1) + 3 = 5 and 5 is less than and equal to 7.
By a solution of the two variable inequality x - y 5 we mean any ordered pair of numbers which
when substituted for x and y, respectively, yields a true statement.
For example, (2, 1) is a solution of x - y 5 because 2-1 = 1 and 1 5.
By a solution of the three variable inequality 2x - y + z 3 we means an ordered triple of number
which when substituted for x, y and z respectively, yields a true statement.

For example, (2, 0, 1) is a solution of 2x - y + z 3.

24 | P a g e
A solution of an inequality is said to satisfy the inequality. For example, (2, 1) is satisfy x - y 5.
Two or more inequalities, each with the same variables, considered as a unit, are said to form a
system of inequalities. For example,
x 0
y 0
2x + y 4
Note that the notion of a system of inequalities is analogous to that of a solution of a system of
equations.
Any solution common to all of the inequalities of a system of inequalities is said to be a solution
of that system of inequalities. A system of inequalities, each of whose members is linear, is said
to be a system of linear inequalities.

Geometric Interpretation of Inequalities


An inequality in two variable x and y describes a region in the x-y plane (called its graph),
namely, the set of all points whose coordinates satisfy the inequality.
The y-axis divide, the xy-plane into two regions, called half-planes.
 Right half-plane
The region of points whose coordinates satisfy inequality x > 0.
 Left half-plane
The region of points whose coordinates satisfy inequality x < 0.
Similarly, the x-axis divides the xy-plane into two half-planes.
 Upper half-plane
In which inequality y > 0 is true.
 Lower half-plane
In which inequality y < 0 is true.
What is x-axis and y-axis? They are simply lines. So, the above arguments can be applied to any
line.
Every line ax + by = c divides the xy-plane into two regions called its half-planes.
 On one half-plane ax + by > c is true.
 On the other half-plane ax + by < c is true.

25 | P a g e
Linear Equations
One Unknown
A linear equation in one unknown can always be stated into the standard form
ax = b
where x is an unknown and a and b are constants. If a is not equal to zero, this equation has a
unique solution
x = b/a
Two Unknowns
A linear equation in two unknown, x and y, can be put into the form
ax + by = c
where x and y are two unknowns and a, b, c are real numbers. Also, we assume that a and b are
no zero.

Solution of Linear Equation


A solution of the equation consists of a pair of number, u = (k1, k2), which satisfies the equation
ax + by = c. Mathematically speaking, a solution consists of u = (k1, k2) such that ak1 + bk2 = c.
Solution of the equation can be found by assigning arbitrary values to x and solving for y OR
assigning arbitrary values to y and solving for x.
Geometrically, any solution u = (k1, k2) of the linear equation ax + by = c determine a point in
the cartesian plane. Since a and b are not zero, the solution u correspond precisely to the points
on a straight line.

Two Equations in the Two Unknowns


A system of two linear equations in the two unknowns x and y is
a1 x + b1 x = c 1
a2 x + b2 x = c 2

Where a1, a2, b1, b2 are not zero. A pair of numbers which satisfies both equations is called a
simultaneous solution of the given equations or a solution of the system of equations.

Geometrically, there are three cases of a simultaneous solution

26 | P a g e
1. If the system has exactly one solution, the graph of the linear equations intersect in one
point.
2. If the system has no solutions, the graphs of the linear equations are parallel.
3. If the system has an infinite number of solutions, the graphs of the linear equations
coincide.
The special cases (2) and (3) can only occur when the coefficient of x and y in the two linear
equations are proportional.

OR => a1b2 - a2b1 = 0 => =0

The system has no solution when

The solution to system


a1 x + b1 x = c 1
a2 x + b2 x = c 2
can be obtained by the elimination process, whereby reduce the system to a single equation in
only one unknown. This is accomplished by the following algorithm

ALGORITHM

Step 1 Multiply the two equation by two numbers which are such that
the resulting coefficients of one of the unknown are negative of
each other.
Step 2 Add the equations obtained in Step 1.

27 | P a g e
The output of this algorithm is a linear equation in one unknown. This equation may be solved
for that unknown, and the solution may be substituted in one of the original equations yielding
the value of the other unknown.
As an example, consider the following system
3x + 2y = 8 ------------ (1)
2x - 5y = -1 ------------ (2)
Step 1: Multiply equation (1) by 2 and equation (2) by -3
6x + 4y = 16
-6x + 15y = 3
Step 2: Add equations, output of Step 1
19y = 19
Thus, we obtain an equation involving only unknown y. we solve for y to obtain
y=1
Next, we substitute y =1 in equation (1) to get
x=2
Therefore, x = 2 and y = 1 is the unique solution to the system.

n Equations in n Unknowns
Now, consider a system of n linear equations in n unknowns
a11x1 + a12x2 + . . . + a1nxn = b1
a21x1 + a22x2 + . . . + a2nxn = b2
.........................
an1x1 + an2x2 + . . . + annxn = bn
Where the aij, bi are real numbers. The number aij is called the coefficient of xj in the ith equation,
and the number bi is called the constant of the ith equation. A list of values for the unknowns,
x1 = k1, x2 = k2, . . . , xn = kn
or equivalently, a list of n numbers
u = (k1, k2, . . . , kn)
is called a solution of the system if, with kj substituted for xj, the left hand side of each equation
in fact equals the right hand side.

28 | P a g e
The above system is equivalent to the matrix equation.

or, simply we can write A × = B, where A = (aij), × = (xi), and B = (bi).

The matrix is called the coefficient matrix of the system of n linear equations in the system of n
unknown.

The matrix is called the augmented matrix of n linear equations in n unknown.

Note for algorithmic nerds: we store a system in the computer as its augmented matrix.
Specifically, system is stored in computer as an N × (N+1) matrix array A, the augmented matrix
array A, the augmented matrix of the system. Therefore, the constants b1, b2, . . . , bn are
respectively stored as A1,N+1, A2,N+1, . . . , AN,N+1.

29 | P a g e
Solution of a Triangular System
If aij = 0 for i > j, then system of n linear equations in n unknown assumes the triangular form.

a11x1 + a12x2 + . . . + a1,n-1xn-1 + a1nxn = b1


a22x2 + . . . + a2,n-1xn-1 + a2nxn = b2
............................
an-2,n-2xn-2 + an-2,n-1xn-1 + an-2,nxn-1 + a2nxn = b2
an-1,n-1xn-1 + an-1,nxn = bn-1
amnxn = bn

Where |A| = a11a22 . . . ann; If none of the diagonal entries a11,a22, . . ., ann is zero, the system has
a unique solution.

Back Substitution Method


we obtain the solution of a triangular system by the technique of back substitution, consider the
above general triangular system.

1. First, we solve the last equation for the last unknown, xn;
xn = bn/ann
2. Second, we substitute the value of xn in the next-to-last equation and solve it for the next-to-
last unknown, xn-1:

3. Third, we substitute these values for xn and xn-1 in the third-from-last equation and solve it for
the third-from-last unknown, xn-2 :

30 | P a g e
.

In general, we determine xk by substituting the previously obtained values of xn, xn-1, . . . , xk+1 in
the kth equation.

Gaussian Elimination

Gaussian elimination is a method used for finding the solution of a system of linear equations.
This method consider of two parts.
1. This part consists of step-by-step putting the system into triangular system.
2. This part consists of solving the triangular system by back substitution.
x - 3y - 2z = 6 --- (1)
2x - 4y + 2z = 18 --- (2)
-3x + 8y + 9z = -9 --- (3)

First Part
Eliminate first unknown x from the equations 2 and 3.
(a) multiply -2 to equation (1) and add it to equation (2). Equation (2) becomes

2y + 6z = 6

(b) Multiply 3 to equation (1) and add it to equation (3). Equation (3) becomes
-y + 3z = 9

31 | P a g e
And the original system is reduced to the system
x - 3y - 2z = 6
2y + 6z = 6
-y + 3z = 9

Now, we have to remove the second unknown, y, from new equation 3, using only the new
equation 2 and 3 (above).
a, Multiply equation (2) by 1/2 and add it to equation (3). The equation (3) becomes 6z = 12.
Therefore, our given system of three linear equation of 3 unknown is reduced to the triangular
system
x - 3y - 2z = 6
2y + 6z = 6
6z = 12

Second Part
In the second part, we solve the equation by back substitution and get
x = 1, y = -3, z = 2

In the first stage of the algorithm, the coefficient of x in the first equation is called the pivot, and
in the second stage of the algorithm, the coefficient of y in the second equation is the point.
Clearly, the algorithm cannot work if either pivot is zero. In such a case one must interchange
equation so that a pivot is not zero. In fact, if one would like to code this algorithm, then the
greatest accuracy is attained when the pivot is as large in absolute value as possible. For
example, we would like to interchange equation 1 and equation 2 in the original system in the
above example before eliminating x from the second and third equation.

That is, first step of the algorithm transfer system as

32 | P a g e
2x - 4y + 2z = 18
x - 4y + 2z = 18
-3x + 8y + 9z = -9

Determinants and systems of linear equations


Consider a system of n linear equations in n unknowns. That is, for the following system
a11x1 + a12x2 + . . . + a1nxn = b1
a21x1 + a22x2 + . . . + a2nxn = b2
.........................
an1x1 + an2x2 + . . . + annxn = bn

Let D denote the determinant of the matrix A +(aij) of coefficients; that is, let D =|A|. Also, let Ni
denote the determinants of the matrix obtained by replacing the ith column of A by the column of
constants.

Theorem. If D 0, the above system of linear equations has the unique solution

This theorem is widely known as Cramer's rule. It is important to note that Gaussian elimination
is usually much more efficient for solving systems of linear equations than is the use of
determinants.

33 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES : Greedy algorithm
Greedy algorithms are simple and straightforward. They are shortsighted in their approach in the
sense that they take decisions on the basis of information at hand without worrying about the
effect these decisions may have in the future. They are easy to invent, easy to implement and
most of the time quite efficient. Many problems cannot be solved correctly by greedy approach.
Greedy algorithms are used to solve optimization problems

Greedy Approach

Greedy Algorithm works by making the decision that seems most promising at any moment; it
never reconsiders this decision, whatever situation may arise later.
As an example consider the problem of "Making Change".
Coins available are:
 dollars (100 cents)
 quarters (25 cents)
 dimes (10 cents)
 nickels (5 cents)
 pennies (1 cent)

Problem Make a change of a given amount using the smallest possible number of coins.

Informal Algorithm
 Start with nothing.
 at every stage without passing the given amount.
o add the largest to the coins already chosen.

Formal Algorithm
Make change for n units using the least possible number of coins.
MAKE-CHANGE (n)
C ← {100, 25, 10, 5, 1} // constant.
Sol ← {}; // set that will hold the solution set.
Sum ← 0 sum of item in solution set

34 | P a g e
WHILE sum not = n
x = largest item in set C such that sum + x ≤ n
IF no such item THEN
RETURN "No Solution"
S ← S {value of x}
sum ← sum + x
RETURN S

Example Make a change for 2.89 (289 cents) here n = 2.89 and the solution contains 2 dollars,
3 quarters, 1 dime and 4 pennies. The algorithm is greedy because at every stage it chooses the
largest coin without worrying about the consequences. Moreover, it never changes its mind in the
sense that once a coin has been included in the solution set, it remains there.

Characteristics and Features of Problems solved by Greedy Algorithms

To construct the solution in an optimal way. Algorithm maintains two sets. One contains chosen
items and the other contains rejected items.
The greedy algorithm consists of four (4) function.
1. A function that checks whether chosen set of items provide a solution.
2. A function that checks the feasibility of a set.
3. The selection function tells which of the candidates is the most promising.
4. An objective function, which does not appear explicitly, gives the value of a solution.

Structure Greedy Algorithm


 Initially the set of chosen items is empty i.e., solution set.
 At each step
o item will be added in a solution set by using selection function.
o IF the set would no longer be feasible
 reject items under consideration (and is never consider again).
o ELSE IF set is still feasible THEN
 add the current item.

35 | P a g e
Definitions of feasibility

A feasible set (of candidates) is promising if it can be extended to produce not merely a solution,
but an optimal solution to the problem. In particular, the empty set is always promising why?
(because an optimal solution always exists)
Unlike Dynamic Programming, which solves the sub problems bottom-up, a greedy strategy
usually progresses in a top-down fashion, making one greedy choice after another, reducing each
problem to a smaller one.
Greedy-Choice Property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem that
lend to a greedy strategy.
Greedy-Choice Property
It says that a globally optimal solution can be arrived at by making a locally optimal choice.

The greedy Algorithms techniques include:

 Activity Selection Problem


 Minimum Spanning Tree
 Kruskal's Algorithm
 Prim's Algorithm
 Dijkstra's Algorithm
 Huffman's Codes

1. An Activity Selection Problem

An activity-selection is the problem of scheduling a resource among several competing activity.


Problem Statement

Given a set S of n activities with and start time, Si and fi, finish time of an ith activity. Find the
maximum size set of mutually compatible activities.
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj)
do not overlap, that is, i and j are compatible if si ≥ fj and sj ≥ fi

36 | P a g e
Greedy Algorithm for Selection Problem
I. Sort the input activities by increasing finishing time.
f1 ≤ f2 ≤ . . . ≤ fn
II. Call GREEDY-ACTIVITY-SELECTOR (s, f)
1. n = length [s]
2. A={i}
3. j = 1
4. for i = 2 to n
5. do if si ≥ fj
6. then A= AU{i}
7. j=i
8. return set A

Operation of the algorithm


Let 11 activities are given S = {p, q, r, s, t, u, v, w, x, y, z} start and finished times for proposed
activities are (1, 4), (3, 5), (0, 6), 5, 7), (3, 8), 5, 9), (6, 10), (8, 11), (8, 12), (2, 13) and (12, 14).
A = {p} Initialization at line 2
A = {p, s} line 6 - 1st iteration of FOR - loop
A = {p, s, w} line 6 -2nd iteration of FOR - loop
A = {p, s, w, z} line 6 - 3rd iteration of FOR-loop
Out of the FOR-loop and Return A = {p, s, w, z}

Analysis
Part I requires O(n lg n) time (use merge of heap sort).
Part II requires θ(n) time assuming that activities were already sorted in part I by their finish
time.

Correctness
Note that Greedy algorithm do not always produce optimal solutions but GREEDY-ACTIVITY-
SELECTOR does.

37 | P a g e
Theorem Algorithm GREED-ACTIVITY-SELECTOR produces solution of maximum size for
the activity-selection problem.

Proof Idea Show the activity problem satisfied


I. Greedy choice property.
II. Optimal substructure property.

Proof
I. Let S = {1, 2, . . . , n} be the set of activities. Since activities are in order by finish time. It
implies that activity 1 has the earliest finish time.
Suppose, A S is an optimal solution and let activities in A are ordered by finish time.
Suppose, the first activity in A is k.
If k = 1, then A begins with greedy choice and we are done (or to be very precise, there is
nothing to proof here).

If k 1, we want to show that there is another solution B that begins with greedy choice,
activity 1.
Let B = A - {k} {1}. Because f1 fk, the activities in B are disjoint and since B has
same number of activities as A, i.e., |A| = |B|, B is also optimal.
II. Once the greedy choice is made, the problem reduces to finding an optimal solution for
the problem. If A is an optimal solution to the original problem S, then A` = A - {1} is an
optimal solution to the activity-selection problem S` = {i S: Si fi}.
why? Because if we could find a solution B` to S` with more activities then A`, adding 1
to B` would yield a solution B to S with more activities than A, there by contradicting the
optimality.

As an example consider the example. Given a set of activities to among lecture halls. Schedule
all the activities using minimal lecture halls.
In order to determine which activity should use which lecture hall, the algorithm uses the
GREEDY-ACTIVITY-SELECTOR to calculate the activities in the first lecture hall. If there are

38 | P a g e
some activities yet to be scheduled, a new lecture hall is selected and GREEDY-ACTIVITY-
SELECTOR is called again. This continues until all activities have been scheduled.

LECTURE-HALL-ASSIGNMENT (s, f)
n = length [s)
for i = 1 to n
do HALL [i] = NIL
k=1
while (Not empty (s))
do HALL [k] = GREEDY-ACTIVITY-SELECTOR (s, t, n)
k=k+1
return HALL

Following changes can be made in the GREEDY-ACTIVITY-SELECTOR (s, f) (see CLR).


j = first (s)
A=i
for i = j + 1 to n
do if s(i) not= "-"
then if
GREED-ACTIVITY-SELECTOR (s, f, n)
j = first (s)
A = i = j + 1 to n
if s(i] not = "-" then
if s[i] ≥ f[j]|
then A = AU{i}
s[i] = "-"
j=i
return A

Correctness

39 | P a g e
The algorithm can be shown to be correct and optimal. As a contradiction, assume the number of
lecture halls are not optimal, that is, the algorithm allocates more hall than necessary. Therefore,
there exists a set of activities B which have been wrongly allocated. An activity b belonging to B
which has been allocated to hall H[i] should have optimally been allocated to H[k]. This implies
that the activities for lecture hall H[k] have not been allocated optimally, as the GREED-
ACTIVITY-SELECTOR produces the optimal set of activities for a particular lecture hall.

Analysis
In the worst case, the number of lecture halls require is n. GREED-ACTIVITY-SELECTOR runs
in θ(n). The running time of this algorithm is O(n2).
Two important Observations
 Choosing the activity of least duration will not always produce an optimal solution. For
example, we have a set of activities {(3, 5), (6, 8), (1, 4), (4, 7), (7, 10)}. Here, either (3,
5) or (6, 8) will be picked first, which will be picked first, which will prevent the optimal
solution of {(1, 4), (4, 7), (7, 10)} from being found.

 Choosing the activity with the least overlap will not always produce solution. For
example, we have a set of activities {(0, 4), (4, 6), (6, 10), (0, 1), (1, 5), (5, 9), (9, 10), (0,
3), (0, 2), (7, 10), (8, 10)}. Here the one with the least overlap with other activities is (4,
6), so it will be picked first. But that would prevent the optimal solution of {(0, 1), (1, 5),
(5, 9), (9, 10)} from being found.

2. Minimum Spanning Tree

Spanning Tree
A spanning tree of a graph is any tree that includes every vertex in the graph. Little more
formally, a spanning tree of a graph G is a subgraph of G that is a tree and contains all the
vertices of G. An edge of a spanning tree is called a branch; an edge in the graph that is not in the
spanning tree is called a chord. We construct spanning tree whenever we want to find a simple,
cheap and yet efficient way to connect a set of terminals (computers, cites, factories, etc.).
Spanning trees are important because of following reasons.

40 | P a g e
 Spanning trees construct a sparse sub graph that tells a lot about the original graph.
 Spanning trees a very important in designing efficient routing algorithms.
 Some hard problems (e.g., Steiner tree problem and traveling salesman problem) can be
solved approximately by using spanning trees.
 Spanning trees have wide applications in many areas, such as network design, etc.

Greedy Spanning Tree Algorithm

One of the most elegant spanning tree algorithm that I know of is as follows:
 Examine the edges in graph in any arbitrary sequence.
 Decide whether each edge will be included in the spanning tree.
Note that each time a step of the algorithm is performed, one edge is examined. If there is only a
finite number of edges in the graph, the algorithm must halt after a finite number of steps. Thus,
the time complexity of this algorithm is clearly O(n), where n is the number of edges in the
graph.

Some important facts about spanning trees are as follows:

 Any two vertices in a tree are connected by a unique path.


 Let T be a spanning tree of a graph G, and let e be an edge of G not
in T. The T+e contains a unique cycle.

Lemma The number of spanning trees in the complete graph Kn is nn-2.

Greediness It is easy to see that this algorithm has the property that each edge is examined at
most once. Algorithms, like this one, which examine each entity at most once and decide its fate
once and for all during that examination are called greedy algorithms. The obvious advantage of
greedy approach is that we do not have to spend time reexamining entities.

41 | P a g e
Consider the problem of finding a spanning tree with the smallest possible weight or the largest
possible weight, respectively called a minimum spanning tree and a maximum spanning tree. It is
easy to see that if a graph possesses a spanning tree, it must have a minimum spanning tree and
also a maximum spanning tree. These spanning trees can be constructed by performing the
spanning tree algorithm (e.g., above mentioned algorithm) with an appropriate ordering of the
edges.
Minimum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the edges is order of non
decreasing weight (smallest first, largest last). If two or more edges have the same weight, order
them arbitrarily.
Maximum Spanning Tree Algorithm
Perform the spanning tree algorithm (above) by examining the edges in order of non
increasing weight (largest first, smallest last). If two or more edges have the same weight, order
them arbitrarily.

Minimum Spanning Trees


A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G whose edges
sum is minimum weight. In other words, a MST is a tree formed from a subset of the edges in a
given undirected graph, with two properties:

 it spans the graph, i.e., it includes every vertex of the graph.


 it is a minimum, i.e., the total weight of all the edges is as low as possible.

Let G=(V, E) be a connected, undirected graph where V is a set of vertices (nodes) and E is the
set of edges. Each edge has a given non negative length.
Problem Find a subset T of the edges of G such that all the vertices remain connected when
only the edges T are used, and the sum of the lengths of the edges in T is as small as possible.
Let G` = (V, T) be the partial graph formed by the vertices of G and the edges in T. [Note: A
connected graph with n vertices must have at least n-1 edges AND more that n-1 edges implies at
least one cycle]. So n-1 is the minimum number of edges in the T. Hence if G` is connected and
T has more that n-1 edges, we can remove at least one of these edges without disconnecting
(choose an edge that is part of cycle). This will decrease the total length of edges in T.

42 | P a g e
G` = (V, T) where T is a subset of E. Since connected graph of n nodes must have n-1 edges
otherwise there exist at least one cycle. Hence if G` is connected and T has more that n-1 edges.
Implies that it contains at least one cycle. Remove edge from T without disconnecting the G`
(i.e., remove the edge that is part of the cycle). This will decrease the total length of the edges in
T. Therefore, the new solution is preferable to the old one.
Thus, T with n vertices and more edges can be an optimal solution. It follow T must have n-1
edges and since G` is connected it must be a tree. The G` is called Minimum Spanning Tree
(MST).
3. Kruskal's Algorithm
This minimum spanning tree algorithm was first described by Kruskal in 1956 in the same paper
where he rediscovered Jarnik's algorithm. This algorithm was also rediscovered in 1957 by
Loberman and Weinberger, but somehow avoided being renamed after them. The basic idea of
the Kruskal's algorithms is as follows: scan all edges in increasing weight order; if an edge is
safe, keep it (i.e. add it to the set A).

Overall Strategy
Kruskal's Algorithm, as described in CLRS, is directly based on the generic MST algorithm. It
builds the MST in forest. Initially, each vertex is in its own tree in forest. Then, algorithm
consider each edge in turn, order by increasing weight. If an edge (u, v) connects two different
trees, then (u, v) is added to the set of edges of the MST, and two trees connected by an edge (u,
v) are merged into a single tree on the other hand, if an edge (u, v) connects two vertices in the
same tree, then edge (u, v) is discarded.
A little more formally, given a connected, undirected, weighted graph with a function w : E → R.
 Starts with each vertex being its own component.
 Repeatedly merges two components into one by choosing the light edge that connects
them (i.e., the light edge crossing the cut between them).
 Scans the set of edges in monotonically increasing order by weight.
 Uses a disjoint-set data structure to determine whether an edge connects vertices in
different components.

Data Structure

43 | P a g e
Before formalizing the above idea, lets quickly review the disjoint-set data structure from
Chapter 21.
 Make_SET(v): Create a new set whose only member is pointed to by v. Note that for
this operation v must already be in a set.
 FIND_SET(v): Returns a pointer to the set containing v.
 UNION(u, v): Unites the dynamic sets that contain u and v into a new set that is union
of these two sets.

Algorithm
Start with an empty set A, and select at every stage the shortest edge that has not been chosen or
rejected, regardless of where this edge is situated in the graph.
KRUSKAL(V, E, w)
A←{} ▷ Set A will ultimately contains the edges of the MST
for each vertex v in V
do MAKE-SET(v)
sort E into nondecreasing order by weight w
for each (u, v) taken from the sorted list
do if FIND-SET(u) = FIND-SET(v)
then A ← A ∪ {(u, v)}
UNION(u, v)
return A

Illustrative Examples
Lets run through the following graph quickly to see how Kruskal's algorithm works on it:

We get the shaded edges shown in the above figure.

44 | P a g e
Edge (c, f) : safe
Edge (g, i) : safe
Edge (e, f) : safe
Edge (c, e) : reject
Edge (d, h) : safe
Edge (f, h) : safe
Edge (e, d) : reject
Edge (b, d) : safe
Edge (d, g) : safe
Edge (b, c) : reject
Edge (g, h) : reject
Edge (a, b) : safe
At this point, we have only one component, so all other edges will be rejected. [We could add a
test to the main loop of KRUSKAL to stop once |V| − 1 edges have been added to A.]
Note Carefully: Suppose we had examined (c, e) before (e, f ). Then would have found (c, e)
safe and would have rejected (e, f ).

Example (CLRS) Step-by-Step Operation of Kurskal's Algorithm.


Step 1. In the graph, the Edge(g, h) is shortest. Either vertex g or vertex h could be
representative. Lets choose vertex g arbitrarily.

Step 2. The edge (c, i) creates the second tree. Choose vertex c as representative for second tree.

45 | P a g e
Step 3. Edge (g, g) is the next shortest edge. Add this edge and choose vertex g as representative.

Step 4. Edge (a, b) creates a third tree.

Step 5. Add edge (c, f) and merge two trees. Vertex c is chosen as the representative.

Step 6. Edge (g, i) is the next next cheapest, but if we add this edge a cycle would be created.
Vertex c is the representative of both.

Step 7. Instead, add edge (c, d).

46 | P a g e
Step 8. If we add edge (h, i), edge(h, i) would make a cycle.

Step 9. Instead of adding edge (h, i) add edge (a, h).

Step 10. Again, if we add edge (b, c), it would create a cycle. Add edge (d, e) instead to complete
the spanning tree. In this spanning tree all trees joined and vertex c is a sole representative.

Analysis
Initialize the set A: O(1)
First for loop: |V| MAKE-SETs
Sort E: O(E lg E)
Second for loop: O(E) FIND-SETs and UNIONs

47 | P a g e
 Assuming the implementation of disjoint-set data structure, already seen in Chapter 21,
that uses union by rank and path compression: O((V + E) α(V)) + O(E lg E)
 Since G is connected, |E| ≥ |V| − 1⇒ O(E α(V)) + O(E lg E).
 α(|V|) = O(lg V) = O(lg E).
 Therefore, total time is O(E lg E).
 |E| ≤ |V|2 ⇒lg |E| = O(2 lg V) = O(lg V).
 Therefore, O(E lg V) time. (If edges are already sorted, O(E α(V)), which is almost
linear.)

II Kruskal's Algorithm Implemented with Priority Queue Data Structure

MST_KRUSKAL(G)
for each vertex v in V[G]
do define set S(v) ← {v}
Initialize priority queue Q that contains all edges of G, using the weights as keys
A←{} ▷ A will ultimately contains the edges of the MST
while A has less than n − 1 edges
do Let set S(v) contains v and S(u) contain u
if S(v) ≠ S(u)
then Add edge (u, v) to A
Merge S(v) and S(u) into one set i.e., union
return A

Analysis
The edge weight can be compared in constant time. Initialization of priority queue takes O(E lg
E) time by repeated insertion. At each iteration of while-loop, minimum edge can be removed in
O(log E) time, which is O(log V), since graph is simple. The total running time is O((V + E) log
V), which is O(E lg V) since graph is simple and connected.

48 | P a g e
4. Prim's Algorithm

This algorithm was first propsed by Jarnik, but typically attributed to Prim. it starts from an
arbitrary vertex (root) and at each stage, add a new branch (edge) to the tree already constructed;
the algorithm halts when all the vertices in the graph have been reached. This strategy is greedy
in the sense that at each step the partial spanning tree is augmented with an edge that is the
smallest among all possible adjacent edges.

MST-PRIM

Input: A weighted, undirected graph G=(V, E, w)


Output: A minimum spanning tree T.

T={}
Let r be an arbitrarily chosen vertex from V.
U = {r}
WHILE | U| < n
DO
Find u in U and v in V-U such that the edge (u, v) is a smallest edge between U-V.
T = TU{(u, v)}
U= UU{v}

Analysis
The algorithm spends most of its time in finding the smallest edge. So, time of the algorithm
basically depends on how do we search this edge.
Straightforward method
Just find the smallest edge by searching the adjacency list of the vertices in V. In this case, each
iteration costs O(m) time, yielding a total running time of O(mn).

Binary heap
By using binary heaps, the algorithm runs in O(m log n).

Fibonacci heap
By using Fibonacci heaps, the algorithm runs in O(m + n log n) time.

49 | P a g e
5. Dijkstra's Algorithm

Dijkstra's algorithm solves the single-source shortest-path problem when all edges have non-
negative weights. It is a greedy algorithm and similar to Prim's algorithm. Algorithm starts at the
source vertex, s, it grows a tree, T, that ultimately spans all vertices reachable from S. Vertices
are added to T in order of distance i.e., first S, then the vertex closest to S, then the next closest,
and so on. Following implementation assumes that graph G is represented by adjacency lists.

DIJKSTRA (G, w, s)
1. INITIALIZE SINGLE-SOURCE (G, s)
2. S ← { } // S will ultimately contains vertices of final shortest-path weights from s
3. Initialize priority queue Q i.e., Q ← V[G]
4. while priority queue Q is not empty do
5. u ← EXTRACT_MIN(Q) // Pull out new vertex
6. S ← S È {u}
// Perform relaxation for each vertex v adjacent to u
7. for each vertex v in Adj[u] do
8. Relax (u, v, w)

Analysis
Like Prim's algorithm, Dijkstra's algorithm runs in O(|E|lg|V|) time.

Example: Step by Step operation of Dijkstra algorithm.

Step1. Given initial graph G=(V, E). All nodes nodes have infinite cost except the source node,
s, which has 0 cost.

50 | P a g e
Step 2. First we choose the node, which is closest to the source node, s. We initialize d[s] to 0.
Add it to S. Relax all nodes adjacent to source, s. Update predecessor (see red arrow in diagram
below) for all nodes updated.

Step 3. Choose the closest node, x. Relax all nodes adjacent to node x. Update predecessors for
nodes u, v and y (again notice red arrows in diagram below).

Step 4. Now, node y is the closest node, so add it to S. Relax node v and adjust its predecessor
(red arrows remember!).

51 | P a g e
Step 5. Now we have node u that is closest. Choose this node and adjust its neighbor node v.

Step 6. Finally, add node v. The predecessor list now defines the shortest path from each node to
the source node, s.

Q as a linear array
EXTRACT_MIN takes O(V) time and there are |V| such operations. Therefore, a total time for
EXTRACT_MIN in while-loop is O(V2). Since the total number of edges in all the adjacency list

52 | P a g e
is |E|. Therefore for-loop iterates |E| times with each iteration taking O(1) time. Hence, the
running time of the algorithm with array implementation is O(V2 + E) = O(V2).

Q as a binary heap ( If G is sparse)


In this case, EXTRACT_MIN operations takes O(lg V) time and there are |V| such operations.
The binary heap can be build in O(V) time.
Operation DECREASE (in the RELAX) takes O(lg V) time and there are at most such
operations.
Hence, the running time of the algorithm with binary heap provided given graph is sparse is
O((V + E) lg V). Note that this time becomes O(ElgV) if all vertices in the graph is reachable
from the source vertices.

Q as a Fibonacci heap
In this case, the amortized cost of each of |V| EXTRAT_MIN operations if O(lg V).
Operation DECREASE_KEY in the subroutine RELAX now takes only O(1) amortized time for
each of the |E| edges.
As we have mentioned above that Dijkstra's algorithm does not work on the digraph with
negative-weight edges. Now we give a simple example to show that Dijkstra's algorithm
produces incorrect results in this situation. Consider the digraph consists of V = {s, a, b} and E =
{(s, a), (s, b), (b, a)} where w(s, a) = 1, w(s, b) = 2, and w(b, a) = -2.
Dijkstra's algorithm gives d[a] = 1, d[b] = 2. But due to the negative-edge weight w(b, a), the
shortest distance from vertex s to vertex a is 1-2 = -1.

53 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Divide & Conquer
Algorithm

Divide-and-conquer is a top-down technique for designing algorithms that consists of dividing


the problem into smaller subproblems hoping that the solutions of the subproblems are easier to
find and then composing the partial solutions into the solution of the original problem.

Little more formally, divide-and-conquer paradigm consists of following major phases:


 Breaking the problem into several sub-problems that are similar to the original problem
but smaller in size,
 Solve the sub-problem recursively (successively and independently), and then
 Combine these solutions to sub problems to create a solution to the original problem.

Binary Search (simplest application of divide-and-conquer)


Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given an
ordered array of n elements, the basic idea of binary search is that for a given element we
"probe" the middle element of the array. We continue in either the lower or upper segment of the
array, depending on the outcome of the probe until we reached the required (given) element.

Problem Let A[1 . . . n] be an array of non-decreasing sorted order; that is A [i] ≤ A [j]
whenever 1 ≤ i ≤ j ≤ n. Let 'q' be the query point. The problem consist of finding 'q' in the
array A. If q is not in A, then find the position where 'q' might be inserted.
Formally, find the index i such that 1 ≤ i ≤ n+1 and A[i-1] < x ≤ A[i].

Sequential Search
Look sequentially at each element of A until either we reach at the end of an array A or find an
item no smaller than 'q'.
Sequential search for 'q' in array A

54 | P a g e
for i = 1 to n do
if A [i] ≥ q then
return index i
return n + 1
Analysis
This algorithm clearly takes a θ(r), where r is the index returned. This is Ω(n) in the worst case
and O(1) in the best case.
If the elements of an array A are distinct and query point q is indeed in the array then loop
executed (n + 1) / 2 average number of times. On average (as well as the worst case), sequential
search takes θ(n) time.

Binary Search
Look for 'q' either in the first half or in the second half of the array A. Compare 'q' to an element

in the middle, n/2 , of the array. Let k = n/2 . If q ≤ A[k], then search in the A[1 . . . k];
otherwise search T[k+1 . . n] for 'q'. Binary search for q in subarray A[i . . j] with the promise
that
A[i-1] < x ≤ A[j]
If i = j then
return i (index)
k= (i + j)/2
if q ≤ A [k]
then return Binary Search [A [i-k], q]
else return Binary Search [A[k+1 . . j], q]

Analysis
Binary Search can be accomplished in logarithmic time in the worst case , i.e., T(n) = θ(log n).
This version of the binary search takes logarithmic time in the best case.

Iterative Version of Binary Search


Interactive binary search for q, in array A[1 . . n]

55 | P a g e
if q > A [n]
then return n + 1
i = 1;
j = n;
while i < j do
k = (i + j)/2
if q ≤ A [k]
then j = k
else i = k + 1
return i (the index)
Analysis
The analysis of Iterative algorithm is identical to that of its recursive counterpart.

56 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Dynamic Programming
Algorithm
Dynamic programming is a fancy name for using divide-and-conquer technique with a table. As
compared to divide-and-conquer, dynamic programming is more powerful and subtle design
technique. It is not a specific algorithm, but it is a meta-technique (like divide-and-conquer).
This technique was developed back in the days when "programming" meant "tabular method"
(like linear programming).
It does not really refer to computer programming. Here in our advanced algorithm course, we'll
also think of "programming" as a "tableau method" and certainly not writing code. Dynamic
programming is a stage-wise search method suitable for optimization problems whose solutions
may be viewed as the result of a sequence of decisions. The most attractive property of this
strategy is that during the search for a solution it avoids full enumeration by pruning early partial
decision solutions that cannot possibly lead to optimal solution. In many practical situations, this
strategy hits the optimal solution in a polynomial number of decision steps. However, in the
worst case, such a strategy may end up performing full enumeration.
Dynamic programming takes advantage of the duplication and arrange to solve each subproblem
only once, saving the solution (in table or in a globally accessible place) for later use. The
underlying idea of dynamic programming is: avoid calculating the same stuff twice, usually by
keeping a table of known results of subproblems. Unlike divide-and-conquer, which solves the
subproblems top-down, a dynamic programming is a bottom-up technique. The dynamic
programming technique is related to divide-and-conquer, in the sense that it breaks problem
down into smaller problems and it solves recursively. However, because of the somewhat
different nature of dynamic programming problems, standard divide-and-conquer solutions are
not usually efficient.
The dynamic programming is among the most powerful for designing algorithms for
optimization problem. This is true for two reasons. Firstly, dynamic programming solutions are
based on few common elements. Secondly, dynamic programming problems are typical
optimization problems i.e., find the minimum or maximum cost solution, subject to various
constraints.
In other words, this technique used for optimization problems:
 Find a solution to the problem with the optimal value.

57 | P a g e
 Then perform minimization or maximization. (We'll see example of both in CLRS).

The dynamic programming is a paradigm of algorithm design in which an optimization problem


is solved by a combination of caching subproblem solutions and appealing to the "principle of
optimality."

There are three basic elements that characterize a dynamic programming algorithm:
1. Substructure
Decompose the given problem into smaller (and hopefully simpler) subproblems. Express the
solution of the original problem in terms of solutions for smaller problems. Note that unlike
divide-and-conquer problems, it is not usually sufficient to consider one decomposition, but
many different ones.
2. Table-Structure
After solving the subproblems, store the answers (results) to the subproblems in a table. This is
done because (typically) subproblem solutions are reused many times, and we do not want to
repeatedly solve the same problem over and over again.
3. Bottom-up Computation
Using table (or something), combine solutions of smaller subproblems to solve larger
subproblems, and eventually arrive at a solution to the complete problem. The idea of bottom-up
computation is as follow:
Bottom-up means
i. Start with the smallest subproblems.
ii. Combining theirs solutions obtain the solutions to subproblems of increasing size.
iii. Until arrive at the solution of the original problem.
Once we decided that we are going to attack the given problem with dynamic programming
technique, the most important step is the formulation of the problem. In other words, the most
important question in designing a dynamic programming solution to a problem is how to set up
the subproblem structure.

If I can't apply dynamic programming to all optimization problem, then the question is what
should I look for to apply this technique? Well! the answer is there are two important elements

58 | P a g e
that a problem must have in order for dynamic programming technique to be applicable (look for
those!).
1. Optimal Substructure
Show that a solution to a problem consists of making a choice, which leaves one or sub-problems
to solve. Now suppose that you are given this last choice to an optimal solution. [Students often
have trouble understanding the relationship between optimal substructure and determining which
choice is made in an optimal solution. One way to understand optimal substructure is to imagine
that "God" tells you what was the last choice made in an optimal solution.] Given this choice,
determine which subproblems arise and how to characterize the resulting space of subproblems.
Show that the solutions to the subproblems used within the optimal solution must themselves be
optimal (optimality principle). You usually use cut-and-paste:
 Suppose that one of the subproblem is not optimal.
 Cut it out.
 Paste in an optimal solution.
 Get a better solution to the original problem. Contradicts optimality of problem solution.
That was optimal substructure.
You need to ensure that you consider a wide enough range of choices and subproblems that you
get them all . ["God" is too busy to tell you what that last choice really was.] Try all the choices,
solve all the subproblems resulting from each choice, and pick the choice whose solution, along
the subproblem solutions, is best.

We have used "Optimality Principle" a couple of times. Now a word about this beast: The
optimal solution to the problem contains within it optimal solutions to subproblems. This is some
times called the principle of optimality.
The Principle of Optimality
The dynamic programming relies on a principle of optimality. This principle states that in an
optimal sequence of decisions or choices, each subsequence must also be optimal. For example,
in matrix chain multiplication problem, not only the value we are interested in is optimal but all
the other entries in the table are also represent optimal. The principle can be related as follows:
the optimal solution to a problem is a combination of optimal solutions to some of its

59 | P a g e
subproblems. The difficulty in turning the principle of optimally into an algorithm is that it is not
usually obvious which subproblems are relevant to the problem under consideration.

Now the question is how to characterize the space of subproblems?


 Keep the space as simple as possible.
 Expand it as necessary.
As an example, consider the assembly-line scheduling. In this problem, space of subproblems
was fastest way from factory entry through stations S1, j and S2, j. Clearly, no need to try a more
general space of subproblems. On the hand, in case of optimal binary search trees. Suppose we
had tried to constrain space of subproblems to subtrees with keys k1, k2, . . . , kj. An optimal BST
would have root kr , for some 1 ≤ r ≤ j. Get subproblems k1, . . . , kr − 1 and kr + 1, . . . , kj. Unless
we could guarantee that r = j, so that subproblem with kr + 1, . . . , kj is empty, then this
subproblem is not of the form k1, k2, . . . , kj. Thus, needed to allow the subproblems to vary at
both ends, i.e., allow both i and j to vary.

Optimal substructure varies across problem domains:


1. How many subproblems are used in an optimal solution.
2. How many choices in determining which subproblem(s) to use.
In Assembly-line Scheduling Problem: we have 1 subproblem and 2 choices (for Si, j use either S1,
j−1 or S2, j − 1). In the Longest Common Subsequence Problem: we have 1 subproblem but as far
as choices are concern, we have either 1 choice (if xi = yj , LCS of Xi − 1 and Yj − 1), or 2 choices
(if xi = yj , LCS of Xi − 1 and Y , and LCS of X and Yj − 1). Finally, in case of the Optimal Binary
Search Tree Problem: we have 2 subproblems (ki , . . . , kr − 1 and kr + 1, . . . , kj ) and j − i + 1
choices for kr in ki, . . . , kj . Once we determine optimal solutions to subproblems, we choose
from among the j − i + 1 candidates for kr .

Informally, the running time of the dynamic programming algorithm depends on the overall
number of subproblems times the number of choices. For example, in the assembly-line
scheduling problem, there are Θ(n) subproblems and 2 choices for each implying running time is
Θ(n). In case of longest common subsequence problem, there are Θ(mn) subproblems and at least

60 | P a g e
2 choices for each implying Θ(mn) running time. Finally, in case of optimal binary search tree
problem, we have Θ(n2) sub-problems and Θ(n) choices for each implying Θ(n3) running time.

Dynamic programming uses optimal substructure bottom up fashion:


 First find optimal solutions to subproblems.
 Then choose which to use in optimal solution to the problem.

When we look at greedy algorithms, we'll see that they work in top down fashion:
 First make a choice that looks best.
 Then solve the resulting subproblem.

Warning! Its not correct into thinking optimal substructure applies to all optimization problems.
IT DOES NOT. dynamic programming is not applicable to all optimization problems.
In both problems, they gave us an unweighted, directed graph G = (V, E). And our job is to find a
path (sequence of connected edges) from vertex u in V to vertex v in V.

Subproblems Dependencies
It is easy to see that the subproblems, in our above examples, are independent subproblems: For
example, in the assembly line problem, there is only 1 subproblem so it is trivially independent.
Similarly, in the longest common subsequence problem, again we have only 1 subproblem thus it
is automatically independent. On the other hand, in the optimal binary search tree problem, we
have two subproblems, ki, . . . , kr − 1 and kr + 1, . . . , kj, which are clearly independent.

2. Polynomially many (Overlapping) Subproblems


An important aspect to the efficiency of dynamic programming is that the total number of
distinct sub-problems to be solved should be at most a polynomial number. Overlapping
subproblems occur when recursive algorithm revisits the same problem over and over. A good
divide-and-conquer algorithm, for example the merge-sort algorithm, usually generate a brand
new problem at each stage of recursion. Our Textbook CLRS has a good example for matrix-
chain multiplication to depict this idea. The CLRS also talked about the alternative approach so-
called memoization. It works as follows:

61 | P a g e
 Store, don't recompute
 Make a table indexed by subproblem.
 When solving a subproblem:
o Lookup in the table.
o If answer is there, use it.
o Otherwise, compute answer, then store it.
In dynamic programming, we go one step further. We determine in what order we would want to
access the table, and fill it in that way.

Four-Step Method of CLRS


Our Text suggested that the development of a dynamic programming algorithm can be broken
into a sequence of following four steps.
1. Characterize the structure of an optimal solution.
2. Recursively defined the value of an optimal solution.
3. Compute the value of an optimal solution in a bottom-up fashion.
4. Construct an optimal solution from computed information.

Examples of Dynamic programming Algorithm:

 Matrix-chain Multiplication
 Knapsack Problem DP Solution
 Activity Selection Problem DP Solution

1. Matrix-chain Multiplication Problem

The chain matrix multiplication problem is perhaps the most popular example of dynamic
programming used in the upper undergraduate course (or review basic issues of dynamic
programming in advanced algorithm's class).
The chain matrix multiplication problem involves the question of determining the optimal
sequence for performing a series of operations. This general class of problem is important in
complier design for code optimization and in databases for query optimization. We will study the
problem in a very restricted instance, where the dynamic programming issues are clear. Suppose
that our problem is to multiply a chain of n matrices A1 A2 ... An. Recall (from your discrete

62 | P a g e
structures course), matrix multiplication is an associative but not a commutative operation. This
means that you are free to parenthesize the above multiplication however we like, but we are not
free to rearrange the order of the matrices. Also, recall that when two (non-square) matrices are
being multiplied, there are restrictions on the dimensions.
Suppose, matrix A has p rows and q columns i.e., the dimension of matrix A is p × q. You can
multiply a matrix A of p × q dimensions times a matrix B of dimensions q × r, and the result will
be a matrix C with dimensions p × r. That is, you can multiply two matrices if they are
compatible: the number of columns of A must equal the number of rows of B.
In particular, for 1 ≤ i ≤ p and 1 ≤ j ≤ r, we have
C[i, j] = ∑1 ≤ k ≤ q A[i, k] B[k, j].
There are p . r total entries in C and each takes O(q) time to compute, thus the total time to
multiply these two matrices is dominated by the number of scalar multiplication, which is p . q .
r.

Problem Formulation
Note that although we can use any legal parenthesization, which will lead to a valid result. But,
not all parenthesizations involve the same number of operations. To understand this point,
consider the problem of a chain A1, A2, A3 of three matrices and suppose
A1 be of dimension 10 × 100
A2 be of dimension 100 × 5
A3 be of dimension 5 × 50
Then,
MultCost[((A1 A2) A3)] = (10 . 100 . 5) + (10 . 5 . 50) = 7,500 scalar multiplications.
MultCost[(A1 (A2 A3))] = (100 . 5 . 50) + (10 . 100 . 50) = 75,000 scalar multiplications.
It is easy to see that even for this small example, computing the product according to first
parenthesization is 10 times faster.

The Chain Matrix Multiplication Problem


Given a sequence of n matrices A1, A2, ... An, and their dimensions p0, p1, p2, ..., pn, where where i
= 1, 2, ..., n, matrix Ai has dimension pi − 1 × pi, determine the order of multiplication that
minimizes the the number of scalar multiplications.

63 | P a g e
Equivalent formulation (perhaps more easy to work with!)
Given n matrices, A1, A2, ... An, where for 1 ≤ i ≤ n, Ai is a pi − 1 × pi, matrix, parenthesize the
product A1, A2, ... An so as to minimize the total cost, assuming that the cost of multiplying an pi −
1× pi matrix by a pi × pi + 1 matrix using the naive algorithm is pi − 1× pi × pi + 1.

Note that this algorithm does not perform the multiplications, it just figures out the best order in
which to perform the multiplication operations.

Naive Algorithm
Well, lets start from the obvious! Suppose we are given a list of n matrices. lets attack the
problem with brute-force and try all possible parenthesizations. It is easy to see that the number
of ways of parenthesizing an expression is very large. For instance, if you have just one item in
the list, then there is only one way to parenthesize. Similarly, if you have n item in the list, then
there are n − 1 places where you could split the list with the outermost pair of parentheses,
namely just after first item, just after the second item, and so on and so forth, and just after the (n
− 1)th item in the list.
On the other hand, when we split the given list just after the kth item, we create two sublists to be
parenthesized, one with k items, and the other with n − k items. After splitting, we could consider
all the ways of parenthesizing these sublists (brute force in action). If there are L ways to
parenthesize the left sublist and R ways to parenthesize the right sublist and since these are
independent choices, then the total is L times R. This suggests the following recurrence for P(n),
the number of different ways of parenthesizing n items:

This recurrence is related to a famous function in combinatorics called the Catalan numbers,
which in turn is related to the number of different binary trees on n nodes. The solution to this
recurrence is the sequence of Catalan numbers. In particular P(n) = C(n − 1), where C(n) is the
nth Catalan number. And, by applying Stirling's formula, we get the lower bound on the
sequence. That is,

64 | P a g e
since 4n is exponential and n3/2 is just a polynomial, the exponential will dominate the
expression, implying that function grows very fast. Thus, the number of solutions is exponential
in n, and the brute-force method of exhaustive search is a poor strategy for determining the
optimal parenthesization of a matrix chain. Therefore, the naive algorithm will not be practical
except for very small n.

Dynamic Programming Approach


The first step of the dynamic programming paradigm is to characterize the structure of an
optimal solution. For the chain matrix problem, like other dynamic programming problems,
involves determining the optimal structure (in this case, a parenthesization). We would like to
break the problem into subproblems, whose solutions can be combined to obtain a solution to the
global problem.
For convenience, let us adopt the notation Ai .. j, where i ≤ j, for the result from evaluating the
product Ai Ai + 1 ... Aj. That is,
Ai .. j ≡ Ai Ai + 1 ... Aj , where i ≤ j,
It is easy to see that is a matrix Ai .. j is of dimensions pi × pi + 1.
In parenthesizing the expression, we can consider the highest level of parenthesization. At this
level we are simply multiplying two matrices together. That is, for any k, 1 ≤ k ≤ n − 1,
A1..n = A1..k Ak+1..n .

Therefore, the problem of determining the optimal sequence of multiplications is broken up into
two questions:
Question 1: How do we decide where to split the chain? (What is k?)
Question 2: How do we parenthesize the subchains A1..k Ak+1..n?

65 | P a g e
The subchain problems can be solved by recursively applying the same scheme. On the other
hand, to determine the best value of k, we will consider all possible values of k, and pick the best
of them. Notice that this problem satisfies the principle of optimality, because once we decide to
break the sequence into the product , we should compute each subsequence optimally. That is,
for the global problem to be solved optimally, the subproblems must be solved optimally as well.
The key observation is that the parenthesization of the "prefix" subchain A1..k within this optimal
parenthesization of A1..n. must be an optimal parenthesization of A1..k.

Dynamic Programming Formulation


The second step of the dynamic programming paradigm is to define the value of an optimal
solution recursively in terms of the optimal solutions to subproblems. To help us keep track of
solutions to subproblems, we will use a table, and build the table in a bottomup manner. For 1 ≤ i
≤ j ≤ n, let m[i, j] be the minimum number of scalar multiplications needed to compute the Ai..j.
The optimum cost can be described by the following recursive formulation.
Basis: Observe that if i = j then the problem is trivial; the sequence contains only one matrix, and
so the cost is 0. (In other words, there is nothing to multiply.) Thus,
m[i, i] = 0 for i = 1, 2, ..., n.

Step: If i ≠ j, then we are asking about the product of the subchain Ai..j and we take advantage of
the structure of an optimal solution. We assume that the optimal parenthesization splits the
product, Ai..j into for each value of k, 1 ≤ k ≤ n − 1 as Ai..k . Ak+1..j.
The optimum time to compute is m[i, k], and the optimum time to compute is m[k + 1, j]. We
may assume that these values have been computed previously and stored in our array. Since Ai..k
is a matrix, and Ak+1..j is a matrix, the time to multiply them is pi − 1 . pk . pj. This suggests the
following recursive rule for computing m[i, j].

To keep track of optimal subsolutions, we store the value of k in a table s[i, j]. Recall, k is the
place at which we split the product Ai..j to get an optimal parenthesization. That is,

66 | P a g e
s[i, j] = k such that m[i, j] = m[i, k] + m[k + 1, j] + pi − 1 . pk . pj.
Implementing the Rule
The third step of the dynamic programming paradigm is to construct the value of an optimal
solution in a bottom-up fashion. It is pretty straight forward to translate the above recurrence into
a procedure. As we have remarked in the introduction that the dynamic programming is nothing
but the fancy name for divide-and-conquer with a table. But here in dynamic programming, as
opposed to divide-and-conquer, we solve subproblems sequentially. It means the trick here is to
solve them in the right order so that whenever the solution to a subproblem is needed, it is
already available in the table.
Consequently, in our problem the only tricky part is arranging the order in which to compute the
values (so that it is readily available when we need it). In the process of computing m[i, j] we
will need to access values m[i, k] and m[k + 1, j] for each value of k lying between i and j. This
suggests that we should organize our computation according to the number of matrices in the
subchain. So, lets work on the subchain:
Let L = j − i + 1 denote the length of the subchain being multiplied. The subchains of length 1
(m[i, i]) are trivial. Then we build up by computing the subchains of length 2, 3, ..., n. The final
answer is m[1, n].
Now set up the loop: Observe that if a subchain of length L starts at position i, then j = i + L − 1.
Since, we would like to keep j in bounds, this means we want j ≤ n, this, in turn, means that we
want i + L − 1 ≤ n, actually what we are saying here is that we want i ≤ n − L +1. This gives us
the closed interval for i. So our loop for i runs from 1 to n − L + 1.

Matrix-Chain(array p[1 .. n], int n) {


Array s[1 .. n − 1, 2 .. n];
FOR i = 1 TO n DO m[i, i] = 0; // initialize
FOR L = 2 TO n DO { // L=length of subchain
FOR i = 1 TO n − L + 1 do {
j = i + L − 1;
m[i, j] = infinity;
FOR k = i TO j − 1 DO { // check all splits
q = m[i, k] + m[k + 1, j] + p[i − 1] p[k] p[j];

67 | P a g e
IF (q < m[i, j]) {
m[i, j] = q;
s[i, j] = k;
}
}
}
}
return m[1, n](final cost) and s (splitting markers);
}
Example [on page 337 in CLRS]: The m-table computed by MatrixChain procedure for n = 6
matrices A1, A2, A3, A4, A5, A6 and their dimensions 30, 35, 15, 5, 10, 20, 25.

Note that the m-table is rotated so that the main diagonal runs horizontally. Only the main
diagonal and upper triangle is used.

Complexity Analysis
Clearly, the space complexity of this procedure Ο(n2). Since the tables m and s require Ο(n2)
space. As far as the time complexity is concern, a simple inspection of the for-loop(s) structures
gives us a running time of the procedure. Since, the three for-loops are nested three deep, and

68 | P a g e
each one of them iterates at most n times (that is to say indices L, i, and j takes on at most n − 1
values). Therefore, The running time of this procedure is Ο(n3).

Extracting Optimum Sequence


This is Step 4 of the dynamic programming paradigm in which we construct an optimal solution
from computed information. The array s[i, j] can be used to extract the actual sequence. The
basic idea is to keep a split marker in s[i, j] that indicates what is the best split, that is, what value
of k leads to the minimum value of m[i, j]. s[i, j] = k tells us that the best way to multiply the
subchain is to first multiply the subchain Ai..k and then multiply the subchain Ak+1..j, and finally
multiply these two subchains together. Intuitively, s[i, j] tells us what multiplication to perform
last. Note that we only need to store s[i, j] when we have at least two matrices, that is, if j > i.
The actual multiplication algorithm uses the s[i, j] value to determine how to split the current
sequence. Assume that the matrices are stored in an array of matrices A[1..n], and that s[i, j] is
global to this recursive procedure. The procedure returns a matrix.

Mult(i, j) {
if (i = = j) return A[i]; // Basis
else {
k = s[i, j];
X = Mult(i, k]; // X=A[i]…A[k]
Y = Mult(k + 1, j]; // Y=A[k+1]…A[j]
return XY; // multiply matrices X and Y
}
}
Again, we rotate the s-table so that the main diagonal runs horizontally but in this table we use
only upper triangle (and not the main diagonal).

69 | P a g e
In the example, the procedure computes the chain matrix product according to the
parenthesization ((A1(A2 A3))((A4 A5) A6).

Recursive Implementation
Here we will implement the recurrence in the following recursive procedure that determines m[i,
j], the minimum number of scalar multiplications needed to compute the chain matrix product
Ai..j. The recursive formulation have been set up in a top-down manner. Now consider the
following recursive implementation of the chainmatrix multiplication algorithm. The call Rec-
MatrixChain(p, i, j) computes and returns the value of m[i, j]. The initial call is RecMatrix-
Chain(p, 1, n). We only consider the cost here.
Rec-Matrix-Chain(array p, int i, int j) {
if (i = = j) m[i, i] = 0; // basic case
else {
m[i, j] = infinity; // initialize
for k = i to j − 1 do { // try all possible splits
cost=Rec-Matrix-Chain(p, i, k) + Rec-Matrix-Chain(p, k + 1, j) + p[i −
1]*p[k]*p[j];
if (cost<m[i, j]) then
m[i, j]= cost;
}
} // update if better

70 | P a g e
return m[i,j]; // return final cost
}
This version, which is based directly on the recurrence (the recursive formulation that we gave
for chain matrix problem) seems much simpler. So, what is wrong with this? The answer is the
running time is much higher than the algorithm that we gave before. In fact, we will see that its
running time is exponential in n, which is unacceptably slow.
Let T(n) be the running time of this algorithm on a sequence of matrices of length n, where n = j
− i + 1.
If i = j, then we have a sequence of length 1, and the time is Θ(1). Otherwise, we do Θ(1) work
and then consider all possible ways of splitting the sequence of length n into two sequences, one
of length k and the other of length n − k, and invoke the procedure recursively on each one. So,
we get the following recurrence, defined for n ≥ 1.

Note that we have replaced the Θ(1)'s with the constant 1.


Claim: T(n) = 2n − 1.
Proof. We shall prove by induction on n. This is trivially true for n = 1. (Since T(1) ≥ 1 = 20.)
Our induction hypothesis is that T(m) = 2m − 1. for all m < n. Using this hypothesis, we have
T(n) = 1 + ∑1≤ k≤ n-1 (T(k) + T(n − k))
≥ 1 + ∑1≤ k≤ n-1 T(k) -- Ignore the term T(n − k).
≥ 1 + ∑1≤ k≤ n-1 (2k −1) -- by application of induction hypothesis.
= 1 + ∑0≤ k≤ n-2 (2k) -- By application of geometric series formula.
= 1 + (2n −1 + 1)
= 2n −1.
Therefore, we have T(n) = Ω(2n).
Now the question is why this is so inefficient than that of bottom-up dynamic programming
algorithm? If you "unravel'' the recursive calls on a reasonably long example, you will see that
the procedure is called repeatedly with the same arguments. The bottom-up version evaluates
each entry exactly once.

71 | P a g e
Now from very practical viewpoint, we would like to have the nice top-down structure of
recursive algorithm with the efficiency of bottom-up dynamic programming algorithm. The
question is: is it possible? The answer is yes, using the technique called memoization.
The fact that our recursive algorithm runs in exponential time is simply due to the spectacular
redundancy in the number of time it issues recursive calls. Now our problem is how could we
eliminate all this redundancy? We could store the value of "cost" in a globally accessible place
the first time we compute it and then simply use this precomputed value in place of all future
recursive calls. This technique of saving values that have already been computed is referred to as
memoization.
The idea is as follow. Let's reconsider the function RecMatrixChain() given above. It's job is to
compute m[i, j], and return its value. The main problem with the procedure is that it recomputes
the same entries over and over. So, we will fix this by allowing the procedure to compute each
entry exactly once. One way to do this is to initialize every entry to some special value (e.g.
UNDEFINED). Once an entries value has been computed, it is never recomputed.
In essence, what we are doing here is we are maintaining a table with subproblem solution (like
dynamic programming algorithm), but filling up the table more like recursive algorithm. In other
words, we would like to have best of both worlds!
Mem-Matrix-Chain(array p, int i, int j) {
if (m[i, j] != UNDEFINED) then
return m[i, j]; // already defined
else if ( i = = j) then
m[i, i] = 0; // basic case
else {
m[i, j] = infinity; // initialize
for k = i to j − 1 do { // try all splits
cost = Mem-Matrix-Chain(p, i, k) + Mem-Matrix-Chain(p, k + 1, j) + p[i − 1]
p[k] p[j];
if (cost < m[i, j]) then // update if better
m[i, j] = cost;
}
}

72 | P a g e
return m[i, j]; // return final cost
}
Like the dynamic programming algorithm, this version runs in time Ο(n3). Intuitively, the reason
is this: when we see the subroblem for the first time, we compute its solution and store in the
table. After that whenever we see the subproblem again, we simply looked up in the table and
returned the solution. So, we are computing each of the Ο(n2) table entry once and, and the work
needed to compute one table entry (most of it in the forloop) is at most Ο(n). So, memoization
turns an Ω(2n)-time algorithm into an time Ο(n3)-algorithm.
As a matter of fact, in general, Memoization is slower than bottom-up method, so it is not usually
used in practice. However, in some dynamic programming problems, many of the table entries
are simply not needed, and so bottom-up computation may compute entries that are never
needed. In these cases, we use memoization to compute the table entry once. If you know that
most of the table will not be needed, here is a way to save space. Rather than storing the whole
table explicitly as an array, you can store the "defined" entries of the table in a hash table, using
the index pair (i, j) as the hash key.

2. 0-1 Knapsack Problem


Problem Statement A thief robbing a store and can carry a maximal weight of W into their
knapsack. There are n items and ith item weigh wi and is worth vi dollars. What items should
thief take?

There are two versions of problem


Fractional knapsack problem The setup is same, but the thief can take fractions of items,
meaning that the items can be broken into smaller pieces so that thief may decide to carry only a
fraction of xi of item i, where 0 ≤ xi ≤ 1.
0-1 knapsack problem The setup is the same, but the items may not be broken into smaller
pieces, so thief may decide either to take an item or to leave it (binary choice), but may not take a
fraction of an item.

Fractional knapsack problem

73 | P a g e
 Exhibit greedy choice property.
Þ Greedy algorithm exists.
 Exhibit optimal substructure property.
 Þ
0-1 knapsack problem
 Exhibit No greedy choice property.
Þ No greedy algorithm exists.
 Exhibit optimal substructure property.
 Only dynamic programming algorithm exists.

3. Knapsack Problem
Let i be the highest-numbered item in an optimal solution S for W pounds. Then S` = S - {i} is an
optimal solution for W - wi pounds and the value to the solution S is Vi plus the value of the
subproblem.
We can express this fact in the following formula: define c[i, w] to be the solution for items 1,2,
. . . , i and maximum weight w. Then

0 if i = 0 or w = 0
c[i,w] =c[i-1, w] if wi ≥ 0
max [vi + c[i-1, w-wi], c[i-1,
if i>0 and w ≥ wi
w]}

This says that the value of the solution to i items either include ith item, in which case it is vi plus
a subproblem solution for (i - 1) items and the weight excluding wi, or does not include ith item,
in which case it is a subproblem's solution for (i - 1) items and the same weight. That is, if the
thief picks item i, thief takes vi value, and thief can choose from items w - wi, and get c[i - 1, w -
wi] additional value. On other hand, if thief decides not to take item i, thief can choose from item
1,2, . . . , i- 1 upto the weight limit w, and get c[i - 1, w] value. The better of these two choices
should be made.

74 | P a g e
Although the 0-1 knapsack problem, the above formula for c is similar to LCS formula:
boundary values are 0, and other values are computed from the input and "earlier" values of c. So
the 0-1 knapsack algorithm is like the LCS-length algorithm given in CLR for finding a longest
common subsequence of two sequences.
The algorithm takes as input the maximum weight W, the number of items n, and the two
sequences v = <v1, v2, . . . , vn> and w = <w1, w2, . . . , wn>. It stores the c[i, j] values in the table,
that is, a two dimensional array, c[0 . . n, 0 . . w] whose entries are computed in a row-major
order. That is, the first row of c is filled in from left to right, then the second row, and so on. At
the end of the computation, c[n, w] contains the maximum value that can be picked into the
knapsack.

Dynamic-0-1-knapsack (v, w, n, W)
FOR w = 0 TO W
DO c[0, w] = 0
FOR i=1 to n
DO c[i, 0] = 0
FOR w=1 TO W
DO IFf wi ≤ w
THEN IF vi + c[i-1, w-wi]
THEN c[i, w] = vi + c[i-1, w-wi]
ELSE c[i, w] = c[i-1, w]
ELSE
c[i, w] = c[i-1, w]

The set of items to take can be deduced from the table, starting at c[n. w] and tracing backwards
where the optimal values came from. If c[i, w] = c[i-1, w] item i is not part of the solution, and
we are continue tracing with c[i-1, w]. Otherwise item i is part of the solution, and we continue
tracing with c[i-1, w-W].

Analysis

75 | P a g e
This dynamic-0-1-kanpsack algorithm takes θ(nw) times, broken up as follows: θ(nw) times to
fill the c-table, which has (n +1).(w +1) entries, each requiring θ(1) time to compute. O(n) time
to trace the solution, because the tracing process starts in row n of the table and moves up 1 row
at each step.

76 | P a g e
ALGORITHMS DESIGN & ANALYSIS TECHNIQUES: Amortized Analysis

In an amortized analysis, the time required to perform a sequence of data structure operations is
average over all operation performed. Amortized analysis can be used to show that average cost
of an operation is small, if one average over a sequence of operations, even though a single
operation might be expensive. Unlike the average probability distribution function, the amortized
analysis guarantees the 'average' performance of each operation in the worst case.
CLR covers the three most common techniques used in amortized analysis. The main difference
is the way the cost is assign.
1. Aggregate Method
o Computes an upper bound T(n) on the total cost of a sequence of n operations.
2. Accounting Method
o Overcharge some operations early in the sequence. This 'overcharge' is used later
in the sequence for pay operation that charges less than they actually cost.
3. Potential Method
o Maintain the credit as the potential energy to pay for future operations.

1. Aggregate Method

Aggregate Method Characteristics


 It computes the worst case time T(n) for a sequence of n operations.
 The amortized cost is T(n)/n per operation.
 It gives the average performance of each operation in the worst case.
 This method is less precise than other methods, as all operations are assigned the same
cost.

Application 1: Stack operations


In the following pseudocode, the operation STACK-EMPTY returns TRUE if there are no object
currently on the stack, and FALSE otherwise.
MULTIPOP(s, k)
while (.NOT. STACK-EMPTY(s) and k ≠ 0)

77 | P a g e
do pop(s)
k = k-1

Analysis
i. Worst-case cost for MULTIPOP is O(n). There are n successive calls to MULTIPOP
would cost O(n2). We get unfair cost O(n2) because each item can be poped only once for
each time it is pushed.

ii. In a sequence of n mixed operations the most times multipop can be called n/2.Since the
cost of push and pop is O(1), the cost of n stack operations is O(n). Therefore, amortized
cost of an operation is the average: O(n)/n = O(1).

Application 2: Binary Counter


We use an array A[0 . . k-1] of bits, where length [A] = k, as the counter. A binary number x that
is stored in the counter has its lowest-order bit in A[0] and its highest-order bit is A[k-1], so
k-1
that Si=0 2iA[i]. Initially, x = 0, and thus A[i] = 0 for i=0, 1, . . . , k-1.
To add 1 (modulus 2k ) to the value in the counter, use the following pseudocode.
INCREMENT (A)
i=0
while i < length [A] and A[i] = 1
do A[i] = 0
i = i+1
if i < length [A]
then A[i] = 1

A single execution of INCREMENT takes O(k) in worst case when Array A contains all 1's.
Thus, a sequence of n INCREMENT operation on an initially zero counter takes O(nk) in the
worst case. This bound is correct but not tight.

78 | P a g e
Amortized Analysis
We can tighten the analysis to get a worst-case cost for a sequence of an INCREMENT's by
observing that not all bits flip each time INCREMENT is called.
Bit A[0] is changed ceiling n times (every time)
Bit A[0] is changed ceiling [n/21] times (every other time)
Bit A[0] is changed ceiling [n/22] times (every other time)
.
.
.
Bit A[0] is changed ceiling [n/2i] times.

In general, for i = 0, 1, . . ., lg n , bit A[i] flips n/2i times in a sequence of n


INCREMENT operations on an initially zero counter.

For i > lg(n) , bit A i never flips at all. The total number of flips in a sequence is
thus

floor(log)
Si=0 n/2i < n ∞Si=0 1/2i = 2n
Therefore, the worst-case time for a sequence of n INCREMENT operation on an initially zero
counter is therefore O(n), so the amortized cost of each operation is O(n)/n = O(1).

2. Accounting Method
In this method, we assign changes to different operations, with some operations charged more or
less than they actually cost. In other words, we assign artificial charges to different operations.
 Any overcharge for an operation on an item is stored (in an bank account) reserved for
that item.
 Later, a different operation on that item can pay for its cost with the credit for that item.
 The balanced in the (bank) account is not allowed to become negative.
 The sum of the amortized cost for any sequence of operations is an upper bound for the
actual total cost of these operations.
 The amortized cost of each operation must be chosen wisely in order to pay for each
operation at or before the cost is incurred.

79 | P a g e
Application 1: Stack Operation
Recall the actual costs of stack operations were:
PUSH (s, x) 1
POP (s) 1
MULTIPOP (s, k) min(k,s)

The amortized cost assignments are


PUSH 2
POP 0
MULTIPOP 0
Observe that the amortized cost of each operation is O(1). We must show that one can pay for
any sequence of stack operations by charging the amortized costs.
The two units costs collected for each PUSH is used as follows:
 1 unit is used to pay the cost of the PUSH.
 1 unit is collected in advanced to pay for a potential future POP.

Therefore, for any sequence for n PUSH, POP, and MULTIPOP operations, the amortized cost is
an
j=1
Ci = Si 3 - Ciactual
= 3i - (2floor(lg1) + 1 + i -floor(lgi) - 2)
If i = 2k, where k ≥ 0, then
Ci = 3i - (2k+1 + i - k -2)
=k+2
If i = 2k + j, where k ≥ 0 and 1 ≤ j ≤ 2k, then
Ci = 3i - (2k+1 + i - k - 2)
= 2j + k + 2
This is an upperbound on the total actual cost. Since the total amortized cost is O(n) so is the
total cost.
As an example, consider a sequence of n operations is performed on a data structure. The ith
operation costs i if i is an exact power of 2, and 1 otherwise. The accounting method of

80 | P a g e
amortized analysis determine the amortized cost per operation as follows:
Let amortized cost per operation be 3, then the credit Ci after the ith operation is: Since k ≥ 1 and
j ≥ 1, so credit Ci always greater than zero. Hence, the total amortized cost 3n, that is O(n) is an
upper bound on the total actual cost. Therefore, the amortized cost of each operation is O(n)/n =
O(1).
Another example, consider a sequence of stack operations on a stack whose size never exceeds k.
After every k operations, a copy of the entire stack is made. We must show that the cost of n
stack operations, including copying the stack, is O(n) by assigning suitable amortized costs to the
various stack operations.
There are, ofcourse, many ways to assign amortized cost to stack operations. One way is:
PUSH 4,
POP 0,
MULTIPOP 0,
STACK-COPY 0.
Every time we PUSH, we pay a dollar (unit) to perform the actual operation and store 1 dollar
(put in the bank). That leaves us with 2 dollars, which is placed on x (say) element. When we
POP x element off the stack, one of two dollar is used to pay POP operation and the other one
(dollar) is again put into a bank account. The money in the bank is used to pay for the STACK-
COPY operation. Since after kk dollars in the bank and the stack size is never exceeds k, there is
enough dollars (units) in the bank (storage) to pay for the STACK-COPY operations. The cost of
n stack operations, including copy the stack is therefore O(n). operations, there are atleast

Application 2: Binary Counter


We observed in the method, the running time of INCREMENT operation on binary counter is
proportion to the number of bits flipped. We shall use this running time as our cost here.
For amortized analysis, charge an amortized cost of 2 dollars to set a bit to 1.
When a bit is set, use 1 dollar out of 2 dollars already charged to pay the actual setting of the bit,
and place the other dollar on the bit as credit so that when we reset a bit to zero, we need not
charge anything.
The amortized cost of psuedocode INCREMENT can now be evaluated:
INCREMENT (A)

81 | P a g e
1. i = 0
2. while i < length[A] and A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1

Within the while loop, the cost of resetting the bits is paid for by the dollars on the bits that are
reset.At most one bit is set, in line 6 above, and therefore the amortized cost of an INCREMENT
operation is at most 2 dollars (units). Thus, for n INCREMENT operation, the total amortized
cost is O(n), which bounds the total actual cost.

Consider a Variant
Let us implement a binary counter as a bit vector so that any sequence of n INCREMENT and
RESET operations takes O(n) time on an initially zero counter,. The goal here is not only to
increment a counter but also to read it to zero, that is, make all bits in the binary counter to zero.
The new field , max[A], holds the index of the high-order 1 in A. Initially, set max[A] to -1. Now,
update max[A] appropriately when the counter is incremented (or reset). By contrast the cost of
RESET, we can limit it to an amount that can be covered from earlier INCREMENT'S.
INCREMENT (A)
1. i = 1
2. while i < length [A] and A[i] = 1
3. do A[i] = 0
4. i = i +1
5. if i < length [A]
6. then A[i] = 1
7. if i > max[A]
8. then max[A] = i
9. else max[A] = -1

Note that lines 7, 8 and 9 are added in the CLR algorithm of binary counter.

82 | P a g e
RESET(A)
For i = 0 to max[A]
do A[i] = 0
max[A] = -1

For the counter in the CLR we assume that it cost 1 dollar to flip a bit. In addition to that we
assume that we need 1 dollar to update max[A]. Setting and Resetting of bits work exactly as the
binary counter in CLR: Pay 1 dollar to set bit to 1 and placed another 1 dollar on the same bit as
credit. So, that the credit on each bit will pay to reset the bit during incrementing.
In addition, use 1 dollar to update max[A] and if max[A] increases place 1 dollar as a credit on a
new high-order 1. (If max[A] does not increase we just waste that one dollar). Since RESET
manipulates bits at some time before the high-order 1 got up to max[A], every bit seen by
RESET has one dollar credit on it. So, the zeroing of bits by RESET can be completely paid for
by the credit stored on the bits. We just need one dollar to pay for resetting max[A].
Thus, charging 4 dollars for each INCREMENT and 1 dollar for each RESET is sufficient, so the
sequence of n INCREMENT and RESET operations take O(n) amortized time.

3. Potential Method
This method stores pre-payments as potential or potential energy that can be released to pay for
future operations. The stored potential is associated with the entire data structure rather than
specific objects within the data structure.
Notation:
 D0 is the initial data structure (e.g., stack)
 Di is the data structure after the ith operation.
 ci is the actual cost of the ith operation.
 The potential function Ψ maps each Di to its potential value Ψ(Di)
The amortized cost ^ci of the ith operation w.r.t potential function Ψ is defined by
^
ci = ci + Ψ(Di) - Ψ (Di-1) --------- (1)
The amortized cost of each operation is therefore
^
ci = [Actual operation cost] + [change in potential].

83 | P a g e
By the eq.I, the total amortized cost of the n operation is
n ^ n
i=1 ci = i=1(ci + Ψ(Di) - Ψ (Di-1) )
n n
= i=1 ci + i=1 Ψ(Di) - n i=1Ψ (Di-1)
n
= i=1 ci + Ψ(D1) + Ψ(D2) + . . . + Ψ (Dn-1) + Ψ(Dn) - {Ψ(D0) + Ψ(D1) + . . . + Ψ (Dn-
1)
n
= i=1 ci + Ψ(Dn) - Ψ(D0) ----------- (2)

n
If we define a potential function Ψ so that Ψ(Dn) ≥ Ψ(D0), then the total amortized cost i=1
^
ci is an upper bound on the total actual cost.
As an example consider a sequence of n operations performed on a data structure. The ith
operation costs i if i is an exact power of 2 and 1 otherwise. The potential method of amortized
determines the amortized cost per operation as follows:
Let Ψ(Di) = 2i - 2ëlgi+1û + 1, then
Ψ(D0) = 0. Since 2ëlgi+1û ≤ 2i where i >0 ,
Therefore, Ψ(Di) ≥ 0 = Ψ(D0)

If i = 2k where k ≥ 0 then

2ëlgi+1û = 2k+1 = 2i
2ëlgiû = 2k = i

^
ci = ci + Ψ(Di) - Ψ(Di-1)
= i + (2i -2i+1) -{2(i-1)-i+1}
=2

If i = 2k + j where k ≥ 0 and 1 ≤ j ≤ 2k
then 2ëlgi+1û = 2[lgi]
^
ci = ci + Ψ(Di) - Ψ(Di-1) = 3

n ^ n
Because i=1 ci = i=1 ci + Ψ(Dn) - Ψ(D0)

84 | P a g e
and Ψ(Di) ≥ Ψ(D0), so, the total amortized cost of n operation is an upper bound on the total
actual cost. Therefore, the total amortized cost of a sequence of n operation is O(n) and the
amortized cost per operation is O(n) / n = O(1).

Application 1- Stack Operations


Define the potential function Ψ on a stack to be the number of objects in the stack. For empty
stack D0 , we have Ψ(D0) = 0. Since the number of objects in the stack can not be negative, the
stack Di after the ith operation has nonnegative potential, and thus

Ψ(Di) ≥ 0 = Ψ(D0).
Therefore, the total amortized cost of n operations w.r.t. function Ψ represents an upper bound on
the actual cost.
Amortized costs of stack operations are:
PUSH
If the ith operation on a stack containing s object is a PUSH operation, then the potential
difference is
Ψ(Di) - Ψ(Di-1) = (s + 1) - s = 1
In simple words, if ith is PUSH then (i-1)th must be one less. By equation I, the amortized cost of
this PUSH operation is

^
ci = ci + Ψ(Di) - Ψ(Di-1) = 1 + 1 = 2
MULTIPOP
If the ith operation on the stack is MULTIPOP(S, k) and k` = min(k,s) objects are popped off the
stack.
The actual cost of the operation is k`, and the potential difference is
Ψ(Di) - Ψ(Di-1) = -k`
why this is negative? Because we are taking off item from the stack. Thus, the amortized cost of
the MULTIPOP operation is
^
ci = ci + Ψ(Di) - Ψ(Di-1) = k`-k` = 0

85 | P a g e
POP
Similarly, the amortized cost of a POP operation is 0.

Analysis
Since amortized cost of each of the three operations is O(1), therefore, the total amortized cost of
n operations is O(n). The total amortized cost of n operations is an upper bound on the total
actual cost.

Lemma If data structure is Binary heap: Show that a potential function is O(nlgn) such that
the amortized cost of EXTRACT-MIN is constant.

Proof
We know that the amortized cost ^ci of operation i is defined as

^
ci = ci + Ψ(Di) - Ψ(Di-1)
For the heap operations, this gives us
c1lgn = c2lg(n+c3) + Ψ(Di) - Ψ(Di-1) (Insert) ------------(1)
c4 = c5lg(n + c6) + Ψ(Di) - Ψ(Di-1) (EXTRACT-MIN) -----(2)
Consider the potential function Ψ(D) = lg(n!), where n is the number of items in D.
From equation (1), we have
(c1 - c2)lg(n + c3) = lg(n!) - lg ((n-1)!) = lgn.
This clearly holds if c1 = c2 and c3 = 0.
From equation (2), we have
c4 - c5 lg(n + c6) = lg(n!) - lg ((n+1)!) = - lg(n+1).
This clearly holds if c4 = 0 and c4 = c6 = 1.
Remember that stirlings function tells that lg(n!) = θ(nlgn), so
Ψ(D) = θ(n lg n)
And this completes the proof.

Application 2: Binary Counter

86 | P a g e
Define the potential of the counter after ith INCREMENT operation to be bi, the number of 1's in
the counter after ith operation.
Let ith INCREMENT operation resets ti bits. This implies that actual cost = atmost (ti + 1).
Why? Because in addition to resetting ti it also sets at most one bit to 1.
Therefore, the number of 1's in the counter after the ith operation is therefore bi ≤ bi-1 - ti + 1, and
the potential difference is
Ψ(Di) - Ψ(Di-1) ≤ (bi-1 - ti + 1) - bi-1 = 1- ti
Putting this value in equation (1), we get
^
ci = ci + Ψ(Di) - Ψ (Di-1)
= (ti + 1) + (1- ti)
= 2
If counter starts at zero, then Ψ(D0) = 0. Since Ψ(Di) ≥ 0 for all i, the total amortized cost of a
sequence of n INCREMENT operation is an upper bound on the total actual cost, and so the
worst-case cost of n INCREMENT operations is O(n).
If counter does not start at zero, then the initial number are 1's (= b0).
After 'n' INCREMENT operations the number of 1's = bn, where 0 ≤ b0, bn ≤ k.
n ^
Since i=1 ci = (ci + Ψ(Di) + Ψ(Di-1))
n ^ n n n
=> i=1 ci = i=1 ci + i=1 Ψ(Di) + i=1 Ψ(Di-1)
n ^
=> i=1 ci = nSi=1 ci + Ψ(Dn) - Ψ(D0)
n n ^
=> i=1ci = i=1 ci + Ψ(D0) - Ψ(Dn)

We have ^ci ≤ 2 for all 1≤ i ≤ n. Since Ψ(Di) = b0 and Ψ(Dn) = b, the total cost of n
INCREMENT operation is
Since n i=1ci = n
i=1
^
ci + Ψ(Dn) + Ψ(D0)
≤n i=1 2 - bn + b0 why because ci ≤ 2
=2n i=1 - bn + b0
= 2n - bn + b

Note that since b0 ≤ k, if we execute at least n = Ω(k) INCREMENT Operations, the total actual
cost is O(n), no matter what initial value of counter is.

87 | P a g e
Implementation of a queue with two stacks, such that the amortized cost of each ENQUEUE and
each DEQUEUE Operation is O(1). ENQUEUE pushes an object onto the first stack.
DEQUEUE pops off an object from second stack if it is not empty. If second stack is empty,
DEQUEUE transfers all objects from the first stack to the second stack to the second stack and
then pops off the first object. The goal is to show that this implementation has an O(1) amortized
cost for each ENQUEUE and DEQUEUE operation. Suppose Di denotes the state of the stacks
after ith operation. Define Ψ(Di) to be the number of elements in the first stack. Clearly, Ψ(D0) =
0 and Ψ(Di) ≥ Ψ(D0) for all i. If the ith operation is an ENQUEUE operation, then Ψ(Di) - Ψ(Di-
1) =1
Since the actual cost of an ENQUEUE operation is 1, the amortized cost of an ENQUEUE
operation is 2. If the ith operation is a DEQUEUE, then there are two case to consider.

Case i: When the second stack is not empty.


In this case we have Ψ(Di) - Ψ(Di-1) = 0 and the actual cost of the DEQUEUE operation
is 1.

Case ii: When the second stack is empty.


In this case, we have Ψ(Di) - Ψ(Di-1) = - Ψ(Di-1) and the actual cost of the DEQUEUE
operation is Ψ(Di-1) + 1
In either case, the amortize cost of the DEQUEUE operation is 1. It follows that each operation
has O(1) amortized cost

 Suppose you have to make a series of decisions, among various choices, where

88 | P a g e
 You don’t have enough information to know what to choose
 Each decision leads to a new set of choices
 Some sequence of choices (possibly more than one) may be a solution to your
problem
 Backtracking is a methodical way of trying out various sequences of decisions, until you
find one that “works”

Solving a maze

 Given a maze, find a path from start to finish


 At each intersection, you have to decide between three or fewer choices:
 Go straight
 Go left
 Go right
 You don’t have enough information to choose correctly
 Each choice leads to another set of choices
 One or more sequences of choices may (or may not) lead to a solution
 Many types of maze problem can be solved with backtracking

Coloring a map

 You wish to color a map with


not more than four colors
 red, yellow, green, blue
 Adjacent countries must be in different colors
 You don’t have enough information to choose colors
 Each choice leads to another set of choices
 One or more sequences of choices may (or may not) lead to a solution
 Many coloring problems can be solved with backtracking

89 | P a g e
Solving a puzzle

 In this puzzle, all holes but one are filled with white pegs
 You can jump over one peg with another
 Jumped pegs are removed
 The object is to remove all but the last peg
 You don’t have enough information to jump correctly
 Each choice leads to another set of choices
 One or more sequences of choices may (or may not) lead to a solution
 Many kinds of puzzle can be solved with backtracking

90 | P a g e
Backtracking

Terminology I

91 | P a g e
Terminology II

Real and virtual trees

 There is a type of data structure called a tree


 But we are not using it here
 If we diagram the sequence of choices we make, the diagram looks like a tree
 In fact, we did just this a couple of slides ago
 Our backtracking algorithm “sweeps out a tree” in “problem space”

The backtracking algorithm

 Backtracking is really quite simple--we “explore” each node, as follows:


 To “explore” node N:
1. If N is a goal node, return “success”
2. If N is a leaf node, return “failure”
3. For each child C of N,
3.1. Explore C
3.1.1. If C was successful, return “success”
4. Return “failure”

92 | P a g e
Full example: Map coloring

 The Four Color Theorem states that any map on a plane can be colored with no more than
four colors, so that no two countries with a common border are the same color
 For most maps, finding a legal coloring is easy
 For some maps, it can be fairly difficult to find a legal coloring
 We will develop a complete Java program to solve this problem

Data structures

 We need a data structure that is easy to work with, and supports:


 Setting a color for each country
 For each country, finding all adjacent countries
 We can do this with two arrays
 An array of “colors”, where countryColor[i] is the color of the ith
country
 A ragged array of adjacent countries, where map[i][j] is the jth country
adjacent to country i
 Example: map[5][3]==8 means the 3th country adjacent to
country 5 is country 8

93 | P a g e
Creating the map

Setting the initial colors


static final int NONE = 0;
static final int RED = 1;
static final int YELLOW = 2;
static final int GREEN = 3;
static final int BLUE = 4;

int mapColors[] = { NONE, NONE, NONE, NONE,


NONE, NONE, NONE };

94 | P a g e
The main program

(The name of the enclosing class is ColoredMap)


public static void main(String args[]) {
ColoredMap m = new ColoredMap();
m.createMap();
boolean result = m.explore(0, RED);
System.out.println(result);
m.printMap();

The backtracking method

95 | P a g e
Checking if a color can be used

Printing the results

96 | P a g e
GRAPH ALGORITHMS
Graph Theory is an area of mathematics that deals with following types of problems
 Connection problems
 Scheduling problems
 Transportation problems
 Network analysis
 Games and Puzzles.

The Graph Theory has important applications in Critical path analysis, Social psychology,
Matrix theory, Set theory, Topology, Group theory, Molecular chemistry, and Searching.

Those who would like to take a quick tour of essentials of graph theory please go directly to
"Graph Theory" from here.

Introduction to Graphs
Definitions
Graphs, vertices and edges
A graph is a collection of nodes called vertices, and the connections between them, called edges.
Undirected and directed graphs
When the edges in a graph have a direction, the graph is called a directed graph or digraph, and
the edges are called directed edges or arcs. Here, I shall be exclusively concerned with directed
graphs, and so when I refer to an edge, I mean a directed edge. This is not a limitation, since an
undirected graph can easily be implemented as a directed graph by adding edges between
connected vertices in both directions.
A representation can often be simplified if it is only being used for undirected graphs, and I'll
mention in passing how this can be achieved.
Neighbours and adjacency
A vertex that is the end-point of an edge is called a neighbour of the vertex that is its starting-
point. The first vertex is said to be adjacent to the second.
An example

97 | P a g e
The following diagram shows a graph with 5 vertices and 7 edges. The edges between A and D
and B and C are pairs that make a bidirectional connection, represented here by a double-headed
arrow.

Mathematical definition
More formally, a graph is an ordered pair, G = <V, A>, where V is the set of vertices, and A, the
set of arcs, is itself a set of ordered pairs of vertices.
For example, the following expressions describe the graph shown above in set-theoretic
language:
V = {A, B, C, D, E}
A = {<A, B>, <A, D>, <B, C>, <C, B>, <D, A>, <D, C>, <D, E>}

Digraph
A directed graph, or digraph G consists of a finite nonempty set of vertices V, and a finite set of
edges E, where an edge is an ordered pair of vertices in V. Vertices are also commonly referred
to as nodes. Edges are sometimes referred to as arcs.

As an example, we could define a graph G=(V, E) as follows:


V = {1, 2, 3, 4}
E = { (1, 2), (2, 4), (4, 2) (4, 1)}
Here is a pictorial representation of this graph.

98 | P a g e
The definition of graph implies that a graph can be drawn just knowing its vertex-set and its
edge-set. For example, our first example

has vertex set V and edge set E where: V = {1,2,3,4} and E =


{(1,2),(2,4),(4,3),(3,1),(1,4),(2,1),(4,2),(3,4),(1,3),(4,1). Notice that each edge seems to be listed
twice.
Another example, the following Petersen Graph G=(V,E) has vertex set V and edge set E where:
V = {1,2,3,4}and E ={(1,2),(2,4),(4,3),(3,1),(1,4),(2,1),(4,2),(3,4),(1,3),(4,1)}.

99 | P a g e
This section covers following three important topics from algorithmic perspective.
1. Transpose
2. Square
3. Incidence Matrix

1. Transpose
If graph G = (V, E) is a directed graph, its transpose, GT = (V, ET) is the same as graph G with all
arrows reversed. We define the transpose of a adjacency matrix A = (aij) to be the adjacency
matrix AT = (Taij) given by Taij = aji. In other words, rows of matrix A become columns of matrix
AT and columns of matrix A becomes rows of matrix AT. Since in an undirected graph, (u, v) and
(v, u) represented the same edge, the adjacency matrix A of an undirected graph is its own
transpose: A = AT.
Formally, the transpose of a directed graph G = (V, E) is the graph GT (V, ET), where ET = {(u,
v) Î V×V : (u, v)ÎE. Thus, GT is G with all its edges reversed.
We can compute GT from G in the adjacency matrix representations and adjacency list
representations of graph G.
Algorithm for computing GT from G in representation of graph G is

100 | P a g e
ALGORITHM MATRIX TRANSPOSE (G, GT)
For i = 0 to i < V[G]
For j = 0 to j V[G]
GT (j, i) = G(i, j)
j = j + 1;
i=i+1

To see why it works notice that if GT(i, j) is equal to G(j, i), the same thing is achieved. The time
complexity is clearly O(V2).

Algorithm for Computing GT from G in Adjacency-List Representation


In this representation, a new adjacency list must be constructed for transpose of G. Every list in
adjacency list is scanned. While scanning adjacency list of v (say), if we encounter u, we put v in
adjacency-list of u.

ALGORITHM LIST TRANSPOSE [G]


for u = 1 to V[G]
for each element vÎAdj[u]
Insert u into the front of Adj[v]

To see why it works, notice if an edge exists from u to v, i.e., v is in the adjacency list of u, then
u is present in the adjacency list of v in the transpose of G.

2. Square
The square of a directed graph G = (V, E) is the graph G2 = (V, E2) such that (a, b)ÎE2 if and only
if for some vertex cÎV, both (u, c)ÎE and (c,b)ÎE. That is, G2 contains an edge between vertex a
and vertex b whenever G contains a path with exactly two edges between vertex a and vertex b.

Algorithms for Computing G2 from G in the Adjacency-List Representation of G

101 | P a g e
Create a new array Adj'(A), indexed by V[G]
For each v in V[G] do
For each u in Adj[v] do
\\ v has a path of length 2.
\\ to each of the neighbors of u
make a copy of Adj[u] and append it to Adj'[v]
Return Adj'(A).
For each vertex, we must make a copy of at most |E| list elements. The total time is O(|V| * |E|).
Algorithm for Computing G2 from G in the Adjacency-Matrix representation of G.
For i = 1 to V[G]
For j = 1 to V[G]
For k = 1 to V[G]
c[i, j] = c[i, j] + c[i, k] * c[k, j]

Because of three nested loops, the running time is O(V3).

3. Incidence Matrix
The incidence matrix of a directed graph G=(V, E) is a V×E matrix B = (bij) such that
-1 if edge j leaves vertex j.
bij = 1 if edge j enters vertex j.
0 otherwise.
If B is the incidence matrix and BT is its transpose, the diagonal of the product matrix BBT
represents the degree of all the nodes, i.e., if P is the product matrix BBT then P[i, j] represents
the degree of node i:
Specifically we have
BBT(i,j) = ∑eÎE bie bTej = ∑eÎE bie bje
Now,
 If i = j, then biebje = 1, whenever edge e enters or leaves vertex i and 0 otherwise.
 If i ≠ j, then biebje = -1, when e = (i, j) or e = (j, i) and 0 otherwise.

Therefore

102 | P a g e
BBT(i,j) = deg(i) = in_deg + Out_deg if i = j
= -(# of edges connecting i an j ) if i ≠ j

Types of Graph Algorithms


1. Breadth First Search (BFS)
2. Depth First Search (DFS)
3. Topological Sort
4. Strongly Connected Components
5. Euler Tour
6. Generic Minimum Spanning Tree
7. Kruskal's Algorithm
8. Prim's Algorithm

1. Breadth-First Search Traversal Algorithm


Breadth-first search is a way to find all the vertices reachable from the a given source vertex, s.
Like depth first search, BFS traverse a connected component of a given graph and defines a
spanning tree. Intuitively, the basic idea of the breath-first search is this: send a wave out from
source s. The wave hits all vertices 1 edge from s. From there, the wave hits all vertices 2 edges
from s. Etc. We use FIFO queue Q to maintain the wavefront: v is in Q if and only if wave has
hit v but has not come out of v yet.

Overall Strategy of BFS Algorithm


Breadth-first search starts at a given vertex s, which is at level 0. In the first stage, we visit all the
vertices that are at the distance of one edge away. When we visit there, we paint as "visited," the
vertices adjacent to the start vertex s - these vertices are placed into level 1. In the second stage,
we visit all the new vertices we can reach at the distance of two edges away from the source
vertex s. These new vertices, which are adjacent to level 1 vertices and not previously assigned
to a level, are placed into level 2, and so on. The BFS traversal terminates when every vertex has
been visited.
To keep track of progress, breadth-first-search colors each vertex. Each vertex of the graph is in
one of three states:

103 | P a g e
1. Undiscovered;
2. Discovered but not fully explored; and
3. Fully explored.
The state of a vertex, u, is stored in a color variable as follows:
1. color[u] = White - for the "undiscovered" state,
2. color [u] = Gray - for the "discovered but not fully explored" state, and
3. color [u] = Black - for the "fully explored" state.
The BFS(G, s) algorithm develops a breadth-first search tree with the source vertex, s, as its root.
The parent or predecessor of any other vertex in the tree is the vertex from which it was first
discovered. For each vertex, v, the parent of v is placed in the variable π[v]. Another variable,
d[v], computed by BFS contains the number of tree edges on the path from s to v. The breadth-
first search uses a FIFO queue, Q, to store gray vertices.

Algorithm: Breadth-First Search Traversal


BFS(V, E, s)
1. for each u in V − {s} ▷ for each vertex u in V[G] except s.
2. do color[u] ← WHITE
3. d[u] ← infinity
4. π[u] ← NIL
5. color[s] ← GRAY ▷ Source vertex discovered
6. d[s] ← 0 ▷ initialize
7. π[s] ← NIL ▷ initialize
8. Q ← {} ▷ Clear queue Q
9. ENQUEUE(Q, s)
10 while Q is non-empty
11. do u ← DEQUEUE(Q) ▷ That is, u = head[Q]
12. for each v adjacent to u ▷ for loop for every node along with edge.
13. do if color[v] ← WHITE ▷ if color is white you've never seen it before
14. then color[v] ← GRAY
15. d[v] ← d[u] + 1

104 | P a g e
16. π[v] ← u
17. ENQUEUE(Q, v)
18. DEQUEUE(Q)
19. color[u] ← BLACK

Example: The following figure (from CLRS) illustrates the progress of breadth-first search on
the undirected sample graph.
a. After initialization (paint every vertex white, set d[u] to infinity for each vertex u, and set the
parent of every vertex to be NIL), the source vertex is discovered in line 5. Lines 8-9 initialize Q
to contain just the source vertex s.

b. The algorithm discovers all vertices 1 edge from s i.e., discovered all vertices (w and r) at
level 1.

c.

d. The algorithm discovers all vertices 2 edges from s i.e., discovered all vertices (t, x, and v) at
level 2.

e.

105 | P a g e
f.

g. The algorithm discovers all vertices 3 edges from s i.e., discovered all vertices (u and y) at
level 3.

h.

i. The algorithm terminates when every vertex has been fully explored.

Analysis
 The while-loop in breadth-first search is executed at most |V| times. The reason is that
every vertex enqueued at most once. So, we have O(V).
 The for-loop inside the while-loop is executed at most |E| times if G is a directed graph or
2|E| times if G is undirected. The reason is that every vertex dequeued at most once and
we examine (u, v) only when u is dequeued. Therefore, every edge examined at most
once if directed, at most twice if undirected. So, we have O(E).

106 | P a g e
Therefore, the total running time for breadth-first search traversal is O(V + E).

Lemma 22.3 (CLRS) At any time during the execution of BFS suppose that Q contains the
vertices {v1, v2, ..., vr} with v1 at the head and vr at the tail. Then d[v1] ≤ d[v2] ≤ ... ≤ d[vr] ≤ d[v1]
+ 1.
Let v be any vertex in V[G]. If v is reachable from s then let δ(s, v) be the minimum number of
edges in E[G] that must be traversed to go from vertex s to vertex v. If v is not reachable from s
then let δ(s, v) = ∞.
Theorem 22.5 (CLRS) If BFS is run on graph G from a source vertex s in V[G] then for all v
in V[G], d[v] = δ(s, v) and if v ≠ s is reachable from s then one of the shortest paths from s to v is
a shortest path from s to π[v] followed by the edge from π[v] to v.
BFS builds a tree called a breadth-first-tree containing all vertices reachable from s. The set of
edges in the tree (called tree edges) contain (π[v], v) for all v where π[v] ≠ NIL.
If v is reachable from s then there is a unique path of tree edges from s to v. Print-Path(G, s, v)
prints the vertices along that path in O(|V|) time.
Print-Path(G, s, v)
if v = s
then print s
else if π[v] ← NIL
then print "no path exists from " s "to" v"
else Print-Path(G, s, π[v])
print v

Algorithms based on BFS


Based upon the BFS, there are O(V + E)-time algorithms for the following problems:
 Testing whether graph is connected.
 Computing a spanning forest of graph.
 Computing, for every vertex in graph, a path with the minimum number of edges between
start vertex and current vertex or reporting that no such path exists.
 Computing a cycle in graph or reporting that no such cycle exists.

107 | P a g e
In our course, we will use BFS in the following:
 Prim's MST algorithm. (CLRS, Chapter 23.)
 Dijkstra's single source shortest path algorithm. (CLRS, Chapter 24.)

Some Applications of BFS


1. Bipartite Graph
We define bipartite graph as follows: A bipartite graph is an undirected graph G = (V, E) in
which V can be partitioned into two sets V1 and V2 such that (u, v) E implies either u in V1 and v
in V2 or u in V2 and v in V1. That is, all edges go between the two sets V1 and V2.
In other to determine if a graph G = (V, E) is bipartite, we perform a BFS on it with a little
modification such that whenever the BFS is at a vertex u and encounters a vertex v that is already
'gray' our modified BSF should check to see if the depth of both u and v are even, or if they are
both odd. If either of these conditions holds which implies d[u] and d[v] have the same parity,
then the graph is not bipartite. Note that this modification does not change the running time of
BFS and remains O(V + E).
Formally, to check if the given graph is bipartite, the algorithm traverse the graph labeling the
vertices 0, 1, or 2 corresponding to unvisited, partition 1 and partition 2 nodes. If an edge is
detected between two vertices in the same partition, the algorithm returns.

ALGORITHM: BIPARTITE (G, S)


For each vertex u in V[G] − {s}
do color[u] ← WHITE
d[u] ← ∞
partition[u] ← 0
color[s] ← GRAY
partition[s] ← 1
d[s] ← 0
Q ← [s]
while Queue 'Q' is non-empty
do u ← head [Q]
for each v in Adj[u] do

108 | P a g e
if partition [u] ← partition [v]
then return 0
else
if color[v] ← WHITE then
then color[v] ← gray
d[v] = d[u] + 1
partition[v] ← 3 − partition[u]
ENQUEUE (Q, v)
DEQUEUE (Q)
Color[u] ← BLACK
Return 1
Correctness
As Bipartite (G, S) traverse the graph it labels the vertices with a partition number consisted with
the graph being bipartite. If at any vertex, algorithm detects an inconsistency, it shows with an
invalid return value. Partition value of u will always be a valid number as it was enqueued at
some point and its partition was assigned at that point. At line 19, partition of v will unchanged if
it already set, otherwise it will be set to a value opposite to that of vertex u.
Analysis
The lines added to BFS algorithm take constant time to execute and so the running time is the
same as that of BFS which is O(V + E).

2. Diameter of Tree
The diameter of a tree T = (V, E) is the largest of all shortest-path distance in the tree and given
by max[dist(u, v)]. As we have mentioned that BSF can be use to compute, for every vertex in
graph, a path with the minimum number of edges between start vertex and current vertex. It is
quite easy to compute the diameter of a tree. For each vertex in the tree, we use BFS algorithm to
get a shortest-path. By using a global variable length, we record the largest of all shortest-paths.
ALGORITHM: TREE_DIAMETER (T)
maxlength ← 0
for s ← 0 to s < |V[T]|
do temp ← BSF(T, S)

109 | P a g e
if maxlength < temp
maxlength ← temp
increment s by 1
return maxlength

Analysis
This will clearly takes O(V(V + E)) time.

2. Depth-First Search
Depth-first search is a systematic way to find all the vertices reachable from a source vertex, s.
Historically, depth-first was first stated formally hundreds of years ago as a method for
traversing mazes. Like breadth-first search, DFS traverse a connected component of a given
graph and defines a spanning tree. The basic idea of depth-first search is this: It methodically
explore every edge. We start over from different vertices as necessary. As soon as we discover a
vertex, DFS starts exploring from it (unlike BFS, which puts a vertex on a queue so that it
explores from it later).

Overall Strategy of DFS Algorithm


Depth-first search selects a source vertex s in the graph and paint it as "visited." Now the vertex s
becomes our current vertex. Then, we traverse the graph by considering an arbitrary edge (u, v)
from the current vertex u. If the edge (u, v) takes us to a painted vertex v, then we back down to
the vertex u. On the other hand, if edge (u, v) takes us to an unpainted vertex, then we paint the
vertex v and make it our current vertex, and repeat the above computation. Sooner or later, we
will get to a “dead end,” meaning all the edges from our current vertex u takes us to painted
vertices. This is a deadlock. To get out of this, we back down along the edge that brought us here
to vertex u and go back to a previously painted vertex v. We again make the vertex v our current
vertex and start repeating the above computation for any edge that we missed earlier. If all of v's
edges take us to painted vertices, then we again back down to the vertex we came from to get to
vertex v, and repeat the computation at that vertex. Thus, we continue to back down the path that
we have traced so far until we find a vertex that has yet unexplored edges, at which point we take
one such edge and continue the traversal. When the depth-first search has backtracked all the

110 | P a g e
way back to the original source vertex, s, it has built a DFS tree of all vertices reachable from
that source. If there still undiscovered vertices in the graph then it selects one of them as the
source for another DFS tree. The result is a forest of DFS-trees.
Note that the edges lead to new vertices are called discovery or tree edges and the edges lead to
already visited (painted) vertices are called back edges.
Like BFS, to keep track of progress depth-first-search colors each vertex. Each vertex of the
graph is in one of three states:
1. Undiscovered;
2. Discovered but not finished (not done exploring from it); and
3. Finished (have found everything reachable from it) i.e. fully explored.
The state of a vertex, u, is stored in a color variable as follows:
1. color[u] = White - for the "undiscovered" state,
2. color[u] = Gray - for the "discovered but not finished" state, and
3. color[u] = Black - for the "finished" state.

Like BFS, depth-first search uses π[v] to record the parent of vertex v. We have π[v] = NIL if and
only if vertex v is the root of a depth-first tree.
DFS time-stamps each vertex when its color is changed.
1. When vertex v is changed from white to gray the time is recorded in d[v].
2. When vertex v is changed from gray to black the time is recorded in f[v].
The discovery and the finish times are unique integers, where for each vertex the finish time is
always after the discovery time. That is, each time-stamp is an unique integer in the range of 1 to
2|V| and for each vertex v, d[v] < f[v]. In other words, the following inequalities hold:
1 ≤ d[v] < f[v] ≤ 2|V|

Algorithm Depth-First Search


The DFS forms a depth-first forest comprised of more than one depth-first trees. Each tree is
made of edges (u, v) such that u is gray and v is white when edge (u, v) is explored. The
following pseudocode for DFS uses a global timestamp time.
DFS (V, E)

111 | P a g e
1. for each vertex u in V[G]
2. do color[u] ← WHITE
3. π[u] ← NIL
4. time ← 0
5. for each vertex u in V[G]
6. do if color[u] ← WHITE
7. then DFS-Visit(u) ▷ build a new DFS-tree from u

DFS-Visit(u)
1. color[u] ← GRAY ▷ discover u
2. time ← time + 1
3. d[u] ← time
4. for each vertex v adjacent to u ▷ explore (u, v)
5. do if color[v] ← WHITE
6. then π[v] ← u
7. DFS-Visit(v)
8. color[u] ← BLACK
9. time ← time + 1
10. f[u] ← time ▷ we are done with u

Example (CLRS): In the following figure, the solid edge represents discovery or tree edge and
the dashed edge shows the back edge. Furthermore, each vertex has two time stamps: the first
time-stamp records when vertex is first discovered and second time-stamp records when the
search finishes examining adjacency list of vertex.

112 | P a g e
Analysis
The analysis is similar to that of BFS analysis. The DFS-Visit is called (from DFS or from itself)
once for each vertex in V[G] since each vertex is changed from white to gray once. The for-loop
in DFS-Visit is executed a total of |E| times for a directed graph or 2|E| times for an undirected
graph since each edge is explored once. Moreover, initialization takes Θ(|V|) time. Therefore, the
running time of DFS is Θ(V + E).
Note that its Θ, not just O, since guaranteed to examine every vertex and edge.

Consider vertex u and vertex v in V[G] after a DFS. Suppose vertex v in some DFS-tree. Then
we have d[u] < d[v] < f[v] < f[u] because of the following reasons:
1. Vertex u was discovered before vertex v; and
2. Vertex v was fully explored before vertex u was fully explored.
Note that converse also holds: if d[u] < d[v] < f[v] < f[u] then vertex v is in the same DFS-tree
and a vertex v is a descendent of vertex u.

113 | P a g e
Suppose vertex u and vertex v are in different DFS-trees or suppose vertex u and vertex v are in
the same DFS-tree but neither vertex is the descendent of the other. Then one vertex was
discovered and fully explored before the other was discovered i.e., f[u] < d[v] or f[v] < d[u].
Parenthesis Theorem For all u, v, exactly one of the following holds:
1. d[u] < f[u] < d[v] < f[v] or d[v] < f[v] < d[u] < f[u] and neither of u and v is a descendant of the
other.
2. d[u] < d[v] < f[v] < f[u] and v is a descendant of u.
3. d[v] < d[u] < f[u] < f[v] and u is a descendant of v.
[Proof omitted.]
So, d[u] < d[v] < f[u] < f[v] cannot happen. Like parentheses: ( ) [], ( [ ] ), and [ ( ) ] are OK but
( [ ) ] and [ ( ] ) are not OK.

Corollary Vertex v is a proper descendant of u if and only if d[u] < d[v] < f[v] < f[u].

White-path Theorem Vertex v is a descendant of u if and only if at time d[u], there is a path u
to v consisting of only white vertices. (Except for u, which was just colored gray.)
[Proof omitted.]

Consider a directed graph G = (V, E). After a DFS of graph G we can put each edge into one of
four classes:
1. A tree edge is an edge in a DFS-tree.
2. A back edge connects a vertex to an ancestor in a DFS-tree. Note that a self-loop is a back
edge.
3. A forward edge is a non-tree edge that connects a vertex to a descendent in a DFS-tree.
4. A cross edge is any other edge in graph G. It connects vertices in two different DFS-tree or
two vertices in the same DFS-tree neither of which is the ancestor of the other.

Lemma 1 An Edge (u, v) is a back edge if and only if d[v] < d[u] < f[u] < f[v].
Proof

114 | P a g e
(=> direction) From the definition of a back edge, it connects vertex u to an ancestor vertex v in
a DFS-tree. Hence, vertex u is a descendent of vertex v. Corollary 22.8 in the CLRS (or see
above) states that vertex u is a proper descendent of vertex v if and only if d[v] < d[u] < f[u] <
f[v]. Hence proved forward direction.
(<= direction) Again by the Corollary 22.8 (CLRS), vertex u is a proper descendent of vertex v.
Hence if an edge (u, v) exists from u to v then it is an edge connecting a descendent vertex u to
its ancestor vertex v. Hence, it is a back edge. Hence proved backward direction.
Conclusion: Immediate from both directions.

Lemma 2 An edge (u, v) is a cross edge if and only if d[v] < f[v] < d[u] < f[v].
Proof
First take => direction.
Observation 1 For an edge (u, v), d[u] < f[u] and d[v] < f[v] since for any vertex has to be
discovered before we can finish exploring it.

Observation 2 From the definition of a cross edge it is an edge which is not a tree edge, forward
edge or a backward edge. This implies that none of the relationships for forward edge {d[u] <
d[v] < f[v] < f[u]} or back edge {d[v] < d[u] < f[u] < f[v]} can hold for a cross edge.
From the above two observations we conclude that the only two possibilities are:
1. d[u] < f[u] < d[v] < f[v]
2. d[v] < f[v] < d[u] < f[u]
When the cross edge (u, v) is discovered we must be at vertex u and vertex v must be black. The
reason is that if v was white then edge (u, v) would be a tree edge and if v was gray edge (u, v)
would be a back edge. Therefore, d[v] < d[u] and hence possibility (2) holds true.
Now take <= direction.
We can prove this direction by eliminating the various possible edges that the given relation can
convey. If d[v] < d[v] < d[u] < f[u] then edge (u, v) cannot be a tree or a forward edge. Also, it
cannot be a back edge by lemma 1. Edge (u, v) is not a forward or back edge. Hence it must be a
cross edge (Confused? please go above and look again the definition of cross edge).
Conclusion: Immediately from both directions.

115 | P a g e
DFS-Visit can be modified to classify the edges of a directed graph during the depth first search:
DFS-Visit(u) ▷ with edge classification. G must be a directed graph
1. color[u] ← GRAY
2. time ← time + 1
3. d[u] ← time
4. for each vertex v adjacent to u
5. do if color[v] ← BLACK
6. then if d[u] < d[v]
7. then Classify (u, v) as a forward edge
8. else Classify (u, v) as a cross edge
9. if color[v] ← GRAY
10. then Classify (u, v) as a back edge
11. if color[v] ← WHITE
12. then π[v] ← u
13. Classify (u, v) as a tree edge
14. DFS-Visit(v)
15. color[u] ← BLACK
16. time ← time + 1
17. f[u] ← time

Suppose G be an undirected graph, then we have following edge classification:


1. Tree Edge an edge connects a vertex with its parent.
2. Back Edge a non-tree edge connects a vertex with an ancestor.
3. Forward Edge There is no forward edges because they become back edges when considered
in the opposite direction.
4. Cross Edge There cannot be any cross edge because every edge of G must connect an
ancestor with a descendant.

Theorem In a depth-first search of an undirected graph G, every edge in E[G] is either a tree
edge or a back edge. No forward or cross edges.
[Proof omitted.]

116 | P a g e
Algorithms based on DFS
Based upon DFS, there are O(V + E)-time algorithms for the following problems:
 Testing whether graph is connected.
 Computing a spanning forest of G.
 Computing the connected components of G.
 Computing a path between two vertices of G or reporting that no such path exists.
 Computing a cycle in G or reporting that no such cycle exists.

Application
As an application of DFS lets determine whether or not an undirected graph contains a cycle. It is
not difficult to see that the algorithm for this problem would be very similar to DFS(G) except
that when the adjacent edge is already a GRAY edge than a cycle is detected. While doing this
the algorithm also takes care that it is not detecting a cycle when the GRAY edge is actually a
tree edge from a ancestor to a descendent.

ALGORITHM DFS_DETECT_CYCLES [G]


for each vertex u in V[G]
do color[u] ← WHITE
predecessor[u] ← NIL
time ← 0
for each vertex u in V[G]
do if color[u] ← WHITE
DFS_visit(u)
The sub-algorithm DFS_visit(u) is as follows:
DFS_visit(u)
color(u) ← GRAY
d[u] ← time ← time + 1
for each v adjacent to u
do if color[v] ← GRAY and Predecessor[u] ≠ v
return "cycle exists"

117 | P a g e
if color[v] ← WHITE
do predecessor[v] ← u
recursively DFS_visit(v)
color[u] ← BLACK
f[u] ← time ← time + 1

Correctness
To see why this algorithm works suppose the node to visited v is a gray node, then there are two
possibilities. The first possibility is that the node v is a parent node of u and we are going back
the tree edge which we traversed while visiting u after visiting v. In that case it is not a cycle.
The second possibility is that v has already been encountered once during DFS_visit and what
we are traversing now will be back edge and hence a cycle is detected.
Time Complexity
The maximum number of possible edges in the graph G if it does not have cycle is |V| − 1. If G
has a cycles, then the number of edges exceeds this number. Hence, the algorithm will detects a
cycle at the most at the Vth edge if not before it. Therefore, the algorithm will run in O(V) time.

3. Strongly Connected Components

Decomposing a directed graph into its strongly connected components is a classic application of
depth-first search. The problem of finding connected components is at the heart of many graph
application. Generally speaking, the connected components of the graph correspond to different
classes of objects. The first linear-time algorithm for strongly connected components is due to
Tarjan (1972). Perhaps, the algorithm in the CLRS is easiest to code (program) to find strongly
connected components and is due to Sharir and Kosaraju.
Given digraph or directed graph G = (V, E), a strongly connected component (SCC) of G is a
maximal set of vertices C subset of V, such that for all u, v in C, both u Þ v and v Þ u; that is,
both u and v are reachable from each other. In other words, two vertices of directed graph are in
the same component if and only if they are reachable from each other.

118 | P a g e
C1 C2 C3 C4
The above directed graph has 4 strongly connected components: C1, C2, C3 and C4. If G has an
edge from some vertex in Ci to some vertex in Cj where i ≠ j, then one can reach any vertex in Cj
from any vertex in Ci but not return. In the example, one can reach any vertex in C2 from any
vertex in C1 but cannot return to C1 from C2.

The algorithm in CLRS for finding strongly connected components of G = (V, E) uses the
transpose of G, which define as:
 GT = (V, ET), where ET = {(u, v): (v, u) in E}.
 GT is G with all edges reversed.
From the given graph G, one can create GT in linear time (i.e., Θ(V + E)) if using adjacency lists.

Observation:
The graphs G and GT have the same SCC's. This means that vertices u and v are reachable from
each other in G if and only if reachable from each other in GT.

Component Graph
The idea behind the computation of SCC comes from a key property of the component graph,
which is defined as follows:
GSCC = (VSCC, ESCC), where VSCC has one vertex for each SCC in G and ESCC has an edge if
there's an edge between the corresponding SCC's in G.
For our example (above) the GSCC is:

119 | P a g e
The key property of GSCC is that the component graph is a dag, which the following lemma
implies.
Lemma GSCC is a dag. More formally, let C and C' be distinct SCC's in G, let u, v in C, u', v' in
C', and suppose there is a path u Þ u' in G. Then there cannot also be a path v' Þ v in G.
Proof Suppose there is a path v' Þ v in G. Then there are paths u Þ u' Þ v' and v' Þ v Þ u in G.
Therefore, u and v' are reachable from each other, so they are not in separate SCC's.
This completes the proof.

ALGORITHM
A DFS(G) produces a forest of DFS-trees. Let C be any strongly connected component of G, let
v be the first vertex on C discovered by the DFS and let T be the DFS-tree containing v when
DFS-visit(v) is called all vertices in C are reachable from v along paths containing visible
vertices; DFS-visit(v) will visit every vertex in C, add it to T as a descendant of v.
STRONGLY-CONNECTED-COMPONENTS (G)
1. Call DFS(G) to compute finishing times f[u] for all u.
2. Compute GT
3. Call DFS(GT), but in the main loop, consider vertices in order of decreasing f[u] (as
computed in first DFS)
4. Output the vertices in each tree of the depth-first forest formed in second DFS as a separate
SCC.

Time: The algorithm takes linear time i.e., θ(V + E), to compute SCC of a digraph G.
From our Example (above):
1. Do DFS
2. GT
3. DFS (roots blackened)

120 | P a g e
Another Example (CLRS) Consider a graph G = (V, E).
1. Call DFS(G)

2. Compute GT

3. Call DFS(GT) but this time consider the vertices in order to decreasing finish time.

4. Output the vertices of each tree in the DFS-forest as a separate strongly connected
components.
{a, b, e}, {c, d}, {f, g}, and {h}

Now the question is how can this possibly work?

121 | P a g e
Idea By considering vertices in second DFS in decreasing order of finishing times from first
DFS, we are visiting vertices of the component graph in topological sort order.
To prove that it really works, first we deal with two notational issues:
 We will be discussing d[u] and f[u]. These always refer to the first DFS in the above
algorithm.
 We extend notation for d and f to sets of vertices U subset V:
o d(U) = minu in U {d[u]} (earliest discovery time of any vertex in U)
o f(U) = minu in U {f[u]} (latest finishing time of any vertex in U)

Lemma Let C and C' be distinct SCC's in G = (V, E). Suppose there is an edge (u, v) in E such
that u in C and v in C'. Then f(C) > f(C').

Proof There are two cases, depending on which SCC had the first discovered vertex during the
first DFS.
Case i. If d(C) > d(C'), let x be the first vertex discovered in C. At time d[x], all vertices in C and
C' are white. Thus, there exist paths of white vertices from x to all vertices in C and C'.
By the white-path theorem, all vertices in C and C' are descendants of x in depth-first tree.
By the parenthesis theorem, we have f[x] = f(C) > f(C').
Case ii. If d(C) > d(C'), let y be the first vertex discovered in C'. At time d[y], all vertices in C'
are white and there is a white path from y to each vertex in C. This implies that all vertices in C'
become descendants of y. Again, f[y] = f(C').
At time d[y], all vertices in C are white.
By earlier lemma, since there is an edge (u, v), we cannot have a path from C' to C. So, no vertex
in C is reachable from y. Therefore, at time f[y], all vertices in C are still white. Therefore, for all
w in C, f[w] > f[y], which implies that f(C) > f(C').
This completes the proof.
Corollary Let C and C' be distinct SCC's in G = (V, E). Suppose there is an edge (u, v) in ET
where u in C and v in C'. Then f(C) < f(C').
122 | P a g e
Proof Edge (u, v) in ET implies (v, u) in E. Since SCC's of G and GT are the same, f(C') > f(C).
This completes the proof.
Corollary Let C and C' be distinct SCC's in G = (V, E), and suppose that f(C) > f(C'). Then
there cannot be an edge from C to C' in GT.

Proof Idea It's the contrapositive of the previous corollary.

Now, we have the intuition to understand why the SCC procedure works.
When we do the second DFS, on GT, start with SCC C such that f(C) is maximum. The second
DFS starts from some x in C, and it visits all vertices in C. Corollary says that since f(C) > f(C')
for all C' ≠ C, there are no edges from C to C' in GT. Therefore, DFS will visit only vertices in C.
Which means that the depth-first tree rooted at x contains exactly the vertices of C.
The next root chosen in the second DFS is in SCC C' such that f(C') is maximum over all SCC's
other than C. DFS visits all vertices in C', but the only edges out of C' go to C, which we've
already visited.
Therefore, the only tree edges will be to vertices in C'.
We can continue the process.
Each time we choose a root for the second DFS, it can reach only
 vertices in its SCC ‾ get tree edges to these,
 vertices in SCC's already visited in second DFS ‾ get no tree edges to these.
We are visiting vertices of (GT)SCC in reverse of topologically sorted order. [CLRS has a formal
proof.]

Before leaving strongly connected components, lets prove that the component graph of G = (V,
E) is a directed acyclic graph.

Proof (by contradiction) Suppose component graph of G = (V, E) was not a DAG and G
comprised of a cycle consisting of vertices v1, v2 , . . . , vn . Each vi corresponds to a strongly
connected component (SCC) of component graph G. If v1, v2 , . . . , vn themselves form a cycle
then each vi ( i runs from 1 to n) should have been included in the SCC corresponding to vj ( j

123 | P a g e
runs from 1 to n and i ≠ j). But each of the vertices is a vertex from a difference SCC of G.
Hence, we have a contradiction! Therefore, SCC of G is a directed acyclic graph.

4. Euler Tour
The motivation of this section is derived from the famous Konigsberg bridge problem solved by
Leonhard Euler in 1736. The 18th century German city of Königsberg was situated on the river
Pregel. Within a park built on the banks of the river, there were two islands joined by seven
bridges. The puzzle asks whether it is possible to take a tour through the park, crossing each
bridge only once.
An exhaustive search requires starting at every possible point and traversing all the possible
paths from that point - an O(n!) problem. However Euler showed that an Eulerian path existed
if and only if
 it is possible to go from any vertex to any other by following the edges (the graph must
be connected) and
 every vertex must have an even number of edges connected to it, with at most two
exceptions (which constitute the starting and ending points).
It is easy to see that these are necessary conditions: to complete the tour, one needs to enter and
leave every point except the start and end points. The proof that these are sufficient conditions
may be found in the literature . Thus we now have a O(n) problem to determine whether a path
exists.

124 | P a g e
In order to get a solution transform the map into a graph in which the nodes represent the "dry
land" points and the arcs represent the bridges.

We can now easily see that the Bridges of Königsberg does not have a solution. A quick
inspection shows that it does have a Hamiltonian path.

Definition A Euler tour of a connected, directed graph G = (V, E) is a cycle that traverses each
edge of graph G exactly once, although it may visit a vertex more than once.
In the first part of this section we show that G has an Euler tour if and only if in-degrees of every
vertex is equal to out-degree vertex. In the second part, we describe an algorithm to find an Euler
tour of graph if one exists.
Part 1 Show that G has an Euler tour if and only if in-degree(v) = out-degree(v) for each vertex
vÎV
Proof
First we'll work with => direction.
We will call a cycle simple if it visits each vertex no more than once, and complex if can visit a
vertex more than once. We know that each vertex in a simple cycle in-degree and out-degree
one, and any complex cycles can be expressed as a union of simple cycles. This implies that any
vertex in a complex cycle (and in particular an Euler tour) has in-degree equal to its out-degree.
Thus, if a graph has an Euler tour than all of its vertices have equal in- and out- degrees.

125 | P a g e
Now look at the <= direction.
Suppose we have a connected graph for which the in-degree and out-degree of all vertices are
equal. Let C be the longest complex cycle within G. If C is not an Euler tour, then there is a
vertex v of G touched by C such that not all edges in and out v of are exhausted by C. We may
construct a cycle C` in G-C starting and ending at v by performing a walk in G-C. (The reason is
that G-C also has a property that in-degrees and out-degrees are equal.) this simply means that
the complex cycle that starts at v goes along the edges of C` (returning to v) and then goes along
the edges of C is a longer complex cycle than C. This contradicts our choice of C as the longest
complex cycle which means that C must have been an Euler tour.

Part 2 Find an Euler tour of given graph G if one exists.

ALGORITHM
Given a starting vertex , the v0 algorithm will first find a cycle C starting and ending at v0 such
that C contains all edges going into and out of v0. This can be performed by a walk in the graph.
As we discover vertices in cycle C, we will create a linked list which contains vertices in order
and such that the list begins and ends in vertex v0. We set the current painter to the head of the
list. We now traverse the list by moving our pointer "current" to successive vertices until we and
a vertex which has an outgoing edge which has not been discovered. (If we reach the end of the
list, then we have already found the Euler tour). Suppose we find the vertex, vi, that has an
undiscovered outgoing edge. We then take a walk beginning and ending at vi such that all
undiscovered edges containing vi are contained in the walk. We insert our new linked list into old
linked list in place of vi and more "current" to the new neighbor pointed to the first node
containing vi. We continue this process until we search the final node of the linked list, and the
list will then contains an Euler tour.

Running Time of Euler Tour


The algorithm traverse each edge at most twice, first in a walk and second while traversing the
list to find vertices with outgoing edges. Therefore, the total running time of the algorithm is
O(|E|).

126 | P a g e
AUTOMATA THEORY
What is Automata Theory?
 Study of abstract computing devices, or“machines”
 Automaton = an abstract computing device
 Note: A “device” need not even be a physical hardware!
 A fundamental question in computer science:
 Find out what different models of machines can do and cannot do
 The theory of computation
 Computability vs Complexity

127 | P a g e
The Central Concepts of Automata Theory
Alphabet
 An alphabet is a finite, non-empty set of symbols
 We use the symbol Σ (sigma) to denote an alphabet
Examples:
o Binary: Σ = {0,1}
o All lower case letters: Σ = {a,b,c,..z}
o Alphanumeric: Σ = {a-z, A-Z, 0-9}
o DNA molecule letters: Σ = {a,c,g,t}
Strings
 A string or word is a finite sequence of symbols
 chosen from Σ
 Empty string is (or “epsilon”)
 Length of a string w,denoted by “|w|”, is equal to the number of (non- ) characters in
the string
 E.g., x = 010100 |x| = 6
 x = 01 0 1 00 |x| = ?
 xy = concatenation of two strings x and y

128 | P a g e
Powers of an alphabet
Let Σ be an alphabet.
 Σk = the set of all strings of length k
 Σ* = Σ0 U Σ1 U Σ2U …
 Σ+ = Σ1 U Σ2 U Σ3 U …

Languages

Finite Automata
Some Applications
 Software for designing and checking the behavior of digital circuits
 Lexical analyzer of a typical compiler
 Software for scanning large bodies of text (e.g., web pages) for pattern finding
 Software for verifying systems of all types that have a finite number of states (e.g., stock
market transaction, communication/network protocol)

129 | P a g e
Structural expressions
Grammars
Regular expressions
E.g., unix style to capture city names such as “Palo Alto CA”:

130 | P a g e
Proofs

Terminology

 Definition: precise description of the objects and notions that we use


 Mathematical statement: precise statement about the objects and notions that we use,
typically that it has some property
o Is either true or false
 Proof: convincing argument that a statement is true
 Theorem: Statement that has been proved to be true
 Lemma: True statement used to prove a Theorem
 Corollary: True statement that follows easily from a Theorem

Hints for Finding Proofs

 No magic formula
 Understand statement:
o Write in own words
o Consider parts separately
 Recognize hidden parts:
o iff
o To prove that 2 sets are equal (ie A = B), prove that A ⊆ B and B ⊆ A
 Think about why the statement must be true
 Consider examples:
o Try examples that have the property
o Try to find examples that don't have the property
 Attempt something easier (eg special case)
 Write up ideas clearly
 Come back to it; be patient

Deductive Proofs

131 | P a g e
From the given statement(s) to a conclusion statement (what we want to prove)
Logical progression by direct implications
Example for parsing a statement:
“If y≥4, then 2y≥y2.”
Given conclusion
(there are other ways of writing this).

Quantifiers
For all” or “For every”
 Universal proofs
 Notation*=?
“There exists”
 Used in existential proofs
 Notation*=?
Implication is denoted by =>
 E.g., “IF A THEN B” can also be written as “A=>B”

132 | P a g e
Proving techniques

By contradiction

 To prove P, assume ~P and use it to prove a contradiction (ie to another statement that is
obviously false). Then conclude that P must be true.

 Example from the text, revised: Use the "fact" that if it's raining, then everyone who
comes in from the outside has an umbrella to prove it's not raining. First, assume that it
is raining. Therefore, from the "fact" given above, everyone who comes in will have an
umbrella. Someone just came in from outside without an umbrella. This is a
contradiction. Thus we conclude that it is not raining.
o The method of this proof is valid, but the conclusion is only valid if the "fact"
really is a fact!
 Example from the net: Prove p: that there is no smallest positive rational number. First
assume p is false, that is, that there is a smallest positive rational. Call the smallest
rational number r...
 Example from the text: square root of 2 is irrational
 Careful: When using proof by contradiction, mistakes can lead to apparent
contradictions.
For homework: Prove there is no largest prime number. Please work on this yourself and don't
look up a solution

Another Example
o Start with the statement contradictory to the given statement
o E.g., To prove (A => B), we start with: (A and ~B) … and then show that
could never happen
o What if you want to prove that “(A and B => C or D)”?

133 | P a g e
By induction

 Inductive proofs of P(n) for all n ≥ 1 follow this format:


1. Basis: Prove that P(1) is true.
 Actually, we can prove P(i) for any small integer i (eg P(2).
2. Induction step: assume P(k-1) is true and use this assumption to prove P(k) is true
(ie prove that the property jumps from one integer to the next).
 Alternatively, assume P(k) is true and use this assumption to prove
P(k+1).
 Alternatively, Assume that P(i) is true for 1 ≤ i ≤ k. Use this assumption to
prove that P(k+1) is true
3. From 1 and 2, conclude that P(n) is true for all n ≥ 1.
 Example 1: Prove P(n)that the descendents of George Washington (GW) at generation n
are colorblind (CB) for every n ≥ 1
1. Use this "fact": Colorblindness is hereditary: children of a colorblind parent are
themselves colorblind.
 In other words, if P(k) then P(k+1)
2. Basis: History books indicate that GW was CB so P(1) is true
3. Induction step: Assume, for arbitrary k ≥ 1, p(k): that generation k is colorblind.
Thus, from the "fact" given above, we have P(k+1): generation k+1 is colorblind
4. Hence all generations are colorblind.
 Example 2: Prove that the sum of 1 to n is n(n+1)/2 for all n ≥ 1

 See expanded proof below!


1. For any n ≥ 1, let P(n) represent the proposition that the sum of 1 to n is n(n+1)/2.

2. Basis: P(1) is true since the sum of 1 to 1 is 1(1+1)/2.


3. Induction step: Assume that P(k) is true, meaning that we assume that the sum of
1 to k is k(k+1)/2. Now consider the sum of 1 to (k+1) which equals ... Thus,
P(k+1) is true.
4. Thus we conclude that P(n) is true for all n ≥ 1.

134 | P a g e
Proof by Induction: Example

Let P(n) be the statement ∑i=1ni=n(n+1)2.


Prove P(n) ∀ n ≥ 1. Use proof by induction. Prove P(n)∀n≥1. Use proof by induction.
1. Basis: Prove P(1), that is, that ∑i=11i=1(1+1)2. But ∑i=11i=1 and 1(1+1)2=1, and this
proves P(1).
2. Inductive Step: Assume P(k) for an arbitrary k ≥ 1. This is the Inductive Hypothesis. Now
we must show that P(k) implies P(k+1), that is, that ∑i=1ki=(k)(k+1)2 implies that
∑i=1k+1i=(k+1)(k+1+1)2. (In other words, P(k) ⇒ P(k+1)).
Now ∑i=1k+1i=∑i=1ki+(k+1),by definition of summation=k(k+1)2+(k+1),by the
Inductive Hypothesis=k2+k+2k+22=k2+3k+12=(k+1)(k+2)2=(k+1)(k+1+1)2
Thus, we have proved that ∑i=1k+1i=(k+1)(k+1+1)2, which is P(k+1). Thus, from P(k)
we have proved P(k+1).
3. Therefore, from 1 and 2 we conclude by induction that P(n) is true for all n ≥ 1
 How did we do this?
o Start from what we know.
o Be aware of where we're trying to go.
o Use definitions.

 By counter-example
o Show an example that disproves the claim
o Note: There is no such thing called a“proof by example”!
o So when asked to prove a claim, an example thatsatisfied that claim is not a proof

Proof by Construction

o To prove that an object exists, show how it can be constructed


o Example: A 3-regular graph of degree n exists for every even number n

135 | P a g e
“If-and-Only-If” statements
o “A if and only if B” (A <==> B)
o (if part) if B then A ( <= )
o (only if part) A only if B ( => )
(same as “if A then B”)
“If and only if” is abbreviated as “iff”
i.e., “A iff B”
Example:
Theorem: Let x be a real number. Then floor of x =ceiling of x if and only if x is an integer.
Proofs
for iff have two parts
o One for the “if part” & another for the “only if part”

136 | P a g e
REFERENCES
1. Algorithm Design - Foundations, Analysis & Internet Examples by Michael T.
Goodrich and Roberto Tamassia
2. Data Structures and Algorithms in Java by Michael T. Goodrich and Roberto
Tamassia
3. Data Structures and Algorithms in C++ by Michael T. Goodrich, Roberto
Tamassia and David M. Mount
4. Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson,
Ronald L. Rivest and Clifford Stein

137 | P a g e

You might also like