Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Data Structures and Algorithms

(CS210/ESO207/ESO211)

Lecture 36
Sorting
beyond O(n log n) bound
1

Overview of todays lecture

The sorting algorithms you studied till now


Integer sorting
Solving 2 problems from Practice sheet 6 and
one problem from Practice sheet 5.

Sorting algorithms studied


till now

Algorithms for Sorting n elements

sort:
Insertion

O()
Selection sort:
O()
Bubble sort:
O()
Merge sort:
O( log )
Quick sort: worst case O(), average case O( log )
Heap sort:
O( log )

Question: What is common among these algorithms ?


Answer: All of them are allowed to use only comparison
operation to perform sorting.

Question: Can we sort in O() time ?


The answer depends upon


the model of computation.
the domain of input.

Theorem (to be proved in CS345): Every


comparison based sorting algorithm must perform at least
O( log ) comparisons in the worst case.

word RAM model of computation:


Characteristics

Word is the basic storage unit of RAM. Word is a collection of


few bytes.

Each input item (number, name) is stored in binary format.

RAM can be viewed as a huge array of words. Any arbitrary


location of RAM can be accessed in the same time
irrespective of the location.

Data as well as Program reside fully in RAM.


Each arithmetic or logical operation (+,-,*,/,or, xor,) involving
O( log n) bits take a constant number of steps by the
Each arithmetic
ornlogical
operation
(+,-,*,/,or,
involving
CPU, where
is the number
of bits
of inputxor,)
instance.
a constant number of words takes a constant number of
steps by the CPU.

Integer sorting

Counting sort: algorithm for


sorting integers

Input: An array A storing integers in the range [0].
Output: Sorted array A.
Running time: O() in word RAM model of computation.
Extra space: O()

Counting sort: algorithm for sorting


integers
0

2A

Count 2
1

Place
2

Counting sort: algorithm for sorting


integers
0

2A

Count 2
1

Place
2

Counting sort: algorithm for sorting


integers
0

2A

Count 2
1

Place
1

Counting sort: algorithm for sorting


integers

Algorithm (A[...], )
For =0 to do Count[] 0;
For =0 to do Count[A[]] Count[A[]] +1;
For =0 to do Place[] Count[];
For =1 to do Place[] Place[] + Count[];
For = to do
{
??
] A[];
B[
Place[A[]]-1
Place[A[]] Place[A[]]-1;
}
return B;

Counting sort: algorithm for sorting


integers

Note: The algorithm performs arithmetic operations involving O(log


+ log ) bits. In word RAM model, it takes O() time for such an
operation.
Theorem: An array storing integers in the range [..]can be sorted in
O(+) time and using total O(+) space in word RAM model.
For = O(), we get an optimal algorithm for sorting. But what if is
large ?
In the next class:
We shall discuss an algorithm for sorting integers in the range [..] in
O() time and using O() space in word RAM model.

Practice sheet 6
We shall solve exercises 5 and 1 from
this sheet

Solution of Problem 5 of practice


sheet 6.
Description(in terms of interval):
Given a set A of n intervals, compute smallest set B of intervals so
that for every interval I in A\B, there is some interval in B which
overlaps/intersects with I.

The set of green intervals is a


solution but not an optimal
solution.

Solution of Problem 5 of practice


sheet 6.
Description(in terms of interval):
Given a set A of n intervals, compute smallest set B of intervals so that
for every interval I in A\B, there is some interval in B which
overlaps/intersects with I.

I*

Let I* be the interval with earliest finish time.


Let I be the interval with maximum finish time overlapping I*.
Lemma1: There is an optimal solution for set A that contains I.

Solution of Problem 5 of practice


sheet 6.
Question: How to obtain smaller instance A using this greedy approach
?
Naive approach (again inspired from the job scheduling problem):
remove from A all intervals which overlap with I. This is A.
This approach does not work! Here is a counterexample.

I*

I
I

The problem is that some deleted interval (in this case I) could have
been used for intersecting many intervals if it were not deleted. But
deleting it from the instance disallows it to be selected in the solution.

Overview of the approach

In order to make sure we do not delete intervals (like I in the


previous slide) if they are essential to be selected to cover
many other intervals, we make some observations and
introduce a terminology called Uniquely covered interval. It
turns out that we need to keep I in the smaller instance if
there is an interval there which is uniquely covered by I .
Otherwise, we may discard I.

An Observation
We can delete all intervals whose finish time is before finish time of I because
any interval overlapped by such intervals will anyway be overlapped by I. Let
us consider intervals which overlap with I, but have finish time greater than
that of I. In the example shown below, these intervals are those three intervals
which cross the red line.

I*

I
I

Observation1: Among the intervals crossing the red line, we need to keep only
that interval which has maximum finish time. (I in this picture)
Proof: Notice that each of these intervals are anyway intersected by I. As far
as using them to intersect other intervals in concerned, we may better choose
I for this purpose.
So from now onwards, we shall assume that there is exactly one interval I in A
which overlaps I (intersects the red line) and has finish time larger than I.

Uniquely covered interval

I1

I2

I2 is said to be uniquely covered by I1 if


I2 is fully covered by I1

Every interval overlapping I2 is also full covered by I1.

Lemma2 : There is an optimal solution containing I1.


Proof: Surely I2 or some other interval overlapping it must be there in the
optimal solution. If we replace that interval by I1, we still get a solution of the
same size and hence an optimal solution.

We are now ready to give description/construction of A from


A. There will be two cases. We shall then prove that |Opt(A)|
= |Opt(A)| + 1 for each of these cases.
Important note:
The reader is advised to full understand Lemma1, Lemma2,
Observation1, and the notion of Uniquely covered
interval. Also fully internalize the notations I*, I, and I. This
will help the reader understand the rest of the solution.

Constructing A from A

Constructing A from A

I*

I
I

Weneed
can partition
We
to take these
intervals
into two sets.
care
of intervals
Case1:
There
D:Now
those
which
overlap
whose
starting
we
shall describe
I.
point
iswith
to the
the
two
cases for
E: those
that
start after
right
of red
line
construction
of A.
the end
of I
and hence
(finish
time
of I).
do not overlap with I.

D
E
is an interval I D uniquely covered by I

I
A

Constructing A from A

If there is an interval I D uniquely covered by I, then we


define A as follows. Remove all intervals from A which
overlap with I (this was our usual way of defining A in our
wrong solution). Now add I to this set. This set is the smaller
instance A for Case 1.
We shall now define A for Case 2.

Constructing A from A

I*

I
I
D

Case2: There is no interval uniquely covered by I

Constructing A from A

If there is no interval in D uniquely covered by I, then we


define A as follows. Remove all intervals from A which
overlap with I (this was our usual way of defining A in our
wrong solution). This set is the smaller instance A for Case
2.

Theorem: |Opt(A)| = |Opt(A)| +


1
We shall prove this theorem for case 1
as well as case 2.

Case1:

There is an interval I D uniquely


covered by I
|Opt(A)| |Opt(A)| + 1

I*

I
I
D

Now Using Lemma2, it


follows that there is an
optimal solution for A
What
to add to
this
containing
I.
We need
to
add
just
solution to get a I to
getsolution
a solution
forfor
A ?A and
we are done.

I
I

Case1:

There is an interval I uniquely covered by


I
|Opt(A)| |Opt(A)| - 1

I*

I
I
D

Using Lemma1 and


Lemma2,
it just
follows
that I
We need to
remove
there
is optimal
an optimal
from
this
solution
solution
for A tofor
getAacontaining
solution for
A and
weI.
are done.
I and

I
A

This finishes the proof of Theorem for Case 1.


We shall now analyze Case2 and prove Theorem for this case
as well.

Case2:

There is no interval uniquely covered by


I
|Opt(A)| |Opt(A)| + 1

I*

I
I
D

Consider
So
we just any
needoptimal
to take care of
solution for
A.ANote
that
intervals
from
which
intersect
thisred
optimal
solution
the
line. These
are taken
takes
of D and
carecare
by adding
I E.
to this
solution. We are done.

Case2:

There is no interval uniquely covered by


I
|Opt(A)| |Opt(A)| - 1

I*

I
I
D

Using
If I isLemma1,
not in this itoptimal
follows
that
there
ancase
we
canisthe
see
that
Sosolution,
let us
consider
optimal
for
Aoptimal
removing
I
thisin
when Isolution
is from
present
the
The
problem
is
that
I
is the
not
Notice
that
I
can
serve
solution
gives
a
valid
solution
optimal
containing
solution
I
.
of
A.
We
replace in
I A,
by the
interval
from
present
so we
need
a D
purpose
of
overlapping
of
which
intersects
the
violet A.
line and
for
A
substitute
of D
I.only.
from
intervals
from
has earliest
start
time. So
Seewe
the
should
search
for
substitute
following slide for its justification.
for I from D only.

Let be the interval in D which intersects the violet vertical line


(has finish time greater than that of I) and has earliest start
time. It suffices if we can show that every interval of D overlaps
with . We proceed as follows. Consider any interval in D. There
are two cases.
Finish time of is less than that of I. In other words, does
not intersects the violet line. In this case, there must be some
other interval in D that overlaps and intersects the violet line
(otherwise, would be uniquely covered by I); since start time
of is less than this interval, so is overlapped by as well.
Finish time of is more than I. In other words, does intersect
the violet line. Hence overlaps with as well since the latter
also intersects the violet line.
This completes the proof.

Concluding slide for exercise 5

We demonstrated a greedy strategy and proved its correctness by


establishing the relationship between its optimal solution and the
optimal solution of the smaller instance defined by the greedy step.
Each step of the greedy strategy can be executed in time polynomial
of n.

Theorem:
There is a polynomial time algorithm for computing smallest
subset of intervals overlapping a given set of intervals.

Problem 1

Given an array A storing n elements, and a number k,


compute k nearest elements for the median. Time complexity
should be O(n).
Hint: Use the following tools.
Divide and conquer strategy like used in problem 2 of the
same practice sheet.
Linear time median finding algorithm.
You need to divide the problem to half the size in each step.

Firstly, we may prune our search domain from to as follows.


Find median, let it be .
Find element with rank , let it be . Remove all elements smaller than (justify it).
Find element with rank , let it be . Remove all elements greater than (justify it).
Time spent till now is O().
The nearest elements of the median are surely among these remaining
elements.
Now find element with rank , let it be . Find element with rank , let it be . If is
closer to than , then we can conclude the following:
1. all elements greater than and less than must be among the set of nearest
elements from . These elements are eliminated from input and added to our
solution.
2. None of the elements which are greater than can be among the set of
nearest element from . These elements are also removed from the input.
In this way, we have found nearest element from . Moreover, the input has
reduced from to . Keep repeating it. We get nearest element from inO() time.

Finding DFS tree from start and


finish time
There was a problem in practice sheet 5 where, given start time and finish
time of DFS traversal for all vertices, the aim is to compute DFN number
and DFS tree.
A few students were facing the problem of determining children of a node
in DFS tree. An easy way to achieve this goal is an indirect way:
In order to compute children of a vertex in DFS tree, it suffices if we can
compute parent of each vertex. We can do the latter task as follows.
Among all vertices neighboring to a vertex u, find all those vertices whose
start time is smaller than that of u. All these vertices are ancestors of u.
Who among them will be parent of u? Surely, the vertex with maximum
start time.
So we can compute parent of vertex u in O(deg(u)) time. Time spent over
all vertices will be O(m+n) time. Hence we can compute children of each
vertex in DFS tree and hence the entire DFS tree structure in
O(m+n)time.

You might also like