Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 25

SORTING

(1) Sorting---Overview:I/p:- A sequence <a1 a2 an>


O/p:- A permutation of i/p < a1 a2 an > such that ai <= ai+1 , for 1<=i<n
where each ai belongs to a domain on which sorting is defined in one or the
other way. For example, sorting is defined on the set of real numbers where
sorting is generally considered to be sorting by magnitude of the numbers.
i.e is considered to be higher in order than and - is considered to be
lower than -.
This does not mean that sorting can not be done a domain on which such
natural concept like magnitude of an element is not defined. We can force or
define our own customized notion of magnitude on the domain in such cases,
of course depending on requirement and if such a notion can really be
created.
For ex. consider a 2 dimensional figure, a square of width 4, with left
bottom corner located at the zero of the Cartesian 2 dimensional geometry.
Consider 2-dimensional points (x,y) inside or on border of this square where
x and y are positive integers. Call this set as INT(square), meaning the
collection of all points inside or on the boundary of the square where coordinates can take only integral values. So, INT(square)={(0,0), (1,0), (2,0),
(3,0), (0,1), (1,1), (2,1), (3,1), (0,2), (1,2), (2,2), (3,2), (0,3), (1,3), (2,3),
(3,3)} .
Now, there does not exist any immediate obvious concept like magnitude of
a n-dimensional point on which we can sort the above set. Given (1,3) and
(2,2), we do not know which one is greater so that we can sort this pair. So,
we can define our own notion of magnitude. For example, consider
magnitude of a point (x,y) to be the formula |(x,y)|=|x|+|y| i.e sum of
absolute values of co-ordinates. Then given points (a,b) and (c,d) we can say
(a,b) >= (c,d) if and only if |(a,b)| >= |(c,d)|. What is happening here is that
points (a,b) and (c,d) are put into relation with each other by a certain
relation R which totally orders the point set. Our usual 2-dimensional
Euclidean Distance from (0,0), defined as
|(x,y)| = ( x2+ y2 )

called the radial distance, is one such valid total order relation on the 2dimensional plane that we have been dealing with in Cartesian Geometry.
Ok..so....Given a domain, we may come up with many such total order
relations, using which we can sort the given input subset of that domain. All
points having same value will be called equal w.r.t the defined relation. For
eg. all 2-dimensional points having same Euclidean Distance 'd' i.e all points
on the circumference of a circle of radius 'd' are considered to be equal in
order, if we use Euclidean Distance metric as a basis for sorting. The value of
a real number is another such metric over real number domain which we
generally use for sorting real numbers. The point is that, at least one such
definition should exist which our algorithm should use to decide relative
order among elements of a given domain. Sorting algorithms may sometimes
explicitly use the definition of this total order relation as a part of sorting
strategy. But, usually sorting algorithms you will learn are independent of
these considerations.
So, by this end, you should have acquired enough understanding of what
sorting is generally and that you should have realized by now that the first
thing a sorting algorithm requires is the existence of some comparison
concept (i.e a total order relation) based on which comparisons of elements
of the given domain can be done. Well, even a partially ordered set would
also do for sorting. In such a case, sorting would be a collection of chains
each, individually sorted and each chain is obviously totally ordered.

(2) Space-Time Complexity Analysis---Issues:..suddenly analysis coming in the scene when you have not seen even a single sorting
algorithm!!! why ???
...so, to speak, we don't need to see any such concrete algorithm to discuss the points
which are mentioned below...For example, to start learning Discrete Structures, you do
not need to see an example of a discrete structure so that you start learning.. all you
need at first is to think about what should not be a discrete structure and what in the
world can be a sufficient picture of what a discrete structure really is...then you start
studying examples of discrete structures which confirm to our concerns...
there are different methods of solving a problem..many methods...but some of them
adorned a lot for their sheer beauty and elegance. Be patient enough to see the issues
involved in implementing one of them, namely divide and conquer strategy.

(a) Space Complexity:-

Space is always required for entire input to be stored, usually. This is


generally the case that you are dealing with and may be this is your only
view about input output. This view is typically called as Off-line or Static
case (and you will know on-line case may be in higher semesters ). Meaning, entire input is
available to us at once and we work on it by reading it as and when required.
Similar is the case for output. Usually, we will produce the entire output at
once somewhere which might be used in future computations. In such offline cases, it does not make sense to consider space requirement for input
and output as part of the space analysis of our algorithm, because somehow
we have to store input somewhere so that our algorithm can read it as and
when needed. That is the least thing our algorithm should and will demand,
in off-line cases. So, that becomes a kind of necessity on the part of our
algorithm.
Algorithm-space:- So, algorithms space cost should be calculated in terms of
the space requirement of the other entities because of algorithms strategy.
Place holding variables, loop variables, certain copy preserving variables and
may be sometimes as big as a temporary array or a list and many such
things called as temporary or local entities. These may be required by an
algorithm for its strategy to be effective to solve the problem. Such extra
space is generally engaged by an algorithm as a working information set.
Algorithm creates such a space and then possibly keeps on updating this
space as a result of its functioning and based on the state of this working
information, further actions of the algorithm are decided (just like rough work
on paper). We calculate such costs and attribute it to algorithms space
requirement. Typically we want extra space requirement i.e strategy space of
the algorithm, to be bounded above by some constant number, represented
asymptotically as a member function of the O(1) class of constantly
dominated functions. We, in fact, have a nice name for such algorithms.
In-Place algorithms.
Implementation-space:- But, algorithm is going to be implemented on some
device through models of computation or technologies like programming
languages, assembly language or direct ROM burning of the circuitry
corresponding to the algorithm's steps. So, along with algorithm-space, there
may come the cost of space provided by certain features of that
implementation technology that your code is using directly as a service or
support feature. Don't panic if you don't get it right now.
For example, when implemented by method of higher level programming
languages like C, C++, recursive algorithms are usually implemented as

recursive function calls (if the language supports such a feature). Recursive
calls are usually handled by what is called as an active function call stack.
This call stack occupies a certain portion of system memory to track function
calls, recursive or otherwise. This system stack grows and shrinks according
to the number of non-recursive calls or embedded active recursive calls
present at any moment in the runtime. Each active call record provides
enough memory for each function call to keep it's state. Local variables,
return address and local memory references to dynamically allocated
memory are all recorded in this call activation record (How else do you think
such a dynamic information can be stored, to record order and relevance of function
calls??). The amount of system stack dedicated to an algorithm is called the

of that algorithm. We do not, generally, attribute Stack Depth


cost to the algorithm, because it is the dedicated implementation level
facility at use. It is the feature of that implementation technology, like C or
C++, serving our algorithm.
Stack Depth

If we attribute even this facility or support cost to the algorithm then we


are not appreciating the beauty or elegance of algorithm strategy. Solution
methods like divide and conquer are elegant methods for solving a certain
class of problems. As a famous quote says, To iterate is human. To recurse,
divine.

But on the other hand, if we ignore implementation costs altogether then we


might run a risk of overuse of space if our algorithm requires to have way too
many active functions depending on each other simultaneously by our poor
implementation of recursion to the levels which are not even needed. What if
the only language currently in the world does'nt support recursive calls?
What if the implementing system shouts back at you saying Hello, no further
recursion please..i will have to give you my place to support your activities..So, Stop
Now or achieve nothing from me!!!. I bet, you already have encountered such

situations (though not such screaming from the system..system cries out in code
words like

recursion depth exceeded

or

segmentation fault

or similar). So

we try converting such elegant divide and conquer recursive algorithms to


non-recursive form i.e iterative equivalent, without losing efficiency of D&C
approach.
This is done for at least 2 reasons...To help system memory not overload
itself with recursive records and ...it is in our nature to look for further
efficient solutions even when we have one efficient solution at hand.

We are still discussing probable issues of recursive method only. Why only recursive
method in discussion here? Are there not other methods where such issues will not
arise, may be?
Yes, there of course are other so-called non-recursive methods..like decrease and
conquer, memoization (not memoRization), greedy, branch-bound, backtrack, heuristicsearch space and alike..currently you do not fully understand what these words mean..
but, for time being, anyway, so far, Divide and Conquer (D&C) is like one of the jewels of
methods for solving a problem...and especially a jewel for sorting in the sense that it
does avoid unnecessary comparisons to establish sorted order...in this sense, it really
deserves space for discussions over the method.

So..again...Another quick remedy to reduce the recursive load on the


implementing technology is to stop recursion earlier enough i.e to identify
problem size threshold at which problem is solved by other direct methods
rather than calling recursion further. This precedes by a little algebraic and/or
arithmetic analysis of time complexity equations which implies us that below
threshold value recursion would be more costly than direct problem solving.
This reduces the load a lot. Such an analysis is not mathematically difficult.
But it involves diligent, exact step counting of the algorithm or it requires to
consider fixing of those constants in your asymptotic analysis like O, o,
classes of functions. Only then you can compare 2 functions absolutely on
their magnitude for deciding the threshold or the barrier, because to decide
a barrier or a threshold, what we want here is an exact estimated
comparison of 2 functions..and not just asymptotic behavior.

(b) Time Complexity:It is assumed here that you have some idea about calculating time
complexity in terms of what is called as input size. It is also assumed that
know asymptotic analysis notations like O, o, and . So, we analyze the
time complexity of any algorithm, and especially sorting algorithm, from 3
perspectives.
(i) Worst Case Analysis:As is obvious from the words, by looking at the strategy of the
algorithm, we identify input instances which make algorithm
consume the highest possible number of operations. Purpose is to
bound from above the time requirement of the algorithm. There may

be multiple such instances. To get upper bound, we can actually find


the run-time equations of each of these worst cases and then choose
the fastest growing function among them.
There is obviously at least 1 reason to support worst case analysis.
And that is, that knowing upper bound on the running time
guarantees the worstness of the algorithm. Knowing upper bound is
surely required.
But there may be several opinions against, putting worst case
analysis as the major time complexity analysis of an algorithm.
---How many worst cases can exist?
Let D= {d1, d2, ...,dn} be an input set to an algorithm, where di
belongs to some domain. Most of the times, any arrangement of this
set can become an input to our algorithm. This is known as an input
instance. For ex. given a set of numbers {10,6,3,4,8} as an input set to
sorting algorithm, we know that this same set can come as an input in any of
the 5!= 120 ways. So there are 120 input instances. We generally have

some idea about how the arrangements of D are distributed. Let P be


such an assumed probability distribution over this set D. (In case we
have no knowledge about the distribution of arrangements of input set then P is
considered to be Uniform Distribution)

Let assume that X is the set of worst


case input instances identified for our algorithm. Now, according to P,
what is the probability of occurrence of at least one of these X worst
cases? Means we are asking about the probability of the set X when P
is the distribution over parent set D. Usually it is low, since usually
number of worst cases is itself low.
As a short example, worst case is identified for Quick-Sort by the
situation when out of 2 sub arrays of given array of size n, left or
right, exactly 1 is empty so that every time problem size reduces by
just 1. This leads to time complexity equation (n2). How many
cases force this to happen? Just 2, when input is either increasingly
sorted or decreasingly sorted. So, D= {x1, x2,...,xn} and
X={increasing permutation, decreasing permutation}. Assuming
probability distribution over D to be P=Uniform Distribution, the
probability of set X is p(X)=2/(n!), which is tremendously low even for
n=10. Obviously there are other permutations too for which time
complexity of Quick-Sort shoots to (n2) class. We will see that in a
section dedicated for Quick-Sort.

Above points should make you start thinking that we should be


careful when we say that time complexity of an algorithm is it's worst
case equation..i.e we should be careful when we summarize time
complexity in just one class i.e O (Big-Oh). Worst case analysis will
surely have an upper hand in analysis when either worst cases are
too many (in such a case, algorithm is generally inefficient) or probability of
occurrence of worst case set is such high that it starts concerning us.
But this all depends on the distribution P that we choose or that
exists in reality, on the input set.
---How really worst any worst case can be? Can we not reduce
worstness?
Previous question tried dealing with the quantitative aspect of worst
case. Here we discuss qualitative aspect of a worst case. Instead of
talking about number of worst cases, we talk about how really bad
can any worst case be. Bad in the structural sense, the way the input
is stored and the inherent structure of the input elements
themselves. Any worst case will surely lead to higher time
complexity. But can we not really reshape the received input instance
so as to avoid receiving worst case input instances directly to our
algorithm? This is called pre-processing the input without losing it's
meaning or value. Can we not pre-process the input to avoid worst
input, somehow, without increasing the time complexity of entire
algorithm?
If we can then of course we should take a look at the input instance
and pre-process it if required, before feeding it to our algorithm.
For example, consider again Quick-Sort and it's 2 worst case input
instances, increasing or decreasing input. Before letting Quick-Sort
run on the input, we can scan the array once to see if we have
received worst case. It takes (n) time to decide such a presence.
Then we can pre-process it by partially permuting it so that now we
have transformed worst case to some other input instance. Then
feed to Quick-Sort. Total time complexity = Pre-processing-Time +
T(Q-Sort) = (n)+(n*log(n)) = (n*log(n)). So we have not
disturbed time complexity. (Actually in this case, we do not even need to run
Quick-Sort because worst case instance of increasing input is already sorted and
we can sort directly the other worst case instance i.e decreasing input, in (n)
time... Think how..)

Now is the time to discuss case when pre-processing is costly. You can try this
case yourself. As an exercise, you can think about some problem and it's
algorithm and find out worst cases for given problem and algo. Think of some
way of pre-processing the input. Find such a problem and algo for which preprocessing input itself is costlier than algo time complexity. A hint to start with
is that input elements themselves should be so complex or so restrictive in
structure that any processing on them will probably take considerable time.
For example, how about the problem of sorting the singly linked list ???

From above, it is clear that worst case analysis must be done to show
the upper bound and may be it is not that good to label the
algorithms time complexity by it. But anyway, worst case equation is
the upper bound on an algorithm.
(ii) Best Case Analysis:Best cases are those which consume the minimum of resources and
computational steps. Symmetric to worst case analysis, discussion
for best case goes.
(iii) Average Case Analysis:Average case analysis is simply averaging the time complexity over
all input instances. Now, there are many kinds of
averages..arithmetic average, mean squared average and so on. We
have to choose an averaging notion which fits best to the analysis
case. Now input instances are generally large in number. So
obviously it is not feasible to list down all instances and calculate
time complexity for each. Here, using the notions O, o, , of
function classes comes to help. Though there exist as many time
complexity equations as are the instances, these equations usually
lead to only a few function orders. For ex. all functions of the form aX+b
where a and b are real numbers belong to O(X) class order. Though there exist uncountably infinite such functions, they all form just 1 class .

We have to remember that average case complexity considers all


input instances and their equations. When we talk about input
instances, we are also talking about probability distribution P over
the input instance set. With this in mind, average complexity is the
mathematical expectation over run time equations, if it exists.(There
are sets for which mathematical expectation does not exist for a given probability
distribution because summation does not converge)

If T is the set of run time equations of input instances then

Average complexity=E[T] =all input instances I p(I) * run-time(I)


where, p(I) = probability of occurrence of input instance I.
Example:- Consider the problem of searching an element x in the
given array A. Consider linear search algorithm. Complexity of linear
search is the number of comparisons required for finding the search
key x. Assume that elements of A are all distinct. Given an input set
of n elements, there are n! input instance arrays. Assume uniform
probability distribution over input instances. Input instances can be
grouped together according to number of comparisons these
instances yield. So there can only be n+1 groups, n for successful
search and 1 for unsuccessful search. So here we can see that n!
instances are categorized in just n+1 groups. All we require is to find
the size of each group. Let C(i) be the size of the group requiring i
comparisons. Here T={1,2,3,4,,n+1} i.e our time equations are
nothing but simply considered to be number of comparisons, without
any loss of generality. Average complexity of linear search is then
mathematical expectation over T, given uniform distribution
E[T]= 1<=i<=n+1 i * p(instance subset requiring i comparisons)
Now, since we have assumed uniform distribution, probability of each
instance is 1/(n!). So above equation becomes
E[T]= 1<=i<=n+1 i * C(i)/(n!)
Calculating C(i) is straightforward. i comparisons means search key is
found at location i. There exist (n-1)! such instances. This is true for
all 1<=i<=n. So,
E[T]= 1<=i<=n+1 i * (n-1)!/(n!) = 1<=i<=n+1 i /n = (n+1)(n+2)/(2n)
= (n)
So average complexity of linear searching is (n). Actual count
number says that on an average we need to scan half the array to
find presence or absence. This is intuitive also but only in the case of
uniform probability distribution.
Here we have used elementary combinatorics to count instances. We
can simplify analysis by using other tools like Indicator Random
Variables and then calculating expectation of each random variable
and adding all these expectations.

Here analysis was simplified because of uniform probability


distribution..calculations were simplified because each instance had the same
probability 1/n!. In any general distribution case, analysis is quite
involved and complex. Equations may become almost impossible to
be represented in closed form. Tools required for such an analysis
may have to be taken from mathematical fields like algebra, calculus,
discrete mathematics and alike.
---What is the difference between average time complexity
and time complexity of average input instance ? Are they
same?
Average complexity of an algorithm is what we discussed
above..expectation, if it exists, over run-time equations under
considered probability distribution. Average input instance is
certainly a different thing because here we are talking about an
instance and not about complexity. Like best case and worst case,
average input is an instance case which consumes an average
number of operations of an algorithm and resources. So there are 2
ways to figure out this. (i) Either every operation should be executed
average number of times or (ii) algorithm should execute only an
average number of operations. In first case, algorithm executes
every operation but average number of times. In second case,
algorithm does not execute every operation but executes only
average number of operations.
Example .. Say an algorithm A consists of 5 steps.
Algorithm A:Step 1 group of statements
at max n1 times

cost= c1

repeated

Step 2 group of statements


at max n2 times

cost= c2

repeated

Step 3 group of statements


at max n3 times

cost= c3

repeated

Step 4 group of statements


at max n4 times

cost= c4

repeated

Step 5 group of statements


at max n5 times

cost= c5

repeated

According to
case (i) average algo cost = i Average cost of Step(i) = i ci*ni/2
case (ii) average algo cost = x cost of Step(x) * nx, where x 3subset of {1,2,3,4,5} .. because out of 5, on an average 3 are
executed fully. This second case suggests that some 3 steps are
executed and other 2 not. This can be the case when steps are
executed conditionally i.e under if-else or switch kind of condition
clauses. So on an average number of times conditions evaluate to
false.
As you can see, these 2 may evaluate to different cost functions. But
for algorithm A, still the order of both these functions is same i.e they
belong to same function class asymptotically. But this need not be
true generally. What if the cost functions of different steps are of different
order? Then second case gives different time complexity functions
depending on which subset of steps actually got executed .

That is why case (i) method of evaluation seems to be appropriate


because in that case an algorithm is not rejecting certain steps just
because they might be executed conditionally. Rather than rejecting,
we should in fact consider the average number of times when the
conditions turn to be TRUE and so steps will be executed.
So we should come up with an input instance which satisfies case (i)
evaluation. Again, notion of average here is always with respect to
the probability distribution over input instances.
Considering linear search problem above, considering uniform
distribution once again, average input instance will be any such
instance where on an average n/2 times algorithm fails to locate the
search key. This corresponds to the case where search key is at an
average location i.e at location n/2. There are (n-1)! Instances where
search key is at location n/2. So, there are (n-1)! average cases for
linear search problem. All of these evaluate to time complexity
functions of same order and that is (n). (Linear search is a problem which
has n! worst cases i.e when search key is absent.. (n-1)! best cases i.e when
search key is present at location 1.. and (n-1)! average cases i.e when search key
is at location n/2)

Let us put some notations to summarize momentarily above


discussion about average case and average complexity.
I = set of all input instances
AI= set of average instances
T(x) = time complexity of x, where x is a subset of I
As we saw before, average time complexity is expectation, if it
exists, over all instances under probability distribution P. So, Average
Time Complexity = E[T(I)].
And we also know that we have time complexity of average case
inputs also. That is T(AI).
Now read the question which started this subsection. That question is
now formally expressed as
Are E[T(I)] and T(AI) of same order always? i.e do they
belong to same asymptotic function class always?
For linear search algorithm case, answer is YES. So, we can
summarize time complexity analysis of linear search problem on n
elements in an expression like
(1) <= T(n) <= O(n) with average complexity (n)
It is left to you to think about the answer in general case.

(3) Desirable Algorithm Properties(if you can


achieve):Now, if you have not understood the need and timing of the topics
we have discussed till now, do not worry much. But also do not be
relaxed. Be sure that you come to read these previous points as and
when required by you. So, to talk about desirable properties of any
sorting algorithm, we want a sorting algorithm to be
(a) In-Place :Meaning, all the computations done by algorithm should be done
in a space function constraint belonging to the class O(1) of

functions. You will see (and probably know) that merge sort and radix
sort are at least 2 such algorithms you know which are not in-place.
(b) Stable :Meaning, elements which are of same order (relative to sorting)
appear in the sorted order in the output sequence as they appeared
in the input sequence. Meaning, for example, if the integer 5 appears
in the sequence, say 3 times, then 5 should appear in output in the
same order as itself in the input. To clarify further, let us say, for the
sake of discussion, that 3 instances of this 5 are labeled as 5', 5'' and
5''' in the order of appearance in i/p(it is same 5, but 3 times). We
know that these 3 5's are same in order. But we want these 3 5's to
appear in o/p in the same order i.e 5', 5'', 5'''. If we can achieve this
for all elements which have multiple occurrence or the elements
which are order-equivalent w.r.t sorting (remember general magnitude
custom definition we talked about in first section?? total order relation ?? order
equivalence??), then we have achieved stability of the order of

appearance.
Apart from these desirable properties, we of course want an
algorithm to be efficient on various frontiers like time, memory
hierarchy (like cache, primary memory, secondary memory) and
other resources like network bandwidth if we are dealing with a huge
amount of data over computer interconnection network or any
communication network like radio communication and alike.
Now, after all these considerations, we are in a position to list
different strategies for sorting i.e list different algorithms for sorting.
We will discuss, in fair details, every sorting strategy that turns up in
usual B.E curriculum.

(3) An Upper Bound on Run Time of Sorting By


Comparison :
Before discussing a variety of algorithms for sorting, we can derive general
upper bound on running time of any algorithm which does sorting by
comparison on a random set of elements (partially or totally ordered sets).
Consider the input set S={a1,a2,,an} where ai belongs to some data
domain. We assume absence of any further information about any possible

input set S like the range of data elements, frequency of occurrence and
probability distribution on the universe from where the data is chosen.
We can decide sorted ordering of S by comparing each element x with
every other element and checking how many elements are less than x and
putting x directly to its proper position using this information. Given n
elements, there are nC2 such pairs. Each pair (a,b) checks if a < b and
results in value 0 or 1 accordingly. This information can be stored and
reused repeatedly to swap elements to their proper positions. At a time
only 1 element needs fixing the position. So at a time only n-1 comparisons
are required. Given this, run time complexity becomes O(n 2) with space
complexity (n). We can summarize this as a result.
Sufficient Upper Bound on Run Time and Space Requirement for
Sorting By Comparison:Given a set of input elements S of size n and absence of further
information about elements of S, O(n2) run time procedure is sufficient
with (n) space complexity, for sorting by comparison.
Equipped with the above result, in later sections, we will see algorithms
having time complexity belonging to the above stated O(n 2) family of
functions and space complexity belonging to O(n) family of functions, not
(n) because this class of forces to require the variable space which is not
the case for many algorithms like Heap sort, Bubble sort, Insertion sort and
many others.

(4)

Algorithms for Sorting

Below is a list of well studied algorithms which find their way to syllabus
almost everywhere. It is useful to think of an algorithm as a person having its
own mind and words to describe the thought. So every algorithm has a
strategy which is expressed formally as a step-by-step computational
procedure which we see in text. Consider S to be the set of elements to be
sorted. A few points below are to be considered.
(i) An element x S is called ith order statistic if x is ith smallest element in
the sorted order over S.
(ii) Given 2 elements i, j N, with i < j, we say Derangement has occurred if
S[i] > S[j]. This means smaller values occur before higher ones in the
sequence. The performance of sorting algorithms, most of the times, depends
on the magnitude of derangement present in the input sequence. Neatly,
performance of any algorithm is tightly linked to the magnitude of the
derangement in i/p sequence and how fast is any strategy in resolving

derangements from a certain to pass to the next. So it makes sense in


obtaining complexity bounds relative to magnitude of derangement and the
distribution of a certain magnitude of it in the input sequence.

(4.1) SELECTION Sort:4.1.1 Strategy:Successively relocate the ith order statistic to location i. This is to be done by
Decrease-and-Conquer approach. Dec-Con approach follows here by
assuming that all i-1 statistics have been relocated to respective proper
positions. Given this, ith order is the 1st order statistic of remaining n-i
elements. Follow above procedure for increasing values of i. Separation
between sorted and unsorted elements is maintained by array.
4.1.2

Math Model:-

Consider i/p to be a sequence of elements S = < a 1, a2, , an>


Let Perm = {x | x is a permutation sequence of S}
Swap: Perm X Zn X Zn Perm
such that,
Swap( p, i, j) = p such that
for 0<= ki and kj <= (n-1), p(k) = p(k)
p(i) = p(j) and p(j) = p(i)
min : Perm X Zn Zn
such that
min(p, i) = minimum k { p(k) | i<= k<= (n-1) }
SS= < Swap(p0, 0, min(p0,0)), , Swap(pn-2, n-2, min(pn-2,n-2)) >
such that

pi = Swap(pi-1, i-1, min(pi-1,i-1)) and


p0 = i/p and o/p = Swap(pn-2, n-2, min(pn-2,n-2))
4.1.3

Algorithm Extraction from Model :-

and

As seen, selection sort is modeled as (n-1)-length homogenous


sequence of Swap function. Algorithmically, this can be easily translated into any
kind of iterative loop of (n-1) repeatations. Function min embedded in Swap is a
search function finding minimum value. This function can be easily translated into a
linear search over contiguous space. Such direct translations lead to following
procedure text.
For i from 1 to n-1

(1)

f1 = n

(2)

f2 = (n-1)

(3)

f3 = (n-i+1)

(4)

f4 = (n-i)

Begin
min = S[i]
For j from i+1 to n
Begin
If ( min > A[j] )
min= A[j]

(5)

f5 <= (n-i)

End
swap min and A[i]

(6)

f6 = (n-1)

End
4.1.4

Space/Time Complexity Analysis :1.4.1 Space :Space(n) =


=
(1)

Variable space ( (1) ) + StackDepth ( (1) )

1.4.2 Time :T(n) = 1<=i<=6 fi


4.1.4.2.1 Worst Case Analysis :- is when step (5) gets executed with
frequency reaching
upper bound. This happens only when input is reverse sorted.
So worst case instance
S= <a 1, a2, , an> such that for all i, ai > ai+1
T(n) = (n2)
4.1.4.2.2 Best Case Analysis :- is when step (5) gets executed not even
once. This happens

only when input is already sorted. So best case instance


S= <a1, a2, , an> such that for all i, ai <= ai+1
T(n) = (n2)
4.1.4.2.2 Average Case Analysis :- is when step (5) gets executed an
average number of times.
This depends on the probability distribution over Perm. But
anyway, average
complexity is the mathematical expectation over run time equations
w.r.t a probability
distribution p.
Consider I Perm
E[ T( Perm ) ] =

I Perm

p(I) * T(I)

As seen from worst and best case analysis, (n 2) <= T(I) <= (n2).
So,
E[ T( Perm ) ] =

I Perm

p(I) * (n2) = (n2) *

I Perm

p(I) =

(n2)

As a result of above analysis, if T(n) is the time complexity of


the sort then
(n2) <= T(n) <= O(n2) implies

T(n) = (n2)

4.1.5 Discussion :#Swaps = n-1


Selection sort has complexity of the order of n2 because the
strategy is insensitive to
received permutation. The strategy is just to seek minimum
element from a set and relocate it to proper position. Search operation can not be optimized due
to lack of any

specific information about the input sequence. As a result, brute force


linear search has
to be used, resulting in a stationary complexity range. Generally, we
can think of a strategy to be nearly input insensitive if it results in a stationary
complexity range.

(4.2) BUBBLE Sort:4.2.1 Strategy:Successively relocate the ith order statistic to location i in decreasing
values of i . This is to be done by Decrease-and-Conquer approach. DecCon approach follows here by assuming that all n-i statistics have been
relocated to respective proper positions. Given this, i th order is the(n-i)th order
statistic of remaining n-i elements. Follow above procedure for decreasing
values of i.
ith element is to be relocated to proper position by pair-wise derangement
check and swap(if required). This is very different from Selection sort
strategy. In selection sort, ith element does not find its own way through the
sequence because it is swapped exactly once to reach to its proper location.
Here, ith element finds its own way by moving progressively through the
sequence of swaps whenever required. Separation between sorted and
unsorted elements is maintained by array.

4.2.2

Math Model:I/p: A sequence <a1 a2 an>


O/p: A permutation sequence of I/p such that for all 1 k n, a k
ak+1
Let

SEQ = {x | x is a permutation sequence of I/p sequence }

N = Set of all non-negative integers

(i)

To find out of order pairs :F : SEQ X N N

F(<a1 a2 an>, p ) = min { i | ai > ai+1 }

1 i p and 1 p

<n
= 0 , otherwise

Description:- Given prefix subsequence <a 1 a2 ap>, F selects


minimum index at which adjacent elements are out of order.

(ii)To swap out of order pairs :fswap : SEQ X N

SEQ

fswap(<a1 a2 an>, k) = <a1 a2 an> for 1 k < n

such

that
a i = ai , 1 i < k
ak = ak+1
a k+1 = ak
a i = ai , k+1 < i n,

and

fswap(<a1 a2 an>, 0) = <a1 a2 an>


Description:- Given sequence <a1 a2 an>, elements at indices k and
(k+1) are out of order i.e ak > ak+1. Thus, ak and ak+1 are swapped and all
other elements remain same. This transforms original sequence <a 1 a2 an>
to another sequence <a1 a2 an>.
If no elements are out of order then this is indicated by second argument as
0 (since F outputs 0). In such a case, sequence remains unchanged.

(iii)A Pass through the array :fpass : SEQ X N

SEQ

(restricted over 1 k n)

fpass (<a1 a2 an>, k ) = fkswap (<a1 a2 an>, F(<a1 a2 an>, k) ) , for


1k< n
= f kswap ( xk , F(xk ,k ) )

such that,
x i = fi-1swap ( xi-1 , F(xi-1 , i-1) ) , for all 1< i k

and

x1 is either an I/p sequence or any other permutation of I/p as a


result of previous operations of any function operating on sequences (can be
observed in equation (4) ahead ).
Description:(1) fd means function f is applied to itself d times (iterated functions).
(2) k is the length of the subsequence. f pass with second argument as k
means current subsequence under consideration is <a 1 a2 ak> (i.e first n-k
maximum elements have been settled to proper positions) and bubble sort is
trying to fix (n-k+1)st maximum element in its proper position (current
proper position under consideration is k).

(iv)

Sequence of Passes :- compute fpass for 1 k n

BS = < f 1 f2 fn-1>

such that

fi( xi , n-(i)+1 ) = fpass (<a1 a2 an>, n-i+1 ), for all 1 i < n


and

x j = fj-1( xj-1 , n-( j-1 )+1) for 2 j k

and

x 1 = I/p

and

O/p = f n-1(sn-1, 2)

Description:(1) Bubble Sort is a series of passes. For next pass, size of sequence under
consideration reduces by 1, since in each pass 1 element is put to its
proper position in the sequence.
(2) Pass i=1 function f1 operates on x1 = initial sequence as its first
argument and length n-1+1= n. It fixes 1 st maximum element at position
n. This produces modified sequence with 1 st maximum at proper position.
This output sequence of f1 is given to f2 as x2. After a sequence of n-1
functions, output is the desired O/p sequence.

So, Bubble Sort is a sequence BS (as above) of pass functions (modeled as


equation (3) ), where each pass function fixes exactly 1 element in its proper
place in the sequence. Each of these pass functions fix the proper element to
proper position by swapping(if required). Thus, swapping is modeled as
equation (2). To end, to swap only when required, we require to have a
function which can tell us if any adjacent pair is out of order. This function is
modeled as equation (1).

4.2.3

Algorithm Extraction from Model :-

In above model, each function fi in sequence BS depends on fpass which in turn


depends on fswap which finally depends on F. So, from a very rough overview, BS
sequence forms outer core of the procedure. BS is a sequence of same function
operating over different lengths of the i/p sequence. Each function is a group of
operations. So, in BS that same group of operations is executed one after the other
for n-1 times. So, algorithmically we can express the sequence BS as a Repeat
statement.
1.

i=1

2.

Repeat (n-1 times )

//because BS is of length n-1

3.

Evaluate function fi of BS (i.e evaluate fpass on length n-i+1)

4.

increment i
Algo 1. Algorithmic analog of mathematical BS

Going ahead, each fi is a fpass function operating on sequence Si of length n-i+1. fpass
is itself a repeated evaluation of function f swap for n-i+1 times. So, going ahead, step
3 itself can be expressed algorithmically as another Repeat statement. This Repeat
is done for each execution of step 2. So extending Algo1,
1.

i=1

2.

Repeat (n-1 times )

3.
4.
5.
6.
7.

// i tracks the pass number


//because BS is of length n-1

k=1

// k tracks the position inside a pass

Repeat( n-i+1 times)


Evaluate function f swap
Increment k
increment

Algo 2.
Further, step 5 in Algo 2 is a swap function which will swap only after evaluating
function F. If function F returns 0 then no action should be taken because sequence
will be left unchanged for that point of execution. This conditional behavior can be
expressed algorithmically as an if-else structure. We dont require else part since,
as said above, no action to be taken if F returns 0. So, naturally evaluation of
function F becomes if test condition. So, extending Algo2,

1.

i=1

2.

Repeat (n-1 times )

3.

// i tracks the pass number


//because BS is of length n-1

k=1

4.

// k tracks the position inside a pass

Repeat( n-i+1 times)

5.

If ( F evaluates to non-zero value k) then

6.

swap entries at indices k and k+1

7.

Increment k

8.

increment

i
Algo 3.

Now, further, every operation is on sequences, we have to introduce variables to


mention sequences. Initially variable will represent i/p sequence. Also, step 6 talks
about swapping. At this stage, this swap execution can be expressed in many
different ways. So, no real need to expand step 6 further to specify its form. But, F
in step 5 actually is a minimum index finding function. This is a complex evaluation.
Since we are looking for an algorithm on serial processing machines, minimum
finding is itself a search over all indices one-by-one, starting from minimum index 1.
We can use variable k introduced in step 3 of Algo3. Extending Algo3,
1.
2.

s = i/p sequence , s =i/p sequence


i=1

// i tracks the pass number

3.

Repeat (n-1 times )

4.

s= s

5.

k=1

6.

//because BS is of length n-1

// k tracks the position inside a pass

Repeat( n-i+1 times)

7.

If ( k = F(s,k) ) then

8.

swap s(k) and s(k+1),

9.

s= old s with swap changes effected

10.

Increment k

11.

increment

12 . s is the desired o/p sequence in increasingly sorted order


Algo4.

Below given algorithm is the refined expression of above Algo4.


Bubble Sort Algorithm :- (Heavy elements push down approach)
For(i=1 to n-1)

..(4)

f1 = n
For(k=1 to n-i)

(3)

f2 = n-i+1
If( a k > ak+1)

(1)

f3 = n-i
Swap a k with ak+1 (2)
f4 <= n-i

4.2.4

Space/Time Complexity Analysis :-

4.2.4.1 Space :=

Space(n) =
(1)

Variable space ( (1) ) + StackDepth ( (1) )

4.2.4.2 Time :4.2.4.2.1 Worst Case Analysis :- is when step labeled(2) executes to
the upper bound.
T(n) = i=1n j=1i ( (1) + (1) ) = ( n (n+1)2) = ( n 2)
This is the case when input sequence comes in reverse sorted order.

4.2.4.2.2 Best Case Analysis :- is when input sequence comes in


requisite sorted order so
that step labeled(2) executes none of the times.
T(n) = i=1n j=1i ( (1) ) = ( n (n+1)2) = ( n 2)

4.2.4.2.2 Average Case Analysis :- same as that of Selection sort due


to the fact that best
and worst case complexities remain stationary.
As a result of above analysis, if T(n) is the time
complexity of the sort then
(n2) <= T(n) <= O(n2) implies

T(n) =

(n )
4.2.5 Discussion :Discussion here goes along the same lines as that for Selection sort.
There are a few changes, however . Owing to the fact that the i th element
finds its way through the sequence by progressive swaps, the number of
swaps is comparatively lot more than that in Selection sort. As observed,
number of swaps here obey
0 <= #swaps <= n *(n-1) 2 = O(n 2)
Difference is not only in number of swaps. Number of swaps here depends on
how bad a permutation has been received. In best case, #swaps = 0.
Tracking #swaps from pass to pass can surely be utilized to decide whether
the next pass is even necessary. If in the ith pass #swaps=0 then this directly
implies that array has been already sorted, since the highest i-1 statistics
have been repositioned properly and #swaps=0 for remaining n-i elements.
This provides a way to optimize the sort to respond to the best case
and cases nearly best case instance.
Tracking #swaps for a pass requires (1) extra space for tracking variable
and adds only (1) to time complexity per pass to set the tracking variable if
required. Extra condition of checking tracking variable for zero, before
initiating the next pass, requires (1) time per pass.
All of the above analysis implies that addition to the space and run time
complexity is only a constant and the lower bound i.e best case time

complexity reduces to (n). Thus, modified bubble sort would have following
run time analysis.
As a result of above analysis, if T(n) is the time
complexity of the sort then
(n) <= T(n) <= O(n2)

Improvement was possible because the strategy of checking pair-wise


adjacent elements for derangement is exactly also the strategy to check whether
the sequence is in sorted order. This sort can be said to be sensitive to the
sequence of elements. This has helped optimize the sort, which is not the case with
Selection sort discussed earlier. Also, to be noticed that, to optimize the algorithm,
not much change is done to the basic strategy of the sort.

You might also like