3 Sort

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 54

SORTING

The Sorting Problem

• Input:

– A sequence of n numbers a1, a2, . . . , an

• Output:

– A permutation (reordering) a1’, a2’, . . . , an’ of the

input sequence such that a1’ ≤ a2’ ≤ · · · ≤ an’

2
Structure of data

3
Why Study Sorting Algorithms?
• There are a variety of situations that we can
encounter
– Do we have randomly ordered keys?
– Are all keys distinct?
– How large is the set of keys to be ordered?
– Need guaranteed performance?

• Various algorithms are better suited to some


of these situations

4
Some Definitions
• Internal Sort
– The data to be sorted is all stored in the
computer’s main memory.
• External Sort
– Some of the data to be sorted might be stored in
some external slower device.
• In Place Sort
– The amount of extra space required to sort the
data is constant with the input size.

5
Stability
• A STABLE sort preserves relative order of records with equal
keys
Sorted on first key:

Sort file on second key:

Records with key value 3


are not in order on first
key!!

6
INSERTION SORT
• Idea: like sorting a hand of playing cards
– Start with an empty left hand and the cards facing
down on the table.
– Remove one card at a time from the table, and insert
it into the correct position in the left hand
• compare it with each of the cards already in the hand, from
right to left
– The cards held in the left hand are sorted
• these cards were originally the top cards of the pile on the
table

8
Insertion Sort
input array

5 2 4 6 1 3

at each iteration, the array is divided in two sub-arrays:

left sub-array right sub-array

sorted unsorted

9
Insertion Sort

10
INSERTION-SORT
INSERTION-SORT(A) 1 2 3 4 5 6 7 8

1. for j ← 2 to n a1 a2 a3 a4 a5 a6 a7 a8

2. do key ← A[ j ] key

3. Insert A[ j ] into the sorted sequence A[1 . . j -1]


4. i←j-1
5. while i > 0 and A[i] > key
6. do A[i + 1] ← A[i]
7. i←i–1
8. A[i + 1] ← key
• Insertion sort – sorts the elements in-place
11
Loop Invariant for Insertion Sort
Alg.: INSERTION-SORT(A)
for j ← 2 to n
do key ← A[ j ]
Insert A[ j ] into the sorted sequence A[1 . . j -1]

i←j-1
while i > 0 and A[i] > key
do A[i + 1] ← A[i]
i←i–1
A[i + 1] ← key
Invariant: at the start of the for loop the elements in A[1 . . j-1] are in
sorted order
12
Proving Loop Invariants
• Proving loop invariants works like induction
• Initialization (base case):
– It is true prior to the first iteration of the loop
• Maintenance (inductive step):
– If it is true before an iteration of the loop, it remains true before the
next iteration
• Termination:
– When the loop terminates, the invariant gives us a useful property
that helps show that the algorithm is correct
– Stop the induction when the loop terminates

13
Loop Invariant for Insertion Sort

• Initialization:
– Just before the first iteration, j = 2:
the sub array A[1 . . j-1] = A[1],
(the element originally in A[1]) is
sorted.

14
Loop Invariant for Insertion Sort
• Maintenance:
– the while inner loop moves A[j -1], A[j -2], A[j -3],
and so on, by one position to the right until the proper
position for key (which has the value that started out
in A[j]) is found
– At that point, the value of key is placed into this
position.

15
Loop Invariant for Insertion Sort
• Termination:
– The outer for loop ends when j = n + 1  j-1 = n
– Replace n with j-1 in the loop invariant:
• the subarray A[1 . . n] consists of the elements
originally in A[1 . . n], but in sorted order
j-1 j

• The entire array is sorted!


Invariant: at the start of the for loop the elements in A[1 . . j-1] are in
sorted order
16
Analysis of Insertion Sort
INSERTION-SORT(A) cost times
1. for j ← 2 to n c1 n
2. do key ← A[ j ] c2 n-1
3. Insert A[ j ] into the sorted sequence A[1 . . j -1] 0 n-1
4. i←j-1 c4 n-1
c5 
n
5. while i > 0 and A[i] > key j 2 j
t
6. do A[i + 1] ← A[i] c6 
n
j 2
(t j  1)
7. i←i–1 c7 
n
(t j  1)
j 2
8. A[i + 1] ← key c8 n-1

tj: # of times the while statement is executed at iteration j

T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

j  217 j 2 j 2
Best Case Analysis
• The array is already sorted “while i > 0 and A[i] > key”

– A[i] ≤ key upon the first time the while loop test is run
(when i = j -1)
T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

– tj = 1 j 2 j 2 j 2

• T(n) = c1n + c2(n -1) + c4(n -1) + c5(n -1) +


c8(n-1) = (c1 + c2 + c4 + c5 + c8)n + (c2 + c4 + c5
+ c8)
= an + b = (n)
18
Worst Case Analysis
• The array is in reverse sorted order “while i > 0 and A[i] > key”
– Always A[i] > key in while loop test
– Have to compare key with all elements to the left of the j-th position
 compare with j-1 elements  tj = j
n
n(n  1) n
n(n  1) n
n(n  1)
using 
j 1
j
2
  j 
j 2 2
 1   ( j 1) 
j 2 2
we have:

 n(n  1)  n( n  1) n(n  1)
T (n )  c1n  c2 (n  1)  c4 (n  1)  c5   1  c6  c7  c8 (n  1)
 2  2 2

 an 2  bn  c a quadratic function of n

• T(n) = (n2) order of growth in n2


T (n)  c1n  c2 (n  1)  c4 (n  1)  c5  t j  c6  t j  1  c7  t j  1  c8 (n  1)
n n n

j 2 j 2 j 2 19
Comparisons and Exchanges in Insertion Sort

INSERTION-SORT(A) cost times


for j ← 2 to n c1 n
do key ← A[ j ] c2 n-1
Insert A[ j ] into the sorted sequence A[1 . . j -1] 0 n-1

i←j-1  n2/2 comparisons c4 n-1


c5 
n
j 2 j
t
while i > 0 and A[i] > key
do A[i + 1] ← A[i] c6 
n
j 2
(t j  1)

 n2/2 exchanges c7 
n
i←i–1 j 2
(t j  1)

A[i + 1] ← key c8 n-1


20
Insertion Sort - Summary
• Advantages
– Good running time for “almost sorted” arrays (n)
• Disadvantages
– (n2) running time in worst and average case
–  n2/2 comparisons and exchanges

21
BUBBLE SORT
• Idea:
– Repeatedly pass through the array
– Swaps adjacent elements that are out of order
i
1 2 3 n

8 4 6 9 2 3 1
j

• Easier to implement, but slower than Insertion


sort

23
Example
8 4 6 9 2 3 1 1 8 4 6 9 2 3
i=1 j i=2 j

8 4 6 9 2 1 3 1 2 8 4 6 9 3
i=1 j i=3 j

8 4 6 9 1 2 3 1 2 3 8 4 6 9
i=1 j i=4 j

8 4 6 1 9 2 3 1 2 3 4 8 6 9
i=1 j i=5 j

8 4 1 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=6 j

8 1 4 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=7
j
1 8 4 6 9 2 3
24
i=1 j
BUBBLESORT(A)
1. for i  1 to length[A]
2. do for j  length[A] down to i + 1
3. do if A[j] < A[j -1]
4. i then exchange A[j]  A[j-1]

8 4 6 9 2 3 1
i=1 j

25
Bubble-Sort Running Time
BUBBLESORT(A)
for i  1 to length[A] c1
do for j  length[A] downto i + 1 c2
Comparisons:  n2/2 do if A[j] < A[j -1] c3
Exchanges:  n2/2
then exchange A[j]  A[j-1] c4
n

 (n  i )
n n
T(n) = c1(n+1) + c2  (n  i  1)  c3  (n  i )  c4
i 1 i 1 i 1
n

= (n) + (c2 + c2 + c4)  (n  i )


i 1
n n n
n( n  1) n 2
n
where  (n  i)  n   i  n 
2
 
i 1 i 1 i 1 2 2 2
Thus,T(n) = (n2)
26
SELECTION SORT
• Idea:
– Find the smallest element in the array
– Exchange it with the element in the first position
– Find the second smallest element and exchange it
with the element in the second position
– Continue until the array is sorted
• Disadvantage:
– Running time depends only slightly on the amount
of order in the file

28
Example
8 4 6 9 2 3 1 1 2 3 4 9 6 8

1 4 6 9 2 3 8 1 2 3 4 6 9 8

1 2 6 9 4 3 8 1 2 3 4 6 8 9

1 2 3 9 4 6 8 1 2 3 4 6 8 9

29
SELECTION-SORT(A) 8 4 6 9 2 3 1
1.n ← length[A]
2.for j ← 1 to n - 1
3. do smallest ← j
4. for i ← j + 1 to n
5. do if A[i] < A[smallest]
6. then smallest ← i
7. exchange A[j] ↔ A[smallest]

30
Analysis of Selection Sort
SELECTION-SORT(A) cost times
n ← length[A] c1 1
for j ← 1 to n - 1 c2 n
do smallest ← j c3 n-1
n2/2 for i ← j + 1 to n c4 nj11 (n  j  1)
comparisons
do if A[i] < A[smallest] c5 
n 1
j 1
(n  j )
n
exchanges
then smallest ← i c6 
n 1
j 1
(n  j )

exchange A[j] ↔ A[smallest] c7 n-1


n 1 n 1 n 1
T (n)  c1  c2 n  c3 (n  1)  c4  (n  j  1)  c5 
31 n  j   c6   n  j   c7 (n  1)  (n 2 )
j 1 j 1 j 2
• Worst Case Time Complexity [ Big-O ]: O(n2)
• Best Case Time Complexity [Big-omega]: O(n2)
• Average Time Complexity [Big-theta]: O(n2)
• Space Complexity: O(1)
MERGE SORT
It uses divide-and conquer strategy.

First, divide the array into equal halves.

Then, sort the two sub arrays and combine to form the solution to
the problem array.
MergeSort(A, p, r)
1. If p > r
2. return;
3. q = (p+r)/2
4. MergeSort(A, p, q)
5. MergeSort(A, q+1, r)
6. Merge(A, p, q, r)
Merge( A, p, q, r) 14. j=j+1;
15. }
1. Create L ← A[p..q] and 16. k=k+1;
M ← A[q+1..r] 17. }
2. n1 = q - p + 1; n2 = r - q;
3. i = 0; j = 0; k = p; 18. while (i < n1)
4. while (i < n1 and j < n2) 19. {
5. { 20. A[k] = L[i];
6. if (L[i] <= M[j]) 21. i=i+1; k=k+1;
7. { 22. }
8. A[k] = L[i];
9. i=i+1; 23. while (j < n2)
10. } 24. {
11. else 25. A[k] = M[j];
12. { 26. j=j+1; k=k+1;
13. A[k] = M[j]; 27. }
• Worst Case Time Complexity [ Big-O ]: O(n*log n)
• Best Case Time Complexity [Big-omega]: O(n*log n)
• Average Time Complexity [Big-theta]: O(n*log n)
• Space Complexity: O(n)
QUICKSORT
/* low --> Starting index, high --> Ending index */
QuickSort(arr[], low, high)
1. if (low < high)
2. {
3. //pi is partitioning index, arr[p] is now at right place
4. pi = Partition(arr, low, high);

5. QuickSort(arr, low, pi - 1); // Before pi


6. QuickSort(arr, pi + 1, high); // After pi
7. }
Partition (arr[], low, high)

1. // pivot (Element to be placed at right position)


2. pivot = arr[high];
3. i = (low - 1) // Index of smaller element
4. for j = low to high- 1
5. {
6. // If current element is smaller than or equal to pivot
7. if (arr[j] <= pivot)
8. {
9. i++; // increment index of smaller element
10. swap arr[i] and arr[j]
11. }
12. }
13. swap arr[i + 1] and arr[high])
14. return (i + 1)
HEAPSORT
• Combines the better attributes of merge sort
and insertion sort.
– Like merge sort, but unlike insertion sort, running
time is O(n lg n).
– Like insertion sort, but unlike merge sort, sorts in
place.
• Introduces an algorithm design technique
– Create data structure (heap) to manage
information during the execution of an algorithm.

Comp 122
Binary Heap
• Array viewed as a nearly complete binary tree.
– Physically – linear array.
– Logically – binary tree, filled on all levels (except lowest.)
• Map from array elements to tree nodes and vice versa
– Root – A[1]
– Left[ i ] – A[2i]
– Right[ i ] – A[2i+1]
– Parent[ i ] – A[ i /2]
• length[A] – number of elements in array A.
• heap-size[A] – number of elements in heap stored in
A.
– heap-size[A]  length[A]
Heap Property (Max and Min)
• Max-Heap
– For every node excluding the root,
value is at most that of its parent: A[parent[i]]  A[i]
• Largest element is stored at the root.
• In any subtree, no values are larger than the value
stored at subtree root.
• Min-Heap
– For every node excluding the root,
value is at least that of its parent: A[parent[i]]  A[i]
• Smallest element is stored at the root.
• In any subtree, no values are smaller than the value
stored at subtree root
Heaps – Example
26 24 20 18 17 19 13 12 14 11 Max-heap as an
array.
1 2 3 4 5 6 7 8 9 10

Max-heap as a binary
tree.
26

24 20

18 17 19 13

12 14 11 Last row filled from left to right.


HeapSort(A)

1. BuildMaxHeap(A)
2. for i  length[A] down to 2
3. do exchange A[1]  A[i]
4. heap-size[A]  heap-size[A] – 1
5. MaxHeapify(A, 1)
Building a heap
• Use MaxHeapify to convert an array A into a max-heap.
• How?
• Call MaxHeapify on each element in a bottom-up manner.

BuildMaxHeap(A)
1. heap-size[A]  length[A]
2. for i  length[A]/2 down to 1
3. do MaxHeapify(A, i)
MaxHeapify(A, i)
1. l  left(i)
2. r  right(i)
Assumption:
3. if l  heap-size[A] and A[l] > A[i] Left(i) and Right(i) are
max-heaps.
4. then largest  l
5. else largest  i
6. if r  heap-size[A] and A[r] > A[largest]
7. then largest  r
8. if largest i
9. then exchange A[i]  A[largest]
10. MaxHeapify(A, largest)
Running Time of BuildMaxHeap
• Loose upper bound:
– Cost of a MaxHeapify call  No. of calls to MaxHeapify
– O(lg n)  O(n) = O(n lg n)
• Tighter bound:
– Cost of a call to MaxHeapify at a node depends on the
height, h, of the node – O(h).
– Height of most nodes smaller than n.
– Height h of nodes ranges from 0 to lg n.
– No. of nodes of height h is n/2h+1

Comp 122
Running Time of BuildMaxHeap
Tighter Bound for T(BuildMaxHeap)

lg n 
T(BuildMaxHeap) h

h 0 2
h
lg n 
 n 
 
h 0  2
h 1 

O ( h)


h
h
, x  1 / 2 in (A.8)
h 0 2
 lg n  h 
 O n  h  
1/ 2
 h 0 2  (1  1 / 2) 2
2
 lg n  h    h
O n  h   O n  h 
 h 0 2   h 0 2 
 O ( n)

Can build a heap from an unordered array in linear time

Comp 122
Running Time for MaxHeapify
MaxHeapify(A, i)
1. l  left(i)
2. r  right(i)
3. if l  heap-size[A] and A[l] > A[i]
Time to fix node i and
4. then largest  l its children = (1)
5. else largest  i
6. if r  heap-size[A] and A[r] > A[largest] PLUS
7. then largest  r
8. if largest i
Time to fix the
9. then exchange A[i]  A[largest] subtree rooted at one
10. MaxHeapify(A, largest) of i’s children = T(size
of subree at largest)
Comp 122
Running Time for MaxHeapify(A, n)
• T(n) = T(largest) + (1)
• largest  2n/3 (worst case occurs when the last row
of tree is exactly half full)
• T(n)  T(2n/3) + (1)  T(n) = O(lg n)
• Alternately, MaxHeapify takes O(h) where h is the
height of the node where MaxHeapify is applied

Comp 122
Heapsort – Example
26 24 20 18 17 19 13 12 14 11

1 2 3 4 5 6 7 8 9 10

26

24 20

18 17 19 13

12 14 11

Comp 122
Algorithm Analysis
• In-place

• Not Stable

• Build-Max-Heap takes O(n) and each of the n-1 calls to


Max-Heapify takes time O(lg n).

• Therefore, T(n) = O(n lg n)

Comp 122
Heap Procedures for Sorting
• MaxHeapify O(lg n)
• BuildMaxHeap O(n)
• HeapSort O(n lg n)

Comp 122

You might also like