Introduction To Algorithms Introduction To Algorithms: Comparison Sorting Review P G

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Comparison

p Sorting
g Review

Introduction to Algorithms z Insertion sort:


z Pro’s:
Sorting in Linear Time z Easy to code
z Fast on small inputs (less than ~50 elements)

CSE 680 z Fast on nearly-sorted


y inputs
p
Prof. Roger Crawfis z Con’s:
z ( 2) worst case
O(n
z O(n2) average case

z O(n2) reverse-sorted case

Comparison
p Sorting
g Review Comparison
p Sorting
g Review

z Merge sort: z Heap sort:


z Divide-and-conquer: z Uses the very useful heap data structure
z Split
p arrayy in half z Complete
p binaryy tree
z Recursively sort sub-arrays z Heap property: parent key > children’s keys

z Linear-time merge step z Pro’s:


z P ’
Pro’s: z O(n lg n) worst case - asymptotically optimal for
z O(n lg n) worst case - asymptotically optimal for comparison sorts
comparison
co pa so sosortss z So
Sorts
s in p
place
ace
z Con’s: z Con’s:
z Doesn’t sort in place z Fair amount of shuffling memory around
Comparison
p Sorting
g Review Non-Comparison Based Sorting

z Quick sort: z Manyy times we have restrictions on our


z Divide-and-conquer: keys
z Partition array into two sub-arrays, recursively sort z Deck of cards: Ace->King and four suites
z All of first sub-array < all of second sub-array z Social Security Numbers
z Pro’s: z Employee ID’s
z O(n lg n) average case
z S t in
Sorts i place
l
z We will examine three algorithms which
z Fast in practice (why?) under certain conditions can run in O(n)
z Con’s: time.
z O(n2) worst case z Counting
C ti sortt
z Naïve implementation: worst case on sorted input z Radix sort
z Good partitioning makes this very unlikely.
z Bucket sort

Counting
g Sort Counting
g Sort
1 CountingSort(A, B, k)
z Depends on assumption about the 2 for i=1 to k
numbers being sorted 3 C[i]= 0; This is called
a histogram.
z Assume numbers are in the range 1..
1 k 4 for j=1
j 1 to n
5 C[A[j]] += 1;
z The algorithm: 6 for i=2 to k
z IInput:
t A[1..n],
A[1 ] where
h A[j] ∈ {1,
{1 22, 3
3, …, k} 7 C[i] = C[i] + C[i
C[i-1];
1]
8 for j=n downto 1
z Output: B[1..n], sorted (not sorted in place)
9 B[C[A[j]]] = A[j];
z Also: Array C[1..k] for auxiliary storage 10 C[A[j]] -= 1;
Counting
g Sort Example
p Counting
g Sort
1 CountingSort(A, B, k)
2 for i=1 to k Takes time O(k)
3 C[i]= 0;
4 for j=1
j 1 to n
5 C[A[j]] += 1;
6 for i=2 to k
Takes time O(n)
7 C[i] = C[i] + C[i
C[i-1];
1]
8 for j=n downto 1
9 B[C[A[j]]] = A[j];
10 C[A[j]] -= 1;

What is the runningg time?

Counting
g Sort Counting
g Sort

z Total time: O(n + k) z Why don


don’tt we always use counting sort?
z Works well if k = O(n) or k = O(1) z Depends on range k of elements.
z This algorithm / implementation is stable.
stable
z A sorting algorithm is stable when numbers z Could we use counting sort to sort 32 bit
with the same values appear in the output i t
integers?
? Wh
Why or why
h not?t?
array in the same order as they do in the
input array
array.
Counting
g Sort Review Radix Sort
z Assumption: input taken from small set of numbers of
size
i k z How did IBM get rich originally?
z Basic idea:
z Count number of elements less than you for each element.
z Answer: punched card readers for
z This gives
sort.
gi es the position of that n
number
mber – similar to selection census tabulation in early 1900’s
1900 s.
z Pro’s: z In particular, a card sorter that could sort
z Fast cards into different bins
z Asymptotically fast - O(n+k)
z Simple to code z Each column can be punched in 12 places
z Co s
Con’s: z Decimal digits use 10 places
z Doesn’t sort in place.
z Elements must be integers. countable z Problem: only one column can be sorted on
z Requires O(n+k) extra storage. at a time

Radix Sort Radix Sort Example


p

z Intuitively, you might sort on the most


Intuitively
significant digit, then the second msd, etc.
z Problem: lots of intermediate piles of cards
(read: scratch arrays) to keep track of
z Key idea: sort the least significant digit first
RadixSort(A, d)
for i=1 to d
StableSort(A) on digit i
Radix Sort Correctness Radix Sort

z Sketch of an inductive proof of correctness z What sort is used to sort on digits?


(induction on the number of passes): z Counting sort is obvious choice:
z Assume lower-order digits {j: j<i }are sorted
z Sort n numbers on digits that range from 1
1..k
k
z Show that sorting next digit i leaves array
correctly sorted z Time: O(n + k)
z If two digits at position i are different,
different ordering z Each pass over n numbers with d digits
numbers by that digit is correct (lower-order digits
irrelevant) takes time O(n+k), so total time O(dn+dk)
z If they are the same,
same numbers are already sorted on z When d is constant and k=O(n),
k=O(n) takes O(n)
the lower-order digits. Since we use a stable sort, time
the numbers stay in the right order

Radix Sort Radix Sort Review


z Assumption: input has d digits ranging from 0 to k
z Problem: sort 1 million 64-bit
64 bit numbers z Basic idea:
z Treat as four-digit radix 216 numbers z Sort elements by digit starting with least significant
z Use a stable sort (like counting sort) for each stage
z Can sort in just four passes with radix sort! z P ’
Pro’s:
z Fast
z Performs well compared to typical z Asymptotically fast (i.e., O(n) when d is constant and k=O(n))
O( lg
O(n l n)) comparison
i sortt z Simple to code
z A good choice
z Approx lg(1,000,000) ≅ 20 comparisons per z Con’s:
number
b b being
i sorted
t d z Doesn t sort in place
Doesn’t
z Not a good choice for floating point numbers or arbitrary
strings.
Bucket Sort Bucket Sort
Assumption: input elements distributed uniformly over some known Bucket-Sort(A,
( x, y)
range, e.g., [0,1), so all elements in A are greater than or equal to 0 but less 1. divide interval [x, y) into n equal-sized subintervals (buckets)
than 1 . (Appendix C.2 has definition of uniform distribution) 2. distribute the n input keys into the buckets
3. sort the numbers in each bucket (e.g., with insertion sort)
Bucket-Sort(A) 4. scan the (sorted) buckets in order and produce output array
1. n = length[A]
2. for i = 1 to n
Running time of bucket sort: O(n) expected time
3
3. d iinsertt A[i] iinto
do t lilistt B[fl
B[floor off nA[i]]
A[i]]
Step 1: O(1) for each interval = O(n) time total.
4. for i = 0 to n-1 Step 2: O(n) time.
5. do sort list i with Insertion-Sort Step 3: The expected number of elements in each bucket is O(1)
6 Concatenate lists B[0]
6. B[0], B[1]
B[1],…,B[n-1]
B[n 1] (
(see b
bookk ffor fformall argument,
t section
ti 8.4),
8 4) so total
t t l is
i O(n)
O( )
Step 4: O(n) time to scan the n buckets containing a total of n input
elements

Bucket Sort Example


p Bucket Sort Review
z Assumption: input is uniformly distributed across a range
z Basic idea:
z Partition the range into a fixed number of buckets.
z Toss each element into its appropriate bucket.
z S t each
Sort h bucket.
b k t
z Pro’s:
z Fast
z As mptoticall fast (i
Asymptotically (i.e.,
e O(n) when
hen distribution
distrib tion is uniform)
niform)
z Simple to code
z Good for a rough sort.
z Con’s:
Con s:
z Doesn’t sort in place
Summary
y of Linear Sorting
g
Non-Comparison Based Sorts
Running Time
worst-case average-case best-case in place
Counting Sort O(n + k) O(n + k) O(n + k) no
Radix Sort O(d(n + k')) O(d(n + k')) O(d(n + k')) no
Bucket Sort O(n) no

Counting sort assumes input elements are in range [0


[0,1,2,..,k]
1 2 k] and
uses array indexing to count the number of occurrences of each
value.
Radix sort assumes each integer consists of d digits
digits, and each digit is
in range [1,2,..,k'].
Bucket sort requires advance knowledge of input distribution (sorts n
numbers
b uniformly
if l distributed
di t ib t d iin range iin O(
O(n)) ti
time).
)

You might also like