Professional Documents
Culture Documents
Applications of Sorting
Applications of Sorting
n n2 /4 n lg n
10 25 33
100 2,500 664
1,000 250,000 9,965
10,000 25,000,000 132,877
100,000 2,500,000,000 1,660,960
You might still get away with using a quadratic-time algorithm even if n =
10, 000, but quadratic-time sorting is clearly ridiculous once n ≥ 100, 000.
Many important problems can be reduced to sorting, so we can use our clever
O(n log n) algorithms to do work that might otherwise seem to require a quadratic
algorithm. An important algorithm design technique is to use sorting as a basic
building block, because many other problems become easy once a set of items is
sorted.
Consider the following applications:
• Closest pair – Given a set of n numbers, how do you find the pair of numbers
that have the smallest difference between them? Once the numbers are sorted,
the closest pair of numbers must lie next to each other somewhere in sorted
order. Thus, a linear-time scan through them completes the job, for a total
of O(n log n) time including the sorting.
Figure 4.1: The convex hull of a set of points (l), constructed by left-to-right insertion.
• Selection – What is the kth largest item in an array? If the keys are placed
in sorted order, the kth largest can be found in constant time by simply
looking at the kth position of the array. In particular, the median element
(see Section 14.3 (page 445)) appears in the (n/2)nd position in sorted order.
• Convex hulls – What is the polygon of smallest area that contains a given
set of n points in two dimensions? The convex hull is like a rubber band
stretched over the points in the plane and then released. It compresses to
just cover the points, as shown in Figure 4.1(l). The convex hull gives a nice
representation of the shape of the points and is an important building block
for more sophisticated geometric algorithms, as discussed in the catalog in
Section 17.2 (page 568).
But how can we use sorting to construct the convex hull? Once you have the
points sorted by x-coordinate, the points can be inserted from left to right
into the hull. Since the right-most point is always on the boundary, we know
that it will appear in the hull. Adding this new right-most point may cause
others to be deleted, but we can quickly identify these points because they lie
inside the polygon formed by adding the new point. See the example in Figure
4.1(r). These points will be neighbors of the previous point we inserted, so
they will be easy to find and delete. The total time is linear after the sorting
has been done.
While a few of these problems (namely median and selection) can be solved in
linear time using more sophisticated algorithms, sorting provides quick and easy
solutions to all of these problems. It is a rare application where the running time
106 4. SORTING AND SEARCHING
Take-Home Lesson: Sorting lies at the heart of many algorithms. Sorting the
data is one of the first things any algorithm designer should try in the quest
for efficiency.
Problem: Give an efficient algorithm to determine whether two sets (of size m and
n, respectively) are disjoint. Analyze the worst-case complexity in terms of m and
n, considering the case where m is substantially smaller than n.
Solution: At least three algorithms come to mind, all of which are variants of
sorting and searching:
• First sort the big set – The big set can be sorted in O(n log n) time. We can
now do a binary search with each of the m elements in the second, looking
to see if it exists in the big set. The total time will be O((n + m) log n).
• First sort the small set – The small set can be sorted in O(m log m) time. We
can now do a binary search with each of the n elements in the big set, looking
to see if it exists in the small one. The total time will be O((n + m) log m).
• Sort both sets – Observe that once the two sets are sorted, we no longer
have to do binary search to detect a common element. We can compare the
smallest elements of the two sorted sets, and discard the smaller one if they
are not identical. By repeating this idea recursively on the now smaller sets,
we can test for duplication in linear time after sorting. The total cost is
O(n log n + m log m + n + m).
So, which of these is the fastest method? Clearly small-set sorting trumps big-
set sorting, since log m < log n when m < n. Similarly, (n + m) log m must be
asymptotically less than n log n, since n + m < 2n when m < n. Thus, sorting the
small set is the best of these options. Note that this is linear when m is constant
in size.
Note that expected linear time can be achieved by hashing. Build a hash table
containing the elements of both sets, and verify that collisions in the same bucket
are in fact identical elements. In practice, this may be the best solution.