Professional Documents
Culture Documents
Merge Sort: Divide and Conquer
Merge Sort: Divide and Conquer
INTRODUCTION
A Simple Solution is to initialize skyline or result as empty, then one by one add
buildings to skyline. A building is added by first finding the overlapping strip(s). If there
are no overlapping strips, the new building adds new strip(s). If overlapping strip is
found, then height of the existing strip may increase. Time complexity of this solution is
O(n2)
We can find Skyline in Θ(nLogn) time using Divide and Conquer. The idea is similar
to Merge Sort, divide the given set of buildings in two subsets. Recursively construct
skyline for two halves and finally merge the two skylines.
The greedy approach and dynamic programming are two main algorithm design
principles. They have in common that they extend solutions from smaller sub-instances
incrementally to larger sub-instances, up to the full instance. The third main design
principle still follows the pattern of reducing a given problem to smaller instances of
itself, but it makes jumps rather than incremental steps. Divide: A problem instance is
split into a few significantly smaller instances. These are solved independently. Conquer:
These partial solutions are combined appropriately, to get a solution of the given
instance. Sub-instances are solved in the same way, thus we end up in a recursive
algorithm. A certain difficulty is the time analysis. While we can determine the time
complexity of greedy and dynamic programming algorithms basically by counting of
operations in loops and summations, this is not so simple for divide-and-conquer
algorithms, due to their recursive structure. We will need a special techique for time
analysis: solving recurrences. Luckily, a special type of recurrence covers almost all
usual divide-and-conquer algorithms. We will solve these recurrences once and for all,
and then we can just apply the results. This way, no difficult mathematical analysis will
be needed for every new algorithm. Among the elementary algorithm design techniques,
dynamic programming is perhaps the most useful and versatile one. Divide-and-conquer
has, in general, much fewer applications, but it is of central importance for searching and
sorting problem
1
2. Searching, Sorting, and Divide-and Conquer
The greedy approach and dynamic programming are two main algorithm design
principles. They have in common that they extend solutions from smaller sub-instances
incrementally to larger sub-instances, up to the full instance. The third main design
principle still follows the pattern of reducing a given problem to smaller instances of
itself, but it makes jumps rather than incremental steps. Divide: A problem instance is
split into a few significantly smaller instances. These are solved independently. Conquer:
These partial solutions are combined appropriately, to get a solution of the given
instance. Sub-instances are solved in the same way, thus we end up in a recursive
algorithm. A certain difficulty is the time analysis. While we can determine the time
complexity of greedy and dynamic programming algorithms basically by counting of
operations in loops and summations, this is not so simple for divide-and-conquer
algorithms, due to their recursive structure. We will need a special techique for time
analysis: solving recurrences. Luckily, a special type of recurrence covers almost all
usual divide-and-conquer algorithms. We will solve these recurrences once and for all,
and then we can just apply the results. This way, no difficult mathematical analysis will
be needed for every new algorithm. Among the elementary algorithm design techniques,
dynamic programming is perhaps the most useful and versatile one. Divide-and-conquer
has, in general, much fewer applications, but it is of central importance for searching and
sorting problems.
2
3.Divide-and Conquer
3
returns the solution from this half, i.e., the position of x or the information that x is not
present. Although it was very easy to see the time bound O(log n) directly, we also show
how this algorithm would be analyzed in the general context of divide-and-conquer
algorithms. (Recall that binary search serves here only as an introductory example.) Let
us pretend that, in the beginning, we have no clue what the time complexity could be.
Then we may define a function T(n) as the time complexity of our algorithm, and try to
figure out 3 this function. What do we know about T from the algorithm? We started
from an instance of size n. Then we identified one instance of half size, after O(1)
operations (computing the index of the central element, and one comparison). Hence our
T fulfills this recurrence: T(n) = T(n/2) + O(1). Verify that T(n) = O(log n) is in fact a
solution. We will show later in more generality how to solve such recurrences.
4
4. Solving The Skyline Problem
A more substantial example is the Skyline Problem. Since this problem is
formulated in a geometric language, we first have to think about the representation of
geometric data in the computer, before we can discuss any algorithmic issues. In which
form should the input data be given, and how shall we describe the output? Our
rectangles can be specified by three real numbers: coordinates of left and right end, and
height. It is natural to represent the skyline as a list of heights, ordered from left to right,
also mentioning the coordinates where the heights change. A straightforward algorithm
would start with a single rectangle, insert the other rectangles one by one into the picture
and update the skyline. Since the jth rectangle may obscure up to j − 1 lines in the skyline
formed by the first j − 1 rectangles, updating the list needs O(j) time in the worst case.
This results in O(n 2 ) time in total. The weakness of the obvious algorithm is that is uses
linearly many update operations to insert only one new rectangle. This is quite wasteful.
The key observation for a faster algorithm is that merging two arbitrary skylines is not
much more expensive than inserting a single new rectangle (in the worst case). This
suggests a divide-and-conquer approach: Divide the instance arbitrarily in two sets of
roughly n/2 rectangles. Compute the skylines for both subsets independently. Finally
combine the two skylines, by scanning them from left to right and keeping the higher
horizontal line at each position. The details of this conquer phase are not difficult, we
skip them here. The conquer phase runs in O(n) time. Hence the time complexity of the
entire algorithm satisfies the recurrence T(n) = 2T(n/2) + O(n). For the moment believe
that this recurrence has the solution T(n) = O(n log n). (We may prove this by an ad-hoc
argument, but soon we will do it more systematically.) This is significantly faster than
O(n 2 ). Again, the intuitive reason for the much better time complexity is that we made a
better use of the same number O(n) of update operations. We 4 can also see this from the
recurrence for our previous algorithm, which is T(n) = T(n − 1) + O(n), with solution
T(n) = O(n 2 ). We mention an alternative algorithm. Sort all 2n left and right ends
5
according to their order on the horizontal line, then scan this sorted sequence of these
points, and for every such point determine the largest height immediately to the right of
the point. (Again, details are simple.) This construction needs O(n) time, when
implemented with some care. However, note that the
6
m + 1, and a few steps of calculation yield T(n) = O(a mm) = O(n k log n). If a < bk then
only the mth term in the sum determines the result in Onotation: T(n) = O(a m(b k /a) m)
= O((b m) k ) = O(n k ). These three formulae are often called the master theorem for
recurrences. So far we have only considered very special arguments n = b m. However,
the O-results remain valid for general n, for the following reason: The results are
polynomially bounded functions. If we multiply the argument by a constant, the function
value is changed by at most a constant factor as well. Every argument n is at most a
factor b away from the next larger power b m. Hence, if we generously bound T(n) by
T(b m), we incur only another constant factor. Most divide-and-conquer algorithms lead
to a recurrence settled by the master theorem. In other cases we have to solve other types
of recurrences. Approaches are similar, but sometimes the calculations and bounding
arguments may be more tricky.
In statistical investigations, the median is often better suited as a “typical” value
than the average, because it is robust against outliers. For example, the average wealth in
a population can raise when only a few people become extremely rich. Then the average
gives a biased picture of the wealth of the majority. This does not happen if we look at
the median. Changing the median requires substantial changes of the wealth of many
people. More generally speaking, median values, or values with another fixed rank, are
often used as thresholds for the discretization of numerical data, because the sets of
values above/below these thresholds have known sizes. We remark that, once we know
the kth smallest element, we can also find the k smallest elements in another O(n) steps,
just by n − 1 comparisons to the rank-k element.
7
6. SKYLINE IN 1-DIMENSIONAL
we take S to be a set of n points on the real line R 1 . Thus, a point p in S has d = 1 range
attribute, x1(p), and t ≥ 1 features, a1(p), . . . , at(p). To simplify notation, throughout this
section we will view the range attribute x1(p) to be equivalent to p and will generally
write “p” instead of “x1(p)”. The query is an interval q = [u1, v1] in range space. As we
shall see, our solution takes advantage of the fact that there is a linear ordering of the
points in the range space and q ∩ S is a set of points that are contiguous in this ordering.
Therefore, our approach works in a more general setting, where the points of S lie on a
curve in R d (d ≥ 1) and the query, q, is a subsegment of the curve; the curve is our range
space, the ordering of the points along the curve is the desired linear ordering, and q ∩ S
is a subset of contiguous points in this ordering. An instance of this setting, for d = 2, is
where many houses are situated along a freeway and the buyer is interested in houses
lying between two points on the freeway. However, for simplicity, we will describe our
solution assuming that the range space is R 1 . Key ideas: Assume that the points of S are
given in increasing order along R 1 as p1, p2, . . . , pn, and that there are two dummy
points p0 = −∞ and pn+1 = ∞ for which all feature coordinates are set to −∞. For each
point pi ∈ S, we define two other points, p − i and p + i , as follows: p − i is the point of
S∪{p0, pn+1} with the largest index less than i that dominates pi in the feature space.
Symmetrically, p + i is the point with the smallest index greater than i that dominates pi
in the feature space. Clearly, due to how we defined the dummy points, the points p − i
and p + i always exist. The crucial observation is that pi ∈ RangeSky(S, q) iff the
following conditions are met: (a) pi ∈ q, (b) p − i 6∈ q, and (c) p + i 6∈ q. Condition (a) is
clear from the definition of RangeSky(S, q). Conditions (b) and (c) follow from the
definition of p − i and p + i . Thus, pi ∈ RangeSky(S, q) iff u1 ≤ pi ≤ v1 and p − i < u1
and p + i > v1, i.e., iff u1 ∈ (p − i , pi] and v1 ∈ [pi, p+ i ), i.e., iff (u1, v1) ∈ (p − i , pi] ×
8
[pi, p+ i ). Thus, in R 2 , the point q ′ = (u1, v1), defined by the endpoints of the query
interval q = [u1, v1], must lie in the axes-parallel rectangle Ri = (p − i , pi] × [pi, p+ i )
defined by pi, p − i , and p + i . Thus, our problem reduces to determining all rectangles
Ri, corresponding to points pi ∈ S, that are stabbed by the query point q ′ = (u1, v1). This
is a standard intersection searching problem on n rectangles and can be solved by
employing a data structure due to Chazelle [2] that uses O(n) space and has a query time
of O(k +log n), where k is the number of rectangles stabbed (hence the number of points
in RangeSky(S, q)). Complete details will appear in the full paper. Theorem 2.1. Let S be
a set of n points on the real line R 1 , where each point has t features. S can be
preprocessed in O(n logt n) time into a data structure of size O(n) such that for any query
interval q = [u1, v1], the skyline of the points of q ∩ S, computed w.r.t. their features, can
be reported in time O(k + log n), where k is the size of the skyline. This is optimal, w.r.t.
space and query time, in the decision tree model. Furthermore, the same bounds hold
even if the points of S lie on a curve, C, in R d (d ≥ 1) and the query is specified as a
subsegment of C.
9
7. HARDNESS OF RANGE-SKYLINE COUNTING
We consider a problem that is closely related to the rangeskyline query problem.
This problem, called range-skyline counting, is to count the number of points in
RangeSky(S, q). We provide evidence for the likely computational difficulty of this
problem by showing a linear-time reduction from the set intersection problem, which is
conjectured to be difficult. Set Intersection Problem: Given sets S1, S2, . . . , Sm of
positive reals, where Pm i=1 |Si| = n, decide if Si and Sj are disjoint, for query indices i
and j, i < j. It is conjectured that this problem is “hard”. Specifically, in the so-called cell-
probe model without the floor function and where the maximum cardinality of the sets is
polylogarithmic in m, any algorithm to answer set intersection queries in O˜(α) time
requires Ω(( ˜ n/α) 2 ) space, for 1 ≤ α ≤ n [3]. (The “tilde” notation is used to suppress
polylogarithmic factors.) Idea behind the reduction: Given sets S1, S2, . . . , Sm, we map
their elements to a point-set, T , with d = 2 range attributes and t = 3 features. This
mapping has the property that Si and Sj are disjoint iff |RangeSky(T, qij )| = |Si|+|Sj |,
where qij is a certain query rectangle determined by Si and Sj . (We define qij below and
establish the stated property in Lemma 4.1.) Specifically, let L1 : y = x + n and L2 : y = x
− n be two lines in the plane (the range space). We map each element of each set Si to
two points, one on L1 and one on L2, as follows: Let n0 = 0 and ni = ni−1 + |Si|, for 1 ≤ i
≤ m. Let sik be the kth element of Si. (Si is unordered and sik is simply the kth element in
an arbitrary ordering of the elements of Si.) The mapping of sik to points on L1 and L2 is
given as follows: sik 7→ s ′ ik = (−(k + ni−1), −(k + ni−1) + n), on L1 and sik 7→ s ′′ ik
= ((k + ni−1),(k + ni−1) − n), on L2. The resulting set, T , consists of 2n points and the
above coordinates are the range attributes of the points of T . Observe that the elements
of each Si map to sets of contiguous points on L1 (in the second quadrant) and on L2 (in
10
the fourth quadrant). Moreover, it is easy to see that for any query indices i and j, where i
< j, there is a rectangle qij that contains exactly all the points s ′ ik that Si maps to on L1
and all the points s ′′ jk that Sj maps to on L2. Specifically, qij is defined uniquely by the
first and last mapped points of Si on L1 and the first and last mapped points of Sj on L2.
By storing with each set the first and last mapped point of that set on L1, and similarly on
L2, the rectangle qij can be computed in O(1) time for any pair of query indices i and j, i
< j. Next, each point of T is assigned t = 3 features. Specifically, for each set Si, the
points s ′ ik and s ′′ ik are both assigned the feature vector (i, sik, −sik). This constitutes
the entire reduction and it is clear that it takes O(n) time. The following lemma, whose
proof is omitted due to space limitation, captures the main property of the reduction.
Lemma 4.1. Sets Si and Sj (i < j) are disjoint iff |RangeSky(T, qij )| = |Si| + |Sj |. Clearly,
deciding if |RangeSky(T, qij )| = |Si| + |Sj | is no more difficult than range-skyline
counting for d = 2 and t = 3. Furthermore, the latter is no more difficult than range-
skyline counting for d > 2 and t > 3. This, together with Lemma 4.1, establishes the
following theorem. Theorem 4.1. The range-skyline counting problem (for d ≥ 2 and t ≥
3) is at least as hard the set intersection problem. Specifically, answering a range-skyline
counting query in O˜(α) time requires Ω(( ˜ n/α) 2 ) space, for 1 ≤ α ≤ n in the cell-probe
model, without the floor function.
11
8.SOURCE CODE
#include <iostream>
using namespace std;
// A structure for building
struct Building {
// x coordinate of left side
int left;
// height
int ht;
// x coordinate of right side
int right;
};
// A strip in skyline
class Strip {
// x coordinate of left side
int left;
// height
int ht;
public:
Strip(int l = 0, int h = 0)
{
left = l;
12
ht = h;
}
friend class SkyLine;
};
// Skyline: To represent Output(An array of strips)
class SkyLine {
// Array of strips
Strip* arr;
// Capacity of strip array
int capacity;
// Actual number of strips in array
int n;
public:
~SkyLine() { delete[] arr; }
int count() { return n; }
// A function to merge another skyline
// to this skyline
SkyLine* Merge(SkyLine* other);
// Constructor
SkyLine(int cap)
{
capacity = cap;
arr = new Strip[cap];
n = 0;
}
// Function to add a strip 'st' to array
void append(Strip* st)
{
// Check for redundant strip, a strip is
// redundant if it has same height or left as previous
if (n > 0 && arr[n - 1].ht == st->ht)
return;
if (n > 0 && arr[n - 1].left == st->left) {
arr[n - 1].ht = max(arr[n - 1].ht, st->ht);
13
return;
}
arr[n] = *st;
n++;
}
// A utility function to print all strips of
// skyline
void print()
{
for (int i = 0; i < n; i++) {
cout << " (" << arr[i].left << ", "
<< arr[i].ht << "), ";
}
}
};
// This function returns skyline for a
// given array of buildings arr[l..h].
// This function is similar to mergeSort().
SkyLine* findSkyline(Building arr[], int l, int h)
{
if (l == h) {
SkyLine* res = new SkyLine(2);
res->append(
new Strip(
arr[l].left, arr[l].ht));
res->append(
new Strip(
arr[l].right, 0));
return res;
}
int mid = (l + h) / 2;
// Recur for left and right halves
// and merge the two results
SkyLine* sl = findSkyline(
arr, l, mid);
SkyLine* sr = findSkyline(
14
arr, mid + 1, h);
SkyLine* res = sl->Merge(sr);
// To avoid memory leak
delete sl;
delete sr;
// Return merged skyline
return res;
}
// Similar to merge() in MergeSort
// This function merges another skyline
// 'other' to the skyline for which it is called.
// The function returns pointer to the
// resultant skyline
SkyLine* SkyLine::Merge(SkyLine* other)
{
// Create a resultant skyline with
// capacity as sum of two skylines
SkyLine* res = new SkyLine(
this->n + other->n);
// To store current heights of two skylines
int h1 = 0, h2 = 0;
// Indexes of strips in two skylines
int i = 0, j = 0;
while (i < this->n && j < other->n) {
// Compare x coordinates of left sides of two
// skylines and put the smaller one in result
if (this->arr[i].left < other->arr[j].left) {
int x1 = this->arr[i].left;
h1 = this->arr[i].ht;
// Choose height as max of two heights
int maxh = max(h1, h2);
res->append(new Strip(x1, maxh));
i++;
}
15
else {
int x2 = other->arr[j].left;
h2 = other->arr[j].ht;
int maxh = max(h1, h2);
res->append(new Strip(x2, maxh));
j++;
}
}
// If there are strips left in this
// skyline or other skyline
while (i < this->n) {
res->append(&arr[i]);
i++;
}
while (j < other->n) {
res->append(&other->arr[j]);
j++;
}
return res;
}
// Driver Function
int main()
{
Building arr[] = {
{ 1, 11, 5 }, { 2, 6, 7 }, { 3, 13, 9 }, { 12, 7, 16 }, { 14, 3, 25 }, { 19, 18,
22 }, { 23, 13, 29 }, { 24, 4, 28 }
};
int n = sizeof(arr) / sizeof(arr[0]);
// Find skyline for given buildings
// and print the skyline
SkyLine* ptr = findSkyline(arr, 0, n - 1);
cout << " Skyline for given buildings is \n";
ptr->print();
return 0;
}
16
9.OUTPUT:
17
REFERENCES
[1] G. S. Brodal and K. Tsakalidis. Dynamic planar range maxima queries. In
Proc. 38th Intl. Conf. on Automata, Languages, and Programming, pages 256–267, 2011.
[2] B. Chazelle. Filtering search: A new approach to query-answering. SIAM J.
Computing, 15(3):703–724, 1986.
[3] P. Davoodi, M. Smid, and F. van Walderveen. Two-dimensional range
diameter queries. In Proc. Latin American Theoretical Informatics Symposium, pages
219–230, 2012.
[4] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf.
Computational Geometry: Algorithms and applications. Springer-Verlag, 2nd edition,
2000.
[5] D. Papadias, Y.Tao, G. Fu, and B. Seeger. Progressive skyline computation in
database systems. ACM Trans. on Database Systems, 30(1):41–82, 2005
18