Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Covering a Set of Points

with a Minimum Number of Lines

Magdalene Grantson and Christos Levcopoulos

Department of Computer Science


Lund University, Box 118, 221 Lund, Sweden
{magdalene,christos}@cs.lth.se

Abstract. We consider the minimum line covering problem: given a


set S of n points in the plane, we want to find the smallest number l
of straight lines needed to cover all n points in S. We show that this
problem can be solved in O(n log l) time if l ∈ O(log 1−ǫ n), and that
this is optimal in the algebraic computation tree model (we show that

the Ω(n log l) lower bound holds for all values of l up to O( n)). Fur-
thermore, a O(log l)-factor approximation
√ can be found within the same
O(n log l) time bound if l ∈ O( 4 n). For the case when l ∈ Ω(log n) we
suggest how to improve the time complexity of the exact algorithm by a
factor exponential in l.

1 Introduction
We consider the minimum line covering problem: given a set S of n points in the
plane, we want to find the smallest number l of straight lines needed to cover all
n points in S. The corresponding decision problem is: given a set S of n points
in the plane and an integer k, we want to know whether it is possible to find k
(or fewer) straight lines that cover all n points in S.
Langerman and Morin [6] showed that the decision problem can be solved in
O(nk + k 2(k+1) ) time. In this paper we show that the decision problem can be
solved in O(n log k + (k/2.2)2k ) time.
Kumar et al. [5] showed that the minimum line covering problem is APX-
hard. That is, unless P = N P , there does not exit a (1 + ǫ)-approximation
algorithm. In their paper they pointed out that the greedy algorithm proposed
by Johnson [4], which approximates the set covering problem within a factor
of O(log n), is the best known approximation for the minimum line covering
problem. In this paper we show that a O(log l)-factor approximation for the
minimum line covering problem can be obtained in time O(n log l + l4 log l.
We also present an algorithm that solves the line covering problem exactly
in O(n log l + (l/2.2)2l ) time. This simplifies to O(n log l) if l ∈ O(log1−ǫ n), and
we show that this is optimal in the algebraic computation tree model. That √ is,
we show that the Ω(n log l) lower bound holds for all values of l up to O( n).
We also suggest more asymptotic improvements for our exact algorithms when
l ∈ Ω(log n).
2 Magdalene Grantson and Christos Levcopoulos

2 Preliminaries

Lemma 1. Any set S of n points in the plane can be covered with at most ⌈ n2 ⌉
straight lines.

Proof. A simple way to show this upper bound is to pick two points at a time,
to construct a line through the pair, and then to remove the pair from the set.
For the special case when n is odd, we can draw an arbitrary line through the
last point. The time complexity of this algorithm is obviously O(n).

Lemma 2. If a set S of n points can be covered with k lines (k minimal or not),


then: for any subset R ⊆ S of at least k + 1 collinear points (i.e., |R| ≥ k + 1
and ∀r1 , r2 , r 3 ∈ R : r1 6= r 2 ⇒ ∃α ∈ IR : r3 = α · (r 2 − r1 ) + r 1 ), the line
through them is in the set of k covering lines.

Proof. Suppose the line through the points in R was not among the k lines
covering S. Then the points in R must be covered with at least k + 1 lines, since
no two points in R can be covered with the same line. (The only line covering
more than one point in R is the one through all of them, which is ruled out.)
Hence we need at least k + 1 lines to cover the points in R. This contradicts the
assumption that S can be covered with k lines.

Lemma 3. If a set S of n points can be covered with k lines (k minimal or not),


then: any subset of S containing at least k 2 + 1 points must contain at least k + 1
collinear points.

Proof. Suppose there is a subset R ⊆ S containing at least k 2 + 1 points, but


not containing k + 1 collinear points. Then each of the k covering lines must
contain at most k points in R. Hence with these at most k covering lines, each
containing at most k points, we can cover at most k 2 points. Thus we cannot
cover R (nor any superset of R, like S) with the k lines. This contradicts the
assumption that S can be covered with k lines.

Corollary 1. If in any subset of S containing at least k 2 + 1 points we do not


find k + 1 collinear points, we can conclude that S cannot be covered with k lines.

Lemma 4. If a set S of n points can be covered with l lines, but not with l − 1
lines (i.e., if l is the minimum number of lines needed to cover S) and k ≥ l,
then: if we generate all lines containing more than k points, the total number of
uncovered points will be at most l · k.

Proof. Let R be the set of uncovered points in S after all lines containing more
than k points have been generated. Since S can be covered with l lines and
R ⊆ S, R can be covered with l (or fewer) lines. None of the lines covering
points in R can cover more than k points in R (as all such lines have already
been generated). Hence there can be at most l · k points in R.
Covering a Set of Points with a Minimum Number of Lines 3

3 General Procedure
Given a set S of n points in the plane, we already know (because of Lemma 1)
that the minimum number l of lines needed to cover S is in {1, . . . , ⌈ n2 ⌉}.
In our algorithm, we first check whether l = 1, which can obviously be de-
cided in time linear in n. If the check fails (i.e., if the points in S are not all
collinear and thus l ≥ 2), we try to increase the lower bound for l by exploiting
Lemmas 2 and 3, which (sometimes) provide us with means of proving that the
set S cannot be covered with a certain number k of lines. In the first place, if for
a given value of k we find a subset R ⊆ S containing k 2 + 1 points, but not con-
taining k + 1 collinear points, we can conclude that more than k lines are needed
to cover S (because of Corollary 1). On the other hand, if we find k + 1 collinear
points (details of how this is done are given below), we record the line through
them (as it must be among the covering lines due to Lemma 2) and remove from
S all points covered by this line. This leads to second possible argument: If by
repeatedly identifying lines in this way (always choosing the next subset R from
the remaining points), we record k lines while there are points left in S, we can
also conclude that more than k lines are needed to cover S.
We check different values of k in increasing order (the exact scheme is dis-
cussed below), until we reach a value k1 , for which we fail to prove (with the
means mentioned above) that S cannot be covered with k1 lines. On the other
hand, we can demonstrate (in one of the two ways outlined above) that S cannot
be covered with k0 lines, where k0 is the largest value smaller than k1 that was
tested in the procedure. At this point we know that l > k0 .
Suppose that when processing S with k = k1 , we identified m1 lines, m1 ≤ k1 .
We use a simple greedy algorithm to find m2 lines covering the remaining points.
(Note that m2 may or may not be the minimum number of lines needed to cover
the remaining points. Note also that m2 = 0 if there are no points left to be
covered.) As a consequence we know that S can be covered with m1 + m2 lines
(since we have found such lines) and thus that k0 < l ≤ m1 +m2 . We show below
that m1 + m2 ∈ O(l log l) and thus that with the m1 + m2 lines we selected we
obtained an O(log l) approximation of the optimum.
In a second step we may then go on and determine the exact value of l by
drawing on the found approximation (see below for details).

4 Algorithms
In this section we propose approximate and exact algorithms to solve the mini-
mum line covering problem. We use two already known algorithms as subroutines
in our algorithms:
1. An algorithm proposed by Guibas et al. [3], which finds
 2all lines containing

n n
at least k + 1 points in a set S of n points in time O k+1 log k+1 .
2. An algorithm proposed by Langerman and Morin [6], which takes as input
a set S of n points and an integer k, and outputs whether S can be covered
with k lines in O(nk + k 2(k+1) ) time.
4 Magdalene Grantson and Christos Levcopoulos

4.1 Finding at most k Lines Covering More Than k Points


(In all pseudo-code var in the declaration of a function’s arguments is used to
indicate that an argument is passed by reference, not just by value.)
function Lines (var S: set of points, k: int) : int;
var m: int; (∗ number of lines found ∗)
L1 , . . . , Lk : set of points; (∗ points on straight lines found ∗)
R: set of points; (∗ pool of points for line finding ∗)
begin
m := 0; R := ∅; (∗ init. line counter and point pool ∗)
while m < k do begin (∗ try to find at most k lines ∗)
while |R| < k 2 + 1 and S 6= ∅ do begin
choose p ∈ S; (∗ collect at most k 2 + 1 points ∗)
S := S − {p}; (∗ into the point pool R ∗)
if ∃i; 1 ≤ i ≤ m : p is collinear with the points in Li
then Li := Li ∪ {p}; (∗ point is on already found line ∗)
else R := R ∪ {p}; (∗ point is on no found line ∗)
end
Lm+1 := FindLine(R, k); (∗ Guibas et al.’s algorithm [3] ∗)
if Lm+1 = ∅ then begin (∗ if no line with k + 1 points found ∗)
if |R| ≤ k 2 (∗ return the number of lines found ∗)
then begin S := S ∪ R; return m; end;
else return −1; (∗ if there are more than k 2 points, ∗)
end (∗ there should be such a line ∗)
R := R − Lm+1 ; m := m + 1; (∗ remove the covered points and ∗)
end (∗ increment the line counter ∗)
if |R| + |S| = 0 return m; (∗ return the number of lines found ∗)
return −1; (∗ if there are uncovered points, ∗)
end (∗ more than k lines are needed ∗)

4.2 Approximation for Minimum Line Covering


To greedily cover the at most k · l points (see Lemma 4) that may be left after
we removed lines with more than k points, we use the following function:
function GreedyCover (S: set of points, k: int) : int;
var m: int; (∗ number of lines found ∗)
L: set of points; (∗ buffer for points on found line ∗)
begin
m := 0; (∗ initialize the line counter ∗)
while S 6= ∅ do begin (∗ while not all points are covered ∗)
L := FindLine(S, k); (∗ Guibas et al.’s algorithm [3] ∗)
if L = ∅ then k := k − 1; (∗ reduce the number of points ∗)
else begin m := m + 1; S := S − L; end;
end; (∗ count the line found and ∗)
return m; (∗ remove the covered points ∗)
end (∗ return the number of lines found ∗)
Covering a Set of Points with a Minimum Number of Lines 5

The approximation algorithm for the line covering problem then is:

function ApxLineCover (S: set of points) : int;


var k, m, n : int; (∗ numbers of lines/points ∗)
R : set of points; (∗ remaining uncovered points ∗)
begin
if all points in S are collinear then return 1;
k := 2; n :=√|S|; (∗ initialize variables ∗)
while k ≤ 8 n do begin
R := S; m := Lines(R, k); (∗ see Subsection 4.1 ∗)
if m ≥ 0 then return m + GreedyCover(R, k − 1);
k := k 2 ;
end; √
R := S; m := Lines(R, 4 n); (∗ see Subsection
√ 4.1 ∗)
if m ≥ 0 then return m + GreedyCover(R, 4 n − 1);
end; √
k := 2 4 n;  
while k < n2 do begin
R := S; m := Lines(R, k); (∗ see Subsection 4.1 ∗)
if m ≥ 0 then return m + GreedyCover(R, k − 1);
k := 2k;
end;  
return n2 ; (∗ see Lemma 1 ∗)
end

4.3 Analysis

Approximation ratio. Let k1 be the value of k in function ApxLineCover, for


which the call to function Lines succeeds, i.e., for which it returns a value ≥ 0.
Let m1 be the number of lines found by the function Lines in this case, nR = |R|
the total number of points that are left uncovered, and m2 the number of lines
found by the algorithm GreedyCover applied to these nR remaining points.
From Lemma 4 we know nR ≤ k1 · l. We also know that the maximum
value of k1 is (l − 1)2 , namely when we tried k0 = l − 1, could prove that we
cannot cover S with k0 lines, and then computed k1 = k02 . Running the greedy
algorithm on the at most nR ≤ k1 · l ≤ (l − 1)2 · l ∈ O(l3 ) remaining points gives
an O(log l3 ) = O(log l) approximation of the optimum number lR of lines, with
which the nR remaining points can be covered (as shown by Johnson [4]). Since
certainly lR ≤ l, we have m2 ∈ O(l log l). The maximum value of m1 is clearly l,
namely when m2 = 0. Thus m1 + m2 ∈ O(l + l log l) = O(l log l).

Time complexity of function Lines. For each point p chosen from S, we


need to check whether it lies on an already found line or not. This is a funda-
mental problem in computational geometry called the point location problem. A
planar subdivision with m lines is called an arrangement. Such an arrangement
can be preprocessed in time O(m2 log m) into a linear size data structure so
6 Magdalene Grantson and Christos Levcopoulos

that a query whether a given point p lies on any of the lines can be answered
in O(log m) time [2, 8]. In principle, we have to construct at most k such data
structures, namely each time we find a new line. Thus in the worst case it takes
O(k 3 log k) time to construct all such data structures. (This could be improved,
but we do not elaborate it, because it does not affect the overall time com-
plexity.) Since we perform at most n queries—checking whether a chosen point
lies on an already constructed line—it takes O(n log k) to perform all queries.
The function FindLine, which is the algorithm by Guibas et al. [3] for identi-
fying at least k + 1 collinear
 points, is called at most k times. Each call takes
2
+1)2 2
time O (kk+1 log kk+1+1
= O(k 3 log k). Thus the total time complexity of the
function Lines is O k 3 log k + n log k + k · k 3 log k = O(n log k + k 4 log k).


Time complexity of function GreedyCover. Note that in the function


GreedyCover it always holds that all lines covering more than k + 1 points have
already been found and the covered points removed. Hence we know, because of
Lemma 4, that there are always no more than l · (k + 1) points remaining in S.
Let k1 − 1 be the value of k with which GreedyCover is called. We group the
calls to the function FindLine (which is the algorithm by Guibas et al. [3] for
identifying at least k + 1 collinear points) according to the values of k: group j,
1 ≤ j ≤ ⌈log2 k1 ⌉, contains the values of k with (k1 − 1)2−j < k ≤ (k1 − 1)21−j .
Since all points covered by lines covering more than k + 1 points have been
removed, we know that for any value of k in the j-th group there are at most
l · ((k1 − 1)21−j + 1) ≤ l k1 21−j points left to cover. Thus the time bound for a
call to the function FindLine with k in the j-th group is

(l k1 21−j )2 l k1 21−j
 
= O 2−j l2 k1 log l .

O 1−j
log 1−j
k1 2 k1 2

To estimate the number of calls to function FindLine in the j-th group we have
to distinguish two cases: either FindLine returns a line (productive call) or not
(unproductive call). Since for each unproductive call we decrement k, there can
be at most one unproductive call for each k in the group and thus there are
(k1 − 1)21−j − (k1 − 1)2−j = (k1 − 1)2−j unproductive calls in the j-th group.
Each productive call removes at least (k1 − 1)2−j + 1 ≥ k1 2−j points from S.
Thus, since there are at most l k1 21−j points left for any k in the j-th group, there
1−j
12
can be at most (k1l·k
−1)2−j +1 ≤ 2l productive calls. Therefore function FindLine
is called at most 2l + (k1 − 1)2−j times in the j-th group and hence a worst case
upper bound on the time required for all calls in the j-th group is

O (2l + (k1 − 1)2−j )2−j l2 k1 log l = O (l3 k1 2−j + l2 k12 2−2j ) log l
 

We observe that this upper bound gets smaller by a constant factor, which is
smaller than 21 , for successive groups (as j increases for successive groups). Hence
the total time complexity (sum over all groups) is bounded by twice the above
time complexity for j = 1. Therefore a worst case bound on the time complexity
of the function GreedyCover is O((l3 k1 + l2 k12 ) log l).
Covering a Set of Points with a Minimum Number of Lines 7

Time complexity of function ApxLineCover. ApxLineCover calls function


Lines each time the value of k is incremented. There are three cases we have
to consider, in each of which k is incremented differently. In any case, let k1 be
the value of k, for which the call to function Lines succeeds, i.e., for which Lines
returns a value ≥ 0 (just as above). In addition, let m1 be the number of lines
found by function Lines in this case.

Case 1: k ≤ 8 n. In this case we always square k. Immediately before squaring
the value of k, we know that k < l. Each call to function Lines takes at most
O(n log k + k 4 log k) time (see above). We observe that this time bound increases
by a factor of at least 2 for successive values of k (since we start with k = 2
and keep squaring k), leading to a geometric progression of the time complexity.
Thus the total time is asymptotically dominated by the last term, which is
O(n log k1 + k14 log k1 ). Thus the overall time complexity for ApxLineCover is
O(n log k1 + k14 log k1 + (l3 k1 + l2 k12 ) log k1 ) = O((n + k14 + l3 k1 + l2 k12 ) log k1 ).
Since we squared the value of k at each step, we know k1 < l2 (actually
k1 ≤ (l − 1)2 ). Thus the total time complexity simplifies to O(n log l + l8 log l).
This leads to the following lemma:
Lemma 5. We can approximate the minimum √ line covering problem within a
factor of O(log l) in O(n log l) time if l ≤ 8 n.

However, if we did not succeed in finding a value of k < 8 n, such that function
Lines succeeded, we proceed to the second case.

Case 2: k = 4 n. Let k0 be
√ √ smallest value of k before it became √ greater than
8
n. We know that k0 ≤ 8 n < l, trivially √ implying k = 4
n < l2 . Suppose
the function Lines succeeds for k = n, yielding m1 ≥ 0 lines. Then the total
4

time invested is the time spent in Case 1, the time for the successful call to
function Lines, plus the time for calling function GreedyCover. For the first term
(Case 1), we computed above that the worst case time bound is O(n log k +
k 4 log k), which we have to apply for k = k0 ≤ l, which yields O(n log l +
l4 log l). For the second term (function Lines) we have (see above) O(n log k +
k 4 log k) = O(n log l+l8 log l) = O(n log l). Finally, calling function GreedyCover
takes O((l3 k + l2 k 2 ) log l) = O((l5 + l6 ) log l) = O(l6 log l) time. Thus the overall
time complexity is O(n log l + l4 log l + n log l + l6 log l) = O(n log l + l6 log l).
Lemma 6. We can approximate the minimum √ line covering problem within a
factor of O(log l) in O(n log l) time if l ≤ 4 n.

However, if we do not succeed with k = 4 n, we proceed to the third case:
√ √
Case 3: 4 n < k < ⌈ n2 ⌉. In this case l > 4 n. The time we spent up to this point
consists of the time spent in Case 1, that is, O(k04 log k0 + n log k0 ) = O(n log l +
l4 log l) (see
√ above) plus the time for the (unsuccessful) call to function Lines
for k = 4 n < l, which took at most O(n log k + k 4 log k) = O(n log l + l4 log l)
time. Together this yields O(n log l + l4 log l) time. Afterwards we proceed by
doubling the value of k. For each k we call function Lines,
√ spending O(k 4 log k +
4
n log k) time, which simplifies to O(k log k) for k > n. We observe that this
4
8 Magdalene Grantson and Christos Levcopoulos

calculated worst case upper time bound increases by a factor of at least 24 = 16


for successive calls, leading to a geometric progression of the time complexity.
Thus the total time is asymptotically dominated by the last term, which is
O(k14 log k1 + n log k1 + k12 l2 log k1 + k1 l3 log k1 ), which simplifies to O(l4 log l),
since k ≤ 2l. Summarising all the three cases we obtain the following theorem:
Theorem 1. We can approximate the minimum line covering problem within a
factor of O(log l) in time O(n log l + l4 log l).
Corollary 2. We can approximate the minimum √ line covering problem within
a factor of O(log l) in O(n log l) time if l ∈ O( 4 n).

4.4 Exact Minimum Line Covering


Drawing on the total number m1 + m2 of lines obtained by the approximation
algorithm, we can determine the exact value of the minimum number of lines l
needed to cover a given set S of n points. Let m = m1 + m2 . If we know m, we
use function Lines to repeatedly identify lines containing at least m + 1 collinear
points from subsets R ⊆ S containing m2 + 1 points. Since m ≥ l, all lines found
in this way must be in the optimal solution (see Lemma 2). Hence we can find the
optimal solution by running any optimal line covering algorithm on the remaining
points, of which there can be no more then l · m (see Lemma 4). An example of
such an optimal algorithm is the algorithm proposed by Langerman and Morin
[6]. More formally, our algorithm works as follows: To find the minimum number
of lines needed to cover the remaining nR = m · l points we use the following
function, which calls the algorithm proposed by Langerman and Morin [6].
function ExactCover (S : set of points) : int;
var n, k: int; (∗ current number of lines ∗)
begin
n = |S|; k :=
 1; (∗ traverse possible numbers of lines ∗)
while k < n2 do begin


if IsCoverable(S, k) (∗ Langerman and Morin’s algorithm [6] ∗)


then return k; (∗ if S can be covered with k lines ∗)
k := k + 1; (∗ return the current k, ∗)
end;   (∗ otherwise go to the next k ∗)  
return n2 ; (∗ S can always be covered with n2 lines ∗)
end
The algorithm itself works as follows:
function ExactLineCover (S: set of points) : int;
var m: int; (∗ approximate solution ∗)
begin
m := ApxLineCover(S); (∗ find approximate solution ∗)
m := Lines(S, m); (∗ find lines that must be in the solution ∗)
return m + ExactCover(S);
end (∗ cover the rest with an exact algorithm ∗)
Covering a Set of Points with a Minimum Number of Lines 9

Theorem 2. The minimum line covering problem can be solved exactly in


O(n log l + l2l+2 ) time. In particular, if l ∈ O(log1−ǫ n), the minimum line cov-
ering problem can be solved in O(n log l) time.

Proof. We already argued above √ that the approximation algorithm takes


O(n log l + l8 log l) time if l < 8 n. It finds m lines that cover S. Finding all lines
containing at least m + 1 points takes O(n log m + m4 log m) time as analyzed
above. Recall that the approximation algorithm yields at most m = O(l log l)
lines. Therefore the time complexity of finding the lines can also be written as

O(n log(l log l) + (l log l)4 log(l log l))


= O(n log l + n log log l + l4 log5 l + l4 log4 l · log log l)
= O(n log l + l4 log5 l).

Finally, we use the algorithm proposed by Langerman and Morin [6] to find the
minimum number of lines covering the most O(m · l) = O(l2 log l) points. Their
algorithm takes at most O(nk + k 2k+2 ) time to decide whether n points can be
covered with k lines. It has to be applied for all values k, 1 ≤ k ≤ l − m3 , where
m3 is the number of lines identified by applying function Lines with k = m. In
the worst case, it is m3 = 0. The sum over the time complexities for the different
values of k is clearly dominated by the last term, that is, for k = l. Thus the time
complexity for this step is O(l3 log l + l2l+2 ) time. As a consequence, the overall
time complexity is O(n log l + l4 log5 l + l3 log l + l2l+2 ) = O(n log l + l2l+2 ). This
log n
simplifies to O(n log l) if l ≤ 2 log log n .

Theorem 3. Given a set S of n points in the plane and an integer k, we can


answer whether it is possible to find k lines that cover all the points in the set
in O(n log k + k 2k+2 ) time.

Proof. We apply the approximation algorithm (function ApxLineCover) as de-


scribed above, but endowing it with the integer k as an additional argument.
The algorithm proceeds basically as before, but it terminates and returns −1
(meaning that the set cannot be covered with k lines) if it tries to build the
point location structure with more than k 2 lines. Otherwise, it proceeds to the
end, to find m lines as described above. Next we call function Lines to find m3
lines, each covering at least m + 1 points. We know that these m3 lines are in the
optimal solution. So our next step will be to determine, whether we can cover the
remaining points with k − m3 lines. This can be done by calling the algorithm by
Langerman and Morin [6], with k − m3 as the input. (Their algorithm answers
whether the remaining points can be covered with k − m3 lines.)

4.5 Producing the optimal set of lines

We also remark that after computing the optimal number of lines l, we can
also produce the actual lines covering the input point set within the same time
bounds. One way is to first use the algorithm proposed by Guibas et al. [3] to
10 Magdalene Grantson and Christos Levcopoulos

produce lines covering at least l + 1 points. Let l′ denote the number of lines
covering at least l + 1 points and n′ the number points left to be covered. We
n′
observe that at least one of the remaining l−l′ lines cover at least l−l ′ points. Let
n′
a line be called a candidate line if it covers at least l−l′ points. Next, we repeat
the following step to produce the remaining l − l′ lines: If n′ ≤ 2(l − l′ ) then
any line covering at least two points can be included in the optimal solution.
Otherwise, for each candidate line we tentatively (temporarily) remove the points
covered by it and call Langerman and Morin’s algorithm [6] to see whether the
remaining points can be covered by l − l′ − 1 lines. Clearly, the candidate can
be included in the optimal solution if and only if the answer is yes.
To calculate the time bound we show that there are at most 32 · (l − l′ )2
candidate lines. Any point can be covered by no more than 32 · (l − l′ ) candidate

lines. (The factor 23 comes from the extreme case when l − l′ = n3 , so that

each candidate line covers only three points and the point is covered by n 2−1
candidates.) Hence, if we sum for each point the number of candidates it is
covered by, we thus get an upper bound of n′ · 23 (l − l′ ). But we observe that this
sum equals the sum we obtain by adding for each candidate line the number of
n′
points it covers. Since each candidate line covers at least l−l ′ points, the number

of candidate lines cannot be larger than (n′ · 23 (l − l′ ))/ l−l
n
′ and hence not larger
3 ′ 2
than 2 (l − l ) . Therefore we call Langerman and Morin’s algorithm [6] at most
3 ′ 2
2 (l − l ) times before we produce one more optimal line. In subsequent calls to
their algorithm, the number of optimal lines to be produced gets smaller and
hence the time complexity gets smaller each time by at least a constant factor,
since it is exponential in the number of optimal lines. This results in a geometric
progression of the time complexity. Therefore the worst-case bound for the first
call asymptotically dominates all subsequent calls.

4.6 Improving the time bound when l ∈ Ω(log n)


The above idea with candidate lines can also be used to reduce the asymptotic
time complexity of the exact algorithm for larger values of l. The idea is that
we start with our exact algorithm as described in the beginning (Theorem 2),
with the difference that instead of calling Langerman and Morin’s algorithm
[6] we proceed in another way. Recall that just before calling Langerman and
Morin’s algorithm the remaining number of uncovered points is O(l2 log l). Now
we start looping by setting a variable k first to 1, then 2, then 3, etc, until we
succeed to cover this remaining point set with k lines. For any such value of k,
the algorithm calls the recursive function FindOptimal(k ′ , S ′ ) with arguments
k and S respectively. The parameter k ′ stands for number of remaining lines to
be produced, and S ′ stands for the set of remaining points to be covered. The
function returns “true” if S ′ can be covered with k ′ lines, otherwise it returns
“false”. It works as follows.
If |S ′ | ≤ 2 · k ′ then return “true”. If k ′ = 0 and S ′ is not empty, then return
“false”. Otherwise, if there is any line covering at least k ′ + 1 points in S ′ , then
return the value of FindOptimal(k ′ − 1, S ′′ ) where S ′′ is the subset of points
Covering a Set of Points with a Minimum Number of Lines 11

of S ′ not covered by that line. Otherwise, for each line covering at least |S ′ |/k ′
points of S ′ (we call this a candidate line for the set S ′ and for the parameter
k ′ ) call FindOptimal(k ′ − 1, S ′′ ), where S ′′ is the subset of points of S ′ which are
not covered by the corresponding candidate line. Whenever such a recursive call
returns “true”, then the function returns “true”. If none of all these recursive
calls returns “true”, then the function returns “false”.
It remains to analyse the time complexity of the above function. We look at
the case when k = Ω(l), since smaller k can be handled even more efficiently.
As we calculated above, at any execution of the function FindOptimal(k ′ , S ′ )
(either on the top level or at any level of recursion) the number of the candidate
lines is at most 23 · k ′2 . Let us regard the recursion tree of an execution, having
the call FindOptimal(k, S) on the first, top level as the root of the recursion tree.
Hence, on the second level we will have at most 23 · k 2 nodes, on the third level
( 32 )2 · k 2 · (k − 1)2 nodes, and on the i-th level ( 23 )(i−1) · (k!/(k − i + 1)!)2 nodes,
for i ≤ k. It follows that the time complexity is not larger than O(( 32 )k · (k!)2 ).
By using Stirling’s approximation, this is O(( 32 )k · (k/e)2k ), where e stands for

Euler’s constant, 2.718. It holds that ( 32 )k · (k/e)2k = ( √3·k 2·e
)2k < ( 2.2194...
k
)2k .
k
Hence, the time complexity is O(( 2.2194... )2k ).
In the same way as we explained above, within the same asymptotic time
bound we can also produce a minimum set of lines covering the input point set.
We summarize the above results and discussion in the following theorem.

Theorem 4. For any input set of n points, it can be decided whether there is
a set of lines of cardinality at most k covering the n points in time O(n log k +
k
( 2.2194... )2k ). Moreover, an optimal set of covering lines with minimum cardinal-
l
ity l can be produced in time O(n log l + ( 2.2194... )2l )

Further asymptotic improvements: As we described our recursive function


FindOptimal(k ′ , S ′ ) above, it may try to solve the same subproblem several
times. For example, it may pick the same subset of lines in different order.
A way to avoid this without using too large memory-consuming tables is to
adjust FindOptimal to consider (subsets of) candidate lines in a systematic,
lexicographic order. Such refinements, together with a refined analysis of how
many candidate lines there can be depending on how many points each candidate
line has to cover, lead to provably better asymptotic time bounds. We leave
it as future work to elaborate on these ideas and analyse more precisely this
asymptotic improvement.

5 Lower Bound

In this section we give a lower bound on the time complexity for solving the
minimum line cover problem. We make the assumption that the√minimum num-
ber of lines l needed to cover a set S of n points is at most O( n). (For larger
values of l our lower bound may not be very interesting, since the best known
12 Magdalene Grantson and Christos Levcopoulos

upper bounds on the time complexity of the minimum line cover problem are
exponential anyway.)
The main result we prove in this section is as follows:
Theorem 5. The time complexity of the minimum line cover problem is
Ω(n log l) in the algebraic decision tree model of computation.
We prove Theorem 5 with a reduction from a special variant of the general set
inclusion problem [1], which we call the k-Array Inclusion problem. Set inclusion
is the problem of checking whether a set of m items is a subset of the second set
of n items with n ≥ m. Ben-Or [1] showed a lower bound of Ω(n log n) for this
problem using the following important statement.
Statement 1 If YES instances of some problem Π have N distinct connected
components in ℜn , then the depth of the real computation tree for this problem
is Ω(log N − n).
Applying Statement 1 to the complement of Π, we get the same statement for
NO instances. We define the k-array inclusion problem as follows:
Definition 1. k-Array Inclusion Problem: Given two arrays A[1 . . . k] of
distinct real numbers and B[1 . . . m] of real numbers, k ≤ m, m + k = n, deter-
mine whether or not each element in B[1 . . . m] belongs to A[1 . . . k].

Corollary 3. Any algebraic computation tree solving k-array inclusion problem


must have a depth of Ω(n log k).

Proof. This lower bound can be shown in a corresponding way as the lower
bound for the set inclusion problem [1]. As already pointed out by Ben-Or, any
computational tree will correctly decide the case when A[1 . . . k] = (1 . . . k). The
number of disjoint connected components, N , for YES instances of the k-array
inclusion problem is k m . This is because, in order to create a YES instance, for
each element mi in B[1 . . . m], there are k choices concerning which of the k
fixed elements in A[1 . . . k], mi could be equal to. Since these choices are inde-
pendent for each mi , the total number of YES-instances becomes k m . Applying
Statement 1, we get a lower bound of Ω(m log k), which is also Ω(n log k), since
m > n2 .

To establish the lower bound in Theorem 5 for the minimum line cover problem,
we convert (in linear time) the input of the k-array inclusion problem into a
suitable input to the minimum line cover problem as follows: Each real number
ai , 1 ≤ i ≤ k, in the array A[1 . . . k] becomes k points (ai , j), 1 ≤ j ≤ k and each
real number bj in the array B[1 . . . m], 1 ≤ j ≤ m, becomes a point (bj , −j), all
points are in two dimensional space. None of the constructed sets of n = k + m
points coincide. If we use any algorithm for the minimum line cover problem to
solve the constructed instance, the output will be a set of lines covering these
points. To obtain an answer to the k-array inclusion problem, we check whether
the total number of lines, denoted by l, obtained for the minimum line cover
problem is greater than k. If l = k then each element in B[1 . . . m] belongs
Covering a Set of Points with a Minimum Number of Lines 13

to A[1 . . . k], otherwise at least one element in B[1 . . . m] does not belong to
A[1 . . . k]. Since the k-array inclusion problem requires Ω(n log k) time, it follows
that the minimum line cover problem requires Ω(n log k) time as well.
According to this construction, if√it would be possible to compute the number
l in o(n log l) time, for some l = O( n), then it would also be possible to solve
the k-array inclusion problem in time o(n log k) for the case when k = l, which
would contradict our lower bound for the k-array inclusion problem.
Acknowledgment: The authors wish to thank Dr. Christian Borgelt for his
detailed reading and comments on the paper.

References
1. M. Ben-Or. Lower Bounds for Algebraic Computation Trees. Proc. 15th Annual
ACM Symp. on Theory of Computing, 80–86. ACM Press, New York, NY, USA
1983
2. H. Edelsbrunner, L. Guibas and J.Stolfi. Optimal Point Location in a Mono-
tone Subdivision. SIAM J. Comput. 15:317–340.Society for Industrial and Applied
Mathematics, Philadelphia, PA, USA 1986
3. L. Guibas, M. Overmars, J. Robert. The Exact Fitting Problem in Higher Dimen-
sions. Computational Geometry: Theory and Applications, 6:215–230. 1996
4. D. Johnson. Approximation Algorithms for Combinatorial Problems. J. of Comp.
Syst. Sci. 9:256-278. 1974
5. V. Kumar, S. Arya, and H. Ramesh. Hardness of Set Cover With Intersection 1.
Proc. 27th Int. Coll. Automata, Languages and Programming, LNCS 1853:624–635.
Springer-Verlag, Heidelberg, Germany 2000
6. S. Langerman and P. Morin. Covering Things with Things. Proc. 10th Annual
Europ. Symp. on Algorithms (Rome, Italy), LNCS 2461:662–673. Springer-Verlag,
Heidelberg, Germany, 2002
7. N. Megiddo and A. Tamir. On the Complexity of Locating Linear Facilities in the
Plane. Operation Research Letters 1:194–197. 1982
8. N. Sarnak, and R.E. Tarjan. Planar Point Location Using Persistent Search Tree.
Comm. ACM 29:669-679. ACM Press, New York, NY, USA 1986

You might also like