Comparison-Based Search in The Presence of Errors : (Preliminary Version)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Comparison-based Search in the Presence of Errors *

(Preliminary Version)

Ryan S. Bergstrom S. Rao Kosaraju


Computer Science Department
Johns Hopkins University
Baltimore, MD 21218

Abstract comparison questions may be incorrect on oc-


casion. Specifically, we are given a sorted list
The classic problem of searching an ordered list L[l... n] of distinct numbers, a number z, and
becomes more complicated when the answers to a positive fraction r < ~. We are allowed to
queries are not reliable. In the prefix-bounded ask questions of the form: is x < L[j] ? The re-
error model, the number of incorrect responses sponses are bounded by the following rule: At
at no point exceeds r times the number of ques- any point, when k questions have been asked,
tions, where r is a fixed positive fraction strictly no more than irk] of the answers have been
less than ~. We present efficient algorithms for incorrect.
searching in this error model, and we also estab- We seek to design an efficient algorithm
lish nontrivial lower bounds for sorting, maxima to find the i such that z is in the list iff
finding, and related problems. z = L [i].1 In Section 2 we design an O(log n)
step algorithm for any r < ~. Previously,
[Pe189b, DGW92], the best bound for any ~ s
1 Introduction
~ < ~ was O(n]0g2 + ). In Section 3 we re-

One of the fundamental problems of computer move the prefix constraint and assume that

science is ‘searching’; given a list of numbers, the overall number of incorrect responses is no

possibly constrained in some manner, deter- more than r times the total number of questions

mine the position/existence of a given number asked. For any r < ~, we develop an O(log n)

in the list. Optimal algorithms exist for the step algorithm that computes an O(1) size set

standard variations of this problem; in particu- such that z is in the list iff x is in the computed

lar, when the initial list is sorted, binary search set. In Section 4 we establish exponential lower

finds the element in question with an optimal bounds for sorting, identifying n bits or finding

number of comparisons. the maximum.

We consider an extended version of this


problem wherein the responses given to the
2 The ‘Chip Game’
Supported
● by the National Science Foundation un-
der Grant #CCR-9107293 and by NSF/DARPA under As in [SW], let us rewrite the problem as fol-
Grant #CCR-8908092.
lows:
Permission to copy without fee all or part of this material is
granted providad that tha copies are not made or distributed for For each possible answer z = L [i], we place
direct commercial advantage, the ACM copyright notice and the a chip i upon a shared one-dimensional board.
title of the publication and its date appear, and notice ie given
that copying is by permission of the Association for Computing lWhen the comparisons are limited to z < L[j] the
Machinery. To copy otherwise, or to republish, requires a fea
above information is the best we can achieve.
and/or spscific permission.
25th ACM STOC ‘93-51931CA,USA
o 1993 ACM 0-89791 -591 -719310005 /0130 . ..$1.50

130
Position 1 of the board will contain the set of and W is the current weight. This definition
chips that will be possible answers if exactly 1 of weight is different from previous definitions
of the responses have been incorrect. We will [SW, KMR+80, AD91]. It leads to simpler anal-
place a gate at any time t at location rt. The ysis and simpler algorithms.
gate distance measures the number of permis-
sible errors up to that instant. Any chip to the Lemma 1 For every T, O < T < ~, any c
right of the gate, then, corresponds to an an- such that 1 < c ~ ~ satisfies the condition

swer that is no longer possible. q<l.

The above properties are maintained as Proof Let ~(c) = U – 1, The only value
follows: of c > 1 at which f’(c) 20 is ~. Since ~’(l) is
negative, f’ is continuous in [1, ~], and f(l) =
● All chips are initially at location O, since
O,~(c)< Oforanyl<c S~.
initially every element is a possible answer
and no errors have occurred. The gate is .,-l+.T ~ 1, then everY
Lemma 2 If c satisjies ~
also at position O.
step which moves to the right chips accounting

● Each question ‘x < L[j]?’ will be an- for at least half the weight reduces W by at least

swered with ‘yes’ or ‘no). If the answer the constant fraction d = ~.

is yes, we move chips j . . . n forward by


a unit distance. If the answer is no, we Proof The worst case, by this measure, is
move chips 1 . . . j – 1 forward by a unit when exactly half the weight moves forward. In
distance. In each step the gate moves this case, our new weight after this step will be
forward by a distance r. (Note that the fC’-l + ~CT = -W = dW.
gate’s location can be non-integral.) The following generalization of lemma 1
plays a significant role in Stage 1 of the next
imagine, in the worst case, that there is an section.
adversary who is playing this chip game against
us, i.e. choosing the ‘yes’ or ‘no’ answer. This Lemma 3 For every r, O < r < ~, the?’e exists
adversary, equivalently, chooses in each step a positive integer H such that T + ~ < ~, and
which set of chips will be moved forward, If l-?–~
anyc, l<c~ K, satisjies the condition
the algorithm in some step moves all but one r++

chip to the right of the gate, then all t’he others -c+ < 1.
2
are excluded as possible answers and hence we
are done. Proof The proof of this lemma is similar to
that of lemma 1.

2.1 Measuring Progress


2.2 Prefix-Bounded Error Search
At any instant, let us define the weight of a
chip i as cd;, where c is a constant (greater 2.2.1 Stage 1: Reduce the number of
than 1) yet to be defined and di is the dis- chips on the board to O(1)
tance from (the location of) chip i to the gate.
We translate our search into the chip game,
This distance will be considered to be positive
with n chips. Note that E = nc” = n. Choose
if chip i is to the left of the gate, a,nd nega-
H and c so that Lemma 3 applies.
tive if chip i is to the right of the gate. Let
Because the weight function is the sum of
the (total) weight be the sum of the weights of
chip weights, it has a midpoint that is either in
all the chips. Let E denote the initial weight,

131
a chip or between two chips, for any ordering removed is a constant. We replace the heavy
of them. Therefore, there is a j such that the chips - resulting in an overall constant number
summed weights of 1 ,.. j – 1 and of j + 1 .. . of chips. The total number of st eps performed is
n are both less than or equal to half the weight (log,,, 2) log2 n which is O(log n). As observed
of 1 . . . n. in [AD91, SW] since moving any chip left can-
We will compare z to L[j]. Based on the not make the problem easier, we assume that
response, in one step, either 1, . . . . j — 1 will all the chips are at location O, and the gate is
move forward, or j, . . . . n will. In the former O(log n) to the right.
case, we will also shift j forward one square;
call this a virtual advance. In either case, we 2.2.2 Stage 2: Reduce the number of
have moved forward chips containing more than chips on the board to 1.
half the weight. By lemma 2 (when dc~ < 1,
As shown in [AD91], it is a simple matter to
surely d < 1), our new weight is no larger than
reduce the number of chips on the board to 1
dE, where d = ~.
with O(log n) steps, when there are only O(1)
We apply this virtual advance technique for
chips and the gate is initially at most O(log n)
m = (log nl steps. At this stage, we have mul-
away.
tiplied our weight by the fraction d m times.
Our new weight is Edm , and we will have made
up to m virtual advances. We charge each vir- 2.3 Algorithmic Implementation
tual advance to the corresponding chip moved.
Even though we have bounded the number of
We will call a chip heavy if more than ~ virtual
questions to be asked to O(log n), it is not
advances are charged to it. Note that at most
clear whether an algorithm whose overall run-
H heavy chips can result since the total num-
ning time is O (log n) can be designed. In the
ber of virtual advances is m. We remove the
following we design such an algorithm. We first
heavy chips and move back each of the remain-
present an 0(log2 n) algorithm. Then we ob-
ing chips by the virtual advances charged to it.
serve that the algorithm can be easily improved
At this stage the chips on the board are in their
to O(log n log log n) steps. Finally, we design an
correct positions.
O(log n) step algorithm.
Now let us bound the weight of the result-
ing board position. Since the heavy chips are
removed, no chip moves back by more than ~ 2.3.1 0(log2 n) step implementation

squares. In the worst case, no heavy chips result Simulating Stage 1 in 0(log2 n) time At
and each chip gets moved back by ~ squares,
any stage we maintain a list in which each entry
multiplying the total weight by c%. Thus the is a triple consisting of an interval (of chips), the
new weight after m steps will be no larger than sum of the weights of the chips in the interval,
Edmc% = l?(dc+)~. Now by Lemma 3, this and a count of the number of virtual advances
weight is no larger than Egm for some g < 1.
charged to the last chip in that interval. In
We repeat the above application of m steps
addition the concatenation of the intervals in
for logllg 2 rounds. Since the initial weight is the list will result in the interval [1... n]. Ini-
n, the final weight will be n(g~)lOgl f~ 2 which is tially, the list consists of one entry, ([1... n], n,
no more than 1. Thus no chip can end up to O). When the next comparison is performed,
the left of the gate and at most one chip can intervals that don’t contain the index of com-
end up on the gate. Since we have removed parison need not be split, and even the interval
a constant number, H, of heavy chips in each that contains the index need not be split into
round, and since there are only a constant num- more than two sub-intervals. It is easy to show
ber of rounds, the total number of heavy chips that the index of comparison can be chosen in

132
one scan of the list. After asking a q,uestion, Our previous algorithm performs a “mid-
the weights of the intervals will have changed, point” chip search. That is, if the sequence of
and it is necessary to update the list to reflect chip weights is WI, wz, . . . . Wn, we chose an i
this. In addition, one of the entries can split such that WI + W2 + ... +wi?~andw;
into 2 entries. It is easy to observe that all + w~+~ + . . . + W. > ~ as the next index of
these changes in the list can be incorpalrated in comparison. We show below, for a suitable s,
a single scan. In between cycles, in one scan we 0 <s<+, it suffices if i satisfies zq + wz + . . .
can identify the heavy chips. + wi 2 SW and Wi + Wi+l+ . . . +wn > sW.
Since each find-question/ask-questicm/update To establish the correctness of this ap-
can increase the number of entries by at most 1, proach, we need a strengthening of Lemma 3.
the length of the list is O(log n). Therefore, be-
cause there are only O(log n) questions in this Lemma 4 For every r, O < r < ~, there exists

stage, the time taken to complete this stage is a positive integer H and an s < ~ such that

bounded by 0(log2 n).

satisfies the condition (SC’-l +(1 — s)cr)c~ < 1.


Simulating Stage 2 in O(log n) time Since
the list is of lengt$ O(1), a procedure akin to
Proof Again, the proof of this lemma is sim-
that in Stage 1 will run in O(log n) time for
ilar to the proof of lemma 1.
performing O(log n) steps.
Therefore, one may perform a comparison-
based search in the prefix-bounded error model This lemma will allow a great deal of flexi-
in O (log2 n) worst-case time. bility in choosing the index of comparison.
Additionally, we replace the above (2,3)-tree

2.3.2 Improvement to O(log nlog log n) implementation by a biased search tree imple-
mentation of [B ST85]. A biased search tree is
The list organization of Stage 1 can be balanced by weight. For any s, O < s < ~, there
replaced by any balanced tree, ‘~”g” a is some constant depth d such that by searching
(2,3)-tree. Since the number of entries is c1levels deep into the tree, one can find a chip
O(log n), the depth of this tree is O(log log n). with at least s of the weight to either side. Thus
It is not difficult to verify that each a suitable “midpoint” chip may be selected by
find-question/ask-question/update can be per- going down the tree 0(1) levels, rather than the
formed in O(log log n) steps, resulting in an O(log log n) levels required in the (2,3)-tree.
overall speed of O (log n log log n) steps. Each
We will show in the complete version that
internal node stores the total weight contained each find-question / ask-question / update can
in its subtree and a multiplicative scaling factor be implemented by performing 0(1) steps on
applicable for every weight in its subtree. the biased tree. Identification of heavy chips
can be performed in O(log n) steps. Conse-
2.3.3 O(log n) Step Implementation quently the overall time will be O(log n).
Hence we have:
Now we design an O(log n) step implementa-
tion. This is achieved by making a significant
Theorem 1 For any T < ~, we can construct
improvement to the previous algorithm, and by
an O(log n) step algorithm for the binary search
employing the biased search trees of [13ST85] in
problem when the incorrect responses are pre$x-
place of the above (273)-trees. In particular, the
bounded by the factor T.
time to locate the chip to cut at, and the tree
updating process, can be reduced to O(l).

133
3 A Variation parison questions between two elements of the
list to accomplish this determination.
Consider the variation in which we remove the We note that any comparison involving ele-
constraint that at any point when k questions ments more than one step away from each other
have been asked no more than rk of the answers provides no information, as their relative posi-
have been incorrect. Instead we simply require tion is known. Hence, the only questions whose
that the total number of incorrect responses be answers provide information are those between
no more than r times the total number of ques- two adjacent elements.
tions asked. In the following we establish an exponential
lower bound for this problem. We even assume
Observation For this variation, for any r < that the adversary responds correctly to every
~, we can construct an O(log n) step algorithm query.
. . .
which finds O(1) indices, zl, 2z, . . . ip such that Given the string S of questions produced by
z is in L iff z is in {L(iI), L(i2), . . . L(iP)}. the given algorithm, we will make an improve-
ment to it. Examine the last question in 5’.
It compares two elements, say, ai and ai+l. If
Proof No chips are removed from the board
the algorithm already knows a; < ai+l then we
until the very end of the execution of stage 1
remove the question and repeat the procedure.
of our previous algorithm. Consequently, if we
terminate this algorithm after stage 1, we will Otherwise, let us move all the comparisons of
the ai and ai+l to the end of our question string.
end up with O(1) indices. In addition, just the
overall error bound, instead of the prefix bound, Clearly, we will still have sufficient evidence to
compute whether a; < a;+l. Furthermore, this
suffices.
This is especially interesting, because it was can only help all of the other comparisons in
the string; there are now fewer potential errors
established in [SW, DGW92] that the existence
of an algorithm to end up with one index re- that can have been made at the time of their

quires r < ~. computation.


Repeat this process on the list composed of
all the other questions. When we are done, we
4 Related Problems will have a list of questions, no longer than S,
in which the first question is asked until the
Unlike searching, related problems such as sort- answer is verified, and then the second question
ing have drastically different lower bounds in is asked until the answer is verified, and so forth
the prefix-bounded error model. We first estab- until the last question.
lish such a lower bound for the simple problem Take any such string of questions. Let f(k)
of testing whether the input is in sorted order. be the total number of questions asked by the
time the first k questions are resolved. Note

4.1 Determining Whether a List that the number of repetitions of the (k + I)st
question must be more than the number of er-
is Sorted
rors permissible in the f(k+ 1) questions. Hence
Consider a list of distinct elements a. ... an. f(k + 1) – j(k) > [rj(k + l)J. In addition,
Our goal is to determine whether this list is in ~(l) = 1. Consequently our verification re-
sorted order or not. To simplify this, we are quires Q((* )n) comparisons.
guaranteed that if the list is not sorted, there Therefore, it requires an exponential num-
will be an i such that switching ai and ai+l will ber of questions to test whether a list is sorted
produce a sorted list. We are allowed only com- in the prefix-bounded error model. Looking at
this problem as a set of n successive 2-chip chip

134
games, using the results of [AD91, DGW92] we lished that ai is less than some aj, we might
can easily show that the verification can be per- not be able to pick a particular j s.t. ai < aj.
formed in O((S )“) comparisons. It can turn out that the total number of times
ai is found to be less than other elements might
suffice to show that ai is not the largest element.
4.2 Multiple Bit Identification
We pick the smallest of such elements, say
Suppose we are given m variables Z1, X2, . . . . z~, aj, and rewrite all of the comparisons necessary
each having a value ‘O’ or ‘1’. In each step, we to establishing that ai is less than some other
can test whether a chosen xi < 1? The errors element as comparisons between ai and aj. We
are prefix-bounded as, before. now move all of these comparisons to the be-
Then as in the previous argument, we can ginning of the question string and replace ai
establish a lower bound of Q( ( & )“) – mat thing with aj in the rest of the question sequence.
the upper bound given in [AD91, DGW92]. Where the original sequence established that
some other element ak was not the largest eh
ement by answering ak < ai, our modified se-
4.3 Maximum Element Finding
quence will instead answer ak < aj.

It is quite nontrivial to establish an exponential We repeat this process on the remainder of


lower bound for the related problem of finding the question string, which no longer contains
the maximum element of a list. Let the ad- questions relating to ai. As with sorting, this
versary assume that a. < al < ... < an, and problem may be shown to be exponential when
respond correctly for each query. Note that our rewritten into this format.
algorithm must establish that each Ui, i # n, is
less than some other aj.
We consider the question string S, and re- References
move all comparisons whose answers can be in-
[AD91] J.A. Aslam and A. Dhagat. Search-
ferred by the algorithm at the time they were
ing in the presence of linearly
asked.
bounded errors. In Proceedings of
Consider the first time that it is established
the 23rd ACM Symposium on the
that some ai is less than at least one other el-
Theory of Computing, pages 486-
ement. We look at the subset of comparisons
493, 1991.
necessary to have established this.
Since this is the first time that such a fact is [Ber68] E.R. Berlekamp. Block Coding
established, a comparison involving two other for the Binary Symmetric Channei
elements aj and ak is of no value. If we do not with Noiseless, Delayless Feedback,
know with certainty that ai is less than one of pages 61-85. Wiley, 1968.
those elements, we cannot conclude with cer-
tainty that ai is less than some other element [BST85] S.W. Bent, D.D. Sleator, and R.E.
from the results of such comparisons regard- Tarjan. Biased search trees. SIAM

less of how well established they are. Further, Journal on Computing, pages 545-

no comparison between a; and some element 568, 1985.


smaller than itself can aid establishing that ai
[DGW92] A. Dhagat, P. Gacs, and P. Win-
is not the maximum.
kler. On playing “Twenty Ques-
Therefore, all of the comparisons that orig-
tions” with a liar. In Proceedings of
inally establish that ai is not the maximum
the 3rd Annual A CM-SIAM Sympo-
must be between ai and elements larger than
sium on Discrete Algorithms, pages
ai. However, at the instant when it is est ab-
16-22, 1992.

135
[FPRU90] U. Feige, D. Peleg, P. Raghavan, retical Computer Science, pages 85-
and E. Upfal, Computing with un- 94, 1984.
reliable information. In Proceedings
of the 22nd A CM Symposium on the
[Sw] J. Spencer and P. Winkler. Three
threshholds for a liar. Preprint
Theory of Computing, pages 128-
1990.
137, 1990.

[YY85] A.C. Yao and F.F. Yao. On fault-


[KMR+80] D.J. Kleitman, A.R. Meyer, R.L.
tolerant networks for sorting. SIAM
Rivest, J. Spencer, and K. Winkl-
Journal on Computing, pages 120-
mann. Coping with errors in binary
128, 1985.
search procedures. Journal of Com-
puter and System Sciences, pages
396-404, 1980.

[LRG91] K.B. Lakshmanan, B. Ravikumar,


and K. Ganesan. Coping with er-
roneous information while sorting.
IEEE Transactions on Computers,
pages 1081-1084, 1991.

[Pe’188] A. Pelt. Prefix search with a lie.


Journal of Combinatorial Theory,
Series A, pages 165-173, 1988.

[Pe189a] A. Pelt. Detecting errors in search-


ing games. Journal of Combinator-
ial Theory, Series A, pages 43-54,
1989.

[Pe189b] A. Pelt. Searching with known error


probability. Theoretical Computer
Science, pages 185-202, 1989.

[Pip85] N. Pippenger. On networks of noisy


gates. In Proceedings of the 26th
Annual Symposium on the Founda-
tions of Computer Science, pages
30-38, 1985.

[RGL87] B. Ravikumar, K. Ganesan, and


K.B. Lakshmanan. On selecting the
largest element in spite of erroneous
information. Lecture Notes in Com-
puter Science, 247:88-99, 1987.

[RL84] B. Ravikumar and K.B. Laksh-


manan. Coping with known pat-
terns of lies in a search game. Theo-

136

You might also like