Nearest Neighbor Queries: Instructor: Cyrus Shahabi

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

CSCI585

Nearest Neighbor Queries


Instructor: Cyrus Shahabi

CSCI585

Nearest Neighbor Search


convert the nearest neighbor search to range search. Guess a range around Q that contains at least one object say O
if the current guess does not include any answers, increase range size until an object found.

Retrieve the nearest neighbor of query point Q Simple Strategy:

Compute distance d between Q and O re-execute the range query with the distance d around Q. Compute distance of Q from each retrieved object. The object at minimum distance is the nearest neighbor!!!

CSCI585

Nave Approach
A B F E 4 5 6
Query Point Q
A: B C

1 2 3
B: E F C: G H

E 4 5 6

F 1 2 3

G 10 11 12 H 7 8

C 11 G 12 10 7 8 9 H

Issues: how to guess range? The retrieval may be sub-optimal if incorrect range guessed. Would be a problem in high dimensional spaces.

CSCI585

Given a query location q, find the nearest object.

a q

Depth First and Best-First Search using R-trees Goal: avoid visiting nodes that cannot contain results
4

CSCI585

Basic Pruning Metric: MINDIST


Minimum distance between q and an MBR.
p E1 mindist(E1,q) q

mindist(E1,q) is a lower bound of d(o,q) for every object o in E1. If we have found a candidate NN p, we can prune every MBR whose mindist > d(p, q).
5

CSCI585

MINDIST Property

MINDIST is a lower bound of any k-NN distance

(p1, p2) (t1, t2)

(p1, p2) (p1, p2) (s1, s2)

(p1, p2) (p1, p2)

Depth-First (DF) NN Algorithm Roussoulos et al., SIGMOD, 1995


CSCI585

y axis
10 g 8 6 4 2 e f h

E 2 E 6

E 7
l k j

E 5

i E E 4 1 query d E a 3 b c

Main idea Starting from the root visit nodes according to their mindist in depth-first manner

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a 5 7 E 3 b c E 4 E 3 5 E 4 9 d E 5 5 e E 5 f

E 2 2 E 6 2 g h E 6

Note: distances not actually stored inside nodes. Only for illustration E 7 13 i 2 j 10

E 2 k 13 E 7 l m

CSCI585

DF Search Visit E1
y axis
10 g 8 6 4 b 2 c e f h

E 2 E 6

E 7
l k j

E 5 E 1
a i

E 4
d

query

E 3

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a 5 8 E 3 b c E 4 E 3 5 E 4 9 d E 5 5 e E 5 f

E 2 2 E 6 2 g h E 6 E 7 13 i 2 j E 2 k E 7 l m

DF Search Find Candidate NN a


CSCI585

y axis
10 g 8 6 4 b 2 c e f h

E 2 E 6

E 7
l k j

E 5 E 1
a i

E 4
d

query

E 3

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a 5 9 E 3 b c E 4 E 3 5 E 4 9 d E 5 5 e E 5 f

E 2 2 E 6 2 g h E 6 E 7 13 i 2 j

First Candidate NN: a with distance 5

E 2 k E 7 l m

CSCI585

DF Search Backtrack to Root and Visit E2 y axis


10 g 8 6 4 b 2 c e f h

E 2 E 6

E 7
l k j

E 5 E 1
a i

E 4
d

query

E 3

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a 5 10 E 3 b c E 4 E 3 5 E 4 9 d E 5 5 e E 5 f

E 2 2 E 6 2 g h E 6 E 7 13 i 2 j

First Candidate NN: a with distance 5

E 2 k E 7 l m

DF Search Find Actual NN i


CSCI585

y axis
10 g 8 6 4 b 2 c e f h

E 2 E 6

E 7
l k j

E 5 E 1
a i

E 4
d

query
First Candidate NN: a with distance 5 x axis

E 3

10

Root E 1 1 E 1 a 5 11 E 3 b c E 4 E 3 5 E 4 9 d E 5 5 e E 5 f

Actual NN: i with distance 2

E 2 2 E 6 2 g h E 6 E 7 13 i 2 j E 2 k E 7 l m

Optimality
CSCI585

y axis
10 g 8 6 4 b 2 c e f h

E 2 E 6 E 5 E 1
a i

E 7
l k j

E 4
d

query

E 3

x axis
0 2 4 6 8 10

Question: Which is the minimal set of nodes that must be visited by any NN algorithm? Answer: The set of nodes whose MINDIST is smaller than or equal to the distance between q and its NN (e.g., E1, E2, E6).
12

CSCI585

A Better Strategy for KNN search

A sorted priority queue based on MINDIST; Nodes traversed in order; Stops when there is an object at the top of the queue; (1-NN found) k-NN can be computed incrementally;

I/O optimal

13

CSCI585

Priority Queue
A:

A B F E 4 5 6

B C

1
B: E F F 4 5 6 1 2 3

C: G G 10

2 3
A B
Query Point Q

11 12 H 7 8 9

C C 5 5 5 F 6 G 8 4 6 G F 4 9 F 6 4 F

E C

C 1211 10 G 7 8 9 H
H 7

1NN

14

CSCI585

Best-First (BF) NN Algorithm (Optimal) Hjaltason and Samet, TODS, 1998

Keep a heap H of index entries and objects, ordered by MINDIST. Initially, H contains the root. While H
Extract the element with minimum MINDIST
If it is an index entry, insert its children into H. If it is an object, return it as NN.

End while
15

CSCI585

BF Search Visit root


y axis
g h

10 8 6 4 b 2 c e f

E 2 E 6

E 7
l k j

Action Visit Root

Heap
E 1 E 2 1 2

E 5 E 1
a i

E 4
d

query

E 3

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a
16

E 2 2 E 6 2 f g h E 6 E 7 13 i 2 j 10 E 2 k 13 E 7 l m

E 3 5 c E 4

E 4 9 d

E 5 5 e E 5

E 3

CSCI585

y axis
10 g 8 6 4 b 2 c e f h

BF Search Visit E1
E 2 E 6 E 5 E 1
a i j m

E 7
l k

Action Visit Root follow E1

Heap
E 1 E 2 1 2 E 2 E 5 E 5 E 9 3 5 4 2

E 4
d

query

E 3

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a
17

E 2 2 E 6 2 f g h E 6 E 7 13 i 2 j 10 E 2 k 13 E 7 l m

E 3 5 c E 4

E 4 9 d

E 5 5 e E 5

E 3

CSCI585

y axis
10 g 8 6 4 b 2 c e f h

BF Search Visit E2
E 2 E 6 E 5 E 1
a i j m

E 7
l k

Action Visit Root follow E1 follow E2

Heap
E 1 E 2 1 2 E 2 E 5 E 5 E 9 3 5 4 2 E 2 E 5 E 5 E 9 E 13 3 5 4 7 6

E 4
d

query

E 3

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a
18

E 2 2 E 6 2 f g h E 6 E 7 13 i 2 j 10 E 2 k 13 E 7 l m

E 3 5 c E 4

E 4 9 d

E 5 5 e E 5

E 3

CSCI585

y axis
10 g 8 6 4 b 2 c e f h

BF Search Visit E6
E 2 E 6 E 5 E 1
a i j m

E 7
l k

Action Visit Root follow E1 follow E2 follow E6

Heap
E 1 E 1 2 E 2 E 3 2 E 2 E 3 6 i 2 E3 2 5 E 5 E 9 5 4 5 E 5 E 9 E 13 5 4 7 j 10 E 13 k 5 E 5 E 9 7 5 4

E 4
d

query

E 3

13

x axis
0 2 4 6 8 10

Root E 1 1 E 1 a
19

E 2 2 E 6 2 f g h E 6 E 7 13 i 2 j 10 E 2 k 13 E 7 l m

E 3 5 c E 4

E 4 9 d

E 5 5 e E 5

E 3

CSCI585

DF Search Find Actual NN i


y axis
g h

10 8 6 4 b 2 c e f

E 2 E 6

E 7
l k j

Action Visit Root follow E1 follow E2 follow E6

Heap
E 1 E 1 2 E 2 E 3 2 E 2 E 3 6 i 2 E3 2 5 E 5 E 9 5 4 5 E 5 E 9 E 13 5 4 7 j 10 E 13 k 5 E 5 E 9 7 5 4

E 5 E 1
a i

E 4
d

query

E 3

13

Report i and terminate x axis


2 4 6 8 10

Root E 1 1 E 1 a
20

E 2 2 E 6 2 f g h E 6 E 7 13 i 2 j 10 E 2 k 13 E 7 l m

E 3 5 c E 4

E 4 9 d

E 5 5 e E 5

E 3

CSCI585

Generalizations

Both DF and BF can be easily adapted to (i) extended (instead of point) objects and (ii) retrieval of k (>1) NN. BF can be made incremental; i.e., it can report the NN in increasing order of distance without a given value of k.
Example: find the 10 closest cities to HK with population more than 1 million. We may have to retrieve many (>>10) cities around Hong Kong in order to answer the query.

21

CSCI585

Generalize to k-NN

Keep a sorted buffer of at most k current nearest neighbors Pruning is done according to the distance of the furthest nearest neighbor in this buffer Example:
R

MINDIST

P Actual_dist The k-th object in the buffer


22

CSCI585

Another filter idea based on MBR Face Property

MBR is an n-dimensional Minimal Bounding Rectangle used in R trees, which is the minimal bounding n-dimensional rectangle bounds its corresponding objects. MBR face property: Every face of any MBR contains at least one point of some object in the database.

23

CSCI585

MBR Face Property 2D

24

CSCI585

MBR Face Property 3D


Rectangle R

25

CSCI585

Improving the KNN Algorithm

While the MinDist based algorithm is I/O optimal, its performance may be further improved by pruning nodes from the priority queue.

26

CSCI585

Properties of MINMAXDIST

MINMAXDIST(P,R) is the minimum over all dimensions distances from P to the furthest point of the closest face of R. MINMAXDIST is the smallest possible upper bound of distances from the point P to the rectangle R. MINMAXDIST guarantees there is an object within the R at a distance to P less than or equal to minmaxdist.

MINMAXDIST is an upper bound of the 1-NN distance


27

CSCI585

MINDIST & MINMAXDIST


MINDIST(P,R) <= NN(P) <= MINMAXDIST(P,R)

28

CSCI585

MinDist & MinMaxDist 3D


Query Point Q MinMaxDist(Q,R) MinDist(Q,R)

Rectangle R

29

CSCI585

Pruning 1
If there exists another R such that MINDIST(P,R)> MINMAXDIST(P,R)
R R

Downward pruning: An MBR R is discarded

MINDIST

MINMAXDIST

30

CSCI585

Pruning 2
If there exists an R such that Actual_dist(P,O) > MINMAXDIST(P,R)

Downward pruning: An object O is discarded

Actual-dist

MINMAXDIST

31

CSCI585

Pruning 3
If an object O is found such that MINDIST(P,R) > Actual_dist(P,O)

Upward pruning: An MBR R is discarded

MINDIST

P Actual_dist O

32

CSCI585

MINDIST vs MINMAXDIST Ordering

MINDIST: optimistic (the box that is closer) MINMAXDIST: pessimistic (the box that has at least one object at that distance)

Example: MINDIST ordering finds the 1-NN first


33

CSCI585

MINDIST vs MINMAXDIST Ordering

Example: MINMAXDIST ordering finds the 1-NN first


34

CSCI585

NN-search Algorithm using the mentioned pruning rules (branch and bound)
1. 2.
Initialize the nearest distance as infinite distance Traverse the tree depth-first starting from the root. At each Index node, sort all MBRs using an ordering metric and put them in an Active Branch List (ABL). Apply pruning rules 1 and 2 to ABL Visit the MBRs from the ABL following the order until it is empty If Leaf node, compute actual distances, compare with the best NN so far, update if necessary. At the return from the recursion, use pruning rule 3 When the ABL is empty, the NN search returns.

3. 4. 5. 6. 7.

CSCI585

Best-First vs Branch and Bound


Best-First is the optimal algorithm in the sense that it visits all the necessary nodes and nothing more! But needs to store a large Priority Queue in main memory. If PQ becomes large, we have thrashing BB uses small Lists for each node. Also uses MINMAXDIST to prune some entries

CSCI585

References

1- N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, pages 71-79, 1995 2- G. R. Hjaltason and H. Samet, Distance browsing in spatial databases, ACM Transactions on Database Systems 24, 2 (June 1999), 265-318 STDBM06 keynote slides by Dimitris Papadias Novel Forms of Nearest Neighbor Search in Spatial and Spatiotemporal Databases

You might also like