Incremental LOF Algorithm

Incremental Local Outlier
Factor Algorithm
Dragoljub Pokrajac, AMRC, DSU

Longin Jan Latecki, Temple
University
Aleksandar Lazarevic, UTC
University of Delaware, 10/23/2006
Static LOF Algorithm For Outlier
Detection (Breunig et al, 2000)
• Main idea: Estimate local density of data
• Outlier is a data point where the local
density significantly decreases in
comparison to its neighborhood
• We measure:
– The local density by local reachability distance
– Density variation by LOF factor
Example: Two-Dimensional Data
Not outlier
Outlier
Outlier
Local Reachability Distance (lrd)
k=MinPts=4
LOF Factor
6.05
60
3.70
50
40
2.26
30
20
1.38
10
0.87
LOF factor
Identified Outliers
Reachability Distance
 d O , p , O is not among k nearest neighbors of p
reach  dist (O , p )   (1)
k  dis tan ce p , O is among k nearest neighbors of p
K-neighborhood (k=3)
k-distance(p)
Reachability distance to particular points from point p

Local Reachability Density (lrd)
• For each point O, local reachability density is
defined as inverse of average reachability
distance from O to its k neighbors
1
lrd (O )  (2)
 reach _ dist k (O , p ) / k
for all k neighbors p
• For given coordinates of neighbors

– the reachability distances will be the smallest possible if
O is among k-nearest neighbors of each of its k
neighbor
• Heuristically, this means the lrd is large, if the
neighborhood of O is “compact”
Local Outlier Factor (LOF)
• Heuristically, the point is outlier if:
– its neighboring points are in more “compact” (dense)
regions than the region where the point is itself
– We measure “compactness” by lrd
– Hence, we may detect outliers by comparing average
lrd in neighborhood of point and the lrd at the very
point
• To quantify outliers, we use the LOF(O) ratio
defined as:
1
 lrd ( p )
LOF (O )  k for all k neighbors p (3)
lrd (O )
Properties of LOF algorithm
• Computational complexity is O(NlogN) for
properly chosen indexing structure
• Algorithm performs statically: e.g., does not
accommodate for incremental data
Incremental LOF Algorithm
• Goal: Update parameters of LOF algorithm

incrementally with each new record arriving in the
database
• Retain computational complexity of static LOF
algorithm
Incremental LOF-Insertion k=2
 d O, p , O  k-NN  p 
reach  dist (O, p )  
k  dis tan ce p , O  k-NN  p 
f
lrd (O ) 
1 Dataset New point
LOF computed for the new point
 reach _ dist k (O , p ) / k
for all k neighbors p
Compute reach-dist from new point
f
1 to its k-neighbors
 lrd ( p )
Update k-distance for points having
LOF (O )  k for all k neighbors p
lrd (O ) New point in its k-neighborhood
lrd updated for all points where k-neighborhood changed

and to points whose reach-dist to one of their existing k-neighbors
changed reach-dist updated for points in
k-neighborhods of points where
LOF updated for points whereislrd
k-distance is updated
updated
and to points where lrd of one of its k-neighbors is updated
K-Nearest Neighbor Query
Sue
Tom Beverly
Allen
Sue and Beverly are 2 nearest neighbors of Tom! (k=2)

Sue Allen
Beverly
Tom
Inverse k-Nearest Neighbor
Query
Sue is among 2 nearest neighbors of Tom!
Sue is among 2 nearest neighbors of Allen!
Allen and Tom are inverse 2-nearest neighbors of Sue
Sue Allen
Beverly
Tom
Incremental LOF_insertion(Dataset S)
· Given: Set S {p1, … ,pN} pi RD to be inserted into the database
· For each data point pc in data set S
 insert(pc)
 Compute kNN(pc)
 (pj kNN(pc))
compute reach-distk(pc,pj) using Eq. (1);
//Update_neighbors of pc
 Supdate_k_distance =inverse k-NN(pc);
 (pj  Supdate_k_distance)
update k-distance(pj);
 Supdate_lrd = Supdate_k_distance;
 (pj  Supdate_k_distance), (pikNN(pj)\{pc})
reach-distk(pi,pj) =k-distance(pj);
if pj  kNN(pi)
Supdate_lrd = Supdate_lrd {pi};
 Supdate_LOF = Supdate_lrd;
 (pm  Supdate_lrd)
update lrd(pm) using Eq. (2);
Supdate_LOF = Supdate_LOF  inverse k-NN(pm);
 (pl  Supdate_LOF)
update LOF(pl) Eq. (3);
IncrementalLOF_deletion(Dataset S,Sdelete)
¨ Given: Dataset S {p1, … ,pN} pi RD,
¨ Set Sdelete {p1, … ,pM} pi RD to be deleted
¨ Supdate_k_distance=;
¨ (pc  Sdelete)
Supdate_k_distance = Supdate_k_distanceinverse k-NN (pc);
delete(pc); //we can delete pc after finding all friends
¨ Supdate_k_distance= Supdate_k_distance\Sdelete; //points from Sdelete may still be
//present when computing inverse nearest neighbors
¨ (pj  Supdate_k_distance)
update k-distance(pj);
¨ Supdate_lrd = Supdate_k_distance;
¨ (pj  Supdate_k_distance) (pi(k-1)NN (pj))
reach-distk(pi,pj)=k-distance(pj);
if pj  kNN (pi) then Supdate_lrd = Supdate_lrd {pi};
¨ Supdate_LOF = Supdate_lrd;
¨ (pm  Supdate_lrd)
update lrd(pm) using Eq. (2);
Supdate_LOF = Supdate_LOF  inverse k-NN (pm);
¨ (pl  Supdate_LOF)
Performance of Incremental LOF
algorithm
• The number of points where reach-dist, lrd, LOF is to
be updated is:
– Limited
– Does not depend on N
• E.g., For insertion:
– |Supdate_k_distance| F=O(k 2D)
– |Supdate_lrd|  k|Supdate_k_distance| =O(k2 2D)
– |Supdate_LOF|  (1+F)|Supdate_k_distance| =O(k3 22D)
• Computational performance of kNN and inverse-kNN is

O(logN) with proper data structure used
• Therefore: When inserting/deleting N points,
Incremental LOF performs O(N log N)
Simulation Evaluation
• Synthetic data sets with N{100,200,…,5000}
records generated
• Dimensionality varied D{2,3,4,5,10}
• For each simulation experiment:
• The neighborhood size varied we varied the values of
the parameter k {5, 10, 15, 20 }
• Number of k-distance, reach-dist, lrd and LOF
updates measured
• Experiments repeated 100 times
• Average results on number of LOF updates reported
k=5
60
50
# LOF updates
40
30
Dimension
20
2
3
10 4
5
10
0
0 1000 2000 3000 4000 5000
Number of points
Number of LOF updates vs. number N of points in the database

k=20
1000
Dimension
900
2
800 3
4
700 5
10
# LOF updates
600
500
400
300
200
100
0
0 1000 2000 3000 4000 5000
Number of points
Number of LOF updates vs. number N of points in the database

800
Dimension
700
2
3
600 4
5
500 10
#LOF updates
400
300
200
100
0
2 4 6 8 10 12 14 16 18 20
k
Number of LOF updates vs. neighborhood size k

800
Dimension
700 2
3
600 4
5
10
500
#LOF updates
400
300
200
100
0
0 234 5 10 15 20
k
Number of LOF updates vs. neighborhood size k

(note that abscise scale is quadratic)
Application of Incremental LOF
Identifying Distribution with Small Variance

Identifying Outbreak of Novel Distribution

Outliers in Trajectories
]Trajectory Representation
• Detection of unusual events in video sequence
25
Dynamic LOF
Identified transitions
Static LOF
20
15
10
0
20 30 40 50 60 70 80 90 100
Work in Progress
• Incremental LOF algorithm with “fading
memory”
– Old examples have gradually less and less
influence on the outlier computation
• Incrementalization of other outlier detection
algorithms
• Indexing structures for efficient
implementation of kNN, inverse-kNN
• Computation of tighter bounds for F
Thanks a lot!
• David Mount, CS, Univ. of • Dejan Milic, University of Nis
Maryland, College Park • Nenad Milosevic, University of
• Janko Milutinovic, DSU Nis
• Kam Kong, DSU • Aleksandra Vidovic, University
• Dragana Jankovic, DSU of Nis
• Brian Tjaden, Wellesley • Zoran Peric, University of Nis
College, USA • Jelena Nikolic, University of
• DOD/DOA for supporting Nis
this research • Marko Petrovic, University of
Nis
• Aleksandra Jovanovic,
University of Nis
For further questions, please contact me at:

dpokraja@desu.edu
dragoljub.pokrajac@comcast.net

Incremental LOF Algorithm

Uploaded by

Copyright:

Available Formats

You might also like

Incremental LOF Algorithm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Incremental LOF Algorithm

Uploaded by

Copyright:

Available Formats

Incremental Local Outlier

Dragoljub Pokrajac, AMRC, DSU

Reachability distance to particular points from point p

• For given coordinates of neighbors

• Goal: Update parameters of LOF algorithm

lrd updated for all points where k-neighborhood changed

Sue and Beverly are 2 nearest neighbors of Tom! (k=2)

• Computational performance of kNN and inverse-kNN is

Number of LOF updates vs. number N of points in the database

Number of LOF updates vs. number N of points in the database

Number of LOF updates vs. neighborhood size k

Number of LOF updates vs. neighborhood size k

Identifying Distribution with Small Variance

Identifying Outbreak of Novel Distribution

For further questions, please contact me at:

You might also like