Professional Documents
Culture Documents
Improving LIDAR
Improving LIDAR
1. Introduction
The association rules mining (Ceglar and Roddick, 2006) is an active research
topic. The range of applications are from the early warnings in food’s supply net-
works (Beulens et al., 2006) to spatial data analysis (Ding et al., 2006; Koperski
and Han, 1995). Traditional association rules are useful to discovering potentially
interesting patterns in databases, but in some cases there are millions of rules to
be analyzed, and the discovery of infrequent/hidden patterns is difficult (if not im-
possible) because in most cases infrequent patterns are hidden from the traditional
definition of the association rule (AR).
∗
Corresponding author
Email addresses: mabalderas@acm.org (Antonio Balderas Cepeda),
wsalinas@uat.edu.mx (Wilver-Enrique Salinas Castillo), mvogel@uat.edu.mx (Martı́n-Eleno
Vogel Vazquez), sjimenez@uat.edu.mx (Sergio-Bernardo Jı́menez Hernández)
2
proposed techniques for the improvement of classification of LiDAR data points
on land and non-land areas. Although there are some proposals (Ding et al.,
2006; Koperski and Han, 1995; Bogorny et al., 2008; Mennis and Liu, 2005) to
discover knowledge from spatial datasets, in this paper we propose a new filter
for HARs (called IDENT) that in combination with the use of interesting mea-
sures avoids the generation of millions of hidden association rules and provides
an useful ruleset.
In this document, we shows an on-going research work. We propose a filtering
techniques to discover compact and useful hidden rulesets. Moreover, in an effort
to reduce the number of anomalous rules obtained, the approach proposed can be
parallelized immediately. Furthermore, the rules obtained can be found useful in
bio-surveillance, credit screening, marketing, and in any application that requires
the identification of rare and significant patterns.
Through the next sections we will define our proposal and subsequently dis-
cuss the experimental results obtained from the analyzed LiDAR data.
2. Definitions
To define the filter called IDENT and the adaption of the Certainty Factor
as a filter, we must recall the definitions of hidden anomalous rules. Let I =
I1 , I2 , . . . , Im be a set of binary attributes called items. Let T be a database of
transactions. Each transaction t is represented as a binary vector, with t[k] = 1 if
t contains the item Ik and t[k] = 0 otherwise. The database contains one tuple for
each transaction, therefore a transaction t contains X (a set of some items in I) if
for all items in X, t[k] = 1.
A canonical anomalous association rule (CAAR) is an associative and im-
3
plicative pattern of the form X ⇒ A j |Yl , where X is a set of some items in I,
and A j and Yl are single items in I that are not present in X. The confidence fac-
tor of rule is obtained by the percentage of transactions in T that contain X¬Yl
and also contain A j . Then, the confidence and support of a rule is given by
con f (X ⇒ A j |Yl ) = Pr(A j |X¬Yl ) , supp(X ⇒ A j |Yl ) = Pr(A j X¬Yl ) . The support
of an itemset X is the number of transactions in T that contains X. The support
corresponds to a measure of statistical significance while confidence corresponds
with the strength of the implication.
Let X be an itemset, X + H is an extension of X iff X ∩ H = ∅, and we write it
XH.
A general filter over CAARs is defined to preserve general rules. Let a CAAR
be X ⇒ A j |Yl . Therefore, every rule with antecedent extensions is pruned, that is,
every rule XH ⇒ A j |Yl , with this filter will be called AP.
A significance filter is defined using the conditional probability. Let a rule be
X ⇒ A j |Yl , the rule’s significance be given by the percentage of transactions that
contain A j and also contain X¬Yl . This filter will be called SIG.
4
X ⇒ A j |Yl and V ⇒ A j |Yl , if supp(A j V X) = supp(A j XYl ) and supp(A j V X) =
supp(A j X), and the support of both rules be the same value. These rules relate to
the same transactions; they are correlated.
2.1. Constraints
To provide a concise anomalous ruleset, we use minimum thresholds as fol-
lows: a minimum confidence threshold (to provide a measure of the strength of
the association), a minimum support threshold (to provide statistical significance
to the obtained pattern), an absolute minimum threshold (to avoid the extraction
of very low support patterns supported by one or two transactions in T ), a mini-
mum attribute domain threshold (in the case of relational databases) to reveal hid-
den patterns and avoid obtaining a binary election, and a minimum significance
threshold that provides a measure of the strength of the anomalous pattern ob-
tained. con f (X ⇒ A j |Yl ) ≥ θ, con f (XYl ⇒ ¬A j ) ≥ θ, minS upp(X ⇒ A j |Yl ) ≥ ǫ
,absMinS upp(X ⇒ A j Yl ) ≥ ω, minDom(X ⇒ A j Yl ) ≥ β, minS ig(X ⇒ A j |Yl ) ≥ γ
3. Algorithm
5
root k
a b c d e f 1
b d ... d e f ... f 2
d e f 3
Figure 1: Tree structure representation of the data structure stored in memory. This structure is
generated by our algorithm (called IMAAM). The k column means the k-itemsets obtained, while
the spoted cells in the tree corresponds with infrequent k-itemsets.
this stage some filters (SIG, CF) can be applied. Lastly, we perform the filtering,
grouping and ranking.
The most computationally intensive stage is the first stage. The algorithm
must obtain the support of every k-itemset; this process is in general described as
follows;
supportEx{
tree.start(minSupp,absMinSupp, minDom)
tree.generateRoot() //generate 1-itemset
while(k ≤ userLimit && LB ! = empty){
tree.reorderLB() //last branch (LB)
tree.copyLB()
}}
We performed the analysis and application using one Turion II M600 4GB
(RAM) laptop, and a Sun Ultra 40 workstation with 16GB of RAM and four
processors, and used the MARS 6 Software (Merrick Co.) and AutoCAD Map3D
6
Figure 2: The terrain model obtained from the LiDAR dataset used in this proposal.
4. Application
Our pipeline can be described as follows: the continuous values were dis-
cretized using equi-width 10 bins, and then processed by our implementation in
Java. The resulting CAARs were modeled again in Autocad Map3D. We evaluate
our technique using the following minimum thresholds; minDom = 4, absMin-
Supp = 3, minSig=1/4, minSupp = 5% and 10%, minConf= 75% and 90%.
In figure 3 we illustrate a CAAR that has identified a points cloud corre-
sponding to a vegetation layer that was wrongly classified as terrain, hence, we
can use CAARs to improve the classification of LiDAR data. Quantitatively, the
number of CAARs obtained in any case were under 30, a human (and program)
manageable set. Therefore, our filtering adoption was useful since we were able
7
Figure 3: The red point cloud shows an hidden association rule. The back layer is a model of the
terrain as is shown in a previous image.
to improve the LiDAR data analysis within a manageable CAAR’s set. The time
taken to process the five millions records was about two seconds at the filtering
stage and seven minutes at the itemset and rule generation stages.
5. Conclusions
8
Balderas, M.-A., 2010. Rare Association Rule Mining and Knowledge Discov-
ery: Technologies for Infrequent and Critical Event Detection. IGI-Global, Ch.
Mining Hidden Association Rules from Real-Life Data, pp. 168–184.
Beulens, A., Li, Y., Kramer, M., van der Vorst, J., 2006. Possibilities for applying
data mining for early warning in food supply networks. In: 20th Workshop on
Methodologies and Tools for Complex System Modeling and Integrated Policy
Assessment. International Institute for Applied Systems Analysis.
Bezerra, F., Wainer, J., Aalst, W. M. P., 2009. Anomaly Detection Using Process
Mining. Vol. 29 of Lecture Notes in Business Information Processing. Springer
Berlin Heidelberg, pp. 149–161.
Bogorny, V., Kuijpers, B., Alvares, L. O., January 2008. Reducing uninteresting
spatial association rules in geographic databases using background knowledge:
a summary of results. International Journal Geographic Information Science.
22, 361–386.
Ceglar, A., Roddick, J. F., 2006. Association mining. ACM Computing Surveys
38 (2), 5.
Das, K., Schneider, J., 2007. Detecting anomalous records in categorical datasets.
In: Proceedings of the 13th ACM SIGKDD international conference on Knowl-
edge discovery and data mining. KDD ’07. ACM, New York, NY, USA, pp.
220–229.
Ding, W., Eick, C. F., Wang, J., Yuan, X., 2006. A framework for regional asso-
ciation rule mining in spatial datasets. IEEE International Conference on Data
Mining 0, 851–856.
9
Koh, Y., Rountree, N., 2005. Finding sporadic rules using apriori-inverse. In:
Proceedings of the Advances in Knowledge Discovery and Data Mining, 9th
Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer,
pp. 97–106.
Koperski, K., Han, J., 1995. Discovery of spatial association rules in geographic
information databases. In: Lecture Notes in Computer Science - Advances in
Spatial Databases. Springer-Verlag, pp. 47–66.
Mennis, J., Liu, J. W., 2005. Mining association rules in spatio-temporal data: An
analysis of urban socioeconomic and land cover change. Transactions in GIS
9 (1), 5–17.
Webb, A. G. I., 2006. Discovering significant rules. In: Proceedings of the 12th
ACM SIGKDD international conference on Knowledge discovery and data
mining. ACM Press, Philadelphia, PA, USA, pp. 434–443.
10