Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

va

- Ad nced
SA Re
R

se
A

ar
FERENCE -
Advanced Research in Scientific Areas

ch in S cien
ON
December, 2. - 6. 2013

ti
C

fi
L c
Ar
UA e as
- VIRT

Methodology for rare events prediction


Francisco Soler-Flores José Ángel Olivas Varela
Dep. de Ingeniería Civil: Transportes. ETSICCP. Departamento de Tecnologías y Sistemas de Información
Universidad Politécnica de Madrid Escuela Superior de Informática
Madrid, Spain Universidad de Castilla la Mancha
f.soler@upm.es Ciudad Real, Spain
joseangel.olivas@uclm.es

Abstract— The theory and applications of low probability events The work is divided into three main sections :
is popular in recent years because of its practical importance in
many different fields, as insurance, finance, engineering or Data Mining and Bayesian networks : basic concepts are
environmental science. This work presents a methodology for described in which
predicting rare events based on Bayesian networks that in turn  methodology supports .
allows consider alternative scenarios.
 Proposed Methodology : The proposed method is
Keywords: Rare events, Bayesian networks, poisson, Naive- described in detail.
Poisson
 Conclusions .
I. INTRODUCTION
II. DATA MINING AND BAYESIAN NETWORKS
Treatment of rare events , events that occur with a low
probability , it is a complex problem and comprehensive Data mining ( [7], [8] ) is the set of techniques and tools
treatment which is framed in the field of modelinguncertainty, applied tonontrivial process of extracting and presenting
decision theory and where the study of risk, defined in terms of knowledge implicit , previously unknown , potentially useful
theorydecision as the average losses or losses are forecast when and humanly understandable from large data sets, with order to
somethingbad happens , it is important . The ' Law of Rare predict trends and automated behaviors [9]. The term
Events ' , demonstrated by Poisson , basedmathematically the Intelligent Data Mining [8] is specifically referred to the
concept of rare occurrence . This law that bears his name , the application of machine learning methods( [10] ) , to discover
law of the rare events [1] is also called by Bortkiewicz ' law of and list patterns in the data , for these were developed a large
small numbers ' ( [2] ) . number of analysis methods based on statistical data. To the
extent that the amount of stored information [11] was increased
In recent decades numerous techniques for analysis and in databases, these methods first used for problems of
modelingdata in different areas of statistics ( [3], [4] ) and efficiency and scalability and that's when the concept of data
artificial intelligence [5] have been developed mining appears . One difference between the traditional data
Data mining [6] is a modern interdisciplinary area which analysis and data mining is that the former assumes that the
includes those techniques that operate automatically ( requiring hypotheses are already built and validated against the data ,
theminimal human intervention ) , and also are effective for while the second patterns are assumed and hypotheses are
working with large amounts information available in the automatically extracted from the data. The tasks of data mining
databases of many practical problems. These techniques can can be classified into two categories: descriptive data mining
extract useful knowledge (associations between variables , and data mining predicative [12]. different works such as [5]
rules, patterns , etc.) from the raw data stored , thus enabling and [ 13] apply data mining to the treatment of rare events.
better analysis and understanding problem . In some cases , this A. BAYESIAN NETWORKS
knowledge can also be post- processed to form allowing
automatic draw conclusions , and even make decisions almost Among the various techniques available in data mining,
automatically , practices in specific situations ( intelligent Bayesian networks or probabilistic networks allow modeling
systems ) . The practical application of these disciplines together all the relevant information for a given problem and
extends to many commercial and research areas where then using probabilistic inference mechanisms for conclusions
problems prediction, classification or diagnosis. based on the available evidence. Bayesian networks have been
used in the context of the estimation of rare event occurrence in
This paper proposes the use of Bayesian networks to some studies ([14] and [15]) without actually signal a general
estimate the occurrence of rare events. estimation method.
The aim of this paper is to present a methodology that , Bayesian networks ([16], [17]) are a compact representation
starting from the raw data of commonly available when of a multivariate probability distribution. Formally, a Bayesian
studying any problem within the framework data analysis , network is a directed acyclic graph where each node represents
systematized the study of rare events and is able to study a random variable and dependencies between variables are
different situations that affect the probability of occurrence of encoded in the structure of the graph according to the criterion
these . of d-separation [18]. Associated with each node in the network

The 2nd year of Advanced Research in Scientific Areas SECTION


http://www.arsa-conf.com Informatics - 459 -
va
- Ad nced
SA Re
R

se
A

ar
FERENCE -
Advanced Research in Scientific Areas

ch in S cien
ON
December, 2. - 6. 2013

ti
C

fi
L c
Ar
UA e as
- VIRT

is a conditional distribution of the parents of that node  The inductions are extremely fast, requiring only a step
probability, so that the joint distribution factored as the product to do so.
of conditional distributions associated with the nodes of the
network. That is, for a network with n variables  It is very sturdy considering irrelevant attributes.
(equation 1).  Evidence from many attributes takes to perform the
final prediction.
In the sense that their predictive capacity is competitive
with other existing classifiers, called Naïve-Bayes, described,
Equation 1 for example, [24] and [25] is one of the most effective
classifiers. This classifier learns from a training set the
Bayesian networks usually considered discrete or nominal conditional probability of each attribute Xi given the class C.
variables, so if are not, we must discretizarlas before building The classification is then applying Bayes' rule to calculate the
the model. Although there are network models Bayesian with probability of C given instances of X1, X2,. . . , Xn taking as
continuous variables, these are limited to Gaussian variables the highest predicted class posterior probability. These
and relationships linear . Discretization methods are divided calculations are based on a strong independence assumption:
into two main types : unsupervised and supervised [19] . The all attributes are conditionally independent Xi given the value
concept of causality [20] in a Bayesian network results in a of class C.
particular case of these called causal network [21] . Bayesian
networks may have a causal interpretation and although often As noted above, the Naive Bayesian classification assumes
used to represent causal relationships , the model does not have independence between the attributes given the class and its
to represent them in this way, Naive - Bayes (Section 2.2) is an structure is already given, so only have to learn the
example of this , relationships are not causal . probabilities of the values of the attributes given the class.
Probabilistic networks probabilistic automate the process of Since it is assumed that the predictors are conditionally
modeling [22] using expressiveness of graphs . The resulting independeientes given the variable C, we obtain:
models combine results of the theory graph (to represent the
relations of dependence and independence of all variables) and
the probability ( to quantify these relationships ) . This union
allows for both efficient machine learning model through the Equation 2.
calculation of parameters [23] that for the case of binary The Naive-Bayes model combined with the Poisson
variables is modeled by a Beta distribution and variables distribution is used for text classification in the work of [27]
multivalued by extension, that is the Dirichlet (Table 1) with good results. This paper proposes the addition Data
distrubución like inference from the available evidence. The mining using Bayesian networks and applying the probability
knowledge base of such systems is a estimation of the joint distribution knowned for the study of rare events. In this known
probability function of all the variables of the model, while form and the values of the variables used as predictors can
reasoning module which is where the calculation of conditional study the different possible situations and observe when the
probabilities is made. The study of this technique provides a occurrence of a rare event is more likely.
good overall view of the problem of learning statistical and
data mining . III. METHODOLOGY
The methodology presented in this work will be detailed .
TABLE I. ESTIMATORS
The purpose of it is to systematize the study of rare events ,
Estimator Expression
provide a tool for study based on the raw data , a priori, any
problem.
Maximum likelihood. Multinomial So the phases of this methodology are:
Preprocessing of data
Bayesian estimation. Dirichlet 1. Selection of variables and obtain their values.
2 . Discretization of variables
Construction of Bayesian network structure Naive - Bayes
Artificial Intelligence is concerned with algorithmic
Naive application -Poisson model
solutions with acceptable computational cost, while that
statistics has been more concerned the power of generalization A. . Preprocessing of data
of the results obtained, ie able to infer the results that the more Bayesian networks usually considered discrete or nominal
general situations studied. variables, so if are not, we must discretize before building the
Naive Bayes model is very used because they have certain model. Although there are network models Bayesian with
advantages: continuous variables, these are limited to Gaussian variables
and linear relationships . Discretization methods are divided
 Generally, it is simple to build and understand. into two main types : unsupervised andsupervised [19 ] .

The 2nd year of Advanced Research in Scientific Areas SECTION


http://www.arsa-conf.com Informatics - 460 -
va
- Ad nced
SA Re
R

se
A

ar
FERENCE -
Advanced Research in Scientific Areas

ch in S cien
ON
December, 2. - 6. 2013

ti
C

fi
L c
Ar
UA e as
- VIRT

This section of data preparation , they are formatted so that For the calculation of the parameter that determines the
the computer tools that can manipulate . In turn, this section is Poisson distribution and the average, take their maximum
also in the selection of variables and data discretization for likelihood estimator which is given by equation 5, where xi are
continuous variables. the discrete values of accident p (xi) the probability values
provided by the network built Naive-Bayes using the
This step also identifies the variable from which the rare algorithm, the discrete variable to be rated C, the variable we
events are to estimate. want to study the values we consider rare events and the
B. Construction of the Bayesian network remaining variables used for that purpose the other variables.
From the previous phase, the construction of the network
consists of the following:
 Identification of the Structure: identify causal
relationships , analyze the variables for dependencies Equation 5
and independencies For each strata values are taken the actual
 Calculation of parameters ( probabilities) : quantify values of the frequency of accidents except for the for
relationships and interactions which is assigned a value given by the average number of
accidents .higher to stratum (equation 3).
In this case a specific model , the Naive - Bayes is proposed
for the classification process and the variable for which you
want to estimate the rare events is the variable named parent Equation 3
[18 ] .
In this way we obtain the distribution described poisson
They will choose to develop the methodology for Naive associated with any of the situations to study
Bayes model . the model Naive Bayes is used because they (equation 4) with any set of values of the selected variables.
present , among others, certain advantages :
Thus, it is possible to determine that the situation is higher
 Generally, it is simple to build and understand. likelihood of low probability events which are detected once
 Inductions are extremely fast , requiring only a step to the values of the variable studied that given probability
do so. distribution are lower.

 It is very sturdy considering irrelevant attributes. In Figure 3 an 4 are seen two different examples. Different
case estudio for a variable. (Figure 3) shows the case of a
 Evidence takes many attributes to make the final variable with five values, five options and their resulting
prediction. probability distributions. Likewise, in Figure 4 the two
different distributions with respect to two variable values, two
C. Naive-Poisson model
layers are obtained.
The model or procedure by which , from the Bayesian
network built by Naive - Bayes model , we obtain the
probability distribution associated to estimate the probability of
occurrence of a rare event is called Naive -Poisson [28] . The
process of assigning the probability distribution consists of the
following sections:
 Poisson assumption
 Construction of the probability distribution
The distribution of the frequency of rare events is consistent
with a Poisson distribution [1] . Thus , it is assumed for the Figure 3
model building , the frequency of rare events follows a Poisson
distribution. After obtaining the values of the actual
distribution data were adjusted for this type of distribution
([28], [29]) . The probability distribution provides Bayesian
network constructed to estimate the probability for different
values of each of the values of variables that provides the
discretization. From the results obtained, the network is
adjusted with a Poisson distribution , which as we know is
determined by its mean ( equation 4).

Figure 4
Equation 4

The 2nd year of Advanced Research in Scientific Areas SECTION


http://www.arsa-conf.com Informatics - 461 -
va
- Ad nced
SA Re
R

se
A

ar
FERENCE -
Advanced Research in Scientific Areas

ch in S cien
ON
December, 2. - 6. 2013

ti
C

fi
L c
Ar
UA e as
- VIRT

IV. RESULTS AND CONCLUSIONS [8] E. Castillo, J. M. Gutiérrez, and A. S. Hadi. Expert systems and
probabilistic network models. Springer Verlag, 1997.
[9] J. Bromley, NA Jackson, OJ Clymer, AM Giacomello, and F. V. Jensen.
The use of hugin to develop bayesian networks as an aid to integrated
water resource planning. Environmental Modelling andSoftware,
20(2):231–242, 2005.
[10] K. J. Cios, W. Pedrycz, and R. W. Swiniarski. Data mining: A
knowledge discovery approach. Springer Verlag, 2007.
[11] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From
data mining to knowledge discovery in databases. AI magazine,
17(3):37, 1996.
[12] S. Acid and L. de Campos. Approximations of causal networks by
polytrees: an empirical study. Advances in Intelligent
Computingâ˘ATˇIPMU’94, pages 149–158, 1995. ID: 149.
[13] Gary M. Weiss. Mining with rarity: a unifying framework. ACM
Figure 5. Comparison SIGKDD Explorations Newsletter, 6(1):7–19, 2004.
[14] Seong-Pyo Cheon, Sungshin Kim, So-Young Lee, and Chong-Bum Lee.
This paper presents a methodology for, easily, from any set Bayesian networks based rare event prediction with sensor data.
of data and once selected the variable under study, low Knowledge-Based Systems, 22(5):336–343, 2009.
probability events or rare events are analyzed, in addition to [15] A. Ebrahimi and T. Daemi. Considering the rare events in construction
comparing different situations is proposed in order to control of the bayesian network associated with power systems. In Probabilistic
and the probability with which they occur (Figure 5). Methods Applied to Power Systems (PMAPS), 2010 IEEE 11th
International Conference on, pages 659–663. IEEE, 2010.
The methodology presented allows the systematic study of [16] J. H. Kim and J. Pearl. A computational model for causal and diagnostic
the occurrence of the events under study and propose reasoning in inference systems. In Proceedings of the 8th International
alternatives to control its occurrence. Other works opened Joint Conference on Artificial Intelligence, pages 190–193. Citeseer,
1983.
includes the development of software that allows intuitive use,
[17] J. Pearl. Probabilistic reasoning in intelligent systems: networks of
the study of measures of goodness of fit to complete their plausible inference. Morgan Kaufmann, 1988.
application and contrast with other artificial intelligence [18] E. Castillo, J. M. Gutierrez, and A. S. Hadi. Expert systems and
models. probabilistic network models. Springer Verlag, 1997.
[19] James Dougherty, Ron Kohavi, and Mehran Sahami. Supervised and
ACKNOWLEDGMENT unsupervised discretization of continuous features. In ICML, pages 194–
The author is grateful to the support ‘Departamento de 202, 1995.
Tecnologías y Sistemas de Información’ (Universidad de [20] Cristina Puente Agueda. Causality in sciencie. Pensamiento Matemático,
Castilla la Mancha) for conducting PhD thesis in the program (1):12, 2011.
“Tecnologías informáticas avanzadas”. [21] Judea Pearl. Causality: models, reasoning and inference, volume 29.
Cambridge Univ Press, 2000.
REFERENCES [22] Laura Uusitalo. Advantages and challenges of bayesian networks in
environmental modelling. Ecological Modelling, 203(3):312–318, 2007.
[1] Simon Denis Poisson. Recherches sur la probabilité des jugements en
matière criminelle et en matière civile,précédées des règles générales du [23] Luis Enrique Sucar. Redes bayesianas.
calcul des probabilités. Bachelier, 1837. [24] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. 2000. NY
[2] Ladislaus Bortkiewicz. Das gesetz der klienen zahlen (the law of small Wiley. ID: 129.
numbers.). Leipzig, Germany:Teubner, 1898. [25] P. Langley,W. Iba, and K. Thompson. An analysis of bayesian
[3] Micahael Tomz, Gary King, and Langche Zeng. Relogit: Rare events classifiers. In Proceedings of the National Conference on Artificial
logistic regression. Journal of statistical software, 8(i02), 2003. Intelligence, pages 223–223. JOHN WILEY and SONS LTD, 1992.
[4] F. Soler-Flores, J. M. Pardillo Mayora, and R. Jurado Piña. Tratamiento [26] José Antonio Gámez and José Miguel Puerta. Sistemas expertos
de outliers en los modelos de predicción de accidentes de tráfico. VIII probabilísticos, volume 20. Univ deCastilla La Mancha, 1998.
CIT, 02/07/2008-04/07/2008, La Coruña, España., 2008. [27] Sang-Bum Kim, Hee-Cheol Seo, and Hae-Chang Rim. Poisson naive
[5] Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri bayes for text classification with feature weighting. In Proceedings of
Napolitano. Mining data with rare events: a case study. In Tools with the sixth international workshop on Information retrieval with Asian
Artificial Intelligence, 2007. ICTAI 2007. 19th IEEE International languages-Volume 11, pages 33–40. Association for Computational
Conference on, volume 2, pages 132–139. IEEE, 2007. Linguistics, 2003.
[6] D. E. Holmes, J. Tweedale, and L. C. Jain. Data mining techniques in [28] F. Soler-Flores. Naive-poisson, a mathematical model for road accidents
clustering, association and classification. Data Mining: Foundations and frequency estimation. Conference of Informatics and Management
Intelligent Paradigms, pages 1–6, 2012. Sciences, pages 384–391, 2013.
[7] D. J. Hand. Data mining: statistics and more? The American Statistician, [29] F. Soler-Flores. Expert system for road accidents frequency estimation
52(2):112–118, 1998. based in naive-poisson. Global Virtual Conference, page 646, 2013.

The 2nd year of Advanced Research in Scientific Areas SECTION


http://www.arsa-conf.com Informatics - 462 -

You might also like