Ijsrcsamsv 7 I 3 P 174

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ISSN 2319 – 1953

International Journal of Scientific Research in Computer Science Applications and Management Studies

Naïve Bayesian based Algorithm for


IncrementalDistributed Classification of Wireless Sensor
Data
G. Das#1, A. Das*2
#1
Department of Computer Science Engineering, Assam Don Bosco University, Guwahati, Assam, India
*2
Department of Computer Science, St. Anthony’s College,Shillong, Meghalaya, India
1 2
ganapati_online@yahoo.com, anjan_sh@rediffmail.com

Abstract— Wireless sensor network (WSN) generate a huge degrade the predictive performance of a classification model
amount of data in the form of a stream. Data classification of this [3]. Steam nature of most real world data including wireless
stream is often used to gather knowledge from these data sensor data poses new challenges for the existing classical
collected by the WSN. Most classification algorithm learns from classification techniques like CART [4], ID3 [5], C4.5 [6],
training data at the beginning to develop a classification model
IFN [7],. Methods for extracting patterns from continuous
and uses this model to classify future unclassified data. Most of
these models are unable to incorporate new knowledge at later streams of data are known as incremental (online) learning
stage. It is also seen that in most cases there is a need to algorithms [3]. In most data stream, a complete training data is
incorporate new knowledge as new concepts are discovered at not available at the initial learning step and new training
later stage. Therefore, classification algorithms used for such tuples are introduced to the system at later stage. An
data stream must be able to learn incrementally as new classified incremental classification algorithm should be able to learn
data is introduced, while maintaining its earlier knowledge. Also from such intermittent training data without reconstructing the
considering the limited battery life of sensor node and it’s large classification model.
deployment demands a distributed strategy to overcome the With recent advances in wireless sensor technologies,
limitations of sensor network both in terms of processing
wireless sensor networks are deployed for various purposes.
capability and network life time. To deal with this problem a
noble algorithm for distributed incremental learning of Wireless sensor networks can be used in many applications,
classification knowledge is introduced in this paper. Algorithm is such as wildlife monitoring [8], hazardous environment
based on Naïve Bayesian classification algorithm. Using exploration [10], military target tracking and surveillance [9],
simulated experiment on real data, incremental learning etc. A major constraint in the case of wireless sensor network
capability of the algorithm is presented. is its limited energy as sensors are run by batteries with
limited power and in most cases, replacing of these batteries is
Keywords— Datamining, Wireless Sensor Networks, not possible as they are deployed in large number and mostly
Classification, Naïve Bayesian, in hostile environment. In most of these application, sensor
node continuously sense the physical phenomenon, such as
I. INTRODUCTION temperature, humidity, light intensity and report the same to
Classification is a form of data analysis that extracts models the sink or the base station. Small amount of these data may
describing important data classes [1]. In the process of be used as training data with class label. Later such model can
classification models or classifier is constructed to predict be used to classify unseen data without class label. Since the
class labels. Data classification is a two steps process, the first initial training data is negligible as compared to the enormous
phase is learning step, where classification model is data generated by a wireless senor network, and physical
constructed and the later phase is classification step where phenomenon may change over time, new class label may be
new unknown data is used to predict the class labels. There added later or the concepts of existing class may change over
are many approaches to develop a classification model like time, there is often a need to introduce new training data to
decision tree induction, Bayesian classification methods, update the knowledge base without losing the previously
classification by back-propagation, support vector machines, learned knowledge. This process of generating and updating
nearest neighbor classifier etc. [1]. set of classification rules, without regenerating previous
Decision tree induction is one of the most common results (rules) when new data objects become available is
techniques to solve the classification problem [2]. Most known as incremental learning capability [11].
classification methods are based on the assumption that the Classification techniques can be used in WSN to save
data conforms to a stationary distribution. However, the real- energy of the sensor. Sensor nodes may transmit only the class
world data is usually collected over certain periods of time, label instead of the raw data, which can significantly reduce
ranging from seconds to years, and ignoring possible changes the amount of energy consumed based on the number of
in the underlying concept, also known as concept drift, may parameter each sensor is monitoring. Say for example, a

IJSRCSAMS
Volume 7, Issue 3 (May 2018) www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies

wireless sensor network responsible for monitoring (b) If Ak is continuous-valued, then continuous-valued
temperature, humidity and light need to send three messages attribute is typically assumed to have a Gaussian
for each parameter along with the with its epoch id and sensor distribution with a mean µ and standard deviation 𝜎,
id. Sending only class label will reduce the number of
message by at least 60%. Moreover, no further classification defined by
1 (𝑥 −𝜇 )2
of the sensor data is required at the base station. 𝑔 𝑥, 𝜇, 𝜎 =

𝑒 2𝜎 2
In this paper, we present a distributed classification 2𝜋𝜎
algorithm for WSN, where each sensor generate its local so that,
classifier based on a set of training tuple and passes these 𝑃 𝑥𝑘 𝐶𝑖 = 𝑔(𝑥𝑘 , 𝜇𝐶𝑖 , 𝜎𝐶𝑖 )
classifiers to its parent node which uses these information
from its child along with its own training data and produces an
For tuple X, 𝑃 𝑋 𝐶𝑖 𝑃(𝐶𝑖 ) is evaluated for each class Ci, The
enhanced classifier. Ultimately the root node or the sink
classifier predict the class label of tuple X is the class Ciif and
generated a global classifier. Algorithm is also able to handle
only if
future learning from new training data presented at later state.
𝑃 𝑋 𝐶𝑖 𝑃 𝐶𝑖 > 𝑃 𝑋 𝐶𝑗 𝑃(𝐶𝑗 )for1 ≤ 𝑗 ≤ 𝑚, 𝑗 ≠ 𝑖
II. NAÏVE BAYESIAN CLASSIFICATION To avoid the case of probability value of zero Laplacian
Naïve Bayesian Classification is based on Bayes’ theorem. correction or Laplace estimator is used.
In order to simplify computation, Naïve Bayesian classifiers III. NAÏVE BAYESIAN CLASSIFICATION FOR INCREMENTAL
assume that the effect of an attribute value on a given class is LEARNING
independent of the values of the other attributes. This
assumption is called class conditional independence. It has Performance of naïve Bayesian classification is found to be
been found that naïve Bayesian classifier is comparable to comparable with decision tree and neural network in some
decision tree and selected neural network classifier in terms of domain. It exhibit high accuracy in large databases and
its performance. theoretically minimum error rate is found.
Let D be a set of training tuple. Let each tuple be represented Naïve Bayesian classification can be used for incremental
by n-dimensional attributes vector X = {x1, x2, x3,…, xn} learning by updating counts for each attribute in case of
measurements made on n tuples be A1, A2, A3, … , An.Suppose, categorical data and updating sum of values and sum of
there are m classes, C1, C2, C3, …, Cm . Given a tuple, X, the squared values (mean and standard deviation can be calculated)
classifier will predict that X belongs to the class having the for continuous data. When new training data is arrived, it has
highest posterior probability, conditioned on X. That is, the to increment the count or recalculate the mean and standard
naïve Bayesian classifier predicts that tuple X belongs to the deviation for the existing and newly arrived data. In this way,
class Ci if and only if there is a no loss of the earlier information learned and also it
P(Ci|X) > P(Cj|X) for 1≤ i, j ≤ m, j ≠ i. can easy cope with the concept drift. Algorithm1 and
Thus, we maximize P(Ci|X). The class Ci for which P(Ci|X) is Algorithm 2 describe the Incremental Learning of Naïve
maximized is called the maximum posteriori hypothesis. By Bayesian classification for continuous valued attribute and
Bayes’ theorem, Classification process for Naïve Bayesian classification. To
𝑃 𝑋 𝐶𝑖 𝑃(𝐶𝑖 ) understand the working of Incremental classification, training
𝑃 𝐶𝑖 𝑋 = data in Table I and Table II can be used. Tuples in Table I are
𝑃(𝑋)
the initial training tuple while the tuples in Table II are the
P(X) is constant for all class and commonly assumed that all
training tuples which is introduced at later state.
classes are equally likely, i.e. P(C1 )=P(C2)=…=P(Cm).
Table II is the initial training data based on which
Therefore only P(X |Ci) must be maximized. Class prior
classification model is derived. Suppose a new tuple X arrives
probability can be estimated by using
after the initial training.
𝑃 𝐶𝑖 = 𝐶𝑖,𝐷 |/|𝐷| X= (Temperature = 20.09, Humidity = 48.89, Light = 549.9).
In Naïve Bayesian Classification, class conditional
independence is assumed and thus Prior probability P(Ci) of the each class can be computed
𝑛
using the training data,
𝑃 𝑋 𝐶𝑖 = 𝑃(𝑥𝑘 |𝐶𝑖 ) P(Class = 0) = 5/10 = 0.5
𝑘=1 P(Class = 1) = 5/10 = 0.5
= 𝑃 𝑥1 𝐶𝑖 × 𝑃 𝑥2 𝐶𝑖 × … × 𝑃(𝑥𝑘 |𝐶𝑖 )
P (Temperature=20.09 | Class = 0) = 𝑔(20.09, 𝜇𝐶0 , 𝜎𝐶0 )
In order to compute 𝑃 𝑋 𝐶𝑖 , we need to consider the
=0.172317845
following
P (Temperature=20.09 | Class = 1) =𝑔(20.09, 𝜇𝐶1 , 𝜎𝐶1 )
(a) If Akis categorical, then 𝑃(𝑥𝑘 |𝐶𝑖 ) is the number of
=0.133955904
tuples of class Ci in D having the value xk for Ak, P (Humidity = 48.89 |Class = 0) =𝑔(48.89, 𝜇𝐶0 , 𝜎𝐶0 )
divided by |Ci,D|, the number of tuples of class Ci in = 1.87765E-08
D.

IJSRCSAMS
Volume 7, Issue 3 (May 2018) www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies

P (Humidity = 48.89 |Class = 1) =𝑔(48.89, 𝜇𝐶1 , 𝜎𝐶1 ) TABLE III


INCREMENTAL TRAINING DATA
=0.006719924
P (Light = 549.9 | Class 0) = 𝑔(549.9, 𝜇𝐶0 , 𝜎𝐶0 ) RID Temperature Humidity Light Class:
=0.015232308 Suitability
P (Light = 549.9 | Class 1) = 𝑔(549.9, 𝜇𝐶1 , 𝜎𝐶1 ) 11 25.0158 39.1443 684.48 0
12 23.0558 43.7518 537.28 1
=0.000962359
13 20.449 36.4741 86.48 1
Algorithm 1: Learning Algorithm 14 19.9296 39.0082 1494.08 0
15 25.8194 37.0246 684.48 0
Require training data set Dt
Do We can find the following using the above probabilities
If 𝑋 ∈ 𝐶𝑖 P (X | Class = 0) =P (Temperature=20.09 | Class = 0)
Increment count for class ctri × P (Humidity = 48.89 |Class = 0)
For Each attribute ai × P (Light = 549.9 | Class 0)
calculatesum_ai = sum_ai + ai = 0.172317845×1.87765E-08×0.015232308 = 4.92845E-11
calculatesum_sq_ai = sum_sq_ai+ ai^2
End For Similarly, P (X | Class = 1)
Else =0.133955904× 0.006719924×0.000962359=8.6629E-07
Create new Class Cj
For Each attribute ai Finally, we compute P (X |Ci)P(Ci)
calculatesum_ai = sum_ai + ai P (X | Class = 0) P(Class = 0)= 4.92845E-11×0.5
calculatesum_sq_ai = sum_sq_ai+ ai^2 = 2.46422E-11
End For P (X | Class = 1) P(Class = 1)= 8.6629E-07×0.5
End If =4.33145E-07
While |Dt| According to the initial training, tuple X belongs to Class 1.
For each attribute aiin 𝐶𝑖 Once new training tuple arrives (Table 2), new knowledge
Calculate , 𝜇𝐶𝑖 = sum_ai /ctri can be easy incorporated to the existing knowledge by
Calculate stdev_ai=sqrt(sum_sq_ai) /ctri-mean_ai^2 updating the count (for categorical data) or by updating the
Calculate prior probability P(Ci)=ctri / |Dt| mean and standard deviation (for continuous data). Using
similar calculations, we can compute the class of tuple X after
Algorithm 2: Naïve Bayesian Classification incremental learning. In this case the class of tuple X is found
Require unclassified data D to be 1
P (X | Class = 0) P(Class = 0)= 1.3543E-11
Do
P (X | Class = 1) P(Class = 1)= 3.22692E-06
For each Ci Although in the above example, it did not make any
Calculate 𝑃 𝑎𝑘 𝐶𝑖 = 𝑔(𝑎𝑘 , 𝜇𝐶𝑖 , 𝜎𝐶𝑖 ) difference to the class of the tuple X, it can be easily said that
Class Label of X = Ci which has algorithm will be able to deal with concept drift if new
training data arrive at a later state.
Max(𝑃 𝑋 𝐶𝑖 𝑃 𝐶𝑖 )
IV. HIERARCHICAL DISTRIBUTED CLASSIFICATION BASED ON
While |D| NAÏVE BAYESIAN CLASSIFICATION
TABLE I Let S = {s0, s1, s2, s3, …,sn}, be a set of sensor covering an
INITIAL TRAINING DATA area s0 being the base station. These sensors are responsible
RID Temperature Humidity Light Class: for collecting data from the environment being monitored.
Suitability Collected data is transmitted to the base station using a routing
1 19.9884 37.0933 45.08 0 protocol discussed later.
2 19.3024 38.4629 45.08 0
3 27.015 40.8718 264.96 1 A. Spanning Tree
4 22.2816 44.3162 104.88 1 Spanning tree construction in our algorithm is similar to the
5 22.3796 33.3857 426.88 1 one discussed in [15], where the base station broadcast tree
6 23.046 34.4317 93.84 1 construction message containing the source (base station in
7 22.9088 34.2577 772.8 0 this case) and the depth (0 in this case). Upon receiving the
8 19.4984 38.7357 97.52 1 message, the sensor nodes in its communication range
9 23.5654 33.979 1847.36 0 responds to the base station as a candidate child node. The
10 24.9864 39.2123 684.48 0 base station randomly select its children based on the
maximum children node set for it and sends a message to the

IJSRCSAMS
Volume 7, Issue 3 (May 2018) www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies

child to inform about its selection. Selected child nodes


broadcast the tree construction message with depth of 1 and
the process continues iteratively until all sensor nodes are
assigned to the tree as leaf or branch node.
B. Hierarchical Classification
Once spanning tree is constructed, leaf nodes calculate the
sum of its attributes and sum of the squared values and the
counts of its local training data and send it to its parent. Parent
nodes use this information from its child nodes along with its
own values and create new sum and squared sum of values
and transmit the same to their parent nodes. The process Fig 2 Spanning tree of depth 4.
continues until the sink node receives the final values. The
sink nodes generate the global mean and standard deviation
and broadcast the same to all the nodes. Nodes can now find
the class label and transmit only the class label instead of raw
data.

V. EXPERIMENTAL RESULTS
In our simulated experiment, we used real dataset that has
been collected from 54 sensors deployed in the Intel Berkeley
Research lab between February 28th and April 5th, 2004
(http://db.csail.mit.edu/labdata/labdata.html). Data contains
about 2.3 million readings collected over the period. We used
Fig 3 Spanning tree of depth 5
the same location of the sensors and assumed that the sink
node is located at the center of the lab. As there is no
estimation of values or any pseudo data creation and all
50
information sensed by any sensor is preserved, it offers same
Millions
No. of message

level of accuracy as that of centralized implementation of 40


classification algorithm. Using our spanning tree construction 30
Without
algorithm, we get spanning tree of height 3, 4, and 5 (Figure 1 20
Classifier
to 3). We calculate the reduction in the number of message 10
send to collect the above data at different height. Experiment 0 With
shows that a high tree needs more number of messages as Classifier
compared to a broad tree. Number of messages required to 3 4 5
gather the above data with and without hierarchical distributed Depth
naïve Bayesian classification presented in the Figure 4. It also
shows that in all cases, total number of messages transmitted
was reduced by 39-41%. Fig 4 Number of messages at different depth

VI. CONCLUSION
Use of distributed classification algorithm can preserve
energy of the sensors by reducing the message length of the
sensors. In this paper, we present a noble hierarchical
distributed classification algorithm based on naïve Bayesian
classification. Naïve Bayesian classification technique has
been found scalable with high accuracy and hence it is
suitable for wireless sensor networks as it generates a huge
stream of unbounded data. Moreover, incremental naïve
Bayesian classification can be used to ensure that new
knowledge can be easily incorporated without losing earlier
knowledge. In distributed hierarchical naïve Bayesian
classification, it further increases efficiency of the sensor
networks by increasing its network life time. Since this
algorithm classifies data at each sensor node and transmits
Fig 1 Spanning tree of depth 3. only the class label instead of the raw data, there is no need of

IJSRCSAMS
Volume 7, Issue 3 (May 2018) www.ijsrcsams.com
ISSN 2319 – 1953
International Journal of Scientific Research in Computer Science Applications and Management Studies

running classification algorithm at the base station. Simulated [11] Ning Shan and WojciechZiarko, ―Data-Based Acquisition and
Incremental Modification of Classification Rules,‖ Computational
experiments show that the messages transmitted reduces
Intelligence, Vol. 11(2), pp. 357–370, May 1995.
significantly by using this algorithm. One drawback of this [12] Yu Wang, ―An Incremental Classification Algorithm for Mining Data
algorithm is that naïve Bayesian classifier makes an with Feature Space Heterogeneity,‖ Mathematical Problems in
assumption of class conditional independence but practical Engineering, Feb 2014.
[13] Shaoning Pang, Seiichi Ozawa, and Nikola Kasabov, ―Incremental
scenario, dependencies between variables may exist.
Linear Discriminant Analysis for Classification of Data Streams,‖
IEEE transactions on systems, man, and cybernetics—part b, vol. 35(5),
pp. 905-914, Oct 2005.
REFERENCES [14] Seiichi Ozawa ;Shaoning Pang ; Nikola Kasabov, ―Incremental
[1] J Han, M Kamber, J Pei, Data Mining, Concepts and Techniques, 3rd Learning of Chunk Data for Online Pattern Classification Systems,‖
ed. Morgan Kaufmann, 2012, pp.327-391. IEEE Transactions on Neural Networks, vol.19(6), Jun 2008.
[2] Al-Hegami, Ahmed Sultan, ―Classical and Incremental Classification [15] XuCheng ;JiXu ; Jian Pei ; Jiangchuan Liu, ―Hierarchical distributed
in Data Mining Process,‖ International Journal of Computer Science data classification in wireless sensor networks,‖ IEEE 6th International
and Network Security, Vol. 7(12), pp. 179-187, Dec 2007. Conference on Mobile Adhoc and Sensor Systems MASS '09, Oct, 2009.
[3] M. Last, ―Online classification of nonstationary data streams,‖ Journal [16] Khushboo Sharma, ManishaRajpoot, and Lokesh Kumar Sharma,
of Intelligent Data Analysis, Vol. 6(2), pp. 129-147, Apr 2002. ―Nearest Neighbour Classification for Wireless Sensor Network Data,‖
[4] L. Breiman, J.H. Friedman, R.A. Olshen, & P.J. Stone, Classification International Journal of Computer Trends and Technology, vol. 2(2),
and Regression Trees, Wadsworth, 1984. 2011.
[5] J.R. Quinlan, ―Induction of Decision Trees,‖ Machine Learning, Vol. [17] Seidl T., Assent I., Kranen P., Krieger R., Herrmann J., ―Indexing
1(1), pp. 81-106, 1986. Density Models for Incremental Learning and Anytime Classification
[6] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan on Data Streams,‖ in Proceedings of 12th International Conference on
Kaufmann, 1993. Extending Database Technology (EDBT/ICDT 2009), pp. 311-322, Jan
[7] O. Maimon and M. Last, Knowledge Discovery and Data Mining, The 2009.
Info-Fuzzy Network (IFN) Methodology, Kluwer Academic Publishers, [18] Paul E. Utgoff, ―Incremental Induction of Decision Trees,‖ Machine
2000. Learning, vol. 4(2), pp 161–186, Nov 1989.
[8] Garcia-Sanchez, et. al. ―Wireless Sensor Network Deployment for [19] Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S.
Monitoring Wildlife Passages,‖ Sensors, Vol. 10(8), pp. 7236-7262, Yu, ―On demand classification of data streams,‖ in Proceedings of the
Aug 2010. tenth ACM SIGKDD international conference on Knowledge discovery
[9] T Bokarevaet. al., ―Wireless sensor networks for battlefield and data mining, pp. 503-508, Aug 2004.
surveillance,‖ in Proceedings of the Land Warfare Conference, Oct [20] http://db.csail.mit.edu/labdata/labdata.html
2006.
[10] G. Werner-Allen, et. al., ―Deploying a wireless sensor network on an
active volcano,‖ IEEE Internet Computing, Vol. 10 (2), pp 18–25, Mar-
Apr, 2006.

IJSRCSAMS
Volume 7, Issue 3 (May 2018) www.ijsrcsams.com

You might also like