Professional Documents
Culture Documents
Clustering in Wireless Sensor Network Using K-Means and Map Reduce Algorithm
Clustering in Wireless Sensor Network Using K-Means and Map Reduce Algorithm
Clustering in Wireless Sensor Network Using K-Means and Map Reduce Algorithm
Master of Technology
by
Jyoti R. Patole
Roll No: 121022017
2012
Dedicated to
my mother
Smt. Laxmibai R. Patole
and
my father
Shri. Ramdas R. Patole
DEPARTMENT OF COMPUTER ENGINEERING AND
INFORMATION TECHNOLOGY,
COLLEGE OF ENGINEERING, PUNE
CERTIFICATE
By
Jyoti R. Patole
(121022017)
Master of Technology.
Date :
Abstract
A wireless sensor network (WSN) consists of a large number of small sensors with
limited energy. Prolonged network lifetime, scalability, node mobility and load
balancing are important requirements for many WSN applications. Clustering the
sensor nodes is an effective technique to achieve these goals. The different clus-
tering algorithms also differ in their objectives. We have proposed a new method
to achieve these goals and the proposed method depends on MAP-REDUCE pro-
gramming model and K-MEANS clustering algorithm. So, new clustering algo-
rithm has been proposed to cluster the sensor nodes of a network. It uses MAP
REDUCE and K MEANS algorithm for clustering. Network is divided into number
of clusters, which we have taken as 5% of the total number of nodes of a network.
Nodes are assigned to the cluster having minimum distance to the cluster head
having maximum energy. The distance is calculated using Euclidean Distance
Formula. We have also calculated the intra cluster and inter cluster distance for
the cluster. We also found the end to end delay of packet transmission,energy
consumption for the transmission.
Initial simulations are performed to check how much we can lower the energy
consumption by placing the cluster heads over the grid. We have considered two
ways with which cluster heads can be placed over the grid, either place them
randomly or keep some distance among them. For this results are found and
checked. These results show that placing the cluster heads using some minimal
distance performs well than placing them randomly.
iv
Acknowledgments
The satisfaction that accompanies the successful completion of task would be in-
complete without mentioning the people who make it possible. I am grateful to
number of individuals whose professional guidance along with the encouragement
have made it very pleasant endeavour to undertake this project. I express my
sincere gratitude towards my guide Dr. Jibi Abraham for her constant help,
encouragement and inspiration throughout the project work. Without her invalu-
able guidance, this work would never have been a successful one. Also her true
criticism towards technical issues provided us to concentrate on transparency of
our work. I would also like to thank Prof.V. K. Pachghare, Dr. J.V. Aghav for
their valuable suggestions and helpful discussions. Last, but not the least, I would
like to thank all my classmates, my family and those who helped us directly or
indirectly in many ways in completion of this project work.
Jyoti R. Patole
College of Engineering, Pune
June, 2012
v
Contents
Abstract iv
Acknowledgments v
1 Introduction 1
1.1 Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Home Control . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Medical Monitoring . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Clustering in wireless sensor network . . . . . . . . . . . . . . . . . 3
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Thesis Objective and Scope . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Survey 7
2.1 LEACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 MAP REDUCE PROGRAMMING MODEL . . . . . . . . . . . . . 9
2.3 NS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Main NS2 Simulation Steps . . . . . . . . . . . . . . . . . . 10
2.3.2 Packet Tracing . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Main Parameters used in wireless networks simulation . . . . 12
3 System Design 15
3.1 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 MAP-REDUCE Programming Model . . . . . . . . . . . . . 15
3.1.2 K-MEANS Algorithm . . . . . . . . . . . . . . . . . . . . . 16
3.2 OUR PROPOSED SYSTEM . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 ALGORITHM ASSUMPTIONS . . . . . . . . . . . . . . . . 17
3.2.2 CLUSTER SETUP PHASE . . . . . . . . . . . . . . . . . . 18
3.3 Channel Propagation Model . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Radio Energy Dissipation . . . . . . . . . . . . . . . . . . . . . . . 20
Bibliography 26
vii
List of Figures
Introduction
• Sensing capability provides the detailed data about electric, gas, water
1
Chapter 1. Introduction
usage.
2
1.2 Clustering in wireless sensor network
Cluster Head: The Cluster head (CH) is considered as a leader for that
3
Chapter 1. Introduction
specific cluster. And it is responsible for different activities carried out in the
cluster, such as data aggregation, data transmission to base station, scheduling in
the cluster, etc.
Base Station: Base station is considered as a main data collection node for
the entire sensor network. It is the bridge (via communication link) between the
sensor network and the end user. Normally this node is considered as a node with
no power constraints.
Cluster: It is the organizational unit of the network, created to simplify the
communication in the sensor network.
mbox
Advantages of Clustering
4
1.3 Motivation
1.3 Motivation
The unique properties mentioned above become challenges to set up a sensor
network. The key challenge in setting up and proper operation of WSN is increase
the lifetime of the network by minimizing the energy consumption. Since from last
few years variety of changes have been made to limit the energy requirement in
WSN, as mainly energy dissipation is more for wireless transmission and reception
[15]. Main approaches till proposed were focusing at making the changes at MAC
layer and network layer to minimize the energy dissipation. Two more major
challenges are how to place the cluster heads over the grid and how many clusters
would be there in a network. If the cluster heads are properly placed over the
grid and sufficient clusters are formed,it will help to minimize the dissipation of
energy and would help to increase the lifetime of the network To tackle with all
the above mentioned challenges clustering have been found the efficient technique
[19] [20]. Clustering is always been referred as an effective method to enhance the
lifetime of WSN.
5
Chapter 1. Introduction
6
Chapter 2
Literature Survey
In recent years, the interest on clustered WSNs has generated a significant body of
research works. A CH may be elected by the sensors in a cluster or pre-assigned
by the network designer. A CH may also be just one of the sensors or a node
that is richer in resources. Also the cluster membership of a node may be fixed or
variable [25]. In this section, we will review CH selection algorithms
2.1 LEACH
Low Energy Adaptive Clustering Hierarchy (LEACH) by Heinzelman [15] [26] is
the most famous clustering protocol which had been a basis for many further clus-
tering protocols. The most important goal of LEACH is to have Cluster Heads to
reduce the energy cost of transmitting data from normal nodes to a distant Base
Station [3]. In LEACH, nodes organize themselves into local clusters with one
node acting as cluster head. All non-cluster head nodes (normal nodes) transmit
their data to the cluster heads. Cluster head nodes do some data aggregation
and/or data fusion function on which should be transmitted to Base Station. The
cluster heads change randomly over a period of time to balance the nodes energy
dissipation.
The operation of LEACH is divided into two phases: Set up Phase and steady
state phase. Each round begins with a set-up (clustering) phase when clusters are
organized, followed by a steady- state (transmission) phase in which data packets
are transferred from normal nodes to cluster heads. After data aggregation, clus-
ter heads will transmit the messages to the Base Station.
Set up Phase: During this phase each node decides whether or not to become
a cluster head for the current round. The election of cluster head is done with a
probability function: each node selects a random number between 0 and 1 and if
7
Chapter 2. Literature Survey
the number is less than T(n) , the node is elected as a cluster head for current
round:
p
1−p(rmod p1 )
if nG
T (n) = (2.1)
0 Otherwise
Where, P is the cluster head probability, r is the number of current round and G
is the set of nodes that have not been cluster-heads in last p1 rounds. After this
CH election, each cluster head prepares a TDMA schedule and transmits to all
the cluster nodes in that respective cluster. This completes the set up phase of
LEACH. Figure 2.1 [26] illustrates the set up phase through flowchart.
Steady State Phase: In this phase nodes send their collected data to CH
at once per frame allocated to them. This assumes that the node always has a
data to transmit. The node goes to sleep mode after this transmission until next
allocated transmission slot, to save the energy. The CH must keep its receiver on
all the time to receive the data from cluster nodes. After reception of all the data,
CH aggregates that data and transmits it to the base station.
The strength of LEACH is in its CH rotation mechanism and data aggregation.
But one important problem with LEACH is that it offers no guarantee about
placement and/or number of cluster head nodes in every round [3] [15]. Therefore
using a centralized clustering algorithm would produce better results. LEACH-
Centralized (LEACH-C) is a Base Station cluster formation algorithm. It uses
the same steady state protocol as LEACH. During the steady state phase, each
8
2.2 MAP REDUCE PROGRAMMING MODEL
node sends information about its current position and energy level to BS. The as-
sumption usually is that each node has a GPS receiver. The BS have to insure the
evenly distribution of energy among nodes. So it determines a threshold for energy
level and selects the nodes (with higher energy than this threshold) as possible
cluster heads. The problem of determining the optimal number of cluster heads
is an NP-Hard problem. LEACHC makes use of Simulated Annealing (Murata
and Ishibuchi, 1994) algorithm to address this problem. After determining the
cluster heads of current round, BS sends a message containing cluster head ID for
each node. If a node’s cluster head ID matches its own ID, the node is a cluster
head; otherwise it’s a normal node and can go to sleep until data transmission
phase. LEACH-C is more efficient than LEACH (LEACH-C delivers about 40%
more data per unit energy than LEACH) because the BS has global knowledge
of the location and energy level of all nodes in the network [15]. Also LEACH-C
always insures the existence of K optimal number of cluster heads in every set-up
phase while LEACH cannot ensure that [3] [14] [15].
Disadvantages
2.3 NS2
It is a discrete event simulator and very much useful for analysis of dynamic
nature of communication network. Figure 2.2 shows the basic architecture of NS2.
NS2 simulator is based on two languages: an object oriented simulator written in
c++ and OTcl (an Object Oriented extension of Tcl), an interpreter. OTcl is
used to execute users command scripts. These scripts can used to define network
topologies, define modules and their relationships, the protocol that we wish to
implement and application which is to be simulated, also the form of output that
is expected to be obtained, etc. After simulation it can generate the output in
the form of text or animation. To interpret these results graphically as well as
interactively, additional tools such as NAM (Network AniMator) and XGraph are
used.
9
Chapter 2. Literature Survey
Characteristics of NS2
Ns 2 implements following features:
3. Multicasting
4. Routing
7. Network Topology
8. Packet flow
Step 1: Simulation Design This is the first step of simulating the network in
which user must determine the purpose the simulation, network to be simu-
lated with its configuration, assumptions to be made, performance measures
and the expected type of output.
10
2.3 NS2
First two steps are implemented using C++ and OTcl languages as mentioned
earlier. Step 3 which is evaluating the performance of the simulated network, also
called as Packet Tracing.
In these details of packet flowing through the network check points (e.g., nodes
and queues) are recorded. Following figure 2.3 shows the format of each line
in a normal trace file in which 12 columns forms a line. Only having the trace
file would not be sufficient unless meaningful data is extracted from it. In post
analysis phase user can extract the data of interest can further analyse it as per
the requirement. For example, average throughput can be found for a link by
11
Chapter 2. Literature Survey
extracting the respective columns from the trace file. Another example could be
time taken to reach to the destination for each packet could be found. Two most
popular languages for this are AWK and pearl.
Another type of output is animation based output which is created using Network
AniMation (NAM) Trace. This NAM trace records the simulation details in a text
file and uses this text file to playback the simulation in animation form.
NAM
NAM provides a visual interpretation of the network topology created. Its features
are as follows. Figure 2.4 displays the NAM application and its components.
• Controls include play, stop ff, rw, pause, a display speed controller and a
packet monitor facility.
This section describes XGraph application used to analyse trace files produced
from a simulation.
XGraph Xgraph is an X-Windows application that includes:
To use XGraph in NS-2 the executable can be called within a TCL Script. This will
then load a graph displaying the information visually displaying the information
of the trace file produced from the simulation (see figure 2.5)
12
2.3 NS2
Medium Access Control (MAC) Medium access control for sensor node. Since
13
Chapter 2. Literature Survey
it’s a wireless sensor network simulation IEEE 802.11 is the MAC available.
Link Layer Link layer configuration. Uses NS-2 LL default link layer configu-
ration.
Interface Queue (IFQ) Interface priority queue. Eight queue models are pro-
vided: DropTail (default), DropTail/XCP, RED, RED/Pushback, RED/RIO,
Vq and XCP.
IFQ Length number of messages buffered in IFQ. User should provide this value
(50 messages by default).
Scenario Size Size (in meters) of the simulation scenario. The user should fill
in the length for the sides of the simulation rectangle (100 x 100meters by
default).
Trace Options Trace options. Radio buttons are used to define the kind of
information should be stored in trace file: TRACE-MAC, TRACE-ROUTE
and TRACE-AGENT. By default all three options are set “on”.
NS has a rich library of network and protocol objects. Additionally, large amount
of online support is available through mailing lists, message boards, tutorials, on-
line manuals, etc.
14
Chapter 3
System Design
3.1 BACKGROUND
15
Chapter 3. System Design
2. Calculate the distance between each of the data points to each of the centers,
and assign each point to the closest center.
3. Calculate the new cluster center by calculating the mean value of all data
points in the respective cluster.
4. With the new centers, repeat step 2. If the assignment of cluster for the
data points changes, repeat step 3 else stop the process.
The distance between the data points is calculated using Euclidean distance as fol-
lows. The Euclidean distance between two points or tuples, X1 = (x11 , x12 . . . x1n )
X2 = (x21 , x22 . . . x2n )
v
u n
uX
Dist(X1 , X2 ) = t (x
1i − x2i )2 (3.3)
i=1
16
3.2 OUR PROPOSED SYSTEM
17
Chapter 3. System Design
V alue1 → List of all other nodes along with their location information and energy level.
V alue2 → List of all other nodes with with their cluster heads.
0
key3 → List of new k ≤ k centroids.
Table 3.1 shows the process of a Mapper. Input to the mapper is list of initial set
of randomly selected k centroids as key1 and list of all other nodes along with their
location information and energy level as value1 . By using mapper (key1 , value1 )
protocol, the Map phase would produce list of new set of k centroids as key2 and
list of all other nodes with their cluster heads known to them as value2 .
Table 3.2 shows the process of a Reducer. The intermediate results produced by
Map protocol are given as input to the reducer i.e. list of new set of k centroids as
key2 and list of all other nodes with their cluster heads known to them as value2 .
By using reducer (key2 , value2 ) protocol, the Reduce phase would produce final
clusters with their cluster heads and other nodes in that cluster as value3 .The
term reduce is used in Reduce phase, which is meant for optimizing the output
and not for reducing the size of the output.
18
3.2 OUR PROPOSED SYSTEM
a) If the member node is losing the energy below the threshold, it will start searching
for better CH
b) Or the cluster head is running out of energy new CH will be assigned to the node.
d) Until no change
4 Produce value3
K MEANS algorithm will be called by MAP and REDUCE Protocol (See table
3.3).
Intra-Cluster Distance
This is the distance between the cluster nodes to its cluster centres to determine
whether the clusters are compact[30].
K X
1 X
intra =
x − Zi
(3.4)
N i=1 xCi
where N is the number of nodes in the network,K is the number of clusters, and
zi is the cluster centre of cluster Ci .
19
Chapter 3. System Design
Inter-Cluster Distance
This is the distance between clusters [30]. We calculate this as the distance be-
tween cluster centres, and take the minimum of this value, defined as
2
inter = (
Zi − Zj
) (3.5)
20
3.4 Radio Energy Dissipation
21
Chapter 4
In this section we have mentioned the details about the implementation of the
proposed algorithm and the results found after the implementation.The details
are as follows:
22
4.2 Simulation Results
Figure 4.2 shows the energy consumption for the data transmission in a net-
work. It shows thows that energy consumption is much lesser if the cluster heads
are separated with minimum distance.
23
Chapter 4. Implementation and Simulation
24
Chapter 5
Our proposed scheme does not need the homogeneous distribution of the nodes
over the grid. In MAP phase we are assigning the cluster heads to sensor nodes.
In REDUCE phase we tried to optimize the clusters by checking two conditions.
In first one we checked the energy of the CH, it is below some threshold new CH
will be assigned to the sensor nodes. It helps to minimize the dropped nodes in
the network. In second condition , if the energy of the common node is falling
below some threshold it tries to find out new CH.It will also help to minimize the
dropped nodes. We have considered have placed the CHs in the sensor network
such that minimum distance is maintained among them. Our algorithm tries to
change the cluster head of the nodes if the CH is running out of the energy, it
helps to minimize the dropped packets. Also the proposed scheme gives the better
performance in terms of throughput. Our scheme basically considers the energy
of the node as well as the position of the node , it helps to produce best cluster.
Our scheme does not consider the From this we can conclude that our proposed
algorithm achieves best results in terms of energy required, throughput of the
network and number of dropped packets.
25
Bibliography
[3] Neda Enami, Reza Askari Moghadam, Energy Based Clustering Self Organiz-
ing Map Protocol For extending Wireless Sensor Networks lifetime and cov-
erage Canadian Journal on Multimedia and Wireless Networks Vol. 1, No. 4,
August 2010.
[4] Shamneesh Sharma, Robin Prakash Mathur, Dinesh Kumar, Enhanced Reliable
Distributed Energy Efficient Protocol for WSN International Conference on
Communication Systems and Network Technologies, 2011.
[8] C. Nam, H. Jeong, and D. Shin, The Adaptive Cluster Head Selection in Wire-
less Sensor Networks” IEEE International Workshop on Semantic Computing
and Application, pp. 147-149, July 2008.
[9] C. Nam, Y. Ku, J. Yoon, and D. Shin, ”Cluster Head Selection for Equal Clus-
ter Size in Wireless Sensor Networks” Proceedings New Trends in Information
and Service Science, 2009. NISS, pp. 618-623, July 2009.
26
BIBLIOGRAPHY
[10] Alina Ene Sungjin Imy Benjamin Moseleyz, Fast Clustering using MapReduce
http://www.arxiv.org/abs/1109.1579v1 September 6, 2011.
[11] Jing Zhang, Gongqing Wu, Haiguang Li, Xuegang Hu, Xindong Wu, A 2-Tier
Clustering Algorithm with Map-Reduce The Fifth Annual ChinaGrid Confer-
ence,China, 2010.
[17] Dehni L, Kief F, Bennani Y., Power Control and Clustering in Wireless Sen-
sor Networks Proceedings of Med-Hoc-Net Mediterranean Ad Hoc Networking
Workshop, France, 2005
[18] F. L. LEWIS, D.J. Cook and S.K. Das, “Wireless Sensor Networks Smart
Environments: Technologies, Protocols, and Applications John Wiley, New
York, 2004.
[19] N. Vlajic and D. Xia Wireless Sensor Networks: To Cluster or Not To Clus-
ter? WoWMoM’06, 2006.
[20] Vivek Katiyar, Narottam Chand, Surender Soni, Clustering Algorithms for
Heterogeneous Wireless Sensor Network: A Survey International Journal of
Applied Engineering Research, DINDIGUL Volume 1, No 2, 2010.
27
BIBLIOGRAPHY
[22] Kazem Sohraby, Daniel Minoli, Taieb Znati WIRELESS SENSOR NET-
WORKS Technology, Protocols, and Applications John Wiley, New York, 2007.
[24] The VINT Project, The ns Manual (Formerly ns Notes and Documenta-
tion) A Collaboration between researchers at UC Berkeley, LBL, USC/ISI,
and Xerox PARC, DABT63-96-C-0105. http://www.isi.edu/nsnam/ns/
ns-documentation.html
[25] Inbo Sim, KoungJin Choi, KoungJin Kwon and Jaiyong Lee, Energy Effi-
cient Cluster header Selection Algorithm in WSN International Conference on
Complex, Intelligent and Software Intensive Systems, 978-0-7695-3575-3/09.
[26] Rajesh Patel, Sunil Pariyani, Vijay Ukani, Energy and Throughput Analysis
of Hierarchical Routing Protocol (LEACH) for Wireless Sensor Network IJCA
(0975 8887) Volume 20 No.4, April 2011.
[27] http://www.mannasim.dcc.ufmg.br/msg-basic-window.htm
[29] Charka Panditharathne and Soumya Jyoti Sen, Energy Efficient Communica-
tion Protocols for Wireless Sensor Networks ,a thesis for the degree of bachelor
of technology in Electronics and Instrumentation Engineering, National Insti-
tute of Technology, Rourkela Orissa May-2009.
[30] Asif Khan, Israfil Tamim, Emdad Ahmed, Muhammad Abdul Awal Multiple
Parameter Based Clustering (MPC): Prospective Analysis for Effective Clus-
tering in Wireless Sensor Network (WSN) Using K-Means Algorithm Wireless
Sensor Network,4, 18-24,2012.
28