Clustering in Wireless Sensor Network Using K-Means and Map Reduce Algorithm

Clustering in Wireless Sensor Network
using K-MEANS and MAP REDUCE

Algorithm
Dissertation
submitted in partial fulfillment of the requirements

for the degree of
Master of Technology
by
Jyoti R. Patole
Roll No: 121022017
under the guidance of

Dr. Jibi Abraham
Department of Computer Engineering and Information Technology

College of Engineering, Pune
Pune - 411005.
2012
Dedicated to
my mother
Smt. Laxmibai R. Patole
and
my father
Shri. Ramdas R. Patole
DEPARTMENT OF COMPUTER ENGINEERING AND
INFORMATION TECHNOLOGY,
COLLEGE OF ENGINEERING, PUNE
CERTIFICATE
This is to certify that the dissertation titled
Clustering in Wireless Sensor Network using

K-MEANS and MAP REDUCE Algorithm
has been successfully completed
By
Jyoti R. Patole
(121022017)
and is approved for the degree of
Master of Technology.
Dr. Jibi Abraham, Dr. Jibi Abraham,

Guide, Head,
Department of Computer Engineering Department of Computer Engineering
and Information Technology, and Information Technology,
College of Engineering, Pune, College of Engineering, Pune,
Shivaji Nagar, Pune-411005. Shivaji Nagar, Pune-411005.
Date :
Abstract
A wireless sensor network (WSN) consists of a large number of small sensors with
limited energy. Prolonged network lifetime, scalability, node mobility and load
balancing are important requirements for many WSN applications. Clustering the
sensor nodes is an effective technique to achieve these goals. The different clus-
tering algorithms also differ in their objectives. We have proposed a new method
to achieve these goals and the proposed method depends on MAP-REDUCE pro-
gramming model and K-MEANS clustering algorithm. So, new clustering algo-
rithm has been proposed to cluster the sensor nodes of a network. It uses MAP
REDUCE and K MEANS algorithm for clustering. Network is divided into number
of clusters, which we have taken as 5% of the total number of nodes of a network.
Nodes are assigned to the cluster having minimum distance to the cluster head
having maximum energy. The distance is calculated using Euclidean Distance
Formula. We have also calculated the intra cluster and inter cluster distance for
the cluster. We also found the end to end delay of packet transmission,energy
consumption for the transmission.
Initial simulations are performed to check how much we can lower the energy
consumption by placing the cluster heads over the grid. We have considered two
ways with which cluster heads can be placed over the grid, either place them
randomly or keep some distance among them. For this results are found and
checked. These results show that placing the cluster heads using some minimal
distance performs well than placing them randomly.
iv
Acknowledgments
The satisfaction that accompanies the successful completion of task would be in-
complete without mentioning the people who make it possible. I am grateful to
number of individuals whose professional guidance along with the encouragement
have made it very pleasant endeavour to undertake this project. I express my
sincere gratitude towards my guide Dr. Jibi Abraham for her constant help,
encouragement and inspiration throughout the project work. Without her invalu-
able guidance, this work would never have been a successful one. Also her true
criticism towards technical issues provided us to concentrate on transparency of
our work. I would also like to thank Prof.V. K. Pachghare, Dr. J.V. Aghav for
their valuable suggestions and helpful discussions. Last, but not the least, I would
like to thank all my classmates, my family and those who helped us directly or
indirectly in many ways in completion of this project work.
Jyoti R. Patole
College of Engineering, Pune
June, 2012
v
Contents
Abstract iv
Acknowledgments v
List of Figures viii
1 Introduction 1
1.1 Wireless Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Home Control . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Medical Monitoring . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Clustering in wireless sensor network . . . . . . . . . . . . . . . . . 3
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Thesis Objective and Scope . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Survey 7
2.1 LEACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 MAP REDUCE PROGRAMMING MODEL . . . . . . . . . . . . . 9
2.3 NS2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Main NS2 Simulation Steps . . . . . . . . . . . . . . . . . . 10
2.3.2 Packet Tracing . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.3 Main Parameters used in wireless networks simulation . . . . 12
3 System Design 15
3.1 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 MAP-REDUCE Programming Model . . . . . . . . . . . . . 15
3.1.2 K-MEANS Algorithm . . . . . . . . . . . . . . . . . . . . . 16
3.2 OUR PROPOSED SYSTEM . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 ALGORITHM ASSUMPTIONS . . . . . . . . . . . . . . . . 17
3.2.2 CLUSTER SETUP PHASE . . . . . . . . . . . . . . . . . . 18
3.3 Channel Propagation Model . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Radio Energy Dissipation . . . . . . . . . . . . . . . . . . . . . . . 20
4 Implementation and Simulation 22

4.1 Simulation Set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Future Scope and Conclusion 25
Bibliography 26
vii
List of Figures
1.1 Typical Sensor Network Arrangement . . . . . . . . . . . . . . . . . 2

1.2 Home Control Application . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Clustered Sensor Network . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Flow Chart for Set UP Phase . . . . . . . . . . . . . . . . . . . . . 8

2.2 Basic Architecture of NS 2 . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Format of each line in a trace file. . . . . . . . . . . . . . . . . . . . 11
2.4 NAM Tool Description. . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 XGraph running comparing three trace files in a graph. . . . . . . . 13
3.1 Map Reduce Illustration. . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Original K-MEANS Algorithm. . . . . . . . . . . . . . . . . . . . . 17
3.3 Radio Energy Dissipation Model [26] . . . . . . . . . . . . . . . . . 21
4.1 End To End Delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2 ENERGY REQUIRED. . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 1
Introduction
1.1 Wireless Sensor Network

Wireless sensor network [18] is a popular area for research now days, due to vast po-
tential usage of sensor networks in different areas. A sensor network is a comprised
of sensing, processing, communication ability which helps to observe, instrument,
react to events and phenomena in a specified environment [22] [2]. This kind of
network enables to connect the physical world to environment. By networking
tiny sensor nodes, it becomes easy to obtain the data about physical phenomena
which was very much difficult with conventional ways. Wireless sensor network
typically consist of tens to thousands of nodes. These nodes collect, process and
cooperatively pass this collected information to a central location. WSNs have
unique characteristics such as low duty cycle, power constraints and limited bat-
tery life, redundant data acquisition, heterogeneity of sensor nodes, mobility of
nodes, and dynamic network topology, etc [22]. Figure 1.1 [22] depicts a typi-
cal WSN arrangement. Application of WSNs exists in variety of fields including
environmental applications, medical monitoring, home security, surveillance, mil-
itary applications, air traffic control, industrial and manufacturing automation,
process control, inventory management, distributed robotics, etc [1][22]. Consider
the following application for better understanding.
1.1.1 Home Control

Home control [22] is the best example to illustrate the application of wireless sensor
network. It provides control as well as safety to home, as follows (see figure 1.2)
[22]:
• Sensing capability provides the detailed data about electric, gas, water
1
Chapter 1. Introduction
Figure 1.1: Typical Sensor Network Arrangement
Figure 1.2: Home Control Application
usage.
• Sensing capability provides the flexible management of temperature, cool-

ing, heating as well as lighting anywhere in the home with single remote
control.
• Sensing capability provides automatic notification upon detection of some

unusual events in the home.
• Sensing capability enables easy way to install, upgrade and networking of
2
1.2 Clustering in wireless sensor network
home control system without running any cable.
1.1.2 Medical Monitoring

A different application domain that can make use of wireless sensor network tech-
nology can be found in the area of medical monitoring. This field ranges from
monitoring patients in the hospital using wireless sensors to remove the constraints
of tethering patients to big, bulky, wired monitoring devices, to monitoring pa-
tients in mass casualty situations, to monitoring people in their everyday lives
to provide early detection and intervention for various types of disease. In these
scenarios, the sensors vary from miniature, body-worn sensors to external sensors
such as video cameras or positioning devices. This is a challenging environment
in which dependable, flexible, applications must be designed using sensor data as
input [29].
Consider a personal health monitor application running on a PDA that receives

and analyzes data from a number of sensors (e.g., ECG, EMG, blood pressure,
blood flow). The monitor reacts to potential health risks and records health infor-
mation in a local database. Considering that most sensors used by the personal
health monitor will be battery-operated and use wireless communication, it is clear
that this application requires networking protocols that are efficient, reliable, scal-
able and secure.
1.2 Clustering in wireless sensor network

In clustering, the sensor nodes are partitioned into different clusters. Each cluster
is managed by a node referred as cluster head (CH) and other nodes are referred
as cluster nodes. Cluster nodes do not communicate directly with the sink node.
They have to pass the collected data to the cluster head. Cluster head will ag-
gregate the data, received from cluster nodes and transmits it to the base station.
Thus minimizes the energy consumption and number of messages communicated
to base station. Also number of active nodes in communication is reduced. Ulti-
mate result of clustering the sensor nodes is prolonged network lifetime.
Sensor Node: It is the core component of wireless sensor network. It has the
capability of sensing, processing, routing, etc.
Cluster Head: The Cluster head (CH) is considered as a leader for that
3
Figure 1.3: Clustered Sensor Network
specific cluster. And it is responsible for different activities carried out in the
cluster, such as data aggregation, data transmission to base station, scheduling in
the cluster, etc.
Base Station: Base station is considered as a main data collection node for
the entire sensor network. It is the bridge (via communication link) between the
sensor network and the end user. Normally this node is considered as a node with
no power constraints.
Cluster: It is the organizational unit of the network, created to simplify the
communication in the sensor network.
mbox
Advantages of Clustering
• Transmit aggregated data to the data sink
• Reducing number of nodes taking part in transmission
• Useful Energy consumption
• Scalability for large number of nodes
• Reduces communication overhead
• Efficient use of resources in WSNs
4
1.3 Motivation
1.3 Motivation
The unique properties mentioned above become challenges to set up a sensor
network. The key challenge in setting up and proper operation of WSN is increase
the lifetime of the network by minimizing the energy consumption. Since from last
few years variety of changes have been made to limit the energy requirement in
WSN, as mainly energy dissipation is more for wireless transmission and reception
[15]. Main approaches till proposed were focusing at making the changes at MAC
layer and network layer to minimize the energy dissipation. Two more major
challenges are how to place the cluster heads over the grid and how many clusters
would be there in a network. If the cluster heads are properly placed over the
grid and sufficient clusters are formed,it will help to minimize the dissipation of
energy and would help to increase the lifetime of the network To tackle with all
the above mentioned challenges clustering have been found the efficient technique
[19] [20]. Clustering is always been referred as an effective method to enhance the
lifetime of WSN.
1.4 Problem Statement

The sole purpose of this project is to find the method which is more energy efficient.
Wireless sensor networks are battery operated. Sensor nodes collect the data and
pass them on to the network for further use. This passing and receiving of data
utilizes most of the energy of the network. So for better operation and increase the
lifetime of the network, energy consumption must be the major factor of concern.
In this project new method for clustering the sensor network is proposed which is
divided into two phases as Mapping and Reducing. The MAP protocol performs
mapping or assigning of sensor nodes to clusters and REDUCE protocol optimizes
these clustering by making some changes.
1.5 Thesis Objective and Scope

When this method was proposed few goals were set, as follows
1. Minimize the energy dissipation of the network.
2. Increase the network life time.
3. Clusters must be better balanced.
4. Better distribution of cluster heads in the network.
5
1.6 Thesis Outline

The rest of the thesis is organized as follows: In Section 2 we give a brief description
of the important papers that we have studied or utilized as a part of our literature
survey. In Section 3, we introduce our proposed system model for clustering the
sensor network. Section 4 shows the experimental results achieved so far and
finally in Section 5 we present the conclusion and future work.
6
Chapter 2
Literature Survey
In recent years, the interest on clustered WSNs has generated a significant body of
research works. A CH may be elected by the sensors in a cluster or pre-assigned
by the network designer. A CH may also be just one of the sensors or a node
that is richer in resources. Also the cluster membership of a node may be fixed or
variable [25]. In this section, we will review CH selection algorithms
2.1 LEACH
Low Energy Adaptive Clustering Hierarchy (LEACH) by Heinzelman [15] [26] is
the most famous clustering protocol which had been a basis for many further clus-
tering protocols. The most important goal of LEACH is to have Cluster Heads to
reduce the energy cost of transmitting data from normal nodes to a distant Base
Station [3]. In LEACH, nodes organize themselves into local clusters with one
node acting as cluster head. All non-cluster head nodes (normal nodes) transmit
their data to the cluster heads. Cluster head nodes do some data aggregation
and/or data fusion function on which should be transmitted to Base Station. The
cluster heads change randomly over a period of time to balance the nodes energy
dissipation.
The operation of LEACH is divided into two phases: Set up Phase and steady
state phase. Each round begins with a set-up (clustering) phase when clusters are
organized, followed by a steady- state (transmission) phase in which data packets
are transferred from normal nodes to cluster heads. After data aggregation, clus-
ter heads will transmit the messages to the Base Station.
Set up Phase: During this phase each node decides whether or not to become
a cluster head for the current round. The election of cluster head is done with a
probability function: each node selects a random number between 0 and 1 and if
7
Chapter 2. Literature Survey
the number is less than T(n) , the node is elected as a cluster head for current
round:

p

1−p(rmod p1 )
if nG
T (n) = (2.1)

0 Otherwise
Where, P is the cluster head probability, r is the number of current round and G
is the set of nodes that have not been cluster-heads in last p1 rounds. After this
CH election, each cluster head prepares a TDMA schedule and transmits to all
the cluster nodes in that respective cluster. This completes the set up phase of
LEACH. Figure 2.1 [26] illustrates the set up phase through flowchart.
Figure 2.1: Flow Chart for Set UP Phase
Steady State Phase: In this phase nodes send their collected data to CH
at once per frame allocated to them. This assumes that the node always has a
data to transmit. The node goes to sleep mode after this transmission until next
allocated transmission slot, to save the energy. The CH must keep its receiver on
all the time to receive the data from cluster nodes. After reception of all the data,
CH aggregates that data and transmits it to the base station.
The strength of LEACH is in its CH rotation mechanism and data aggregation.
But one important problem with LEACH is that it offers no guarantee about
placement and/or number of cluster head nodes in every round [3] [15]. Therefore
using a centralized clustering algorithm would produce better results. LEACH-
Centralized (LEACH-C) is a Base Station cluster formation algorithm. It uses
the same steady state protocol as LEACH. During the steady state phase, each
8
2.2 MAP REDUCE PROGRAMMING MODEL
node sends information about its current position and energy level to BS. The as-
sumption usually is that each node has a GPS receiver. The BS have to insure the
evenly distribution of energy among nodes. So it determines a threshold for energy
level and selects the nodes (with higher energy than this threshold) as possible
cluster heads. The problem of determining the optimal number of cluster heads
is an NP-Hard problem. LEACHC makes use of Simulated Annealing (Murata
and Ishibuchi, 1994) algorithm to address this problem. After determining the
cluster heads of current round, BS sends a message containing cluster head ID for
each node. If a node’s cluster head ID matches its own ID, the node is a cluster
head; otherwise it’s a normal node and can go to sleep until data transmission
phase. LEACH-C is more efficient than LEACH (LEACH-C delivers about 40%
more data per unit energy than LEACH) because the BS has global knowledge
of the location and energy level of all nodes in the network [15]. Also LEACH-C
always insures the existence of K optimal number of cluster heads in every set-up
phase while LEACH cannot ensure that [3] [14] [15].
Disadvantages
• It assumes a homogeneous distribution of sensor nodes in the given area.
• LEACH is not applicable in large regions.
• No uniform distribution of the CH nodes in the network.
2.2 MAP REDUCE PROGRAMMING MODEL
2.3 NS2
It is a discrete event simulator and very much useful for analysis of dynamic
nature of communication network. Figure 2.2 shows the basic architecture of NS2.
NS2 simulator is based on two languages: an object oriented simulator written in
c++ and OTcl (an Object Oriented extension of Tcl), an interpreter. OTcl is
used to execute users command scripts. These scripts can used to define network
topologies, define modules and their relationships, the protocol that we wish to
implement and application which is to be simulated, also the form of output that
is expected to be obtained, etc. After simulation it can generate the output in
the form of text or animation. To interpret these results graphically as well as
interactively, additional tools such as NAM (Network AniMator) and XGraph are
used.
9
Figure 2.2: Basic Architecture of NS 2
Characteristics of NS2
Ns 2 implements following features:
1. Router queue Management Techniques DropTail, RED, CBQ,
2. Traffic Source Behaviour- www, CBR, VBR
3. Multicasting
4. Routing
5. Simulation of wireless networks
• Developed by Sun Microsystems + UC Berkeley (Daedalus Project)

• Terrestrial (cellular, adhoc, GPRS, WLAN, BLUETOOTH), satellite
• IEEE 802.11 can be simulated, Mobile-IP, and adhoc protocols such as
6. Tracing Packets on all links/specific links
7. Network Topology
8. Packet flow
9. Applications- Telnet, FTP, Ping
2.3.1 Main NS2 Simulation Steps

Following are the three key steps in defining a simulation scenario in a NS2:
Step 1: Simulation Design This is the first step of simulating the network in
which user must determine the purpose the simulation, network to be simu-
lated with its configuration, assumptions to be made, performance measures
and the expected type of output.
10
2.3 NS2
Step 2: Configuring and Running Simulation This step is the implementa-

tion of first step. It consists of two phases:
• Network configuration phase: This is a phase in which actual net-

work components such as protocol, models, are created and configured
according to the first step. Also different events like data transfer,
simulation start and stop time, etc. are scheduled.
• Simulation Phase: This phase starts the simulation as per the con-
figuration mentioned in Network Configuration Phase. It maintains the
simulation clock. And executes all scheduled events until the threshold
value of clock is reached.
Step 3: Post Simulation Processing Verifying the integrity of the program

and evaluating the performance of the simulated network is the main task
of this step.
First two steps are implemented using C++ and OTcl languages as mentioned
earlier. Step 3 which is evaluating the performance of the simulated network, also
called as Packet Tracing.
2.3.2 Packet Tracing

The main activity in packet tracing is recording the details of packet flow during
a simulation which is classified as text-based packet tracing and a NAM packet
tracing.
Text-Based Packet Tracing
In these details of packet flowing through the network check points (e.g., nodes
and queues) are recorded. Following figure 2.3 shows the format of each line
in a normal trace file in which 12 columns forms a line. Only having the trace
Figure 2.3: Format of each line in a trace file.
file would not be sufficient unless meaningful data is extracted from it. In post
analysis phase user can extract the data of interest can further analyse it as per
the requirement. For example, average throughput can be found for a link by
11
extracting the respective columns from the trace file. Another example could be
time taken to reach to the destination for each packet could be found. Two most
popular languages for this are AWK and pearl.
Another type of output is animation based output which is created using Network
AniMation (NAM) Trace. This NAM trace records the simulation details in a text
file and uses this text file to playback the simulation in animation form.
NAM
NAM provides a visual interpretation of the network topology created. Its features
are as follows. Figure 2.4 displays the NAM application and its components.
• Provides a visual interpretation of the network created
• Provides a drag and drop interface for creating topologies.
• Can be executed directly from a Tcl script
• Presents information such as throughput, number packets on each link.
• Controls include play, stop ff, rw, pause, a display speed controller and a
packet monitor facility.
Trace Data Analyzer
This section describes XGraph application used to analyse trace files produced
from a simulation.
XGraph Xgraph is an X-Windows application that includes:
• Animation and derivatives
• Interactive plotting and graphing
To use XGraph in NS-2 the executable can be called within a TCL Script. This will
then load a graph displaying the information visually displaying the information
of the trace file produced from the simulation (see figure 2.5)
2.3.3 Main Parameters used in wireless networks simula-

tion
Let us see some of the parameters used in wireless networks simulation along with
their default and available values.
12
2.3 NS2
Figure 2.4: NAM Tool Description.
Figure 2.5: XGraph running comparing three trace files in a graph.
Transport Protocol Transport protocol used by the sensor node. Available

protocols are TCP and UDP (default).
Routing Protocol Routing protocol used by sensor node. Available protocols

are DSR, TORA, LEACH, Directed Diffusion, DSDV and AODV (default).
Medium Access Control (MAC) Medium access control for sensor node. Since
13
it’s a wireless sensor network simulation IEEE 802.11 is the MAC available.
Link Layer Link layer configuration. Uses NS-2 LL default link layer configu-
ration.
Physical Layer Network interphase layer. Two configurations are provided:

one simulating Crossbow Mica2 sensor node (default) and other simulating
a 914MHz Lucent WaveLAN DSSS radio interface.
Antenna Sensor node antenna configuration. An omnidirectional antenna, cen-

tered in node position and 1.5 meters above the ground is provided.
Radio Propagation Radio propagation model used in simulation. Four models

are available: FreeSpace, Shadowing, ShadowingVis, TwoRayGround (de-
fault).
Interface Queue (IFQ) Interface priority queue. Eight queue models are pro-
vided: DropTail (default), DropTail/XCP, RED, RED/Pushback, RED/RIO,
Vq and XCP.
IFQ Length number of messages buffered in IFQ. User should provide this value
(50 messages by default).
Scenario Size Size (in meters) of the simulation scenario. The user should fill
in the length for the sides of the simulation rectangle (100 x 100meters by
default).
Trace File Name Trace file name. Default value “trace.tr”.
Trace Options Trace options. Radio buttons are used to define the kind of
information should be stored in trace file: TRACE-MAC, TRACE-ROUTE
and TRACE-AGENT. By default all three options are set “on”.
NS has a rich library of network and protocol objects. Additionally, large amount
of online support is available through mailing lists, message boards, tutorials, on-
line manuals, etc.
14
Chapter 3
System Design
3.1 BACKGROUND
3.1.1 MAP-REDUCE Programming Model

The MapReduce framework was originally developed at Google [12] but now a
day to wide adoption of this frame work, it becomes a de facto standard for large
scale companies for data analysis purpose. MAP-REDUCE programming model
is defined by Dean et al [12]. MAP-REDUCE computing model consists of two
functions, Map and Reduce. The Map and Reduce functions are both defined with
data structure of (key1 , value1 ) pairs [12] [13]. Map function is applied to each
item in the input dataset according to the format of the (key1 , value1 ) pairs; each
call produces a list (key2 , value2 ). All the pairs which have the same key in the
output lists are put to reduce function which generates one (value3 ) or an empty
return. The results of all calls form a list, list (value3 ) [11]. This process of MAP
and REDUCE is illustrated in figure 5 [21]. The process of data input and output
is as follows [12] [13].
M ap(key1 , value1 ) → List(key2 , value2 ) (3.1)
Reduce(key2 , value2 ) → List(value3 ) (3.2)
Consider the example of counting the number of occurrences of each word in

a large dataset. The map function generates each word along with its number of
occurrences in the data set. The Reduce function sums together all number of
occurrences generated for particular word by map function.
15
Chapter 3. System Design
Figure 3.1: Map Reduce Illustration.
3.1.2 K-MEANS Algorithm

K-MEANS [21] is the simplest algorithm used for clustering which is unsupervised
clustering algorithm. This algorithm partitions the data set into k clusters using
the cluster mean value so that the resulting clusters intra cluster similarity is high
and inter cluster similarity is low. K-Means is iterative in nature. Figure 3.2[21]
illustrates the original K-MEANS algorithm. It follows following steps:
1. Arbitrarily generate k points (cluster centers),k being the number of clusters

desired.
2. Calculate the distance between each of the data points to each of the centers,
and assign each point to the closest center.
3. Calculate the new cluster center by calculating the mean value of all data
points in the respective cluster.
4. With the new centers, repeat step 2. If the assignment of cluster for the
data points changes, repeat step 3 else stop the process.
The distance between the data points is calculated using Euclidean distance as fol-
lows. The Euclidean distance between two points or tuples, X1 = (x11 , x12 . . . x1n )
X2 = (x21 , x22 . . . x2n )
v
u n
uX
Dist(X1 , X2 ) = t (x
1i − x2i )2 (3.3)
i=1
16
3.2 OUR PROPOSED SYSTEM
Figure 3.2: Original K-MEANS Algorithm.

Clustering has been proved to be an effective method to increase the lifetime
of WSN [19]. To use this effectiveness we tried to present a new technique for
clustering the sensor nodes. As mentioned, in section 1.3 , the main motivation
for proposing this system is applying the goodness of K MEANS along with MAP
REDUCE. We tried to develop a clustering method depending on the energy of
the nodes in order to extend the lifetime of WSN.
3.2.1 ALGORITHM ASSUMPTIONS

In this section, we make some assumptions for the proper operation of our algo-
rithm. The main assumption made is that the base station has no constraints on
its energy resources and all nodes initially have the same energy available. The
operation of a sensor network starts with the cluster set up phase, in which clusters
of the sensor nodes are formed, followed by the data transmission phase, in which
cluster nodes will transmit the collected data to cluster head. Each cluster head
aggregates the data received from cluster nodes and relays to the base station.
The cluster set up phase is divided into two sub clustering phases. In first sub
clustering phase, the base station has to cluster the sensor nodes and assign the
proper roles to them. This operation is referred as MAP protocol. In second sub
clustering phase,if the energy of the cluster node is getting down it will try to find
out the better cluster head. This phase is referred as REDUCE protocol.
17
3.2.2 CLUSTER SETUP PHASE

As we are using MAP-REDUCE [10] [11] algorithm, the types of key and value
for proposed method are as follows.
key1 → List of initial set of selected k centroids.
V alue1 → List of all other nodes along with their location information and energy level.
key2 → List of new set of k centroids.
V alue2 → List of all other nodes with with their cluster heads.
0
key3 → List of new k ≤ k centroids.
V alue3 → List of all other nodes with new cluster heads.
Map / First-clustering Phase
Table 3.1 shows the process of a Mapper. Input to the mapper is list of initial set
of randomly selected k centroids as key1 and list of all other nodes along with their
location information and energy level as value1 . By using mapper (key1 , value1 )
protocol, the Map phase would produce list of new set of k centroids as key2 and
list of all other nodes with their cluster heads known to them as value2 .
Table 3.1: MAP Protocol

1 BS → N sensor nodes : Requesting node’s position and energy level
2 BS: KMEANS(key1 , value1 )
3 BS assigns role (Cluster Head /member node ) to each sensor node
4 Each cluster sends one hop communication about the cluster to member nodes
5 Generate output key2 , value2
Reduce / second-clustering Phase
Table 3.2 shows the process of a Reducer. The intermediate results produced by
Map protocol are given as input to the reducer i.e. list of new set of k centroids as
key2 and list of all other nodes with their cluster heads known to them as value2 .
By using reducer (key2 , value2 ) protocol, the Reduce phase would produce final
clusters with their cluster heads and other nodes in that cluster as value3 .The
term reduce is used in Reduce phase, which is meant for optimizing the output
and not for reducing the size of the output.
18
Original Map is parallel in nature. We use a centralized MAP algorithm at

BS but REDUCE is parallelized to optimize the final clusters. This parallization
reduces the time of clustering the sensor network.
Table 3.2: REDUCE Protocol

1 Read ( value2 ) /* Build second Clustering */
0 0
2 Place k (k ≤ k ) nodes represented as initial cluster heads
3 Repeat
a) If the member node is losing the energy below the threshold, it will start searching
for better CH
b) Or the cluster head is running out of energy new CH will be assigned to the node.
c) Update CH i.e. calculate the mean value for each cluster
d) Until no change
4 Produce value3
K MEANS algorithm will be called by MAP and REDUCE Protocol (See table
3.3).
Table 3.3: K Means Algorithm

1 BS will arbitrarily chooses k nodes as initial cluster heads having maximum energy
and closer to the node
2 Repeat
3 (Re)assign each node to the cluster with the nearest CH.
4 Calculate the mean value of the Cluster.
5 Until no change
The effectiveness of clusters are evaluated based on uniformity of node distri-

bution.
Intra-Cluster Distance
This is the distance between the cluster nodes to its cluster centres to determine
whether the clusters are compact[30].
K X
1 X
intra = x − Zi (3.4)

N i=1 xCi
where N is the number of nodes in the network,K is the number of clusters, and
zi is the cluster centre of cluster Ci .
19
Inter-Cluster Distance
This is the distance between clusters [30]. We calculate this as the distance be-
tween cluster centres, and take the minimum of this value, defined as
2
inter = ( Zi − Zj ) (3.5)

i=1,2 . . . K − 1 and j= i+1 . . . K

we take only the minimum of this value.
3.3 Channel Propagation Model

In the wireless channel, the electromagnetic wave propagation can be modelled as
falling off as a power law function of the distance between the transmitter and
receiver. The free space model which considered direct line-of-sight and two-ray
ground propagation model which considered ground reflected signal also, were
considered depending upon the distance between transmitter and receiver. If the
distance is greater than dcrossover , two-ray ground propagation model is used. The
crossover is defined as follows.
√
4 ∗ π ∗ L ∗ hr ∗ ht
dcrossover = (3.6)
λ
Where,L ≤ 1 is system loss factor. Here equation 3.6 , hr is the height of the
receiving antenna, ht is the height of the transmitting antenna and is the wave-
length of the carrier signal. Now transmit power is attenuated based on following
formula:

pt ∗Gt ∗Gr ∗λ2

(4∗π∗d)2
if d < dcrossover
Pd (d) =  pt ∗Gt ∗Gr ∗h2t ∗h2r (3.7)
(d)4
if d≥dcrossover
Where,pr is the received power at distance d, pt is transmitted power, Gt is gain
of the transmitting antenna and Gr is gain of the receiving antenna.
3.4 Radio Energy Dissipation

We assumed a simple model for the radio hardware energy dissipation where the
transmitter dissipates energy to run the radio electronics and the power amplifier,
and the receiver dissipates energy to run the radio electronics as shown in figure
3.3. Using this radio model, to transmit k - bit message at distance d the radio
expends:
ET x (k, d) = ET x−elec (k) + ET x−amp (k, d) (3.8)
20
3.4 Radio Energy Dissipation
ET x (k, d) = Eelec ∗ (k) + Eamp ∗ k ∗ d2 (3.9)
and to receive this message, the radio expends:
ERx (k, d) = ERx−elec (k) (3.10)
ERx (k, d) = Eelec ∗ k (3.11)
Figure 3.3: Radio Energy Dissipation Model [26]
21
Chapter 4
Implementation and Simulation
In this section we have mentioned the details about the implementation of the
proposed algorithm and the results found after the implementation.The details
are as follows:
4.1 Simulation Set up

We simulated the proposed algorithm in NS 2.29 [16]. We found results for placing
the cluster heads with minimum distance separated as well as placing the clus-
ter heads randomly over the grid.We also calculated the intra cluster and inter
cluster distance.Analysed the network in terms of packet delivery ration,Energy
consumption for transmission,dropped packets and found that the network works
well for the .
For the simulation experiments, following parameters were used:
Tx Antenna Gain Gt = Rx Antenna Gain Gr =1
Antenna Height (Ht ) =1.5m,
Base Station Location was (500,200)
4.2 Simulation Results

As per mentioned in [26] , 5% of total number of cluster gives the better perfor-
mance in the network. We have clustered the network in same number of clusters.
We have found the intra cluster distance and inter cluster distance of the cluster.
Results have shown that .As we have mentioned that the cluster heads can be
placed randomly or separated with some minimum distance. Results show that if
the cluster heads are separated with some minimum distance it gives the better
22
4.2 Simulation Results
Table 4.1: Simulation Parameters

No.Of Item No.Item Description Parameter No.Item Description Parameter
1 Simulation Area 1000X1000
2 No. of Nodes 100
3 Radio Propagation Model Two ray ground
4 Channel Type Channel/ Wireless channel
5 Antenna Model Antenna/Omniantenna
6 Interface Queue Type Queue/Drop Tail/PriQueue
7 Link Layer Type LL
8 Energy Model Battery
9 Min Packets in ifq 30
performance.We have considered the minimum distance as 5o meters. Figure 4.1

shows the end to end delay for cluster heads placed with minimum distance and
cluster heads placed randomly. It shows that the end to end delay of the network
is much lesser if the nodes are separated with minimum distance
Figure 4.1: End To End Delay.
Figure 4.2 shows the energy consumption for the data transmission in a net-
work. It shows thows that energy consumption is much lesser if the cluster heads
are separated with minimum distance.
23
Chapter 4. Implementation and Simulation
Figure 4.2: ENERGY REQUIRED.
24
Chapter 5
Future Scope and Conclusion
We examined the need of clustering in wireless sensor network. We introduced

LEACH algorithm in literature survey but the major disadvantage of LEACH is,
it considers homogeneous distribution of nodes in the network. As MAP REDUCE
is the best programming model for large data sets to parallel the task. We tried to
use this functionality of MAP REDUCE. K MEANS is widely used for clustering
in data mining, but it is best suitable for smaller data sets. The Lager data set
of sensor network becomes the smaller data set of K MEANS.And for it the K
MEANS works best. so we tried to combine the best of these two methods.
Our proposed scheme does not need the homogeneous distribution of the nodes
over the grid. In MAP phase we are assigning the cluster heads to sensor nodes.
In REDUCE phase we tried to optimize the clusters by checking two conditions.
In first one we checked the energy of the CH, it is below some threshold new CH
will be assigned to the sensor nodes. It helps to minimize the dropped nodes in
the network. In second condition , if the energy of the common node is falling
below some threshold it tries to find out new CH.It will also help to minimize the
dropped nodes. We have considered have placed the CHs in the sensor network
such that minimum distance is maintained among them. Our algorithm tries to
change the cluster head of the nodes if the CH is running out of the energy, it
helps to minimize the dropped packets. Also the proposed scheme gives the better
performance in terms of throughput. Our scheme basically considers the energy
of the node as well as the position of the node , it helps to produce best cluster.
Our scheme does not consider the From this we can conclude that our proposed
algorithm achieves best results in terms of energy required, throughput of the
network and number of dropped packets.
25
Bibliography
[1] K. BenkiK, M. Malajner, A. PeuliK, and . MuKej, Academic Education Wire-

less Sensor Network: AeWSN 50th International Symposium ELMAR-2008
Zadar, Croatia, 10-12 September 2008.
[2] A. A. Abbasi, and M. Younis, A Survey on Clustering Algorithms for wireless

sensor network Computer Communications 30, 28262841, 21 June 2007.
[3] Neda Enami, Reza Askari Moghadam, Energy Based Clustering Self Organiz-
ing Map Protocol For extending Wireless Sensor Networks lifetime and cov-
erage Canadian Journal on Multimedia and Wireless Networks Vol. 1, No. 4,
August 2010.
[4] Shamneesh Sharma, Robin Prakash Mathur, Dinesh Kumar, Enhanced Reliable
Distributed Energy Efficient Protocol for WSN International Conference on
Communication Systems and Network Technologies, 2011.
[5] A. S. Raghuvanshi, S Tiwari, R Tripathi and N. Kishor Optimal Number of

Clusters in Wireless Sensor Networks: An FCM Approach ICCCT 2010.
[6] Alper Bereketli Ozgur B. Akan., Event-to-Sink Directed Clustering in Wireless

Sensor Networks WCNC 2009.
[7] Anirooth Thonklin, W. Suntiamorntut, Load Balanced and Energy Efficient

Cluster Head Election in Wireless Sensor Networks ECTI Association of Thai-
land - Conference 2011.
[8] C. Nam, H. Jeong, and D. Shin, The Adaptive Cluster Head Selection in Wire-
less Sensor Networks” IEEE International Workshop on Semantic Computing
and Application, pp. 147-149, July 2008.
[9] C. Nam, Y. Ku, J. Yoon, and D. Shin, ”Cluster Head Selection for Equal Clus-
ter Size in Wireless Sensor Networks” Proceedings New Trends in Information
and Service Science, 2009. NISS, pp. 618-623, July 2009.
26
BIBLIOGRAPHY
[10] Alina Ene Sungjin Imy Benjamin Moseleyz, Fast Clustering using MapReduce
http://www.arxiv.org/abs/1109.1579v1 September 6, 2011.
[11] Jing Zhang, Gongqing Wu, Haiguang Li, Xuegang Hu, Xindong Wu, A 2-Tier
Clustering Algorithm with Map-Reduce The Fifth Annual ChinaGrid Confer-
ence,China, 2010.
[12] J. Dean, S. Ghemawat, ”MapReduce: Simplified Data Processing on Large

Clusters” Proceedings of the 6th Symposium on Operating System Design
and Implementation. San Francisco, California, USA, pp. 137-150, December
2004.
[13] J. Dean, S. Ghemawat, ”MapReduce: A flexible Data Processing Tool” Com-

munications of the ACM, vol. 53, no. 1, pp. 72-77, January 2010.
[14] Heinzelman W, Chandrakasan A, Balakrishnan H. An Application-specific

Protocol Architecture for Wireless Microsensor Networks IEEE Transactions
on Wireless Communications,p. 660 - 670, 2002
[15] Heinzelman W, Chandrakasan A, Balakrishnan H. Energy-Efficient Commu-

nication Protocol for Wireless Microsensor Networks Proc. 33rd HICSS, 2000.
[16] NS 2, Network Simulator, World Wide Web

http://www.isi.edu./nsnam/ns/ns-build.html 2004.
[17] Dehni L, Kief F, Bennani Y., Power Control and Clustering in Wireless Sen-
sor Networks Proceedings of Med-Hoc-Net Mediterranean Ad Hoc Networking
Workshop, France, 2005
[18] F. L. LEWIS, D.J. Cook and S.K. Das, “Wireless Sensor Networks Smart
Environments: Technologies, Protocols, and Applications John Wiley, New
York, 2004.
[19] N. Vlajic and D. Xia Wireless Sensor Networks: To Cluster or Not To Clus-
ter? WoWMoM’06, 2006.
[20] Vivek Katiyar, Narottam Chand, Surender Soni, Clustering Algorithms for
Heterogeneous Wireless Sensor Network: A Survey International Journal of
Applied Engineering Research, DINDIGUL Volume 1, No 2, 2010.
[21] Wang, Xuan, ”Clustering in the Cloud: Clustering Algorithms to Hadoop

Map/Reduce Framework” http://ecommons.txstate.edu/cscitrep/19 Pa-
per 19, 2010.
27
BIBLIOGRAPHY
[22] Kazem Sohraby, Daniel Minoli, Taieb Znati WIRELESS SENSOR NET-
WORKS Technology, Protocols, and Applications John Wiley, New York, 2007.
[23] Mccanne, S; Floyd, S; and Fall, K. NS2 (Network Simulator 2) http://

www-nrg.ee.lbl.gov/ns/.
[24] The VINT Project, The ns Manual (Formerly ns Notes and Documenta-
tion) A Collaboration between researchers at UC Berkeley, LBL, USC/ISI,
and Xerox PARC, DABT63-96-C-0105. http://www.isi.edu/nsnam/ns/
ns-documentation.html
[25] Inbo Sim, KoungJin Choi, KoungJin Kwon and Jaiyong Lee, Energy Effi-
cient Cluster header Selection Algorithm in WSN International Conference on
Complex, Intelligent and Software Intensive Systems, 978-0-7695-3575-3/09.
[26] Rajesh Patel, Sunil Pariyani, Vijay Ukani, Energy and Throughput Analysis
of Hierarchical Routing Protocol (LEACH) for Wireless Sensor Network IJCA
(0975 8887) Volume 20 No.4, April 2011.
[27] http://www.mannasim.dcc.ufmg.br/msg-basic-window.htm
[28] Teerawat Issariyakul, Ekram Hossain, Introduction to Network Simulator NS2

Springer Science Business Media, LLC,DOI: 10.1007/978-0-387-71760-9 Li-
brary of Congress Control Number: 2008928147, 2009
[29] Charka Panditharathne and Soumya Jyoti Sen, Energy Efficient Communica-
tion Protocols for Wireless Sensor Networks ,a thesis for the degree of bachelor
of technology in Electronics and Instrumentation Engineering, National Insti-
tute of Technology, Rourkela Orissa May-2009.
[30] Asif Khan, Israfil Tamim, Emdad Ahmed, Muhammad Abdul Awal Multiple
Parameter Based Clustering (MPC): Prospective Analysis for Effective Clus-
tering in Wireless Sensor Network (WSN) Using K-Means Algorithm Wireless
Sensor Network,4, 18-24,2012.
28

Clustering in Wireless Sensor Network Using K-Means and Map Reduce Algorithm

Uploaded by

Copyright:

Available Formats

You might also like

Clustering in Wireless Sensor Network Using K-Means and Map Reduce Algorithm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clustering in Wireless Sensor Network Using K-Means and Map Reduce Algorithm

Uploaded by

Copyright:

Available Formats

Clustering in Wireless Sensor Network

using K-MEANS and MAP REDUCE

submitted in partial fulfillment of the requirements

under the guidance of

Department of Computer Engineering and Information Technology

This is to certify that the dissertation titled

Clustering in Wireless Sensor Network using

and is approved for the degree of

Dr. Jibi Abraham, Dr. Jibi Abraham,

List of Figures viii

4 Implementation and Simulation 22

5 Future Scope and Conclusion 25

1.1 Typical Sensor Network Arrangement . . . . . . . . . . . . . . . . . 2

2.1 Flow Chart for Set UP Phase . . . . . . . . . . . . . . . . . . . . . 8

3.1 Map Reduce Illustration. . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 End To End Delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.1 Wireless Sensor Network

1.1.1 Home Control

Figure 1.1: Typical Sensor Network Arrangement

Figure 1.2: Home Control Application

• Sensing capability provides the flexible management of temperature, cool-

• Sensing capability provides automatic notification upon detection of some

• Sensing capability enables easy way to install, upgrade and networking of

home control system without running any cable.

1.1.2 Medical Monitoring

Consider a personal health monitor application running on a PDA that receives

1.2 Clustering in wireless sensor network

Figure 1.3: Clustered Sensor Network

• Transmit aggregated data to the data sink

• Reducing number of nodes taking part in transmission

• Useful Energy consumption

• Scalability for large number of nodes

• Reduces communication overhead

• Efficient use of resources in WSNs

1.4 Problem Statement

1.5 Thesis Objective and Scope

2. Increase the network life time.

3. Clusters must be better balanced.

4. Better distribution of cluster heads in the network.

1.6 Thesis Outline

Figure 2.1: Flow Chart for Set UP Phase

• It assumes a homogeneous distribution of sensor nodes in the given area.

• LEACH is not applicable in large regions.

• No uniform distribution of the CH nodes in the network.

2.2 MAP REDUCE PROGRAMMING MODEL

Figure 2.2: Basic Architecture of NS 2

1. Router queue Management Techniques DropTail, RED, CBQ,

2. Traffic Source Behaviour- www, CBR, VBR

5. Simulation of wireless networks

• Developed by Sun Microsystems + UC Berkeley (Daedalus Project)

6. Tracing Packets on all links/specific links

9. Applications- Telnet, FTP, Ping

2.3.1 Main NS2 Simulation Steps

Step 2: Configuring and Running Simulation This step is the implementa-

• Network configuration phase: This is a phase in which actual net-

Step 3: Post Simulation Processing Verifying the integrity of the program

2.3.2 Packet Tracing