Incremental and SQL-Based Data Grid Mining Algorithm For Mobility Prediction of Mobile Users

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2009 International Conference on Computational Science and Its Applications

Incremental and SQL-Based Data Grid Mining


Algorithm for Mobility Prediction of Mobile Users

U.Sakthi R.S.Bhuvaneswaran
Research Scholar Assistant Professor
Computer Science Department Ramanujan Computing Center
Anna University, Chennai Anna University, Chennai
India India
sakthi.ulaganathan@gmail.com

Abstract
next location of mobile user in mobile environment.
In this paper, we propose a new SQL based By using the predicted location, the system effectively
incremental distributed algorithm for predicting the allocate resources to mobile users in the neighbor
next location of a mobile user in a mobile web location and it is possible to answer the queries that
environments. Parallel and Distributed data mining refer to the future position of mobile uses. Data grid is
algorithm is applied on moving logs stored in designed to allow large moving logs to be stored in
geographically distributed data grid to generate the repositories. In business area it is necessary to develop
mobility pattern, which provides various location environment for analysis, inference and discovery over
based services to the mobile users. One of the existing the data grid. Therefore, the evolution of the data grid
works for deriving mobility pattern is re-executing the is represented by knowledge grid offering high level
algorithm from scratch results in excessive services for distributed mining and extraction of
computation. In our work, we have designed new knowledge from data repositories available on data
incremental algorithm by maintaining infrequent grid.
mobility patterns, which avoids unnecessary scan of
full database. We built data grid system on a cluster of Location management or location tracking is the
workstation using open source Globus Toolkit (GT) mechanism of locating the particular mobile user at
and Message Passing Interface extended with Grid any given point of time. The two strategies for location
Services (MPICH-G2). The experiments were tracking are location updating and location prediction
conducted on original data sets with incremental [15]. Location updating is a passive method in which
addition of data and the computation time was the system periodically updates the current location in
recorded for each data sets. We analyzed our results a database. Location prediction is the dynamic strategy
with various sizes of data sets and it shows the time in which the system proactively estimates the mobile
taken to generate mobility pattern by incremental users location based on a user movement model.
mining algorithm is less than re-computing approach. Personal Communication System (PCS) allows mobile
users to move from one location to another location
1. Introduction since these systems are based on the notion of wireless
access [14]. In mobile system each mobile user is
Grid Computing has emerged as an important new associated with Home Location Register (HLR) which
field, distinguished from conventional distributed stores up-to-date location of the mobile users. These
system by focusing on large scale resource sharing, logs accumulate as large database, in which data
problem solving in dynamic and multi-institutional mining technique [13] is applied to find the frequently
virtual organizations [7]. The Knowledge Grid (KG) is followed location. Each Base Station (BS) in PCS is
a parallel and distributed architecture that integrates connected with Separate Home Location Register led
data mining techniques [8] and grid technologies. The to the geographically distributed data grid node. Grid
Knowledge Grid is exploited to perform distributed network was built with cluster of grid node contains
data mining on very large data sets available over grids moving logs of mobile users. Existing research work
to find hidden valuable information, process models to applied data mining technique on mobile data for path
make decisions and results to make business decisions. mining in a single database server. If the size of the
In our work knowledge grid is developed to predict the moving logs is very large, the overhead in integrating

978-0-7695-3701-6/09 $25.00 © 2009 IEEE 71


DOI 10.1109/ICCSA.2009.6
the data source will be too high. To overcome this grid network, mobile user movement history is
problem data mining algorithm is executed in geographically distributed in different data grid nodes.
distributed environment. The most prominent example Knowledge grid based mobility pattern mining
of distributed environment is grid, where a large algorithm is executed on computational grid over the
number of computing and storage units are data grid to generate the trajectories that are frequently
interconnected over a high speed network. Data mining used by mobile users. The frequently followed
is inductive, iterative process that extracts information trajectory is named as User Mobility Pattern (UMP)
or frequent patterns from large volume of data. Most of which is used to generate mobility rules. The
the applications have created large volume of data sets communication of candidate item sets between grid
which are constantly increasing and stored in nodes is achieved by MPICH-G2 and reduces the
geographically distributed locations. A parallel and computation overhead. Our proposed algorithm is
distributed algorithm is implemented on computational executed on knowledge grid to generate mobility
grid to mine data stored in data grids. The Knowledge patterns from the database distributed on the data grid
Grid is a parallel and distributed software architecture node. The execution time of our parallel and
that integrates grid technologies and data mining distributed algorithm is relatively small when
applications. compared to sequential algorithm.

The paper is organized as follows. In section 2, the Location management is purely database updating
process of incremental mining Mobility Pattern (MP) and querying procedure. Let UAP = 〈 l1, l2,…,ln 〉 be the
of mobile users in grid environment is described. set of locations. A database DB is a set of mobile user
Section 3 describes the related work in distributed data records, where each record contains set of locations.
mining and Section 4 describes the services of Let DB = {DB1, DB2,...,DBm} where m is number of
knowledge grid. Section 5 describes how the inter- data grid nodes in grid network and DBk is the
process communication is performed in MPICH-G2. database stored in kth data grid node. Local frequent
Section 6 explains the distributed algorithm for mining locations are mined and it is communicated to all other
mobility patterns in grid environment. Section 7 and grid nodes to find the global frequent locations called
Section 8 explains the process of generating mobility as mobility patterns. Mobility rules are generated from
rules and mobility prediction. Section 9 describes the mobility patterns which is in the form of A  B
incremental mining algorithm to generate mobility having global confidence c if c% of records in DB that
pattern when adding new moving logs to the original contain A followed by B. Mobility rule A B
database. The performance analysis of the proposed represents the mobile user’s current location is A then
method is described in Section 10. The conclusion is he/she will move to location B. The mobility rule A 
given in Section 11. B has global support s if s% of record contains A ∪ B.

2. Problem Definition In general, mobility prediction is performed in two


steps. 1. Find all mobility patterns: By definition, each
Most of the mobile users exhibit some regularity in of these location sets will occur at least as frequently as
their daily movement and they move from one location a predetermined global minimum support count,
to another location in a wireless PCS network. The min_sup. 2. Mobility rules are generated from the
coverage area of the network is divided into number of mobility patterns which satisfy global minimum
location areas. Each mobile device is linked with the support and minimum confidence. Mobility Rule is the
base station. Each base station contains home location important knowledge derived from database on a data
register which stores permanent details of the mobile grid. Moving logs are added incrementally to the data
users and Visiting Location Register (VLR) which base. It makes existing mobility rules invalidate. Re-
stores temporary details of the mobile users. These execution of mining algorithm is not an efficient
register includes attributes like user ID, user location, process, since it ignores previously discovered rules
call time, call duration. User ID acts as a key for the and repeats the work that has already been done.
mobile user records. The movement history of a mobile Incremental mining algorithm is a useful technique for
user is extracted from log files and it is stored in grid mining mobility rules when logs are added to the data
node for mining mobility patterns. The movement of base continuously.
mobile user is called as User Actual Path (UAP) which
have the form UAP = 〈 l1, l2,…,ln 〉 , where n is the
number of locations followed by the mobile user and lk
represents kth location in the movement path. In our

72
3. Related Works hostname, operating system, etc., and dynamic
information like CPU workload, memory status.
In mobile environment, moving log size is very 3. Globus Resource Allocation and Management
large. It will increase the overhead to integrate all the (GRAM): responsible for resource allocation, job
moving logs into one database server [10]. The creation, monitoring management, job control and
centralized mobility pattern algorithm was discussed in provides interface for job submission on remote
[11] which cannot be efficient for large data size. machine.
Many parallel and distributed variants of sequential 4. GridFTP: It is an extension of FTP for parallel data
apriori algorithm [9,12] have been discussed in other transfer, file transfer based on GSI authentication
resources. In [1,4], grid implementation of frequent mechanisms.
item sets in a grid environment deals with sales 5. Heart Beat Monitor (HBM): Responsible for
transaction of a company, these algorithms cannot be identifying globus process failures and application
used directly in our domain, because this algorithm process failures and immediately recovery action can
does not take into account the network topology while be taken.
generating the candidate patterns. The generation of 6. Global Access to Secondary Storage (GASS):
candidate pattern in [13] is not same as the candidate Exposes interfaces that help the clients to access the
pattern in mobile environment. In PCS, only the data in a uniform manner from heterogeneous data
sequence of neighboring location of the network can be sources. Data caching service is utilized to improve
considered as the mobility pattern. We propose data access performance. Metadata catalog and
incremental parallel and distributed knowledge grid services allows us to search for a multitude of services,
mining approach based on apriori algorithm for mining based upon the object metadata attributes.
mobility patterns in mobile environment. 7. Dynamically-Updated Resource Online Co-allocator
(DUROC): It acts as a coordinator between subjobs
running on different computational grid.
4. Knowledge Grid
4.2 Knowledge Grid Services
Knowledge Discovery in Database (KDD)
employs a variety of software, tools called as data
The knowledge grid contains Core K-Grid layer and
mining to find useful patterns and models. The
High level K-Grid layer [5]. The Core K-Grid layer of
Computational power of parallel and distributed
knowledge grid performs two main services. 1.
system is applied for analysis on large volume of data,
Knowledge Directory Service (KDS) manages
maintained over geographically distributed sites. This
metadata about data source, algorithm used for data
technique is called as Parallel and Distributed
mining, mining results and visualization tool. All
Knowledge Discovery (PDKD). Grid computing is the
metadata are represented in eXtensible Markup
most emerged area for high performance distributed
Language (XML) documents stored in Knowledge
applications like knowledge discovery process.
Metadata Repository (KMR). The discovered mobility
Knowledge gird is an integration of basic grid services,
patterns after the execution of distributed mining
data grid services and computational grid services for
process is stored in Knowledge Base Repository
distributed data mining and knowledge discovery [2].
(KBR) 2. The Resource Allocation and Execution
We built knowledge grid using the open-source Globus
Management Service (RAEMS) generates data mining
toolkit [3].
process execution plan and it will be stored in a
Knowledge Execution Plan Repository (KEPR). The
4.1 Globus Toolkit Services execution plan will generate resources request
expressed using the Resource Specification Language
The basic grid components provided by the globus (RSL) for GRAM. The high level K-Grid layer
toolkit are: supports the following services. 1. Data Access Service
1. Grid Security Infrastructure (GSI): Provides (DAS) is responsible for accessing data for data
authentication (identity of the users and services) based mining. 2. Tools and Algorithm Access Service
on certificate produced by certificate authority and (TAAS) are responsible for loading tools and
standard X.509.The mutual authentication is provided algorithm defined in KDS. 3. Execution Plan
by Secure Socket Layer (SSL) protocol. Management Service (EPMS) enables users to create
2. Monitoring and Data Service (MDS): MDS is an execution plan by assigning programs to data
information service component provides information resources. On multiple execution of program, different
about available grid resources and periodically collects execution plans are created. 4. Result Presentation
their status. It provides static information like Service (RPS) is responsible for presenting mobility

73
patterns to the users stored in knowledge base Generates resource
repository. specification using
MDS
mpirun command
5. Inter-Process Communication Using
MPICH-G2
Submission of multiplejobs
MPICH-G2 is a grid enabled open source library for GASS using globusrun command
implementing Message Passing Interface (MPI). It
supports parallel and distributed data mining MPI
application to run on cluster of machines of different
architecture. MPICH-G2 uses TCP for inter-machine DUROC
communication and vendor API for intra-machine Coordinator
communication. Our distributed KMPM algorithm is
executed on several computational grids using
MPICH-G2 component. MPICH-G2 uses RSL script
for sub job execution. The design of our parallel and
distributed mining of mobility pattern application on GRAM GRAM
knowledge grid is shown in Figure 1. Initially grid Authentication Authentication
security infrastructure generates certificates for the using GSI using GSI
user authentication to sign on other site. The user can
use monitoring and discovery service to select grid Job Manager Job Manager
node based on memory, CPU load and network
topology. The globus-run command is executed to
PBS job Script PBS job Script
submit job on multiple machines by creating MPI
computation. MPICH-G2 uses RSL to specify the URL
of the computational and grid resources. The RSL
script defines the job in the following way: Local Scheduler Local Scheduler

+
(&(resourceManagerContact=
“kalannia/jobmanager-pbs”) (count = 10) P1 P2 P3 P4 P5 P6
(label = “subjob 0”)
(environment=
(GLOBUS_DUROC_SUBJOB_INDEX 0) TCP communication
(directory = “/home/sakthi/gridmining”) Vendor API between clusters
(executable = “/home/sakthi/gridmining/KMPM”)
) Figure 1. Inter-process communication

The parameter “resourceManagerContact” specifies 6. SQL Based Parallel and Distributed


the URL of the cluster resource and the corresponding Algorithm
jobmanager-pbs. The parameter “count” specifies the
number of nodes required for computation, and the The data base contains mobile user actual path is
“label” represents name of the sub-job. The parameter distributed over n processing nodes. In our work,
“environment” specifies the directory in which the modified form of Count Distribution (CD) algorithm is
globus is installed. The parameter “directory” specifies used to mine user mobility pattern from user actual
the working directory, and the “executable” parameter path. Let us assume there are n processing nodes P1,
specifies the location of the executable. The mining job P2,…,Pn in our distributed system. The locations which
is started on the computational resource by using are frequently followed by mobile user are called
GRAM running on each server. The MPICH-G2 calls mobility patterns. Count distribution algorithm is
dynamically-updated request online coallocator library previously used in various domains. In our work, CD
file to start the mining on multiple computational algorithm is used with new method for calculating
resources specified by the RSL script. support count of subsequence in UAP. Let X.sup and
X.supi be the global and local support count of
subsequence X at a process Pi. X is globally large if

74
X.sup > S and locally large if X.supi > Si , where S is increment k by 1
global minimum support and Si is local minimum end while
support at Pi. Our KMPM algorithm is executed on n
processing nodes in parallel. Intermediate results are The above algorithm is an adaptation apriori
transferred to other nodes using send and receive algorithm in a distributed environment. Every process
commands of MPICH-G2. generates subsequence of length k called as Local
Subsequence (LSk) and then calculates local support
Generation of candidate sequence is implemented in count for each LSk. These subsequence local support
the form of SQL k-way join by removing infrequent count is exchanged with all other process to generate
subsequence [6]. Candidate subsequences are Global Subsequence (GSk) and then calculates global
generated by maintaining chronological order of support count for each GSk. The subsequences which
location. The previous work generates large number of have a support count greater than the threshold global
Local Candidate sequences of length k (LCk) from the support count are selected as Global Mobility Pattern
original UAP. In our work, we have generated of length k (GMPk). For instance, consider UAPs 〈 4,
candidate sequence from the previously generated
6, 8, 0, 5 〉 , 〈 2, 4, 8, 0, 6 〉 and 〈 1, 2, 4, 6 〉 where the
Local Mobility Pattern (LMPk-1) using SQL k-way join
operation. All sub sequences of new sequences must be number 4 represents location of mobile user. The
frequent. Our algorithm generates small number of support count of the subsequence 〈 4, 6 〉 can be
candidate sequences, since it involve less computation calculated as follows. s.count = s.count + suppInc and
time in determining global mobility patterns. The 1
suppInc= 1+ totdis , where totdis is the number of
pseudocode for Knowledge grid based Mobility Pattern
Mining (KMPM) algorithm is given below: location between 4 and 6. s.count value is 2 because it
appears in 1st and 3rd UAP. In 2nd UAP there are two
// pseudo code for local mining at a process Pi locations between 4 and 6. Therefore the support value
KMPM( ) for 4 and 6 is 〈4,6 〉 .count = 2+ 1+12 =2.33. It will
Input: moving path of mobile users increase the accuracy of the support counting. This
Output: local frequent mobility pattern algorithm will generate more accurate global mobility
LSk=null //k is the length of subsequence patterns.
for each UAP a ∈ Di
find the subsequence of UAP and put it in S
7. Mobility Rule Generation
for each subsequence s ∈ S
//calculate the support count and store it in
In our knowledge grid, after the execution of
LSk
parallel and distributed mining algorithm, mobility
s.count = s.count + s.suppInc
patterns are stored in knowledge base repository. It can
end for
be used to generate mobility rules. For example,
end for
mobility pattern is 〈4, 6, 8 〉 .
// pseudocode for global mining at kth pass Mobility rules are as follows:
globalmining() 〈4 〉  〈 6, 8 〉
Input: local frequent patterns 〈4, 6 〉  〈 8 〉
Output: global frequent patterns
From the UMPs, all possible mobility rules are
k=1
generated and their confidence value is calculated. In
while (GSk ≠ null)
for every from 1 to n increment by 1 general, mobility rule R is represented as 〈t1, t2,..,ti 〉
node Pi exchange and merge local support  〈ti+1,ti+2,…,tp 〉 . Confidence value for the rule R is
counts of LSk with all n nodes and find the calculated using the following formula:
global support count of all subsequence and
〈 t1 ,t2 ,...,ti 〉 .count
store it in GSk
Confidence (R) = 〈 ti +1 ,ti + 2 ,...t p 〉 .count
end for
for each subsequence in GSk Then the mobility rules which have a confidence
if the global support count of subsequence is higher than a predefined confidence threshold (confmin)
above the minimum global support count then are selected. These mobility rules can be used in next
put the subsequence in global mobility pattern phase for mobility prediction. The mined mobility rule
GMPk is compared with the current location of mobile user to
end for predict the next possible locations.

75
8. Mobility Prediction 9. Incremental Mobility Rule Mining
In mobile web environment, next location of mobile The parallel and distributed algorithm is executed
users is predicted using mobility rule and current on data grid to find the mobility pattern when moving
location of the mobile user. Mobility rule contains two logs are added to and removed from the database
parts namely, head - the part before the arrow and tail- without re-executing the algorithm. The algorithm uses
the part after the arrow. Our process generates set of the concept of negative border for data mining by
rules whose head matches with the current location of maintaining the infrequent mobility pattern. Infrequent
the mobile user. These rules are called as matching sequence represents the set of sequences, which did not
rules. The first location in the tail of the matching rule satisfy the minimum support. During each pass of the
and match value is stored in the resultant array. Match data mining algorithm, the set of sequences (GSk) are
value is calculated by summing up the support value of computed from the previous mobility patterns (GMPk-
the UMP and confidence of the rule. The matching 1). The negative border with infrequent sequence is
rules in the array are sorted in descending order with found by NMPk=GSk-GMPk. where NMPk represents
respect to match value. This process generates most the infrequent sequence in kth pass.
confident and frequent rules.
Table 1. Parameters used in experiment
The parameter m defines number of predictions
required. It selects only first m locations from the S.No Notation Meaning
resultant array. In Figure 2, there are three locations 4, 1 DB Transactions in
6 and 8. For example, currently the mobile user is in original database
location 4. Our algorithm generates matching rules
2 db Transactions that are
〈4 〉  〈6 〉 and 〈4 〉  〈 8 〉 . The match value is newly added
calculated for each predicted location and stored in 3 DB+ Transactions in the
resultant array. Resultant array contains two values [(6, updated database DB
78.56), (8, 68.56)]. If m=1, then the location 6 will be ∪ db
predicted as next location. If m=2, then the locations 6 4 GMP+,GMPDB,GMPdb Mobility pattern in the
and 8 are the predicted as next locations. The predicted respective database
location can be used by expert system to provide 5 NMP(GMP+), Negative border in the
location based service to the mobile user. NMP(GMPDB), respective database
NMP(GMPdb)
Location 6
4
The algorithm for updating the mobility pattern as
follows:

Mobile Client Function updatemobilitypattern (GMPDB,


Location 4 NMP(GMPDB), MLdb)

Server compute GMPdb


for each sequence s ∈ GMPDB ∪ NMP(GMPDB) do
Mobile Client tdb(s) = number of transaction in db containing s
Location 8 GMP+ = φ
Server
for each sequence s ∈ GMPDB do
if (tDB (s) + tdb(s)) > min_sup
Mobile Client then GMP+ = GMP+ ∪ s
for each sequence s ∈ GMPdb do
Server if s ∉ GMPDB and s ∈ NMPdb and
(tDB (s) + tdb(s))> min_sup then
GMPdb+ = GMPdb+ ∪ s
if GMPDB ≠ GMPdb+ then
Figure 2. Movement of mobile users in GSM NMP(GMP+) = negativebordergen(GMP+)
network else NMP(GMPDB+) = NMP(GMPDB)

76
if GMPDB ∪ NMP(GMPDB) ≠ NMP(GMPDB+) ∪ count 0.30%. From the Figure 4 the performance
NMP(GMPDB) then improvement was about 50% for 0.20% and it is
s = GMPDB+ increased to 55% for the support count 0.25%. The
repeat performance is increased 60% for the support count
compute s = s ∪ NMP (s) 0.30%.
until s does not grow
GMPDB+ = { x s | support(x) >= min_sup } 10000

C o m p u t a ti o n T im e in S e c o n d s
9000
Initially the original transactions are mined to 8000
generate the Gocal Mobility Pattern (GMPDB) for the 7000 100%
6000 90%
specified minimum support. While the Mobility
5000 80%
Patterns are generated, their negative border 4000 70%
GMP(GMPDB) is also generated and retained. The 3000 60%
negative border is used to avoid re-computation when 2000
1000
new transactions are added to the database. When
0
newly moving log transactions are added to the RC IM RC IM RC IM
database(db) the frequent mobility pattern for the new
0.2 0.25 0.3
transactions(GMPdb) are generated for the user Support Value in %
specified minimum support. There are three cases to
update the mobility pattern. 1. The support count is
calculated for each sequence in the (GMPDB) from the Figure 3. Dataset T5I2D1400K, data added
(GMPdb) if the minimum support is satisfied and increment of 100K from 1000K
GMPDB is updated. 2. The support count of sequence
that is common to both GMPdb and NMPdb are counted
and GMPDB is updated if support count is specified. 8000
C o m p u ta tio n T im e in S e c o n d s

Some set of sequences in NMP(GMPDB) may not 7000


satisfy the support count and it would remain in ∪ 6000 100%
NMP(GMPDB). 3. The sequences that are in GMPdb not 5000 90%
in (GMPDB) are counted. The sequences in GMPdb 4000 80%
would satisfy the support count is added to (GMPDB). 3000 70%
2000 60%
10. Experiments 1000
0
The experiments have been performed on Oracle10G RC IM RC IM RC IM
and PostgreSQL installed in Globus Toolkit4.0 0.2 0.25 0.3
middleware in Scientific Linux environment. Initially Support Value in %
the transactions in the database are considered as
original database and datasets are added incrementally
for three runs of algorithm to show the performance of Figure 4. Dataset T5I2D1400K, data added
Incremental Algorithm (IM) over Re-Computing increment of 200K from 600K
algorithm (RC). The data sets represented in the form
of T5I2D1000K, where 5 denotes the average number 10. 2 Impact of Prediction
of locations in the user moving path, 2 denotes support
count of locations in the dataset and 1000K denotes the In our experiment the parameter m represents the
total number of transactions in K. number of predictions made at each time the mobile
user moves from one location to another. If number of
10.1 Impact of Support Value predictions m increases, the number of prediction and
the number of correct predictions is increased and at
The experiment is conducted for 0.30%, 0.25% the same time the probability of getting incorrect
and 0.20% support count. From the Figure 3 it is noted locations are also higher. From the Figure 5, when m
that the performance improvement for increment size increases, the precision of predictions made by re-
of 100K was about 55% for 0.20% support count and it execution algorithm is decreases in higher rate. This
is increased to 60% for.25% support count. The value is reduced in incremental algorithm by updating
performance is increased about 65% for the support the mobility rules immediately when new logs are

77
added to the database. The precision is calculated by
the number of correctly predicted locations divided by [2] Mario Cannataro, Antonio Congiusta, Andrea Pugliese,
the total number of predictions made. Talia, and Paolo Trunfio, ”Distributed Data Mining on Grids:
Services, Tools and Applications”, IEEE Transaction on
Systems Man and Cybernetics, Dec. 2004, pp.2451-2465.
0.9
[3] Ian Foster, “Globus Toolkit Version 4: Software for
0.8 Service-Oriented System”, Journal of Computer Science and
0.7 Technology, Jul. 2006, pp. 513-520.
P re c i s i o n

0.6 RC [4]. Cristian Aflori and Mitica Craus, “Grid Implementation


0.5 IM of Apriori Algorithm”, Advances in Engineering Software,
pp. 295-300.
0.4

0.3 [5] M. Cannataro and D. Talia,“The Knowledge Grid”,


Communications of the ACM, 2003, pp. 89-93.
0.2
1 2 3 4 5 6 7 [6] Oracle, Java Stored Procedures Developers Guide,
Maximum Number of Predictions http://otn.oracle.com/doc/oracle8i_816/java/a81353/toc.htm

[7] Foster and C. Kesselman, “The Anatomy of the Grid:


Figure 5. Precision of prediction Enabling Scalable Virtual Organizations”, International
Journal of Super Computing Applications”, 2001.
11. Conclusion [8] Wu-Shan Jiang and Ji-Hui Yu, “Distributed Data Mining
on the Grid”, Proceedings of International Conference on
In this paper, we have proposed SQL based Machine Learning and Cybernetics, Aug. 2005, pp. 2010-
incremental parallel and distributed algorithm 2014.
implemented on Knowledge grid to predict the next
location of mobile user in a mobile web computing [9] R. Agrawal and J. Shafer, “Parallel Mining of Association
system. Incremental algorithm performs better when Rules”, IEEE Transaction on Knowledge and Data
compared to re-computation for larger datasets. In the Engineering, 1996, pp. 962-969.
first step of mining algorithm, mobility patterns are
[10] R. Agrawal, T. Imielinski and A. Swami, “Mining
mined from the user access path and mobility rules are Association Rules between Sets of Items in Large
generated using mobility patterns. Finally current Databases”, Proceedings of ACM SIGMOD International
location of mobile user is compared with the mobility Conference on the Management of Data, 1993, pp. 207-216.
rule to predict the location of mobile user. By using the
predicted movement, the system can effectively [11] Gokhan Yavas, Dimitrios Katsaros, Ozgur Ulssoy, and
allocate resources and provide location based services Yannis Manolopoulos, “A Data Mining Approach for
to the mobile users. Our knowledge grid based Location Prediction in Mobile Web Environments”, IEEE
mobility pattern mining algorithm for mobility Transaction on Data and Knowledge Engineering, Sce.
2005, pp.121-146.
prediction needs less computation time compared to
sequential mobility prediction algorithm and it [12] Assaf Schuster, Ran Wolff, and Dan Trock, “A High-
supports scalability. The proposed approach shows Performance Distributed Algorithm for Mining Association
how the knowledge grid system is used for distributed Rules”, Knowledge and Information Systems (KAIS) Journal,
data analysis. Also compared to other distributed
system, grid reduces the message communication [13] B. Thuraisingham, “A Primer for Understanding and
overhead using MPICH-G2 technology. The Applying Data Mining”, IEEE Educational Activities
subsequence exchange between processes is effectively Department, 2000, pp. 28-31.
achieved by using MPICH-G2. In future the work can
[14] Sedong Kwon, Hyunmin Park, and Kangsun Lee, “A
be extended by applying the SQL subquery to calculate
Novel Mobility Prediction Algorithm Based on User
the support count. Movement History in Wireless Networks AsiaSim 2004,
LNAI 3398”, 2005, pp. 419–428.
12. References
[15] Tong Liu, Paramvir Bahl, and Imrich Chlamtac,
[1] Luo, Anil L. Pereira, and Soon M. Chung, “Distributed “Mobility Modeling, Location Tracking, and Trajectory
Mining of Maximal Frequent Item sets on a Data Grid Prediction in Wireless ATM Networks”, IEEE Journal on
System”, The Journal of Super Computing, 2006, pp.71-90. Selected Areas in Communications, Aug. 1998, pp. 922-936.

78

You might also like