Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2009 International Conference on Environmental Science and Information Application Technology

A Webgis Load-balancing Algorithm Based on Collaborative Task Clustering

Huang Ying, Guo Mingqiang, Luo Xiangang, Liu Yong


Faculty of Information & Engineering, China University of Geosciences
GIS software and application project research center of the educational department
Wuhan, China
e-mail: mayying_2003@163.com

Abstract—This paper analyzes deficiencies of load-balancing studied as the core component of the model. Practical
technologies in the current webgis and discusses the necessity implementation of the algorithm is discussed. The
and feasibility to solve load-balancing in webgis through algorithm minimizes the disk I/O operations of map server
server farms. A new distributed dynamic scheduling model and the CPU instructions as much as possible. It also
based on server farms is given. A webgis load-balancing effectively improves parallel processing capacities and
algorithm based on collaborative task clustering is proposed. solves the single point failure of map server, providing
The new algorithm divides the collaborative computing task users with faster and better webgis service.
into n subtasks; then generates collaborative task clustering;
at last, dynamically allocate them to every gis server which is III. DISTRIBUTED WEBGIS MODEL BASED ON SERVER
in the above model. It maximizes the execution time of FARM
collaborative spatial computing; effectively improve the
utilization and the parallel processing capacity of webgis; Because of the complexity and particularity of webgis
provides faster and better webgis service for users. A test- design, a large amount of data is processed in map server.
bed is established. A series of test results prove that the With the users and network flow growing geometrically, it
algorithm has good global load balancing performance. has high demands on the map server scalability. When we
need a great deal of data, a single map server can not fully
Keywords-webgis; load-balancing; collaborative task process all requests in time. It will result in responses
clustering lagging, requests losing and so on. Some serious delay will
lead to resending of GIS data packets. Therefore, based on
I. INTRODUCTION cluster technology, a map server farm is composed. By the
load-balancing technology, it is necessary to distribute
Webgis is a computer system which can store, process,
large users and computing-intensive tasks to multiple map
analyze, display and use geographic information in the
servers at the same time to enhance parallel processing
internet or intranet [1]. With the development of Internet,
ability.
the spatial computing tasks in webgis have developed from
a single task to the sequential, parallel, and collaborative
ones [2]. Large collaborative spatial computing task is
often divided into a number of subtasks. It gives new
challenges to webgis load-balancing.
II. DEFICIENCIES OF THE EXISTING LOAD-BALANCING
IN CURRENT WEBGIS
The existing webgis model resolves the load-balancing
mainly from two aspects of hardware and software.
Software methods provide more cheaper and effective load
balancing mechanism [3]. There are related studies such as
the shortest queue algorithm (SQA) [4], network-based
polling algorithm (NPA), and the ratio coefficient
algorithm [5]. The task scheduling mechanisms of these
load-balancing algorithms are almost for a single task and
have the characteristic of sequence. They only describe the
calculating amount, communications traffic and priority of
the task. But they fail to reflect the data association and
priorities constraint relations among the tasks. Therefore,
they can only serve in isomorphism computing
environments. When selecting collaborative tasks
resources, it has to consider the calculation amount,
dependent relationship among tasks, resources load and the
network status.
Aiming at above deficiencies, a new distributed model Figure 1. Distributed dynamic scheduling model of server farm.
based on server farm is proposed. The load-balancing
algorithm based on collaborative task clustering (CTCA) is

978-0-7695-3682-8/09 $25.00 © 2009 IEEE 736


DOI 10.1109/ESIAT.2009.326
Based on the map service scheduler, the distributed Then the computing tasks will be submitted to map
dynamic scheduling model is established, shown in Fig.1. servers. It computes the capacity weights of various map
After the message monitor [6] monitors the requests servers in accordance with the calculating time. The
from web server farm, it will inform the master control formula is as follows:
module, which includes the mechanism of task solving. By
the collaborative task clustering load-balancing algorithm W ( p)
of map server farm, the mechanism of task solving will Pi = n
dispatch the map service requests to several distributed ( pi − t i ) ∑ 1 /( pi − t i )
map server nodes to implement load-balancing. After
i =1 (3 )
finishing successfully, the results will be returned to the
users by the dynamic feedback mechanism. To solve the In which, the parameters Ti and n are the same to (2).
single failure of map service scheduler, the model provides Pi is the total time that the map service scheduler spends in
fault-tolerant mechanism. submitting the requests to map servers and getting results.
Through (1), (2), (3), we can get the load weights of
IV. ALGORITHM SOLUTION each map server:

A. Initializing Map Service L i = H i ∗ W ( h ) + T i ∗ W (t ) + P i ∗ W ( p ) (4)


Map service scheduler initializes load weights list [7],
service node information list and service status list of map After calculating map server load weights, the map
server farm. service scheduler starts to monitor the service port and
The load weights of all map servers are decided by the prepare to receive service requests from the web server. At
following three parts: the performance linear integrated the same time map service scheduler monitor thread is
parameters Hi of hardware and software in map server, started to monitor various statuses of map server regularly.
weights for W(h); network conditions parameters Ti
between map service scheduler and map server farm, right B. Generating spatial computing collaborative task
value of W(t); map server instant capacity Pi and the right clustering
value for W(p). Expressed as: W(h)+W(t)+W(p)=1 When the map service scheduler receives a
(Hi,Pi,Ti<1) collaborative spatial computing task request from web
First, according to the hardware and software server, it will get the acyclic directed graph of these tasks
configuration information of map server, compute each from collaborative spatial computing task description, as
weight. The performance values and weights parameters shown in Fig.2 (a). According the communications time
table of all hardware are defined as table 1. and computing time, the algorithm constitutes the subtasks
TABLE I. DEFINITION OF CONFIGURATION AND WEIGHT
into collaborative task clustering.

Hardware value Weight


CPU Ci W(c)
Memory Mi W(m)
Hard disk Di W(d)
Graphic card Xi W(x)
Network card Ni W(n)
Operation system Oi W(o)
Ci, Mi, Di, Xi, Ni, Oi<1
W (c ) + W ( m ) + W ( d ) + W ( x ) + W ( n ) + W ( o ) = 1
Based on above parameters and weights, we can
compute performance linear integrated parameters:

Hi=(Ci*W(c)+Mi*W(m)+Di*W(d)+Xi*W(x)+Ni*W(n)
+Oi*W(o))*W(h) (1)

Secondly, after calculating the performance linear


integrated parameters of hardware and software, according
to the map server network delay calculate each map (a) (b)
server’s network parameters. The calculation formula is as Figure 2. Collaborative computing task graph.
follows:
The collaborative computing task graph can be
W (t ) described as: G=(T, E, e, c)
Ti = n T={ t1, t2,…, tn }; //n subtasks
ti ∑1 / ti E={(ti, tj)|communication needed between ti,tj}
i =1 (2) e(ti); //the calculating time of ti
c(ti, tj); //the communication time between ti, tj
Ti is the network delay between the various map In order to improve the efficiency, the algorithm
servers and map service schedulers, n is the number of implements tasks’ copy. That is, at the same time, there is
map servers. a copy of the same task in a number of service nodes. It

737
will reduce the communication time among subtasks and After the map service scheduler has received the
make the entire spatial computing task ahead of schedule. requests from the web server, it will test whether all the
The algorithm will calculate the earliest possible start current work of the map servers is normal. The test relies
time of each subtask of G. If the scheduling can let each on the map server address information which is preserved
subtask start at the earliest possible time, the scheduling in the map server address list, and modifies the
scheme is optimal. The following algorithm is used to corresponding server list information. According to the
calculate the earliest start time s(v) of a particular task v, calculate costs of each collaborative clustering in
v V. collaborative task clustering list and the current load of
Greedy scheduling function: each spatial computing node, the algorithm allocates each
Set C={v} {μ1,μ2,…,μk},μ1,μ2,…,μk is all parent collaborative task clustering. The greater load weights of
nodes of v. Obviously, the earliest start time will not server nodes, the greater calculate costs they get. By this, it
earlier than the earliest completion time: f{μ1,μ2,…,μk }. can reduce the completion time of the collaborative spatial
The maximum boundary value function: computing mission and balance the load of spatial
maxg(C)=max{g(u, w)| u V-C, w C,u,w) E } computing nodes.
in which, g(u, w) is the boundary value of (u, w): g(u,
w)=s(u)+e(u)+c(u, w), that is, the boundary value of (u, w) V. THE IMPLEMENTATION OF MODEL AND
is equal to the earliest start time of u + time-consuming of PERFORMANCE ANALYSIS
u + the communication time between u and w. We have used 4 PCs which are in high-speed LAN to
To ∀v V, s(v) ≥ max{f({μ1,μ2,…,μk }), maxg(C)}, as build test bed, in which 2 PCs are map servers, 1 PC is a
shown in Fig.2 (b). web server, and 1 PC is map service scheduler. The
g(u,w)=13,g(v,w=8), then s(w)=13. configurations of testing machines are shown in table 2.
Algorithm is following. We have used Load Runner to carry out a collaborative
C={v}; spatial task’s simulation test which is described in the
//take the earliest possible start time of v figure 2. We used 160.1 MB map data. The first step is the
var-g=maxg(C); comparison of performance among the shortest queue
var-s=var-g; algorithm (SQA), network-based polling algorithm (NPA)
Repeat the following steps until out of it. and the new one CTCA with designated number of
(1) Select w C and u V-C from C and V-C in order to connections, the test results is shown in Fig.4; the second
make g(u, w)=var-g; is to compare the average response time, the results are
(2) C=C∪{u}; shown in Fig.5.
var-f=f(C-{v}); From the test results shown in Fig.4, under the same
var-g=maxg(C); task and same requests, the average response time and
(3) if var-s<max{var-f, var-g}, then C=C-{u}; task’s complementation time of CTCA are shorter than
else var-s=max{var-f, var-g}; SQA and NPA.
return var-s; Fig.5 is the average response time comparison chart
The algorithm given above calculates the collaborative among CTCA, SQA and NPA when a client increases five
task clustering C and the earliest start time of the given users to do parallel processing every time.
node v at the same time, the tasks in C will be assigned to The above test results indicate that using CTCA
a spacial computing service node for processing. We significantly shorten the response time of system and
calculate the C(v) for each vertex v from the root node of improve map server farm’s parallel processing capacity. It
collaborative task graph step by step, Finally, C(w) is has good stability, concurrency and load-balancing
reserved if C(v)⊆C(w).The final collaborative task capacity.
clustering generates from Fig.1 is shown in Fig.3.
TABLE II. CONFIGURATIONS PARAMETER OF HARDWARE
As soon as we get the collaborative task clustering C1,
C2, C3 and C4, we sort them by the cost of calculating S Role CPU
Mem- Hard Network Graphic
(i), and then saved to the collaboration task clustering list ory Disk Card Card
for map service scheduler. Intel ST
Map D-Link RADEON
PD2.66 2G 160G/
server DFE-530TX X550
×2 7200R
ST
Map Intel realtek GeForce4
1G 80G/
server P42.0 8139 MX 440
7200R
WD
Web AMD realtek
1G 80G/ Integrated
server 2500+ 8139
7200R
SAM
Map
Intel SUNG realtek
service 512M Integrated
C1.6 40G/ 8139
scheduler
7200R

VI. CONCLUSION
This paper proposed a new webgis model based on
server farm. Focusing on the characteristic of collaborative
spatial computing task, we discussed the implementation
Figure 3. Collaborative task clustering. of CTCA in detail. At last, a test-bed is established to

738
verify the performance of the algorithm. Through a series
of test results, this paper shows the availability and
efficiency of CTCA The next step is to study the ACKNOWLEDGMENT
application of CTCA in server cluster. The important work The work is supported by National "863" plan:
is to study how to use the server cluster’s resources Research and software development of three-dimensional
entirely and improve the utilities. spatial information service technique of the network
(No.2009AA12Z211).

REFERENCES
[1] He, J., Liu, R.Y. and Liu, N, “The Design and Implementation
based on COM+”, Journal of Zhejiang University, June 2004,
vol.31, pp.712.
[2] Jiang, F., Zhou, B.Q. and Wang, H.F, “An Effective Loading-
Balancing Framework for Distributed WebGIS”, Microcomputer
Information, October 2006, vol.22, pp.215-217.
[3] Zhu,J., Song, G.F. and Zhong, E.S. “Research on a New
Generation of WebGIS Based on Web Services and .NET
Figure 4. The comparison of performance among SQA, NPA and Technologies”, Geomatics World, Feberary 2004, vol.2, pp17-20,.
CTCA. [4] Zhang,S. and Kang, Z.W. “Design and Implementation of .NET-
based WebGIS System”, Computer Engineering, May 2006, vol.8,
pp.106.
[5] Li, W.Z, Guo, Q and Wang,L. “Study and Realization of Internet
Server Load Balancing”, Computer Engineering, June 2005, vol.3,
pp,98.
[6] Dong YG, Wang SR, Guo YF, Liu Y. “Exploring Load Balancing
of a Parallel Switch with Input Queues”, Journal of Software,
Feberary 2007, vol.18, pp.229-235.
[7] Douglas G. Down, Mark E. Lewis. “Dynamic load balancing in
parallel queueing systems: stability and optimal control,” European
Journal of Operational Research, 2006, vol.168, pp.509-519.

Figure 5. The comparison of performance among SQA, NPA and


CTCA.

739

You might also like