Professional Documents
Culture Documents
A Webgis Load-Balancing Algorithm Based On Collaborative Task Clustering
A Webgis Load-Balancing Algorithm Based On Collaborative Task Clustering
Abstract—This paper analyzes deficiencies of load-balancing studied as the core component of the model. Practical
technologies in the current webgis and discusses the necessity implementation of the algorithm is discussed. The
and feasibility to solve load-balancing in webgis through algorithm minimizes the disk I/O operations of map server
server farms. A new distributed dynamic scheduling model and the CPU instructions as much as possible. It also
based on server farms is given. A webgis load-balancing effectively improves parallel processing capacities and
algorithm based on collaborative task clustering is proposed. solves the single point failure of map server, providing
The new algorithm divides the collaborative computing task users with faster and better webgis service.
into n subtasks; then generates collaborative task clustering;
at last, dynamically allocate them to every gis server which is III. DISTRIBUTED WEBGIS MODEL BASED ON SERVER
in the above model. It maximizes the execution time of FARM
collaborative spatial computing; effectively improve the
utilization and the parallel processing capacity of webgis; Because of the complexity and particularity of webgis
provides faster and better webgis service for users. A test- design, a large amount of data is processed in map server.
bed is established. A series of test results prove that the With the users and network flow growing geometrically, it
algorithm has good global load balancing performance. has high demands on the map server scalability. When we
need a great deal of data, a single map server can not fully
Keywords-webgis; load-balancing; collaborative task process all requests in time. It will result in responses
clustering lagging, requests losing and so on. Some serious delay will
lead to resending of GIS data packets. Therefore, based on
I. INTRODUCTION cluster technology, a map server farm is composed. By the
load-balancing technology, it is necessary to distribute
Webgis is a computer system which can store, process,
large users and computing-intensive tasks to multiple map
analyze, display and use geographic information in the
servers at the same time to enhance parallel processing
internet or intranet [1]. With the development of Internet,
ability.
the spatial computing tasks in webgis have developed from
a single task to the sequential, parallel, and collaborative
ones [2]. Large collaborative spatial computing task is
often divided into a number of subtasks. It gives new
challenges to webgis load-balancing.
II. DEFICIENCIES OF THE EXISTING LOAD-BALANCING
IN CURRENT WEBGIS
The existing webgis model resolves the load-balancing
mainly from two aspects of hardware and software.
Software methods provide more cheaper and effective load
balancing mechanism [3]. There are related studies such as
the shortest queue algorithm (SQA) [4], network-based
polling algorithm (NPA), and the ratio coefficient
algorithm [5]. The task scheduling mechanisms of these
load-balancing algorithms are almost for a single task and
have the characteristic of sequence. They only describe the
calculating amount, communications traffic and priority of
the task. But they fail to reflect the data association and
priorities constraint relations among the tasks. Therefore,
they can only serve in isomorphism computing
environments. When selecting collaborative tasks
resources, it has to consider the calculation amount,
dependent relationship among tasks, resources load and the
network status.
Aiming at above deficiencies, a new distributed model Figure 1. Distributed dynamic scheduling model of server farm.
based on server farm is proposed. The load-balancing
algorithm based on collaborative task clustering (CTCA) is
Hi=(Ci*W(c)+Mi*W(m)+Di*W(d)+Xi*W(x)+Ni*W(n)
+Oi*W(o))*W(h) (1)
737
will reduce the communication time among subtasks and After the map service scheduler has received the
make the entire spatial computing task ahead of schedule. requests from the web server, it will test whether all the
The algorithm will calculate the earliest possible start current work of the map servers is normal. The test relies
time of each subtask of G. If the scheduling can let each on the map server address information which is preserved
subtask start at the earliest possible time, the scheduling in the map server address list, and modifies the
scheme is optimal. The following algorithm is used to corresponding server list information. According to the
calculate the earliest start time s(v) of a particular task v, calculate costs of each collaborative clustering in
v V. collaborative task clustering list and the current load of
Greedy scheduling function: each spatial computing node, the algorithm allocates each
Set C={v} {μ1,μ2,…,μk},μ1,μ2,…,μk is all parent collaborative task clustering. The greater load weights of
nodes of v. Obviously, the earliest start time will not server nodes, the greater calculate costs they get. By this, it
earlier than the earliest completion time: f{μ1,μ2,…,μk }. can reduce the completion time of the collaborative spatial
The maximum boundary value function: computing mission and balance the load of spatial
maxg(C)=max{g(u, w)| u V-C, w C,u,w) E } computing nodes.
in which, g(u, w) is the boundary value of (u, w): g(u,
w)=s(u)+e(u)+c(u, w), that is, the boundary value of (u, w) V. THE IMPLEMENTATION OF MODEL AND
is equal to the earliest start time of u + time-consuming of PERFORMANCE ANALYSIS
u + the communication time between u and w. We have used 4 PCs which are in high-speed LAN to
To ∀v V, s(v) ≥ max{f({μ1,μ2,…,μk }), maxg(C)}, as build test bed, in which 2 PCs are map servers, 1 PC is a
shown in Fig.2 (b). web server, and 1 PC is map service scheduler. The
g(u,w)=13,g(v,w=8), then s(w)=13. configurations of testing machines are shown in table 2.
Algorithm is following. We have used Load Runner to carry out a collaborative
C={v}; spatial task’s simulation test which is described in the
//take the earliest possible start time of v figure 2. We used 160.1 MB map data. The first step is the
var-g=maxg(C); comparison of performance among the shortest queue
var-s=var-g; algorithm (SQA), network-based polling algorithm (NPA)
Repeat the following steps until out of it. and the new one CTCA with designated number of
(1) Select w C and u V-C from C and V-C in order to connections, the test results is shown in Fig.4; the second
make g(u, w)=var-g; is to compare the average response time, the results are
(2) C=C∪{u}; shown in Fig.5.
var-f=f(C-{v}); From the test results shown in Fig.4, under the same
var-g=maxg(C); task and same requests, the average response time and
(3) if var-s<max{var-f, var-g}, then C=C-{u}; task’s complementation time of CTCA are shorter than
else var-s=max{var-f, var-g}; SQA and NPA.
return var-s; Fig.5 is the average response time comparison chart
The algorithm given above calculates the collaborative among CTCA, SQA and NPA when a client increases five
task clustering C and the earliest start time of the given users to do parallel processing every time.
node v at the same time, the tasks in C will be assigned to The above test results indicate that using CTCA
a spacial computing service node for processing. We significantly shorten the response time of system and
calculate the C(v) for each vertex v from the root node of improve map server farm’s parallel processing capacity. It
collaborative task graph step by step, Finally, C(w) is has good stability, concurrency and load-balancing
reserved if C(v)⊆C(w).The final collaborative task capacity.
clustering generates from Fig.1 is shown in Fig.3.
TABLE II. CONFIGURATIONS PARAMETER OF HARDWARE
As soon as we get the collaborative task clustering C1,
C2, C3 and C4, we sort them by the cost of calculating S Role CPU
Mem- Hard Network Graphic
(i), and then saved to the collaboration task clustering list ory Disk Card Card
for map service scheduler. Intel ST
Map D-Link RADEON
PD2.66 2G 160G/
server DFE-530TX X550
×2 7200R
ST
Map Intel realtek GeForce4
1G 80G/
server P42.0 8139 MX 440
7200R
WD
Web AMD realtek
1G 80G/ Integrated
server 2500+ 8139
7200R
SAM
Map
Intel SUNG realtek
service 512M Integrated
C1.6 40G/ 8139
scheduler
7200R
VI. CONCLUSION
This paper proposed a new webgis model based on
server farm. Focusing on the characteristic of collaborative
spatial computing task, we discussed the implementation
Figure 3. Collaborative task clustering. of CTCA in detail. At last, a test-bed is established to
738
verify the performance of the algorithm. Through a series
of test results, this paper shows the availability and
efficiency of CTCA The next step is to study the ACKNOWLEDGMENT
application of CTCA in server cluster. The important work The work is supported by National "863" plan:
is to study how to use the server cluster’s resources Research and software development of three-dimensional
entirely and improve the utilities. spatial information service technique of the network
(No.2009AA12Z211).
REFERENCES
[1] He, J., Liu, R.Y. and Liu, N, “The Design and Implementation
based on COM+”, Journal of Zhejiang University, June 2004,
vol.31, pp.712.
[2] Jiang, F., Zhou, B.Q. and Wang, H.F, “An Effective Loading-
Balancing Framework for Distributed WebGIS”, Microcomputer
Information, October 2006, vol.22, pp.215-217.
[3] Zhu,J., Song, G.F. and Zhong, E.S. “Research on a New
Generation of WebGIS Based on Web Services and .NET
Figure 4. The comparison of performance among SQA, NPA and Technologies”, Geomatics World, Feberary 2004, vol.2, pp17-20,.
CTCA. [4] Zhang,S. and Kang, Z.W. “Design and Implementation of .NET-
based WebGIS System”, Computer Engineering, May 2006, vol.8,
pp.106.
[5] Li, W.Z, Guo, Q and Wang,L. “Study and Realization of Internet
Server Load Balancing”, Computer Engineering, June 2005, vol.3,
pp,98.
[6] Dong YG, Wang SR, Guo YF, Liu Y. “Exploring Load Balancing
of a Parallel Switch with Input Queues”, Journal of Software,
Feberary 2007, vol.18, pp.229-235.
[7] Douglas G. Down, Mark E. Lewis. “Dynamic load balancing in
parallel queueing systems: stability and optimal control,” European
Journal of Operational Research, 2006, vol.168, pp.509-519.
739