Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Parallel Processing Architecture of Remotely Sensed Image Processing System

Based on Cluster

Hangye Liu1,2 Yonghong Fan1 Xueqing Deng1 Song Ji1,2


1 2
Dept. of Remote Sensing Information Engineering Jiangxi Province Key Lab for Digital Land
Zhengzhou Institute of Mapping and Surveying East China Institute of Technology
Zhengzhou, PR. China Fuzhou, PR. China
lhy2004lx@163.com jisong_chxy@163.com

Abstract—As remote sensing platform and sensor technologies of two main types of parallel computing system, the
develop quickly, the resolution of remotely sensed images architecture of remote sensing parallel processing system is
becomes much higher. However, the data processing studied, and the task scheduling and workflow under
technologies have been following behind, especially in specific remote sensing processing application are
computing speed and efficiency. Parallel computing technology discussed.
provides an effective way to solve this problem. In this paper,
based on the analysis of parallel computing system, the II. PARALLEL COMPUTER SYSTEMS
architecture of remote sensing parallel processing system
which is based on cluster computing is studied, and a new Current parallel computer systems may fall into two
hybrid cluster computing architecture for massive remote basic types, which are shared memory multi-computers and
sensing data processing is put forward, task scheduling of message passing multi-computers[1].
remote sensing processing system and parallel processing A. Shared memory muliti-computer
workflow are discussed.
In shared memory multi-computers, memories are shared
Keywords- Parallel Computing; Cluster Computing; Load among computers, which means multi-computers share a
Balancing; Geometric correction; Task Scheduling uniformly coded storage unit and data exchange is realized
by addressing operation. Data communication in shared
I. INTRODUCTION memory multi-computers is based on bus, so it is convenient
and faster to access data among computers. Without the
With the rapid development of remote sensing platform explicit control of the data exchange, it is easier for
and sensor technologies, the resolution of acquired remotely programmers to realize a parallel computing program under
sensed images becomes much higher and the amount of data this platform. The structure of shared memory multi-
becomes huge. A common satellite ground receiving station computers is depicted as Figure 1.
can receive image data of more than 10G bytes. Comparing
with the fast development of data acquiring technology,
recent remote sensing data processing speed falls far behind,
which means that abundant data cannot be translated into
useful information in time. Parallel computing is able to
improve the computing speed to a great extent in massive
data processing, which makes itself an effective way to
solve the problem of processing efficiency in remote
sensing. Figure 1. Structure of Shared Memory Multi-computers
There are very important differences between processing
mode of remote sensing processing under parallel However, scalability of this system is relatively weak,
computing environment and that under traditional when adding massive processors, fast data accessing cannot
processing mode, which include task assignment and load be guaranteed. This type of parallel computing system can be
balancing. So it is of vital importance to introduce utilized in the applications which don’t need two much
architecture with highly efficient parallel mechanism to computing nodes but do need much data communication.
remote sensing processing. Most of recent remote sensing B. Message passing multi-computers
parallel processing systems are based on specialized parallel
computers which are very expensive and unfavorable to Message passing multi-computer system uses network to
popularize. Meanwhile, Cluster computer represents the connect computers or processors, and each computer has its
development direction of parallel computing, and remote own storage unit which cannot be accessed directly by other
sensing processing based on cluster computing is of high computers. Data exchange is implemented by message
efficiency and cost-benefit ratio. Through proper passing on networks, and the message passing is under
architecture design and effective algorithm, it can achieve explicit control. The drawback of this platform is that
high computing speed. In this paper, based on the analysis programming is more complicated, but in another way, this

978-1-4244-4131-0/09/$25.00 ©2009 IEEE


structure is more flexible and is extensible. The structure of computing. The processing architecture is showed as Figure
message passing multi-computers is depicted as Figure 2. 3.

Figure 2. Structure of Message Passing Multi-computers Figure 3. Architecture of remote sensing parallel processing
system based on cluster computing
Cluster belongs to this kind of parallel computer system.
Cluster system can be built by using low-cost PC computers The whole processing system is divided into several mini
or workstations and can be extended easily by adding clusters (from cluster 0 to cluster n), which are connected by
computers, disks or other resources. Therefore, by proper gigabit Ethernet to guarantee fast data communication. Each
system design and high efficient parallel algorithms, cluster mini cluster consists of several PCs which may have multi-
computing is able to get very high computing speed with processors and multi-GPUs to form a mini shared memory
lower cost. system. Architecture of mini shared memory system is
Some mature remote sensing parallel processing systems, showed as Figure 4.
such as pixel factory-[6] and DPGrid[7], adopt specialized
parallel computers as its computing platform. The cost of
whole system is enormously expensive and price is swift to
reach million dollars. The maintenance cost is also very
high. Recently, as the performance of PC computers and
technology of network develop fast, the computing power of
cluster improves sharply. Using cluster computer system as
computing platform is of profound meaning to popularize the
application of remote sensing. Figure 4. Architecture of mini shared memory system

III. ARCHITECTURE OF PARALLEL REMOTE SENSING Traditional cluster system consists of high-network and
PROCESSING SYSTEM independent computers as computing nodes. It is useful
enough since traditional PC has only one processor and the
Process architecture of parallel remote sensing
computing resource can be fully utilized. However, as multi-
processing system is core of the whole system, of which the processors are widely used in common PC and the floating
computing efficiency affects the processing speed of whole computing advantage of GPU becomes prominent, the
system. In order to achieve the best performance, the potential computing power of common PC is worth
computing power of every single PC in the cluster system developing.
should be fully utilized. The architecture above is a kind of hybrid parallel
Remote sensing processing involves a huge amount of architecture, which combines the shared memory system
data and complex computing operations, which makes both into cluster system. The whole cluster is divided into mini
the time of calculation and data communication be costly. clusters, so the workload of single management server can
Considering the characteristics of remote sensing processing, be reduced to avoid communication bottleneck and each
an efficient processing system should allocate computing
mini cluster takes charge in cooperating PC nodes in it.
load into as many processing nodes as possible to speed up
and avoid as much unnecessary data communication as Because the number of PC in one mini cluster will be
possible, or use more efficient methods to improve the data relatively small, the workload of cooperation will not affect
communicating speed. With the development of multi- the computing efficiency and fast network guarantees the
processor technology, there is more than one processor in a smooth data communication. Each PC in mini cluster
single PC and multi-processors in one PC can work together represents a shared memory system, which has better data
efficiently so as to form a mini shared memory parallel communication power. Due to the relatively weak
system. Besides, as the graphic processing unit (GPU) scalability of shared memory system and restriction of
technology develops rapidly, the powerful computing memory, this type of parallel computing system is not
capability of GPU is able to improve the computing speed suitable for massive parallel computing, but when the
further. Taking two new technologies above into account, processing tasks are already subdivided and necessary
this paper puts forward a new hybrid processing architecture amount of memory is restricted, its strong communication
of remote sensing processing system based on cluster
speed can achieve much better performance on the parallel scheduling among mini clusters will be less time-consuming
computing efficiency. and less complex, so these task scheduling can be performed
As mentioned above, remote sensing processing is data- distributed or even through a overall centered scheduling
intensive, massive data communication will be inevitable. In server.
this architecture, the massive image data and corresponding After determining the scheduling policy, load
processing tasks are divided into mini clusters, and in the information collection policy should be considered. Load
mini cluster, the data and the sub-tasks will be subdivided information collection involves two aspects, one is the load
further. Through the subdivision, the amount of data in a index which measures the load on a node, and another is
single processing node is reduced into such a number that the when to collect the load information. In most situations,
data can be processed in a shared memory system, thus the computation in remote sensing processing is repeated on
parallel computing can be performed more efficiently. At the one single pixel, so number of pixels or blocks of image that
same time, all the computation resources are fully utilized. need to be processed serves as load index. But in certain
IV. TASK SCHEDULING AND LOAD BALANCING
applications, the computation complexity depends on the
information content, so image blocks with same size would
For further optimization of parallel computing, the task cause different computation. Thus, computation load
that performed under the parallel computing platform should becomes load index. Study shows that choosing single
be allocated and controlled according to certain principle. parameter as load index will be more effective[5], so
The aim of coordinating and control the allocation of tasks according to characteristics of different applications, the
is to achieve the best output speed of final results. In task load index would be different. The information collection
scheduling, an important norm is whether the load policies include periodical collection policy, on-demand
(computing load, work load etc.) on each computing node is collection and on-state-change collection, among which on-
balanced. If loads that allocated to each computing node are state-change collection policy collects information only
nearly equal to each other, the computing nodes are likely to when the load state on node changes. Remote sensing
be fully utilized. Otherwise, the nodes that finish working processing is computation intensive and most computation
earlier have to wait for other nodes to proceed to next resource should be used in computing. Therefore, the
processing. chance that scheduling will disturb computation should be
The first problem in task scheduling is adopting what as small as possible. On-state-change policy not only
kind of scheduling policy. Remotely sensed data is of huge reduces the redundant communication but also avoids the
amount, so the number of parallel computing nodes should postponement of executing the scheduling, so it is more
be big enough to achieve better efficiency. However, when
suitable for massive remote sensing processing.
adopting centered scheduling policy, too many nodes will
Load migration policy should also be considered. A
inevitably cause communication bottleneck in centered
specific load migration involves two problems, what kind of
management node. Besides, the fully distributed scheduling
load to migrate and how to implement. In remote sensing,
policy is hard to implement, especially when there are too
scheduling should be as simple as possible to make sure the
many computing nodes. Therefore, a semi-distributed
computing not to be disturbed too much. Data migration
scheduling policy should be more suitable for remote sensing
involves only the re-allocation of data, which has lower
processing. The policy is described in Figure 5.
complexity and is easy to control, so choosing data migration
to balancing the workload is applicable to remote sensing
parallel processing. For further optimization, a pre--
migration policy can be applied. Because the complexity of
migrating data is relative low, it is possible to perform
computation and load migration at the same time so as to
improve efficiency. Before load migration, a forecast of load
will be performed. The load migration is actually overlapped
with computing, so influence that data communication exerts
on overall computing efficiency is reduced.

Figure 5. Semi-distributed scheduling policy V. DESIGN OF WORKFLOW


Since the working mode of remote sensing data
Combined with the traits of hybrid cluster architecture, processing is parallel and very different from traditional
each mini cluster system adopts centered scheduling policy. remote sensing, the workflow should also be re-designed.
In mini cluster system, one of the PCs or a server is The workflow changes when processing application is
designated to be load information management center which different. Therefore, in this paper, we assume a specific
is in charge of collecting load information on each node. remote sensing application which starts from image
Because the number of nodes in each cluster is restricted, the geometric correction to generation of digital surface model.
communication on load information management center will The workflow falls into two main modes. One is based on
not be a bottleneck of overall performance. Much of the the data that need to be processed, of which the main
scheduling work being done in mini clusters, the task
solution is to divide the data into several blocks which are nearly the same workload. Task scheduling among mini
independent on each other and allocate them to computing clusters takes place only when loads on mini clusters are
nodes to process through the whole workflow. This mode is heavily unbalanced so as to reduce the data communication.
easy to implement but the task scheduling is hard to perform,
because workflow on each node is independent. In order to VI. SUMMARIZE
control the load balance on each computing node, another Comparing with the fast development of data acquiring
mode should be adopted. The other mode is based on the technology, recent remote sensing data processing speed falls
specific processing task and the whole workflow is divided far behind and parallel computing technology is an efficient
into several parallel processing steps. These steps are way to solve this problem. Aiming at the massive remote
executed on each computing node at the same time and sensing processing, this paper analyzes two main types of
necessary operations will be done to guarantee parallel computing platform, and based on that, put forward a
synchronization. The workflow is depicted as Figure 6. hybrid parallel processing architecture for remote sensing
processing. This architecture is flexible and high efficient to
meet the requirement for computation intensive and data
intensive application. Also, proper task scheduling policy
and workflow are discussed.
ACKNOWLEDGMENT
This work described here is funded by Jiangxi Province
Key Lab for Digital Land (DLLJ200901).
REFERENCES
[1] Barry Wilkinson, Michael Allen., Parallel Programming, China
Figure 6. Task-based parallel processing workflow Machine Press, Beijing, 2002, pp.9-10.
[2] Li Jun, Li Deren, Key Techniques for Distributed Processes of
Remote Sensing Imagery, Geomatics and Information Science of
During each parallel processing step, the task scheduling Wuhan University, 1999, pp. 15-19.
will be performed to make sure the loads on computing [3] Deng Xueqing. Study on Architecture and Algorithms of Raster
nodes are balanced. After each step is finished, an explicit Space Data Service System, Surveying and Mapping Institute, 2003.
synchronization will be made to ensure that each computing [4] Chen Guoliang, An Hong etc. Practice on parallel algorithms. Higher
node proceed to next step at the same time. Thanks to the Education Press, 2004.
[5] Kunz T. The influence of different workload description on a
synchronization, in certain time, tasks executed on each
heuristic load balancing scheme. IEEE Trans. On Software
computing node are the same, so task scheduling is easy to Engineering, 1991, 17(7).
perform. It is quite easy to tell that this workflow needs more [6] Bignone F. Processing of Stereo Scanner: Stereo Plotter to Pixel
data communication and task scheduling will be more Factory. Stuttgart. 2003:14l-150
complex. Combine with architecture which is discussed [7] Zhang Zuxun. From Digital Photogrammetry Workstation (DPW) to
above, the task scheduling can be performed hierarchically. Digital Photogrammetry Grid (DPGrid). Geomatics and Information
Science of Wuhan University. 2007: 565-571.
Task scheduling inside the mini clusters is executed more
frequently to make sure the specific computing nodes have

You might also like