Professional Documents
Culture Documents
Design of Water Quality Monitoring Based On SVM and Its Simulation Platform by Remote Sensing
Design of Water Quality Monitoring Based On SVM and Its Simulation Platform by Remote Sensing
Huibin Wang, Zhuoyuan Ren, Min Tang, Aiye Shi, Fengchen Huang
College of Computer and Information Engineering
Hohai University
Nanjing, China
Abstract—Monitoring water quality using remote sensing Practically, SVM not only has good ability to approximate
technology is current research focus, the main challenge of which complex nonlinear system, but also has better performance of
is to design an appropriate inversion model of water quality and generalization, learning and classification [3]. Especially, it has
an effective simulation platform. For the Small Sample Size little dependence on sample size which rests on the limited
problem, this paper proposes a water quality inversion method number of ground monitoring site. Thus, the paper proposes an
based on SVM. Such method uses ε-SVR whose kernel is RBF to inversion model based on SVM and its application. Besides, we
build the inversion model. Besides, we design a water quality design a simulation platform (WRS) based on MVC pattern to
monitoring simulation platform (WRS) based on MVC pattern. make the experiment more efficient.
WRS is developed by MFC, GDAL and LIBSVM to realize the
function of graphical interface, image read/write, modeling and
inversion. Furthermore, the divide and conquer algorithm is II. THE MODELING OF WATER QUALITY INVERSION BASED
utilized to speed up the huge-volume remote sensing image ON ε -SVR
processing. Finally, we simulate this SVM method on WRS, and
the results show the feasibility of our method and the A. Model Construction
effectiveness of the simulation platform.
The inversion model is a representation of relationship
Keywords-water quality monitoring; SVM; simulation platform; between characteristics of water reflectance spectra and water
remote sensing image processin quality indicators. So, we can acquire the inverse result from
the model by using the remote sensing images as the input,
because the information of water reflectance can be stored in
I. INTRODUCTION these images. There are three main parts to modeling as
Generally, we use the remote sensing images as the input of follows: First, combine the ground monitoring data and remote
water quality inversion model which is achieved by analyzing sensing image into training set. Second, search the best
the relationship between characteristics of water reflectance parameters by cross-validation. Third, train the ε-SVR and get
spectra and water quality indicators. It is the main way to the inversion model. Note that the targets of the training set are
monitor water quality using remote sensing technology. Hence, supplied by ground monitoring data, and the attributes are
the inversion model is a vital part of water monitoring research. provided by remote sensing image. Fig. 1 shows the modeling
The analysis and calculation of water quality inversion is procedure. Note that the attributes depend on the spectral
currently based on general remote sensing data processing characteristics of the target water quality indicator. For
platform, like ERDAS and ENVI. However, the large amounts chlorophyll-α, there is an absorption peak near 440nm, a
of models and complex processing programs in those platforms reflection peak near 550nm, a prominent fluorescence peak
limit the potential of further research and generalization. Thus, near 685nm. It coincides with the band 1-3 of the TM image.
it’s very important to design an effective remote sensing data Thus, the attributes are the DN values of the band 1-3 [4].
processing simulation platform for the research of water quality
monitoring.
Many scholars have design inversion models in recent
years. Their methods include linear regression, multiple linear
regression, cluster analysis, grey system theory, neural
networks and so on [1]. Nevertheless, there still exist problems
in those methods. The nonlinear relationship between water
quality indicators and reflectance spectra results in the
uncertainty of the inversion model. Although neutral network is
able to imitate the nonlinear relationship [2], if the size of
training sample can’t meet the training demand, the ground
monitoring sites which provide data of training samples are Figure 1. Water quality inversion model based on ε-SVR.
rather limited, the generalization performance will degrade.
analyzes the regression part of SVM, because the target values ⎨ = ω − ∑ (α i − α i* ) xi = 0 (4)
are the concentrations of water quality indicator. It can be ⎪ ∂ω i =1
max W (α ) = ∑ yi (α i − α i* ) − ε ∑ (α i + α i* )
which has maximum margin of separation between any training i =1 i =1
point and the hyperplane. The wider the margin is, the smaller
1 l
the total error will be. −
2 i , j =1
∑ (α i
− α i )(α j − α j )( xi ⋅ x j ),
* *
(5)
To generalize the SVM to SVR, the optimal hyperplane is a l
function which can estimate target values. Suppose we are subject to ∑ (α i
− α i* ) = 0 and α i , α i* ∈ [ 0, C ] .
given training data ( x1 , y1 ), … , ( xl , yl ) ∈ R d × R , where i =1
xi , i = 1, 2, , l denote the attributes of an input sample, Solving equation (4) and (5), the second equation of (4) can be
yi , i = 1, 2, , l denote the target of a sample. Our goal is to find rewritten as follows:
a function f ( x) = ω ⋅ x + b, ω ∈ R d , b ∈ R . If the difference l
ω = ∑ (α i − α i* )xi (6)
between f ( xi ) and yi is less than ε for all the xi , f ( x) will i =1
f ( x) = ∑ (α i − α i ) ⋅ ( xi ⋅ x ) + b (7)
*
i =1
1 2
min φ ( x ) = ω ,
2 (1) In nonlinear situation, x can be mapped into high-
subject to yi − (ω ⋅ xi − b) ≤ ε . dimension feature space by ϕ ( x) . The samples are linearly
separable in this feature space. The inner product ( xi ⋅ x ) can be
In this description, ε is the maximum deviation from the transformed into ϕ ( xi ) ⋅ ϕ ( x ) . Substituting ϕ ( xi ) ⋅ ϕ ( x ) for kernel
actually targets. So the method is called ε-SVR. Sometimes, function K ( xi , x ) , the linear fit can be realized after nonlinear
this may not be the case. We can transform (1) into a
constrained optimization problem by introducing slack transformation, and there is no increase in computational
complexity. Thus, the optimization problem can be transformed
variables. We denote them by ξ and ξ * , as follow:
1 n
+ C ∑ (ξi + ξi ),
2 l l
min φ ( x ) = ω max W (α ) = ∑ yi (α i − α i ) − ε ∑ (α + α i )
*
* *
2 i =1
i =1 i =1
(8)
⎧ yi − ω ⋅ xi − b ≤ ε + ξ i (2) 1 l
⎪
subject to ⎨ω ⋅ xi + b − yi ≤ ε + ξ i
* −
2 i , j =1
∑ (α i
− α )(α j − α )K ( xi , x j ).
*
i
*
j
⎪ ξ , ξ * ≥ 0, i = 1, 2, , l , C > 0.
⎩ i i The equation (6) is rewritten as follow,
We can use Lagrange multipliers to cope with (2) by l
1 l
And the regression function can be written as
ω + C ∑ (ξi + ξ i* )
2
L (ω , b, ξ , ξ * ) =
2 i =1 l
f ( x) = ∑ (α i − α i ) ⋅ K ( xi , x ) + b (10)
*
l
− ∑ α i (ε + ξi − yi + ω xi + b) i =1
i =1
l
(3) There are four commonly kernel functions as follows:
− ∑ α i ( yi + ε + ξ i − ω xi − b)
* *
− ∑ (ηi ξi + ηi ξi )
* *
• Polynomial: K ( xi , x j ) = (γ xiT x j + r ) d , γ > 0 ; (12)
i =1
2164
•
2
RBF: K ( xi , x j ) = exp( −γ xi − x j ), γ > 0 ; (13) coordinate among each other modules. It can be used to control
the flow of the application. It processes the events, and then
• Sigmoid: K ( xi , x j ) = tanh(γ xiT x j + r ) . (14) responds them. The user interface and complex business logic
are separated clearly by introducing MVC. So, it often used to
In this paper, we choose the RBF as the kernel to build the design distributed systems. Nevertheless, for some application
water quality inversion model. which has complex functions of data processing, it can be used
too.
C. Values of C, ε and γ
B. Framework Design using MVC
We have to identify a best parameter group (C, ε, γ) via
grid-search before training the SVR. Grid-search is a method 1) Improvement of Document/View Frame
that traverses the ranges of the parameters to pick a best WRS is developed via MFC (Microsoft Foundation
parameter group. In detail, we use the existing training set and Classes). MFC provide a frame named Document/View. In the
every parameter group which is picked via grid-search to frame, the class CView can be seen as View of MVC, its task is
conduct an n-fold cross-validation. The best group is the one to draw the windows and data. The task of class CDocument is
who has the highest accuracy in all the cross-validations. In n- to access data and files, so it can be seen as Model of MVC.
fold cross-validation, we first divide the training set into n There isn’t an independent module to realize the Controller of
subsets of equal size. Then, one subset is tested using the MVC in MFC, so it can be realized in CView or CDocument.
regression function trained on the remaining n-1 subsets. Thus, Accordingly, the function of Controller must be bound
each sample of the training set is predicted once, so the with CView or CDocument. Hence, if we use Document/View
accuracy of a cross-validation is the percentage of targets frame directly for WRS, the strong coupling between modules
which are predicted in ε deviation. of view and business logic will occur. Thus, we add a public
class named CModel which is a member of CDocument to
III. MVC PATTERN AND SIMULATION PLATFORM DESIGN solve this problem. In class CModel, it encapsulates the whole
We will make use of the simulation platform (WRS) to business logics. In this case, the business logic can be, partly,
support the analysis and evaluation of the water quality independent of CDocument. Then, CDocument degenerate into
inversion model. Hence, functionally, WRS mainly involves Controller. Eventually, the framework of WRS is logical
the remote sensing image processing and water quality consistent with MVC pattern (Fig. 2).
monitoring. For the former, WRS can read, write and browse
huge-volume remote sensing images and it can process these
images quickly. For the latter, it can realize ground monitoring
data management, SVM modeling and water quality inversion.
We designed the structure of WRS by introducing MVC
pattern. Consequently, WRS has powerful capability of
operation and display and it can process huge-volume images
quickly.
WRS is developed by Visual C++ 2008. We take advantage
of GDAL (Geospatial Data Abstraction Library) to realize the
remote sensing image I/O between memory and peripheral and
utilize LIBSVM (an open source SVM library developed by
Chin-Jen Lin and co-workers [7] [8]) to complete modeling and Figure 2. The improvement of Document/View.
inversion of SVM.
2) The Hierarchy Design
A. MVC Pattern According to our improvement, the functions which include
MVC denotes Model-View-Controller. The core ideology image reading, writing and display, SVM modeling and water
of MVC is to separate the function of business logic, quality inversion, and the collection and management of
representation and control, from an application, to three monitoring data, can be realized in CModel. Moreover, the
independent parts. MVC was proposed by Trygve Reenskaug functions are encapsulated in independent modules based on
in 1970s. The target of MVC is to realize a type of dynamic modular program design, and the modules provide public
programming to simplify the modification and extension of the interface to Model. Particularly, we build, separately, file I/O
program, and to recycle partial program. Moreover, the interface, SVM interface and data management interface to
application structure is clear and easy to read by using this provide services for image operation of reading and writing,
pattern. modeling and inversion, and maintenance of monitoring data.
Fig. 3 shows the hierarchy diagram of WRS. In the diagram,
Model can be used to encapsulate the business logic. The the upside of Model is the infrastructure of WRS, and the
business logic of WRS is the image processing. Model can downside of Model is the basic function of it. For the basic
access the data directly without View and Controller. View can function of it, every layer invokes methods which belong to the
show the data which the user wants. Generally, business logic next layer, and provides service to the previous layer.
can’t be included in View. The effect of Controller is to
2165
In the programming, a function called ImageCalc is defined
to implement this algorithm. Besides, the specific image
processes which can be invocated after division are designed as
a callback parameter of ImageCalc. Function description is the
following:
ImageCalc( CRect rect, int *band, void *para,
void (CModel::*calc)(CRect,int*,void*) )
• rect: The information of current block.
• band: A pointer which point to an array of band’s ID.
• para: A pointer which point to additional parameters.
Figure 3. System hierarchy diagram. • calc: A pointer of callback function.
This algorithm only divides according the image
IV. THE DESIGN OF DATA PROCESSING coordinate. The callback function read the final blocks from
File I/O interface via their coordinate, and then processes and
A. Division Algorithm of Remote Sensing Image Based on saves them.
Divide and Conquer Algorithm
B. The Design of Concentration Inversion Procedure
The remote sensing images must be processed pixel by
pixel during the pretreatment or inversion. Due to the volume As mentioned above, WRS needs to execute the
of remote image, it often can’t be read into memory at once. concentration inversion procedure on the final blocks. For this
Thus, we need to design an algorithm to divide a whole image procedure, first, we try to train the ε-SVR model by the
into blocks. These blocks are smaller enough to be read one by correlative bands of the remote sensing image and the actual
one. So, the specific procedure, like inversion and pretreatment, concentration which acquire from ground monitoring sites.
can be allowed to work on these blocks one after another. Then, the remote sensing image is imported into the ε-SVR
model. Finally, we get the inverse concentration of the water
In this paper, a division algorithm is designed to divide the quality indicator. Now, we will realize this procedure in
image into blocks based on Divide and Conquer algorithm. computer. First, the pixels must be found by the location of the
This algorithm needs to divide an image, self adaptively, into ground monitoring sites, and then combine them and actual
blocks which have equal size, because the size of input remote concentration into training set after pretreatment. Second, train
sensing image is uncertain. The basic idea of this algorithm is the ε-SVR using the training set, and then a ε-SVR model will
the following: First, an original image is divided into 4 roughly be returned. Finally, import the remote sensing image, and then
equal sized blocks by drawing a cross at the center of it. Then, we will get the predicted concentration distribution image.
these blocks are divided into sub blocks by using the same Fig.5 is a flowchart of this procedure. We can observe how the
method. Continuing this process until the size of blocks less data flows among modules in Fig. 5.
than the size which defined by user. Finally, these blocks are
read in the memory to process one by one (Fig. 4). The process In this paper, a data management interface is designed to
of the whole image is complete when all of these blocks are manage the data of ground monitoring sites. Moreover, the
processed. It’s a typical Divide and Conquer problem. The LIBSVM is encapsulated to a SVM interface to provide
division of the algorithm can be designed as recursion. The services of SVM modeling and inversion of water quality
stopping criterion of this recursion is “block size ≤ predefined indicator concentration. The two interfaces mentioned above
size”. This algorithm can be described as follow: can be invoked by CModel. The data structure and functions of
LIBSVM is invisible for CModel.
⎧Θ(n 2 ) n≤S
T ( n) = ⎨ (15)
⎩4T (n / 4) + Θ(1) n > S .
Here, S represents the predefined size. The time complexity at
the stopping criterion is Θ( n 2 ) , because these final blocks,
which are subject to algorithms of remote sensing image
processing, must be processed pixel by pixel.
2166
V. RESULTS
ACKNOWLEDGMENT
This work is supported by National Natural Science
Foundation of China (60774092) and ( 60901003), Research
Fund for the Doctoral Program of Higher Education of China
(20070294027).
REFERENCES
[1] Q. Wang, B. Zhang, Y. C. Wei and X. W. Li, Remote sensing
monitoring experiment of Taihu Lake water body and software
realization. Beijing: Science Press, 2008.
[2] Louis K E, Yan X H, “A neural network model for estimating sea
surface chlorophyll and sediments from Thematic Mapper imagery,”
Remote Sensing of Environment, vol. 66(2), 1998, pp.153-165.
Figure 6. Inverse results of concentrations (Unit: mg/L). (a) Concentration [3] X. G. Zhang, “Introduction to statistical learning theory and support
distribution of chlorophll-α. (b) Concentration distribution of suspended vector machines,” Acta Automatica Sinica, vol.26(1), 2000, pp.32-42.
substance.
[4] Han L, Rundquist D, Liu L, Fraser R, Schalles J, “The spectral response
of algal chlorophyII in water with varying levels of suspended
B. Performance of WRS sediment,” International Journal of Remote Sensing, vol.15(18), 1994,
pp.3707-3718.
Fig. 7 is the interface of WRS after concentration inversion [5] Cortes and V. Vapnik, “Support vector networks,” Machine Learning,
of chlorophyll-α. There is a thumbnail window at the upper left vol. 20, 1995, pp.273-297.
corner, and a full resolution window at the right side. [6] Bernhard Schölkopf and Alexander J. Smola, Learning with Kernels,
Cambridge, MA:MIT Press, 2002
We use WRS to process the Landsat TM data whose
[7] Chih-Wei Hsu, Chih-Chung Chang, Chin-Jen lin, “A Practical Guide to
resolution is 4017×4132 and number of band is 7 on two Support Vector Classification,” unpublished.
hardware conditions. The time consumed of predicting the [8] Chih-Chung Chang, Chih-Jen Lin, “LIBSVM: a Library for Support
concentration of chlorophyll-α via ε-SVR is shown by Table II. Vector Machines,” unpublished.
2167