Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2010 3rd International Congress on Image and Signal Processing (CISP2010)

Design of Water Quality Monitoring Based on SVM


and Its Simulation Platform by Remote Sensing

Huibin Wang, Zhuoyuan Ren, Min Tang, Aiye Shi, Fengchen Huang
College of Computer and Information Engineering
Hohai University
Nanjing, China

Abstract—Monitoring water quality using remote sensing Practically, SVM not only has good ability to approximate
technology is current research focus, the main challenge of which complex nonlinear system, but also has better performance of
is to design an appropriate inversion model of water quality and generalization, learning and classification [3]. Especially, it has
an effective simulation platform. For the Small Sample Size little dependence on sample size which rests on the limited
problem, this paper proposes a water quality inversion method number of ground monitoring site. Thus, the paper proposes an
based on SVM. Such method uses ε-SVR whose kernel is RBF to inversion model based on SVM and its application. Besides, we
build the inversion model. Besides, we design a water quality design a simulation platform (WRS) based on MVC pattern to
monitoring simulation platform (WRS) based on MVC pattern. make the experiment more efficient.
WRS is developed by MFC, GDAL and LIBSVM to realize the
function of graphical interface, image read/write, modeling and
inversion. Furthermore, the divide and conquer algorithm is II. THE MODELING OF WATER QUALITY INVERSION BASED
utilized to speed up the huge-volume remote sensing image ON ε -SVR
processing. Finally, we simulate this SVM method on WRS, and
the results show the feasibility of our method and the A. Model Construction
effectiveness of the simulation platform.
The inversion model is a representation of relationship
Keywords-water quality monitoring; SVM; simulation platform; between characteristics of water reflectance spectra and water
remote sensing image processin quality indicators. So, we can acquire the inverse result from
the model by using the remote sensing images as the input,
because the information of water reflectance can be stored in
I. INTRODUCTION these images. There are three main parts to modeling as
Generally, we use the remote sensing images as the input of follows: First, combine the ground monitoring data and remote
water quality inversion model which is achieved by analyzing sensing image into training set. Second, search the best
the relationship between characteristics of water reflectance parameters by cross-validation. Third, train the ε-SVR and get
spectra and water quality indicators. It is the main way to the inversion model. Note that the targets of the training set are
monitor water quality using remote sensing technology. Hence, supplied by ground monitoring data, and the attributes are
the inversion model is a vital part of water monitoring research. provided by remote sensing image. Fig. 1 shows the modeling
The analysis and calculation of water quality inversion is procedure. Note that the attributes depend on the spectral
currently based on general remote sensing data processing characteristics of the target water quality indicator. For
platform, like ERDAS and ENVI. However, the large amounts chlorophyll-α, there is an absorption peak near 440nm, a
of models and complex processing programs in those platforms reflection peak near 550nm, a prominent fluorescence peak
limit the potential of further research and generalization. Thus, near 685nm. It coincides with the band 1-3 of the TM image.
it’s very important to design an effective remote sensing data Thus, the attributes are the DN values of the band 1-3 [4].
processing simulation platform for the research of water quality
monitoring.
Many scholars have design inversion models in recent
years. Their methods include linear regression, multiple linear
regression, cluster analysis, grey system theory, neural
networks and so on [1]. Nevertheless, there still exist problems
in those methods. The nonlinear relationship between water
quality indicators and reflectance spectra results in the
uncertainty of the inversion model. Although neutral network is
able to imitate the nonlinear relationship [2], if the size of
training sample can’t meet the training demand, the ground
monitoring sites which provide data of training samples are Figure 1. Water quality inversion model based on ε-SVR.
rather limited, the generalization performance will degrade.

978-1-4244-6516-3/10/$26.00 ©2010 IEEE 2163


B. Support Vector Regression The optimal solution can be found in the following form:
Support Vector machine (SVM) was developed by Vapnik
⎧ ∂L l
⎪ ∂b = ∑ (α i − α i ) = 0
and co-workers [5].It’s a kind of machine learning algorithm *

based on statistical learning theory. It can be used for not only ⎪ i =1

classification, but also regression estimation. This paper ⎪ ∂L l

analyzes the regression part of SVM, because the target values ⎨ = ω − ∑ (α i − α i* ) xi = 0 (4)
are the concentrations of water quality indicator. It can be ⎪ ∂ω i =1

called Support Vector Regression (SVR), when SVM is used ⎪ ∂L


⎪ (*) = C − α i − ηi = 0.
(*) (*)

for regression estimation [6]. ∂


⎩ iξ
Following is the principle of SVM: the attributes (inputs) Substituting (4) into (3) yields the dual optimization problem
can be assumed as vectors in a higher-dimension space. In this
space, we can find a unique hyperplane (optimal hyperplane) l l

max W (α ) = ∑ yi (α i − α i* ) − ε ∑ (α i + α i* )
which has maximum margin of separation between any training i =1 i =1
point and the hyperplane. The wider the margin is, the smaller
1 l
the total error will be. −
2 i , j =1
∑ (α i
− α i )(α j − α j )( xi ⋅ x j ),
* *
(5)
To generalize the SVM to SVR, the optimal hyperplane is a l
function which can estimate target values. Suppose we are subject to ∑ (α i
− α i* ) = 0 and α i , α i* ∈ [ 0, C ] .
given training data ( x1 , y1 ), … , ( xl , yl ) ∈ R d × R , where i =1

xi , i = 1, 2, , l denote the attributes of an input sample, Solving equation (4) and (5), the second equation of (4) can be
yi , i = 1, 2, , l denote the target of a sample. Our goal is to find rewritten as follows:
a function f ( x) = ω ⋅ x + b, ω ∈ R d , b ∈ R . If the difference l

ω = ∑ (α i − α i* )xi  (6)
between f ( xi ) and yi is less than ε for all the xi , f ( x) will i =1

have the ability to predict y from x . We can write the problem


Thus, the regression function has the following explicit form:
as a convex optimization problem:
l

f ( x) = ∑ (α i − α i ) ⋅ ( xi ⋅ x ) + b  (7)
*

i =1
1 2
min φ ( x ) = ω ,
2 (1) In nonlinear situation, x can be mapped into high-
subject to yi − (ω ⋅ xi − b) ≤ ε . dimension feature space by ϕ ( x) . The samples are linearly
separable in this feature space. The inner product ( xi ⋅ x ) can be
In this description, ε is the maximum deviation from the transformed into ϕ ( xi ) ⋅ ϕ ( x ) . Substituting ϕ ( xi ) ⋅ ϕ ( x ) for kernel
actually targets. So the method is called ε-SVR. Sometimes, function K ( xi , x ) , the linear fit can be realized after nonlinear
this may not be the case. We can transform (1) into a
constrained optimization problem by introducing slack transformation, and there is no increase in computational
complexity. Thus, the optimization problem can be transformed
variables. We denote them by ξ and ξ * , as follow:
1 n

+ C ∑ (ξi + ξi ),
2 l l
min φ ( x ) = ω max W (α ) = ∑ yi (α i − α i ) − ε ∑ (α + α i )
*
* *

2 i =1
i =1 i =1
(8)
⎧ yi − ω ⋅ xi − b ≤ ε + ξ i (2) 1 l


subject to ⎨ω ⋅ xi + b − yi ≤ ε + ξ i
* −
2 i , j =1
∑ (α i
− α )(α j − α )K ( xi , x j ).
*
i
*
j

⎪ ξ , ξ * ≥ 0, i = 1, 2, , l , C > 0.
⎩ i i The equation (6) is rewritten as follow,
We can use Lagrange multipliers to cope with (2) by l

introducing α i , α i* ,ηi ,ηi* . ω = ∑ (α i − α i* ) ⋅ ϕ ( xi )  (9)


i =1

1 l
And the regression function can be written as
ω + C ∑ (ξi + ξ i* )
2
L (ω , b, ξ , ξ * ) =
2 i =1 l

f ( x) = ∑ (α i − α i ) ⋅ K ( xi , x ) + b  (10)
*
l

− ∑ α i (ε + ξi − yi + ω xi + b) i =1
i =1
l
(3) There are four commonly kernel functions as follows:
− ∑ α i ( yi + ε + ξ i − ω xi − b)
* *

i =1 • Linear: K ( xi , x j ) = xiT x j ; (11)


l

− ∑ (ηi ξi + ηi ξi )
* *
• Polynomial: K ( xi , x j ) = (γ xiT x j + r ) d , γ > 0 ; (12)
i =1

2164

2
RBF: K ( xi , x j ) = exp( −γ xi − x j ), γ > 0 ; (13) coordinate among each other modules. It can be used to control
the flow of the application. It processes the events, and then
• Sigmoid: K ( xi , x j ) = tanh(γ xiT x j + r ) . (14) responds them. The user interface and complex business logic
are separated clearly by introducing MVC. So, it often used to
In this paper, we choose the RBF as the kernel to build the design distributed systems. Nevertheless, for some application
water quality inversion model. which has complex functions of data processing, it can be used
too.
C. Values of C, ε and γ
B. Framework Design using MVC
We have to identify a best parameter group (C, ε, γ) via
grid-search before training the SVR. Grid-search is a method 1) Improvement of Document/View Frame
that traverses the ranges of the parameters to pick a best WRS is developed via MFC (Microsoft Foundation
parameter group. In detail, we use the existing training set and Classes). MFC provide a frame named Document/View. In the
every parameter group which is picked via grid-search to frame, the class CView can be seen as View of MVC, its task is
conduct an n-fold cross-validation. The best group is the one to draw the windows and data. The task of class CDocument is
who has the highest accuracy in all the cross-validations. In n- to access data and files, so it can be seen as Model of MVC.
fold cross-validation, we first divide the training set into n There isn’t an independent module to realize the Controller of
subsets of equal size. Then, one subset is tested using the MVC in MFC, so it can be realized in CView or CDocument.
regression function trained on the remaining n-1 subsets. Thus, Accordingly, the function of Controller must be bound
each sample of the training set is predicted once, so the with CView or CDocument. Hence, if we use Document/View
accuracy of a cross-validation is the percentage of targets frame directly for WRS, the strong coupling between modules
which are predicted in ε deviation. of view and business logic will occur. Thus, we add a public
class named CModel which is a member of CDocument to
III. MVC PATTERN AND SIMULATION PLATFORM DESIGN solve this problem. In class CModel, it encapsulates the whole
We will make use of the simulation platform (WRS) to business logics. In this case, the business logic can be, partly,
support the analysis and evaluation of the water quality independent of CDocument. Then, CDocument degenerate into
inversion model. Hence, functionally, WRS mainly involves Controller. Eventually, the framework of WRS is logical
the remote sensing image processing and water quality consistent with MVC pattern (Fig. 2).
monitoring. For the former, WRS can read, write and browse
huge-volume remote sensing images and it can process these
images quickly. For the latter, it can realize ground monitoring
data management, SVM modeling and water quality inversion.
We designed the structure of WRS by introducing MVC
pattern. Consequently, WRS has powerful capability of
operation and display and it can process huge-volume images
quickly.
WRS is developed by Visual C++ 2008. We take advantage
of GDAL (Geospatial Data Abstraction Library) to realize the
remote sensing image I/O between memory and peripheral and
utilize LIBSVM (an open source SVM library developed by
Chin-Jen Lin and co-workers [7] [8]) to complete modeling and Figure 2. The improvement of Document/View.
inversion of SVM.
2) The Hierarchy Design
A. MVC Pattern According to our improvement, the functions which include
MVC denotes Model-View-Controller. The core ideology image reading, writing and display, SVM modeling and water
of MVC is to separate the function of business logic, quality inversion, and the collection and management of
representation and control, from an application, to three monitoring data, can be realized in CModel. Moreover, the
independent parts. MVC was proposed by Trygve Reenskaug functions are encapsulated in independent modules based on
in 1970s. The target of MVC is to realize a type of dynamic modular program design, and the modules provide public
programming to simplify the modification and extension of the interface to Model. Particularly, we build, separately, file I/O
program, and to recycle partial program. Moreover, the interface, SVM interface and data management interface to
application structure is clear and easy to read by using this provide services for image operation of reading and writing,
pattern. modeling and inversion, and maintenance of monitoring data.
Fig. 3 shows the hierarchy diagram of WRS. In the diagram,
Model can be used to encapsulate the business logic. The the upside of Model is the infrastructure of WRS, and the
business logic of WRS is the image processing. Model can downside of Model is the basic function of it. For the basic
access the data directly without View and Controller. View can function of it, every layer invokes methods which belong to the
show the data which the user wants. Generally, business logic next layer, and provides service to the previous layer.
can’t be included in View. The effect of Controller is to

2165
In the programming, a function called ImageCalc is defined
to implement this algorithm. Besides, the specific image
processes which can be invocated after division are designed as
a callback parameter of ImageCalc. Function description is the
following:
ImageCalc( CRect rect, int *band, void *para,
void (CModel::*calc)(CRect,int*,void*) )
• rect: The information of current block.
• band: A pointer which point to an array of band’s ID.
• para: A pointer which point to additional parameters.
Figure 3. System hierarchy diagram. • calc: A pointer of callback function.
This algorithm only divides according the image
IV. THE DESIGN OF DATA PROCESSING coordinate. The callback function read the final blocks from
File I/O interface via their coordinate, and then processes and
A. Division Algorithm of Remote Sensing Image Based on saves them.
Divide and Conquer Algorithm
B. The Design of Concentration Inversion Procedure
The remote sensing images must be processed pixel by
pixel during the pretreatment or inversion. Due to the volume As mentioned above, WRS needs to execute the
of remote image, it often can’t be read into memory at once. concentration inversion procedure on the final blocks. For this
Thus, we need to design an algorithm to divide a whole image procedure, first, we try to train the ε-SVR model by the
into blocks. These blocks are smaller enough to be read one by correlative bands of the remote sensing image and the actual
one. So, the specific procedure, like inversion and pretreatment, concentration which acquire from ground monitoring sites.
can be allowed to work on these blocks one after another. Then, the remote sensing image is imported into the ε-SVR
model. Finally, we get the inverse concentration of the water
In this paper, a division algorithm is designed to divide the quality indicator. Now, we will realize this procedure in
image into blocks based on Divide and Conquer algorithm. computer. First, the pixels must be found by the location of the
This algorithm needs to divide an image, self adaptively, into ground monitoring sites, and then combine them and actual
blocks which have equal size, because the size of input remote concentration into training set after pretreatment. Second, train
sensing image is uncertain. The basic idea of this algorithm is the ε-SVR using the training set, and then a ε-SVR model will
the following: First, an original image is divided into 4 roughly be returned. Finally, import the remote sensing image, and then
equal sized blocks by drawing a cross at the center of it. Then, we will get the predicted concentration distribution image.
these blocks are divided into sub blocks by using the same Fig.5 is a flowchart of this procedure. We can observe how the
method. Continuing this process until the size of blocks less data flows among modules in Fig. 5.
than the size which defined by user. Finally, these blocks are
read in the memory to process one by one (Fig. 4). The process In this paper, a data management interface is designed to
of the whole image is complete when all of these blocks are manage the data of ground monitoring sites. Moreover, the
processed. It’s a typical Divide and Conquer problem. The LIBSVM is encapsulated to a SVM interface to provide
division of the algorithm can be designed as recursion. The services of SVM modeling and inversion of water quality
stopping criterion of this recursion is “block size ≤ predefined indicator concentration. The two interfaces mentioned above
size”. This algorithm can be described as follow: can be invoked by CModel. The data structure and functions of
LIBSVM is invisible for CModel.
⎧Θ(n 2 ) n≤S
T ( n) = ⎨ (15)
⎩4T (n / 4) + Θ(1) n > S .
Here, S represents the predefined size. The time complexity at
the stopping criterion is Θ( n 2 ) , because these final blocks,
which are subject to algorithms of remote sensing image
processing, must be processed pixel by pixel.

Figure 5. SVM modeling and concentration inversion procedure.


Figure 4. Division algorithm diagram.

2166
V. RESULTS

A. Inverse Results of SVM Model


A Taihu Lake image which created on 5/4/1997 based on
Landsat TM is chosen to make concentration inversion of
chlorophyll-α and suspended substance. The actual
concentrations and predicted values are shown in Table I.

TABLE I. CONCENTRATION COMPARISON OF ACTUAL AND PREDICTED


VALUES

Concentrations (mg/L) Figure 7. Inversion interface.


Sites Chlorophyll-α Suspended Substance
Actual Predicted Actual Predicted
1 0.039 0.036 46 45 TABLE II. INVERSION PERFORMANCE OF WRS
2 0.022 0.020 107 106
3 0.016 0.018 14 13 CPU Memory (Byte) Time (μs)
4 0.017 0.018 18 18 Pentium IV 3.0G 1G 20,539
5 0.013 0.013 41 40 Pentium Dual-Core E5200 2.5G 2G 11,953
6 0.016 0.017 35 35
7 0.017 0.019 15 15 VI. CONCLUSIONS
8 0.008 0.015 15 15
9 0.039 0.041 22 22 In this paper, we design a method of water quality inversion
10 0.047 0.045 16 16 based on SVM. The accuracy of this method is better than
11 0.016 0.014 41 41 traditional ones. At the same time, we design a simulation
12 0.104 0.102 25 24 platform of water quality monitoring (WRS) using remote
sensing technology based on MVC pattern. It is able to realize
Fig. 6 shows the inverse results. The concentrations of functions of reading, writing and processing of remote sensing
Chlorophyll-α and suspended substance are very high at the image, water quality inversion and management of monitoring
north and east of Taihu Lake, especially at the river entrances data. We utilize the division algorithm to divide the original
and the lake’s bay. In these regions, the jurisdiction belongs to image into blocks. This algorithm can be utilized to process
Wuxi and Suzhou which are the densely populated, the most huge-volume remote sensing image. The research and design
industrially and agriculturally prosperous and heavy polluted mentioned in this paper can be generalized to the application of
cities. water quality monitoring.

ACKNOWLEDGMENT
This work is supported by National Natural Science
Foundation of China (60774092) and ( 60901003), Research
Fund for the Doctoral Program of Higher Education of China
(20070294027).

REFERENCES
[1] Q. Wang, B. Zhang, Y. C. Wei and X. W. Li, Remote sensing
monitoring experiment of Taihu Lake water body and software
realization. Beijing: Science Press, 2008.
[2] Louis K E, Yan X H, “A neural network model for estimating sea
surface chlorophyll and sediments from Thematic Mapper imagery,”
Remote Sensing of Environment, vol. 66(2), 1998, pp.153-165.
Figure 6. Inverse results of concentrations (Unit: mg/L). (a) Concentration [3] X. G. Zhang, “Introduction to statistical learning theory and support
distribution of chlorophll-α. (b) Concentration distribution of suspended vector machines,” Acta Automatica Sinica, vol.26(1), 2000, pp.32-42.
substance.
[4] Han L, Rundquist D, Liu L, Fraser R, Schalles J, “The spectral response
of algal chlorophyII in water with varying levels of suspended
B. Performance of WRS sediment,” International Journal of Remote Sensing, vol.15(18), 1994,
pp.3707-3718.
Fig. 7 is the interface of WRS after concentration inversion [5] Cortes and V. Vapnik, “Support vector networks,” Machine Learning,
of chlorophyll-α. There is a thumbnail window at the upper left vol. 20, 1995, pp.273-297.
corner, and a full resolution window at the right side. [6] Bernhard Schölkopf and Alexander J. Smola, Learning with Kernels,
Cambridge, MA:MIT Press, 2002
We use WRS to process the Landsat TM data whose
[7] Chih-Wei Hsu, Chih-Chung Chang, Chin-Jen lin, “A Practical Guide to
resolution is 4017×4132 and number of band is 7 on two Support Vector Classification,” unpublished.
hardware conditions. The time consumed of predicting the [8] Chih-Chung Chang, Chih-Jen Lin, “LIBSVM: a Library for Support
concentration of chlorophyll-α via ε-SVR is shown by Table II. Vector Machines,” unpublished.

2167

You might also like