Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 28

Annual Progress Report 2008-09

THEME-III
EXPERIMENT 2
MODELLING OF SPRING DISCHARGE

INTRODUCTION
In the high altitude area, the river flow in deep valley at the toe of slopes rarely serve
any purpose as far as domestic water supply and irrigation are concerned. Thus, in Uttarakhand,
state of India, natural springs are the available major source of water. About 90 per cent of the
rural population of this region depends upon natural springs for their water demands. That’s why
the villages in hills are clustered around the springs. There is hardly any settlement where there is
no spring. These springs are called ' naula ' (very shallow 1-2 m deep, appropriately lined wells
which recover water from seepage) in Kumaun region, while it is called as ' dhara ' in the
Garhwal region of Uttarakhand hills.

Under the paucity of information on types of spring, spring discharge in relation to


rainfall, geology and geomorphology, land use, vegetation anthropogenic interference, etc.
Government operated water supply schemes are becoming non functional fast due to diminished
discharge of springs, mismanagement of water and consequent apathy of the people. In most of
springs in Himalayan area, the spring flow has decreased by 50 per cent within last 30 years and
the piped drinking water in hilly areas is failing due to drying up of springs and has adversely
affected the water supply in the irrigation channels. Under these circumstances, people will move
wherever the water moves. Studies indicate that deforestation, grazing and trampling by livestock,
erosion of top fertile soil, forest fires and developmental activities (e.g. road cutting, mining,
building construction etc.) are the causes of the spring flow reduction.

Rainfall-spring discharge relationship is a complex hydrologic phenomenon. Temporal


and spatial variability in watershed characteristics, uncertainty in rainfall patterns and a large
number of input parameters, which are responsible for transforming rainfall in to spring discharge,
are the major sources of complexity in such relationship. Several process based models are in
vague, which mathematically simulate spring discharge generation process on a watershed scale.
However, determination of large number of parameters involved in their application restricts
wider adaptability of these models.

Natural Spring Discharge Modelling

Hydro-geologic models proposed by researcher, are complex in nature. They require


information of several parameters such as transmissivity and storativity of aquifer, length of
transmission zone, width of springs opening and instantaneous recharge. But there is lack of
regular monitoring of spring behaviour and detailed study of the cathchment areas of springs due
to which it is difficult to determine these parameters.

The models used for prediction of spring discharge can also be solved numerically by
using suitable non linear optimization techniques such as Marquardt algorithm, gradient
projection, Gauss-Newton method, etc. However, these non linear optimization techniques mostly
yield local optima. On the other hand, the Genetic Algorithm (GA) optimization technique
provides a highly efficient and effective tool for ensuring convergence to near optimal or optimal
solutions (McKinney, 1994; Morshed and Kaluarachchi, 2000; Reed et al., 2000). It is also
cost effective and less time consuming.
Annual Progress Report 2008-09

Therefore, in the present study, the Genetic Algorithm (GA) approach has been applied for
prediction of weekly spring discharge of micro-watershed of Henval river. This study aims at
highlighting the GA technique and demonstrating its reliability and validity for prediction of
weekly spring discharge. Overall, the results of the present study will be of great significance for
planning, design, and management of spring discharge.

Genetic algorithms (GAs) are robust search methods that seek to reproduce
mathematically the mechanisms of natural selection and population genetics, according to the
biological processes of survival and adaptation (Goldberg, 1989). GAs were introduced in the
United States in the 1970s by John Holland at the University of Michigan. The continuing
price/performance improvements of computational systems have made them attractive for various
types of optimization. In particular, genetic algorithm works very well on mixed (continuous and
discrete), combinatorial problems. They are less susceptible to getting ‘stuck’ to local optima than
gradient search or other traditional optimization methods, rather they search for global optima.

The need for solving optimization problems arises in almost every field and, in particular,
is a dominant theme in water resource systems. Consequently, many analytic and numerical
optimization techniques have been developed. However, there exist a great number of functions,
such as discontinuous, non-differentiable, non-convex which are beyond analytic methods and
presents profound difficulties for numerical techniques. Moreover, traditional optimization
techniques depend highly on a deterministic relationship between the model parameters and its
performance; to date, these techniques have been unable to optimize the performance of complex
systems (McKinney and Lin, 1994; Reed et al., 2000). Therefore, new and more robust
optimization techniques, which are capable of handling such problems, are needed. The search for
efficacy and efficiency of numerical optimization methods has led researchers to reproduce
system mechanisms that are naturally robust.

In the past decade, GAs have been applied successfully to a number of problems such as
optimizing simulation models, fitting nonlinear curves to data, solving systems of nonlinear
equations and machine learning are salient studies, in which GAs have been successfully
employed to solve complex problems of water resources. Savic and Walters (1995a, b) applied
GAs within the framework of an evolution program by integrating it with the hydraulic analysis of
water distribution networks to determine optimal location of isolating valves. They demonstrated
that the evolutionary process considerably accelerates the search for optimal solution, reducing
the number of hydraulic analyses of a distribution network. Further, it allows infeasible solutions
to stay and help guide the search.

GENERAL DESCRIPTION OF THE STUDY AREA

Study area is located near Ranichauri, on Rishikesh-Uttarkashi route, in Tehri Garhwal


district of Uttarakhand, as shown in Fig. 2.1. The watershed, drains into Henval river in Tehri
Garhwal district of Uttarakhand, covering an area of 871 ha (8.71 km 2). It is located between 78o
22’ 28” to 78o 24’ 57” E longitude and 30o 17’ 19” to 30o 18’ 52” N latitude. The elevation varies
from 960 to 2000 m above mean sea level (msl). Slope, drainage, shade cast etc. are the important
elements of topography. There was a lot of variation in the topography of the study area. In
conformity with the dramatic altitudinal and climatic differences, the region supported variety of
forest ecosystems. Pine forests predominated above 1500 m elevation from msl.
Annual Progress Report 2008-09

Fig. 2.1 Index map of study area

Soils

Soils of the Tehri Garhwal district were formed from rock with biotite, schist and phyllitic
material, under cool and moist climate. The soils of micro-watershed of Henval river were brown
to greyish brown and dark grey in colour, besides being non-calcareous and neutral to slightly
acidic in reaction. These soils were fairly deep and moderately permeable. Moderate to highly
acidic soils were also found at higher elevations, where rainfall was high and strong enough to
leach down the bases from the soil minerals under temperate climatic conditions.

Climate

The climate of this region was humid temperate but variations existed which largely
depend upon the altitude and geological differences. The most common factors which lead to the
development of micro-climate were altitude, aspect, slope, drainage condition, vegetation etc. The
valleys were hot in summer and cold in winter. The average temperature in this area varied from 3
o
C to 30 oC. The average rainfall in study area varied from 1200 to 1400 mm of which 70 to 80 per
cent was received between June to September. With further increase in elevation, rainfall tended
to decrease. The relative humidity at 8.30 hrs varied from 60 to 70 per cent in the northern hills
and 30 to 40 percent in the south-western dry areas. Directional aspect played very important role
in the development of vegetation, particularly, at higher altitudes. Southern aspect was exposed to
more insolation.

GENETIC ALGORITHM (GA)


GA involves simple mechanisms including copies of strings and exchanges between
strings. A simple GA, capable of producing good practical results, is more commonly composed
of three operators: reproduction, crossover and mutation. Reproduction is a process in which
individual strings are copied in the next generation according to their objective function value, a
survival-of-the-fittest selection process. In its simpler form, crossover is considered as the
partial exchange of corresponding segments between two "parent" strings to produce two
"offspring" strings. Mutation is the occasional flipping of values, which allows the introduction
of new features into the population pool. There are many other variations of these operators
Annual Progress Report 2008-09

available in the literature on genetic algorithms (Goldberg, 1989; Reed et al., 2000; Hilton
and Culver, 2000).

In GA, the solution to a problem is represented as a genome (or chromosome).


The genetic algorithm then creates a population of solutions and applies genetic operators,
such as mutation and crossover to evolve solutions in order to find the best solution(s). The three
most important steps of genetic algorithms are: (1). definition of the objective function; (2).
definition and implementation of the genetic representation; and (3). definition and
implementation of the genetic operators. Once these three have been defined, the generic genetic
algorithm should work fairly well.
Genetic algorithms are original systems based on the supposed functioning of living beings.
It is very different from classical optimization algorithms because of the following unique
characteristics:
I. Use of the encoding of the parameters, not the parameters themselves.
II. Work on a population of variables, rather than a unique one.
III. Use of only values of the function to optimize, not their derived functions or
other auxiliary knowledge.
IV. Use of probabilistic transition function, not the determinist ones.
Natural evolution takes place in chromosomes which are the microscopic, threadlike part
of the cell nucleus that carry hereditary information in the form of genes. In a GA, an
individual "chromosome", referred to as a string, represents a possible solution. Each string
comprises a series of characters or features, equivalents of the biological genes, representing a
coding of the decision variable set. A population of chromosomes represents a set of possible
solutions. While many coding schemes are possible, Goldberg (1989) suggested that the
performance of a GA is optimal when a binary coding is implemented. However, many classes of
problems require different codings, even a real number coding for the decision variables. In a
recent work, Goldberg suggested that a real coding might be convenient in keeping a one-gene-
one variable correspondence besides the possibility of finding the best point regardless of the
initial population.

GA improves upon an initial population of strings representing a set of possible


solutions generated randomly. The repeated application of genetic operators searches efficient
solutions to the problem at hand. The solutions with higher values of objective function (usually
called 'fitness function') are retained, whereas weaker ones are discarded. Advantages of GA
over traditional search methods include the fact that they retain a population of well-adapted
sample points, thereby increasing the chance of reaching the global optimum.

Representation schemes
In the present study, an initial population of parameters (i.e. a set of solutions) proportional
to the total string length is generated using a random generator. Every population is presented by a
set of parameter values that describe the problem. The parameters can be represented as binary
string type, integer type, enumerated data type, or real/continuous type.
Annual Progress Report 2008-09

Fig 2.2 General flow chart of Genetic Algorithm

Traditionally, GAs have been developed by using binary coding, in which a chromosome is
represented by a string of binary bits (i.e., O's and 1 's) that can encode integers, real numbers, or
anything else appropriate to the problem. Binary strings are easy to operate on, and within any
Annual Progress Report 2008-09

gene, binary representations can be mapped to values in the range feasible for the variable
represented (Goldberg, 1989). The parameters (variables) are transformed into a binary string
of specific length. The length of the variable vector called sub-string, is determined
according to the desired solution accuracy and is also dependent on the specified range of
parameter and the precision requirement for the parameter represented by sub-string; the longer
the string length, the more accurate the result. The relationship between the string length and
the precision is expressed as (Goldberg, 1989):

 Dk  Ck 10 a  2l  1 …(2.1)

where a = required precision parameter, l = string length, Ck and Dk = lower and upper limits
of the variable, respectively, and k = variable number.

At each generation, every chromosome is evaluated by measuring the evaluation


fitness (or simply fitness) value. It is accomplished by decoding binary strings into real
parameter values. The value for each variable is decoded using the following equation:
Dk  Ck l
Z k  Ck 
2l  1
D 2
j 0
j
l
…(2.2)

Where Zk = real number of variable k, and the remaining notations are the same as in Eqn (2.1).

An alternative approach to the formulation of a GA is to use a representation appropriate to


the components of the problem. In a real-value representation, individual genes of a
chromosome are initially allocated random values within the feasible limits of the variable
represented. With a sufficiently large population of chromosomes, adequate representation is
achieved. There is a significant advantage in not wasting computer time on decoding for
objective function evaluation, though a more careful approach to mutation is required.
In the developed GA optimization program, the number of variables, population
size, and number of generations can be selected according to the specific problem.

Fitness function
Since GAs mimic the survival-of-the-fittest principle of nature to make a search
process, they are naturally suitable for solving maximization problems. Minimization problems
are usually transformed into maximization problems by adopting suitable transformations. For
maximization problems, the fitness function can be considered to be the same as the
objective function, i.e. F(x) = f(x). However, for minimization problems, the fitness
function is an equivalent maximization problem chosen such that the optimum point remains
unchanged. The following fitness function is used for minimization problems in the developed
GA program:

F (x)= l/(l + f(x)) …(2.3)

It should be noted that this transformation does not alter the location of the minimum, rather
converts a minimization problem to an equivalent maximization problem. The developed GA
program is capable of solving both the maximization and minimization problems.

Roulette wheel selection approach


Annual Progress Report 2008-09

Selection is the process by which chromosomes are chosen for participation in the
reproduction process. Deterministic sampling, reminder stochastic sampling with and without
replacement, stochastic sampling with and without replacement, fitness proportionate selection,
stochastic tournament (Wetzel ranking) are some of commonly used selection schemes
(Goldberg, 1989). In this study, the fitness proportionate selection is chosen, where a string
is selected for the mating pool with a probability proportional to its fitness. Thus, the probability
(pi) of an individual string, / being selected is given by
fi
Pi  n
…(2.4)
f
i 1
i

where fi = fitness of individual i, and n - population size.

Since the population size is usually kept fixed, the sum of the probability of each string
being selected for the mating pool must be one. In order to implement this selection scheme, a
roulette wheel with its circumference marked for each string proportionate to the string's fitness, is
simulated. The roulette wheel is spun 'n' times, each time selecting an instance of the string
chosen by the roulette wheel pointer. Since the circumference of the wheel is marked according
to a string's fitness, this roulette wheel mechanism is expected to make fi/fmean copies of the Ith string
in the mating pool, where the average fitness of the population (fmean) is calculated as:

1 n
f mean   fi
n i 1
…(2.5)

Figure 3 shows a roulette wheel for four individuals having different fitness values. Since
the second individual has a higher fitness value than any other, it is expected that the roulette
wheel selection will choose the second individual more than any other individuals. In other
words, the string with a higher fitness value will represent a larger range in the cumulative
probability values, and therefore has a higher probability of being copied into the mating pool.
This roulette wheel selection scheme is simulated through a computer program to implement the
genetic algorithm.

Crossover
The general theory behind the crossover operation is that, by exchanging important
building blocks between two strings that perform well, the GA attempts to create new strings that
preserve the best material from two parent strings. The number of strings in which material is
exchanged is controlled by the crossover probability forming part of the parametric data. In fact,
crossover is a recombinant operator that takes two individuals, and cuts their chromosome strings
at some randomly chosen position. This produces two 'head' segments and two 'tail'
segments. The tail segments are then swapped over to produce two new full length
chromosomes. There are two types of crossover, viz. single point crossover and multiple
crossover as illustrated in Figs. 2.3 and 2.4. In the single point crossover, a crossover point is
selected at random at some point in the chromosome length and then swapping is done. But in
the multiple crossover, parents
genetic material between two or moreoffspring
positions chosen at random along
the length of the chromosomes is exchanged. The GA code developed in this study has an option
for the user to choose the type of crossover.
Annual Progress Report 2008-09

Fig. 2.3 Single crossover

parents offspring

Fig. 32.4 Multiple crossover

eneration. Once a string has been selected for reproduction, an exact replica of the string is
made. This string is then entered into a mating pool for further action of genetic operators.

Mutation
Mutation is an important process that permits new genetic material to be introduced to a
population. A mutation probability is specified that permits random mutations to be made to
individual genes. Mutation provides a small amount of random search, and helps to ensure
that no point in the search space has a zero probability of being examined.
While running the GA program, users can select the crossover and mutation probabilities
within the recommended range.

NATURAL SPRING DISCHARGE MODELLING FOR MICRO-WATERSHED OF


HENVAL RIVER

In the present study, meteorological data and spring discharge data on daily basis were
collected from the observatory of College of Forestry & Hill Agriculture, Hill Campus of G. B.
Pant University of Agriculture & Technology, Ranichauri, Tehri-Garhwal, Uttarakhand. The data
were arranged in 52 standard meteorological weeks. In each year 8 days were counted in 52nd
meteorological week.

Model development
A variation in spring discharge depends on the inputs like rainfall and other
climatological parameters (i.e. evaporation, temperature, relative humidity and wind speed) apart
from other variables like ground water recharge. In the present study, only five variables namely:
rainfall, evaporation, temperature, relative humidity and wind speed on average weekly basis
were incorporated to study their effect on natural spring discharge using Genetic Algorithm
optimization technique. Using this, the one input variable (rainfall), two input variables (rainfall
and average evaporation), three input variables (i.e. rainfall, average evaporation, and average
Annual Progress Report 2008-09

temperature), four input variables (i.e. rainfall, average evaporation, average temperature, average
relative humidity) and five input variables (i.e. rainfall, average evaporation, average temperature,
average relative humidity, average wind speed) were taken and models were developed at
different week time lag scale, separately. Here, input variables are uncertain and depend totally on
climatic conditions of the area. These uncertainties can be handled effectively by making suitable
model with the help of Genetic Algorithm optimization technique.
The respective model structure for the rainfall-spring discharge hydro-geological process
may be shown as below:

Input Output

Rainfall
Evaporation
Temperature Hydro-geological Process Spring discharge
This block
Relative Humidity

Wind Speed

The above diagram may be expressed in mathematical form as,


Qo = f ( R, E, T, RH, WS ) . (2.6)
where Qo = Observed spring discharge,
R = Rainfall,
E = Evaporation,
T = Temperature,
R H = Relative Humidity, and
W S = Wind Speed
The general form of the spring discharge model may be written in the form of simple
linear equation as;
Qo =a + b*R + c* E + d* T + e* RH + f* WS ….(2.7)
where a, b, c, d, e and f are the model parameters of input variable which are to be optimized.
Determination of suitable model
The influence of rainfall is higher as compared to other input variables (evaporation,
temperature, relative humidity and wind speed) on a spring discharge. As in input-output
relationship, the output variable discharge vary according to variation of the input variable i. e.
rainfall.
Therefore, for a present study the relationship between the rainfall and the spring discharge
was studied with simple linear regression equation, as given below
Y= a* X + b … (2.8)
Where Y= dependent variable,
X= independent variable, and
a, b = constants
Annual Progress Report 2008-09

by putting the rainfall (R) as independent variable and spring discharge (Q o) as dependent variable
in above equation, the new equation becomes,

Qo = a* R + b …(2.9)

The above equation was evaluated with different weekly time lag scales of rainfall i.e.
from one to ten week time lag scale and root mean square error (RMSE) for each time lag was
calculated and compared. Among the 10 weekly time lag scale model, the model with a minimum
RMSE was finally selected for the further development. For a weekly time lag scale, it was found
that the model with a 4 week time lag scale yielded the minimum RMSE. Therefore, a model with
4 week time lag scale of rainfall was finally selected for the optimization process.

The remaining input variables (i.e. average evaporation, average temperature, average
relative humidity and average wind speed) were then added to the final model one by one and
their correlation coefficients were calculated. Finally with all input variables, the moving average
of rainfall as one of the input variable was also added to the model for the greater accuracy.
By adding all the variables in the model and taking their exponential form, the final model
selected was as follows,
(a  b  R  c  E  d  T  e  RH  f  WS  g  MV1  h  MV2  .  nMVn)
Qo  e

…(2.10)
where MV1, MV2…MVn are the moving average of rainfall variable.
The above model was finally selected for the estimation of the model parameter by using
Genetic Algorithm Optimization technique.

Application of GA to estimation of spring discharge model parameter


Formulation of objective function
If Q0 [ ] and Qp [ ] are the observed and predicted spring discharges respectively, then the
residual error is given as

Error (E)  Q o    Qp  …(2.11)


The root mean square error (RMSE) was calculated as;

    Q p   2
n
 Qo
i 1 …(2.12)
RMSE 
n

Where n is the total number of weeks.


Eqn. (3.12 ) served the purpose of objective function as root mean square error (RMSE) is
to be optimized in this study.

Development and implementation of computer code for optimizing model parameter


Every hydro-geological model contains number of parameters, which cannot be measured
directly because of their conceptual nature and therefore, they are estimated on the basis of
calibration. Calibration of a hydro-geological model is an optimization process, which involves
adjusting of the parameter values so that simulated output fits the corresponding observed output
as closely as possible. Deviation between simulated and observed outputs represents the objective
function. Therefore purpose of optimization of the objective function is to find the values of model
parameters which ultimately minimize the deviation between simulated and observed outputs.
Annual Progress Report 2008-09

A computer code in ‘C’ programming language developed by Prof. Kalanmoy Deb (2001)
was selected and modified for the calculation of the model parameters under different hydro-
geological conditions, by GA optimization technique. For the spring discharge model eqn. (3.5),
the parameters to be determined are a, b, c, …..n. The GA could optimize these model parameters
simultaneously through a systematic search. Suitable string length was selected based on the
desired precision for each model parameter and coded with binary digits. In this study, a string
length of 10 was selected. The chromosome length is set equal to the sum of individual string
length. Then the suitable values of population, generation, crossover and mutation probability
were decided. Based on the recommended criteria (Deb, K. 2005) a population size of 200, a
crossover probability of 0.9 and a mutation probability of 0.05 were finally selected.

The modified computer program was run on the Windows operating system (Pentium IV,
512 MB RAM, 80 GB HDD). The best run corresponding to the minimum root mean square errors
(minimum RMSE) was taken as the final solution to the problem.

Performance Indicators

To judge the predictive capability of the developed methodology, based on Genetic


Algorithm optimization technique, the following performance indicators were used.

Correlation coefficient (R2)

R 2  C pa (C p * C a ) … (2.13)
where
n
C pa   (ai  a avg )( pi  p avg ) /(n  1) … (2.14)
i 1
n
C p   ( pi  p avg ) 2 /(n  1) … (2.15)
i 1
n
C a   (ai  a avg ) 2 /(n  1) … (2.16)
i 1

where a1 , a 2 , ……, a n are actual values and a avg is average of actual values.
p1 , p 2 , ……, p n are estimated values and p avg is average of estimated values.

The correlation coefficient measures the statistical correlation between the predicted and
actual values. Higher value of correlation coefficient is the indicator of model. This performance
measure is only used for numerical input and output.

Root mean square error (RMSE)

Root mean square error is the most commonly used measure of success of numeric
prediction. The root mean square error was evaluated with the help of following equation:

n
RMSE  (1 / n)( (a i  pi ) 2 ) … (2.17)
i 1

where n = Number of observations, and the other terms are same as defined earlier.

Coefficient of Efficiency (CE)


Annual Progress Report 2008-09

The coefficient of efficiency defined as the proportion of the initial variance accounted
for the model. The coefficient of efficiency is determined by the following equation.

 a  a    a  p 
n 2 n
2
i i i
CE  i 1 i 1
…(2.18)
 a  a
n 2
i
i 1

Absolute prediction error (APE)


The absolute prediction error was determined by following equation:
n

 a p i i
APE  i 1
n … (2.19)
ai 1
i

Coefficient of variation of the residual error (CVRE)

The coefficient of variation of residual error was estimated by following equation


(Luchetta et al., 2003):
n

a  pi 
2
i
1 i 1 … (2.20)
CVRE 
a n
where a is the average of the observed values.

Estimation of Water Demand

The water requirement was estimated on weekly basis as per standard meteorological
weeks. For the estimation of reference evapo-transpiration (ET O), FAO Penman-Monteith method
was used. Before estimating the total water requirement, crop evapo-transpiration and effective
rainfall were also determined.

THE DEVELOPMENT OF MATHEMATICAL MODEL FOR THE PREDICTION OF


SPRING DISCHARGE USING GA APPROACH

Since the water of Hill Campus spring is being used continuously through out the year for
domestic as well as for irrigation purposes, the total period of 52 weeks of the year has been
considered for the purpose of the development, calibration and validation of the model.

SPRING DISCHARGE PREDICTION MODEL

Development and calibration of the model

The spring discharge prediction model, represented by eqn. 3.10, has been developed for
the Hill Campus Spring of the micro- watershed of Henval river, using multiple input and single
output linear function approach. The model parameters have been estimated using a very strong
global optimization tool Genetic Algorithm technique as per the procedure detailed above .
Weekly average data of five input variable (rainfall, evaporation, temperature, relative humidity
Annual Progress Report 2008-09

and wind speed) and one output variable (spring discharge) of 6 years (312 weeks) were selected
for the development and calibration of the model.

As discussed earlier the suitable model for prediction of spring discharge has been
developed by initially calculating the minimum value of Root Mean Square Error “RMSE” by
using GA optimization technique for the first ten weeks time lag of rainfall variable using eqn.2.9.
Therefore, the GA program has been set to estimate the parameters of the eqn.2.9 with
minimization of RMSE as an objective function, and it was found that the rainfall variable with a
four week time lag scale yielded minimum value of RMSE (7611). Therefore, the rainfall variable
with four week time lag scale criteria was selected for model development. The RMSE values of
first ten week time lag scale of the rainfall variable are graphically shown in Fig.2.5 .

8600
8400
8200
8000
RMSE

7800 RMSE
7600
7400
7200
7000
0 1 2 3 4 5 6 7 8 9 10
Time lag in w eeks

Fig.2.5 RMSE values for the estimation of spring discharge with one input i.e rainfall only
with different time lag values

The crossover probability, mutation probability, length of binary string, random seed
number and maximum run were set as 0.9, 0.05,10, 0.123 and 5, respectively. The values of
maximum number of generation, population size, number of binary variable with their lower and
upper bounds were set according to the number of parameters in the models shown in Table 2.1.
As presented in the table, the various equations have been used for model development and the
relationship between input variables (rainfall, evaporation, temperature, relative humidity and
wind speed), and output variable (discharge) for four week time lag scale of rainfall have been
evaluated for their RMSE and Correlation Coefficient (R).

For four week time lag scale, the relationships between single input variable and single
output variable, with three equations, were evaluated using Genetic Algorithm optimization
technique and their results are shown in Table 2.2.

Table 2.1 Different mathematical models tried with Genetic Algorithm optimization technique
Annual Progress Report 2008-09

Model Model
No.

1 Q= a+ b*R

2 Q= a+ b*R + c*R2

3 Q= a+ b*R + c*R2 + d*R3

4 Q= a+ b*R + c*R2 + d*R3 + e*E

5 Q= a+ b*R + c*R2 + d*R3 + e*E + f*T

6 Q= a+ b*R + c*R2 + d*R3 + e*E + f*T +g*RH

7 Q= a+ b*R + c*R2 + d*R3 + e*E + f*T +g*RH + h*WS

8 Q= a+ b*R + c*R2 + d*R3 + e*E + f*T +g*RH + h*WS + e*E2 + f*T2 +g*RH2 + h*WS2

9 Q= a+ b*R + c*MV1 + d*MV2 + e*MV3

10 Q= a+ b*R + c*MV1 + d*MV2 + e*MV3 + f*E

11 Q= a+ b*R + c*MV1 + d*MV2 + e*MV3 + f*E + g*T

12 Q= a+ b*R + c*MV1 + d*MV2 + e*MV3 + f*E + g*T + h*RH

13 Q= a+ b*R + c*MV1 + d*MV2 + e*MV3 + f*E + g*T + h*RH + i*WS

14 Q= a+ b*R + c*MV1 + d*MV2 + e*MV3 + f*MV4 + g*MV5 +h*E + i*T + j*RH + k*WS

15 Q= a+ b*R + c*MV1 + d*MV2 + e*MV3 + f*MV4 + g*MV5 + h*MV6 + i*MV7 +j*E + k*T + l*RH +
m*WS

16 Q= exp (a+ b*R + c*MV1 + d*MV2 + e*MV3 + f*MV4 + g*MV5 + h*MV6 + i*MV7 +j*E + k*T +
l*RH + m*WS)

Similarly, the relationships between two input variables (rainfall, evaporation), three input
variables (rainfall. evaporation, temperature), four input variables (rainfall, evaporation,
temperature, relative humidity), and five input variables (rainfall, evaporation, temperature,
relative humidity, wind speed), and single output variable (discharge) were also evaluated with
various equations by using GA technique and their results are shown in Tables 2.2. From the
above table, it was found that the relationship between five input variables and single output
variable with model 8 in Table 2.1 yielded minimum value of RMSE (5630) and the correlation
coefficient for the equation was found to be 0.75, which is not in acceptable range. Therefore,
another relationship between input and output variables with the moving average of rainfall
variable was tried using Genetic Algorithm Optimization technique and their results are given
from model 9 to model 16 in Table.2.2
Annual Progress Report 2008-09

Initially, the relationship between single input variable (rainfall) with its moving
average and the output variable (discharge) was evaluated for the model 9 and the parameters
were optimized by using Genetic Algorithm optimization technique. Similarly, the relationship
between two input variables (rainfall and evaporation), three input variables (rainfall, evaporation
and temperature), four input variables (rainfall, evaporation, temperature and relative humidity),
and five input variables (rainfall, evaporation, temperature, relative humidity, and wind speed)
with moving average of rainfall variable and output variable (discharge) were studied and their
results are given in Table 2.2 for model 9 to 16, respectively. By comparing all the models from 9
to 16, with their RMSE value and correlation coefficient, it was observed that the model 15 with
five input variables with moving average of rainfall variable and single output variable yielded
minimum value of RMSE (5078) and maximum correlation coefficient (R = 0.80) and finally by
taking their exponential form (model 16), it was observed that the value of RMSE decreased
further to 4806 and correlation coefficient increased to 0.83.

Therefore, model 16 given in Table 2.1 was selected as the final model for prediction of
spring discharge because of its minimum value of RMSE and high value of correlation coefficient
. Thus, the finally selected spring discharge prediction model has 13 parameters and the Genetic
Algorithm program (Appendix A) was set to estimate 13 parameters by minimizing the objective
function (eqn. 3.12). The crossover probability, mutational probability, population size and
maximum run were set as 0.9, 0.05, 200 and 5, respectively. Each generation of the Genetic
Algorithm produces an objective function value and a set of 13 parameters. This procedure was
done for 5 runs, out of which run number 3 yielded minimum value of RMSE (4806) at a
generation number 876 with 13 parameters. Thus by substituting the values of different
parameters, obtained in run number 3 for the model 16, the final developed spring discharge
prediction model can be expressed as ;

Q= exp (8.469568 - 0.000608*R + 0.000385*MV1 + 0.000020*MV2 + 0.005898*MV3+


0.006950*MV4 - 0.000500*MV5 + 0.001000 *MV6 - 0.010000 *MV7 - 0.250888*E
+ 0.054633 *T + 0.012540 *RH - 0.029229 * WS)

‘Q’ is spring discharge (lpd), ‘R’ is the rainfall (mm), ‘E’ evaporation (mm), ‘T’ is the
temperature( oC ), ‘RH’ is the relative humidity (%) and ‘WS’ is the wind speed (m/s), MV1,
MV2,….MV7 are the moving average of rainfall variable, all the above variables are given on
weekly average basis.

The all above developed model presented in Table 2.1 have been tested for the data of 5
years (from 1999 to 2005 excluding data of year 2001). The predicted and observed values are
graphically shown in Fig. 2.6 to 2.22. It is clear from above figures that there was a close
agreement between observed and predicted spring discharges with minimum deviation as RMSE
was minimum for the finally developed model 16. The model qualitative performance indices
during training of the model are given in Table 4.9 and it was found that there was good
correlation between observed and predicted natural spring discharge with maximum correlation
coefficient (R = 0.83) during training of the data. The value of the other four indices RMSE, APE
and CVRE and CE were also within acceptable range.

Table 2.2 Values of parameters for different models.


Annual Progress Report 2008-09

Model Min. Generation Generation Population Correlation Coefficient


No. RMSE limit at min. size
RMSE

1 7611 500 41 50 0.44

2 7380 500 490 60 0.49

3 7330 500 496 100 0.50

4 7256 500 433 100 0.52

5 5912 500 590 100 0.73

6 5700 700 691 100 0.75

7 5665 1000 993 100 0.753

8 5630 2000 1900 200 0.757

9 6890 500 487 50 0.587

10 6644 500 480 60 0.625

11 5438 1000 990 100 0.779

12 5396 1000 992 100 0.782

13 5326 1000 989 100 0.787

14 5239 1000 980 150 0.794

15 5078 1000 987 200 0.808

16 4805 1000 876 200 0.83

Table 2.3 Qualitative evaluation indices for finally developed spring discharge prediction
model during training period.

Indicator Training
Annual Progress Report 2008-09

Correlation Coefficient (CC) 0.83


Root Mean Square Error (RMSE) 4752
Absolute Prediction Error (APE) 24.32
Coefficient of Variation of the Residuals Error (CVRE) 0.32
Coefficient of Efficiency (CE) 0.91

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286 301

weeks

Fig.2.6, Comparison of observed and predicted discharges


with Model 1.

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305

Weeks

Fig.2.7, Comparison of observed and predicted discharges with Model 2.


Annual Progress Report 2008-09

Observed

50000 Predicted
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286 301

Weeks
Fig.2.8,
Comparison of observed and predicted discharge with Model 3.

Observed
Predicted
50000
45000
40000
Discharge(lpd)

35000
30000
25000
20000
15000
10000
5000
0
1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305

Weeks

Fig.2.9, Comparison of observed and predicted discharges with Model 4.


Annual Progress Report 2008-09

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307

Weeks

Fig.2.10, Comparison of observed and predicted discharges with Model 5.

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286 301

Weeks

Fig.2.11, Comparison of observed and predicted discharge with Model 6.


Annual Progress Report 2008-09

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307

Weeks

Fig.2.12, Comparison of observed and predicted discharges with Model 7.

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307

Weeks

Fig.2.13, Comparison of observed and predicted discharges with Model 8.


Annual Progress Report 2008-09

Observed
Predicted

50000
Discharge (lpd)

40000

30000

20000

10000

0
1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307

Weeks

Fig.2.14, Comparison of observed and predicted discharges with Model 9.

Observed
Predicted

50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307

Weeks

Fig.2.15, Comparison of observed and predicted discharges with Model 10.


Annual Progress Report 2008-09

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 305

weeks

Fig.2.16, Comparison of observed and predicted discharges with Model 11.

Observed
Predicted
50000
Discharge (lpd)

40000

30000

20000

10000

0
1 20 39 58 77 96 115 134 153 172 191 210 229 248 267 286 305
Weeks

Fig2.17, Comparison of observed and predicted discharges with Model 12.


Annual Progress Report 2008-09

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307

Weeks

Fig.2.18, Comparison of observed and predicted discharges with Model 13.

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307

Weeks

Fig.2.19, Comparison of observed and predicted discharges with Model 14.


Annual Progress Report 2008-09

Observed
Predicted

50000
Discharge (lpd)

40000

30000

20000

10000

0
1 18 35 52 69 86 103 120 137 154 171 188 205 222 239 256 273 290 307

Weeks

Fig.2.20, Comparison of observed and predicted discharges with Model 15.

Observed
Predicted
50000
45000
Discharge (lpd)

40000
35000
30000
25000
20000
15000
10000
5000
0
1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286 301

Weeks

Fig.2.21, Comparison of observed and predicted discharges with Model 16.


Annual Progress Report 2008-09

Validation of the Model

Validity of the developed model was tested for two years data (years 2006 and 2007) on weekly
average basis. The qualitative performance indices of spring discharge prediction model for
validation of the model are given in Table GENETIC ALGORITHM (GA)

GENETIC ALGORITHM (GA)


2.4. It can be observed from Table 2.4 that, model holds good for nearly all the weeks in
two years as correlation coefficient and the coefficient of efficiency are 0.87 and 78%. Other three
qualitative indices RMSE, APE and CVRE are also well within permissible limit. The graphical
comparison of measured and predicted values of spring discharge for finally developed model are
shown in Fig 2.22. From visual comparison of a Fig 2.22 a close relationship can be observed
between measured and predicted values of spring discharge. However, the model slightly over
predicts the spring discharge for the weeks 81 to 92 which fall during the months of August and
September of the year 2007. Some of the reason may be attributed that rainfall of these months
was very high as compared to rainfall considered for the model development. The results showed
that, this global search optimization technique is reliable, accurate, less time and money
consuming approach. These models were found satisfactory on the basis of performance
evaluation of methods also, thus confirm the validity of model for the Hill Campus spring.

80000.0
Observed
70000.0
Predicted
Discharge (lpd)

60000.0
50000.0
40000.0
30000.0
20000.0
10000.0
0.0
1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97

Weeks

Fig.2.22 Variation of observed and predicted values of spring discharge prediction model
during year 2006 to 2007.

Table 2.4 Qualitative evaluation indices of spring discharge prediction model during
validation period.

Indicator Validation
0.87
Correlation Coefficient (CC)
7556
Root Mean Square Error (RMSE)
24.96
Absolute Prediction Error (APE)
Annual Progress Report 2008-09

0.54
Coefficient of Variation of the Residuals Error (CVRE)
0.78
Coefficient of Efficiency (CE)

Table 2.5 Maximum available area for the crops

Season crops Maximum available area (ha)

Lahi 3.39
1. Rabi
Wheat 1.86

Paddy 0.61

2. Kharif Sorghum 17.14

Maize 10.61
Annual Progress Report 2008-09

Table 2.6 Monthly irrigation water requirement of the crops and monthly water availability from spring
Month Monthly irrigation water requirement (mm) Spring water
availability (ha-
Lahi Lentil Wheat Barley Gram Potato Pea Paddy Sorghum Maize mm)

January 7.89 0 0 2.26 0 1.314 3.3 0 0 0 26.8

February 0 0 0 0 0 0 0 0 0 0 28.7

March 0 0 1.7 0 0 3.37 0 0 0 0 35.4

April 0 0 0 0 0 0 0 0 0 0 28.6

May 0 0 0 0 0 0 0 0 0 0 24.1

June 0 0 0 0 0 0 0 35.8 0 0 22.1

July 0 0 0 0 0 0 0 0 0 0 23.1

August 0 0 0 0 0 0 0 0 0 0 30.8

September 0 0 0 0 0 0 0 3.3 3.4 0.2 69.1

October 0 0 0 0 0 0 0.74 1.078 3.9 0 68.3

November 0 11.5 4.60 17.0 7.3 12.8 17 0 0 0 61.5

December 2.31 21.46 18.8 35.74 22.57 24.75 33.3 0 0 0 43.0


Annual Progress Report 2008-09

LITERATURE CITED
Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization and Machine
Learning.Addition-Wesley Publishing Company Inc., New York, pp. 1-145.
Hilton, A.B.C., and Culver, T.B. 2000. Constrainthandling for genetic algorithm in optimal
remediation design. Journal of Water Resources Planning and Management, 126, 3: 128-
137.
Mckinney, D.C., and Lin, M.D. 1994. Genetic algorithm solution of groundwater management
models. Water Resources Research, 30, 6: 1897-1906.
Morshed, J., and Kaluarachchi. J.J. 2000. Enhancement to the genetic algorithm for optimal
groundwater management. Journal of Hydrologic Engineering, 5, 1: 67-73.
Reed, P., Minskar, B., and Goldberg, D.E. 2000. Designing a competent simple genetic
algorithm for search and optimization. Water Resources Research, 36, 12: 3731-3741.
Reed, P., Minskar, B.S.,and Volochchi,A.J. 2000. Cost-effective long term groundwater
monitoring design genetic algorithm and global mass interpolation. Water Resources
Research, 36, 12:3731-3741.
Savic, D.A., Walters, G.A., 1995a. Integration of model for hydraulic analysis of water
distribution networks with evolution of program for pressure regulation. Microcomputers
in Civil Engineering, 10, 3: 219-229.
Savic, D.A., Walters, G.A., 1995b. An evolution program for optimal pressure regulation in
water distribution network. Engineering Optimization, 24, 3: 197-219.

You might also like