Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Int. J. Adaptive and Innovative Systems, Vol. 1, No.

1, 2009

Optimisation of pesticide crystal protein production


from Bacillus thuringiensis employing artificial
intelligence techniques
K. Pakshirajan* and C. Manda
Department of Biotechnology,
Indian Institute of Technology Guwahati,
Guwahati, 781039, India
Fax: +91-361-2690762
E-mail: pakshi@iitg.ernet.in
E-mail: c.manda@iitg.ernet.in
*Corresponding author
Abstract: Mixtures containing spores of the bacterium Bacillus thuringiensis
and its - endotoxins commonly referred to as pesticide crystal protein (PCP)
are very well known to be effective against several insects and pests. In this
work, three factors, namely medium pH, inoculum size and sugar concentration
from molasses, that were found to be highly significant for the production of
PCP were optimised of their levels using a combination of artificial intelligence
techniques artificial neural network (ANN) and genetic algorithm (GA).
Earlier results, expressed in terms of the culture optical density at 600 nm
wavelength (OD600), were first modelled by ANN based on back propagation
algorithm, which was highly accurate in predicting the system with coefficient
of determination (R2) value greater than 0.99 in both training and validation of
the network. Optimum values of 3.65 for pH, 6.009% for inoculum size and
1.61 g/L for sugar concentration were obtained using GA based on the
developed ANN model. At these optimised settings of the factors, a predicted
maximum OD600 value of 0.5764 was achieved, which was 9.17% more than
the previously obtained maximum experimental value.
Keywords: artificial neural networks; ANN; genetic algorithms; GA; pesticide
crystal protein; PCP; Bacillus thuringiensis; optimisation; back propagation;
artificial intelligence.
Reference to this paper should be made as follows: Pakshirajan, K. and
Manda, C. (2009) Optimisation of pesticide crystal protein production from
Bacillus thuringiensis employing artificial intelligence techniques, Int. J.
Adaptive and Innovative Systems, Vol. 1, No. 1, pp.7787.
Biographical notes: Kannan Pakshirajan is an Associate Professor at the
Department of Biotechnology, IIT Guwahati, India. His research interests
include artificial intelligence, environmental bioremediation and
biodetoxification, bioprocess kinetics and optimisation, toxic metal removal
and recovery, and development of novel bioreactors for waste minimisation. He
has been actively involved in major consultancy and research projects
supported by reputed national agencies.
Chaithanya Manda is a graduate (BTech) in Biotechnology from the
Department of Biotechnology, IIT Guwahati, India. His research interests
include artificial intelligence, mathematical biology, biostatistics and
bioinformatics. He has also completed a few short-term research projects
during his undergraduate program.

Copyright 2009 Inderscience Enterprises Ltd.

77

78

K. Pakshirajan and C. Manda

Introduction

Bacillus thuringiensis is a rod shaped gram-positive bacterium that can produce


parasporal crystal inclusions during sporulation (Hofte and Whiteley, 1989). The
parasporal crystals comprise of approximately two polypeptide chains each with an
estimated molecular mass of 130 kDa, which upon ingestion by the insect larvae is
cleaved to low molecular 68 kDa protein due to the combined effects of proteases and
alkaline pH in the insect gut (Schnepf et al., 1998). Consequently, the toxic protein
results in its death by ion leakage and ATP hydrolysis via formation of pores or lesions in
the brush border membrane vesicles of the larvae (Sharma et al., 2000).
For achieving maximum production of this pesticide crystal protein (PCP) using
B. thuringiensis on a large scale, it is necessary to optimise the levels of various factors
influencing the process, especially its fermentation. And generally, optimisation
strategies in fermentations are carried out by statistical based response surface
methodology (RSM) employing a simple quadratic regression equation relating output
variable(s) to input variables in the process. Owing to the complexity of biological
systems, application of such optimisation techniques is highly limited. Moreover,
modelling biological processes, based on which optimisation is performed, is also
cumbersome due to incomplete understanding of the process. In these cases, optimisation
based on data driven modelling approach, such as artificial neural networks (ANNs)
proves more useful.

1.1 Artificial neural networks


This alternate modelling procedure consists of a machine learning approach wherein the
principles of artificial intelligence are applied with the help of neural networks. ANNs
such as the three-layer back propagation network have been proved to serve as an
approximation for multi-dimensional functions (Hornik et al., 1989; Poggio and Girosi,
1990). ANNs have already been applied to solve, predict and optimise a variety of
environmental and biotechnological problems, such as biodegradation of organic
compounds (Schuurmann and Muller, 1994), identifying unknown air pollution sources
(Reich et al., 1999), predicting fed batch fermentation kinetics (Valdez-Castro et al.,
2003), control and management of a biological reactor for treating hydrogen sulphide
(Elas et al., 2006) and enhanced lipase production (Haider et al., 2008).
ANN mimics the learning process of the human brain. Neural networks acquire their
name from the simple processing units in the brain called neurons, which are
interconnected by a network that transmits signals between them. These can be thought of
as a black box device that accepts inputs and produces a desired output. A multi layer
perceptron (MLP) using the back propagation learning algorithm (Rummelhart et al.,
1986) is the most widely used neural network for forecasting/prediction purposes (Maier
and Dandy, 1998, 2001). MLP generally consists of three layers: an input layer, a hidden
layer and an output layer, a typical schematic of which is shown in Figure 1. Each layer
in the perceptron consists of neurons that are connected to the neurons in the adjacent
layers, which is governed by a scalar function called connection weights (Wij), and these
Wij take part in learning process of networks. Additionally, a bias term () is provided to
introduce a threshold for the activation of neurons. The input data (Xi) is presented to the
network through the input layer, which is then passed to the hidden layer along with the

Optimisation of pesticide crystal protein production

79

weights. The weighted sum of inputs


X iWij is passed through the neurons

i, j

activation function to give its output and the output of each such neuron in the hidden
layer becomes the input of the output layer neuron after being properly weighed using the
output weights. The desired output of the ANN is therefore the weighted sum of these
output layers inputs obtained after passing through the output neurons activation
function. The logistic sigmoid function, which is the most commonly used activation
function, is given by the expression:

f ( x) =
Figure 1

1
1+e x

(1)

Schematic of three-layer ANN architecture

Notes: X1, X2 and X3: input vectors; Y: output vector

The network internal parameters involve: number of neurons in the hidden layer, the
activation function, the learning rate of the network, epoch size, momentum term,
tolerance and training count. The best values of these parameters, necessary for
developing good network architecture, are usually obtained by trial and error method. An
ANN model developed in this way can then be used as an objective function for
optimisation studies to find the possible input combinations that yields a maximum
response by experimentation. However, the major disadvantage with conventional
optimisation methods is that convergence to global optimum values are always based on
proximity of the initial guess values of the factors to the optimum levels, thereby
necessitating more robust optimisation techniques. In such instances, the ANN model
equation can be easily optimised using genetic algorithms, which ensures global
maximum.

80

K. Pakshirajan and C. Manda

1.2 Genetic algorithms


Genetic algorithms (GA) are based on the evolution principle the survival of the fittest
while searching in an objective function, such as the ANN model being considered, for
maximisation. Global maxima in the entire solution space are achieved by GA using
string coding of variables for dividing the search space into discrete ones, wherein all
regions of the solution space are explored using randomly selected initial population of
individuals constituting a generation. The fitness of each individual is then evaluated
using the ANN model equation. Reproduction, mutation and crossover operations are
then performed to select the best individual and the process is repeated over a number of
generations to find optimal solutions (Deb, 1995).
The literature is replete with studies that demonstrate the effectiveness of RSM,
which is essentially a collection of statistical and regression techniques for optimisation
of certain processes. On the other hand, artificial intelligence techniques, such as the
ANN and GA are generally found to outperform RSM in modelling and optimisation.
But, in recent years, only a limited number of researchers have investigated the
possibility of using these non-conventional techniques in biological processes (Haider
et al., 2008). And, optimisation using such artificial intelligence techniques for enhancing
PCP production from a microbial source has not been studied, which proves helpful even
when a statistical based optimisation technique fail. In this paper, we report the
application of two artificial intelligence techniques, in combination, for modelling and
optimisation of PCP production from B. thuringiensis; while ANN was used to model the
experimental data obtained in our previous work, for optimising the levels of three
important factors, known to significantly affect PCP production by the bacterium in batch
fermentation, GA was applied based on the developed ANN model.

Methodology

2.1 Background about the previous work


In a previous study (Prabagaran et al., 2004), initial pH of the medium, inoculum size and
sugar concentration from molasses were found to be the most significant factors affecting
the PCP production using B. thuringiensis in batch fermentation. The three factors were
then optimised using a conventional statistical technique of RSM. The results on PCP,
expressed in terms of the bacterial cell density measured at 600 nanometres (OD600) after
72 hours of culture, were obtained by performing a 23 central composite design (CCD)
with six centre point replicates.

2.2 ANN modelling


Using the previously obtained experimental results, a neural network based model was
developed for predicting the OD600. The initial pH, inoculum size and sugar
concentrations were used as the model inputs and OD600 of the culture as the model
output. The entire experimental data, consisting of 20 observations, was randomly
divided into 75% (15 observations) for training and the remaining 25% (five
observations) was used for validating the developed ANN model. The values of internal
network parameters were chosen by trial and error that yielded the best coefficient of

Optimisation of pesticide crystal protein production

81

determination (R2) values during both training and validation of the network, and these
values are presented in Table 1.
Table 1
Parameters

Optimal values of network parameters used in ANN modelling


Training
matrix

Test
matrix

No. of hidden
layer neurons

Error
tolerance

Theta

Learning
rate

15

0.001

0.5

Values

2.3 Optimisation by GA technique


GA approach based on the developed ANN model for predicting OD600 was used for
optimising the levels of the three factors significantly affecting PCP production. This
approach was tested against the conventional RSM optimisation technique applied earlier
in the previous work.
The GA parameters set for this study are presented in Table 2, and Table 3 gives the
lower and upper bounds of the individuals used in the GA. The input individuals
comprised of the values of input variables, viz., initial pH, inoculum size and sugar
concentration. An initial random population of 100 individuals were taken in the starting
generation and fitness of the individuals over several generations was measured using the
developed ANN model.
Table 2
Population
size

GA parameters set for finding the optimum levels of the factors in the study
Total no. of
generations

Crossover
probability

Mutation
probability

Total
string
length

No. of binary
coded
variables

Total
no. of
runs

250

0.9

0.05

45

100
Table 3

Lower and upper bounds of the individuals used in GA optimisation

Factors

Lower bound

Upper bound

String length

pH

3.64

10.36

15

Inoculum size (%)

0.98

6.02

15

Sugar conc. (g/L)

1.59

18.41

15

2.4 Softwares used


ANN based predictive modelling was carried out using the shareware version of the
neural network and multivariable statistical modelling software, NNMODEL (Version
1.4, Neural Fusion, NY). A modified version of the C-implementation of a simple binary
coded variable GA developed at Kanpur Genetic Algorithms Laboratory, Indian Institute
of Technology Kanpur, India was used for GA optimisation.

82

K. Pakshirajan and C. Manda

Results and discussion

3.1 ANN based modelling


To predict and model the PCP production using the bacterium B. thuringiensis, neural
based simulations were carried out using the standard back error propagation network. As
stated before, the experimental data was suitably divided into training and test data sets,
pre-processed and randomised before carrying out the simulations. Following the training
of the network, the model was evaluated with the test data and the effect of network
parameters on the determination coefficient (R2) value was used as a measure to choose
the best network architecture. Using the best network architecture values given in Table
1, the developed ANN model gave an R2 value of 0.9982 during the training phase, which
could also predict the test data set with a very high degree of accuracy (R2 = 0.9969). The
ANN model developed in this study is presented in Table 4, which shows the weights of
the neurons in each layer of the network. Residuals, depicted in Figure 2, were calculated
as the absolute difference between the measured and ANN predicted values of OD600. In
general, positive or negative residual value in such plots reveals whether the model
predicted value is respectively high or low, and magnitude of which determines the
accuracy of model prediction a low residual value, either positive or negative,
indicating good accuracy of model prediction. In the present study, the very low residual
values (Figure 2) confirm that the ANN model has predicted the experimental values very
accurately. Similarly, ANN based modelling has been shown to accurately predict the
performance of a trickling bed air biofilter treating benzene, toluene and xylene vapours
(Rene et al., 2006). Further, the efficacy of this ANN technique over simple statistical
regression technique in predicting complex biological systems was tested by comparing
their predicted OD600 values with those obtained from experiments. This is manifested in
Figure 3 and Table 5, which compares the ANN and regression model predicted OD600
with the experimental values.
Table 4

Weights of the trained neural network for the prediction of PCP production by
B. thuringiensis
Wih

Neurons

pH

Who

Inoculum size

Sugar
concentration

Theta

Neurons

OD600

0.286

0.219

0.897

0.171

5.61

0.0115

0.587

2.78

2.53

7.70

7.98

6.41

10.5

0.109

9.03

0.135

7.14

3.19

0.0972

1.89

Theta

0.13

Notes: Wih weights between input and hidden layer


Who weights between hidden and output layer

Optimisation of pesticide crystal protein production


Figure 2

83

Residual plot of measured and predicted OD600 based on the ANN model

0.04
0.02

Residuals

0
0

10

15

20

-0.02
-0.04
-0.06
-0.08
Observations

Table 5

23 CCD showing experimental, ANN and regression predicted OD600 of the culture
OD600 values

pH
5

Inoculum size
(%)

Sugar concentration
(g/L)

Experimental

ANN

Regression

0.242

0.221

0.281

Model predicted

0.228

0.217

0.213

0.428

0.431

0.384

0.258

0.276

0.277

15

0.354

0.352

0.337

15

0.353

0.365

0.4

15

0.45

0.463

0.408

9
3.64

15

0.528

0.468

0.492

3.5

10

0.356

0.339

0.359

10.36

3.5

10

0.331

0.333

0.322

0.98

10

0.301

0.229

0.269

6.02

10

0.407

0.395

0.433

3.5

1.59

0.258

0.241

0.259

3.5

18.41

0.494

0.485

0.487

3.5

10

0.319

0.325

0.323

3.5

10

0.329

0.325

0.323

3.5

10

0.314

0.325

0.323

3.5

10

0.339

0.325

0.323

3.5

10

0.324

0.325

0.323

3.5

10

0.316

0.325

0.323

84

K. Pakshirajan and C. Manda

Figure 3

Comparison of OD600 values obtained from experiments, ANN and regression based
models

0.6

Experimental
ANN

0.5

Regression
OD 600

0.4
0.3
0.2
0.1
0
1 2 3

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Observations

3.2 Optimisation by GA technique


The developed ANN model for predicting PCP production using B. thuringiensis was
employed as the objective function for optimising the values of the input factors by GA
technique. A total of five different runs employing the parameters presented in Tables 2
and 3 were used to search for the optimum values of the input factors and the results
obtained in each of these runs are indicated in Table 6. To get a better visualisation of the
GA optimisation process followed in the study, the entire range of the output parameter
(OD600) was divided into subunits of equal length and the number of individuals in each
subunit was calculated for selected generations. While the initial generation zero gives
the initial random population covering the entire range, the individuals in the subsequent
generations seem to converge to a specific OD600 subunit in the range of 0.50.6
(Figure 4). This population profile of the individuals taken at different generations
therefore revealed that the objective function obtained by GA converged to give a
maximum value of 0.57643. The optimised value of PCP was thus obtained at a pH of
3.65, inoculum size of 6.009% and sugar concentration of 1.61 g/L. These optimised
factor levels were found to be in the low range for pH and sugar concentration, whereas
for inoculum size, it was in its high range. This result was also found to be consistent
with the individual effects of the factors on PCP production by the bacterium, as
indicated in Table 5. Moreover, the convergence of the different factors in the study to a
narrow range could be taken as an indication of non-interaction between them
(Weuster-Botz and Wandrey, 1995; Haider et al., 2008). Compared with the optimal
values found from the experimental trials, this artificial intelligence technique predicted
optimal OD600 value of 0.57643, which was 9.17% higher than the value found from
experiments. The optimal value obtained using this optimisation technique was also
compared with that obtained from conventional statistical based RSM technique, and
was, however, found to be slightly less. The RSM optimised input parameters levels were
5.62 for pH, 4.19% for inoculum size and 11.9 g/L for sugar concentration, which yielded

Optimisation of pesticide crystal protein production

85

a predicted OD600 value of 0.582. It should be mentioned here that the RSM optimised
levels of the input factors were not in concurrence either with the experimental data or
the GA optimised levels, probably due to very low predictive accuracy (R2 < 0.85) of the
quadratic regression model, based on which RSM optimisation was performed; hence, the
result obtained using RSM may not be valid as such. This further proved the superiority
of the artificial intelligence techniques in prediction and optimisation of PCP production
using B. thuringiensis. It would also be more interesting to experimentally validate the
predicted OD600 value obtained in the present work using the ANN-GA method.
Table 6

Summary of the optimised results of OD600 of the culture using the ANN-GA method
OD600
Run 1

Run 2

Run 3

Run 4

Run 5

Maximum

0.5282

0.57643

0.52773

0.52783

0.57613

Minimum

0.35439

0.26608

0.34181

0.32586

0.25454

Average

0.50488

0.54223

0.50574

0.50536

0.53007

Figure 4

Population profile at different generations showing convergence of GA for maximum


OD600
Generation no. 1
Number of
individuals

Number of
individuals

Generation no. 0

60
40
20
0
0 to 0.1

0.1 to
0.2

0.2 to
0.3

0.3 to
0.4

0.4 to
0.5

40
30
20
10
0

0.5 to
0.6

0 to 0.1

0.1 to
0.2

OD 600

Number of
individuals

Number of
Individuals
0.2 to
0.3

0.3 to
0.4

OD 600

0.4 to
0.5

0.5 to
0.6

Generation no. 250

80
60
40
20
0
0.1 to
0.2

0.3 to
0.4

OD 600

Generation no 25

0 to 0.1

0.2 to
0.3

0.4 to
0.5

0.5 to
0.6

80
60
40
20
0
0 to 0.1 0.1 to
0.2

0.2 to
0.3

0.3 to
0.4

0.4 to
0.5

0.5 to
0.6

OD 600

From the results of the hybrid approach used in this study, it could be well said that such
combinations of artificial intelligence techniques would give more insight and work
better in modelling and optimisation of other biological systems as well.

Conclusions

Earlier obtained experimental data on PCP production using Bacillus thuringiensis was
adopted for modelling and optimisation using ANN and GA respectively. The data driven
approach employing ANN for modelling the bioprocess performed excellently well with
a coefficient of determination value greater than 0.995 both during its training and

86

K. Pakshirajan and C. Manda

validation. Based on the developed ANN model, optimal values of the factors, viz., pH of
the medium, inoculum size and sugar concentration, affecting the process were obtained
using GA. The maximum PCP value obtained by this ANN-GA approach at the
optimised settings of the factors was found to be 9.17% higher compared to that obtained
from experiments.

References
Deb, K. (1995) Optimization for Engineering Design: Algorithms and Examples, Prentice Hall of
India Private Limited.
Elas, A., Ibarra-Berastegi, G., Arias, R. and Barona, A. (2006) Neural networks as a tool for
control and management of a biological reactor for treating hydrogen sulphide, Bioprocess
and Biosystems Engineering, Vol. 29, pp.129136.
Haider, M.A., Pakshirajan, K., Singh, A. and Chaudhry, S. (2008) Artificial neural network
genetic algorithm approach to optimize media constituents for enhancing lipase production by
a soil microorganism, Applied Biochemistry and Biotechnology, Vol. 144, pp.225235.
Hofte, H. and Whiteley, H.R. (1989) Insecticidal crystal proteins of Bacillus thuringiensis,
Microbiological Reviews, Vol. 53, pp.242255.
Hornik, K., Stinchcombe, M. and White, H. (1989) Multilayer feed forward networks as universal
approximators, Neural Networks, Vol. 2, pp.356359.
Maier, H.R. and Dandy, G.C. (1998) The effects of internal parameters and geometry on the
performance of back propagation neural networks: an empirical study, Environmental
Modelling and Software, Vol. 13, pp.193209.
Maier, H.R. and Dandy, G.C. (2001) Neural network based modeling of environmental variables:
a systematic approach, Mathematical and Computer Modelling, Vol. 33, pp.669682.
Poggio, T. and Girosi, F. (1990) Networks for approximation and learning, Proceedings of IEEE,
Vol. 78, No. 9, pp.14811497.
Prabagaran, S.R., Pakshirajan, K., Swaminathan, T. and Jayachandran, S. (2004) Media
optimization of Bacillus thuringiensis PBT-372 using response surface methodology,
Chemical and Biochemical Engineering Quarterly, Vol. 18, No. 2, pp.183187.
Reich, S.L., Gomez, D.R. and Dawiowski, L.E. (1999) Artificial neural network for the
identification of unknown air pollution sources, Atmospheric Environment, Vol. 33,
pp.30453052.
Rene, E.R., Maliyekkal, S.M., Philip, L. and Swaminathan, T. (2006) Back-propagation neural
network for performance prediction in trickling bed air biofilter, International Journal of
Environment and Pollution, Vol. 28, pp.382401.
Rummelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning representations by
back-propagation errors, Nature, Vol. 323, pp.533536.
Schnepf, E., Crickmore, N., Van Rie, J., Lereclus, D., Baum, J., Feitelson, J., Zeilger, D.R. and
Dean, D.H. (1998) Revision of the nomenclature for the Bacillus thuringiensis pesticidal
crystal proteins, Microbiology and Molecular Biology Reviews, Vol. 62, pp.775806.
Schuurmann, G. and Muller, G. (1994) Back-propagation neural networks recognition vs
prediction capability, Environmental Toxicology and Chemistry, Vol. 13, pp.743747.
Sharma, H.C., Sharma, K.K., Seetharama, N. and Ortiz, R. (2000) Prospects for using transgenic
resistance to insects in crops improvement, Electronic Journal of Biotechnology, Vol. 3,
No. 2, pp.2122.
Valdez-Castro, L., Baruch, I. and Barrera-Cortes, J. (2003) Neural networks applied to the
prediction of fed-batch fermentation kinetics of Bacillus thuringiensis, Bioprocess and
Biosystems Engineering, Vol. 25, pp.229233.

Optimisation of pesticide crystal protein production

87

Weuster-Botz, D. and Wandrey, C. (1995) Medium optimisation by genetic algorithm for


continuous production of formate dehydrogenase, Process Biochemistry, Vol. 30,
pp.563571.

You might also like