Professional Documents
Culture Documents
Artificial Neural Networks Used For Validation
Artificial Neural Networks Used For Validation
Powder Technology
j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / p ow t e c
Comparison between two types of Articial Neural Networks used for validation of
pharmaceutical processes
Sharareh Salar Behzadi a,, Chakguy Prakasvudhisarn b, Johanna Klocker c,
Peter Wolschann c, Helmut Viernstein a
a
b
c
Department of Pharmaceutical Technology and Biopharmaceutics, University of Vienna, Althanstrasse 14, 1090 Vienna, Austria
School of Technology, Shinawatra University, Shinawatra Tower III, Viphavadi-Rangsit Rd., Chatuchak, 10900 Bangkok, Thailand
Department of Theoretical Chemistry, University of Vienna, Whringerstrasse 17, 1090 Vienna, Austria
a r t i c l e
i n f o
Article history:
Received 4 April 2008
Received in revised form 16 April 2009
Accepted 29 May 2009
Available online 8 June 2009
Keywords:
Bayesian neural network
Feed-forward back-propagation network
Leave-one-out cross validation
Granulation processes
Validation of pharmaceutical processes
a b s t r a c t
Two types of Articial Neural Networks (ANNs), a Multi-Layer Perceptron (MLP) and a Generalized
Regression Neural Network (GRNN), have been used for the validation of a uid bed granulation process. The
training capacity and the accuracy of these two types of networks were compared. The variations of the ratio
of binder solution to feed material, product bed temperature, atomizing air pressure, binder spray rate, air
velocity and batch size were taken as input variables for training the MLP and GRNN. The properties of size,
size distribution, ow rate, angle of repose and Hausner's ratio of granules produced, were measured and
used as output variables. Qualitatively, the two networks gave comparable results, as both pointed out the
importance of the binder spray rate and the atomizing air pressure to the granulation process. However, the
averaged absolute error of the MLP was higher than the averaged absolute error of the GRNN. Furthermore,
the correlation coefcients between the experimentally determined and the calculated output values, the
corresponding prediction accuracy for the different granule properties as well as the overall prediction
accuracy using GRNN were better than using MLP. In conclusion, the comparison of two different networks
(MLP, a so-called feed-forward back-propagation network and GRNN, a so-called Bayesian Neural Network)
showed the higher capacity of the latter for validation of such granulation processes.
2009 Elsevier B.V. All rights reserved.
1. Introduction
Process validation is dened by the FDA (Food and Drug
Administration) as establishing documented evidence which provides
a high degree of assurance that a specic process will consistently
produce a product meeting its predetermined specications and
quality attributes. Consequently, process validation is an important
subject in the pharmaceutical industry.
In the case of wet-granulation methods, controlling the moisture
content of growing granules and wet-massing time is important to
assure the manufacture of granules with the desired characteristics
[15]. On the other side, the process of size enlargement of particles in
the uid bed granulation technology is a complex and non-linear
interaction, which is affected by apparatus, process and product
parameters [611]. In the past decade, Articial Neural Networks
(ANNs) have been increasingly applied for modeling of the complex
relationships between these parameters and their inuence on the
end product quality [1222]. Many investigations have shown that the
use of ANNs rather than traditional statistic designs enables an
advanced predictability of the process and end product properties
Corresponding author. Tel.: +43 1 4277 55417; fax: +43 1 4277 9554.
E-mail address: sharareh.salar-behzadi@univie.ac.at (S. Salar Behzadi).
0032-5910/$ see front matter 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.powtec.2009.05.025
training and a verication set. While the training set is used to train the
network, the verication set is applied to check the network's error
performance. Finally, it is common practice to reserve a third set of
cases (test set) for external prediction, to ensure that the results on the
training and verication set are real and not artefacts of the training
process. The general architecture of feed-forward back-propagation
networks is shown in Fig. 1.
In such a feed-forward back-propagation network, the connection
weights are set to random values at the beginning of the training. The
descriptor values for all studied sets of parameters are passed through
the network (feed-forward) and the output responses are compared
to the target values of the input properties to give an error value. The
weights are then adjusted for the second pass of the data through the
network in order to reduce the obtained error value. As the
adjustment of the weights is beginning with correction of the last
layer and then continuing backwards to the rst layer, this is called
back-propagation [31]. The entire procedure is repeated in an iterative
manner until the error value reaches a minimum. Finally, a regression
coefcient may be calculated between the observed product properties and the network predicted values. Generally, feed-forward backpropagation networks suffer from two potential problems: a) overtting of the data if there are too many adjustable weights; and b)
overtraining of the network if there are too many training cycles.
According these two problems, the number of adjustable weights
plays a crucial role, which additionally inuences the predictability of
the nal network. Finding an optimal network topology to achieve a
balance between those two extreme situations is an important point
in the network training.
Another type of networks are the so-called Bayesian Neural
Networks (BNN). Two types of BNNs have been developed: Probabilistic Neural Networks (PNNs) distinguish between different categories
of patterns [32], while Generalized Regression Neural Networks
(GRNNs) estimate the most probable value for continuous dependent
values [33]. BNNs are feed-forward networks which do not use backpropagation. The input layer of a BNN consists of a number of neurons,
equal to the number of independent parameters the network is
trained on. The normalized input vector is copied onto the units in the
pattern layer, each representing a training case. Instead of the sigmoid
activation function commonly applied for back-propagation, BNNs
use exponential functions. The resulting activation level is forwarded
to the summation unit. The density estimated on each pattern is
151
152
Batch
size (g)
Ratio of binder
solution to feed
material (w/w)
Product
air temp.
(C)
Atomizing
air pressure
(bar)
Spray
rate
(g/min)
Air
velocity
(m/s)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
400 10
600 10
800 10
1000 10
1200 10
1600 10
1800 10
2000 10
2200 10
2400 10
2600 10
2800 10
3000 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1400 10
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:14
1:9
1:5
1:2.8
1:2.4
1:2.2
1:2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
1:3.2
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
30 3
40 3
50 3
70 3
80 3
90 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
60 3
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.2 0.05
0.3 0.05
0.4 0.05
0.5 0.05
0.8 0.05
1.0 0.05
1.2 0.05
1.4 0.05
1.6 0.05
1.8 0.05
2.0 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
0.6 0.05
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
6.0 2
11,5 2
16,5 2
21,5 2
26,5 2
36,5 2
42.0 2
46,5 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
32.0 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
24 2
16 2
18 2
22 2
26 2
30 2
34 2
38 2
24 2
Table 2
The properties of granules produced by using input setting values described in Table 1
(obtained output parameters).
Run
Mean granule
size (d50) (m)
Granule size
distrib. (m)
Flow
rate (s)
Angle of
repose ()
Hausner's
ratio (ml)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
2000.00
2000.00
874.40
909.00
790.00
603.40
626.20
552.20
571.40
805.00
662.00
548.34
575.35
349.00
305.00
392.00
696.50
800.00
790.00
897.50
1750.68
1240.00
650.75
592.70
535.13
599.28
2000.00
2000.00
1400.00
649.50
356.52
214.70
192.65
185.50
183.00
150.00
140.00
300.70
345.22
388.53
551.00
598.40
1650.28
2000.00
2000.00
1437.96
1242.24
908.15
840.61
593.26
589.57
548.71
696.50
2.50
2.50
1.52
1.52
1.55
1.38
1.43
1.42
1.38
1.51
1.50
1.48
1.48
2.23
1.99
1.88
1.53
1.80
1.74
1.74
1.75
1.60
1.27
1.43
1.65
1.52
2.50
2.23
1.83
1.53
1.61
1.98
1.60
1.83
1.81
1.68
1.69
2.02
2.09
1.83
1.73
1.57
1.69
2.50
2.50
1.60
1.61
1.64
1.44
1.36
1.36
1.52
1.67
50.00
50.00
28.00
27.02
22.06
20.05
21.08
19.03
20.00
19.04
16.05
16.06
16.10
6.40
6.10
13.87
23.00
21.90
26.00
28.60
24.00
26.00
18.60
17.50
15.00
15.10
50.00
50.00
19.50
23.00
11.20
11.00
11.15
6.10
6.2
5.8
5.65
14.01
11.08
17.07
20.09
22.05
23.68
50.00
50.00
28.01
26.05
25.04
24.05
20.07
21.07
22.00
23.00
35.00
35.00
30.54
33.02
31.79
26.56
30.96
30.54
22.78
28.81
29.25
29.25
29.30
24.70
27.02
25.64
28.37
27.47
31.38
31.38
35.00
26.56
29.25
28.81
26.56
27.92
35.00
25.64
26.56
28.37
24.70
25.64
20.30
23.27
23.65
21.50
21.00
31.80
32.62
32.21
33.02
31.38
30.96
35.00
35.00
19.29
26.10
29.68
29.68
30.96
31.80
32.62
27.47
1.57
1.57
1.23
1.19
1.20
1.21
1.19
1.21
1.20
1.16
1.18
1.17
1.17
1.30
1.26
1.15
1.19
1.15
1.10
1.10
1.57
1.10
1.16
1.17
1.14
1.16
1.27
1.16
1.16
1.18
1.18
1.10
1.17
1.19
1.19
1.18
1.20
1.24
1.20
1.23
1.15
1.19
1.16
1.57
1.57
1.17
1.18
1.18
1.20
1.23
1.24
1.23
1.17
The MLP as well as the GRNN were trained by using the mentioned
investigated parameters as inputs, and the mentioned investigated
granule properties as outputs.
Tables 1 and 2 demonstrate the setting values of input and output
parameters for the training of MLP and GRNN. Due to the small number
of data-sets (53 trials) a leave-one-out cross validation process was
chosen for the training of both networks by combining the training and
validation sets. 38 data-sets were randomly chosen as the training set, 7
data-sets as the validation and 8 data-sets as the test sets.
Table 3
The obtained network structures.
Network
type
Input
nodes
Hidden
nodes
Output
nodes
Epochs Smoothing
factor ()
Averaged
total error
GRNN
MLP
5
5
456
3
5
5
30
0.085
0.15
0.1
153
3. Results
The MLP was trained with the Levenberg-Marquardt optimization
algorithm using 30 iterations (epochs). A hyperbolic tangent sigmoid
transfer function tansig and a linear transfer function purelin were
Fig. 4. ae. Correlation between experimentally determined and predicted target values
for the different granule properties by GRNN.
Fig. 3. ae. Correlation between experimentally determined and predicted target values
for the different granule properties by MLP.
used for nodes in hidden and output layers, respectively. The obtained
MLP architecture consisted of ve input nodes, one hidden layer and 3
hidden nodes, and ve output nodes.
The obtained GRNN architecture consisted of 5 input nodes, two
hidden layers (pattern and summation layers), 45 pattern nodes, 6
summation nodes and 5 output nodes. The smoothing factor () was
selected as 0.1.
154
Table 4
Prediction accuracies obtained by trained MLP and GRNN for the prediction of each
granule property and the overall prediction accuracies.
Table 6
Correlations between the investigated parameters and the properties of obtained
granules.
MLP
GRNN
Correlation coefcient
82.45
71.70
89.43
86.80
71.70
80.42
96.22
84.90
94.33
92.45
88.67
91.31
Angle of
Hausner's
Granule size Flow
Mean
distribution rate (s) repose () ratio
granule
size (m) (m)
The air velocity was not used in the parameter set, as inclusion of this
parameter increased the error values of both networks signicantly.
The network structures and the averaged total errors for the
prediction of the granule properties from the granulation parameters
are shown in Table 3. The errors have been dened as the sum of
squared differences between the predicted and actual output values
on each output unit.
In Figs. 3ae and 4ae, the correlations between the experimentally determined target outputs and the outputs predicted by the MLP
and GRNN are presented respectively.
The prediction accuracies of both trained networks were estimated
using two analytical methods:
On the one hand, the accuracy for the prediction of each granule
property has been estimated by considering 7.5% cut-offs of the
highest experimentally measured values. Table 4 shows the prediction
accuracies of the trained networks for each granule property. These
prediction accuracy values are related to the estimated correlation
coefcients (Figs. 3 and 4).
On the other hand, the involvement of each parameter in the
prediction of granule properties has been estimated. For this purpose,
the training data-set has been introduced into the networks
repeatedly, omitting one independent parameter each time. The
resulting network error was recorded each time. Any signicant
increase of the error served as proof of the importance of the
parameter. The involvement of each independent parameter in the
prediction of granule properties using trained MLP and GRNN are
reported in Table 5. For this purpose the ratio of network error after
elimination of each parameter to the averaged total error was
calculated. The higher the ratio, the more important is the parameter.
The correlations between each parameter and the properties of
produced granules are presented in Table 6. The mean size and the
ow rate of obtained granules were affected by the variation of the
ratio of binder solution to feed material, the atomizing air pressure
and the spray rate.
0.94
0.97
0.97
0.73
0.74
0.87
0.55
0.54
0.93
0.82
0.70
0.82
0.66
0.82
0.74
0.54
0.99
0.74
0.62
0.68
0.51
0.59
0.59
4. Discussion
The comparison of the predictability of MLP and GRNN for
uidized bed granulation processes showed the higher capacity of
GRNN, as discussed in the following:
The averaged absolute error for training the MLP was higher than the
averaged absolute error for training the GRNN (0.15 vs. 0.085, Table 3).
Table 5
The involvement of each parameter in the prediction of granule properties using trained
MLP or GRNN.
MLP
GRNN
1.01
2.63
3.50
3.97
3.18
1.7
4.76
7.9
8.9
7.2
Shown are the ratios of network error after elimination of each parameter to the
averaged total error.
Fig. 5. ac. Effect of the variation of air velocity on the properties of obtained granules.
Fig. 6. Effect of the variation of atomizing air pressure on the mean size of obtained
granules.
properties, i.e. optimal size and owability and narrow size distribution. Investigation of the airow's uniformity and the effect of air
velocity on the particle ow pattern in a uid bed process require
other mathematical tools than non-linear calculation methods
(ANNs). The probabilistic or numerical methods are the common
useful tools for such investigations [3941]. Analysis of obtained
granule properties conrmed the inuence of the air velocity on the
granule properties (Fig. 5ac). Increasing the air velocity resulted in
the decreasing of mean size and ow rate of granules. The size
distribution was not affected by increasing this parameter from 16 to
22 m/s, whereas the obtained granules using air velocities more than
22 to 34 m/s possessed signicantly narrower size distribution. It can
be assumed that using sucrose as feed material, the so-called bubbling
phase with optimal mixing and heat transfer was achieved in the bed
by using air velocities between 22 and 34 m/s. Increasing the air
velocity more than 34 m/s resulted in heterogeneous bed. Consequently, granules with wide size distribution were obtained.
Comparison of the correlation coefcients between the obtained
granule properties and predicted output values by training the MLP
and GRNN conrms the higher capacity of GRNN (Figs. 3 and 4). The
corresponding prediction accuracy for the different granule properties
as well as the overall prediction accuracy using GRNN were higher
than using MLP (91.31% vs. 80.42%, Table 4). As is seen in Table 4, the
accuracy of the prediction of mean size and the ow rate of granules
was higher than the prediction accuracy of other granule properties
using both trained MLP and GRNN. These results were conrmed by
the analysis of the correlation coefcients between the experimentally
measured and predicted granule properties on the one hand (Figs. 3
and 4), and the analysis of the correlations between each parameter
and the obtained granule properties (Table 6, Figs. 59) on the other.
However, even in the case of the prediction of mean size and ow rate,
the accuracy of GRNN was higher than that obtained with the MLP.
Fig. 7. Effect of the variation of spray rate on the mean size of obtained granules.
155
Both network types point out the high inuence of the binder
spray rate and the atomizing air pressure on the granulation process.
However the trained GRNN was more sensitive than the trained MLP,
as removing each input parameter resulted in higher network error
(Table 5). Investigations of the obtained granule properties conrmed
the high inuence of the binder spray rate and the atomizing air
pressure on the granule properties.
The measurements of droplet size by different combinations of
spray rate and atomizing air pressure conrmed the slight size
enlargement of droplets by increasing the spraying rate at constant
atomizing air pressure. Increasing the atomizing air pressure resulted
in the decreasing of droplet size. However, the droplets were not fully
developed as they leave the spraying nozzle (data not shown).
Figs. 6 and 7 depict the effect of spray rate and atomizing air
pressure on the mean size of obtained granules. As can be observed,
decreasing the atomizing air pressure or increasing the spray rate
resulted in the increase of mean granule size.
The investigations of the obtained granule properties showed also
the importance of the amount of binder solution, which could not
point out using the trained networks. Fig. 8ac demonstrate the effect
Fig. 8. ac. Effect of the variation of binder solution:feed material ratio on the properties
of obtained granules.
156
processes, the higher prediction accuracy of the lather one was clearly
demonstrated for the presented experimental setup. Given that the
superiority of GRNN over MLP might not be applicable for all
granulation processes in general, the present report underlines the
importance of selecting the best suited ANN for each individual
application.
Acknowledgment
Cordial thanks to Dr. Stefan Toegel for his ongoing support during
the preparation of this report.
References
Fig. 9. SEM photographs of obtained granules by using the ratios of a) 1:14 and b) 1:2 of
binder solution to feed material [w/w] (runs nr. 12 and 16, respectively).
157
[33] D.F. Specht, A general regression neural network, IEEE Trans. Neural Netw. 2 (1991)
568576.
[34] J. Zupan, J. Gasteiger, Neural Networks in Chemistry and Drug Design, Wiley-VCH,
Weinheim, New York, 1999.
[35] L. Simon, K.M. Nazmul, Probabilistic neural networks using Bayesian decision
strategies and a modied Gompertz model for growth phase classication in the
batch culture of Bacillus subtilis, Biochem. Eng. J. 7 (2001) 4148.
[36] J.V. Hansen, R.D. Meservy, Learning experiments with genetic optimisation of a
generalized regression neural network, Decis. Support Syst. 18 (1996) 317325.
[37] P. Bruneau, Search for predictive generic model of aqueous solubility using
Bayesian Neural Nets, J. Chem. Inf. Comput. Sci. 41 (2001) 16051616.
[38] A. Martin, J. Swarbrick, A. Cammarata, Phasical Pharmacy, Lea & Febiger, Philadelphia,
1983.
[39] H. Nakamura, S. Watano, Numerical modeling of particle uidization behavior in a
rotating uidized bed, Powder Technol. 171 (2007) 106117.
[40] W. Zhong, Y. Xiong, Z. Yuan, M. Zhang, DEM simulation of gassolid ow
behaviours in spout-uid bed, Chem. Eng. Sci. 61 (2006) 15711584.
[41] W. Zhong, M. Zhang, B. Jin, Z. Yuan, Flow behaviours of a large spout-uid bed at
high pressure and temperature by 3D simulation with kinetic theory of granular
ow, Powder Technol. 175 (2007) 90103.