Professional Documents
Culture Documents
Characterisation of Model Error For Charpy Impact Energy of Heat Treated Steels Using Probabilistic Reasoning and A Gaussian Mixture Model
Characterisation of Model Error For Charpy Impact Energy of Heat Treated Steels Using Probabilistic Reasoning and A Gaussian Mixture Model
Institute for Microstructural and Mechanical Process Engineering: The University of Sheffield
(IMMPETUS)
Department of Automatic Control and Systems Engineering
The University of Sheffield
Mappin Street, Sheffield S1 3JD, UK
Email: y.y.yang@sheffield.ac.uk, m.mahfouf@sheffield.ac.uk
Abstract: Data-driven modelling has gained much momentum recently, with modelling algorithms being
evolved into more complex structures capable of dealing with highly non-linear multi-dimensional systems.
However, it is widely accepted that data-driven models are typically obtained under the principle of error
minimisation, with the assumption of normal error distribution. The latter assumption is often not valid in
more complex modelling environments, leading to sub-optimal model predictions. In this paper, a new
modelling strategy aimed at exploiting the rich information contained in the model error data using a
Gaussian mixture model (GMM) is proposed. The GMM error model can provide a probability
characterisation of the error distribution, which can then be used complementally with the original data
model. This combination often produces improvements in prediction performances, as will be illustrated in
the case study relating to the hybrid modelling of the Charpy impact energy of heat-treated steels.
Keywords: Data-driven modelling, hybrid modelling, probabilistic reasoning, Gaussian mixture model, EM
algorithm, Charpy impact energy.
2
IFACMMM 2009. Viña del Mar, Chile, 14 -16 October 2009.
deviation σt in (5) is estimated by an evidence estimation assumptions of a normal error distribution during the BNN
algorithm (Nabney 2002). modelling is not valid here. It is well known that the Charpy
impact energy measurements will have different scatters
depending on the impact testing temperature, due to the
transition between brittle and ductile cleavage mechanism
(Moskovic and Flewitt 1997). The very sparse and clustered
sample distribution of the Charpy data add to the severity of
the non-Gaussian behaviour.
Apparently, the K-dimension vector zn has only one element Equations (13-15) are not closed analytical solutions since
with a value of 1, and all the other elements have the value of γ(znk) depends on the GMM parameter set α, hence they
‘zero’. The interpretation of this in the GMM is that the data cannot be used directly to find the ML solution for α.
x(n) is linked to the kth component if znk = 1. If the latent However, such a solution can be found via an iterative search
variables zn (n = 1,2,…, N) are known for the associated error algorithm by computing γ(znk) and α separately, starting from
data X, then the GMM parameter set α can be obtained some initial values of α(0). When computing γ(znk), α is
straightforward by calculating the location and the spreading assumed to be fixed (at its previous calculated values); the
parameters using only the data belonging to the each GMM newly computed values of γ(znk) is then used to update the α.
component, while the mixing coefficients can be calculated This process is repeated until either the algorithm converges
by the proportion of the data belonging to the kth component, or the maximum iteration number tMax has been reached. The
which are given as follows: algorithm based on the above principle is known as the
N Expectation Maximum (EM) algorithm (Bishop 2006), and it
N N ∑z nk ( xn − µ k ) 2 has been widely used to solve various PDF modelling
µ k = ∑ z nk xn / ∑ z nk ; σ = 2 n =1
problems involving latent variables. The flowchart of the EM
k N (11)
n =1 n =1
∑z
n =1
nk algorithm used in this paper is shown in Fig. 2, where the
N initial GMM parameters α(0) are fixed using the k-means
∑z nk clustering in order to speed up the convergence.
πk = n =1
N
; k = 1,2,...K
Initialisation
A difficulty arises in that, in a real GMM modelling scenario, K-means clustering (t = 0)
α(0) = {αi, i = 1,2 …K}, αi= [πi, µi, Σi]
zn (n = 1,2,…, N) are not known, and as a result (11) cannot
be readily used. Also, there may be more than one component
that are responsible for producing the data x(n), due to the
probability nature of the GMM. From (7) it is apparent that E-Step
the likelihood function is given as follows: Computing γ(znk) using (13) given α(t)
N K
P ( X | π , µ , Σ) = ∏∑ π k N ( µ k , Σ k ) M-Step
n =1 k =1
(12) Estimate α(t+1) by maximising the log
N K likelihood L(π, µ, Σ |X) using (14-15)
L (π , µ , Σ | X ) = ∑ log{∑ π k N ( xn | µ k , Σ k )}
n =1 k =1
k =1
4. APPLICATION OF GMM ERROR MODEL TO
∂L (π , µ , Σ | X ) CHARPY IMPACT ENERGY MODELLING
=0 ⇒
∂µ k
N
(14) The Charpy impact energy data provide a good case for
∑ γ ( znk )xn 1 N
GMM error data modelling, due to the fact that there are large
µk = n =1
= ∑γ (z nk )xn , k = 1,2,...K scatters in the impact energy variation, various Charpy impact
N
Nk measurement errors, and highly sparse sampling data
∑γ (z nk ) n =1
4
IFACMMM 2009. Viña del Mar, Chile, 14 -16 October 2009.
The first step in GMM error data modelling is to form the 0.000 0.814 0.030 0.134 0.407 0.196 0.001 0.473
0.000 0.506 0.019 0.033 0.161 0.129 0.000 0.512
error data X. The output error ey can be obtained from the
original modelling data and the data model: 1.515 0.091 0.006 0.004 0.170 0.019 0.034 1.577 (20)
0.013 1.062 1.735 0.404 1.620 1.612 0.813 1.479
e y ( n) = t n − f m (u n , wMP ), n = 1,2,..., N (16) Σ=
0.094 0.033 0.389 2.492 0.568 0.117 0.407 1.347
2.767 0.803 0.309 0.353 0.615 0.349 1.085 0.461
where fm is the function of the BNN data model. Other 0.150 0.918 0.015 0.012 0.052 0.027 0.002 1.230
symbols are defined similarly as in Section 2.
0.252 1.132 0.008 0.004 0.298 0.103 0.235 1.433
Apparently, this output error must be a constituent of the
The GMM parameter set α = {π, µ, Σ} completely specifying
error data X form which a GMM is to be developed. The
the joint PDF of the GMM via (7), and a example of the
remaining constituents of the error data X are chosen from the
marginal PDF (obtained from the GMM) for the output error
original inputs U of the data model, assuming that we do not
is shown in Fig. 3.
have any additional information. It is not advisable to use all
the inputs in U as most of the relationships between the inputs
and outputs have already been elicited during the data
modelling stage. The dimension of the error data should be
kept minimum as high dimensional error data will require a
very large number of error data to fix α adequately. Hence, a
heuristic rule proposed here is selecting a minimal subset
from U, which exerts the most relevant influences to the ey.
Following the above heuristic rule, 7 additional inputs, i.e.,
testing depth, specimen size, carbon composition, manganese
composition, sulphur composition, hardening temperature,
and testing temperature, are selected from the 16 inputs to for
the error data X. The resulting error data X is a 8 × N matrix,
given by:
x (2) 2
= e y ( 2) u e 2 ... u e28
X = (17)
... ... ... ... ...
N
x ( N ) e y ( N ) u e 2 u eN8 Fig. 3. GMM marginal PDF for ey
where ei, i = 2, 3, …8, are the indices corresponding to the The GMM error model (7) contains rich information about
positions U for the ith error data constituent. Only data in the the model output error, hence it can be explored to enhance
training and validation set are used to form X, with N = 1442. the BNN data model. One way of using the GMM relies on
finding the following conditional PDF under given inputs,
After some trial-and-error experiments, it is decided that K =
8 is appropriate for the GMM error data modelling. The EM P (e y , xe 2 ,...xe 7 ) (21)
P(e y | xe 2 ,...xe 7 ) =
P (e y )
= P(e y , xe 2 ,...xe 7 ) / ∫ P(e, X e )dX e
algorithm as shown in Fig. 2 is then initiated to obtain the Ω Xe
optimal α, with tMax = 30. The GMM parameters obtained are where Xe = [xe2, …, xe7]T. The analytical derivation of the
given in (18-20). conditional PDF (21) is challenging because the involvement
π = [0.29, 0.05, 0.10, 0.15, 0.06, 0.11, 0.15, 0.09]T (18) of a high dimensional integration, numerical solutions can be
found. From (21), the associated conditional mean µe | X and
y e
- 0.577 0.525 0.438 - 0.485 0.692 - 0.208 0.980 - 0.003
0.317 0.695 0.341 0.542 0.115 1.031 1.093 0.081 standard deviation σ e | X can also be computed.
(19)
y e
ACKNOWLEDGEMENTS
Financial support from the UK-EPSRC is acknowledged
under Grant EP/F023464/1. The authors wish to thank
CORUS-TATA Engineering Steel (UK) for providing the
modelling data used in this research.
REFERENCES
Fig. 4. Compensated predictions and 95% confidence bands
Bishop, C. M. (1995), Neural Networks for Pattern
Fig. 4 shows that the confidence bands fit nicely with the Recognition, Clarendon Press, Oxford.
actual measurements in the test data. Compared to the results Bishop, C. M. (2006), Pattern Recognition and Machine
obtained via BNN model (see Fig. 1(b)), it is a significant Learning, Springer.
improvement. The effect of prediction compensation is less Chen, M. Y., and D. A. Linkens (2001), A systematic neural-
dramatic, which confirms that the BNN model is good as it fuzzy modelling framework with application to material
provides predictions with a small bias. properties of alloy steels, IEEE Transactions on System,
Man and Cybernetics – Part B: Cybernetics, 31(5), 781-
5. CONCLUSIONS 790.
Huang, Z. K., and K. W. Chau (2008). A new image
A novel way of using Gaussian mixture model to capture the thresholding method based on Gaussian mixture model
complicated probabilistic behaviour of model error data has Applied Mathematics and Computation, 205, 899–907.
been developed in this paper. This method is particularly Kinnunen, T., J. Saastamoinen, V. Hautamäki, M. Vinni, and
beneficial in cases where the modelling data are not well- P. Fränti (2009). Comparative evaluation of maximum a
distributed, have a large measurement scatter and include Posteriori vector quantization and Gaussian mixture
multiple sources of random disturbances. Standard models in speaker verification, Pattern Recognition
‘deterministic’ data-driven modelling often fails to deliver Letters, 30, 341–347.
satisfactory results, and the use of GMM error data model can Moskovic, R and P. E. J. Flewitt (1997), An overview of the
provide valuable information about the stochastic behaviour principles of modelling Charpy impact energy data using
of the errors. The GMM error data model can then be statistical analysis, Metallurgical & Material
exploited to provide confidence bands, thus giving a vital Transactions A, 28A, 2609-2623.
information about the reliability of the prediction, which is Nabney, I. T. (2002). NETLAB – Algorithms for Pattern
often critical in real application of model predictions. The Recognition, Springer, London.
GMM error model can also provide prediction compensation Panoutsos, G. and M. Mahfouf (2005), Granular computing
to the original data model in order to reduce the error bias. and evolutionary fuzzy modelling for mechanical
properties of alloy steels, Proceeding of 16th IFAC World
Detailed development of algorithms and probabilistic
Congress , Prague, Czech Republic, 4-8 July.
manipulation of the GMM error data modelling are presented
Tenner, J. (1999), Optimisation of the Heat Treatment of
in this paper, together with the strategies and key concepts of
Steel using Neural Networks, PhD Thesis, The University
GMM implementation. The GMM error model paradigm is
of Sheffield.
applied to the Charpy impact energy data modelling, which is
Yang, Y. Y. and D. A. Linkens (2001). Error Bounds
known to have processed most of the unwanted
Calculation for Steel Tensile Strength Prediction—
characteristics, such as large measurement scatter, sparse
Comparison of Bayesian and Ensemble Model Approach,
sample distribution, high dimensional multiple random noises
The IFAC Symposium on Automation in Mineral, Mining
(Panoutsos and Mahfouf 2005). The GMM error model
and Metal Processing, Tokyo, Japan, Sept 04 - 06.
developed here is generic in nature, and can be coupled with
Yang, Y. Y., D. A. Linkens, M. Mahfouf, and A. J. Rose
other types of data-driven models.
(2003). Grain Growth Modelling for Continuous
Preliminary results of GMM error data modelling are Reheating Process – A Neural Network-Based Approach,
promising, though a few technical challenges remain. The ISIJ International, 43(7), 1048-1056.
formation of the error data matrix deserves more elaborated
techniques, since the selection of the inputs to be included
6
IFACMMM 2009. Viña del Mar, Chile, 14 -16 October 2009.