Professional Documents
Culture Documents
Principal Components Analysis For Turbulence-Chemistry Interaction Modeling
Principal Components Analysis For Turbulence-Chemistry Interaction Modeling
Principal Components Analysis For Turbulence-Chemistry Interaction Modeling
4.1
A primary goal when dealing with multivariate data is to reduce their dimensionality to the smallest number of meaningful dimensions, in order to help
data exploration and any further processing. Principal Components Analysis
(PCA) can be successfully exploited for this purpose. PCA was first introduced
by Pearson in the early 1900s [94]. A formal treatment of the method is due
to Hotelling [95] and Rao [96].
Suppose that X is a vector of p random variables, i.e. X = (x1 , x2 , . . . , xp ),
with mean and covariance matrix . The (i, j)th element of represents the
covariance between the ith and jth variables of X, if i 6= j, or the variance of the
jth element of X, if i = j. PCA is concerned with finding a few (<< p) derived
variables, called Principal Components (PCs), which nevertheless preserve most
of the information present in the original data. The PCs are linear combinations
of the original variables; moreover, they are uncorrelated (i.e. orthogonal) and
derived so that the variance on the jth component is maximal.
The first PC of X is defined as the linear combination:
(4.1)
z1 = Xa1 .
0
(4.3)
(4.4)
a1 a1 = a1 a1 = a1 a1 = = 1 .
0
(4.5)
a1 a2 = 0 a2 a1 = 0 a1 a2 = 0 a2 a1 = 0
(4.6)
(4.7)
a1 a2 a1 a2 a1 a1 = 0
(4.8)
= 0,
(4.9)
which reduces to
being
0
a1 a2 = a1 a2
(4.10)
due to the constraint of z1 and z2 being uncorrelated. Then, Eq. (4.8) reduces
to:
a2 a2 = 0.
(4.11)
4.2
Sample PCA
In Section 4.1, the definition and derivation of PCs have been discussed for an
infinite population of measures. In practice, a random sample of n observations
of the p variables is available, so that Xi = (xi1 , xi2 , . . . , xip ) represents the ith
observation from the data set. Thus, the data available for PCA is a (n x p)
data matrix and an unbiased estimator of , S 1 , is employed.
For a single observation of X, Xi , zi1 is given by:
zi1 = Xi a1 , i = 1, 2 . . . , n
(4.12)
X
1
(zi1 z 1 )2 ,
(n 1)
(4.13)
i=1
(4.14)
(4.15)
1
0
X X.
n1
(4.16)
The matrix S represents the approximation of for a finite population, i.e. the random
sample consisting of n observations for p variables.
67
(4.17)
S = A LA
where L is a (p x p) diagonal matrix containing the eigenvalues of S in descending order, l1 > l2 > . . . > lp .
The linear transformation given by Eq. (4.15) simply recast the original
variables into a set of new uncorrelated variables, whose coordinate axes are
described by A. Then, the original variables can be stated as a function of the
PCs as:
0
(4.18)
X = ZA
(4.19)
Xq = Zq Aq .
(4.20)
4.2.1
It can be shown [90] that PCA satisfies the following optimal properties:
Property 1 : For any integer q, 1 q p, consider the orthonormal
0
transformation Z = XB, where B is a (p x q) matrix. Let Sz = B LB
be the variance-covariance matrix for Z. The, the trace of Sz , tr (Sz ),
is maximized by taking B = Aq , where Aq contains the first q columns
of A. Property 1 emphasizes that the PCs explain, successively, as much
as possible of the total univariate variance in the original data, being
tr (Sz ) = tr (S)
Property 2. Consider the orthogonal transformation Z = XB. Then,
tr (Sz ) is minimized by taking B = Aq , where Aq consist of the last q
columns of A. The statistical implication of property 2 is that the last few
PCs are not simply unstructured leftovers after removing the important
PCs. Being the variances of the last PCs small, they can help to detect
unsuspected near-constant linear dependencies among the element of X,
68
S = l1 a1 a1 + l2 a2 a2 + . . . + lp ap ap .
(4.21)
This result shows that we can decompose the whole covariance matrix
into decreasing contribution due to each PC.
Property 4. Consider the orthogonal transformation Z = XB. Then,
the determinant of Sz , det (Sz ), is maximized by taking B = Aq . The
statistical importance of this property follows because the determinant of
a covariance matrix, called generalized variance, can be used as a simple
measure of spread for a multivariate random variable.
Property 5. Each element of X can be predicted by a linear function of Z,
Z = XB. If j2 is the residual variance in predicting xj from Z , then
Pp
2
j=1 j is minimized by taking B = Aq . The statistical implication
of property 5 is that Eq. (4.20) is the best linear predictor of X in a
q-dimensional subspace, in terms of squared prediction error.
4.2.2
As it was anticipated in Section 4.2, data are usually centered before PCA is
carried out. When the variable means are subtracted from the data sample, all
the observations are converted to fluctuations, thus leaving only the relevant
variation for analysis. Moreover, when working with centered variables, centered PCs are obtained. Centering is usually used with all the scaling criteria
described below.
Scaling is an essential operation when the elements of X are in different
units or when they have very different variances. These aspects have both
to be faced when analyzing the thermochemical state of a reacting system
since temperature and species concentrations have different units. Moreover,
temperature may range from ambient conditions to thousands of degrees while
species mass fractions vary between zero and one. Besides, even among species
mass fractions, there may be need for scaling. For example, radicals appear in
small concentrations and their mass fractions may range from zero to something
far less than one (i.e. 103 106 ), while major species mass fractions range
from 0 to 1. Taking into account centering, it is possible to define a scaled
ej as:
variable, x
ej =
x
xj xj
dj
69
(4.22)
(4.23)
0
X X X X D 1
.
Z = X X D 1 A
1
Zq = X X D Aq
1
1
n1 D
(4.24)
The choice of the scaling parameters is very important, and has a potentially
strong impact on the resulting eigenvectors. The following choices are available:
1. Auto scaling, also called unit variance scaling. It is commonly applied and
uses the standard deviation, sj , as the scaling factor. After auto scaling,
all the elements of X have a standard deviation equal to 1 and therefore
the data is analyzed on the basis of correlations instead of covariances
2. Vast scaling [97]. Vast is an acronym of variable stability scaling and
it is an extension of auto scaling. It focuses on stable variables, the
variables that do not show strong variation, using the standard deviation
and the so-called coefficient of variation as scaling factors. The use of the
coefficient of variation, defined as the ratio of the standard deviation and
the mean: sj/xj , results in a higher importance for variables with a small
relative standard deviation
3. Range scaling. Range scaling adopts the difference between the minimal
and the maximal value, (xj,max xj,min ), as scaling factor. A disadvantage of range scaling with respect to other scaling methods is that only
two values are used to estimate the range, while for the standard deviation all measurements are taken into account. This makes range scaling
more sensitive to outliers. To increase the robustness of range scaling,
the range could also be determined by using robust range estimators or
after the outliers have been removed.
4. Level scaling. The mean values of the variables, xj , are used as scaling
factors. Level scaling converts deviations from the mean (the mean is
always subtracted) in percentages compared to the mean values. As for
the range scaling, also level scaling can be affected by outliers. Then,
a more robust estimator of the mean, the median, could be used or the
mean could be determined after outlier removal. Level scaling can be
used when large relative changes are of specific interest. However, in
the case of the thermochemical state of a system, this could lead to an
overestimation of the role of chemical species which appear in very small
concentrations, i.e. radicals.
70
(4.25)
ik
k=1
lk
2
2
zip
zi1
z2
+ i2 + . . . +
= DM .
l1
l2
lp
(4.26)
The first few principal components have large variances and explain most of
the variation in X. Therefore, these major components are strongly affected
by variables with relatively large variances and covariances. Consequently, the
observations that are outliers with respect to the first few components usually
correspond to outliers on one or more of the original variables. On the other
hand, the last few principal components represent linear functions of the original variables with minimal variance. These components are sensitive to the
71
(a)
(b)
Figure 4.1: Principal components scores with (a) and without (b) outliers.
(a)
(b)
Figure 4.2: Eigenvalues size with (a) and without (b) outliers.
73
4.2.3
(4.28)
(xq,ij xij )2 =
i=1 j=1
p
X
k=q+1
74
lk .
(4.29)
tq,j
2
q
X
ajk lk
=
sj
(4.30)
k=1
where ajk is the weight of the jth variable on the kth eigenvector and sj is the
standard deviation of variable xj .
4.2.3.2
lq
1X1
=
.
p
k
(4.31)
k=1
This method actually compares the eigenvalues from the observed sample
with the eigenvalues from random data. Based on Eq. (4.31), the observed
eigenvalues are considered interpretable if they exceed lq .
75
Scree plot
Another common method to determine the number of PCs is the Scree Plot
(Figure 4.2). This is a simple plot of the eigenvalues sorted in descending order
against their indexes. The number of eigenvalues to retain is based on the
observation of the index q at which the slopes of lines joining the plotted points
are steep to the left of q, and not steep to the right of it. Cattell [103] originally
proposed that the points to the left of the straight line, defined by the smaller
eigenvalues (three components for the data in Figure 4.2), should be considered
important. Afterwards, Cattell and Vogelmann [104] concluded that also the
first eigenvalue to the right of this point should be included (four components
for the data in Figure 4.2). Often the Scree Plot approach is complicated by
either the leak of any obvious break or the possibility of multiple break points.
4.2.3.5
Z
Matching under rotation and reflection is ensured by considering
= trace Zq Zq + Z Z 2
Q=VU
0 Zq = U V 0
Z
(4.33)
0 Zq .
where is the matrix of singular values from the SVD of Z
2
77
(4.35)
The criteria proposed by McCabe [106] for the definition of the principal
variables are:
Q
MC1 max |S 11 | = min |S 22,1 | = P
min m
k=1 k
MC2
min tr (S 22,1 ) = min m
Pmk=1 2k
(4.36)
2
MC3
min
kS
k
=
min
22,1
k=1 k
Pr
MC4 max k=1 2k , with r = min (m, p m)
where k are the eigenvalues of S 22,1 and k are the canonical correlations between the selected and not selected variables. As McCabe [106]
points out, after the selection of the PVs, S 22,1 represents the information
left in the remaining unselected variables and, then, it is quite plausible
that three of the optimality criteria should be functions of this matrix.
McCabe [106] criteria are very appealing as they satisfy well defined properties. For instance, criterion MC1 maximizes the variance of the data
explained by the subset of variables, while MC2 and MC3 both minimize
the reconstruction error. However, the criteria rapidly becomes computationally unfeasible for very large data sets.
78
4.2.4
Bq = Aq T .
p
q
X
X
1
VM AX (Aq ) =
a4ik
p
k=1
i=1
p
1X
!2
a2ik
(4.38)
i=1
Kaiser [110] refers to this as raw VARIMAX, but it is the version that has
become most popular. Verbally, this is simply the sum of the column-wise
variances of the squared elements of Aq . In other words, a criterion is defined
to maximize the amount of variance explained for any of the original variables
on single PCs. After VARIMAX rotation, Aq will generally have fewer large
loadings in its columns, thereby making the columns more easily interpretable.
A simple analytical solution for the maximization of the criterion in Eq. (4.38)
exist for the two-dimensional case [110]. Indicating the columns of Aq with k
and l, the two-dimensional solutions is:
Pp
2 a2 (2a a ) +
t = 2p
a
ik il
i=1
il
ik
Pp
P 2
2
ahik a2il
i=1 (2aik ail ) i
2
P
,
b = p pi=1 a2ik a2il (2aik ail )2 +
2
P 2
P
2
[ pi=1 (2aik ail )]
aik a2il
(4.39)
(4.40)
cos () sin ()
sin () cos ()
.
(4.41)
4.3
The PCA transformation described in Section 4.2 can suffer from its reliance on
second order statistics. In fact, the PCs are uncorrelated, i.e. their second-order
product moment is zero, but they can still be highly statistically dependent.
This is particularly important when the relationships among the correlated
variables are non-linear, as it usually happens for a reacting system. In this
case, PCA fails to find the most compact description of the data and it usually
requires a larger number of components to model the low-dimensional hyper
plane embedded in the original space, with respect to a non-linear technique.
This simple realization has prompted the development of non-linear alternatives
to PCA. A considerable amount of work has been done in the context of neural
networks. Nevertheless, here we are more interested in a different approach,
introduced by Kambhatla and Leen [111] in the field of images processing and
known as Local Principal Components Analysis (LPCA).
LPCA employs a local linear approach to reduce the statistical dependency
between the variables of a sample and to achieve the desired optimal dimension
reduction. According to LPCA, a Vector Quantization (VQ) algorithm first partitions the data space into disjoint regions and then PCA is performed in each
cluster, relying on the observation that, if the local regions are small enough,
the data manifold will not curve much over the extent of the region and the
linear model will be a good fit. For the LPCA to be effective, the VQ algorithm
should not be independent of the PCA analysis. For example, a partitioning
based on the Euclidean distance is very intuitive and easy to implement but
the sample clustering is carried out without any connection with the following
projection onto the lower-dimensional subspace. For this reason, Kambhatla
and Leen [111] introduce a VQ algorithm based on a reconstruction error metric. Given an observation from the sample X, Xi , a global reconstruction error
for each observation can be defined as:
(k)
GRE Xi , X
= kXi Xi,q k =
(k)
=
Xi X
+ Zi,q A(k)
(4.42)
q
81
where X
is kth cluster centroid, Xi,q is the rank q approximation of Xi ,
Zi,q is the ith value of the truncated set of PCs, Zq , and A(k)
q is the matrix obtained by retaining only the first q eigenvectors of the covariance matrix, S (k) ,
associated to the kth cluster. In the context of reacting systems Eq. (4.42)
needs to be modified to take into account the differences in size and units of
the state variables. In fact, a clustering based on GRE would lead to an optimization with respect to temperature only. Therefore, the original LPCA
algorithm from Kambhatla and Leen [111] was modified [92] to include data
preprocessing (Section 4.2.2) in the quantization scheme. A very stable algorithm is obtained by using a global scaled reconstruction error metric, GSRE ,
defined as:
(k)
f
f
GSRE Xi , X , D =
X
X
(4.43)
i
i,q
ei is the ith observation of the sample scaled by D, the diagonal matrix
where X
whose jth diagonal element is the scaling factor dj associated to xj . The
proposed LPCA algorithm, briefly referred as VQPCA, can be summarized as
follows:
(k)
E (GSRE )
E [var (e
xj )]
(4.44)
(4.45)
Figure 4.6: Schematic illustration of the FPCA algorithm [92] for a CO/H2
flame [112]..
coordinate system is identified in each cluster. With respect to the VQPCA
approach, FPCA allows a very fast clustering. However, it is not possible to
state a priori that the choice of the mixture fraction as conditioning variable
is the best available.
In the following, the local approaches will be compared to the classic approach consisting in the application of PCA to compete sets of data, i.e. taking
k = 1, and denoted with Global PCA (GPCA).
4.4
4.4.1
Experimental data
High fidelity experimental data provided under the framework of the Workshop
on Measurement and Computation of Turbulent Non-premixed Flames (TNF
workshop) [80] have been used to assess the PCA methodology.
The first flame investigated in the present study is a turbulent non-premixed
CO/H2 /N2 (0.4/0.3/0.3 by vol.) jet flame [112], hereafter called simply jet
flame, selected as base case for the analysis due to its favorable properties. In
84
4.4.2
Numerical data
In conjunction with high fidelity experimental data, numerical results from the
Direct Numerical Simulation (DNS) of CO/H2 oxidation with detailed chemistry [114] have also been considered. Details about the DNS simulations and
code can be found in Sutherland et al. [88]. Two DNS data sets have been considered: a spatially evolving and a temporally evolving jet, characterized by a
significant degree of extinction. For the first data set, indicated as DNS1, three
temporal slices, each consisting of approximately 1.500.000 scalar observations
(T, H2 , O2 , O, OH, H2 O, H, HO2 , H2 O2 , CO, CO2 , HCO), are available; the
second data set, DNS2, consists of twelve temporal slices, each one comprising
around 700.000 observations of the same variables.
The advantage of DNS data with respect to experimental data lies in the
large amount of data accessible. Moreover, DNS simulations give access to
many additional variables, beside scalar values, which are not provided by any
experimental campaign. In particular, the scalar source terms can be extracted
from DNS simulations, thus allowing to judge the capabilities of the extracted
PCs to parametrize not only the original variables, but also their source terms.
Of course, in the perspective of adopting PCA as a predictive model, the generation of data with DNS for PCs extraction does not represent a viable solution
and other approaches, such as One Dimensional Turbulence (ODT) [115], could
be pursued.
4.5
Results
This section describes the results of the PCA methodology applied to the experimental and numerical data sets described in Section 4.4 are here presented.
First, the capabilities of PCA for the identification of low-dimensional manifolds in turbulent reacting systems is investigated. In particular, the effect of
the preprocessing strategies and modeling approaches (i.e. GPCA vs. LPCA)
on the manifold dimensionality is thoroughly discussed, trying to provide also
a physical interpretation for the extracted PCs.
Then, the feasibility of a PCA based combustion model is discussed. The
PCA model is validated a priori using the DNS data sets and its performances
are compared to those of an ideal flamelet parametrization (Chapter 2).
4.5.1
The objective of the present Section is to provide a methodology i) to investigate the existence of low-dimensional manifolds in turbulent flames, ii)
to find the most compact representation for them and iii) to guide the selection of optimal reaction variables able to accurately reproduce the state
space of a reacting system. PCA has been previously applied to combustion.
Frouzakis et al. [116] applied PCA for data reduction of two-dimensional DNS
86
4.5. Results
data of opposed jet flames. The analysis was aimed at identifying the number of components required to accurately approximate the original data. To
this purpose, the correlations among velocities, pressure and species concentrations at different times were taken into account, thus leading to eigenvectors
which are linear combination of the temporal snapshots considered. Similarly,
Danby and Echekki [117] implemented PCA for the analysis of an unsteady
two-dimensional direct numerical simulation of auto ignition in homogeneous
hydrogen air mixtures, with the main purpose of determining the requirements
to reproduce passive and reactive scalars during the process of auto ignition.
The approach presented here is quite different from the ones described above.
The main purpose of the developed PCA methodology is to find correlations
among the state variables (temperature and species concentration) to allow
an optimal approximation of the system in a low-dimensional space. Such an
approach leads to the determination of eigenvectors which are linear combinations of the original variables in a way that allows reducing the dimension of the
system. A similar method was proposed by Maas and Thvenin [118] for the
analysis of DNS data. However, they only considered a very small sampling in
state space. The current study provides significantly more depth in its analysis,
and applies PCA to both experimental and numerical data sets.
4.5.1.1
Figure 4.7 shows the magnitude of the eigenvalues associated with the PCA
reduction of the jet flame data set, together with the contribution of the q
largest eigenvalues to the amount of variance explained by the new basis vectors.
The eigenvalue distribution reflects the covariance structure of the data set,
shown in Table 4.1, and obtained by applying the auto scaling criterion. It is
clear that the first two eigenvalues alone account for more than the 92% of the
total variance in the data. On the other hand, the last four smallest eigenvalues
are very close to zero; therefore, they contain no useful information and only
explain linear dependencies among the original variables. Therefore, a strong
size reduction, from 9 to 2 or 3, can be accomplished by using PCA, through
the identification of the most active directions in the original data. The total,
tq , and individual variance, tq,j , accounted for the jet flame by the first two
or three eigenvalues are listed in the first two columns of Table 4.2. It can be
observed that, by choosing q = 2, it is possible to capture more than 90% of the
individual variances of all the main species and temperature, while the minor
species, OH and NO, require an additional component, q = 3, to reach levels
of approximation comparable to the other state variables.
This is confirmed by the analysis of the parity plots of temperature and
species mass fractions given by the PCA reconstruction for the cases q = 2
(Figure 4.8) and q = 3 (Figures 4.9). It can be observed that the addition of a
component has a small effect on temperature and main species, whose variation
is mainly explained by the first two components (Table 4.2). Moreover, the par87
Figure 4.7: Scree-graph and histograms of the q largest eigenvalues for the jet
flame data set, preprocessed with auto scaling.
ity plots of temperature (Figures 4.8 and 4.9 (a)), H2 O mass fraction (Figures
4.8 and 4.9 (d)) and minor species such as OH and NO (Figures 4.8 and 4.9
(e, f)) point out the existence of non linear deviations in the recovered data,
which can be probably ascribed to non linear dependencies among the original
variables. This result suggests that the low-dimensional projection of the thermochemical state shows significant non linearities which cannot be taken into
account with a global linear approach. Therefore, specific algorithms performing PCA in locally linear regions of the data (Section 4.3) could be taken into
account, to improve the accuracy of the parametrization.
Figure 4.10 shows the eigenvalue size distribution and the contribution of
the q largest eigenvalues to the total explained variance, tq , for Flame D, F and
JHC data sets. The covariance matrices for the data sets are shown in Table
4.3-4.5. Similarly to the jet flame, a significant size reduction can be achieved
for D and F flames, although an additional component is required, q = 3 or
q = 4, due to the higher complexity of the piloted flames (Section 4.4.1). On
the other hand, the JHC data set shows a higher dimensionality and at least 4
components are needed to explain as much as 90% of the total variance in the
original data. The number of required PCs, q, increases to 5 if an individual
variance, tq,i , above 90% is desired for all the variables, as indicated in Table
4.6. Such result is particularly interesting for the present Thesis, as it confirms
the complexity in the numerical modeling of the flameless combustion regime
[45, 78, 79], caused by the overlap between chemical and mixing scale and,
thus, by the need of optimal progress variables for the description of complex
interactions which take place in such regime.
Table 4.6 lists the values of tq and tq,j accounted for Flame D, F and for
JHC. It is interesting to observe the very strong similarities between Flame
D and F, confirmed by the analysis of their covariance structure (Tables 4.3
88
T
YO2
YN2
YH2
YH2 O
YCO
YCO2
YOH
YN O
YN 2
YH2 O
T
Y O2
YH2
YCO
YCO2
1.000 0.825 0.512 0.005
0.938
0.117
0.984
1.000
0.887 0.541 0.909 0.646 0.767
1.000 0.835 0.667 0.902 0.438
1.000
0.196
0.973 0.082
1.000
0.329
0.892
1.000
0.024
1.000
YOH
0.771
0.562
0.266
0.168
0.725
0.081
0.793
1.000
YN O
0.815
0.558
0.256
0.170
0.678
0.113
0.855
0.639
1.000
Table 4.1: Covariance matrix for the jet flame data set. Scaling criterion adopted: auto scaling.
4.5. Results
89
tq,i (%)
auto
q=2
0.971
0.986
0.986
0.968
0.930
0.994
0.973
0.738
0.772
0.924
q=3
0.973
0.986
0.986
0.969
0.936
0.994
0.977
0.940
0.930
0.966
range
q=2
0.983
0.994
0.981
0.962
0.945
0.995
0.979
0.731
0.728
0.946
q=3
0.991
0.994
0.981
0.963
0.945
0.997
0.987
0.991
0.795
0.975
max
q=2
0.979
0.997
0.971
0.957
0.944
0.990
0.977
0.745
0.729
0.942
q=3
0.990
0.997
0.971
0.960
0.944
0.994
0.988
0.992
0.802
0.975
vast
q=2
0.992
0.975
1.000
0.945
0.940
0.979
0.981
0.660
0.744
0.992
q=3
0.992
0.978
1.000
0.947
0.978
0.980
0.985
0.687
0.970
0.996
level
q=2
0.896
0.942
0.965
0.991
0.870
0.987
0.908
0.870
0.701
0.949
q=3
0.943
0.961
0.970
0.991
0.884
0.987
0.959
0.993
0.926
0.980
Table 4.2: Total, tq , and individual variance, tq,j , accounted for the jet flame data set, as a function of the number of retained
PCs, q, and the preprocessing criterion.
T
YO2
YN2
YH2
YH2 O
YCO
YCO2
YOH
YN O
tq (%)
90
4.5. Results
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.8: Parity plots of temperature (a), H2 O (b), H2 (c), CO (d), OH (e)
and NO (f) mass fractions illustrating the GPCA (q = 2) reduction of the jet
flame data set. Scaling criterion adopted: auto scaling.
91
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.9: Parity plots of temperature (a), H2 O (b), H2 (c), CO (d), OH (e)
and NO (f) mass fractions illustrating the GPCA (q = 3) reduction of the jet
flame data set. Scaling criterion adopted: auto scaling.
92
4.5. Results
and 4.4), thus indicating that the relations between the state variables are not
strongly affected by the increase in Reynolds number from one flame to the
other.
A closer look at the covariance matrices structure indicate that, with the
exception of the JHC data set, there is always a strong correlation between
temperature, oxidation products (CO2 , H2 O), OH and NO (Table 4.1, Tables
4.3-4.4), as it is expected for a turbulent non premixed flame. The covariance
matrix for the JHC data set still shows a strong correlation between temperature and products mass fractions; however, the covariance between temperature and the minor species, i.e. OH and NO, is lower. Once again, this indicate
the existence of a more complex flame structure, arising from a balance between
turbulent mixing and chemical kinetics.
Figure 4.11 and Figure 4.12 show the GPCA reconstruction of Flame F,
with q = 3 and q = 4, respectively. Similarly to the jet flame, the addition
of a PC barely affects the accuracy in the prediction of the major species, as
it mainly acts on the prediction of the minor species, i.e. OH and NO (Table
4.6). Very similar results are observed for Flame D.
With regard to the JHC system, very large (non linear) deviations are observed for temperature (Figure 4.13 (a)), CO (Figure 4.13 (c)) and OH (Figure
4.13 (e)), for the case q = 4. The increase of the number of PCs to q = 5
strongly improves the prediction of CO (Figure 4.14 (c)) and OH (Figure 4.14
(e)), but not temperature (Figure 4.13 (a)) and other species, i.e. CO2 . It is
noteworthy that NO is very well captured, even with q = 4. This results suggests that one of the retained PCs is highly correlated with NO, thus leading
to the observed result.
PCs interpretation and rotation It is interesting to provide an interpretation of the results described above by looking at the structure of the eigenvectors matrices for the different experimental data sets. Tables 4.7-4.10 report the
weights of the original variables on the retained principal components, before
(a) and after applying VARIMAX rotation, for the jet flame, Flame D, Flame F
and JHC, respectively. As it was pointed out in Section 4.2.4, the PCs weights
are determined to maximize variance and not physical interpretability. However, PCs rotation can help overcome such difficulty, through the determination
of a simpler structure for the eigenvectors.
The analysis of the rotated eigenvectors matrices shows a common pattern
for the different systems, again with the exception of the JHC data set. It
can be observed how the first (rotated) PC is always an ensemble component,
consisting of temperature, oxidizer, product species and NO. This component
has the effect of capturing as much as possible of the original data variance in
the data, trying to explain the (non linear) relations among the state variables
with a single parameter. The other PCs differs from one data set to the other.
For the jet flame, the second PC consists of reactants (CO, H2 , Air)., while
the third is basically OH, which is determinant for capturing the reaction zone
93
(a)
(b)
(c)
Figure 4.10: Scree-graph and histograms of the q largest eigenvalues for Flame
D (a), Flame F (b) and JHC (c). Scaling criterion adopted: auto scaling.
94
T
YO2
YN2
YH2
YH2 O
YCH4
YCO
YCO2
YOH
YN O
T
Y O2
YN2
YH2
YH2 O
YCH4
YCO
YCO2
YOH
YN O
1.000 0.960 0.134 0.418
0.979 0.295 0.535
0.984
0.681
0.912
1.000
0.323 0.589 0.977 0.093 0.688 0.932 0.645 0.859
1.000
0.548
0.194
0.919
0.320
0.056
0.240
1.000
0.102 0.312 0.221 0.329
1.000
0.442
0.213
0.372
1.000
0.708
0.933
1.000
0.688
1.000
Table 4.3: Covariance matrix for Flame D data set. Scaling criterion adopted: auto scaling.
4.5. Results
95
Table 4.4: Covariance matrix for Flame F data set. Scaling criterion adopted: auto scaling.
YOH
YN O
YCO
YCO2
YH2 O
YCH4
YN 2
YH2
T
YO2
T
1.000 0.968 0.073 0.418
0.984 0.312 0.543
0.981
0.745
0.824
YO2
1.000
0.241 0.545 0.976 0.128 0.660 0.940 0.748 0.790
YN2
1.000 0.378 0.109 0.882 0.349 0.053 0.057 0.026
1.000
0.512
0.124
0.926
0.305
0.189
0.229
YH2
YH2 O
1.000 0.296 0.636
0.956
0.754
0.816
1.000
0.041 0.327 0.232 0.297
YCH4
YCO
1.000
0.432
0.262
0.331
YCO2
1.000
0.767
0.851
YOH
1.000
0.633
YN O
1.000
96
T
YO2
YN2
YH2
YH2 O
YCH4
YCO
YCO2
YOH
YN O
T
YO2
YN 2
YH2
YH2 O
YCH4
YCO
YCO2
YOH
YN O
1.000 0.476 0.616 0.534 0.892 0.534 0.292
0.913
0.427
0.388
1.000
0.266 0.072 0.123
1.000
0.362
0.376
1.000
0.214
1.000
Table 4.5: Covariance matrix for JHC data set. Scaling criterion adopted: auto scaling.
4.5. Results
97
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.11: Parity plots of temperature (a), H2 O (b), CO (c), H2 (d), OH (e)
and NO (f) mass fractions illustrating the GPCA (q = 3) reduction of Flame
F. Scaling criterion adopted: auto scaling.
98
4.5. Results
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.12: Parity plots of temperature (a), H2 O (b), CO (c), H2 (d), OH (e)
and NO (f) mass fractions illustrating the GPCA (q = 4) reduction of Flame
F. Scaling criterion adopted: auto scaling.
99
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.13: Parity plots of temperature (a), H2 O (b), CO (c), H2 (d), OH (e)
and NO (f) mass fractions illustrating the GPCA (q = 4) reduction of JHC
data set. Scaling criterion adopted: auto scaling.
100
4.5. Results
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.14: Parity plots of temperature (a), H2 O (b), H2 (c), CO (d), OH (e)
and NO (f) mass fractions illustrating the GPCA (q = 5) reduction of JHC
data set. Scaling criterion adopted: auto scaling.
101
T
YO2
YN2
YH2
YH2 O
YCH4
YCO
YCO2
YOH
YN O
tq (%)
tq,i (%)
Flame D
q=3
0.971
0.982
0.979
0.959
0.987
0.984
0.940
0.965
0.743
0.902
0.941
q=4
0.985
0.986
0.981
0.966
0.988
0.986
0.965
0.985
1.000
0.932
0.977
Flame F
q=3
0.967
0.978
0.979
0.969
0.983
0.984
0.961
0.969
0.711
0.792
0.946
q=4
0.971
0.979
0.980
0.970
0.984
0.984
0.969
0.974
0.978
0.892
0.968
JHC
q=4
0.932
0.961
0.991
0.998
0.966
0.999
0.757
0.911
0.735
0.999
0.925
q=5
0.948
0.974
0.991
0.998
0.966
0.999
0.998
0.970
1.000
1.000
0.984
correctly.
Moving on to Flame D and F, the second PC observed for the jet flame is
somehow split into two components, one representative of a mixture fraction
(both N2 and CH4 are very highly correlated to mixture fraction) and one representative of the intermediate product species (CO, H2 ). The last component
is again OH, the flame marker. The eigenvectors structures of Flame D and
F are very similar. There is only a significant difference which could be highlighted, namely the NO weights on the fourth component. For Flame D, NO
does not appear as a relevant weight on the last PC, whereas it is no negligible
for the fourth component of Flame F, thus reflecting the lower correlations beTable 4.7: Retained (a) and rotated (b) eigenvectors for the jet flame data set.
T
YO2
YN2
YH2
(a)
YH2 O
YCO
YCO2
YOH
YN O
a1
0.40
-0.41
-0.33
0.14
0.41
0.18
0.39
0.31
0.31
a2
0.18
0.15
0.38
-0.55
0.05
-0.53
0.24
0.27
0.29
a3
-0.07
T
0.00
YO2
0.01
YN2
-0.05
YH2
(b)
0.12
YH2 O
0.02
YCO
-0.11
YCO2
0.74
YOH
-0.65
YN O
102
a1,r
0.44
-0.30
-0.13
-0.10
0.36
-0.06
0.46
0.22
0.54
a2,r
0.00
0.31
0.48
-0.56
-0.13
-0.56
0.05
0.11
0.13
a3,r
0.03
-0.07
-0.02
-0.07
0.20
0.01
0.00
0.81
-0.54
4.5. Results
Table 4.8: Retained (a) and rotated (b) eigenvectors for Flame D data set.
T
YO2
YN2
YH2
(a) YH2 O
YCH4
YCO
YCO2
YOH
YN O
a1
0.40
-0.40
-0.06
0.23
0.41
-0.10
0.28
0.39
0.32
0.34
a2
-0.08
-0.06
-0.59
0.39
-0.03
0.57
0.33
-0.13
-0.11
-0.15
a3
0.07
-0.07
-0.38
-0.55
0.00
0.41
-0.49
0.18
0.25
0.22
a3
0.11
T
-0.04
YO2
-0.05
YN2
0.04
YH2
0.05 (b) YH2 O
0.01
YCH4
-0.14
YCO
YCO2
0.11
-0.83
YOH
0.51
YN O
a1,r
0.46
-0.39
-0.07
0.00
0.38
-0.07
0.01
0.48
0.00
0.50
a2,r
0.02
0.08
0.69
-0.01
0.04
-0.72
0.02
0.00
0.00
0.01
a3,r
0.00
0.13
0.05
-0.69
-0.15
0.06
-0.67
0.09
0.01
0.16
a3
0.02
0.03
0.02
0.08
-0.05
0.02
-0.07
0.01
-0.99
0.04
Table 4.9: Retained (a) and rotated (b) eigenvectors for Flame F data set.
T
YO2
YN2
YH2
(a) YH2 O
YCH4
YCO
YCO2
YOH
YN O
a1
0.40
-0.40
-0.10
0.23
0.41
-0.08
0.28
0.39
0.29
0.37
a2
0.10
0.06
0.56
-0.40
0.03
-0.55
-0.34
0.14
0.19
0.18
a3
0.04
-0.03
-0.39
-0.49
-0.05
0.45
-0.44
0.13
0.40
0.16
a3
0.20
T
-0.11
YO2
YN2
-0.08
YH2
-0.14
0.07 (b) YH2 O
YCH4
0.08
-0.26
YCO
0.24
YCO2
-0.84
YOH
0.29
YN O
a1,r
0.43
-0.39
-0.07
0.03
0.39
-0.08
0.04
0.45
0.13
0.53
a2,r
-0.03
-0.08
-0.70
0.01
-0.04
0.70
-0.01
-0.01
0.00
0.02
a3,r
-0.02
0.11
0.04
-0.70
-0.11
0.04
-0.66
0.09
0.06
0.19
a3
-0.01
0.07
0.00
0.12
-0.05
0.00
-0.09
-0.03
-0.92
0.35
tween the two variables (Table 4.4), probably determined by the higher physical
complexity of the system.
Finally, regarding the eigenvectors of the JHC system, it can be observed
how the first rotated component does not show a large influence of NO, differently from all the other systems. This can be explained by taking into account
that the first PC tries to explain as much as possible of the data variability.
It is well known [3, 4, 2, 45] that NO formation in flameless combustion is
more homogeneous than in traditional non premixed combustion, due to the
smoother temperature gradients; therefore, NO is characterized by less variability and disappears from the first PC. The second and third PCs are, again,
representative of reactant and intermediate combustion products (Table 4.5),
reflecting a similar pattern to that observed for Flame F (and D). Differently
from the piloted flames, the fourth component is exclusively NO, thus meaning
that none of the previous components can take into account NO formation and
a specific PC is needed. Then, the OH component, present in all the other
103
a1
0.42
-0.11
0.38
-0.35
0.40
-0.35
0.16
0.39
0.19
0.22
a2
-0.16
0.59
0.34
-0.39
-0.27
-0.39
-0.23
-0.26
-0.07
-0.07
a3
0.08
0.02
-0.10
0.08
-0.10
0.08
-0.67
0.08
0.68
0.21
a4
0.12
-0.11
0.07
-0.04
0.06
-0.04
-0.10
0.11
0.20
-0.95
a5
0.16
-0.15
0.03
0.00
0.03
0.00
-0.64
0.31
-0.67
0.00
T
YO2
YN2
YH2
(b) YH2 O
YCH4
YCO
YCO2
YOH
YN O
a1,r
0.47
-0.49
0.09
-0.02
0.46
-0.02
-0.02
0.56
0.01
0.01
a2,r
0.14
0.37
0.51
-0.53
0.06
-0.53
0.02
0.04
-0.02
-0.01
a3,r
0.06
0.08
-0.02
0.00
-0.18
0.01
-0.97
0.14
0.00
0.00
a4,r
0.00
-0.06
0.02
0.00
-0.03
0.00
0.00
-0.01
0.00
-1.00
a5,r
-0.06
-0.04
0.01
0.00
-0.01
0.00
0.00
0.05
-1.00
0.00
4.5. Results
Table 4.11: Principal variables for the jet flame data set, as provided by the
different methods described in Section 4.2.3.5.
Method
B4
B2
M3
MC1
MC2
MC3
PF
Principal Variables
H2 O, H2 , OH
CO2 , H2 , OH
T, CO, OH
NO, CO, OH
CO2 , CO, OH
CO2 , CO, OH
T, O2 , H2
Table 4.12: Principal variables for Flame D, F and the JHC data set. PV
method: MC2 (Section 4.2.3.5).
Data set
Flame D
Flame F
JHC
Principal Variables
CH4 , CO, CO2 , OH
CH4 , CO, CO2 , OH
CH4 , CO, CO2 , NO, OH
than that CO2 and H2 O. Finally, the PF method provides a different solution,
neglecting OH as PV and replacing it with O2 . However, this solution was
considered unreliable, being very far from the pattern identified by all the other
methods.
On the basis of the results obtained for the jet flame case, it was chosen
to adopt the MC2 method for the extraction of the PVs, as it provides results
comparable to most of the other models and satisfies a very appealing property of PCA, the minimization of the reconstruction error. Applying the MC2
methods to the other data sets, we get the results in Table 4.12. It is very
interesting to observe that the same considerations derived from the analysis of
the rotated PCs can be done here, with a clearer physical interpretation. The
PVs selected for Flame D and F reflect the patters of the PCs, as they include
a mixture fraction variable, an intermediate and a product species and OH.
Finally, for the JHC system, the same set of PVs obtained for Flame D (and F)
is recovered, although augmented with NO, thus confirming the need to take
explicitly into account the formation of such pollutant species.
Effect of preprocessing strategies on the manifold dimensionality In
this Paragraph, the effect of preprocessing strategies on the PCA reduction is
presented, focusing on the jet flame data set. The performances of auto scaling
have been compared to those of other scaling criteria, presented in Section 4.2.2.
Figure 4.15 shows the eigenvalue size distribution and the contribution of the
q largest eigenvalues to the total variance explained, when applying scaling
105
The GPCA analysis presented in Section 4.5.1.1 has shown the existence of severe non linearities in the parity plots of observed and predicted state variables.
Therefore, the determination of the manifold dimensionality with GPCA can
be somehow biased, as a globally linear approach is adopted to model complex
non linear interactions. In this context, LPCA (Section 4.3) can provide locally
linear models, able to follow the non linear development of the thermochemical
manifold in low-dimensional space.
Table 4.13 lists the values of the error metric, GSRE,n (Section 4.3), given
by GPCA, VQPCA and FPCA for the jet flame, Flame F and the JHC data
set, as a function of the number of clusters, k, and retained PCs, q.
It is interesting to observe (Table 4.13) that, when the reconstruction error is evaluated on a scaled basis, all the state variables become relevant and
the goodness of reconstruction can be properly judged. It should be recalled
106
4.5. Results
(a)
(b)
(c)
(d)
Figure 4.15: Scree-graph and histograms of the q largest eigenvalues for the
jet flame data set, preprocessed with range (a), vast (b), level (c) and max (d)
scaling.
Table 4.13: Values of GSRE,n associated with the GPCA, VQPCA and FPCA
reconstructions of the jet flame, flame F and JHC data set, as a function of the
number of clusters, k, and retained PCs, q.
k
GPCA
VQPCA
FPCA
1
2
4
6
8
2
4
6
8
Jet flame
q=2
0.681
0.208
0.112
0.091
0.079
0.214
0.121
0.103
0.092
q=3
0.309
0.106
0.056
0.046
0.034
0.084
0.066
0.051
0.045
Flame F
q=3
0.707
0.205
0.131
0.095
0.090
0.263
0.158
0.134
0.122
107
q=4
0.320
0.119
0.076
0.052
0.040
0.147
0.087
0.067
0.063
JHC
q=3
1.552
0.214
0.099
0.078
0.058
0.410
0.123
0.112
0.093
q=4
0.752
0.093
0.050
0.037
0.028
0.105
0.059
0.052
0.044
4.5. Results
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.16: Parity plots of temperature (a), H2 O (b), CO (c), H2 (d), OH (e)
and NO (f) mass fractions illustrating the VQPCA (q = 2, k = 8) reduction of
the jet flame data set. GSRE,n = 0.08.
109
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.17: Parity plots of temperature (a), H2 O (b), CO (c), H2 (d), OH (e)
and NO (f) mass fractions illustrating the VQPCA (q = 3, k = 8) reduction of
Flame F data set. GSRE,n = 0.08.
110
4.5. Results
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.18: Parity plots of temperature (a), H2 O (b), CO (c), H2 (d), OH (e)
and NO (f) mass fractions illustrating the VQPCA (q = 3, k = 6) reduction of
JHC data set. GSRE,n = 0.08.
111
FPCA
1
2
4
6
8
10
2
4
6
8
10
DNS1
q=2
3.130
0.816
0.307
0.116
0.043
0.036
0.625
0.243
0.122
0.062
0.046
q=3
1.830
0.176
0.065
0.025
0.010
0.009
0.216
0.052
0.030
0.020
0.015
DNS2
q=3
1.800
0.734
0.235
0.141
0.114
0.096
0.773
0.263
0.204
0.167
0.140
q=4
1.130
0.369
0.076
0.046
0.038
0.033
0.417
0.081
0.066
0.054
0.043
VQPCA has been exploited also for the analysis of the DNS data sets, DNS1
and DNS2, described in Section 4.4.2. Regarding the DNS2 data set, multiple
time steps have been merged before analyzing the data, namely t = 1.5e 03 s,
t = 2.0e 03 s, t = 2.5e 03 s and t = 3.0e 03 s. However, the resulting data
set (3.800.000 data points) have been conditioned in mixture fraction space,
between f = 0.1 and f = 0.8, to overcome memory issues (Figure 4.19).
Table 4.14 lists the values of GSRE,n given by GPCA, VQPCA and FPCA
for the DNS data sets. Similarly to the JHC case, the first partition is characterized by a dramatic reduction of GSRE,n also for the DNS data sets. This
indicates, once again, that a global approach would lead to misleading estimation of the manifold dimensionality. Table 4.14 also shows that DNS2 requires
an additional PC with respect to DNS1 to reach acceptable levels of accuracy.
This is determined by the higher complexity of the DNS2 data set, characterized
by a significant degree of extinction.
Figure 4.20 shows the contour plots of original and recovered temperature
and OH mass fraction distribution for the DNS1 data set. It can be observed
how a VQPCA approach with q = 3 and k = 8 allows to capture with great
accuracy the flame features, resulting in a very small reconstruction error,
GSRE,n = 0.01. This is a very appealing result, indicating that VQPCA could
be effectively exploited for the compression of DNS data sets, characterized by
very large storage requirements, for visualization and post-processing purposes.
Very strong compressions could be achieved, as shown here, prescribing the desired accuracy of the recovered data. For a given manifold dimensionality, the
dimensions of the reduced data sets are independent of the number of clusters;
therefore, the parameters q and k can be varied to optimize the accuracy and
112
4.5. Results
(a)
(b)
Figure 4.19: Original (a) and conditioned (b) temperature field for DNS2 data
set at time step t = 1.5e 03 s.
113
4.5. Results
(a)
(a)
(b)
(b)
Figure 4.20: Contour plots of original and recovered temperature (a, a) and
OH mass fraction (b, b) distribution for DNS1. VQPCA reduction with q = 3
and k = 8. GSRE,n = 0.01.
115
(a)
(b)
Figure 4.21: Parity plots of original and recovered temperature (a, a) and OH
mass fraction (b, b) distribution for DNS1. VQPCA reduction with q = 3 and
k = 8. GSRE,n = 0.04.
ignition. In the context of the Conditional Moment Closure [73], for example,
it has been recognized [119] that conditioning on mixture fraction is not sufficient for Flame F and a second conditioning variable should be used. Figure
4.27 shows the temperature as a function of mixture fraction for the two clusters selected by VQPCA with q = 3. It can be observed that, differently from
the jet flame, the VQPCA algorithm extracts features from the whole mixture
fraction space in order to achieve the best q-dimensional representation of the
thermochemical state of the system. Then, it can be concluded that, for Flame
F, mixture fraction does not represent an optimal reaction variables. Therefore, VQPCA could provide an appealing alternative to guide the selection of
the most compact subset of reaction variables needed to properly describe the
thermochemical state of such reacting system.
Figure 4.24 (c) shows the comparison between the reconstruction provided
by VQPCA and FPCA for the JHC data set. Similarly to Flame F, the VQPCA
algorithm provides GSRE,n , 10-50% lower than those obtained with FPCA.
The mixture fraction partitioning does not optimally follow the curvature of
the manifold in state space, indicating the complexity of turbulence/chemistry
interactions for the system. This is further confirmed by Figure 4.28, which
shows the partition of temperature in the two clusters selected by VQPCA
with q = 3, in mixture fraction space. The algorithm selects a first cluster
characterized by a lean branch and part of a rich region, with an aspect characteristic of a non premixed flame. On the other hand, the second cluster shows
important non equilibrium phenomena, such as extinction, similarly to cluster
2 for Flame F (Figure 4.27 (b)).
To better understand the underlying mechanism of the VQ partitioning
algorithm, it is possible to analyze the structure of the rotated eigenvectors in
116
4.5. Results
(a)
(a)
(b)
(b)
Figure 4.22: Contour plots of original (a, b) and recovered (a, b) temperature
distribution for DNS2, at two different time steps, i.e. t = 1.5e 03 s (a,
a) and t = 2.0e 03 s (b, b). VQPCA reduction with q = 4 and k = 8.
GSRE,n = 0.04.
(a)
(b)
Figure 4.23: Parity plots of temperature (a) and OH (b) mass fraction illustrating the VQPCA (q = 4, k = 8) reduction of DNS2 data set. GSRE,n = 0.04.
117
Figure 4.24: Values of GSRE,n as a function of the number of clusters, k, and retained PCs, q, for the jet flame, Flame F and
JHC data sets.
118
4.5. Results
(a)
(b)
119
(a)
(b)
(a)
(b)
120
4.5. Results
Table 4.15: Rotated eigenvector in the first (a) and second (b) cluster identified
by VQPCA for Flame F. q = 3 and GSRE,n = 0.21
T
YO2
YN2
YH2
(a) YH2 O
YCH4
YCO
YCO2
YOH
YN O
a1,r
-0.01
-0.09
-0.70
-0.01
-0.03
0.71
0.00
0.00
0.00
-0.02
a2,r
0.55
-0.44
-0.07
-0.04
0.44
-0.10
0.07
0.51
0.05
0.18
T
YO2
YN2
YH2
(b) YH2 O
YCH4
YCO
YCO2
YOH
YN O
a1
0.30
-0.34
-0.12
0.05
0.33
-0.01
0.12
0.35
0.60
0.42
a2
-0.01
-0.09
-0.12
0.64
-0.09
0.02
0.70
-0.11
-0.17
-0.13
the two clusters identified by VQPCA for Flame F (Table 4.15) and for the
JHC data set (Table 4.16). For Flame F, the eigenvectors associated to the
first cluster (Table 4.15 (a)) are a mixture fraction 3 and a linear combination
of major species and temperature, respectively. This supports the graphical
observation provided by Figure 4.27 (a), which shows the first cluster to be
characterized by the lean and rich branches of the flame. On the other hand,
the reaction region identified by Figure 4.27 (b), needs to be described by means
of parameters with a strong contribution of intermediate and minor species, as
it is shown in Table 4.15 (b).
With regard to the JHC data set, the structure of the rotated eigenvectors prompts very interesting considerations. In particular, the second cluster
Table (4.16 (b)) is parametrized by a first component with significant weights
on the fuel species, intermediate species and temperature, whereas the second
PC reduces to OH. Thus, VQPCA is able to extract the subset of the data set
dominated by finite rate chemistry effects by means of progress variables able to
capture the ignition process. In the context of the numerical modeling of flameless combustion, such result confirms the need of combustion models suited for
the description of turbulence-chemistry interactions in such combustion regime.
As far as the numerical data are concerned, the VQPCA and FPCA reductions appear comparable for the DNS1 data set (Table 4.14), while VQPCA
outperforms FPCA for DNS2 (Table 4.14).This confirms that mixture fraction
is not optimal from the point of view of error minimization when the physics
under investigation become too complex. This is somehow expected, as mixture
fraction is only a measure of the local system stoichiometry and, then, it can
only cover relatively fast scales.
The small discrepancies between FPCA and VQPCA for DNS1 (and for
3
The denomination mixture fraction is used here because the variables which define the
first PC are highly correlated with the f , cov (f, N2 ) = 0.97 and cov (f, CH4 ) = 0.90.
121
a1
0.48
-0.51
0.07
0.00
0.46
0.00
0.05
0.53
0.09
-0.03
a2
0.01
0.03
-0.02
0.00
0.04
0.00
-0.03
0.04
0.01
1.00
T
YO2
YN2
YH2
(b) YH2 O
YCH4
YCO
YCO2
YOH
YN O
a1,r
-0.25
-0.13
-0.50
0.48
-0.23
0.49
0.15
-0.32
0.10
-0.05
a2,r
0.21
-0.03
0.00
-0.06
0.06
0.00
-0.07
0.05
0.96
0.14
the jet flame) suggest that VQPCA actually tends to FPCA when dealing with
relatively simple systems, characterized by fast chemistry and a small degree
of extinction. This is confirmed by Figure 4.29, showing the VQPCA (a-d) and
FPCA (a-d) partition of the DNS1 data. Both the approaches identify a rich
and lean region, together with a rich and lean reacting layer.
Computational cost of the analysis In the above discussion, VQPCA
has been showed to be generally superior to FPCA from the point of view
of reconstruction error minimization. However, it should be reminded that
VQPCA is an iterative algorithm, whereas FPCA is based on the supervised
partitioning of data into bins of mixture fraction (Section 4.3). Therefore,
the CPU time associated with VQPCA is certainly higher that that of FPCA;
moreover, it increases with k, as shown in Figure 4.30, for an experimental
and numerical data set. It is clear how the CPU associated with VQPCA can
reach values of the order of minutes, for the experimental data sets (Figure
4.30 (a)), and hours, for the numerical data sets (Figure 4.30 (b)), whereas
the corresponding CPU time of FPCA is of the order of seconds and minutes.
Therefore, FPCA represents certainly a valid solution for applications similar
to the jet or the DNS1 flame, as it optimizes both CPU time and accuracy of
predictions.
4.6
In the previous Section, a methodology based on Principal Components Analysis (PCA) has proposed for the identification of low-dimensional manifolds
in turbulent flames, the estimation of their dimensionality and the selection
of optimal reaction variables. The reduced representation given by PCA has
great potential, especially in its local formulations, i.e. VQPCA and FPCA.
122
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
Figure 4.29: Parity plots of temperature (a), H2 O (b), CO (c), H2 (d), OH (e)
and NO (f) mass fractions illustrating the VQPCA (q = 3, k = 6) reduction of
JHC data set. GSRE,n = 0.08.
Figure 4.30: CPU time associated with the FPCA and VQPCA reductions.as a
function of the number of clusters, k, and retained PCs, q, for the experimental
(a) and numerical (b) data sets.
123
4.6.1
Yk
+
=
Dk
+ k
t
xj
xj
xj
can be transformed into a transport equation for the reacting species. Introducing the material derivative and the Lewis number, the species equation
becomes:
DYk
2
=
Dt
cp Lek x2j
Yk
xj
(4.46)
+ k
2
k
=
+
.
(4.47)
Dt
cp Lek x2j
dk
dk
Indicating with aki the weight of the kth variable on the ith PC, the following
equation is obtained:
D
(Yk Y k )
dk
aki
2
=
cp Lek x2j
Dt
"
#
Yk Y k
k aki
aki +
.
dk
dk
(4.48)
Pp
(Yk Y k )
k=1
dk
Dt
aki
" p
#
2 X Yk Y k
=
aki +
cp Lek x2j
dk
k=1
1 X k aki
. (4.49)
dk
k=1
124
(4.50)
zi
1 X k aki
=
.
dk
(4.51)
k=1
zi
Qr
1 X k aki
=
+
cp dT
dk
(4.52)
k=1
(4.53)
where jZ is the mass diffusive flux of Z In Eq. (4.53), the source terms of
temperature and all species contribute to the source term for each PC.
4.6.2
A complete PCA modeling approach requires several ingredients. First, the PCs
must be identified using the procedure outlined in Section 4.1. This identification requires high-fidelity, fully-resolved data including source terms. Once the
PCs are selected, transport equations may be derived for each PC as described
in Section 4.6.1.
Second, the initial conditions (ICs) and boundary conditions (BCs) on the
PCs must be defined using the transformation matrix A. For Dirichlet BCs
on all the original variables, we obtain Dirichlet conditions on the PCs (ICs
are analogously defined). Likewise, Neumann conditions on X yield Neumann
conditions on Z. Mixed conditions on X yield Robin boundary conditions on
Z.
Diffusion terms in the transport equations for Z require evaluation of the
diffusive fluxes for each component of X. In turbulent flow calculations, the
molecular diffusion term is typically augmented by a turbulent diffusion term
125
4.6.3
This section presents the results of PCA applied to two DNS data sets of
non-premixed CO/H2 combustion. The DNS data-sets (Case A and B) have
been obtained using a code with 8th order spatial and 4th order temporal discretization. Detailed kinetics of CO/H2 oxidation have been used [114], along
with mixture-averaged transport approximations. The fuel stream is 0.45% CO,
0.05% H2 , and 0.5% N2 , giving a stoichiometric mixture fraction of fst = 0.4375,
and both fuel and air streams are at 300 K.
Case A is a spatially-evolving jet with an initial max = 25 s1 , while case
B is a temporally-evolving jet with an initial max = 125 s1 . The primary
difference between the two data-sets is the initial scalar dissipation rate ()
and turbulence intensity, which affects the degree of extinction observed; case
A exhibits virtually no extinction, while case B exhibits moderate extinction.
The existence of moderate extinction in case B is shown qualitatively in Figure
4.31 (a), which shows T versus at fst 4 . Additional details of the DNS code
and simulation configuration may be found elsewhere [120, 88].
To quantify the error in representing the data in low-dimensional space,
parametrized by Z, we calculate the R2 value,
n
X
R =1
(xij xij )2
"
i=1
#" n
#1
X
2
(xij xj )
(4.54)
i=1
The results shown in this Section refer to data conditioned on mixture fraction, f , since
this is a convenient variable to force as the first component.
126
(a)
(b)
Figure 4.31: Parametrization of temperature at fst by (a) and z1 (b) for case
B. Solid lines are the doubly-conditional mean temperature. R2 is calculated
from Eq. (4.54).
where xi is the ith observation of the jth variable, xij is its parametrized approximation, and x
j is the mean of xj . For the state variables, R2 is equivalent
to the parameter tq,j , introduced in Section 4.2.3.1. However, for the source
term, such parameter is not available and the R2 can be directly calculated.
Figures 4.31 (a) and 4.31 (b) show the parametrization (at fst ) of T by
and the first PC, z1 , respectively, for Case B. Examining Figure 4.31 (b), we
see that z1 acts as a progress variable, capturing the extinction process remarkably well. This has also been observed for other choices of progress variables
such as CO2 [88]. Comparing the two-parameter PCA approach with the (f, )
parametrization is reasonable since both are two-parameter models, although
the second parameter ( versus z1 ) represents different physical phenomena
(gradient versus chemical state). Figure 4.32 shows the parametrization of the
OH mass fraction by the common (f, ) and the proposed (f, z1 ) parametrizations. This demonstrates that the PCA approach can be used to represent a
wide range of the state variables, not temperature alone.
Also shown on Figures 4.31 (a) and 4.31 (b) is the R2 value as calculated by
Eq.(4.54). Table 4.17 lists R2 values for reconstruction of the temperature and
all species mass fractions as a function of the number of parameters adopted,
q. These values are a concise, quantitative representation of the information
presented graphically in Figs. 4.31 and 4.32. For example, for Case B with
q = 1, we obtain RT2 = 0.967, corresponding to Figure 4.31 (b). For comparison,
Table 4.17 also lists the R2 values given by the (f, ) parametrization, RT2 =
0.801 (Figure 4.31 (a)). Clearly, the two-parameter (f, z1 ) parametrization
reconstructs the temperature and most other state variables with much more
accuracy than the (f, ) parametrization. It should be noted that the results for
the (f, ) parametrization represent the best possible performance of a model
based on (f, ); the steady laminar flamelet model typically does not perform
ideally [88].
127
T
0.789
0.983
0.983
0.801
0.967
0.996
0.990
H2
0.344
0.259
0.936
0.509
0.370
0.845
0.904
O2
0.811
0.976
0.968
0.807
0.910
0.982
0.982
O
0.718
0.930
0.958
0.697
0.614
0.882
0.984
OH
0.165
0.240
0.963
0.426
0.736
0.931
0.979
H2 O
0.085
0.178
0.924
0.186
0.531
0.990
0.991
H
0.695
0.823
0.964
0.648
0.524
0.858
0.985
HO2
0.839
0.986
0.980
0.665
0.940
0.974
0.977
H2 O 2
0.816
0.916
0.985
0.729
0.849
0.941
0.933
CO
0.803
0.978
0.969
0.810
0.907
0.981
0.981
CO2
0.827
0.956
0.976
0.058
0.094
0.378
0.854
HCO
0.828
0.980
0.980
0.817
0.901
0.984
0.980
Table 4.17: R2 values defined by Eq. (4.54). Also shown are results for the parametrization. All results are at f = fst = 0.4375.
A
1
2
1
2
3
128
(a)
(b)
Table 4.17 also demonstrates that increasing the number of retained PCs
increases the accuracy with which the state variables are represented. This
indicates that one may select a desired error threshold and then determine the
minimum number of PCs required to achieve that accuracy. Conversely, one
may choose the number of PCs and estimate a priori the associated error.
4.6.4
The PCs are not conserved variables and their source terms must be parametrized
by the PCs. In this section we explore the ability of PCA to parametrize
source terms. Any function of X may be approximated by F(X) F (XAq ) .
However, it is more accurate to calculate F(X) directly from the data in pdimensional space and then project it onto Z by calculating the conditional
mean hF (X) |Z i. Thus, source terms are calculated directly from the original
observables, X, and their conditional means are projected onto Z. Figure 4.33
illustrates this for the two-dimensional (f, z1 ) parametrization of z1 .
Table 4.18 summarizes the ability of an q-dimensional PCA to parametrize
the source terms of the PCs. We first consider the columns describing the
results at fst . For case A, a two-dimensional parametrization (f, z1 ) captures
z1 with R2 z = 0.978. For case B, 3 PCs are required to parametrize z1 to
1
a similar degree of accuracy. Comparing the dimensionality requirements for
parametrizing Z with those for parametrizing the state variables (Table 4.17),
we see that parametrizing the source terms does not require more PCs than
parametrization of the state variables themselves, an encouraging result.
129
(a)
Figure 4.33: Parametrization of z1 at fst by z1 for case B. Solid line: doublyconditional mean value of z1 . R2 is calculated from Eq. (4.54).
A
B
q
z1
z2
z1
z2
z3
1
0.993
0.270
-
f = 0.2
2
3
0.985
0.996
0.844
0.967
0.835
0.955
0.976
f = fst = 0.4375
1
2
3
0.978
0.985
0.922
0.815
0.932
0.958
0.951
0.961
0.731
130
1
0.923
0.809
-
f = 0.6
2
0.934
0.876
0.852
0.883
-
3
0.902
0.909
0.831
4.7. Summary
4.6.5
The results presented thus far have been obtained locally at fst . One may
consider whether a PCA performed at fst is applicable at other f . We term
this a semi-local PCA. If the PCA is highly dependent on mixture fraction,
then one of two options must be considered
Eliminate the mixture fraction as a parameter and seek a global PCA on
the entire data set. This approach typically requires more PCs than a
PCA obtained at fst (Section 4.5.1).
Perform a local PCA (Section 4.5.1) in f -space and derive transport equations for Z|f . These equations would have exchange terms representing
transport in mixture fraction space. This approach is further complicated
by the fact that the definition of the PCs would vary with f .
If the PCA obtained at fst reasonably represents the data at other f , then
the transport equations derived in Section 4.6.2 may be used directly at all f ,
eliminating the need for conditional equations in f -space.
Tables 4.19 and 4.20 provide parametrization errors for the state variables
at f = 0.2 and f = 0.6, respectively. Table 4.18 shows the parametrization
errors for Z at f = 0.2 and f = 0.6. Interestingly, the parametrizations do
not perform well at lean conditions (especially for CASE B); the same is true for
the (f, ) parametrization. A posteriori testing is necessary to fully determine
the parametrization accuracy required. However, these results show promise
for the ability to use a PCA obtained at fst globally.
4.7
Summary
In the first part of the present Chapter, a novel methodology based on Principal
Components Analysis (PCA) has been proposed for the identification of lowdimensional manifolds in turbulent flames, the estimation of their dimensionality and the selection of optimal reaction variables. To this purpose, high fidelity
experimental and numerical data sets have been investigated. Three different
PCA approaches are proposed. A global PCA analysis, GPCA, has been compared to two local PCA models, VQPCA and FPCA, based on the partitioning
of the experimental data into separate clusters where PCA is performed locally.
However, the partitioning algorithm used by VQPCA is unsupervised and based
on reconstruction error minimization while FPCA conditions the data a priori
on the mixture fraction. Results show that the local PCA approaches (VQPCA
and FPCA) outperform the global approach in all cases. Indeed, GPCA is unable to provide a compact representation of the data in a low-dimensional space
due to the highly non-linear relationships existing among the state variables.
Regarding the local approaches, the performances of VQPCA and FPCA are
comparable for a simple jet flames, while FPCA proves unable to capture important features for systems characterized by complex equilibrium phenomena
131
q
T
0.097
0.500
0.968
0.497
0.979
0.996
0.990
H2
0.798
0.413
0.910
0.542
0.741
0.877
0.963
O2
0.169
0.816
0.881
0.390
0.866
0.945
0.958
O
0.774
0.212
0.868
0.303
0.337
0.819
0.989
OH
0.736
0.188
0.859
0.329
0.219
0.822
0.977
H2 O
0.245
0.134
0.940
0.269
0.749
0.994
0.994
H
0.827
0.319
0.888
0.558
0.127
0.806
0.978
HO2
0.812
0.433
0.838
0.537
0.805
0.970
0.984
H 2 O2
0.811
0.398
0.855
0.390
0.858
0.960
0.982
CO
0.580
0.666
0.867
0.417
0.859
0.958
0.968
CO2
0.432
0.555
0.940
0.206
0.513
0.737
0.808
HCO
0.881
0.619
0.934
0.689
0.403
0.860
0.955
Table 4.19: R2 values at f = 0.2 using the PCA obtained at fst . Also shown are results for the parametrization.
A
B
1
2
1
2
3
132
1
2
1
2
3
T
0.676
0.956
0.959
0.628
0.964
0.984
0.986
H2
0.190
0.287
0.962
0.081
0.134
0.612
0.769
O2
0.740
0.958
0.949
0.593
0.904
0.928
0.948
O
0.642
0.887
0.923
0.662
0.804
0.836
0.888
OH
0.548
0.587
0.775
0.112
0.197
0.373
0.543
H2 O
0.073
0.076
0.826
0.246
0.755
0.986
0.991
H
0.434
0.542
0.768
0.365
0.721
0.822
0.913
HO2
0.741
0.966
0.955
0.508
0.938
0.960
0.967
H2 O 2
0.727
0.867
0.898
0.616
0.650
0.791
0.839
CO
0.467
0.836
0.911
0.521
0.844
0.873
0.909
CO2
0.572
0.868
0.919
0.268
0.442
0.542
0.841
HCO
0.555
0.751
0.889
0.570
0.896
0.930
0.941
Table 4.20: R2 values at f = 0.6 using the PCA obtained at fst . Also shown are results for the parametrization.
4.7. Summary
133
134