Professional Documents
Culture Documents
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-Parametric Strategies
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-Parametric Strategies
DOI: 10.1002/qsar.200620064
Abstract
This work deals with data analysis techniques and high-throughput tools for synthesis and
characterization of solid materials. In previous studies, it was found that the final
properties of materials could be successfully modeled using learning systems. Machine
learning algorithms such as neural networks, support vector machines, and regression
trees are non-parametric strategies. They are compared to traditional parametric statistical
approaches. We review a wide range of statistical methodologies, and all the methods are
evaluated using experimental data derived from an exploration-optimization of the
material ITQ-21. The results are judged on the numerical prediction of phase>s crystal-
linity. We discuss the theoretical aspects of such statistical techniques, which make them
an attractive method when compared to other learning strategies for modeling the
properties of the solids. Advantages and drawbacks are highlighted. We show that such
approaches, by offering broad solutions, can reach high-level performances while offering
ease of use, comprehensibility, and control. Finally, we shed light on both the interpre-
tation and stability of results, which remain the main drawbacks of the majority of
machine learning methodologies when trying to retrieve knowledge from the data
treatment.
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &1&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
&2& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
equal to the MSPT amount to get neutral pH. The distribu- ble values are predicted from a linear combination of pre-
tion of experiments comes from a full factorial design with dictor variables, which are connected to the dependent
Si/Ge, H2O, (MSPT & F )/(Si þ Ge), Al/(Si þ Ge), and the variable via a function called link function. The relation-
synthesis duration noted as t, respectively, at 4, 3, 2, 3, and ship in GLZ is assumed to be y ¼ g(b0 þ b1x1 þ ... þ bkxk) þ
2 levels. Therefore each experiment is one combination e, where e stands for the error variability. The inverse func-
among the 144 possible. All the experiments were carried tion g1 ¼P
f is the link function; so that
i¼k
out in a random order. f ð~yÞ ¼ b0 þ i¼1 bi xi , where ~y stands for the expected val-
The reagents employed for gel syntheses were ammoni- ue of y. For additional information about GLZ, see [27,
um fluoride (98%, Aldrich), germanium oxide (99.998% 28].
Aldrich), aluminum isopropoxide (98%, Aldrich), methyl-
sparteine, LUDOX (AS 40 wt% Aldrich), MilliQ water
2.2.3 Piecewise Linear Regression (PLR)
(Millipore) and N(16)-methyl-sparteinium hydroxide. Au-
tomated gel synthesis was done inside Teflon vials (3 mL), This model specifies a common intercept b0, and a slope
which were finally inserted in a multi-autoclave of 15 posi- that is either equal to b1 if y 100, or b2 taking into ac-
tions and sealed with a Teflon-lined stainless-steel tip, and count a problem with only two variables, and the following
subsequently allowed to crystallize at 175 8C. The samples model: y ¼ b0 þ b1x(y 100) þ b2x(y > 100). Stepwise mod-
were then washed and filtered in parallel and then dried at el-building techniques for regression designs with a single
100 8C overnight. Finally, the samples were weighted and dependent variable are described in numerous sources [29,
characterized by XRD using a multi-sample Philips X>Pert 30].
diffractometer employing Cu Ka radiation. The standard
X-ray diffractogram for ITQ-21 is shown in Figure 1b. Cal-
2.2.4 SVMs as Regression Tool
culation of the occurrence and crystalinity was done inte-
grating the area of the characteristic peaks. For ITQ-21 A general introduction of SVMs was already presented in
the range for the angle 2q is comprised between 25.78 and [20]. With e-SV regression [31], the goal is to find a func-
26.58. tion f(x) that has at most e deviation from the target yi for
all the training data, and at the same time, as flat as possi-
ble. Formally, the problem is written as a convex optimiza-
2.2 Computational Methods
tion problem.
In regression problems, the objective is to estimate the val-
ue of a continuous output variable that in our case is a giv-
2.2.5 Regression Trees (RTs)
en crystalline phase from input variables such as the syn-
thesis parameters. All the different techniques used in this Regression trees may be considered as a variant of deci-
study are quickly detailed except NNs which already have sion trees, designed to approximate real-valued functions
received considerable attention, see [15 – 20] for recent ap- instead of being used for classification tasks. RT is built
plications in material science, and [25, 26] for more techni- through a process known as binary recursive partitioning.
cal explanations. In order to provide a fair comparison be- This is an iterative process of splitting the data into parti-
tween the different techniques investigated, 28% of the tions, and then splitting it up further on each of the
data chosen randomly among the whole available dataset branches. In our experiments the classical C&RT [32] tree
composed of 144 distinct experiments is kept unused for is used.
model generalization evaluation.
2.2.1 Multiple Linear Regression (MLR) 3 Results of Parametric Statistics and Prediction of
ITQ-21 Phase Crystallinity
An MLR model specifies the relationship between one de-
pendent variable
Pi¼ky, and a set of predictor variables X, so 3.1. Experimental Results
that y ¼ b0 þ i¼1 bi xi in where bi are the regression coef-
ficients. In Figure 2 is represented the effect of each synthesis vari-
able on ITQ-21 crystallinity. It is shown that ITQ-21 is fa-
vored by some combination of synthesis variables. The
2.2.2 Generalized Linear Model (GLZ)
highest values of crystallinity appear in concentrated gels
GLZ can be used to predict responses for both dependent [H2O/(Si þ Ge) < 5] with low ratios of Si/Ge. The presence
variables with discrete distributions and for dependent var- of the zeolite can be affected by the content of aluminum,
iables which are non-linearly related to the predictors. in such a way that the more aluminum, the less crystallini-
GLZ differs from the linear model mainly in the following ty. Furthermore, high MSPT/(Si þ Ge) and F/(Si þ Ge) ra-
major aspects. (i) The distribution of the dependent varia- tios play positive roles in the formation of ITQ-21.
ble can be explicitly non-normal, (ii) the dependent varia-
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &3&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
&4& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
Table 1. Table shows the standardized regression coefficients (b) and the raw regression coefficients (B) for the MLR.
b B t(138) p-Level Partial Semi-partial Tolerance R2
Intercept – 65.152 9.7772 0.000000 – – – –
T 0.094280 1.205 1.7771 0.077757 0.149574 0.094273 0.999844 0.000156
Si/Ge 0.403721 0.741 7.6099 0.000000 0.543687 0.403695 0.999875 0.000125
Al/c 0.231247 317.365 4.3587 0.000025 0.347867 0.231227 0.999829 0.000171
MSPT/c 0.111600 22.344 2.1036 0.037230 0.176265 0.111593 0.999870 0.000130
H2O/c 0.611522 4.725 11.5273 0.000000 0.700392 0.611513 0.999971 0.000029
The magnitude of b allows to compare the relative contribution of each independent variable in the prediction of the dependent variable. The squared
semi-partial correlation is an indicator of the percent of total variance uniquely accounted for by the respective independent variable, while the squared
partial correlation is an indicator of the percent of residual variance accounted after adjusting the dependent variable for all other independent varia-
bles. Grey cells indicate that the estimates are significant (a ¼ 5%). Gray.
Figure 3. Prediction of ITQ-21 phase crystallinity with synthesis variables as input, for GLZs using Gamma distribution with log
link function, normal distribution with identity link function, and normal distribution with log link function as modeling approaches.
3.2. MLR and First Inspection of the Dataset are statistically significant (p < 5%) except for the synthe-
sis duration (variable 2). If the risk a is increased to 10%,
In Figure 3, the MLR is calculated with the synthesis varia- the variable 2 becomes significant. A non-significant p-val-
bles as input. Real ITQ-21 phase crystallinity is indicated ue does not mean that the null hypothesis*is true. It simply
on the y-axis while the expected one is represented on the means that this dataset is not strong enough to convince
x-axis. The adjustment was R2 ¼ 0.61164 [F(5.138) ¼ 43.46; that the null hypothesis is not true. To conclude that a val-
p ¼ 0.00000]. According to this method, 61.16% of the
original variability has been explained, and (1 R2) is the
residual variability. Regression coefficients are given in
* A significance test is performed to determine if an observed
Table 1, where highlighted values (gray background color)
value of a statistic differs enough from a hypothesized value
are significant. As indicated by b values, Si/Ge and H2O/ of a parameter to draw the inference that the hypothesized
(Si þ Ge) (respectively, variables 3 and 6) are the most im- value of the parameter is not the true value. The hypothe-
portant predictors of ITQ-21 phase crystallinity, and all sized value of the parameter is called the “null hypothesis”.
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &5&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
ue is not statistically significant when the null hypothesis is times and temperature reported here. Finally, the other
false is called Type II error. For more details of this aspect factor that is statistically interesting is the Al content. The
see [23]. highest values of crystallinity have been obtained at low
Another way of looking at the unique contributions of levels of Al. The reason being that the number of frame-
each independent variable is to compute the partial and work negative charges introduced by Al and which have to
semi-partial correlations. In Table 1, partial correlations be neutralized by the Organic Structure Directing Agent
are the correlations between the respective independent (OSDA) are limited, due to the fact that OSDA has also
variables adjusted by all other variables, and the depen- to compensate the F located within the double four
dent variable adjusted by all other variables. The semi-par- member rings [33], and the void volume of the ITQ-21
tial correlation is the correlation of the respective inde- structure can fit a limited number of MSPT cations. It has
pendent variable adjusted by all other variables, with the to be noted that H2O has the largest effect on crystalliza-
raw dependent variable. Values in Table 1 for such partial tion only considering the chosen ranges of variation.
and semi-partial correlations appear relatively similar and The use of parametric procedures allows taking advan-
confirm the trends observed with b values. In Table 2, the tages of the whole theory behind the model. However, as-
partial correlation sizes the correlation between two varia- sumptions should always be first verified, otherwise the
bles that remains after partialling out one other variables conclusion may not be accurate. For example, in MLR, it
(indicated with “ ”), while the correlation coefficient is assumed that the residuals are distributed normally.
does not take into account such control. It can be observed Many tests are robust with regard to violations of this as-
that the correlations, and partial correlations, between sumption. The normal probability plot of residuals gives a
each variable and ITQ-21 crystallinity, are quite similar. quick indication of whether or not violations have occur-
However, one can note that without considering the effect red. If the observed residuals (plotted on the x-axis of Fig-
of H2O (i.e., fifth column of partial correlations in bold) ure 4) are normally distributed, then all values should fall
the correlation between Si/Ge and ITQ-21 crystallinity de- onto a straight line. If the residuals are not normally dis-
creases by ten points. Actually, a similar jump is examined tributed, then they will deviate from the line. Figure 4
for all the correlations when H2O is partialled out; in the shows a particular lack of fit: the data seems to form an S-
case of positive response (MSTP or F) such effects are in- shape around the line. This pattern is characteristic when
creased, while negative partial correlations are decreased. the dependent variable may have to be transformed
Surprisingly, it seems that H2O increase, which has a global through a log-transformation to pull the tails of the distri-
negative effect on ITQ-21 crystallinity, when combined bution.
with other variables has a good effect on negative feature Another important step when building models is the de-
and a bad one for the unique positive relation. tection of outliers. If one experiment is clearly an outlier,
Moreover, it is shown that the three variables that have then there is a tendency for the regression line to be pulled
the greatest influences on the formation of ITQ-21 are H2 by this outlier. As mentioned before, one can say that such
O, Si/Ge, and Al content. For the levels chosen in the pres- a deviation would be rather low compared to the conse-
ent work, the water content is the variable that has the quences (overfitting) which might be observed using ML
largest influence on ITQ-21 crystallinity. This phase pre- models. As a result, if the respective cases were excluded,
fers concentrated gels that present relations of H2O/(Si þ different B coefficients would be found. Figure 5 shows
Ge) with values less than 5. This can also indicate that the “deleted residual” statistic which is the standardized
high concentration of F has a positive effect on crystalli- residual for the respective case that one would obtain if
zation. The content of Ge in the framework of the ITQ-21 the case was excluded from the analysis. Therefore, if the
is a critical factor. When the content of Ge decreases in deleted residual is different from the standardized residual
the starting gel, the rate of crystallization of ITQ-21 de- the regression analysis may be biased by the given case.
creases, and for high values of Si/Ge (> 30), small amounts However, such a case does not belong to our experimental
of ITQ-21 (low crystallinity) is achieved with the set of dataset and therefore the entire set is kept. Another inter-
Table 2. Partial correlations and correlation coefficients between all variables involved in the synthesis study.
Variables ITQ21 – partial correlation ITQ21 – correlation
Time – 0.10 0.09 0.09 0.12 0.09
Si/Ge 0.40 – 0.41 0.41 0.51 0.40
Al 0.23 0.26 – 0.23 0.29 0.23
MSTP or F 0.11 0.12 0.11 – 0.13 0.11
H2O 0.62 0.67 0.63 0.62 – 0.61
The partial correlation sizes a correlation between two variables that remains after controlling for (e.g., partialling out) one or more other variables.
Gray cells contain significant values at p < 0.05.
&6& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
Figure 4. Normal probability plot of residuals for ITQ-21 Figure 5. Residuals vs. deleted residuals plot. This technique
phase crystallinity linear model. This visualization procedure allows to separate outliers from the dataset when the latter are
permits to quickly examine if the normal distribution of residual relatively far from the line.
assumption is respected or not. The tails show an S-shape pat-
tern.
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &7&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
&8& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
Table 3. Estimates of GLZ models using different configurations of effects. Gray cells contain significant values at p < 0.05.
Effects Multiple reg. Full-factorial Polynomial- Quadratic Fractional Fractional
degree 2 response factorial factorial
surface to degree 3 to degree 2
regression
Intercept B0 5.8659 5.446 6.9819 3.4273 2.535 5.6146
First order (main effects) t 0.0552 0.324 0.0485 0.0346 0.004 0.0240
Si/Ge 0.0672 0.032 0.0255 0.0191 0.051 0.0256
Al/c 13.7012 61.500 11.3677 1.3192 143.024 10.6196
MSPT/c 0.7943 0.482 11.1976 3.2865 4.124 0.6261
H2O/c 0.3214 0.247 0.3193 0.7122 0.665 0.0037
Two-way interaction t. ( Si/Ge) 0.006 0.0016 0.002 0.0023
t. Al/c 6.832 0.6027 8.681 0.2978
t. MSPT/c 0.655 0.0004 0.501 0.0192
t. H2O/c 0.076 0.0190 0.060 0.0085
( Si/Ge).Al/c 2.568 0.1918 3.816 0.1389
( Si/Ge).MSPT/c 0.103 0.0327 0.129 0.0242
( Si/Ge).H2O/c 0.003 33.8077 0.007 0.0171
( Al/c).MSPT/c 162.920 0.0270 225.243 29.0218
( Al/c).H2O/c 27.895 9.0908 49.030 6.5146
( H2O/c).MSPT/c 0.583 0.4554 1.037 0.3316
Three-way interaction t.( Si/Ge).Al/c 1.203 0.379
t.( Si/Ge). MSPT/c 0.028 0.011
t.( Si/Ge). H2O/c 0.009 0.002
t.( Al/c). MSPT/c 18.467 0.266
t.( Al/c). H2O/c 5.623 1.135
t.( MSPT/c). H2O/c 0.145 0.126
( Si/Ge).Al/c.( MSPT/c) 9.108 5.223
( Si/Ge).Al/c.( H2O/c) 0.486 0.095
( Si/Ge).MSPT/c.( H2O/c) 0.064 0.004
( Al/c). MSPT/c.( H2O/c) 100.063 80.883
{4...5}-way ... ...
Second order T2 0.0000 0.0000
( Si/Ge)2 0.0002 0.0003
( Al/c) 2 33.0816 65.1404
( MSPT/c) 2 15.6960 5.3025
( H2O/c) 2 0.0935 0.0751
Scale 10.2652 7.724 9.8454 7.5255 7.812 8.7050
One has to note that the breakpoint is defined on the de- namely “training,” “selection,” and “test,” respectively,
pendent variable and therefore, in order to assign a value with 64, 40, and 40 individuals in each set in order to avoid
to a new experiment it should be first evaluated on which overfitting. Thus, the test set represents 28% of the entire
side on the breakpoint the dependent variable will be. dataset as mentioned before.
However, a previous model can be used or a classification
algorithm with a two-class system defined by the threshold 4.1 Comparison and Performance Assessments
(i.e., the breakpoint). Therefore the final PLR efficiency
depends on such a previous estimation. A quick classifica- As in the case of traditional MLR models, fitted GLZ can
tion using the quadratic model only misclassified six ex- be summarized through statistics such as parameter esti-
periments. mates, their standard errors, and GoF statistics. Here dif-
ferent statistics such as the correlation coefficient (i.e., the
correlation coefficient between the predicted and ob-
4 Results of Non-parametric Approaches and served output values), the coefficient of determination
Prediction of ITQ-21 Phase Crystallinity (R2, Eq. 3), R2 adjusted (R2adj , Eq. 4), the standard devia-
tion (Eq. 1) of the target output variable (sy), and the stan-
Having previously estimated the distribution of the col- dard deviation of errors for the output variable (se) have
lected data from ITQ21 analysis study, the predictions of been calculated. r (Eq. 2) represents the linear relationship
previous parametric statistics are compared with NN, between two variables. A perfect prediction will have a
SVMs, and RTs. For each ML approach, the whole dataset correlation coefficient of 1. A correlation of 1 does not
which contains 144 data is divided into three different sets, necessarily indicate a perfect prediction (only a prediction
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &9&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
Figure 7. ITQ-21 phase crystallinity fitting using GLZs and MLR is given as a reference.
which is perfectly linearly correlated with the actual out- 4.2 Performances of Neural Networks, Regression Trees,
puts), although in practice the correlation coefficient is a and SVMs
good indicator of performance. It also provides a simple
and familiar way to compare the performance of statistical The most common NN architectures have outputs in a lim-
and ML methods. In Eqs. 1 – 4, formulas are given for each ited range (e.g., 0 – 1 for the logistic activation function we
statistics, with n the amount of data, and p the number of use). When the desired output is in such a range, it pres-
predictors. Adding more independent variables to a model ents an interest for classification problems as has been in-
can only increase the R2. Since the number of variables se- vestigated [15]. However, for regression problems there is
lected by the NN is different from the one used in the oth- clearly an issue to be resolved, and some of the consequen-
er approaches, R2adj has also been used. ces are quite subtle. A scaling algorithm can be applied to
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ensure that the network>s output will be in a sensible
1 X 2 range. The simplest scaling function finds the minimum
s¼ ðx xÞ ð1Þ and maximum values of a variable in the training data, and
N i
&10& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
Figure 8. Confidence intervals of predicted values for the best GLZ models. Three different a values are considered (i.e., 10, 5, and
1%)
ple than the one obtained up to now. The 100% crystal- NNs. (i) The work required to obtain and select the best
lized material may not belong to the training set (e.g., ran- NN is by far more time-consuming than the other non-
dom selection of training set), and the 100% crystallinity parametric approaches. Considerable attention has been
has been arbitrarily defined by the best zeolite found in given to NN due to the high variability of results we have
our experimentation. Nevertheless new synthesis could obtained. Numerous architectures, activation functions,
achieve an even better crystallized sample. and other parameters have been tested. Several NN mod-
In all the cases, NNs as Multi-Layer Perceptron (MLP) els have been discarded due to the great difference of per-
and SVMs using RBF kernel form have reached the best formance between the training/selection and the test, indi-
performances. In Table 4, the best NN model for the pre- cating a clear overfitting phenomenon. (ii) Having com-
diction of ITQ21 crystallinity is shown. Two points have to bined a feature selection algorithm to the NN, among the
be underlined considering the performance assessment of first selected “good” networks, some of them are com-
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &11&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
Figure 9. ITQ-21 phase crystallinity modeling with best GLZ, PLR, SVM, NN, and RT.
posed of very few input variables. Considering the synthe- could emerge. Both R2 and corrected R2 have been given,
sis of zeolites, it can be shown that any of the variables we while the use of the latter can be questioned because of
have used is without effect and can be eliminated from the the above reasons. It has to be noted that such a feature se-
synthesis steps. However, the selection of input variables lection mechanism could have been used for SVM or re-
permits to eliminate variables from which the network did gression trees. Conversely, the stability of these methodol-
not find the right way to utilize the information brought ogies are usually better, partially due to the very little
after the exploitation of the others. Moreover, reducing number of parameters compared to the numerous ones
the pool of variables input minimizes inherently the poten- simply contained into the NN architecture as will be
tiality of extrapolation when using a broader range for syn- shown later.
thesis variables, since the role of the discarded variables
Table 4. Description of all the selected models for the prediction of ITQ21 phase>s crystallinity.
Statistics Models MLR GLZ Full Polynomial Quadratic Fractional Fractional Piecewise Neural SVM Regres-
(normal fac- of degree 2 response factorial factorial linear network radial sion
distribu- torial surface to degree 3 to degree 2 regression MLP 4 : basis tree
tion, regression 4 – 5-1 : 1 function
log link
function)
Correlation 0.782 0.919 0.953 0.923 0.955 0.952 0.941 0.962 0.918 0.921 0.916
coefficient (r)
R2 0.611 0.844 0.909 0.853 0.913 0.907 0.885 0.925 0.843 0.849 0.840
R2 adjusted 0.597 0.839 0.906 0.847 0.910 0.904 0.881 0.923 0.838 0.844 0.835
Standard deviation 15.931 10.151 7.695 9.813 7.514 7.782 8.664 6.978 10.139 9.956 10.216
of errors
Black cells are used for non-parametric approaches and gray ones are the selected models. Mean of the whole dataset: 17.458&Pls check change&.
Standard deviation of the whole dataset: 25.565&Pls check change&.
&12& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
Figure 10. Final Regression tree for prediction of ITQ-21 phase crystallinity after additional user pruning.
Figure 9 shows the predictions of ITQ21 crystallinity for succeeds in isolating the different levels of ITQ-21 crystal-
the given NN. The effect of the synthesis variable named linity. It is interesting to observe that the position of the
“Time” being rather low (as indicated earlier), NN has re- rectangles gives an intuitive classification of the samples
moved it from the input parameter. The number of false studied, allowing an easy visualization of the crystallinity
positives is much more important for NN-MLP and SVM- and the synthesis factors. The leaves in the left branches
RBF (radial basis function) compared to all other techni- present an increasing crystallization (dark rectangles) go-
ques. The SVM-RBF is the best among non-parametric ap- ing down the splits, while in the right branches the crystal-
proaches considering the overall criteria given in Table 4, linity is descending (bright rectangles). For each leaf, the
but on the other hand, numerous negative crystallinity val- mean (m, i.e., mu) of the samples is indicated. Figure 10
ues can be observed. A k-fold (k ¼ 10) Cross-Validation shows that the highest crystalline samples are obtained for
(CV) has been utilized for the optimizing capacity (C) and concentrate gels (H2O < 3.5) and Si/Ge < 36. These results
epsilon (e) at the same time. For C ¼ 10, gamma (g) has are in agreement with conclusions obtained in Section 2,
been set at 0.2, and e ?¼? 0.1. Regression Tree (RT) pro- and in the MLR (Section 3.2). Moreover, previous work
duces accurate predictions based on few logical if – then [33] suggests that ITQ-21 could be obtained for a Si/Ge ra-
conditions. A ten-fold CV is used for pruning. The original tio of 25, but not for 50.
version of the RT was composed of 13 non-terminal nodes SVM-RBF obtains the best results without requiring
and 14 (terminal) leaves. In Figure 10, some terminals heavy pre- and post-treatments. However, RT may be pre-
have been pruned again (the leaves containing less than 20 ferred because the RBF kernel makes the model interpre-
individuals are removed) making the reading easier. It can tation more difficult than other easier ones. Such a kernel
be observed through the gray scale rectangles that the RT has been chosen to give SVM the same chance facing NN.
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &13&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
However, simpler kernels such as polynomials of degrees through the difference between training and test perform-
2 and 3 have also been tested. Results for a 30% test set ances as done before. However, the set #2 in Table 6 shows
and 10-CV are the following: an obvious estimation failure. Only one input has been
{Degree, C, e, g, coeff.} ¼ {3, 10, 0.1, 0.3} gives r ¼ 0.915 kept; consequently, the range of maximum interest (i.e.,
(training), r ¼ 0.85 (test) “> 50”) is greatly under-evaluated while amorphous mate-
{Degree, C, e, g, coeff.} ¼ {2, 10, 0.1, 0.3} gives r ¼ 0.920 rials are overestimated. Obviously, such an NN has been
(training), r ¼ 0.87 (test) trapped into a local optimum. In Tables 6 – 9, the gray cells
These results confirm what was concluded through indicate where a given failure has been encountered, while
MLR and GLZ examination, i.e., the use of second degree the black cells indicate the selected models. Different
effects is useful while the integration of higher effects is kinds of disappointment are underlined below. It has to be
not. The difference in performance between RBF and such pointed out that the following criteria are not independ-
a simple kernel is very low and once again it discards all ent, and therefore only the most significant criteria of the
forms of more complex models in this study. Finally, both failure are shown in gray. On the basis of traditional statis-
RT and SVM with polynomial kernel of degree 2 are se- tics listed in Tables 8 and 9.
lected. RT should be used for a quick overview of the sys- (1) High performance drop from calculation on training
tem while SVM could allow to draw precisely a contour to test sets such as the NN using the MLP technique and
plot. tested with set #1 (11.4% ¼ 98.186.7, Table 8) which rep-
resents the greatest fall, but also set #5 for NN-MLP in Ta-
ble 8.
5 Advanced Analysis of Methodologies and (2) Relatively low performance compared to all other
Interpretation models of the same type. Therefore, set #4 for NN-RBF in
Table 9 is discarded.
Not only to show the difficulties encountered using NN (3) Relatively high error standard deviation. One has to
but also for better arguing the selection of SVM methodol- note that even if a prediction error mean extremely close
ogy over NN in this study, both techniques are compared to zero is expected, it is possible to get a zero prediction
based on the stability/variability of their performances de- error mean simply by estimating the averaged training
pending on the amount of data available for the training data value, without any recourse to the input variables or
step. We have chosen to assess performance generalization any advanced methodologies at all. Thus, the standard de-
for only these two approaches since SVM has been quali- viation error is of great interest in order not to use false
fied as a more stable technique compared to NN, and all good models as NN-RBF tested on set #4 in Tables 8 and
other used techniques are far less likely to overfit the data 9. NN-RBF with set #2 in Table 8 could have been discard-
or a fast post-processing treatment can be easily combined ed directly with the error mean. Note that if the standard
such as for RT. Through this analysis it is also investigated deviation error is no better than the training data standard
if the decrease of the size of the test set for allocating deviation, then the technique has performed no better
more “resources” to the training part makes the variability than a simple mean estimator.
of performance higher and thus the risk of false accepta- (4) A weak (i.e., non-robust) architecture. Not only the
tion of the model becomes larger. NN-MLP tested on set #2 in Table 8, but also NN-MLP
The dataset is divided into two parts: training (Tr) and and NN-RBF tested on set #5 in Table 9 possess a very low
test (Te). Their respective size varies and the fitting capaci- number of input data indicating that the networks did not
ty is assessed. The relative amount of data in the test sub- manage to use the information brought by all variables. In
set is set to either 70 or 30% of the whole available data- Tables 6 and 7, predictions are followed on separated rang-
set. Five different samplings for each distribution into es of crystallinity.
training and test are presented for both NNs and SVM. (5) Difference between observed and predicted mean of
The frequencies of responses have been checked in order ITQ-21 crystallinity. This is generally observed for high
to have a minimum number of each type of experiments values of crystallinity (sets #2, #3, #5 for NN-MLP, and sets
into both training and test sets, i.e., low and high ITQ-21 #4 and #5 for NN-RBF in Table 6, and sets #1, #3, #4 with
crystallinity values. This will permit to assess the perfor- NN-RBF in Table 7). This is due to the relatively low
mance of the modeling on three different ranges of crystal- amount of experiments belonging to the range “> 50.” On
linity: {0, ]0...50], ]50...100]}. Table 5 gives the mean and the other hand, in set #2 for both NN-RBF and NN-MLP
standard deviation of each sample taking into account the in Table 6, a very bad recognition of amorphous materials
ranges, while Tables 6 and 7 indicate the statistics for the is detected as well for set #1 for NN-MLP. The prediction
predicted values. The best solution using RBF and MLP for medium crystallized materials is overestimated in set
(three or four layers) is conserved for NN while SVM #1 for NN-MLP, making the margin between the medium
makes use of only RBF model form. Considering NNs, the and highly crystallized zeolites very narrow.
best network found is kept for each sampling after elimi- (6) Overfitting phenomenon is also detected through
nation of the networks that show a clear overfitting the high standard deviation of the predicted ITQ-21 crystal-
&14& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
Table 5. Two different partitions of the whole available experimental set of data are described.
Test sets Real ITQ-21 crystallinity mean ( SD )
Percentage (%) Set #1 Set #2 Set #3 Set #4 Set #5 Ranges
(mean nominal value)
30 (44.6) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0
30.8362 (15.6366) 17.0754 (13.0485) 28.2170 (17.2106) 28.4415 (15.2259) 25.4530 (17.5253) > 50
68.1282 (13.1038) 69.9754 (12.9964) 72.1727 (9.4963) 63.9546 (3.7527) 71.8512 (15.1351) < 50
21.4306 (28.9117) 18.9280 (28.2568) 15.2557 (25.4654) 18.7385 (25.7839) 22.7955 (29.8045) Total
70 (100.8) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0
25.692 (13.3596) 23.674 (14.5605) 25.772 (14.4003) 22.415 (15.2069) 24.740 (15.085) > 50
65.853 (10.683) 65.077 (10.527) 66.109 (8.068) 66.822 (11.615) 69.327 (12.645) < 50
17.0139 (24.3044) 17.5168 (25.4006) 16.7966 (24.8081) 17.3316 (26.3934) 17.4238 (26.5437) Total
The first column gives the percentage and corresponding nominal mean of the test set. Then, considering a given partition, the real mean and standard
deviation of each test set are given taking into account different ranges of ITQ-21 phase crystallinity (entire set, null, ]0...50] and ]50...100]) noted in the
last column.
Table 6. Mean and standard deviation of the predicted ITQ-21 crystallinity for a test set of 70% of the entire dataset.
Neural networks
Test sets only Ranges MLP MLP MLP MLP MLP
4 : 4 – 2-1 : 1 1 : 1 – 1-1 – 1 : 1 3 : 3 – 1-2 – 1 : 1 3 : 3 – 2-1 : 1 3 : 3 – 3-1 : 1
ITQ-21 crytallitnity Mean 0 2.9502 5.1625 0.3841 2.4025 0.3139
> 50 22.0063 21.1096 24.6572 26.2958 24.6209
< 50 62.5251 27.1190 47.3451 60.8654 45.7861
SD 0 7.9992 6.3111 5.1928 8.8584 5.6070
> 50 13.5509 10.1026 16.3604 21.5982 17.2128
< 50 31.1547 6.7165 14.6670 17.3099 12.2742
Test sets only Ranges RBF RBF RBF RBF RBF
3 : 3 – 9-1 : 1 3 : 3 – 10 – 1 : 1 3 : 3 – 9-1 : 1 4 : 4 – 9-1 : 1 4 : 4 – 10 – 1 : 1
ITQ-21 crytallitnity Mean 0 0.5330 8.8558 0.8193 1.0616 1.7603
> 50 30.0371 33.4153 33.7114 27.1984 24.8732
< 50 58.6176 69.5698 60.7412 44.4586 40.9229
SD 0 7.6737 13.1794 7.3416 14.0595 11.2243
> 50 17.5384 19.2353 22.6005 12.2213 12.0779
< 50 24.2643 19.7210 23.1682 8.7685 11.6271
Support vector machines
Test sets only Ranges RBF 1 RBF 2 RBF 3 RBF 4 RBF 5
ITQ-21 crytallitnity Mean 0 1.3014 1.87664 1.3818 0.7985 2.2881
> 50 29.4619 30.8891 28.3283 29.4217 31.6706
< 50 54.8544 54.5931 60.4716 60.6477 51.7803
SD 0 11.3509 14.4216 8.1337 10.7433 12.1001
> 50 15.1640 14.9366 15.0950 12.1353 15.8295
< 50 20.7396 12.4630 14.3953 13.7363 16.8359
The two statistics are given depending on both the methodologies employed and ranges of the real ITQ-21 crystallinity.
Table 7. Mean and standard deviation of the predicted ITQ-21 crystallinity depending on both the methodologies employed and rang-
es of the real ITQ-21 crystallinity.
Neural networks
Test sets only Ranges MLP MLP MLP MLP MLP
4 : 4 – 8-1 : 1 4 : 4 – 3-1 : 1 4 : 4 – 10 – 8-1 : 1 5 : 5 – 3-1 : 1 3 : 3 – 1-3 – 1 : 1
ITQ-21 crytallitnity Mean 0 1.6462 0.4754 5.6347 0.3186 0.3159
> 50 39.6653 20.6251 24.4674 30.0032 31.9088
< 50 51.9028 60.1008 61.6987 56.7048 61.3224
SD 0 2.8551 4.4862 10.0612 5.3826 4.0088
> 50 21.1517 15.4822 17.3254 18.4007 24.8200
< 50 20.4111 13.2987 26.5842 23.2101 9.8563
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &15&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
Table 7. (cont.)
Neural networks
Test sets only Ranges MLP MLP MLP MLP MLP
4 : 4 – 8-1 : 1 4 : 4 – 3-1 : 1 4 : 4 – 10 – 8-1 : 1 5 : 5 – 3-1 : 1 3 : 3 – 1-3 – 1 : 1
Test sets only Ranges RBF RBF RBF RBF RBF
5 : 5 – 20 – 1 : 1 4 : 4 – 15 – 1 : 1 4 : 4 – 19 – 1 : 1 5 : 5 – 6-1 : 1 2 : 2 – 9-1 : 1
ITQ-21 crytallitnity Mean 0 1.3549 0.1903 3.5478 3.1014 1.5366
> 50 33.6393 23.2889 31.4614 29.4760 27.0582
< 50 43.0530 56.7472 50.5591 41.0717 53.1617
SD 0 7.5802 7.4352 9.1270 14.8800 5.3202
> 50 14.3402 14.2591 20.1331 8.8716 20.6843
< 50 14.2721 12.0603 14.6490 9.5832 13.1921
Support vector machines
Test sets only Ranges RBF 1 RBF 2 RBF 3 RBF 4 RBF 5
ITQ-21 crytallitnity Mean 0 0.3699 1.5211 2.0352 0.0455 0.1315
> 50 33.1845 24.0824 26.9391 29.1968 29.1049
< 50 55.7318 59.0732 55.0579 62.3767 58.3044
SD 0 6.2682 8.2075 8.8295 7.2102 6.0857
> 50 17.0246 14.4054 11.9040 12.9160 16.2556
< 50 23.4673 13.2500 15.2324 13.7425 15.9692
Tables 6 and 7 differ from the percent of test set which is used. Here the test set represents 30% of the whole available dataset.
Table 8. Different statistics are given for each type of model, parameters, test set such as the mean error, the error standard devia-
tion, the ratio of the prediction error standard deviation to the original output data standard deviation noted “SD ratio,” as well as
the Pearson correlation r for both training and test sets. Note that a lower SD ratio indicates a better prediction, and this is equiva-
lent to 1 minus the explained variance of the model. The percentage of test set used is 70 as indicated in the first column.
70% of Test set
Methodology Test sets Statistics on test set Models
Error Pearson correlation (i.e., r) Form Parameters
Mean SD ratio Training SD ratio Test
( þ selection)
Neural networks Set #1 0.0430 8.3626 0.5254 0.98191 0.86779 MLP 4 : 4 – 2-1 : 1
Set #2 4.2375 12.8548 0.7489 0.55621 0.70001 1 : 1 – 1-1 – 1 : 1
Set #3 2.8432 6.5612 0.4112 0.91686 0.91527 3 : 3 – 1-2 – 1 : 1
Set #4 1.3291 8.3130 0.4696 0.93298 0.88991 3 : 3 – 2-1 : 1
Set #5 3.5427 7.8134 0.4612 0.96023 0.89572 3 : 3 – 3-1 : 1
Set #1 0.6261 8.9759 0.5109 0.92194 0.87771 RBF 3 : 3 – 9-1 : 1
Set #2 8.3668 12.8457 0.5488 0.90154 0.86598 3 : 3 – 10 – 1 : 1
Set #3 1.8639 9.4848 0.5318 0.92553 0.87900 3 : 3 – 9-1 : 1
Set #4 2.0565 13.1512 0.6105 0.82674 0.79190 4 : 4 – 9-1 : 1
Set #5 3.4021 11.8073 0.5934 0.81881 0.80956 4 : 4 – 10 – 1 : 1
&16& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!
Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies
Table 9. Different statistics are given for each type of model, parameters, test set such as the mean error, the error standard devia-
tion, the ratio of the prediction error standard deviation to the original output data standard deviation noted “SD ratio,” as well as
the Pearson correlation r for both training and test sets. Note that a lower SD ratio indicates a better prediction, and this is equiva-
lent to 1 minus the explained variance of the model. The percentage of test set used is 30 as it is indicated in the first column.
30% of Test set
Methodology Test sets Statistics on test set Models
Error Pearson correlation (i.e., r) Form Parameters
Mean SD SD ratio Training Test
(þ selection)
Neural networks Set #1 1.7774 6.7509 0.3846 0.9309 0.9246 MLP 4 : 4 – 8-1 : 1
Set #2 0.7065 6.6734 0.3610 0.9316 0.9336 4 : 4 – 3-1 : 1
Set #3 1.4581 7.7908 0.5077 0.9281 0.8625 4 : 4 – 10 – 8-1 : 1
Set #4 0.7467 7.2215 0.4313 0.9520 0.9081 5 : 5 – 3-1 : 1
Set #5 0.2503 8.2122 0.3942 0.9086 0.9197 3 : 3 – 1-3 – 1 : 1
Set #1 4.3222 6.9701 0.4833 0.8688 0.8891 RBF 5 : 5 – 20 – 1 : 1
Set #2 0.7534 4.0173 0.4108 0.9018 0.9133 4 : 4 – 15 – 1 : 1
Set #3 0.6127 8.7909 0.5222 0.8497 0.8528 4 : 4 – 19 – 1 : 1
Set #4 2.1398 11.4482 0.6267 0.7989 0.7793 5 : 5 – 6-1 : 1
Set #5 2.5788 6.9778 0.4557 0.8225 0.8938 2 : 2 – 9-1 : 1
QSAR Comb. Sci. 00, 0000, No. &, 1 – 18 www.qcs.wiley-vch.de I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim &17&
These are not the final page numbers! ÞÞ
Full Papers Laurent A. Baumes et al.
of the simplest methodologies at the beginning, while tional Zeolite Conference 12th, Baltimore, July 5 – 10, 1998,
more complicated but less informative techniques are kept Meeting Date 1998, 1999, 541 – 549.
when complex underlying systems are detected. The com- [8] A. Corma, F. Rey, J. Rius, M. J. Sabatier, S. Valencia, Nature
2004, 431, 287 – 290.
bination of multiple approaches is of great appeal and [9] A. Corma, M. J. Daz-Cabañas, F. Rey, S. Nicolopoulus, K.
should allow to reach higher and more stable performan- Boulahya, Chem. Commun. 2004, 12, 1356 – 1357.
ces, taking advantage of each method contribution, while [10] C. S. Cundy, P. A. Cox, Micropor. Mesopor. Mat. 2005, 82,
expected drawbacks might be eliminated or corrected 1 – 78.
through the complementarities of each technique>s [11] A. Corma, V. Fornes, U. Diaz, Chem. Commun. 2001, 24,
strength. Such a study has pointed the difficulty of select- 2642 – 2643.
[12] M. Moliner, J. M. Serra, A. Corma, E. Argente, S. Valero,
ing models when any a priori or preference is given to a
V. Botti, Micropor. Mesopor. Mat. 2005, 78, 73 – 81.
given modeling technique. Of course, such a work is prob- [13] O. B. Vistad, D. E. Akporiaye, K. Mejland, R. Wendelbo, A.
lem-dependent as the number of input variables, available Karlsson, M. Plassen, K. P. Lillerud, Stud. Surf. Sci. Catal.
amount of data, and complexity of the system investigated 2004, 154, 731 – 738.
makes a given approach more or less adapted. Thus, such [14] A. Cantn, A. Corma, M. J. Diaz-Cabanas, J. L. Jordá, M.
preliminary inspection of techniques appears to be manda- Moliner, J. Am. Chem. Soc. 2006, 128, 4216 – 4217.
[15] L. A. Baumes, D. Farruseng, M. Lengliz, C. Mirodatos,
tory decreasing the risk of deceptive results. The data min-
QSAR Comb. Sci. 2004, 29, 767 – 778.
ing technology is more and more applied in the production [16] A. Corma, J. M. Serra, E. Argente, S. Valero, V. Botti,
mode, which usually requires automatic analysis of data Chem. Phys. Chem. 2002, 3, 939 – 945.
and related results in order to proceed to conclusions. But [17] M. Holena, M. Baerns, Catal. Today 2003, 81, 485 – 494.
we have shown here that the selection of a given model re- [18] C. Klanner, D. Farrusseng, L. A. Baumes, C. Mirodatos, F.
mains a difficult task making the automation of the whole Schuth, Angew. Chem. Int. Ed. 2004, 43, 5347 – 5349.
combinatorial loop a problem which is too often under-es- [19] J. M. Serra, L. A. Baumes, M. Moliner, P. Serna, A. Corma,
Comb. Chem. High Throughput Screen. 2006 (Submitted).
timated. Unlike traditional data mining contexts which [20] L. A. Baumes, J. M. Serra, P. Serna, A. Corma. J. Comb.
deal with voluminous amounts of data, materials science is Chem. 2006, 8, 583 – 596.
actually characterized by a scarcity of data, owing to the [21] D. Nicolaides, QSAR Comb. Sci. 2005, 24, 15 – 21.
cost and time involved in conducting simulations or setting [22] M. M. Gardner, J. N. Cawse, in: J. M. Cawse (Ed.), Experi-
up experimental apparatus for data collection. In such do- mental Design for Combinatorial and High Throughput Ma-
mains, it is prudent to balance speed through automation terials Development, John Wiley & Sons, Hoboken, New
Jersey, 2003, pp. 129 – 145.
and the utility of data. For these reasons, the human inter-
[23] L. A. Baumes, J. Comb. Chem. 2006, 8, 304 – 313.
action, verification and guidance may lead to better quali- [24] A. Corma, M. J. Daz-Cabañas, J. Martnez-Triguero, F. Rey,
ty output. J. Rius, Nature 2002, 418, 514 – 517.
[25] C. Bishop, Neural Networks for Pattern Recognition, Oxford
University Press, Oxford, 1995.
Acknowledgements [26] S. Haykin, Neural Networks: A Comprehensive Foundation,
Macmillan Publishing, New York, 1994.
[27] P. J. Green, B. W. Silverman, Nonparametric Regression and
EU Commission (TOPCOMBI Project) is gratefully ac- Generalized Linear Models: A Roughness Penalty Approach,
knowledged. Chapman & Hall, New York, 1994.
[28] A. J. Dobson, An Introduction to Generalized Linear Mod-
els, Chapman & Hall, New York, 1990.
References [29] J. Stevens, Applied Multivariate Statistics for the Social Sci-
ences, Erlbaum, Hillsdale, NJ, 1986.
[30] M. S. Younger, A First Course in Linear Regression, 2nd ed,
[1] H. Lee, S. I. Zones, M. E. Davis, Nature 2003, 425, 385 – 387.
Duxbury Press, Boston, 1985.
[2] A. Corma, J. Catal. 2003, 216(1 – 2), 298 – 312.
[31] V. Vapnik, The Nature of Statistical Learning Theory,
[3] C. S. Cundy, P. A. Cox, Chem. Rev. 2003, 103, 663 – 702.
Springer, Berlin, Germany, 1995.
[4] S. I. Zones, S. J. Hwang, S. Elomari, I. Ogino, M. E. Davis,
[32] L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Clas-
A. W. Burton, C. R. Chim. 2005, 8, 267 – 282.
sification and Regression Trees, Wadsworth & Brooks/Cole
[5] J. L. Paillaud, B. Harbuzaru, J. Patarin, N. Bats, Science
Advanced Books & Software, Monterey, CA, 1984.
2004, 304(5673), 990 – 992.
[33] T. Blasco, A. Corma, M. J. Diaz-Cabanas, F. Rey, J. Rius, G.
[6] K. G. Strohmaier, D. E. Vaughan, J. Am. Chem. Soc. 2003,
Sastre, J. A. Vidal-Moya, J. Am. Chem. Soc. 2004, 126,
125(51), 16035 – 16039.
13414 – 13423.
[7] R. Millini, C. Perego, L. Carluccio, G. Bellussi, D. E. Cox,
B. J. Campbell, A. K. Cheetham, Proceedings of the Interna-
&18& I 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 00, 0000, No. &, 1 – 18
ÝÝ These are not the final page numbers!