Professional Documents
Culture Documents
CHT 08 - No. 183
CHT 08 - No. 183
ABSTRACT Prediction of thermophysical properties required for heat transfer calculation and the
design and development of thermal systems is considered. Newly developed computational methods for
property prediction are described and their use is demonstrated for the prediction of various constant
properties, such as normal boiling and melting temperature, critical properties, heats of formation, etc.,
as well as for the temperature dependent properties: vapour pressure and viscosity of liquids. The
computational methods discussed include the Quantitative Structure-Structure-Property Relationship
(QS2PR), the short-cut QS2PR method (SC-QS2PR) and the targeted QSPR method (TQSPR). These
methods are based on the use of molecular descriptors (calculated from the molecular structure) for
predicting properties. However, unlike in the traditional property prediction methods, these new
methods are targeted to a particular compound, or a group of compounds, and rely on the identification
of a relatively small number of structurally similar compounds. Hence, they can provide accurate
predictions and estimates of the prediction error. In the examples presented it is demonstrated that
proper combination of the proposed method can provide property prediction within the experimental
error level.
INTRODUCTION
Heat transfer calculation and the design and development of thermal systems require data for thermophysical properties. The required data include constant properties, such as critical pressure and
temperature, normal boiling temperature, melting point, and temperature dependent properties, such as
vapour pressure, heat capacity, density, viscosity and heat conductivity. The scope of selection of
compounds for a particular application and heat transfer calculations is limited to those compounds for
which thermo-physical property data are available. Unfortunately, these are at present available only
for a small fraction of compounds. Therefore, methods for reliable prediction of property data are
needed.
Current methods used to predict temperature-dependent properties can be classified into "group
contribution" methods, methods based on the "corresponding-states principle" (for an extensive review
of these methods see Poling et al., 2001), and "asymptotic behaviour" correlations (see, for example,
Marano and Holder, 1997). However, Reliable methods for predicting temperature-dependent
properties have not yet been established. Even for vapour pressure, which is probably the most
extensively investigated temperature-dependent property, prediction errors often reach several tens of
percent Poling et al. [2001]. Furthermore, to predict properties of a target compound, many of these
methods require experimental data about the target compound (e.g., critical properties), and therefore
are inapplicable for property prediction of compounds for which no pertinent constant property data are
available.
We have recently developed several new computational methods for property prediction to complement
the traditional group contribution techniques, and Quantitative Structure-Property Relationships
(QSPR, Dearden [2003]). The newly developed methods include the Quantitative Structure-StructureProperty Relationship (QS2PR, Shacham et al. [2004], Brauner et al. [2005]) the short-cut QS2PR
method (SC-QS2PR, Cholakov et al. [2007]) and the targeted QSPR method (TQSPR, Brauner et al.
[2006], Shacham et al. [2007], Kahrs et al. [2007] ). The various methods that we have developed are
based on the use of molecular descriptors, which are calculated on the basis of the molecular structure
of the compound, for predicting its properties. However, unlike in the traditional QSPR methods, these
new methods are targeted to a particular compound, or a group of compounds, and rely on the
identification of a relatively small number of structurally similar compounds. Hence, they can provide
accurate predictions and estimates of the prediction error, while avoiding the need to model the highly
nonlinear relationships between molecular descriptors and properties that may require large amount of
experimental data. The extension of these methods to predict temperature dependent thermo-physical
properties is not a trivial task. Those properties are represented by models which often include
empirically fitted parameters. The latter do not comply with the structural similarity relationships when
compared with their corresponding values in other similar compounds.
The new methods we have developed overcome the difficulties and obstacles of existing methods,
which attempt to predict parameters for empirical (often nonlinear) models. This is accomplished using
structural similarity between compounds for point by point prediction of temperature dependent
properties. The new predicted data can subsequently be used to fit unknown model parameters if
required. The successful application of this approach is demonstrated for prediction of the variation
liquid vapor pressure and viscosity with temperature.
SOFTWARE AND DATABASES USED
Property prediction using QSPRs require the use of a stepwise regression program and physical
property and molecular descriptor databases. Modified versions of the stepwise regression program
(SROV) of Shacham and Brauner [2003] were used in the various stages of this research project. In the
early stages of the research descriptor and property database of Cholakov et al., [1999] and Wakeham et
al., [2002] were used. This database included 259 hydrocarbons for which 99 molecular descriptors and
five physical properties were available.
Recently we have developed a new database with 1630 descriptors calculated with version 5.3. of the
Dragon program [Todeschini et al., 2006; DRAGON is copyrighted by TALETE srl,
http://www.talete.mi.it] for 324 of hydrocarbons and oxygen containing organic compounds (alcohols
and acids). Published property data from the DIPPR [Rowley et al., 2006] and NIST databases
[National Institute of Standards, 2005], were used in the studies.
THE QS2PR AND SC-QS2PR METHODS
The QS2PR technique and its shortcut version, SC-QS2PR, have been described in detail previously
[Shacham et al. 2004, Brauner et al. 2005, Cholakov et al., 2007]. Hereunder, we provide a brief outline
of their main features.
Let us assume that the vector of properties of the target compound yt (the dependent variable) is
potentially related to a set of m vectors of properties of predictive compounds (independent variables)
x1, x2,xm. The following partition of the yt and x vectors to sub-vectors is used:
y ct
x ci
yt = ; x j =
y pt
x pi
(1)
where yct is an N vector of known properties, ypt is a K vector of unknown properties. Both the N vector
xci and the K vector xpi contain known properties. Typically, the sub-vectors yct and xci contain
properties, which are directly related to the molecular structure and can be calculated with high
accuracy (molecular descriptors), while the sub-vectors ypt and xpi contain measured properties with
various levels of experimental error. We wish to model the structure-structure relationship between yct
and m independent variables xc1, xc2, xcm by a linear regression model, with the general form:
y ct = 1 x c1 + 2 x c 2 m x cm +
(2)
where the weighing factors 1 , 2 m are model parameters to be estimated, and represents
independent normal errors with a constant variance.
The practical application of Eq. (2) requires preparation of a bank of potential predictive compounds as
a database. The same set of molecular descriptors must be defined for all compounds included in the
database, while the span of the molecular descriptors should reflect the difference between the
compounds in the database. Having the corresponding molecular descriptors for a target compound, yc,
defined as well, a stepwise regression procedure is applied in order to identify the most appropriate
predictive compounds to be included in the structure-structure regression model (Eq. 2) and estimate
the respective model parameters. The similarity between potential predictive compounds and the target
compound is measured by the partial correlation coefficient, rti, between the vector of the molecular
descriptors of the target compound, yct, and that of a potential predictive compound xci. The partial
correlation coefficient is defined as rti = y ct x Tci , where y ct and x ci are row vectors, centred (by
subtracting the mean) and normalized to unit length (by dividing by the Euclidean norm of the vector).
Absolute rti values close to one ( rti 1) indicate high correlation between vectors yct and xcj, and thus,
high level of similarity between the molecular structures of the target compound and the predictive
compound i.
Applying Eq.(2) with property values on its r.h.s (instead of descriptor values) results in a propertyproperty relation. Accordingly, the following equation can then be used to predict the j property of the
target compound yptj on the basis of the data available for that property for the m predictive compounds:
y ptj = 1 x p1 j + 2 x p 2 j + m x pmj
(3)
If the predictive and the target compounds belong to the same homologous series, the prediction
procedure can be simplified significantly by the use of the SC-QS2PR method. In such a case the
minimum information required for deriving a structure-structure relation for a target compound in terms
of m predictive compounds is the availability of m-1 non-collinear molecular descriptors for both the
predictive and the target compounds [Cholakov et al., 2007]. For example, for m = 3 the coefficients of
3
the structure-structure relation, y ct = j x cpi are obtained by the solution of the following system of
i =1
1 + 2 + 3 = 1
1nc1 + 2 nc 2 + 3nc 3 = nct
1 xc1 + 2 xc 2 + 3 xc 3 = y ct
(4)
where nc1, nc2 and nc3 are the numbers of carbon atoms of the predictive compounds and nct is the
number of the carbon atoms of the target compound. The first equation in (4) reflects the influence of
the descriptors that have the same value for the target and predictive compounds (e.g. the number of
CH3 groups in the n-alkane series). The molecular descriptor included in the third equation must be
well correlated with the property to be predicted for the group of compounds to which the target and
predictive compounds belong (i.e. the homologous series). Alternatively, another property (e.g., normal
boiling temperature) of the target and predictive compounds, which is well-correlated with the target
property (to be predicted), can replace the molecular descriptor in the third equation.
(x t x i )(x t x i )T
al. [2003]. rti 1 or dti 0 indicate a high similarity between the molecular structures of the target
compound and the ith predictive compound. Different methods for adding the predictive compounds to
the similarity group, all related to cluster algorithms [Hastie et al., 2001], were compared in Kahrs et al.
[2007].
The similarity group is used to obtain a training set for the development of the QSPR for compounds
that has been identified as structurally similar to the target compound. The training set is established by
selecting the np compounds with highest | rti| value for which experimental property values ypi are
available. The remaining compounds in the similarity group are used for validation.
For development of a TQSPR for a particular property of the target compound, a linear structureproperty relation is assumed of the form:
y p = 0 + 11 + 2 2 m m +
(5)
where y is a np-dimensional vector of the respective property values (np is the number of compounds
included in the training set), 1, 2 m are np-dimensional vectors of predictive molecular descriptors
(to be selected via a stepwise regression algorithm), 0 , 1 , 2 m are the corresponding model
parameters to be estimated, and is a np -dimensional vector of random measurement errors. A
stepwise regression program is used to determine which molecular descriptors should be included in the
QSPR to best represent the measured property data of the training set and to calculate the QSPR
parameter values. The TQSPR so-obtained can be subsequently employed for calculating estimated
property values for the target compound and for other compounds in the similarity group that do not
have measured data by
~
y pt = 0 + 1 t1 + 2 t 2 m tm
(6)
where ~
y pt is the estimated unknown property value of the respective compound and t1, t2 tm are
its corresponding molecular descriptor values.
The selection of a suitable set of predictive molecular descriptors for Eq (5) is a challenging problem,
since the number of candidates is in the order of (103), which prohibits the determination of the best of
all possible sets of predictive molecular descriptors by a full search procedure. The stepwise regression
program SROV [Shacham and Brauner, 2003] is used, which selects in each step one molecular
descriptor that reduces the prediction error most strongly. Two criteria for measuring the signal-to-noise
ratio in the j-th candidate descriptor (TNRj) and in the partial correlation of the j-th candidate descriptor
with the prediction residual (CNRj) ensure that the selected descriptors contain valuable information. A
detailed description of the TNR and CNR criteria and further algorithmic details can be found in
Shacham and Brauner [2003]. Additionally, model refinement (i.e, addition of more descriptors to the
model) stops when the variance of the model prediction error for the training set
n
p
1
(y pi (0 + 1 1i + 2 2i m mi ))2
s =
p m 1 i =1
(7)
2
. For an optimally refined model, the prediction error
falls below a pre-specified threshold value sgoal
should be approximately as large as the measurement error that is present in the property data. Thus, the
2
can be estimated from the relative measurement error r i of the property data by
threshold value sgoal
n
2
s goal
=
1 p
( r i y pi )2
p i =1
(8)
Table 1.
Structure-Structure Correlation Data for Ethyl-Cyclopentane.
No.
Compound
C atoms
1
methylcyclopentane
6
2 (target)
ethylcyclopentane
7
3
propylcyclopentane
8
4
butylcyclopentane
9
*
From the database of Cholakov et al. [1999]
Wiener's index
26
43
67
99
Table 2.
Properties of Ethyl-Cyclopentane: Prediction Errors
No
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Property
Critical Temperature
Critical Pressure
Critical Volume
Crit Compress Factor
Melting Point Temp.
Triple Pt Temperature
Normal Boiling Temp.
Liq Molar Volume
IG Heat of Formation
IG Gibbs of Formation
IG Absolute Entropy
Std Heat of Formation
Std Gibbs of Formation
Std Absolute Entropy
Heat Fusion at Melt Pt
Std Net Heat of Comb
Acentric Factor
Radius of Gyration
Units
K
Pa
m^3/kmol
unitless
K
K
K
m^3/kmol
J/kmol
J/kmol
J/kmol*K
J/kmol
J/kmol
J/kmol*K
J/kmol
J/kmol
unitless
m
(J/m^3)^0.
19 Solubility Parameter
5
20 van der Waals Volume
m^3/kmol
21 van der Waals Area
m^2
22 Refractive Index
unitless
23 Flash Point
K
24 Lower Flammability Limit
vol% in air
25 Upper Flammability Limit
vol% in air
26 Lower Flamm Limit Temp
K
27 Upper Flamm Limit Temp
K
28 Auto Ignition Temp
K
*Data from the DIPPR database [Rowley et al. 2006]
Reported
value*
569.5
3400000
0.375
0.269
134.71
134.71
376.62
0.128748
-1.2700E+08
4.4800E+07
3.7800E+05
-1.6300E+08
3.7600E+07
2.8000E+05
6.8700E+06
-4.2800E+09
0.270095
3.73E-10
Prediction
error (%)
1.10
0.57
0.38
0.81
5.95
5.95
0.73
0.33
0.13
0.21
0.06
0.25
0.58
0.09
22.63
0.00
2.58
0.05
16300
0.0704
8.8700E+08
1.4173
269
1.1
6.7
270
303
533.15
0.46
0.02
0.00
0.17
1.00
2.66
11.29
0.99
0.11
7.65
By introducing experimental data for other properties of the predictive compounds, the respective
properties for the target compound can be similarly predicted. The predicted values are compared in
Table 2 with data reported in the DIPPR database [Rowley et al. 2006], for 28 constant properties. In
this table the difference between the DIPPR and the predicted values is shown as "prediction error".
In assessing the prediction error, the experimental (or other) error in the published data should be taken
into consideration. In the DIPPR database the errors in the data are reported in terms of "reliability".
Obviously, the predicted property value cannot be more accurate than the published values of the same
property for the predictive compounds. Thus, if the prediction error is smaller than the reliability
reported by DIPPR the predicted property value can be considered accurate up to the experimental error
level. The predictions of the properties shown in Table 2 for ethyl-cyclopentane are accurate up to
experimental error level, except for the melting point temperature, the triple point temperature and the
heat of fusion at the melting point (for property numbers 23 through 28 the reliabilities are unknown).
Improving the prediction of melting point and related properties (such as heat of fusion) with the SCQSPR requires the use of a descriptor which is collinear with the target property values of the potential
predictive compounds (instead of the Wiener index) from among the database of the molecular
descriptors [Cholakov et al. 2007].
Similar results were obtained in predicting constant properties for a wide range of compounds using the
QS2PR and SC-QS2PR methods [Shacham et al. 2004, Brauner et al. 2005, Cholakov et al. 2007].
It should be pointed out that in order to obtain predictions within experimental error level when using
these methods, the data base must contain a small number of predictive compounds with very high level
of similarity to the target compound (such as being members of the same homologous series). The SCQS2PR method has been evaluated so far only for cases when the predictive and the target compounds
belong to the same homologous series, while the QS2PR method has been tested with systems
containing a wider variety of (similar) compounds. However, our experience has shown that in cases of
a wider structural variety of available predictive compounds the TQSPR method is a more robust
method (see the forthcoming section on prediction of the melting point of alkyl-benzenes using the
TQSPR method).
PREDICTION OF VAPOUR PRESSURE OF 1-BUTANOL BY THE SC QS2PR METHOD
Vapour pressure prediction can be based on the Antoine equation, for which the model parameters are
available for a large number of compounds [e.g., in the NIST, 2005]. The Antoine equation reads:
log( Pv ) = A
B
C +T
(9)
where T is the temperature (K), Pv is the vapour pressure (bar) and A, B and C are parameters that can
be obtained by regression of experimental data. An alternative form of the Antoine equation, which is
explicit in the saturation temperature (Ts) at pressure Pv is the following
Ts =
B
+C
A log( P)
(10)
Prediction of the vapour pressure as function of temperature (or saturation temperature as function of
pressure) involves the calculation of Pv for all predictive compounds at a specified temperature Ti and
predicting Pvi of the target compound at the same temperature using the SC-QS2PR method, similarly
to the way any other constant property was predicted in the previous section.
Table 3.
Data for Vapour Pressure Prediction: Ethanol, 1-Propanol 1-Pentanol (Predictive
Compounds) and 1-Butanol (Target Compound).
Ethanol
Low
Pressure
Range
High
Pressure
Range
No. of C Atoms
Descriptor HTe
Normal Boiling Temp. (K)
Temp. Range - Min.(K)
Temp. Range - Max.(K)
Press. Range -Min (bar)
Press. Range - Max (bar)
Antoine Constant A
Antoine Constant B
Antoine Constant C
Temp. Range - Min.(K)
Temp. Range - Max.(K)
Press. Range -Min (bar)
Press. Range - Max (bar)
Antoine Constant A
Antoine Constant B
Antoine Constant C
2
5.188
351.5
273
351.7
0.016
1.023
5.37229
1670.409
-40.191
364.8
513.91
1.575
57.101
4.92531
1432.526
-61.819
Predictive Compounds
1-propanol
1-pentanol
3
5
7.524
11.026
370.3
411.1
292.4
307
370.4
411
0.019
0.006
1.015
1.022
5.31384
4.68277
1690.864
1492.549
-51.804
-91.621
405.46
437.79
536.71
513.79
3.336
2.118
51.398
11.402
4.59871
3.97383
1300.491
1106.11
-86.364
-134.578
Target
1-butanol
4
9.324
390.6
362.36
398.84
0.330
1.334
4.50393
1313.878
-98.789
419.34
562.98
2.568
43.909
4.42921
1305.001
-94.676
The details of this method will be explained and demonstrated using ethanol, 1-propanol and 1-pentanol
as predictive compounds and 1-butanol as the target compound. The pertinent data for these compounds
are shown in Table 3. The data shown includes the number of C atoms (nC), the DRAGON molecular
descriptor HTe, normal boiling temperature (Tb), the Antoine constants and the temperature range and
pressure range of validity of the constants (from the NIST database). The Antoine constants for both the
low pressure and the high pressure ranges are included. The descriptor HTe (a GATEWAY descriptor)
has been selected as it was identified by SROV program as highly correlated with the Tb values of the
1-alchol series [see Brauner et al., 2007].
Looking at the temperature range of validity of the predictive compounds shows that there is very little
overlap between the ranges of the various compounds. However, there is almost a complete overlap
between the pressure ranges of validity. The differences between the Antoine equation coverage levels
for the two properties for the low pressure range are also shown in Figure 1.
Because of the much higher coverage level of the vapour pressure the prediction of Ts as function of Pv
is preferred. To this aim the SC-QS2PR is used with the molecular descriptor HTe. Introducing the nC
and HTe values of the predictive and target compounds into Eq. (4) yields the following parameter
values: 0 = -0.08376, 1 =0.62564 and 2 = 0.45812. To predict Ts of the target compound for a
specified pressure (say P), the P value (and the Antoine equation parameters of the predictive
compounds) are introduced into Eq. (10) to obtain the corresponding Ts values of the predictive
compound. Then, Eq. (3) with the above values of 0, 1 and 2 is used to predict the Ts value of the
target compound. Table 4 shows the so-predicted values of Ts of 1-butanol in comparison with the
values calculated directly by its Antoine equation (whose parameters are shown in Table 3). The error
in predicted Ts of 1-butanol is <0.2 % in the low pressure range and <0.3% in the high pressure range.
Ethanol
1-propanol
1-pentanol
1.2
0.8
0.6
0.4
0.2
0
275
300
325
350
375
400
425
Temperature (K)
Figure 1. Vapour pressure versus temperature of the predictive compounds in the low pressure range.
Table 4.
Comparison of Saturation Temperatures Obtained by the Antoine Equation for 1 - Butanol
with those Predicted by SC-QS2PR Method.
Pressure(bar)
0.02
0.1
0.5
1
3.5
5
7.5
10
407.1
419.8
435.6
447.7
457.1
472.3
491.5
506.5
431.5
445.5
462.9
476.4
% error
0.17
0.03
0.06
0.15
-0.22
-0.22
-0.23
-0.25
L = exp A +
+ C ln(T ) + DT E
T
(11)
where L is the viscosity (Pa*s) and A, B, C, D and E are parameters of the Riedel equation. The
pertinent data for the predictive and the target compounds are shown in Table 5. The data shown
includes nC , Tb, the temperature range, the associated viscosity range of the data available for these
compounds and the Riedel equation constants. In this case there is a complete overlap between the
temperature ranges of the predictive compounds (see Figure 2). Thus, no exchange between the
dependent and the independent variables (as was done in the case of the vapour pressure prediction) is
necessary.
Table 5.
Data and Riedel Equation Constants for Liquid Viscosity of Toluene (Target Compound) and
Ethylbenzene, Pentylbenzene and N-Heptylbenzene (Predictive Compounds).
No. of C Atoms
Normal Boiling Temp.
(K)
Temp. Range - Min.(K)
Temp. Range - Max.(K)
Visc. Range - Min (Pa*s)
Visc. Range - Max (Pa*s)
Riedel Constant A
Riedel Constant B
Riedel Constant C
Riedel Constant D
Riedel Constant E
ethylbenzene
8
Predictive Compounds
pentylbenzene n-heptylbenzene
11
13
409.35
178.2
413.1
8.01E-03
2.32E-04
-13.563
1208.6
0.377
-
478.61
253.15
478.61
3.66E-03
1.69E-04
-407.33
13112
67.647
-0.09652
1
n-heptylbenzene
pentylbenzene
DIPPR data
toluene
7
519.25
222.15
714
2.5E-02
6.12E-05
82.588
-8568.7
-12.521
910310
-1.9838
383.78
178.18
383.78
1.57E-03
2.43E-04
-226.08
6805.7
37.542
-0.060853
1
ethylbenzene
8.00E-03
7.00E-03
Viscosity (Pa*s)
6.00E-03
5.00E-03
4.00E-03
3.00E-03
2.00E-03
1.00E-03
0.00E+00
225.00
275.00
325.00
375.00
425.00
Table 6.
Comparison of Liquid Viscosity- Riedel Equation and Predicted Values
Temp. (K)
253.10
263.10
273.10
283.10
293.10
303.10
313.10
323.10
333.10
343.10
353.10
363.10
373.10
383.10
Viscosity (Pa*s)
Riedel Equation
Predicted
1.07E-03
1.09E-03
9.01E-04
8.95E-04
7.71E-04
7.59E-04
6.68E-04
6.58E-04
5.85E-04
5.79E-04
5.19E-04
5.14E-04
4.64E-04
4.61E-04
4.18E-04
4.16E-04
3.79E-04
3.78E-04
3.45E-04
3.45E-04
3.16E-04
3.16E-04
2.90E-04
2.90E-04
2.68E-04
2.68E-04
2.48E-04
2.48E-04
Error
%
-1.58
0.71
1.50
1.46
1.07
0.87
0.58
0.38
0.35
0.11
0.07
-0.17
-0.04
-0.09
330
310
290
270
250
230
210
190
170
150
5
11
13
15
17
19
21
23
25
No. of C atoms
Figure 3 shows the published Tm values plotted versus the nC. The first two compounds in the
homologous series (ethylbenzene and propylbenzene) exhibit anomalous variation of Tm, with a
decreasing trend with the nC, rather than increase as the rest of the compounds in the series.
To identify a TQSPR applicable to the whole series, all the members of the series for which Tm values
are available (except to n-nonylbenzene) were selected as training set. N-nonylbenzene was left out in
order to validate the TQSPR. The SROV program was used to identify the molecular descriptor with
the highest correlation with Tm for the training set. The descriptor BIC1 (from the 2D information
indices category: "bond information content, neighbourhood symmetry of 1-order") was identified as
having the highest correlation coefficient (of the value -0.99975) with Tm. Plotting Tm versus BIC1
yields a straight line (see Figure 4), meaning that the one descriptor TQSPR: Tm = 495.3913-628.1879
BIC1 provides adequate representation of Tm for the alkyl benzene homologous series. The prediction
error is greater than the reliability only for ethylbenzene (1.5 %, above the < 1%) reliability. Adding
one more descriptor to the TQSPR (MPC08, a 2D descriptor from the walk and path counts category:
"molecular path count of order 08") yields the TQSPR: Tm = 560.5368 - 1.4111 MPC08 - 751.2091
BIC1. The prediction errors associated with the latter are below experimental error level (see Table 7).
Table 7.
Reference Data and Results for Predicting Melting Point Temperature of Alkyl Benzenes Using
the Model: Tm = 560.5368 - 1.4111 MPC08 - 751.2091 BIC1
Tm (K)*
Reliability
Component Name
DIPPR
(%)*
ethylbenzene
178.2
1
propylbenzene
173.55
1
butylbenzene
185.3
pentylbenzene
198.15
3
hexylbenzene
212
1
heptylbenzene
225.15
1
oktylbenzene
237.15
3
n-nonylbenzene
249
1
n-decylbenzene
258.77
1
n-undecylbenzene
268
1
n-dodecylbenzene
275.927
1
n-tridecylbenzene
283.15
1
n-tetradecylbenzene
289.15
3
n-pentadecylbenzene
295.15
1
n-hexadecylbenzene
300.15
1
n-heptadecylbenzene
305.15
3
n-octadecylbenzene
309
1
n-nonadecylbenzene
n-eicosylbenzene
n-heneicosylbenzene
n-docosylbenzene
n-tricosylbenzene
n-tetracosylbenzene
1
Data from the DIPPR database,Rowley et al. [2006].
Tm (K)
Prediction
178.1714
172.3443
184.5463
198.2506
211.955
224.9082
237.77
248.3783
258.9865
268.0923
275.6957
283.2991
289.4001
295.5011
300.0996
304.6982
309.2968
313.1441
316.2403
319.3364
322.4325
324.7775
327.1224
% error
0.016
0.695
0.407
0.051
0.021
0.107
0.261
0.250
0.084
0.034
0.084
0.053
0.086
0.119
0.017
0.148
0.096
-
Descripto
r MPC08
0
2
4
6
8
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Descripto
r BIC1
0.509
0.513
0.493
0.471
0.449
0.428
0.409
0.393
0.377
0.363
0.351
0.339
0.329
0.319
0.311
0.303
0.295
0.288
0.282
0.276
0.27
0.265
0.26
330
310
290
270
250
230
210
y = -628.19x + 495.39
R2 = 0.9995
190
170
150
0.25
0.3
0.35
0.4
0.45
0.5
0.55
Descriptor BIC1
CONCLUSIONS
The examples presented in this paper have demonstrated that the combination of the QS2PR, SCQS2PR and TQSPR methods can provide reliable prediction for constant properties, such as normal
boiling and melting temperature, critical properties, heats of formation, etc. as well as for temperature
dependent properties for a wide variety of compounds. In this paper, the prediction of temperature
dependent properties has been demonstrated for liquid vapour pressure and viscosity. However, similar
results are obtained in predicting other temperature dependent properties, which include liquid density,
heat of vaporization, liquid and ideal gas heat capacities, liquid and vapour thermal conductivity and
surface tension. The results concerning these properties will be reported elsewhere.
The analysis and property prediction for a new target compound starts by identifying its similarity
group using vectors of descriptors of the target and the predictive compounds. Depending on the level
of similarity with potential predictive compounds, the most appropriate method (in terms of precision,
complexity and amount of data required) for property prediction can be selected from among the
QS2PR, SC-QS2PR and TQSPR methods. The key to reliable prediction of a property is the existence
of sufficient number of predictive compounds, which are similar to the target compound and for which
experimental target property values are available. The proposed methods may detect group of
compounds for which the application of the prediction methods is severely restricted by the availability
of experimental data and should be targeted by experimentalists.
The hybrid modeling framework we are developing allows for a systematical incorporation of
information obtained from molecular structure, physical understanding, and experimental data in order
to exploit the available data for the identification of unknown, complex structureproperty
relationships.
REFERENCES
Basak, S.C., Gute, B.D., Mills, D., Hawkins, D.M. [2003], Quantitative molecular similarity methods in
the property/toxicity estimation of chemicals: a comparison of arbitrary versus tailored similarity
spaces. J. Mol. Struct. Theochem., Vol. 622, pp127145.
Brauner, N., Shacham, M., Cholakov, G.St. and Stateva, R.P. [2005], Property Prediction by Similarity
of Molecular Structures Practical Application and Consistency Analysis, Chem. Eng. Sci. Vol. 60, pp
5458 5471.
Brauner, N., Stateva, R.P., Cholakov, G.St. and Shacham, M. [2006], Structurally Targeted
Quantitative Structure-Property Relationship Method for Property Prediction, Ind. Eng. Chem. Res., 45,
pp 8430-8437.
Brauner, N., Cholakov, G. St., Kahrs, O., Stateva, R.P. and Shacham, M. [2007], Linear QSPRs for
Predicting Pure Compound Properties in Homologous Series, AIChE J., Accepted for Publication.
Cholakov, G.St, Wakeham, W.A., Stateva, R.P. [1999], Estimation of normal boiling temperature of
industrially important hydrocarbons from descriptors of molecular structure. Fluid Phase Equilibria
Vol. 163, pp 2142.
Cholakov, G.St., Stateva, R.P., Shacham M. and Brauner, N., [2007], Identifying Equations that
Represent Properties in Homologous Series using Structure-Structure Relations, AIChE Journal , Vol.
53, pp 150-159.
Dearden, J.C. [2003], Quantitative structureproperty relationships for prediction of boiling point,
vapor pressure, and melting point. Environmental Toxicology and Chemistry, Vol. 22, pp 16961709.
Hastie, T., Tibshirani, R., Friedman, J. [2001], The Elements of Statistical Learning; Data Mining,
Inference, and Prediction, Springer-Verlag.
Kahrs, O., Brauner, N., Cholakov, G. St. and Stateva R. P., Marquardt, W. and Shacham, M., [2007],
Analysis and Refinement of the Targeted QSPR Method, Computers chem. Engng.,
doi:10.1016/j.compchemeng.2007.06.006.
Marano, J.J., Holder, G.D. [1997], General Equations for Correlating the Thermo-physical Properties of
n-Paraffins, n-Olefins and other Homologous Series. 2. Asymptotic Behaviour Correlations for PVT
Properties, Ind. Eng. Chem. Res., vol. 36, pp 1887-1894.
National Institute of Standards and Technology [NIST). In: Linstrom PJ, Mallard WG, eds. Chemistry
WebBook, NIST Standard Reference Database Number 69. Gaithersburg, MD:NIST; June 2005
(http://webbook.nist.gov).
Poling, B.E., Prausnitz, J. M., OConnel, J. P. [2001], Properties of Gases and Liquids, 5th Ed.,
McGraw-Hill, New York.
Rowley, R.L., Wilding, W.V., Oscarson, J.L., Yang, Y., Zundel, N.A. [2006], DIPPR Data
Compilation of Pure Chemical Properties Design Institute for Physical Properties. http//dippr.byu.edu,
Brigham Young University Provo Utah.
Shacham, M., Brauner, N. [2003], The SROV program for data analysis and regression model
identification. Computers and Chemical Engineering, Vol. 27, pp. 701714.
Shacham, M. , Brauner, N., Cholakov, G. St. and Stateva R. P. [2004], Property Prediction by
Correlations Based on Similarity of Molecular Structures, AIChE J., Vol. 50, No. 10, pp 2481-2492.
Shacham, M, Kahrs, O., Cholakov, G. St. and Stateva R. P. Marquardt, W. and Brauner, N. [2007], The
Role of the Dominant Descriptor in Targeted Quantitative Structure Property Relationships, Chem.
Eng. Sci., Vol. 62, No. 22, pp 6222-6233.
Todeschini, R., Consonni, V., Mauri, A., and Pavan, M., [2006], DRAGON user manual, Talete srl,
Milano, Italy.
Wakeham, W.A., Cholakov, G.St., Stateva, R.P. [2002], Liquid density and critical properties of
hydrocarbons estimated from molecular structure. Journal of Chemical and Engineering Data, Vol. 47,
pp 559570.