Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Proceedings of CHT-08

ICHMT International Symposium on Advances in Computational Heat Transfer


May 11-16, 2008, Marrakech, Morocco
CHT-08-183

PREDICTION OF THERMOPHYSICAL PROPERTIES BY METHODS BASED ON


SIMILARITY OF MOLECULAR STRUCTURES
Neima Brauner*, Georgi St. Cholakov**, Roumiana P. Stateva*** and Mordechai Shacham****
*
School of Engineering, Tel-Aviv University, Tel-Aviv, Israel
**
University of Chem. Technology and Metallurgy, Sofia, Bulgaria
***
Inst. Chem. Engng, Bulgarian Academy of Sciences, Sofia, Bulgaria
****
Chem. Eng. Dept., Ben-Gurion University, Beer-Sheva, Israel

Correspondence author. Fax: +972 3 6407334 Email: brauner@eng.tau.ac.il

ABSTRACT Prediction of thermophysical properties required for heat transfer calculation and the
design and development of thermal systems is considered. Newly developed computational methods for
property prediction are described and their use is demonstrated for the prediction of various constant
properties, such as normal boiling and melting temperature, critical properties, heats of formation, etc.,
as well as for the temperature dependent properties: vapour pressure and viscosity of liquids. The
computational methods discussed include the Quantitative Structure-Structure-Property Relationship
(QS2PR), the short-cut QS2PR method (SC-QS2PR) and the targeted QSPR method (TQSPR). These
methods are based on the use of molecular descriptors (calculated from the molecular structure) for
predicting properties. However, unlike in the traditional property prediction methods, these new
methods are targeted to a particular compound, or a group of compounds, and rely on the identification
of a relatively small number of structurally similar compounds. Hence, they can provide accurate
predictions and estimates of the prediction error. In the examples presented it is demonstrated that
proper combination of the proposed method can provide property prediction within the experimental
error level.
INTRODUCTION
Heat transfer calculation and the design and development of thermal systems require data for thermophysical properties. The required data include constant properties, such as critical pressure and
temperature, normal boiling temperature, melting point, and temperature dependent properties, such as
vapour pressure, heat capacity, density, viscosity and heat conductivity. The scope of selection of
compounds for a particular application and heat transfer calculations is limited to those compounds for
which thermo-physical property data are available. Unfortunately, these are at present available only
for a small fraction of compounds. Therefore, methods for reliable prediction of property data are
needed.
Current methods used to predict temperature-dependent properties can be classified into "group
contribution" methods, methods based on the "corresponding-states principle" (for an extensive review
of these methods see Poling et al., 2001), and "asymptotic behaviour" correlations (see, for example,
Marano and Holder, 1997). However, Reliable methods for predicting temperature-dependent
properties have not yet been established. Even for vapour pressure, which is probably the most
extensively investigated temperature-dependent property, prediction errors often reach several tens of

percent Poling et al. [2001]. Furthermore, to predict properties of a target compound, many of these
methods require experimental data about the target compound (e.g., critical properties), and therefore
are inapplicable for property prediction of compounds for which no pertinent constant property data are
available.
We have recently developed several new computational methods for property prediction to complement
the traditional group contribution techniques, and Quantitative Structure-Property Relationships
(QSPR, Dearden [2003]). The newly developed methods include the Quantitative Structure-StructureProperty Relationship (QS2PR, Shacham et al. [2004], Brauner et al. [2005]) the short-cut QS2PR
method (SC-QS2PR, Cholakov et al. [2007]) and the targeted QSPR method (TQSPR, Brauner et al.
[2006], Shacham et al. [2007], Kahrs et al. [2007] ). The various methods that we have developed are
based on the use of molecular descriptors, which are calculated on the basis of the molecular structure
of the compound, for predicting its properties. However, unlike in the traditional QSPR methods, these
new methods are targeted to a particular compound, or a group of compounds, and rely on the
identification of a relatively small number of structurally similar compounds. Hence, they can provide
accurate predictions and estimates of the prediction error, while avoiding the need to model the highly
nonlinear relationships between molecular descriptors and properties that may require large amount of
experimental data. The extension of these methods to predict temperature dependent thermo-physical
properties is not a trivial task. Those properties are represented by models which often include
empirically fitted parameters. The latter do not comply with the structural similarity relationships when
compared with their corresponding values in other similar compounds.
The new methods we have developed overcome the difficulties and obstacles of existing methods,
which attempt to predict parameters for empirical (often nonlinear) models. This is accomplished using
structural similarity between compounds for point by point prediction of temperature dependent
properties. The new predicted data can subsequently be used to fit unknown model parameters if
required. The successful application of this approach is demonstrated for prediction of the variation
liquid vapor pressure and viscosity with temperature.
SOFTWARE AND DATABASES USED
Property prediction using QSPRs require the use of a stepwise regression program and physical
property and molecular descriptor databases. Modified versions of the stepwise regression program
(SROV) of Shacham and Brauner [2003] were used in the various stages of this research project. In the
early stages of the research descriptor and property database of Cholakov et al., [1999] and Wakeham et
al., [2002] were used. This database included 259 hydrocarbons for which 99 molecular descriptors and
five physical properties were available.
Recently we have developed a new database with 1630 descriptors calculated with version 5.3. of the
Dragon program [Todeschini et al., 2006; DRAGON is copyrighted by TALETE srl,
http://www.talete.mi.it] for 324 of hydrocarbons and oxygen containing organic compounds (alcohols
and acids). Published property data from the DIPPR [Rowley et al., 2006] and NIST databases
[National Institute of Standards, 2005], were used in the studies.
THE QS2PR AND SC-QS2PR METHODS
The QS2PR technique and its shortcut version, SC-QS2PR, have been described in detail previously
[Shacham et al. 2004, Brauner et al. 2005, Cholakov et al., 2007]. Hereunder, we provide a brief outline
of their main features.

Let us assume that the vector of properties of the target compound yt (the dependent variable) is
potentially related to a set of m vectors of properties of predictive compounds (independent variables)
x1, x2,xm. The following partition of the yt and x vectors to sub-vectors is used:
y ct
x ci
yt = ; x j =
y pt
x pi

(1)

where yct is an N vector of known properties, ypt is a K vector of unknown properties. Both the N vector
xci and the K vector xpi contain known properties. Typically, the sub-vectors yct and xci contain
properties, which are directly related to the molecular structure and can be calculated with high
accuracy (molecular descriptors), while the sub-vectors ypt and xpi contain measured properties with
various levels of experimental error. We wish to model the structure-structure relationship between yct
and m independent variables xc1, xc2, xcm by a linear regression model, with the general form:
y ct = 1 x c1 + 2 x c 2 m x cm +

(2)

where the weighing factors 1 , 2 m are model parameters to be estimated, and represents
independent normal errors with a constant variance.
The practical application of Eq. (2) requires preparation of a bank of potential predictive compounds as
a database. The same set of molecular descriptors must be defined for all compounds included in the
database, while the span of the molecular descriptors should reflect the difference between the
compounds in the database. Having the corresponding molecular descriptors for a target compound, yc,
defined as well, a stepwise regression procedure is applied in order to identify the most appropriate
predictive compounds to be included in the structure-structure regression model (Eq. 2) and estimate
the respective model parameters. The similarity between potential predictive compounds and the target
compound is measured by the partial correlation coefficient, rti, between the vector of the molecular
descriptors of the target compound, yct, and that of a potential predictive compound xci. The partial
correlation coefficient is defined as rti = y ct x Tci , where y ct and x ci are row vectors, centred (by
subtracting the mean) and normalized to unit length (by dividing by the Euclidean norm of the vector).
Absolute rti values close to one ( rti 1) indicate high correlation between vectors yct and xcj, and thus,
high level of similarity between the molecular structures of the target compound and the predictive
compound i.
Applying Eq.(2) with property values on its r.h.s (instead of descriptor values) results in a propertyproperty relation. Accordingly, the following equation can then be used to predict the j property of the
target compound yptj on the basis of the data available for that property for the m predictive compounds:

y ptj = 1 x p1 j + 2 x p 2 j + m x pmj

(3)

If the predictive and the target compounds belong to the same homologous series, the prediction
procedure can be simplified significantly by the use of the SC-QS2PR method. In such a case the
minimum information required for deriving a structure-structure relation for a target compound in terms
of m predictive compounds is the availability of m-1 non-collinear molecular descriptors for both the
predictive and the target compounds [Cholakov et al., 2007]. For example, for m = 3 the coefficients of
3

the structure-structure relation, y ct = j x cpi are obtained by the solution of the following system of
i =1

three linear equations:

1 + 2 + 3 = 1
1nc1 + 2 nc 2 + 3nc 3 = nct
1 xc1 + 2 xc 2 + 3 xc 3 = y ct

(4)

where nc1, nc2 and nc3 are the numbers of carbon atoms of the predictive compounds and nct is the
number of the carbon atoms of the target compound. The first equation in (4) reflects the influence of
the descriptors that have the same value for the target and predictive compounds (e.g. the number of
CH3 groups in the n-alkane series). The molecular descriptor included in the third equation must be
well correlated with the property to be predicted for the group of compounds to which the target and
predictive compounds belong (i.e. the homologous series). Alternatively, another property (e.g., normal
boiling temperature) of the target and predictive compounds, which is well-correlated with the target
property (to be predicted), can replace the molecular descriptor in the third equation.

The targeted QSPR method


The targeted QSPR (TQSPR) technique is described in detail by Brauner et al. [2006], Shacham et al.
[2007], Kahrs et al. [2007]. The basic principles of the method are briefly reviewed here.
Similarly to the QS2PR method, the first stage of the TQSPR method involves identification of a
similarity group structurally related to the compound for which properties have to be predicted (the
target compound). For identification of the similarity group, a database of molecular descriptors, xji, is
required, where i is the number of the compound and j is the number of the descriptor. The similarity
between potential predictive compounds and the target compound is measured by the partial correlation
coefficient, rti, between the vector of the molecular descriptors of the target compound, xt, and that of a
potential predictive compound xi. The similarity can be measured alternatively by the Euclidean
distance, d ti =

(x t x i )(x t x i )T

between the molecular descriptor vectors as suggested by Basak et

al. [2003]. rti 1 or dti 0 indicate a high similarity between the molecular structures of the target
compound and the ith predictive compound. Different methods for adding the predictive compounds to
the similarity group, all related to cluster algorithms [Hastie et al., 2001], were compared in Kahrs et al.
[2007].
The similarity group is used to obtain a training set for the development of the QSPR for compounds
that has been identified as structurally similar to the target compound. The training set is established by
selecting the np compounds with highest | rti| value for which experimental property values ypi are
available. The remaining compounds in the similarity group are used for validation.
For development of a TQSPR for a particular property of the target compound, a linear structureproperty relation is assumed of the form:

y p = 0 + 11 + 2 2 m m +

(5)

where y is a np-dimensional vector of the respective property values (np is the number of compounds
included in the training set), 1, 2 m are np-dimensional vectors of predictive molecular descriptors
(to be selected via a stepwise regression algorithm), 0 , 1 , 2 m are the corresponding model
parameters to be estimated, and is a np -dimensional vector of random measurement errors. A
stepwise regression program is used to determine which molecular descriptors should be included in the
QSPR to best represent the measured property data of the training set and to calculate the QSPR
parameter values. The TQSPR so-obtained can be subsequently employed for calculating estimated

property values for the target compound and for other compounds in the similarity group that do not
have measured data by

~
y pt = 0 + 1 t1 + 2 t 2 m tm

(6)

where ~
y pt is the estimated unknown property value of the respective compound and t1, t2 tm are
its corresponding molecular descriptor values.
The selection of a suitable set of predictive molecular descriptors for Eq (5) is a challenging problem,
since the number of candidates is in the order of (103), which prohibits the determination of the best of
all possible sets of predictive molecular descriptors by a full search procedure. The stepwise regression
program SROV [Shacham and Brauner, 2003] is used, which selects in each step one molecular
descriptor that reduces the prediction error most strongly. Two criteria for measuring the signal-to-noise
ratio in the j-th candidate descriptor (TNRj) and in the partial correlation of the j-th candidate descriptor
with the prediction residual (CNRj) ensure that the selected descriptors contain valuable information. A
detailed description of the TNR and CNR criteria and further algorithmic details can be found in
Shacham and Brauner [2003]. Additionally, model refinement (i.e, addition of more descriptors to the
model) stops when the variance of the model prediction error for the training set
n

p
1
(y pi (0 + 1 1i + 2 2i m mi ))2
s =

p m 1 i =1

(7)

2
. For an optimally refined model, the prediction error
falls below a pre-specified threshold value sgoal

should be approximately as large as the measurement error that is present in the property data. Thus, the
2
can be estimated from the relative measurement error r i of the property data by
threshold value sgoal
n

2
s goal
=

1 p
( r i y pi )2

p i =1

(8)

PREDICTION OF CONSTANT PROPERTIES OF ETHYL- CYCLOPENTANE BY THE SC


QS2PR METHOD

Prediction of constant properties of ethyl-cyclopentane (target compound) is used to demonstrate the


application of the SC QS2PR method. The data used for deriving the structure-structure correlation
for ethyl-cyclopentane is shown in Table 1. Three of the neighbouring compounds: methyl-, propyland butyl-cyclopentane were employed as predictive compounds. From amongst the molecular
descriptors, which are asymptotically dependent of the number of carbon atoms, the easily calculated
Wiener's index from the database of Cholakov et al. [1999] was arbitrarily selected as the additional
molecular descriptor for deriving the structure-structure relationship. Introducing the numerical values
of the number of carbon atoms and the Wiener's index into equation (4) yields the results: 1 = 0.58537;
2 = 0.2439; and 3 = 0.17073. These coefficients can be used in the property-property correlation, Eq.
(3) to predict various properties of ethyl-cyclopentane. For example, introducing the boiling point data
from Table 1 and the coefficients into Eq. (3) yields an estimate for the boiling temperature of ethylcyclopentane:
0.58537*344.96+0.2439*404.11+ 0.17073*429.8 = 373.87 K
Comparison of the predicted value with the experimental data shown in Table 1 indicates a prediction
error of 0.73 %.

Table 1.
Structure-Structure Correlation Data for Ethyl-Cyclopentane.
No.
Compound
C atoms
1
methylcyclopentane
6
2 (target)
ethylcyclopentane
7
3
propylcyclopentane
8
4
butylcyclopentane
9
*
From the database of Cholakov et al. [1999]

Wiener's index
26
43
67
99

Boiling Temp. (K)*


344.96
376.62
404.11
429.8

Table 2.
Properties of Ethyl-Cyclopentane: Prediction Errors
No
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Property
Critical Temperature
Critical Pressure
Critical Volume
Crit Compress Factor
Melting Point Temp.
Triple Pt Temperature
Normal Boiling Temp.
Liq Molar Volume
IG Heat of Formation
IG Gibbs of Formation
IG Absolute Entropy
Std Heat of Formation
Std Gibbs of Formation
Std Absolute Entropy
Heat Fusion at Melt Pt
Std Net Heat of Comb
Acentric Factor
Radius of Gyration

Units

K
Pa
m^3/kmol
unitless
K
K
K
m^3/kmol
J/kmol
J/kmol
J/kmol*K
J/kmol
J/kmol
J/kmol*K
J/kmol
J/kmol
unitless
m
(J/m^3)^0.
19 Solubility Parameter
5
20 van der Waals Volume
m^3/kmol
21 van der Waals Area
m^2
22 Refractive Index
unitless
23 Flash Point
K
24 Lower Flammability Limit
vol% in air
25 Upper Flammability Limit
vol% in air
26 Lower Flamm Limit Temp
K
27 Upper Flamm Limit Temp
K
28 Auto Ignition Temp
K
*Data from the DIPPR database [Rowley et al. 2006]

Reported
value*
569.5
3400000
0.375
0.269
134.71
134.71
376.62
0.128748
-1.2700E+08
4.4800E+07
3.7800E+05
-1.6300E+08
3.7600E+07
2.8000E+05
6.8700E+06
-4.2800E+09
0.270095
3.73E-10

Prediction
error (%)
1.10
0.57
0.38
0.81
5.95
5.95
0.73
0.33
0.13
0.21
0.06
0.25
0.58
0.09
22.63
0.00
2.58
0.05

16300
0.0704
8.8700E+08
1.4173
269
1.1
6.7
270
303
533.15

0.46
0.02
0.00
0.17
1.00
2.66
11.29
0.99
0.11
7.65

By introducing experimental data for other properties of the predictive compounds, the respective
properties for the target compound can be similarly predicted. The predicted values are compared in
Table 2 with data reported in the DIPPR database [Rowley et al. 2006], for 28 constant properties. In
this table the difference between the DIPPR and the predicted values is shown as "prediction error".
In assessing the prediction error, the experimental (or other) error in the published data should be taken
into consideration. In the DIPPR database the errors in the data are reported in terms of "reliability".
Obviously, the predicted property value cannot be more accurate than the published values of the same
property for the predictive compounds. Thus, if the prediction error is smaller than the reliability
reported by DIPPR the predicted property value can be considered accurate up to the experimental error
level. The predictions of the properties shown in Table 2 for ethyl-cyclopentane are accurate up to
experimental error level, except for the melting point temperature, the triple point temperature and the
heat of fusion at the melting point (for property numbers 23 through 28 the reliabilities are unknown).
Improving the prediction of melting point and related properties (such as heat of fusion) with the SCQSPR requires the use of a descriptor which is collinear with the target property values of the potential
predictive compounds (instead of the Wiener index) from among the database of the molecular
descriptors [Cholakov et al. 2007].
Similar results were obtained in predicting constant properties for a wide range of compounds using the
QS2PR and SC-QS2PR methods [Shacham et al. 2004, Brauner et al. 2005, Cholakov et al. 2007].
It should be pointed out that in order to obtain predictions within experimental error level when using
these methods, the data base must contain a small number of predictive compounds with very high level
of similarity to the target compound (such as being members of the same homologous series). The SCQS2PR method has been evaluated so far only for cases when the predictive and the target compounds
belong to the same homologous series, while the QS2PR method has been tested with systems
containing a wider variety of (similar) compounds. However, our experience has shown that in cases of
a wider structural variety of available predictive compounds the TQSPR method is a more robust
method (see the forthcoming section on prediction of the melting point of alkyl-benzenes using the
TQSPR method).
PREDICTION OF VAPOUR PRESSURE OF 1-BUTANOL BY THE SC QS2PR METHOD

Vapour pressure prediction can be based on the Antoine equation, for which the model parameters are
available for a large number of compounds [e.g., in the NIST, 2005]. The Antoine equation reads:
log( Pv ) = A

B
C +T

(9)

where T is the temperature (K), Pv is the vapour pressure (bar) and A, B and C are parameters that can
be obtained by regression of experimental data. An alternative form of the Antoine equation, which is
explicit in the saturation temperature (Ts) at pressure Pv is the following
Ts =

B
+C
A log( P)

(10)

Prediction of the vapour pressure as function of temperature (or saturation temperature as function of
pressure) involves the calculation of Pv for all predictive compounds at a specified temperature Ti and
predicting Pvi of the target compound at the same temperature using the SC-QS2PR method, similarly
to the way any other constant property was predicted in the previous section.

Table 3.
Data for Vapour Pressure Prediction: Ethanol, 1-Propanol 1-Pentanol (Predictive
Compounds) and 1-Butanol (Target Compound).
Ethanol

Low
Pressure
Range

High
Pressure
Range

No. of C Atoms
Descriptor HTe
Normal Boiling Temp. (K)
Temp. Range - Min.(K)
Temp. Range - Max.(K)
Press. Range -Min (bar)
Press. Range - Max (bar)
Antoine Constant A
Antoine Constant B
Antoine Constant C
Temp. Range - Min.(K)
Temp. Range - Max.(K)
Press. Range -Min (bar)
Press. Range - Max (bar)
Antoine Constant A
Antoine Constant B
Antoine Constant C

2
5.188
351.5
273
351.7
0.016
1.023
5.37229
1670.409
-40.191
364.8
513.91
1.575
57.101
4.92531
1432.526
-61.819

Predictive Compounds
1-propanol
1-pentanol
3
5
7.524
11.026
370.3
411.1
292.4
307
370.4
411
0.019
0.006
1.015
1.022
5.31384
4.68277
1690.864
1492.549
-51.804
-91.621
405.46
437.79
536.71
513.79
3.336
2.118
51.398
11.402
4.59871
3.97383
1300.491
1106.11
-86.364
-134.578

Target
1-butanol
4
9.324
390.6
362.36
398.84
0.330
1.334
4.50393
1313.878
-98.789
419.34
562.98
2.568
43.909
4.42921
1305.001
-94.676

The details of this method will be explained and demonstrated using ethanol, 1-propanol and 1-pentanol
as predictive compounds and 1-butanol as the target compound. The pertinent data for these compounds
are shown in Table 3. The data shown includes the number of C atoms (nC), the DRAGON molecular
descriptor HTe, normal boiling temperature (Tb), the Antoine constants and the temperature range and
pressure range of validity of the constants (from the NIST database). The Antoine constants for both the
low pressure and the high pressure ranges are included. The descriptor HTe (a GATEWAY descriptor)
has been selected as it was identified by SROV program as highly correlated with the Tb values of the
1-alchol series [see Brauner et al., 2007].
Looking at the temperature range of validity of the predictive compounds shows that there is very little
overlap between the ranges of the various compounds. However, there is almost a complete overlap
between the pressure ranges of validity. The differences between the Antoine equation coverage levels
for the two properties for the low pressure range are also shown in Figure 1.
Because of the much higher coverage level of the vapour pressure the prediction of Ts as function of Pv
is preferred. To this aim the SC-QS2PR is used with the molecular descriptor HTe. Introducing the nC
and HTe values of the predictive and target compounds into Eq. (4) yields the following parameter
values: 0 = -0.08376, 1 =0.62564 and 2 = 0.45812. To predict Ts of the target compound for a
specified pressure (say P), the P value (and the Antoine equation parameters of the predictive
compounds) are introduced into Eq. (10) to obtain the corresponding Ts values of the predictive
compound. Then, Eq. (3) with the above values of 0, 1 and 2 is used to predict the Ts value of the
target compound. Table 4 shows the so-predicted values of Ts of 1-butanol in comparison with the
values calculated directly by its Antoine equation (whose parameters are shown in Table 3). The error
in predicted Ts of 1-butanol is <0.2 % in the low pressure range and <0.3% in the high pressure range.

Ethanol

1-propanol

1-pentanol

1.2

Vapor Pressure (bar)

0.8

0.6

0.4

0.2

0
275

300

325

350

375

400

425

Temperature (K)

Figure 1. Vapour pressure versus temperature of the predictive compounds in the low pressure range.

Table 4.
Comparison of Saturation Temperatures Obtained by the Antoine Equation for 1 - Butanol
with those Predicted by SC-QS2PR Method.
Pressure(bar)
0.02
0.1
0.5
1
3.5
5
7.5
10

Saturation Temperature (K)


Ethanol 1-propanol 1-pentanol
276.4
292.9
325.5
302.3
319.6
354.3
334.6
352.9
391.1
351.1
370.0
410.4
388.8
400.8
415.5
426.8

407.1
419.8
435.6
447.7

457.1
472.3
491.5
506.5

Sat. Temp. (K) 1 -butanol


Antoine Eq.
SC-QS2PR
309.8
309.2
337.0
336.9
372.2
372.0
390.6
390.1
430.6
444.5
461.9
475.2

431.5
445.5
462.9
476.4

% error
0.17
0.03
0.06
0.15
-0.22
-0.22
-0.23
-0.25

PREDICTION LIQUID VISCOSITY OF 1-BUTANOL BY THE SC QS2PR METHOD


Liquid viscosity data correlation for the predictive compounds ethylbenzene, pentylbenzene and nheptylbenzene are used to predict the viscosity of liquid toluene. The following Riedel equation is used
by DIPPR to represent the variation of the liquid viscosity with the temperature:

L = exp A +

+ C ln(T ) + DT E
T

(11)

where L is the viscosity (Pa*s) and A, B, C, D and E are parameters of the Riedel equation. The
pertinent data for the predictive and the target compounds are shown in Table 5. The data shown
includes nC , Tb, the temperature range, the associated viscosity range of the data available for these
compounds and the Riedel equation constants. In this case there is a complete overlap between the
temperature ranges of the predictive compounds (see Figure 2). Thus, no exchange between the
dependent and the independent variables (as was done in the case of the vapour pressure prediction) is
necessary.

Table 5.
Data and Riedel Equation Constants for Liquid Viscosity of Toluene (Target Compound) and
Ethylbenzene, Pentylbenzene and N-Heptylbenzene (Predictive Compounds).

No. of C Atoms
Normal Boiling Temp.
(K)
Temp. Range - Min.(K)
Temp. Range - Max.(K)
Visc. Range - Min (Pa*s)
Visc. Range - Max (Pa*s)
Riedel Constant A
Riedel Constant B
Riedel Constant C
Riedel Constant D
Riedel Constant E

ethylbenzene
8

Predictive Compounds
pentylbenzene n-heptylbenzene
11
13

409.35
178.2
413.1
8.01E-03
2.32E-04
-13.563
1208.6
0.377
-

478.61
253.15
478.61
3.66E-03
1.69E-04
-407.33
13112
67.647
-0.09652
1

n-heptylbenzene

pentylbenzene

DIPPR data
toluene
7

519.25
222.15
714
2.5E-02
6.12E-05
82.588
-8568.7
-12.521
910310
-1.9838

383.78
178.18
383.78
1.57E-03
2.43E-04
-226.08
6805.7
37.542
-0.060853
1

ethylbenzene

8.00E-03
7.00E-03

Viscosity (Pa*s)

6.00E-03
5.00E-03
4.00E-03
3.00E-03

2.00E-03
1.00E-03
0.00E+00
225.00

275.00

325.00

375.00

425.00

Tem perature (K)

Figure 2. Liquid viscosity versus temperature of the predictive compounds


To predict the liquid viscosity of the toluene, the nC and Tb values of the predictive and target
compounds are introduced into Eq. (4) yielding the following parameter values: 0 = 1.63253, 1 = 1.081325 and 2 = 0.4488. Predicted value of the viscosity of the target at a specified temperature can
now be obtained by calculating the viscosity of the predictive compounds using the Riedel equation
(11) (with the parameters of Table 5) and introducing the viscosity and the values into Eq. 3. The so
predicted viscosity values, and viscosity calculated by the Riedel equation are shown in Table 6. The
maximal prediction error is <1.6 %. Considering that the upper error bound in the published liquid
viscosity correlations for toluene is 3 % (on the basis of experimental data, see the DIPPR database,
[Rowley 2006], we can conclude that in this case the prediction error is within the experimental error
level.

Table 6.
Comparison of Liquid Viscosity- Riedel Equation and Predicted Values
Temp. (K)
253.10
263.10
273.10
283.10
293.10
303.10
313.10
323.10
333.10
343.10
353.10
363.10
373.10
383.10

Viscosity (Pa*s)
Riedel Equation
Predicted
1.07E-03
1.09E-03
9.01E-04
8.95E-04
7.71E-04
7.59E-04
6.68E-04
6.58E-04
5.85E-04
5.79E-04
5.19E-04
5.14E-04
4.64E-04
4.61E-04
4.18E-04
4.16E-04
3.79E-04
3.78E-04
3.45E-04
3.45E-04
3.16E-04
3.16E-04
2.90E-04
2.90E-04
2.68E-04
2.68E-04
2.48E-04
2.48E-04

Error
%
-1.58
0.71
1.50
1.46
1.07
0.87
0.58
0.38
0.35
0.11
0.07
-0.17
-0.04
-0.09

PREDICTION MELTING POINT FOR THE ALKYL BENZENE HOMOLOGOUS SERIES


BY THE TQSPR METHOD
The Tm data for the members of the alkyl-benzene series included in this study are shown in Table 7.
Published (experimental or smoothened) Tm data are available in the DIPPR database for compounds
containing 8 through 24 carbon atoms (ethylbenzene through n-octadecylbenzene). The reliability of
these Tm data varies between <1% and <3% as shown in Table 7.

330
310

Melting Point Temp. (K)

290
270
250
230
210
190
170
150
5

11

13

15

17

19

21

23

25

No. of C atoms

Figure 3. Melting Temperatures of Alkyl-Benzenes Versus Number of Carbon Atoms

Figure 3 shows the published Tm values plotted versus the nC. The first two compounds in the
homologous series (ethylbenzene and propylbenzene) exhibit anomalous variation of Tm, with a
decreasing trend with the nC, rather than increase as the rest of the compounds in the series.
To identify a TQSPR applicable to the whole series, all the members of the series for which Tm values
are available (except to n-nonylbenzene) were selected as training set. N-nonylbenzene was left out in
order to validate the TQSPR. The SROV program was used to identify the molecular descriptor with
the highest correlation with Tm for the training set. The descriptor BIC1 (from the 2D information
indices category: "bond information content, neighbourhood symmetry of 1-order") was identified as
having the highest correlation coefficient (of the value -0.99975) with Tm. Plotting Tm versus BIC1
yields a straight line (see Figure 4), meaning that the one descriptor TQSPR: Tm = 495.3913-628.1879
BIC1 provides adequate representation of Tm for the alkyl benzene homologous series. The prediction
error is greater than the reliability only for ethylbenzene (1.5 %, above the < 1%) reliability. Adding
one more descriptor to the TQSPR (MPC08, a 2D descriptor from the walk and path counts category:
"molecular path count of order 08") yields the TQSPR: Tm = 560.5368 - 1.4111 MPC08 - 751.2091
BIC1. The prediction errors associated with the latter are below experimental error level (see Table 7).

Table 7.
Reference Data and Results for Predicting Melting Point Temperature of Alkyl Benzenes Using
the Model: Tm = 560.5368 - 1.4111 MPC08 - 751.2091 BIC1

Tm (K)*
Reliability
Component Name
DIPPR
(%)*
ethylbenzene
178.2
1
propylbenzene
173.55
1
butylbenzene
185.3
pentylbenzene
198.15
3
hexylbenzene
212
1
heptylbenzene
225.15
1
oktylbenzene
237.15
3
n-nonylbenzene
249
1
n-decylbenzene
258.77
1
n-undecylbenzene
268
1
n-dodecylbenzene
275.927
1
n-tridecylbenzene
283.15
1
n-tetradecylbenzene
289.15
3
n-pentadecylbenzene
295.15
1
n-hexadecylbenzene
300.15
1
n-heptadecylbenzene
305.15
3
n-octadecylbenzene
309
1
n-nonadecylbenzene
n-eicosylbenzene
n-heneicosylbenzene
n-docosylbenzene
n-tricosylbenzene
n-tetracosylbenzene
1
Data from the DIPPR database,Rowley et al. [2006].

Tm (K)
Prediction
178.1714
172.3443
184.5463
198.2506
211.955
224.9082
237.77
248.3783
258.9865
268.0923
275.6957
283.2991
289.4001
295.5011
300.0996
304.6982
309.2968
313.1441
316.2403
319.3364
322.4325
324.7775
327.1224

% error
0.016
0.695
0.407
0.051
0.021
0.107
0.261
0.250
0.084
0.034
0.084
0.053
0.086
0.119
0.017
0.148
0.096
-

Descripto
r MPC08
0
2
4
6
8
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Descripto
r BIC1
0.509
0.513
0.493
0.471
0.449
0.428
0.409
0.393
0.377
0.363
0.351
0.339
0.329
0.319
0.311
0.303
0.295
0.288
0.282
0.276
0.27
0.265
0.26

330
310

Melting Point Temp.

290
270
250
230
210

y = -628.19x + 495.39
R2 = 0.9995

190
170
150
0.25

0.3

0.35

0.4

0.45

0.5

0.55

Descriptor BIC1

Figure 4. Melting Temperatures of Alkyl-Benzenes Plotted Versus the Descriptor BIC1

CONCLUSIONS
The examples presented in this paper have demonstrated that the combination of the QS2PR, SCQS2PR and TQSPR methods can provide reliable prediction for constant properties, such as normal
boiling and melting temperature, critical properties, heats of formation, etc. as well as for temperature
dependent properties for a wide variety of compounds. In this paper, the prediction of temperature
dependent properties has been demonstrated for liquid vapour pressure and viscosity. However, similar
results are obtained in predicting other temperature dependent properties, which include liquid density,
heat of vaporization, liquid and ideal gas heat capacities, liquid and vapour thermal conductivity and
surface tension. The results concerning these properties will be reported elsewhere.
The analysis and property prediction for a new target compound starts by identifying its similarity
group using vectors of descriptors of the target and the predictive compounds. Depending on the level
of similarity with potential predictive compounds, the most appropriate method (in terms of precision,
complexity and amount of data required) for property prediction can be selected from among the
QS2PR, SC-QS2PR and TQSPR methods. The key to reliable prediction of a property is the existence
of sufficient number of predictive compounds, which are similar to the target compound and for which
experimental target property values are available. The proposed methods may detect group of
compounds for which the application of the prediction methods is severely restricted by the availability
of experimental data and should be targeted by experimentalists.
The hybrid modeling framework we are developing allows for a systematical incorporation of
information obtained from molecular structure, physical understanding, and experimental data in order
to exploit the available data for the identification of unknown, complex structureproperty
relationships.

REFERENCES
Basak, S.C., Gute, B.D., Mills, D., Hawkins, D.M. [2003], Quantitative molecular similarity methods in
the property/toxicity estimation of chemicals: a comparison of arbitrary versus tailored similarity
spaces. J. Mol. Struct. Theochem., Vol. 622, pp127145.
Brauner, N., Shacham, M., Cholakov, G.St. and Stateva, R.P. [2005], Property Prediction by Similarity
of Molecular Structures Practical Application and Consistency Analysis, Chem. Eng. Sci. Vol. 60, pp
5458 5471.
Brauner, N., Stateva, R.P., Cholakov, G.St. and Shacham, M. [2006], Structurally Targeted
Quantitative Structure-Property Relationship Method for Property Prediction, Ind. Eng. Chem. Res., 45,
pp 8430-8437.
Brauner, N., Cholakov, G. St., Kahrs, O., Stateva, R.P. and Shacham, M. [2007], Linear QSPRs for
Predicting Pure Compound Properties in Homologous Series, AIChE J., Accepted for Publication.
Cholakov, G.St, Wakeham, W.A., Stateva, R.P. [1999], Estimation of normal boiling temperature of
industrially important hydrocarbons from descriptors of molecular structure. Fluid Phase Equilibria
Vol. 163, pp 2142.
Cholakov, G.St., Stateva, R.P., Shacham M. and Brauner, N., [2007], Identifying Equations that
Represent Properties in Homologous Series using Structure-Structure Relations, AIChE Journal , Vol.
53, pp 150-159.
Dearden, J.C. [2003], Quantitative structureproperty relationships for prediction of boiling point,
vapor pressure, and melting point. Environmental Toxicology and Chemistry, Vol. 22, pp 16961709.
Hastie, T., Tibshirani, R., Friedman, J. [2001], The Elements of Statistical Learning; Data Mining,
Inference, and Prediction, Springer-Verlag.
Kahrs, O., Brauner, N., Cholakov, G. St. and Stateva R. P., Marquardt, W. and Shacham, M., [2007],
Analysis and Refinement of the Targeted QSPR Method, Computers chem. Engng.,
doi:10.1016/j.compchemeng.2007.06.006.
Marano, J.J., Holder, G.D. [1997], General Equations for Correlating the Thermo-physical Properties of
n-Paraffins, n-Olefins and other Homologous Series. 2. Asymptotic Behaviour Correlations for PVT
Properties, Ind. Eng. Chem. Res., vol. 36, pp 1887-1894.
National Institute of Standards and Technology [NIST). In: Linstrom PJ, Mallard WG, eds. Chemistry
WebBook, NIST Standard Reference Database Number 69. Gaithersburg, MD:NIST; June 2005
(http://webbook.nist.gov).
Poling, B.E., Prausnitz, J. M., OConnel, J. P. [2001], Properties of Gases and Liquids, 5th Ed.,
McGraw-Hill, New York.
Rowley, R.L., Wilding, W.V., Oscarson, J.L., Yang, Y., Zundel, N.A. [2006], DIPPR Data
Compilation of Pure Chemical Properties Design Institute for Physical Properties. http//dippr.byu.edu,
Brigham Young University Provo Utah.

Shacham, M., Brauner, N. [2003], The SROV program for data analysis and regression model
identification. Computers and Chemical Engineering, Vol. 27, pp. 701714.
Shacham, M. , Brauner, N., Cholakov, G. St. and Stateva R. P. [2004], Property Prediction by
Correlations Based on Similarity of Molecular Structures, AIChE J., Vol. 50, No. 10, pp 2481-2492.
Shacham, M, Kahrs, O., Cholakov, G. St. and Stateva R. P. Marquardt, W. and Brauner, N. [2007], The
Role of the Dominant Descriptor in Targeted Quantitative Structure Property Relationships, Chem.
Eng. Sci., Vol. 62, No. 22, pp 6222-6233.
Todeschini, R., Consonni, V., Mauri, A., and Pavan, M., [2006], DRAGON user manual, Talete srl,
Milano, Italy.
Wakeham, W.A., Cholakov, G.St., Stateva, R.P. [2002], Liquid density and critical properties of
hydrocarbons estimated from molecular structure. Journal of Chemical and Engineering Data, Vol. 47,
pp 559570.

You might also like