Professional Documents
Culture Documents
Comparison of Probability Distribution Functions For Fitting Distillation Curves of Petroleum
Comparison of Probability Distribution Functions For Fitting Distillation Curves of Petroleum
The fitting capability of 25 probability distribution functions for distillation data of petroleum fractions was
analyzed in this work. Rankings of all the functions based on two different approaches were established after
a statistical analysis of the fit of the functions with a data set of 137 distillation curves. In general, distribution
functions with four parameters showed better fitting capability than those with three parameters. Two-parameter
functions were not effective in fitting distillation data. The Weibull extreme, Kumaraswamy, and Weibull
functions were found to be the best distribution functions for fitting distillation data considering their ranking
and the required CPU time. These distribution functions exhibited the lowest Akaike information criterion and
Bayesian information criterion average values, standard deviations lower than 1%, correlation coefficients
higher than 0.999, and residuals randomly distributed without any tendency. The fitting capability of the best
functions was validated with an independent set of distillation data, and the ranking was the same.
Introduction have been used for interpolating points along the distillation
curve. Another approach, which offers more accurate adjust-
Specialized characterization of petroleum has been the topic ments, is the use of least-square methods for fitting probability
of a number of research papers, which can be from the well- distribution functions to distillation data.
known and used “assays” to sophisticated studies based on up- Probability distribution functions have been utilized in a wide
to-date laboratory techniques, for example, NMR, MS, XRD, variety of applications: in process simulation software during
and so forth. Since the early 1930s, several studies have the creation of pseudocomponents, which are used together with
discussed the implementation of more accurate characterization quadrature techniques for determining the optimal number of
methods.1 While sophisticated approaches to characterize pe- pseudocomponents for simulation purposes, for example, char-
troleum have enhanced our understanding of its structure, acterization of petroleum fractions;3-5 for phase equilibrium
traditional characterization methods are still widely employed. calculations when continuous thermodynamics methods are
Empirical correlations are also very popular to estimate “bulk” applied;6-8 and to describe the extent of chemical transforma-
properties of petroleum fractions, which are mainly based on tions occurring during petroleum refining processes, which is a
distillation curves and specific gravities. These correlations are relatively recent application of the probability distribution
very useful in process engineering, particularly when the functions.9,10 Other examples include the description of poly-
available experimental data are limited.2 In the case of distil- merization reaction products, which are mixtures of compounds
lation curves, sometimes a limited amount of distillation points with different molecular weights and use in environmental
are available and it is necessary to interpolate/extrapolate to studies for representing the size distribution of particles of dust
determine a required value. Several American Society for and aerosols, the rain precipitation per day, the level of rivers
Testing and Materials (ASTM) methods are commonly used to and lakes, and other meteorological phenomena.11
obtain distillation data: ASTM D-5307, ASTM D-2892, ASTM
Distillation data and specific gravity are the most common
D-1160, ASTM D-86, and so forth. All of them employ
properties used as inputs in empirical correlations to characterize
standardized devices and report boiling point temperatures of
petroleum fractions. This characterization is achieved by means
the sample versus distillation yields on a volumetric and/or a
of correlations that are useful for determining molecular weight,
gravimetric basis. Since distillation curves are formed with a
critical properties, and so forth. They can also be utilized for
finite number of temperature-yield data, they can be fitted to
distinguishing reaction products as pseudocomponents or lumps
different functions in order to generate a continuous representa-
(naphtha, middle distillates, etc.) of some typical refinery
tion. Polynomial regression, cubic spline interpolation, and
Lagrange interpolation are all common mathematical tools which
(3) Whitson, C. H. SPE J. 1983, 275, 683.
(4) Whitson, C. H.; Anderson, T. F.; Soreide, I. NPRA, New Orleans,
* To whom correspondence should be addressed. Fax: (+52-55) 9175- LA, March 6-10, 1988.
8418. E-mail: jancheyt@imp.mx. (5) Dhulesia, H. Hydrocarbon Process. 1984, 62, 179.
† Instituto Mexicano del Petróleo. (6) Kehlen, H.; Ratzsch, M. T. Chem. Eng. Sci. 1987, 42, 221.
‡ ESIQIE-IPN. (7) Willman, B.; Teja, A. S. Ind. Eng. Chem. Res. 1986, 26, 948.
§ University of Alberta. (8) Peng, D. Y.; Wu, J. P.; Batycky, J. P. AOSTRA J. Res. 1987, 3, 113.
(1) Watson, K. M.; Nelson, E. F. Ind. Eng Chem. 1933, 25, 880. (9) Bacaud, R.; Rouleau, L; Bacaud, B. Energy Fuels 1996, 10, 915.
(2) Technical data book - petroleum refining; American Petroleum (10) Krishna, R.; Saxena, A. K. Chem. Eng. Sci. 1989, 44, 703.
Institute: Washington, DC, 1976. (11) Kumaraswamy, P. J. Hydrol. 1980, 46, 79.
processes such as hydrocracking, catalytic cracking, and so Table 1. Probability Distribution Functions Used in This Worka
forth.12 To have accurate and reliable representations of distil- other functions
lation data for further interpolation, a strict analysis of other included
number of
approaches apart from the traditional interpolation techniques eq. function parameters PDF CDF ref
is mandatory.
1 R (normalized) 2 Φ Φ 18
The main objective of the present work is to contrast the 2 R 4 Φ Φ 18
fitting capability of several probability distribution functions 3 β 4 Γ I 42
to distillation data. 4 Bradford 3 42
5 Burr 4 42
6 χ 3 Γ Γ 42
Brief Background of Probability Distribution Functions 7 fatigue life 3 Φ 18
8 Fisk 3 42
Probability distribution functions were first developed to 9 Frèchet 3 17
measure the possibility of the occurrence of a specific event. 10 folded normal 2 Φ 42
Depending on the number of possible events, they can be 11 γ 3 Γ Γ 42
discrete functions, when the number of possible events is 12 generalized extreme value 3 17
discrete, or they can be classified as continuous functions. In 13 generalized logistic 3 42
14 Gumbel 2 17
the present contribution, only continuous distribution functions 15 half normal 2 42
are studied. Probability distribution functions are defined for a 16 Jhonson SB 4 Φ 43
reduced number of parameters and have simple formulas for 17 Kumaraswamy 4 11
calculating mean, mode, variance, and so forth. They have two 18 log-normal 2 Φ 42
19 Nakagami 3 Γ Γ 42
main forms: the probability density function (PDF) and the 20 normal 2 Φ 42
cumulative distribution function (CDF). The former is the most 21 Riazi 3 16
commonly used form of the probability distribution functions, 22 Student's t 3 Γ I 42
a very well-known example of this type of function is the 23 Wald 2 Φ 42
24 Weibull 3 42
classical Gaussian bell.13 CDFs increase monotonically and
25 Weibull extreme 4 18
generally describe the same behavior that is observed with
a Φ: CDF normal. Γ: γ function. Ι: incomplete β function.
distillation curves. Due to their simplicity, probability distribu-
tion functions are easily included in computer programs for
modeling, optimization, and control purposes. It is not recom- point, specific gravity, and refractive index of C7+ fractions.
mended to select the distribution function a priori, however, The extreme value distributions are a family of equations which
since finding an adequate distribution function that represents have been used in applications involving natural phenomena
the experimental data with minimal error depends on the specific such as rainfall, floods, air pollution, and corrosion. Gumbel,
characteristics of each type of function. Frèchet, and Weibull functions may all be represented as
As was mentioned in the Introduction, several distribution members of a single family of generalized extreme value
functions have been utilized for calculations related to the distribution functions.17 The Kumaraswamy distribution is
petroleum industry. Whitson et al.3,4 proposed a petroleum comparable in characteristics with the β distribution function
characterization method based on the three-parameter γ distribu- for their versatility and double-bounded nature; however, the
tion function for the characterization of C7+. Dhulesia5 proposed Kumaraswamy distribution has a simpler form than the β
the Weibull distribution function in its cumulative form to distribution for both the PDF and the CDF.11 Even though the
describe ASTM distillation curves of petroleum fractions. The Kumaraswamy distribution function was originally developed
Weibull equation was tested with distillation data of the feed for hydraulic modeling, it has been applied to describe bounded
and products of a fluid catalytic cracking unit, and the fitted physical variables encountered in civil engineering.
curves showed good agreement with experimental data. The Among all distribution functions found in the literature, only
normal function has been employed in phase equilibrium 25 were chosen to be analyzed in this work. The selected
methods when the continuous thermodynamics approach was distributions are summarized in Table 1. Not included in the
used.6 The normal function and the error function were utilized list are a few well-known distributions, such as the Tukey-
for modeling reaction behavior of the hydrocracking process.9,10 Lambda, Cauchy, and F distributions,18 because they are either
Willman and Teja7 used the bivariate log-normal distribution seldom used to model empirical data or they lack a convenient
function for characterizing the composition of mixtures involved analytical form for the CDF. Even though most of the distribu-
during phase equilibrium calculations. The β distribution tions reported in Table 1 are not widely applied to engineering
function has been employed for characterizing petroleum applications, they all have the potential to be very useful for
fractions in state equation calculations.8 Exponential and χ2 describing real-world data sets. Definitions of the probability
distribution functions, which are simplified cases of the γ distribution functions in their two main forms (CDF and PDF)
distribution function, have been used to characterize the heavy are presented in Table 2. The forms of the functions may vary
end of reservoir fluids and to develop phase equilibrium slightly from those reported in the literature. It is also possible
computations.14,15 A modified form of the Weibull distribution that a distribution function could be known with different names.
was utilized by Riazi16 for establishing a method to predict
complete property distributions for the molecular weight, boiling Methodology
(12) Ancheyta, J.; Sánchez, S.; Rodrı́guez, M. A. Catal. Today 2005, Data Source. The fitting capability of the 25 selected functions
109, 76. was done using two distillation data sources: those previously
(13) Evans, M.; Hastings, N.; Peacock, B. Statistical distributions; John
Wiley: New York, 1993.
(14) Behrens, R. A.; Sandler, S. I. SPE Res. Eng. 1998, 3, 1041. (17) Kotz, S.; Nadarajah, S. Extreme Value distributions: theory and
(15) Luks, K. D.; Turek, E. A.; Kragas, T. R. Ind. Eng. Chem. Res. 1990, applications, Imperial College Press: London, 2000.
29, 2101. (18) Heckert, N. A.; Filliben, J. NIST handbook 148; NIST: Gaithersburg,
(16) Riazi, M. R. Ind. Eng. Chem. Res. 1989, 28, 1731. MD, 2003.
Comparison of Probability Distribution Functions Energy & Fuels, Vol. 21, No. 5, 2007 2957
y Φ(C)x2π
2
exp - Φ C-( y )
Φ(C)
2 R (A, B, C, D)
D
[ 21(C - Dt) ] D
2
B2t2Φ(C)x2π
exp - Φ C-( t )
Φ(C)
3 β (A, B, C, D)
Γ(C + D)
Γ(C) Γ(D) (B - A) C+D-1
(y - A)C-1
(B - y)1-D
I(By -- AA,C,D)
[ ]
4 Bradford(A, B, C)
C C(y - A)
ln 1 +
[C(y - A) + B - A] ln(C + 1) B-A
ln(C + 1)
5 Burr (A, B, C, D) (1 + t -C)-D
CD -C-1
t (1 + t-C)-D-1
B
6 χ (A, B, C)
1
( )
tC-1 exp - t2
2
1-Γ ( C2 ,21t )2
C
(2(C/2)-1B)Γ
2()
[x x ( )] [ ]
7 fatigue life (A, B, C)
y A -1 y A 0.5 0.5
y2 - A2 y A
exp -
+ -2
A y
Φ
(A)
-( )
y
2x2πC2By2 A y 2C2 C
8 Fisk (A, B, C)
C tC-1 1
B (1 + tC)2 1 + t-C
9 Fréchet (A, B, C) exp(-t-C)
C -C-1
t exp(-t-C)
B
11 γ (A, B, C) Γ(C,t)
1 C-1
t exp(-t)
ΒΓ(C)
12 generalized extreme value (A, B, C) exp[-(1 + Ct)-1/C]
1
(1 + Ct)-1-(1/C) exp[-(1 + Ct)-1/C]
B
13 generalized logistic (A, B, C)
C exp(-t) 1
B [1 + exp(-t)]c+1 [1 + exp(-t)]c
14 Gumbel (A, B) exp[-exp(-t)]
1
exp(-t) exp[-exp(-t)]
B
{ }
15 Johnson SB (A, B, C, D)
y-A
( By -- Ay)]
2
D(B - A)
exp -
[C + D ln(
B - y)] [
Φ C + D ln
(y - A)(B - y)x2π 2
16 Kumaraswamy (A, B, C, D)
(By -- AA) [1 - (By -- AA) ] [ (By -- AA) ]
C-1 C D-1 C D
CD 1- 1-
17 half normal (A, B) 2Φ(t) - 1
1
B x ( )
2
π
1
exp - t2
2
[ ( )] ( )
18 log-normal (A, B)
1 1 ln(y) - A 2 ln(y) - A
exp - Φ
Byx2π 2 B B
19 Nakagami (A, B, C) Γ(C,Ct2)
2CC 2C-1
t exp(-Ct2)
BΓ(C)
20 normal (A, B) Φ(t)
1 1
x
B 2π
exp - t2
2 ( )
21 Riazi (A, B)
B2 B-1 B
A
y
B
(
exp - yB
A ) 1 - exp - yB
A ( )
2958 Energy & Fuels, Vol. 21, No. 5, 2007 Sánchez et al.
Table 2 (Continued)
probability density function cumulative distribution function
{
eq distribution (PDF) (CDF)
22 Student’s t (A, B, C)
(C +2 1) 1 + t 1
( C C1
)
C ( C)
Γ 2 -[(C+1)/2] I , , ,te0
1 2 C + t2 2 2
BxπC Γ( )
2
1
1- I ( C C1
, , ,t>0
2 C + t2 2 2 )
23 Wald (A, B)
x
B
2πy3
exp -
B y-A
[ (
2y A )]
2
Φ (x By - A
y A )
+ exp
2B
A
Φ ( ) (x B-y - A
y A )
24 Weibull (A, B, C) C C-1 1 - exp(-tC)
t exp(-tC)
B
25 Weibull extreme (A, B, C, D) CDtC-1 [1 - exp(-tC)]D
exp(-tC)[1 - exp(1 - tC)]D-1
B
a t ) (y - A)/B
reported in the literature19-37 and data obtained in laboratories at Step 1. Temperature data were changed to a dimensionless form
the Mexican Institute of Petroleum and the University of Alberta. using the following equation:
The selected samples include whole crude oils; vacuum gasoils;
atmospheric and vacuum residua; atmospheric gasoils; light cycle Ti - T 0
oils (LCO); hydrotreated LCO; feeds of the fluid catalytic cracking θi ) (26)
T1 - T0
(FCC) process; feeds; and products of mild thermal processing,
vacuum residue hydrotreating, hydrotreating of bitumen-derived
gasoils process, and hydrotreating of middle distillates. The where θi is the dimensionless temperature, Ti is the actual
distillation data set was comprised of petroleum samples mainly temperature boiling point, and T0 and T1 are reference temperatures,
from Kuwait, Saudi Arabia, Mexico, and Canada. A total of 137 which are chosen to have θi values between 0 and 1 (T0 ) 30 °C
distillation curves were considered in the analysis, each having at and T1 ) 1000 °C in this work). The dimensionless distillation
least six experimental points, with a total of 1627 temperature- curve together with the original distillation data of a selected sample
versus-yield points. All experimental distillation data were obtained are shown in Figure 1. In the case of dimensionless distillation
using standardized methods (physical distillation methods: ASTM data, the values neither start at zero nor finish at one since the
D-2892 and ASTM D-1160; simulated distillation methods: ASTM reference temperatures, T0 and T1, covered a wider range of values
D-5307, ASTM D-6352, and high-temperature simulated distilla- than those of the selected sample.
tion) and were collected in a database. Distillation data were not Step 2. An optimization method38 was applied for obtaining the
reduced at a unique basis; instead, they were treated on their original optimal set of parameters of the probability distribution function.
basis (ASTM D-2892, ASTM D-1160, ASTM D-5307, etc.). The optimization criterion was the minimization of the residual sum
Different units of temperature and product recovery were found in of squares (RSS) defined by eq 27.
the literature and were transformed to degrees Celsius and weigth
percent.
Example of Parameter Estimation. The comparison of fitting
RSS ) ∑(y exp,i - ycal,i)2 (27)
capability of all functions reported in Tables 1 and 2 was performed
by statistical methods. The procedure for parameter estimation is where yexp,i and ycal,i are the experimental and calculated weight
described below, and the four-parameter β-distribution function (eq fractions, respectively. The optimal set of parameters using a β
3) is taken as an example using a single distillation data set, which distribution function for the data given in Figure 1 was A )
corresponds to a simulated distillation curve of hydrocracked Maya 0.089 62, B ) 1.050 13, C ) 2.490 03, and D ) 6.341 86. To be
crude oil: sure about the precision of the estimated parameters and conver-
gence to a global minimum, a sensitivity analysis was conducted
(19) Ali, F.; Ghaloum, N.; Hauser, A. Energy Fuels 2006, 20, 45. using an approach reported elsewhere.39
(20) Anabtawi, J. A.; Ali, S. A. Ind. Eng. Chem. Res. 1991, 30, 2586. Step 3. Predicted values of liquid recovery using the distribution
(21) Aoyagi, K.; McCaffrey, W.; Gray, M. R. Pet. Sci. Technol. 2003, function with the optimal values of parameters were obtained. The
21, 997. results from the model are also shown in Figure 1.
(22) Barman, B. N. Energy Fuels 2005, 19, 1995.
(23) Bollas G. M.; Vasalos, I. A.; Lappas, A. A.; Iatridis, D. K.; Tsioni,
G. K. Ind. Eng. Chem. Res. 2004, 43, 3270.
(24) Chen, Y. W.; Tsai, M. C. Ind. Eng. Chem. Res. 1997, 36, 2521.
(25) Espinosa, M.; Figueroa, Y.; Jimenez, F. Energy Fuels 2004, 18,
1832.
(26) Laredo, G. C.; López, C. R.; Alvárez, R. E.; Castillo, J.; Cano, J.
L. Energy Fuels 2004, 18, 1687.
(27) Lenoir, J. M.; Hipkin H. G. J. Chem. Eng. Data 1973, 18, 195.
(28) Marafi, A.; Al-Bazzaz, H.; Al-Marri, M.; Maruyama, F.; Absi-
Halabi, M.; Stanislaus, A. Energy Fuels 2003, 17, 1191.
(29) Marroquin, G.; Ancheyta, J.; Ramı́rez, A.; Farfan, E. Energy Fuels
2001, 15, 1213.
(30) Maw, S. C.; Heldman, J. L.; Hwang, S. C.; Tsonopoulos, C. Ind.
Eng. Chem. Process Des. DeV. 1984, 23, 577.
(31) Michael, G.; Al-Siri, M.; Khan, Z. H.; Ali, F. A. Energy Fuels 2005,
19, 1598.
(32) Owusu-Boakye, A.; Dalai, A. K.; Ferdous, D.; Adjaye, J. Energy
Fuels 2005, 19, 1763.
(33) Rousis, S. G.; Fitzgerald, W. P. Anal. Chem. 2000, 72, 1400.
(34) Sarma, A. K.; Konwer, D. Energy Fuels 2005, 19, 1755.
(35) Schwartz, H. E.; Brownlee, R. G.; Boduszinski, M. M.; Su, F. Anal.
Chem. 1987, 59, 1393.
(36) Ukwuoma, O. Pet. Sci. Technol. 2002, 20, 525. Figure 1. Experimental (O) and predicted (__) distillation values with
(37) Yui, S. M.; Ng, S. H. Energy Fuels 1995, 9, 665. β function (hydrocracked Maya crude oil).
Comparison of Probability Distribution Functions Energy & Fuels, Vol. 21, No. 5, 2007 2959
( )
of the 137 distillation data sets. R
1
As an example, comparisons of experimental and predicted values
from the β-distribution function are shown in Figure 2, in which a
∑
r)1
exp - ∆r
2
visual analysis and predictive capability of the function can be
( )
established. Correlation coefficient (R2), slope, intercept, and ω1 1
standard deviation were obtained by statistical analysis of the parity ) exp (33)
ωj 2∆j
plot of experimental versus calculated values of liquid recovery. A
summary of the statistical parameters derived from a regression Results and Discussion
analysis of the parity plot is presented in Table 3. This table gives
more quantitative analysis of the predictive capability of the β To determine the best distribution function to describe
function. distillation curves, the various functions were first fitted to the
The predictive capability of the different functions was classified experimental data. The largest errors during data fitting were
according to their statistical indicators. First, a methodology based obtained with boiling points close to the end of the distillation
on regression analyses was applied, in which SD as calculated by curves followed by those close to the beginning. This problem
eq 28 was the main criteria for establishing the ranking; the
was particularly evident for the normal and Student’s t distribu-
correlation coefficient (R2), slope, and intercept were also consid-
ered. tion functions, which are symmetrical. The difficulty in fitting
distribution functions to IBP and FBP data is compounded by
the larger experimental error associated with the endpoints of
SD )
xnRSS
-2
(28) distillation curves. These errors are associated with the sensitiv-
ity of experimental devices when initializing or finalizing the
A second approach took into consideration both the Akaike and tests and are observed regardless of whether the equipment is
Bayesian information criteria (AIC and BIC, respectively). The AIC operated manually or automatically. The experimental error is
is an operational way of considering both the complexity of a model variable, depending upon the standardized method that is
and how well it fits the data.40 The AIC methodology attempts to employed; for instance, in the ASTM D-2892 method, errors
find the model that best explains the data with a minimum of free up to 1.2 wt % for the volume recovery are tolerated, whereas
parameters. When residuals are randomly distributed, the AIC is in the ASTM D-1160 method, errors can range from 1.7 up to
calculated as
5.7 wt % for the different points in the distillation curve.
Selecting the “best” distribution function is not a trivial task.
AIC ) 2k + nln (RSS
n )
(29) A wide variety of statistical data can be used in this duty,
including standard deviations, R2, Akaike and Bayesian infor-
where k is the number of parameters, n is the number of mation criteria, and even CPU time, which are all presented in
observations, and RSS is the residual sum of squares. AIC includes Table 4. It is well-accepted that correlation coefficients are not
very useful in discriminating between models. In this study,
(38) Marquardt, D. W. J. Soc. Ind. Appl. Math. 1963, 2, 431. the correlation coefficients were very close to unity (0.986-
(39) Alcazar, L. A.; Ancheyta, J. Chem. Eng. J. 2007, 128, 85.
(40) Burnham, K. P.; Anderson, D. R. Model selection and multimodel 0.999) for all of the functions. To highlight this point, only the
inference, 2nd ed.; Springer-Verlag: New York, 1998. R distribution function (eq 1) exhibited a value of R2 lower than
2960 Energy & Fuels, Vol. 21, No. 5, 2007 Sánchez et al.
0.99. A parity plot is presented in Figure 2, and the slopes and One method to rank the models is to compare the standard
intercepts of the parity plots for each model are included in deviations. Since the standard deviation values for all functions
Table 4. The slopes of the experimental versus predicted values ranged from 0.59 to 3.37 wt %, they were more useful than the
plots are in all cases virtually unity (0.984-1.026), and R2 values or the slopes and intercepts from the parity plots. From
intercepts range between -1.47 and +1.00. A more useful the results given in Table 4 (R2, slope, intercept, and SD) and
technique to eliminate potential models was to identify which residual analysis, the following classification of accuracy of
distributions yielded nonrandom residuals. Nearly all of the two- predictions was established:
parameter models had trends in their residuals. The two- Group 1: SD < 1.0; 0.999 < R2 < 1; number of functions
parameter models that were eliminated due to trends in the ) 4 (eqs 3, 17, 24, and 25)
residuals were normalized R (eq 1), Frèchet (eq 9), folded Group 2: 1.0 < SD < 1.992; 0.995 < R2 < 0.998; number
normal (10), half normal (eq 15), log-normal (eq 18), normal of functions ) 11 (eqs 2, 4, 5, 6, 8, 9, 11, 13, 14, 16, 19, and
(eq 20), Student’s t (eq 22), and Wald (eq 23). Additionally, 21)
the three-parameter models that were eliminated due to trends Group 3: trends in residuals; number of functions ) 10 (eqs
in their residuals were the fatigue life (eq 7) and the generalized 1, 7, 10, 12, 15, 18, 20, 22, and 23)
extreme value (eq 12) models. Interestingly, the Gumbel Functions of group 1 are the most accurate, and those of group
distribution, eq 14, was the only two-parameter model to display 3 do not adequately describe the functionality of the distillation
random residuals. Not surprisingly, all of the four-parameter data.
models were effective in describing the experimental data. For Model selection should be based not solely on goodness of
comparison purposes, Figure 3 presents the residuals analysis fit but also on the degree of confidence of the predicted
for the worst (two-parameter R function, eq 1) and best (four- parameters. It is well-known that increasing the number of free
parameter Weibull extreme function, eq 25) functions. The parameters to be estimated can improve the goodness of fit but
differences and precision of estimations are very clear; while can also decrease the confidence in the estimates of the model
residuals for eq 25 ranged between -5 and +5 and were parameters. Therefore, ranking the models solely on the basis
randomly distributed, those for eq 1 varied from a -15 to +10 of SD data may not be satisfactory for comparing functions with
with a very pronounced trend. different numbers of parameters and different sample sizes. In
Comparison of Probability Distribution Functions Energy & Fuels, Vol. 21, No. 5, 2007 2961
Table 6. Statistical Parameters for Regression Analysis for Data Set Validation
Weibull R
extreme Kumaraswamy Weibull β (normalized)
eq 25 17 24 3 1
R2 0.994 0.994 0.994 0.993 0.984
SD 2.38 2.43 2.54 2.75 4.10
slope 1.004 ( 0.016 1.009 ( 0.016 1.006 ( 0.017 1.008 ( 0.009 1.010 ( 0.028
average AIC 18.11 18.54 19.34 19.92 36.09
average BIC 20.18 20.61 20.89 21.99 37.13
∆i 0 0.43 1.23 1.81 17.99
evidence ratio 1 1.24 1.85 2.47 8048
positive residuals 164 167 172 176 195
negative residuals 182 179 174 170 151
absolute differencea 18 12 2 6 44
a Absolute difference between positive and negative residuals.
(eq 17), and Weibull (eq 24) functions require similar computing Table 7. Composition of Products from Hydrocracking at P ) 10
time (0.248, 0.124, and 0.219, respectively), while the required MPa and LHSV ) 0.5 h-1 (El-Kady41)
CPU time for evaluating the β distribution function (eq 3, rank Product 1 2 3
2) is more than 10 times longer (3.705). This can be explained reactor temperature, °C 410 430 450
by the evident relative simplicity of the Weibull extreme, Reported Yields
Weibull, and Kumaraswamy distribution functions, which do gases (C2-C5) 5.90 10.42 17.23
not include any special function as in the case of the β function light naphtha (IBP-80 °C) 3.51 6.92 19.03
(Table 1). gasoline (80-150 °C) 9.07 15.26 36.03
The following observations, based on the number of param- kerosene (150-250 °C) 21.23 22.84 11.15
gasoil (250-380 °C) 25.28 28.26 13.75
eters (two, three, or four) of each distribution function, can be residue (380-538 °C) 35.01 16.30 2.81
made:
Estimated Distillation Data of Liquid Product, °C
• Four-parameter distribution functions offer the best fitting IBP 36.0 36.0 36.0
capability. Five of them are ranked among the top 10. The 5% 86.4 60.9 46.3
Weibull extreme, β, and Kumaraswamy distributions are in the 10% 125.6 89.7 56.5
best-ranked group. 30% 228.1 174.4 87.8
50% 321.9 247.6 122.4
• Some of the three-parameter distribution functions can fit 70% 401.9 325.0 174.9
distillation data with good accuracy: the Weibull and γ 90% 479.9 428.2 307.9
distributions have standard deviations of 0.86 and 1.07% and 95% 505.7 468.9 394.4
are ranked within the best five. FBP 538.0 538.0 538.0
• Two-parameter distribution functions exhibited poor fitting
capability. All but one of them are in group 3. deviations and residuals using the four best equations are lower
• It must be noticed that the γ and normal distribution than those obtained with the normalized R distribution function
functions, which are the most popular distribution functions used (SD of about 2.5 versus 4.1). The correlation coefficients and
for fitting distillation data, were ranked 5 and 20, respectively. slopes of the parity plots are closer to unity and the intercepts
For validation purposes, fitting capabilities of the four best closer to zero for the four best functions as compared to the
functions and the worst function (eqs 25, 17, 24, 31, and 1) worst function. Additionally, the absolute difference between
were determined using other data sets. A total of 30 samples, the number of positive and negative residuals of the normalized
which are from three whole crude oils and their various boiling R function is more than twice that of the other functions, which
range fractions, with a total of 346 points were selected for this means that the former is overestimating the experimental values.
task. They cover a wide range of distillation temperatures (from An inspection of the AIC and BIC values, ∆i, and evidence
20 to 540 °C). The validation results are presented in Table 6. ratios yielded the same order in the ranking from the validation
Residuals for the Weibull extreme and normalized R functions set as from the testing data set. These validation results
are shown in Figure 5. It can be seen in Table 6 that the standard corroborate that Weibull extreme, Kumaraswamy, and Weibull
are the best distribution functions to fit distillation data.
To illustrate one application of fitting distribution functions
to experimental distillation data, a data set of hydrocracking
products of vacuum gas oil (distillation range, 380-550 °C;
molecular weight, 425 g/mol; density at 15 °C, 0.931 g/mL),
obtained in a fixed-bed reactor at 10 MPa and 0.5 h-1 liquid
hourly space velocity (LHSV), was taken from the literature.41
The reported composition data of various products are detailed
in Table 7. The complete distillation curves of the whole
hydrocracking products were not reported; however, they can
be reproduced from yields and temperature ranges of products
by using distribution functions. The IBP of naphtha was assumed
Nomenclature
A, B, C, D ) Distribution parameters
AIC ) Akaike information criterion
Figure 6. Comparison of Weibull extreme distribution function (s) AICi ) AIC for model i in eq 31
and Hermite interpolation method (- -) for representing experimental AICmin ) AIC for best model in eq 31
distillation data of products from hydrocracking at different tempera- BIC ) Bayesian information criterion
tures: (b) 410 °C, (9) 430 °C, and (2) 450 °C (data from El-Kady41). k ) Number of free parameters
n ) Number of observations
to be that of n-C5 so that a total of six points was available. R ) Number of models considered in the study in eq 32
R2 ) Correlation coefficient
The procedure previously described was applied and a set of
RSS ) Residual sum of squares
optimal parameters for the Weibull extreme distribution function SD ) Standard deviation
was determined, and a complete distillation curve was generated t ) Independent variable in eqs 1-25
from only partial data, which is also reported in Table 7. This T ) Actual temperature boiling point
procedure and data were successfully applied for kinetic T1, T2 ) Reference temperatures
modeling of the hydrocracking in which a complete distillation y ) Independent variable in eqs 1-25, t ) (y - A)/B
curve was needed.12 For comparison purposes, the results of ycal ) Calculated weight fraction
the complete distillation curves obtained by using the Weibull yexp ) Experimental weight fraction
extreme probability distribution function were plotted together Acronyms
with those determined by the common interpolation method
CDF ) Cumulative distribution function
(piecewise cubic Hermite interpolation), and the results are
FBP ) Final boiling point
presented in Figure 6. The Hermite interpolation method was IBP ) Initial boiling point
selected because it preserves monotonicity and the shape of the MS ) Mass spectroscopy
data. It can be clearly observed that the distillation curves NMR ) Nuclear magnetic resonance
obtained with cubic interpolation show “humps”, although PDF ) Probability density function
passing through all of the experimental points; this feature is XRD ) X-ray diffraction
not typically observed in distillation curves. On the contrary,
Greek symbols
the Weibull extreme probability distribution function provides
a very good fit to the shape of the distillation curve and ∆i ) AIC differences for model i with respect to best model
experimental points. It is worthy to mention that the Weibull Ι ) Incomplete beta function
Γ ) Gamma function
function was previously recommended by Dhulesia to describe
Φ ) Normal cumulative distribution function
distillation curves of feeds and products of the FCC process.5 θ ) Dimensionless temperature
ωi ) AIC weight for model i
Conclusion ω1/ωj ) Evidence ratio of model j with respect to model 1
The probability distribution functions in their cumulative Acknowledgment. The authors thank the Mexican Institute of
forms are very useful in general for fitting distillation data. On Petroleum for economic support. Discussions with Fraser Forbes
the basis of statistical analyses of 25 functions and 1474 on the AIC and BIC rankings are also appreciated.
distillation data points, it was possible to establish a ranking of
fitting capability of the functions according to two approaches: Supporting Information Available: Entire experimental data
(1) with standard deviation, a correlation coefficient, and a set used for determining the fitting capability of probability
residuals analysis and (2) with the AIC and BIC methodology. distribution functions. This information is available free of charge
via the Internet at http://pubs.acs.org.
Even when SD introduces a bias due to the number of
parameters, the rankings obtained are quite similar with both EF070003Y