Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Journal of Applied Statistics

ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: https://www.tandfonline.com/loi/cjas20

On beta regression residuals

Patrícia L. Espinheira , Silvia L.P. Ferrari & Francisco Cribari-Neto

To cite this article: Patrícia L. Espinheira , Silvia L.P. Ferrari & Francisco Cribari-Neto
(2008) On beta regression residuals, Journal of Applied Statistics, 35:4, 407-419, DOI:
10.1080/02664760701834931

To link to this article: https://doi.org/10.1080/02664760701834931

Published online: 25 Mar 2008.

Submit your article to this journal

Article views: 1345

View related articles

Citing articles: 36 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=cjas20
Journal of Applied Statistics
Vol. 35, No. 4, April 2008, 407–419

On beta regression residuals

Patrícia L. Espinheiraa , Silvia L.P. Ferraria∗ and Francisco Cribari-Netob

a Departamento de Estatística/IME, Universidade de São Paulo, Brazil; b Departamento de Estatística,


CCEN, Universidade Federal de Pernambuco, Brazil

We propose two new residuals for the class of beta regression models, and numerically evaluate their
behaviour relative to the residuals proposed by Ferrari and Cribari-Neto. Monte Carlo simulation results
and empirical applications using real and simulated data are provided. The results favour one of the residuals
we propose.

Keywords: beta distribution; beta regression; maximum likelihood estimation; proportions; residuals

1. Introduction
Practitioners oftentimes desire to investigate how certain variables influence a continuous variable
that assumes values on the open interval (0, 1), such as percentages, proportions, rates and
fractions. For example, the fraction of household income spent on food is influenced by variables
such as family size, family total income, and so on. Linear regression models are not suitable
for modelling such data, since the response only takes values on a limited range, and thus the
mean of the response must be nonlinear in the regression parameters. Additionally, the variance
of the response is not constant across observations; it should approach zero as the mean response
approaches the limits of the standard unit interval. Kieschnick and McCullough [3] identify dif-
ferent modelling strategies that are commonly used in such applications. One of the models
they recommend is based on the beta distribution, which is useful to model continuous random
variables that assume values on (0, 1). A class of beta regression models that is in many aspects
similar to that of generalized linear models was proposed by Ferrari and Cribari-Neto [1]. The
mean response is related to a linear predictor through a link function and the linear predictor
involves covariates and unknown regression parameters. The model is also indexed by a precision
parameter. The model allows for nonconstant response variance and asymmetry of the response
distribution; it does not require the response to be transformed to assume values on the real line,
thus allowing for parameter interpretation in terms of the response in the original scale, also
allowing the investigator to choose a link function.

∗ Corresponding author. Email: sferrari@ime.usp.br

ISSN 0266-4763 print/ISSN 1360-0532 online


© 2008 Taylor & Francis
DOI: 10.1080/02664760701834931
http://www.informaworld.com
408 P.L. Espinheira et al.

Ferrari and Cribari-Neto [1] developed maximum likelihood inference in the class of beta
regression models and provided some guidelines for diagnostic analysis, including the use of two
different residuals (standardized and deviance). The chief goal of this paper is to propose two new
residuals that can be used when estimating beta regressions, and evaluate their merits relative to
those proposed by Ferrari and Cribari-Neto [1]. Monte Carlo simulation results are presented.
We also present real and simulated data applications. Overall, the results favour one of the residuals
we propose.
The paper unfolds as follows. Section 2 presents the beta regression model and the associated
residuals, including our proposal. Section 3 contains simulation results. Applications using real
and simulated data are presented in Section 4. Finally, concluding remarks are given in Section 5.

2. Beta regression residuals


Let y1 , . . . , yn be independent random variables such that each yt , t = 1, . . . , n, is beta-distributed,
i.e. each yt has density

(φ)
f (y; μt , φ) = y μt φ−1 (1 − y)(1−μt )φ−1 , 0 < y < 1, (1)
(μt φ)((1 − μt )φ)

where 0 < μt < 1 and φ > 0. Here, E(yt ) = μt and var(yt ) = V (μt )/(1 + φ), where V (μt ) =
μt (1 − μt ). This parameterization is useful for defining a beta regression model since μt is the
mean of yt and φ is a precision parameter in the sense that, for fixed μt , the variance ofyt decreases
as φ increases. In the beta regression model [1] the mean of yt is written as g(μt ) = ki=1 xti βi =
ηt , where β = (β1 , . . . , βk ) is a k-vector of unknown parameters (β ∈ IR k ), xt1 , . . . , xtk are fixed
and known covariates (k < n), and g(·) is a strictly monotonic and twice differentiable function
known as the link function.
Residual analysis aims at identifying atypical observations and/or model misspecification.
It can be based on ordinary residuals or on standardized variants; it can also be based on deviance
residuals [5, pp. 37–39]. Residuals are measures of agreement between the data and the fitted
model. Most residuals are based on the differences between the observed responses and the
fitted conditional mean. Ferrari and Cribari-Neto [1] defined the following standardized ordinary
residual:
yt − 
μt
rt = √ , (2)
 t)
var(y
 t) = 
where var(y μt (1 −   Here, 
μt )/(1 + φ).  β
μt = g −1 (xt β),  and φ denoting the maximum
likelihood estimators of β and φ, respectively.
The new residuals we propose are based on Fisher’s scoring iterative algorithm for estimating
β when φ is fixed (see Appendix 1). From (14) it follows that the mth step of the scoring scheme is

β (m+1) = β (m) + (X W (m) X)−1 X T (m) (y ∗ − μ∗ (m) ), (3)

where the tth elements of the vectors y ∗ and μ∗ are given, respectively, by
 
yt
yt∗ = log and μ∗t = ψ(μt φ) − ψ((1 − μt )φ),
1 − yt

ψ(·) denoting the digamma function, i.e. ψ(z) = d log (z)/dz for z > 0. The matrices T and W
are given in (10) and (12), respectively, and X is an n × k matrix whose tth row is xt . Note that
μ∗t = E(yt∗ ) (see (15)).
Journal of Applied Statistics 409

It is possible to write the iterative scheme in (3) in terms of weighted least-squares regres-
(m)
sions: β (m+1) = (X W (m) X)−1 X W (m) z(m) , where z(m) = η(m) + W −1 T (m) (y ∗ − μ∗ (m) ), with

η = (η1 , . . . , ηn ) = Xβ. Upon convergence,

 X)−1 X W
 = (X W
β  z, (4)

where
z=  −1 T(y ∗ − 
η+W μ∗ ). (5)
Here, W and T are the matrices W and T , respectively, evaluated at the maximum likelihood esti-
in (4) can be viewed as the least squares estimate of β obtained by regressing z
mator. We note that β
on X with weighting matrix W . The ordinary residual is r ∗ = W  1/2 (z − 
η) = W −1/2 T(y ∗ − 
μ∗ ).
Hence, using the definitions of T and W given in (10) and (12), respectively, we propose a new
beta regression residual, which we shall refer to as the weighted residual:

y∗ − 
μ∗t
rt∗ = t√ , (6)
φvt

where vt is given in (13). The weighted residual in (6) is based on the difference between yt∗
and μ∗t , i.e. on the difference between the logit of the tth response and the maximum likelihood
estimate of its expected value under the adopted model. Since var(yt∗ ) = vt (see (16)), we define
the following standardized weighted residual:

yt∗ − 
μ∗
rtw = φ 1/2 rt∗ = √ t. (7)
vt

We shall refer to rtw as ‘standardized weighted residual 1’.


An alternative is to standardize the weighted residual using the variance of z. To that end,
we write (4) as (X W  X)β  = X W  z. Since cov(β)
 ≈ φ −1 (X  W X)−1 , it follows that cov(z) ≈
−1  −1  X)−1
 1/2 X(X W
φ W . Then, using (5) we obtain cov(r ) ≈ φ −1 (In − H ), where H = W

X W 1/2
and In is the n × n identity matrix. We can then define the ‘standardized weighted
residual 2 as
rt∗ rw y∗ −  μ∗t
rtww =  =√ t =√ t , (8)
φ −1 (1 − htt ) (1 − htt ) vt (1 − htt )
where htt denotes the tth diagonal element of H .
We note that we have considered φ fixed. In practice, one should replace φ by its maximum
likelihood estimate φ  when computing the two residuals proposed above.
Ferrari and Cribari-Neto [1] also proposed a deviance residual, which is based on sign(yt −

μt ){2(t (  − t (
μt , φ)  1/2 , where t (μt , φ) is the contribution of the tth observation to the
μt , φ))}
log-likelihood function (see Appendix 1) and  μt is the maximum likelihood estimate of μt in
the saturated model. The authors showed that, for fixed μt and large φ,  μt ≈ yt , and suggested
replacing t (  by t (yt , φ),
μt , φ)  thus arriving at the following deviance residual:

rtd = sign(yt −   − t (
μt ) 2(t (yt , φ) 
μt , φ))
1/2
.

We note, however, that even when φ is large (e.g. greater than 400) the approximation used
for obtaining such a residual may not be accurate, especially when μt is close to either 0 or 1.
Numerical results not presented here showed that even when φ  is large (around 400) t (yt , φ)
 −
t (  may be negative, thus rendering impossible the computation of the corresponding
μt , φ)
deviance residual. In what follows, we shall not consider the deviance residual.
410 P.L. Espinheira et al.

3. Monte Carlo results


The Monte Carlo experiments were carried out using a beta regression model in which g(μt ) =
β1 + β2 xt , t = 1, . . . , n, where g(·) is the logit function. The covariate values were obtained
as random draws of the following distributions: U(0, 1) (standard uniform), t3 (Student t with
three degrees of freedom) and exp(2) (exponential with mean equal to 2). The covariate values
remained constant throughout the simulations. Also, φ = exp(δ), with δ = 2.5, 5.0, which yield
φ ≈ 12, 148, respectively. It is interesting to consider scenarios where the response values are close
to one, close to zero, and scattered on the standard unit interval. When xt ∼ U(0, 1), two situations
were considered. First, β1 = 4.0 and β2 = −0.8, which resulted in mean response values close
to one, μ ∈ (0.96, 0.98). Second, β1 = −2.5 and β2 = −1.2, which resulted in mean response
values close to zero, μ ∈ (0.024, 0.075). When the covariate values were obtained as random
draws of the t3 distribution, β1 = 1.21 and β2 = 1.25, which yielded μ ∈ (0.05, 0.93). Finally,
when the exponential distribution is used in the generation of the covariate values, β1 = −1.2
and β2 = −1.3, which yielded μ ∈ (0.04, 0.53). Three different residuals are considered: the
standardized ordinary residual given in (2), and the standardized weighted residuals 1 and 2
proposed in this paper (Equations (7) and (8), respectively). All results are based on 1000 Monte
Carlo replications and n = 20.
Figures 1 and 2 contain normal probability plots of the mean order statistics corresponding to
φ = 12 and φ = 148, respectively. They reveal that the distributions of both standardized weighted
residuals proposed here are well approximated by the standard normal distribution when φ = 148,
even when the mean responses are close to either 0 or 1; see the second and third columns of panels
in Figure 2. When φ = 12, the distributions of the two standardized weighted residuals display
some asymmetry: to the right when the mean responses are close to one and to the left when they
are close to zero; see panels (b), (c), (e) and (f) in Figure 1. Such asymmetry is considerably more
marked for the standardized ordinary residual. Even when φ = 148, the distribution of such a
residual still displays some asymmetry when the mean responses are close to one of the limits of
the standard unit interval; see panels (a) and (d) of Figure 2. Similar results not shown here were
obtained for n = 40, 60 and also using the probit link function.
The simulation results indicate that the distributions of the two residuals proposed in this paper
are better approximated by the standard normal distribution than that of the standardized ordinary
residual. In some situations, they are indeed much better approximated.

4. Applications
In what follows we shall present three applications, two of them based on real data and one that
employs simulated data. The first application uses data on food expenditure, income, and number
of persons in each household from a random sample of 38 households in a large US city; the
source of the data is Griffiths et al. [2, Table 15.4]. We model the proportion of income spent on
food (y) as a function of the level of income (x2 ) and the number of persons in the household (x3 )
using the logit link function. The response values are contained in the interval (0.108, 0.562),
their median being 0.261. The maximum likelihood estimate of φ is approximately equal to 35;
see Table 2 in Ferrari and Cribari-Neto [1]. The plots of the residuals against the indices of the
observations are similar and suggest that the residuals are randomly scattered around zero; see
Figures 3(a)–(c). It is noteworthy that the residual plot that employs the standardized weighted
residual 2 singles out a larger number of atypical observations than the remaining residual plots;
these are observations 5, 11 and 20. We have individually removed observations 5, 11 and 20
from the data, and also these three observations together. The relative changes in the parameter
estimates are presented in Table 1.
Journal of Applied Statistics 411

Figure 1. Normal probability plots, φ = 12.

The figures in Table 1 show that the parameter estimates are considerably sensitive to the
observations identified as atypical by the standardized weighted residual 2, more so when taken
together. For instance, when the three observations are removed from the sample, the estimate of
412 P.L. Espinheira et al.

Figure 2. Normal probability plots, φ = 148.

φ increases by more than 70%, and the estimates of β1 and β3 , by more than 40%. We note, from
panels (d), (e) and (f) of Figure 3, that the three normal probability plots are similar, and that there
is a slight tendency for the residuals that assume values between −1 and 1 to be near or above the
upper envelope limit. However, there is no clear evidence of misspecification.
Journal of Applied Statistics 413

Figure 3. Residual plots, data on food expenditure.

The next application is based on the data analysed by Smithson and Verkuilen [7], obtained from
Pammer and Kevan [6]. The response variable (y) are scores on a test of reading accuracy of 44
children, and the covariates are dyslexia versus non-dyslexia status (x2 ), nonverbal IQ converted
414 P.L. Espinheira et al.
Table 1. Relative changes in estimates (%), data on
food expenditure.

Obs β1 β2 β3 φ

5 13.8 7.5 3.7 6.8


11 22.6 7.9 24.2 9.9
20 6.9 8.0 2.4 17.4
5,11,20 42.7 21.8 42.2 71.0

to z-scores (x3 ) and an interaction variable (x4 ). Participants (19 dyslexics and 25 controls) were
recruited from primary schools in the Australian Capital Territory. The ages of the children ranged
from eight years five months to twelve years three months. The covariate x2 assumes value 1 when
the child is dyslexic and −1 otherwise. The observed scores y  were linearly transformed from
their original scale to the open unit interval (0, 1). Let a and b denote the smallest and largest
possible scores, and let y  = (y  − a)/(b − a). Now, in order to obtain a variable that assumes
values on the open interval (0, 1), compute y = [y  (n − 1) + 0.5]/n, where, as before, n denotes
the sample size. The mean accuracy score was 0.900 for non-dyslexic readers and 0.606 for the
dyslexic group. The scores ranged from 0.459 to 0.990, the overall mean score being 0.773. The
fit of the model indicated that the only covariate that is statistically significant at the usual nominal
levels was the dyslexia status; see Table 2.
This result suggests that IQ makes little or no clear independent contribution. Smithson and
Verkuilen [7] showed, however, that this impression is misleading. They modelled the precision
parameter using IQ and dyslexia status as covariates and found that the effect of IQ on reading
accuracy becomes statistically significant, as expected. In particular, reading accuracy declines
for the dyslexic group and increases with IQ. They also found that the interaction effect was sta-
tistically significant under nonconstant precision. In particular, the positive relationship between
IQ and reading accuracy holds for nondyslexic children but not for the dyslexic group. We com-
puted both standardized ordinary residuals and standardized weighted residuals 2 from the fitted
beta regression in order to check whether there is evidence of model misspecification due to the
constant precision assumption. We also aim at identifying influential observations.
The plot of standardized ordinary residuals against the indices of the observations (Figure 4(a))
singles out observations 17, 19 and 24 as atypical (observation 2 is close to the threshold).
We note that the simulation results obtained under a similar setting (small precision, mean
responses that are not concentrated near one of the standard unit interval limits – Figure 1(g))
indicated that there is some asymmetry in these residuals, and the thresholds −2 and 2 should be
used with care. The corresponding plot that uses standardized weighted residuals 2 (Figure 4(b))
identifies observation 8 as worthy of further investigation. We have reestimated the model after
removing observation 8 from the data, and likewise for observations 17, 19 and 24 together.
The relative changes in the parameter estimates and the p-values for the significance tests of the
associated parameters are given in Table 3. Note that the removal of observation 8 has a large
impact on the estimates of β3 (coefficient of IQ) and β4 (coefficient of the interaction between
IQ and dyslexia status); these estimates increased by approximately 65% and 50%, respectively.

Table 2. Parameter estimates, reading accuracy data.

Parameter β1 β2 β3 β4 φ

Estimate 1.334 −0.974 0.161 −0.219 11.133


p-value 0.0000 0.0000 0.2317 0.1049
Journal of Applied Statistics 415

Figure 4. Residual plots, reading accuracy data.

It is noteworthy that the IQ effect becomes nearly significant at the 5% nominal level, and the
interaction effect, at the 1% nominal level. The joint removal of observations 17, 19 and 24, on
the other hand, has a large impact on the estimate of the precision parameter, but its impact on
the estimates of the parameters in the linear predictor is not nearly as noticeable. Once again, the
416 P.L. Espinheira et al.
Table 3. Relative changes in estimates and p-values, reading accuracy data.

Obs Parameter β1 β2 β3 β4 φ

Obs. 8 Rel. change 6.0 8.4 65.6 48.5 7.7


p-value 0.0000 0.0000 0.0516 0.0178
Obs. 17, 19, 24 Rel. change 14.2 18.2 8.4 7.1 58.1
p-value 0.0000 0.0000 0.1546 0.0559

standardized weighted residual 2 was more successful than the standardized ordinary residual in
identifying observations that are influential to the estimation of the effect of the covariates on the
mean response.
The normal probability plot in panel (c) of Figure 4 suggests lack of fit of the regression model.
Panels (e) through (h) of Figure 4 present plots of the residuals against the IQ and dyslexia status
covariates. They indicate that the dispersion of the residuals is not constant for all covariate values.
Indeed, there is more dispersion for the control group than for the dyslexic group; we also note
that there is more dispersion for the individuals whose IQ measures are between −1 and 1 than
for those in the two tails. Therefore, there is evidence in favour of nonconstant precision, and a
modelling strategy that accounts for that is required, as in Smithson and Verkuilen [7]. Finally,
note that the estimate of φ is small (φ ≈ 11) and, as expected based on the simulation results in
Section 3, the distribution of the standardized ordinary residual is clearly asymmetric.
The illustrations presented so far suggested that the standardized weighted residual 2 is the most
efficient in identifying observations that have large influence on the estimates of the parameters
that index the mean response. In what follows we shall use simulated data to further investigate

Figure 5. Fitted lines with and without data error.


Journal of Applied Statistics 417
Table 4. Model fit, simulated data.

Correct data Data with error Without obs. 30


Parameter Estimate p-value Estimate p-value Estimate p-value

β1 3.51 0.0000 1.63 0.0000 3.52 0.0000


β2 −3.41 0.0000 −0.20 0.4571 −3.44 0.0000
φ 330.17 7.72 330.28 4177.9

Figure 6. Residual plots, using simulated data: incorrect value used.


418 P.L. Espinheira et al.

that. The beta regression used is such that log[μt /(1 − μt )] = β1 + β2 xt , t = 1, . . . , 30. The
covariate values were obtained as random draws from the standard uniform distribution U(0, 1)
except for the last value, which was set equal to 3, i.e. x30 = 3.0. (Note that x30 is a leverage point.)
Also, β1 = 3.5, β2 = −3.4 and φ = 403.43 (φ = exp(δ) and δ = 6.0). As expected, residual
analyses based on the three different residuals under study did not single out any observation
as atypical nor yielded evidence of lack of fit. Next, we mimicked a data recording/entry error:
the response value corresponding to x30 was replaced by max{y1 , . . . , y29 }. Observation 30 now
becomes an outlier in addition to being a leverage point; indeed, this point has a marked influence
on the model fit; see Figure 5 which plots the data points (with both values of y30 ), and the
fitted lines (solid: no data entry error; dashed: wrong value of y30 was used). It is noteworthy
that the fitted line is highly sensitive to the wrong value of y30 , which is evidence that such an
observation is highly influential. Table 4 presents the parameter estimates for: (i) the correct data;
(ii) the data with the wrong value of y30 ; and (iii) the data without observation 30. Note that
the estimate of the precision parameter is considerably reduced when the wrong value of y30
is used; the fit takes the outlier as indication of large dispersion. Note also that the estimates
of β1 and β2 are highly sensitive to the incorrect values of y30 . Finally, the model fit obtained
when observation 30 is removed from the data is very similar to that obtained with the complete
(correct) data.
We shall examine the residuals from the fitted regression model that uses the data with the
incorrect value of y30 ; see Figure 6. We note that the plots corresponding to the standardized
weighted residual 2 clearly single out observation 30 as worthy of further investigation. Note also
that the plots constructed from the standardized ordinary residual do not identify this observation
as atypical. As for the standardized weighted residual 1, the plot against the observations indices
singles out the 30th residual, but not as clearly and unmistakably as does the standardized weighted
residual 2.

5. Concluding remarks
We proposed two new residuals for the class of beta regression models. They were numerically
evaluated relative to the standardized ordinary residual proposed by Ferrari and Cribari-Neto [1].
The latter is based on the difference between the observed responses and the fitted means whereas
the new residuals are constructed using the difference between the logit of the responses and their
fitted means. The new residuals are obtained using Fisher’s scoring iterative scheme for the
estimation of the parameters that index the regression linear predictor. The results favour one of
the residuals we propose, more specifically the residual that accounts for observations leverages.
Its finite-sample distribution is better approximated by the standard normal distribution than that
of the standardized ordinary residual. It also identifies more clearly influential observations. In the
applications presented in Section 4 we noticed that the observations singled out as atypical by this
new residual were indeed quite influential on the estimates of the parameters in the linear predictor.
In the illustration based on simulated data, the proposed residual, unlike the standardized ordinary
residual, was able to identify an observation that was clearly atypical, being both a leverage point
and an outlier. This was possible because such a residual takes into account the different leverages
of the observations, unlike the other residuals. The three residuals behave similarly when used to
investigate model misspecification.

Acknowledgements
We gratefully acknowledge partial financial support from CNPq and FAPESP.
Journal of Applied Statistics 419

References
[1] S.L.P. Ferrari and F. Cribari-Neto, Beta regression for modelling rates and proportions, J. Appl. Stat. 31 (2004),
pp. 799–815.
[2] W.E. Griffiths, R.C. Hill, and G.G. Judge, Learning and Practicing Econometrics, Wiley, New York, 1993.
[3] R. Kieschnick and B.D. McCullough, Regression analysis of variates observed on (0,1): percentages, proportions
and fractions, Stat. Modelling 3 (2003), pp. 193–213.
[4] E.L. Lehmann and E. Casella, Theory of Point Estimation, 2nd ed., Springer-Verlag, New York, 1998.
[5] P. McCullagh and J.A. Nelder, Generalized Linear Models, 2nd ed., Chapman and Hall, London, 1989.
[6] K. Pammer and A. Kevan, The contribution of visual sensitivity, phonological processing, and nonverbal IQ to
children’s reading, Sci. Stud. Read. 11 (2007), pp. 33–53.
[7] M. Smithson and J. Verkuilen, A better lemon-squeezer? Maximum likelihood regression with beta-distribuited
dependent variables Psycholog. Meth. 11 (2006), pp. 54–71.

Appendix 1
In what follows we shall present the score function and Fisher’s information for β in the class of beta regression models
[1], assuming that φ is known. We shall also present results that are useful to the derivation of the residuals proposed in
this paper.

The log-likelihood function is (β, φ) = nt=1 t (μt , φ), where t (μt , φ) = log (φ) − log (μt φ) − log ((1 −
μt )φ) + (μt φ − 1) log yt + {(1 − μt )φ − 1} log(1 − yt ). The score function for β is given by
Uβ (β, φ) = φX T (y ∗ − μ∗ ), (9)

where  
1 1
T = diag ,..., . (10)
g  (μ 1) g  (μ n)
Fisher’s information for β is
Kββ = φX W X, (11)
where
W = diag{w1 , . . . , wn }, (12)
where wt = φvt [1/{g  (μt )}2 ], with
vt = ψ  (μt φ) + ψ  ((1 − μt )φ) . (13)
Assuming that φ is known, Fisher’s scoring iterative scheme used for estimating β can be written as
(m) −1 (m)
β (m+1) = β (m) + (Kββ ) Uβ (β), (14)

where m = 0, 1, 2, . . . are the iterations that are performed until convergence, which occurs when the distance between
β (m+1) and β (m) becomes smaller than a given small constant. Plugging the score function and Fisher’s information for
β given in (9) and (11), respectively, into the iterative scheme in (14), we arrive at (3).
It is important to note that the beta density (1) belongs to a canonical two-parameter exponential
family. Indeed, f (yt ; μt , φ) = exp{τ1 T1 + τ2 T2 − A(τ )}(1/yt (1 − yt )), where τ = (τ1 , τ2 ) = (μt φ, φ), (T1 , T2 ) =
(log{yt /(1 − yt )}, log(1 − yt )) e A(τ ) = {− log (φ) + log (μφ) + log ((1 − μ)φ)}. Thus,
∂A(τ )
E(T1 ) = E(yt∗ ) = = ψ(μt φ) − ψ((1 − μt )φ) = μ∗t (15)
∂τ1
and
∂ 2 A(τ )
var(T1 ) = var(yt∗ ) = = ψ  (μt φ) + ψ  ((1 − μt )φ) = vt ; (16)
∂τ12
see [4, p. 27].

You might also like