Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

J. R. Statist. Soc.

A (2007)
170, Part 4, pp. 975–1000

Small area estimates of labour force participation


under a multinomial logit mixed model

Isabel Molina,
Universidad Carlos III de Madrid, Madrid, Spain

Ayoub Saei
University of Southampton, UK

and M. José Lombardía


Universidad de Santiago de Compostela, Spain

[Received December 2005. Final revision February 2007]

Summary. A new methodology is developed for estimating unemployment or employment char-


acteristics in small areas, based on the assumption that the sample totals of unemployed and
employed individuals follow a multinomial logit model with random area effects. The method is
illustrated with UK labour force data aggregated by sex–age groups. For these data, the accuracy
of direct estimates is poor in comparison with estimates that are derived from the multinomial
logit model. Furthermore, two different estimators of the mean-squared errors are given: an ana-
lytical approximation obtained by Taylor linearization and an estimator based on bootstrapping.
A simulation study for comparison of the two estimators shows the good performance of the
bootstrap estimator.
Keywords: Bootstrap; Maximum likelihood; Multinomial logit mixed model; Penalized quasi-
likelihood; Small area estimation; Unemployment

1. Introduction
Unemployment is an indicator of socio-economic situation and is thus an issue of primary
interest for society in general, and in particular for local, regional and central governments
that need to allocate effectively the funds which are needed for conducting employment plans
or policies. The European Union provides structural funds to cofinance specific employment
programmes with the purpose of achieving a better equilibrium in the levels of development
of the different European regions. Of course, the effectiveness of these programmes depends
on rigorous knowledge of regional socio-economic activity via adequate and reliable statistical
information. Thus, regional studies and investigations are currently of great interest.
In particular, the European statistical office Eurostat demands from the statistical offices
of the members increasingly detailed statistical information on smaller geographical regions.
However, the national statistical offices face the problem that the sample sizes of current
national surveys are not planned to provide reliable direct estimates for such small areas, and
the increase in size that is necessary to cover all these areas adequately is not affordable.
For instance, concerning labour force statistics, the Office for National Statistics of the UK
Address for correspondence: Isabel Molina, Departamento de Estadística, Universidad Carlos III de Madrid,
28903 Getafe, Madrid, Spain.
E-mail: isabel.molina@uc3m.es

 2007 Royal Statistical Society 0964–1998/07/170975


976 I. Molina, A. Saei and M. J. Lombardía
considers an estimate to be publishable if its coefficient of variation is less than 20% (Office
for National Statistics (2004), volume 6, annex C). According to this rule, for the annual
labour force data from year 2000 that are at our disposal, direct estimates of unemployed totals
can be published only for 75 out of 406 unitary authorities and local authority districts.
The European project EURAREA (EURAREA Consortium, 2004), which was funded by
Eurostat from 2001 to 2004, was designed to investigate, compare and supply procedures for
estimating quantities of general interest such as gross domestic product and rates of unem-
ployment in small areas. A small area is defined as a geographical region or a domain where
direct estimates (calculated just from the data that are sampled within the target area) lack
precision.
Most of the techniques that have been developed for small area estimation are included in
Rao (2003). For a recent review, see Jiang and Lahiri (2006). Generally, the way to overcome
the lack of observations in the target area is to increase the effective number of observations
that are used for estimation in that area. This is generally carried out through the use of implicit
or explicit models that relate the target variable to some auxiliary explanatory variables. The
resulting estimators benefit from the assumption of a constant dependence relationship across
areas to increase the effective information.
Model-assisted approaches (see for example Lehtonen and Veijanen (1998) and Estevao and
Särndal (1999, 2005)) are design based but assisted by models that enhance the accuracy of the
estimators. They are design unbiased, but when the sizes of the areas are too small they can
suffer from instability.
Model-dependent approaches are broadly accepted and have been widely used in recent
years. They suffer from design bias but their overall accuracy measures (mean-squared errors)
remain small unless the model is poorly specified or the auxiliary variables are not very infor-
mative. Biased estimators with better accuracy than direct estimators have been accepted for a
long time. Fay and Herriot (1979) had already stated that
‘for smaller places, substituting biased estimates with negligible sampling error . . . for estimates with
large sampling error is preferable’.

The inclusion of area random effects in the model is a common practice in the current literature
on small area estimation. These effects model the variations over areas that are not explained
by auxiliary variables and additionally allow for correlations between the units within an area.
Such correlations are often observed in practice when the areas are geographical regions or
homogeneous domains.
The auxiliary information is typically taken from census or other administrative sources. If
there is relevant auxiliary information for each unit in the population, then the models are
usually formulated at the unit level. However, sometimes the information at the unit level is
not updated and other times there may be confidentiality reasons that prevent its use. In such
situations, it is usually possible to obtain data that are aggregated by areas, and the model
is then stated at the area level. The model that is assumed in the application of Section 2 is
in between the two approaches, since the available data are aggregated by sex–age categories
within areas. Then sex–age categories can be regarded as individual units within areas, but the
statistical advantages of aggregated data remain.
Linear mixed models are a common tool for small area estimation. Totals of unemployed
and employed individuals could be estimated via two separate models of this kind, relating the
direct estimates of the proportions of unemployed and employed to some area level auxiliary
variables. However, the estimated proportions derived might be inconsistent in the sense that
they might not be within the [0, 1] interval, and also the sum of both proportions might exceed
Estimates of Labour Force Participation 977
1, which in terms of totals means that the estimated number of unemployed plus employed
individuals could exceed the corresponding population total. Another disadvantage of these
models is that they do not take into account the typical strong dependence between the pro-
portions of unemployed, employed and inactive people. A bivariate linear mixed model could
provide estimates for two of these quantities, allowing them to be correlated, and the third
quantity could be calculated by subtraction from the population total. However, the previously
mentioned inconsistency problems remain.
The estimated proportions can be brought to the [0, 1] interval by using logistic models, which
relate the logit transformation of the proportions to the auxiliary variables. A univariate logistic
model with random area effects was proposed by the EURAREA Consortium (2004) (see pro-
ject reference volume D7.1.4, part 1, pages C5.6–C5.8) to model the proportion of unemployed
population. Moreover, the UK Office for National Statistics has recently released small area
estimates of unemployment rates that were obtained from a model of this type. The model
provides estimated totals of unemployed individuals, which are then combined with the direct
estimates of the totals of employed individuals to derive the rates of unemployment (Hastings
et al., 2003).
In this paper we propose to estimate certain unemployment or employment measures of inter-
est, namely totals, proportions and rates of unemployment, assuming a joint multinomial logit
model with random area effects for the proportions of unemployed and employed individuals.
This model adapts naturally to the characteristics of the problem, solving the inconveniences
of previous approaches, and allowing simultaneous model-based estimation of unemployment,
employment and inactivity totals. The model coefficients are interpretable as relative incre-
ments of ratios of unemployed or employed over inactive totals. Rates of unemployment or
other quantities of interest such as rates of inactivity are easily derived.
In Section 2 we illustrate the proposed methodology with a data set from the Great Britain
Labour Force Survey from the year 2000 (see Office for National Statistics (2004), volume 6,
for details on the Labour Force Survey for local area data). Section 2.1 specifies the model and
Section 2.2 describes the results of the model fit. The estimated totals of unemployed and
employed are compared with the direct estimators in Section 2.4. We observe an increase
in accuracy for the new model-based estimators for all areas. This increase is remarkable
particularly for unemployment because of the small number of sampled unemployed individuals
within the areas.
The accuracy of small area estimates is indeed crucial, because the loss of unbiasedness will
be accepted only if there is a clear gain in accuracy. Thus, in Section 2.5 we describe two different
approaches for approximating the mean-squared error of the new small area estimators. The
first is an analytical approximation based on Taylor linearizations. The second is a bootstrap
estimator that is obtained by a parametric bootstrap procedure which was specially designed
for the data structure at hand. It avoids linearizations, is of simple practical application and
easily extends to other types of parameter and model. In the simulation study that is described
in Section 3 we show the good performance of the bootstrap estimator. Furthermore, in that
section we use an approach which is similar to that of Hastings et al. (2003) referred to above
for the simulated data, and we compare the results with those obtained from the multinomial
logit mixed model.
Models must be constructed ad hoc for each data set at hand, and this means that each data
set must be studied until an adequate model is found for these data. Although the main objective
of the application in Section 2 is illustrative, the results for the available real data show that the
model that is fitted in Section 2 provides reliable estimates of unemployment or employment
characteristics.
978 I. Molina, A. Saei and M. J. Lombardía
2. Illustration with labour force data of Great Britain
2.1. Model specification
The available data set (source: Office for National Statistics) contains labour force data for
small areas (unitary authorities and local authority districts) in Great Britain from the year
2000 aggregated by sex–age categories. There are 406 × 6 records corresponding to the 406
small areas and six sex–age groups for each area, and nine columns with the variables that are
described in Table 1. The variable CLUSTER is a socio-economic classification of areas that
was developed by the Office for National Statistics (Bailey et al., 2000). The variables GOR,
CLUSTER and REG.UNEMPLOYED are obtained from an administrative source, and the
rest of the variables come from the Labour Force Survey.
Consider the multinomial vector that counts the number of sampled unemployed, employed
and inactive individuals within each AREA–SEXAGE group. The aim of this work is to obtain
small area estimates of some usual labour force participation characteristics through a model for
the multinomial probabilities of unemployed and employed individuals. Thus, first a preliminary
analysis was performed to assess the potential predictive power of each auxiliary variable in the
data set.
Fig. 1 plots the mean proportions of employed and unemployed people over the GOR, SEX-
AGE and CLUSTER categories. Observe that both mean proportions vary across the differ-
ent categories of each variable, but this variation is different for the two proportions since
the lines are not parallel. Indeed, analysis of variance confirmed that there are statistically
significant differences in each mean proportion between the different GOR, CLUSTER and
SEXAGE categories. These results suggest that the indicators of the categories of the three
variables are potentially helpful in predicting the probabilities of unemployed and employed
individuals.
Modelling of probabilities by real-valued explanatory variables requires transformation of
these probabilities into quantities that vary over the whole real line. The logit transformation
is commonly used for multinomial models owing to its simplicity. For a multinomial variable
with three categories and probabilities p1 , p2 and p3 , considering the last category as base
reference, the logit of pj is defined as log.pj =p3 /, j = 1, 2. In our case, regarding the inactive
as the reference category, the sample logit for the proportion of unemployed is equal to the

Table 1. Description of the variables in the labour force 2000 data file

Variable name Description

AREA Unitary authority or local authority district: 1–406


SEXAGE Sex–age categories: 1–6 (age, [16,25], (25,40] and > 40 years; men,
1, 2, 3; women, 4, 5, 6)
GOR Government office region: 1–12
CLUSTER Socio-economic classification: 1–7 (1, rural areas; 2, urban fringe;
3, coast and services; 4, prosperous England; 5, mining,
manufacturing and industry; 6, education centres and outer
London; 7, inner London)
SAMP.UNEMPLOYED Total of sample unemployed individuals in AREA–SEXAGE group
SAMP.EMPLOYED Total of sample employed individuals in AREA–SEXAGE group
SAMP.INACTIVE Total of sample inactive individuals in AREA–SEXAGE group
REG.UNEMPLOYED Total of registered individuals in an unemployment office in AREA–
SEXAGE group
TOTAL Number of individuals in AREA–SEXAGE group
Estimates of Labour Force Participation 979

1.0
0.6

0.8
0.5
0.4

0.6
0.3

0.4
0.2

0.2
0.1

0.0
0.0

2 4 6 8 10 12 1 2 3 4 5 6
GOR SEXAGE
(a) (b)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0

1 2 3 4 5 6 7
CLUSTER
(c)
Fig. 1. Mean proportion of employed () and unemployed (ı) over (a) the GOR, (b) SEXAGE and
(c) CLUSTER categories

logarithm of SAMP.UNEMPLOYED over SAMP.INACTIVE. For the employed the logit is


defined analogously.
Concerning now the variable REG.UNEMPLOYED, observe that this variable refers to the
population, with the population totals being different in each AREA–SEXAGE combination.
Thus, it seems more reasonable to consider as a potential covariate the proportion of registered
unemployed in the small areas. In Fig. 2 we show scatterplots of the logits of the proportions
of unemployed (Fig. 2(a)) and employed (Fig. 2(b)) versus the logarithm of the proportion
of REG.UNEMPLOYED. Observe that the logits (or log-odds with respect to the inactive)
of the two proportions increase linearly with the log-proportion of people who are registered
in an unemployment office. Thus, irrespectively of the rest of the auxiliary variables, the log-
proportion of REG.UNEMPLOYED seems to be a powerful covariate for modelling both
980 I. Molina, A. Saei and M. J. Lombardía
1

4
log(prop.emp/prop.inac)
0

log(prop.emp/prop.inac)
3
−1

2
−2

1
−3

0
−4

−1
−5

−2
−7 −6 −5 −4 −3 −2 −1 −7 −6 −5 −4 −3 −2 −1
log(prop.registered) log(prop.registered)
(a) (b)
Fig. 2. Logit of (a) unemployed and (b) employed against log(REG.UNEMPLOYED)

probabilities. If the logits of the proportions are plotted against the untransformed proportion
of REG.UNEMPLOYED, a clear pattern can barely be distinguished owing to the high dis-
persion of the points. Anyway, both models (with and without logarithmic scale) were fitted
and the differences in the predicted values were negligible.
Fig. 3 plots SAMP.EMPLOYED against SAMP.UNEMPLOYED. The whole scatterplot
is depicted in Fig. 3(a), and in Fig. 3(b) we have augmented the scale to see the main cloud
of points more clearly. The integer nature and the frequent small figures of SAMP.UNEM-
PLOYED in the AREA–SEXAGE combinations produce the vertical lines that are observed in
the plot. Observe that the points are distributed along a band with positive slope, where large
numbers of sampled unemployed are mostly associated with large numbers of employed indi-
viduals. Thus, this plot suggests that the numbers of unemployed and employed individuals are
linearly dependent. Furthermore, the concentration of points in the bottom left-hand corner
200
400

150
300

samp.employed
samp.employed

100
200

50
100

0
0

0 10 20 30 40 50 0 5 10 15 20
samp.unemployed samp.unemployed
(a) (b)
Fig. 3. SAMP.EMPLOYED against SAMP.UNEMPLOYED
Estimates of Labour Force Participation 981

1.5
1.0
0.5
0.0

0 100 200 300 400


Fig. 4. Unemployed/employed (ı) and inactive/active (Å) for each area

indicates that the joint distribution of these two variables is highly skewed. On the view of this
plot, it seems convenient to consider a bivariate model representing the observed dependence.
Fig. 4 plots the two rates, unemployed over employed (at the bottom) and inactive over active
(at the top), for each small area. Observe that the variation across areas of the rate unem-
ployed/employed is small compared with the variation of the inactivity/activity rate. Thus,
a large part of the variation across areas of the distribution of unemployed, employed and in-
active is due to the variation in activity/inactivity, and a smaller part due to the variation of
unemployed over employed. In accordance with this we assume that the small variations of the
rate unemployed/employed across areas can be explained sufficiently by the auxiliary variables.
Thus, we have included in the model random area effects that represent the variation of activ-
ity/inactivity, which is the largest part of the across-area variation that is observed in the data.
However, these random effects are constant for the categories unemployed and employed; see
model (2). This assumption considerably simplifies the model and the fitting method, and makes
the subsequent estimation of mean-squared errors easier and more understandable. In Section
2.3 we propose a diagnostic method based on the residuals for assessing whether a model with
specific random area effects for unemployed and employed is worthwhile for the data at hand.
The results indicate that not much can be gained in our case, and consequently the simpler
model with common area effects is preferred here.
Thus we considered as explanatory variables the log-proportion of REG.UNEMPLOYED
and 22 dummy indicators for categories of GOR, CLUSTER and SEXAGE, taking the last
category of each as base reference. With an intercept the constructed incidence matrix X has
24 columns. We use index i (i = 1, . . . , 6) for the SEXAGE category and d (d = 1, . . . , 406)
for AREA. Thus, the rows of X are indexed by xdi , ydi1 , ydi2 and ydi3 denote the number of
sampled unemployed, employed and inactive respectively, mdi = ydi1 + ydi2 + ydi3 the sample
size, and pdi1 , pdi2 , pdi3 = 1 − pdi1 − pdi2 the respective probabilities of unemployed, employed
and inactive individuals. Finally, ud denotes the random effect of area d. We assume that the
982 I. Molina, A. Saei and M. J. Lombardía
vectors (ydi1 , ydi2 , ydi3 ) given ud and mdi are independent across d and i with multinomial dis-
tribution, i.e. with probability mass function
mdi ! y y y
f.ydi1 , ydi2 |ud / = p di1 p di2 p di3 : .1/
ydi1 ! ydi2 ! ydi3 ! di1 di2 di3
Moreover, we assume that the probabilities .pdi1 , pdi2 / are related to the auxiliary variables and
the random area effects through the logit link as follows:
IID
log.pdij =pdi3 / = xdi βj + ud , j = 1, 2, i = 1, . . . , 6, d = 1, . . . , 406, ud ∼ N.0, ϕ/,
.2/
where βj = .β1j , . . . , β24j /T contains the coefficients of the explanatory variables for the multi-
nomial category j, j = 1, 2. This model introduces a natural correlation structure among the
unemployed, employed and inactive, and among units within the same small area. The model fit
provides estimated probabilities of unemployed and employed contained in the [0, 1] interval and
that add up to 1. Estimates of totals or proportions can be obtained even for areas without unem-
ployed people in the sample, although for a price in terms of sampling error. For areas with at
least a few sampled unemployed individuals, estimates with acceptable accuracy can be obtained.
Estimation of small area totals of unemployed, employed and inactive people requires the
prediction of the corresponding unsampled numbers of unemployed, employed and inactive
people in each AREA–SEXAGE group. Let us denote these quantities by ydi1 r , yr and yr
di2 di3
r r r r
respectively, and let mdi = ydi1 + ydi2 + ydi3 be the number of unsampled units. We assume that
model (1)–(2) holds also for .ydi1 r , yr /, with m replaced by mr . Furthermore, we denote by
di2 di di
r
Mdi = mdi + mdi the number of population units in the ith SEXAGE group within AREA d. We
assume that the population size Mdi is known for each AREA and SEXAGE group, and that
there are some observations in each small area.

2.2. Model fitting


The model fitting was carried out by using a combination of the penalized quasi-likelihood
(PQL) method that was introduced by Breslow and Clayton (1993) for the estimation of β =
.βT T T
1 , β 2 / and ud , d = 1, . . . , 406, with either maximum likelihood (ML) or restricted ML (RML)
for the estimation of the variance ϕ of random effects. The two methods for estimating ϕ rely
on a normal approximation of the marginal likelihood of the transformed data ξdi = .ξdi1 , ξdi2 /′
where ξdij = log.ydij =ydi3 /, j = 1, 2. This combined algorithm was introduced by Schall (1991)
and was later used in the context of small area estimation by Saei and Chambers (2003) for gen-
eralized linear mixed models. In Appendix A the fitting method is adapted to the multivariate
set-up of the multinomial logit mixed model that was introduced in Section 2.1, and the steps
of the algorithm are detailed. This algorithm was implemented by the authors in C++.
Using the multivariate normal approximation that was mentioned above, statistics for testing
the significance of the model parameters were derived. For the model coefficients βkj , Z-type test
statistics were obtained by dividing each estimated coefficient by an estimate of its standard error
obtained from the Fisher information matrix. For the variance of the random effects ϕ, the likeli-
hood ratio test was calculated. Under the null hypothesis ϕ = 0, the likelihood ratio test statistic is
distributed as a mixture of two χ2 -distributions with 0 and 1 degrees of freedom; more precisely,
0:5χ20 + 0:5χ21 (see Self and Liang (1987) or Claeskens (2004)). The estimated model parameters
and the resulting test statistics are listed in Table 2. The footnote to Table 2 gives the estimate of ϕ,
the value of the likelihood ratio test statistic and the corresponding quantile at the significance
level α = 0:05. For both responses, unemployed and employed, at least one category of each
variable has a significant coefficient, and the variance of the random effects is also significant.
Table 2. Model fitting results†

Variable Results for the unemployed Results for the employed

Estimate Standard Z p-value Estimate Standard Z p-value


deviation deviation

Constant −2.059 0.258 −7.99 0‡ −1.599 0.149 −10.73 0‡


GOR = 1 0.172 0.081 2.12 0.034§ 0.030 0.057 0.53 0.598
GOR = 2 −0.043 0.092 −0.47 0.638 −0.192 0.061 −3.12 0.002§§
GOR = 3 0.139 0.088 1.58 0.115 0.041 0.056 0.73 0.468
GOR = 4 0.129 0.086 1.50 0.133 0.058 0.055 1.06 0.287
GOR = 5 0.159 0.080 2.00 0.045§ −0.002 0.056 −0.043 0.966
GOR = 6 0.035 0.080 0.44 0.663 −0.039 0.054 −0.72 0.474
GOR = 7 0.114 0.089 1.28 0.200 −0.129 0.062 −2.06 0.039§
GOR = 8 −0.219 0.125 −1.74 0.080 −0.315 0.094 −3.36 0.001‡
GOR = 9 0.162 0.112 1.44 0.149 −0.059 0.076 −0.77 0.443
GOR = 10 0.005 0.088 0.06 0.951 −0.051 0.056 −0.91 0.361

Estimates of Labour Force Participation


GOR = 11 0.039 0.087 0.45 0.651 −0.009 0.056 −0.16 0.873
CLUSTER = 1 0.107 0.141 0.76 0.447 0.368 0.094 3.93 0‡
CLUSTER = 2 0.196 0.130 1.51 0.131 0.409 0.089 4.61 0‡
CLUSTER = 3 0.147 0.135 1.09 0.275 0.210 0.093 2.27 0.023§
CLUSTER = 4 0.346 0.146 2.38 0.017§ 0.550 0.095 5.81 6.1×10−9 ‡
CLUSTER = 5 0.163 0.129 1.26 0.208 0.073 0.091 0.81 0.416
CLUSTER = 6 0.128 0.106 1.21 0.227 0.291 0.076 3.83 1.0×10−4 ‡
SEXAGE = 1 2.297 0.130 17.73 0‡ 2.029 0.064 31.70 0‡
SEXAGE = 2 2.153 0.102 21.06 0‡ 1.689 0.047 35.91 0‡
SEXAGE = 3 2.878 0.121 23.83 0‡ 3.658 0.059 61.78 0‡
SEXAGE = 4 2.055 0.075 27.35 0‡ 2.197 0.027 79.87 0‡
SEXAGE = 5 0.534 0.087 6.11 9.8× −10−10 ‡ 0.702 0.033 21.01 0‡
REG.UNEMPLOYED 1.147 0.105 10.92 0‡ −0.162 0.0561 −2.90 0.004§§

†ϕ: estimate 0.026; likelihood ratio test statistic, 591.45; critical value, 7.68.
‡Significant at level 0.001.
§Significant at level 0.05.
§§Significant at level 0.01.

983
984 I. Molina, A. Saei and M. J. Lombardía
Taking the exponential in equation (2), we obtain that the marginal effect of an increment ∆Xk
of an explanatory variable Xk on the ratio pdij =pdi3 is a multiplicative effect of exp.βkj ∆Xk /,
j = 1, 2. In particular, when Xk is a dummy indicator, the ratio of unemployed over inactive
for the category that is represented by Xk is exp.βk1 / times the value of the ratio for the base
category. In this way, the coefficients of the SEXAGE indicators that are displayed in Table 2
can be interpreted as follows. The ratio unemployed over inactive for SEXAGE = 1 (men aged
between 16 and 25 years) is about 10 ≈ exp.2:3/ times the ratio for SEXAGE = 6 (women over 40
years). A similar effect is observed in the ratio of employed over inactive, although the increase is
somewhat smaller (about 7:4 ≈ exp.2/ times). From this we conclude that there is a large increase
in the activity when we move from the group of women over 40 years old to men aged between
16 and 25 years, and this increase is bigger in the unemployed. Similar conclusions are obtained
for the group of men between 25 and 40 years (SEXAGE = 2) in comparison with women over
40 years old; only the increase in activity is slightly smaller and there is a bigger gap between the
unemployed and the employed. Comparing the group of men over 40 years old (SEXAGE = 3)
with the reference group we see that the activity grows with respect to the previous cases, the
increase being much higher for employed people (about 18 times for unemployed and 40 times
for employed). In the group of women aged between 16 and 25 years (SEXAGE = 4) there is also
more activity than in the reference group, and the increase is greater in the number of employed
people. Finally, for women aged between 25 and 40 years (SEXAGE = 5) there is a considerable
decrease in activity, but the number of unemployed women reduces more. The remaining model
coefficients can be interpreted similarly.

2.3. Model diagnostics


Concerning model diagnostics, Pearson residuals are defined as
ydij − mdi p̂dij
rdij = √ , j = 1, 2, i = 1, . . . , 6, d = 1, . . . , 406:
{mdij p̂dij .1 − p̂dij /}
For the category employed, Fig. 5 plots the Pearson residuals rdi1 against the predicted values
ŷdi1 = mdi p̂di1 . The skewness of the predicted values reduces the visibility of the majority of the
5

2
Residuals Employed

0
0 10 20 30 40 50 60 70 80 90 100
-1

-2

-3

-4

-5
Predicted Employed
Fig. 5. Residuals against predicted values for the employed
Estimates of Labour Force Participation 985

4
Residuals Unemployed

0
0 2 4 6 8 10 12 14 16 18 20
-1

-2

-3

-4
Predicted Unemployed
Fig. 6. Residuals against predicted values for the unemployed

observations in the plot. For this reason, we have reduced the scale of the x-axis to 0–100 to
show 93% of the observations more clearly. In the remaining 7% there are no large residuals.
In fact, as we can see in the plot, there are no high residuals in absolute value or any visible
pattern; only a slight decrease of the variability when the predicted values increase. This could
be an effect of the skewness of the predicted values in the graph because, in the absence of
overdispersion, in regions with fewer observations we should see less variability.
For the category unemployed, the analogous plot appears in Fig. 6. The x-range has also
been reduced to show clearly over 99% of the observations. In the x-range from 6 to 20,
there is no obvious pattern. However, between 0 and 6 we can see more variability and a
strange pattern in the form of decreasing parallel curves. The higher variability that is
observed could again be an effect of the skewness of the predicted values. Observe that the
quantities to predict (the number of unemployed individuals) are integer. Thus, each decreas-
ing curve is naturally formed by the residuals corresponding to the same integer value. In
any case the plot indicates some underprediction when the number of unemployed is very
small.
Further validation of the model includes checking whether a model with specific random
effects for the categories unemployed and employed would substantially improve the predic-
tion. A specific diagnostic method has been developed for this, based on the idea that, if the
true model has additional across-area variability in any of the categories that is not explained
by the fitted model, then this extra variability should be found in the residuals. Consider that
the true data-generating model has different random effects for the categories unemployed and
employed. Then, without loss of generality, the true model verifies
0
log.pdij =pdi3 / = ηdij , j = 1, 2,

with linear predictors


0
ηdi1 = xdi β1 + ud + vd ,
0
ηdi2 = xdi β2 + ud ,
986 I. Molina, A. Saei and M. J. Lombardía
where the new random area effects vd are independent of the common random effects ud and
satisfy vd ∼IID N.0, ϕv /. However, in practice the true model is unknown. Then consider that
model (2) is fitted. Let us denote the linear predictors of the fitted model by
ηdi1 = xdi β1 + ud ,
ηdi2 = xdi β2 + ud :

Let us define the vectors of linear predictors η0di = .ηdi1


0 , η 0 /T and η = .η , η /T , and the
di2 di di1 di2
T
vector of true probabilities pdi = .pdi1 , pdi2 / , where
0 /
exp.ηdij
pdij = pdij .η0di / = 0 / + exp.η 0 /
, j = 1, 2:
1 + exp.ηdi1 di2

Additionally, we define the vector εdi = ."di1 , "di2 /T of random errors "dij = ydij − mdi pdij , and
the vector edi = .edi1 , edi2 /T of residuals edij = ydij − mdi p̂dij , where the p̂dij are the estimated
probabilities that are obtained by fitting model (2). It is easy to see that
edi = mdi .pdi − p̂di / + εdi : .3/
By a first-order Taylor series expansion of pdi .η/ about η = η0di , evaluated at η = η̂di , we obtain
0
pdi − p̂di = m−1
di Σdi .η di − η̂ di /, .4/
where Σdi is the variance–covariance matrix of .ydi1 , ydi2 /T . Observe that the derivatives of
pdi = .pdi1 , pdi2 /T with respect to ηdi = .ηdi1 , ηdi2 /T are the elements of m−1
di Σdi . Substituting
equation (4) in equation (3), multiplying the resulting equation on the left by Σ−1 di and subtract-
ing the second component of the obtained equation from the first component, we obtain the
following univariate mixed linear model for the difference of scaled residuals:
.mdi pdi1 /−1 edi1 − .mdi pdi2 /−1 edi2 = xdi α + vd + "di , .5/
where the obtained errors "di are heteroscedastic, with variances
var."di / = m−1 −1 −1
di .pdi1 + pdi2 /:

If the true model was fitted, we would obtain


.mdi pdi1 /−1 edi1 − .mdi pdi2 /−1 edi2 = xdi α + "di : .6/
The diagnostic method consists of fitting both models (5) and (6) to the residuals .edi1 , edi2 /
of model (1)–(2). The true probabilities can be replaced by the fitted probabilities. Then the
two models can be compared with the usual measures for model comparison such as the log-
likelihood or the Bayes information criterion BIC. The resulting values of these measures are
listed in Table 3. We can see that, for the more complicated model including the extra random
effects vd , the gain in log-likelihood is not large, and BIC is also very similar for both models.

Table 3. Results on the fit of models (5) and (6)

Model Log-likelihood BIC

Linear model (6) −3055.9 6306.8


Mixed linear model (5) −3050.2 6303.1
Estimates of Labour Force Participation 987
Applying the principle of parsimony, we prefer to keep the simpler model without the extra
random effects vd .

2.4. Small area estimation of employment characteristics


The fitting method (see Section 2.2) provides estimates of the model coefficients β̂j = .β̂1j , . . . ,
β̂24j /T , j = 1, 2, and predicted values of the random area effects ûd , d = 1, . . . , 406. From these,
predicted values of unsampled totals of unemployed and employed are obtained as

r exp.xdi β̂j + ûd /


ŷdij = mrdi , j = 1, 2, i = 1, . . . , 6, d = 1, . . . , 406:
2

1+ exp.xdi β̂j + ûd /
j=1

Then estimates of the total number of unemployed and employed individuals, and of the rates
of unemployment in each area, are calculated as
6
 r
δ̂dj = .ydij + ŷdij /, j = 1, 2,
i=1
.7/
δ̂d1
 d = 100
ur , d = 1, . . . , 406:
δ̂d1 + δ̂d2
Similarly, other usual labour statistics such as rates of employment, activity or inactivity can be
easily derived from the fit of model (1)–(2).
Direct estimates of small area characteristics are design based and are usually calculated by
using only the sample data belonging to the target area. Direct estimates of the totals of unem-
ployed and employed for each small area were provided by the Office for National Statistics
for the same data. In Fig. 7 we plotted the estimates that were derived from model (1)–(2) and
equation (7) against these direct estimates. We observe that the estimated totals of employed
people are almost equal for both methods. Direct estimates of employment totals are based on
sufficient observations to achieve an acceptable sampling error. Thus, the strong similarity with
4 e+05
40000

3 e+05
30000

model estimate
model estimate

2 e+05
20000

1 e+05
10000

0 e+00
0

0 10000 20000 30000 40000 0 e+00 1 e+05 2 e+05 3 e+05 4 e+05


direct estimate direct estimate
(a) (b)
Fig. 7. Model-based estimates of totals of (a) unemployed and (b) employed versus the corresponding
direct estimates for each area
988 I. Molina, A. Saei and M. J. Lombardía
the proposed model-based estimators gives reliability to the latter. However, for the unemployed
there are large differences between both methods. In what follows we illustrate the poor quality
of direct estimates of the unemployed totals reflected by their excessive sampling errors in areas
with few sampled unemployed.
Let md and Md be the sample and population size of area d respectively, and p̂dir dj , j = 1, 2,
be the direct estimators of the proportions of unemployed and employed individuals in area d.
Then the direct estimators of the corresponding totals and rates of unemployment are given by
dir
δ̂dj = Md p̂dir
dj , j = 1, 2,
dir
δ̂d1
 dir
ur d = 100 dir dir
:
δ̂d1 + δ̂d2
Assuming that the sample units are drawn by simple random sampling with replacement, the
variances and covariances of the totals are given by
dir pdj .1 − pdj /
var.δ̂dj / = Md , j = 1, 2,
md .8/
dir dir pd1 pd2
cov.δ̂d1 , δ̂d2 / = −Md :
md
Estimates of these quantities are obtained by replacing the true proportions pdj with direct esti-
mates. Mean-squared errors of direct rates of unemployment can be easily obtained by Taylor
linearization and using the formulae (8).
The accuracy of both estimates was compared by calculating the ratios between the coeffi-
cients of variation of direct and model-based estimates of totals for both categories, unemployed
and employed. In this way, a ratio greater than 1 indicates a gain in accuracy of the proposed
estimator with respect to the direct estimator, and the larger the value of the ratio the greater
the gain. These coefficients of variation are the squared root of the mean-squared errors over
the corresponding estimates, where the mean-squared errors of the model-based estimators
were obtained by using the bootstrap procedure that is described in Section 2.5. To analyse the
relationship between gain and sample size, in Fig. 8 the ratios for the category unemployed are
14

3.0
12
ratio of coef. of variation

ratio of coef. of variation

2.5
10

2.0
8
6

1.5
4

1.0
2

0 50 100 150 0 200 400 600 800 1000 1200 1400


sampled unemployed sampled employed
(a) (b)
Fig. 8. Ratio of coefficients of variation of direct estimates over model-based estimates of totals of (a) unem-
ployed and (b) employed for each area
Estimates of Labour Force Participation 989
plotted against the number of sampled unemployed individuals (Fig. 8(a)) and the analogous
plot is made for the category employed (Fig. 8(b)).
We observe that all ratios are larger than 1, even for the category employed, where direct
estimators have acceptable sampling errors. Moreover, the gain in accuracy of the estimators
proposed increases when the sample size decreases. For the category unemployed the direct esti-
mates perform poorly in comparison with estimates that are based on the multinomial model.
In fact, the coefficients of variation of the former exceed the publishing limit of 20% for 331 out
of the 406 small areas, whereas for the latter this limit is not exceeded for any small area.

2.5. Mean-squared error


Publication of estimates must be always accompanied by a measure of accuracy, the most
common being the mean-squared error MSE. This section is devoted to the calculation of
mean-squared errors of the characteristics of interest, namely the totals of unemployed δ̂d1 and
employed δ̂d2 , and the rates of unemployment ur  d . In the model-based approach that is followed
in this paper the observations .ydi1 , ydi2 , ydi3 / are regarded as random variables (see model (1)–
(2)), in contrast with the design-based approach, where they are deemed as fixed values. Thus,
the target quantities in this paper are random. The mean-squared error of an estimate θ̂ of the
value of a (possibly random) parameter θ is defined as usual by MSE.θ̂/ = E.θ̂ − θ/2 .
There are some characteristics of the problem at hand that complicate the calculation of an
analytical expression for the mean-squared error of δ̂d1 and δ̂d2 . The first is the non-linearity
of the model, i.e. the non-linearity of the response mean in the linear predictor along with the
lack of normality. The second is the non-linearity of the estimators δ̂d1 and δ̂d2 in the fixed and
random-effects estimates β̂ and û = .û1 , . . . , ûD /T . An extra difficulty is the correlation between
δ̂d1 and δ̂d2 . Until now, analytical expressions for MSE for small area estimators are only avail-
able for linear mixed models where the estimators are linear functions of β̂ and û (Prasad and
Rao, 1990). However, remember that ML estimation of the variance of random effects relies on
an approximation of the model by a linear mixed model (see Appendix A). This approximation,
together with a Taylor series expansion for linearizing the estimates δ̂dj in β̂ and û, allows appli-
cation of Prasad and Rao’s results, adequately adapted to a multivariate set-up (for details see
Appendix B). Then approximations for the MSEs of both quantities of interest δ̂d1 and δ̂d2 can
be derived. This procedure was applied by Saei and Chambers (2003) under a general set-up of
(univariate) generalized linear mixed models. In Appendix B their procedure has been adapted
to the multinomial logit mixed model that is used here.
Owing to the correlation between δ̂d1 and δ̂d2 , a new error term appears, called the mean
crossed product error MCPE, and given by MCPE.δ̂d1 , δ̂d2 / = E{.δ̂d1 − δd1 /.δ̂d2 − δd2 /}. If we
are interested in the mean-squared error of unemployment rates ur  d , which are non-linear func-
tions of previous totals, then the application of Taylor linearization requires the calculation of
MCPE. Concretely, this method leads to the formula
2
 d / = .δ̂d1 + δ̂d2 /−4 {δ̂d2 2
MSE.ur MSE.δ̂d1 / + δ̂d1 MSE.δ̂d2 / − 2δ̂d1 δ̂d2 MCPE.δ̂d1 , δ̂d2 /}: .9/

An estimator of MSE.ur  d / is obtained by replacing the unknown parameters that appear in the
formulae of MSE.δ̂dj /, j = 1, 2, and MCPE.δ̂d1 , δ̂d2 / by their estimated values. We denote this
estimator by mseA .ur
 d /, where A stands for ‘analytical’.
When explicit exact formulae of mean-squared errors cannot be calculated, an alternative
approach that avoids Taylor linearizations and further approximations is resampling. Several
resampling methods have been suggested in small area estimation. Jiang et al. (2002) proposed
a jackknife methodology for estimation under generalized linear mixed models. Pfeffermann
990 I. Molina, A. Saei and M. J. Lombardía
and Tiller (2005) proposed a parametric and a non-parametric bootstrap estimator of mean
prediction errors under state space models. Butar and Lahiri (2003) used a bootstrap for esti-
mation under linear mixed models. Hall and Maiti (2006) proposed a double-bootstrap approach
for bias correction, which is applicable for constructing bias-corrected estimators of the mean-
squared error and for computing prediction regions under general settings. Under logistic mixed
models, González-Manteiga et al. (2007) proposed a bootstrap for mean-squared error estima-
tion on finite populations. This method works by generating bootstrap populations from a
model with probabilistic properties that is similar to the original model but conditional on the
initial sample, and then extracting samples from these populations.
Here we generalize the proposal of González-Manteiga et al. (2007) to the multinomial model
and adapt it to the data structure at hand. The simulation study that was described in Section 3
shows its good performance in a simulation experiment with artificial data similar to the main
application of this paper. The proposed bootstrap works as follows.
(a) Model fitting: fit model (1)–(2) to the original data, obtaining parameter estimates β̂j =
.β̂1j , . . . , β̂24j /T , j = 1, 2, and ϕ̂.
(b) Generation of random effects: generate a vector w containing D independent copies of a
standard normal variable w. Construct the vector uÅ = ϕ̂1=2 w = .u1Å , . . . , uÅD /T such that
E.uÅ / = 0D and var.uÅ / = ϕ̂ID .
(c) Generation of a bootstrap population (sample and non-sample): for d = 1, . . . , D, calculate
the probabilities
 2
 
pÅ = 1 + exp.x β̂ + uÅ / −1 ,
di3 di j d
j=1
Å = pÅ exp.x β̂ + uÅ /,
pdij j = 1, 2:
di3 di j d

Generate the following sample and non-sample multinomial vectors:


.yÅ , yÅ / ∼ Multin.m , pÅ , pÅ /,
di1 di2 di di1 di2

.ydi1 , ydi2 / ∼ Multin.mrdi , pÅdi1 , pÅdi2 /:


Å r Å r

Calculate true area totals and rates of unemployment


6

Å=
δdj Å + yÅr /,
.ydij j = 1, 2,
dij
i=1
urÅ Å Å Å
d = 100δd1 =.δd1 + δd2 /:

(d) Model fitting to the bootstrap sample and parameter estimation: fit model (1)–(2) to the
bootstrap sample data .ydi1 Å , yÅ /, i = 1, . . . , 6, d = 1, . . . , D, obtaining estimates β̂Å and
di2 j
predicted values ûdÅ . From these, calculate individual predicted values
Å
Å r r
exp.xdi β̂j + ûd /
ŷdij = mdi , j = 1, 2:
2 Å
1+ exp.xdi β̂j + ûd / Å
j=1

Then, calculate bootstrap estimates of totals and rates of unemployment, by


6

Å=
δ̂dj Å + ŷÅr /,
.ydij j = 1, 2,
dij
i=1
 dÅ = 100δ̂d1
ur Å =.δ̂ Å + δ̂ Å /:
d1 d2
Estimates of Labour Force Participation 991
Å.b/ Å.b/ Å.b/
(e) Bootstrap replicates: repeat steps (c) and (d) B times. Let δd1 , δd2 and urd denote the
Å.b/ Å.b/  Åd .b/ the estimators that are obtained
true values of the parameters and δ̂d1 , δ̂d2 and ur
in the bth repetition, b = 1, . . . , B. The bootstrap estimators of MSE.ur  d /, MSE.δ̂dj /,
j = 1, 2, and MCPE.δ̂d1 , δ̂d2 / are
B

mseB .ur
 d / = B−1  Åd .b/ − urÅd .b/ /2 ,
.ur
b=1
B
 Å.b/ Å.b/
mseB .δ̂dj / = B−1 .δ̂dj − δdj /2 , j = 1, 2,
b=1
B
 Å.b/ Å.b/ Å.b/ Å.b/
mcpeB .δ̂d1 , δ̂d2 / = B−1 .δ̂d1 − δd1 /.δ̂d2 − δd2 /:
b=1

Fig. 9 depicts the MSE estimates based on the analytical approximation mseA .ur  d / and the
estimates based on bootstrap mseB .ur d / for the first 200 small areas of the Labour Force Survey
data file. We observe that the estimates behave similarly along small areas without big differ-
ences, with the analytical approximation often being somewhat below the bootstrap values. In
the simulation study of Section 3, where the true values of the MSEs are available, the analytical
approximation turns out to be clearly downward biased (see Fig. 11 in Section 3). However,
here the two types of MSE estimates are more similar than in the simulation experiment. Since
the parametric bootstrap relies on full knowledge of the data-generating process, we conjecture
that, when the model is correct as in the simulation study, the performance of the bootstrap-
based estimator is very good. However, in practice the correct model is rarely known. In the
application to the Labour Force Survey data the bootstrap works nicely because the model
fits the data reasonably well, although the differences from the analytical approximation are
smaller.
0.30
0.25
0.20
0.15
0.10
0.05
0.00

0 50 100 150 200


Fig. 9. Analytical estimates mseA .ur
 d/ (Å) and bootstrap estimates mseB .ur
 d/ ( ) for the first 200 areas
992 I. Molina, A. Saei and M. J. Lombardía
3. Simulation study
We performed a simulation experiment with two purposes. The first is to compare the estimated
rates of unemployment that are derived from the multinomial logit mixed model with that
obtained by a univariate logit mixed model for the unemployed as in Hastings et al. (2003). The
second is to compare the performance of the two different estimates of the mean-squared error
that were proposed in Section 2.5. The simulated data are similar to those of the application in
Section 2. They are generated from model (1)–(2), using the values of the fitted parameters for
the real data.
First model (1)–(2) was fitted to the same data as in Section 2, but taking as the explana-
tory variable only the proportion of REG.UNEMPLOYED. Then, with the estimates β̂ and ϕ̂
obtained, K = 1000 populations (sample plus non-sample) with D = 200 small areas were gen-
erated. The generation was carried out as described in steps (a)–(c) of the parametric bootstrap
procedure of Section 2.5. The sample and non-sample sizes mdi and mrdi were chosen as in the
.k/
first 200 areas of the data set. In this way, the true rates of unemployment urd , k = 1, . . . , K, of
the generated populations were available. Then two different models were fitted to each sample
k. The first model is the multinomial logit mixed model (1)–(2). The second is a binomial logit
mixed model for the unemployed, i.e.
ind
ydi1 |ud ∼ Bin.mdi , pdi1 /,
where
log{pdi1 =.1 − pdi1 /} = xdi β + ud :
From the multinomial logit mixed model, estimates of rates of unemployment ur  M.k/
d were
derived as described in Section 2.4, where the superscript M stands for ‘multinomial’. From
L.k/
the binomial logistic model, first model-based estimates of unemployment totals δ̂d1 were
obtained, where the superscript L stands for ‘logistic’. Using the direct estimates of the employ-
dir.k/
ment totals δ̂d2 , then rates of unemployment were obtained as
L.k/
δ̂d1
 L.k/
ur d = 100 L.k/ dir.k/
, d = 1, . . . , D:
δ̂d1 + δ̂d2

M
The mean-squared error of the estimators that was obtained by the two models ur L
d and urd
was approximated empirically as
K
 l.k/ .k/
 ld / = K−1
MSE.ur  d − urd /2 ,
.ur l ∈ {M, L}:
k=1

The resulting empirical MSEs of the estimates derived from the two models are plotted on a
logarithmic scale in Fig. 10. We can observe that the empirical mean-squared errors of the esti-
mates that are derived from the multinomial logit mixed model are much smaller. This happens
because the univariate logistic model does not take into account the dependence between the
number of unemployed, employed and inactive people in the estimation process.
Regarding the second purpose of the simulation study, for the comparison of the two MSE
estimates that were developed in Section 2.5 with the true values being fair, first these true
values were empirically calculated with greater precision (K = 5000). After this preliminary sim-
ulation for obtaining the empirical MSEs, the same simulation scheme was followed, i.e. K = 600
populations were generated with sample and non-sample sizes as before. From each sample k,
estimates of unemployment rates ur  .k/
d were derived from the multinomial logit mixed model,
Estimates of Labour Force Participation 993

2
1
0
−1
−2
−3

0 50 100 150 200


Fig. 10. Empirical values of  Ld /
MSE.ur (Å) and M
MSE.ur d/ ( ) for each small area d on a logarithmic scale

and analytical and bootstrap estimates of the mean-squared errors mseA .ur  .k/ B  .k/ /
d / and mse .ur d
were computed. The latter were obtained with B = 600 replications of the bootstrap procedure
that was described in Section 2.5. As a result, the following quantities were computed:
K
 .k/
mseA .ur
 d / = K−1 mseA .ur
 d /,
k=1
K
 .k/
EdA = K−1 {mseA .ur  d /}2 ,
 d / − MSE.ur
k=1
K
 .k/
mseB .ur
 d / = K−1 mseB .ur
 d /,
k=1
K
 .k/
EdB = K−1 {mseB .ur  d /}2 :
 d / − MSE.ur
k=1

In Fig. 11 the true values MSE.ur  d / that were obtained in the preliminary simulation, the
analytical estimates mseA .ur
 d / and the bootstrap estimates mseB .ur
 d / are plotted for each area.
Observe that the bootstrap estimates are very close to the true values; in fact they are super-
posed for most of the areas. However, the analytical approximations underestimate the true
values for all areas. This bias seriously affects the overall accuracy of MSE estimates. Thus,
although both MSE estimates rely on the model, when small area rates of unemployment are
derived from a reliable model, we recommend estimating MSE by using the bootstrap proposed.

4. Conclusions
A multinomial logit model with random area effects has been proposed for modelling employ-
ment or unemployment data, and small area estimators have been derived from it. The estimates
994 I. Molina, A. Saei and M. J. Lombardía

0.20
0.15
0.10
0.05
0.00

0 50 100 150 200


 d / (), analytical estimates
Fig. 11. True values MSE.ur mseA .ur
 d/ (Å) and bootstrap estimates mseB .ur
 d/
( )

obtained are consistent in the sense that they lie in the desired space, i.e. the sum of estimated
totals of unemployed, employed and inactive sum up to the population total. In comparison
with direct estimators, they have reduced variance without a significant bias.
Furthermore, two different ways of estimating the mean-squared error of the small area esti-
mators proposed are given: an analytical expression and a bootstrap estimator. The analytical
approximation is based on Taylor linearizations that are specific for the model and the parameter
at hand, whereas the bootstrap procedure is designed for the multinomial logit model avoiding
any linearization and can be easily adapted to some variations in the model and to different tar-
get parameters. Furthermore, the bootstrap estimator has performed better than the analytical
estimator in the simulations, although the differences are smaller in the application with UK
unemployment data.
There are various straightforward extensions of the multinomial logit mixed model that was
proposed in this work. If auxiliary information is available for all units of the population, a
unit level model can be used, whereas, if there is only area level information, the model should
be stated at the area level. Moreover, the sampling design can be introduced in the estimation
procedure by taking as response variables the direct estimates of the totals of unemployed and
employed individuals, and assuming that these totals follow a multinomial model.

Acknowledgements
This work started during a research stay of the first author in the Department of Social Sta-
tistics of the University of Southampton in the summer of 2003 by invitation of Professor
Raymond L. Chambers. We thank him and Professor Domingo Morales for their continu-
ous support and advice during this work, Miguel Molina for his help in the enhancement of
the program code, Zsolt Sándor and Roland Fried for their help in the last stage of the work
Estimates of Labour Force Participation 995
and finally the referees for their careful reading and helpful comments. It has been supported
by grants MTM 2006-05693, SEJ2004-03303, MTM2005-00820, PGIDT03PXIC20702PN and
PGIDIT06PXIB207009PR.

Appendix A: Fitting a multinomial logit mixed model


Here we describe the technical details of the combined PQL–ML or PQL–RML algorithms that were
adapted to the multinomial logit mixed model that is proposed in this paper.
Let ydi = .ydi1 , ydi2 /T be the ith observation inside area d of the numbers out of the total mdi sampled
in the first two of the three categories and pdi = .pdi1 , pdi2 /T the corresponding vector of multinomial
probabilities. The mean and the covariance matrix of ydi are given by
µdi = mdi pdi ,

Σdi = mdi .Pdi − pdi pdi /,

where Pdi = diag.pdi1 , pdi2 /. The natural parameter is θdi = .θdi1 , θdi2 /T , where θdij = log.pdij =pdi3 /, j = 1, 2,
and where pdi3 = 1 − pdi1 − pdi2 is the multinomial probability for the third category. Let u = .u1 , . . . , uD /T
be the vector of random effects that are associated with D small areas. With this notation, the proposed
multinomial logit mixed model (2) can be written as
θdi = Xdi β + Zdi u, u ∼ ND .0D , ϕID /, i = 1, . . . , 6, d = 1, . . . , D:
Here,
 
xdi 01×24
Xdi = ,
01×24 xdi
Zdi = .02×.d−1/ 12 02×.D−d/ /
are the 2 × p and 2 × D incidence matrices for observation i within area d with p = 48. We denote by xdij
the jth row of matrix Xdi , j = 1, 2. Additionally, let us denote by y, X and Z the matrices with the sample
elements ydi , Xdi and Zdi stacked in columns. The conditional density of y given u is
D 
 6
f1 .y|u/ = f.ydi1 , ydi2 |ud /,
d=1 i=1

and the marginal density of u is


 
1 D
f2 .u/ = .2π/−D=2 .ϕ/−D=2 exp − u2d :
2ϕ d=1
Let l1 .y|u/ = log{f1 .y|u/} and l2 .u/ = log{f2 .u/}. For ϕ known, PQL estimators of β and u are obtained
by maximization of the joint log-likelihood l.y, u/ = l1 .y|u/ + l2 .u/. This method can be implemented by
using a Newton–Raphson algorithm.
Now assume that β is known and u is fixed and known. Adapting the ideas of Schall (1991) to a bivariate
setting, a Taylor series expansion of the functions gj .ydi / = log{ydi1 =.mdi − ydi1 − ydi2 /}, j = 1, 2, about the
point µdi leads to
 
@gj  @gj 
gj .ydi / = gj .µdi / + .y − µ / + .ydi2 − µdi2 /, j = 1, 2:
@ydi1 µdi @ydi2 µdi
di1 di1

Let us denote ξdi = .g1 .ydi /, g2 .ydi //T and edi = Σ−1
di .ydi − µdi /. Calculating the expressions of the derivatives
involved and using matrix notation, the above Taylor series expansion becomes
ξdi = Xdi β + Zdi u + edi , .10/
where var.edi / = Σ−1
Let ξ denote the vector that is constructed by stacking the vectors ξdi in one column
di .
and V = var.ξ/. Then V = ϕZZT + Σ−1 , where Σ = diag.Σdi , i = 1, . . . , 6, d = 1, . . . , D/. Assuming that the
marginal distribution of ξ is approximately normal, and maximizing the log-likelihood of ξ with respect
996 I. Molina, A. Saei and M. J. Lombardía
to ϕ, we obtain the approximate likelihood equation

D 1 D
ϕ = .n − r1 /−1 u2d , r1 = v−1 , .11/
d=1 ϕ d=1 d
where

6
vd = mdi pdi3 .1 − pdi3 / − ϕ−1 :
i=1

Thus, if β and u are known, plugging an initial value of ϕ in r1 and iterating via the formula of ϕ in
equation (11), we obtain an approximated ML of ϕ.
Following Harville (1977), the approximated RML estimator of ϕ is obtained by maximizing the re-
stricted likelihood
 
1
f.ϕ; ξ/ = .2πϕ/−.n−p/=2 |XT X|1=2 |V|−1=2 |XT V−1 X|−1=2 exp − ξ′ Πξ , .12/

where

Π = V−1 − V−1 XPXT V−1 P = .XT V−1 X/−1 :

Consider the matrices

T = .ZT ΣZ + ϕ−1 ID /−1 ,


R = T + TZT ΣXPXT ΣZT:
The final RML equation, which is obtained by equating the derivative of f.ϕ; ξ/ to 0, is given by

D
ϕ = .n − r2 /−1 u2d r2 = ϕ−1 tr.R/: .13/
d=1

Thus, starting with some initial values, estimates of β, u and ϕ can be obtained through a double-iter-
ation scheme. First update β and u by the Newton–Raphson equation to obtain PQL estimators, with ϕ
known, and then take the updated values of β and u as entries for one of the updating equations for ϕ,
either equation (11) or equation (13). The detailed PQL–ML fitting algorithm is described below.

A.1. Penalized quasi-likelihood–maximum likelihood algorithm for fitting the multinomial


logit mixed model
(a) Set the desired precision " and l = 1, and take initial values β.0/ , u.0/ = .u1.0/ , . . . , uD.0/ /T and ϕ.1/ .
(b) Perform the following substeps.
(i) Set k = 0, β0 = β.l−1/ and u0 = .u01 , . . . , u0D /T = u.l−1/ .
(ii) Update the current values βk and uk in the following way. Calculate, for i = 1, . . . , 6, d = 1, . . . , D,
k
θdij = xdij βk + ukd , j = 1, 2,
−1

2
pkdi3 = 1 + k
exp.θdij / ,
j=1

k
pkdij = pkdi3 exp.θdij /, j = 1, 2,
 k 
p
µkdi = mdi kdi1 ,
pdi2
 k 
k
pdi1 .1 − pkdi1 / −pkdi1 pkdi2
Σdi = mdi :
−pkdi1 pkdi2 pkdi2 .1 − pkdi2 /

Compute
Estimates of Labour Force Participation 997
D 
 6
T k
Ak = Xdi Σdi Xdi ,
d=1 i=1
D 
 6
T k
Bk = Xdi Σdi Zdi ,
d=1 i=1


6
vkd = mdi pkdi3 .1 − pkdi3 / − ϕ−1
.l/ , d = 1, . . . , D,
i=1

and
Tk = diag{.vk1 /−1 , . . . , .vkD /−1 },
From this, compute Wk = {Ak − Bk Tk .Bk /T }−1 . The updating equation is
 k+1   k     Sk 
β β Wk −Wk Bk Tk β
= +
uk+1 uk −Tk .Bk /T Wk Tk + Tk .Bk /T Wk Bk Tk Sku
where
D 
 6
T
Skβ = Xdi .ydi − µkdi /,
d=1 i=1
D 
 6
Sku = ZTdi .ydi − µkdi / − ϕ−1 k
.l/ u :
d=1 i=1

For ydi3 = mdi − Σ2j=1 ydij , d = 1, . . . , D, we have


⎛ n1 ⎞
.m1i pk1iq − y1iq / − ϕ−1 k
.l/ u1
⎜ i=1 ⎟
⎜ ⎟
k ⎜
Su = ⎜ :
:: ⎟:

⎝nD ⎠
k −1 k
.mDi pDiq − yDiq / − ϕ.l/ uD
i=1

(iii) If the condition below holds, denote the last estimates by β.l/ and u.l/ . Otherwise increase k by 1
unit and return to step (ii).
 k+1   k+1  
 βj − βjk   ud − ukd 
max   , j = 1, . . . , p,  , d = 1, . . . , D < ":
βjk   ukd 

(c) Compute, for j = 1, . . . , 6, d = 1, . . . , D,


θdij.l/ = xdij β.l/ + ud.l/ , j = 1, 2,
 2 −1
pdi3.l/ = 1 + exp.θdij.l/ / ,
j=1

pdij.l/ = pdi3.l/ exp.θdij.l/ /, j = 1, 2,


 
p
µdi.l/ = mdi di1.l/ ,
pdi2.l/
 
p .1 − pdi1.l/ / −pdi1.l/ pdi2.l/
Σdi.l/ = mdi di1.l/ :
−pdi1.l/ pdi2.l/ pdi2.l/ .1 − pdi2.l/ /
Calculate

D 
6
r.l/ = ϕ−1
.l/ v−1
d.l/ , for vd.l/ = mdi pdi3.l/ .1 − pdi3.l/ / + ϕ−1
.l/ :
d=1 i=1

Finally, update the estimates of ϕ by the equation


998 I. Molina, A. Saei and M. J. Lombardía

D
ϕ.l+1/ = u2d.l/ =.D − r.l/ /:
d=1

(d) If the condition below holds stop. Otherwise increase l by 1 unit and return to step (b).
     
 βj.l+1/ − βj.l/     
max  , j = 1, . . . , p,  ud.l+1/ − ud.l/ , d = 1, . . . , D,  ϕ.l+1/ − ϕ.l/  < ":
βj.l/   ud.l/   ϕ.l/ 

Appendix B: Analytic approximation of the mean-squared error


As mentioned in Section 2.5, the mean-squared error of ur  d can be obtained from the MSEs and the
MCPE for δ̂dj , j = 1, 2, through formula (9). Under generalized linear mixed models, these quantities can
be approximated by using linear approximations of the model and the estimators δ̂dj (see Saei and Cham-
bers (2003)). In this section we describe this procedure adapted to the multinomial model that is treated
here.
Let δ d = .δd1 , δd2 /T be the target parameter of the small area d. This parameter can be written as

6 
6
r

6 
6 
6
δd = ydi + ydi = ydi + µrdi + r
.ydi − µrdi /, .14/
i=1 i=1 i=1 i=1 i=1

where µrdi = .µrdi1 , µrdi2 /T , and

exp.θdij /
µrdij = mrdi = µrdij .θdi /, j = 1, 2:
2
1 + exp.θdik /
k=1

The estimator of δ̂ d is


6 
6
δ̂ d = ydi + µ̂rdi ,
i=1 i=1

where µ̂rdij = µrdij .θ̂di /, j = 1, 2, and θ̂di = Xdi β̂ + Zdi û. Let us consider the working parameter τ d = Σ6i=1 µrdi
and its estimator τ̂ d = Σ6i=1 µ̂rdi , and let us denote the unpredictable part of equation (14) by εrd = Σ6i=1 .ydi r

r
µdi /. Then, the mean-squared error of δ̂ d can be written in terms of the mean-squared error of τ̂ d plus
additional terms as

MSE.δ̂ d / = MSE.τ̂ d / + E{εrd .εrd /T } − E{.τ̂ d − τ d /.εrd /T } − E{εrd .τ̂ d − τ d /T }: .15/

The second term on the right-hand side of equation (15) can be approximated by the conditional expec-
tation
6 
 6 
6
E{εrd .εrd /T |ud } = r
E{.ydi r
− µrdi /.ydk − µrdk /|ud } = Σrdi ,
i=1 k=1 i=1

where Σrdi = mrdi .Pdi − pdi pdi



/, since for k = i the expectation inside the sum is 0 (the observations are con-
ditionally independent).
Concerning the first term in equation (15), observe that τ̂ d = Σ6i=1 µ̂rdi is not linear in β̂ and û. However,
a Taylor series expansion of µrdij .θ̂di / around θdi yields

2 @µr
dij
µ̂rdij ∼
= µrdij + .θ̂dik − θdik /, j = 1, 2:
k=1 @θdik

Calculating the expressions of the derivatives, the Taylor series expansion written in matrix notation is

µ̂rdi − µrdi ∼
= Σrdi .θ̂di − θdi /:

Now consider the following parameter and estimator:


Estimates of Labour Force Participation 999

6
τ ′d = Σrdi θdi = Md β + Kd u,
i=1


6
τ̂ ′d = Σrdi θ̂di = Md β̂ + Kd û,
i=1

where Md = Σ6i=1 Σrdi Xdi and Kd = Σ6i=1 Σrdi Zdi . Then it holds that MSE.τ̂ d / = MSE.τ̂ ′d /, where now τ̂ ′d is
linear in β̂ and û.
Under linear mixed models, Prasad and Rao (1990) obtained an analytical approximation of the MSE
of an estimator of the type λT β̂ + mT û, where β̂ and û are respectively the best linear unbiased estimator
of β and the best linear unbiased predictor of u. The multinomial mixed model can be approximated by
the linear mixed model (10) for the transformed data vector ξ. Moreover, PQL equations for β and u are
(see Breslow and Clayton (1993))

β̂ = .XT V−1 X/−1 XT V−1 ξ,


û = ϕZT V−1 .ξ − Xβ̂/:

If V were known, these formulae would be the best linear unbiased estimator of β and the best linear
unbiased predictor of u under the linear model (10). Thus, this fact justifies the use of Prasad and Rao’s
formula for approximating MSE.τ̂ ′d /. This formula was adapted to a multivariate mixed linear model and
a multidimensional parameter by Baíllo and Molina (2005). Let us denote

Λd = Kd − Md TZT ΣX,
Γd = ϕV−1 ZMdT :

The approximation of MSE.τ̂ d / = MSE.τ̂ ′d / is

MSEAP .τ̂ d / = G1 .ϕ/ + G2 .ϕ/ + G3 .ϕ/,

where
G1 .ϕ/ = Md TMdT ,
G2 .ϕ/ = Λd PΛTd ,
G3 .ϕ/ = .@Γd =@ϕ/T V.@Γd =@ϕ/I −1 :

Here, I denotes the Fisher information of the parameter ϕ obtained from the likelihood of ξ. If the ML
method is used for estimating ϕ, then the Fisher information is obtained from the (normal) likelihood of
ξ and is equal to

1 D ωd2 
6
I1 = , ωd = mdi pdi3 .1 − pdi3 /, d = 1, . . . , D:
2 d=1 .1 + ϕωd /2 i=1

If the method that is used for estimating ϕ is the RML, then the Fisher information that is obtained from
the restricted likelihood (12) becomes
1 2 1
I2 = n− tr.R/ + 2 tr.R2 / :
2ϕ2 ϕ ϕ

Let us denote G4 .ϕ/ = Σ6i=1 Σrdi . Then, an approximation to the mean-squared error of the original target
parameter δ̂ d is

4
MSEA .δ̂ d / = Gk .ϕ/:
k=1

An estimator of MSEA .δ̂ d / could be obtained by replacing ϕ in each Gk .ϕ/ by its estimator, either the
ML or the RML estimator. However, it is known (see for example Prasad and Rao (1990)) that G1 .ϕ̂/
1000 I. Molina, A. Saei and M. J. Lombardía
is asymptotically biased for G1 .ϕ/, with negative bias equal to G3 .ϕ/. Thus, an asymptotically unbiased
estimator of G1 .ϕ/ is G1 .ϕ̂/ + G3 .ϕ̂/. Therefore, we take the following estimator of MSEA .δ̂ d /:
mseA .δ̂ d / = G1 .ϕ̂/ + G2 .ϕ̂/ + 2 G3 .ϕ̂/ + G4 .ϕ̂/:

References
Bailey, S., Charlton, J., Dollamore, G. and Fitzpatrick, J. (2000) Families, Groups and Clusters of local and health
authorities: revised for authorities in 1999. Popln Trends, 99, 37–52.
Baíllo, A. and Molina, I. (2005) Mean squared errors of small area estimators under a unit-level multivariate
model. Working Paper 05-40 (07). Universidad Carlos III de Madrid, Madrid.
Breslow, N. E. and Clayton, D. G. (1993) Approximate inference in generalized linear mixed models. J. Am.
Statist. Ass., 88, 9–25.
Butar, F. B. and Lahiri, P. (2003) On measures of uncertainty of empirical Bayes small-area estimators. J. Statist.
Planng Inf., 112, 63–76.
Claeskens, G. (2004) Restricted likelihood ratio lack-of-fit tests using mixed spline models. J. R. Statist. Soc. B,
66, 909–926.
Estevao, V. M. and Särndal, C. E. (1999) The use of auxiliary information in design-based estimation for domains.
Surv. Methodol., 25, 213–221.
Estevao, V. M. and Särndal, C. E. (2005) Borrowing strength is not the best technique within a wide class of
design-consistent domain estimators. J. Off. Statist., 20, 1–25.
EURAREA Consortium (2004) EURAREA Project IST-2000-26290. (Available from http://www.
statistics.gov.uk/eurarea.)
Fay, R. E. and Herriot, R. A. (1979) Estimation of income from small places: an application of James-Stein
procedures to census data. J. Am. Statist. Ass., 74, 269–277.
González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D. and Santamaría, L. (2007) Estimation of
the mean squared error of predictors of small area linear parameters under a logistic mixed model. Computnl
Statist. Data Anal., 51, 2720–2733.
Hall, P. and Maiti, T. (2006) On parametric bootstrap methods for small area prediction. J. R. Statist. Soc. B, 68,
221–238.
Harville, D. A. (1977) Maximum likelihood approaches to variance component estimation and related problems.
J. Am. Statist. Ass., 72, 322–340.
Hastings, D., Maine, N., Brown, G. and Crudas, M. (2003) Development of improved estimation methods for
local area unemployment levels and rates. In Technical Report, Labour Market Trends, pp. 37–43. London:
Office for National Statistics.
Jiang, J. and Lahiri, P. (2006) Mixed model prediction and small area estimation. Test, 15, 1–96.
Jiang, J., Lahiri, P. and Wan, S. (2002) A unified jackknife theory for empirical best prediction with M-estimation.
Ann. Statist., 30, 1782–1810.
Lehtonen, R. and Veijanen, A. (1998) Logistic generalized regression estimators. Surv. Methodol., 24, 51–55.
Office for National Statistics (2004) Labour Force Survey User Guide. London: Office for National Statistics.
(Available from http://www.statistics.gov.uk/downloads/theme-labour/Vol6.pdf.)
Pfeffermann, D. and Tiller, R. (2005) Bootstrap approximation to prediction MSE for state-space models with
estimated parameters. J. Time Ser. Anal., 26, 893–916.
Prasad, N. G. N. and Rao, J. N. K. (1990) The estimation of the mean squared error of small-area estimators.
J. Am. Statist. Ass., 85, 163–171.
Rao, J. N. K. (2003) Small Area Estimation. New York: Wiley.
Saei, A. and Chambers, R. (2003) Small area estimation under linear and generalized linear mixed models with
time and area effects. Working Paper M03/15. Southampton Statistical Sciences Research Institute, University
of Southampton, Southampton.
Schall, R. (1991) Estimation in generalized linear models with random effects. Biometrika, 78, 719–727.
Self, S. G. and Liang, K.-Y. (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio
tests under nonstandard conditions. J. Am. Statist. Ass., 82, 605–610.

You might also like