Professional Documents
Culture Documents
Small-Area Estimation: Theory and Practice: Michael Hidiroglou
Small-Area Estimation: Theory and Practice: Michael Hidiroglou
Small-Area Estimation: Theory and Practice: Michael Hidiroglou
Michael Hidiroglou
Statistical Innovation and Research Division, Statistics Canada, 16 th Floor Section D, R.H. Coats Building, Tunney's
Pasture, Ottawa, Ontario, K1A 0T6, Canada
3445
Section on Survey Research Methods
p(s), and the probability of including the j-th element in Rao (2003). We will confine ourselves to just a few
in the sample is π j . The design weight for each of them that include the synthetic estimator, and the
more well-known composite estimators.
selected unit j ∈ s is defined as w j = 1/ π j . Suppose
U i denotes a domain (or subpopulation) of interest. 3. 2 Direct Estimation
Denote as si = s ∩ U i the part of the sample s that
Let w j be the design weight associated with j ∈ s .
falls in domain U i .The realized sample size of si is a
The Horvitz-Thompson is the simplest direct estimator.
random variable ni , where 0 ≤ ni ≤ N i . Auxiliary data x
If the small area total Yi is to be estimated for small
will either be known at the element level x j for j ∈ s
area U i , then the corresponding Horvitz-Thompson
or for each small area i as totals X i = ∑
j∈U i
x j or
estimator is given by Yˆ = w y provided i , HT ∑ j∈si j j
The problem is to estimate the domain total Auxiliary information can be available either at the
Yi = ∑j∈U
y j or the domain mean Yi = Yi / N i ,
i
population level or at the domain level. If it available
at the population level, then we used the Generalized
where N i , the number of elements in U i may or may Regression Estimator (GREG) given by
not be known. We define yij to be y j if j ∈ U i , and 0 Yˆ
i ,GR= X ′ β% + Yˆ
i ,GREG − Xˆ ′ β% ( where
i , HT HT i ,GREG )
otherwise. An indicator variable aij is similarly
∑∑
m
defined: it is equal to one if j ∈ U i and 0 otherwise. X′ =
j∈Ui
xj , Xˆ HT
′ = ∑ xk′ / π k , and
∑y =∑y a
s
i =1
Note that Yi can be written as Yi = ij j ij .
j∈U j∈U β%i ,GREG is the set of regression coefficient obtained by
Small area estimation is categorized into two types of
regressing yij on x j . That is
estimators: direct and indirect estimators. A direct
estimator is one that uses values of the variable of −1
⎛ w j x j x ′j ⎞ w j x j yij
⎜∑ ∑
interest, y, only from the sample units in the domain of β%i ,GREG = ⎜ ⎟ ,
interest. However, a major disadvantage of such ⎝
s cj ⎟
⎠
s cj
estimators is that unacceptably large standard errors may where c j is a specified constant ( c j >0 ).
result: this is especially true if the sample size within
the domain is small or nil. An indirect estimator uses
values of the variable of interest from a domain and/or The straight GREG is estimator is not efficient, and it
time period other than the domain and time period of is better to use regression estimators that use auxiliary
interest. Three types of indirect estimators can be data available as close possible to the small areas of
identified. A domain indirect estimator uses values of interest. One such estimator is the domain–specific
the variable of interest from another domain but not GREG that uses auxiliary data at the domain level. It is
from another time period. A time indirect estimator
uses values of the variable of interest from another
given by Y * = X ′βˆ
i , GR + Yˆ − Xˆ ′ βˆ
i i ,GREG ( i , HT i , HT i , GREG )
−1
time period but not from another domain. An estimator
that is both domain and time indirect uses values of the
where βˆi ,GREG = ⎛⎜
⎝
∑ si
w j x j x ′j / c j ⎞⎟
⎠
∑ si
wj x j y j / c j .
variable of interest from another domain and another
time period. An estimator that is approximately p-unbiased as the
overall sample size increases but uses y-values outside
An alternative is to use estimators that borrow strength the domain is the modified direct estimator given by
across small areas, by modeling dependent on
independent variables across a number of small areas: Yˆ = X ′βˆ
i , SR + Yˆ
i GREG − Xˆ ′ βˆ ( where
i , HT i , HT GREG )
( ∑ w x x′ / c ) ∑
they are called indirect estimators. Indirect estimators −1
will be quite good (i.e.: indirectly increase the effective βˆGREG = j j j j wj x j y j / c j .This
s s
sample size and thus decrease the standard error) if the
estimator is also referred to in Woodruff (1966), and
models obtained across small areas still hold at the
Battese, Harter, and Fuller (1988) as the “survey
small area level. Departures from the model will result
regression estimator”.
in unknown biases. There is a wide variety of indirect
estimators available, and a good summary is provided
3446
Section on Survey Research Methods
Hidiroglou and Patak (2004) compared a number of composite estimator is most insensitive when the mean
the direct estimators. One of their conclusions was that square errors of the two component estimators do not
the direct estimators would be best if the domains of differ greatly. Simple weighting factors for the
interest coincided as closely as possible with the composite estimators that depend on the realized
design strata. domain size were given by Drew, Singh and Choudhry
(1982), and by Hidiroglou and Särndal (1985)
3.2 Indirect Estimation
Small area estimators are split into two main types,
Some of the most widely used indirect estimators have depending on how models are applied to the data
been the synthetic estimator, the regression-adjusted within the small areas: these two types are known as
synthetic, the composite estimator, and the sample- area level and unit level. Small area estimators are
dependent estimator. based on area level computations if models link small
area means of interest (y) to area-specific auxiliary
The synthetic estimator uses reliable information of a variables (such as x sample means). They are based on
direct estimator for a large area that spans several unit level computations if the models link unit values
small areas, and this information is used to obtain an of interest to unit-specific auxiliary variables. Area
indirect estimator for a small area. It is assumed that based small area estimators are computed if the unit
the small areas have the same characteristics as the level area data are not available. They can also be
large area: Gonzalez (1978) provides a good account computed if the unit level data are available by
how these estimators were obtained, and used to obtain summarizing them at the appropriate area level.
unemployment statistics at levels lower than those
planned in the survey design. The National Center for 3.2.1 Area Model
Health Statistics (1968) in the United States pioneered
the use of synthetic estimation for developing state One of the most widely used area based level small
estimates of disability and other health characteristics area estimator was given by Fay and Herriot (1979)
from the National Health Interview Survey (NHIS).
Sample sizes in most states were too small to provide
small. Population totals ( Yi = ∑
j∈U
y j ) or means
i
resulting combined estimator is given by weight w% j associated with the j-th unit can be the
design weight w j (i.e. w% j = w j ) or a final weight that
Yˆi ,COMB = φiYˆi , DIR + (1 − φi )Yˆi , INDIR reflects any adjustment (i.e.: non-response, calibration,
or a product thereof) made to the design weight.
where φi ( 0 ≤ φi ≤ 1) . The optimal φi* is determined by
The synthetic portion is estimated as the product of a
minimizing the MSE of Yˆi ,COMB . The resulting given auxiliary population mean row-vector
composite estimator has a mean square error which is
smaller than that of either component estimator.
(say Zi′ = ∑ z ′j / N i ) for the i-th small area of
j∈U i
3447
Section on Survey Research Methods
ˆ
small area U i . The regression vector βFH is computed and ψˆ iEXP = ∑w y ∑w
j∈U i
j j
j∈U i
j is the simple estimator
across a number of small areas in such a way that the
ˆ of the mean involving the design weights w j . The
model linking the variable of interest (the mean Yi , DIR )
computations required to obtain the normal regression
auxiliary data also holds at the small area level. The estimator do not involve estimating any variance
Fay-Herriot estimator of a given population mean Yi is components.
estimated as:
ˆ
Yi , FH = γ i Yi , DIR + (1 − γ i ) Zi′βFH
% % 3.2.2 Unit Model
(3.1)
The unit model originates with Battese, Harter and
The two components (direct estimator and synthetic Fuller (1988). They used the nested error regression
estimator) of (3.1) are weighted γ i and (1 − γ i ) where model to estimate county crop areas using sample
%
survey data in conjunction with satellite information.
γ i = σ v2 /(σ v2 +ψ i ) . The regression vector βFH and γ i Their model is given by
depend on the population variance ψ i , DIR of the direct
yij = xij′ β + vi + eij (3.5)
ˆ
estimator Yi , DIR and the model variance σ v2 . Although iid iid
( ) ∑
ni
ˆ ˆ ˆ ˆ
Yi , REG = Z i′ βREG + Yi , EXP − Z i′ βREG (3.5) where v%i. = ni−1
j =1
(y ij )
− xij′ β% γ i with
given in Cochran (1977), where the estimated
regression vector is given by −1
∑∑ ∑∑ ( x y − γ x y ) (3.8)
m ni m ni
⎛ ⎞
−1 β% BHF = ⎜ ( xij xij′ − γ i xi. xi′. ) ⎟
∑ Z ′Z /ψˆ ∑ Z ′ ψˆ
D D
βˆ REG
⎛ ⎞ ⎛ ⎞ ij ij i i. i.
=⎜ i i i , EXP ⎟ ⎜ i i
EXP
/ψˆ i , EXP ⎟ ⎝ i =1 j =1 ⎠ i =1 j =1
⎝ i =1 ⎠ ⎝ i =1 ⎠
3448
Section on Survey Research Methods
and γ i = σ v2 (σ v2 + ni−1σ e2 ) .
−1
well-known-method of fitting-of-constants to estimate benchmarking property means that the sum of the
them. The resulting estimator of the i-th area sample estimated small area totals is equal to the direct
mean is known as the EBLUP estimator, because the estimator of the overall total Y. That is,
variance components were estimated.
ˆ ′
∑
m
Prasad and Rao (1990) derived an approximation to N i Yi , PR = Yˆw + ( X − X w ) βˆ w
i =1
o(m −1 ) for the model based mean squared error of the
ˆ
∑ ∑ ∑
m m ni
Battese-Harter-Fuller estimator, and also obtained its where Yˆw = Ni Yi , PRYˆw = w% ij yij and
i =1 i =1 j =1
estimator to o(m −1 ) as well. Prasad-Rao (1999) were
the first to include the survey weights in the unit level Xˆ w is similarly defined.
model: they labelled their estimator as a pseudo-
EBLUP estimator of the small area mean Yi . The 4. Applications
Prasad-Rao estimator of Yi is given by 4.1 Canadian Community Health Survey: Area
model
Yi , PR = X i′ βˆ PR + γ iw yiw − xiw
%
(
′ β% PR ) (3.9) The Canadian Community Health Survey CCHS is a
cross-sectional health survey carried out by Statistics
Canada since 2001.The survey operates on a two-year
where (
γ iw = σ v2 / σ v2 + σ e2 ∑ j∈si
w% 2j ) with collection cycle. The first year of the survey cycle
"x.1" is a large sample (130,000 persons), general
yiw = ∑ j∈si
w% j y j ; w% ij = wij* / ∑ j∈si
wij* and wij* are population health survey, designed to provide reliable
estimates at the health region (sub-provincial areas
calibrated weights, and β% PR is given by defined in terms of Census results), provincial and
national levels. This portion of the survey collects
−1 m information related to health status, health care
∑γ ∑γ
m
⎛ ⎞
β%PR = ⎜ iw xiw xiw′ ⎟ iw xiw yiw (3.10) utilization and health determinants for the Canadian
⎝ i =1 ⎠ i =1 population. The second year of the survey cycle "x.2"
has a smaller sample (30,000 persons) and is designed
Prasad and Rao (1999) also provided model based to provide provincial and national level results on
expressions for the MSE of their estimator when it specific focused health topics.
included the estimated variance components
σ v2 and σ e2 . The CCHS is based on a multiple frame (two frames)
sampling design of that uses. The first one, used as the
primary frame, is the area frame designed for the
The sum of small area estimates do not necessarily add
Canadian Labour Force Survey. This survey is
up to the corresponding direct estimator. You-and Rao
basically a two-stage stratified design that uses
(2002) proposed an estimator of β that ensures self-
probability proportional to size without replacement at
benchmarking of the small area estimates to the each stage. Face to face interviews take place with
corresponding direct estimator. Their estimator is individuals selected from that frame. The second frame
given by uses a list frame of telephone numbers in some of the
Health Regions for cost reasons. Individuals selected
Yi ,YR = X i′ βˆYR + γ iw yiw − xiw (
′ βˆYR ) (3.10) in that frame are interviewed by telephone.
where The area frame uses the Labour Force Frame. This
resulting sample is a two-stage stratified cluster.
−1
3449
Section on Survey Research Methods
,a (1 − pr ,a )
( a = 1,… , 10 ). The direct estimator of proportion of DIR ˆ rDIR
p ˆ DIR
alcohol abuse is given by p̂ DIR = Yˆ DIR / Nˆ DIR ψ% rDIR
,a term given by ψ r ,a = def
%
DIR
r ,a r ,a r ,a nr ,a
where N̂ DIR
r ,a = ∑ j∈sr ,a
*
w . Given that, for domain ra,
%j is obtained using the smoothed design effect
def
DIR
= ∑ deffi i
DIR
/ I over the I=200 domains. The
σ̂ v2 term is obtained from the Fay-Herriot
methodology: computational details for
estimating σ̂ v can be found in Rao (2003, p. 118). The
2
3450
Section on Survey Research Methods
∑ ∑( y
2
1 I ⎛ 1 R ⎞
primary domains of interest, as well as incorporate ⎜
(r )
i ,EST − Yi ) ⎟ .
I ⎜ RYi ⎟
improvements on the use of the administrative data. i =1 ⎝ r =1 ⎠
The resulting sample, estimated to be between 11,00 to
20,00 establishments (depending on budget Estimators considered in the Rubin et al. (2007)
constraints) will be allocated to the newly defined simulation included the GREG, the Prasad-Rao (1999)
strata, defined as cross-classifications of geography pseudo-EBLUP unit level, and the You-Rao (2002)
(provinces) and industry (NAICS3), so that the pseudo-EBLUP area level SAE estimators given in
resulting GREG estimates for AWE satisfy coefficients Section 3.0. The GREG estimator is given by
of variation. The design strata are also referred to
model groups since the GREG estimators are
y =
i ,GREG ∑
E% x ′ βˆ + w E% y − x ′ βˆ
Ui ij ij ∑
(4.1) si ij ij ( ij ij )
computed at these levels as well. Estimates below this with xij′ = (1, xij ) . Here xij is the average monthly
level can be obtained using domain estimation. As the
sample associated will be relatively small (or non- earnings associated with the j-th sampled
existent), the reliability associated with the GREG establishment within domain U i , and β̂ is the
3451
Section on Survey Research Methods
regression estimator resulting from the (Undercount) and the gross number of persons
iid erroneously included in the final Census count
model yij = xij′ βˆ + eij , with eij ~( 0,σ / Eij ) .
2
e (Overcount). The sample size of the RRC is designed
to produce reliable direct estimates for the provinces
Figure 3 and 4 provide the ARB and ARMSE (including the two Territories),and eight age - sex
respectively for construction domains in Canada for groups, with age categories are less than 19, 20 to
2005. The GREG estimator has the smallest ARB 29, 30 to 44, and 45 and over at the national level.
amongst the three estimators. The Prasad-Rao (1999) The cross tabulation of these two marginal tabulations
is the best estimator in terms of ARMSE. This is results in m= 96 (12*8) cells. These cells are
reasonable on account that the You-Rao (2002) considered as small areas because they have too few
estimator loses efficiency on account of its observations to sustain reliable direct estimates. The
benchmarking property. objective is to use small area techniques to improve the
reliability of the cell estimates. Dick (1995) applied the
Fay-Herriot methodology for this purpose.
The Census of Canada is conducted every five years. va r iab les , and vi ~(0, σ v2 ) .
One objective is to provide the Population Estimates
Program with accurate baseline counts of the number T he r esu lt in g Fa y-Herr iot est imator is give n
of persons by age and sex for specified geographic
areas. However, not all persons are correctly
as (
θˆi , FH = zi′ βˆ FH + γˆi yi − zi′ βˆ FH)where
enumerated. Two errors that occur are undercoverage - γˆi = σˆ /(σˆ +ψ i ) .
2
v
2
v
exclusion of eligible persons - and over coverage -
erroneous inclusion of persons. This undercoverage The sampling variances are not known, but can be
varies between 2 and 3 %.
estimated as from ψˆ i = v ( yi ) given the sampling plan
A special survey, known as the Reverse Record Check for the RRC. As these variances are for domains, they
(RRC), with a sample size of 60,000 persons, estimates will be tend to be variable. Dick (1995) smoothed them
the net number of persons missed by the Census. This
net number combines two types of coverage errors: the
gross number of persons missed by the Census
3452
Section on Survey Research Methods
( )
by using log v( Mˆ i ) = α + β log ( Ci ) + ηi where it is
assumed that ηi N ( 0, ζ 2 ) .
iid
3453
Section on Survey Research Methods
Est. cv
0,5
rotation results in a significant level of overlap for the 0,4
sampled households. This is reflected in the linking 0,3
model given by θ it = xit′ β + vi + uit where the error 0,2
0,1
structure of the uit ’s is assumed to follow an AR(1)
0
( 0,σ )
iid
process, represented as uit = ui , t −1 + ε it ; ε it 2 Figure 4.8: Comparison of coefficients of variation of
unemployment rates using Direct, Fay-Herriot, and
space-time estimates for June 1999
The error structure of the eit ’s is assumed known, and
as this is not the case, the sample based estimates need Acknowledgements: The author would like to
to be smoothed. You, Rao and Gambino (2003) used acknowledge Jon Rao, Peter Dick and Susana
the Hierarchical Bayes (HB) procedure to estimate the Rubin-Bleuer.
required parameters in the error and linking equation.
They compared numerically three estimators of the References
unemployment rates in June 1999. These estimators
were the direct estimator (Direct Est), a small area Australian Bureau of Statistics (2006). A Guide to
estimator based only on the current cross-sectional Small Area Estimation - Version 1.1. Internal
data (the Fay-Herriot), and one using both the cross- ABS document.
sectional and longitudinal data (Space-time). Battese, G.E., Harter, R.M., Fuller, W.A. (1988). An
Error-Components Model for Prediction of Crop
Figure 4.7 displays these LFS estimates for the June Areas Using Survey and Satellite Data, Journal of
1999 unemployment rates for the 62 CAs across the American Statistical Association, 83, 28-36.
Canada. The 62 CAs appear in the order of Brackstone, G. J. (1987). Small area data: policy issues
population size with the smallest CA (Dawson and technical challenges. In R. Platek, J. N. K. Rao,
Creek, BC, population is 10,107) on the left and the C. E. Sarndal, and M. P. Singh, eds., Small Area
largest CA (Toronto, Ont., population is 3,746,123) Statistics, pp. 3-20. John Wiley & Sons, New York.
on the right. The Fay-Herriot model tends to shrink Béland, Yves Canadian Community Health Survey
(2002). Methodological overview. Health report,
the estimates towards the average of the
Statistics Canada, Catalogue no. 82-003-XPE
unemployment rates. The space-time model leads to
(0030182-003-XIE.pdf), Vol. 13, No. 3, ISSN
moderate smoothing of the direct LFS estimates. For 0840-6529.
the CAs with large population sizes and therefore Dick, P. (1995). Modelling Net Undercoverage in the
large sample sizes, the direct estimates and the HB 1991 Canadian Census, Survey Methodology ,
estimates are very close to each other; for smaller 21, 45-54.
CAs, the direct and HB estimates differ substantially Drew, D., Singh, M.P., and Choudhry, G.H. (1982).
for some regions. Evaluation of Small Area Estimation Techniques
for the Canadian Labour Force Survey, Survey
Methodology , 8, 17-47.
16
Direct Est Fay-Herriot Space-Time
Fay, R.E. and Herriot, R.A. (1979). Estimation of
14
Income for Small Places: An Application of
Unemployment rate(%)
12
James-Stein Procedures to Census Data. Journal
10
8
of the American Statistical Association, 74, 269-
6
277.
4 Fuller, W.A. (1999). Environmental Surveys Over
2 Time, Journal of Agricultural, Biological and
0 Environmental Statistics, 4, 331-345.
Gambino, J.G., Singh, M.P., Dufour, J., Kennedy, B.
Figure 4.7: Comparison of unemployment rates using and Lindeyer, J. (1998). Methodology of the
Direct, Fay-Herriot, and space-time for June 1999 Canadian Labour Force Survey, Statistics Canada,
Catalogue No. 71-526.
3454
Section on Survey Research Methods
Gonzalez, M.E., and Hoza, C. (1978), Small-Area Singh, M.P., Gambino, J., Mantel, H.J. (1994). Issues
Estimation with Application to Unemployment and Strategies for Small Area Data, Survey
and Housing Estimates, Journal of the American Methodology, 20, 3-22.
Statistical Association, 73, 7-15. Woodruff, R.S. (1966), Use of a Regression Technique
Hidiroglou M.A. and Singh A., and Hamel M. (2007). to Produce Area Breakdowns of the Monthly
some thoughts on small area estimation for the National Estimates of Retail Trade, Journal of the
Canadian community health survey (CCHS). American Statistical Association, 61, 496-504.
Internal Statistics Canada document. You, Y., and Rao, J.N.K. (2002). A Pseudo-Empirical
Hidiroglou, M.A. and Särndal, C.E., (1985). Small Best Linear Unbiased Prediction Approach to
Domain Estimation: A Conditional Analysis, Small Area Estimation Using Survey Weights,
Proceedings of the Social Statistics Section, Canadian Journal of Statistics, 30, 431-439.
American Statistical Association, 147-158. You, Y., Rao, J.N.K. and Dick, J.P. (2002)
Hidiroglou, M.A. and Patak, Z. (2004). Domain Benchmarking hierarchical Bayes small area
estimation using linear regression. Survey estimators with application in census
Methodology, 30, 67-78. undercoverage estimation. Proceedings of the
Levy, P.S. (1971). The Use of Mortality Data in Survey Methods Section 2002, Statistical Society
Evaluating Synthetic Estimates, Proceedings of of Canada, 81 - 86.
the Social Statistics Section, American Statistical You, Y, Rao, J.N.K., and Gambino, J.G. (2003).
Association, pp. 328-331. Model-based unemployment rate estimation for
Prasad, N.G.N., and Rao, J.N.K. (1990), The the Canadian Labour Force Survey: A hierarchical
Estimation of the Mean Squared Error of Small- Bayes approach, Survey Methodology, 29, 25-32.
Area Estimators,. Journal of the American You, Y and Dick, P. (2004). Hierarchical Bayes
Statistical Association, 85, 163-171. Small Area Inference to the 2001 Census
Undercoverage Estimation. Proceedings of the
Prasad, N.G.N. and Rao, J.N.K. (1999). On robust ASA Section on Government Statistics, 1836-
small area estimation using a simple random 1840.
effects model. Survey Methodology, 25, 67-72 .
Rao, J.N.K. (2003). Small Area Estimation. New York: .
Wiley.
Rao, J.N.K. and Choudhry, H. (1995). Small Area
Estimation: Overview and Empirical study.
Business Survey Methods, Edited by Cox, Binder,
Chinnappa, Christianson, Colledge, Kott, Chapter
27.
Rubin-Bleuer, S., Godbout S and Morin Y (2007).
Evaluation of small domain estimators for the
Canadian Survey of Employment, Payrolls and
Hours. Paper presented at the third International
Conference of Establishment Surveys July 2007
Statistical of Society Meetings.
Schaible, W.A. (1978). Choosing Weights for
Composite Estimators for Small Area Statistics,
Proceedings of the Section on Survey Research
Methods, American Statistical Association, pp.
741-746.
Singh A.C. and Verret F. (2006). Mixed Linear
Nonlinear Aggregate level and Matt Type for
formulas? Models for Small Area Estimation for
Binary count data from Surveys. Proceedings of
the Statistics Canada Symposium.
Singh, A.C. (2006). Some problems and proposed
solutions in developing a small area estimation
product for clients. ASA Proc. Surv. Res. Meth.
Sec.
3455
Random
effects model
Section on Survey Research Methods
∑ (θˆ − z ′ β (σ )) / (ψ + σ
m 2
Solve h (σ v2 ) = i i
% 2
v i
2
v ) = m − p for σ 2
v via iteration
i =1
(
σ v2( r +1) = σ v2 ( r ) + ⎡⎣ m − p − h (σ v2 ) ⎤⎦ h*′ σ v2 ( r ) constraining to σ v2( r +1) ≥ 0 , )
∑ (θˆ − z ′ β ) / (ψ + σ
m 2
where h*′ (σ v2 ) = − )
2
2
i i
%
i v is an approximation to the
i =1
∑ ∑ z θˆ / (ψ + σˆ )
m m
βˆ = β% (σˆ v2 ) = ⎢ zi zi′ / (ψ i + σˆ v2 ) ⎥
⎡ ⎤ ⎡ 2 ⎤
⎢ i i i v ⎥ where
⎣ i =1 ⎦ ⎣ i =1 ⎦
MSE of θˆi, FH ( ) ( )
2
9. Leading term of MSE θˆi , FH = E θˆi , FH − θ i where the expectation is with
respect to the Fay-Herriot model; see step 8; g1i (σ v2 ) = γ iψ i shows the
efficiency of θˆi, FH over direct estimator θˆi is γ i−1 for large number of areas
m. If γ i = σ v2 / (ψ i + σ v2 ) = 1/ 2 , then efficiency is 200% or gain in
efficiency is 100%.
10. Scenarios for large efficiency Sampling variance ψ i large or model variance σ v2 small relative to ψ i
gains
11. Nearly unbiased estimator of
( )
mse θˆi , FH : See equation (7.1.26), p. 129, Rao (2003); easily
(
MSE θˆi , FH) programmable
12. Estimation of small area mean Yi Yˆ = g −1 θˆ
i , FH ( ) = K (θˆ )
i , FH i , FH
13. MSE estimator of Yˆi , FH ( ) = ⎡⎣ K ′ (θˆ )⎤⎦ mse (θˆ ) ; may not be nearly unbiased.
2
mse Yˆi , FH i , FH i , FH
Empirical Bayes (EB) and hierarchical Bayes( HB) methods are better
suited for handling non-linear cases , K θˆ , see p. 133 , Rao (2003) ( i , FH )
3456