Professional Documents
Culture Documents
Estimation of A Rare Sensitive Attribute For Two Stage Randomized Response Model in Probability Proportional To Size Sampling Using Poisson
Estimation of A Rare Sensitive Attribute For Two Stage Randomized Response Model in Probability Proportional To Size Sampling Using Poisson
To cite this article: G. N. Singh, C. Singh & S. Suman (2019) Estimation of a rare sensitive
attribute for two-stage randomized response model in probability proportional to
size sampling using Poisson probability distribution, Statistics, 53:2, 387-394, DOI:
10.1080/02331888.2019.1566906
1. Introduction
In survey sampling, there are many situations where data collection is a cumbersome task
because potentially embarrassing responses are sought when we collect information on
sensitive issues. Respondents do not respond the questions asked by the interviewer or
give an untruthful response. To overcome such difficulties in sample surveys, random-
ized response technique (RRT) originated by Warner [1] is an effective method to reduce
the non-response in survey data. Greenberg et al. [2] modified Warner [1] randomized
response model (RRM) and suggested an unrelated question RRM which more rectifies
the privacy problems. Further, Chaudhuri and Mukerjee [3], Kim and Elam [4], and Singh
and Tarray [5] among others suggested more improved randomized response procedures
for different practical situations. Singh et al. [6] suggested a randomized response proce-
dure using blank cards. Land et al. [7] suggested a method for estimating the mean number
of individuals with a rare sensitive attribute by using the Poisson probability distribution
based on survey sampling. Lee et al. [8] used probability proportional to size (pps) sam-
pling scheme for the estimation of the parameter of a rare sensitive attribute using Poisson
probability distribution.
Motivated with the above works and following Lee et al. [8] work, in this paper, we have
made an attempt to extend Singh et al. [6] unrelated RRM to two-stage unrelated random-
ized response procedure for estimating the mean number of individuals in the population
who possess a rare sensitive attribute when the parameter of a rare unrelated non-sensitive
attribute is known and unknown.
where λi2 (λi2 = mi πi2 )is the mean number of individuals who have the rare unrelated
non-sensitive attribute in the ith cluster.
Under unequal probability sampling pi = M Mi , the final estimator of λ , i.e., the mean
1
0
number of individuals in the population who possess a rare sensitive attribute is defined
as:
1
n
λ̂1ppswr = λ̂i1
n
i=1
N
where M0 = i = 1 Mi .
Theorem 1: The estimator λ̂1ppswr is unbiased for the mean number of individuals λ1 with
the variance given as
⎡ ⎤
1 ⎣
N N
M i ⎦
V(λ̂1ppswr ) = Mi (λi1 − λ1 )2 + ϕi (2)
nM0 mi
i=1 i=1
where
λi1 P2i (1 − Ti )λi2
ϕi = +
[P1i + Ti (1 − P1i )] [P1i + Ti (1 − P1i )]2
θi1 = T1i πi1 + (1 − T1i ){P1i πi1 + P2i πi2 } and θi2 = T2i πi1 + (1 − T2i ){P4i πi1 + P5i πi2 }
where πi1 and πi2 are the population proportions for the rare sensitive and unrelated non-
sensitive attributes in ith cluster, respectively. Since A1 and A2 are the rare attributes in the
population for the ith cluster, we have mi θi1 = λ∗i1 and mi θi2 = λ∗i2 are finite for large mi
and small θi1 and θi2 .
After derivational procedures, the estimators of λi1u and λi2u are obtained as
⎡ ⎤
1 ⎣ mi mi
λ̂i1u = P5i (1 − T2i ) yi1j −P2i (1 − T1i ) yi2j ⎦ (3)
mi Ai
i=1 i=1
⎡ ⎤
1 ⎣ mi mi
λ̂i2u = {P4i + T2i (1 − P4i )} yi1i − {P1i + T1i (1 − P1i )} yi2j ⎦ (4)
mi Bi
i=1 i=1
where
Ai = (P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i ) − (P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i )
(P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i ) = (P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i )
and
Bi = (P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i ) − (P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i )
(P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i ) = (P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i )
Finally, the estimator of the rare sensitive attribute λ1 is derived as λ̂1ppswru =
1 λ̂ .
n
n i1u
i=1
Theorem 2: The estimator λ̂1ppswru is unbiased for the mean number of individuals λ1 with
the variance
⎡ ⎤
(12)
1 ⎣
N N
ψ
V(λ̂1ppswru ) = Mi (λi1 − λ1 )2 + Mi i 2 ⎦ (5)
nM0 mi γi
i=1 i=1
where
(12)
ψi = [{P1i + T1i (1 − P1i )}(1 − T2i )2 P5i
2
+ {P4i + T2i (1 − P4i )}(1 − T1i )2 P2i
2
]
− 2P2i P5i (1 − T1i )(1 − T2i ) × {P1i + T1i (1 − P1i )}{P4i + T2i (1 − P4i )}
+ P2i P5i (1 − T1i )(1 − T2i ){P2i (1 − T1i ) + P5i (1 − T2i )}
2 2
− 2P2i P5i (1 − T1i )2 (1 − T2i )2 ].
Theorem 3: The estimator λ̂1sppswr is unbiased for mean number of individuals λ1 with the
variance
⎡ ⎤
L
1 Nh Nh
M
V(λ̂1sppswr ) = Wh2 ⎣ Mhi (λhi1 − λh1 )2 +
hi
ϕhi ⎦ (7)
nh Mh0 mhi
h=1 i=1 i=1
392 G. N. SINGH ET AL.
where
λhi1 P2hi (1 − Thi )λhi2
ϕhi = +
[P1hi + Thi (1 − P1hi )] [P1hi + Thi (1 − P1hi )]2
where T1hi , P1hi , P2hi and T2hi , P4hi , P5hi are the probabilities of being asked the questions
same as described in Section 2.2, πhi1 and πhi2 are the population proportions of the rare
sensitive attributes and rare unrelated non-sensitive attribute Ah1 and Ah2 , respectively.
Since the two attributes are rare in the population, therefore, mhi θhi1 = λ∗hi1 and mhi θhi2 =
λ∗hi2 are finite for large mhi and small θhi1 and θhi2 . After derivational procedures, the
estimators of λhi1u and λhi2u are obtained as
⎡ ⎤
1 ⎣
mhi
mhi
λ̂hi1u = P5hi (1 − T2hi ) yhi1j − P2hi (1 − T1hi ) yhi2j ⎦ (8)
mhi Ahi
i=1 i=1
⎡ ⎤
1 ⎣ mhi mhi
λ̂hi2u = {P4hi + T2hi (1 − P4hi )} yhi1j −{P1hi + T1hi (1 − P1hi )} yhi2j ⎦
mhi Bhi
i=1 i=1
(9)
where
Ahi = (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )
− (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
× (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )
= (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
STATISTICS 393
and
Bhi = (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
− (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )
× (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
= (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )
L
nh
Finally, the proposed estimator of λ1 is given as λ̂1sppswru = Wh n1h λ̂hi1u .
h=1 i=1
where
(12)
ψhi = [{P1hi + T1hi (1 − P1hi )}(1 − T2hi )2 P5hi
2
+ {P4hi + T2hi (1 − P4hi )}(1 − T1hi )2 P2hi
2
]
− 2P2hi P5hi (1 − T1hi )(1 − T2hi )
× {P1hi + T1hi (1 − P1hi )}{P4hi + T2hi (1 − P4hi )} + P2hi P5hi (1 − T1hi )(1 − T2hi )
× {P2hi (1 − T1hi ) + P5hi (1 − T2hi )}
2 2
− 2P2hi P5hi (1 − T1hi )2 (1 − T2hi )2 ]
4. Efficiency comparison
To show the dominance of the proposed estimators, the empirical comparisons are made
over Lee et al. [8] estimators. The per cent relative efficiencies of the proposed estimators
are calculated with respect to Lee et al. [8] estimators for the corresponding situations using
the formula:
V(Lee et al. estimator)
PRE = × 100
V(Prooposed estimator)
For the efficiency comparison, similar data are used as given in Lee et al. [8] paper.
0.60), (P2 = 0.40, 0.30, 0.25, 0.20)] for non-stratified population and [(T = 0.40, 0.50,
0.60, 0.70), (P1 , P2 = 0.50, 0.25; 0.60, 0.20; 0.70, 0.15; 0.80, 0.10), (P4 , P5 = 0.20, 0.30;
0.30, 0.25; 0.40, 0.20; 0.50, 0.15)] for the stratified population. The obtained per cent rel-
ative efficiencies of the proposed estimators with respect to Lee et al. [8] estimators are
always found greater than 100 for all the considered cases.
Following these results, it may be concluded that the proposed estimators based on the
randomized response model when characteristics under the study concerns to the stig-
matized issues, are rewarding in the terms of percent relative efficiencies and dominate
over Lee et al. [8] estimators. Thus, the suggested estimators in this work may be utilized
effectively to handle the problems of untruthful response or non-response arises due to
sensitive nature of characteristics and may be recommended to the survey practitioners
for their practical uses.
Acknowledgements
Authors are thankful to Indian Institute of Technology (Indian School of Mines), Dhanbad for pro-
viding the necessary support to carry out the present research work, authors are also grateful to the
honourable Reviewers, Editor and Associate Editor for their valuable suggestions which helped us
in improving the quality of the paper.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
[1] Warner SL. Randomized response: a survey technique for eliminating evasive answer bias. J Am
Stat Assoc. 1965;60(309):63–69.
[2] Greenberg BG, Abul-Ela ALA, Simmons WR, et al. The unrelated question randomized
response model: theoretical framework. J Am Stat Assoc. 1969;64(326):520–539.
[3] Chaudhuri A, Mukerjee R. Randomized response: theory and techniques. New York (NY):
Marcel Dekker; 1988.
[4] Kim JM, Elam ME. A stratified unrelated question randomized response model. Stat Pap.
2007;48(2):215–233.
[5] Singh HP, Tarray TA. A stratified unknown repeated trials in randomized response sampling.
Commun Stat Appl Methods. 2012;19(6):751–759.
[6] Singh S, Horn S, Singh R, et al. On the use of modified randomization device for estimating the
prevalence of a sensitive attribute. Stat Transition. 2003;6(4):515–522.
[7] Land M, Singh S, Sedory SA. Estimation of a rare sensitive attribute using Poisson distribution.
Statistics (Ber). 2012;46(3):351–360.
[8] Lee GS, Uhm D, Kim JM. Estimation of a rare sensitive attribute in probability proportional to
size measures using Poisson distribution. Statistics (Ber). 2014;48(3):685–709.