Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Statistics

A Journal of Theoretical and Applied Statistics

ISSN: 0233-1888 (Print) 1029-4910 (Online) Journal homepage: https://www.tandfonline.com/loi/gsta20

Estimation of a rare sensitive attribute for two-


stage randomized response model in probability
proportional to size sampling using Poisson
probability distribution

G. N. Singh, C. Singh & S. Suman

To cite this article: G. N. Singh, C. Singh & S. Suman (2019) Estimation of a rare sensitive
attribute for two-stage randomized response model in probability proportional to
size sampling using Poisson probability distribution, Statistics, 53:2, 387-394, DOI:
10.1080/02331888.2019.1566906

To link to this article: https://doi.org/10.1080/02331888.2019.1566906

Published online: 21 Jan 2019.

Submit your article to this journal

Article views: 101

View related articles

View Crossmark data

Citing articles: 2 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=gsta20
STATISTICS
2019, VOL. 53, NO. 2, 387–394
https://doi.org/10.1080/02331888.2019.1566906

Estimation of a rare sensitive attribute for two-stage


randomized response model in probability proportional to
size sampling using Poisson probability distribution
G. N. Singh, C. Singh and S. Suman
Department of Applied Mathematics, Indian Institute of Technology, Indian School of Mines, Dhanbad, India

ABSTRACT ARTICLE HISTORY


This paper addresses the estimation of the mean number of individ- Received 20 November 2017
uals in the population who possess a rare sensitive attribute using Accepted 5 January 2019
Poisson distribution for the situations of (i) clustered population and KEYWORDS
(ii) stratified population with clusters are strata units. Properties of Randomized response
the proposed estimation procedures have been discussed when the technique; rare sensitive
proportion of a rare unrelated non-sensitive attribute is assumed to attribute; rare unrelated
be known as well as unknown. Empirical studies are carried out to non-sensitive attribute;
support the theoretical results which showed dominance over Lee Poisson probability
et al. [Estimation of a rare sensitive attribute in probability propor- distribution; probability
tional to size measures using Poisson distribution. Statistics (Ber). proportional to size (pps)
2014;48(3):685–709] estimation procedures. sampling
MATHEMATICS SUBJECT
CLASSIFICATION
62D05

1. Introduction
In survey sampling, there are many situations where data collection is a cumbersome task
because potentially embarrassing responses are sought when we collect information on
sensitive issues. Respondents do not respond the questions asked by the interviewer or
give an untruthful response. To overcome such difficulties in sample surveys, random-
ized response technique (RRT) originated by Warner [1] is an effective method to reduce
the non-response in survey data. Greenberg et al. [2] modified Warner [1] randomized
response model (RRM) and suggested an unrelated question RRM which more rectifies
the privacy problems. Further, Chaudhuri and Mukerjee [3], Kim and Elam [4], and Singh
and Tarray [5] among others suggested more improved randomized response procedures
for different practical situations. Singh et al. [6] suggested a randomized response proce-
dure using blank cards. Land et al. [7] suggested a method for estimating the mean number
of individuals with a rare sensitive attribute by using the Poisson probability distribution
based on survey sampling. Lee et al. [8] used probability proportional to size (pps) sam-
pling scheme for the estimation of the parameter of a rare sensitive attribute using Poisson
probability distribution.

CONTACT C. Singh chandraketu.lko@gmail.com Department of Applied Mathematics, Indian Institute of


Technology, Indian School of Mines, Dhanbad 826004, India

© 2019 Informa UK Limited, trading as Taylor & Francis Group


388 G. N. SINGH ET AL.

Motivated with the above works and following Lee et al. [8] work, in this paper, we have
made an attempt to extend Singh et al. [6] unrelated RRM to two-stage unrelated random-
ized response procedure for estimating the mean number of individuals in the population
who possess a rare sensitive attribute when the parameter of a rare unrelated non-sensitive
attribute is known and unknown.

2. Estimation procedures under two-stage sampling design using RRM


Consider a population with N clusters such that ith (i = 1,2, . . . N) cluster consists of Mi
elementary units. In the first stage, a sample of n clusters is selected using PPSWR sam-
pling scheme from the population. In the second stage,mi elementary units are selected
using SRSWR sampling scheme from the ith selected cluster in the first stage, so that
n
the final sample consists of m = mi elementary units. Extending the Singh et al. [6]
i=1
unrelated randomized response procedure using Poisson probability distribution for afore-
mentioned two-stage sampling scheme, we have proposed the estimation procedures for
the mean number of individuals who possess a rare sensitive attribute. The proposed esti-
mation procedures are discussed for the situations when the proportion of a rare unrelated
non-sensitive attribute is known as well as unknown.

2.1. Estimation procedure when the proportion of a rare unrelated non-sensitive


attribute is known
Following the above two-stage sampling scheme and assuming the proportion of a rare
unrelated non-sensitive attribute is known, the responses from the elementary units in the
second-stage samples were collected using the extended two-stage Singh et al. [6] unre-
lated randomization device which consists the following statements for the ith cluster:
(a) Are you a member of a rare sensitive group A1 ? (b) Go to randomization device R2 ,
with corresponding probabilities Ti and (1−Ti ), respectively, and second-stage random-
ization device which uses the following three statements: (i) Are you a member of group
A1 ? (ii) Are you a member of a rare unrelated non-sensitive group A2 ? and (iii) Blank
Cards, with pre-assigned probabilities P1i , P2i and P3i such that 3k=1 Pk i = 1. Following
the above-randomized response procedures, the probability θi0 of ‘yes’ answer in the ith
cluster is
θi0 = Ti πi1 + (1 − Ti )[P1i πi1 + P2i πi2 ]
where πi1 is the true proportion of the rare sensitive attribute A1 and πi2 is the true pro-
portion of the rare unrelated non-sensitive attribute A2 in ith cluster and πi2 is assumed
to be known. Since A1 and A2 are rare attributes, hence for large mi and small θi0 , we have
mi θi0 = λi0 is finite. Let yi1 , yi2 , . . . , yimi be a random sample drawn from ith cluster which
follow the Poisson probability distribution with mean λi0 , so the estimator for the mean
number of individuals with the rare sensitive characteristics, λi1 (λi1 = mi πi1 ) is derived
as
⎡ ⎤
1 mi
λ̂i1 = ⎣1 yij − P2i (1 − Ti )λi2 ⎦ (1)
P1i + Ti (1 − P1i ) mi
j=1
STATISTICS 389

where λi2 (λi2 = mi πi2 )is the mean number of individuals who have the rare unrelated
non-sensitive attribute in the ith cluster.
Under unequal probability sampling pi = M Mi , the final estimator of λ , i.e., the mean
1
0
number of individuals in the population who possess a rare sensitive attribute is defined
as:
1 
n
λ̂1ppswr = λ̂i1
n
i=1
N
where M0 = i = 1 Mi .

Theorem 1: The estimator λ̂1ppswr is unbiased for the mean number of individuals λ1 with
the variance given as
⎡ ⎤
1 ⎣ 
N N
M i ⎦
V(λ̂1ppswr ) = Mi (λi1 − λ1 )2 + ϕi (2)
nM0 mi
i=1 i=1

The unbiased estimate of the variance of λ̂1ppswr is simplified as


 2
1 n
λ̂1ppswr
V̂(λ̂1ppswr ) = λ̂i1 −
n(n − 1) M0
i=1

where
λi1 P2i (1 − Ti )λi2
ϕi = +
[P1i + Ti (1 − P1i )] [P1i + Ti (1 − P1i )]2

2.2. Estimation procedure when the proportion of a rare unrelated non-sensitive


attribute is unknown
In this section, we have estimated the mean number of persons in the population who are
possessing a sensitive attribute when the proportion of individuals who possess the rare
unrelated non-sensitive attribute is unknown. The individuals selected in the sample are
asked to answer ‘yes’ or ‘no’ using the following two-stage randomization devices.
The first randomization device R11 in the ith cluster for the second-stage sample consists
the following statements: (a) Are you a member of a rare sensitive group A1 ? (b) Go to
randomization device R12 with corresponding probabilities T 1i and (1−T 1i ), respectively.
The second-stage randomization device R12 for the ith cluster consists the following
three statements: (i) Are you a member of group A1 ? (ii) Are you a member of a rare non-
sensitive unrelated3group A2 ? (iii) ‘Blank Cards’ with pre-assigned probabilities P1i , P2i
and P3i such that k = 1 Pki = 1. The respondents are again requested to answer one of the
same questions using the second randomization device. The second randomization device
is as follows:
The first-stage randomization device R21 consists of the following two statements: (a)
Are you a member of a rare sensitive group A1 ? (b) Go to randomization device R22 with
corresponding probabilities T 2i and (1−T 2i ), respectively. The second-stage randomiza-
tion device R22 consists of the following three statements: (i) Are you a member of group
390 G. N. SINGH ET AL.

A1 ? (ii) Are you a member of a rare non-sensitive unrelated


group A2 ? (iii) ‘Blank Cards’
with pre-assigned probabilities P4i , P5i and P6i such that 6k = 4 Pki = 1. Based on the
responses obtained through using two randomization devices, the probabilities of ‘yes’
answer in the ith cluster are obtained as

θi1 = T1i πi1 + (1 − T1i ){P1i πi1 + P2i πi2 } and θi2 = T2i πi1 + (1 − T2i ){P4i πi1 + P5i πi2 }

where πi1 and πi2 are the population proportions for the rare sensitive and unrelated non-
sensitive attributes in ith cluster, respectively. Since A1 and A2 are the rare attributes in the
population for the ith cluster, we have mi θi1 = λ∗i1 and mi θi2 = λ∗i2 are finite for large mi
and small θi1 and θi2 .
After derivational procedures, the estimators of λi1u and λi2u are obtained as
⎡ ⎤
1 ⎣ mi mi
λ̂i1u = P5i (1 − T2i ) yi1j −P2i (1 − T1i ) yi2j ⎦ (3)
mi Ai
i=1 i=1

⎡ ⎤
1 ⎣ mi mi
λ̂i2u = {P4i + T2i (1 − P4i )} yi1i − {P1i + T1i (1 − P1i )} yi2j ⎦ (4)
mi Bi
i=1 i=1

where

Ai = (P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i ) − (P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i )

(P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i ) = (P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i )
and

Bi = (P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i ) − (P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i )

(P2i P4i − P2i P4i T2i + P2i T2i )(1 − T1i ) = (P1i P5i − P1i P5i T1i + P5i T1i )(1 − T2i )
Finally, the estimator of the rare sensitive attribute λ1 is derived as λ̂1ppswru =
1  λ̂ .
n
n i1u
i=1

Theorem 2: The estimator λ̂1ppswru is unbiased for the mean number of individuals λ1 with
the variance
⎡ ⎤
(12)
1 ⎣ 
N N
ψ
V(λ̂1ppswru ) = Mi (λi1 − λ1 )2 + Mi i 2 ⎦ (5)
nM0 mi γi
i=1 i=1

The unbiased estimate of the variance of λ̂1ppswru is simplified as


 2
1 n
λ̂1ppswru
V̂(λ̂1ppswru ) = λ̂i1u −
n(n − 1) M0
i=1
STATISTICS 391

where
(12)
ψi = [{P1i + T1i (1 − P1i )}(1 − T2i )2 P5i
2
+ {P4i + T2i (1 − P4i )}(1 − T1i )2 P2i
2
]
− 2P2i P5i (1 − T1i )(1 − T2i ) × {P1i + T1i (1 − P1i )}{P4i + T2i (1 − P4i )}
+ P2i P5i (1 − T1i )(1 − T2i ){P2i (1 − T1i ) + P5i (1 − T2i )}
2 2
− 2P2i P5i (1 − T1i )2 (1 − T2i )2 ].

3. Estimation procedure under two-stage sampling design with


stratification usingRRM
To develop the estimation procedure under this section, it is considered that population
is stratified into L strata with Nh clusters in hth (h = 1,2, . . . ,L) stratum. It is assumed
that the size of the ith cluster in hth stratum is Mhi (i = 1,2, . . . ,Nh ) and in the first stage,
a sample of nh clusters are drawn using PPSWR with selection probability phi from hth
stratum. In the second-stage simple random samples using with replacement of sizes mhi
are selected from the ith (i = 1,2, . . . ,nh ) cluster drawn from the hth stratum.

3.1. Estimation procedure when the proportion of a rare unrelated non-sensitive


attribute is known for a stratified population
In this section, we extend the procedure as discussed in Section 2.1 for stratified popula-
tion, the probability that respondents answer ‘yes’ in ith sampled cluster of hth stratum is
defined as
θhi0 = Thi πhi1 + (1 − Thi )[P1hi πhi1 + P2hi πhi2 ]
where Thi , P1hi , and P2hi are the similar probabilities of being asked the questions as
described in Section 2.1. πhi1 and πhi2 are the population proportions of the rare sen-
sitive and rare unrelated non-sensitive attribute, Ah1 and Ah2 , respectively, hence for large
mhi and small θhi0 , we have mhi θhi0 = λhi0 is finite. Let yhi1 ,yhi2 , . . . ,yhimhi be a random
sample of size mhi from the Poisson distribution with mean λhi0 from ith cluster of hth stra-
tum, hence, an estimator for the mean number of individuals who possess a rare sensitive
attribute λhi1 in ith cluster of hth stratum is defined as follows
⎡ ⎤
1 1 mhi
λ̂hi1 = ⎣ yhij − P2hi (1 − Thi )λhi2 ⎦ (6)
P1hi + Thi (1 − P1hi ) mhi
j=1

Under pps sampling scheme the final estimator of λ1 is defined as λ̂1sppswr =



L 
nh
Wh n1h λ̂hi1 .
h=1 i=1

Theorem 3: The estimator λ̂1sppswr is unbiased for mean number of individuals λ1 with the
variance
⎡ ⎤
L
1 Nh Nh
M
V(λ̂1sppswr ) = Wh2 ⎣ Mhi (λhi1 − λh1 )2 +
hi
ϕhi ⎦ (7)
nh Mh0 mhi
h=1 i=1 i=1
392 G. N. SINGH ET AL.

The unbiased estimate of the variance of λ̂1sppswr is simplified as


 2

L
1 
nh
λ̂
V̂(λ̂1sppswr ) = Wh2 λ̂hi1 − h1
nh (nh − 1) M0
h =1 i =1

where
λhi1 P2hi (1 − Thi )λhi2
ϕhi = +
[P1hi + Thi (1 − P1hi )] [P1hi + Thi (1 − P1hi )]2

3.2. Estimation procedure when the proportion of a rare unrelated non-sensitive


attribute is unknown
In this section, it is assumed that nh clusters are selected from hth stratum using PPSWR
scheme and πhi2 is assumed to be unknown. To extend two-stage, Singh et al. [6] unrelated
randomized response procedure, the probability that respondents answer ‘yes’ in ith clus-
ter of hth stratum by using the two-stage randomization devices out of two repetitive RR
procedures on the same respondents are given as

θhi1 = T1hi πhi1 + (1 − T1hi ){P1hi πhi1 + P2hi πhi2 } and


θih2 = T2hi πhi1 + (1 − T2hi ){P4hi πhi1 + P5hi πhi2 }

where T1hi , P1hi , P2hi and T2hi , P4hi , P5hi are the probabilities of being asked the questions
same as described in Section 2.2, πhi1 and πhi2 are the population proportions of the rare
sensitive attributes and rare unrelated non-sensitive attribute Ah1 and Ah2 , respectively.
Since the two attributes are rare in the population, therefore, mhi θhi1 = λ∗hi1 and mhi θhi2 =
λ∗hi2 are finite for large mhi and small θhi1 and θhi2 . After derivational procedures, the
estimators of λhi1u and λhi2u are obtained as
⎡ ⎤
1 ⎣ 
mhi 
mhi
λ̂hi1u = P5hi (1 − T2hi ) yhi1j − P2hi (1 − T1hi ) yhi2j ⎦ (8)
mhi Ahi
i=1 i=1

⎡ ⎤
1 ⎣ mhi mhi
λ̂hi2u = {P4hi + T2hi (1 − P4hi )} yhi1j −{P1hi + T1hi (1 − P1hi )} yhi2j ⎦
mhi Bhi
i=1 i=1
(9)
where

Ahi = (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )
− (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
× (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )
= (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
STATISTICS 393

and

Bhi = (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
− (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )
× (P2hi P4hi − P2hi P4hi T2hi + P2hi T2hi )(1 − T1hi )
= (P1hi P5hi − P1hi P5hi T1hi + P5hi T1hi )(1 − T2hi )


L 
nh
Finally, the proposed estimator of λ1 is given as λ̂1sppswru = Wh n1h λ̂hi1u .
h=1 i=1

Theorem 4: The estimator λ̂1sppswru is unbiased for λ1 with the variance


⎡ ⎤
L 
Nh Nh (12)
1 ψ
V(λ̂1sppswru ) = Wh2 ⎣ Mhi (λhi1 − λh1 )2 + Mhi hi 2 ⎦ (10)
nh Mh0 mhi γhi
h=1 i=1 i=1

The unbiased estimate of the variance of λ̂1sppswru is simplified as


 2
L
1 nh
λ̂hiu
2
V̂(λ̂1sppswru ) = Wh λ̂hi1u −
nh (nh − 1) Mh0
h=1 i=1

where
(12)
ψhi = [{P1hi + T1hi (1 − P1hi )}(1 − T2hi )2 P5hi
2
+ {P4hi + T2hi (1 − P4hi )}(1 − T1hi )2 P2hi
2
]
− 2P2hi P5hi (1 − T1hi )(1 − T2hi )
× {P1hi + T1hi (1 − P1hi )}{P4hi + T2hi (1 − P4hi )} + P2hi P5hi (1 − T1hi )(1 − T2hi )
× {P2hi (1 − T1hi ) + P5hi (1 − T2hi )}
2 2
− 2P2hi P5hi (1 − T1hi )2 (1 − T2hi )2 ]

4. Efficiency comparison
To show the dominance of the proposed estimators, the empirical comparisons are made
over Lee et al. [8] estimators. The per cent relative efficiencies of the proposed estimators
are calculated with respect to Lee et al. [8] estimators for the corresponding situations using
the formula:
V(Lee et al. estimator)
PRE = × 100
V(Prooposed estimator)
For the efficiency comparison, similar data are used as given in Lee et al. [8] paper.

5. Discussion of results and conclusions


The results are calculated for different choices of Parameters [(λ11 , λ12 , λ13 , λ14 , λ15 ),
(λ111 , λ121 , λ211 , λ221 , λ231 )] for known and unknown cases, respectively. We have consid-
ered different choices of probabilities [(T = 0.50, 0.60, 0.70, 0.80), (P1 = 0.30, 0.40, 0.50,
394 G. N. SINGH ET AL.

0.60), (P2 = 0.40, 0.30, 0.25, 0.20)] for non-stratified population and [(T = 0.40, 0.50,
0.60, 0.70), (P1 , P2 = 0.50, 0.25; 0.60, 0.20; 0.70, 0.15; 0.80, 0.10), (P4 , P5 = 0.20, 0.30;
0.30, 0.25; 0.40, 0.20; 0.50, 0.15)] for the stratified population. The obtained per cent rel-
ative efficiencies of the proposed estimators with respect to Lee et al. [8] estimators are
always found greater than 100 for all the considered cases.
Following these results, it may be concluded that the proposed estimators based on the
randomized response model when characteristics under the study concerns to the stig-
matized issues, are rewarding in the terms of percent relative efficiencies and dominate
over Lee et al. [8] estimators. Thus, the suggested estimators in this work may be utilized
effectively to handle the problems of untruthful response or non-response arises due to
sensitive nature of characteristics and may be recommended to the survey practitioners
for their practical uses.

Acknowledgements
Authors are thankful to Indian Institute of Technology (Indian School of Mines), Dhanbad for pro-
viding the necessary support to carry out the present research work, authors are also grateful to the
honourable Reviewers, Editor and Associate Editor for their valuable suggestions which helped us
in improving the quality of the paper.

Disclosure statement
No potential conflict of interest was reported by the authors.

References
[1] Warner SL. Randomized response: a survey technique for eliminating evasive answer bias. J Am
Stat Assoc. 1965;60(309):63–69.
[2] Greenberg BG, Abul-Ela ALA, Simmons WR, et al. The unrelated question randomized
response model: theoretical framework. J Am Stat Assoc. 1969;64(326):520–539.
[3] Chaudhuri A, Mukerjee R. Randomized response: theory and techniques. New York (NY):
Marcel Dekker; 1988.
[4] Kim JM, Elam ME. A stratified unrelated question randomized response model. Stat Pap.
2007;48(2):215–233.
[5] Singh HP, Tarray TA. A stratified unknown repeated trials in randomized response sampling.
Commun Stat Appl Methods. 2012;19(6):751–759.
[6] Singh S, Horn S, Singh R, et al. On the use of modified randomization device for estimating the
prevalence of a sensitive attribute. Stat Transition. 2003;6(4):515–522.
[7] Land M, Singh S, Sedory SA. Estimation of a rare sensitive attribute using Poisson distribution.
Statistics (Ber). 2012;46(3):351–360.
[8] Lee GS, Uhm D, Kim JM. Estimation of a rare sensitive attribute in probability proportional to
size measures using Poisson distribution. Statistics (Ber). 2014;48(3):685–709.

You might also like