Professional Documents
Culture Documents
Improved Estimator For The Estimation of Sensitive Variable Using ORRT Models
Improved Estimator For The Estimation of Sensitive Variable Using ORRT Models
To cite this article: Sunil Kumar & Chanda Rani (2024) Improved estimator for the
estimation of sensitive variable using ORRT models, Research in Statistics, 2:1, 2318383, DOI:
10.1080/27684520.2024.2318383
Improved estimator for the estimation of sensitive variable using ORRT models
Sunil Kumar and Chanda Rani
Department of Statistics, University of Jammu, Jammu Tawi, J&K, India
CONTACT Chanda Rani chandaheer65@gmail.com Department of Statistics, University of Jammu, Jammu Tawi, J&K, India.
© 2024 The Author(s). Published with license by Taylor & Francis Group, LLC.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by
the author(s) or with their consent.
2 S. KUMAR AND C. RANI
and Kour (2022). In this technique, the respondent gives direct 3. Existing mean estimator
answer in first phase then ORRT model is used to get answer
By using basic terminologies, as used in Section 2, suppose
from a sub-group of non-respondents in the second phase.
that population mean and variance of the auxiliary variable
Therefore, ORRT model in the second phase is
N
Ywithprobability(1 − W) X are known and is denoted by μx = N1 xi and σx2 =
Z= (1) i=1
TY + SwithprobabilityW
1
N
(xi − μx )2 , respectively.
with mean E(Z) = E(Y) and variance Var(Z) = σy2 + σS2 W + N−1
i=1
σT2 (σy2 + μ2y )W. The RRT model is Let the population mean and variance of the respondent
group of size N1 is given by
Z = (TY + S)J + Y(1 − J) (2) N1
N1 2
where J∼ Bernoulli(W) with E(J) = W and Var(J) = W(1 − μx(1) = N11 xi and σx2(1) = N11−1 xi − μx(1) , respec-
i=1 i=1
W) When W = 1, then the randomized response becomes non- tively and the population mean and variance of non-respondent
optional. So, with W = 1 the mean and variance of Z is
N2
group of size N2 is given byμx(2) = N12 xi and σx2(2) =
ER (Z) = (μT W + 1 − W)Y + μS W (3) i=1
N2
2 σxy
VR (Z) = Y 2 σT2 +σS2 W (4) 1
N2 −1 xi − μx(2) , respectively. Further, ρxy = σx σy be the
i=1
Let us take a transformation of the randomized response be ŷi correlation coefficient between the auxiliary variable X and sen-
whose expectation under the randomization mechanism is the sitive variable Y σxy ρxy
true response yi and is given as Similarly, let ρxy(1) = σx σ(1) and ρxy(2) = σx σ(2)y be the
y
Zi − μS correlation coefficient between auxiliary variable X and the sen-
ŷi = (5)
μT W + 1 − W sitive study variableY for the respondent group and the non-
(y2 σT2 +σS2 )W respondents group, respectively.
with E(ŷi ) = yi and Var ŷi = (μ i W+1−W) 2.
Assuming that the population mean μx of X is known and
T
Based on above discussions, we assume that only n1 units non-response happened on both Y and X. Some of the existing
provide response on first call and remaining n2 =(n − n1) units mean estimators of ORRT model are listed below:
do not respond. Then a sub-sample of ns = nf2 f > 1 units
1. A typical mean estimator for sensitive variable in finite pop-
are taken from non-responding units n2 respectively. A modified ulation under modified Hansen and Hurwitz (HH) estimator
version of Hansen and Hurwitz estimator is given by in presence of measurement error is
ŷ = w1 y1 + w2 ŷ2 , (6) ∗
μ̂HH = ŷ = w1 y1 + w2 y∗2 , (7)
where y1 is the mean of respondents in first phase and ŷ2 =
ns
ns
ŷi where y∗2 = 1
zi .
ns is the mean of sub-sampled units in the second phase.
ns
i=1
i=1
Also w1 = nn1 and w2 = nn2 . In presence of measurement error, the MSE of μ̂HH is given
by
The mean and variance of ŷ is
E ŷ = Y and Var ŷ = θ σy2 + λσy2(2) + G MSE μ̂HH = θ σy2 + σu2 + λ σy2(2) + σp2 + G (8)
N−n W2 f −1 2. A ratio estimator corresponding to Gupta et al. (2014) esti-
where θ = ,λ = , mator under modified HH in presence of measurement error
Nn n
⎡ ⎤ is given by
W2 f ⎣ σy(2) + μy(2) σT + σS W ⎦
2 2 2 2
∗
G= , and ŷ
n (μT W + 1 − W)2 μ̂R = ∗ μx = R̂∗W μx , (9)
x
N2 ∗
W2 = . where ŷ and x∗ is the ordinary mean estimator under original
N
HH procedure.
Moreover, let xi , yi , zi be the observed values and (Xi Yi Zi )be The MSE of μ̂R is given by
the true values of the variablesXY and Z, respectively. Let u be
the measurement error (ME) on Y, v be the measurement error MSE μ̂R = θ σy2 + R2 σx2 − 2Rρyx σy σx
on X and p be the measurement error on Z, respectively. The
ME’s on ith observed unit are ui = yi − Yi , vi = xi − Xi and + λ σy2(2) + R2 σx2(2) − 2Rρzx(2) σz σx(2)
pi = zi − Zi and assumed to be uncorrelated with mean zero
and variance σu2 σv2 and σp2 , respectively. + θ σu2 + R2 σv2 + λ σp2 + R2 σv2 + G (10)
∗
In the presence of non-response and ME, the variance of ŷ μy ρyx(2)
is given by where R = μx and ρzx(2) = ⎛
⎞.
σS2 +σT2 σy2 +μ2y
∗ ⎝
1+
(2) (2)
W
⎠
Var ŷ = θ σy2 + σu2 + λ σy2(2) + σp2 + G. σy2
(2)
RESEARCH IN STATISTICS 3
The expression for the minimized MSE of the proposed estima- Table 1. PRE of the proposed estimators with respect to existing estimators for
different values of f and W using ORRT models.
tor without ME may be obtained by putting σv2 = σu2 = σp2 = 0
in the above expression, we get W f Estimators
2 μ̂HH μ̂R μ̂pw τ̂cs
MSEmin τ̂cs = θ D∗ + J ∗ σx2 − 2ρyx σy σx D∗ + J ∗ + σy2
0.2 2 100.000 76.41382 100.0061 100.0119
2 3 100.000 75.75772 100.0056 100.0091
+ λ J ∗ σx2(2) − 2J ∗ ρ zx(2) σz σx(2) + σy2(2) + G (18) 4 100.000 75.37299 100.0054 100.0074
5 100.000 76.57487 100.0048 100.0062
0.4 2 100.000 84.91888 100.0051 100.0101
3 100.000 84.50677 100.0047 100.0076
1 σy
where D∗ = ρyx θ σx2 + λσx2(2) 4 100.000 84.19802 100.0045 100.0062
λσx2(2) σx 5 100.000 85.03562 100.004 100.0051
0.6 2 100.000 92.21496 100.0043 100.0083
3 100.000 92.04403 100.0039 100.0061
− θρyx σy σx + R λρzx(2) σz σx(2) 4 100.000 91.85606 100.0036 100.0048
5 100.000 92.34802 100.0031 100.0038
∗
R λρzx(2) σz σx(2) 0.8 2 100.000 97.32849 100.0034 100.0064
and J = . 3 100.000 97.29573 100.0031 100.0042
λσx2(2) 4 100.000 97.2229 100.0028 100.003
5 100.000 97.42204 100.0023 100.0018
5. Efficiencies comparison
In this section, we compare the MSE of the proposed estimator An artificial population of size N(5000) from normal distri-
with respect to the MSE of other existing estimators mentioned bution and a sample of size n(850) under SRSWOR is taken. It is
in (8), (10), (12), and (17) are given as assumed that only n1 (450) units provide response and n2 (400)
do not respond in the first phase.
In
the second phase, we take a
1. MSEmin τ̂cs <MSE μ̂HH if sub-sample of size ns = nf2 f > 1 from the non-respondent n2
! units by using f = 2, 3, 4, 5, respectively. The simulation study
θ (D + J)2 σx2 + J 2 σv2 − 2ρyx σy σx (D + J)
given in Tables 1 and 2.
+λ J 2 σx2(2) + σv2 − 2Jρzx(2) σz σx(2) < 0 (19) Also, the scrambling variable Tand S are taken to be normal
with mean 1 and 0, respectively and with different variances.
2. MSEmin τ̂cs <MSE μ̂R if Further, another artificial population is used, we considered
by Zhang et al. (2021) for the comparison purpose and to see
θ (D + J)2 − R2 σx2 + J 2 − R2 σv2 the performance of the proposed estimator over other consid-
! ered estimator. We have considered a population of size 5000
−2ρyx σy σx (D + J − R)
generated from a bivariate normal distribution with mean and
+λ J 2 − R2 σx2(2) + σv2 (20) covariance (Y, X) as mentioned below:
10 16 9.051
−2ρzx(2) σz σx(2) (J − R) < 0. μ= , = ρyx = 0.8
6 9.051 8
3. MSEmin τ̂cs <MSE μ̂pw if
μx = 6.0228,σx2 = 8.1830,μy = 9.9864,σy2 = 16.1215,ρyx = 0.8024
θ (D + J)2 − P2 σx2 + J 2 − P2 σv2
! Taking sample of size n = 500 using SRSWOR and in the
− 2ρyx σy σx (D + J − P) (21) first phase we select a sample of size n1 (200) and n2 (300).
We take another sub-sample ns = nf2 where (f > 1) from the
+λ J 2 − P2 σx2(2) + σv2 − 2ρzx(2) σz σx(2) (J − P) < 0.
non-respondent in the second phase by using f = 2, 3, 4, 5. The
If the above conditions (19)–(21) hold true then the proposed simulation study based on Zhang et al. (2021) given in Table 3.
estimator is always more efficient than the other considered Coding for simulation was done in R software. The Percent
estimators. Relative Efficiency (PRE) of the proposed estimator τ̂cs with
respect to usual
unbiased
estimator (μ̂HH ) and two considered
estimators μ̂R , μ̂pw is defined as
6. Simulation study
MSE∗ (μ̂HH )
In this study, with the help of simulation study, we compare the PRE = ∗100,
performance of the proposed estimator under SRSWOR with the MSE∗ (μ̂i )
usual unbiased estimator and other two considered estimators. where μ̂i = μ̂HH , μ̂R , μ̂pw and τ̂cs
For simulation study, data set consist of sensitive study vari- Also σT2 = σS2 = 0.5.
able Y and an auxiliary variable X is generated from a normal Also σT2 = σS2 = 1.
distribution using themodel From Tables 1 and 2, we will compare the performance of the
Y = aX + rnorm N,μy ,σy2 where X = rnorm N,μx ,σx2 , proposed estimator with respect to usual unbiased estimator and
two considered estimators. In Table 1, when σT2 = σS2 = 0.5,
μy ,μx = (0, 0), a= 0.25 and σy2 ,σx2 may varies. we see that our proposed estimator decreases with the increase
RESEARCH IN STATISTICS 5
Table 2. PRE of the proposed estimators with respect to existing estimators for posed estimator is more efficient than the existing estimators.
different values of f and W using ORRT models.
A simulation study is also supporting the theoretical results
W f Estimators except the situation when the probability of sensitive question
μ̂HH μ̂R μ̂pw τ̂cs is moderately high (i.e., W = 0.6), under this situation the
0.2 2 100.000 85.42602 100.0049 100.0097 Zhang et al. (2021) estimator (μ̂pw ) is more efficient. Based on
3 100.000 85.01259 100.0046 100.0073 the results obtained, we recommend the use of the suggested
4 100.000 84.72413 100.0043 100.0059 estimator by the researchers and practitioners in future.
5 100.000 85.60064 100.0038 100.0048
0.4 2 100.000 97.52038 100.0032 100.0059
3 100.000 97.48531 100.0028 100.0039
4 100.000 97.41648 100.0026 100.0028
Acknowledgments
5 100.000 97.62159 100.0021 100.0016
0.6 2 100.000 99.69804 100.0018 100.0007
The authors express very sincere gratitude to the reviewers for their con-
3 100.000 99.70426 100.0015 99.99723 structive suggestions which helped improve the presentation of the paper.
4 100.000 99.69732 100.0013 99.99538
5 100.000 99.72319 100.0007 99.99238
0.8 2 100.000 95.12491 100.0009 100.0022 Data availability statement
3 100.000 95.3436 100.0006 100.0015
4 100.000 95.20721 100.0005 100.0012 No real data is used in the paper.
5 100.000 95.77384 99.9998 100.0009
Disclosure statement
Table 3. PRE of the proposed estimators with respect to existing estimators for
different values of f and W = 0.8 using ORRT models. Also (σv2 = σu2 = σp2 = The authors declares that they have no conflicts of interest.
1, 5, 10).
Kumar S, Kour SP. 2022. The joint influence of estimation of sensitive vari- Singh N, Vishwakarma GK, Kim, JM. 2019. Computing the effect of mea-
able under measurement error and non-response using ORRT models. J surement errors on efficient variant of the product and ratio estimators of
Stat Comput Simul. 92:3583–3604. mean using auxiliary information. Commun Stat Simul Comput. 51:1–
Kumar S, Singh H, P, Bhougal S, Gupta R. 2011. A class of ratio-cum-product 22.
type estimators under double sampling in the presence of non-response. Singh N, Vishwakarma, G, K. 2019. A generalized class of estima-
J Math Stat. 40:589–599. tor of population mean with the combined effect of measurement
Kumar S, Trehan M, Joorel JPS. 2018. A simulation study: estimation errors and non-response in sample survey. Rev Investig Oper. 40:275–
of population mean using two auxiliary variables in stratified random 285.
sampling. J Stat Comput Simul. 88:3694–3707. Singh SR, Sharma P. 2015. Method of estimation in the presence of non-
Kumar S. 2016. Improved estimation of population mean in presence of response and measurement errors simultaneously. J Mod App Stat Meth.
nonresponse and measurement error. J Stat Theory Pract. 10:707–720. 14:107–121.
Makhdum M, Sanaullah A, Hanif M. 2020. A modified regression-cum- Zhang Q, Khalil S, Gupta S. 2021. Mean estimation of sensitive variables
ratio estimator of population mean of a sensitive variable in the pres-
under non-response and measurement errors using optional RRT mod-
ence of non-response in simple random sampling. J Stat Manag Syst.
els. J Stat Theory Pract. 15:1–15.
23:495–510.