Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Lecture 11: Sensitive Questions and the

Technique of Randomised Response

 Motivation
 The Warner Model
 The Unrelated Question Model
 Other Approaches

References:
 See within slides...
Motivation

The survey taker wishes for the sampling units (responders)


to give accurate information in response to the questions
 If not the data will have potentially serious respondent errors
resulting in BIAS

Despite assurances of confidentiality, the likelihood of mis-


leading response increases when questions are considered
sensitive by the responder:
 BUT (as an example) accurate estimates of the number of
people with AIDS are necessary for planning expenditure on
treatment clinics, hospitals and hospices.
Other Examples:
 tax evasion, illegal earnings, illegal drug use,...
Motivation...

...internal crime comprises a significant amount of all crime against


business, which means that the offenders are not mainstream offenders
but are ordinary people like us...The seminal work in this area by Clark
and Hollinger (1983) showed the considerable extent of employee
involvement in what they describe as property deviance (which includes
theft) and production deviance (which includes misconduct such as not
working all the time, being late for work, taking long lunches and so on). ...
Self report surveys are often used to gauge the level of internal crime but
even they may understate its true extent.

Dennis Challinger, Coles Myer Ltd, The Realities of Crime Against


Business, paper presented at the conference Crime Against Business,
convened by the Australian Institute of Criminology, June 1998.
Motivation...

In 1965 Warner introduced the technique of randomised response to attempt


to reduce or ideally eliminate the bias from non-response and false responses
that occurs when you directly question respondents about sensitive issues.
Warner, S.L. (1965). Randomised response: a survey technique for eliminating evasive
answer bias. J. Amer. Statist. Assoc., 60, 63-69.

The Warner Model


A randomisation device is used to determine whether the
respondent replies to the question ‘Do you belong to A?’ (where
A is the set of people who have the sensitive attribute) or to the
question ‘Do you belong to AN?’ (where AN is the set of people
who don't have the sensitive attribute). The interviewer does not
know which of the two questions the respondent has answered,
and so the respondent has complete confidentiality.
The Warner Model – mathematical structure

Let A represent our sensitive group in the population


 AN (not A) represents those not in the sensitive group

We want to estimate πA
 the proportion of the population in group A.

Randomise p Q: do you πA Yes


with device belong to A 1- πA No
(spinner, dice, Q: do you 1- πA Yes
1-p
cards,...) belong to AN πA No

Let µ be the proportion that says yes under this structure...


The Warner Model – mathematical structure...

After applying this structure within a sample we get , an


unbiased estimate of µ
 nY is the number of respondents that answer yes
 n is the total sample size

where is our estimate of the proportion with attribute A using


the Warner model...
The Warner Model – mathematical structure...

We can see that is undefined for p = 0.5, so we cannot just


use a coin toss...
 IF p = 1 then where is the estimate we
should get by just directly asking ‘do you belong to A’?
 the binomial variance assuming truthful
answers to the direct question
 where the second part is the
increased uncertainty (penalty) due to the randomisation
 where
Example

A study is designed to estimate the proportion of people in a


certain district who give false information on income tax returns.
The researcher constructs a deck of cards in which 25% of the
cards are marked F, denoting a false return, and 75% are marked
C, denoting a correct return. A simple random sample of 100
people is selected from the large population of taxpayers in the
district. In separate interviews each sampled taxpayer is asked to
draw a card from the deck (not showing it to the interviewer) and
to respond Yes if the letter agrees with the group to which he or
she belongs. The experiment results in 72 Yes responses.
 Estimate the proportion of taxpayers in the district who have
falsified returns
 Find the 95% confidence interval for this estimate
Example...

 n = 100, nY = 72, A = false return, πA = proportion with false


return, p = 0.25 (proportion facing do you make a false return)
 gives the observed proportion that say yes


 (1- α)% Confidence Interval given by
 In our case α = 0.05 so = 1.96
o 95% CI = 0.06 ± 1.96×0.09 = (‘-0.1164’, 0.2364)
95% CI (0, 0.24)
Efficiency Issues

Warner noted that there is a conflict between efficiency and


confidentiality.
 If p close to ‘one’ (or ‘zero’) then the second term in the
variance is small
o BUT respondent has little protection (privacy) so chance of
truthful response may decline...
Efficiency Issues...
Alternative Approaches

There are many extensions of the randomised response model.


The Unrelated Question Model randomises the respondent to
answer either the sensitive question or an unrelated question. A
trial of this method took place in North Carolina in 1965. The two
questions were:
 A (so called stigmatising attribute): There was a baby born in
this household after January 1, 1965, to an unmarried woman
who was living here.
 Y (unrelated and innocuous attribute): I was born in North
Carolina.
The Unrelated Question Model

We want to estimate πA
 the proportion of the population in group A.

Randomise p Q: do you πA Yes


with device belong to A 1-πA No
(spinner, dice, Q: do you πY Yes
1-p
cards,...) belong to Y 1-πY No

Let µ be the proportion that says yes under this structure...


The Unrelated Question Model...

In this case πY = proportion born in NC


 Assume this proportion is known...

After applying this structure within a sample we get , an


unbiased estimate of µ
 nY is the number of respondents that answer yes
 n is the total sample size

where is our estimate of the proportion with attribute A using


the Unrelated Question model...

Applications

Some applications of randomised response techniques:


 study of organised crime in Illinois 1975
 study to estimate annual incidence of at-fault automobile
accidents among licensed drivers who drink alcoholic
beverages
 study of deliberate concealment of deaths (in household
surveys of vital events in the Philippines) 1976
 measuring drug use among Swedish adolescents 1987
 estimation of smuggled liquor in Norway 1994
Applications...

Approach prominent in social and behavioural sciences


 The most effective means of reducing misreporting seem to
be self-administration and the randomised response
technique. With self-administration, the interviewer is not
aware of the respondent's answer; with the randomised
response procedure, the interviewer is unaware of the
question. Either way, the threat of the interviewer's
disapproval is eliminated, and that appears to be a key
consideration that motivates respondents to report erroneous
information deliberately.
Roger Tourangeau
Applications...

However, there have been some reservations about the


technique. The Commonwealth Department of Health
commissioned questions on legal and illegal drug usage in the
September 1978 round of the Canberra Population Survey.
 ½ the sample got direct questions
 ½ the sample got direct questions except the last question
marijuana use that was RR

RESULTS: RR got lower estimates across all population sub-


groups... WHY?
 Embarrassment, other???
o RR broke flow of questionnaire and wasn’t taken seriously
o Highlighted the question was potentially sensitive...
Unmatched Count Technique

Dalton, D.R., Wimbush, J.C. and Daily, C.M. (1994). Using the
unmatched count technique (UCT) to estimate base rates for
sensitive behaviour. Personnel Psychology, 47, 817-828.
 Half of the sample is presented with a set of innocuous
statements and each respondent has to record how many of
these statements are true. The other half of the sample
receives a set of statements composed of the original
innocuous statements together with one sensitive question.
Again each respondent has to record how many of these
statements are true.
o Given random assignment to groups, the estimate of the
base rate for the sensitive attribute is the difference
between means for the two halves of the sample.
Examples of the sets of questions:

Set 1
 I have been to Spain
 I would consider myself to be a sports fan
 I have a brother
 I have more than one sister
 I have read the book The Pelican Brief
 I have engaged in self dealing in the auctioneer business
Set 2
 I could tell you what time 22:00 is
 I have used a phantom bid in the auctioneer business
 I could tell you the name of the head football coach of the Dallas Cowboys
 I do not normally eat breakfast
 I could tell you what a canonical correlation is
 If I were to get a pet (or another pet) I would prefer to get a cat or a dog
Set 3
 I usually take a vitamin supplement almost each day
 I have engaged in conspiracy nondisclosure in the auctioneer
business
 I have read the book The Prince
 There is a shotgun in the house/apartment where I currently live
 I have lived in three or more states
 I have lived in a country other than the US

A recent research study involved samples of people who had worked in high
theft exposure businesses being given either a conventional self-report survey
instrument or one based on one of two psychological techniques: randomised
response technique (RRT) or the unmatched count technique (UCT). That
produced overall theft admission of:
 28:2% with the self report instrument
 59:2% (RRT) and 57:9% (UCT) with the other instruments.
Dennis Challinger, Coles Myer Ltd, The Realities of Crime Against Business,
paper presented at the conference Crime Against Business, convened by the
Australian Institute of Criminology, June 1998.
Block Total Response Procedure

Smith, L.L., Federer, W.T. and Raghavarao, D. (1974). A


comparison of three techniques for eliciting truthful responses to
sensitive questions. In Proceedings of the Social Statistics
Section, pp447-452. American Statistical Association.
 There are v questions in total.
 Each answer to each question has a score attached to it.
 A randomising device used to give each respondent one of b
sets of k questions to answer.
 The respondent reports only the total score for the sets of
questions.
Example

1. Are you under 21 years of age? Yes (0) No (1)


2. Did you cheat in any way on the Stat 200 prelim last week? Yes (2) No (3)
3. Are you happy with your decision to come to Cornell? Yes (1) No (0)
4. While at Cornell, have you ever stolen money (or any article worth over
$5) from a room-mate, friend, employer or anyone else?) Yes (3) No (2)
5. Does your parent earn more than $25000 a year? Yes (0) No (1)
6. Have you smoked any marijuana during the past two weeks?
Yes (2) No (3)
7. Are you enrolled in the College of Agricultural and Life Sciences?
Yes (1) No (0)

7 sets of questions: 1 2 4; 2 3 5; 3 4 6; 4 5 7; 5 6 1; 6 7 2; 7 1 3.
 Research undertaken by Smith and Street (at UTS) determined the best balanced
incomplete block design to use to estimate the base rates for 3, 4, 5 and 6
sensitive attributes respectively, given a maximum total number of 13 questions.
The estimates obtained have smaller variance than estimates obtained using the
unmatched count technique.
Just to Finish...

These approaches can be a useful way to ask sensitive questions


 Work well when the survey is just focusing on that aspect
BUT
 With a main-stream interviewer survey often just offering a
self-completion mode within the survey interview works
better...

There is always a trade-off between efficiency and privacy


 IF p close to 0.5 then lots of privacy BUT low efficiency
 IF p close to one (or zero) then maximum efficiency but less
privacy

You might also like