Humanizing Self-Administered Surveys: Experiments On Social Presence in Web and IVR Surveys

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Computers in Human Behavior 19 (2003) 1–24

www.elsevier.com/locate/comphumbeh

Humanizing self-administered surveys:


experiments on social presence in web and IVR
surveys§
Roger Tourangeaua,*, Mick P. Coupera, Darby M. Steigerb
a
Survey Research Center, University of Michigan, Joint Program in Survey Methodology, University of
Maryland, 1218 LeFrak Hall, College Park, MD 20742, USA
b
The Gallup Organization, 200 Public Square, Suite 2020, Cleveland, OH 44114, USA

Abstract
Social interface theory has had widespread influence within the field of human–computer
interaction. The basic thesis is that humanizing cues in a computer interface can engender
responses from users similar to those produced by interactions between humans. These
humanizing cues often confer human characteristics on the interface (such as gender) or sug-
gest that the interface is an agent actively interacting with the respondent. In contrast, the
survey interviewing literature suggests that computer administration of surveys on highly
sensitive topics reduces or eliminates social desirability effects, even when such humanizing
features as recorded human voices are used. In attempting to reconcile these apparently con-
tradictory findings, we varied features of the interface in two Web surveys and a telephone
survey. In the first Web experiment, we presented an image of (1) a male researcher, (2) a
female researcher, or (3) the study logo at several points throughout the questionnaire. This
experiment also varied the extent of personal feedback provided to the respondent. The sec-
ond Web study compared three versions of the survey: (1) One that included a photograph of
a female researcher and text messages from her; (2) another version that included only the text
messages; and (3) a final version that included neither the picture nor the personalizing mes-
sages. Finally, we carried out a telephone study using a method—interactive voice response
(IVR)—in which the computer plays a recording of the questions over the telephone and
respondents indicate their answers by pressing keys on the telephone handset. The IVR study
varied the voice that administered the questions. All three surveys used questionnaires that
included sensitive questions about sexual behavior and illicit drug use and questions on gen-
der-related attitudes.

§
An earlier version of this paper was originally presented at the CHI ’01 Conference on Human Fac-
tors in Computing Systems in Seattle, Washington in April, 2001.
* Corresponding author.
E-mail address: rtourangeau@survey.umd.edu (R. Tourangeau).

0747-5632/03/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved.
PII: S0747-5632(02)00032-8
2 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

We find limited support for the social interface hypothesis. There are consistent, though
small, effects of the ‘‘gender’’ of the interface on reported gender attitudes, but few effects
on socially desirable responding. We propose several possible reasons for the contradictory
evidence on social interfaces.
# 2002 Elsevier Science Ltd. All rights reserved.
Keywords: Social interfaces; Web surveys; IVR surveys; Social desirability

1. Introduction

Two distinct research traditions have examined how features of the interfaces of
computerized systems can alter the character of the interaction and thereby affect
the responses people give to the system. One tradition—social interface theory
(Dryer, 1999; Kiesler & Sproull, 1997; Reeves & Nass, 1997)—is attracting con-
siderable interest within the field of human–computer interaction. Much of the
support for social interface theory comes from laboratory-based studies. The other
tradition involves the systematic comparison of different modes of survey data col-
lection. Recent mode comparisons have focused on two of the newest modes of data
collection—Web surveys and a method of telephone data collection variously refer-
red to as telephone audio computer-assisted self-interviewing (T-ACASI), touchtone
data entry (TDE), and interactive voice response (IVR). This second tradition relies
on field studies in which the same questions are administered under the different
methods of data collection. Unfortunately, the two traditions seem to come to dif-
ferent conclusions about the impact of incidental features of interfaces.

1.1. Social interface theory

A growing body of laboratory research suggests that relatively subtle cues (such as
‘‘gendered’’ text or simple inanimate line drawings of a face) in a computer interface
can evoke reactions similar to those produced by a human, including social desir-
ability effects. Nass, Moon, and Green (1997), for example, conclude that the ten-
dency to stereotype by gender can be triggered by such minimal cues as the voice on
a computer. Based on the results of a series of experiments that varied a number of
cues in computer tutoring and other tasks, Nass and his colleagues (Fogg & Nass,
1997; Nass, Fogg, & Moon, 1996; Nass, Moon, & Carney, 1999; Nass et al., 1997)
argue that people treat computer interfaces as though they were people—that is, as
social actors rather than inanimate tools (see also Couper, 1998).
Additional support for the hypothesis that a computer interface can function as a
virtual human presence comes from a study by Walker, Sproull, and Subramani
(1994). They administered questionnaires to people using either a text display or one
of two talking-face displays to ask the questions. Those interacting with a talking-
face display spent more time, made fewer mistakes, and wrote more comments than
did people interacting with the text display. However, people who interacted with an
expressive face liked the face and the experience less than those who interacted with
an inexpressive face. In a subsequent experiment, Sproull, Subramani, Kiesler,
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 3

Walker, and Waters (1996) varied the expression of a talking face on a computer-
administered career counseling interview; one face was stern, the other pleasant. The
faces were computer-generated images with animated mouths. They found that:

People respond to a talking-face display differently than to a text display. They


attribute some personality attributes to the faces differently than to a text display.
They report themselves to be more aroused (less relaxed, less confident). They
present themselves in a more positive light to the talking-face displays (p. 116).

Social interface researchers have used a range of humanizing cues to foster a sense
of virtual presence. Some studies use cues that confer human-like qualities on the
interface, such as gender or specific personality traits (e.g. pleasantness). Other
studies have tried to create the impression that the interface is actively interacting
with the user. Our studies examined both types of humanizing cues, varying both the
gender associated with the interfaces and the degree of apparent agency.

1.2. Survey studies of mode effects

If the social interface theory is correct, it has important implications for survey
practice. Within the survey industry, there is an increasing trend toward the use of
computer-assisted interviewing, and especially the use of the World Wide Web, for
administration of surveys (Couper, 2001; Couper and Nicholls, 1998). In addition,
more and more surveys include sensitive questions (on sexual behavior, drug use,
etc.), raising concerns about social desirability effects and interviewer influences on
the answers. Partly in response to these developments, more surveys have come to
rely on computer-assisted self-interviewing (CASI) methods, in which the respon-
dent interacts directly with the computer to answer questions. The most recent
manifestation of this trend is the development of audio-CASI, in which the respon-
dent listens to a recording of an interviewer reading the questions and enters the
responses into the computer. A number of studies have compared CASI and audio-
CASI to alternative methods of data collection in field-based experiments. The gen-
eral finding in the survey literature is that self-administration increases reports about
sensitive behaviors and that computerized forms of self-administration—that is,
CASI methods (including audio-CASI)—maintain or increase these advantages over
interviewer-administered methods (Tourangeau & Smith, 1998). Some have gone so
far as to argue that voice does not matter when asking questions about sexual
behavior (e.g. Turner, Forsyth, O’Reilly, Cooley, Smith, Rogers, & Miller, 1998;
Turner, Ku, Rogers, Lindberg, Pleck, & Sonenstein, 1998), although these claims
have yet to be tested empirically.
The survey results appear to contradict the findings of the social interface
researchers. If subtle humanizing cues do indeed influence the behavior of computer
users, we would fully expect the gender of the voice to affect the answers given to
survey questions on topics such as gender attitudes and sexual behavior. Further,
the increasing use of multimedia tools on the Web make it possible to add a variety
of humanizing visual and aural cues to Web surveys; these cues could reduce or even
4 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

completely eliminate the gains from self-administration for items eliciting sensitive
information. It is thus important to explore the apparent contradiction between the
social interface and survey methods work and to try to bring these two strands of
research together.
There are numerous differences between the two research traditions that could
account for the discrepant results. For one, virtually all of the social interface
research has been conducted with students as volunteer subjects. In contrast, the
survey-based findings are from probability samples of broader populations (e.g.
teenage males, women between 15 and 44 years old, the adult US population). In the
former, the number of subjects is typically measured in tens or scores while, in
the latter, sample sizes go up to the thousands.
The measurement settings also differ sharply. The social interface work is typically
done in a laboratory setting, free from distractions and from threats to privacy. Most
of the CASI surveys are conducted in the respondent’s home with an interviewer
present—and sometimes with other family members present as well. The more con-
trolled environment of the laboratory may well focus subjects’ attention on the
experimental manipulation more than would be true in an uncontrolled real-world
setting where the respondents are not aware they are in an experiment and where
outside distractions may draw much of their attention. Furthermore, the measure-
ment devices differ considerably between the two approaches. The social interface
experiments often use subjects’ performance on a computer task as the dependent
measure. When questionnaire measures are used in social interface studies, they are
typically measures designed to assess social desirability or impression management.
The findings from the survey world are based on measures of highly sensitive beha-
viors (e.g. abortions, number of sex partners, engagement in high risk sexual beha-
viors, illicit drug use, etc.); the items are often drawn from ongoing national surveys.
We are engaged in a program of research to explore the issue of the effect of
interface design and social interface features on survey responses. Here we report on
the results of two Web experiments and an experiment using IVR to investigate the
effect of interface features on social desirability effects and other response problems
in surveys. In all three of the studies we describe here, some versions of the survey
questionnaire included gender cues (conveyed by photographs or recorded voices),
allowing us to compare ‘‘male’’ and ‘‘female’’ versions of the questionnaire or gen-
dered with neutral versions. In addition, our first and third studies included cues
designed to create a sense of agency. In the first study, we varied whether or not the
interface gave tailored feedback to the respondents, based on their answers to earlier
questions; in the final study, we varied whether the recorded voices administering the
questions used first or third person language.

2. First Web experiment

Our first study was designed to explore the effect of characteristics of the interface
on the responses obtained to survey questions administered via the World
Wide Web. We varied two dimensions in an attempt to increase the sense of social
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 5

presence in a Web survey interface—we personalized the interface by making attri-


butes of the researcher salient and we made the automated interviews interactive by
addressing the respondents by name and echoing back their answers.

2.1. Methods

We created six versions of the Web survey instrument that differed on the dimen-
sions of personalization and interaction. At several points in the questionnaire,
the personalized versions of the questionnaire displayed pictures of one of the
researchers (either Tourangeau or Steiger). A third version of the questionnaire
presented the logo for the study (‘‘Study of Attitudes and Lifestyles’’) instead of
either investigator’s picture. Along with the pictures, the program displayed relevant
statements from the investigator (Hi! My name is Roger Tourangeau. I’m one of the
investigators on this project. Thanks for taking part in my study.) There were, thus,
three levels of the personalization variable, involving the presentation of pictures
and messages from the male investigator, the female investigator, or only the pre-
sentation of the study logo.
We also varied the level of interaction in the Web survey. The high interaction
versions of the questionnaire addressed the respondent by name and used the first
person in introductions and transitional phrases (e.g. Thanks, [NAME]. Now I’d
like to ask you a few questions about the roles of men and women) and occasionally
echoed back to the respondents their earlier answers (According to your responses,
you exercise once daily). The low interaction versions used more impersonal lan-
guage (The next series of questions is about the roles of men and women) and gave less
tailored feedback (Thank you for this information). Examples of screens from each
condition designs are shown in Figs. 1–3.
We crossed the two experimental variables, resulting in a 32 experiment. We
randomly assigned respondents to one of the six cells in the design, with final sample
sizes shown in Table 1.

2.1.1. Questionnaire
The survey questionnaire contained a variety of items, most of them from existing
instruments:

Gender attitudes: eight items from Kane and Macauley’s (1993) study regarding
the roles of men and women (e.g. Thinking about men as a group, do you think men
have too much influence, about the right amount of influence, or too little influence in
society?);
Socially undesirable behaviors: three items on drinking and illicit drug use and
three on diet and exercise;
Socially desirable behaviors: two items on voting and church attendance;
Self-reported social desirability: sixteen items from the Marlowe–Crowne Social
Desirability Scale (Crowne & Marlowe, 1964) and the 20-item Impression Man-
agement scale from the Balanced Inventory of Desirable Responding (BIDR;
Paulhus, 1984), all of them using a true/false answer format;
6 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

Fig. 1.

Trust: three items on trust from the General Social Survey and National Election
Studies (e.g. Most people can be trusted);
Debriefing questions: nine items to assess social presence and evaluate the inter-
view experience (e.g. How much was this interview like an ordinary conversation?
How much was it like dealing with a machine?);
Demographic questions: eight items eliciting the respondent’s age, race, sex and so on.

Many of the items are known to be susceptible to various sorts of response effects.
For example, Kane and Macauley showed that responses to the gender items partly
reflects the sex of the interviewer who administers them; on the average, women
interviewers elicit more pro-feminist responses than men interviewers do. We inclu-
ded the gender attitude items to see whether our attempt to personalize the interface
produced similar effects—that is, more pro-feminist responses with the ‘‘female’’
than with the ‘‘male’’ interface. We included the items on diet, exercise, drinking,
drug use, voting, and attendance at church to test the hypothesis that humanizing
the interface (both by personalizing it and by making it more interactive) would
increase the number of socially desirable responses and decrease the number of
socially undesirable responses. The items we chose are typical of sensitive items that
appear in surveys, and they show systematic reporting problems.
The social desirability and impression management items have been used for
similar purposes to measure socially desirable responding in several studies on social
presence (e.g. Moon, 1998; Sproull et al., 1996), and we included them in our studies
for the sake of comparability. We included the trust items to see whether the
experimental variables had the most impact on those who are low on trust (as
Aquilino & LoSciuto, 1990, found). We included the demographic items to assess
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 7

Fig. 2.

subgroup differences. In addition, we solicited the respondent’s first name


at the outset. The high interaction treatment versions of the questionnaire addressed
the respondent by name at several points in the survey. On average, respondents
took about 15 min to complete the questionnaire.
According to social interface theory, increasing the social nature of the Web survey
interaction, whether by personalizing the survey or making it interactive, would yield:
(1) higher scores on the standardized measures of social desirability and impression
management; (2) lower reports of socially undesirable behaviors (drug use, drinking,
fat consumption); and (3) higher reports for socially desirable behaviors (church
attendance, voting, exercise). The social interface viewpoint also suggests that the
‘‘male’’ interface will elicit less positive attitudes toward women and the ‘‘female’’
interface more positive attitudes, with the neutral logo occupying a middle position.

2.1.2. Sample design and implementation


The sample for this study was selected from a frame developed by Survey Sam-
pling Inc. (SSI). The SSI list consists of more than seven million e-mail addresses of
Web users, compiled from various sources; in each case, visitors to specific Web sites
agreed to receive messages on a topic of interest. For our study, SSI selected a
sample of 15,000 e-mail addresses and sent out an initial e-mail invitation to take
part in ‘‘a study of attitudes and lifestyles’’. The e-mail invitation included the URL
of the Gallup Web site with our questionnaire and a unique ID number (which
prevented respondents from completing the survey more than once). After 10 days,
SSI sent a second reminder e-mail to sample persons who had not yet completed the
survey. At no time did we have access to the e-mail addresses or any other identify-
ing information of sampled persons or respondents.
8 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

Fig. 3.

A total of 3047 sample members completed the questionnaire, for a response rate
of approximately 20%. Less than 1% of the initial e-mail invitations bounced back
as invalid addresses. Another 434 persons (3% of the sample) began the survey but
broke off without finishing it. We focus here on the respondents who completed the
survey. The number of completed cases per cell is shown in Table 1.
The number of cases we obtained far exceeds that for most of the experimental
studies on social interfaces (which typically include 10–20 subjects per cell). Thus,
the study should have ample statistical power to detect the effects of the manipu-
lations. Furthermore, the respondents to our survey represent a much more diverse
group than is typically found in laboratory-based experiments. However, we
acknowledge that neither the initial invitees nor the final respondents are repre-
sentative of any larger population. Our focus is on detecting differences across
experimental manipulations rather than on making inference to some broader
population.

2.2. Results

2.2.1. Rates of breaking off


One possible advantage of humanizing the interfaces of Web surveys is that it may
increase respondent motivation and interest, thereby improving data quality (com-
pleteness of responses) and reducing breakoffs or incompletes. The literature on mail
surveys has many examples of the positive effect of personalization on response rates
(Dillman, 2000). Dommeyer and Ruggiero (1996), for example, increased the
response rate to a mail survey by adding a picture of an attractive young woman to
the cover letter. This may be the reason that Web surveys sometimes use similar
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 9

Table 1
Number of subjects per cell

Personalizing cues Interaction Total

Low High

Logo 492 502 994


Male picture 529 529 1058
Female picture 501 492 993
Total 1522 1523 3047

features (e.g. http://us.lightspeedpanel.com). However, we find no evidence that


adding humanizing features improved response in our survey. Among those who
started the survey and thus were exposed to the experimental manipulations, there
are no significant differences in the proportion who finished it by experimental
treatment. The proportion of participants who completed the questions once they’d
started them ranged from a low of 85.8% to a high of 89.2% across the six experi-
mental groups.

2.2.2. Response scales


We created a number of scales that served as the key outcome measures in our
study. The social desirability scale ranged from 0 to 16, with the scores representing
the number of socially desirable answers to the Marlowe–Crowne items. We used
the same strategy for the BIDR impression management scale, creating a summary
score ranging from 0 to 20, again with a higher score indicating greater impression
management. For the gender attitude items, we created a scale that combined
responses across the eight items; scores on this scale ranged from 8 to 24, with higher
score indicating more pro-feminist attitudes.
We created a similar index that combined answers to eight of the sensitive ques-
tions; the scores (which ranged from zero to eight) on this index represented the
number of embarrassing answers given in response to those questions. Respondents
got a point each if they reported they consumed more dietary fat than the average
person, were 20 pounds or more over their ideal weight, exercised once a week or
less, drank alcohol almost every day (or more often), had smoked marijuana, had
used cocaine, did not vote in the last election, and did not attend church in the last
week. These items were not designed to tap a single trait or attitude as the other
scales were. Rather we simply wanted to create a summary measure to protect us
from overinterpreting scattered findings involving the eight individual items. In fact,
the eight items did not show a very high degree of internal consistency; Cronbach’s
alpha—which measures scale reliability based on the average intercorrelation among
a set of items—was only 0.19 for these items. As a result, we report the results both
for the scale and for the eight individual items. The significance levels for the indi-
vidual items are clearly inflated because of the large number of significance tests
involved.
10 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

2.2.3. Socially desirable responses


The results for each of these scales by each of the two experimental variables are
presented in Table 2. None of the effects reach statistical significance (P > 0.10) with
the exception of the effect of personalization on gender attitudes, to which we return
later. To perform a stronger test of the social interface hypothesis, we combined the
two experimental conditions, and contrasted the high social interface group (high
interaction and male/female picture) with the low social interface group (low inter-
action, logo). The differences in means again do not approach statistical significance.
We tried a variety of alternative specifications, including control variables and
interaction terms, but the findings essentially remain the same.
There were a few scattered findings for some of the individual sensitive items. We
include a few examples of both socially undesirable and socially desirable behaviors
in Table 3. For reports about voting, the personalization variable had a significant
impact (w2=6.35, df=2, P < 0.05). Contrary to expectation, the respondents who
got the female picture were less likely to say they had voted in the most recent elec-
tion than those who saw the logo, while those who got the male picture were most
likely to say they had voted (cf. Hutchinson & Wegge, 1991, who found no effects of

Table 2
Scale means, by condition (standard errors in parentheses)

Social Impression Gender Sensitive


desirability management attitudes admissions

Interaction n.s. n.s. n.s. n.s.


Low interaction 7.87 (0.14) 8.84 (0.19) 18.25 (0.16) 3.27 (0.07)
High interaction 7.83 (0.10) 8.91 (0.13) 17.98 (0.11) 3.30 (0.05)

Personalization n.s. n.s. P<0.05 n.s.


Logo 7.95 (0.10) 8.87 (0.13) 18.09 (0.12) 3.27 (0.05)
Male Picture 7.77 (0.09) 8.73 (0.13) 17.77 (0.11) 3.21 (0.05)
Female Picture 7.85 (0.09) 8.84 (0.13) 18.19 (0.11) 3.31 (0.05)

Table 3
Percentages reported selected behaviors, by condition

% Used cocaine % Smoked % Drink daily % Attended % Voted in


in lifetime marijuana in or almost daily church last election
last year last week

Interaction n.s. n.s. n.s. n.s. n.s.


Low interaction 14.2 10.7 7.8 23.3 53.2
High interaction 15.3 10.2 7.7 25.7 52.2

Personalization n.s. n.s. n.s. n.s. P<0.05


Logo 15.4 10.8 7.4 23.2 52.8
Male picture 14.7 9.9 8 24.3 55.3
Female picture 14.2 10.5 7.7 26.1 49.7
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 11

the gender of a live interviewer in a telephone survey on electoral participation). In


general, though, neither the level of personalization nor the level of interaction had
much effect on reports about sensitive topics.

2.2.4. Gender attitudes


The only hypothesized effect that we found support for in our data involved the
gender attitudes scale (see Table 2). We expected respondents of both sexes to report
the most pro-feminist attitudes when the program displayed pictures and messages
from the female investigator and the least pro-feminist attitudes when the program
displayed the pictures and messages from the male investigator. We expected the
group who got the survey logo to fall in between the other two. This pattern was
apparent in the scale means by condition and reached statistical significance (F(1,
3028)=5.52, P < 0.05). The effect is consistent with that found for live interviewers
(Kane & Macauley, 1993), but much smaller in the Web case.
It is not clear whether this difference in reported gender attitudes reflects social
presence or something else. Research on race-of-interviewer effects (Devine, 1989)
has found that racial stereotypes can be ‘‘primed’’ simply by presenting an image of
the target group. The account is referred to as the ‘‘mere presence’’ hypothesis, and
it represents an alternative to the ‘‘racial deference’’ hypothesis accepted by many
survey researchers (Athey, Coleman, Reitman, & Tang, 1960; Schuman & Converse,
1971). According to the deference account, people avoid expressing negative stereo-
types in the presence of a member of the target group—the interviewer—for fear of
giving offense. Our finding that the female picture elicits more pro-feminist attitudes
and the male picture less pro-feminist attitudes (with the logo occupying a middle
position) is consistent with both interpretations. That is, our respondents may have
wanted to avoid offending the researchers whose pictures were displayed or they
may have had certain attitudes primed by those pictures. In any case, the effects of
the interface gender seem modest at best.

2.2.5. Models of the interview


The questionnaire also contained a set of six debriefing items, two designed to
measure how much the interview was like a conversation, two designed to assess
how much it was like filling in a form, and two designed to measure how much the
interview was like dealing with a machine. These were combined into three scales.
Neither of the experimental variables had much effect on the responses to these
scales, with the one exception—the level of personalization had a significant impact
on whether the survey was rated as like interacting with a machine, F(2, 3016)=3.45,
P < 0.05. But, contrary to expectation, the male picture version was seen as most like
interacting with a machine and the female version was rated least machine-like.

3. Second Web experiment

Our second Web study was designed as a partial replication of the first. Given
the general lack of effects in the first study, we were concerned that the feedback
12 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

in the low interaction version may have been too mechanical; for instance,
when the respondents’ answers were fed back to them, the answers were in par-
entheses. In addition, the BIDR items were administered as dichotomous true/
false items (to be consistent with the Marlowe–Crowne items) rather than as
seven-point scales scored as dichotomies after the fact (following the procedure
outlined by Paulhus, 1984). We refined the design to correct these potential prob-
lems and administered the revised experimental treatments to a new sample of
respondents.

3.1. Method

We collapsed the study into a one-way design with three versions of the Web
questionnaire: (1) One version displayed the female researcher’s picture and pro-
vided feedback to the respondent; (2) the second version provided the same feedback
as the first version but displayed the study logo; and (3) the final version displayed
the logo but provided no feedback. In addition, we changed the format of the feed-
back text so it would appear more natural. We also experimentally compared the
two-point version of the BIDR items with the seven-point version. This resulted in a
32 experiment with three levels of the social presence manipulation crossed with
two versions of the BIDR items.

3.1.1. Sample design and data collection


Subjects were recruited in a similar fashion to the first study, with one exception.
A change in Survey Sampling Inc.’s policy regarding the use of the e-mail list pre-
cluded the use of reminders to nonrespondents. We thus increased the initial sample
size to 25,000, anticipating a lower response rate than the first study. In all other
respects, the process of inviting subjects to the survey Web site and completing the
survey were the same.
Of the 25,000 members of the SSI list invited to participate in the study of ‘‘Atti-
tudes and lifestyles,’’ 2987 completed the survey for a 12% response rate. In addi-
tion there were 255 partials (breakoffs). As with the first Web study, there was no
effect of the experimental manipulations on the completion rate.

3.1.2. Questionnaire
Aside from the change in the experimental variables noted above, the ques-
tionnaire we used was very similar to the one used in the first Web study. The
two-point version of the impression management scale was presented as a series of
true/false items, with one point given for each response indicating an attempt to
create a positive impression. The items in the seven-point version were presented
as scales ranging from 1 (not true) to 7 (very true), with the items dichotomized
after the fact. One point was added for every 6 or 7 response, after reverse scoring
negatively-scaled items (following Paulhus, 1991). Thus, both versions of the
impression management scale have a minimum of 0 and a maximum of 20. To
shorten the questionnaire, we dropped the Marlowe-Crowne items from the sec-
ond study.
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 13

3.2. Results

3.2.1. Impression management


We first examined the effect of personalization on the BIDR impression manage-
ment items. The manipulation had a significant effect (F(2, 1505)=6.77) on the two-
point version but not on the seven-point version (F(2, 1453) < 1); see Table 4. The
trend is as expected for both versions of the scale, with the greatest personalization
(researcher’s picture and feedback) producing the highest levels of impression
management.

3.2.2. Gender attitudes


The overall effect of personalization on gender attitudes was only marginally sig-
nificant, F(2, 2966)=2.66 (P< 0.10), but when we contrast the version which dis-
played the female investigator’s picture with the other two versions of the survey, we
find a significant difference—F(2, 2966)=4.62 (P< 0.05). The version with the
female picture elicited the most pro-feminist responses, again suggesting a possible
priming or deference effect.

3.2.3. Sensitive questions


Our main expectation, based on social interface theory, was that increasing the
degree of personalization of the Web interface would decrease the number of
socially sensitive disclosures. Again, the results in Table 4 provide little empirical
support for this position. While the mean number of sensitive admissions is slightly
lower in the most personalized version, this effect fails to reach statistical sig-
nificance, F(2, 2975)< 1. Furthermore, an examination of the individual items mak-
ing up the embarrassing admissions scale reveals a few scattered significant effects,
but no consistent pattern across the three experimental conditions. Examples are
shown in Table 5.
The cocaine and marijuana items produce statistically significant differences
across conditions. However, while the picture-plus-feedback condition produces the
lowest levels of reporting (as expected), the logo-plus-feedback version produces
the highest levels of sensitive admissions. Two other examples—alcohol consumption
and church attended are presented in Table 5. These, along with the other individual
behavior items, show neither significant effects nor trends in the expected direction.

Table 4
Scale means, by condition (standard errors in parentheses)

Impression Impression Gender Sensitive


management management attitudes admissions
(2-point) (7-point)

Personalization P <0.01 n.s. n.s. n.s.


Logo, no feedback 8.76 (0.18) 8.44 (0.18) 17.61(0.13) 3.29 (0.05)
Logo with feedback 9.27 (0.17) 8.46 (0.18) 17.54 (0.12) 3.29 (0.05)
Picture with feedback 9.64 (0.16) 8.62 (0.19) 17.81 (0.12) 3.26 (0.05)
14 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

Table 5
Percentages reported selected behaviors, by condition

% Used cocaine % Smoked % Drink daily % Attended


in lifetime marijuana or almost daily church last
in last year week

Personalization P<0.05 P<0.05 n.s. n.s.


Logo, no feedback 15.1 10.7 5.9 25.3
Logo with feedback 17.4 11.4 8.2 25.4
Picture with feedback 13.3 8 7.1 26.8

3.2.4. Models of the interview


As a manipulation check, we retained four of the debriefing items from the first
Web study, two designed to measure how much the Web survey was like a con-
versation and two assessing how much it was dealing with a machine. A low score
on the scales means that the survey was more like a conversation (or more like
interacting with a machine); high scores mean the opposite. The experimental
manipulation produced significant effects in the expected direction on both scales
(see Table 6). The version with both a picture and feedback was viewed by respon-
dents as most like a conversation and least like interacting with a machine. These
results provide some evidence that the experimental manipulations worked, despite
the lack of effect on the main outcome variables of interest.

4. IVR experiment

A potentially powerful humanizing cue is a human voice (Nass et al., 1997).


Recordings of human interviewers reading survey questions have sometimes been
added to computerized self-administered surveys partly to make the surveys more
accessible to respondents with limited reading ability. Audio-CASI (in which the
computer plays a digitized recording to the respondent of an interviewer reading the
questions as well as presenting the text of the question on screen) and its telephone
counterpart, interactive voice response (IVR), are widely used in surveys.1 Recorded
human voices almost inevitably convey the gender of the speaker and may, there-
fore, have an impact on responses to sensitive questions (such as questions about
sexual behavior) and on responses to questions about gender roles. Social interface
theory suggests that the addition of a recorded human voice to a self-administered
questionnaire could affect answers to both types of item. In contrast, the survey
studies to date suggest that audio-CASI may yield significantly greater reporting of
sensitive information than self-administered methods that dispense with the sound
(Tourangeau & Smith, 1998; Turner, Ku et al., 1998).

1
Turner and his colleagues refer to this method as telephone audio computer assisted self-interviewing,
or T-ACASI (see Cooley, Miller, Gribble, & Turner, 2000; Gribble, Miller, Cooley, Catania, Pollack, &
Turner, 2000).
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 15

Table 6
Mean ratings of Web survey, by condition (ns in parentheses)

Interview like Conversation Interview like Machine

Personalization P<0.01 P <0.01


Logo, no feedback 2.73 (959) 2.04 (958)
Logo with feedback 2.67 (1003) 2.10 (999)
Picture with feedback 2.59 (1007) 2.18 (1003)

Given the widespread adoption of audio-CASI and IVR, it is a matter of con-


siderable practical importance to survey researchers whether the voice used to
record the questions has any impact on the answers. We carried out our next
experiment to find out.

4.1. Methods

We compared six versions of an IVR interview. The different versions used dif-
ferent voices to record the questionnaires (a male voice, a female voice, or the male
voice reading the questions themselves and the female voice reading the answer
options) and also varied whether the questionnaires used personalizing language
(e.g. the first person). The respondents were selected via random digit dialing (RDD)
and were contacted by a live telephone interviewer, who switched those agreeing to
take part into the automated IVR interview. The IVR questionnaire included a
subset of the items from the Web studies, including the gender attitude items and
questions on a number of sensitive behaviors.

4.1.1. Sample design and data collection


The sample for our IVR study was selected using a list-assisted RDD design. A
list-assisted RDD sample begins with the selection of a sample of 100-banks from
among all active banks that include some minimal number of residential listings. (A
bank is a group of 100 consecutive potential numbers that share the same area code,
exchange, and two-digit prefix. For example, the numbers from 301 314–7900 to 301
314–7999 make up a single bank.) In this study, we restricted the sample to banks
with at least one residential listing. (This doubles the rate of working residential
numbers in the sample, with limited loss of coverage). Once the banks are selected,
randomly generated two-digit suffixes are appended to complete the telephone
number.
Interviewers at the Gallup Organization attempted to contact 6715 randomly
generated numbers. Approximately 61% of these were working residential numbers
(the rest being businesses or unassigned numbers). A total of 1598 cases—an esti-
mated 40.9% of those eligible—answered a few preliminary questions and were then
switched to IVR. Of the cases switched to IVR, 1022 (or 64.0%) completed
the questions; the remaining 576 broke off at some point prior to completing the
IVR questionnaire. Many of the 576 breakoffs hung up during the switch to IVR
(153 cases) or immediately after entering the system (72) and thus provided no data.
16 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

The IVR portion of the interview lasted an average of about 8 min for those who
made it to the end of the questionnaire.

4.1.2. Experimental variables


The experimental design varied three factors. The first was whether the live inter-
viewer began the interview with a series of three innocuous questions (asking whe-
ther the respondent had ever been in an automated survey before and his or her
educational attainment and age). For approximately half the respondents, the tele-
phone interviewer asked these questions just before the switch to IVR; for the
remainder, these questions were the first three in the IVR interview. The second
factor in the design was the voice used to record the questions. One version of the
IVR questionnaire used the voice of a female, who read both the question text and
the associated answer options. The second version of the questionnaire used a male
voice for both the questions and the response options. The final version used the
same male voice as the second version to read the text of the question (What is your
gender?) and the same female voice as the first version to read the answer options. (If
you are male, press ‘‘1’’. If you are female, press ‘‘2’’.)
The final factor in the design was the level of personalization in the wording of the
IVR interview. The personalized version used first person in introducing questions
(Now, I will read you a few statements about the roles of men and women); the
impersonal version stuck to third person wording (Now, please listen to a few state-
ments about the roles of men and women). There were personal and impersonal ver-
sions of three different transitions, all of them coming prior to the section of the
interview that included the sensitive questions.

4.1.3. Questionnaire
The questionnaire was a streamlined version of the ones used in our two Web
surveys. For approximately half the respondents, the IVR questionnaire included
the innocuous opening items (asking the respondent’s age, and educational attain-
ment). Those items were then followed by 12 items on gender roles, seven items on
sensitive behaviors (covering exercise, diet, illicit, drug use, voting, and church
attendance), four debriefing questions, and three demographic items.2 The remain-
ing questionnaires included the same questions, except that they omitted the intro-
ductory questions (which the live interviewer had already administered before the
switch to IVR).

4.2. Results

The most important findings from the experiment involve responses to the sensi-
tive items and to the gender role questions; our presentation focuses on these. We
also briefly discuss our findings regarding the breakoffs and responses to the
debriefing items.

2
The gender role items used in the Web studies were expanded and modified to suit the IVR mode.
The cocaine item was dropped from the IVR study because of concerns about the sensitivity of this item.
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 17

4.2.1. Sensitive questions


A key question was whether personalization of the questionnaires or the gender
of the voice reading the recorded questions would make a difference in responses
to the sensitive items. We constructed our usual index—the number of embarras-
sing admissions each respondent made (ranging, in this study, from 0 to 7)—and
carried out a three-way analysis of variance, examining this variable as a function
of the three experimental factors. The voice gender variable had no impact on this
variable, but the level of personalization did. As can be seen in Table 7, the
impersonal versions of the questionnaires elicited significantly more embarrassing
admissions than the personalized versions (2.86 vs. 2.65); F(1, 966)=4.02, P< 0.05.
The analysis also revealed a complex interaction between the gender of the recorded
voice administering the IVR questions and whether any questions were administered
prior to the switch to IVR: when the innocuous questions came before the switch,
the recorded male voice elicited the highest average number of embarrassing
admissions (2.92); when the innocuous questions came after the switch, the male
voice elicited the lowest average number of embarrassing admissions (2.48). The
pattern was the opposite when a recorded female voice administered the questions—
there were more embarrassing admissions, on the average, when the innocuous
questions came before the switch (2.86) than when they came afterwards (2.64).
Overall, the interaction is significant—F(2, 966)=3.19, P < 0.05—though not readily
interpretable.

4.2.2. Gender attitudes


We carried out similar analyses on the gender attitude items, examining responses
by the three experimental factors. (For some of our analyses of these items, we also
included respondent sex as an additional factor in the analysis.) As in the previous
Web studies, we constructed an overall gender attitude scale by averaging responses
to the individual gender items. The items were measured on a five-point scale in the
IVR version, and the scale score is the average of the answers to the 12 items, with a
minimum of 1 (least pro-feminist) and maximum of 5 (most pro-feminist). As is
apparent from Table 7, the means on the gender attitude scale did not vary mark-
edly as a function of the experimental variables. In fact, none of the experimental
factors had a significant impact on gender attitude scores, either alone or in inter-
action with each other or with respondent sex. By contrast, respondent sex had a
very large impact on responses to the gender attitude items. The average scale score
for female respondents was 3.54 vs. 3.18 for the males, a highly significant differ-
ence—F(1, 990)=93.3, P < 0.001.

4.2.3. Breakoffs
The results presented so far exclude data from the cases who broke off without
completing the IVR questionnaire (though the results do not change appreciably if
we include the responses from those who quit part way through the interview).
Here we focus on the effect of the experimental variables on the breakoff rates.
Because a substantial proportion of the breakoffs occurred before the respondent
had answered a single question (225 out of 576 cases), we analyzed three related
18 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

Table 7
Mean sensitive admissions by condition (ns in parentheses)

Sensitive admissions Gender attitudes

Personalization P <0.05 n.s.


Personalized 2.65 (487) 3.37 (506)
Not Personalized 2.86 (490) 3.40 (515)

Voice n.s. n.s.


Male 2.67 (398) 3.39 (411)
Female 2.75 (264) 3.39 (279)
Male for questions/female for answers 2.87 (316) 3.38 (331)

Questions before switch by voice P <0.05 n.s.


Questions before switch (total) 2.71 (519) 3.37 (540)
Male 2.48 (228) 3.36 (231)
Female 2.86 (129) 3.41 (137)
Male for questions/female for answers 2.88 (162) 3.34 (172)

No questions before switch (total) 2.82 (459) 3.40 (481)


Male 2.92 (170) 3.42 (180)
Female 2.64 (135) 3.38 (142)
Male for questions/female for answers 2.86 (154) 3.41 (159)

rates—the proportion of respondents who completed the questionnaire given that


they were switched to IVR, the proportion who started the IVR interview (providing
at least one answer) given that they were switched, and the proportion who com-
pleted the interview once they started it. We used logit analyses that examined each
of these rates as a function of the experimental variables.
In each case, the one variable that made a difference was whether the respondent
got the three innocuous questions just prior to the switch to IVR; Table 8 sum-
marizes the main results. Respondents who got these questions just before the switch
were significantly more likely to complete the IVR questions (68.5% vs. 59.6%;
w2=11.0, df=1, P < 0.001), significantly more likely to start them (89.2% vs.
82.8%; w2=11.6, df=1, P < 0.001), and marginally more likely to finish the IVR
questions once they’d started them (76.7% vs. 72.0%; w2=3.17, df=1, P < 0.08).
Neither the gender of the recorded voice that administered the questions nor the
level of personalization had any effect the respondent’s likelihood of completing
the questionnaire.
Aside from never starting the questionnaire at all, respondents gave up most fre-
quently at two other points in the questionnaire—at the beginning of the gender
items, where a lengthy introduction explained the response scale, and about midway
through the questionnaire, where a transitional statement informed respondents that
they were ‘‘now about halfway done with the interview.’’ These findings are con-
sistent with earlier results that breakoffs are likely when respondents infer that there
are still many items to come (Tourangeau & Steiger, 1999).
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 19

Table 8
Proportion completing IVR questionnaire, by condition (ns in parentheses)

Completed Began IVR Complete


IVR questionnaire questionnaire
once started

Questions before Switch P< 0.001 P<0.001 n.s.


Questions before switch 68.5% (790) 89.2% (790) 76.7 (705)
No questions before switch 59.6% (807) 82.8% (807) 72.0 (668)

4.2.4. Models of the interview


The questionnaire for this study retained four of the debriefing items from the
earlier Web studies, two of them designed to measure how much the IVR interview
was like a conversation and two designed to measure how much the interview was
like dealing with a machine. The gender of the recorded voice had a significant
impact on both scales derived from these items. Table 9 shows the relevant means,
with a high score indicating least like a conversation or least like interacting with a
machine. The combination of a male voice for the question text and a female voice
for the answer categories seemed least like an everyday conversation and most like
dealing with a machine. For the conversation scale, F(2, 1006)=3.92, P < 0.05; for
the machine scale, F(2, 1005)=6.43, P < 0.01.

5. Discussion and conclusions

Across our three studies, our attempts to humanize the interfaces produced much
weaker effects than those reported by Nass, Sproull, or their colleagues, though our
efforts to give our questionnaires a human feel seem much more blatant than theirs
and our studies feature much larger samples. Table 10 summarizes the main features
of our studies and the major findings.

5.1. Socially desirable responding

For survey researchers, an important practical implication of the social interface


theory is that efforts to make electronic questionnaires more appealing and user-
friendly may reintroduce the very problems—interviewer effects and social desir-
ability bias—that electronic questionnaires were supposed to cure. At least for

Table 9

Like conversation Like machine

Voice P<0.05 P <0.01


Male 2.84 (411) 2.03 (411)
Female 2.77 (278) 1.85 (277)
Male for questions/female for answers 2.98 (329) 1.85 (329)
20
Table 10
Summary of findings, by study

First Web study Second Web study IVR study

R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24


Experimental Variables Picture (Male, female, logo) Version (Picture plus personalizing text, text Voice (Male, female, both)
only, neither)
Level of interaction (High, low) Response format for BIDR (7-point scale, Level of personalization (High, low)
2-point scale)

Sample Size 3045 completes; 436 partials 2987 completes; 268 partials 1022 completes; 351 partials

Social Desirability Scales


Marlowe–Crowne No effects – –
BIDR No effects Effect of version (2-point scale only) –

Sensitive Admissions
Scale No effects No effects No effects
Individual items Effect of picture on reported voting Effects of version: Use of marijuana in past No effects
year, Lifetime cocaine use

Gender Attitudes Effect of picture Effect of version No effects

Ratings of Interview
Chat No effect Effect of version Effects of voice, voice-personalization
Form No effect – –
Machine Effect of picture Effect of version Effect of voice

In the IVR study, an additional 225 cases were switched to the IVR system but hung up without answering a single question.
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 21

reports about sensitive behaviors, we find little reason for concern. In three tries
across two types of interface, we find only one significant effect of the interface, for
one version of the BIDR. Though significant, the effect is quite small, accounting
for less than 1% of the variance in the BIDR scores from that version; the gender of
the respondent accounts for approximately the same proportion of the variation in
BIDR scores in that study.
We find even less evidence for interface effects with scales constructed from sensi-
tive questions taken from real surveys. None of our three studies found an overall
effect for any of our experimental variables. Looking at five individual items in the
three studies, we find a total of three significant effects (out of a total of 35 possible
main effects and interactions) and none of these replicates across studies.

5.2. Gender attitudes

The results for reports about gender role attitudes are more consistent with the
social interface theory. We used the same items shown to be affected by the gender
of human interviewers (Kane & Macauley, 1993) and found that our attempt to
manipulate the gender of the interface had analogous results—when the researcher
displayed to the respondent was a woman, respondents gave more pro-feminist
answers. Only our final study, which varied the gender of the interviewer who
recorded the questions, failed to show the expected differences. Again, the size of
these gender-of-interface effects is quite small. In our first Web study, for example,
the correlation between scores on the gender attitude scale and the gender of the
researcher displayed was 0.050; again, this is far smaller than the impact of the sex of
the respondent (r=0.338).

5.3. Impressions of the survey

It seems clear that respondents noticed our variations of the interface. In all three
studies, the experimental variables had a significant impact on the debriefing items.
Our efforts to humanize the interface did succeed in making the experience of com-
pleting the surveys seem more like talking to a friend and less like interacting with a
machine (see the bottom panel of Table 10). But this change in the feel of the inter-
view did not seem to have much impact on respondent’s answers to the substantive
questions. Nor did the experimental variables have any impact on the likelihood
that respondents would complete the questionnaire once they started it. The only
variable that affected breakoff rates involved giving a foretaste of the questions
before they began the automated questionnaire.

5.4. Accounting for the discrepancy

In short, we find at best limited support for the social interface theory. We are puz-
zled by the discrepancy between our results and those of the social interface research-
ers. We included some of the same measures used in the past work (e.g. the BIDR),
used strong humanizing cues, and fielded much larger samples than the earlier studies.
22 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

Several things could account for the discrepant findings. First, our experimental
manipulations may not have been blatant enough to produce the hypothesized
effects. We believe our manipulations to be at least as obvious as many of those in
the social interface research studies; these studies often use very subtle cues, such as
a label on a computer monitor (Moon, 1998) or the shape of a mouth on a compu-
ter-displayed face (Sproull et al., 1996; see also Kiesler & Sproull, 1997; Parise,
Kiesler, Sproull, & Waters, 1999). A related possibility is that our manipulations did
not foster the sense of the computer as a social agent. In the Web studies, the inves-
tigators were clearly third parties, who were not actively participating in the inter-
action between the respondents and Web questionnaire. In the social interface
studies, by contrast, the computer itself is presented as the active agent and the
language the computer uses and the line drawings and facial expressions it presents
are used to convey the characteristics of that agent. Our manipulations may thus be
examples of computer-mediated communication (CMC) rather than human–
computer interaction (HCI), and the consequences of the two may differ. We are
currently conducting a follow-up study to examine this possibility. However, Nass,
Isbister, and Lee (2001) argue that the same effects should occur under both CMC
and HCI conditions.
Another explanation of the discrepant results could be the differences in the sam-
ples. The social interface studies mostly use college students; ours encompassed a
broader range of the population. In our Web studies, we had big enough samples to
examine several variables that might interact with the experimental variables (whe-
ther the respondent was currently a student, his or her age, prior survey experience,
and level of trust). For example, we tested the hypotheses that students are more
sensitive to the characteristics of the interface and that respondents with prior
experience with Web surveys are less sensitive to them. None of these hypotheses
received much support—we did not find any significant interactions between these
individual differences variables and the experimental variables on the reporting of
sensitive information or gender attitudes.
A final possible explanation, which we could not test, is that the demand char-
acteristics of laboratory-based experiments yield results that are not replicated in
our field studies. In the lab-based studies, the subjects are typically students and they
are aware of being in an experiment. As a result, they may be especially sensitive to
small variations in interfaces. The same variations may pass unnoticed by most
survey respondents, who are typically unaware that they are in an experiment
(believing instead that the purpose of the study is to elicit their views on particular
topics) and who may be distracted as they answer. Any of these differences
may account for the failure of the social interface theory to replicate beyond the
laboratory.
Given the influence of the social interface perspective in human–computer inter-
action research and interface design, it is important to understand whether and
how the findings from this work translate to the real-world experiences of those
who interact with computers. In two such applications—Web surveys and an
automated telephone interview—we find little support for the social interface
hypothesis.
R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24 23

Acknowledgements

This research was supported by the National Science Foundation, Award No.
SES-9910882. We thank Robert Tortora and Margrethe Montgomery of the Gallup
Organization for their assistance in planning and carrying out this project.

References

Aquilino, W., & LoSciuto, L. (1990). Effect of interview mode on self-reported drug use. Public Opinion
Quarterly, 54, 362–395.
Athey, K. R., Coleman, J. E., Reitman, A. P., & Tang, J. (1960). Two experiments showing the effects of
the interviewer’s racial background on responses to questionnaires concerning racial issues. Journal of
Applied Psychology, 44, 562–566.
Cooley, P. C., Miller, H. G., Gribble, J. N., & Turner, C. F. (2000). Automating telephone surveys: Using
T-ACASI to obtain data on sensitive topics. Computers in Human Behavior, 16, 1–11.
Couper, M. P. (1998). Review of Byron Reeves and Clifford Nass, ‘‘The media equation: how people
treat computers, television, and new media like real people and places’’. Journal of Official Statistics, 13,
441–443.
Couper, M. P. (2001). Web surveys: a review of issues and approaches. Public Opinion Quarterly, 64,
464–494.
Couper, M. P., & Nicholls, W. L. II. (1998). The history and development of computer assisted survey
information collection. In M. P. Couper, R. P. Baker, J. Bethlehem, C. Z. F. Clark, J. Martin, et al.
(Eds.), Computer assisted survey information collection. New York: John Wiley.
Crowne, D., & Marlowe, D. (1964). The approval motive. New York: John Wiley.
Devine, P. G. (1989). Stereotypes and prejudice: their automatic and controlled components. Journal of
Personality and Social Psychology, 56, 5–18.
Dillman, D. (2000). Mail and Internet surveys: the tailored design method (2nd ed.). New York: John Wiley.
Dommeyer, C. J., & Ruggiero, L. A. (1996). The effects of a photograph on mail survey response. Mar-
keting Bulletin, 7, 51–57.
Dryer, D. C. (1999). Getting personal with computers: how to design personalities for agents. Applied
Artificial Intelligence, 13, 273–295.
Fogg, B. J., & Nass, C. (1997). Silicon sycophants: the effects of computers that flatter. International
Journal of Human–Computer Studies, 46, 551–561.
Gribble, J. N., Miller, H. G., Cooley, P. C., Catania, J. A., Pollack, L., & Turner, C. F. (2000). The
impact of T-ACASI interviewing on reported drug use among men who have sex with men. Substance
Use & Misuse, 35(6–8), 869–890.
Hutchinson, K. L., & Wegge, D. G. (1991). The effects of interviewer gender upon response in telephone
survey research. Journal of Social Behavior and Personality, 6, 573–584.
Kane, E. W., & Macauley, L. J. (1993). Interviewer gender and gender attitudes. Public Opinion Quarterly,
57, 1–28.
Kiesler, S., & Sproull, L. (1997). ‘‘Social’’ human–computer interaction. In B. Friedman (Ed.), Human
values and the design of computer technology. Stanford, CA: CSLI Press.
Moon, Y. (1998). Impression management in computer-based interviews: the effects of input modality,
output modality, and distance. Public Opinion Quarterly, 62, 610–622.
Moon, Y. (2000). Intimate exchanges: using computers to elicit self-disclosure from consumers. Journal of
Consumer Research, 26, 323–339.
Nass, C., Fogg, B. J., & Moon, Y. (1996). Can computers be teammates?. International Journal of
Human–Computer Studies, 45, 669–678.
Nass, C., Isbister, K., & Lee, E.-J. (2001). Truth is beauty: researching embodied conversational agents. In
J. Cassell, J. Sullivan, S. Prevost, & E. Churchill (Eds.), Embodied conversational agents (pp. 374–402).
Cambridge, MA: MIT Press.
24 R. Tourangeau et al. / Computers in Human Behavior 19 (2003) 1–24

Nass, C., Moon, Y., & Carney, P. (1999). Are people polite to computers? Responses to computer-based
interviewing systems. Journal of Applied Social Psychology, 29, 1093–1110.
Nass, C., Moon, Y., & Green, N. (1997). Are machines gender neutral? Gender-stereotypic responses to
computers with voices. Journal of Applied Social Psychology, 27, 864–876.
Parise, S., Kiesler, S., Sproull, L., & Waters, K. (1999). Cooperating with life-like interface agents. Com-
puters in Human Behavior, 15, 123–142.
Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of Personality and
Social Psychology, 46, 598–609.
Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. R. Shaver, &
L. S. Wrightsman (Eds.), Measures of personality and social psychological attitudes (pp. 17–59). San
Diego: Academic Press.
Reeves, B., & Nass, C. (1997). The media equation: how people treat computers, television, and new media
like real people and places. Cambridge: CSLI and Cambridge University Press.
Schuman, H., & Converse, J. (1971). The effects of black and white interviewers on black responses in
1968. Public Opinion Quarterly, 35, 44–68.
Sproull, L., Subramani, M., Kiesler, S., Walker, J. H., & Waters, K. (1996). When the interface is a face.
Human–Computer Interaction, 11, 97–124.
Tourangeau, R., & Smith, T. (1998). Collecting sensitive information with different modes of data col-
lection. In M. P. Couper, R. P. Baker, J. Bethlehem, C. Z. F. Clark, J. Martin, et al. (Eds.), Computer
assisted survey information collection. New York: John Wiley.
Tourangeau, R., & Steiger, D. M. (1999). The impact of mode of data collection on response rates. Paper
presented at the International Conference on Nonresponse, Portland, Oregon, 28 October.
Turner, C. F., Forsyth, B. H., O’Reilly, J. M., Cooley, P. C., Smith, T. K., Rogers, S. M., & Miller, H. G.
(1998). Automated self-interviewing and the survey measurement of sensitive behaviors. In M. P. Couper
(Ed.), Computer assisted survey information collection. New York: John Wiley.
Turner, C. F., Ku, L., Rogers, S. M., Lindberg, L. D., Pleck, J. H., & Sonenstein, F. L. (1998). Adolescent
sexual behavior, drug use and violence: Increased reporting with computer survey technology. Science,
280, 867–873.
Walker, J., Sproull, L., & Subramani, R. (1994). Using a human face in an interface. In Proceedings of the
Conference on Human Factors in Computers (pp. 85–99). Boston: ACM Press.

You might also like