Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Journal of Business Research 67 (2014) 598–607

Contents lists available at ScienceDirect

Journal of Business Research

Detecting gender item bias and differential manifest response behavior:


A Rasch-based solution☆
Thomas Salzberger a,⁎, Fiona J. Newton b, 1, Michael T. Ewing b,⁎⁎
a
WU Wien, Institute for Marketing Management, Augasse 2-6, 1090 Vienna, Austria
b
Department of Marketing, Faculty of Business & Economics, Monash University, PO Box 197, Caulfield East, VIC, 3145 Australia

a r t i c l e i n f o a b s t r a c t

Article history: Although gender is a salient variable in consumer research, researchers largely overlook whether, and how, it
Received 1 March 2011 influences consumer response to indicators measuring latent variables. The authors therefore extend the
Received in revised form 1 November 2011 framework of measurement equivalence assessment to the largely overlooked issue of differential item
Accepted 1 March 2012
response behavior between men and women. This paper demonstrates the efficacy of using item response
Available online 21 March 2013
theory to investigate the presence of gender item bias. This methodological approach affords researchers
Keywords:
the means of objectively disentangling actual gender differences and gender bias. Ignoring the possibility
Gender item bias of gender item bias has the potential to bias means and thereby compromise any substantive gender-based
Measurement bias mean comparisons. The authors conclude with solutions to address gender item bias both pre and post
Item-response theory survey construction.
Differential item functioning © 2013 Elsevier Inc. All rights reserved.
Rasch modeling

1. Introduction Orhede & Kreiner, 2000) which plays a crucial role in marketing re-
search (Steenkamp, De Jong, & Baumgartner, 2010). This omission is
The comparability of measures of latent variables requires psy- likely to emanate from a lack of awareness of the problem itself.
chometric evidence of measurement equivalence (Steenkamp & Authors typically proceed on the assumption of items functioning in
Baumgartner, 1998). Although widely acknowledged among cross- the same way for females and males; and reviewers rarely, if ever,
cultural researchers, awareness of other criteria possibly modifying ask for evidence to support this supposition. It is feasible that lack
the response process and altering the relationship between a mani- of familiarity with appropriate psychometric methods also contrib-
fest indicator and the latent variable is either limited or nonexistent. utes to researchers overlooking this issue in marketing research out-
Gender is a case in point. Despite perennial interest in gender differ- side the field of cross-national or cross-cultural research. In fact, even
ences in consumer behavior (Dahl, Sengupta, & Vohs, 2009; Melnyk, in international research, limited awareness of the problems and
van Osselaer, & Bijmolt, 2009; Moss, 2009; Wolin, 2003), the extant ignorance of proper analytical approaches are still commonplace as
marketing literature typically fails to examine whether gender influ- evidenced by He, Merz, and Alden (2008).
ences participant responses to survey instruments. This is a major The present study therefore examines gender as a potential contrib-
theoretical gap, given the rich body of research indicating gender dif- utor to differential item response behavior between men and women.
ferences in terms of information processing and social behavior The study first establishes a theoretical foundation of differential item
(Eagly, 1987, 1993; Halpern, 1989; Putrevu, 2001; Wood & Rhodes, functioning (DIF; otherwise termed item bias) and discusses the differ-
1992). In practical terms, there is obvious potential for gender-item ent types of DIF resulting from respondent characteristics. The sug-
bias in survey research (Dean & Edwardson, 1996; Fleishman, gested method to test for DIF not only allows for the identification of
Spector, & Altman, 2002; Gelin, Carleton, Smith, & Zumbo, 2004; DIF but also helps establish a common metric of measures across
males and females should DIF occur. Empirical examples illustrate the
different ways in which gender can influence participants' responses
☆ The authors acknowledge and are grateful for the comments by Harmen Oppewal,
to surveys and the possible consequences of ignoring the issue.
Joshua Newton and Carla Taines (Monash University) to an earlier draft of this manu-
script. The authors alone are responsible for all limitations and errors that may relate to
the study and the manuscript. 2. Differential item functioning (DIF)
⁎ Corresponding author. Tel.: +43 1313364609; fax: +43 131336732.
⁎⁎ Corresponding author. Tel.: +61 39903 2563; fax: +61 39903 2974.
E-mail addresses: Thomas.Salzberger@wu.ac.at (T. Salzberger),
Although invariance of measurement is crucial, researchers typi-
Fiona.Newton@monash.edu (F.J. Newton), Michael.Ewing@monash.edu (M.T. Ewing). cally assume that the items mean roughly the same thing to all re-
1
Tel.: +61 39903 2563; fax: +61 39903 2974. spondents within a particular population. Yet a variety of factors

0148-2963/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jbusres.2013.02.045
T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607 599

may lead respondents to interpret and consequently respond to the assessment and service fairness in the higher education service indus-
meaning of the items in different ways, thereby threatening measure- try. Specifically, the authors analyze the effect of gender using hierar-
ment equivalence and comparability across groups. The scientific chical regression analyses with gender among the independent
community readily acknowledges this issue in cross-cultural applica- variables and service quality and fairness acting as the respective de-
tions by requiring appropriate empirical analyses to assess measure- pendent variables. These authors use items based on SERVQUAL to
ment equivalence rather than relying on fortunate coincidence. measure service quality and a single item to assess service fairness
However, the potential for gender invariance in participant response and find no effect for service quality. However, they report gender im-
patterns remains unknown despite extensive evidence indicating pacts on service fairness. Since the study does not test measurement
gender-based differences in social behavior and personality (Eagly, invariance, the lack of effect in the case of service quality and the ef-
1987, 1993; Eagly & Wood, 1991, 1999; Hall, 1984, 1987; Wood & fect in service fairness could result from any combination of a true im-
Rhodes, 1992); cognition (Halpern, 1989, 1992), learning (Severiens pact and measurement bias. The Snipes and Thomson (2006) study
& Ten Dam, 1998) and information processing (Putrevu, 2001). also highlights the potential limitations of relying on single-item
Explanations for gender differences include evolutionary (Buss, measures. Specifically, the use of a single item to measure service fair-
1995a, 1995b; Luxen, 2007), biological (Alexander, 2003; Williams ness not only rules out the possibility to test for invariance but also
& Meck, 1991) and sociological (Bem, 1981) influences. There are increases the likelihood that bias will at least partially account for
also potentially methodological explanations for gender differences. the alleged gender difference.
This paper focuses specifically on the latter. Evidence suggests that These examples of failure to test for gender-based DIF are by no
implicit cues in response formats can influence response patterns. In means exhaustive, but rather serve to illustrate how consumer re-
a study of jealousy in relation to infidelity, for instance, DeSteno and searchers often disregard the potential for gender to impact on the
colleagues (DeSteno, Bartlett, Braverman, & Salovey, 2002; DeSteno measurement models used to assess latent variables. The current
& Salovey, 1996) argue that men and women interpret stimuli differ- paper addresses this oversight by providing proof of concept that
entially. These gender differences, however, are not present in re- Rasch measurement theory (Rasch, 1960) is an effective way to inves-
sponse formats designed to remove implicit cues (DeSteno et al., tigate whether gender-based measurement invariance is present. The
2002). Moreover, the power of implicit cues to influence participant paper also provides insights into how researchers can properly ac-
responses links to respondents' inherent desire to understand the count for gender bias, thus facilitating their ability to objectively dis-
true purpose of the experiment and then conform to what they tinguish between actual gender differences and gender bias.
think researchers expect of them (Orne & Whitehouse, 2000).
When assessing differences between men and women in compar- 3. Methodological approaches to examining item DIF
ative research, there is a need for researchers to disentangle true and
artificial effects by properly investigating gender-related invariance As DIF is a manifestation of non-invariance of the measurement
of measurement. This involves a careful examination of gender effects, model, the same methods utilized to test for invariance in cross-
which is a collective term for any impact caused by gender as an inde- cultural research (Cheung & Rensvold, 1999; De Jong, Steenkamp, &
pendent variable on a dependent variable of interest. A gender effect Fox, 2007; Schaffer & Riordan, 2003; Singh, 1995; Steenkamp &
may consist of two components. First, gender may have a true impact Baumgartner, 1998) lend themselves as approaches to investigate
on the measured variable (i.e., an actual substantial difference be- gender-based DIF. Although both confirmatory factor analysis (CFA)
tween males and females exists). Second, a gender effect can result and item response theory (IRT) offer means of assessing measurement
from differential response behavior when measuring the dependent equivalence or DIF (Embretson & Reise, 2000; Ewing, Salzberger, &
variable (i.e., a methodological artifact requiring identification and Sinkovics, 2005; Schaffer & Riordan, 2003), the current paper utilizes
correction). In practice, gender effects may also result from a combi- the latter, namely the Rasch model (Andrich, 1978a, 1978b, 1988;
nation of true substantial effect and measurement bias. In this latter Rasch, 1960; Salzberger, 2009).
situation, the effects can work in opposite directions, with the possi- The defining characteristic of a Rasch measurement model is the re-
bility that DIF will mask a true effect. Alternatively, an observed mean quirement of invariance, which implies the independence of item pa-
difference can appear to be larger than it actually is due to DIF. rameters from respondent characteristics, for example gender. This
Despite this, researchers often ignore the issue of gender measure- property makes it a powerful method for identifying DIF. Unlike general
ment invariance. For example, Melnyk et al. (2009) analyze differ- IRT models (Singh, 2004), the Rasch model uses only location parame-
ences between men and women regarding customer loyalty across ters (item location) to model item characteristics but no discrimination
three separate studies. Although their third study uses four attitudinal parameters. The location parameters specify the position of the item and
and behavioral intention indicators to measure loyalty, an assessment the item's response categories on the continuum of the latent variable.
of measurement invariance in terms of gender is not undertaken. Sim- Thus, item parameters and person measures are directly comparable
ilarly, Seock and Sauls (2008) investigate gender effects on shopping as they are on the same metric. Furthermore, the Rasch model (Rasch,
orientation (operationalized by 18 items covering six dimensions) 1960) is without a number of the limiting assumptions associated
and evaluation of apparel retail stores (measured by 15 items loading with CFA. First, unlike CFA, the Rasch model does not assume that man-
on three different factors). The authors, however, do not subject the ifest item scores are interval-scaled. Second, the relationship between
data set to an invariance assessment. Thus, the higher importance of the latent variable and the manifest item score is non-linear. This
all three store evaluation factors for females (identified on raw score means that the model accounts for any floor and ceiling effects, since
comparisons) could be at least partly due to DIF. Likewise, de the probability of choosing extreme response categories approaches 0
Gregorio and Sung (2010) in their exploration of gender effects on at- and 1 respectively, at infinity. Third, the Rasch model does not depend
titudes toward product placement do not test for measurement arti- on any assumptions as to the distribution of the latent measures of re-
facts. Although Richard, Chebat, Yang, and Putrevu's (2010) study, spondents. This is particularly helpful as gender-based DIF can be inves-
published in the Journal of Business Research, constitutes a commend- tigated using samples that are not representative or which deviate from
able exception, in that they do assess measurement equivalence being normally distributed.
across genders using multi-group structural equation modeling, the Since marketing research typically utilizes multi-categorical re-
power to identify DIF is limited since seven out of the ten latent vari- sponse scales, the polytomous Rasch model for ordered categories
ables are measured using only two or three items. (e.g., the rating scale model, Andrich, 1978a, 1978b; the partial credit
Finally, Snipes and Thomson (2006) use the term gender bias model, Andrich, 1988; Masters, 1982) is of more interest than the di-
when actually testing for gender differences on the service quality chotomous model (Rasch, 1960). Multi-categorical items require
600 T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607

m − 1 threshold location parameters where m equals the number of respondents into m homogeneous classes along the continuum.
response categories. A threshold parameter (τij) indicates the transi- Within each group, the standardized differences between the
tion point between two adjacent response categories j and j + 1 expected item score and the observed mean score summate to provide
within item i, where two adjacent categories are equally likely. The an approximation to a chi-square distributed test statistic, where
average of the threshold parameters represents the overall location df = m − 1 (see Andrich, Sheridan, & Luo, 2004, p. 21; Hagquist &
(δi) of the item. Eq. (1) depicts the probability of choosing a category Andrich, 2004, p. 961). A non-significant chi-square supports item
scored x (with x going from 0 to m) depending on the person param- fit. Summing the individual item chi-square values yields an overall
eter βv, the item overall location δi and the threshold locations τij, fit statistic for the entire instrument. Data analysis has to address all
which are stated as deviations from their overall mean (Andrich, requirements embodied in the Rasch model: unidimensionality
1988, p. 366). The denominator γ is the sum of the numerators across (Christensen, Bjorner, Kreiner, & Petersen, 2002; Smith, 2002); local
all response categories. The numerator of scoring in the lowermost independence (Andersen, 1982); parameter invariance (DIF); and
category (0) is 1. Expression of parameters is in logits, or log-odds person fit (Smith, 1986). Furthermore, a comprehensive Rasch analy-
to agree to an item. sis comprises the investigation of the threshold order (Andrich,
 1995a, 1995b, 2010; Andrich, de Jong, & Sheridan, 1997). In a five cat-
 e ∑j¼1 −τij þx⋅ðβv −δi Þ
x
 egory item, a proper ordering of the threshold parameters τij exists
P avi ¼ xjβv ; τij ; j ¼ 1…m; 0 b x ≤ m ¼ ð1Þ when τi1 b τi2 b τi3 b τi4. However, if for example the estimate of
γ
τi2 is greater than that of τi3 then the thresholds become disordered,
signaling that the rating scale is not working as intended. Collapsing
where:
categories and scoring them equally is a post hoc remedy to account
  for disordered thresholds. User-friendly software like RUMM 2030
x
m ∑j¼1 −τij þk⋅ðβv −δi Þ (Andrich, Sheridan, & Luo, 2010) not only estimates model parame-
γ ¼ 1þ ∑e :
k¼1 ters but also provides a range of diagnostic opportunities capturing
deviations of the actual data from the theoretical ideal. A meaningful
The s-shaped item characteristic curve ICC (see Fig. 1a) graphically interpretation of the item locations and the person measures depends
depicts each item. In the dichotomous model, the curve represents upon demonstration of adequate fit of the data the model.
the probability of a positive response depending on the person loca-
tion. Polytomous items require a set of curves (the category charac- 3.1. DIF as a type of misfit
teristic curves); with one curve for each response category (see
Fig. 1b). In the polytomous case (see Fig. 1c), the ICC refers to the Differential item functioning represents a type of misfit because an
expected item score. The juxtaposition of the theoretical ICC and the item affected by DIF lacks invariance with regard to the latent vari-
empirical ICC showing the actual item mean scores from groups of re- able. For example, an item affected by gender-based DIF lacks invari-
spondents with similar location estimates (see Fig. 1d) exposes ance with regard to measures for men and for women. There are two
whether the item fits or not. Statistical tests of fit allow for determin- types of DIF—uniform and non-uniform. Uniform DIF by gender
ing the significance of the size of the residuals. implies that the item location differs for males and females. If the
The current paper draws upon a straightforward comparison of analysis fails to account for DIF, it provides one common item location
expected and observed responses to check whether the Rasch estimate that is incorrect for either gender. Consequently, the person
model accurately explains the data. This process involves grouping measures inferred from manifest responses are biased. Fig. 2a

(a) Item characteristic curve (ICC) (b) Category characteristic curves (CCC)
for a dichotomous item for a polytomous item
1 1

0 5
P(avi=1)

P(avi=1)

0.5 0.5
4
1 2 3

0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
βv, δi βv, τij

(c) Item characteristic curve (ICC) (d) Theoretical and empirical item
for a polytomous item characteristic curve (ICC)
for a polytomous item
Expected value of avi

Expected value of avi

5 5
4 4
3 3
2 2
1 1
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
βv, δi, τij βv, δi, τij

Fig. 1. Item characteristic and category characteristic curves in the Rasch model.
T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607 601

(a) No presence of DIF (b) Uniform DIF


P (a vi=1) 1 P (a vi=1)1

0.75 0.75

0.5 0.5

0.25 0.25

0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
βv, δi βv, δi
common ICC male and female expected common ICC ICC male ICC female

(c) Non-uniform DIF (d) Uniform and non-uniform DIF


P (a vi=1)1 P (a vi=1) 1

0.75 0.75
0.5 0.5
0.25 0.25

0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
βv, δi βv, δi
expected common ICC ICC male ICC female expected common ICC ICC male ICC female

Fig. 2. Item operating equally for males and females (a); item affected by uniform DIF (b), by non-uniform DIF (c), and by uniform as well as non-uniform DIF (d), respectively.

illustrates the situation where the item displays no DIF, while Fig. 2b for one group but predominantly negative for the other. In this con-
shows an item exhibiting DIF. When DIF is present, a proper repre- text, the DIF factor is significant. The Rasch model easily accounts
sentation of the item requires two separate ICCs: one for males and for this type of DIF by estimating one item location for males and an-
one for females. Estimating one common ICC implies that the item fa- other for females. Although at a technical level, this requires splitting
vors males (in the example in Fig. 2b) who score higher than females the item into a male-version and a female-version during the analysis,
truly located at the same position on the latent dimension. Therefore, it does not add to the proliferation of scales in marketing since the
the estimated measures for males and females will differ even though separation only occurs during the analysis. The administration of
they have the same true measures. In the case of uniform DIF, a quan- the scale itself typically remains unaltered. The same is true when re-
titative difference in the functioning of the item exists between gen- searchers retain non-uniform DIF items for one gender but discard
ders. Splitting the item in the data analysis permits accounting for these items for the other gender during the analysis.
this difference. If an item favors males, the item location for males
will be smaller than the location for females, since males more likely 4. The role of DIF by gender
agree with the item than females located at the same point on the la-
tent continuum. By contrast, non-uniform DIF is a type of model vio- A key issue is deciding when to test for gender-based DIF. If the
lation that offers less possibilities of compensation. Non-uniform DIF theoretical underpinnings of a study give reason to anticipate
means that the slope of the ICC is different for males and females gender-based DIF, a proper investigation of DIF is (or rather should
(see Fig. 2c), whereas the location can be the same. Up to the item lo- be) a matter of course. The situation parallels the assumption and
cation, the item favors males, whereas subsequently females score the assessment of unidimensionality. Even though thorough qualita-
higher than males. This type of DIF represents a qualitative difference tive investigations and extensive pretests precede a scale develop-
and suggests that the item fails to be a proper indicator of the latent ment project, proper assessment of unidimensionality is still an
variable for at least one gender, possibly for both. Finally, uniform integral part of scale analysis. Furthermore, the fact that unidimen-
and non-uniform DIF may occur simultaneously (see Fig. 2d). The sionality has been established once does not exempt future re-
consequences parallel those of non-uniform DIF. searchers from rigorously reestablishing unidimensionality.
DIF requires special attention since ordinary tests of fit do not nec- Neglecting the issue of gender-based DIF implies the possibility of
essarily reveal this type of model violation, unless DIF is extreme or distorted, biased measures. Specifically, measures for females and
involves a strong non-uniform component. One way to explicitly males may lack comparability resulting in biased p-values for mean
test for the presence to DIF is to undertake a two-way analysis of var- comparisons. Disregarding non-uniform DIF can result in invalid items
iance (Ewing et al., 2005). As per the assessment of item fit, the anal- contributing to the estimation of measures for males or females. Al-
ysis involves grouping the respondents into several classes according though the biased DIF items may balance out if some items favor
to their location. Hence, the class interval is the first factor, which es- males while others advantage females to the exactly same amount
sentially captures general fit. The second source of variation repre- (Andrich & Hagquist, 2012), this line of argument comes with serious
sents the DIF factor, which in the current context comprises the qualifications. First, it requires a proper investigation of DIF in the first
categories of male and female. Under the null-hypothesis, all differ- place since the cancelation is confined to a specific scale and very likely
ences between the expected scores and the actual scores are zero to a particular context. Transferring the instrument from one context to
and, so, all ANOVA effects should be non-significant, including the in- another might completely invalidate DIF cancelation. Second, cancel-
teraction term. Uniform DIF implies that residuals are mostly positive ation cannot account for non-uniform DIF. Third, cancelation of DIF
602 T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607

can be deceptive in situations where there is incomplete data or very Table 1


specific patterns of missing values. Fourth, the non-reporting of DIF Item location estimates for items from two subscales of the Fear of Death Scale.

can result in other researchers pre-emptively assuming an absence of Column 1 2 3


DIF. This assumption may be incorrect if the context is different or the
Subscale: The death of others Location Fit (p) Location
researchers use a subset of items or a shortened version of a scale. Ignore DIF Account
Thus, the potential problem of gender-DIF requires a multi-stage for DIF
approach. First, researchers should test for gender-based DIF, ideally I: Losing someone close to you −0.84 0.09 −1.01
in any empirical study but specifically in studies aiming at the assess- 2: Having to see the person's dead body Misfit 0.002a
ment of gender differences. Contemporary measurement models 3: Never being able to communicate −0.46 0.08 −0.63
based on Rasch measurement theory or confirmatory factor analysis with the person again
4: Regret over not being nicer to the 1.20 0.01 0.62(M)
allow for straightforward tests for DIF. Flagging items exhibiting DIF person when he or she was alive 1.24(F)
properly informs fellow researchers who want to utilize the instru- 5: Growing old alone without the person 0.09 0.05 −0.07
ment in the future. Moreover, revealing gender-based DIF, if any, in- 6: Feeling guilty that you are relieved Misfit 0.002a
forms theory development and thereby provides the basis for that the person is dead
7: Feeling lonely without the person 0.01 0.11 −0.15
advancing our understanding of the role of gender in substantive the-
Items favoring females 0 0
ories. In the longer term, evidence of comprehensive DIF-analyses Items favoring males 1 0
will provide insight to the scientific community into how serious Overall fit (p) 0.0006 0.009
the threat actually is. Second, a subsequent analysis should account PSI (Reliability) 0.83 0.83
for DIF either by discarding items showing DIF or by splitting such Mean difference female–male (absolute) 0.80 0.94
Effect size mean difference 0.53 0.62
items in the analysis. Third, researchers may wish to compare the Effect size mean difference female–male 100% 117%
original analysis and the analysis considering DIF to ascertain the ex- (relative)
tent to which gender-based DIF matters in their given context. p-Value 0.00002 0.000006

Subscale: The dying of others Location Fit (p) Location


5. Case analyses
1: Having to be with someone who is dying 0.80 0.82 0.53(M)
This paper uses three cases to illustrate the different ways in 0.84(F)
2: Having the person want to talk about 1.49 0.33 1.38
which gender can influence participant response to questionnaire death with you
items. The aim is to examine the potential impact of gender-based 3: Watching the person suffer from pain −1.72 0.17 −1.817
DIF on research outcomes. All analyses utilize RUMM 2030 (Andrich 4: Seeing the physical degeneration of the −0.48 0.02 −0.57
et al., 2010). person's body
5: Not knowing what to do about your grief 0.13 0.57 0.04
at losing the person when you are with
5.1. Case One: The fear of death scale him or her
6: Watching the deterioration of the −1.04 0.33 −1.13
The first case pertains to the Collett–Lester Fear of Death scale person's mental abilities
(Lester & Abdel-Khalek, 2003), which is an established instrument 7: Being reminded that you are going to go 0.82 0.03 0.72
through the experience also one day
assessing fear of death and dying. Measures of fear of death have par- Items favoring females 0 0
ticular utility across a range of marketing contexts. In social market- Items favoring males 1 0
ing settings, the relevance of fear of death measures stem from their Overall fit (p) 0.04 0.12
proven ability to predict intentions to perform health related behav- PSI (Reliability) 0.88 0.88
Mean difference female–male (absolute) 0.299 0.361
iors as varied as communicating wishes about posthumous organ do-
Effect size mean difference 0.19 0.23
nation with family members (Newton, Burney, Hay, & Ewing, 2010) Effect size mean difference female–male 100% 121%
and performing physical exercise (Arndt, Schimel, & Goldenberg, (relative)
2003). In more general marketing contexts, fear of death is intrinsi- p-Value 0.10 0.04
cally linked with terror management theory (Routledge & Juhl, M = Location for male version of the item when accounting for DIF during analysis.
2010), a theoretical account that explains the ‘urge to splurge’ that F = Location for female version of the item when accounting for DIF during analysis.
a
underpins much of modern consumer behavior (Arndt, Solomon, Refers to an earlier analysis including this item.
Kasser, & Sheldon, 2004). The scale consists of 28 items in total
assessing four different dimensions: one's own death, one's own a male and a female version during the estimation), the effect size of
dying, the death of others, and the dying of others. Each dimension the mean difference increases from 0.53 to 0.62 (see columns 1 and 3
comprises seven items. in Table 1). In essence, Item 4 partially masks the effect and when
The sample comprises 404 respondents. Two subscales (one's own accounted for, increases the effect size by 17%. In the current context,
death and one's own dying) show no signs of gender-based DIF. How- the mean difference between females and males was sufficiently large
ever, the dimensions capturing the death and the dying of others that this item does not mask the overall finding that females score
exhibit DIF, such that one item in each subscale favors males. The higher on this subscale than males.
lack of gender neutrality of these two items threatens the validity of The second 7-item subscale, the dying of others, also shows evidence
the assessment of gender differences (see Table 1). of gender-item DIF which, when unaccounted for, will lead to errone-
With respect to the 7-item death of others subscale, females are more ously accepting the null hypothesis. Specifically, Item 1 (Having to be
reluctant than males to admit that they experience regret over not being with someone who is dying) implies higher scores for males compared
nicer to the person when he or she was alive. Column 1 in Table 1 refers to to females given the latent variable. Again, DIF does not become mani-
an analysis that ignores DIF. Since DIF does not adversely affect item fit fest in the item fit statistics as the p-value is insignificant (see the
(see column 2 in Table 1), it is likely to go unnoticed unless the analysis p-value of 0.82 in column 2 in Table 1), but requires specific analyses fo-
explicitly tests for gender-based DIF. Despite Item 4 favoring males, cusing on invariance. In contrast to the prior subscale (death of others),
overall, females show a much higher level of fear of death of others the mean difference between genders is smaller. When not accounting
with the effect size mean difference (Cohen's d, Cohen, 1988) for DIF, the effect size is 0.19 (see column 1 in Table 1) and the p-value
amounting to 0.53. Accordingly, the p-value is extremely small (p = of 0.10 militates in favor of the null-hypothesis. However, the analysis
0.00002). After properly accounting for DIF (by splitting the item into properly adjusting for DIF proves this conclusion wrong (see column 3
T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607 603

in Table 1). When considering DIF, the effect size goes from 0.19 to 0.23, As in the previous example, the ordinary item fit statistics do not in-
an increase by 21%, which is enough to bring the p-value down to 0.04. dicate DIF. Only the two-way ANOVA is sensitive to the violation of in-
Consequently, a seemingly negligible amount of gender-based DIF re- variance. Since the number of items favoring females and males,
verses the result of hypothesis testing. respectively, is balanced, it is unlikely that DIF will dramatically distort
the means of females or males. Nevertheless, DIF does not completely
5.2. Case Two: The self-report altruism scale cancel out and therefore p-values of mean comparisons are somewhat
biased whenever DIF is present. For example, when ignoring DIF, the
The second example utilizes the 20-item self-report altruism scale mean difference between females and males is small, with males scor-
by Rushton, Chrisjohn, and Fekken (1981) and a convenience sample ing slightly higher (−0.016 on the logit scale, which translates to an
of 409 respondents. Some items embody behavior that requires phys- effect size of −0.02, p-value of 0.82; see column 1 in Table 2). When ac-
ical strengths, for example helping a stranger push a car, which po- counting for DIF in all six DIF-affected items (see column 2 in Table 2),
tentially favors males. Other items implicate a safety hazard that the means change direction suggesting that females are slightly more
may disadvantage females, for example giving a stranger a lift in altruistic than males (0.018 on the logit scale, effect size of 0.02,
one's car. Furthermore, the content of some items seemingly aligns p-value of 0.64). Two interrelated factors contribute to DIF becoming ir-
to traditional female roles, for example looking after a neighbor's relevant in this case. First, the mean difference between females and
pets or children. In total, eight items are plausible candidates for males is extremely small. Therefore, a switch from the null hypothesis
gender-based DIF. In fact, a Rasch analysis reveals DIF in six of these of no mean difference to the alternative hypothesis requires a great dis-
eight items, while one of these items fits weakly (see Table 2). No tortion of the measures. Second, the DIF cancels out almost completely.
item shows DIF without a justifiable reason. However, the latter is only true provided the measures derive from the

Table 2
Item location estimates of items of the self-report altruism scale.

Column 1 2 3 4 5

Altruism scale items Location Fit (p) Location Location Location


Ignore Account 3 DIF items 3 DIF items
DIF a for DIF b omittedc omittedd

1: I have helped push a stranger's car that had broken down. 1.59 0.54 2.13(F) Omitted 1.53
0.51(M)
2: I have given directions to a stranger. −3.10 0.14 −3.27 −2.87 −3.22
3: I have made change for a stranger. 0.24 0.02 0.08 0.51 0.15
4: I have given money to a charity. −2.27 0.42 −2.44 −2.03 −2.32
5: I have given money to a stranger who needed it (or asked me for it). 0.97 0.84 0.82 1.25 0.89
6: I have donated goods or clothes to a charity. −2.06 0.71 −2.11(F) −1.85 Omitted
−1.28(M)
7: I have done volunteer work for a charity. 0.30 0.01 0.14 0.56 0.22
8: I have donated blood. 1.48 0.12 1.32 1.75 1.40
9: I have helped carry a stranger's belongings (books, parcels etc.). 0.75 0.18 0.59 1.03 0.67
10: I have delayed an elevator and held the door open for a stranger. −1.78 0.99 −1.96 −1.55 −1.87
11: I have allowed someone to go ahead of me in a line-up −1.53 0.01 −1.70 −1.29 −1.61
(at a photocopy machine, in the supermarket).
12: I have given a stranger a lift in my car. 2.23 0.19 2.41(F) Omitted 2.16
1.54(M)
13: I have pointed out a clerks error (in a bank, at the supermarket) in 0.58 0.60 0.42 0.85 0.50
undercharging me for an item.
14: I have let a neighbor whom I didn't know too well borrow an item of some value to me. 1.45 0.12 1.31 1.73 1.35
15: I have brought ‘charity’ Christmas or greeting cards deliberately because I 0.27 0.39 −0.02(F) 0.53 Omitted
knew it was a good cause. 0.62(M)
16: I have helped a classmate who I did not know that well with a homework assignment 0.03 0.90 −0.13 0.28 −0.06
when my knowledge was greater than his or hers.
17: I have, before been asked, voluntarily looked after a neighbors pet or children 0.39 0.61 0.08(F) 0.65 Omitted
without being paid for it. 0.90(M)

Altruism scale items Location Fit (p) Location Location Location


DIF DIF 3 DIF items 3 DIF items
ignoreda consideredb omittedc omittedd

18: I have offered to help a handicapped or elderly stranger across a street. 0.75 0.01 0.61 1.04 0.66
19: I have offered my seat on a bus or train to a stranger who was standing. −0.84 0.49 −1.00 −0.58 −0.93
20: I have helped an acquaintance move households. 0.55 0.57 0.71(F) Omitted 0.48
−0.27(M)
Items favoring females 3 0 3 0
Items favoring males 3 0 0 3
Overall fit (p) 0.007 0.003 0.002 b0.001
PSI (Reliability) 0.82 0.83 0.80 0.79
Mean difference female–male (absolute) −0.016 0.018 0.200 −0.19
Effect size mean difference −0.02 0.02 0.20 −0.20
Effect size mean difference female–male (relative) 100% −111% −1185% 1178%
p-Value 0.82 0.64 0.08 0.09

M = Location for male version of the item when accounting for DIF during analysis.
F = Location for female version of the item when accounting for DIF during analysis.
a
Location parameters when DIF not taken into account for the 6 items that displayed DIF.
b
Location parameters when DIF is taken into account for the 6 items that displayed DIF.
c
Location parameters when omitting 3 items that favor males.
d
Location parameters when omitting 3 items that favor females.
604 T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607

complete set of all 20 items or, at least, from a subset of items analyses use only the positively worded items. In addition, Item 11
safeguarding a balance of items favoring females and males. This high- misfits in Australia but functions properly in the US and the UK.
lights the need to exert caution when selecting subsets of items to In both the UK and US samples, none of the NEP items displays
shorten the length of a survey, as often occurs in both academic and gender-based DIF and the mean differences between females and
commercial survey research. In situations such as these, and where re- males are insignificant (p-value of 0.38 in the UK and 0.74 in the
searchers are exploring gender differences, a sensitivity analysis pro- US). Moreover, the effect sizes are small (− 0.08 in the UK with
vides valuable insight in terms of how much undetected (and males scoring higher on the NEP than females; 0.03 in the US with fe-
unaccounted) DIF may distort the measures. Two additional analyses males scoring slightly than males). By contrast, in Australia one item
are therefore undertaken to illustrate the potential impact of ignoring (Item 9) lacks invariance in terms of gender. Males more readily
gender DIF when selecting items for a shortened scale. agree with the statement that the earth is like a spaceship with only
The first analysis omits the three items that favor males but re- limited room and resources. As a result, the analysis ignoring DIF dis-
tains the remaining 17 items from the altruism scale. As expected, closes males as being more environmentally concerned than females,
the mean difference shifts in favor of female respondents (0.20 on with a p-value of 0.06 (see column 1 in Table 3). In fact, reliance on a
the logit scale, effect size of 0.20, p-value of 0.08; see column 4 in traditional exploratory factor analysis (principal axis method) leads
Table 2) and approaches near significance. By contrast, the alternative to significantly different factor scores (p-value of 0.047, see column
analysis excluding the three DIF items that favor females (see column 4 in Table 3). In the analysis that properly accounts for DIF, the
5 in Table 2), the reverse occurs (− 0.19 on the logit scale, effect size p-value increases to 0.17, which supports the null hypothesis. Thus,
of − 0.20, p-value of 0.09). Thus, when analyses do not properly ac- DIF in just one item out of seven accounts for 27% of the effect size in-
count for DIF, there is at least a tenfold inflation of the effect size in ferred from the first analysis ignoring DIF. While DIF partly masks a
either direction. Importantly, this sensitivity analysis is still conserva- true effect in the first case using the fear of death scale, the third
tive. Subsets of, say, six items with half of them showing unbalanced case illustrates that DIF may also enhance the mean difference. Ignor-
DIF would certainly result in even more extreme differences between ing DIF in the case of the NEP scale suggests that a gender difference
females and males. seems to occur in Australia while no such difference prevails in the
USA and the UK. This finding might invoke the notion of cultural dif-
5.3. Case Three: The NEP scale across three countries ferences between the countries investigated. However, this explana-
tion is without substance, since a methodological artifact accounts
The third example uses Dunlap and Van Liere's (1978) twelve- for the identified difference.
item New Environmental Paradigm scale (NEP). The construct has
attracted attention in terms of gender differences (e.g., Zelezny, 6. Discussion
Chua, & Aldrich, 2000). The data set comprises respondents from
the United States (N = 501), the United Kingdom (N = 500) and This study explores the potential for males and females to respond
Australia (N = 502). In all three countries, the reverse coded items differentially to the same survey items. The paper presents three
show strong misfit (see column 2 in Table 3). Therefore, subsequent cases to establish proof of the concept that gender-based DIF can

Table 3
Item location estimates of the NEP scale.

Column 1 2 3 4 5 6

Country AUS AUS AUS AUS UK USA

NEP Items Location Fit(p) Location EFA (PA)a loading Location Location

1. We are approaching the limit of the number of people that the earth can support. 0.51 0.03 0.54 0.54 0.19 0.46
2. The balance of the earth is very delicate and easily upset. −0.18 0.63 −0.15 0.63 0.12 0.36
3. Humans have the right to modify the natural environment to suit their needs (reversed). Misfitb
4. Mankind was created to rule over the rest of nature (reversed). Misfit b
5. When humans interfere with nature it often produces disastrous consequences. 0.38 0.24 0.41 0.56 0.49 −0.10
6. Plants and animals exist primarily to be used by humans (reversed). Misfit b
7. To maintain a healthy economy we will have to develop a ‘steady-state’ economy where 0.15 0.13 0.18 0.58 0.05 0.02
industrial growth is controlled.
8. Humans must live in harmony with nature in order to survive. −0.60 0.15 −0.58 0.65 −0.70 −0.35
9. The earth is like a spaceship with only limited room and resources. −0.15 0.00 0.08(F) 0.74 0.04 −0.11
−0.39(M)
b
10. Humans need not adapt to the natural environment because they can remake it to suit Misfit
their needs (rev.).
11. There are limits to growth beyond which our industrialized society cannot expand. Misfit c 0.24 0.21
12. Mankind is severely abusing the environment. −0.11 0.01 −0.08 0.73 −0.43 −0.49
Items favoring females 0 0 0 0 0
Items favoring males 1 0 1 0 0
Overall fit (p) 0.00006 0.0003 0.02 0.002
PSI (Reliability) 0.75 0.75 0.82d 0.76 0.81
Mean difference female–male (absolute) −0.248 −0.182 −0.16e −0.109 0.046
Effect size mean difference −0.17 −0.12 −0.08 0.03
Effect size mean difference female–male (relative) 100% 73%
p-Value 0.06 0.17 0.047 0.38 0.74

AUS = Australia; UK = United Kingdom; USA = United States of America.


M = Location for male version of the item when accounting for DIF during analysis.
F = Location for female version of the item when accounting for DIF during analysis.
a
Exploratory factor analysis (Principal axis method).
b
Reverse coded items showing strong misfit across all countries and therefore excluded from further analysis.
c
Positively worded item that functions properly for USA and UK samples but not for AUS sample.
d
Cronbach's α.
e
Mean difference of factor scores.
T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607 605

exist within survey data. As such, it extends current knowledge and advancing a scale and determining its range of applicability. This is
best practice with respect to item response theory in cross-cultural critical when relying on subsets of items or selecting items from
contexts to the under-researched issue of gender. multiple scales to form an instrument considered suitable for use
The first case presents gender-related differences in the responses in a given setting.
to items measuring fear of death and the dying of others. The findings Finally, DIF may also add to the theoretical understanding of a
support the contention that the wording and the content of survey construct and its embeddedness in a nomological network. In princi-
items can mean different things to men and women. The example il- ple, gender-based DIF implies a qualitative difference in the very
lustrates that DIF in just one item in a scale comprising seven items meaning of a construct between women and men. Notwithstanding
has the potential to reverse the decision when testing the hypothesis the fact that in many cases it is possible to account for this difference
that females and males differ with respect to their scores on the dying quantitatively, the substantive theory of the construct should take the
of others subscale. By comparison, under the more favorable condi- role of gender into consideration. The latter is particularly true when-
tions of the dimension fear of death of others, the conclusion proves ever non-uniform DIF is present, where the item is a suitable indica-
robust. In both cases, the item displaying DIF partly masks a true tor for one gender but not for the other. In turn, the presence of DIF
effect. should also provoke theoretical considerations as to why DIF occurs.
The second example focuses on a commonly used measure of altru- Although we limit our discussion to the technical aspects of DIF,
ism. This case underscores the importance of evaluating gender-based there is a need for substance-driven studies to extend beyond statis-
DIF on a case-by-case basis in that although six out of twenty items tical procedures and shed light on the mechanism(s) responsible for
display gender-based DIF, the conclusions from a mean comparison gender-based DIF.
remain relatively stable. This robustness emanates from a situation
whereby the three items favoring females balance the distortion aris-
ing from the three items advantaging males. However, this outcome 6.2. Addressing gender-based DIF in research design
crucially depends on the usage of the entire set of twenty items. A sen-
sitivity analysis demonstrates that, even under relatively conservative Since a lack of invariance is a serious type of item misfit, a compre-
conditions of fourteen gender-invariant items, three items exhibiting hensive validation of a scale should include proper assessment of DIF.
DIF in the same direction strongly increase the effect size altering a Ideally, measurement instruments should be free of gender-based
virtually non-existent mean difference into a near-significant test sta- DIF. During the development of new scales, the substantial theory of
tistic. The second case therefore demonstrates how DIF can produce the construct governing the process of item generation should con-
mean differences that are substantially meaningless but statistically sider whether gender influences the relationship of the latent vari-
significant. able and the item. As noted earlier, the body of research regarding
A similar situation occurs in the third example. Gender-based DIF in gender differences represents a rich source of theoretical input. More-
just one item in a set of seven positively worded NEP items shifts the over, gender should always be a matter of concern when screening
mean difference between female and male respondents in Australia and selecting items during the purification phase of scale develop-
from a value firmly in line with the null hypothesis to a near- ment (see Schulz, 1990). Even small pilot studies are generally capa-
significant difference when ignoring DIF. Since no mean differences ble of revealing response anomalies caused by gender (see Schulz,
occur in the samples from the US and the UK, where all items turn out 1990). With large item pools, the option exits to eliminate items af-
to be invariant regarding gender, the purported effect in Australia ap- fected by DIF. However, more often than not, the number of appropri-
pears to reflect a cross-cultural difference, while in fact it is solely due ate indicators is relatively small, thus raising the possibility of
to a measurement artifact. changing the wording of DIF items as an attempt of avoiding DIF. In
this situation, the resulting items would then have to be re-assessed.
6.1. Implications of gender item bias Given that marketing researchers are often more concerned with
applications of measurement instruments than scale development
The examples provide evidence that gender-based DIF can mask or advancement, the question arises as to what extent possible
true differences as well as inflate actual mean differences. Under ad- gender-based DIF requires attention in these contexts. First, the in-
verse conditions that are by no means extraordinary, DIF can actually vestigation of DIF is dispensable whenever the theory of the construct
reverse the decision in favor of the null or the alternative hypothesis is devoid of any reference to gender and the researchers have no rea-
of a mean difference between women and men. Such spuriously sig- son to assume that the interpretation of the items differs between
nificant results may give rise to substantial explanations that are un- genders. However, when this is not the case, researchers should un-
founded. There is a codicil to this argument. If there are a relatively dertake a DIF analysis to rule out the presence of biased (differential)
large number of items and distortions operate in opposite direction item responses between males and females. Gender-specific informa-
such that they cancel each other out, the mean differences and, tion processing is an example where DIF considerations are worthy of
consequently, substantial conclusions may prove relatively robust investigation. Second, when a study investigates and tests for gender
against DIF. However, an a priori reliance on the presence of such a differences, proper assessment of DIF is essential regardless of the
serendipitous cancelation of DIF is contrary to fundamental scientific construct under scrutiny. Finally, a DIF analysis is appropriate for
principles. Bearing in mind that marketing research often uses rela- pooled data analyses (i.e., analyses based on pooled male and female
tively short scales, relying on effects balancing out is risky. Even responses), when there is reason to assume that the construct differs
under favorable conditions, accounting for DIF is advisable, since ef- structurally between genders. A different structure means that the
fect sizes and p-values are more accurate. order of the item locations is different or that the distance between
Furthermore, ignoring DIF because of an apparent lack of conse- the item locations is not the same for males and females. In these sit-
quences in a particular study may have implications with respect uations, this is indicative of DIF being present.
to future research endeavors. For example, although a difference be- In summary, an informed appraisal of the extent to which mea-
tween a p-value of, say, 0.20 and 0.80 is perhaps meaningless in the surement instruments actually suffer from a gender bias requires
context of a single study, in a meta-study, the last-mentioned differ- the utilization of appropriate methods to assess gender-based DIF.
ence in the p-value (and, as a consequence, of the effect size) Extending scale analysis by testing for DIF is advisable in any applica-
matters. Thus, accurate parameter estimates that adjust for DIF tion of a scale whenever gender-based DIF appears likely. Where
are desirable. Moreover, flagging items showing gender-based DIF there is evidence of DIF, splitting the item lends itself as a straightfor-
in a particular context provides invaluable feedback in terms of ward way to account for the distortion of measures.
606 T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607

6.3. Limitations and future research De Jong, M. G., Steenkamp, J. -B. E. M., & Fox, J. -P. (2007). Relaxing measurement
invariance in cross-national consumer research using a hierarchical IRT model.
Journal of Consumer Research, 34, 260–279.
The focus of this study is on providing proof of concept that Dean, K., & Edwardson, S. (1996). Additive scoring of reported symptoms: Validity and
gender-based DIF can be present in survey items and that it has the item bias problems in morbidity scales. European Journal of Public Health, 6(4),
275–280.
potential to impact data interpretation in different ways depending DeSteno, D. A., Bartlett, M. Y., Braverman, J., & Salovey, P. (2002). Sex differences in
on whether the bias is uniform, non-uniform, balanced or unbal- jealousy: Evolutionary mechanism or artifact of measurement? Journal of Personality
anced. Accordingly, the study design does not attempt to explore and Social Psychology, 83(5), 1103–1116.
DeSteno, D. A., & Salovey, P. (1996). Evolutionary origins of sex differences in jealousy?
the underlying reasons for the identified differences in item response.
Questioning the “fitness” of the model. Psychological Science, 7, 367–372.
Nor does it examine the substantive reasons for a potential interac- Dunlap, R., & Van Liere, K. (1978). The new environmental paradigm. The Journal of
tion between culture and gender in response patterns. These issues Environmental Education, 9, 10–19.
Eagly, A. H. (1987). Sex differences in social behavior: A social-role interpretation.
may benefit from further exploration, especially given that gender
Hillsdale, NJ: Erlbaum.
roles are contingent on culture. Eagly, A. H. (1993). Sex differences in human social behavior: Metaanalytic studies of
The aforementioned arguments highlight the need for empirical social psychological research. In M. Haug, R. E. Whalen, C. Aron, & K. L. Olsen (Eds.),
studies to routinely test for measurement invariance in terms of The development of sex differences and similarities in behaviour (pp. 421–436). London,
England: Kluwer Academic.
gender. Currently, gender-related research in marketing lags behind Eagly, A. H., & Wood, W. (1991). Explaining sex differences in social behavior: A
cross-cultural research in this respect. When seeking theoretical ex- meta-analytic perspective. Personality and Social Psychology Bulletin, 17, 306–315.
planations of DIF, researchers should note that cross-cultural research Eagly, A. H., & Wood, W. (1999). The origins of sex differences in human behavior:
Evolved dispositions versus social roles. American Psychologist, 54, 408–423.
has only recently shifted from merely identifying cross-national dif- Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,
ferences (with nationality used as a surrogate for culture) to theoret- NJ: Lawrence Erlbaum Associates.
ically explaining substantial differences based on theories of culture. Ewing, M. T., Salzberger, T., & Sinkovics, R. (2005). An alternate approach to assessing
cross-cultural measurement equivalence in advertising research. Journal of Adver-
As such, the dissemination of testing for gender DIF should follow tising, 34, 17–36.
the model of cross-cultural research. Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of differential item func-
tioning on age and gender differences in functional disability. Journal of Gerontology:
Social Sciences, 57B(5), S275–S284.
Gelin, M. N., Carleton, B. C., Smith, M. A., & Zumbo, B. D. (2004). The dimensionality and
References gender differential item functioning of the mini asthma quality of life question-
naire (MiniAQLQ). Social Indicators Research, 68(1), 91–105.
Alexander, G. M. (2003). An evolutionary perspective of sex-typed toy preferences: Hagquist, C., & Andrich, D. (2004). Is the sense of coherence-instrument applicable on ad-
Pink, blue, and the brain. Archives of Sexual Behavior, 32(1), 7–14. olescents? A latent trait analysis using Rasch-modelling. Personality and Individual
Andersen, E. B. (1982). Latent trait models and ability parameter estimation. Applied Differences, 36, 955–968.
Psychological Measurement, 6, 445–461. Hall, J. A. (1984). Nonverbal sex differences: Communication accuracy and expressive
Andrich, D. (1978a). A rating formulation for ordered response categories. style. Baltimore: Johns Hopkins University Press.
Psychometrika, 43, 561–573. Hall, J. A. (1987). On explaining gender differences: The case of nonverbal communica-
Andrich, D. (1978b). Application of a psychometric rating model to ordered categories tion. In P. Shaver, & C. Hendrick (Eds.), Sex and gender: Review of personality and
which are scored with successive integers. Applied Psychological Measurement, 2, social psychology, Vol. 7. (pp. 177–200)Newbury Park, CA: Sage.
581–594. Halpern, D. F. (1989). The disappearance of cognitive gender differences: What you see
Andrich, D. (1988). A general form of Rasch's extended logistic model for partial credit depends on where you look. American Psychologist, 44, 1156–1158.
scoring. Applied Measurement in Education, 1(4), 363–378. Halpern, D. F. (1992). Sex differences in cognitive abilities (2nd ed.). Hillsdale, NJ:
Andrich, D. (1995a). Models for measurement, precision and the non-dichotomization Lawrence Erlbaum Associates.
of graded responses. Psychometrika, 60(1), 7–26. He, Y., Merz, M. A., & Alden, D. L. (2008). Diffusion of measurement invariance assess-
Andrich, D. (1995b). Further remarks on non-dichotomization of graded responses. ment in cross-national empirical marketing research: Perspectives from the litera-
Psychometrika, 60(1), 37–46. ture and a survey of researchers. Journal of International Marketing, 16(2), 64–83.
Andrich, D. (2010). Understanding the response structure and process in the Lester, D., & Abdel-Khalek, A. (2003). The Collett–Lester fear of death scale: A correc-
polytomous Rasch model. In M. L. Nering, & R. Ostini (Eds.), Handbook of tion. Death Studies, 27(1), 81–85.
polytomous item response theory models (pp. 123–152). New York, NY and Hove, Luxen, M. F. (2007). Sex differences, evolutionary psychology and biosocial theory:
East Sussex, United Kingdom: Routledge. Biosocial theory is no alternative. Theory and Psychology, 17(3), 383–394.
Andrich, D., de Jong, J. H. A. L., & Sheridan, B. E. (1997). Diagnostic opportunities with Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2),
the Rasch model for ordered response categories. In J. Rost, & R. Langeheine 149–174.
(Eds.), Applications of latent trait and latent class models in the social sciences Melnyk, V., van Osselaer, S., & Bijmolt, T. (2009). Are women more loyal customers
(pp. 145–151). Münster: Waxmann. than men? Gender differences in loyalty to firms and individual service providers.
Andrich, D., & Hagquist, C. (2012). Real and artificial differential item functioning. Jour- Journal of Marketing, 73, 82–96.
nal of Educational and Behavioral Statistics, 37(3), 387–416. Moss, G. (2009). Gender, design and marketing. How gender drives our perception of
Andrich, D., Sheridan, B. S., & Luo, G. (2004). Displaying the Rumm 2020 analysis. design and marketing. Surrey, England: Gower Publishing Limited.
Working Paper. Perth, Western Australia: RUMM Laboratory. Newton, J. D., Burney, S., Hay, M., & Ewing, M. T. (2010). A profile of Australian adults
Andrich, D., Sheridan, B. S., & Luo, G. (2010). Rumm 2030: Rasch unidimensional mea- who have discussed their posthumous organ donation wishes with family mem-
surement models [computer software]. Perth, Western Australia: RUMM Laboratory. bers. Journal of Health Communication, 15(5), 470–486.
Arndt, J., Schimel, J., & Goldenberg, J. L. (2003). Death can be good for your health: Fitness Orhede, E., & Kreiner, S. (2000). Item bias in indices measuring psychosocial work environ-
intentions as a proximal and distal defense against mortality salience. Journal of ment and health. Scandinavian Journal of Work, Environment & Health, 26(3), 263–272.
Applied Social Psychology, 33(8), 1726–1746. Orne, M. T., & Whitehouse, W. G. (2000). Demand characteristics. In A. E. Kazdin (Ed.),
Arndt, J., Solomon, S., Kasser, T., & Sheldon, K. M. (2004). The urge to splurge: A terror Encyclopedia of psychology (pp. 469–470). Washington, DC: American Psychologi-
management account of materialism and consumer behavior. Journal of Consumer cal Association and Oxford University Press.
Psychology, 14(3), 198–212. Putrevu, S. (2001). Exploring the origins and information processing differences
Bem, S. L. (1981). Gender schema theory: A cognitive account of sex typing. Psycholog- between men and women: Implications for advertisers. Academy of Advertising
ical Review, 88, 354–364. Science Review, 10, 1–14.
Buss, D. M. (1995a). Evolutionary psychology: A new paradigm for psychological Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago:
science. Psychological Inquiry, 6, 1–30. MESA (Reprint 1980, Danish Institute for Educational Research).
Buss, D. M. (1995b). Psychological sex differences: Origins through sexual selection. Richard, M. -O., Chebat, J. -C., Yang, Z., & Putrevu, S. (2010). A proposed model of online
American Psychologist, 50, 164–168. consumer behavior: Assessing the role of gender. Journal of Business Research,
Cheung, G., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A 63(9–10), 926–934.
reconceptualization and proposed new method. Journal of Management, 25(1), Routledge, C., & Juhl, J. (2010). When death thought leads to death fears: Mortality
1–27. salience increases death anxiety for individuals who lack meaning in life. Cognition
Christensen, K. B., Bjorner, J. B., Kreiner, S., & Petersen, J. H. (2002). Testing unidimen- and Emotion, 24(5), 848–854.
sionality in polytomous Rasch models. Psychometrika, 67, 563–574. Rushton, J. P., Chrisjohn, R. D., & Fekken, G. C. (1981). The altruistic personality and the
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). self-report altruism scale. Personality and Individual Differences, 2(4), 293–302.
Hillsdale, NJ: Lawrence Earlbaum Associates. Salzberger, T. (2009). Measurement in marketing research: An alternative framework.
Dahl, D. W., Sengupta, J., & Vohs, K. D. (2009). Sex in advertising: Gender differences and Cheltenham, UK: Edward Elgar.
the role of relationship commitment. Journal of Consumer Research, 36, 215–231. Schaffer, B. S., & Riordan, C. M. (2003). A review of cross-cultural methodologies for
de Gregorio, F., & Sung, Y. (2010). Understanding attitudes toward and behaviours in organizational research: A best-practices approach. Organizational Research
response to product placement. Journal of Advertising, 39(1), 83–96. Methods, 6(2), 169–215.
T. Salzberger et al. / Journal of Business Research 67 (2014) 598–607 607

Schulz, E. M. (1990). DIF detection: Rasch versus Mantel–Haenszel. Rasch Measurement Snipes, R. L., & Thomson, N. F. (2006). Gender bias in customer evaluations of ser-
Transactions, 4(2), 107. vice quality: An empirical investigation. Journal of Services Marketing, 20(4),
Seock, Y. -K., & Sauls, N. (2008). Hispanic consumers' shopping orientation and apparel 274–284.
retail store evaluation criteria. An analysis of age and gender differences. Journal of Steenkamp, J. -B. E. M., & Baumgartner, H. (1998). Assessing measurement invariance
Fashion Marketing and Management, 12(4), 469–486. in cross-national consumer research. Journal of Consumer Research, 25, 78–90.
Severiens, S., & Ten Dam, G. (1998). Gender and learning: Comparing two theories. Steenkamp, J. B. E., de Jong, M. G., & Baumgartner, H. (2010). Socially desirable re-
Higher Education, 35(3), 329–350. sponse tendencies in survey research. Journal of Marketing Research, 47(2),
Singh, J. (1995). Measurement issues in cross-national research. Journal of International 199–214.
Business Studies, 26(3), 597–619. Williams, C. L., & Meck, W. H. (1991). The organizational effects of gonadal steroids on
Singh, J. (2004). Tackling measurement problems with item response theory: Principles, sexually dimorphic spatial ability. Psychoneuroendocrinology, 16(1–3), 155–176.
characteristics, and assessment, with an illustrative example. Journal of Business Wolin, L. D. (2003). Gender issues in advertising: An oversight synthesis of research:
Research, 57(2), 184–208. 1970–2002. Journal of Advertising Research, 43, 111–129.
Smith, R. (1986). Person fit in the Rasch model. Educational and Psychological Measure- Wood, W., & Rhodes, N. (1992). Sex differences in interaction style in task groups. In C.
ment, 46(2), 359–372. L. Ridgeway (Ed.), Gender, interaction, and inequality (pp. 97–121). New York:
Smith, E. V. (2002). Understanding Rasch measurement: Detecting and evaluating the Springer-Verlag.
impact of multidimensionality using item fit statistics and principal component Zelezny, L. C., Chua, P. -P., & Aldrich, C. (2000). Elaborating on gender differences in
analysis of residuals. Journal of Applied Measurement, 3(2), 205–230. environmentalism. Journal of Social Issues, 56(3), 443–457.

You might also like