Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Reliability & Validity

International Business Research


Elke Schrover
The seven-step deductive research process
Theory Data
Write-up

Analyze data

Collect data

Choose a research design

Develop a theoretical framework

Formulate the problem statement

Define the business problem


2
Agenda
1. Measurement reliability & validity
2. Internal & external validity
3. Equivalence in cross-national research

3
1. Measurement reliability and validity

4
From conceptualization to operationalization

Conceptualization

• Defining a variable = explaining its precise meaning

Operationalization

• Turning concepts into measures

5
From conceptualization to operationalization
• Concrete variable • Abstract variable

Definition Definition

Straightforward Challenging

Measure Measure
6
Operationalization of concrete variables
• Concrete variable • Example: Age
“The length of time
Definition
that a person has lived”

Straightforward Single facet

Measure What is your age?


7
Operationalization of abstract variables
• Abstract variable • Affective commitment
“The belief that an ongoing
Definition relationship with another is very
important”

Challenging Underlying facets?

Measure Measure..?
8
Operationalization of abstract variables

Affective
commitment to a X
How affectively committed are you to “company _____”?

company

Affective 1. I feel like part of the family at _____


2. _____ has a great deal of personal meaning for me
commitment to a 3. I feel emotionally attached to _____
4. I feel a strong sense of belonging to _____
company
The scale is used by Auh, Bell, McLeod, and Shih (2007)

9
Evaluating the quality of a measure

• Measurement reliability

• Measurement validity

10
Measurement reliability
• Degree to which a measure produces similar results
under similar circumstances

11
Assessing measurement reliability

• Test-retest reliability

• Inter-rater reliability

• Internal consistency

12
Test-retest reliability
• Degree of agreement between the observations
when the same measure is repeated sometime later

• Example: concept = IQ
• Same people, same test on t1 and t2 (not too far apart)
• Degree of agreement
= correlation between observations on t1 and t2

13
Inter-rater reliability
• Degree of agreement between the observations when (at
least) two people ("raters") administer the same measure

• Example: concept = user-friendliness of online shop


• Two people rate user-friendliness via the proposed
measure
• Degree of agreement
= correlation between these two people’s ratings
14
Internal consistency
• Degree of agreement between the different indicators/items
of a single measurement instrument.

• Example: concept = trust in online banking


• Measured via three Likert questions
• Degree of agreement
= extent of inter-correlation between three questions
= demonstrated by calculating Cronbach’s alpha.

15
Internal consistency: Chronbach’s alpha

𝑘 sum of covariances
Cronbach′s 𝛼 =
𝑘 − 1 sum of variances and covariances
k = number of items

Value range 0-1, closer to 1 is


better

Rules of thumb
  .80 Good
  .70 Acceptable

16
Measurement validity
• The degree to which the observations made through a
measure represent the variable they are intended to

17
Obtaining measurement validity

• Provide precedence, but …

• Provide sound logic to support that

considerable overlap exists between measure

and construct (e.g. BMI)

• Be aware: single-item measures for abstract

constructs = always low validity

18
Measurement reliability and validity

19
Drawing valid conclusions from research
• Measurement reliability and validity

- necessary conditions to draw valid conclusions from research

- but not sufficient!

• Also needed:
• internal validity
• external validity

20
Internal and external validity

21
Internal validity
• Extent to which you can be confident that a relationship in
your study cannot be explained by other factors.
w
• Studies with fewer confounding variables are higher in
internal validity
Confounding variables = external variables not included in your
study that might be alternative explanations for your findings

• Studies high on internal validity control for potential


confounding variables
• E.g., hold them constant in an experiment
• E.g., include them as control variables

22
What is internal validity?

23
External validity
Extent to which study findings are generalizable to other
settings

Do the findings apply to …


• other subjects?
• other ‘environments’?

Important for external validity:


• Representative and large sample
• Natural environment
24
Balancing internal vs. external validity
The optimal research design is high in terms of internal
and external validity. However, increasing one without
decreasing the other is not always possible.

Internal validity benefits from a External validity benefits from


controlled environment a natural environment

25
Equivalence in cross-national research

26
International business

• Increased complexity

• Most blunders stem from


inadequate research

27
Equivalence in cross-national research

Equivalence
Construct Measurement Sampling
• Are we studying • Are the • Are the samples
the same phenomena in used in countries
phenomena in countries X, Y and X, Y and Z
countries X, Y and Z measured in the equivalent?
Z? same way?

28
The importance of establishing equivalence

Without … Misleading results given …

• construct equivalence • differences in underlying constructs

• measurement equivalence • differences in underlying measures

• sample equivalence • differences in underlying samples

≠ 29
1. Construct equivalence

30
Construct equivalence
• Are we studying the same phenomena/concepts in
different countries?

• Examples
• Universally understood construct:
• Construct with different meaning across countries:

31
Examples (1)

When Procter & Gamble introduced diapers in Japan, it used the same
ad that did well in the U.S.: a stork delivering Pampers to a happy
home. Contrary to Western folklore, storks in Japan are not supposed to
deliver babies (although they might very well steal one). 32
Examples (2)

McDonald’s Standard Cup Sizes – U.S.A. versus Japan

33
Construct equivalence in secondary data

• Secondary data from different countries may not be


readily comparable because national differences in
definitions make comparing difficult

E.g., definition of “what is a supermarket” differs across


countries

34
How to ensure construct equivalence?

• Draw not only from the domestic literature but also


from the country-specific literature when developing
conceptualizations

• Conduct qualitative research (interviews, focus groups,


…) to identify cultural differences in the meaning of a
construct

35
2. Measurement equivalence

36
Measurement equivalence

• Are the phenomena/concepts that we study measured


in the same way in terms of

• wording → translation equivalence


• scaling / scoring → metric equivalence

in different countries?

37
Translation equivalence

Are questionnaire items translated appropriately


so that items tap into the same constructs
in different countries?

38
Examples (1)

Syrup vs.
blandsaft

39
Examples (2)

Original: “Got milk?”

Translation in
Spanish (Mexico):
“Are you lactating?”

40
Examples (3)

Original (U.K.): “Nothing sucks like an Electrolux”


In the U.S., the word “sucks” had become a derogatory word41
Examples (4)

Original: “Finger lickin’ good!”


Translation in Chinese: “We’ll eat your fingers off!” 42
How to obtain translation equivalence?

Back-translation

43
An example: From Dutch to English

Gezellig Cozy

Knus Pleasant
Change source language
to: “aangenaam”
(decentring) 44
Metric equivalence

Do the scores given by respondents have the same


meaning in different countries?

Example:

Strongly disagree Strongly agree


1 2 3 4 5 6 7 In USA

=?
Strongly disagree Strongly agree
1 2 3 4 5 6 7 In France
45
Metric equivalence: Threats to reliability
• Some languages have fewer terms to express gradation
in evaluation than others
• Example: Korean (less) vs. French (more)

• Some countries lack familiarity with certain scaling /


scoring formats

46
Metric equivalence: Threats to validity
• Response style bias
• Extreme responding
• Socially desirable responding

Do responses to items differ due to actual national


differences or due to response-style bias?

47
Socially desirable responding

• Two types of SDR:

• Egoistic response tendencies (ERT)


Self-deceptive, exaggerated but
honestly held positive self-view
(“Super Hero”)

• Moralistic response tendencies (MRT)


Deliberate attempt to project a
favorable self-image
(“Saint”)

48
Steenkamp, De Jong & Baumgartner (2010)
How to obtain metric equivalence?
• Pre-data collection (reliability):
• Pictorial response scales work well
(especially with less educated population)
• Semantic differentials

• Post-data collection (validity):


• Standardize response to each variable within each
country sample (“deculture” the data)

• Perform analyses on the standardized variables (z)

49
Measurement equivalence for secondary data
Be careful:
• Categories may differ across countries
• e.g., age brackets, income brackets, professions
• Calibration systems may differ across countries
• E.g. monetary units, measures of weight, distance and volume

Reliability threat: Validity threat:


Data may be old or Governments may
scarce in certain paint too rosy
countries pictures
50
3. Sampling equivalence

51
Sampling equivalence

• Achieve representative AND comparable samples

The Netherlands Poland

Representative sample Representative sample

52
Ensuring sampling equivalence
Equivalence = comparability ≠ keeping everything the same

• Timing
Minimize lapses of time between data collection in
different countries

• Sample frame May need to be different in


different countries to ensure
sampling equivalence
• Data collection procedure

53
Sampling equivalence
• Use comparable sampling frames across countries
(e.g., electoral lists, telephone directories, …)

unless inadequate coverage in some countries

E.g., women were not allowed to vote in Saudi Arabia until


2015

E.g., subscription list to the Economist provides better


coverage of the business population in English-speaking
countries than in French-speaking countries

54
Sample equivalence
• Use comparable data collection procedures across countries
(e.g., personal interviews, telephone interviews, mail
surveys, internet surveys)

unless a procedure leads to different biases in different


countries

E.g., mail surveys not effective for consumers in countries


- with high illiteracy levels
- with an unreliable mail service

E.g., personal interviews not effective in Eastern-Europe

55
56
Next:
• Practice quiz

57

You might also like