Module 5 Experimental Designs and Significance Testing

THMT 3312: Research Methods for Tourism &

Hospitality Management

Module 5:
Experimental Designs
& Significance Testing

Slide 1

1. Describe what is meant by causation and association
2. Know when experimentation is the best research
approach to use
3. Understand how experimentation outputs differ from
other types of data collection
After 4. Know what factors impact validity
completing 5. Understand how quasi-experimental and
experimental research designs differ
this 6. Be capable of applying research designs
7. Describe what is meant by validity and reliability
module 8. Know when and why significance testing should be
you 9. Understand what is meant by Type I and Type II errors
should: 10. Define what is meant by research and null hypotheses
11. Know the basic steps in hypothesis testing and
applying significance tests
12. Understand which significance tests to use with the
four levels of data measurement

What is Significance Testing?

As was indicated in Lecture 6 • We need to determine whether

– Sampling, whenever or not there are relationships
samples are used to (association or causation)
generate research results between variables that we
the sample statistics will
not actually equal the real-
measure in our
world results that one • Significance testing allows
could obtain if a census researchers to know if there are
were conducted. We can relationships between variables
only reliably use
in the real-world based on the
significance testing with
empirical samples analysis of sample data

Substantive and Statistical Significance
Becky’s Bakery Executive Auto Blue Lagoon Burt’s Fitness
• Becky sells cookies • Executive Auto’s
Cafe Centre
• Her research tells her research uncovers a • Blue Lagoon Cafe has • Burt did a survey to
that overall, customers statistically customer service find out if his new
will pay an average of significant difference forms on each table centre layout was
$0.75/cookie, but that based on age of enough of a draw to
• Results have shown charge a higher price
men will pay up to customer that customer
$0.77/cookie. • Policy owners over for membership
dissatisfaction has
• While there is a the age of 55 tend to increased by a small • He was hoping the new
statistically significant buy more expensive percentage overall format would allow him
difference between insurance packages to raise his prices
• The difference is not without losing
men and women and for luxury autos great enough to be
how much they will • Therefore, Executive members, however, the
statistically results were not
pay, she chooses to Auto decides to use significant, so
ignore it, because the age as a market statistically significant.
management chose This means that he
price difference is segmentation tool to ignore it.
minor, and she doesn’t could not expect
• In this case, the • In this case, the
want to segment her customers to see added
results are both results are not
market based on value in the new format
statistically statistically
gender. and it could be risky to
significant and significant, so they
• In this case, the results raise prices.
substantive also lack substantial
are statistically • In this case, the results
significance would have been
significant, but not
substantially substantially significant
significance to the company, but
they lacked statistical
THMT 3312: Research Methods for Tourism &
Hospitality Management

Hypothesis Testing

Slide 5

About Hypotheses:
▪ Hypotheses are “educated guesses” about an anticipated association (or causation in the case of
experimental research) between variables

▪ It is important to state the hypotheses before attempting to test for whether or not there are
statistically significant differences in the data that would support the hypothesis

▪ The hypothesis is stated by looking at two possible outcomes:

1) that there is no statistically significant finding (the Null), which is the default
2)that there is a statistically significant finding (the Research or Alternate)

▪ Analyzing data from the sample enables researchers to calculate statistical measures to either
accept, or reject, the Null hypothesis. In essence, the statistical measure used provides evidence
which may be strong enough to reject the Null (similar to a finding of guilty in a court of law)

▪ If the data is not strong enough to show a statistically significant difference it means that the
differences observed could be due simply to chance, or to sampling error, and we cannot
conclude that a relationship exists (the default position of “innocent”)!

Stating Hypotheses

▪ Laura wants to know if there is a The Null Hypothesis (H0):

difference in how men and
women perceive her radio ad. There is no statistically
There is a lot of relaxing music in
significant difference between
the background of her ad, so she
believes it will appeal more to men and women for the
women, which is her goal as her likeability of Laura’s radio ad
primary target market is the
female head of household The Alternate Hypothesis
▪ Stating hypotheses is important
because it clarifies what is being • Women will find Laura’s
tested which aids in choosing the radio ad more likeable than
right statistical method to for the men will.
significance test
Notation: H0= Null and HA= Alternate

Steps in Testing Hypotheses
Always do the following
steps: The norm in business research
ranges from .01 (99% level of
confidence) to .10 (90% level of
1) State the Null and Alternate confidence), with .05 (95% level
of confidence) being a
“happy medium”.
2) Determine the level of
statistical significance desired for Sample: p-value (probability)
the test

3) Choose the appropriate test Nominal, ordinal, interval, or

statistic for the data level you are ratio level.

4) Calculate your test statistic

either manually or by using
software SPSS will automatically provide
the p-value level. If calculating
5) Interpret the level of it manually you must refer to a
significance of your calculated
statistical table.
value (your calculated value
must be equal to/greater than a
critical value to be statistically
Type I and Type II Error

▪ Hypothesis testing requires If we are not correct in our interpretation of the

inference to a population where significance test we can make Type I or Type II errors.
you assume that there is a
situation involving variables of
interest – we must prove this to Decision Made Based on
be true using significance Significance Test
Accept Null Reject Null
▪ There is the chance that our
results will be wrong even using Type I Error
due diligence – we will commit Rejected a
a Type I or Type II error Null is True true Null
(A false
▪ We always test the Null in order
to accept it (no statistically positive)
significant change/difference in Type II Error
sample data) or to reject it Accepted a
(statistically significant Correct
Null is False False Null
change/difference in sample Decision
(A false

The Decision to Accept or Reject is Based on
Test Results
In business research
the p-values used to
determine statistical P-Value in SPSS Level of
significance typically (p) Statistical
range between .01 Significance in
and .05, but they can Business
go as low as .10. Research
When analyzing data .01 .99
using statistical
software the .05 .95
significance level will .10 .90
go beyond .01 to .000
and higher levels of

The Driver’s Ed Example
 The insurance agent is asserting that there
is a relationship between gender and Other Factors:
driving ability A=Number of km/year
B=Purchase of high
 First, we should consider if this seems to be Driver’s performance sports car
reasonable. What is it about males that
would make them worse drivers that
females? Generally, if the relationship (X)
sounds illogical, it is usually not a valid one.

 We can map out the relationship between

driver’s gender and number of traffic
We can’t ensure
that X>Y unless we Number
 We should consider all of the other things are controlling for
that could contribute to the number of
of car
other factors that
accidents, but that are not gender could impact Y, like accidents
A & B shown (Y)
 These other factors can confound our
analysis of the relationship between gender above.
and number of accidents

THMT 3312: Research Methods for Tourism &
Hospitality Management

Experimental Design

Slide 12

Why do we need an experimental design?

If you haven’t Survey research • Surveys can show

used an only measures whether or not A, B, X or
experimental association (also Y vary together, but they
known as cannot show causation.
design, you What causes Y?
cannot claim
X causes Y. Experimental • By allowing researchers
research designs to measure of the impact
of X on Y, while
can prove controlling for A and B,
causation experiments can measure

Experimental Designs
Driver’s Ed
Experimental designs
provide a way for • You have just invested • Jane goes to buy auto
researchers to $250,000 in a promotional insurance and her
campaign premium is half of the
determine how cost of her brother, Jack’s,
• Sales seem have increased
much one factor as a result of the even though they are the
influences another. campaign same age and both are
Only an • But how do you know that new drivers
experimental design the campaign is the • Jack asks his insurance
can measure reason why sales have agent for an explanation
increased? Could it be due and he is not satisfied
causation. Consider with the answer he is
to something else?
two examples that given: That young males
require proof of have more car accidents
causation. and therefore, they must
pay more for insurance
• Does this sound like a
reasonable explanation?

The Advertising Example
Other Factors:
 We can map out the relationship A=Competitors’ stock outs
between advertising and sales. B=Rise in competitors’
 We should consider all of the prices
other things that could Ads
contribute to sales that have no
bearing on our advertising at all (X)
 Are there other things that could
cause our sales to increase that
have nothing at all to do with our
 These other factors can We can’t be sure that
confound our analysis of the X>Y unless we are Sales
relationship between advertising controlling for other
and sales (Y)
factors that could
 How do we know that advertising impact Y, like A & B
caused sales to increase and not
one of these other factors? shown above.

Elements of an Experimental Design
Independent Variable (IV or X): The variable whose impact you are
trying to measure (ag: Radio Ad)
An experiment is
defined as
Dependent Variable (DV or Y): The effect that you are trying to
manipulating an
measure (eg: Sales)
variable to see Extraneous Variables: Influences you are trying to “weed out” that
could confound your analysis
how it affects a
dependent Experimental Group: The group that gets the IV (eg: hears the radio)
variable while
also controlling
Control Group: If used, the group that doesn’t get the IV
the effects of
additional (or Pre-Test: If used, a measure of the DV before the IV is administered
extraneous) (eg: pretest for sales)
Post-Test: A measure of the DV after the IV is administered (eg: sales
measure after ad shown).

Two Common Experimental Designs
- Technically not a real - Often called the Classic Experimental
After-Only Design

Before-After with Control
experiment because it doesn’t Design.
manipulate anything. Often called - Two interchangeable samples are
a quasi-experimental design. drawn from the population of interest.
- DV is measured after the IV - The DV is measured to get a baseline
variable has been administered reading in both groups (Pre-Test or
- Any difference in the DV is Before measure).
attributed to the IV. Sometimes, - The IV is administered to one group
extraneous factors are controlled only (the Experimental Group). The
using statistical procedures at the remaining group (the Control Group)
point of data analysis. does not experience the IV.
- This is a loose design without - The DV is measured again in both
any controls: no before measure groups (Post-Test or After measure).
(how do we know that the DV
actually changed?) -Any difference between the
Before/After measures in the
- This is what happens when Experimental Group are then adjusted
survey data are used to try to based on the differences observed in
measure causation. It can’t be the Control Group (because those
done reliably. All that you see is could not be due to the IV). Any
association, not causation. remaining difference should be due to
the IV.

THMT 3312: Research Methods for Tourism &
Hospitality Management

Validity and Reliability

Slide 18
Validity in Data Collection
Does it the question/method measure what it was designed
to measure?
The result confirms what The ability of one The questions look like
Construct Validity:

Face Validity:
Criterion Validity:
it was designed to measure to correlate they measure the
measure with another either underlying construct
The measure/scale must concurrently, or effectively
be sensitive enough to predictively, when
testing a research model A weak way to
accurately measure determine validity
subtle variances Concurrently: Those who
Example: Studies slept got a better mark
showing purchase on the exam than those
intention should be able who didn’t
to discriminate between Predictive: Those who
future purchasers and wrote the admissions
non-purchasers test and did well, also did
well in their degree

Reliability in Data Collection
Does it produce consistent results time after time?
The test items/scale items Ensures that results are
Internally Consistent:

Overall stability:
consistently measure concepts consistent across those taking
Can do a split-half test to the survey
determine internal reliability if May require test/retest to see if
you have a battery of questions same individuals give
that you can split up and then consistent results (if possible
check how consistently both and if the first test doesn’t bias
test halves compare the follow-up test)
This is not possible for surveys Equivalent/parallel forms tests
without batteries of scaled are useful if one has two tests
questions that provide similar measures.
Can test with one, then the
next, then see if the results are
consistent across both tests.

Validity in Experimental Design

• The extent to which the change in the DV is

Internal actually due to the IV and not due to other

confounding factors
• Lab-based experiments are best for internal

Validity validity
• Eg: Did the sales increase come from the radio ad,
or something else?

• The extent to which the relationship observed

between the IV and DV during the experiment
External can be generalized back to the real world
• Field experiments (real world-based) are best
for external validity
Validity • Eg: Can what we observed in the experiment
actually happen/be used in real-world business

Things that Impact Validity in Experiments
External influences that happen Internal & external
History between pre/post measures that Affects validity
affect DV

Changes in human subjects over Internal & external

Maturation time
Affects validity

The effects of the pre-test on

Testing subsequent measures Affects Internal validity

Change in interviewers or
Instrumentation methodology between pre/post Affects Internal validity
DV measures
Differences in the samples selected
Selection for two-group studies (eg: Control Affects Internal validity
vs. Experimental groups)
Changes in research subjects due Internal & external
Mortality to attrition (drop outs) or deaths Affects validity
during the research project

Methods to Control for Extraneous Influences
in Experimental Design

• A discussion/
Consistency Blind Method process at the
Randomization • Using a double blind end of the
• Applying
research design • Randomizing approach where both research that
and methods treatment groups the researcher and provides full
consistently (counterbalancing) the subject, or disclosure to
across all disguised approach subjects and the
• Employing methods
research (unknown to opportunity to
to prevent order bias
subjects and respondent only) do discuss the
(hard to do)
groups not know what is research
being measured experience
• May be required
for ethical

THMT 3312: Research Methods for Tourism &
Hospitality Management

Using Statistics Tests

Significance Testing

Slide 24

How many variables are you testing or
describing? One or two?
- Univariate analysis involves - Bivariate analysis involves

Bivariate Analysis:
Univariate Analysis:
One Variable

Two Variables
analyzing data one variable at a analyzing two variables (IV and DV)
time at at time to better understand the
- What percent of tourists visited differences between them, or to
Peggy’s Cove? examine association (survey data),
or causation (experimental data)
- What is the average household
expenditure on electricity? - Who prefers a particular brand of
shoes: men or women? (Shoes =
- This kind of analysis involves DV, Gender = IV)
producing counts (frequencies),
percentages, and means for the - Are there significant differences in
questions asked in a survey and expenditure on vitamins for adults
examining the data over 55 and adults aged 55 or
younger? (Vitamin expenditure =
- This kind of analysis also involves DV, Age = IV)
creating confidence intervals
around the sample statistics to - This kind of analysis involves
determine what the population testing hypotheses using test
parameter would be as part of statistics suitable for the level of
statistical inference data measurement

Applying the Level of Data Measurement and
Statistical Significance
Significance tests differ depending the level of data measurement. Data
must be classified as categorical or continuous. The classification, which
is based on the four levels of data measurement, appears below.

Categorical Data Continuous Data

Nominal measures Interval measures

Ordinal measures Ratio measures

Significance Tests for Doing Bivariate Analysis
What does your
hypothesis measure?

Does one variable influence another? How variables are related, or how they differ, but not
(association/causation) without association/causation?
(IV & DV) (no IV/DV)

If both the IV and If both variables

If the IV and the
DV are continuous are continuous If both variables are
DV are both If the IV is categorical and
use Regression and data comes measured
measured the DV is continuous:
from the same continuously and
categorically use
respondent (eg: you want to
comparing two investigate their
different ratings association with
Use Analysis of from the same each other – use
Use Independent person) – use
Variance (ANOVA) Correlation
Samples t-test if the Paired Samples
IV has only two levels if the IV has more
than two levels to t-test
to compare
(eg: Male/Female) compare
(eg: Income)

Module Review:
1. What is the difference between association and

2. Why is significance testing necessary?

3. The Null Hypothesis is the “default” in hypothesis

testing. What does this mean?

Discussion 4. What are the strengths of having a before measure?

What are the weaknesses of having a before measure?
5. What is meant by the following kinds of influences:
maturation, mortality, selection?

6. When is a research finding statistically significant, but

not managerially significant?

7. What is the p-level for a .90 level of significance?

8. Is a nominal measurement also continuous?


