Chapter 5 - Hypothesis Testing Part 1

“What Does it Mean if 2 Companies
Report 95% Efficacy Rates?”

- New York Times, November 20, 2020

“In the case of Pfizer, for example, the
company recruited 43, 661 volunteers and
waited for 170 people to come down with
symptoms of COVID-19 and then get a
positive test. Out of these 170, 162 had
received a placebo shot, and just eight had
received the real vaccine.

- Carl Zimmer (2020)

On Nov. 30, 2020 Moderna (2020) issued a
press release stating a primary analysis
based on 196 cases, of which 185 cases of
COVID-19 were observed in the placebo
group versus 11 cases observed in the
mRNA-1273 group, resulting in a point
estimate of vaccine efficacy rate of 94.1%
(out of the 30,000 participants in the US).
• Hypothesis Testing Intuition
• Null and Alternative Hypotheses
• Type I and Type II Errors
• Significance Level
• Test Statistics
• P-value
• Choosing Statistical Test
Hypothesis Testing:
“The Idea”
Suppose, as a statistics student, you would like
to verify whether the claim of Pfizer that the
efficacy rate of their COVID-19 vaccine is
really 95%.

Or, perhaps as a Pfizer researcher, you would

like to find out if your vaccine is really
effective in protecting a vaccinated individual
from acquiring COVID-19.
Let’s recall the steps of a research process
(scientific method)
•Define a problem (based on an observation)
•Gather data
•Generate a theorem and formulate a testable
hypothesis (an educated guess)
•Test the hypothesis (experiment)
•Make a conclusion
What is a statistical hypothesis?

In statistical inference, this is an assumption or

prediction about population/s expressed in
terms of parameters (e.g. population means,
population proportion or correlations)
What is a statistical hypothesis?
•Say for example, Pfizer vaccines have an
efficacy rate of 95%.
•Or, males have higher IQ level than females.
•Or, intervention X is better than intervention
•Or rating of movies is related to the number
of viewers.
A procedure for determining whether or
not the hypothesis is true is called
statistical hypothesis test or significance
It uses data to evaluate a hypothesis by
comparing sample point estimates of
parameters to values predicted by the
How is scientific method compared with statistical test
of hypothesis?
• Define a problem
• Gather data We create two hypotheses...
• Formulate a
hypothesis A hypothesis to be tested called
• Test the hypothesis the NULL HYPOTHESIS
• Make a conclusion and a back up one called the
Null vs Alternative
“Why two guesses?”
Types of Hypothesis
• Null Hypothesis (Ho) - always states that there is
no effect in the underlying population.

-by effect we might mean a relationship between

two or more variables, a difference between two
or more different populations or a difference in the
responses of one population under two or more
different conditions.
Types of Hypothesis

• Null Hypothesis (Ho) - -is usually formulated for

the purpose of being rejected.
Types of Hypothesis

•Alternative Hypothesis (Research Hypothesis)

(Ha or H1) - the operational statement of the
researcher's hypothesis.
-to be accepted if the null hypothesis is rejected.

-is a prediction of how two variables might be

related to each other.
Types of Hypothesis
• Alternative Hypothesis (Research Hypothesis)

- or, it might be our prediction of how specified

groups of participants might be different from
each other or how one group of participants might
be different when performing under two or more
Types of Hypothesis (Remarks)

•The null hypothesis and the alternative

hypothesis are mutually exclusive

•The alternative research hypothesis can be

directional or non-directional.
•Directional research hypothesis
• specifies the direction of the difference or direction
of relationship.

•Non-directional research hypothesis

• does not specify the direction of the difference or
direction of relationship.
• Who is smarter, males or females?
• Is IQ related to academic performance?
• Does in-house training make employees more
productive than out-house training as indicated by
their work performance?
• Is there a significant relationship between the level of
morale of employees and their work performance?
We can restate…
•Who is smarter, males or females?
•Is there a significant difference between
the IQ level of the male and females?
•Is IQ related to academic performance?
•Is there a significant relationship between
IQ and academic achievement?
1. Research Question: Is there a significant difference
between the IQ level of the males and females?
Null Hypothesis: There is no significant difference between
the IQ level of the males and females.
Non-directional alternative hypothesis:
• There is a significant difference between the IQ level of the
males and females.
Directional alternative hypothesis:
• The IQ level of males is higher than that of the females.
2. Research Question: Is there a significant relationship
between IQ and academic achievement?
Null Hypothesis: There is no significant relationship between
IQ and academic achievement.
Non-directional alternative hypothesis:
• There is a significant relationship (or correlation) between
the IQ and academic performance.
Directional alternative hypothesis:
• The higher the IQ of a student the better his academic
Type I and Type II
“The good, the bad, and
the ugly ”
•Decision maker (statistician) mainly relies on data
observed on whether he will REJECT the null
hypothesis or ACCEPT (fail to reject) it.
•However, true states of nature, which are beyond
his control, may determine whether his decision
is good or bad.
•Let’s illustrate the possible scenarios for every
decision made against the true state of nature
Nature of the hypothesis
Null H is true Null H is false

Error Good
Reject the null H

Accept the null H Good Error

Consider the following analogies to
illustrate this scenario.

•Courtroom Trial
•COVID-19 Testing
Courtroom Trial Analogy
•Null Hypothesis: The defendant is not guilty
•Alternative Hypothesis: The defendant is guilty.
•The judge may reject the null and convict the
•Or accept the null and acquit him.
Ho: The defendant is not guilty.
Nature of the hypothesis
Null H is true Null H is false
(Innocent) (Guilty)
Reject (Convict) Error
Good Error
Accept (Acquit) Decision
COVID-19 Test Analogy
•Null Hypothesis: The patient is COVID-19 negative.
•Alternative Hypothesis: The patient is COVID-19
•The test may reject the null and report a COVID-19
positive person.
•Or accept the null and report him as negative.
Ho: The patient is COVID-19 negative.
Nature of the hypothesis
Null H is true Null H is false
(Without Virus) (With Virus)
Reject (Positive) Error
Good Error
Accept (Negative) Decision
Types of Statistical Errors
• Type I Error (producer’s error) is committed when we
reject the null hypothesis when in fact it is true.

• Type II Error (consumer’s error) is committed when

we accept the null hypothesis when in fact it is false.
Chance of Committing Such Errors

• 𝛼(alpha) is the probability of committing type I


• 𝛽(𝑏𝑒𝑡𝑎) error is the probability of committing

type II error.
Alpha (𝛼):
“A criterion for
statistical significance”
Alpha (𝛼)
•known as the significance level, it is
interpreted as the allowance for error in
decision making.
•To be useful, the level of significance of a test
must be small.
•By tradition, the most common value of 𝛼 are
0.05 or 0.01.
Alpha (𝛼)
•It is the probability level that we use as a cut-
off below which we are happy to assume that
our pattern of results is so unlikely as to
render our research hypothesis (alternative)
as more plausible than the null hypothesis.
Alpha (𝛼)
•On the assumption of the null hypothesis
being true, if the probability of obtaining an
effect due to sampling error is less than 5%,
then the findings are said to be ‘significant’.
•If this probability is greater than 5%, then the
findings are said to be ‘non-significant’.
Test Statistic:
“The decision maker”
Test Statistic

•Test statistic is a formula (called a decision

maker) used to test the null hypothesis.
•It is used to determine how close a specific
sample result falls to one of the hypothesis
being tested.
•Examples of test statistics: z, 𝜒2, t, F
• When we convert our data into a score from a
probability distribution, the score we calculate is
called the test statistic.
• For example, if we were interested in looking for a
difference between two groups, we could convert our
data into a t-value (from the t-distribution). This t-
value is called our test statistic.
• We then calculate the probability of obtaining such a
value by chance factors alone and this represents our
Remark: The values of the test statistic can be
classified in two sets:
• 1. Critical region or rejection region of a test is the set
of values of the test statistic that leads to the
rejection of the null hypothesis.

• 2. Acceptance region is the set of values of the test

statistic that will lead to the acceptance of the null
•Critical value of the test statistic is a that value
which separates the critical region from the
acceptance region.
test statistic



P-values or attained level of significance
•The p-value is the probability of obtaining the
pattern of results we found in our study if there
was no relationship between the variables in
which we were interested in the population.

•the smaller the p-value, the more strongly the

test rejects the null hypothesis
•The type of test to be used depends on the nature
of the research hypothesis.
•In general, if the research hypothesis is
directional, a one-tailed-test is used;
•if the research hypothesis is non-directional, a
two-tailed test is used.
Logic of Null Hypothesis Testing
• 1. State the null and the alternative hypothesis
• 2. Set the level of significance (𝛼) to be used.
• 3. Identify and compute the appropriate test statistic to be
used (e.g., t-statistic).
• 4. Determine the probability value (p- value).
• 5. Make the decision:
• Decision Rule: Reject the null hypothesis if and only if, the
p-value is less than level of significance (𝜶) .
Let’s take a BREAK!!!
Choosing the Appropriate
Test Statistics
(Without asking a
Assumptions underlying the use of statistical tests
- Many statistical tests that we use require that our
data have certain characteristics. These
characteristics are called assumptions.
- Many statistical tests are based upon the
estimation of certain parameters relating to the
underlying populations in which we are interested.
These sorts of test are called parametric tests.
Parametric Tests
- These tests make assumptions that our samples are
similar to underlying probability distributions such as
the standard normal distribution
- scale of the dependent variable should be at least interval
- samples must be drawn from a normally distributed
- variances of the population must be approximately equal
(homogeneity of variances)
- no extreme scores
SPSS: Statistics Coach
One of the biggest factors in
determining which statistical
tests you can use to analyse
your data is the way you have
designed your study
Two most common tests are
difference test and
correlational test
Overview of the main features of the various research design
Overview of the main features of the various research design
Research Designs
Another important feature of research designs
is whether you get each participant to take
part in more than one condition.

Between-participants or
Within-participants designs
Research Designs
Between-participants designs are those
where we have different participants
allocated to each condition of the IV.

The participants are called independent or

uncorrelated samples.
Between –participants/Independent Samples
•samples that are randomly selected from distinct
•the sample sizes may or may not be equal.
Examples of Independent Samples
•sample of male students and sample of
female students
•sample of smokers and sample of non-
•sample of parents, sample of teachers, and
sample of pupils
Limitations of Between-participants design
different people bring different
characteristics to the experimental
setting hence may increase presence of
extraneous variables
Research Designs
Within-participants designs, on the other
hand, are those where each participant is
measured under all conditions of the IV.

The participants are called dependent or

correlated samples.
Within-Participants/Dependent Samples

dependent samples or correlated samples

usually arise in experimental designs where
the objective is to make sure that the
subjects being compared are comparable in
terms of relevant variables
Within-Participants/Dependent Samples

•these designs are repeated measures

designs (e.g. pretest-posttest design) and
matched groups design

•the sample sizes of the groups are always

A. Before & After or Pretest-Posttest Design
(Repeated Measures Designs)

• The two sets of

data are said to be
correlated because
they are taken or
measured from the
same set of


B. Matched Groups Design
• We take a sample of
paired individuals and we
randomly split each pair
into two groups.
• The resulting samples are
dependent or correlated.
For instance…
•Experimental study on the effectiveness of two
methods of teaching, we have to make sure that
the groups of students are comparable in terms of
ability and other relevant characteristics.

•If matching is not done, it may happen that the

groups are not comparable and the internal
validity of the study will be questionable.
Matched group design
•is rarely resorted to by educational
researchers because of the difficulty in
matching individuals.
•the more variables considered, the more
difficult it will be to form a good number of
matched or paired individuals.
Limitations of Within-participants design
•presence of order effects (can be answered by
•demand effects / halo effects
•cannot be used in many quasi-experimental

