Lecture 12 - Hypothesis Testing

MSc Computer Science with Emerging Technologies
Lecture 12
Hypothesis Testing
Research Methodologies
by Dr Vinaye Armoogum
Professor
VA 1
Learning Topics
This lecture will address the following:
1. Definition of hypothesis and formation
2. Testing hypothesis using three methods
3. Examples using various methods
VA 2
Research Process
II. Review the literature
Review concepts
and theories IV. Design
I. Define Research III. Formulate research(including
Problem
Review previous hypotheses sample design)
research finding
V. Build Design (e.g.

VII. Interpret VI. Analyse data
(Test hypotheses) model) and Collect
and report
data (Execution)
VA 3
Development of
Working Hypothesis
• After extensive literature survey, the researcher should state in clear terms the working
hypothesis.
• For a researcher hypothesis is a formal question that he intends to resolve.
• A hypothesis is a proposed explanation for an observable phenomenon which is capable of being
tested by scientific methods .
• For example, consider statements:
“the drug A is equally efficacious as drug B.” (medical field)
or
“the novel ML-based IDS X is equally efficient as the existing Cisco SNORT IDS.” (computer science
field)
These are hypotheses capable of being objectively verified and tested.
Formation of Hypothesis and the technique how Testing will be conducted must be
explained in the Research Methodology Section/chapter and, the testing to be computed and
VA 4
analyzed in the Results Analysis Section/chapter
Hypothesis Definition
Hypothesis
in statistics, is a claim or statement about a property of a population
Hypothesis Testing
is to test the claim or statement
Example: A conjecture is made that “the average starting salary for computer
science graduates is Rs 30,000 per month”.
VA 5
Question:
How can we justify/test this conjecture?
A. What do we need to know to justify this conjecture?
B. Based on what we know, how should we justify
this conjecture?
VA 6
Answer to A:
Randomly select, say 100 computer science graduates and find
out their monthly salaries
– We need to have some sample observations, i.e., a sample set!
Answer to B:
In this lecture, this question will be explored
– Make conclusions based on the sample observations
VA 7
Statistical Reasoning
Analyze the sample set in an attempt to distinguish
between results that can easily occur and results that are
highly unlikely.
VA 8
Central Limit Theorem:
Distribution of Sample Means
Assume the
conjecture is Likely sample means Sample data: z = 2.62
true!
or
x = 43.1k
µx = 30k
z = –1.96 z= 1.96
or or
x = 20.2k x = 39.8k
VA 9
Development of
Working Hypothesis
Characteristics of hypothesis: Hypothesis must possess the following
characteristics:
❖ Hypothesis should be clear and precise. If the hypothesis is not clear and
precise, the inferences drawn on its basis cannot be taken as reliable.
❖ Hypothesis should be capable of being tested.
❖ Hypothesis should be limited in scope and must be specific.
❖ Hypothesis should be stated as far as possible in most simple terms so that the
same is easily understandable by all concerned.
❖ Hypothesis should be amenable to testing within a reasonable time. One
should not use even an excellent hypothesis, if the same cannot be tested in
reasonable time for one cannot spend a life-time collecting data to test it.
❖ Thus hypothesis must actually explain what it claims to explain
VA 10
Purpose and Importance of
Hypotheses in an Empirical Research
The key advantages of hypothesis formation are given below:
• It provides the researcher with a relational statement that can
be directly tested in a research study.
• It helps in formulation of conclusions of the research.
• It helps in forming a tentative or an educated guess about any
phenomena in a research.
• It provides direction to the collection of data for validation of
hypothesis and thus helps in carrying the research forward.
• Even if the hypothesis is proven to be false, it leads to a specific
conclusion.
VA 11
How to Form a
Hypothesis?
The null hypothesis is also known as hypothesis of no difference and denoted as H0.
• The null hypothesis is the proposition that implies that there is no statistically
significant relationship within a given set of parameters.
• It denotes the reverse of what the researcher in his experiment would actually expect or
predict.
Alternative hypothesis is denoted as H1 or Ha.

• The alternative hypothesis reflects that a statistically significant relationship does exist
within a given set of parameters. It is also known as Research Hypothesis. Example
regarding law of gravity by Newton, then later more explicitly explained by Einstein.
Hence Einstein gave a more explicit explanation of law of gravity via an alternative
Hypothesis or Research hypothesis
• It is the opposite of null hypothesis and is only reached if H0 is rejected.

VA 12
Components
Components of a Formal Hypothesis Test
❖ Null Hypothesis (denoted H 0):

is the statement being tested in a test of hypothesis.
❖ Alternative Hypothesis (H 1):
is what is believe to be true if the null hypothesis is false.
VA 13
Null Hypothesis: H0
❖ Must contain condition of equality

❖ =, , or 
❖ Test the Null Hypothesis directly
❖ Reject H 0 or fail to reject H 0
VA 14
Alternative Hypothesis: H1
❖ Must be true if H0 is false
❖ , <, >
❖ ‘opposite’ of Null
• Example:
• H0 : µ = 30 versus H1 : µ > 30
VA 15
Forming (Stating) Your Own
Hypothesis
• If you wish to support your claim, the claim must be
stated so that it becomes the alternative hypothesis.
• That is, if you are conducting a study and want to use a
hypothesis test to support your claim, the claim must be
worded so that it becomes the alternative hypothesis.
VA 16
Forming (Stating) Your
Own Hypothesis
Legal Trial Hypothesis Test
HO The defendant Claim about a
is not guilty population parameter
HA The defendant Opposing claim about a

is guilty population parameter
Result The evidence The statistic indicates a

convinces the rejection of HO, and the
jury to reject alternate hypothesis is
the assumption accepted.
of innocence. The
verdict is guilty
VA 17
Important Notes:
❖ H0 must always contain equality; however some claims are
not stated using equality. Therefore, sometimes the claim
and H0 will not be the same.
❖ Ideally all claims should be stated that they are Null

Hypothesis so that the most serious error would be a Type
I error.
VA 18
Example 1
❖ Level of Confidence (C)– how confident are we in our decision?
❖ Level of significance,  = 1 - C
❖ A company has stated that their straw machine makes straws that are 5
mm diameter. An employee believes that the machine no longer makes
straws of this size and he samples 100 straws to perform a hypothesis
test with 99% confidence. Write the H0 and the H1
H0: µ = 5 mm
H1: µ  5 mm
C = 0.99, hence,  = 0.01
VA 19
More Examples
❖ Doctors believe that teens sleep on average no longer 10 hours a day. A researcher believes
that teens on average sleep longer. Write the H0 and the H1
H0: µ  10 hours
H1: µ > 10 hours
❖ The School Board claims that at least 60% of the students bring smart phones to the
university. A lecturer believes that this number is too high and randomly samples 25
students to test at a level of significance of 0.02. Write the H0 and the H1.
Note that in this example, there is no average in the claim. The number is a proportion.
Hence we will replace the symbol µ by the symbol P.
H0: P  0.6
H1: P < 0.6
 = 0.02, C = 0.98
VA 20
Type I Error
❖ The mistake of rejecting the null hypothesis when it is true.
❖ The probability of doing this is called the significance level, denoted by

 (alpha).
❖ Common choices for  : 0.05 and 0.01
❖ Examples:
– Rejecting a perfectly good parachute and refusing to jump
– Rejecting (preventing) a genuine voice packet to enter a VoIP

system to communicate with the IP-PBX server
VA 21
Type II Error
❖ the mistake of failing to reject the null hypothesis when it is false.
❖ denoted by ß (beta)
❖ Example:
– failing to reject a defective parachute and jumping out of a plane

with it.
– failing to reject an illegitimate voice message from penetrating

into a VoIP system
VA 22
Type I and Type II Errors
True State of Nature (Reality)
The null The null
hypothesis is hypothesis is
true false
We decide to Type I error
Correct
reject the (rejecting a true
decision
null hypothesis null hypothesis)
Decision
(measured or perceived) We fail to Type II error
Correct
reject the (failing to reject
decision a false null
null hypothesis
hypothesis)
VA 23
Definition
Test Statistic:
is a sample statistic or value based on sample data
Example:
x – µx
z=
s/ n
VA 24
Critical Region
Set of all values of the test statistic that would cause a rejection
of the null hypothesis
Critical
Regions
VA 25
Critical Value
Value (s) that separates the critical region from the values that
would not lead to a rejection of H 0
Reject H0 Fail to reject H0
Critical Value
( z score ) VA 26
Controlling Type I
and Type II Errors
❖  , ß, and n are related
❖ when two of the three are chosen, the third is determined
❖  and n are usually chosen
❖ try to use the largest  you can tolerate
❖ if Type I error is serious, select a smaller  value and a

larger n value
VA 27
Conclusions
in Hypothesis Testing
❖ always test the null hypothesis
1. Fail to reject the H 0
2. Reject the H 0
❖ Need to formulate correct wording of final conclusion
See Flowchart in the next slide
VA 28
Wording of Conclusions in
Hypothesis Tests
“There is sufficient (This is the
Original Do Yes
you reject evidence to warrant only case in
H0?. (Reject H0) rejection of the claim which the
claim is H0
that. . . (original claim).” original claim
No is rejected).
(Fail to
“There is not sufficient
reject H0)
evidence to warrant
rejection of the claim
that. . . (original claim).”
(This is the
Original Do Yes “The sample data only case in
you reject supports the claim that which the
claim is H1 H0 ? (Reject H0) . . . (original claim).” original claim
No is supported).
(Fail to
reject H0) “There is not sufficient
evidence to support
the claim that. . .
(original claim).”
VA 29
Accept versus Fail
to Reject
❖ Some texts use “accept the null hypothesis”
❖ The term ‘accept’ is somewhat misleading

❖ We are not proving the null hypothesis
❖ Sample evidence is not strong enough to warrant rejection

(such as not enough evidence to convict a suspect)
VA 30
Left-tailed Test
H0: µ  200
H1: µ < 200
Points Left
Reject H0 Fail to reject H0
Values that
differ significantly
from 200 200
VA 31
Right-tailed Test
H0: µ  200
H1: µ > 200
Points Right
Fail to reject H0 Reject H0
Values that
differ significantly
200 from 200
VA 32
Two-tailed Test
H0: µ = 200  is divided equally between
H1: µ  200 the two tails of the critical
region
Means less than or greater than
Reject H0 Fail to reject H0 Reject H0
200
Values that differ significantly

VA from 200 33
Assumptions
For testing claims about population means
1) The sample is a simple random sample.
2) The sample is large (n > 30).
a) Central limit theorem applies
b) Can use normal distribution
3) If s is unknown, we can use sample standard deviation s as

estimate for s.
VA 34
Three Methods
(tests)
1) Traditional method
2) P-value method
3) Confidence intervals
Note: These three methods are equivalent, i.e., they will provide the same
conclusions.
VA 35
Other Tests
2. ANOVA for equality of means
3. Paired t-test
4. Chi-square test for goodness-of-fit
5. Friedman test
6. Mann–Whitney test
7. Kruskal–Wallis test
8. Wilcoxon signed-rank test
VA 36
Traditional (or Classical) Method
of Testing Hypotheses
Goal
Identify a sample result that is significantly different from the

claimed value
The traditional (or classical) method of hypothesis testing

converts the relevant sample statistic into a test statistic which
we compare to the critical value.
VA 37
Hypotheses Testing
five-Step Process
1. State the hypotheses
2. Decide on a model.
3. Determine the endpoints of the rejection region and state the

decision rule.
4. Compute the test statistic
5. State the conclusion
VA 38
Test Statistics
Test Statistic for Claims about µ when n > 30
x - µx
z= s
n
Test Statistic for Claims about µ when n < 30
x - µx
t= s
VA n 39
Decision Criterion
• Reject the null hypothesis if the test statistic is in the

critical region
• Fail to reject the null hypothesis if the test statistic is

not in the critical region
VA 40
Start
Wording of Final
Conclusion
Does the “There is sufficient (This is the
Yes Do Yes
original claim contain evidence to warrant only case in
you reject
the condition of (Original claim H0?. (Reject H0) rejection of the claim which the
equality that. . . (original claim).” original
contains equality
and becomes H0)
No claim
(Fail to is rejected).
“There is not sufficient
No reject H0)
evidence to warrant
(Original claim
rejection of the claim
does not contain
equality and
becomes Ha)
(This is the
Do Yes “The sample data only case in
you reject supports the claim that which the
H0 ? (Reject H0) . . . (original claim).” original claim
No is supported).
(Fail to
reject H0) “There is not sufficient
evidence to support
the claim
VA 41
Example – Statically
Significant
Given a dataset of 106 healthy body temperatures, where the mean was 98.2o and s = 0.62o, at
the 0.05 significance level, test the claim that the mean body temperature of all healthy
adults is equal to 98.6o, that is, where do we draw the line to make a decision?
Steps:
1) State the hypotheses
H0 :  = 98.6o
Ha :   98.6o
2) Determine the model
Two tail Z test, n > 30, to use Z-Table
VA 42
Example
3) Determine the Rejection Region
 = 0.05
/2 = 0.025 (two tailed test)
0.4750 0.4750
0.025 0.025
z = - 1.96 1.96
VA 43
Example
4) Compute the test statistic
z = x - µ = 98.20.62- 98.6 = - 6.64

s n 106
VA 44
Example
5) State the Conclusion
Reject Reject
H0: µ = 98.6 H0: µ = 98.6
Fail to Reject
H0: µ = 98.6
Sample data:
x = 98.2o
or
z = - 6.64 More extreme values are
in the rejection region
z = - 1.96 µ = 98.6 z = 1.96

or z = 0
z = - 6.64 There is sufficient evidence to warrant rejection of

claim that the mean body temperatures of healthy
REJECT H0 adults is equal to 98.6o.
VA 45
Assumptions
For testing claims about population means
1) The sample is a simple random sample.

2) The sample is small (n  30).
3) The value of the population standard deviation s is unknown.
4) The sample values come from a population with a distribution

that is approximately normal.
VA 46
Test Statistic for a
Student t-distribution
x -µx
t= s
n
Critical Values
❖ Found in t-Table
❖ Degrees of freedom (df) = n -1
❖ Critical t values to the left of the mean are negative
VA 47
Example
A company manufacturing rockets claims to use an average of 5500 lbs of rocket fuel for the first 15
seconds of operation. A sample of 6 engines are fired and the mean fuel consumption is 5690 lbs with a
sample standard deviation of 250 lbs. Is the claim justified at the 5% level of significance?
1. HO: µ = 5500 HA: µ  5500

2. Two tail t test, n < 30, unknown
population standard deviation
-2.571 1.862 2.571
3. t critical for 5% for a two tail test with (n-1) = 5 d.f. is 2.571
x -μ 5690 − 5500
4. t = = = 1.862
s n 250 6
5. Fail to reject HO, there is no evidence at the .05 level that the average
fuel consumption is different from µ = 5500 lbs
VA 48
P-Value Method
❖ Only selected values of 
❖ Specific P-values usually cannot be found
❖ Use Table to identify limits that contain the P-value

❖ Some calculators and computer programs will find
exact P-values
VA 49
P-Value Method
❖very similar to traditional method
❖key difference is the way in which we decide to

reject the null hypothesis
❖approach finds the probability (P-value) of

getting a result and rejects the null hypothesis
if that probability is very low
VA 50
P-Value Method
Definition
❖ P-Value (or probability value)
– the probability that the test statistic is as far from  if

the null hypothesis is true
The attained significance level of a hypothesis test is the P
value of its test statistic
VA 51
P-Value Method
Guidelines for rejecting HO based on the P
value:
If P < , then reject HO and accept HA
If P > , then reserve judgement about HO
VA 52
P-value Interpretation
Small P-values (such as Unusual sample results. Significant
0.05 or lower) difference from the null hypothesis
Large P-values (such as Sample results are not unusual. Not a

above 0.05) significant difference from the null
hypothesis
VA 53
Start Finding P-Values
Left-tailed What Right-tailed
type of test
?
Two-tailed
Is
Left the test statistic Right
to the right or left of
center
?
P-value = area P-value = twice P-value = twice P-value = area

to the left of the the area to the left the area to the right to the right of the
test statistic of the test statistic of the test statistic test statistic
P-value P-value is twice P-value is twice P-value

this area this area
µ µ µ µ
Test statistic Test statistic VA Test statistic Test statistic 54
Example : Hypothesis
Testing Using P-values
The National Institute of Diabetes and Digestive and Kidney Diseases reports that the average
cost of bariatric (weight loss) surgery is $22,500. You think this information is incorrect.
A researcher randomly selects 30 bariatric surgery patients and find that the average cost for
their surgeries is $21,545 with a standard deviation of $3015. Is there enough evidence to
support the researcher’s claim at α = 0.05? Use a P-value.
(Adapted from National Institute of Diabetes and Digestive and Kidney Diseases) (courtesy: Pearson
Education, 2012)
VA 55 of 101 55
Example : Hypothesis
Testing Using P-values
• P-value
• H0: μ = $22,500 0.5000
• Ha: μ ≠ 22,500 (Claim) - 0.4582 ->(for Z= -1.73)
0.0418
• α = 0.05
• Test Statistic:
x − • Decision: P=0.0836 > 0.05
z=
s n Fail to reject H0 .
21,545 − 22,500 At the 5% level of significance, there is
 not sufficient evidence to support the
3015 30
claim that the mean cost of bariatric
 −1.73 surgery is different from $22,500.
VA 56 of 101 56
Connection to Confidence
Intervals
•Does an average box of cereal
contains 368 grams of cereal?
•A random sample of 25 boxes 368 gm.
showed X = 372.5. The company has
specified s to be 15 grams.
•Test at the =0.05 level. H0:  = 368
H1:   368
Connection to Confidence
Intervals
For X = 372.5 gms, s (sample std dev) = 15 and n = 25,
The 95% Confidence Interval is:
372.5 - (1.96) 15/5 to 372.5 + (1.96) 15/5
or
366.62    378.38
If this interval contains the Hypothesized mean (368), we do not
reject the null hypothesis.
It does.
Verdict: Do not reject.
By Vinaye Armoogum 58
Z Test for Proportion
• Problem:
The marketing department of a software company claims
that it receives 4% responses from its Mailing.
• Approach:
To test this claim, a random sample of 500 were surveyed
with 25 responses.
• Solution:
Test at the  = .05 significance level.
Z Test for Proportion: Solution
H0: p = .04 Test Statistic:

H1: p  .04
p - ps .04 -.05
•  = .05 Z @ = = 1.14
• n = 500 p (1 - p) .04 (1 - .04)
n 500
Critical Values:  1.96
Decision:
Reject Reject Do not reject at  = .05
.025 .025
Conclusion:
We do not have sufficient
evidence to reject the company’s
0 Z claim of 4% response rate.
Summary
We have considered
• Formation of hypotheses
• Testing Hypothesis using various methods.
VA 61

Lecture 12 - Hypothesis Testing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 12 - Hypothesis Testing

Uploaded by

Copyright:

Available Formats

MSc Computer Science with Emerging Technologies

1. Definition of hypothesis and formation

2. Testing hypothesis using three methods

3. Examples using various methods

V. Build Design (e.g.

Alternative hypothesis is denoted as H1 or Ha.

• It is the opposite of null hypothesis and is only reached if H0 is rejected.

❖ Null Hypothesis (denoted H 0):

❖ Must contain condition of equality

HA The defendant Opposing claim about a

Result The evidence The statistic indicates a

❖ Ideally all claims should be stated that they are Null

❖ The probability of doing this is called the significance level, denoted by

❖ Common choices for  : 0.05 and 0.01

– Rejecting a perfectly good parachute and refusing to jump

– Rejecting (preventing) a genuine voice packet to enter a VoIP

– failing to reject a defective parachute and jumping out of a plane

– failing to reject an illegitimate voice message from penetrating

Reject H0 Fail to reject H0

❖ when two of the three are chosen, the third is determined

❖  and n are usually chosen

❖ try to use the largest  you can tolerate

❖ if Type I error is serious, select a smaller  value and a

1. Fail to reject the H 0

❖ Need to formulate correct wording of final conclusion

See Flowchart in the next slide

❖ The term ‘accept’ is somewhat misleading

❖ Sample evidence is not strong enough to warrant rejection

Reject H0 Fail to reject H0

Fail to reject H0 Reject H0

Means less than or greater than

Reject H0 Fail to reject H0 Reject H0

Values that differ significantly

3) If s is unknown, we can use sample standard deviation s as

Identify a sample result that is significantly different from the

The traditional (or classical) method of hypothesis testing

3. Determine the endpoints of the rejection region and state the

4. Compute the test statistic

5. State the conclusion

• Reject the null hypothesis if the test statistic is in the

• Fail to reject the null hypothesis if the test statistic is

z = x - µ = 98.20.62- 98.6 = - 6.64

z = - 1.96 µ = 98.6 z = 1.96

z = - 6.64 There is sufficient evidence to warrant rejection of

For testing claims about population means

1) The sample is a simple random sample.

3) The value of the population standard deviation s is unknown.

4) The sample values come from a population with a distribution

1. HO: µ = 5500 HA: µ  5500

❖ Specific P-values usually cannot be found

❖ Use Table to identify limits that contain the P-value

❖key difference is the way in which we decide to

❖approach finds the probability (P-value) of

– the probability that the test statistic is as far from  if

Large P-values (such as Sample results are not unusual. Not a

P-value = area P-value = twice P-value = twice P-value = area

P-value P-value is twice P-value is twice P-value

H0: p = .04 Test Statistic:

You might also like