Download as pdf or txt
Download as pdf or txt
You are on page 1of 139

Essentials of Data Science With R

Software ‐ 1
Probability Theory and Statistical Inference

L
Testing of Hypothesis

TE
:::
NP Lecture 65
Basics of Tests of Hypothesis and Decision Rules
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Testing of Hypothesis:
Notice that different samples give different values of estimates.

The value of estimates may or may not be the same as of the


unknown value of parameters in the population.

L
TE
The difference in the estimated value may be due to random
variation or there may be some assignable cause/reason.
NP
The value which we want to know is unknown. So want to check if
the estimated value differs with some other reasonable value or not.

2
Testing of Hypothesis:
In simple words, we would like to test the closeness of estimated
value with some hypothetical value.

Find the difference between estimated value and hypothetical value.

L
• If differences is less, we can expect to accept it and would say that

TE
there is not much significant difference between the two values.
NP
• If difference increases, we can expect to reject it and would say
that there is significant difference between the two values.

How to decide scientifically that the difference between the values is


significant or not.
3
Testing of Hypothesis: Examples
Suppose we want to compare the yield of the crop with a new
variety of seeds with that of an earlier used variety of seeds. The
new variety may replace the earlier one if found better.

L
Suppose we want to determine whether a new method of sealing

TE
LED light bulbs will increase the life of the LED bulbs, whether one
NP
method of preserving foods is better than another so as to retain the
vitamins more, and so on.

All such statements can be converted into a statement or a


hypothesis.
4
Statistical Hypothesis:
A researcher may have a research question for which the truth about
the population of interest is unknown.

L
Suppose data can be obtained using a survey, observation, or an

TE
experiment.

NP
If, given a prespecified uncertainty level, a statistical test based on
the data supports the hypothesis about the population, we say that
this hypothesis is statistically proven.

5
Hypothesis:
A statistical hypothesis is usually a statement about a set of
parameters of a population distribution.

L
It is called a hypothesis because it is not known whether or not it is

TE
true.

NP
A primary problem is to develop a procedure for determining
whether or not the values of a random sample from this population
are consistent with the hypothesis.

6
Research Hypothesis:
Research Hypothesis is a statement of what the researcher believes
will be the outcome of an experiment or a study

For example,

– Older workers are more loyal to a company.

L
TE
– Companies with more than Rs. 1 billion of assets spend a
higher percentage of their annual budget on advertising than
NP
the companies with less than Rs. 1 billion of assets.

– The price of scrap metal is a good indicator of the industrial


production index six months later.

7
Statistical Hypothesis:
Statistical Hypothesis is a formal structure used to statistically (based
on a sample) test the research hypothesis.

This is the statement which we want to test.

L
A claim (assumption) about a population parameter

TE
Examples:
NP
– The average monthly cell phone bill of the people in a city is
Rs. 1000.00

– The proportion of adults in this city with cell phones is more


than 0.80
8
Statistical Hypothesis: Simple and Composite Hypothesis
A statistical hypothesis is an assertion or conjecture about the
distribution of one or more random variables.

L
If the statistical hypothesis completely specifies the distribution,

TE
then it is called simple; otherwise, it is called composite hypothesis.

NP

9
Statistical Hypothesis: Simple and Composite Hypothesis
The statement “𝝁 is equal to 17” is simple hypothesis since it
completely specifies the distribution.

Everything is known that μ = 17.

L
TE
The statement “𝝁 is less than 17” is a composite hypothesis as it
NP
does not completely specify the distribution.

It is known that μ ≠ 17 but what value does it take is unknown.

10
Test of a Statistical Hypothesis:
A test of a statistical hypothesis is a rule or procedure for deciding
whether to reject the hypothesis.

For example, consider a particular normally distributed population

L
having an unknown mean value 𝝁 and known variance 1.

TE
The statement “𝝁 is less than 17” is a statistical hypothesis that we
NP
could try to test by observing a random sample from this population.

If the random sample is deemed to be consistent with the hypothesis


under consideration, we say that the hypothesis has been
“accepted”; otherwise we say that it has been “rejected.” 11
Randomized and Nonrandomized Test:
A test can be either randomized or nonrandomized.

For example, consider a test Toss a coin, and reject the hypothesis if

and only if a head appears. Such is an example of a randomized test.

L
TE
We will consider only the nonrandomized tests
NP

12
Testing of Hypothesis: One Sample and Two Sample Test
When we are working only with one sample to conclude about a
value, the problem is termed as one sample problem.

When we are working with two samples to make a comparison and

L
TE
conclusion about a value, the problem is termed as two sample
problem.
NP
The two samples can be independent or dependent based on which
we have “two independent samples problem” or “two dependent
samples problem” (“paired data problem”).
13
Testing of Hypothesis: One Sample and Two Sample Test
We judge if the marks obtained by the students are, say 80% or not.
We draw a sample of students and collect their marks. This is one
sample test.

Consider an experiment in which a group of students receives extra

L
TE
mathematical tuition.

We want to judge whether the marks of boys and girls in mathematics


NP
are the same or not. So we draw two random samples of the male
and female students and obtain their marks in mathematics for the
comparison. This is two sample test when the samples are
independent – “two independent samples problem”.
14
Testing of Hypothesis: One Sample and Two Sample Test
Consider an experiment in which a group of students receives extra
mathematical tuition.

Their ability to solve mathematical problems is evaluated before and


after the extra tuition.

L
TE
We are interested in knowing whether the ability to solve
mathematical problems increases after the tuition, or not.
NP
Since the same group of students is used in a pre–post experiment,
this is called a “two‐dependent‐samples problem” or a “paired data
problem”.

15
Construction of Decision Rule:
Let 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 be a random sample of n observations from a
probability 𝒇𝑿 𝒙; 𝜽 , 𝜽 ∈ 𝚯 where 𝚯 is the parametric space.

Partition 𝚯 into two disjoint sets 𝚯𝟎 and 𝚯𝟏 .

L
Now 𝜽 comes from either 𝚯𝟎 or 𝚯𝟏 .

TE
Translate the events as 𝜽 ∈ 𝚯𝟎 or 𝜽 ∈ 𝚯𝟏 .
NP

16
Construction of Decision Rule:
Hypothesis: 𝜽 ∈ 𝚯𝟎 or

Hypothesis: 𝜽 ∈ 𝚯𝟏

We want to know which one is the correct based on the observed

L
sample of data 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 .

TE
We want to devise a rule that can tell us the decision to accept or
NP
reject the hypothesis once the experimental values 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are
observed.

Such a rule is called a test of hypothesis.

17
Construction of Decision Rule:
Different rules of tests of hypothesis can be constructed.

We construct the test around the following notion:

Partition the sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 into two disjoint


subsets, say C and C*.

L
TE
• If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are such that the point 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝑪 , we
shall reject H0.
NP
• If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are such that the point 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C∗, we
shall accept H0.

18
Construction of Decision Rule:
There is no common region between C and C* to avoid any ambiguity
in decision.

• C is called as critical region or region of rejection.

• C* is called as acceptance region or region of acceptance.

L
TE
Next question is how to partition the sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 ?
NP
Consider the possible errors in decision making.

19
Decisions in Tests of Hypothesis:
Used to check the correctness of a statement in statistical sense.

Accept statement Reject statement


Statement Accept correct Reject correct decision:
is correct decision:

L
INCORRECT DECISION

TE
CORRECT
TYPE ONE ERROR
Statement Accept incorrect
NP Reject incorrect decision:
is incorrect decision:
CORRECT
INCORRECT DECISION

TYPE TWO ERROR

20
Procedure for Test:
A good procedure is the one which minimizes both the errors ‐ Type I
and Type II.

L
TE
NP

21
Testing of Hypothesis: Example
Suppose a new type of blood test is introduced.

A patient has some disease and he wants to get it tested with the
blood test.

We have following four possibilities.

L
TE
Test
NP
Detects Not detects
Presence Correct decision Incorrect decision
Disease
Absent Incorrect decision Correct decision

22
Testing of Hypothesis: Example
Error 1: Person has disease and test does not detects.

Error 2: Person does not has disease and test detects it.

Which one is more serious?

L
Error 1 has more serious consequences than error 2.

TE
NP
Whichever error is more serious‐ designate it as Type‐1 error and
other as Type‐2 error.

23
Testing of Hypothesis: Example
Construct the hypothesis which you want to test such that Type‐1
error is more serious than Type‐2

Type 1 error: Reject the hypothesis when it is correct.

L
TE
Type 1 error: Person has disease and test does not detects it.

NP
Hypothesis: Person does not has disease.

This is called as Null Hypothesis.

Reject the hypothesis means “Person has disease”. 24


Testing of Hypothesis: Example
How to compare the hypothetical value.

Need another value for making a comparison.

L
TE
So create an alternative hypothetical value.

NP
The value which we want to test is null hypothesis and the value
against which we want to test is the alternative hypothesis

25
Statistical Hypothesis:
Hypothesis: Statement about the parameter

Statistical Hypothesis has two parts

– Null Hypothesis

L
– Alternative Hypothesis

TE
NP

26
Null Hypothesis:
How to construct the null hypothesis.

The null hyopothesis is constructed such that the type 1 error is more
serious than type 2 error.

The tests are derived using Neyman Pearson lemma which is based

L
on this assumption.

TE
• Denoted as H0 , the statement to be tested.
NP
• Nothing new is happening, usually a hypothesis of no difference.

• Example: The average number of TV sets in Indian Homes is equal


to three. H0: μ = 3

• Begin with the assumption that the H0 is true and test it for
rejection or acceptance. 27
Alternative Hypothesis:
• Denoted as H1 or Ha

• Something new is happening.

• It is the opposite of the null hypothesis.

L
• Statement against which the null hypothesis is tested

TE
Example: The average number of TV sets in Indian homes is not equal
to 3 ( H1: μ ≠ 3 )
NP
• It is generally the hypothesis that the researcher is trying to prove.

• The null and alternative hypotheses are mutually exclusive.

• Only one of them can be true.


28
Errors in Decision Making:

Possible Hypothesis Test Outcomes

Actual Situation
Decision

L
H0 True H0 False

TE
Accept H0 No Error Type II Error
Probability 1 -  Probability 
Reject H0
NP
Type I Error No Error
Probability  Probability 1 - 

29
Two Types of Errors:
Type I Error: Reject H0, when it is true

Type II Error: Accept H0, when it is wrong

Size of Type I Error = P(Type I Error)

L
= P(Reject H0, when it is true)

TE
= P(Reject H0| H0)

= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝑪| H0)
NP
= called as Level of Significance.

 is set by the researcher in advance.

30
Two Types of Errors:
Size of Type II Error = P(Type II Error)

= P(Accept H0, when it is wrong)

= P(Accept H0| H1)

= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C∗ | H1)

L
TE
= 

Power of the test = 1 ‐ 


NP
= 1 ‐ P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C* | H1)

= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C | H1)

= Probability of rejection of H0 when it is wrong

Good criterion, Probability of rejecting a false hypothesis.


31
Two Types of Errors:
P(Reject H0) = P 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C)

= 𝑲 𝜽 ,𝜽 ∈ 𝚯

𝜶 𝜽 𝐢𝐟 𝜽 ∈ 𝚯𝟎

𝟏 𝜷 𝜽 𝐢𝐟 𝜽 ∈ 𝚯𝟏

L
Power Function = 𝑲 𝜽

TE
= 1 ‐ P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C* | H1)
NP
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C | H1)

= Probability of rejection of H0 when it is wrong

Good criterion, Probability of rejecting a false hypothesis.

32
Procedure for Test:
A good procedure is the one which minimizes both the errors ‐ Type I
and Type II.

Simultaneous minimization of both the errors is not possible.

No procedure exists where  = 0 and  =0.

L
TE
If one error decreases, then the other error increases.

NP

33
Procedure for Test:
Extreme cases:

If C* = sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 , then  = 1 and  = 0, i.e.,


always reject H0 .

If C* = Null Set 𝝓 , then  = 0 and  = 1, i.e., always accept H0 .

L
TE
Both cases: One error is minimized and other error is maximized.

No procedure exists that can minimize both  and simultaneously.


NP
So pick up one and minimize other.

As a convention, as per Neyman Pearson Lemma, fix  and minimize 

Null hypothesis is constructed which we would like to reject or is the


hypothesis tested for possible rejection.
34
Neyman Pearson Lemma:
Let 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 be a random sample of n observations from a
probability function 𝒇𝑿 𝒙; 𝜽 , 𝜽 ∈ 𝚯 where 𝚯 is the parametric space.

Suppose 𝚯 𝜽𝟎 , 𝜽𝟏 .

L
Then the best critical region of size 𝜶 is obtained by the following rule:

TE
Reject 𝑯𝟎 if and only if
𝑳 𝜽 𝟎 ; 𝒙𝟏 , 𝒙 𝟐 , … , 𝒙 𝒏
NP 𝒌
𝑳 𝜽 𝟏 ; 𝒙𝟏 , 𝒙 𝟐 , … , 𝒙 𝒏

𝑳 𝜽𝟎 ;𝒙𝟏 ,𝒙𝟐 ,…,𝒙𝒏


i.e. 𝑪 𝒙 𝟏 , 𝒙 𝟐 , … , 𝒙𝒏 : 𝒌 where k > 0 is a constant to
𝑳 𝜽𝟏 ;𝒙𝟏 ,𝒙𝟐 ,…,𝒙𝒏

be determinded such that 𝑷𝜽𝟎 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝐂 𝜶.


35
Likelihood Ratio Test:
Likelihood ratio test is another approach to find a good test.

L
TE
NP

36
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference

Testing of Hypothesis

L
TE
:::
Lecture 66
NP
Test Procedures for One Sample Test for Mean
with Known Variance
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Critical (Rejection) and Acceptance Region:

Critical value divides the whole area under probability curve into two

regions:

• Critical (Rejection) region

L
– When the statistical outcome falls into this region, H0 is rejected.

TE
– Size of this region is α.

• Acceptance Region NP
– When the statistical outcome falls into this region, H0 is accepted.

– Size of this region is (1‐ α).

2
One‐ and Two‐Sided Tests:
We distinguish between one‐sided and two‐sided hypotheses and

tests.

For an unknown population parameter 𝜃 (e.g. μ) and a fixed value 𝜽𝟎 .

L
Case Null hypothesis Alternative hypothesis

TE
(a)
𝜽 𝜽𝟎 𝜽 𝜽𝟎 Two‐sided test
(b)
𝜽 𝜽𝟎
NP 𝜽 𝜽𝟎 One‐sided test
(c)
𝜽 𝜽𝟎 𝜽 𝜽𝟎 One‐sided test

3
One‐ and Two‐Sided Tests:

Two Tail Test H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎


Distribution of the test statistic

L
Acceptance Region

TE
 

Region of Region of
Rejection
NP Rejection



Critical Values

4
Critical Values:
Distribution of the test statistic
One Tail Tests

 

L
Upper‐tail test or Right Tail Test
H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎

TE
NP
  

0
Lower‐tail test or Left Tail Test
H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎
5
Steps for Testing of hypothesis:
In order to test the null hypothesis, a “good“ statistical test is
developed by fixing the type one error and minimizing the second
type error.

1. Define distributional assumptions for the random variables of

L
TE
interest, and specify them in terms of population parameters

2. Formulate the null hypothesis and the alternative hypothesis


NP
3. Fix a significance value (type one error)

4. Test involves developing a TEST STATISTICS.

6
Steps for testing of hypothesis:
5. Construct a test statistic T(X) = T(X1,X2,...,Xn) on the basis of
observed data X1,X2,...,Xn

6. Calculate the value of test statistics on the basis of given data.

7. Construct a critical region K for the statistic T, i.e. a region where

L
TE
if T falls in this region – H0 is rejected.

8. Calculate t(x) = T(x1,x2,...,xn) based on the realized sample values


NP
X1=x1, X2=x2,..., Xn=xn.

9. Test statistics follows a probability distribution.

7
Steps for testing of hypothesis:
10. Find the corresponding value from the corresponding probability
distribution.

11. Compare the two values and accept or reject the null hypothesis.

12. Decision rule:

L
TE
 if t(x) falls into the critical region K, then H0 is rejected. The
alternative hypothesis is then accepted.
NP
 If t(x) falls outside the critical region, H0 is not rejected.

8
p ‐ Value:
Statistical software usually does not follow all the steps of
hypothesis testing.

Instead of calculating and reporting the critical values, the test

L
TE
statistics is printed together with p‐value.

NP
It is possible to use the p‐value instead of critical regions for making
test decisions.

p value : estimated probability of rejecting the null hypothesis


9
p ‐ Value:
The p‐value of the test statistic T(X) is defined as follows:

For two‐sided case: PH0(|T| > t(x)) = p‐value

For one‐sided case: PH0(T ≥ t(x)) = p‐value

L
PH0(T ≤ t(x)) = p‐value

TE
Note: PH0 means probability when H0 is true.

NP
If p value < significance level (α) then reject the null hypothesis

Significance level: Probability of type one error (α)

10
Test Decisons Using Confidence Intervals:
Interesting and useful relationship between confidence intervals
and hypothesis tests

If H0 is rejected at the significance level α then there exists a

L
100(1‐ α)% confidence interval which yields the same conclusion

TE
as the test.
NP
Suppose H0: θ = θ0
If confidence interval does not contain the value θ0, then H0 is
rejected.

11
Hypothesis Testing for μ (σ known): Example
• A survey, done 10 years ago, of CA’s (Chartered Accountant)
found that their average annual salary was Rs. 74,914.
• An accounting researcher would like to test whether over the
years

L
– this average has increased?

TE
• Right or Upper Tail Test (H0: μ = 74914, H1: μ > 74914)
– this average has decreased?
NP
• Left or Lower Tail Test (H0: μ =74914, H1: μ < 74914)
– this average has changed?
• Two Tail Test (H0: μ = 74914, H1: μ ≠ 74914)
• A sample of 112 CAs gave a mean annual salary of Rs. 78,695.
• Assume that  = Rs. 14,530. 12
Hypothesis Testing for μ (σ known): Example
• Assumptions:
– σ is known.
– Population is normal or
– sample size is large (n ≥ 30).

L
• Test Statistic: Compute the value of test statistic using following
formula:

TE
𝒙 𝝁
𝒁𝒄
NP 𝝈/ 𝒏
• Level of Significance: Fix the value of a, say 0.05 or 0.10

• Critical Values:

– Distribution of test statistic is N(0,1) when H0 is true.

– Critical values are obtained using N(0,1)


13
Hypothesis Testing for μ (σ known): Example
• For two tail test (H0: μ = μ0, H1: μ ≠ μ0):
N(0,1)


 

L
• P(‐ zα/2 < Z< zα/2 ) = 1‐ α  

TE
• P(‐ 1.96 < Z < 1.96 ) = 0.95

• P(Z > zα/2 ) = α/2


NP
• P(Z >1.96) = 0.025

• P( Z < ‐zα/2 ) = α/2

• P(Z < ‐1.96) = 0.025


14
Hypothesis Testing for μ (σ known): Example
• For right tail test (H0: μ = μ0, H1: μ > μ0): zα
N(0,1)
• P( Z < zα) = 1‐ α

• P(Z < 1.645 ) = 0.95

• P(Z > zα ) = α
• P(Z >1.645) = 0.05

L


TE
NP
• For left tail test (H0: μ = μ0, H1: μ < μ0): ‐zα
• P( Z > ‐zα) = 1‐ α N(0,1)

• P(Z > ‐1.645 ) = 0.95 


• P(Z < ‐zα ) = α 
• P(Z < ‐1.645) = 0.05
 15
Hypothesis Testing for μ (σ known): Example
Decision Making

We reject H0 in the favor of H1 at α x100% level

 If |Zc| > zα/2 (for two tailed test)

L
 If Zc > zα (for right tailed test)

TE
 If Zc < ‐zα (for left tailed test)

Accepting H0 means thatNP


• The difference between sample mean and hypothetical
population mean is not significant.

• This difference is because of sampling fluctuation only.

16
Hypothesis Testing for μ (σ known): Example
• State H0 and H1

• Compute the value of test statistic Zc

• Obtain critical value for fixed α and according to H1 (Right/

L
Left/ Two tailed test)

TE
• Compare computed value of Zc with critical value

• Make the decision accordingly.


NP
• Some useful critical values of N(0,1) distribution
Test Level of Significance
1% 5% 10%
Two Tailed 2.58 1.96 1.645
Right Tailed 2.33 1.645 1.28
Left Tailed ‐2.33 ‐1.645 ‐1.28 17
Hypothesis Testing for μ (σ known): Example
• A survey, done 10 years ago, of CA’s (Chartered Accountant)
found that their average annual salary was Rs. 74,914.
• An accounting researcher would like to test whether over the
years

L
– this average has increased?

TE
• Right or Upper Tail Test (H0: μ = 74914, H1: μ > 74914)
– this average has decreased?
NP
• Left or Lower Tail Test (H0: μ =74914, H1: μ < 74914)
– this average has changed?
• Two Tail Test (H0: μ = 74914, H1: μ ≠ 74914)
• A sample of 112 CAs gave a mean annual salary of Rs. 78,695.
• Assume that  = Rs. 14,530. 18
Hypothesis Testing for μ (σ known): Example
Has the average salary of CAs increased?
• Right or Upper Tailed Test (H0: μ = 74914, H1: μ > 74914)

• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539

L
𝜎⁄ 𝑛 14530⁄ 112

TE
• At 5% level of significance, critical value for a right tailed test,
z(0.05) =1.645


NP
Since, computed value > critical value at 5% level of significance

• We reject H0 at 5% level of significance in favor of H1 and


conclude that average salary of CAs has increased.

19
Hypothesis Testing for μ (σ known): Example
Has the average salary of CAs decreased?
• Left or Lower Tailed Test (H0: μ = 74914, H1: μ < 74914)

• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539

L
𝜎⁄ 𝑛 14530⁄ 112

TE
• At 5% level of significance, critical value for a left tailed test, z(0.05)
= ‐1.645
NP
• Since, computed value > critical value at 5% level of significance,

we accept H0 at 5% level of significance against H1 and conclude


that average salary of CAs has not decreased.
20
Hypothesis Testing for μ (σ known): Example
Has the average salary of CAs changed?
• Two Tailed Test (H0: μ = 74914, H1: μ ≠ 74914)

• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539

L
𝜎⁄ 𝑛 14530⁄ 112

TE
• At 5% level of significance, critical value for a two tailed test,
z(0.025) =1.96


NP
Since, |computed value| > critical value at 5% level of
significance, we reject H0 at 5% level of significance in favor of H1

and conclude that average salary of CAs is changed.

21
p – Value Approach:
• Let Zc be the computed value of test statistic
• Let Z ~ N(0,1)
• Then p – value is given by the following probability
– For two tailed tests:
• 2P(Z> |Zc|)

L
– For right tailed tests:

TE
• P(Z > Zc)
– For left tailed tests:
NP
• P(Z < Zc)
• Decision: H0 is rejected in the favor of H1 at α x100% level of
significance, if p – value < α
• The p – value is the smallest level of significance at which H0 would
be rejected.
22
Classical Gauss Test
Gauss.test is available in library ‘’compositions’’
(Compositional Data Analysis). So it needs to be installed first.
install.packages("compositions")
library(compositions)

L
Description

TE
One and two sample Gauss test for equal mean of normal random
NP
variates with known variance.
Usage
Gauss.test(x,y=NULL,mean=0,sd=1,alternative =
c("two.sided", "less", "greater"))

23
Classical Gauss Test
Arguments
x : a numeric vector providing the first dataset
y : optional second dataset
mean : the mean to compare with

L
: the known standard deviation

TE
sd
alternative : the alternative to be used in the test
NP

24
Hypothesis Testing for μ (σ known): Example in R

Suppose a random sample of size n = 20 of the day temperature in a


particular city is drawn. Let us assume that the temperature in the
population follows a normal distribution N(μ, σ2) with , σ2 = 36 . The

L
sample provides the following values of temperature (in degree

TE
Celsius) :
NP
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2

Note that 𝝁 𝒙 𝟒𝟏. 𝟗𝟕

25
Hypothesis Testing for μ (σ known): Example in R

install.packages("compositions")

library(compositions)

L
TE
temp=c(40.2, 32.8, 38.2, 43.5, 47.6, 36.6,
38.4, 45.5, 44.4, 40.3, 34.6, 55.6, 50.9, 38.9,
37.8, 46.8, 43.6, 39.5, 49.9, 34.2 )
NP

26
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 30 H1: μ ≠ 30

Gauss.test(x=temp, y=NULL, mean=30, sd=6,


alternative = "two.sided")

one sample Gauss-test

L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1e-06
NP
alternative hypothesis: two.sided

27
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 30 H1: μ < 30

Gauss.test(x=temp, y=NULL, mean=30, sd=6,


alternative = "less")

one sample Gauss-test

L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1
NP
alternative hypothesis: less

28
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 30 H1: μ > 30

Gauss.test(x=temp, y=NULL, mean=30, sd=6,


alternative = "greater")

one sample Gauss-test

L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1
NP
alternative hypothesis: less

29
Critical Values in t and F distribution:
P[t > q]=0.025 based on t distribution
> qt(p=0.25, df=7, ncp=0, lower.tail=F)
[1] 0.7111418

L
P[t 1.96 or t > 1.96](2‐tailed) based on t distribution

TE
> qt(p=0.25, df=7, ncp=0, lower.tail=F)*2
[1] 1.422284 NP
P[F > p] = 0.025 based on F distribution
> qf(p=0.025, df1=5, df2=10, ncp=0, lower.tail
= F)
[1] 4.236
30
Critical Values in Normal Distribution:
P[z > q]=0.025 based on Normal distribution
> qnorm(0.025, lower.tail=F)
[1] 1.959964

L
P[z 1.96 or z > 1.96](2‐tailed) based on Normal distribution

TE
> pnorm(1.96, lower.tail=F)*2
[1] 0.04999579 NP
P[z 3] based on Normal distribution
> (1 - pnorm(3))*2
[1] 0.04999579 31
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference

L
Testing of Hypothesis

TE
:::
NPLecture 67
One Sample Test for Mean with Unknown Variance
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Hypothesis Testing for μ (σ unknown): t ‐ Test
• Let X1,X2,...,Xn be a random sample from 𝑵 𝝁, 𝝈𝟐 where

• Assumptions:

– σ is unknown.

L
– Population is normal or

TE
– sample size is small (n ≤ 30).
NP
• Test Statistic: Compute the value of test statistic using following
formula:
𝒙 𝝁
𝑻𝒄
𝒔/ 𝒏
𝟏
where 𝒔𝟐 ∑𝒏𝒊 𝟏 𝒙𝒊 𝒙 𝟐
.
𝒏 𝟏 2
Hypothesis Testing for μ (σ unknown): t ‐ Test
• Level of Significance: Fix the value of α, say 0.05 or 0.10

• Test statistic Tc has t(n‐1) distribution.

• Critical values of t(n‐1) distribution can be obtained from the t

L
table for given d.f. and significance level.

TE
• For large samples, t distribution can be approximated by N(0,1).

NP

3
Hypothesis Testing for μ (σ unknown): t ‐ Test
• For two tail test (H0: μ = μ0, H1: μ ≠ μ0):
T~t(n-1)

Rejection Rejection
Region (α/2) Acceptance Region (α/2)
Region (1- α)

L
 tα/2

TE
tα/2

• P(‐ tα/2 < T< tα/2 ) = 1‐ α


NP
• P(T > tα/2 ) = α/2

• P( T < ‐tα/2 ) = α/2

4
Hypothesis Testing for μ (σ unknown): t ‐ Test
We reject H0 in the favor of H1 at α x100% level

• If |Tc| > tα/2 (for two tailed test)

• If Tc > tα (for right tailed test)

L
• If Tc < ‐tα (for left tailed test)

TE
NP

5
Hypothesis Testing for μ (σ known): t – Test
T~t(n-1)
For right tail test (H0: μ = μ0, H1: μ > μ0):
α
• P(T < tα) = 1- α  α

• P(T > tα ) = α

L

TE
NP
• For left tail test (H0: μ = μ0, H1: μ < μ0):
T~t(n-1)
• P(T > -tα) = 1- α α

• P(T < -tα ) = α  α

tα 6
Hypothesis Testing for μ (σ unknown): p – value
Approach
Let tc be the computed value of test statistic

Let T ~ t(n‐1)

Then p – value is given by the following probability

L
TE
• For two tailed tests: 2P(T > |tc|)

• For right tailed tests: P(T > tc)


NP
• For left tailed tests: P(T < tc)

Decision:

H0 is rejected in the favor of H1 at α x100% level of significance,

if p – value < α
7
Hypothesis Testing for μ (σ unknown): t ‐ Test in R
t.test()
Performs one and two sample t‐tests on vectors of data.
Usage
t.test(x, ...)

L
TE
t.test(x, y = NULL, alternative =
NP
c("two.sided", "less", "greater"), mu = 0,
paired = FALSE, var.equal = FALSE, conf.level
= 0.95, ...)

8
Hypothesis Testing for μ (σ unknown): t ‐ Test in R
Arguments
x a (non‐empty) numeric vector of data values.
y an optional (non‐empty) numeric vector of data values.

L
alternative a character string specifying the alternative

TE
hypothesis, must be one of "two.sided" (default), "greater"
NP
or "less". You can specify just the initial letter.

mu a number indicating the true value of the mean (or


difference in means if you are performing a two sample test).

9
Hypothesis Testing for μ (σ unknown): t ‐ Test in R
Arguments
paired a logical indicating whether you want a paired t‐test.

var.equal a logical variable indicating whether to treat the two

L
variances as being equal. If TRUE then the pooled variance

TE
is used to estimate the variance otherwise the Welch (or
NP
Satterthwaite) approximation to the degrees of freedom is used.

conf.level confidence level of the interval.

10
Hypothesis Testing for μ (σ unknown): Example in R

Suppose a random sample of size n = 20 of the day temperature in a


particular city is drawn. Let us assume that the temperature in the
population follows a normal distribution N(μ, σ2) where σ2 is

L
unknown. The sample provides the following values of

TE
temperature (in degree Celsius) :
NP
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2

temp=c(40.2, 32.8, 38.2, 43.5, 47.6, 36.6,


38.4, 45.5, 44.4, 40.3, 34.6, 55.6, 50.9, 38.9,
11
37.8, 46.8, 43.6, 39.5, 49.9, 34.2 )
Hypothesis Testing for μ (σ unknown): Example in R
H0: μ = 40 H1: μ ≠ 40
t.test(temp, y = NULL, alternative =
"two.sided", mu = 40, paired = FALSE,
var.equal = FALSE, conf.level = 0.95)

One Sample t-test

L
TE
data: temp
t = 1.4476, df = 19, p-value = 0.164
NP
alternative hypothesis: true mean is not equal
to 40
95 percent confidence interval:
39.12384 44.80616
sample estimates:
mean of x
41.965
12
Hypothesis Testing for μ (σ unknown): Example in R

L
TE
NP

13
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 40 H1: μ < 40
t.test(temp, y = NULL, alternative = "less",
mu = 40, paired = FALSE, var.equal = FALSE,
conf.level = 0.95)

One Sample t-test

L
TE
data: temp
t = 1.4476, df = 19, p-value = 0.918
NP
alternative hypothesis: true mean is less than
40
95 percent confidence interval:
-Inf 44.3122
sample estimates:
mean of x
41.965
14
Hypothesis Testing for μ (σ known): Example in R

L
TE
NP

15
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 40 H1: μ > 40
t.test(temp, y = NULL, alternative =
"greater", mu = 40, paired = FALSE, var.equal
= FALSE, conf.level = 0.95)

L
One Sample t-test
data: temp

TE
t = 1.4476, df = 19, p-value = 0.08202
alternative hypothesis: true mean is greater
than 40 NP
95 percent confidence interval:
39.6178 Inf
sample estimates:
mean of x
41.965
16
Hypothesis Testing for μ (σ unknown): Example in R

L
TE
NP

17
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference

Testing of Hypothesis

L
TE
:::
Lecture 68
NP
Two Sample Test for Mean with Known and
Unknown Variances
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
We want to conduct statistical Hypothesis testing of difference of
means

• In a survey of buying habits, 400 women shoppers are chosen at

L
random in a super market ‘A’. Their average weekly food

TE
expenditure is Rs. 250 with a s.d. of Rs. 40.

NP
• For 400 women shoppers chosen at random in some other
super market ‘B’ , the average weekly food expenditure is Rs.
220 with a s.d. of Rs. 55.

2
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
• Do these two populations have similar shopping habits.

• Are the average weekly food expenditure of two populations of


shoppers equal.

L
TE
NP

3
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
• Suppose we have two samples

• First sample is drawn from a population with mean 𝝁𝟏 and


variance 𝝈𝟐𝟏 .

L
• 𝒙𝟏 is the mean and 𝒏𝟏 is the size of 1st sample.

TE
• Second sample is drawn from a population with mean 𝝁𝟐 and
variance 𝝈𝟐𝟐 . NP
• 𝒙𝟐 is the mean and 𝒏𝟐 is the size of 2nd sample.

• We wish to examine if two population means μ1 and μ2 are


different.

4
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2

– H0: μ1 = μ2 H1: μ1 > μ2

L
– H0: μ1 = μ2 H1: μ1 < μ2

TE
• Since population means μ1 and μ2 are unknown, we use the
statistic 𝒙𝟏 𝒙𝟐 to make some conclusion about 𝝁𝟏
NP 𝝁𝟐 .

• Assumptions:

– σ1 and σ2 both are known

– Both populations are normal.

– Or, sample sizes n1 and n2 are large. 5


Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
𝝈𝟐𝟏 𝝈𝟐𝟐
• We know 𝒙𝟏 ~𝑵 𝝁𝟏 , and 𝒙𝟐 ~𝑵 𝝁𝟐 ,
𝒏𝟏 𝒏𝟐

• Test Statistic:

𝝈𝟐𝟏 𝝈𝟐𝟐

L
𝐄 𝒙𝟏 𝒙𝟐 = 𝝁𝟏 𝝁𝟐 and 𝐕𝐚𝐫 𝒙𝟏 𝒙𝟐 =
𝒏𝟏 𝒏𝟐

TE
𝒙𝟏 𝒙𝟐 𝟎 𝒙𝟏 𝒙𝟐
Thus 𝒁𝒄 when H0 is true.
𝝈𝟐 𝟐
𝟏 𝝈𝟐
𝒏𝟏 𝒏𝟐
NP 𝝈𝟐 𝟐
𝟏 𝝈𝟐
𝒏𝟏 𝒏𝟐

• Under our assumptions, 𝒁𝒄 ~𝑵 𝟎, 𝟏 .

• We use N(0, 1) distribution to get critical values and p – values.

6
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) in R
Classical Gauss Test

Gauss.test is available in library ‘’compositions’’


(Compositional Data Analysis). So it needs to be installed first.

L
install.packages("compositions")

TE
library(compositions)

Usage NP
Gauss.test(x, y, mean=0, sd=1, alternative =
c("two.sided", "less", "greater"))

7
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) in R
Arguments

x : a numeric vector providing the first dataset

y : a numeric vector providing the second dataset

L
mean : the mean to compare with

TE
sd : the known standard deviation
NP
alternative : the alternative to be used in the test

8
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) : Example in R
Following are the gain in weights (in grams) of fishes fed on two
diets A and B.

A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25

L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22

TE
Suppose the standard deviation is known to be 2
NP
We want to test if the two diets differ significantly as regards their
effect on increase in weight.

H0: μ1 = μ2 H1: μ1 ≠ μ2

9
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) : Example in R
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)

xb =
c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)

L
TE
Gauss.test(xa, xb, mean=0, sd=2, alternative =
c("two.sided")) NP
one sample Gauss-test
data: xa
T = -2, mean = 0, sd = 2, p-value = 0.009823
alternative hypothesis: two.sided 10
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) : Example in R

L
TE
NP

11
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances) :
• When σ1 and σ2 both are known, we use N(0, 1) distribution.

• When σ1 and σ2 both are not known, we use t distribution

• Assumptions:

L
– Both populations are normal.

TE
– Unknown σ1 and σ2 are equal.
NP
• For large sample sizes, we can still use N(0,1) distribution.

12
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances) :
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2

– H0: μ1 = μ2 H1: μ1 > μ2

L
– H0: μ1 = μ2 H1: μ1 < μ2

TE
• Let

 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and
NP
 𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐

• Population means 𝝁𝟏 and 𝝁𝟐 are unknown, we use the statistic


𝒙 𝒚 to make some conclusion about 𝝁𝟏 𝝁𝟐 .

• σ1 and σ2 both are unknown. 13


Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances)
𝝈𝟐𝟏 𝝈𝟐𝟐
• We know 𝒙~𝑵 𝝁𝟏 , and 𝒚~𝑵 𝝁𝟐 ,
𝒎 𝒏

• Test Statistic:

𝐄 𝒙 𝒚 = 𝝁𝟏 𝝁𝟐 and

L
TE
𝒎 𝒏
𝟏
𝑺𝟐 𝒙𝒊 𝒙 𝟐 𝒚𝒊 𝒚 𝟐
𝒎 𝒏 𝟐
𝒊 𝟏
NP 𝒋 𝟏

𝒙 𝒚 𝟎 𝒙 𝒚
Thus 𝑻𝒄 when H0 is true.
𝟏 𝟏 𝟏 𝟏
𝑺 𝑺
𝒎 𝒏 𝒎 𝒏

• Under our assumptions, 𝑻𝒄 ~𝒕𝒎 𝒏 𝟐

• We use N(0, 1) distribution to get critical values and p – values.


14
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances): Example in R
Following are the gain in weights (in grams) of fishes fed on two
diets A and B.

A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25

L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22

TE
Suppose the standard deviation is unknown.
NP
(Note that now the sd is unknown (earlier it was known))

We want to test if the two diets differ significantly as regards their


effect on increase in weight.

H0: μ1 = μ2 H1: μ1 ≠ μ2

15
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances): Example in R
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)

xb = c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)

t.test(xa, xb, alternative = c("two.sided"), mu = 0,

L
paired = FALSE, var.equal = TRUE, conf.level = 0.95)

TE
Two Sample t-test
data: xa and xb
NP
t = -0.61028, df = 25, p-value = 0.5472
alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
-8.749507 4.749507
sample estimates:
mean of x mean of y
28 30 16
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances): Example in R

L
TE
NP

17
Testing Hypothesis for Difference of Means (Dependent
Samples):
Assume a company send their salespeople to a “customer service”
training workshop. They want to know has the training made a
difference in the number of complaints.

L
Following data is collected:

TE
Salesperson Number of Complaints
NP Before After
C.B. 6 4
T.F. 20 6
M.H. 3 2
R.K. 0 0
M.O 4 0 18
Testing Hypothesis for Difference of Means (Dependent
Samples):
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2

– H0: μ1 = μ2 H1: μ1 > μ2

L
– H0: μ1 = μ2 H1: μ1 < μ2

TE
• Let
 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and
NP
 𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐

• Both samples are dependent samples.

19
Testing Hypothesis for Difference of Means (Dependent
Samples):
• The two samples are not independent.

• But sample observations are paired together.

• This means the pair of observations xi and yi corresponds to the

L
same i th individual.

TE
• Cleary, the sizes of both samples should be the same.
NP
• Population means 𝝁𝟏 and 𝝁𝟐 are unknown, we use the statistic
𝒙 𝒚 to make some conclusion about 𝝁𝟏 𝝁𝟐 .

• We obtain the differences di = xi – yi for each pair of observations.

• Assumption: Both populations are normal.


20
Testing Hypothesis for Difference of Means (Dependent
Samples):
• Since, we use the differences di = xi – yi in our analysis. We need
the population variance of di’s.

• In practice this variance is not available.

L
• Even if σ1 and σ2 are known, then also we can not obtain

TE
population variance of di’s. (samples are not independent)
NP
• So, this test can be considered as

– a test of single mean (of di‘s).

– with hypothetical mean as zero.

– population variance unknown.


21
Testing Hypothesis for Difference of Means (Dependent
Samples):
Thus the test statistic is
𝒅
𝒄 when H0 is true.
𝒔𝒅 / 𝒏

L
𝟏 𝒏 𝟏 𝟐
where 𝒅 ∑ 𝒅, 𝒔𝟐𝒅 ∑𝒏𝒊 𝟏 𝒅𝒊 𝒅

TE
𝒏 𝒊 𝟏 𝒊 𝒏 𝟏

NP
• Under our assumptions, 𝑻𝒄 ~𝒕𝒏 𝟏.

22
Testing Hypothesis for Difference of Means (Dependent
Samples):
Usage

t.test(x, y, paired = TRUE)

L
TE
t.test(x, y, alternative = c("two.sided",
"less", "greater"), mu = 0, paired = TRUE,
NP
var.equal = TRUE, conf.level = 0.95)

23
Testing Hypothesis for Difference of Means (Dependent
Samples):
Arguments
x a (non‐empty) numeric vector of data values.
y a (non‐empty) numeric vector of data values.

L
alternative a character string specifying the alternative

TE
hypothesis, must be one of "two.sided" (default), "greater" or
"less". NP
mu a number indicating the true value of the mean (or difference in

means if you are performing a two sample test).

paired a logical indicating whether you want a paired t‐test.

24
Testing Hypothesis for Difference of Means (Dependent
Samples):
Arguments
var.equal a logical variable indicating whether to treat the two
variances as being equal. If TRUE then the pooled variance is used

L
to estimate the variance otherwise the Welch (or Satterthwaite)

TE
approximation to the degrees of freedom is used.
NP
conf.level confidence level of the interval.

25
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
Assume a company send their salespeople to a “customer service”
training workshop. They want to know has the training made a
difference in the number of complaints.

L
Following data is collected:

TE
Salesperson Number of Complaints Difference, di
NP
Before (1) After (2) (2‐1)
C.B. 6 4 ‐2
T.F. 20 6 ‐14
M.H. 3 2 ‐1
R.K. 0 0 0
M.O 4 0 ‐4
26
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
• To test H0: μ1 = μ2 H1: μ1 ≠ μ2

• At 1 % level and 4 d.f., Critical Value = 4.604

L
𝒅 𝟒.𝟐
𝑻𝒄 𝟏. 𝟔𝟔 Reject Reject
𝒔𝒅 / 𝒏 𝟓.𝟔𝟕/ 𝟓

TE
/2 /2

NP
• Decision: Do not reject H0
‐ 4.604
‐ 1.66
4.604

• Conclusion: There is no evidence of a significant change in the


number of complaints

27
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
ya = c(6,20,3,0,4)

yb = c(4,6,2,0,0)

L
t.test(ya, yb, alternative = c("two.sided"), mu

TE
= 0, paired = T)

NP

28
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
Paired t-test

data: ya and yb
t = 1.655, df = 4, p-value = 0.1733

L
alternative hypothesis: true difference in

TE
means is not equal to 0
95 percent confidence interval:
NP
-2.845828 11.245828
sample estimates:
mean of the differences
4.2

29
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R

L
TE
NP

30
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference

Testing of Hypothesis

L
TE
:::
Lecture 69
NP
Test of Hypothesis for Variance in One and Two
Samples
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Hypothesis Testing for σ2 :

• A pharmaceutical company is considering the purchase of new


bottling machines to increase efficiency.

• The factory currently makes use of machines that fill cough

L
syrup bottles whose volume of medicine has a standard

TE
deviation of 1.6 mL.
NP
• The new machine they are considering was tested on 30 bottles,
producing a batch with a standard deviation of 1.2 ml.

• Does this machine produce a variance less than 1.6 ml at the


0.05 significance level?

2
Hypothesis Testing for σ2 :
• Assumptions:

– Population is normal.

• Test Statistic:

L
𝒏
𝒙𝒊 𝒙 𝟐 𝒏 𝟏 𝒔𝟐

TE
𝒊 𝟏
𝝌𝟐𝒄
𝝈𝟐 𝝈𝟐
NP
• Distribution of above test statistic is Chi Square with (n ‐ 1)
degree of freedom.

• Critical values are obtained from the Chi Square table for given
level of significance and d.f.

3
Hypothesis Testing for σ2 :
• For two tail test (H0: σ = σ0, H1: σ ≠ σ0):

Critical Region
Critical Region
(α/2)
(α/2)
Acceptance Region

L
TE
(1‐α)
0
 (21 / 2 )  (2 / 2 )

NP
• This distribution is not symmetric.
• So, we have different notations for critical values.
• Area after the point is mentioned in subscript.

• We reject H0 in the favor of H1 at α x100% level, If 𝝌𝟐𝒄 𝝌𝟐 /𝟐 or 𝝌𝟐𝒄

𝝌𝟐𝟏 /𝟐 4
Hypothesis Testing for σ2 :
• For right tail test (H0: σ = σ0, H1: σ > σ0):

Critical Region
(α)

L
Acceptance Region

TE
(1‐α)

0
NP  (2 )
• We reject H0 in the favor of H1 at α x100% level, if 𝝌𝟐𝒄 𝝌𝟐

5
Hypothesis Testing for σ2 :
• For left tail test (H0: σ = σ0, H1: σ < σ0):

Critical Region
(α)

L
Acceptance Region

TE
(1-α)

0
 (21 ) NP
• We reject H0 in the favor of H1 at α x100% level, if 𝝌𝟐𝒄 𝝌𝟐𝟏

6
One‐Sample Chi‐Squared Test on Variance :
Estimate the variance, test the null hypothesis using the chi‐
squared test that the variance is equal to a user‐specified value,
and create a confidence interval for the variance.

L
install.packages("EnvStats")

TE
library(EnvStats)
NP
Usage
varTest(x, alternative = "two.sided",
conf.level = 0.95, sigma.squared = 1)

7
One‐Sample Chi‐Squared Test on Variance:
Arguments
x numeric vector of observations.

alternative character string indicating the kind of alternative


hypothesis. The possible values are "two.sided" (the default),

L
"greater", and "less".

TE
conf.level numeric scalar between 0 and 1 indicating the
NP
confidence level associated with the confidence interval for the
population variance. The default value is conf.level=0.95.

sigma.squared a numeric scalar indicating the hypothesized


value of the variance. The default value is sigma.squared=1.
8
Hypothesis Testing for σ2 : Example in R
Suppose a random sample of size n = 20 of the day temperature in a
particular city is drawn. Let us assume that the temperature in the
population follows a normal distribution N(μ, σ2) where σ2 is

L
unknown. The sample provides the following values of

TE
temperature (in degree Celsius) :

40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
NP
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2

temp=c(40.2, 32.8, 38.2, 43.5, 47.6, 36.6,


38.4, 45.5, 44.4, 40.3, 34.6, 55.6, 50.9, 38.9,
37.8, 46.8, 43.6, 39.5, 49.9, 34.2 )
9
Hypothesis Testing for σ2 : Example in R
• We want to test: H0: σ2 = 36, H1: σ2 ≠ 36, two sided test,  = 5%
varTest(temp, alternative = "two.sided", conf.level
= 0.95, sigma.squared = 36)
Chi-Squared Test on Variance
data: temp

L
Chi-Squared = 19.45, df = 19, p-value = 0.8566

TE
alternative hypothesis: true variance is not equal
to 36
NP
95 percent confidence interval:
21.31373 78.61721
sample estimates:
variance
36.85292

10
Hypothesis Testing for σ2 : Example in R

L
TE
NP

11
Hypothesis Testing for σ2 : Example in R
• We want to test: H0: σ2 = 36, H1: σ2 > 36, right tail test,  = 5%
varTest(temp, alternative = "greater", conf.level =
0.95, sigma.squared = 36)
Chi-Squared Test on Variance

L
data: temp
Chi-Squared = 19.45, df = 19, p-value = 0.4283

TE
alternative hypothesis: true variance is greater
than 36
NP
95 percent confidence interval:
23.22905 Inf
sample estimates:
variance
36.85292

12
Hypothesis Testing for σ2 : Example in R

L
TE
NP

13
Hypothesis Testing for σ2 : Example in R
• We want to test: H0: σ2 = 36, H1: σ2 < 36, left tail test,  = 5%
varTest(temp, alternative = "less", conf.level =
0.95, sigma.squared = 36)
Chi-Squared Test on Variance

L
data: temp
Chi-Squared = 19.45, df = 19, p-value = 0.5717

TE
alternative hypothesis: true variance is less than
36
NP
95 percent confidence interval:
0.00000 69.21069
sample estimates:
variance
36.85292

14
Hypothesis Testing for σ2 : Example in R

L
TE
NP

15
Testing the Hypothesis for Difference of Variances:
We have two populations with variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐

We want to examine the difference between these two variances.

These variances are unknown.

L
So we take a sample from each population.

TE
And use sample variances to study about population variances.

We wish to test:
NP
𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐

𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐

𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐


16
Testing the Hypothesis for Difference of Variances:
Assumptions: both populations are normal.

Let 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and

𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐

L
TE
𝟏
∑𝒎 𝒙𝒊 𝒙 𝟐
𝒎 𝟏 𝒊 𝟏
Test Statistic: 𝑭𝒄 𝟏
∑𝒏 𝒚𝒊 𝒚 𝟐
𝒏 𝟏 𝒊 𝟏
NP
Fc ~ F(m-1, n-1) when the null hypothesis is true.

Critical values are obtained using F table for given d.f. and level of
significance.
17
Testing the Hypothesis for Difference of Variances:
For two tail test (𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐 ):
Critical Region
(α/2) Critical Region
(α/2)
Acceptance Region

L
(1‐α)

TE
0
Reject H0 Accept H0 Reject H0
NP
F(1 / 2 ) F( / 2 )

This distribution is not symmetric.

Reject H0 in the favor of H1 at α x100% level, if 𝑭𝒄 𝑭𝜶/𝟐 or

𝑭𝒄 𝑭𝟏 𝜶/𝟐 and critical values are obtained using F table for given
18
d.f. and level of significance.
Testing the Hypothesis for Difference of Variances:
For two tail test (𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐 ):

Critical Region
(α)

L
Acceptance Region

TE
(1‐α)
0
Accept H0 Reject H0
NP F( )

Reject H0 in the favor of H1 at α x100% level, if 𝑭𝒄 𝑭𝜶 .

19
Testing the Hypothesis for Difference of Variances:
For two tail test (𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐 ):

Critical Region
(α)

L
Acceptance Region

TE
(1‐α)
0
Reject H0 NP
F(1 ) Accept H0

Reject H0 in the favor of H1 at α x100% level, if 𝑭𝒄 𝑭𝟏 𝜶.

20
Testing the Hypothesis for Difference of Variances In R:
Usage

var.test(x, ...)

L
var.test(x, y, ratio = 1, alternative =

TE
c("two.sided", "less", "greater"),
NP
conf.level = 0.95)

21
Testing the Hypothesis for Difference of Variances In R:
Arguments

x, y numeric vectors of data values.

ratio the hypothesized ratio of the population variances of x

L
and y.

TE
alternative a character string specifying the alternative
NP
hypothesis, must be one of "two.sided" (default), "greater" or
"less".

conf.level confidence level for the returned confidence


interval.

22
Testing the Hypothesis for Difference of Variances: Example in R

Following are the gain in weights (in grams) of fishes fed on two diets
A and B. Assume normal populations.

A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25

L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22

TE
We want to test if the variances of the two diets differ significantly as
NP
regards their effect on increase in weight.

𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐

xa = c(25,32,30,34,24,14,32,24,30,31,35,25)
xb = c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)
23
Testing the Hypothesis for Difference of Variances: Example in R
var.test(xa, xb, ratio = 1, alternative =
"two.sided", conf.level = 0.95)

F test to compare two variances


data: xa and xb

L
F = 0.343, num df = 11, denom df = 14, p-value =

TE
0.08144
alternative hypothesis: true ratio of variances is
not equal to 1 NP
95 percent confidence interval:
0.1108401 1.1520871
sample estimates:
ratio of variances
0.3430045

24
Testing the Hypothesis for Difference of Variances: Example in R

L
TE
NP

25

You might also like