Eds1 Week12

Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference
L
Testing of Hypothesis
TE
:::
NP Lecture 65
Basics of Tests of Hypothesis and Decision Rules
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Testing of Hypothesis:
Notice that different samples give different values of estimates.
The value of estimates may or may not be the same as of the

unknown value of parameters in the population.
L
TE
The difference in the estimated value may be due to random
variation or there may be some assignable cause/reason.
NP
The value which we want to know is unknown. So want to check if
the estimated value differs with some other reasonable value or not.
2
Testing of Hypothesis:
In simple words, we would like to test the closeness of estimated
value with some hypothetical value.
Find the difference between estimated value and hypothetical value.
L
• If differences is less, we can expect to accept it and would say that
TE
there is not much significant difference between the two values.
NP
• If difference increases, we can expect to reject it and would say
that there is significant difference between the two values.
How to decide scientifically that the difference between the values is

significant or not.
3
Testing of Hypothesis: Examples
Suppose we want to compare the yield of the crop with a new
variety of seeds with that of an earlier used variety of seeds. The
new variety may replace the earlier one if found better.
L
Suppose we want to determine whether a new method of sealing
TE
LED light bulbs will increase the life of the LED bulbs, whether one
NP
method of preserving foods is better than another so as to retain the
vitamins more, and so on.
All such statements can be converted into a statement or a

hypothesis.
4
Statistical Hypothesis:
A researcher may have a research question for which the truth about
the population of interest is unknown.
L
Suppose data can be obtained using a survey, observation, or an
TE
experiment.
NP
If, given a prespecified uncertainty level, a statistical test based on
the data supports the hypothesis about the population, we say that
this hypothesis is statistically proven.
5
Hypothesis:
A statistical hypothesis is usually a statement about a set of
parameters of a population distribution.
L
It is called a hypothesis because it is not known whether or not it is
TE
true.
NP
A primary problem is to develop a procedure for determining
whether or not the values of a random sample from this population
are consistent with the hypothesis.
6
Research Hypothesis:
Research Hypothesis is a statement of what the researcher believes
will be the outcome of an experiment or a study
For example,
– Older workers are more loyal to a company.
L
TE
– Companies with more than Rs. 1 billion of assets spend a
higher percentage of their annual budget on advertising than
NP
the companies with less than Rs. 1 billion of assets.
– The price of scrap metal is a good indicator of the industrial

production index six months later.
7
Statistical Hypothesis is a formal structure used to statistically (based
on a sample) test the research hypothesis.
This is the statement which we want to test.
L
A claim (assumption) about a population parameter
TE
Examples:
NP
– The average monthly cell phone bill of the people in a city is
Rs. 1000.00
– The proportion of adults in this city with cell phones is more

than 0.80
8
Statistical Hypothesis: Simple and Composite Hypothesis
A statistical hypothesis is an assertion or conjecture about the
distribution of one or more random variables.
L
If the statistical hypothesis completely specifies the distribution,
TE
then it is called simple; otherwise, it is called composite hypothesis.
NP
9
Statistical Hypothesis: Simple and Composite Hypothesis
The statement “𝝁 is equal to 17” is simple hypothesis since it
completely specifies the distribution.
Everything is known that μ = 17.
L
TE
The statement “𝝁 is less than 17” is a composite hypothesis as it
NP
does not completely specify the distribution.
It is known that μ ≠ 17 but what value does it take is unknown.
10
Test of a Statistical Hypothesis:
A test of a statistical hypothesis is a rule or procedure for deciding
whether to reject the hypothesis.
For example, consider a particular normally distributed population
L
having an unknown mean value 𝝁 and known variance 1.
TE
The statement “𝝁 is less than 17” is a statistical hypothesis that we
NP
could try to test by observing a random sample from this population.
If the random sample is deemed to be consistent with the hypothesis

under consideration, we say that the hypothesis has been
“accepted”; otherwise we say that it has been “rejected.” 11
Randomized and Nonrandomized Test:
A test can be either randomized or nonrandomized.
For example, consider a test Toss a coin, and reject the hypothesis if
and only if a head appears. Such is an example of a randomized test.
L
TE
We will consider only the nonrandomized tests
NP
12
Testing of Hypothesis: One Sample and Two Sample Test
When we are working only with one sample to conclude about a
value, the problem is termed as one sample problem.
When we are working with two samples to make a comparison and
L
TE
conclusion about a value, the problem is termed as two sample
problem.
NP
The two samples can be independent or dependent based on which
we have “two independent samples problem” or “two dependent
samples problem” (“paired data problem”).
13
We judge if the marks obtained by the students are, say 80% or not.
We draw a sample of students and collect their marks. This is one
sample test.
Consider an experiment in which a group of students receives extra
L
TE
mathematical tuition.
We want to judge whether the marks of boys and girls in mathematics

NP
are the same or not. So we draw two random samples of the male
and female students and obtain their marks in mathematics for the
comparison. This is two sample test when the samples are
independent – “two independent samples problem”.
14
Consider an experiment in which a group of students receives extra
mathematical tuition.
Their ability to solve mathematical problems is evaluated before and

after the extra tuition.
L
TE
We are interested in knowing whether the ability to solve
mathematical problems increases after the tuition, or not.
NP
Since the same group of students is used in a pre–post experiment,
this is called a “two‐dependent‐samples problem” or a “paired data
problem”.
15
Construction of Decision Rule:
Let 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 be a random sample of n observations from a
probability 𝒇𝑿 𝒙; 𝜽 , 𝜽 ∈ 𝚯 where 𝚯 is the parametric space.
Partition 𝚯 into two disjoint sets 𝚯𝟎 and 𝚯𝟏 .
L
Now 𝜽 comes from either 𝚯𝟎 or 𝚯𝟏 .
TE
Translate the events as 𝜽 ∈ 𝚯𝟎 or 𝜽 ∈ 𝚯𝟏 .
NP
16
Hypothesis: 𝜽 ∈ 𝚯𝟎 or
Hypothesis: 𝜽 ∈ 𝚯𝟏
We want to know which one is the correct based on the observed
L
sample of data 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 .
TE
We want to devise a rule that can tell us the decision to accept or
NP
reject the hypothesis once the experimental values 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are
observed.
Such a rule is called a test of hypothesis.
17
Different rules of tests of hypothesis can be constructed.
We construct the test around the following notion:
Partition the sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 into two disjoint

subsets, say C and C*.
L
TE
• If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are such that the point 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝑪 , we
shall reject H0.
NP
• If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are such that the point 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C∗, we
shall accept H0.
18
There is no common region between C and C* to avoid any ambiguity
in decision.
• C is called as critical region or region of rejection.
• C* is called as acceptance region or region of acceptance.
L
TE
Next question is how to partition the sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 ?
NP
Consider the possible errors in decision making.
19
Decisions in Tests of Hypothesis:
Used to check the correctness of a statement in statistical sense.
Accept statement Reject statement

Statement Accept correct Reject correct decision:
is correct decision:
L
INCORRECT DECISION
TE
CORRECT
TYPE ONE ERROR
Statement Accept incorrect
NP Reject incorrect decision:
is incorrect decision:
CORRECT
INCORRECT DECISION
TYPE TWO ERROR
20
Procedure for Test:
A good procedure is the one which minimizes both the errors ‐ Type I
and Type II.
L
TE
NP
21
Testing of Hypothesis: Example
Suppose a new type of blood test is introduced.
A patient has some disease and he wants to get it tested with the
blood test.
We have following four possibilities.
L
TE
Test
NP
Detects Not detects
Presence Correct decision Incorrect decision
Disease
Absent Incorrect decision Correct decision
22
Error 1: Person has disease and test does not detects.
Error 2: Person does not has disease and test detects it.
Which one is more serious?
L
Error 1 has more serious consequences than error 2.
TE
NP
Whichever error is more serious‐ designate it as Type‐1 error and
other as Type‐2 error.
23
Construct the hypothesis which you want to test such that Type‐1
error is more serious than Type‐2
Type 1 error: Reject the hypothesis when it is correct.
L
TE
Type 1 error: Person has disease and test does not detects it.
NP
Hypothesis: Person does not has disease.
This is called as Null Hypothesis.
Reject the hypothesis means “Person has disease”. 24

How to compare the hypothetical value.
Need another value for making a comparison.
L
TE
So create an alternative hypothetical value.
NP
The value which we want to test is null hypothesis and the value
against which we want to test is the alternative hypothesis
25
Hypothesis: Statement about the parameter
Statistical Hypothesis has two parts
– Null Hypothesis
L
– Alternative Hypothesis
TE
NP
26
Null Hypothesis:
How to construct the null hypothesis.
The null hyopothesis is constructed such that the type 1 error is more
serious than type 2 error.
The tests are derived using Neyman Pearson lemma which is based
L
on this assumption.
TE
• Denoted as H0 , the statement to be tested.
NP
• Nothing new is happening, usually a hypothesis of no difference.
• Example: The average number of TV sets in Indian Homes is equal

to three. H0: μ = 3
• Begin with the assumption that the H0 is true and test it for
rejection or acceptance. 27
Alternative Hypothesis:
• Denoted as H1 or Ha
• Something new is happening.
• It is the opposite of the null hypothesis.
L
• Statement against which the null hypothesis is tested
TE
Example: The average number of TV sets in Indian homes is not equal
to 3 ( H1: μ ≠ 3 )
NP
• It is generally the hypothesis that the researcher is trying to prove.
• The null and alternative hypotheses are mutually exclusive.
• Only one of them can be true.

28
Errors in Decision Making:
Possible Hypothesis Test Outcomes
Actual Situation
Decision
L
H0 True H0 False
TE
Accept H0 No Error Type II Error
Probability 1 -  Probability 
Reject H0
NP
Type I Error No Error
Probability  Probability 1 - 
29
Two Types of Errors:
Type I Error: Reject H0, when it is true
Type II Error: Accept H0, when it is wrong
Size of Type I Error = P(Type I Error)
L
= P(Reject H0, when it is true)
TE
= P(Reject H0| H0)
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝑪| H0)
NP
= called as Level of Significance.
 is set by the researcher in advance.
30
Size of Type II Error = P(Type II Error)
= P(Accept H0, when it is wrong)
= P(Accept H0| H1)
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C∗ | H1)
L
TE
= 
Power of the test = 1 ‐ 

NP
= 1 ‐ P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C* | H1)
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C | H1)
= Probability of rejection of H0 when it is wrong
Good criterion, Probability of rejecting a false hypothesis.

31
P(Reject H0) = P 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C)
= 𝑲 𝜽 ,𝜽 ∈ 𝚯
𝜶 𝜽 𝐢𝐟 𝜽 ∈ 𝚯𝟎

𝟏 𝜷 𝜽 𝐢𝐟 𝜽 ∈ 𝚯𝟏
L
Power Function = 𝑲 𝜽
TE
= 1 ‐ P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C* | H1)
NP
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C | H1)
= Probability of rejection of H0 when it is wrong
Good criterion, Probability of rejecting a false hypothesis.
32
Procedure for Test:
A good procedure is the one which minimizes both the errors ‐ Type I
and Type II.
Simultaneous minimization of both the errors is not possible.
No procedure exists where  = 0 and  =0.
L
TE
If one error decreases, then the other error increases.
NP
33
Procedure for Test:
Extreme cases:
If C* = sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 , then  = 1 and  = 0, i.e.,

always reject H0 .
If C* = Null Set 𝝓 , then  = 0 and  = 1, i.e., always accept H0 .
L
TE
Both cases: One error is minimized and other error is maximized.
No procedure exists that can minimize both  and simultaneously.

NP
So pick up one and minimize other.
As a convention, as per Neyman Pearson Lemma, fix  and minimize 
Null hypothesis is constructed which we would like to reject or is the

hypothesis tested for possible rejection.
34
Neyman Pearson Lemma:
Let 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 be a random sample of n observations from a
probability function 𝒇𝑿 𝒙; 𝜽 , 𝜽 ∈ 𝚯 where 𝚯 is the parametric space.
Suppose 𝚯 𝜽𝟎 , 𝜽𝟏 .
L
Then the best critical region of size 𝜶 is obtained by the following rule:
TE
Reject 𝑯𝟎 if and only if
𝑳 𝜽 𝟎 ; 𝒙𝟏 , 𝒙 𝟐 , … , 𝒙 𝒏
NP 𝒌
𝑳 𝜽 𝟏 ; 𝒙𝟏 , 𝒙 𝟐 , … , 𝒙 𝒏
𝑳 𝜽𝟎 ;𝒙𝟏 ,𝒙𝟐 ,…,𝒙𝒏

i.e. 𝑪 𝒙 𝟏 , 𝒙 𝟐 , … , 𝒙𝒏 : 𝒌 where k > 0 is a constant to
𝑳 𝜽𝟏 ;𝒙𝟏 ,𝒙𝟐 ,…,𝒙𝒏
be determinded such that 𝑷𝜽𝟎 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝐂 𝜶.

35
Likelihood Ratio Test:
Likelihood ratio test is another approach to find a good test.
L
TE
NP
36
Software ‐ 1
L
TE
:::
Lecture 66
NP
Test Procedures for One Sample Test for Mean
with Known Variance
Shalabh
1
Critical (Rejection) and Acceptance Region:
Critical value divides the whole area under probability curve into two
regions:
• Critical (Rejection) region
L
– When the statistical outcome falls into this region, H0 is rejected.
TE
– Size of this region is α.
• Acceptance Region NP
– When the statistical outcome falls into this region, H0 is accepted.
– Size of this region is (1‐ α).
2
One‐ and Two‐Sided Tests:
We distinguish between one‐sided and two‐sided hypotheses and
tests.
For an unknown population parameter 𝜃 (e.g. μ) and a fixed value 𝜽𝟎 .
L
Case Null hypothesis Alternative hypothesis
TE
(a)
𝜽 𝜽𝟎 𝜽 𝜽𝟎 Two‐sided test
(b)
𝜽 𝜽𝟎
NP 𝜽 𝜽𝟎 One‐sided test
(c)
𝜽 𝜽𝟎 𝜽 𝜽𝟎 One‐sided test
3
One‐ and Two‐Sided Tests:
Two Tail Test H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎

Distribution of the test statistic
L
Acceptance Region
TE
 
Region of Region of
Rejection
NP Rejection


Critical Values
4
Critical Values:
Distribution of the test statistic
One Tail Tests

 
L
Upper‐tail test or Right Tail Test
H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎
TE
NP
  
0
Lower‐tail test or Left Tail Test
H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎
5
Steps for Testing of hypothesis:
In order to test the null hypothesis, a “good“ statistical test is
developed by fixing the type one error and minimizing the second
type error.
1. Define distributional assumptions for the random variables of
L
TE
interest, and specify them in terms of population parameters
2. Formulate the null hypothesis and the alternative hypothesis

NP
3. Fix a significance value (type one error)
4. Test involves developing a TEST STATISTICS.
6
Steps for testing of hypothesis:
5. Construct a test statistic T(X) = T(X1,X2,...,Xn) on the basis of
observed data X1,X2,...,Xn
6. Calculate the value of test statistics on the basis of given data.
7. Construct a critical region K for the statistic T, i.e. a region where
L
TE
if T falls in this region – H0 is rejected.
8. Calculate t(x) = T(x1,x2,...,xn) based on the realized sample values

NP
X1=x1, X2=x2,..., Xn=xn.
9. Test statistics follows a probability distribution.
7
Steps for testing of hypothesis:
10. Find the corresponding value from the corresponding probability
distribution.
11. Compare the two values and accept or reject the null hypothesis.
12. Decision rule:
L
TE
 if t(x) falls into the critical region K, then H0 is rejected. The
alternative hypothesis is then accepted.
NP
 If t(x) falls outside the critical region, H0 is not rejected.
8
p ‐ Value:
Statistical software usually does not follow all the steps of
hypothesis testing.
Instead of calculating and reporting the critical values, the test
L
TE
statistics is printed together with p‐value.
NP
It is possible to use the p‐value instead of critical regions for making
test decisions.
p value : estimated probability of rejecting the null hypothesis

9
p ‐ Value:
The p‐value of the test statistic T(X) is defined as follows:
For two‐sided case: PH0(|T| > t(x)) = p‐value
For one‐sided case: PH0(T ≥ t(x)) = p‐value
L
PH0(T ≤ t(x)) = p‐value
TE
Note: PH0 means probability when H0 is true.
NP
If p value < significance level (α) then reject the null hypothesis
Significance level: Probability of type one error (α)
10
Test Decisons Using Confidence Intervals:
Interesting and useful relationship between confidence intervals
and hypothesis tests
If H0 is rejected at the significance level α then there exists a
L
100(1‐ α)% confidence interval which yields the same conclusion
TE
as the test.
NP
Suppose H0: θ = θ0
If confidence interval does not contain the value θ0, then H0 is
rejected.
11
Hypothesis Testing for μ (σ known): Example
• A survey, done 10 years ago, of CA’s (Chartered Accountant)
found that their average annual salary was Rs. 74,914.
• An accounting researcher would like to test whether over the
years
L
– this average has increased?
TE
• Right or Upper Tail Test (H0: μ = 74914, H1: μ > 74914)
– this average has decreased?
NP
• Left or Lower Tail Test (H0: μ =74914, H1: μ < 74914)
– this average has changed?
• Two Tail Test (H0: μ = 74914, H1: μ ≠ 74914)
• A sample of 112 CAs gave a mean annual salary of Rs. 78,695.
• Assume that  = Rs. 14,530. 12
• Assumptions:
– σ is known.
– Population is normal or
– sample size is large (n ≥ 30).
L
• Test Statistic: Compute the value of test statistic using following
formula:
TE
𝒙 𝝁
𝒁𝒄
NP 𝝈/ 𝒏
• Level of Significance: Fix the value of a, say 0.05 or 0.10
• Critical Values:
– Distribution of test statistic is N(0,1) when H0 is true.
– Critical values are obtained using N(0,1)

13
• For two tail test (H0: μ = μ0, H1: μ ≠ μ0):
N(0,1)

 
L
• P(‐ zα/2 < Z< zα/2 ) = 1‐ α  
TE
• P(‐ 1.96 < Z < 1.96 ) = 0.95
• P(Z > zα/2 ) = α/2

NP
• P(Z >1.96) = 0.025
• P( Z < ‐zα/2 ) = α/2
• P(Z < ‐1.96) = 0.025

14
• For right tail test (H0: μ = μ0, H1: μ > μ0): zα
N(0,1)
• P( Z < zα) = 1‐ α

• P(Z < 1.645 ) = 0.95

• P(Z > zα ) = α
• P(Z >1.645) = 0.05
L

TE
NP
• For left tail test (H0: μ = μ0, H1: μ < μ0): ‐zα
• P( Z > ‐zα) = 1‐ α N(0,1)
• P(Z > ‐1.645 ) = 0.95 

• P(Z < ‐zα ) = α 
• P(Z < ‐1.645) = 0.05
 15
Decision Making
We reject H0 in the favor of H1 at α x100% level
 If |Zc| > zα/2 (for two tailed test)
L
 If Zc > zα (for right tailed test)
TE
 If Zc < ‐zα (for left tailed test)
Accepting H0 means thatNP

• The difference between sample mean and hypothetical
population mean is not significant.
• This difference is because of sampling fluctuation only.
16
• State H0 and H1
• Compute the value of test statistic Zc
• Obtain critical value for fixed α and according to H1 (Right/
L
Left/ Two tailed test)
TE
• Compare computed value of Zc with critical value
• Make the decision accordingly.

NP
• Some useful critical values of N(0,1) distribution
Test Level of Significance
1% 5% 10%
Two Tailed 2.58 1.96 1.645
Right Tailed 2.33 1.645 1.28
Left Tailed ‐2.33 ‐1.645 ‐1.28 17
• A survey, done 10 years ago, of CA’s (Chartered Accountant)
found that their average annual salary was Rs. 74,914.
• An accounting researcher would like to test whether over the
years
L
– this average has increased?
TE
• Right or Upper Tail Test (H0: μ = 74914, H1: μ > 74914)
– this average has decreased?
NP
• Left or Lower Tail Test (H0: μ =74914, H1: μ < 74914)
– this average has changed?
• Two Tail Test (H0: μ = 74914, H1: μ ≠ 74914)
• A sample of 112 CAs gave a mean annual salary of Rs. 78,695.
• Assume that  = Rs. 14,530. 18
Has the average salary of CAs increased?
• Right or Upper Tailed Test (H0: μ = 74914, H1: μ > 74914)
• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539
L
𝜎⁄ 𝑛 14530⁄ 112
TE
• At 5% level of significance, critical value for a right tailed test,
z(0.05) =1.645
•
NP
Since, computed value > critical value at 5% level of significance
• We reject H0 at 5% level of significance in favor of H1 and

conclude that average salary of CAs has increased.
19
Has the average salary of CAs decreased?
• Left or Lower Tailed Test (H0: μ = 74914, H1: μ < 74914)
• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539
L
𝜎⁄ 𝑛 14530⁄ 112
TE
• At 5% level of significance, critical value for a left tailed test, z(0.05)
= ‐1.645
NP
• Since, computed value > critical value at 5% level of significance,
we accept H0 at 5% level of significance against H1 and conclude

that average salary of CAs has not decreased.
20
Has the average salary of CAs changed?
• Two Tailed Test (H0: μ = 74914, H1: μ ≠ 74914)
• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539
L
𝜎⁄ 𝑛 14530⁄ 112
TE
• At 5% level of significance, critical value for a two tailed test,
z(0.025) =1.96
•
NP
Since, |computed value| > critical value at 5% level of
significance, we reject H0 at 5% level of significance in favor of H1
and conclude that average salary of CAs is changed.
21
p – Value Approach:
• Let Zc be the computed value of test statistic
• Let Z ~ N(0,1)
• Then p – value is given by the following probability
– For two tailed tests:
• 2P(Z> |Zc|)
L
– For right tailed tests:
TE
• P(Z > Zc)
– For left tailed tests:
NP
• P(Z < Zc)
• Decision: H0 is rejected in the favor of H1 at α x100% level of
significance, if p – value < α
• The p – value is the smallest level of significance at which H0 would
be rejected.
22
Classical Gauss Test
Gauss.test is available in library ‘’compositions’’
(Compositional Data Analysis). So it needs to be installed first.
install.packages("compositions")
library(compositions)
L
Description
TE
One and two sample Gauss test for equal mean of normal random
NP
variates with known variance.
Usage
Gauss.test(x,y=NULL,mean=0,sd=1,alternative =
c("two.sided", "less", "greater"))
23
Arguments
x : a numeric vector providing the first dataset
y : optional second dataset
mean : the mean to compare with
L
: the known standard deviation
TE
sd
alternative : the alternative to be used in the test
NP
24
Hypothesis Testing for μ (σ known): Example in R
Suppose a random sample of size n = 20 of the day temperature in a

particular city is drawn. Let us assume that the temperature in the
population follows a normal distribution N(μ, σ2) with , σ2 = 36 . The
L
sample provides the following values of temperature (in degree
TE
Celsius) :
NP
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2
Note that 𝝁 𝒙 𝟒𝟏. 𝟗𝟕
25
L
TE
temp=c(40.2, 32.8, 38.2, 43.5, 47.6, 36.6,
38.4, 45.5, 44.4, 40.3, 34.6, 55.6, 50.9, 38.9,
37.8, 46.8, 43.6, 39.5, 49.9, 34.2 )
NP
26
H0: μ = 30 H1: μ ≠ 30
Gauss.test(x=temp, y=NULL, mean=30, sd=6,

alternative = "two.sided")
one sample Gauss-test
L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1e-06
NP
alternative hypothesis: two.sided
27
H0: μ = 30 H1: μ < 30

alternative = "less")
L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1
NP
alternative hypothesis: less
28
H0: μ = 30 H1: μ > 30

alternative = "greater")
L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1
NP
alternative hypothesis: less
29
Critical Values in t and F distribution:
P[t > q]=0.025 based on t distribution
> qt(p=0.25, df=7, ncp=0, lower.tail=F)
[1] 0.7111418
L
P[t 1.96 or t > 1.96](2‐tailed) based on t distribution
TE
> qt(p=0.25, df=7, ncp=0, lower.tail=F)*2
[1] 1.422284 NP
P[F > p] = 0.025 based on F distribution
> qf(p=0.025, df1=5, df2=10, ncp=0, lower.tail
= F)
[1] 4.236
30
Critical Values in Normal Distribution:
P[z > q]=0.025 based on Normal distribution
> qnorm(0.025, lower.tail=F)
[1] 1.959964
L
P[z 1.96 or z > 1.96](2‐tailed) based on Normal distribution
TE
> pnorm(1.96, lower.tail=F)*2
[1] 0.04999579 NP
P[z 3] based on Normal distribution
> (1 - pnorm(3))*2
[1] 0.04999579 31
Software ‐ 1
L
TE
:::
NPLecture 67
One Sample Test for Mean with Unknown Variance
Shalabh
1
Hypothesis Testing for μ (σ unknown): t ‐ Test
• Let X1,X2,...,Xn be a random sample from 𝑵 𝝁, 𝝈𝟐 where
• Assumptions:
– σ is unknown.
L
– Population is normal or
TE
– sample size is small (n ≤ 30).
NP
• Test Statistic: Compute the value of test statistic using following
formula:
𝒙 𝝁
𝑻𝒄
𝒔/ 𝒏
𝟏
where 𝒔𝟐 ∑𝒏𝒊 𝟏 𝒙𝒊 𝒙 𝟐
.
𝒏 𝟏 2
• Level of Significance: Fix the value of α, say 0.05 or 0.10
• Test statistic Tc has t(n‐1) distribution.
• Critical values of t(n‐1) distribution can be obtained from the t
L
table for given d.f. and significance level.
TE
• For large samples, t distribution can be approximated by N(0,1).
NP
3
• For two tail test (H0: μ = μ0, H1: μ ≠ μ0):
T~t(n-1)
Rejection Rejection
Region (α/2) Acceptance Region (α/2)
Region (1- α)
L
 tα/2
TE
tα/2
• P(‐ tα/2 < T< tα/2 ) = 1‐ α

NP
• P(T > tα/2 ) = α/2
• P( T < ‐tα/2 ) = α/2
4
We reject H0 in the favor of H1 at α x100% level
• If |Tc| > tα/2 (for two tailed test)
• If Tc > tα (for right tailed test)
L
• If Tc < ‐tα (for left tailed test)
TE
NP
5
Hypothesis Testing for μ (σ known): t – Test
T~t(n-1)
For right tail test (H0: μ = μ0, H1: μ > μ0):
α
• P(T < tα) = 1- α  α
• P(T > tα ) = α
L
tα
TE
NP
• For left tail test (H0: μ = μ0, H1: μ < μ0):
T~t(n-1)
• P(T > -tα) = 1- α α
• P(T < -tα ) = α  α
tα 6
Hypothesis Testing for μ (σ unknown): p – value
Approach
Let tc be the computed value of test statistic
Let T ~ t(n‐1)
Then p – value is given by the following probability
L
TE
• For two tailed tests: 2P(T > |tc|)
• For right tailed tests: P(T > tc)

NP
• For left tailed tests: P(T < tc)
Decision:
H0 is rejected in the favor of H1 at α x100% level of significance,
if p – value < α
7
Hypothesis Testing for μ (σ unknown): t ‐ Test in R
t.test()
Performs one and two sample t‐tests on vectors of data.
Usage
t.test(x, ...)
L
TE
t.test(x, y = NULL, alternative =
NP
c("two.sided", "less", "greater"), mu = 0,
paired = FALSE, var.equal = FALSE, conf.level
= 0.95, ...)
8
Arguments
x a (non‐empty) numeric vector of data values.
y an optional (non‐empty) numeric vector of data values.
L
alternative a character string specifying the alternative
TE
hypothesis, must be one of "two.sided" (default), "greater"
NP
or "less". You can specify just the initial letter.
mu a number indicating the true value of the mean (or

difference in means if you are performing a two sample test).
9
Arguments
paired a logical indicating whether you want a paired t‐test.
var.equal a logical variable indicating whether to treat the two
L
variances as being equal. If TRUE then the pooled variance
TE
is used to estimate the variance otherwise the Welch (or
NP
Satterthwaite) approximation to the degrees of freedom is used.
conf.level confidence level of the interval.
10
Hypothesis Testing for μ (σ unknown): Example in R

population follows a normal distribution N(μ, σ2) where σ2 is
L
unknown. The sample provides the following values of
TE
temperature (in degree Celsius) :
NP
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2
temp=c(40.2, 32.8, 38.2, 43.5, 47.6, 36.6,

38.4, 45.5, 44.4, 40.3, 34.6, 55.6, 50.9, 38.9,
11
37.8, 46.8, 43.6, 39.5, 49.9, 34.2 )
H0: μ = 40 H1: μ ≠ 40
t.test(temp, y = NULL, alternative =
"two.sided", mu = 40, paired = FALSE,
var.equal = FALSE, conf.level = 0.95)
One Sample t-test
L
TE
data: temp
t = 1.4476, df = 19, p-value = 0.164
NP
alternative hypothesis: true mean is not equal
to 40
95 percent confidence interval:
39.12384 44.80616
sample estimates:
mean of x
41.965
12
L
TE
NP
13
H0: μ = 40 H1: μ < 40
t.test(temp, y = NULL, alternative = "less",
mu = 40, paired = FALSE, var.equal = FALSE,
conf.level = 0.95)
One Sample t-test
L
TE
data: temp
t = 1.4476, df = 19, p-value = 0.918
NP
alternative hypothesis: true mean is less than
40
-Inf 44.3122
sample estimates:
mean of x
41.965
14
L
TE
NP
15
H0: μ = 40 H1: μ > 40
t.test(temp, y = NULL, alternative =
"greater", mu = 40, paired = FALSE, var.equal
= FALSE, conf.level = 0.95)
L
One Sample t-test
data: temp
TE
t = 1.4476, df = 19, p-value = 0.08202
alternative hypothesis: true mean is greater
than 40 NP
39.6178 Inf
sample estimates:
mean of x
41.965
16
L
TE
NP
17
Software ‐ 1
L
TE
:::
Lecture 68
NP
Two Sample Test for Mean with Known and
Unknown Variances
Shalabh
1
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
We want to conduct statistical Hypothesis testing of difference of
means
• In a survey of buying habits, 400 women shoppers are chosen at
L
random in a super market ‘A’. Their average weekly food
TE
expenditure is Rs. 250 with a s.d. of Rs. 40.
NP
• For 400 women shoppers chosen at random in some other
super market ‘B’ , the average weekly food expenditure is Rs.
220 with a s.d. of Rs. 55.
2
• Do these two populations have similar shopping habits.
• Are the average weekly food expenditure of two populations of

shoppers equal.
L
TE
NP
3
• Suppose we have two samples
• First sample is drawn from a population with mean 𝝁𝟏 and

variance 𝝈𝟐𝟏 .
L
• 𝒙𝟏 is the mean and 𝒏𝟏 is the size of 1st sample.
TE
• Second sample is drawn from a population with mean 𝝁𝟐 and
variance 𝝈𝟐𝟐 . NP
• 𝒙𝟐 is the mean and 𝒏𝟐 is the size of 2nd sample.
• We wish to examine if two population means μ1 and μ2 are

different.
4
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2
– H0: μ1 = μ2 H1: μ1 > μ2
L
– H0: μ1 = μ2 H1: μ1 < μ2
TE
• Since population means μ1 and μ2 are unknown, we use the
statistic 𝒙𝟏 𝒙𝟐 to make some conclusion about 𝝁𝟏
NP 𝝁𝟐 .
• Assumptions:
– σ1 and σ2 both are known
– Both populations are normal.
– Or, sample sizes n1 and n2 are large. 5

𝝈𝟐𝟏 𝝈𝟐𝟐
• We know 𝒙𝟏 ~𝑵 𝝁𝟏 , and 𝒙𝟐 ~𝑵 𝝁𝟐 ,
𝒏𝟏 𝒏𝟐
• Test Statistic:
𝝈𝟐𝟏 𝝈𝟐𝟐
L
𝐄 𝒙𝟏 𝒙𝟐 = 𝝁𝟏 𝝁𝟐 and 𝐕𝐚𝐫 𝒙𝟏 𝒙𝟐 =
𝒏𝟏 𝒏𝟐
TE
𝒙𝟏 𝒙𝟐 𝟎 𝒙𝟏 𝒙𝟐
Thus 𝒁𝒄 when H0 is true.
𝝈𝟐 𝟐
𝟏 𝝈𝟐
𝒏𝟏 𝒏𝟐
NP 𝝈𝟐 𝟐
𝟏 𝝈𝟐
𝒏𝟏 𝒏𝟐
• Under our assumptions, 𝒁𝒄 ~𝑵 𝟎, 𝟏 .
• We use N(0, 1) distribution to get critical values and p – values.
6
Known Population Variances) in R
Gauss.test is available in library ‘’compositions’’

(Compositional Data Analysis). So it needs to be installed first.
L
TE
Usage NP
Gauss.test(x, y, mean=0, sd=1, alternative =
c("two.sided", "less", "greater"))
7
Known Population Variances) in R
Arguments
x : a numeric vector providing the first dataset
y : a numeric vector providing the second dataset
L
mean : the mean to compare with
TE
sd : the known standard deviation
NP
alternative : the alternative to be used in the test
8
Known Population Variances) : Example in R
Following are the gain in weights (in grams) of fishes fed on two
diets A and B.
A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25
L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22
TE
Suppose the standard deviation is known to be 2
NP
We want to test if the two diets differ significantly as regards their
effect on increase in weight.
H0: μ1 = μ2 H1: μ1 ≠ μ2
9
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)
xb =
c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)
L
TE
Gauss.test(xa, xb, mean=0, sd=2, alternative =
c("two.sided")) NP
data: xa
T = -2, mean = 0, sd = 2, p-value = 0.009823
alternative hypothesis: two.sided 10
L
TE
NP
11
Unknown Population Variances) :
• When σ1 and σ2 both are known, we use N(0, 1) distribution.
• When σ1 and σ2 both are not known, we use t distribution
• Assumptions:
L
– Both populations are normal.
TE
– Unknown σ1 and σ2 are equal.
NP
• For large sample sizes, we can still use N(0,1) distribution.
12
Unknown Population Variances) :
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2
– H0: μ1 = μ2 H1: μ1 > μ2
L
– H0: μ1 = μ2 H1: μ1 < μ2
TE
• Let
 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and
NP
 𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐
• Population means 𝝁𝟏 and 𝝁𝟐 are unknown, we use the statistic

𝒙 𝒚 to make some conclusion about 𝝁𝟏 𝝁𝟐 .
• σ1 and σ2 both are unknown. 13

Unknown Population Variances)
𝝈𝟐𝟏 𝝈𝟐𝟐
• We know 𝒙~𝑵 𝝁𝟏 , and 𝒚~𝑵 𝝁𝟐 ,
𝒎 𝒏
• Test Statistic:
𝐄 𝒙 𝒚 = 𝝁𝟏 𝝁𝟐 and
L
TE
𝒎 𝒏
𝟏
𝑺𝟐 𝒙𝒊 𝒙 𝟐 𝒚𝒊 𝒚 𝟐
𝒎 𝒏 𝟐
𝒊 𝟏
NP 𝒋 𝟏
𝒙 𝒚 𝟎 𝒙 𝒚
Thus 𝑻𝒄 when H0 is true.
𝟏 𝟏 𝟏 𝟏
𝑺 𝑺
𝒎 𝒏 𝒎 𝒏
• Under our assumptions, 𝑻𝒄 ~𝒕𝒎 𝒏 𝟐
• We use N(0, 1) distribution to get critical values and p – values.

14
Unknown Population Variances): Example in R
Following are the gain in weights (in grams) of fishes fed on two
diets A and B.
A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25
L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22
TE
Suppose the standard deviation is unknown.
NP
(Note that now the sd is unknown (earlier it was known))
We want to test if the two diets differ significantly as regards their

effect on increase in weight.
H0: μ1 = μ2 H1: μ1 ≠ μ2
15
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)
xb = c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)
t.test(xa, xb, alternative = c("two.sided"), mu = 0,
L
paired = FALSE, var.equal = TRUE, conf.level = 0.95)
TE
Two Sample t-test
data: xa and xb
NP
t = -0.61028, df = 25, p-value = 0.5472
alternative hypothesis: true difference in means is not
equal to 0
-8.749507 4.749507
sample estimates:
mean of x mean of y
28 30 16
L
TE
NP
17
Testing Hypothesis for Difference of Means (Dependent
Samples):
Assume a company send their salespeople to a “customer service”
training workshop. They want to know has the training made a
difference in the number of complaints.
L
Following data is collected:
TE
Salesperson Number of Complaints
NP Before After
C.B. 6 4
T.F. 20 6
M.H. 3 2
R.K. 0 0
M.O 4 0 18
Samples):
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2
– H0: μ1 = μ2 H1: μ1 > μ2
L
– H0: μ1 = μ2 H1: μ1 < μ2
TE
• Let
 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and
NP
 𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐
• Both samples are dependent samples.
19
Samples):
• The two samples are not independent.
• But sample observations are paired together.
• This means the pair of observations xi and yi corresponds to the
L
same i th individual.
TE
• Cleary, the sizes of both samples should be the same.
NP
• Population means 𝝁𝟏 and 𝝁𝟐 are unknown, we use the statistic
𝒙 𝒚 to make some conclusion about 𝝁𝟏 𝝁𝟐 .
• We obtain the differences di = xi – yi for each pair of observations.
• Assumption: Both populations are normal.

20
Samples):
• Since, we use the differences di = xi – yi in our analysis. We need
the population variance of di’s.
• In practice this variance is not available.
L
• Even if σ1 and σ2 are known, then also we can not obtain
TE
population variance of di’s. (samples are not independent)
NP
• So, this test can be considered as
– a test of single mean (of di‘s).
– with hypothetical mean as zero.
– population variance unknown.

21
Samples):
Thus the test statistic is
𝒅
𝒄 when H0 is true.
𝒔𝒅 / 𝒏
L
𝟏 𝒏 𝟏 𝟐
where 𝒅 ∑ 𝒅, 𝒔𝟐𝒅 ∑𝒏𝒊 𝟏 𝒅𝒊 𝒅
TE
𝒏 𝒊 𝟏 𝒊 𝒏 𝟏
NP
• Under our assumptions, 𝑻𝒄 ~𝒕𝒏 𝟏.
22
Samples):
Usage
t.test(x, y, paired = TRUE)
L
TE
t.test(x, y, alternative = c("two.sided",
"less", "greater"), mu = 0, paired = TRUE,
NP
var.equal = TRUE, conf.level = 0.95)
23
Samples):
Arguments
x a (non‐empty) numeric vector of data values.
y a (non‐empty) numeric vector of data values.
L
TE
hypothesis, must be one of "two.sided" (default), "greater" or
"less". NP
mu a number indicating the true value of the mean (or difference in
means if you are performing a two sample test).
paired a logical indicating whether you want a paired t‐test.
24
Samples):
Arguments
var.equal a logical variable indicating whether to treat the two
variances as being equal. If TRUE then the pooled variance is used
L
to estimate the variance otherwise the Welch (or Satterthwaite)
TE
approximation to the degrees of freedom is used.
NP
conf.level confidence level of the interval.
25
Samples): Example in R
Assume a company send their salespeople to a “customer service”
training workshop. They want to know has the training made a
difference in the number of complaints.
L
Following data is collected:
TE
Salesperson Number of Complaints Difference, di
NP
Before (1) After (2) (2‐1)
C.B. 6 4 ‐2
T.F. 20 6 ‐14
M.H. 3 2 ‐1
R.K. 0 0 0
M.O 4 0 ‐4
26
• To test H0: μ1 = μ2 H1: μ1 ≠ μ2
• At 1 % level and 4 d.f., Critical Value = 4.604
L
𝒅 𝟒.𝟐
𝑻𝒄 𝟏. 𝟔𝟔 Reject Reject
𝒔𝒅 / 𝒏 𝟓.𝟔𝟕/ 𝟓
TE
/2 /2
NP
• Decision: Do not reject H0
‐ 4.604
‐ 1.66
4.604
• Conclusion: There is no evidence of a significant change in the

number of complaints
27
ya = c(6,20,3,0,4)
yb = c(4,6,2,0,0)
L
t.test(ya, yb, alternative = c("two.sided"), mu
TE
= 0, paired = T)
NP
28
Paired t-test
data: ya and yb
t = 1.655, df = 4, p-value = 0.1733
L
alternative hypothesis: true difference in
TE
means is not equal to 0
NP
-2.845828 11.245828
sample estimates:
mean of the differences
4.2
29
L
TE
NP
30
Software ‐ 1
L
TE
:::
Lecture 69
NP
Test of Hypothesis for Variance in One and Two
Samples
Shalabh
1
Hypothesis Testing for σ2 :
• A pharmaceutical company is considering the purchase of new

bottling machines to increase efficiency.
• The factory currently makes use of machines that fill cough
L
syrup bottles whose volume of medicine has a standard
TE
deviation of 1.6 mL.
NP
• The new machine they are considering was tested on 30 bottles,
producing a batch with a standard deviation of 1.2 ml.
• Does this machine produce a variance less than 1.6 ml at the

0.05 significance level?
2
• Assumptions:
– Population is normal.
• Test Statistic:
L
𝒏
𝒙𝒊 𝒙 𝟐 𝒏 𝟏 𝒔𝟐
TE
𝒊 𝟏
𝝌𝟐𝒄
𝝈𝟐 𝝈𝟐
NP
• Distribution of above test statistic is Chi Square with (n ‐ 1)
degree of freedom.
• Critical values are obtained from the Chi Square table for given
level of significance and d.f.
3
• For two tail test (H0: σ = σ0, H1: σ ≠ σ0):
Critical Region
Critical Region
(α/2)
(α/2)
Acceptance Region
L
TE
(1‐α)
0
 (21 / 2 )  (2 / 2 )
NP
• This distribution is not symmetric.
• So, we have different notations for critical values.
• Area after the point is mentioned in subscript.
• We reject H0 in the favor of H1 at α x100% level, If 𝝌𝟐𝒄 𝝌𝟐 /𝟐 or 𝝌𝟐𝒄
𝝌𝟐𝟏 /𝟐 4
• For right tail test (H0: σ = σ0, H1: σ > σ0):
Critical Region
(α)
L
Acceptance Region
TE
(1‐α)
0
NP  (2 )
• We reject H0 in the favor of H1 at α x100% level, if 𝝌𝟐𝒄 𝝌𝟐
5
• For left tail test (H0: σ = σ0, H1: σ < σ0):
Critical Region
(α)
L
Acceptance Region
TE
(1-α)
0
 (21 ) NP
• We reject H0 in the favor of H1 at α x100% level, if 𝝌𝟐𝒄 𝝌𝟐𝟏
6
One‐Sample Chi‐Squared Test on Variance :
Estimate the variance, test the null hypothesis using the chi‐
squared test that the variance is equal to a user‐specified value,
and create a confidence interval for the variance.
L
install.packages("EnvStats")
TE
library(EnvStats)
NP
Usage
varTest(x, alternative = "two.sided",
conf.level = 0.95, sigma.squared = 1)
7
One‐Sample Chi‐Squared Test on Variance:
Arguments
x numeric vector of observations.
alternative character string indicating the kind of alternative

hypothesis. The possible values are "two.sided" (the default),
L
"greater", and "less".
TE
conf.level numeric scalar between 0 and 1 indicating the
NP
confidence level associated with the confidence interval for the
population variance. The default value is conf.level=0.95.
sigma.squared a numeric scalar indicating the hypothesized

value of the variance. The default value is sigma.squared=1.
8
Hypothesis Testing for σ2 : Example in R
population follows a normal distribution N(μ, σ2) where σ2 is
L
unknown. The sample provides the following values of
TE
temperature (in degree Celsius) :
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
NP
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2
temp=c(40.2, 32.8, 38.2, 43.5, 47.6, 36.6,

38.4, 45.5, 44.4, 40.3, 34.6, 55.6, 50.9, 38.9,
37.8, 46.8, 43.6, 39.5, 49.9, 34.2 )
9
• We want to test: H0: σ2 = 36, H1: σ2 ≠ 36, two sided test,  = 5%
varTest(temp, alternative = "two.sided", conf.level
= 0.95, sigma.squared = 36)
Chi-Squared Test on Variance
data: temp
L
Chi-Squared = 19.45, df = 19, p-value = 0.8566
TE
alternative hypothesis: true variance is not equal
to 36
NP
21.31373 78.61721
sample estimates:
variance
36.85292
10
L
TE
NP
11
• We want to test: H0: σ2 = 36, H1: σ2 > 36, right tail test,  = 5%
varTest(temp, alternative = "greater", conf.level =
0.95, sigma.squared = 36)
L
data: temp
TE
alternative hypothesis: true variance is greater
than 36
NP
23.22905 Inf
sample estimates:
variance
36.85292
12
L
TE
NP
13
• We want to test: H0: σ2 = 36, H1: σ2 < 36, left tail test,  = 5%
varTest(temp, alternative = "less", conf.level =
0.95, sigma.squared = 36)
L
data: temp
TE
alternative hypothesis: true variance is less than
36
NP
0.00000 69.21069
sample estimates:
variance
36.85292
14
L
TE
NP
15
Testing the Hypothesis for Difference of Variances:
We have two populations with variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐
We want to examine the difference between these two variances.
These variances are unknown.
L
So we take a sample from each population.
TE
And use sample variances to study about population variances.
We wish to test:
NP
𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐
𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐
𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐

16
Assumptions: both populations are normal.
Let 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and
𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐
L
TE
𝟏
∑𝒎 𝒙𝒊 𝒙 𝟐
𝒎 𝟏 𝒊 𝟏
Test Statistic: 𝑭𝒄 𝟏
∑𝒏 𝒚𝒊 𝒚 𝟐
𝒏 𝟏 𝒊 𝟏
NP
Fc ~ F(m-1, n-1) when the null hypothesis is true.
Critical values are obtained using F table for given d.f. and level of
significance.
17
For two tail test (𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐 ):
Critical Region
(α/2) Critical Region
(α/2)
Acceptance Region
L
(1‐α)
TE
0
Reject H0 Accept H0 Reject H0
NP
F(1 / 2 ) F( / 2 )
This distribution is not symmetric.
Reject H0 in the favor of H1 at α x100% level, if 𝑭𝒄 𝑭𝜶/𝟐 or
𝑭𝒄 𝑭𝟏 𝜶/𝟐 and critical values are obtained using F table for given
18
d.f. and level of significance.
Critical Region
(α)
L
Acceptance Region
TE
(1‐α)
0
Accept H0 Reject H0
NP F( )
Reject H0 in the favor of H1 at α x100% level, if 𝑭𝒄 𝑭𝜶 .
19
Critical Region
(α)
L
Acceptance Region
TE
(1‐α)
0
Reject H0 NP
F(1 ) Accept H0
Reject H0 in the favor of H1 at α x100% level, if 𝑭𝒄 𝑭𝟏 𝜶.
20
Testing the Hypothesis for Difference of Variances In R:
Usage
var.test(x, ...)
L
var.test(x, y, ratio = 1, alternative =
TE
c("two.sided", "less", "greater"),
NP
conf.level = 0.95)
21
Testing the Hypothesis for Difference of Variances In R:
Arguments
x, y numeric vectors of data values.
ratio the hypothesized ratio of the population variances of x
L
and y.
TE
NP
hypothesis, must be one of "two.sided" (default), "greater" or
"less".
conf.level confidence level for the returned confidence

interval.
22
Testing the Hypothesis for Difference of Variances: Example in R
Following are the gain in weights (in grams) of fishes fed on two diets
A and B. Assume normal populations.
A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25
L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22
TE
We want to test if the variances of the two diets differ significantly as
NP
regards their effect on increase in weight.
𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)
xb = c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)
23
var.test(xa, xb, ratio = 1, alternative =
"two.sided", conf.level = 0.95)
F test to compare two variances

data: xa and xb
L
F = 0.343, num df = 11, denom df = 14, p-value =
TE
0.08144
alternative hypothesis: true ratio of variances is
not equal to 1 NP
0.1108401 1.1520871
sample estimates:
ratio of variances
0.3430045
24
L
TE
NP
25

Eds1 Week12

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eds1 Week12

Uploaded by

Copyright:

Available Formats

Essentials of Data Science With R

The value of estimates may or may not be the same as of the

Find the difference between estimated value and hypothetical value.

How to decide scientifically that the difference between the values is

All such statements can be converted into a statement or a

– Older workers are more loyal to a company.

– The price of scrap metal is a good indicator of the industrial

This is the statement which we want to test.

– The proportion of adults in this city with cell phones is more

Everything is known that μ = 17.

It is known that μ ≠ 17 but what value does it take is unknown.

For example, consider a particular normally distributed population

If the random sample is deemed to be consistent with the hypothesis

and only if a head appears. Such is an example of a randomized test.

When we are working with two samples to make a comparison and

Consider an experiment in which a group of students receives extra

We want to judge whether the marks of boys and girls in mathematics

Their ability to solve mathematical problems is evaluated before and

Partition 𝚯 into two disjoint sets 𝚯𝟎 and 𝚯𝟏 .

We want to know which one is the correct based on the observed

Such a rule is called a test of hypothesis.

We construct the test around the following notion:

Partition the sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 into two disjoint

• C is called as critical region or region of rejection.

• C* is called as acceptance region or region of acceptance.

Accept statement Reject statement

TYPE TWO ERROR

We have following four possibilities.

Which one is more serious?

Type 1 error: Reject the hypothesis when it is correct.

This is called as Null Hypothesis.

Reject the hypothesis means “Person has disease”. 24

Need another value for making a comparison.

Statistical Hypothesis has two parts

• Example: The average number of TV sets in Indian Homes is equal

• Something new is happening.

• It is the opposite of the null hypothesis.

• The null and alternative hypotheses are mutually exclusive.

• Only one of them can be true.

Possible Hypothesis Test Outcomes

Type II Error: Accept H0, when it is wrong

Size of Type I Error = P(Type I Error)

 is set by the researcher in advance.

= P(Accept H0, when it is wrong)

= P(Accept H0| H1)

Power of the test = 1 ‐ 

= Probability of rejection of H0 when it is wrong

Good criterion, Probability of rejecting a false hypothesis.

= Probability of rejection of H0 when it is wrong

Good criterion, Probability of rejecting a false hypothesis.

Simultaneous minimization of both the errors is not possible.

No procedure exists where  = 0 and  =0.

If C* = sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 , then  = 1 and  = 0, i.e.,

If C* = Null Set 𝝓 , then  = 0 and  = 1, i.e., always accept H0 .

No procedure exists that can minimize both  and simultaneously.

As a convention, as per Neyman Pearson Lemma, fix  and minimize 

Null hypothesis is constructed which we would like to reject or is the

𝑳 𝜽𝟎 ;𝒙𝟏 ,𝒙𝟐 ,…,𝒙𝒏

be determinded such that 𝑷𝜽𝟎 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝐂 𝜶.

• Critical (Rejection) region

– Size of this region is (1‐ α).

For an unknown population parameter 𝜃 (e.g. μ) and a fixed value 𝜽𝟎 .

Two Tail Test H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎