Professional Documents
Culture Documents
Eds1 Week12
Eds1 Week12
Software ‐ 1
Probability Theory and Statistical Inference
L
Testing of Hypothesis
TE
:::
NP Lecture 65
Basics of Tests of Hypothesis and Decision Rules
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Testing of Hypothesis:
Notice that different samples give different values of estimates.
L
TE
The difference in the estimated value may be due to random
variation or there may be some assignable cause/reason.
NP
The value which we want to know is unknown. So want to check if
the estimated value differs with some other reasonable value or not.
2
Testing of Hypothesis:
In simple words, we would like to test the closeness of estimated
value with some hypothetical value.
L
• If differences is less, we can expect to accept it and would say that
TE
there is not much significant difference between the two values.
NP
• If difference increases, we can expect to reject it and would say
that there is significant difference between the two values.
L
Suppose we want to determine whether a new method of sealing
TE
LED light bulbs will increase the life of the LED bulbs, whether one
NP
method of preserving foods is better than another so as to retain the
vitamins more, and so on.
L
Suppose data can be obtained using a survey, observation, or an
TE
experiment.
NP
If, given a prespecified uncertainty level, a statistical test based on
the data supports the hypothesis about the population, we say that
this hypothesis is statistically proven.
5
Hypothesis:
A statistical hypothesis is usually a statement about a set of
parameters of a population distribution.
L
It is called a hypothesis because it is not known whether or not it is
TE
true.
NP
A primary problem is to develop a procedure for determining
whether or not the values of a random sample from this population
are consistent with the hypothesis.
6
Research Hypothesis:
Research Hypothesis is a statement of what the researcher believes
will be the outcome of an experiment or a study
For example,
L
TE
– Companies with more than Rs. 1 billion of assets spend a
higher percentage of their annual budget on advertising than
NP
the companies with less than Rs. 1 billion of assets.
7
Statistical Hypothesis:
Statistical Hypothesis is a formal structure used to statistically (based
on a sample) test the research hypothesis.
L
A claim (assumption) about a population parameter
TE
Examples:
NP
– The average monthly cell phone bill of the people in a city is
Rs. 1000.00
L
If the statistical hypothesis completely specifies the distribution,
TE
then it is called simple; otherwise, it is called composite hypothesis.
NP
9
Statistical Hypothesis: Simple and Composite Hypothesis
The statement “𝝁 is equal to 17” is simple hypothesis since it
completely specifies the distribution.
L
TE
The statement “𝝁 is less than 17” is a composite hypothesis as it
NP
does not completely specify the distribution.
10
Test of a Statistical Hypothesis:
A test of a statistical hypothesis is a rule or procedure for deciding
whether to reject the hypothesis.
L
having an unknown mean value 𝝁 and known variance 1.
TE
The statement “𝝁 is less than 17” is a statistical hypothesis that we
NP
could try to test by observing a random sample from this population.
For example, consider a test Toss a coin, and reject the hypothesis if
L
TE
We will consider only the nonrandomized tests
NP
12
Testing of Hypothesis: One Sample and Two Sample Test
When we are working only with one sample to conclude about a
value, the problem is termed as one sample problem.
L
TE
conclusion about a value, the problem is termed as two sample
problem.
NP
The two samples can be independent or dependent based on which
we have “two independent samples problem” or “two dependent
samples problem” (“paired data problem”).
13
Testing of Hypothesis: One Sample and Two Sample Test
We judge if the marks obtained by the students are, say 80% or not.
We draw a sample of students and collect their marks. This is one
sample test.
L
TE
mathematical tuition.
L
TE
We are interested in knowing whether the ability to solve
mathematical problems increases after the tuition, or not.
NP
Since the same group of students is used in a pre–post experiment,
this is called a “two‐dependent‐samples problem” or a “paired data
problem”.
15
Construction of Decision Rule:
Let 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 be a random sample of n observations from a
probability 𝒇𝑿 𝒙; 𝜽 , 𝜽 ∈ 𝚯 where 𝚯 is the parametric space.
L
Now 𝜽 comes from either 𝚯𝟎 or 𝚯𝟏 .
TE
Translate the events as 𝜽 ∈ 𝚯𝟎 or 𝜽 ∈ 𝚯𝟏 .
NP
16
Construction of Decision Rule:
Hypothesis: 𝜽 ∈ 𝚯𝟎 or
Hypothesis: 𝜽 ∈ 𝚯𝟏
L
sample of data 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 .
TE
We want to devise a rule that can tell us the decision to accept or
NP
reject the hypothesis once the experimental values 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are
observed.
17
Construction of Decision Rule:
Different rules of tests of hypothesis can be constructed.
L
TE
• If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are such that the point 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝑪 , we
shall reject H0.
NP
• If 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 are such that the point 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C∗, we
shall accept H0.
18
Construction of Decision Rule:
There is no common region between C and C* to avoid any ambiguity
in decision.
L
TE
Next question is how to partition the sample space of 𝑿𝟏 , 𝑿𝟐 , … , 𝑿𝒏 ?
NP
Consider the possible errors in decision making.
19
Decisions in Tests of Hypothesis:
Used to check the correctness of a statement in statistical sense.
L
INCORRECT DECISION
TE
CORRECT
TYPE ONE ERROR
Statement Accept incorrect
NP Reject incorrect decision:
is incorrect decision:
CORRECT
INCORRECT DECISION
20
Procedure for Test:
A good procedure is the one which minimizes both the errors ‐ Type I
and Type II.
L
TE
NP
21
Testing of Hypothesis: Example
Suppose a new type of blood test is introduced.
A patient has some disease and he wants to get it tested with the
blood test.
L
TE
Test
NP
Detects Not detects
Presence Correct decision Incorrect decision
Disease
Absent Incorrect decision Correct decision
22
Testing of Hypothesis: Example
Error 1: Person has disease and test does not detects.
Error 2: Person does not has disease and test detects it.
L
Error 1 has more serious consequences than error 2.
TE
NP
Whichever error is more serious‐ designate it as Type‐1 error and
other as Type‐2 error.
23
Testing of Hypothesis: Example
Construct the hypothesis which you want to test such that Type‐1
error is more serious than Type‐2
L
TE
Type 1 error: Person has disease and test does not detects it.
NP
Hypothesis: Person does not has disease.
L
TE
So create an alternative hypothetical value.
NP
The value which we want to test is null hypothesis and the value
against which we want to test is the alternative hypothesis
25
Statistical Hypothesis:
Hypothesis: Statement about the parameter
– Null Hypothesis
L
– Alternative Hypothesis
TE
NP
26
Null Hypothesis:
How to construct the null hypothesis.
The null hyopothesis is constructed such that the type 1 error is more
serious than type 2 error.
The tests are derived using Neyman Pearson lemma which is based
L
on this assumption.
TE
• Denoted as H0 , the statement to be tested.
NP
• Nothing new is happening, usually a hypothesis of no difference.
• Begin with the assumption that the H0 is true and test it for
rejection or acceptance. 27
Alternative Hypothesis:
• Denoted as H1 or Ha
L
• Statement against which the null hypothesis is tested
TE
Example: The average number of TV sets in Indian homes is not equal
to 3 ( H1: μ ≠ 3 )
NP
• It is generally the hypothesis that the researcher is trying to prove.
Actual Situation
Decision
L
H0 True H0 False
TE
Accept H0 No Error Type II Error
Probability 1 - Probability
Reject H0
NP
Type I Error No Error
Probability Probability 1 -
29
Two Types of Errors:
Type I Error: Reject H0, when it is true
L
= P(Reject H0, when it is true)
TE
= P(Reject H0| H0)
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ 𝑪| H0)
NP
= called as Level of Significance.
30
Two Types of Errors:
Size of Type II Error = P(Type II Error)
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C∗ | H1)
L
TE
=
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C | H1)
= 𝑲 𝜽 ,𝜽 ∈ 𝚯
𝜶 𝜽 𝐢𝐟 𝜽 ∈ 𝚯𝟎
𝟏 𝜷 𝜽 𝐢𝐟 𝜽 ∈ 𝚯𝟏
L
Power Function = 𝑲 𝜽
TE
= 1 ‐ P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C* | H1)
NP
= P( 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ∈ C | H1)
32
Procedure for Test:
A good procedure is the one which minimizes both the errors ‐ Type I
and Type II.
L
TE
If one error decreases, then the other error increases.
NP
33
Procedure for Test:
Extreme cases:
L
TE
Both cases: One error is minimized and other error is maximized.
Suppose 𝚯 𝜽𝟎 , 𝜽𝟏 .
L
Then the best critical region of size 𝜶 is obtained by the following rule:
TE
Reject 𝑯𝟎 if and only if
𝑳 𝜽 𝟎 ; 𝒙𝟏 , 𝒙 𝟐 , … , 𝒙 𝒏
NP 𝒌
𝑳 𝜽 𝟏 ; 𝒙𝟏 , 𝒙 𝟐 , … , 𝒙 𝒏
L
TE
NP
36
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference
Testing of Hypothesis
L
TE
:::
Lecture 66
NP
Test Procedures for One Sample Test for Mean
with Known Variance
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Critical (Rejection) and Acceptance Region:
Critical value divides the whole area under probability curve into two
regions:
L
– When the statistical outcome falls into this region, H0 is rejected.
TE
– Size of this region is α.
• Acceptance Region NP
– When the statistical outcome falls into this region, H0 is accepted.
2
One‐ and Two‐Sided Tests:
We distinguish between one‐sided and two‐sided hypotheses and
tests.
L
Case Null hypothesis Alternative hypothesis
TE
(a)
𝜽 𝜽𝟎 𝜽 𝜽𝟎 Two‐sided test
(b)
𝜽 𝜽𝟎
NP 𝜽 𝜽𝟎 One‐sided test
(c)
𝜽 𝜽𝟎 𝜽 𝜽𝟎 One‐sided test
3
One‐ and Two‐Sided Tests:
L
Acceptance Region
TE
Region of Region of
Rejection
NP Rejection
Critical Values
4
Critical Values:
Distribution of the test statistic
One Tail Tests
L
Upper‐tail test or Right Tail Test
H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎
TE
NP
0
Lower‐tail test or Left Tail Test
H0: 𝜽 𝜽𝟎 Vs H1: 𝜽 𝜽𝟎
5
Steps for Testing of hypothesis:
In order to test the null hypothesis, a “good“ statistical test is
developed by fixing the type one error and minimizing the second
type error.
L
TE
interest, and specify them in terms of population parameters
6
Steps for testing of hypothesis:
5. Construct a test statistic T(X) = T(X1,X2,...,Xn) on the basis of
observed data X1,X2,...,Xn
L
TE
if T falls in this region – H0 is rejected.
7
Steps for testing of hypothesis:
10. Find the corresponding value from the corresponding probability
distribution.
11. Compare the two values and accept or reject the null hypothesis.
L
TE
if t(x) falls into the critical region K, then H0 is rejected. The
alternative hypothesis is then accepted.
NP
If t(x) falls outside the critical region, H0 is not rejected.
8
p ‐ Value:
Statistical software usually does not follow all the steps of
hypothesis testing.
L
TE
statistics is printed together with p‐value.
NP
It is possible to use the p‐value instead of critical regions for making
test decisions.
L
PH0(T ≤ t(x)) = p‐value
TE
Note: PH0 means probability when H0 is true.
NP
If p value < significance level (α) then reject the null hypothesis
10
Test Decisons Using Confidence Intervals:
Interesting and useful relationship between confidence intervals
and hypothesis tests
L
100(1‐ α)% confidence interval which yields the same conclusion
TE
as the test.
NP
Suppose H0: θ = θ0
If confidence interval does not contain the value θ0, then H0 is
rejected.
11
Hypothesis Testing for μ (σ known): Example
• A survey, done 10 years ago, of CA’s (Chartered Accountant)
found that their average annual salary was Rs. 74,914.
• An accounting researcher would like to test whether over the
years
L
– this average has increased?
TE
• Right or Upper Tail Test (H0: μ = 74914, H1: μ > 74914)
– this average has decreased?
NP
• Left or Lower Tail Test (H0: μ =74914, H1: μ < 74914)
– this average has changed?
• Two Tail Test (H0: μ = 74914, H1: μ ≠ 74914)
• A sample of 112 CAs gave a mean annual salary of Rs. 78,695.
• Assume that = Rs. 14,530. 12
Hypothesis Testing for μ (σ known): Example
• Assumptions:
– σ is known.
– Population is normal or
– sample size is large (n ≥ 30).
L
• Test Statistic: Compute the value of test statistic using following
formula:
TE
𝒙 𝝁
𝒁𝒄
NP 𝝈/ 𝒏
• Level of Significance: Fix the value of a, say 0.05 or 0.10
• Critical Values:
L
• P(‐ zα/2 < Z< zα/2 ) = 1‐ α
TE
• P(‐ 1.96 < Z < 1.96 ) = 0.95
L
TE
NP
• For left tail test (H0: μ = μ0, H1: μ < μ0): ‐zα
• P( Z > ‐zα) = 1‐ α N(0,1)
L
If Zc > zα (for right tailed test)
TE
If Zc < ‐zα (for left tailed test)
16
Hypothesis Testing for μ (σ known): Example
• State H0 and H1
L
Left/ Two tailed test)
TE
• Compare computed value of Zc with critical value
L
– this average has increased?
TE
• Right or Upper Tail Test (H0: μ = 74914, H1: μ > 74914)
– this average has decreased?
NP
• Left or Lower Tail Test (H0: μ =74914, H1: μ < 74914)
– this average has changed?
• Two Tail Test (H0: μ = 74914, H1: μ ≠ 74914)
• A sample of 112 CAs gave a mean annual salary of Rs. 78,695.
• Assume that = Rs. 14,530. 18
Hypothesis Testing for μ (σ known): Example
Has the average salary of CAs increased?
• Right or Upper Tailed Test (H0: μ = 74914, H1: μ > 74914)
• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539
L
𝜎⁄ 𝑛 14530⁄ 112
TE
• At 5% level of significance, critical value for a right tailed test,
z(0.05) =1.645
•
NP
Since, computed value > critical value at 5% level of significance
19
Hypothesis Testing for μ (σ known): Example
Has the average salary of CAs decreased?
• Left or Lower Tailed Test (H0: μ = 74914, H1: μ < 74914)
• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539
L
𝜎⁄ 𝑛 14530⁄ 112
TE
• At 5% level of significance, critical value for a left tailed test, z(0.05)
= ‐1.645
NP
• Since, computed value > critical value at 5% level of significance,
• Test Statistic
𝑥̅ 𝜇 78695 74914
𝑍 2.7539
L
𝜎⁄ 𝑛 14530⁄ 112
TE
• At 5% level of significance, critical value for a two tailed test,
z(0.025) =1.96
•
NP
Since, |computed value| > critical value at 5% level of
significance, we reject H0 at 5% level of significance in favor of H1
21
p – Value Approach:
• Let Zc be the computed value of test statistic
• Let Z ~ N(0,1)
• Then p – value is given by the following probability
– For two tailed tests:
• 2P(Z> |Zc|)
L
– For right tailed tests:
TE
• P(Z > Zc)
– For left tailed tests:
NP
• P(Z < Zc)
• Decision: H0 is rejected in the favor of H1 at α x100% level of
significance, if p – value < α
• The p – value is the smallest level of significance at which H0 would
be rejected.
22
Classical Gauss Test
Gauss.test is available in library ‘’compositions’’
(Compositional Data Analysis). So it needs to be installed first.
install.packages("compositions")
library(compositions)
L
Description
TE
One and two sample Gauss test for equal mean of normal random
NP
variates with known variance.
Usage
Gauss.test(x,y=NULL,mean=0,sd=1,alternative =
c("two.sided", "less", "greater"))
23
Classical Gauss Test
Arguments
x : a numeric vector providing the first dataset
y : optional second dataset
mean : the mean to compare with
L
: the known standard deviation
TE
sd
alternative : the alternative to be used in the test
NP
24
Hypothesis Testing for μ (σ known): Example in R
L
sample provides the following values of temperature (in degree
TE
Celsius) :
NP
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2
25
Hypothesis Testing for μ (σ known): Example in R
install.packages("compositions")
library(compositions)
L
TE
temp=c(40.2, 32.8, 38.2, 43.5, 47.6, 36.6,
38.4, 45.5, 44.4, 40.3, 34.6, 55.6, 50.9, 38.9,
37.8, 46.8, 43.6, 39.5, 49.9, 34.2 )
NP
26
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 30 H1: μ ≠ 30
L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1e-06
NP
alternative hypothesis: two.sided
27
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 30 H1: μ < 30
L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1
NP
alternative hypothesis: less
28
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 30 H1: μ > 30
L
TE
data: temp
T = 41.965, mean = 30, sd = 6, p-value = 1
NP
alternative hypothesis: less
29
Critical Values in t and F distribution:
P[t > q]=0.025 based on t distribution
> qt(p=0.25, df=7, ncp=0, lower.tail=F)
[1] 0.7111418
L
P[t 1.96 or t > 1.96](2‐tailed) based on t distribution
TE
> qt(p=0.25, df=7, ncp=0, lower.tail=F)*2
[1] 1.422284 NP
P[F > p] = 0.025 based on F distribution
> qf(p=0.025, df1=5, df2=10, ncp=0, lower.tail
= F)
[1] 4.236
30
Critical Values in Normal Distribution:
P[z > q]=0.025 based on Normal distribution
> qnorm(0.025, lower.tail=F)
[1] 1.959964
L
P[z 1.96 or z > 1.96](2‐tailed) based on Normal distribution
TE
> pnorm(1.96, lower.tail=F)*2
[1] 0.04999579 NP
P[z 3] based on Normal distribution
> (1 - pnorm(3))*2
[1] 0.04999579 31
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference
L
Testing of Hypothesis
TE
:::
NPLecture 67
One Sample Test for Mean with Unknown Variance
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Hypothesis Testing for μ (σ unknown): t ‐ Test
• Let X1,X2,...,Xn be a random sample from 𝑵 𝝁, 𝝈𝟐 where
• Assumptions:
– σ is unknown.
L
– Population is normal or
TE
– sample size is small (n ≤ 30).
NP
• Test Statistic: Compute the value of test statistic using following
formula:
𝒙 𝝁
𝑻𝒄
𝒔/ 𝒏
𝟏
where 𝒔𝟐 ∑𝒏𝒊 𝟏 𝒙𝒊 𝒙 𝟐
.
𝒏 𝟏 2
Hypothesis Testing for μ (σ unknown): t ‐ Test
• Level of Significance: Fix the value of α, say 0.05 or 0.10
L
table for given d.f. and significance level.
TE
• For large samples, t distribution can be approximated by N(0,1).
NP
3
Hypothesis Testing for μ (σ unknown): t ‐ Test
• For two tail test (H0: μ = μ0, H1: μ ≠ μ0):
T~t(n-1)
Rejection Rejection
Region (α/2) Acceptance Region (α/2)
Region (1- α)
L
tα/2
TE
tα/2
4
Hypothesis Testing for μ (σ unknown): t ‐ Test
We reject H0 in the favor of H1 at α x100% level
L
• If Tc < ‐tα (for left tailed test)
TE
NP
5
Hypothesis Testing for μ (σ known): t – Test
T~t(n-1)
For right tail test (H0: μ = μ0, H1: μ > μ0):
α
• P(T < tα) = 1- α α
• P(T > tα ) = α
L
tα
TE
NP
• For left tail test (H0: μ = μ0, H1: μ < μ0):
T~t(n-1)
• P(T > -tα) = 1- α α
tα 6
Hypothesis Testing for μ (σ unknown): p – value
Approach
Let tc be the computed value of test statistic
Let T ~ t(n‐1)
L
TE
• For two tailed tests: 2P(T > |tc|)
Decision:
if p – value < α
7
Hypothesis Testing for μ (σ unknown): t ‐ Test in R
t.test()
Performs one and two sample t‐tests on vectors of data.
Usage
t.test(x, ...)
L
TE
t.test(x, y = NULL, alternative =
NP
c("two.sided", "less", "greater"), mu = 0,
paired = FALSE, var.equal = FALSE, conf.level
= 0.95, ...)
8
Hypothesis Testing for μ (σ unknown): t ‐ Test in R
Arguments
x a (non‐empty) numeric vector of data values.
y an optional (non‐empty) numeric vector of data values.
L
alternative a character string specifying the alternative
TE
hypothesis, must be one of "two.sided" (default), "greater"
NP
or "less". You can specify just the initial letter.
9
Hypothesis Testing for μ (σ unknown): t ‐ Test in R
Arguments
paired a logical indicating whether you want a paired t‐test.
L
variances as being equal. If TRUE then the pooled variance
TE
is used to estimate the variance otherwise the Welch (or
NP
Satterthwaite) approximation to the degrees of freedom is used.
10
Hypothesis Testing for μ (σ unknown): Example in R
L
unknown. The sample provides the following values of
TE
temperature (in degree Celsius) :
NP
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2
L
TE
data: temp
t = 1.4476, df = 19, p-value = 0.164
NP
alternative hypothesis: true mean is not equal
to 40
95 percent confidence interval:
39.12384 44.80616
sample estimates:
mean of x
41.965
12
Hypothesis Testing for μ (σ unknown): Example in R
L
TE
NP
13
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 40 H1: μ < 40
t.test(temp, y = NULL, alternative = "less",
mu = 40, paired = FALSE, var.equal = FALSE,
conf.level = 0.95)
L
TE
data: temp
t = 1.4476, df = 19, p-value = 0.918
NP
alternative hypothesis: true mean is less than
40
95 percent confidence interval:
-Inf 44.3122
sample estimates:
mean of x
41.965
14
Hypothesis Testing for μ (σ known): Example in R
L
TE
NP
15
Hypothesis Testing for μ (σ known): Example in R
H0: μ = 40 H1: μ > 40
t.test(temp, y = NULL, alternative =
"greater", mu = 40, paired = FALSE, var.equal
= FALSE, conf.level = 0.95)
L
One Sample t-test
data: temp
TE
t = 1.4476, df = 19, p-value = 0.08202
alternative hypothesis: true mean is greater
than 40 NP
95 percent confidence interval:
39.6178 Inf
sample estimates:
mean of x
41.965
16
Hypothesis Testing for μ (σ unknown): Example in R
L
TE
NP
17
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference
Testing of Hypothesis
L
TE
:::
Lecture 68
NP
Two Sample Test for Mean with Known and
Unknown Variances
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
We want to conduct statistical Hypothesis testing of difference of
means
L
random in a super market ‘A’. Their average weekly food
TE
expenditure is Rs. 250 with a s.d. of Rs. 40.
NP
• For 400 women shoppers chosen at random in some other
super market ‘B’ , the average weekly food expenditure is Rs.
220 with a s.d. of Rs. 55.
2
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
• Do these two populations have similar shopping habits.
L
TE
NP
3
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
• Suppose we have two samples
L
• 𝒙𝟏 is the mean and 𝒏𝟏 is the size of 1st sample.
TE
• Second sample is drawn from a population with mean 𝝁𝟐 and
variance 𝝈𝟐𝟐 . NP
• 𝒙𝟐 is the mean and 𝒏𝟐 is the size of 2nd sample.
4
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances)
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2
L
– H0: μ1 = μ2 H1: μ1 < μ2
TE
• Since population means μ1 and μ2 are unknown, we use the
statistic 𝒙𝟏 𝒙𝟐 to make some conclusion about 𝝁𝟏
NP 𝝁𝟐 .
• Assumptions:
• Test Statistic:
𝝈𝟐𝟏 𝝈𝟐𝟐
L
𝐄 𝒙𝟏 𝒙𝟐 = 𝝁𝟏 𝝁𝟐 and 𝐕𝐚𝐫 𝒙𝟏 𝒙𝟐 =
𝒏𝟏 𝒏𝟐
TE
𝒙𝟏 𝒙𝟐 𝟎 𝒙𝟏 𝒙𝟐
Thus 𝒁𝒄 when H0 is true.
𝝈𝟐 𝟐
𝟏 𝝈𝟐
𝒏𝟏 𝒏𝟐
NP 𝝈𝟐 𝟐
𝟏 𝝈𝟐
𝒏𝟏 𝒏𝟐
6
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) in R
Classical Gauss Test
L
install.packages("compositions")
TE
library(compositions)
Usage NP
Gauss.test(x, y, mean=0, sd=1, alternative =
c("two.sided", "less", "greater"))
7
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) in R
Arguments
L
mean : the mean to compare with
TE
sd : the known standard deviation
NP
alternative : the alternative to be used in the test
8
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) : Example in R
Following are the gain in weights (in grams) of fishes fed on two
diets A and B.
A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25
L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22
TE
Suppose the standard deviation is known to be 2
NP
We want to test if the two diets differ significantly as regards their
effect on increase in weight.
H0: μ1 = μ2 H1: μ1 ≠ μ2
9
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) : Example in R
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)
xb =
c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)
L
TE
Gauss.test(xa, xb, mean=0, sd=2, alternative =
c("two.sided")) NP
one sample Gauss-test
data: xa
T = -2, mean = 0, sd = 2, p-value = 0.009823
alternative hypothesis: two.sided 10
Testing Hypothesis for Difference of Means (Independent Samples:
Known Population Variances) : Example in R
L
TE
NP
11
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances) :
• When σ1 and σ2 both are known, we use N(0, 1) distribution.
• Assumptions:
L
– Both populations are normal.
TE
– Unknown σ1 and σ2 are equal.
NP
• For large sample sizes, we can still use N(0,1) distribution.
12
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances) :
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2
L
– H0: μ1 = μ2 H1: μ1 < μ2
TE
• Let
𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and
NP
𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐
• Test Statistic:
𝐄 𝒙 𝒚 = 𝝁𝟏 𝝁𝟐 and
L
TE
𝒎 𝒏
𝟏
𝑺𝟐 𝒙𝒊 𝒙 𝟐 𝒚𝒊 𝒚 𝟐
𝒎 𝒏 𝟐
𝒊 𝟏
NP 𝒋 𝟏
𝒙 𝒚 𝟎 𝒙 𝒚
Thus 𝑻𝒄 when H0 is true.
𝟏 𝟏 𝟏 𝟏
𝑺 𝑺
𝒎 𝒏 𝒎 𝒏
A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25
L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22
TE
Suppose the standard deviation is unknown.
NP
(Note that now the sd is unknown (earlier it was known))
H0: μ1 = μ2 H1: μ1 ≠ μ2
15
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances): Example in R
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)
xb = c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)
L
paired = FALSE, var.equal = TRUE, conf.level = 0.95)
TE
Two Sample t-test
data: xa and xb
NP
t = -0.61028, df = 25, p-value = 0.5472
alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
-8.749507 4.749507
sample estimates:
mean of x mean of y
28 30 16
Testing Hypothesis for Difference of Means (Independent Samples:
Unknown Population Variances): Example in R
L
TE
NP
17
Testing Hypothesis for Difference of Means (Dependent
Samples):
Assume a company send their salespeople to a “customer service”
training workshop. They want to know has the training made a
difference in the number of complaints.
L
Following data is collected:
TE
Salesperson Number of Complaints
NP Before After
C.B. 6 4
T.F. 20 6
M.H. 3 2
R.K. 0 0
M.O 4 0 18
Testing Hypothesis for Difference of Means (Dependent
Samples):
• To test,
– H0: μ1 = μ2 H1: μ1 ≠ μ2
L
– H0: μ1 = μ2 H1: μ1 < μ2
TE
• Let
𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 ~𝑵 𝝁𝟏 , 𝝈𝟐𝟏 and
NP
𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐
19
Testing Hypothesis for Difference of Means (Dependent
Samples):
• The two samples are not independent.
L
same i th individual.
TE
• Cleary, the sizes of both samples should be the same.
NP
• Population means 𝝁𝟏 and 𝝁𝟐 are unknown, we use the statistic
𝒙 𝒚 to make some conclusion about 𝝁𝟏 𝝁𝟐 .
L
• Even if σ1 and σ2 are known, then also we can not obtain
TE
population variance of di’s. (samples are not independent)
NP
• So, this test can be considered as
L
𝟏 𝒏 𝟏 𝟐
where 𝒅 ∑ 𝒅, 𝒔𝟐𝒅 ∑𝒏𝒊 𝟏 𝒅𝒊 𝒅
TE
𝒏 𝒊 𝟏 𝒊 𝒏 𝟏
NP
• Under our assumptions, 𝑻𝒄 ~𝒕𝒏 𝟏.
22
Testing Hypothesis for Difference of Means (Dependent
Samples):
Usage
L
TE
t.test(x, y, alternative = c("two.sided",
"less", "greater"), mu = 0, paired = TRUE,
NP
var.equal = TRUE, conf.level = 0.95)
23
Testing Hypothesis for Difference of Means (Dependent
Samples):
Arguments
x a (non‐empty) numeric vector of data values.
y a (non‐empty) numeric vector of data values.
L
alternative a character string specifying the alternative
TE
hypothesis, must be one of "two.sided" (default), "greater" or
"less". NP
mu a number indicating the true value of the mean (or difference in
24
Testing Hypothesis for Difference of Means (Dependent
Samples):
Arguments
var.equal a logical variable indicating whether to treat the two
variances as being equal. If TRUE then the pooled variance is used
L
to estimate the variance otherwise the Welch (or Satterthwaite)
TE
approximation to the degrees of freedom is used.
NP
conf.level confidence level of the interval.
25
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
Assume a company send their salespeople to a “customer service”
training workshop. They want to know has the training made a
difference in the number of complaints.
L
Following data is collected:
TE
Salesperson Number of Complaints Difference, di
NP
Before (1) After (2) (2‐1)
C.B. 6 4 ‐2
T.F. 20 6 ‐14
M.H. 3 2 ‐1
R.K. 0 0 0
M.O 4 0 ‐4
26
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
• To test H0: μ1 = μ2 H1: μ1 ≠ μ2
L
𝒅 𝟒.𝟐
𝑻𝒄 𝟏. 𝟔𝟔 Reject Reject
𝒔𝒅 / 𝒏 𝟓.𝟔𝟕/ 𝟓
TE
/2 /2
NP
• Decision: Do not reject H0
‐ 4.604
‐ 1.66
4.604
27
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
ya = c(6,20,3,0,4)
yb = c(4,6,2,0,0)
L
t.test(ya, yb, alternative = c("two.sided"), mu
TE
= 0, paired = T)
NP
28
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
Paired t-test
data: ya and yb
t = 1.655, df = 4, p-value = 0.1733
L
alternative hypothesis: true difference in
TE
means is not equal to 0
95 percent confidence interval:
NP
-2.845828 11.245828
sample estimates:
mean of the differences
4.2
29
Testing Hypothesis for Difference of Means (Dependent
Samples): Example in R
L
TE
NP
30
Essentials of Data Science With R
Software ‐ 1
Probability Theory and Statistical Inference
Testing of Hypothesis
L
TE
:::
Lecture 69
NP
Test of Hypothesis for Variance in One and Two
Samples
Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
1
Hypothesis Testing for σ2 :
L
syrup bottles whose volume of medicine has a standard
TE
deviation of 1.6 mL.
NP
• The new machine they are considering was tested on 30 bottles,
producing a batch with a standard deviation of 1.2 ml.
2
Hypothesis Testing for σ2 :
• Assumptions:
– Population is normal.
• Test Statistic:
L
𝒏
𝒙𝒊 𝒙 𝟐 𝒏 𝟏 𝒔𝟐
TE
𝒊 𝟏
𝝌𝟐𝒄
𝝈𝟐 𝝈𝟐
NP
• Distribution of above test statistic is Chi Square with (n ‐ 1)
degree of freedom.
• Critical values are obtained from the Chi Square table for given
level of significance and d.f.
3
Hypothesis Testing for σ2 :
• For two tail test (H0: σ = σ0, H1: σ ≠ σ0):
Critical Region
Critical Region
(α/2)
(α/2)
Acceptance Region
L
TE
(1‐α)
0
(21 / 2 ) (2 / 2 )
NP
• This distribution is not symmetric.
• So, we have different notations for critical values.
• Area after the point is mentioned in subscript.
𝝌𝟐𝟏 /𝟐 4
Hypothesis Testing for σ2 :
• For right tail test (H0: σ = σ0, H1: σ > σ0):
Critical Region
(α)
L
Acceptance Region
TE
(1‐α)
0
NP (2 )
• We reject H0 in the favor of H1 at α x100% level, if 𝝌𝟐𝒄 𝝌𝟐
5
Hypothesis Testing for σ2 :
• For left tail test (H0: σ = σ0, H1: σ < σ0):
Critical Region
(α)
L
Acceptance Region
TE
(1-α)
0
(21 ) NP
• We reject H0 in the favor of H1 at α x100% level, if 𝝌𝟐𝒄 𝝌𝟐𝟏
6
One‐Sample Chi‐Squared Test on Variance :
Estimate the variance, test the null hypothesis using the chi‐
squared test that the variance is equal to a user‐specified value,
and create a confidence interval for the variance.
L
install.packages("EnvStats")
TE
library(EnvStats)
NP
Usage
varTest(x, alternative = "two.sided",
conf.level = 0.95, sigma.squared = 1)
7
One‐Sample Chi‐Squared Test on Variance:
Arguments
x numeric vector of observations.
L
"greater", and "less".
TE
conf.level numeric scalar between 0 and 1 indicating the
NP
confidence level associated with the confidence interval for the
population variance. The default value is conf.level=0.95.
L
unknown. The sample provides the following values of
TE
temperature (in degree Celsius) :
40.2, 32.8, 38.2, 43.5, 47.6, 36.6, 38.4, 45.5, 44.4, 40.3, 34.6, 55.6,
NP
50.9, 38.9, 37.8, 46.8, 43.6, 39.5, 49.9, 34.2
L
Chi-Squared = 19.45, df = 19, p-value = 0.8566
TE
alternative hypothesis: true variance is not equal
to 36
NP
95 percent confidence interval:
21.31373 78.61721
sample estimates:
variance
36.85292
10
Hypothesis Testing for σ2 : Example in R
L
TE
NP
11
Hypothesis Testing for σ2 : Example in R
• We want to test: H0: σ2 = 36, H1: σ2 > 36, right tail test, = 5%
varTest(temp, alternative = "greater", conf.level =
0.95, sigma.squared = 36)
Chi-Squared Test on Variance
L
data: temp
Chi-Squared = 19.45, df = 19, p-value = 0.4283
TE
alternative hypothesis: true variance is greater
than 36
NP
95 percent confidence interval:
23.22905 Inf
sample estimates:
variance
36.85292
12
Hypothesis Testing for σ2 : Example in R
L
TE
NP
13
Hypothesis Testing for σ2 : Example in R
• We want to test: H0: σ2 = 36, H1: σ2 < 36, left tail test, = 5%
varTest(temp, alternative = "less", conf.level =
0.95, sigma.squared = 36)
Chi-Squared Test on Variance
L
data: temp
Chi-Squared = 19.45, df = 19, p-value = 0.5717
TE
alternative hypothesis: true variance is less than
36
NP
95 percent confidence interval:
0.00000 69.21069
sample estimates:
variance
36.85292
14
Hypothesis Testing for σ2 : Example in R
L
TE
NP
15
Testing the Hypothesis for Difference of Variances:
We have two populations with variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐
L
So we take a sample from each population.
TE
And use sample variances to study about population variances.
We wish to test:
NP
𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐
𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ~𝑵 𝝁𝟐 , 𝝈𝟐𝟐
L
TE
𝟏
∑𝒎 𝒙𝒊 𝒙 𝟐
𝒎 𝟏 𝒊 𝟏
Test Statistic: 𝑭𝒄 𝟏
∑𝒏 𝒚𝒊 𝒚 𝟐
𝒏 𝟏 𝒊 𝟏
NP
Fc ~ F(m-1, n-1) when the null hypothesis is true.
Critical values are obtained using F table for given d.f. and level of
significance.
17
Testing the Hypothesis for Difference of Variances:
For two tail test (𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐 ):
Critical Region
(α/2) Critical Region
(α/2)
Acceptance Region
L
(1‐α)
TE
0
Reject H0 Accept H0 Reject H0
NP
F(1 / 2 ) F( / 2 )
𝑭𝒄 𝑭𝟏 𝜶/𝟐 and critical values are obtained using F table for given
18
d.f. and level of significance.
Testing the Hypothesis for Difference of Variances:
For two tail test (𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐 ):
Critical Region
(α)
L
Acceptance Region
TE
(1‐α)
0
Accept H0 Reject H0
NP F( )
19
Testing the Hypothesis for Difference of Variances:
For two tail test (𝑯𝟎 : 𝝈𝟐𝟏 𝝈𝟐𝟐 𝑯𝟏 : 𝝈𝟐𝟏 𝝈𝟐𝟐 ):
Critical Region
(α)
L
Acceptance Region
TE
(1‐α)
0
Reject H0 NP
F(1 ) Accept H0
20
Testing the Hypothesis for Difference of Variances In R:
Usage
var.test(x, ...)
L
var.test(x, y, ratio = 1, alternative =
TE
c("two.sided", "less", "greater"),
NP
conf.level = 0.95)
21
Testing the Hypothesis for Difference of Variances In R:
Arguments
L
and y.
TE
alternative a character string specifying the alternative
NP
hypothesis, must be one of "two.sided" (default), "greater" or
"less".
22
Testing the Hypothesis for Difference of Variances: Example in R
Following are the gain in weights (in grams) of fishes fed on two diets
A and B. Assume normal populations.
A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25
L
B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22
TE
We want to test if the variances of the two diets differ significantly as
NP
regards their effect on increase in weight.
xa = c(25,32,30,34,24,14,32,24,30,31,35,25)
xb = c(44,34,22,10,47,31,40,30,32,35,18,21,35,29,22)
23
Testing the Hypothesis for Difference of Variances: Example in R
var.test(xa, xb, ratio = 1, alternative =
"two.sided", conf.level = 0.95)
L
F = 0.343, num df = 11, denom df = 14, p-value =
TE
0.08144
alternative hypothesis: true ratio of variances is
not equal to 1 NP
95 percent confidence interval:
0.1108401 1.1520871
sample estimates:
ratio of variances
0.3430045
24
Testing the Hypothesis for Difference of Variances: Example in R
L
TE
NP
25