Professional Documents
Culture Documents
Stat Deck4
Stat Deck4
harishram@hotmail.com
KDUE73YTIL
Introduction to Statistics
● Small Sample
harishram@hotmail.com Test
KDUE73YTIL
○ One Sample
○ Two Sample
Testing of
Hypothesis
harishram@hotmail.com
KDUE73YTIL
Large
Sample Test
harishram@hotmail.com
KDUE73YTIL
2 3
4 6
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests
● The two sample tests are used to compare the equality of means of two populations
● Used to
harishram@hotmail.com
KDUE73YTIL
compare:
● Let there be two different populations such that they follow normal distribution
● The two
harishram@hotmail.com
KDUE73YTIL
samples are independent of each other
● The samples must have equal variance (it can be tested statistically)
H0: The variances are equal against H1: The variances are not equal
harishram@hotmail.com
KDUE73YTIL
● Failing to reject H0, implies that the two populations have equal variance
● Let there be two samples of sizes n1 and n2 drawn from normal population N(µ1, σ12)
and N(µ2, σ22) respectively
harishram@hotmail.com
● The hypothesis
KDUE73YTIL to test whether the population means are equal
H0 : µ1 = µ2 against H1 : µ1 ≠ µ1
● It implies
against H1: The two population means are not equal µ0 (i.e µ1 ≠ µ2)
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test - hypothesis
● If σ12 and σ22 are not known then they are replaced the sample estimates s12 and s22
respectively
harishram@hotmail.com
● The test
KDUE73YTIL statistic is Z given by
where
Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL
A study was carried out to understand amount of hemoglobin in blood for males and
harishram@hotmail.com
females. A random
KDUE73YTIL sample of 160 males and 180 females have means of 13 g/dl and 15
g/dl. The two samples have standard deviation of 4.1 g/dl for male donors and 3.5 g/dl for
female donor . Can it be said the population means of hemoglobin are the same for men
and women? Use α = 0.01.
harishram@hotmail.com
KDUE73YTIL
i.e. If test statistic is less than -2.58 or greater than 2.58 then we reject H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for population mean (σ unknown)
Python solution: Calculate test statistic
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
● Under the null hypothesis the test statistic follow exact distribution - t distribution, F
harishram@hotmail.com
KDUE73YTIL
distribution or χ2 distribution
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
● If σ2 is
harishram@hotmail.com
KDUE73YTIL
unknown, use the t-test
● The t distribution
harishram@hotmail.com
KDUE73YTIL
is a symmetric distribution
about the mean
● It is leptokurtic
harishram@hotmail.com
KDUE73YTIL
Degrees of freedom (d.f.) is the number of independent variables, i.e. they are free to vary.
Example:
harishram@hotmail.com
KDUE73YTIL
Consider a system of equations:
● Y=8+5Z
● Z = X + 10
Say if the value of Z is known, it is possible to find the value of other variables. Thus the
d.f. is 1.
● The hypothesis to test whether the population mean is equal to a specified value
when σ is unknown
harishram@hotmail.com H0 : µ = µ0 against H1 : µ ≠ µ0
KDUE73YTIL
● It implies
H0 : µ ≤ µ0 against H1 : µ > µ0
harishram@hotmail.com
KDUE73YTIL
Or
H0 : µ ≥ µ0 against H1 : µ < µ0
harishram@hotmail.com
KDUE73YTIL
or
● Under H0, the test statistic t follows t distribution with n-1 degrees of freedom
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
The python code to conduct a t test for population mean is
harishram@hotmail.com
KDUE73YTIL
scipy.stats.ttest_1samp(Sample, popmean)
Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL
The researcher collected 10 water samples of unit volume and found the mean
bacteria count to be 94.8 with a sample variance of 72.66. Does the data indicate
that the bacteria count is within the safety level? Test at the α = .05 level.
Assume that the measurements constitute a sample from a normal population.
The mean bacteria count of 100 per unit volume of water is within the safety level.
harishram@hotmail.com
i.e. µ = 100
KDUE73YTIL
The researcher collected 10 water samples of unit volume and found the mean bacteria
count to be 94.8 with a sample variance of 72.66.
i.e. n = 10, = 94.8 and s2 = 72.66
harishram@hotmail.com
KDUE73YTIL
As t-distribution is symmetric, for a left-tailed test if test statistic is less than -1.83, then we
reject H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Python solution: Calculate test statistic
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
● Like the two sample Z test, the two sample t test are used to compare the equality of
means of two populations for unpaired data
harishram@hotmail.com
KDUE73YTIL
● For unpaired data, test relative performance two machineries, investment portfolios
● Let there
harishram@hotmail.com
KDUE73YTIL
be two different populations such that they follow normal distribution
● Two samples of sizes n1 and n2 drawn from normal population N(µ1, σ12) and N(µ2,
σ22) respectively
H0 : µ1 = µ2 against H1 : µ1 ≠ µ1
harishram@hotmail.com
●
KDUE73YTIL
It implies
against H1: The two population means are not equal µ0 (i.e µ1 ≠ µ2)
H0 : µ1 ≤ µ2 against H1 : µ1 > µ2 Or H0 : µ1 ≥ µ2
This file is meant for personal use by harishram@hotmail.com only.
against H1 : µ1 < µ2 Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - test statistic
● If σ12 and σ22 are not known then they are replaced the sample estimates s12 and s22
respectively
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - test statistic
● If σ12 and σ22 are not known then they are replaced the sample estimates s12 and s22
respectively
● The pooled
harishram@hotmail.com
KDUE73YTIL
sample variance (s2) is used
● Failing to reject H0 implies that the two population means are equal (i.e µ1 = µ2)
scipy.stats.ttest_ind(Sample_1, Sample_2)
An experiment was conducted to compare the pain relieving hours of two new medicines.
harishram@hotmail.com
Two groupsof 14 and 15 patients were selected and were given comparable doses. Group
KDUE73YTIL
1 was given medicine 1 and group 2 was given medicine 2. Following data is obtained from
the two samples. Test whether the two populations give the same mean hours of relief.
Assume the data comes from normal distribution has equal variance. [Use α = 0.01]
Medicine 1 Medicine 2
harishram@hotmail.com
KDUE73YTIL
If test statistic is less than -2.77 or greater than 2.77, then we reject H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for unpaired data
Python solution: Calculate test statistic
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
● The observations are recorded on the same individual/item twice resulting in pairs of
harishram@hotmail.com
KDUE73YTIL observations
Example:
A energy drink manufacturing company wants to test if sales increase after they advertise
the drink with a life sized picture of a well-known athlete.
H0 : µd ≤ µ0 against H1 : µd > µ0
H0 : µd ≥ µ0 against H1 : µd < µ0
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests for paired data
● Under H0, the test statistic follows t-distribution with n-1 degrees of freedom
scipy.stats.ttest_rel(Sample_1, Sample_2)
An energy drink distributor claims that a new advertisement poster, featuring a life-size
harishram@hotmail.com
picture of a
well-known athlete. For a random sample of 10 outlets, the following data was
KDUE73YTIL
collected. Test that the null hypothesis that there the advertisement was effective in
increasing the sales
Before 33 32 38 45 37 47 48 41 45
After 42 35 31 41 37 36 49 49 48
To test,
harishram@hotmail.com
H0: The advertisement was not effective against H1: The advertisement was
KDUE73YTIL
effective ( µd ≤ 0)
(µd > 0)
Compute di
Before 33 32 38 45 37 47 48 41 45
After 42 35 31 41 37 36 49 49 48
To test,
harishram@hotmail.com
H0: The advertisement was not effective against H1: The advertisement was
KDUE73YTIL
effective ( µd ≤ 0)
(µd > 0)
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
The test statistic (=0.1) < critical value (=1.86), also the p-value > 0.05, thus we fail to reject
H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Test for Population Proportion
harishram@hotmail.com
KDUE73YTIL
Test for
proportion
harishram@hotmail.com
KDUE73YTIL
H0 : P = P0 against H1 : P ≠ P0
harishram@hotmail.com
●
KDUE73YTIL
It implies
Sample size
Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL
From a sample 361 business owners had gone into bankruptcy due to recession. On taking
harishram@hotmail.com
a survey, it
was found that 105 of them had not consulted any professional for managing
KDUE73YTIL
their finance before opening the business. Test the null hypothesis that at most 25% of all
businesses had not consulted before opening the business.
From a sample 361 business owners had gone into bankruptcy due to recession.
harishram@hotmail.com
i.e. n = 361
KDUE73YTIL
On taking a survey, it was found that 105 of them had not consulted any professional for
managing their finance before opening the business.
To test: The null hypothesis that at most 25% of all businesses had not consulted before
harishram@hotmail.com
opening the
business
KDUE73YTIL
Here P0 = 0.25
We may conclude that at least 25% of all businesses had not consulted before starting the
business.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion
Python solution: Calculate test statistic
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
Test for
proportion
harishram@hotmail.com
KDUE73YTIL
● Let there be two samples sizes n1 and n2 from different populations of such that x1 and
x2 are the number of specific items in each of them respectively
harishram@hotmail.com
KDUE73YTIL
● Suppose these samples have proportions of specific items p1 and p2 respectively
● To test the equality of population proportion from which these samples are chosen
H0 : P1 = P2 against H1 : P1 ≠ P2
harishram@hotmail.com
●
KDUE73YTIL
It implies
against H1: The two population proportions are not equal (P1 ≠ P2)
● Failing to reject H0 implies that the two population proportions are equal
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Test for proportions
Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL
Steve owns a kiosk where sells two magazines - A and B in a month. He buys 100 copies
harishram@hotmail.com
of magazineA out of which 78 were sold and 70 copies of magazine B out of which 65
KDUE73YTIL
were sold. Is there enough evidence to say that magazine is B is more popular?
Out of 100 copies of magazine A 78 are sold Out of 70 copies of magazine B 65 are sold
Here, x1 = 78 and n1 = 100 Here, x2 = 65 and n1 = 70
Let p1 be the proportion of sell of magazine A Let p2 be the proportion of sell of magazine B
p1= x1/n1 = 78/100 = 0.78 p2 = x2/n2 = 65/70 = 0.928
-2.59
harishram@hotmail.com
KDUE73YTIL
harishram@hotmail.com
KDUE73YTIL
● The times when these assumptions are not satisfied, use the non-parametric tests