Download as pdf or txt
Download as pdf or txt
You are on page 1of 93

Parametric Tests

harishram@hotmail.com
KDUE73YTIL

Introduction to Statistics

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Agenda

● Large Sample Test


○ Two Sample

● Small Sample
harishram@hotmail.com Test
KDUE73YTIL
○ One Sample
○ Two Sample

● Test for Population Proportion


○ One Sample
○ Two Sample

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
In this session…

Testing of
Hypothesis
harishram@hotmail.com
KDUE73YTIL

Large sample tests Small sample tests Tests for proportion

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Large Sample Tests
harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Large sample tests

Large
Sample Test
harishram@hotmail.com
KDUE73YTIL

One Sample Two Sample

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Large sample tests

Number of Defective Number of Defective


● One sample Items from Items from
Production Line A Production Line B
harishram@hotmail.com
1 5
KDUE73YTIL Can we say that
3 8 Production Line B
● Two sample
produces less defectives
5 2
than Production Line A?
6 5

2 3

4 6
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests

● The two sample tests are used to compare the equality of means of two populations

● Used to
harishram@hotmail.com
KDUE73YTIL
compare:

○ Performance of two machineries

○ Performance of two portfolios

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests for population mean

● Let there be two different populations such that they follow normal distribution

● The two
harishram@hotmail.com
KDUE73YTIL
samples are independent of each other

● The samples must have equal variance (it can be tested statistically)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Test for equality of variances
● The hypothesis to test whether two populations have equal variances

H0: The variances are equal against H1: The variances are not equal
harishram@hotmail.com
KDUE73YTIL

● Failing to reject H0, implies that the two populations have equal variance

● The Levene's test used

Note: The python code for levene’s test scipy.stats.levene()


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test - hypothesis

● Let there be two samples of sizes n1 and n2 drawn from normal population N(µ1, σ12)
and N(µ2, σ22) respectively

harishram@hotmail.com
● The hypothesis
KDUE73YTIL to test whether the population means are equal

H0 : µ1 = µ2 against H1 : µ1 ≠ µ1

● It implies

H0: The two population means are equal (i.e µ1 = µ2)

against H1: The two population means are not equal µ0 (i.e µ1 ≠ µ2)
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test - hypothesis

● Like the one sample tests, it is possible to test for

H0 : µ1 ≤ µ2 against H1: µ1 > µ2


harishram@hotmail.com
KDUE73YTIL
Or

H0 : µ1 ≥ µ2 against H1: µ1 < µ2

● Failing to reject H0 implies that the null hypothesis is true

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - test statistic

● The test statistic is Z given by

Sample mean Specified mean


harishram@hotmail.com
KDUE73YTIL

● Under H0, the test statistic follows normal distribution

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - test statistic

● If σ12 and σ22 are not known then they are replaced the sample estimates s12 and s22
respectively

harishram@hotmail.com
● The test
KDUE73YTIL statistic is Z given by

Sample mean Specified mean

where

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
The python code to conduct a Z test for two population means is
harishram@hotmail.com
KDUE73YTIL
statsmodels.stats.weightstats.ztest(Sample_1, Sample_2, value,
alternative)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - decision rule

Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL

For two tailed test µ1 ≠ µ2 Reject H0 if |Z|≥ Zα/2


Reject H0 if p-value
Reject H0 if µ1 - µ2
is less than or
For left tailed test µ1 < µ2 Reject H0 if Z ≤ Zα does not lie in the
equal to the level
confidence interval
of significance
For right tailed test µ1 > µ2 Reject H0 if Z ≥ Zα

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for population mean (σ unknown)
Question:

A study was carried out to understand amount of hemoglobin in blood for males and
harishram@hotmail.com
females. A random
KDUE73YTIL sample of 160 males and 180 females have means of 13 g/dl and 15
g/dl. The two samples have standard deviation of 4.1 g/dl for male donors and 3.5 g/dl for
female donor . Can it be said the population means of hemoglobin are the same for men
and women? Use α = 0.01.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for population mean (σ unknown)
Solution:

X: Amount of hemoglobin in blood for males


harishram@hotmail.com
Y: Amount of
hemoglobin in blood for females
KDUE73YTIL

Here n1 = 160, s1 = 4.1, = 13


n2 = 180, s2 = 3.5, = 15

To test H0: µ1 = µ2 against H1: µ1 ≠ µ2

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for population mean (σ unknown)
Solution:

The test statistic


harishram@hotmail.com
KDUE73YTIL

Decision Rule: Reject |Zcalc| ≥ Zα/2

Here Zα/2 = 2.58

Since 4.807 > 2.58, reject H0. -2.58 2.58

We may conclude that both males and females have


This file is meant for personal use by harishram@hotmail.com only.
different hemoglobin averages
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for population mean (σ unknown)
Python solution: Calculate critical z-value

harishram@hotmail.com
KDUE73YTIL

i.e. If test statistic is less than -2.58 or greater than 2.58 then we reject H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for population mean (σ unknown)
Python solution: Calculate test statistic

harishram@hotmail.com
KDUE73YTIL

As test statistic (=-4.8068) < critical value (=-2.58), we reject H0.


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for population mean (σ unknown)
Python solution: Calculate p-value

harishram@hotmail.com
KDUE73YTIL

As p-value < 0.01, we reject H0.


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Small Sample Tests
harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Small sample tests

● For tests n < 30 are considered to be small sample tests

● Under the null hypothesis the test statistic follow exact distribution - t distribution, F
harishram@hotmail.com
KDUE73YTIL
distribution or χ2 distribution

● Small sample tests are regarded as exact tests

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Small sample tests

harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Small sample tests

harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample tests

● To test for population mean, if σ2 is known the Z test is used

● If σ2 is
harishram@hotmail.com
KDUE73YTIL
unknown, use the t-test

● The t-tests are based on the t-distribution

● The unknown σ is replaced by sample standard deviation (s)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
The t-distribution

The t distribution for different values of df


● It is based on the degrees of freedom

● The t distribution
harishram@hotmail.com
KDUE73YTIL
is a symmetric distribution
about the mean

● Let n denote the degrees of freedom

● Its mean is 0 and variance is (for n>2)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
The t-distribution

● It is leptokurtic
harishram@hotmail.com
KDUE73YTIL

● For large n it is mesokurtic

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Degrees of freedom

Degrees of freedom (d.f.) is the number of independent variables, i.e. they are free to vary.

Example:
harishram@hotmail.com
KDUE73YTIL
Consider a system of equations:
● Y=8+5Z
● Z = X + 10

Say if the value of Z is known, it is possible to find the value of other variables. Thus the
d.f. is 1.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test - hypothesis

● The hypothesis to test whether the population mean is equal to a specified value
when σ is unknown

harishram@hotmail.com H0 : µ = µ0 against H1 : µ ≠ µ0
KDUE73YTIL

● It implies

H0: The population mean is equal to µ0 (i.e µ = µ0)

against H1: The population mean is not equal to µ0 (i.e µ ≠ µ0)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test - hypothesis

● Like the one sample t- tests, it is possible to test for

H0 : µ ≤ µ0 against H1 : µ > µ0
harishram@hotmail.com
KDUE73YTIL
Or

H0 : µ ≥ µ0 against H1 : µ < µ0

● Failing to reject H0 implies that the null hypothesis is true

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test - test statistic

● The test statistic t given by

harishram@hotmail.com
KDUE73YTIL

s2 is the sample estimate of population variance and is given by

or

● Under H0, the test statistic t follows t distribution with n-1 degrees of freedom
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
The python code to conduct a t test for population mean is
harishram@hotmail.com
KDUE73YTIL
scipy.stats.ttest_1samp(Sample, popmean)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Decision rule

Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL

For two tailed test µ ≠ µ0 Reject H0 if |t| ≥ tdf,α/2


Reject H0 if p-value
Reject H0 if µ0
is less than or
For left tailed test µ < µ0 Reject H0 if t ≤ -tdf,α does not lie in the
equal to the level
confidence interval
of significance
For right tailed test µ > µ0 Reject H0 if t ≥ tdf,α

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Question:

A researcher is studying the growth of bacteria in waters of Lake Beach. The


harishram@hotmail.com
mean bacteria count of 100 per unit volume of water is within the safety level.
KDUE73YTIL

The researcher collected 10 water samples of unit volume and found the mean
bacteria count to be 94.8 with a sample variance of 72.66. Does the data indicate
that the bacteria count is within the safety level? Test at the α = .05 level.
Assume that the measurements constitute a sample from a normal population.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Solution:

The mean bacteria count of 100 per unit volume of water is within the safety level.
harishram@hotmail.com
i.e. µ = 100
KDUE73YTIL

Let X: bacteria count per unit volume

The researcher collected 10 water samples of unit volume and found the mean bacteria
count to be 94.8 with a sample variance of 72.66.
i.e. n = 10, = 94.8 and s2 = 72.66

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Solution:

To test: The bacteria count is within the safety level


harishram@hotmail.com
KDUE73YTILH0 : µ ≥
i.e. 100 against H1 : µ < 100

The test statistic is

Here t follows t9,0.05


-1.833
From the table we have t9,0.05 = -1.833
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Solution:

tcritical (-1.833) > tcalc (-1.929)


harishram@hotmail.com
KDUE73YTIL
Since the critical value is less than the calculated
statistic value.
-1.833
We reject the H0, and can conclude that the
average bacteria per unit volume (true mean) is
within the safety levels.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Python solution: Calculate critical t-value

harishram@hotmail.com
KDUE73YTIL

As t-distribution is symmetric, for a left-tailed test if test statistic is less than -1.83, then we
reject H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Python solution: Calculate test statistic

harishram@hotmail.com
KDUE73YTIL

As test statistic (=-1.929) < critical value (=-1.83), we reject H0.


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One sample t-test
Python solution: Calculate p-value

harishram@hotmail.com
KDUE73YTIL

As p-value < 0.05, we reject H0.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Small sample tests

harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests

● Like the two sample Z test, the two sample t test are used to compare the equality of
means of two populations for unpaired data

harishram@hotmail.com
KDUE73YTIL
● For unpaired data, test relative performance two machineries, investment portfolios

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests for unpaired data

● To test population means of two populations

● Let there
harishram@hotmail.com
KDUE73YTIL
be two different populations such that they follow normal distribution

● Two samples of sizes n1 and n2 drawn from normal population N(µ1, σ12) and N(µ2,
σ22) respectively

● The two samples are independent of each other

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test - hypothesis

● The hypothesis to test whether the population means are equal

H0 : µ1 = µ2 against H1 : µ1 ≠ µ1
harishram@hotmail.com

KDUE73YTIL
It implies

H0: The two population means are equal (i.e µ1 = µ2)

against H1: The two population means are not equal µ0 (i.e µ1 ≠ µ2)

● The one sided hypothesis are

H0 : µ1 ≤ µ2 against H1 : µ1 > µ2 Or H0 : µ1 ≥ µ2
This file is meant for personal use by harishram@hotmail.com only.
against H1 : µ1 < µ2 Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - test statistic

● The test statistic is Z given by


σ12 and σ22 are
known
harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - test statistic

● The test statistic is Z given by


σ12 and σ22 are σ12 and σ22 are
known unknown
harishram@hotmail.com
KDUE73YTIL

Where s2 is the pooled sample variance given by

● If σ12 and σ22 are not known then they are replaced the sample estimates s12 and s22
respectively
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests - test statistic

● If σ12 and σ22 are not known then they are replaced the sample estimates s12 and s22
respectively

● The pooled
harishram@hotmail.com
KDUE73YTIL
sample variance (s2) is used

● Under H0, the test statistic follows t distribution with n1 + n2 - 2 df

● Failing to reject H0 implies that the two population means are equal (i.e µ1 = µ2)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
The python code to conduct a t test for two population means which are not
harishram@hotmail.com
KDUE73YTIL paired is

scipy.stats.ttest_ind(Sample_1, Sample_2)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for unpaired data
Question:

An experiment was conducted to compare the pain relieving hours of two new medicines.
harishram@hotmail.com
Two groupsof 14 and 15 patients were selected and were given comparable doses. Group
KDUE73YTIL
1 was given medicine 1 and group 2 was given medicine 2. Following data is obtained from
the two samples. Test whether the two populations give the same mean hours of relief.
Assume the data comes from normal distribution has equal variance. [Use α = 0.01]

Medicine 1 Medicine 2

Mean of hours of relief 6.4 7.3

S.D of hours of relief 1.4 1.5


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for unpaired data
Solution:

Let X: patients receiving medicine 1


harishram@hotmail.com
KDUE73YTILY: patients
receiving medicine 2

To test H0: µ1 = µ2 against H1: µ1 ≠ µ2

The pooled variance is

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for unpaired data
Solution:

The test statistic is


harishram@hotmail.com
KDUE73YTIL

Decision rule: Reject H0 if |tcal| ≥ tn1+n2-2, α/2

tn1+n2-2, α/2 = t27,0.005 = 2.771


-2.771 2.771
Since 2.771>|-1.667|, we fail to reject H0.

The two medicines haveThis


thefilesame
is meant mean hours
for personal use by of sleep.
harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for unpaired data
Python solution: Calculate critical t-value

harishram@hotmail.com
KDUE73YTIL

If test statistic is less than -2.77 or greater than 2.77, then we reject H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for unpaired data
Python solution: Calculate test statistic

harishram@hotmail.com
KDUE73YTIL

As test statistic (=-1.667)This


> file
critical
is meantvalue (=-2.77),
for personal we fail to reject
use by harishram@hotmail.com only.H0.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for unpaired data
Python solution: Calculate p-value

harishram@hotmail.com
KDUE73YTIL

As p-value > 0.01, we fail to reject H0.


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
● The test is carried out under the assumption that the samples are drawn
from independent normal population
harishram@hotmail.com
KDUE73YTIL
● If the populations from which the samples are drawn do not have equal
variance then the t-test can not be use. In this case the Welch test is used

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests for paired data

● For paired data effectiveness of some training/treatment is measured

● The observations are recorded on the same individual/item twice resulting in pairs of
harishram@hotmail.com
KDUE73YTIL observations

Example:

A energy drink manufacturing company wants to test if sales increase after they advertise
the drink with a life sized picture of a well-known athlete.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests for paired data

● Let {(Xi, Yi), i = 1, 2, 3, … ,n} where X and Y are paired data


● Let µX, µY be the mean of data from X and Y respectively
● Define
harishram@hotmail.com
KDUE73YTIL
di = yi - xi
● Let µd = µy - µx
● In paired t-test, we test for
H0 : µd = µ0 against H1 : µd ≠ µ0

H0 : µd ≤ µ0 against H1 : µd > µ0

H0 : µd ≥ µ0 against H1 : µd < µ0
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests for paired data

● Suppose mean of di = and variance of di =

● The test statistic is


harishram@hotmail.com
KDUE73YTIL

● Under H0, the test statistic follows t-distribution with n-1 degrees of freedom

● Failing to reject H0, implies that the null hypothesis is true

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
The python code to conduct a t test for two population means which are
harishram@hotmail.com
KDUE73YTIL paired is

scipy.stats.ttest_rel(Sample_1, Sample_2)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample t-test for paired data
Question:

An energy drink distributor claims that a new advertisement poster, featuring a life-size
harishram@hotmail.com
picture of a
well-known athlete. For a random sample of 10 outlets, the following data was
KDUE73YTIL
collected. Test that the null hypothesis that there the advertisement was effective in
increasing the sales

Before 33 32 38 45 37 47 48 41 45

After 42 35 31 41 37 36 49 49 48

Test the hypothesis using critical region technique. [Use α = 0.05].


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample t-test for paired data
Solution:

To test,
harishram@hotmail.com
H0: The advertisement was not effective against H1: The advertisement was
KDUE73YTIL
effective ( µd ≤ 0)
(µd > 0)

Compute di
Before 33 32 38 45 37 47 48 41 45

After 42 35 31 41 37 36 49 49 48

di=yi-xi 9 This3file is meant-7


for personal-4 0
use by harishram@hotmail.com-11 only. 1 8 3
Sharing or publishing the contents in part or full is liable for legal action.
Two sample t-test for paired data
Solution:

To test,
harishram@hotmail.com
H0: The advertisement was not effective against H1: The advertisement was
KDUE73YTIL
effective ( µd ≤ 0)
(µd > 0)

We have mean and its variance

The test statistic is

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample t-test for paired data
Solution:

Decision Rule: Reject H0, if tcalc ≥ tdf,α


harishram@hotmail.com
KDUE73YTIL
Here tdf,α = tn-1,α= t8,0.05 = 1.86

Since 1.86 > 0.0998, fail to reject H0 1.86

Thus, there is no effect of the advertisement.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample t-test for paired data
Python solution: Calculate critical t-value

harishram@hotmail.com
KDUE73YTIL

If test statistic is greater than 1.86, then we reject H0.


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample t-test for paired data
Python solution: Calculate test statistic and p-value

harishram@hotmail.com
KDUE73YTIL

The test statistic (=0.1) < critical value (=1.86), also the p-value > 0.05, thus we fail to reject
H0.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Test for Population Proportion
harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Test for proportion

● For qualitative data the proportion of a desired characteristic is obtained

● Test for proportion:


harishram@hotmail.com
KDUE73YTIL
○ One sample: Testing population proportion (P) is equal to a specified value (P0)

○ Two sample: Testing equality of Two population proportions (P1 = P2)

● Similar to the tests of population mean

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Test for proportion

Test for
proportion
harishram@hotmail.com
KDUE73YTIL

One Sample Two Sample

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test - hypothesis

● The hypothesis to test the population proportion is equal to a specified value

H0 : P = P0 against H1 : P ≠ P0
harishram@hotmail.com

KDUE73YTIL
It implies

H0: The population proportion is equal to P0

against H1: The population proportion is not equal to P0

● Failing to reject H0 implies that the population proportion is equal to P0


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Test for proportions

● The test statistic is given by


Specified
Sample
proportion
proportion
harishram@hotmail.com
KDUE73YTIL

Sample size

● Under H0, the test statistic follows standard normal distribution

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion - decision rule

Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL

For two tailed test P ≠ P0 Reject H0 if |Z|≥ Zα/2


Reject H0 if p-value
Reject H0 if P0
is less than or
For left tailed test P < P0 Reject H0 if Z ≤ -Zα does not lie in the
equal to the level
confidence interval
of significance
For right tailed test P > P0 Reject H0 if Z ≥ Zα

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion
Question:

From a sample 361 business owners had gone into bankruptcy due to recession. On taking
harishram@hotmail.com
a survey, it
was found that 105 of them had not consulted any professional for managing
KDUE73YTIL
their finance before opening the business. Test the null hypothesis that at most 25% of all
businesses had not consulted before opening the business.

Test the claim using p-value technique. [Use α = 0.05].

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion
Solution:

From a sample 361 business owners had gone into bankruptcy due to recession.
harishram@hotmail.com
i.e. n = 361
KDUE73YTIL

On taking a survey, it was found that 105 of them had not consulted any professional for
managing their finance before opening the business.

Let X: business which did not consult before


x = 105

The sample proportion (p) = x/n = 105/361 = 0.2909


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion
Solution:

To test: The null hypothesis that at most 25% of all businesses had not consulted before
harishram@hotmail.com
opening the
business
KDUE73YTIL

Here P0 = 0.25

To test, H0: P ≤ 0.25 against H1: P > 0.25

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion
Solution:

The test statistic


harishram@hotmail.com
KDUE73YTIL

The p-value = P(Z > 1.79) = 0.0367 1.79

As p-value < 0.05, reject H0.

We may conclude that at least 25% of all businesses had not consulted before starting the
business.
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion
Python solution: Calculate test statistic

harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
One sample test for proportion
Python solution: Calculate p-value

harishram@hotmail.com
KDUE73YTIL

As the p-value < 0.05, we reject H0.

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Test for proportion

Test for
proportion
harishram@hotmail.com
KDUE73YTIL

One Sample Two Sample

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample tests for population proportion

● Let there be two samples sizes n1 and n2 from different populations of such that x1 and
x2 are the number of specific items in each of them respectively

harishram@hotmail.com
KDUE73YTIL
● Suppose these samples have proportions of specific items p1 and p2 respectively

● To test the equality of population proportion from which these samples are chosen

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test - hypothesis

● The hypothesis to test the population proportion

H0 : P1 = P2 against H1 : P1 ≠ P2
harishram@hotmail.com

KDUE73YTIL
It implies

H0: The two population proportions are equal (P1 = P2)

against H1: The two population proportions are not equal (P1 ≠ P2)

● Failing to reject H0 implies that the two population proportions are equal
This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Test for proportions

● The test statistic is given by

where is the proportion of


harishram@hotmail.com pooled sample such that
KDUE73YTIL

● Under H0, the test statistic follows standard normal distribution

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
The python code to conduct a Z test for two population proportions is
harishram@hotmail.com
KDUE73YTIL
statsmodels.api.stats.proportions_ztest(Sample_1, Sample_2)

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for proportion - decision rule

Based on
H1 Based on critical region Based on p-value
confidence interval
harishram@hotmail.com
KDUE73YTIL

For two tailed test P1 ≠ P2 Reject H0 if |Z| ≥ Zα/2


Reject H0 if p-value
Reject H0 if P1 - P2
is less than or
For left tailed test P1 < P2 Reject H0 if Z ≤ -Zα does not lie in the
equal to the level
confidence interval
of significance
For right tailed test P1 > P2 Reject H0 if Z ≥ Zα

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for proportion
Question:

Steve owns a kiosk where sells two magazines - A and B in a month. He buys 100 copies
harishram@hotmail.com
of magazineA out of which 78 were sold and 70 copies of magazine B out of which 65
KDUE73YTIL
were sold. Is there enough evidence to say that magazine is B is more popular?

Test the claim using p-value technique. [Use α = 0.05].

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for proportion
Solution:

Steve owns a kiosk where sells two magazines - A and B in a month.


harishram@hotmail.com
KDUE73YTIL
Let X: the number of magazines sold

Out of 100 copies of magazine A 78 are sold Out of 70 copies of magazine B 65 are sold
Here, x1 = 78 and n1 = 100 Here, x2 = 65 and n1 = 70
Let p1 be the proportion of sell of magazine A Let p2 be the proportion of sell of magazine B
p1= x1/n1 = 78/100 = 0.78 p2 = x2/n2 = 65/70 = 0.928

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for proportion
Solution:

To test, whether magazine B is more popular, i.e


harishram@hotmail.com
KDUE73YTIL
H0: P1 ≥ P2 against H1: P1 < P2

Where P1: denotes population proportion of magazine A sold


P2: denotes population proportion of magazine B sold

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for proportion
Solution:

The pooled proportion is


harishram@hotmail.com
KDUE73YTIL

The test statistic is

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for proportion
Solution:

The test statistic Z = -2.5905


harishram@hotmail.com
KDUE73YTIL
The p-value = P(Z < Zcalc, under H0) = P(Z < -2.5905, µ = 13) = 0.0048

Since p-value < 0.05, we reject H0.

-2.59

Thus there is enough evidence to conclude that magazine is B is more popular.


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Two sample test for proportion
Python solution: Calculate test statistic and p-value

harishram@hotmail.com
KDUE73YTIL

As the p-value < 0.05, we reject H0.


This file is meant for personal use by harishram@hotmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Summary

harishram@hotmail.com
KDUE73YTIL

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
Parametric tests

● The tests considered so far have two features:

○ The probability distribution of the samples was assumed to be known


harishram@hotmail.com
KDUE73YTIL
○ The hypothesis test was about the parameter of the probability distribution

● These tests are known as the parametric tests

● The times when these assumptions are not satisfied, use the non-parametric tests

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.
harishram@hotmail.com
KDUE73YTIL Thank You

This file is meant for personal use by harishram@hotmail.com only.


Sharing or publishing the contents in part or full is liable for legal action.

You might also like