Professional Documents
Culture Documents
Inferential Statistics
Inferential Statistics
Inferential Statistics
1. Null hypothesis()
• The supposition about the population parameter is known as null
hypothesis.
• Null hypothesis means no difference or zero difference between
the true population parameter and statistic computed from
selected sample data. A hypothesis means assumption therefore
null hypothesis means pre-assumption or no difference.
06/17/2024 RAM KRISHNA TAMANG 16
• According to prof. R.A. Fisher “null hypothesis is the
hypothesis which is tested for the possible rejection under
the assumption that it is true”. It is denoted by and set up
as follows (e.g. z-test)
• So, the p-value is the probability that the test statistic is making as extreme as
the value of test statistic calculated from the observed set of data.
Acceptance region
1-α
1-α
• If an alternative hypothesis leads to only one alternative to the null hypothesis then it is
known as one tailed test as critical region found to be situated on only one tail i.e.
either left tailed or right tailed of the normal curve.
P-value approach
• If p-value < level of significance, then we reject the null hypothesis () and
• If p-value level of significance, then we accept the null hypothesis ().
06/17/2024 RAM KRISHNA TAMANG 31
Parametric tests
• A parametric statistical test is a test whose model
specifies certain conditions about the parameters of
the population from which the samples are drawn.
• Sample statistics will be used to test the hypothesis
that will be made about the certain parameters of
universe. The nature of population distribution from
which the sample drawn is known.
• Z –test, t-test, F-test, ANOVA are the examples of
parametric tests.
Test statistic
Under Null hypothesis, the test statistic is
Z = if population standard deviation is known and samples are drawn with
replacement
= if population standard deviation is unknown and samples are drawn with
replacement
Critical value
The critical or tabulated value is obtained from Z table
based on level of significance and alternative hypothesis.
i.e. Ztabulated = ,
Decision
• If > then we reject the null hypothesis.
• If then we accept the null hypothesis.
06/17/2024 RAM KRISHNA TAMANG 38
Example 1
The researchers are interested in the mean age of a certain population. A random
sample of 10 individuals drawn from the population of interest has a mean of 27 years.
Assuming that the population is approximately normally distributed with variance 20;
can you conclude that the mean is different from 30 years? (Use α=0.05).
Solution:
Given,
A sample size (n) = 10,
Sample mean age () = 27 years
Population standard deviation of age () = 20 and = .
Population mean of age () = 30 years
Level of significance () = 0.05
Null hypothesis: = 30 years i.e. there is no significance difference the sample mean
age and population mean age or the mean of population is not significantly different
from 30 years.
06/17/2024 RAM KRISHNA TAMANG 39
Alternative hypothesis: 30 years (two tailed test) i.e. there is significance difference between
the sample mean age of age and population mean of age or the mean age of population is
significantly different from 30.
Test statistic
Under Null hypothesis, the test statistic is
= = = - 2.12
= 2.12
Level of significance
α = 0.05 be the level of significance.
Critical value
The critical or tabulated value at = 0.05 level of significance and for two tailed test is
i.e. Ztabulated = = = 1.96
Decision
Since > (i.e.2.12 > 1.96) then we reject the null hypothesis.
Hence we can conclude that the mean age of population is significantly different from 30 at
= 0.05 level of significance.
06/17/2024 RAM KRISHNA TAMANG 40
Example 2
Among 150 men in Bhaktapur, the mean systolic blood pressure was 146
mm Hg with a standard deviation of 27. On the basis of these data, May
you conclude that the mean systolic blood pressure for a population of
Bhaktapur is greater than 140 mm of Hg? Use α=0.01
Solution:
Given,
A sample size (n) = 150,
Sample mean systolic blood pressure () = 146 mm of Hg
Sample standard deviation of systolic Blood pressure (s) = 27.
Population mean of systolic blood pressure () = 140 mm of Hg.
Level of significance () = 0.01
Null hypothesis: 140 mm of Hg i.e. the population mean of systolic blood
pressure is significance less than or equal to 140 mm of Hg.
06/17/2024 RAM KRISHNA TAMANG 41
Alternative hypothesis: 140 mm of Hg (right tailed test) i.e. the population mean of
systolic blood pressure is significance greater than 140 mm of Hg
Test statistic
Under Null hypothesis, the test statistic is
= = [ = s]
= 2.722
Level of significance
Let α = 0.01 be the level of significance.
Critical value
The critical or tabulated value at = 0.01 level of significance and for right tailed test is
i.e. Ztabulated = = = 2.33
Decision
Since > (i.e. 2.722 > 2.33) then we reject the null hypothesis.
Hence we can conclude that the population mean of systolic blood pressure is significance
greater than 140 mm of Hg at = 0.01 level of significance.
06/17/2024 RAM KRISHNA TAMANG 42
2. Test the significance difference two
sample means
Let us consider two independent samples of sizes and are drawn from two
independent normally distributed populations having means and and variances
and respectively. Also let and be the two independent means of sizes and.
Following steps involved in the test the significance of two independent sample
means
Null hypothesis: i.e. there is no significance difference between means of two
populations or two independent samples are drawn from normally distributed
population having same means.
Alternative hypothesis: (for two tailed test) i.e. there is significance difference
between means of two populations or two independent samples are drawn
from normally distributed population having different means.
Or,
: (for right tailed test) i.e. the mean of first population is significantly greater
than mean of second population.
Critical value
The critical or tabulated value is obtained from Z table
based on level of significance and alternative hypothesis.
i.e., Ztabulated = ,
Decision
• If > then we reject the null hypothesis.
• If then we accept the null hypothesis.
06/17/2024 RAM KRISHNA TAMANG 45
Example 3
The mean and of standard deviation of BMI of 57 males was found to be
23.1 kg/m2 and 3.48 kg/m2 and the mean and standard deviation of BMI
of 49 females was found to be 20.74 kg/m 2 and 2.63 kg/m2 respectively. Is
the mean of BMI of males and female is significantly different?
Solution:
Given,
For males,
Number of males () = 57, mean of BMI () = 23.1 kg/m2 and standard
deviation of BMI() = 3.48 kg/m2.
For female,
Number of females () = 49, mean of BMI () = 20.74 kg/m2 and standard
deviation of BMI() = 2.63 kg/m2.
06/17/2024 RAM KRISHNA TAMANG 46
Null hypothesis: i.e. there is no significance difference between means of BMI for males
and females.
Alternative hypothesis: i.e. there is significance difference between means of BMI for
male and females.
Test statistic
Under Null hypothesis, the test statistic is
= = = 3.969
Level of significance
Let = 5% be the level of significance.
Critical value
The critical value is obtained from the Z table based on 5% level of significance and two
tailed test is
= = 1.96
Decision
Since > (i.e. 3.969 > 1.956) then we reject the null hypothesis () at 5% level of significance.
Hence we concluded that there is significance difference between means of BMI for all
males and females
Level of significance
Let = 5% be the level of significance.
Critical value
The critical value is obtained from the Z table based on 5% level of significance and left
tailed test is
= = -1.645
= 1.645
Decision
Since > = (i.e. 1.723 > 1.645) then we reject the null hypothesis () at 5% level of
significance.
Hence, we conclude that the average height of people from Dharan is significantly greater
than average height of people from Kathmandu.
06/17/2024 RAM KRISHNA TAMANG 49
3. Test the significance of single sample
proportion
Let P be the population proportion of units possessing a certain characteristic in the
population. Let a random sample of size n has been drawn from the population and
x be the number of units possessing the characteristic in the sample then sample
proportion is p = . For large sample size, the Binomial distribution can be
approximated to normal distribution.
There are following steps to test the significance of single sample proportion,
Null hypothesis: = i.e. there is no significance difference the sample proportion and
population or the samples are selected from normally distributed population having
proportion.
Alternative hypothesis: (for two tailed test) i.e. there is significance difference
between the sample proportion and population proportion or the samples are
selected from normally distributed population having proportion is not equal to.
Solution: Given,
Population proportion of drinks alcohol of adults ages 18 to 24 years (P) = 60% = 0.6
And Q = 1 – P = 1 – 0.6 = 0.4
A sample size (n) = 450,
Sample proportion of drinks alcohol of adults ages 18 to 24 years (p) = 66% = 0.66
Null hypothesis: = 60% i.e. the proportion of college students from California who currently
drinks alcohol is no significantly different from the proportion national wide.
Alternative hypothesis: P 60% i.e. the proportion of college students from California who
currently drinks alcohol is significantly different from the proportion national wide.
Test statistic
Under Null hypothesis, the test statistic is
= = = = 2.598
06/17/2024 RAM KRISHNA TAMANG 53
Level of significance
Let α = 0.05 be the level of significance.
Critical value
The critical value is obtained from the Z table based on 5% level of significance and two
tailed test is
= = 1.96
Decision
Since Z > (i.e. 2.598 > 1.956) then we reject the null hypothesis () at 5% level of
significance.
Hence we conclude that the proportion of college students from California who currently
drinks alcohol is significantly different from the proportion national wide.
Example 6:(home work)
In a random survey of 1000 households in the United States, it is found that 29 percent
of the households have at least one member with a college degree. Does this finding
refute the statement that the proportion of all such United States households is at least
35 percent? Test at the α = .05 significance level
06/17/2024 RAM KRISHNA TAMANG 54
4. Test the significance difference between
two sample proportions
Let be the two population proportions possessing a certain characteristic. Let two
independent samples of sizes and drawn from the two normal populations. Also let and
be the proportions of units possessing certain characteristic in the two independent
samples.
The following steps involved in the test of significance difference the sample proportion.
Null hypothesis: i.e. there is no significance difference between proportions of two
populations or two independent samples are drawn from normally distributed population
having same proportions.
Alternative hypothesis: (for two tailed test) i.e. there is significance difference between
proportions of two populations or two independent samples are drawn from normally
distributed population having different proportions.
Or,
: (for right tailed test) i.e. the proportion of first population is significantly greater than
proportion of second population.
Or,
: (for left tailed test) i.e. the proportion of first population is significantly greater than
porportion of second population.
When sample size is less than or equal to 30(i.e. n 30) then the sampling
distribution of sample means follows Student’s t distribution. The t
distribution is also similar to normal distribution having shape as in normal
distribution but little bit flatter. Student’s t statistic is defined as
t=
t=
06/17/2024 RAM KRISHNA TAMANG 58
Where,
= = sample mean
S = = the unbiased estimate of population standard deviation.
s = = sample standard deviation or biased estimator of population standard
deviation.
n – 1 = degree of freedom
It is based upon the assumption that the samples selected from normal
population with unknown variance and the sample observation are independent.
There are following steps involved in test the significance of single sample mean.
S= Decision
If then we reject the null hypothesis () at level of
= =the unbiased estimate of population significance.
standard deviation. If then we accept the null hypothesis () at level of
significance.
s=
Example:-
= = sample standard deviation or biased
A random sample of 10 bags is drawn and their content s
estimator of population standard are found to weight in kg as follows:50, 49, 52, 44, 45, 48,
deviation. 46, 45, 49, 45.
Test the significance of sample mean if the average
n – 1 = degree of freedom packing can be taken to be 50 kg.
Critical value
The critical or tabulated value is obtained from the chi-square table based
on the α level of significance, (r – 1)(c - 1) degree of freedom and
alternative hypothesis is
=
Decision
• If then we reject the null hypothesis ().
• If then we accept the null hypothesis ().
06/17/2024 RAM KRISHNA TAMANG 68
When the two attribute A and B are
classified into 2 subgroups
2 2 consistency table of two attributes A and B is
B Total
A
a b a+b
c d c+d
Level of significance
Let α be the level of significance
Degree of freedom
The d.f. is 1
06/17/2024 RAM KRISHNA TAMANG 70
Critical value Example:
The critical or tabulated value is In an experiment to study the dependence
obtained from chi-square table based of hypertension on smoking habit, the
following data were taken on 186 individuals
on level of significance, degree of
freedom and alternative hypothesis. No Moderate Heavy
smoker smoker smoker
i.e.
Hypertension 21 36 36
Decision
No 48 26 19
• If > then we reject the null hypertension
hypothesis at α level of significance.
• If ≤ then we accept the null
hypothesis at α level of significance. Test the hypothesis that presence or absence
of hypertension is independent of smoking
habit.
Modera number of
No Heavy
te Cells patient (O) E =
smoker smoker Total
smoker
( ( =
(
( 21 34.5 182.25 5.282609
Hyperte
21 36 36 ( 36 =31 25 0.806452
nsion (
93 =
( 36 27.5 72.25 2.627273
No
=
hyperte 48 26 19
nsion ( ( 48 34.5 182.25 5.282609
93 ( 26 =31 25 0.806452
=
Total 69 62 55 N = 186
( 19 27.5 72.25 2.627273
Test statistic
Decision
Under null hypothesis, the test statistic is
Since > i.e. 17.43 > 5.991 then we reject
= = 17.43 the null hypothesis ()at 5% level of
significance.
Level of significance
Let α = 5% be the level of significance Hence the hypertension is not independent
of smoking habit at 5% level of significance.
06/17/2024 RAM KRISHNA TAMANG 73
06/17/2024 RAM KRISHNA TAMANG 74
Example:
A tobacco company claims Null hypothesis (): i.e. there is no
significance relationship smoking
that there is no relationship
habit and lung alignment.
between smoking habit and
lung alignment. To investigate Alternative hypothesis (): i.e. there
the claims, a random sample is significance relationship smoking
of 300 males
Lung
in No
age
lung
groups
Total
of habit and lung alignment.
40 to 50alignment
is givenalignment
medical test.
The observed
Smoker 75 sample 180
105 result Test statistic
aresmoker
No tabulated
25 below:
95 120
Under null hypothesis, the test
statistic is
Total 100 200 300
=
= =14.06
= Decision
Since < i.e. 1.6875 < 3.841 then we
Level of significance accept the null hypothesis ()at 5% level
Let α=5% be the level of significance of significance
06/17/2024 RAM KRISHNA TAMANG 77
Correlation
• Correlation is a statistical device designated to measure
the degree of association between two or more variable
e.g. to studying the relationship between height and
weight of children, blood pressure and age of patients,
fever and weight of children, income and expenditure etc.
• To measure the degree of association between such types
of variables one statistical tool is needed and known as
correlation and summary value of such statistics is known
as correlation coefficient.
• It is generally denoted by r and independent of original
unit of measurement of variables in study variables.
06/17/2024 RAM KRISHNA TAMANG 78
Types of correlation
Types of correlation Methods of studying
• Positive correlation correlation
• Negative correlation • Graphical method or
• Linear correlation scatter diagram
method.
• Non-linear correlation
• Karl Pearson’s
• Simple correlation
correlation coefficient
• Partial correlation
• Multiple correlation
r=
Weight 10 18 24 30 25 35
(Y)
Calculate the Karl Pearson’s correlation
coefficient and interpret the result
7 24 49 576 168
9 25 81 625 225