Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 53

Inferential Statistics:

Tests of significance

Dr.Tarek Tawfik

12/07/21 Dr Tarek Amin 1


What are the test of significance
They are certain tests used to examine hypothesis
.of the underlying research question
:They are
Metric test of significance
.Nonparametric test of significance

12/07/21 Dr Tarek Amin 2


How to choose a statistical test of
?significance
I. Look at the given descriptive
statistics.
II. Type of distribution.
III. Then follow the Road Map.

12/07/21 Dr Tarek Amin 3


Types of descriptive statistics

Tables Graphs Correlation

Numeric measures

Central Tendency Others Dispersion

Maximum
Mean Rate Minimum
Median Ratio Range
Mode Percentage Standard Deviation
IQR Centiles Variance
12/07/21 Dr Tarek Amin 4
Coefficient of variation
Descriptive statistics

Cross tabulation Proportions (percentage)


For frequency data Binomial with 2 outcomes
Mean  S.D
Or Variance

2X2 tables
Chi-square” for“ ?How many samples Binomial test of
independence Dependence significance

One sample t test


Population variance is known
One variable with
Multiple categories Two independent samples
”Goodness of fit“ t-test of independence
Before and after 2 samples
12/07/21
t-paired test
Dr Tarek Amin 5
!The best answer
:For any test, do the following
.State the Null Hypothesis -1
.Write down the formula correctly -2
.Calculate accurately -3
.Determine the degree of freedom -4
.Determine the critical score using the tables -5
.Interpret your results meaningfully -6

12/07/21 Dr Tarek Amin 6


Example of metric test of
”significance “student t test

12/07/21 Dr Tarek Amin 7


Condition for Parametric Tests of
Significance
Used for parametric data “normally
distributed data”.
Data are on numeric scale.
The distribution of the underlying
population is normal.
Observations are independent.
The samples are randomly drawn from
the population.
12/07/21 Dr Tarek Amin 8
Steps of student (t test)
• Hypothesis and null hypothesis
• Probability and level of error
• Calculated t value
• Level of confidence
• Degree of freedom and tabulated t value
• Comparison between calculated t value and
tabulated t value

12/07/21 Dr Tarek Amin 9


.What is the student t test
• So called after W.G. Gosset, who first
defined its properties in 1800.
• As an employee of the Guiness brewing
company, he was not permitted to publish
under his own name.

12/07/21 Dr Tarek Amin 10


Base of Student t test
Based on the t distribution which
reflect variation due to chance than
normal distribution
Used to analyze small sample
t distribution curve is continuous
symmetrical unimodal distribution.

12/07/21 Dr Tarek Amin 11


Types of t test
1-One sample t test:
• The mean and standard deviation of a sample are
calculated and a value is postulated for the mean of
the population.
How significantly does the sample mean differ from
the postulated population mean?
2-Two independent sample:
• The means and standard deviation of two samples
are calculated.
Could both samples have been taken from the same
population??
3-Paired sample
• Paired observations are made on two samples (or
in succession on one sample).
What is the significance of the difference between the
means of the two related sets of observation?
12/07/21 Dr Tarek Amin 12
The One Sample t-test for a mean

® Indicated when we are confronted with a single


sample and we need to know whether or not this
sample is different from the whole population
form which it was taken.
® Quantitative, measured at the interval/ratio
level, the mean could be calculated.
® t = (sample mean X)- (population mean µ)
Standard error of the sample(SE)
(SE = SD/ n)

12/07/21 Dr Tarek Amin 13


Steps to perform one sample t test
• Assume that the Health Department is
interested in whether the average age for the
population in certain region is over 40 years
in order to decide how much money it
should allocate to the local hospital.
• Unable to survey the whole area, a
random sample of 51 subjects from this
population was taken, the average age was
43 years and S.D of 10 years.
How are we going to conclude that this results
was not due to random variation? (chance)
12/07/21 Dr Tarek Amin 14
Steps
• State the null and alternative hypotheses
where H0:µ= 40 years or
Ha: the population in this region is on
the average older than 40 years µ >40 years
(notice the inequality in the alternative
hypothesis).
• Choose the test of significance: interval/ratio, the
mean is provided, the S.D for the population is
missing.

t     / SD n
12/07/21 Dr Tarek Amin 15
1-Calculate the difference between two mean
Substitute the population mean by the sample mean.

43-40

2-Calcculate Standard error of mean by divide SD of

sample by sample size of


10 51=2.1

3-Then divide difference by Standard error of mean


43-40/ 10 51=2.1
12/07/21 Dr Tarek Amin 16
4- Establish the critical score and critical
regions: we need to
a) decide between one-tail and two tail test,
b) select the level of significance ( alpha level ).

• What is the degree of freedom? The t-test is


based on the assumption that the population
S.D (unknown) is equal to the sample S.D
( known). The number of degrees of freedom
affects the value of the critical t-score.
5- Make a decision.

12/07/21 Dr Tarek Amin 17


Degree of freedom
Definition
It is the number of variables in a series or
distribution that can be freely assigned values
when sum of variables is fixed

12/07/21 Dr Tarek Amin 18


Degree of freedom

df = n-1 12
Restricted
value
Mean =50
16

7 15

12/07/21 Dr Tarek Amin 19


Step 4 : establish the critical score
and critical region
 Decide between one-tail and two-tail test.
 Select the level of significance (alpha
level).
ά = 1 - confidence
0.05 = 95. - 1 =
 Then refer to t-table after determining
the degree of freedom (df).

12/07/21 Dr Tarek Amin 20


df
Level of significance for one-tail test
0.005 0.01 0.02 0.05 0.10
Level of significance for two-tail test
0.01 0.02 0.05 0.10 0.20

1 3.078 6.314 12.706 31.821 63.657


2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.340 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
50 1.299 1.676 2.009 2.403 2.678
∞ 1.282 1.645 1.960 2.326 2.576
12/07/21 Dr Tarek Amin 21
Critical Score
The number of degree of freedom affects the
value of the critical score.
The larger the sample size (and therefore the
degree of freedom) the more likely that any
difference between the sample mean and the
test value will prove to be significant.
In our example the critical score is + 1.676
(2.1) and the P value is < 0.05 = reject
the null hypothesis because our value (2.1)
.falls in the region of rejection
12/07/21 Dr Tarek Amin 22
Degree of t value at (0.05) t value at (0.01)
freedom probability probability

8 2.31 3.36

10 2.23 3.17

60 1.96 2.58

Notice that as the sample size increases the test statistics


.required to be significant is decreasing

12/07/21 Dr Tarek Amin 23


Acceptance
Null

Rejection
Alternative

12/07/21 Dr Tarek Amin 24


Step 5: Make a decision
The P value is < 0.05 which means that we will
encounter the same result (population age of 40
years) in only five times for every hundred samples
= the difference is not due to chance or random
.variation

The population in this area is over 40


.years of age

12/07/21 Dr Tarek Amin 25


Exercise
• The following data are
ages, in years, at death
for a sample of people t     / SD n
who were all born in
the same year:
34,60,72,55,68,1 •
Calculate the mean and
2,48,69,78,42,60, S.D.
81,72,58,70,54,8 • What is the probability
5,68,74,59,67,76, of randomly obtaining
55,87,70. this sample from a
population with an
average life expectancy
of 70 years?

12/07/21 Dr Tarek Amin 26


The Two-samples t-test for the equality of
.means
Here we apply a modified procedure for finding the
difference between two means and testing the size of
this difference with following assumptions:
 Data are quantitative and normally
distributed.
 Two samples come from distributions that
may differ in their mean value, but not in the
S.D.
 Observations are independent for each other
(most important).
12/07/21 Dr Tarek Amin 27
The two-samples t-test

Population
Population Y
X
Want to compare
inference inference

Sample Sample
X Y

12/07/21 Dr Tarek Amin 28


Dependent and Independent Variables

A dependent variable is affected by an


independent variable.
1-One way direction.

2-Mutually dependent.

12/07/21 Dr Tarek Amin 29


Dependent and Independent Variables

1-One way direction:


• Cancer lung (dependent)  (smoking
independent)
• The model of direct relationship (the relation
between income and place of residence).
2-Mutually dependent relationship:
place of residence (independent) income
( dependent ) or place of residence( dependent)
 income ( independent).
12/07/21 Dr Tarek Amin 30
The Sampling Distribution of the Difference
.Between Two Means
Steps of doing Two-samples t-test
Step 1-But hypothesis tests, we begin by assuming that the null
hypothesis of no difference is correct.
Step 2-On this assumption we build up the sampling distribution
of the difference between two sample means.
Step 3- Use this distribution to determine the probability of
getting an observed difference between the two sample means
from population with no difference.
Step 4- Finally we compare this probability to a critical alpha
level.
Step 5-Then decide whether the null hypothesis should be
rejected or not.

12/07/21 Dr Tarek Amin 31


The Two Samples t-test how to begin
1-Assuming that the average amount of TV watched
by children is the same in both Australia and
Britain: H0:1=2 or Ho:1-2=0.
2-If this true, repeated sampling should prove this,
so we expect that the most common results will
be that the difference is small, if not zero.
3-Since we are assuming no difference between the
two population, we expect the sample means to
be equal as well:
mean1=mean2, therefore mean1-mean2=0.

12/07/21 Dr Tarek Amin 32


The Two Samples t-test
3-But this will not always be the result. We might
draw a sample from Australia that has a lower
than average amount of TV watching coupled
with a sample from Britain that has higher than
average amount of TV.
(the effect of random variation):
mean1 > mean2 therefore,
mean1- mean 2 > 0.
4-Taking large number of these repeated random
samples and calculating the difference between
each two of sample means, will end up with
sampling distribution of the difference between
two sample means,

12/07/21 Dr Tarek Amin 33


Sampling distribution of the difference between two
sample means properties
:It will be a t-distribution-1

the mean of the difference between -2


:sample means will be zero
   0
the spread of scores around this -3
mean of zero will be defined by the
pooled variance estimate. This
estimate assumes that the
.population have equal variance

t   1   2   1   2
 1   2   n1  1 s1   n2  2 s 2
2 2
 n1  n2  n1n2
12/07/21 Dr Tarek Amin 34
Equation of t test
t  1   2  1   2
It will be a t-distribution
t=mean1-mean2 mean1-mean2.
The mean of the difference between sample means
will be zero:
 mean1-mean2=0.
The spread of scores around this mean of zero will
be defined by the formula:
 mean1-mean2=(n1-1)s2+(n2-1 )s2n / n1+n2-
2*n1+n2n1n2 ( Pooled variance estimate,
populations have equal variance)
12/07/21 Dr Tarek Amin 35
Two Sample t-test
 A survey consists of 20 Australian children and 20
British children, and we want to assess whether TV
viewing time is affected by the country of residence.
 State the null and alternative hypotheses:
H0: µ1= µ2 or µ1- µ2=0 and
Ha: µ1 µ2 or µ1- µ2 0.
 Choose the test of significance: two samples,
compared and measured at the interval/scale ratio.
 Calculate the sample score:
mean1 (Australian) =166, s1=29, n1=20 and for
British mean2= 187,s2=30,and n2=20.

12/07/21 Dr Tarek Amin 36


Two samples t-test
• Mean1-mean2 mean1-mean2 where
 =((n1-1)s1)+((n2-1)s2)/(n1+n2-2) *
((n1+n2)(n1n2)).
So, =( 20-1)29+( 20-1)30/20+20-
2(20+20)/20x20=9.3
and t sample=166-187/9.3=-2.258
• Establish the critical score and critical
region:
degree of freedom= n-2=38.
• Make a decision.

12/07/21 Dr Tarek Amin 37


df
Level of significance for one-tail test
0.005 0.01 0.02 0.05 0.01
Level of significance for two-tail test
0.01 0.02 0.05 0.10 0.20

1 3.078 6.314 12.706 31.821 63.657


2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.340 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
35 1.306 1.690 2.030 2.438 2.724
50 1.299 1.676 2.009 2.403 2.678
∞ 1.282 1.645 1.960 2.326 2.576

12/07/21 Dr Tarek Amin 38


For the following sets of results, test for significance
difference using a two-tailed and alpha =0.05 (assuming
:equal population variance)

Sample1: Sample 2:
• Mean=72 • Mean=76.1
• S.D= 14.2 • S.D=11
• Sample size=35 • Sample size=50

12/07/21 Dr Tarek Amin 39


”Two Samples t-test “exercise
A researcher is interested in the effect that place of
residency has on the age at which people begin to
smoke.
The researcher divides a random sample of people into
91 rural and 107 urban residents and finds that rural
dwellers started smoking at an average age of 15.75
years with S.D of 2.3 years, whereas the urban
dwellers began at a mean age of 14.63 with a S.D of
4.1 years.
Is there a significant difference ?

12/07/21 Dr Tarek Amin 40


t-test for paired, dependent, mean difference

Independent sample: are those


where the criteria for selecting the
cases that make up one sample do not
affect the criteria for selecting cases
.that make up the other sample (s)
.Average height among different populations

12/07/21 Dr Tarek Amin 41


Dependent sample: are those where the
criteria for selecting the cases that make up
one sample affect the criteria for selecting
.cases that make up the other sample (s)
When the same subject is observed under -1
two different conditions (pre-test post-
. test design; drugs, measurements etc.,)
Matched pairs (when subjects in different -2
samples are linked for some special
. reason) children and their parents

12/07/21 Dr Tarek Amin 42


The dependent t-test
A survey of 10 families is conducted and a parent
from each household and a child are asked to keep
a diary of the amount of TV watched in minutes for
.a certain time period
For each parent-child the amount of TV watched is
.recorded
Is there a difference between the parent and his child
?in TV watching

12/07/21 Dr Tarek Amin 43


Steps for calculations
Calculate the difference for each pair of -1
. cases (D)
.Calculate the mean of the differences (XD) -2
.proceed other step of hypothesis -3
H0: µD = 0
or Ha : µD ≠ 0

12/07/21 Dr Tarek Amin 44


Household T.V Child in T.V Parent in Differences in minutes (D)
minutes minutes
1 45 23 22=45-23
2 56 25 31=56-25
3 73 43 30=73-43
4 53 26 27=53-26
5 27 21 6=27-21
6 34 29 5=34-29
7 76 32 44=76-32
8 21 23 2-=21-23
9 54 25 29=54-25
10 43 21 22=43-21

Mean 48.2 26.8 21.4

Variance of the mean difference D2 = 6400 (22x22+31x31+…..)


Variance of the sum difference = 214 X214 = 45,796

12/07/21 Dr Tarek Amin 45


Tips

• An independent samples t-test looks at the


difference between the means,
• A dependent t-test looks at the mean of the
difference.
• The mean difference is equal to the difference
between means.
• So why go through this alternative procedure for
calculating the difference between two means?
• Mean difference = the difference between
means but the variances will not be the
same.
12/07/21 Dr Tarek Amin 46
Calculate the paired t-test

t   D  sD / n
where

 D   D  / n / n 1
2
2
sD 

12/07/21 Dr Tarek Amin 47


SD= √6400-45,796/10 / 10-1
SD = 14.2
t paired = 21.4/14.2√10
t paired= 4.8
With a 9 degrees of freedom
Look for the critical score and critical
region

12/07/21 Dr Tarek Amin 48


df
Level of significance for one-tail test
0.005 0.01 0.02 0.05 0.01
Level of significance for two-tail test
0.01 0.02 0.05 0.10 0.20

1 3.078 6.314 12.706 31.821 63.657


2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.340 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
35 1.306 1.690 2.030 2.438 2.724
50 1.299 1.676 2.009 2.403 2.678
∞ 1.282 1.645 1.960 2.326 2.576

12/07/21 Dr Tarek Amin 49


Finally make a decision
There is a statistically significant difference in
the amount of TV watching between parents
.and their children

12/07/21 Dr Tarek Amin 50


Paired t-test
• One hundred and forty patients are given a
new treatment for lowering blood pressure.
• The mean difference between systolic blood
pressure for these patients before and after
the treatment is -9, with a S.D of 8 .
• The treatment will only be adopted if it is
significant at a 0.01 level ?

12/07/21 Dr Tarek Amin 51


Paired t-test: what is the mean difference for
the following 10 pairs of observations? Conduct a dependent-samples t-
.test on these data, with a 0.05 level of significance

Observation 1 Observation 2
• 12 • 15
• 10 • 13
• 8 • 13
• 14 • 14
• 12 • 18
• 15 • 13
• 14 • 18
• 9 • 9
• 18 • 11
• 13 • 14
12/07/21 Dr Tarek Amin 52
Thank you

12/07/21 Dr Tarek Amin 53

You might also like