Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Research

Methodology and
Biostatistics – part II
statistical tests of significance, type of
significance tests, parametric
tests(students “t” test, ANOVA, Correlation
coefficient, regression),
• Webster’s New Collegiate Dictionary
• Infer – to derive as a conclusion from facts of premises.
Inference – the act of passing from statistical sample data
to generalizations.
• Statistics – a branch of mathematics dealing with the
collection, analysis, interpretation, presentation of masses
Inferential of numerical data.
• Statistical inference is the process of using data obtained
Statistics from a small group of elements (the sample) to make
estimates and test hypothesis about the characteristics of
a larger group of elements (the population).
• EXAMPLE 1: To test the efficacy of drug/s
• EXAMPLE 2: The time required by a robot to do a
repetitive task must - is determined by a few sample
observation.
Biostatistical
Inference
Involves
– Estimation
– Hypothesis testing
Purpose
– Draw conclusions or inferences about
population characteristics
Sampling
 Population (universe)
– The set of all items of interest
– The word population does not necessarily refer to a group of people.
 Sample
– A set of data drawn (or observed) from the population.
Parameter: Population characteristics or summary measures of the population
are called parameters and they are always constant. Parameters are calculated
from the population data or they are estimated from the sample statistics.
Statistic: Sample characteristic or summary measures of the sample are called
statistic and they varied form sample to sample. Statistics are used to estimate
the corresponding population parameters. Size of sample is denoted by “n”
Sampling distribution: The frequency distribution which is formed with
various values of a statistic computed from different samples of the same size
drawn form the same population.
• Biostatistical inference is a technique in which valid
inferences about the population parameter are
drawn
• Two Aspects: Estimation and Testing of Hypothesis
Estimation: Method by which population parameters
are estimated from the sample information.
Two types: (i) Point Estimate (ii) Interval Estimate
Biostatistical Point Estimate: An estimate by a single value of
statistic used to approximate the parameter of an
Inference unknown population-Point estimate/estimator of
the parameter
Interval Estimate: the population of parameter given
by two numbers between which the parameter is
considered. Two values are computed in such a way
that the interval lies between the two values
containing the parameter- interval estimate /
confidence interval.
A good estimator is
one which is very
Properties:
close to the value
of the parameter.

Good Unbiasedness Consistency


Estimator

Efficiency Sufficiency
Testing Hypothesis
Any Statement about a biostatistical population
or the values of its parameter is called
“Biostatistical Hypothesis”
• Two types of statistical Hypothesis
(i) Simple Hypothesis (ii)Composite Hypothesis
• Null Hypothesis
• Alternative Hypothesis
• Critical Region: A critical region, also known as
the rejection region, is a set of values for the
test statistic for which the null hypothesis is
rejected. i.e. if the observed test statistic is in the
critical region then we reject the null hypothesis and
accept the alternative hypothesis.
• Two Types of Errors:
Types of errors
• Type I error: Reject H0 when it is true
Decision from Sample
• Type II error: accept H0 when it is wrong i.e., accept H0 when H1
Reject H0 Accept H0 is true.
 P(Reject H0 when it is true) = P(Reject H0/H0) = α
H0 True Wrong Decision Correct  P( Accept H0 when it is false) = P(Accept H0/H1) = β
(type I error)  The α and β are called the sizes of type I error and type II error,
respectively.
H0 False Correct Wrong Decision  In practice, Type I error amounts to rejecting a lot when it is
(type II error) good and type II error may be regarded as accepting the lot when it
is bad.
 Thus
 P(Reject a lot when it is good) = α
 P(Accept a lot when it is bad )= β

 Where α is called as producer`s risk and β is called as


consumer`s risk.
• Level of Significance: α-the probability of the type 1 error-level of
significance if the test
• Power of test: β- probability of type II error.
• One tailed and two tailed test
One tailed and two tailed tests
1. Test of Significance for Single Mean

2. Test of significance for the difference


Small between two Means
Sample Test
– Student`s 3. Test of significance for the difference of two
Means(Paired t-test for difference of Means)
t- Test
4. Correlation coefficient
• Correlation Analysis
• Correlation Coefficient
• Positive and negative correlation
• Value of correlation always lies between +1 and -1.
• Types of Correlation:
(i)Positive and negative correlation
(ii)Simple and Multiple correlation
(iii) Partial and Total Correlation
Correlation (iv) Linear and non-linear correlation
Methods of Studying Correlation
1.Graphic Method:
(a)Scattered Diagram /scattergram or dotogram (b)Simple graph or correlogram
2. Mathematical Methods:
(a) Karl Pearson` Coefficient of Correlation
(b) Spearman`s Coefficient of Correlation
• Regression is the measure of the average relationship between
two or more variables in terms of the original units.
• Two variables: Dependent variable/regresses-explained and
Independent variable /regressor - predictor
• Graphical representation – Regression Line
• Two regression Lines
• Regression line X on Y and Y on X
Regression • Types of Regression:
• a. simple and multiple b. Linear and Non-Linear c.Total and
partial
• Methods:
 Graphic Method/scattered Diagram
 Algebraic Method
• To test the significance of mean of the mean of Random
Sample/Test of significance for Sigle mean
(𝑋 −𝜇)
𝑡= 𝑠 , 𝑋 = Sample Mean , 𝜇= population Mean,
𝑛

S =Standard deviation , n=sample Size/ Number of observations


Test of (𝑋−𝑋)
2
s=
Significance 𝑛−1
• Degrees of freedom ν= n-1
for Single • Fiducial Limits of Population Mean Assuming that the same is
a random sample from a normal population of unknown
Mean mean, the 95% fiducial limits of the population Mean(𝜇) are;
𝑠
𝑋± t
𝑛 0.05
and 99% fiducial limits
𝑠
𝑋± t
𝑛 0.01
Problem

Items 1 2 3 4 5 6 7 8 9 10

The lifetime of electric bulbs for Life 4.2 4.6 3.9 4.1 5.2 3.8 3.9 4.3 4.4 5.6
in`000
a random sample of 10 from a Hours
large consignment gave the
following data:
Can we accept the hypothesis
that the average life time of the
bulb is 4000 hours.
Solution:
• H0 (Null Hypothesis): There is no significant
difference between the sample mean and
the hypothetical mean i.e., the sample
comes from the population having average
lifetime of 4000 hours.
Problem • H1 (Alternative Hypothesis): There is a
significant difference between the sample
solution mean and the population mean i.e., the
sample does not come from the population
having average lifetime of 4000 hours.
Sl.No. x
1 4.2 Solution conti……
2 4.6
3 3.9 (𝑋 −𝜇)
•𝑡= 𝑠
4 4.1 𝑛
5 5.2 𝑥
• 𝑋=
𝑛
6 3.8 44
7 3.9
•= = 4.4
10
8 4.3
9 4.4
10 5.6
n=10 ∑𝑥=44
Sl.No. 𝑥 (𝒙 − 𝒙) (𝒙 − 𝒙) 𝟐
Solution conti…… 1 4.2 -0.2 0.04
2 4.6 -0. 2 0.04
2 3 3.9 -0.5 0.25
(𝑋−𝑋)
• s= 4 4.1 -0.3 0.09
𝑛−1
5 5.2 0.8 0.64
3.12
• s= 6 3.8 -0.6 0.36
10−1
7 3.9 -0.5 0.25
3.12
• s= = 0.589 8 4.3 -0.1 0.01
9 9 4.4 0 0
(𝑋 −𝜇)
•𝑡= 𝑠
10 5.6 1.2 1.44
n=10 ∑𝑥=44 𝟐
𝑛 (𝒙 − 𝑿) =
3.12
(𝑋 −𝜇)
•𝑡= 𝑠
𝑛
(4.4−4)
•= 0.589 = 2.148.
Solution 10
• Degrees of freedom(ν) =n-1=10-1=9;
Conti…… • Calculate value is 2.148 is less than the table
value 2.26 ,accept null hypothesis stating
that the average lifetime of the bulb could
be 4000 hours.
Problem
• A random sample of size 16 , has 53 as mean. The sum of the squares of the
deviations from the mean is 135.Can this sample be regarded as taken from
the population with 56 as the mean. Obtain 95% and 99% confidence limits of
the mean of the population.
• Solution:
• H0 (Null Hypothesis): There is no significant difference between the sample
mean and the population mean i.e., the sample comes from the population
having a mean of 56.
• H1 (Alternative Hypothesis): There is a significant difference between the
sample mean and the population mean i.e., the sample does not come from
the population having a mean of 56.
(𝑋 −𝜇)
•𝑡= 𝑠
𝑛

𝑋=53
𝜇=56
Problem – n=16
2
Solution …… (𝑋 − 𝑋) =135
2
(𝑋−𝑋)
s=
𝑛−1
135 135
= = =3
16−1 15
(𝑋 − 𝜇)
𝑡= 𝑠
𝑛
(53−56)
= 3 =4
16
Problem – • Degrees of Freedom (ν)= 𝑛 − 1 = 16 − 1 =
Solution …… 15 , table value t0.05=2.13.
• Calculate value is 4 greater than the table value
table value t0.05=2.13, reject null hypothesis
stating that there is a significant difference
between the sample mean and the population
mean i.e., the sample does not come from the
population having a mean of 56
• 95% and 99% confidence limits
𝑠
• the 95% fiducial limits 𝑋 ± t 0.05
𝑛
3
53 ± ∗ 2.13
16
Problem – 53 ± 1.6
Solution = 51.4 to 54.6
𝑠
conti…. the 99% fiducial limits 𝑋 ±
𝑛
t 0.01
3
53 ± *2.95 = 53 ±2.212
16
= 50.788 to 55.212
Problem for practice
• Nakamura et al. studied subjects with medial collateral ligament (MCL) and anterior cruciate ligament (ACL)
tears. Between February 1995 and December 1997, 17 consecutive patients with combined acute ACL and grade
III MCL injuries were treated by the same physician at the research center. One of the variables of interest was
the length of time in days between the occurrence of the injury and the first magnetic resonance imaging (MRI).
The data are shown in Table. We wish to know if we can conclude that the mean number of days between injury
and initial MRI is not 15 days in a population presumed to be represented by these sample data.
Test of difference between means of two
samples

𝑋1−𝑋2 𝑛1𝑛2
•𝑡 = *
𝑆 𝑛1+𝑛2

𝑋1−𝑋1 2+ 𝑋2−𝑋2 2
• S=
𝑛1+𝑛2−2
• Degrees of freedom ν= 𝑛1 + 𝑛2 −2
Problems for practice
• Two types of drugs were used on
5 and 7 patients for reducing
Drug A 10 12 12 11 14
their weight. Drug A was
imported, and Drug B was Drug B 8 9 12 14 15 10 9
indigenous .The decrease in the
weight after using the drugs for
six months was as follows:
• Is there a significant difference in
the efficacy of the two drugs.
Solution
• H (Null Hypothesis): There is no significant difference between the efficacy of Drug A
0
and Drug B
• H1 (Alternative Hypothesis): There is a significant difference between efficacy of Drug A
and Drug B.

𝑋1−𝑋2 𝑛1𝑛2
•𝑡 = *
𝑆 𝑛1+𝑛2
Solution
𝑋1 𝑋1 − 𝑋1 𝑋1 − 𝑋 1 2 𝑋2 𝑋2 − 𝑋2 𝑋2 − 𝑋2 2

10 -2 4 8 -3 9
12 0 0 9 -2 4
13 +1 1 12 +1 1
11 -1 1 14 +3 9
14 +2 4 15 +4 16
10 -1 1
9 -2 4
𝑋1 = 60 𝑋1 − 𝑋1 2 = 10 𝑋2= 77 𝑋2 − 𝑋2 2 =44

𝑋1 60 𝑋2 77
𝑋1= = = 12 𝑋2= = = 11
𝑛1 5 𝑛2 7
Solution conti…..
𝑋1−𝑋1 2+ 𝑋2−𝑋2 2
• S=
𝑛1+𝑛2−2

10+44 54
= = =2.324
5+7−2 10
𝑋1−𝑋2 𝑛1𝑛2
𝑡= *
𝑆 𝑛1+𝑛2

12−11 5∗7 1.708


•= * = =0.735
2.324 5+7 2.234
• ν= 𝑛1 + 𝑛2 −2 (df) = 5+7-2=10
• Calculated value is 0.735 less than the table value 2.23 at 5% level of significance .
• Since the calculated value is less than the table value accept null hypothesis stating
that there is no significant difference in the efficacy of two drugs.
𝑋1−𝑋2
• t=
𝑠2 2
Test of difference 1 + 𝑠2
𝑛1 𝑛2

between means • 𝑋1 = Mean of sample / group1


• 𝑋2 = = Mean of sample /group 2
of two samples – • 𝑠1 = Standard Deviation of Sample1/group
1
With two • 𝑠2 = Standard Deviation of
Separate Sample2/group 2
• 𝒔𝟐𝟏 = 𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 of sample1 / group1
Standard • 𝒔𝟐𝟐 = 𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 of sample2 / group2
deviations` • 𝑛1 =number of
observation/participants in group 1(Sample 1)
• 𝑛2 =number of
observation/participants in group 2(Sample 2)
• Degrees of freedom ν = 𝑛1 + 𝑛2 - 2
• Samples of Sizes of 10 and 14 were taken from two
Normal populations with standard deviation (SD) 3.5
and 5.2. The sample means were found to be 20.3
and 18.6.Test whether the means of the two
populations are same at 5% level.
• Solution:
H0=The means of the two populations are same
Problem to  H1=The means of the two populations are not same
• 𝑋1 = 20.3
Solve 𝑋2 = 18.6
• s1 = 3.5 s2= 5.2
• 𝑛1 =10 𝑛2 =14
• Degrees of freedom ν = 10+ 14 - 2=22
Solution ……..
𝑋1−𝑋2
t=
𝑠2 2
1 + 𝑠2
𝑛1 𝑛2

20.3−18.6 1.70 1.70 1.70


= 2 2
= 12.25 27.04
= 1.23+1.93
= 3.16
3.5 5.2
+ +
10 14 10 14
1.70
= 1.78
= 0.96 Degrees of freedom ν =22
• Calculated value is 0.96 less than the table value 2.074 at 5% level of significance. Accept
null hypothesis stating that there is no significant difference between the two means .
To solve
• The purpose of a study by Tam et al. was to investigate
wheelchair maneuvering in individuals with lower-
level spinal cord injury (SCI) and healthy controls (C).
Subjects used a modified wheelchair to incorporate a
rigid seat surface to facilitate the specified
experimental measurements. Interface pressure
measurement was recorded by using a high-resolution
pressure-sensitive mat with a spatial resolution of four
sensors per square centimeter taped on the rigid seat
support. During static sitting conditions, average
pressures were recorded under the ischial tuberosities
(the bottom part of the pelvic bones). The data for
measurements of the left ischial tuberosity (in mm Hg)
for the SCI and control groups are shown in Table .We
wish to know if we may conclude, on the basis of
these data, that, in general, healthy subjects exhibit
lower pressure than SCI subjects.
Dernellis and Panaretou examined subjects with
hypertension and healthy control subjects. One of
the variables of interest was the aortic stiffness
index. Measures of this variable were calculated
from the aortic diameter evaluated by M-mode
echocardiography and blood pressure measured
by a sphygmomanometer. Generally, physicians
wish to educe aortic stiffness. In the 15 patients
To Solve with hypertension (group 1), the mean aortic
stiffness index was 19.16 with a standard
deviation of 5.29. In the 30 control subjects
(group 2), the mean aortic stiffness index was
9.53 with a standard deviation of 2.69. We wish
to determine if the two populations represented
by these samples differ with respect to mean
aortic stiffness index.
Testing
difference 𝑑 𝑛
• 𝑡=
between means 𝑑
𝑠

of two samples 𝑑 =
𝑛
(Dependent 𝑑 = Mean of the differences s= Standard Deviation
sample or s=
𝑑2 − 𝑛 𝑑 2

Matched paired 𝑛−1


• Degrees of Freedom(ν)= n-1
observations) -
Paired t- test
Problem to solve
• To verify a course in accounting improved performance , a similar test was
given to 12 participants before and after the course. The original marks
recorded in alphabetical order of the participants were
44,40,61,52,32,44,70,41,67,72,53 and 72.After the course , the marks were in
the same order 53,38,69,57,46,39,73,48,73,74,60 and 78.Was the course
useful.
• Solution:
• Null Hypothesis: There is no significant difference in the marks obtained before and after the course. i.e., the
course has not been useful.
• Alternative Hypothesis: There is a significant difference in the marks obtained before and after the course.
i.e., the course has been useful.
Solution conti……
Participants Before (1st Test) After (2nd test) d=(2nd test – 1st d2
Test)
A 44 53 +9 81
B 40 38 -2 4
C 61 69 +8 64
D 52 57 +5 25
E 32 46 +14 196
F 44 39 -5 25
G 70 73 +3 9
H 41 48 +7 49
I 67 73 +6 36
J 72 74 +2 4
K 53 60 +7 49
L 72 78 +6 36
𝑑 = 60 𝑑2 = 578
Solution
𝑑
conti……..
•𝑑=
𝑛
𝑑=60
n=12
60
𝑑 = = 5 𝑑 2 = 578
12
𝑑2 − 𝑛 𝑑 2
578−12 5 2
278
s= = = = 5.03
𝑛−1 12−1 11
Solution conti……..
𝑑 𝑛 5 12 5∗3.464
𝑡= = = =3.443.
𝑠 5.03 5.03
DF= 12-1=11. Table value = 2.201, calculated Value = 3.443.
• Since the calculated value is greater than the table value, reject null
hypothesis stating that there is a significant difference in the marks obtained
before and after the course. Hence the course has been useful.
Problem for practice
• A certain stimulus administered to each of 12 patients resulted in the following increase in of Blood
pressure : 5, 2, 8, -1, 3, 0, -2, 1, 5, 0, 4 and 6. Can it be concluded that the stimulus in general be
accompanied by an increase in Blood pressure?
• Solution:
• H0=There is no significant deference in the blood pressure readings of the patients before and after.
• H1= Stimulus results in increase in blood pressure -------------Right-tailed test
𝑑 𝑛
• 𝑡=
𝑠
Solution conti…….
Patient No. 1 2 3 4 5 6 7 8 9 10
d 5 2 8 -1 3 0 -2 1 5 0
d2 25 4 64 1 9 0 4 1 25 0

Patient No. 11 12 Total

d 4 6 31
d2 16 36 185

𝑑 31
𝑑= = = 2.58
𝑛 12
𝑑2 − 𝑛 𝑑 2
185−12 2.58 2
S= = = 3.09
𝑛−1 12−1
Solution conti…..
𝑑 𝑛 2.58 𝑛12
•𝑡 = = = 2.89
𝑠 3.09
• Degrees of Freedom= 12-1= 11, Table value: 1.80 (Right-tailed test)
• Inference: Since the calculated value is greater than the table value, reject null hypothesis at 5% level of
significance, Hence it is concluded that the stimulus will, in general, be accompanied by an increase in blood
pressure.
To solve
• Albino rats were administered with an aurvedic medicine at the rate of
10mg/10kg day for 7 days .Initial and the final body weights of the rats were
recorded as shown in the following table. Determine whether the drug has
any significant effect on the gain or loss of body weight of the rat.
Rat No. 1 2 3 4 5 6 7 8 9 10

Initial Body 110 115 102 98 112 110 97 120 102 110
Weight

Final Body 109 116 100 95 108 112 98 115 98 111


Weight

You might also like