Professional Documents
Culture Documents
Research Methodology and Biostatistics Part II 2
Research Methodology and Biostatistics Part II 2
Methodology and
Biostatistics – part II
statistical tests of significance, type of
significance tests, parametric
tests(students “t” test, ANOVA, Correlation
coefficient, regression),
• Webster’s New Collegiate Dictionary
• Infer – to derive as a conclusion from facts of premises.
Inference – the act of passing from statistical sample data
to generalizations.
• Statistics – a branch of mathematics dealing with the
collection, analysis, interpretation, presentation of masses
Inferential of numerical data.
• Statistical inference is the process of using data obtained
Statistics from a small group of elements (the sample) to make
estimates and test hypothesis about the characteristics of
a larger group of elements (the population).
• EXAMPLE 1: To test the efficacy of drug/s
• EXAMPLE 2: The time required by a robot to do a
repetitive task must - is determined by a few sample
observation.
Biostatistical
Inference
Involves
– Estimation
– Hypothesis testing
Purpose
– Draw conclusions or inferences about
population characteristics
Sampling
Population (universe)
– The set of all items of interest
– The word population does not necessarily refer to a group of people.
Sample
– A set of data drawn (or observed) from the population.
Parameter: Population characteristics or summary measures of the population
are called parameters and they are always constant. Parameters are calculated
from the population data or they are estimated from the sample statistics.
Statistic: Sample characteristic or summary measures of the sample are called
statistic and they varied form sample to sample. Statistics are used to estimate
the corresponding population parameters. Size of sample is denoted by “n”
Sampling distribution: The frequency distribution which is formed with
various values of a statistic computed from different samples of the same size
drawn form the same population.
• Biostatistical inference is a technique in which valid
inferences about the population parameter are
drawn
• Two Aspects: Estimation and Testing of Hypothesis
Estimation: Method by which population parameters
are estimated from the sample information.
Two types: (i) Point Estimate (ii) Interval Estimate
Biostatistical Point Estimate: An estimate by a single value of
statistic used to approximate the parameter of an
Inference unknown population-Point estimate/estimator of
the parameter
Interval Estimate: the population of parameter given
by two numbers between which the parameter is
considered. Two values are computed in such a way
that the interval lies between the two values
containing the parameter- interval estimate /
confidence interval.
A good estimator is
one which is very
Properties:
close to the value
of the parameter.
Efficiency Sufficiency
Testing Hypothesis
Any Statement about a biostatistical population
or the values of its parameter is called
“Biostatistical Hypothesis”
• Two types of statistical Hypothesis
(i) Simple Hypothesis (ii)Composite Hypothesis
• Null Hypothesis
• Alternative Hypothesis
• Critical Region: A critical region, also known as
the rejection region, is a set of values for the
test statistic for which the null hypothesis is
rejected. i.e. if the observed test statistic is in the
critical region then we reject the null hypothesis and
accept the alternative hypothesis.
• Two Types of Errors:
Types of errors
• Type I error: Reject H0 when it is true
Decision from Sample
• Type II error: accept H0 when it is wrong i.e., accept H0 when H1
Reject H0 Accept H0 is true.
P(Reject H0 when it is true) = P(Reject H0/H0) = α
H0 True Wrong Decision Correct P( Accept H0 when it is false) = P(Accept H0/H1) = β
(type I error) The α and β are called the sizes of type I error and type II error,
respectively.
H0 False Correct Wrong Decision In practice, Type I error amounts to rejecting a lot when it is
(type II error) good and type II error may be regarded as accepting the lot when it
is bad.
Thus
P(Reject a lot when it is good) = α
P(Accept a lot when it is bad )= β
Items 1 2 3 4 5 6 7 8 9 10
The lifetime of electric bulbs for Life 4.2 4.6 3.9 4.1 5.2 3.8 3.9 4.3 4.4 5.6
in`000
a random sample of 10 from a Hours
large consignment gave the
following data:
Can we accept the hypothesis
that the average life time of the
bulb is 4000 hours.
Solution:
• H0 (Null Hypothesis): There is no significant
difference between the sample mean and
the hypothetical mean i.e., the sample
comes from the population having average
lifetime of 4000 hours.
Problem • H1 (Alternative Hypothesis): There is a
significant difference between the sample
solution mean and the population mean i.e., the
sample does not come from the population
having average lifetime of 4000 hours.
Sl.No. x
1 4.2 Solution conti……
2 4.6
3 3.9 (𝑋 −𝜇)
•𝑡= 𝑠
4 4.1 𝑛
5 5.2 𝑥
• 𝑋=
𝑛
6 3.8 44
7 3.9
•= = 4.4
10
8 4.3
9 4.4
10 5.6
n=10 ∑𝑥=44
Sl.No. 𝑥 (𝒙 − 𝒙) (𝒙 − 𝒙) 𝟐
Solution conti…… 1 4.2 -0.2 0.04
2 4.6 -0. 2 0.04
2 3 3.9 -0.5 0.25
(𝑋−𝑋)
• s= 4 4.1 -0.3 0.09
𝑛−1
5 5.2 0.8 0.64
3.12
• s= 6 3.8 -0.6 0.36
10−1
7 3.9 -0.5 0.25
3.12
• s= = 0.589 8 4.3 -0.1 0.01
9 9 4.4 0 0
(𝑋 −𝜇)
•𝑡= 𝑠
10 5.6 1.2 1.44
n=10 ∑𝑥=44 𝟐
𝑛 (𝒙 − 𝑿) =
3.12
(𝑋 −𝜇)
•𝑡= 𝑠
𝑛
(4.4−4)
•= 0.589 = 2.148.
Solution 10
• Degrees of freedom(ν) =n-1=10-1=9;
Conti…… • Calculate value is 2.148 is less than the table
value 2.26 ,accept null hypothesis stating
that the average lifetime of the bulb could
be 4000 hours.
Problem
• A random sample of size 16 , has 53 as mean. The sum of the squares of the
deviations from the mean is 135.Can this sample be regarded as taken from
the population with 56 as the mean. Obtain 95% and 99% confidence limits of
the mean of the population.
• Solution:
• H0 (Null Hypothesis): There is no significant difference between the sample
mean and the population mean i.e., the sample comes from the population
having a mean of 56.
• H1 (Alternative Hypothesis): There is a significant difference between the
sample mean and the population mean i.e., the sample does not come from
the population having a mean of 56.
(𝑋 −𝜇)
•𝑡= 𝑠
𝑛
𝑋=53
𝜇=56
Problem – n=16
2
Solution …… (𝑋 − 𝑋) =135
2
(𝑋−𝑋)
s=
𝑛−1
135 135
= = =3
16−1 15
(𝑋 − 𝜇)
𝑡= 𝑠
𝑛
(53−56)
= 3 =4
16
Problem – • Degrees of Freedom (ν)= 𝑛 − 1 = 16 − 1 =
Solution …… 15 , table value t0.05=2.13.
• Calculate value is 4 greater than the table value
table value t0.05=2.13, reject null hypothesis
stating that there is a significant difference
between the sample mean and the population
mean i.e., the sample does not come from the
population having a mean of 56
• 95% and 99% confidence limits
𝑠
• the 95% fiducial limits 𝑋 ± t 0.05
𝑛
3
53 ± ∗ 2.13
16
Problem – 53 ± 1.6
Solution = 51.4 to 54.6
𝑠
conti…. the 99% fiducial limits 𝑋 ±
𝑛
t 0.01
3
53 ± *2.95 = 53 ±2.212
16
= 50.788 to 55.212
Problem for practice
• Nakamura et al. studied subjects with medial collateral ligament (MCL) and anterior cruciate ligament (ACL)
tears. Between February 1995 and December 1997, 17 consecutive patients with combined acute ACL and grade
III MCL injuries were treated by the same physician at the research center. One of the variables of interest was
the length of time in days between the occurrence of the injury and the first magnetic resonance imaging (MRI).
The data are shown in Table. We wish to know if we can conclude that the mean number of days between injury
and initial MRI is not 15 days in a population presumed to be represented by these sample data.
Test of difference between means of two
samples
𝑋1−𝑋2 𝑛1𝑛2
•𝑡 = *
𝑆 𝑛1+𝑛2
𝑋1−𝑋1 2+ 𝑋2−𝑋2 2
• S=
𝑛1+𝑛2−2
• Degrees of freedom ν= 𝑛1 + 𝑛2 −2
Problems for practice
• Two types of drugs were used on
5 and 7 patients for reducing
Drug A 10 12 12 11 14
their weight. Drug A was
imported, and Drug B was Drug B 8 9 12 14 15 10 9
indigenous .The decrease in the
weight after using the drugs for
six months was as follows:
• Is there a significant difference in
the efficacy of the two drugs.
Solution
• H (Null Hypothesis): There is no significant difference between the efficacy of Drug A
0
and Drug B
• H1 (Alternative Hypothesis): There is a significant difference between efficacy of Drug A
and Drug B.
𝑋1−𝑋2 𝑛1𝑛2
•𝑡 = *
𝑆 𝑛1+𝑛2
Solution
𝑋1 𝑋1 − 𝑋1 𝑋1 − 𝑋 1 2 𝑋2 𝑋2 − 𝑋2 𝑋2 − 𝑋2 2
10 -2 4 8 -3 9
12 0 0 9 -2 4
13 +1 1 12 +1 1
11 -1 1 14 +3 9
14 +2 4 15 +4 16
10 -1 1
9 -2 4
𝑋1 = 60 𝑋1 − 𝑋1 2 = 10 𝑋2= 77 𝑋2 − 𝑋2 2 =44
𝑋1 60 𝑋2 77
𝑋1= = = 12 𝑋2= = = 11
𝑛1 5 𝑛2 7
Solution conti…..
𝑋1−𝑋1 2+ 𝑋2−𝑋2 2
• S=
𝑛1+𝑛2−2
10+44 54
= = =2.324
5+7−2 10
𝑋1−𝑋2 𝑛1𝑛2
𝑡= *
𝑆 𝑛1+𝑛2
of two samples 𝑑 =
𝑛
(Dependent 𝑑 = Mean of the differences s= Standard Deviation
sample or s=
𝑑2 − 𝑛 𝑑 2
d 4 6 31
d2 16 36 185
𝑑 31
𝑑= = = 2.58
𝑛 12
𝑑2 − 𝑛 𝑑 2
185−12 2.58 2
S= = = 3.09
𝑛−1 12−1
Solution conti…..
𝑑 𝑛 2.58 𝑛12
•𝑡 = = = 2.89
𝑠 3.09
• Degrees of Freedom= 12-1= 11, Table value: 1.80 (Right-tailed test)
• Inference: Since the calculated value is greater than the table value, reject null hypothesis at 5% level of
significance, Hence it is concluded that the stimulus will, in general, be accompanied by an increase in blood
pressure.
To solve
• Albino rats were administered with an aurvedic medicine at the rate of
10mg/10kg day for 7 days .Initial and the final body weights of the rats were
recorded as shown in the following table. Determine whether the drug has
any significant effect on the gain or loss of body weight of the rat.
Rat No. 1 2 3 4 5 6 7 8 9 10
Initial Body 110 115 102 98 112 110 97 120 102 110
Weight