Professional Documents
Culture Documents
Basics of Biostatistics PDF
Basics of Biostatistics PDF
------ Lovitt
© 2006
3 4
5 6
1
8/10/2015
Testing of hypothesis
7 8
POPULATION SAMPLES
A population is the group from which a sample A sample is a subset which should be
is drawn
representative of a population
Example
All the students in a school
A sample should be representative if selected
All the patients in a hospital
randomly
Inresearch, it is not practical to include all
members of a population In some cases, the sample may be stratified but
Thus, a sample (a subset of a population) is then randomized within the strata
taken
9 10
2
8/10/2015
13 14
15 16
3- Surveys 4- Experiments
The source may be a survey, if the data needed is about Frequently the data needed to answer a question
answering certain questions. are available only as the results of an experiment.
For example:
Collecting information about patient’s lifestyle, dietary
habits, etc.
17 18
3
8/10/2015
A VARIABLE (DATA)
TYPES OF VARIABLES
It is a characteristic that takes on different
values in different persons, places, or Independent variables
21 22
Alive 38 29
Total 45 46
4
8/10/2015
27 28
5
8/10/2015
FIGURES/CHARTS
Bar
Pie
31 32
33 34
CUMMULATIVE FREQUENCY
BAR CHART DISTRIBUTION
Distribution of Religion in a school
Patients undergoing treatment in a cancer hospital
35 36
6
8/10/2015
Dispersion
Each value is a class in its own.
37
Frequency Histogram
A bar graph that represents the frequency Number of Peas in a Pod
Peas per pod Freq, f
distribution.
1 1 20
1 2 3 4 5 6 7
6 12 Number of Peas
7 3
data
values
7
8/10/2015
Relative frequency
1 1 2 30
class frequency f 2 2 4 25
relative frequency = = 20
Sample size n 3 5 10 15
10
4 9 18 5
o Class midpoints: the value halfway between LCL and 1. Determine the range of the data.
UCL
Range = highest data value – lowest data value
(Lower class limit) + (Upper class limit)
2 May round up to the next convenient number
o Class boundaries: the value halfway between an UCL 2. Decide on the number of classes.
and the next LCL Usually between 5 and 20
(Upper class limit) + (next Lower class limit) 3. Find the class width.
2 range
class width =
number of classes 48
8
8/10/2015
51 52
53 54
9
8/10/2015
Population mean: Σx
µ=
N
Sample mean: where
Σx
x= X = data values of ungrouped data OR
n mid-points of the groups in grouped data
f = frequency of each group
56
FIND MEAN OF THE FOLLOWING DATA FIND MEAN OF THE FOLLOWING DATA
Weight in No. of
Peas per pod Freq, f
kg. persons
1 1 50-54 6
2 2 55-59 18
3 5 60-64 78
4 9 65-69 80
Ans.: 70.4
5 18 70-74 100
6 12 75-79 72
7 3 80-84 30
57 85-89 10 58
90-94 6
10
8/10/2015
90-94 6
90-94 6
11
8/10/2015
PROBLEM 1 PROBLEM 2
Pulse rate of 50 persons is given. Calculate the Calculate the mean, median and mode of the given data.
mean, median and mode of the data.
Pulse rate No. of persons %Hb No. of
67 4 Mean = 71.5 Mean = 13.25
persons
68 5 11.1-12 5
Median = 72 Median = 13.23
69 3
70 2 12.1-13 10
Mode = 73 Mode = 13.31
71 7 13.1-14 15
72 10
14.1-15 4
73 11
74 3 15.1-16 2
75 2 67 68
16.1-17 1
76 3
PROBLEM 3 PROBLEM 4
The table shows the daily expenditure of 100 college
Calculate the mean, median and mode of the given data.
students. Calculate the mean, median and mode of the given
data.
Particle size No. of particles
Expenditure No. of students Mean = 25.1 Mean = 883
(µ)
(Rs.)
Median = 25 100-300 3 Median = 838
0 to 10 14
Mode = 24.29 301-600 9 Mode = 777.8
10 to 20 23
601-900 48
20 to 30 26
901-1200 21
30 to 40 22 1201-1500 19
40 to 50 15 69 70
71 72
12
8/10/2015
73 74
75 76
Standard deviation,
Variance of sample,
Variance of population,
77 78
13
8/10/2015
Weight in No. of
kg. persons
50-54 6
55-59 18
60-64 78
65-69 80
70-74 100
75-79 72
80-84 30
79 85-89 10 80
90-94 6
PROBLEM 1
Pulse rate of 50 persons is given. Calculate the standard
deviation of the data.
Pulse rate No. of persons
67 4
68 5
69 3
70 2
71 7
72 10
73 11
74 3
81
75 2 82
76 3
PROBLEM 2 PROBLEM 3
The table shows the daily expenditure of 100 college
Calculate the standard deviation of the given data.
students. Calculate the standard deviation of the given data.
%Hb No. of
persons Expenditure No. of students
11.1-12 5 (Rs.)
0 to 10 14
12.1-13 10
10 to 20 23
13.1-14 15
14.1-15 4 20 to 30 26
15.1-16 2 30 to 40 22
16.1-17 1 83 40 to 50 15 84
14
8/10/2015
87 88
89 90
15
8/10/2015
91 92
93
Ordinal Median/Mode
Symmetrical – Mean
Quantitative
Skewed – Median
96
95
16
8/10/2015
97 98
99 100
101 102
17
8/10/2015
103 104
105 106
107 108
18
8/10/2015
SKEWNESS SKEWNESS
Measures asymmetry of data If skewness = 0, the data are perfectly
Positive or right skewed: Longer right tail symmetrical.
Negative or left skewed: Longer left tail But a skewness of exactly zero is quite unlikely
for real-world data
n
n ∑ ( xi − x )3 If skewness is less than −1 or greater than +1,
Coefficient of Skewness = i =1
3/ 2
the distribution is highly skewed.
n
∑ ( xi − x ) 2 If skewness is between −1 and −½ or between +½
i =1 and +1, the distribution is moderately skewed.
If skewness is between −½ and +½, the
distribution is approximately symmetric.
If a normal distribution has a skewness of 0, right skewed is
greater then 0 and left skewed is less than 0. 110
109
EXAMPLE EXAMPLE
College Men’s Heights
Here are grouped data
for heights of 100 Height Class Frequency,
xf x-mean (x-mean)2*f (x-mean)3*f
randomly selected male (inches) Mark, x f
111 112
KURTOSIS
n
n ∑ ( xi − x ) 3 Measures peakedness of the distribution of data.
i =1
Coefficient of Skewness = 3/ 2 The height and sharpness of the peak relative to
n
∑ ( xi − x ) 2 the rest of the data are measured by a number called
i =1 kurtosis.
Higher values indicate a higher, sharper peak;
= √100 (-269.33)
(852.75)3/2 lower values indicate a lower, less distinct
peak.
= -2693.3
The kurtosis of normal distribution is 0.
24901.91
= - 0.108
113
Conclusion : Data has a normal distribution
114
19
8/10/2015
KURTOSIS
KURTOSIS Mesokurtic has a kurtosis = 0
There are three types of peakedness. Leptokurtic has a kurtosis that is +
Leptokurtic - very peaked Platykurtic has a kurtosis that is -
Platykurtic - relatively flat
Mesokurtic - in between
Let x1 , x2 ,...xn be n observations. Then,
n
n∑ ( xi − x ) 4
i =1
Kurtosis = 2
−3
n
∑ ( xi − x ) 2
i =1
KURTOSIS EXAMPLE
College Men’s Heights
Here are grouped data
for heights of 100
randomly selected male
students.
Calculate the kurtosis
of the data and give
your interpretation.
118
119 120
20
8/10/2015
SAMPLING
The sampling procedure is an essential
ingredient of a good experiment.
An otherwise excellent experiment or
121 122
Sampling techniques may be roughly divided into Probability sample is one in which each
Probability sampling (Random Sampling)
element of the population has a known
probability of being included in the
Non-probability sampling (Authoritative sampling) sample and are chosen by some random
device.
123 124
SAMPLING TECHNIQUES
PROBABILITY SAMPLING TECHNIQUES SIMPLE RANDOM SAMPLING
SIMPLE RANDOM SAMPLING Most commonly used method
STRATIFIED SAMPLING Each individual (object) in the population to be sampled
has an equal chance of being selected.
SYSTEMATIC SAMPLING
Simple random sampling is most effective when the
CLUSTER SAMPLING
variability is relatively small and uniform over the
population
Eg. Playing cards, names drawn out of a bowl, lottery
125 126
21
8/10/2015
RANDOM SAMPLING
RANDOM SAMPLING RANDOM NUMBER TABLES
BOOK-
A MILLION RANDOM NUMBERS
eg.
(1) Select a sample of 10 bottles
from a batch of 800 bottles.
127 128
129 130
STRATIFIED SAMPLING
DEMERITS OF RANDOM SAMPLING
The Stratification is the process of dividing members
Complete list of all items is required of the population into homogeneous subgroups
before sampling.
When units are spread over large area, this
method can’t be used. Stratified sampling is a recommended way of sampling
when the strata are very different from each other, but
objects within each stratum are alike.
The strata should be mutually exclusive: every
element in the population must be assigned to only one
stratum.
The strata should also be collectively exhaustive: no
population element can be excluded.
131 132
22
8/10/2015
133 134
135 136
137 138
23
8/10/2015
139 140
143 144
24
8/10/2015
145 146
147
Doesn’t allow estimation of sampling error. 148
Should be restricted to small population.
149 150
25
8/10/2015
Desired precision.
The accuracy and precision desired will be a function of
the sampling procedure and sample size.
151 152
155 156
26
8/10/2015
EXAMPLE OF CALCULATING A
CONFIDENCE INTERVAL
CONFIDENCE INTERVALS
Consider measurement of dissolved Ti
in a standard seawater (NASS-3):
Quantifies how far the true mean (µ) lies from the Data: 1.34, 1.15, 1.28, 1.18, 1.33,
measured mean, x. Uses the mean and standard 1.65, 1.48 nM
deviation of the sample. DF = n – 1 = 7 – 1 = 6
ts x = 1.34 nM or 1.3 nM ts
µ=x± s = 0.17 or 0.2 nM µ=x±
n 95% confidence interval
t(df=6,95%) = 2.447
n
CI95 = 1.3 ± 0.16 or 1.3 ± 0.2 nM
where t is from the t-table and n = number of
50% confidence interval
measurements.
t(df=6,50%) = 0.718
Degrees of freedom (df) = n - 1 for the CI.
CI50 = 1.3 ± 0.05 nM
157 158
27
8/10/2015
s1 = 0.07 If |tcalc| < ttable, results are not significantly different at the 95%. CL.
3 mg/g s2 = 0.12 mg/g
If |tcalc| ≥ ttable, results are significantly different at the 95% CL.
n1 n2
=4 =4
163 164
Since |tcalc| (5.056) ≥ ttable (2.447), results from the two methods are
significantly different at the 95% CL.
Wait a minute! There is an important assumption Used to determine if std. devs. are significantly
associated with this t-test: different before application of t-test to compare
replicate measurements or compare means of two
It is assumed that the standard deviations (i.e., the sets of data
precision) of the two sets of data being compared
are not significantly different.
Also used as a simple general test to compare the
•How do you test to see if the two std. devs. are precision (as measured by the std. devs.) of two sets
different? of data
•How do you compare two sets of data whose std. Uses F distribution
devs. are significantly different?
166
© 2006
s12
Fcalc = where s1 > s2
s22
28
8/10/2015
s12 (0.12 ) 2 The use of the t-test for comparing means was justified
Fcalc = = = 2.70
s22 (0.07 3 ) 2 for the previous example because we showed that
standard deviations of the two sets of data were not
Note: Keep 2 or 3 decimal places to compare with Ftable. significantly different.
Compare Fcalc to Ftable at df = (n1 -1, n2 -1) = 3,3 and 95% CL.
If the F-test shows that std. devs. of two sets of data
If Fcalc < Ftable, std. devs. are not significantly different at 95% CL. are significantly different and you need to compare
the means, use a different version of the t-test
If Fcalc ≥ Ftable, std. devs. are significantly different at 95% CL.
Since Fcalc (2.70) < Ftable (9.28), std. devs. of the two sets of data 169 170
are not significantly different at the 95% CL. (Precisions are
similar.)
n1 + 1 n2 + 1
test (the beastly version) (see previous, fully worked-out
example)
171 172
Can use to answer a question such as: Do method one Calculate Qcalc and compare to Qtable
and method two provide similar precisions for the
analysis of the same analyte? Qcalc = gap/range
29
8/10/2015
30
8/10/2015
Measures of association
between 2 variables Z-SCORES
Standardization
The process of converting raw to z-scores Refer to a z-table
to find proportion
The resulting distribution of z-scores will always
under the curve
have a mean of zero, a SD of one, and an area under
the curve equal to one
The proportion of scores that are higher or lower
than a specific z-score can be determined by
referring to a z-table
183 184
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517
Corresponds
0.55570.5596 0.5636
to the 0.5714
0.5675
area 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.59480.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293
under
0.63310.6368
the0.6406
curve in black
0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319185
1.5 0.9332
0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
31