Basics of Biostatistics PDF

8/10/2015
Statistics is the science

“
which deals with

BIOSTATISTICS collection, classification and tabulation of
numerical facts
Niranjan Kanaki
K. B. Institute of Pharmaceutical Education and as the basis for explanation, description and
Research, Gandhinagar
comparison of phenomenon”.
------ Lovitt
© 2006
BIOSTATISICS NEED FOR BIOSTATISTICS

Statisticsarising out of biological sciences, such Variation is an inherent characteristic of experimental
as Medicine, public health, plant sciences, observations.
agriculture, etc.
Reasons of variations:
The methods used in dealing with statistics in
The instrument used for the analysis
the fields of medicine, biology and public health
The analyst performing the assay
for planning, conducting and analyzing data
which arise in investigations of these branches. The particular sample chosen
Unidentified, uncontrollable background error – “Noise”
3 4
FUNCTIONS OF STATISTICS MAIN BRANCHES OF BIOSTATISTICS

Statistics is a field of study concerned with Descriptive Biostatistics
1- collection, organization, summarization and analysis Methods of producing quantitative
of data. summaries of information in biological
- DESCRIPTIVE STATISTICS sciences
Tabulation and graphical presentations
2- drawing of inferences about a body of data when only
Measures of central tendency
a part of the data is observed.
Measures of dispersion
- INFERENTIAL STATISTICS
5 6
1
8/10/2015
BRANCHES OF BIOSTATISTICS POPULATIONS AND SAMPLES

Inferential Biostatistics Samples are usually a relatively small
Methods of making generalizations about a larger
number of observations taken from a
group based on information about a subset
(sample) of that group in biological sciences relatively large Population.
Estimation
Testing of hypothesis
7 8
POPULATION SAMPLES
A population is the group from which a sample A sample is a subset which should be
is drawn
representative of a population
Example
All the students in a school
A sample should be representative if selected
All the patients in a hospital
randomly
Inresearch, it is not practical to include all
members of a population In some cases, the sample may be stratified but
Thus, a sample (a subset of a population) is then randomized within the strata
taken
9 10
EXAMPLE EXAMPLES OF POPULATION AND SAMPLE

POPULATION SAMPLE
We want a sample that will reflect a
Tablet batch 20 tablets taken for content
population’s gender and age: uniformity
Normal males between 18- 24 subjects selected for a phase
1. Stratify the data by gender 65 years I clinical study
Sprague-Dawley rats 100 rats selected to study
2. Within each strata, further stratify by age toxicity of a new drug
Analysts working for 3 analyst selected to test a new
company X assay method
3. Select randomly within each gender/age strata
Persons with diastolic BP 120 such patients selected for
so that the number selected will be between 105 and 120 clinical study to compare two
proportional to that of the population mmHg antihypertensive drugs
Serum cholesterol levels of Blood samples drawn once in a
11
one patient week for 3 months from the 12
patient
2
8/10/2015
PARAMETER AND STATISTIC DATA

The raw material of Statistics is data.
Parameter: Summary value or characteristic of We may define data as figures. Figures result
population or universe from the process of counting or from taking a
measurement.
Statistic:
Summary value or characteristic of For example:
sample used for making inferences about - When a hospital administrator counts the
parameter number of patients (counting).

- When a nurse weighs a patient (measurement)
13 14
SOURCES OF DATA 2- External sources

1- Routinely kept records The data needed to answer a question may already
exist in the form of published reports, commercially
For example: available data banks, or the research literature, i.e.
- Hospital medical records contain immense
someone else has already asked the same question.
amounts of information on patients. For example:
- Records of students’ attendance, marks, etc. in a - studying patients’ data from various hospitals and correlating
school/college it with a disease condition
15 16
3- Surveys 4- Experiments
The source may be a survey, if the data needed is about Frequently the data needed to answer a question
answering certain questions. are available only as the results of an experiment.
For example:
Collecting information about patient’s lifestyle, dietary
habits, etc.
17 18
3
8/10/2015
A VARIABLE (DATA)
TYPES OF VARIABLES
It is a characteristic that takes on different
values in different persons, places, or Independent variables
things. Precede dependent variables in time

Are often manipulated by the researcher
For example:
The treatment or intervention that is used in a
- heart rate study
- the heights of adult males
- the serum cholesterol levels of a person Dependent variables
What is measured as an outcome in a study
- the weight of tablets from a batch
Values depend on the independent variable
19 20
QUALITATIVE DATA OR VARIABLE

TYPES OF VARIABLES (CATEGORICAL DATA)
A variable or characteristic which cannot be
measured in quantitative form
can only be identified by name or categories
Qualitative Quantitative
For example:
variables variables
place of birth,
Religion
stages of breast cancer (I, II, III, or IV)
Nominal Ordinal Discrete Continuous

variables variables variables variables
21 22
QUALITATIVE NOMINAL DATA NOMINAL SCALE DATA EXAMPLE

Data that represent categories or names. survival status of propanolol - treated and
There is no implied order to the categories of control patients with myocardial infarction
nominal data.
In these types of data, individuals are simply
Status 28 days Propanolol Control
after hospital -treated patient Patients
placed in the proper category or group, and the admission
number in each category is counted.
Each item must fit into exactly one category. Dead 7 17
Alive 38 29
Total 45 46
Survival rate 84% 63%

23 24
4
8/10/2015
SOME OTHER EXAMPLES OF NOMINAL DATA QUALITATIVE ORDINAL DATA

Example: Sex ( M, F) It is similar to nominal because the measurement
involve categories, however, the categories are
Exam result (P, F)
ordered by rank.
Blood Group (A,B, O or AB)
Color of Eyes (blue, green, Pain level (Mild, Moderate, Severe)
brown, black) Tumors (Stage 0, ……, IV)
Anemia's ( Microcytic, Macrocytic Arthritis (Class 1, ……, 4 )
Religion - Christianity, Islam, Hinduism, Military Rank (Lt., Capt., Maj., Col., General)
etc Response to treatment (poor, fair, good)
Severity of disease (mild, moderate, severe)
25 26
Income status (low, middle, high)
QUANTITATIVE DATA OR VARIABLE

(NUMERICAL DATA) QUANTITATIVE DISCRETE VARIABLES
Discrete variables have a set of possible values
A quantitative variable is one that can be that are finite or countably infinite.
measured and expressed numerically and Often whole number (integers)
they can be of two types (discrete or characterized by gaps or interruptions in the
continuous). values that it can assume.
Examples:
- The number of daily admissions to a general
hospital
- Attendance in a class
27 28
QUANTITATIVE CONTINUOUS VARIABLES Discrete data -- Gaps between possible values
can assume any value within a specified relevant

interval of values assumed by the variable.
Examples: Number of Children

- Height
Continuous data -- Theoretically,
- Weight
- Duration of seizure
no gaps between possible values
No matter how close together the observed heights

of two people, we can find another person whose
height falls somewhere in between.
Hb
29 30
5
8/10/2015
HOW TO DESCRIBE A CATEGORICAL

VARIABLE (QUALITATIVE DATA)?
FREQUENCY DISTRIBUTION TABLE
STATISTICS A table that organizes data values into classes or
Frequency distribution
intervals along with number of values that fall in each
Relative frequency distribution
Cumulative frequency distribution

class (frequency, f ).
FIGURES/CHARTS
Bar
Pie
31 32
FREQUENCY DISTRIBUTION TABLE

PIE CHART
Distribution of Religion in a school
Religion Frequency % Frequency

Hindu 478 79.7
Muslim 65 10.8
Christian 51 8.5
Others 6 1.0
Total 600 100.0
33 34
CUMMULATIVE FREQUENCY
BAR CHART DISTRIBUTION
Patients undergoing treatment in a cancer hospital
Stage of cancer No. of patients Cummulative

frequency
distribution
I 52 52
II 24 (52+24=) 76
III 69 (52+24+69=) 145
IV 20 (52+24+69+20=) 165
Total 165
35 36
6
8/10/2015
HOW TO DESCRIBE A NUMERICAL

VARIABLE (QUANTITATIVE DATA)? FREQUENCY DISTRIBUTIONS
STATISTICS 1. Ungrouped Frequency Distribution –
Frequency Distribution
for data sets with few different values.
Central tendency
Dispersion
Each value is a class in its own.
FIGURES/CHARTS 2. Grouped Frequency Distribution: for

Histogram
data sets with many different values, which
Frequency polygon
are grouped together in the classes.
37
GROUPED AND UNGROUPED

FREQUENCY DISTRIBUTIONS UNGROUPED FREQUENCY DISTRIBUTIONS
Ungrouped Grouped Number of Peas in a Pea Freq,

Pod Peas per pod f
Age of Frequency, f Age of Frequency, f Sample Size: 50
1 1
child Voters 5 5 4 6 4
3 7 6 3 5 2 2
1 25 18-30 202 6 5 4 5 5
3 5
2 38 31-42 508 6 2 3 5 5
5 5 7 4 3
4 9
3 217 43-54 620 4 5 4 5 6 5 18
4 1462 55-66 413 5 1 6 2 6
6 12
6 6 6 6 4
5 932 67-78 158 7 3
4 5 4 5 3
6 15 78-90 32 5 5 7 6 5
GRAPHS OF FREQUENCY DISTRIBUTIONS: FREQUENCY HISTOGRAM

FREQUENCY HISTOGRAMS Peas per Pod
Frequency Histogram
A bar graph that represents the frequency Number of Peas in a Pod
Peas per pod Freq, f
distribution.
1 1 20
The horizontal scale is quantitative and

2 2 15
Frequency, f
measures the data values.

3 5 10
The vertical scale measures the frequencies of
the classes. 4 9 5
Consecutive bars must touch. 5 18 0

frequency
1 2 3 4 5 6 7
6 12 Number of Peas
7 3
data
values
7
8/10/2015
RELATIVE FREQUENCY DISTRIBUTIONS RELATIVE FREQUENCY DISTRIBUTIONS

AND RELATIVE FREQUENCY HISTOGRAMS AND RELATIVE FREQUENCY HISTOGRAMS
Relative Frequency Distribution Peas per Rel. No. of peas in a pod

Shows the portion or percentage of the data that pod Freq, f Freq. 40
35
falls in a particular class.
Relative frequency
1 1 2 30
class frequency f 2 2 4 25
relative frequency = = 20
Sample size n 3 5 10 15
10
4 9 18 5
Relative Frequency Histogram 5 18 36 0

1 2 3 4 5 6 7
Has the same shape and horizontal scale as a 6 12 24 No. of peas per pod
histogram, but the vertical scale is marked with 7 3 6
relative frequencies.
Total 50
43
GROUPED FREQUENCY DISTRIBUTION

GROUPED FREQUENCY DISTRIBUTIONS TERMS
Grouped Frequency Distribution o Class Limits: the smallest value of a class is the lower
For data sets with many different values.
class limit and the highest value of a class is its Upper
Groups data into 5-20 classes of equal width.
class limit.
Exam Scores Freq, f o Class width: is the difference between two consecutive
30-39 1
lower class limits
40-49 0
50-59 4
60-69 9
70-79 13
80-89 10
90-99 3
46
LABELING GROUPED FREQUENCY CONSTRUCTING A GROUPED FREQUENCY

DISTRIBUTIONS DISTRIBUTION
o Class midpoints: the value halfway between LCL and 1. Determine the range of the data.
UCL
Range = highest data value – lowest data value
(Lower class limit) + (Upper class limit)
2 May round up to the next convenient number
o Class boundaries: the value halfway between an UCL 2. Decide on the number of classes.
and the next LCL Usually between 5 and 20
(Upper class limit) + (next Lower class limit) 3. Find the class width.
2 range
class width =
number of classes 48
Round up to the next convenient number.
8
8/10/2015
SERUM CHOLESTROL CHANGES (MG%) FOR 156 PATIENTS AFTER

CONSTRUCTING A GROUPED FREQUENCY ADMINISTRATION OF A ANTI-HYPERCHOLESTEROLEMIC DRUG
DISTRIBUTION
4. Find the class limits.
Choose the first LCL: use the minimum data entry or
Larson/Farber 4th ed.

something smaller that is convenient.
Find the remaining LCLs: add the class width to the

lower limit of the preceding class.
Find the UCLs: Remember that classes must cover all

data values and cannot overlap.
5. Find the frequencies for each class.

49
50
FREQUENCY DISTRIBUTION HISTOGRAM / FREQUENCY POLYGON

Frequency distribution of serum cholesterol changes Age distribution of people watching a particular TV show
51 52
FREQUENCY POLYGON MEASURES OF CENTRAL TENDENCY

Age distribution of people watching a particular TV show Mean
Median
Mode
53 54
9
8/10/2015
MEASURE OF CENTRAL TENDENCY:

MEAN MEAN
Mean : The sum of all the values of data series divided Mean of ungrouped and grouped frequency
by the total number of values. distribution:
Population mean: Σx
µ=
N
Sample mean: where
Σx
x= X = data values of ungrouped data OR
n mid-points of the groups in grouped data
f = frequency of each group
56
FIND MEAN OF THE FOLLOWING DATA FIND MEAN OF THE FOLLOWING DATA
Weight in No. of
Peas per pod Freq, f
kg. persons
1 1 50-54 6
2 2 55-59 18
3 5 60-64 78
4 9 65-69 80
Ans.: 70.4
5 18 70-74 100
6 12 75-79 72
7 3 80-84 30
57 85-89 10 58
90-94 6
MEASURES OF CENTRAL TENDENCY:

MEDIAN MEDIAN OF GROUPED DATA
Median Median = L + (n/2) − cfb × c
The value that divides a series of values in half fm
when they are all listed in order where:
When there are an odd number of values L is the lower class boundary of the class
The median is the middle value containing the median
When there are an even number of values n is the total number of data
The median is the mean of the two middle values.
cfb is the cumulative frequency of the class before
the median class
fm is the frequency of the median class
c is the class width

59 60
10
8/10/2015
MEASURE OF CENTRAL TENDENCY:

FIND MEDIAN OF THE FOLLOWING DATA MODE
Mode
Weight in No. of
The data value that occurs with the greatest
kg. persons
frequency.
50-54 6
If no value is repeated the data set has no mode.
55-59 18 If two values occur with the same greatest frequency,
60-64 78 each entry is a mode (bimodal).
65-69 80
Ans.: 70.4 a) 5.40 1.10 0.42 0.73 0.48 1.10
70-74 100 Mode is 1.10
75-79 72 b) 27 27 27 55 55 55 88 88 99 Bimodal - 27 & 55
80-84 30 c) 1 2 3 6 7 8 9 10 No Mode
85-89 10 61
90-94 6
MEASURES OF CENTRAL TENDENCY MODE OF GROUPED DATA

Mode
The modal value is
the highest bar in a
Mode Mode = L + f 1− f 0 ×c
histogram Number of Peas in a Pod 2f1 − f0 − f2
20 where
15 L is the lower class boundary of the modal class
Frequency, f
f1 is the frequency of the modal class

10
f0 is the frequency of the class before the modal
5
class in the frequency table
0
1 2 3 4 5 6 7
f2 is the frequency of the class after the modal
Number of Peas 63 class in the frequency table 64
c is the class width of the modal class
COMPARING THE MEAN, MEDIAN, AND MODE

FIND MODE OF THE FOLLOWING DATA All three measures describe an “average”. Choose the
one that best represents a “typical” value in the set.
Weight in No. of
Mean:
kg. persons
The most familiar average.
50-54 6 A reliable measure because it takes into account every entry
55-59 18 of a data set.
60-64 78 May be greatly affected by outliers or skew.
65-69 80 Median:
Ans.: 71.58 A common average.
70-74 100
Not as effected by skew or outliers.
75-79 72
Mode: May be used if there is an overwhelming repeat.
80-84 30
85-89 10 65
90-94 6
11
8/10/2015
PROBLEM 1 PROBLEM 2
Pulse rate of 50 persons is given. Calculate the Calculate the mean, median and mode of the given data.
mean, median and mode of the data.
Pulse rate No. of persons %Hb No. of
67 4 Mean = 71.5 Mean = 13.25
persons
68 5 11.1-12 5
Median = 72 Median = 13.23
69 3
70 2 12.1-13 10
Mode = 73 Mode = 13.31
71 7 13.1-14 15
72 10
14.1-15 4
73 11
74 3 15.1-16 2
75 2 67 68
16.1-17 1
76 3
PROBLEM 3 PROBLEM 4
The table shows the daily expenditure of 100 college
Calculate the mean, median and mode of the given data.
students. Calculate the mean, median and mode of the given
data.
Particle size No. of particles
Expenditure No. of students Mean = 25.1 Mean = 883
(µ)
(Rs.)
Median = 25 100-300 3 Median = 838
0 to 10 14
Mode = 24.29 301-600 9 Mode = 777.8
10 to 20 23
601-900 48
20 to 30 26
901-1200 21
30 to 40 22 1201-1500 19
40 to 50 15 69 70
MEASURES OF DISPERSION MEASURES OF DISPERSION

Range Range
Standard Deviation (SD) The difference between maximum and minimum
Variance values
Interquartile range (IQR) Range = maximum value – minimum value
71 72
12
8/10/2015
STANDARD DEVIATION (SD) STANDARD DEVIATION (SD)

SD is a measure of the variability of a set of data
The mean represents the average of a group of
values, with some of the values being above the mean
and some below
In effect, SD is the average amount of spread in a
distribution of values
SD of sample S
SD of population σ
Variance (S2) is another measure of spread
73 74
DISTANCES AGES DEVIATE ABOVE AND

BELOW THE MEAN Adding deviations CALCULATING S2
always equals zero
Since the total of differences from the mean
always equals zero
Values must first be squared, which cancels the
negative signs
75 76
FORMULA TO CALCULATE THE SD Calculate the SD of following values:
101.8, 103.2, 104.0, 102.5, 103.5
Standard deviation,
Variance of sample,
Variance of population,
77 78
13
8/10/2015
FIND STANDARD DEVIATION OF THE

CALCULATION OF STANDARD DEVIATION FOLLOWING DATA
Weight in No. of
kg. persons
50-54 6
55-59 18
60-64 78
65-69 80
70-74 100
75-79 72
80-84 30
79 85-89 10 80
90-94 6
PROBLEM 1
Pulse rate of 50 persons is given. Calculate the standard
deviation of the data.
Pulse rate No. of persons
67 4
68 5
69 3
70 2
71 7
72 10
73 11
74 3
81
75 2 82
76 3
PROBLEM 2 PROBLEM 3
The table shows the daily expenditure of 100 college
Calculate the standard deviation of the given data.
students. Calculate the standard deviation of the given data.
%Hb No. of
persons Expenditure No. of students
11.1-12 5 (Rs.)
0 to 10 14
12.1-13 10
10 to 20 23
13.1-14 15
14.1-15 4 20 to 30 26
15.1-16 2 30 to 40 22
16.1-17 1 83 40 to 50 15 84
14
8/10/2015
PROBLEM 4 THE SHAPE OF DATA

Calculate the standard deviation of the given data.
Histograms of frequency distributions have
different shapes.
Life of bulb No. of bulbs
Distributions are often symmetrical with most
(hrs) scores falling in the middle and fewer toward the
40-55 10 extremes
55-70 12 Most biological data are symmetrically
distributed and form a normal curve (a.k.a, bell-
70-85 15 shaped curve)
85-100 13
100-115 10
85 86
THE SHAPE OF DATA (CONT.) THE NORMAL DISTRIBUTION

The area under a normal curve has a normal
distribution (a.k.a., Gaussian distribution)
Properties of a normal distribution
Line depicting It is symmetric about its mean

the shape of The highest point is at its mean
the data The height of the curve decreases as one moves away
from the mean in either direction, approaching, but
never reaching zero
87 88
THE NORMAL DISTRIBUTION THE NORMAL DISTRIBUTION

(CONT.) (CONT.)
Mean
Mean = Median = Mode
The highest point of
As one moves away from
the overlying
the mean in either direction
normal curve is at
the height of the curve
the mean
decreases, approaching,
but never reaching zero
A normal distribution is symmetric about its mean
89 90
15
8/10/2015
SKEWED DISTRIBUTIONS SKEWED DISTRIBUTIONS (CONT.)

The data are not distributed symmetrically in Skew is always toward the direction of the longer
skewed distributions tail
Positive if skewed to the right
Consequently, the mean, median, and mode are not
Negative if to the left
equal and are in different positions
Scores are clustered at one end of the distribution
A small number of extreme values are located
towards one end The mean is shifted
the most
91 92
COMPARING THE MEAN, MEDIAN, AND MODE

SKEWED DISTRIBUTIONS (CONT.)
For a normal distribution :
Because the mean is shifted so much, it is not the Mean = Median = Mode
best estimate of the average score for skewed
distributions For a skewed distribution :
The median is a better estimate of the center of
Mode = 3(Median) – 2(Mean)
skewed distributions
It will be the central point of any distribution
50% of the values are above and 50% below the
median
93
CENTRAL TENDENCY FOR DIFFERENT MORE PROPERTIES

TYPES OF DATA OF NORMAL CURVES
About 68.3% of the area under a normal curve is
within one standard deviation (SD) of the mean
Best measure of
Type of data About 95.5% is within two SDs
central tendency
About 99.7% is within three SDs
Nominal Mode
Ordinal Median/Mode
Symmetrical – Mean
Quantitative
Skewed – Median
96
95
16
8/10/2015
MORE PROPERTIES WIDE SPREAD RESULTS IN HIGHER SDS

OF NORMAL CURVES (CONT.) NARROW SPREAD IN LOWER SDS
97 98
SPREAD IS IMPORTANT WHEN

COMPARING 2 OR MORE GROUP MEANS COEFFICIENT OF VARIATION (CV)
Also known as Relative Standard Deviation
(RSD)
It is more difficult to
S
see a clear distinction % CV/ RSD = x 100
X
between groups
in the upper example
%CV = 10, means that s.d. is 10% of the mean.
because the spread is
wider, even though the Allows comparison of variability in different
means are the same kinds of measurements.
99 100
STANDARD DEVIATION OF THE MEAN

HOW MUCH SPREAD IS ACCEPTABLE? (STANDARD ERROR OF THE MEAN, SEM)
This varies with the experiment. It is a measure of the variability of the mean
Example: Means of Potencies of Five Sets of 100 Tablets
Repeatability in normal HPLC analysis Selected from a Production Batch
- %CV should not be >2%
Repeatability in bioanalysis by HPLC

- %CV should not be >20%
In biological experiments, CV may be as high as 20-

50 %
101 102
17
8/10/2015
STANDARD ERROR OF THE MEAN, SEM PRECISION AND ACCURACY

Precision
Instead of studying several samples, it is
statistically calculated by the formula, Refers to the extent of variability of a group of
measurements observed under similar experimental
conditions
Measure of reproducibility
103 104
PRECISION AND ACCURACY PRECISION AND ACCURACY

Accuracy
The accuracy of a measurement is determined by how close a
measured value is to its “true” value.
For example, if a sample is known to weigh 3.182 g, then weighed
five different times by a student with the resulting data: 3.200 g,
3.180 g, 3.152 g, 3.168 g, 3.189 g
The most accurate measurement would be 3.180 g, because it is
closest to the true “weight” of the sample.
Difference between accuracy and precision
105 106
SHAPE OF DATA SKEWNESS, KURTOSIS
Shape of data is measured by Skewness (Sk), Pearsonian coefficient, is a

Skewness measure of asymmetry of a distribution around
Kurtosis its mean.
Kurtosis characterizes the relative peakedness
or flatness of a distribution compared with the
normal distribution.
107 108
18
8/10/2015
SKEWNESS SKEWNESS
Measures asymmetry of data If skewness = 0, the data are perfectly
Positive or right skewed: Longer right tail symmetrical.
Negative or left skewed: Longer left tail But a skewness of exactly zero is quite unlikely
for real-world data
n
n ∑ ( xi − x )3 If skewness is less than −1 or greater than +1,
Coefficient of Skewness = i =1
3/ 2
the distribution is highly skewed.
 n 
 ∑ ( xi − x ) 2  If skewness is between −1 and −½ or between +½
 i =1  and +1, the distribution is moderately skewed.
If skewness is between −½ and +½, the
distribution is approximately symmetric.
If a normal distribution has a skewness of 0, right skewed is
greater then 0 and left skewed is less than 0. 110
109
EXAMPLE EXAMPLE
College Men’s Heights
Here are grouped data
for heights of 100 Height Class Frequency,
xf x-mean (x-mean)2*f (x-mean)3*f
randomly selected male (inches) Mark, x f
students. 59.5–62.5 61 5 305 -6.45 208.01 -1341.68

62.5–65.5 64 18 1152 -3.45 214.25 -739.15
Calculate the skewness
65.5–68.5 67 42 2814 -0.45 8.51 -3.83
coeficient and comment
68.5–71.5 70 27 1890 2.55 175.57 447.70
on the skewness of the
71.5–74.5 73 8 584 5.55 246.42 1367.63
data.
100 6745 852.75 -269.33
Mean 67.45
111 112
KURTOSIS
n
n ∑ ( xi − x ) 3 Measures peakedness of the distribution of data.
i =1
Coefficient of Skewness = 3/ 2 The height and sharpness of the peak relative to
 n

 ∑ ( xi − x ) 2  the rest of the data are measured by a number called
 i =1  kurtosis.
Higher values indicate a higher, sharper peak;
= √100 (-269.33)
(852.75)3/2 lower values indicate a lower, less distinct
peak.
= -2693.3
The kurtosis of normal distribution is 0.
24901.91
= - 0.108
113
Conclusion : Data has a normal distribution
114
19
8/10/2015
KURTOSIS
KURTOSIS Mesokurtic has a kurtosis = 0
There are three types of peakedness. Leptokurtic has a kurtosis that is +
Leptokurtic - very peaked Platykurtic has a kurtosis that is -
Platykurtic - relatively flat
Mesokurtic - in between
Let x1 , x2 ,...xn be n observations. Then,
n
n∑ ( xi − x ) 4
i =1
Kurtosis = 2
−3
 n 
 ∑ ( xi − x ) 2 
 i =1 
KURTOSIS EXAMPLE
College Men’s Heights
Here are grouped data
for heights of 100
randomly selected male
students.
Calculate the kurtosis
of the data and give
your interpretation.
118
Height Class Frequency, n

(inches) Mark, x f
xf x-mean (x-mean)2*f (x-mean)4*f
n∑ ( xi − x ) 4
i =1
59.5–62.5 61 5 305 -6.45 208.01 8653.84 Kurtosis = 2
−3
 n 
62.5–65.5 64 18 1152 -3.45 214.25 2550.05  ∑ ( xi − x ) 2 
65.5–68.5 67 42 2814 -0.45 8.51 1.72  i =1 
68.5–71.5 70 27 1890 2.55 175.57 1141.63 = 100*(19937.59) _ 3
(852.75)2
71.5–74.5 73 8 584 5.55 246.42 7590.35
= 1993759 _ 3
100 6745 -2.25 852.75 19937.59 727182.56
Mean 67.45 = - 0.258
119 120
20
8/10/2015
SAMPLING
The sampling procedure is an essential
ingredient of a good experiment.
An otherwise excellent experiment or
SAMPLING TECHNIQUES investigation can be invalidated if proper

attention is not given to choosing samples in a
manner consistent with the experimental design
or objectives.
Statistical treatment of data and the inference
based on experimental results depend on the
sampling procedure.
121 122
SAMPLING TECHNIQUES PROBABILITY SAMPLING TECHNIQUES
Sampling techniques may be roughly divided into Probability sample is one in which each
Probability sampling (Random Sampling)
element of the population has a known
probability of being included in the
Non-probability sampling (Authoritative sampling) sample and are chosen by some random
device.
123 124
SAMPLING TECHNIQUES
PROBABILITY SAMPLING TECHNIQUES SIMPLE RANDOM SAMPLING
SIMPLE RANDOM SAMPLING Most commonly used method
STRATIFIED SAMPLING Each individual (object) in the population to be sampled
has an equal chance of being selected.
SYSTEMATIC SAMPLING
Simple random sampling is most effective when the
CLUSTER SAMPLING
variability is relatively small and uniform over the
population
Eg. Playing cards, names drawn out of a bowl, lottery
125 126
21
8/10/2015
RANDOM SAMPLING
RANDOM SAMPLING RANDOM NUMBER TABLES
BOOK-
A MILLION RANDOM NUMBERS
eg.
(1) Select a sample of 10 bottles
from a batch of 800 bottles.
(2) Allocation of treatment to 20

patients in a BA study
127 128
MERITS OF RANDOM SAMPLING

More scientific
Theory of probability is applicable.
Economical
Good for homogeneous population
129 130
STRATIFIED SAMPLING
DEMERITS OF RANDOM SAMPLING
The Stratification is the process of dividing members
Complete list of all items is required of the population into homogeneous subgroups
before sampling.
When units are spread over large area, this
method can’t be used. Stratified sampling is a recommended way of sampling
when the strata are very different from each other, but
objects within each stratum are alike.
The strata should be mutually exclusive: every
element in the population must be assigned to only one
stratum.
The strata should also be collectively exhaustive: no
population element can be excluded.
131 132
22
8/10/2015
STRATIFIED SAMPLING TYPES OF STRATIFIED SAMPLING

This method improves the representativeness of Proportionate sampling uses a sampling fraction
the sample by reducing sampling error. in each of the strata that is proportional to that
Example: of the total population.
clinical study on asthmatics, the stratification could For instance, if the population consists of 60% in
be accomplished by dividing the asthmatic patients the male stratum and 40% in the female stratum,
into subsets (strata) depending on age, duration of then the relative size of the two samples (three
illness, or severity of illness males, two females) should reflect this
In quality control procedures, items are frequently proportion.
selected for inspection at random within specified
time intervals (strata) rather than in a completely
random fashion (simple random sampling)
133 134
TYPES OF STRATIFIED SAMPLING MERITS OF STRATIFIED SAMPLING

Disproportionate sampling (Optimum allocation ) More representative of the population
– based on standard deviation of the variable in Ensures greater accuracy
each stratum Good for non-homogenous population
Larger samples are taken in the strata with the
greatest variability to generate the least possible
sampling variance.
135 136
DEMERITS OF STRATIFIED SAMPLING SYSTEMATIC SAMPLING

Stratified sampling is not useful when the Every nth item is selected
population cannot be exhaustively partitioned Sampling is done at regular intervals
into disjoint subgroups.
Sampling interval, k = N/n
If multiple criteria exist for formation of strata it where, N = population size
makes the sampling plan more difficult. n = desired sample size
Initial sample (j) is selected randomly and then
every kth sample is selected, eg. j+k, j+2k, etc.
137 138
23
8/10/2015
SYSTEMATIC SAMPLING SYSTEMATIC SAMPLING

For example, N=64, n=8, Care should be taken that the
then k=64/8 = 8 process does not show a cyclic or
Randomly selected initial periodic behavior, because
sample (j) is 3. systematic sampling will then
not be representative of the
process.
139 140
SYSTEMATIC SAMPLING CLUSTER SAMPLING
Merits: In this technique, the total population is divided

into groups (or clusters) and a simple random
Simple and convenient to adopt
sample of the groups is selected.
If the population is sufficiently large,
Then the required information is collected from a
homogeneous and each unit is numbered it can simple random sample of the elements within
yield accurate results. each selected group.
Demerits: This may be done for every element in these
Periodicities in the list groups (single-stage cluster sampling) or a
subsample of elements may be selected within
each of these groups (two-stage cluster
sampling).
141 142
CLUSTER SAMPLING CLUSTER SAMPLING
The population within a cluster should Merits

ideally be as heterogeneous as possible but Flexibility is high, useful for large populations
there should be homogeneity between
clusters formed.
Demerits
Each cluster should be a small scale
Least accurate amongst all probability sampling
representation of the total population.
methods.
The clusters should be mutually exclusive and
collectively exhaustive.
143 144
24
8/10/2015
STRATIFICATION AND CLUSTERING
STRATIFICATION CLUSTERING Stratified sampling Cluster sampling
Divide population into Divide population into

groups different from each comparable groups. eg.
other eg. sex,age, religion, cities, schools, etc.
etc.
Sample randomly from Randomly sample some of
each group the groups
Less error compared to More error compared to
simple random simple random
More expensive Less expensive
145 146
SAMPLING TECHNIQUES NON-PROBABILITY SAMPLING

Here, some elements have no chance of getting
selected.
Selection is done based on ease of access to
subjects (convenience sample).
Selection is done based on judgement of a person.
(judgement sample)
A pre-planned number of subjects may be
selected (quota sample)
eg. 100 men, 100 women
Sample size may be small
147
Doesn’t allow estimation of sampling error. 148
Should be restricted to small population.
MERITS OF NON-PROBABILITY SAMPLING DEMERITS

Merits: Individual bias
Simple, more representative sample can be Inaccurate
obtained where random sample fails Can’t be compared with other studies.
Widely used in solving business problem and
making public policy decisions.
149 150
25
8/10/2015
FACTORS AFFECTING SELECTION OF

SAMPLING PROCEDURE
The nature of the population.
For example, can we enumerate the individual units,
such as packaged bottles of a product, or is the
population less easily defined, as in the case of
hypertensive patients?
The cost of sampling in terms of both time
and money.
Convenience.
Desired precision.
The accuracy and precision desired will be a function of
the sampling procedure and sample size.
151 152
SAMPLING ERROR SAMPLING ERRORS

The discrepancy between a sample statistic and Types of errors:
its population parameter is called sampling Biased:
error. Due to non-probability sampling
Defining and measuring sampling error is a large
Doesn’t decrease even if the sample size is
part of inferential statistics. increased.
We can’t perfectly miniature the population
Unbiased:
hence errors do occur.
Random sampling errors
It incures when the statistical characteristics of a
Due to chance difference between members of the
population are estimated from a subset, or
sample, of that population population included in the sample and members
excluded in the sample.
153 154
HOW TO REDUCE SAMPLING ERRORS?

Increase the sample size.
Choose correct sampling method.
Choose correct method for data interpretation.
155 156
26
8/10/2015
EXAMPLE OF CALCULATING A
CONFIDENCE INTERVAL
CONFIDENCE INTERVALS
Consider measurement of dissolved Ti
in a standard seawater (NASS-3):
Quantifies how far the true mean (µ) lies from the Data: 1.34, 1.15, 1.28, 1.18, 1.33,
measured mean, x. Uses the mean and standard 1.65, 1.48 nM
deviation of the sample. DF = n – 1 = 7 – 1 = 6
ts x = 1.34 nM or 1.3 nM ts
µ=x± s = 0.17 or 0.2 nM µ=x±
n 95% confidence interval
t(df=6,95%) = 2.447
n
CI95 = 1.3 ± 0.16 or 1.3 ± 0.2 nM
where t is from the t-table and n = number of
50% confidence interval
measurements.
t(df=6,50%) = 0.718
Degrees of freedom (df) = n - 1 for the CI.
CI50 = 1.3 ± 0.05 nM
157 158
COMPARING A MEASURED RESULT

WITH A “KNOWN” VALUE
INTERPRETING THE CONFIDENCE INTERVAL
For a 95% CI, there is a 95% probability that the
true mean (µ) lies between the range 1.3 ± 0.2 nM, “Known” value would typically be a certified value
or between 1.1 and 1.5 nM from a standard reference material (SRM)
Another application of the t statistic
For a 50% CI, there is a 50% probability that the true

mean lies between the range 1.3 ± 0.05 nM, or known value − x
between 1.25 and 1.35 nM t calc = n
s
Note that CI will decrease as n is increased Will compare tcalc to tabulated value of t at appropriate
df and CL.
Useful for characterizing data that are regularly
df = n -1 for this test
obtained; e.g., quality assurance, quality control 159 160
COMPARING A MEASURED RESULT COMPARING REPLICATE MEASUREMENTS OR

WITH A “KNOWN” VALUE--EXAMPLE COMPARING MEANS OF TWO SETS OF DATA
Dissolved Fe analysis verified using NASS-3 seawater SRM
Yet another application of the t statistic
Certified value = 5.85 nM
Example: Given the same sample analyzed by two
Experimental results: 5.76 ± 0.17 nM (n = 10)
different methods, do the two methods give the “same”
known value − x 5.85 − 5.7 6 result?
tcalc =
s
n =
0.17
10 = 1.674 x1 − x 2 n1 n 2
t calc =
(Keep 3 decimal places for comparison to table.) s pooled n1 + n 2
Compare to ttable; df = 10 - 1 = 9, 95% CL s12 (n1 −1) + s 22 (n 2 −1)
s pooled =
ttable(df=9,95% CL) = 2.262 n1 + n 2 − 2
If |tcalc| < ttable, results are not significantly different at the 95% CL. Will compare tcalc to tabulated value of t at appropriate df
and CL.
If |tcalc| ≥ ttable, results are significantly different at the 95% CL. 162
df = n1 + n2 – 2 for this test
For this example, tcalc < ttest, 161
so experimental results are not significantly
different at the 95% CL
27
8/10/2015
COMPARING REPLICATE MEASUREMENTS OR COMPARING REPLICATE MEASUREMENTS OR COMPARING

COMPARING MEANS OF TWO SETS OF DATA— MEANS OF TWO SETS OF DATA—EXAMPLE
EXAMPLE
s12 ( n1 − 1) + s22 (n2 − 1) (0.07 3 ) 2 (4 − 1) + (0.12 ) 2 ( 4 −1)
s pooled = = = 0.0993
n1 + n2 − 2 4+4−2
Determination of nickel in sewage sludge
using two different methods x1 − x2 n1 n2 3.945 − 3.59 (4)(4)
t calc = = = 5.056
s pooled n1 + n2 0.0993 4+4
Method 1: Atomic absorption Method 2: Spectrophotometry
spectroscopy Note: Keep 3 decimal places to compare to ttable.
Data: 3.91, 4.02, 3.86, 3.99 mg/g Data: 3.52, 3.77, 3.49, 3.59 mg/g
Compare to ttable at df = 4 + 4 – 2 = 6 and 95% CL.
ttable(df=6,95% CL) = 2.447
x1 = 3.945 mg/g x2 = 3.59 mg/g
s1 = 0.07 If |tcalc| < ttable, results are not significantly different at the 95%. CL.
3 mg/g s2 = 0.12 mg/g
If |tcalc| ≥ ttable, results are significantly different at the 95% CL.
n1 n2
=4 =4
163 164
Since |tcalc| (5.056) ≥ ttable (2.447), results from the two methods are
significantly different at the 95% CL.
COMPARING REPLICATE MEASUREMENTS OR

COMPARING MEANS OF TWO SETS OF DATA
F-TEST TO COMPARE STANDARD DEVIATIONS
Wait a minute! There is an important assumption Used to determine if std. devs. are significantly
associated with this t-test: different before application of t-test to compare
replicate measurements or compare means of two
It is assumed that the standard deviations (i.e., the sets of data
precision) of the two sets of data being compared
are not significantly different.
Also used as a simple general test to compare the
•How do you test to see if the two std. devs. are precision (as measured by the std. devs.) of two sets
different? of data
•How do you compare two sets of data whose std. Uses F distribution
devs. are significantly different?
166
© 2006
F-TEST TO COMPARE STANDARD DEVIATIONS
Will compute Fcalc and compare to Ftable.
s12
Fcalc = where s1 > s2
s22
DF = n1 - 1 and n2 - 1 for this test.
Choose confidence level (95% is a typical CL).
From D.C. Harris (2003) Quantitative Chemical Analysis, 6th Ed.

167
28
8/10/2015
COMPARING REPLICATE MEASUREMENTS OR

COMPARING MEANS OF TWO SETS OF DATA--
F-TEST TO COMPARE STANDARD DEVIATIONS REVISITED
From previous example:
Let s1 = 0.12 and s2 = 0.073
s12 (0.12 ) 2 The use of the t-test for comparing means was justified
Fcalc = = = 2.70
s22 (0.07 3 ) 2 for the previous example because we showed that
standard deviations of the two sets of data were not
Note: Keep 2 or 3 decimal places to compare with Ftable. significantly different.
Compare Fcalc to Ftable at df = (n1 -1, n2 -1) = 3,3 and 95% CL.
If the F-test shows that std. devs. of two sets of data
If Fcalc < Ftable, std. devs. are not significantly different at 95% CL. are significantly different and you need to compare
the means, use a different version of the t-test
If Fcalc ≥ Ftable, std. devs. are significantly different at 95% CL.
Ftable(df=3,3;95% CL) = 9.28
Since Fcalc (2.70) < Ftable (9.28), std. devs. of the two sets of data 169 170
are not significantly different at the 95% CL. (Precisions are
similar.)
COMPARING REPLICATE MEASUREMENTS OR FLOWCHART FOR COMPARING MEANS OF TWO

COMPARING MEANS FROM TWO SETS OF DATA WHEN
SETS OF DATA OR REPLICATE MEASUREMENTS
STD. DEVS. ARE SIGNIFICANTLY DIFFERENT
Use F-test to see if std. devs. of
the 2 sets of data are significantly
x1 − x2 different or not
tcalc =
s12 / n1 + s22 / n2
Std. devs. are significantly Std. devs. are not significantly

  different different
 
 
2 2 2
( s / n + s2 / n2 )
DF =  2 1 21 2 
−2
 2
  ( s1 / n1 ) + ( s2 / n2 )   Use the 2nd version of the t- Use the 1st version of the t-test
  n1 + 1 n2 + 1  
test (the beastly version) (see previous, fully worked-out
example)
171 172
EVALUATING QUESTIONABLE DATA POINTS

ONE LAST COMMENT ON THE F-TEST USING THE Q-TEST
Need a way to test questionable data points (outliers) in an
Note that the F-test can be used to simply test whether unbiased way.
or not two sets of data have statistically similar Q-test is a common method to do this.
precisions or not.
Requires 4 or more data points to apply.
Can use to answer a question such as: Do method one Calculate Qcalc and compare to Qtable
and method two provide similar precisions for the
analysis of the same analyte? Qcalc = gap/range
Gap = (difference between questionable data pt. and its

nearest neighbor)
173 174
Range = (largest data point – smallest data point)
29
8/10/2015
EVALUATING QUESTIONABLE DATA POINTS

USING THE Q-TEST--EXAMPLE
Consider set of data; Cu values in sewage sample:
9.52, 10.7, 13.1, 9.71, 10.3, 9.99 mg/L
Arrange data in increasing or decreasing order:

9.52, 9.71, 9.99, 10.3, 10.7, 13.1
The questionable data point (outlier) is 13.1

gap (13.1 − 10.7)
Qcalc = = = 0.670
Calculate range (13.1 − 9.52)
Compare Qcalc to Qtable for n observations and desired CL (90% or
95% is typical). It is desirable to keep 2-3 decimal places in
Qcalc so judgment from table can be made.
176
Qtable (n=6,90% CL) = 0.56
175 From G.D. Christian (1994) Analytical Chemistry, 5th Ed.
Design Data summary Statistics & Tests
EVALUATING QUESTIONABLE DATA POINTS 2 independent groups Proportions

Rank Ordered
Chi-square, Fisher-exact
Mann-Whitney U
USING THE Q-TEST--EXAMPLE Mean
Survival
Unpaired t-test
Mantel-Haenzel, Log rank
2 related groups Proportions McNemar Chi-square
Rank Ordered Sign test
If Qcalc < Qtable, do not reject questionable data point at stated CL.
Mean Wilcoxon signed rank
Paired t-test
If Qcalc ≥ Qtable, reject questionable data point at stated CL. More than 2 independent Proportions Chi-square
groups Rank Ordered Kruskal-Wallis
Mean ANOVA
From previous example, Survival Log rank
More than 2 related groups Proportions Cochran Q
Qcalc (0.670) > Qtable (0.56), so reject data point at 90% CL.
Rank Ordered Friedman
Mean Repeated ANOVA
Subsequent calculations (e.g., mean and standard deviation) Study of Causation; one Proportion Relative Risk
should then exclude the rejected point. independent variable Mean Odd Ratios
(univariate) Correlation coefficient
Study of Causation; more Proportion Discriminant Analysis

Mean and std. dev. of remaining data: 10.04 ± 0.47 mg/L than one independent Mean Multiple Logistic Regression
177 Log Linear Model 178
variable (Multivariate)
Regression Analysis
Multiple Classification Analysis
Choosing a test for comparing the averages of 2 or more samples of

scores of experiments with one treatment factor
Scheme for choosing one-sample test
Data Between subjects Within subjects
(independent samples) (related samples) Nominal 2 categories >2 categories
2 samples
Interval Independent t-test Paired t-test
Binomial test Chi-square test
Ordinal Wilcoxon-Mann- Wilcoxon signed ranks Ordinal Randomness Distribution
Whitney test test, Sign test
Nominal Chi-square test Mc Nemar test Runs test Kolmogorov-
> 2 samples Smirnov test
Interval One way ANOVA Repeated measured
ANOVA
Interval Mean Distribution
Ordinal Kruskal-Wallis test Friedman test t-test Kolmogorov-
179 180
Nominal Chi-square test Cochran’s Q test Smirnov test
(dichotomous data only)
30
8/10/2015
Measures of association
between 2 variables Z-SCORES
The number of SDs that a specific score is above

Data Statistic or below the mean in a distribution
Raw scores can be converted to z-scores by
Interval Pearson Correlation (r) subtracting the mean from the raw score then
dividing the difference by the SD
Ordinal Spearman’s Rho,
Kendall’s tau-a, tau-b, tau-c
Nominal Phi, Cramer V X −µ

z=
σ
181 182
Z-SCORES (CONT.) Z-SCORES (CONT.)
Standardization
The process of converting raw to z-scores Refer to a z-table
to find proportion
The resulting distribution of z-scores will always
under the curve
have a mean of zero, a SD of one, and an area under
the curve equal to one
The proportion of scores that are higher or lower
than a specific z-score can be determined by
referring to a z-table
183 184
Partial z-table (to z = 1.5) showing proportions of the

-SCORES
Zarea (CONT
under a normal curve.)
for different values of z.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517
Corresponds
0.55570.5596 0.5636
to the 0.5714
0.5675
area 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.59480.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293
under
0.63310.6368
the0.6406
curve in black
0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319185
1.5 0.9332
0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
31

Basics of Biostatistics PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basics of Biostatistics PDF

Uploaded by

Copyright:

Available Formats

8/10/2015

Statistics is the science

which deals with

BIOSTATISICS NEED FOR BIOSTATISTICS

Unidentified, uncontrollable background error – “Noise”

FUNCTIONS OF STATISTICS MAIN BRANCHES OF BIOSTATISTICS

BRANCHES OF BIOSTATISTICS POPULATIONS AND SAMPLES

EXAMPLE EXAMPLES OF POPULATION AND SAMPLE

PARAMETER AND STATISTIC DATA

parameter number of patients (counting).

SOURCES OF DATA 2- External sources

things. Precede dependent variables in time

QUALITATIVE DATA OR VARIABLE

Nominal Ordinal Discrete Continuous

QUALITATIVE NOMINAL DATA NOMINAL SCALE DATA EXAMPLE

Survival rate 84% 63%

SOME OTHER EXAMPLES OF NOMINAL DATA QUALITATIVE ORDINAL DATA

QUANTITATIVE DATA OR VARIABLE

QUANTITATIVE CONTINUOUS VARIABLES Discrete data -- Gaps between possible values

can assume any value within a specified relevant

Examples: Number of Children

No matter how close together the observed heights

HOW TO DESCRIBE A CATEGORICAL

Cumulative frequency distribution

FREQUENCY DISTRIBUTION TABLE

Distribution of Religion in a school

Religion Frequency % Frequency

Stage of cancer No. of patients Cummulative

HOW TO DESCRIBE A NUMERICAL

FIGURES/CHARTS 2. Grouped Frequency Distribution: for

GROUPED AND UNGROUPED

Ungrouped Grouped Number of Peas in a Pea Freq,

GRAPHS OF FREQUENCY DISTRIBUTIONS: FREQUENCY HISTOGRAM

The horizontal scale is quantitative and

measures the data values.

Consecutive bars must touch. 5 18 0

RELATIVE FREQUENCY DISTRIBUTIONS RELATIVE FREQUENCY DISTRIBUTIONS

Relative Frequency Distribution Peas per Rel. No. of peas in a pod

Relative Frequency Histogram 5 18 36 0

GROUPED FREQUENCY DISTRIBUTION

LABELING GROUPED FREQUENCY CONSTRUCTING A GROUPED FREQUENCY

Round up to the next convenient number.

SERUM CHOLESTROL CHANGES (MG%) FOR 156 PATIENTS AFTER

Larson/Farber 4th ed.

Find the remaining LCLs: add the class width to the

Find the UCLs: Remember that classes must cover all

5. Find the frequencies for each class.

FREQUENCY DISTRIBUTION HISTOGRAM / FREQUENCY POLYGON

FREQUENCY POLYGON MEASURES OF CENTRAL TENDENCY

MEASURE OF CENTRAL TENDENCY:

MEASURES OF CENTRAL TENDENCY:

c is the class width

MEASURE OF CENTRAL TENDENCY:

MEASURES OF CENTRAL TENDENCY MODE OF GROUPED DATA

f1 is the frequency of the modal class

c is the class width of the modal class

COMPARING THE MEAN, MEDIAN, AND MODE

MEASURES OF DISPERSION MEASURES OF DISPERSION

STANDARD DEVIATION (SD) STANDARD DEVIATION (SD)

DISTANCES AGES DEVIATE ABOVE AND

FORMULA TO CALCULATE THE SD Calculate the SD of following values:

101.8, 103.2, 104.0, 102.5, 103.5

FIND STANDARD DEVIATION OF THE