Professional Documents
Culture Documents
Bio Statistics
Bio Statistics
IN
ORTHODONTICS
BY DR K.SHALMA
Contents:
• Introduction
• Basic terminology
• Scales of measurement
• Data
• Presentation of data
• Measures of Dispersion
• Sampling
• Null Hypothesis
• Test of significance
• Conclusion
• References
2
what do statistics mean?
3
introduction
• Statistics is a field of mathematical sciences
that deals with data
• Biostatistics is a branch of statistics that
emphasizes the statistical applications in the
biomedical and health sciences
• John Graunt father of health statistics
FATHER OF BIOSTATISTICS
5
• Statistics is an absolutely indespensible tool
,providing the techniques that allow
researchers to draw objective scientific
conclusions
Why do we need statistics?
• When you can measure what you are speaking
about and express it in numbers ,you know
something about it .but when you cannot
measure when you cannot express it in
numbers ,your knowledge is of meagre and
unsatisfactory kind.
• … LORD KELVIN
Why Do we Need Statistics ?
8
It helps assessing treatment effects
9
How Much Mathematics Do we Need?
• No more than high school or college level algebra is
required.
10
Basic Terminology:
• In most cases, the biomedical and health sciences data
consists of observations of certain characteristics of
individual subjects, experimental animals, chemicals,
microbiological, or physical phenomena in laboratories, or
observations of patients, responses to treatment.
11
• Some data are numeric, such as height (5’6”), systolic
B.P. (112mm Hg), and some are non-numeric, such as sex
(female, male) and the patient’s level of pain (no pain,
moderate pain, severe pain).
12
Population:
• The collection of all elements of interest having one or
more common characteristics is called a population.
13
Variable:
• A variable is any characteristic of an object that can be
measured or categorized.
E.g.:
Age
Sex
Waiting time in clinic
Diabetic levels
14
Types of variables
Quantitative/ Qualitative /
Numerical Categorical
Continuous
Discrete
15
Qualitative Variable:
It is a characteristic of people or objects that cannot be
naturally expressed in a numeric value.
E.g.:
Sex – male, female
Facial type – Brachyfacial, Dolichofacial, Mesiofacial
Level of oral hygiene – poor, fair, good
16
Quantitative Variable:
It is a characteristic of people or objects that can be
naturally expressed in a numeric value.
E.g.:
Age
Height
Bond strength
17
Discrete Variable:
It is a random variable that can take on a finite number
of values or a countable infinite number (as many as
there are whole numbers) of values.
E.g.:
• The size of a family
• The number of DMFT teeth. T can be any one of the 33
numbers, 0,1,2,3,…32.
18
Continuous Variable:
It is a random variable that can take on a range of values
on a continuum, i.e., its range is uncountably infinite.
E.g.:
Treatment time
Temperature
Torque value on tightening an implant abutment
19
Confounding Variable:
The statistical results are said to be confounded when the
results can have more than one explanation.
E.g.:
In a study, smoking is the most important etiological factor
in the development of oral squamous cell carcinoma. It has
been suggested that alcohol is one of the major causes of
squamous cell carcinoma, and alcohol consumption is also
known to be closely related to smoking. Therefore, in this
study, alcohol is confounding variable.
20
Scales Of Measurement:
Nominal Measurement Scale:
It is the simplest type of data, in which the values are in
unordered categories.
E.g.:
• Sex (F, M)
• Blood type (A, B, AB and O)
21
Ordinal Measurement Scale:
.The categories can be ordered or ranked.
.The amount of the difference between any two
categories, though they can be ordered, is not quantified.
E.g.:
Pain after separator placement
0 - no pain
1 - mild pain
2 - moderate pain
3 - severe pain
4 - extremely severe pain
Only for statistic convenience
22
Interval Measurement Scale:
.Observations can be ordered, and precise differences
between units of measure exist. However, there is no
meaningful absolute zero.
E.g.:
• IQ score representing the level of intelligence.
IQ score 0 is not indicative of no intelligence.
23
Ratio Measurement Scale:
It is as same as interval scale in every aspect except that
measurement begins at a true or absolute zero.
E.g.:
• Weight in pounds.
• Height in meters.
24
Observations:
• The description of observations:
It includes collecting, summarizing and presenting.
It is also known as Descriptive statistics.
25
Data:
Data are a set of values of one or more variables recorded on
one or more individuals.
Types of Data
26
Primary data:
It is the data obtained directly from an individual.
Advantages
I. Precise information
2. Reliable
Disadvantages
I. Time consuming
Secondary data:
It is obtained from outside sources,
27
Quantitative data:
Measure something with a number.
E.g: the amount of crowding, overjet, incisor
inclination, and maxillomandibular skeletal discrepancy.
Qualitative data:
Data is collected on the basis of attribute or qualities.
E.g: The sex of the patient, severity of mandibular plane
angle (high, normal, low), likelihood of compliance with
headgear or elastics (yes/no).
28
Uses Of Data:
29
Method of collection of
data
30
Presentation of Data:
Methods of presentation of
data
Tabulation Diagrams/graphs
31
Tabulation:
Types of Tables
Frequency
Simple table Master table distribution table
32
Graphs And Diagrams:
33
Bar Charts:
A diagram of columns or bars, the height of the bars
determine the value of the particular data in question.
34
Pie Charts:
These are so called because the entire graph looks like a pie
and its components represent slices cut from a pie.
23%
10%
9%
35
Line Graph:
When the quantity is a continuous variable i.e., time or
temperature, data is plotted as a continuous line.
0
Category 1 Category 2 Category 3 Category 4
36
Histograms:
• A histogram is a special sort of bar chart.
• The successive groups of data are linked in a definite
numerical order
37
Frequency Polygons:
• A frequency distribution may also be represented
diagrammatically by the frequency polygon.
38
Pictograms:
Pictorial or diagrammatical data represented by a
pictorial symbol.
USA 50
SINGAPORE 1100
INDIA 3700
BANGLADESH 9700
• Statistics/parameters as the
Mean (the arithmetic average)
Median (the middle datum)
Mode (the most frequent score).
Objectives
•To condense the entire mass of data.
40
Mean:
• This measure implies the arithmetic average or arithmetic
mean.
• It is obtained by summing up all the observations and
dividing the total by number of observations.
E.g. The following gives you the fasting blood glucose levels of
a sample of 10 children.
I 2 3 4 5 6 7 8 9 10
56 62 63 65 65 65 65 68 70 71
41
Advantages:
Easy to calculate
Easily understood
Utilizes entire data
Affords good comparison
Disadvantages:
Mean is affected by extreme values, In such cases it leads
to bad interpretation.
42
Median:
• In median the data are arranged in an ascending or
descending order of magnitude and the value of middle
observation is located.
Advantages:
1. It is more representative than mean.
2. It does not depend on every observations.
3. It is not affected by extreme values.
43
Mode:
Mode is that value which occurs with the greatest
frequency.
A distribution may have more than one mode.
44
Advantages :
1. It eliminates extreme variation.
2. Easy to understand
Disadvantages :
3. In small number of cases there may be no mode at
all because no values may be repeated; therefore it is
not used in medical or biological statistics.
45
Dispersion:
Dispersion is the degree of spread or variation of the
variable about a central value. The measures of dispersion
helps us to study the spread of the values about the central
value.
46
Methods of
dispersion
Mean Standard
The range
deviation deviation
47
The Range:
The range is defined as the difference between the highest
and lowest figures in a given sample.
• It is by far the simplest measure of dispersion.
Advantage:
• Easy to calculate
Disadvantages:
• Unstable
• It is affected by one extremely high or low score.
48
The Mean Deviation:
• It is the average of deviations from the arithmetic mean.
• It is given by,
M.D. = (X – Xi)
n
49
Standard Deviation:
• The standard deviation is the most frequently used
measure of deviation.
50
The Normal Curve / Normal
Distribution/ Gaussian Distribution:
When a data is collected from a very large number of
people and a frequency distribution is made with
narrow class intervals , the resulting curve is smooth
and symmetrical and it is called normal curve.
51
Standard Normal Curve:
• It is bell shaped .
• The curve is perfectly symmetrical based on an
infinitely
large number of observations.
• The total area of curve is one, its mean is zero and
standard deviation is one.
• All the three measures of central tendency , the mean,
median and mode coincide
52
Probability:
• Probability is defined as possible or probable chances of
occurrence of an event or happening. Probability is a
proportion.
53
If the probability is more than 0.05, the difference is
called insignificant and if it is less than (or) equal to
0.05 the difference is called as significant. This value of
P is obtained by calculating various tests of significance.
54
Sampling:
• Sampling can be defined as the investigation of part
of a population, in order to provide information, which can
then be generalized to cover the whole population.
Advantages:
• It reduces the cost of investigation, time required and
number of personnel involved.
55
Simple random sampling:
Provides the greatest number of possible samples
56
Systematic random sampling:
Each unit in the sampling frame would have the same
chance of being selected, but the number of possible
samples is greatly reduced.
57
Stratified random sampling:
Analysis the data by a certain characteristic of the
population.
Strata:
Are mutually exclusive segments of a population based on
a specific characteristics
58
Cluster sampling:
59
Errors
Coverage error
Observational error
Processing error
Sampling error:
occurs due to sampling process and could arise because
of faulty sample design or due to the small size of the
sample.
60
Standard error:
If we take a random sample from the population, and
similar samples over and over again we will find that every
sample will have a different mean.
61
Testing of statistical hypothesis:
• Hypothesis– Tentative prediction or explanation of
relationship between two or more variables.
• Null hypothesis (H0) - nullifies the claim that the
experimental result is different from or better than the
one observed already.
• Alternative hypothesis (H1) - sample result is different
62
Tests of significance:
It is a test used to compare or estimate significant
differences between two or more samples and it also
verifies if the result/finding are
due to
Real by chance
variation
63
The process of significance testing involves three basic
Steps
64
Parametric Tests and Non Parametric
Tests:
• Statistical tests that assume a distribution and use
parameters are called parametric tests.
Parameters.
65
Difference between parametric and non
parametric tests:
Parametric test Non parametric test
• Information about population is • No information about population is
completely known. available.
• Scientific assumptions are made • No assumptions are made.
regarding population.
• Null hypothesis is made on • Null hypothesis is free from
parameters of population distribution. parameters.
• Test statistic is based on distribution. • Test statistic is arbitrary.
• Parametric tests are applicable only • It is applied both variables and
for variables. attributes.
• No parametric test exist for nominal • Do exist for nominal and ordinal scale
scale data. data.
• It is powerful, if it exist. • Not so powerful like parametric test.
66
Various test of significance:
Parametric tests Non parametric tests
67
Z test (normal test):
Sample > 30
Used for
1. Comparison of sample mean and proportion mean
2. Difference between two sample proportions
68
t test:
• It was first described by W.S. Gossett, whose pen
name
was student.
69
Steps In Hypothesis Testing
1. State null hypothesis and alternative hypothesis.
70
• If calculated value > table value, reject the null
hypothesis and the test is statistically significant.
71
Critical ratio:
Critical ratio = parameter
Standard Error of that parameter
• For t test
Critical ratio= t = Difference between two means
SE of the difference between two means
• For Z test
Critical ratio= z = Difference between two proportions
SE of the difference between two proportions
72
Degrees of freedom:
• The term “degrees of freedom” refers to the number
of observations that are free to vary.
73
• Before putting on a pair of gloves, a person has the
freedom to decide whether to begin with left or the
right glove. However, once the person puts on the first
glove, he or she loses the freedom to decide which glove
to put on last.
74
Types of t-tests
75
One sample t test:
It is used to compare the mean of a single group of
observations with a specified value.
t= X- μ
SD/√n
76
Unpaired t test:
77
Types of data required:
Independent variable:
One nominal variable with two levels (dichotomous
unpaired)
Ex: boy/girl students.
non smoking/heavy smoking mothers.
Dependent variable: Continuous variable
Ex: birth weight of children.
78
Assumptions:
79
Test statistic is given by
Mean1 – Mean2
t=
SE (Mean1 –mean2)
80
Paired t test:
• It is applied to paired data of independent observations
from one sample, when each individual gives pair of
observations.
81
Types of data:
82
Test statistic is given by
d
t =
SD/ n
Where,
d = mean difference between the before and after values.
n= sample size
83
Unpaired t test: to compare the means between two
independent groups.
84
One way variable ANOVA test:
Null hypothesis :
Ho = μ1 = μ2 = μ3 =…….= μk
Alternative hypothesis :
H1= μ1 ≠ μ2 ≠ μ3 ≠ …. ≠ μk
85
Types of data:
Independent variable:
86
Assumptions:
87
Procedural steps in ANOVA
Under one way ANOVA, only one factor considered
88
4. Calculate total sum of squares
( X)2
i.e. TSS = X2 - --------
N
Where X = X 1+ X 2+ X 3+ … + X k
N= n1+n2+ ……………… +nk, and df = N-1
89
ANOVA table
Source of Degrees of Sum of M Sum of F-ratio
variation freedom squares squares
(1) (2) 3=2/1
90
Compare the calculated F-ratio by F-table, table with
df = (K-1, N-K)
67Y7RFDTRE
91
Systolic blood pressure values (X) of 4 occupation are
given. Determine if there is significant difference in
mean blood pressure of 4 groups in order to assess the
role of occupation in causation of BP
Officer 125 130 135 120 115 120 130 135 140 135
Clerks 120 122 115 110 125 122 120 120 126 120
Lab 120 115 115 130 120 125 122 115 126 118
Technician
Attendants 118 120 118 120 120 115 125 125 120 115
92
Occupation Officers Clerks Lab Attendants
Technician
Total 1285 1200 1206 1196
X =
1285 + 1200 + 1206 + 1196 = 4887
X2 =
165725 + 144194 + 145684 + 143148 = 598761
93
( X)2
TSS = X2 - --------
N
= 598761 - [ (4887)2 / 40]
= 1681.78
= 538.48
94
ANOVA Table :
95
The F-table value with df = (3, 36) = 2.86
96
Two Way Analysis Of Variance:
It is used when the data classified on the basis of two
factors.
Ex:
Different treatments
Effectiveness of drug:
Different period of time
Different groups
97
Chi square test:
• To determine if there any association between categorical
data from two or more groups.
• It is a test of proportions.
98
Formula:
χ 2 = ∑ (O – E)2
E
O = The observed value
E = The expected value
99
Degree of freedom:
d. f = (column-1) (row-1)
chi square table:
100
Ex:
A cancer screening test was carried out by a team of
oncologists and a total of 300 people were screened for
oral cancer the findings were Oral cancer is present in
100 under whom 20 gave history of chewing tobacco
200 people were without oral cancer, under whom 110
gave history of chewing tobacco
cancer
102
Regression analysis and correlation
Regression:
It is used to predict the value of one continuous variable
from the other, if the two variables are associated.
Ex:
1. A study was done to describe the relationship
between height and weight of Dentists; If one dentist is
5 feet 10 inches tall, how much is he expected to
weigh?
103
Dependent variable (outcome variable): weight of the
dentist, family medical and dental expenditure.
104
Correlation:
It is used to quantify the strength and direction of the
relationship between two continuous variables.
It is denoted by r.
(x-x) (y-y)
r=
(x-x)2 (y-y)2
105
Given data
106
Pearson correlation coefficient, r= 0.92=92%
Interpretation: The two variables are highly correlated.
107
Mann - Whitney U test:
• A common nonparametric test for comparison of two
unpaired samples is the Mann-Whitney U test, also known
as the Wilcoxon rank sum test (not to be confused with
the Wilcoxon signed rank test).
108
• The median of each group is found by ranking the data
in each group from lowest to highest and identifying the
middle most value (median).
109
Wilcoxon signed rank test
• To test that 2 treatments are the same, or the
hypothesis is that 2 population distributions are identical.
110
Kruskal- Wallis test:
• If there are more than two groups to compare and it
is not appropriate to use parametric procedures such as
ANOVA
111
A study was conducted to evaluate the efficacy of
hyaluronon containing mouthwash in comparison with
0.2% chlorhexidine and a water based mouthwash. 45
volunteers were recruited in the present study. They
were randomly divided into three groups.
Group A (positive control) was the chlorhexidine group.
Group B (test group) was the hyaluronan group and
Group C (negative control) was the water based group.
Baseline plaque index scores were recorded at baseline and
at the end of the study. Mean plaque index scores were
recorded for all the three groups.
112
• A Kruskal wallis non parametric test was performed
to
compare all the three groups.
113
A study was conducted to evaluate the antiplaque
efficacy of two Dentifrices. 60 subjects were recruited
in the study. They were randomly allocated to two
Groups, test group and control group. Plaque scores are
measured by Turesky et al modification of Quigley-Hein
Plaque Index at baseline and at the end of the study.
Mean plaque scores were calculated for both the groups.
114
Comparison of mean plaque scores between the groups
is done by Mann Whitney-u-test and with in the group
is done by wilicoxon signed rank test.
115
First variable Second variable Test of choice
117
First variable Second variable Test of choice
118
Parametric Test Non parametric test Use
To compare two
Two sample t test Mann- whitney U test independent samples for
(wilcoxon rank sum test) equality of
means/medians
To compare nominal
- X2 analysis data: to compare two or
more samples for
equivalence in proportion
119
SPSS Statistics:
• It is a software package used for statistical analysis.
120
Conclusion:
• Advancing technology has enabled us to collect and safeguard
a
wide variety of data with minimal effort, from patients
demographic information to treatment regimens.
121
References:
•Biostatistics for oral healthcare – Jay S. Kim,
Ronald J. Dailey
122