Bio Statistics

BIOSTATISTICS
IN
ORTHODONTICS
BY DR K.SHALMA
Contents:
• Introduction
• Basic terminology
• Scales of measurement
• Data
• Presentation of data
• Measures of Dispersion
• Sampling
• Null Hypothesis
• Test of significance
• Conclusion
• References
2
what do statistics mean?
3
introduction
• Statistics is a field of mathematical sciences
that deals with data
• Biostatistics is a branch of statistics that
emphasizes the statistical applications in the
biomedical and health sciences
• John Graunt father of health statistics
FATHER OF BIOSTATISTICS
5
• Statistics is an absolutely indespensible tool
,providing the techniques that allow
researchers to draw objective scientific
conclusions
Why do we need statistics?
• When you can measure what you are speaking
about and express it in numbers ,you know
something about it .but when you cannot
measure when you cannot express it in
numbers ,your knowledge is of meagre and
unsatisfactory kind.
• … LORD KELVIN
Why Do we Need Statistics ?
• To read the literature critically, assessing the adequacy

of the research and interpreting the results and conclusions
correctly so that they may properly implement the new
discoveries in diagnosis and treatment – understanding
statistics sufficiently is required.
8
It helps assessing treatment effects
compare different options,

understand how treatments interact
To understand the association

between cause and effect in oral
diseases.
9
How Much Mathematics Do we Need?
• No more than high school or college level algebra is
required.
• However, it is fair to say that with greater

knowledge of mathematics, we can obtain much
deeper insights into and understanding of statistics.
10
Basic Terminology:
• In most cases, the biomedical and health sciences data
consists of observations of certain characteristics of
individual subjects, experimental animals, chemicals,
microbiological, or physical phenomena in laboratories, or
observations of patients, responses to treatment.
• Whenever an experiment or a clinical trial is conducted,

measurements are taken and observations are made.
11
• Some data are numeric, such as height (5’6”), systolic
B.P. (112mm Hg), and some are non-numeric, such as sex
(female, male) and the patient’s level of pain (no pain,
moderate pain, severe pain).
• To adequately discuss and describe that data, few terms

that will be used repeatedly are defined.
12
Population:
• The collection of all elements of interest having one or
more common characteristics is called a population.
• The elements can be individual subjects, objects, or

events.
• The population that contains an infinite number of

elements is called an infinite populations.
• The population that contains an finite

number of elements is called an
finite populations.
13
Variable:
• A variable is any characteristic of an object that can be
measured or categorized.
• Denoted by an upper case of the alphabet, X, Y, or Z.
E.g.:
Age
Sex
Waiting time in clinic
Diabetic levels
14
Types of variables
Quantitative/ Qualitative /
Numerical Categorical
Continuous
Discrete
15
Qualitative Variable:
It is a characteristic of people or objects that cannot be
naturally expressed in a numeric value.
E.g.:
Sex – male, female
Facial type – Brachyfacial, Dolichofacial, Mesiofacial
Level of oral hygiene – poor, fair, good
16
Quantitative Variable:
It is a characteristic of people or objects that can be
naturally expressed in a numeric value.
E.g.:
Age
Height
Bond strength
17
Discrete Variable:
It is a random variable that can take on a finite number
of values or a countable infinite number (as many as
there are whole numbers) of values.
E.g.:
• The size of a family
• The number of DMFT teeth. T can be any one of the 33
numbers, 0,1,2,3,…32.
18
Continuous Variable:
It is a random variable that can take on a range of values
on a continuum, i.e., its range is uncountably infinite.
E.g.:
Treatment time
Temperature
Torque value on tightening an implant abutment
19
Confounding Variable:
The statistical results are said to be confounded when the
results can have more than one explanation.
E.g.:
In a study, smoking is the most important etiological factor
in the development of oral squamous cell carcinoma. It has
been suggested that alcohol is one of the major causes of
squamous cell carcinoma, and alcohol consumption is also
known to be closely related to smoking. Therefore, in this
study, alcohol is confounding variable.
20
Scales Of Measurement:
Nominal Measurement Scale:
It is the simplest type of data, in which the values are in
unordered categories.
E.g.:
• Sex (F, M)
• Blood type (A, B, AB and O)
The categories in a nominal measurement scale have no

quantitative relationship to each other.
21
Ordinal Measurement Scale:
.The categories can be ordered or ranked.
.The amount of the difference between any two
categories, though they can be ordered, is not quantified.
E.g.:
Pain after separator placement
0 - no pain
1 - mild pain
2 - moderate pain
3 - severe pain
4 - extremely severe pain
Only for statistic convenience
22
Interval Measurement Scale:
.Observations can be ordered, and precise differences
between units of measure exist. However, there is no
meaningful absolute zero.
E.g.:
• IQ score representing the level of intelligence.
IQ score 0 is not indicative of no intelligence.
• Statistics knowledge represented by a statistics test

score.
The test score zero does not necessarily mean that the
individual has zero knowledge in statistics.
23
Ratio Measurement Scale:
It is as same as interval scale in every aspect except that
measurement begins at a true or absolute zero.
E.g.:
• Weight in pounds.
• Height in meters.
There cannot be negative measurements.
24
Observations:
• The description of observations:
It includes collecting, summarizing and presenting.
It is also known as Descriptive statistics.
• The inference of observations:

It includes analyzing and interpreting.
It is known as Inferential statistics.
25
Data:
Data are a set of values of one or more variables recorded on
one or more individuals.
Types of Data
Secondary Qualitative Quantitative

Primary Data
data data data
26
Primary data:
It is the data obtained directly from an individual.
Advantages
I. Precise information
2. Reliable
Disadvantages
I. Time consuming
Secondary data:
It is obtained from outside sources,
e.g. hospital records, school register.
27
Quantitative data:
Measure something with a number.
E.g: the amount of crowding, overjet, incisor
inclination, and maxillomandibular skeletal discrepancy.
Qualitative data:
Data is collected on the basis of attribute or qualities.
E.g: The sex of the patient, severity of mandibular plane
angle (high, normal, low), likelihood of compliance with
headgear or elastics (yes/no).
28
Uses Of Data:
In designing a health care

programme.
In evaluating the effectiveness of an

on going program.
In determining the needs of a

specific population. .
In evaluating the scientific accuracy

of a journal article.
29
Method of collection of
data
Questionnaires Surveys Records Interviews
30
Presentation of Data:
Methods of presentation of
data
Tabulation Diagrams/graphs
31
Tabulation:
Types of Tables
Frequency
Simple table Master table distribution table
32
Graphs And Diagrams:
Impact on Better retained Easy

imagination in memory comparisons
33
Bar Charts:
A diagram of columns or bars, the height of the bars
determine the value of the particular data in question.
Simple bar graph
Multiple bar graph
Component bar graph

graph
34
Pie Charts:
These are so called because the entire graph looks like a pie
and its components represent slices cut from a pie.
Distribution of Malocclusions in school children

class 1 class 2A
59% class 2B class 3
23%
10%
9%
35
Line Graph:
When the quantity is a continuous variable i.e., time or
temperature, data is plotted as a continuous line.
0
Category 1 Category 2 Category 3 Category 4
36
Histograms:
• A histogram is a special sort of bar chart.
• The successive groups of data are linked in a definite
numerical order
Haemoglobin levels of Students in a class
37
Frequency Polygons:
• A frequency distribution may also be represented
diagrammatically by the frequency polygon.
• It is obtained by joining the mid points of the

histogram blocks.
38
Pictograms:
Pictorial or diagrammatical data represented by a
pictorial symbol.
USA 50
SINGAPORE 1100
INDIA 3700
BANGLADESH 9700
Population per Physician

39
Central Tendency / Statistical Averages:
• Central tendency refers to the center of the distribution
of data points.
• Statistics/parameters as the
Mean (the arithmetic average)
Median (the middle datum)
Mode (the most frequent score).
Objectives
•To condense the entire mass of data.
•To facilitate comparison.
40
Mean:
• This measure implies the arithmetic average or arithmetic
mean.
• It is obtained by summing up all the observations and
dividing the total by number of observations.
E.g. The following gives you the fasting blood glucose levels of
a sample of 10 children.
I 2 3 4 5 6 7 8 9 10
56 62 63 65 65 65 65 68 70 71
Total Mean = 650 / 10 = 65
Mean is denoted by the sign X(X bar)
41
Advantages:
Easy to calculate
Easily understood
Utilizes entire data
Affords good comparison
Disadvantages:
Mean is affected by extreme values, In such cases it leads
to bad interpretation.
42
Median:
• In median the data are arranged in an ascending or
descending order of magnitude and the value of middle
observation is located.
Arrange them in ascending or descending order.

71,75,75,77,79,81,83,84,90,95.
Median = 79 + 81 / 2 = 80
If there are only 9 observations then median = 79.
Advantages:
1. It is more representative than mean.
2. It does not depend on every observations.
3. It is not affected by extreme values.
43
Mode:
Mode is that value which occurs with the greatest
frequency.
A distribution may have more than one mode.
E.g. Diastolic blood pressure of 10 individuals.

85,75,81,79,71,80,75,78,72,73
Here mode = 75 i.e. the distribution is uni-modal
85,75,81,79,80,71,80,78,75,73
Here mode =75 and 80 i.e. the distribution is bi-modal.
44
Advantages :
1. It eliminates extreme variation.
2. Easy to understand
Disadvantages :
3. In small number of cases there may be no mode at
all because no values may be repeated; therefore it is
not used in medical or biological statistics.
45
Dispersion:
Dispersion is the degree of spread or variation of the
variable about a central value. The measures of dispersion
helps us to study the spread of the values about the central
value.
Purpose of Measures of Dispersion

1. To study the variability of data.
2. To determine the reliability of an average.
3. Compare two or more series in relation to their
variability.
46
Methods of
dispersion
Mean Standard
The range
deviation deviation
47
The Range:
The range is defined as the difference between the highest
and lowest figures in a given sample.
• It is by far the simplest measure of dispersion.
Advantage:
• Easy to calculate
Disadvantages:
• Unstable
• It is affected by one extremely high or low score.
48
The Mean Deviation:
• It is the average of deviations from the arithmetic mean.
• It is given by,
M.D. =  (X – Xi)
n
49
Standard Deviation:
• The standard deviation is the most frequently used
measure of deviation.
• In simple terms it is defined as Root Mean Square

deviation because it is the square root of the variance
(average of the squared difference from the mean)
• It is denoted by the Greek letter  or by the initials

S.D.  = (X – Xi)2
 n
• Greater the S.D. greater will be the magnitude of
dispersion from mean.
• A small S.D. means a higher degree of uniformity of

observations.
50
The Normal Curve / Normal
Distribution/ Gaussian Distribution:
When a data is collected from a very large number of
people and a frequency distribution is made with
narrow class intervals , the resulting curve is smooth
and symmetrical and it is called normal curve.
51
Standard Normal Curve:
• It is bell shaped .
• The curve is perfectly symmetrical based on an
infinitely
large number of observations.
• The total area of curve is one, its mean is zero and
standard deviation is one.
• All the three measures of central tendency , the mean,
median and mode coincide
52
Probability:
• Probability is defined as possible or probable chances of
occurrence of an event or happening. Probability is a
proportion.
• In tossing a coin, the only possible outcome is a head or

a tail. Probability of a head is 0.5 and tail is 0.5 and the
sum is 1.
53
If the probability is more than 0.05, the difference is
called insignificant and if it is less than (or) equal to
0.05 the difference is called as significant. This value of
P is obtained by calculating various tests of significance.
P < 0.001 Very highly significant
P < 0.01 Highly significant

P < 0.05 Significant.
P > 0.05 not Significant.
54
Sampling:
• Sampling can be defined as the investigation of part
of a population, in order to provide information, which can
then be generalized to cover the whole population.
Advantages:
• It reduces the cost of investigation, time required and
number of personnel involved.
• It allows thorough investigation of the units of

observation.
• It helps to provide adequate and indepth coverage of

sample units.
55
Simple random sampling:
Provides the greatest number of possible samples
56
Systematic random sampling:
Each unit in the sampling frame would have the same
chance of being selected, but the number of possible
samples is greatly reduced.
57
Stratified random sampling:
Analysis the data by a certain characteristic of the
population.
Strata:
Are mutually exclusive segments of a population based on
a specific characteristics
58
Cluster sampling:
59
Errors
Sampling errors Non sampling errors
Coverage error
Observational error
Processing error
Sampling error:
occurs due to sampling process and could arise because
of faulty sample design or due to the small size of the
sample.
60
Standard error:
If we take a random sample from the population, and
similar samples over and over again we will find that every
sample will have a different mean.
S.E – it is a measure which enables us to judge whether the

mean of a given sample is within the set of confidence limits
or not.
61
Testing of statistical hypothesis:
• Hypothesis– Tentative prediction or explanation of
relationship between two or more variables.
• Null hypothesis (H0) - nullifies the claim that the
experimental result is different from or better than the
one observed already.
• Alternative hypothesis (H1) - sample result is different
Always remember H1 accepted when H0 rejected
Inference Accept It Reject It
Hypothesis is True Correct Decision Type 1Error (α)
Hypothesis is False Type II Error (β) Correct Decision
62
Tests of significance:
It is a test used to compare or estimate significant
differences between two or more samples and it also
verifies if the result/finding are
due to
Real by chance
variation
63
The process of significance testing involves three basic
Steps
Asserting the null hypothesis
Establishing the alpha level
Rejecting or failing to reject null

hypothesis
64
Parametric Tests and Non Parametric
Tests:
• Statistical tests that assume a distribution and use
parameters are called parametric tests.
• Statistical tests that don't assume a distribution or

use
parameters are called Non parametric tests.
• Means and standard deviations are called
Parameters.
65
Difference between parametric and non
parametric tests:
Parametric test Non parametric test
• Information about population is • No information about population is
completely known. available.
• Scientific assumptions are made • No assumptions are made.
regarding population.
• Null hypothesis is made on • Null hypothesis is free from
parameters of population distribution. parameters.
• Test statistic is based on distribution. • Test statistic is arbitrary.
• Parametric tests are applicable only • It is applied both variables and
for variables. attributes.
• No parametric test exist for nominal • Do exist for nominal and ordinal scale
scale data. data.
• It is powerful, if it exist. • Not so powerful like parametric test.
66
Various test of significance:
Parametric tests Non parametric tests
Z test Mann Whitney U test
Students t test Wilcoxon rank test
One way ANOVA Kruskal Wallis test
Two way ANOVA Friedman test
Correlation Spearman rank

correlation
Regression
Chi – square test
67
Z test (normal test):
Sample > 30
Used for
1. Comparison of sample mean and proportion mean
2. Difference between two sample proportions
Prerequisites to apply Z test

When Z test is applied to sampling variability, difference
between sample estimate and that of population
expressed in SE instead of SD.
68
t test:
• It was first described by W.S. Gossett, whose pen
name
was student.
• so this test is also known as student’s t test.
• It was later shortened to t test.
• t test compare differences between means while z test

compare differences between proportions.
• t test is used for small samples (generally sample Size

< 30), z test is used for large samples.
69
Steps In Hypothesis Testing
1. State null hypothesis and alternative hypothesis.
2. Choose appropriate statistical test.
3. Set the significance level.
4. Two sided or one sided test?
5. Calculate critical ratio and degrees of freedom.
6. Compare the obtained critical ratio with the values in

the appropriate statistical table for different degrees
of freedom.
70
• If calculated value > table value, reject the null
hypothesis and the test is statistically significant.
• If calculated value < table value, accept the null

hypothesis and the test is not statistically significant.
71
Critical ratio:
Critical ratio = parameter
Standard Error of that parameter
• For t test
Critical ratio= t = Difference between two means
SE of the difference between two means
• For Z test
Critical ratio= z = Difference between two proportions
SE of the difference between two proportions
72
Degrees of freedom:
• The term “degrees of freedom” refers to the number
of observations that are free to vary.
• One degree of freedom is lost every time a mean is

calculated.
Why should this be?
73
• Before putting on a pair of gloves, a person has the
freedom to decide whether to begin with left or the
right glove. However, once the person puts on the first
glove, he or she loses the freedom to decide which glove
to put on last.
• Once N – 1 observations (each of which was, free to

vary) have been added up, the last observation is not
free to vary, because the total values of the N observations
must add up to the sum of Xi .
74
Types of t-tests
One sample Unpaired t

Paired t test
t-test test
75
One sample t test:
It is used to compare the mean of a single group of
observations with a specified value.
t= sample mean – hypothesized mean

standard error of sample mean
t= X- μ
SD/√n
76
Unpaired t test:
• It is used to compare two independent groups of

observation.
• The sample need not be equal in size.
77
Types of data required:
Independent variable:
One nominal variable with two levels (dichotomous
unpaired)
Ex: boy/girl students.
non smoking/heavy smoking mothers.
Dependent variable: Continuous variable
Ex: birth weight of children.
A study was conducted to compare the birth weights of

children born to 15 non-smoking with those of children
born to 14 heavy smoking mothers.
78
Assumptions:
• The samples are random and independent of each

other.
• The independent variable is categorical and contains

only two levels.
• The distribution of dependent variable is normal.
• If the distribution is seriously skewed, the t-test may

be
invalid, we have to go for non parametric test.
79
Test statistic is given by
Mean1 – Mean2
t=
SE (Mean1 –mean2)
SE(Mean1 –mean2) = S12 /n1 + S22 /n2
Where S1 and S2 are respectively called SD’s of first

and second group
80
Paired t test:
• It is applied to paired data of independent observations
from one sample, when each individual gives pair of
observations.
• Measurements made on the same people, before and

after intervention.
81
Types of data:
Outcome variable: continuous

Second variable: Dichotomous paired (before vs. after
treatment)
Ex: A study was carried to evaluate the effect of new
diet on weight loss. The study population consist of 12
People have used the diet for 2 months; their weights
before and after the diet are given.
The research question is whether diet makes the

difference.
82
Test statistic is given by
d
t =
SD/ n
Where,
d = mean difference between the before and after values.
S.D= standard deviation for the difference
n= sample size
83
Unpaired t test: to compare the means between two
independent groups.
Paired t test: to compare the means between pre and post

measures of the same group.
How do we compare if there are more

than two groups?
84
One way variable ANOVA test:
It is used to compare means of more than two groups.

If there are K groups,
Null hypothesis :
Ho = μ1 = μ2 = μ3 =…….= μk
Alternative hypothesis :
H1= μ1 ≠ μ2 ≠ μ3 ≠ …. ≠ μk
85
Types of data:
Independent variable:
One nominal variable with more than two levels

Ex: Socio economic status (low/medium/high)
Dependent variable: Continuous variable

Ex: Hb level
A study is conducted to assess the Hb levels of women
in low, medium and high socio economic status.
86
Assumptions:
• The samples are random and independent of each

other
• The independent variable is categorical and contains

more than two levels
• The distribution of dependent variable is normal. If

the
distribution is seriously skewed, the ANOVA may be
invalid, we have to go for non parametric test.
87
Procedural steps in ANOVA
Under one way ANOVA, only one factor considered
1. Null hypothesis, Alternative hypothesis
2. Calculate sum of squares between the groups
X12 X22 X32 X2

i.e. SSB = ------ + ------- + -------- + …….. -------
n1 n2 n3 N
3. Assume k-groups, d.f =K-1 for SSB
88
4. Calculate total sum of squares
( X)2
i.e. TSS =  X2 - --------
N
Where  X =  X 1+  X 2+  X 3+ … +  X k
N= n1+n2+ ……………… +nk, and df = N-1
5. Calculate the sum of squares within the groups /error

sum of square (SSE)
i.e. SSE = TSS- SSB, df = N-1 – (K-1)
89
ANOVA table
Source of Degrees of Sum of M Sum of F-ratio
variation freedom squares squares
(1) (2) 3=2/1
Between K-1 SS1= MSSB= F=MSSB/

groups SSB SSB/K-1 MSSE
Within N-K SS2= MSSE=

groups SSE SSE/N-K
Total N-1 TSS
90
Compare the calculated F-ratio by F-table, table with
df = (K-1, N-K)
calculated > F-ratio value
67Y7RFDTRE
F calculated < F-ratio value
Null hypothesis is accepted, P>0.05 or 0.01
91
Systolic blood pressure values (X) of 4 occupation are
given. Determine if there is significant difference in
mean blood pressure of 4 groups in order to assess the
role of occupation in causation of BP
Officer 125 130 135 120 115 120 130 135 140 135
Clerks 120 122 115 110 125 122 120 120 126 120
Lab 120 115 115 130 120 125 122 115 126 118
Technician
Attendants 118 120 118 120 120 115 125 125 120 115
92
Occupation Officers Clerks Lab Attendants
Technician
Total 1285 1200 1206 1196
Mean 128.5 120.0 120.6 119.6
Square total 165725 144194 145684 143148
X =
1285 + 1200 + 1206 + 1196 = 4887
X2 =
165725 + 144194 + 145684 + 143148 = 598761
93
( X)2
TSS = X2 - --------
N
= 598761 - [ (4887)2 / 40]
= 1681.78
X12 X22 X32 Xn2

SSB = ------ + ------- + -------- + …….. + ------
n1 n2 n3 nk
= 538.48
SSE = TSS- SSB

=1681.78-538.48
=1143.30
94
ANOVA Table :
Source of Degrees of Sum of M Sum of F-ratio

variation freedom squares squares
(1) (2) 3=2/1
Between 4-1=3 538.48 179.49 F=179.49

groups / 31.79 =
5.65
Within 40-1- 1143.30 31.79

groups 3=36
Total 40-1=39 1681.30
95
The F-table value with df = (3, 36) = 2.86
F calculated > F-table value

5.65 > 2.86
Null hypothesis is rejected,
P<0.05 , A significant difference is seen between

three occupational groups in relation to mean BP value
96
Two Way Analysis Of Variance:
It is used when the data classified on the basis of two
factors.
Ex:
Recovery of patient from disease:

Different Doctors
Different treatments
Effectiveness of drug:
Different period of time
Different groups
97
Chi square test:
• To determine if there any association between categorical
data from two or more groups.
• It is a test of proportions.
Ex: A study was done to determine if mother’s education

was associated with oral health status of the child?
Mother’s Good oral Poor oral Total

education health status health status
> Grade 5 78 42 120
<or = Grade 5 105 45 150
Total 183 87 270
98
Formula:
χ 2 = ∑ (O – E)2
E
O = The observed value
E = The expected value
∑ (O – E)2 = all the values of (O – E) squared then

added together
row total x column total

Expected frequency =
Grand total
99
Degree of freedom:
d. f = (column-1) (row-1)
chi square table:
d.f 0.05 0.01

1 3.84 6.63
2 5.99 9.21
3 7.28 11.34
4 9.49 13.28
5 11.07 15.09
6 12.59 16.81
7 14.07 18.48
100
Ex:
A cancer screening test was carried out by a team of
oncologists and a total of 300 people were screened for
oral cancer the findings were Oral cancer is present in
100 under whom 20 gave history of chewing tobacco
200 people were without oral cancer, under whom 110
gave history of chewing tobacco
Chewing Not Total

tobacco chewing
tobacco
Oral cancer 20 (43.3) 80(56.6) 100
No oral 110(86.3) 90(113.3) 200
cancer
Total 130 170 300

101
χ 2
= 33.26, d.f =1
Calculated chi square value > table chi square value

(0.05)  it is significant
Calculated chi square value < table chi square value

it is not significant
102
Regression analysis and correlation
Regression:
It is used to predict the value of one continuous variable
from the other, if the two variables are associated.
Ex:
1. A study was done to describe the relationship
between height and weight of Dentists; If one dentist is
5 feet 10 inches tall, how much is he expected to
weigh?
2. A study was conducted to predict family’s dental and

medical expenditures in terms of household income
103
Dependent variable (outcome variable): weight of the
dentist, family medical and dental expenditure.
Independent variable (predictor variable): height of the

dentist, household income
104
Correlation:
It is used to quantify the strength and direction of the
relationship between two continuous variables.
It is denoted by r.
(x-x) (y-y)
r=
 (x-x)2 (y-y)2
If r=1 then there is perfect positive correlation between X

and Y.
If r=-1 then there is perfect negative correlation between
X and Y
If r=0 then there is no correlation.
105
Given data
Subject Variable x Variable y

(Height) (weight)
1 182.9 cm 78.5 kg
2 172.7 cm 60.8 kg
3 175.3 cm 68.0 kg
4 172.7 cm 65.8 kg
5 160.0 cm 52.2 kg
6 165.1 cm 54.4 kg
7 172.7 cm 60.3 kg
106
Pearson correlation coefficient, r= 0.92=92%
Interpretation: The two variables are highly correlated.
The association is strong, with 92% of variation in

weight explained by variation in height.
Slope = 1.16
Interpretation: There is 1.16 kg increase in weight for

each 1 cm increase in height.
107
Mann - Whitney U test:
• A common nonparametric test for comparison of two
unpaired samples is the Mann-Whitney U test, also known
as the Wilcoxon rank sum test (not to be confused with
the Wilcoxon signed rank test).
Eg: Comparison of the effectiveness of two types of

toothbrushes on the oral hygiene of patients undergoing
orthodontic treatment with fixed appliances. Plaque index
and sulcus bleeding depth index was used.
108
• The median of each group is found by ranking the data
in each group from lowest to highest and identifying the
middle most value (median).
• P value that is not significant (P > .05), it is concluded

that there is no difference.
• But if this test is statistically significant (P < .05), it is

concluded that the two groups are actually different.
109
Wilcoxon signed rank test
• To test that 2 treatments are the same, or the
hypothesis is that 2 population distributions are identical.
• This test can be used in place of the t-test for

dependent
samples, without the assumption of the usual normal
distributions.
110
Kruskal- Wallis test:
• If there are more than two groups to compare and it
is not appropriate to use parametric procedures such as
ANOVA
• The nonparametric test comparable to the ANOVA is

the Kruskal- Wallis procedure.
• The ANOVA uses means and variances for its

computations
• The Kruskal-Wallis test, like the previous

nonparametric procedures described, examines inter
group differences based on ranks.
111
A study was conducted to evaluate the efficacy of
hyaluronon containing mouthwash in comparison with
0.2% chlorhexidine and a water based mouthwash. 45
volunteers were recruited in the present study. They
were randomly divided into three groups.
Group A (positive control) was the chlorhexidine group.
Group B (test group) was the hyaluronan group and
Group C (negative control) was the water based group.
Baseline plaque index scores were recorded at baseline and
at the end of the study. Mean plaque index scores were
recorded for all the three groups.
112
• A Kruskal wallis non parametric test was performed
to
compare all the three groups.
• Differences between the individual rinse solutions and

the water based solution were determined via Mann
Whitney test.
113
A study was conducted to evaluate the antiplaque
efficacy of two Dentifrices. 60 subjects were recruited
in the study. They were randomly allocated to two
Groups, test group and control group. Plaque scores are
measured by Turesky et al modification of Quigley-Hein
Plaque Index at baseline and at the end of the study.
Mean plaque scores were calculated for both the groups.
114
Comparison of mean plaque scores between the groups
is done by Mann Whitney-u-test and with in the group
is done by wilicoxon signed rank test.
115
First variable Second variable Test of choice
Continuous Continuous Pearson

Correlation
coefficient,
Linear regression
Continuous Ordinal data Spearman

correlation
coefficient
Continuous Dichotomous Unpaired t-test

unpaired
Continuous Dichotomous Paired t-test

Paired 116
Ordinal Ordinal Spearman

correlation
coefficient
Ordinal Dichotomous Mann-Whitney U

unpaired test
Ordinal Dichotomous Wilicoxon signed

Paired rank test
117
Dichotomous Dichotomous Chi-square test

unpaired
Dichotomous Dichotomous McNemar Chi

paired square test
Dichotomous Nominal Chi-square test
118
Parametric Test Non parametric test Use
To compare two paired

Paired t test Wilcoxon signed rank test samples for equality of
means/medians
To compare two
Two sample t test Mann- whitney U test independent samples for
(wilcoxon rank sum test) equality of
means/medians
To compare more than

Analysis of variance Kruskal- wallis two samples for equality
of means/medians
To compare nominal
- X2 analysis data: to compare two or
more samples for
equivalence in proportion
119
SPSS Statistics:
• It is a software package used for statistical analysis.
• It is now officially named "IBM SPSS Statistics".
• SPSS (originally, Statistical Package for the Social

Sciences, later modified to read Statistical Product and
Service Solutions) was released in its first version in
1968 after being developed by Norman H.Nie , Dale H.
Bent and C. Hadlai Hull.
• SPSS is among the most widely used programs

for statistical analysis in social science.
120
Conclusion:
• Advancing technology has enabled us to collect and safeguard
a
wide variety of data with minimal effort, from patients
demographic information to treatment regimens.
• It is difficult to make sense of this confusing and chaotic array

of raw data by visual inspections alone. The data must be
processed in meaningful and systematic ways to uncover the
hidden clues.
• The methods of statistical analysis are powerful tools for

drawing the conclusions that are eventually applied to diagnosis,
prognosis and treatment plans for patients.
121
References:
•Biostatistics for oral healthcare – Jay S. Kim,
Ronald J. Dailey
•Essentials of public health dentistry- Soben Peter
•Park text book of Community Medicine
•Orthodontics: Current principles and techniques.

Graber, Vanarsdall, Vig
122

Bio Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bio Statistics

Uploaded by

Copyright:

Available Formats

BIOSTATISTICS

• To read the literature critically, assessing the adequacy

compare different options,

To understand the association

• However, it is fair to say that with greater

• Whenever an experiment or a clinical trial is conducted,

• To adequately discuss and describe that data, few terms

• The elements can be individual subjects, objects, or

• The population that contains an infinite number of

• The population that contains an finite

• Denoted by an upper case of the alphabet, X, Y, or Z.

The categories in a nominal measurement scale have no

• Statistics knowledge represented by a statistics test

There cannot be negative measurements.

• The inference of observations:

Secondary Qualitative Quantitative

e.g. hospital records, school register.

In designing a health care

In evaluating the effectiveness of an

In determining the needs of a

In evaluating the scientific accuracy

Questionnaires Surveys Records Interviews

Impact on Better retained Easy

Simple bar graph

Multiple bar graph

Component bar graph

Distribution of Malocclusions in school children

Haemoglobin levels of Students in a class

• It is obtained by joining the mid points of the

Population per Physician

•To facilitate comparison.

Total Mean = 650 / 10 = 65

Mean is denoted by the sign X(X bar)

Arrange them in ascending or descending order.

E.g. Diastolic blood pressure of 10 individuals.

Purpose of Measures of Dispersion

• In simple terms it is defined as Root Mean Square

• It is denoted by the Greek letter  or by the initials

• A small S.D. means a higher degree of uniformity of

• In tossing a coin, the only possible outcome is a head or

P < 0.001 Very highly significant

P < 0.01 Highly significant

• It allows thorough investigation of the units of

• It helps to provide adequate and indepth coverage of

Sampling errors Non sampling errors

S.E – it is a measure which enables us to judge whether the

Always remember H1 accepted when H0 rejected

Inference Accept It Reject It

Hypothesis is True Correct Decision Type 1Error (α)

Hypothesis is False Type II Error (β) Correct Decision

Asserting the null hypothesis

Establishing the alpha level

Rejecting or failing to reject null

• Statistical tests that don't assume a distribution or

• Means and standard deviations are called

Z test Mann Whitney U test

Students t test Wilcoxon rank test

One way ANOVA Kruskal Wallis test

Two way ANOVA Friedman test

Correlation Spearman rank

Prerequisites to apply Z test

• so this test is also known as student’s t test.

• It was later shortened to t test.