Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 122

BIOSTATISTICS

IN
ORTHODONTICS
BY DR K.SHALMA
Contents:
• Introduction
• Basic terminology
• Scales of measurement
• Data
• Presentation of data
• Measures of Dispersion
• Sampling
• Null Hypothesis
• Test of significance
• Conclusion
• References

2
what do statistics mean?

3
introduction
• Statistics is a field of mathematical sciences
that deals with data
• Biostatistics is a branch of statistics that
emphasizes the statistical applications in the
biomedical and health sciences
• John Graunt father of health statistics
FATHER OF BIOSTATISTICS

5
• Statistics is an absolutely indespensible tool
,providing the techniques that allow
researchers to draw objective scientific
conclusions
Why do we need statistics?
• When you can measure what you are speaking
about and express it in numbers ,you know
something about it .but when you cannot
measure when you cannot express it in
numbers ,your knowledge is of meagre and
unsatisfactory kind.
• … LORD KELVIN
Why Do we Need Statistics ?

• To read the literature critically, assessing the adequacy


of the research and interpreting the results and conclusions
correctly so that they may properly implement the new
discoveries in diagnosis and treatment – understanding
statistics sufficiently is required.

8
It helps assessing treatment effects

compare different options,


understand how treatments interact

To understand the association


between cause and effect in oral
diseases.

9
How Much Mathematics Do we Need?
• No more than high school or college level algebra is
required.

• However, it is fair to say that with greater


knowledge of mathematics, we can obtain much
deeper insights into and understanding of statistics.

10
Basic Terminology:
• In most cases, the biomedical and health sciences data
consists of observations of certain characteristics of
individual subjects, experimental animals, chemicals,
microbiological, or physical phenomena in laboratories, or
observations of patients, responses to treatment.

• Whenever an experiment or a clinical trial is conducted,


measurements are taken and observations are made.

11
• Some data are numeric, such as height (5’6”), systolic
B.P. (112mm Hg), and some are non-numeric, such as sex
(female, male) and the patient’s level of pain (no pain,
moderate pain, severe pain).

• To adequately discuss and describe that data, few terms


that will be used repeatedly are defined.

12
Population:
• The collection of all elements of interest having one or
more common characteristics is called a population.

• The elements can be individual subjects, objects, or


events.

• The population that contains an infinite number of


elements is called an infinite populations.

• The population that contains an finite


number of elements is called an
finite populations.

13
Variable:
• A variable is any characteristic of an object that can be
measured or categorized.

• Denoted by an upper case of the alphabet, X, Y, or Z.

E.g.:
Age
Sex
Waiting time in clinic
Diabetic levels

14
Types of variables

Quantitative/ Qualitative /
Numerical Categorical

Continuous

Discrete

15
Qualitative Variable:
It is a characteristic of people or objects that cannot be
naturally expressed in a numeric value.

E.g.:
Sex – male, female
Facial type – Brachyfacial, Dolichofacial, Mesiofacial
Level of oral hygiene – poor, fair, good

16
Quantitative Variable:
It is a characteristic of people or objects that can be
naturally expressed in a numeric value.

E.g.:
Age
Height
Bond strength

17
Discrete Variable:
It is a random variable that can take on a finite number
of values or a countable infinite number (as many as
there are whole numbers) of values.

E.g.:
• The size of a family
• The number of DMFT teeth. T can be any one of the 33
numbers, 0,1,2,3,…32.

18
Continuous Variable:
It is a random variable that can take on a range of values
on a continuum, i.e., its range is uncountably infinite.

E.g.:
Treatment time
Temperature
Torque value on tightening an implant abutment

19
Confounding Variable:
The statistical results are said to be confounded when the
results can have more than one explanation.
E.g.:
In a study, smoking is the most important etiological factor
in the development of oral squamous cell carcinoma. It has
been suggested that alcohol is one of the major causes of
squamous cell carcinoma, and alcohol consumption is also
known to be closely related to smoking. Therefore, in this
study, alcohol is confounding variable.

20
Scales Of Measurement:
Nominal Measurement Scale:
It is the simplest type of data, in which the values are in
unordered categories.

E.g.:
• Sex (F, M)
• Blood type (A, B, AB and O)

The categories in a nominal measurement scale have no


quantitative relationship to each other.

21
Ordinal Measurement Scale:
.The categories can be ordered or ranked.
.The amount of the difference between any two
categories, though they can be ordered, is not quantified.

E.g.:
Pain after separator placement
0 - no pain
1 - mild pain
2 - moderate pain
3 - severe pain
4 - extremely severe pain
Only for statistic convenience

22
Interval Measurement Scale:
.Observations can be ordered, and precise differences
between units of measure exist. However, there is no
meaningful absolute zero.

E.g.:
• IQ score representing the level of intelligence.
IQ score 0 is not indicative of no intelligence.

• Statistics knowledge represented by a statistics test


score.
The test score zero does not necessarily mean that the
individual has zero knowledge in statistics.

23
Ratio Measurement Scale:
It is as same as interval scale in every aspect except that
measurement begins at a true or absolute zero.

E.g.:
• Weight in pounds.
• Height in meters.

There cannot be negative measurements.

24
Observations:
• The description of observations:
It includes collecting, summarizing and presenting.
It is also known as Descriptive statistics.

• The inference of observations:


It includes analyzing and interpreting.
It is known as Inferential statistics.

25
Data:
Data are a set of values of one or more variables recorded on
one or more individuals.

Types of Data

Secondary Qualitative Quantitative


Primary Data
data data data

26
Primary data:
It is the data obtained directly from an individual.

Advantages
I. Precise information
2. Reliable
Disadvantages
I. Time consuming

Secondary data:
It is obtained from outside sources,

e.g. hospital records, school register.

27
Quantitative data:
Measure something with a number.
E.g: the amount of crowding, overjet, incisor
inclination, and maxillomandibular skeletal discrepancy.

Qualitative data:
Data is collected on the basis of attribute or qualities.
E.g: The sex of the patient, severity of mandibular plane
angle (high, normal, low), likelihood of compliance with
headgear or elastics (yes/no).

28
Uses Of Data:

In designing a health care


programme.

In evaluating the effectiveness of an


on going program.

In determining the needs of a


specific population. .

In evaluating the scientific accuracy


of a journal article.

29
Method of collection of
data

Questionnaires Surveys Records Interviews

30
Presentation of Data:

Methods of presentation of
data

Tabulation Diagrams/graphs

31
Tabulation:

Types of Tables

Frequency
Simple table Master table distribution table

32
Graphs And Diagrams:

Impact on Better retained Easy


imagination in memory comparisons

33
Bar Charts:
A diagram of columns or bars, the height of the bars
determine the value of the particular data in question.

Simple bar graph

Multiple bar graph

Component bar graph


graph

34
Pie Charts:
These are so called because the entire graph looks like a pie
and its components represent slices cut from a pie.

Distribution of Malocclusions in school children


class 1 class 2A
59% class 2B class 3

23%

10%

9%

35
Line Graph:
When the quantity is a continuous variable i.e., time or
temperature, data is plotted as a continuous line.

0
Category 1 Category 2 Category 3 Category 4

36
Histograms:
• A histogram is a special sort of bar chart.
• The successive groups of data are linked in a definite
numerical order

Haemoglobin levels of Students in a class

37
Frequency Polygons:
• A frequency distribution may also be represented
diagrammatically by the frequency polygon.

• It is obtained by joining the mid points of the


histogram blocks.

38
Pictograms:
Pictorial or diagrammatical data represented by a
pictorial symbol.
USA 50

SINGAPORE 1100

INDIA 3700

BANGLADESH 9700

Population per Physician


39
Central Tendency / Statistical Averages:
• Central tendency refers to the center of the distribution
of data points.

• Statistics/parameters as the
Mean (the arithmetic average)
Median (the middle datum)
Mode (the most frequent score).

Objectives
•To condense the entire mass of data.

•To facilitate comparison.

40
Mean:
• This measure implies the arithmetic average or arithmetic
mean.
• It is obtained by summing up all the observations and
dividing the total by number of observations.

E.g. The following gives you the fasting blood glucose levels of
a sample of 10 children.
I 2 3 4 5 6 7 8 9 10
56 62 63 65 65 65 65 68 70 71

Total Mean = 650 / 10 = 65

Mean is denoted by the sign X(X bar)

41
Advantages:
Easy to calculate
Easily understood
Utilizes entire data
Affords good comparison

Disadvantages:
Mean is affected by extreme values, In such cases it leads
to bad interpretation.

42
Median:
• In median the data are arranged in an ascending or
descending order of magnitude and the value of middle
observation is located.

Arrange them in ascending or descending order.


71,75,75,77,79,81,83,84,90,95.
Median = 79 + 81 / 2 = 80
If there are only 9 observations then median = 79.

Advantages:
1. It is more representative than mean.
2. It does not depend on every observations.
3. It is not affected by extreme values.

43
Mode:
Mode is that value which occurs with the greatest
frequency.
A distribution may have more than one mode.

E.g. Diastolic blood pressure of 10 individuals.


85,75,81,79,71,80,75,78,72,73
Here mode = 75 i.e. the distribution is uni-modal
85,75,81,79,80,71,80,78,75,73
Here mode =75 and 80 i.e. the distribution is bi-modal.

44
Advantages :
1. It eliminates extreme variation.
2. Easy to understand

Disadvantages :
3. In small number of cases there may be no mode at
all because no values may be repeated; therefore it is
not used in medical or biological statistics.

45
Dispersion:
Dispersion is the degree of spread or variation of the
variable about a central value. The measures of dispersion
helps us to study the spread of the values about the central
value.

Purpose of Measures of Dispersion


1. To study the variability of data.
2. To determine the reliability of an average.
3. Compare two or more series in relation to their
variability.

46
Methods of
dispersion

Mean Standard
The range
deviation deviation

47
The Range:
The range is defined as the difference between the highest
and lowest figures in a given sample.
• It is by far the simplest measure of dispersion.

Advantage:
• Easy to calculate

Disadvantages:
• Unstable
• It is affected by one extremely high or low score.

48
The Mean Deviation:
• It is the average of deviations from the arithmetic mean.

• It is given by,

M.D. =  (X – Xi)
n

49
Standard Deviation:
• The standard deviation is the most frequently used
measure of deviation.

• In simple terms it is defined as Root Mean Square


deviation because it is the square root of the variance
(average of the squared difference from the mean)

• It is denoted by the Greek letter  or by the initials


S.D.  = (X – Xi)2
 n
• Greater the S.D. greater will be the magnitude of
dispersion from mean.

• A small S.D. means a higher degree of uniformity of


observations.

50
The Normal Curve / Normal
Distribution/ Gaussian Distribution:
When a data is collected from a very large number of
people and a frequency distribution is made with
narrow class intervals , the resulting curve is smooth
and symmetrical and it is called normal curve.

51
Standard Normal Curve:
• It is bell shaped .
• The curve is perfectly symmetrical based on an
infinitely
large number of observations.
• The total area of curve is one, its mean is zero and
standard deviation is one.
• All the three measures of central tendency , the mean,
median and mode coincide

52
Probability:
• Probability is defined as possible or probable chances of
occurrence of an event or happening. Probability is a
proportion.

• In tossing a coin, the only possible outcome is a head or


a tail. Probability of a head is 0.5 and tail is 0.5 and the
sum is 1.

53
If the probability is more than 0.05, the difference is
called insignificant and if it is less than (or) equal to
0.05 the difference is called as significant. This value of
P is obtained by calculating various tests of significance.

P < 0.001 Very highly significant

P < 0.01 Highly significant


P < 0.05 Significant.
P > 0.05 not Significant.

54
Sampling:
• Sampling can be defined as the investigation of part
of a population, in order to provide information, which can
then be generalized to cover the whole population.

Advantages:
• It reduces the cost of investigation, time required and
number of personnel involved.

• It allows thorough investigation of the units of


observation.

• It helps to provide adequate and indepth coverage of


sample units.

55
Simple random sampling:
Provides the greatest number of possible samples

56
Systematic random sampling:
Each unit in the sampling frame would have the same
chance of being selected, but the number of possible
samples is greatly reduced.

57
Stratified random sampling:
Analysis the data by a certain characteristic of the
population.

Strata:
Are mutually exclusive segments of a population based on
a specific characteristics

58
Cluster sampling:

59
Errors

Sampling errors Non sampling errors

Coverage error
Observational error

Processing error

Sampling error:
occurs due to sampling process and could arise because
of faulty sample design or due to the small size of the
sample.

60
Standard error:
If we take a random sample from the population, and
similar samples over and over again we will find that every
sample will have a different mean.

S.E – it is a measure which enables us to judge whether the


mean of a given sample is within the set of confidence limits
or not.

61
Testing of statistical hypothesis:
• Hypothesis– Tentative prediction or explanation of
relationship between two or more variables.
• Null hypothesis (H0) - nullifies the claim that the
experimental result is different from or better than the
one observed already.
• Alternative hypothesis (H1) - sample result is different

Always remember H1 accepted when H0 rejected

Inference Accept It Reject It

Hypothesis is True Correct Decision Type 1Error (α)

Hypothesis is False Type II Error (β) Correct Decision

62
Tests of significance:
It is a test used to compare or estimate significant
differences between two or more samples and it also
verifies if the result/finding are

due to
Real by chance
variation

63
The process of significance testing involves three basic
Steps

Asserting the null hypothesis

Establishing the alpha level

Rejecting or failing to reject null


hypothesis

64
Parametric Tests and Non Parametric
Tests:
• Statistical tests that assume a distribution and use
parameters are called parametric tests.

• Statistical tests that don't assume a distribution or


use
parameters are called Non parametric tests.

• Means and standard deviations are called

Parameters.

65
Difference between parametric and non
parametric tests:
Parametric test Non parametric test
• Information about population is • No information about population is
completely known. available.
• Scientific assumptions are made • No assumptions are made.
regarding population.
• Null hypothesis is made on • Null hypothesis is free from
parameters of population distribution. parameters.
• Test statistic is based on distribution. • Test statistic is arbitrary.
• Parametric tests are applicable only • It is applied both variables and
for variables. attributes.
• No parametric test exist for nominal • Do exist for nominal and ordinal scale
scale data. data.
• It is powerful, if it exist. • Not so powerful like parametric test.

66
Various test of significance:
Parametric tests Non parametric tests

Z test Mann Whitney U test

Students t test Wilcoxon rank test

One way ANOVA Kruskal Wallis test

Two way ANOVA Friedman test

Correlation Spearman rank


correlation
Regression
Chi – square test

67
Z test (normal test):
Sample > 30

Used for
1. Comparison of sample mean and proportion mean
2. Difference between two sample proportions

Prerequisites to apply Z test


When Z test is applied to sampling variability, difference
between sample estimate and that of population
expressed in SE instead of SD.

68
t test:
• It was first described by W.S. Gossett, whose pen
name
was student.

• so this test is also known as student’s t test.

• It was later shortened to t test.

• t test compare differences between means while z test


compare differences between proportions.

• t test is used for small samples (generally sample Size


< 30), z test is used for large samples.

69
Steps In Hypothesis Testing
1. State null hypothesis and alternative hypothesis.

2. Choose appropriate statistical test.

3. Set the significance level.

4. Two sided or one sided test?

5. Calculate critical ratio and degrees of freedom.

6. Compare the obtained critical ratio with the values in


the appropriate statistical table for different degrees
of freedom.

70
• If calculated value > table value, reject the null
hypothesis and the test is statistically significant.

• If calculated value < table value, accept the null


hypothesis and the test is not statistically significant.

71
Critical ratio:
Critical ratio = parameter
Standard Error of that parameter

• For t test
Critical ratio= t = Difference between two means
SE of the difference between two means

• For Z test
Critical ratio= z = Difference between two proportions
SE of the difference between two proportions

72
Degrees of freedom:
• The term “degrees of freedom” refers to the number
of observations that are free to vary.

• One degree of freedom is lost every time a mean is


calculated.

Why should this be?

73
• Before putting on a pair of gloves, a person has the
freedom to decide whether to begin with left or the
right glove. However, once the person puts on the first
glove, he or she loses the freedom to decide which glove
to put on last.

• Once N – 1 observations (each of which was, free to


vary) have been added up, the last observation is not
free to vary, because the total values of the N observations
must add up to the sum of Xi .

74
Types of t-tests

One sample Unpaired t


Paired t test
t-test test

75
One sample t test:
It is used to compare the mean of a single group of
observations with a specified value.

t= sample mean – hypothesized mean


standard error of sample mean

t= X- μ
SD/√n

76
Unpaired t test:

• It is used to compare two independent groups of


observation.

• The sample need not be equal in size.

77
Types of data required:
Independent variable:
One nominal variable with two levels (dichotomous
unpaired)
Ex: boy/girl students.
non smoking/heavy smoking mothers.
Dependent variable: Continuous variable
Ex: birth weight of children.

A study was conducted to compare the birth weights of


children born to 15 non-smoking with those of children
born to 14 heavy smoking mothers.

78
Assumptions:

• The samples are random and independent of each


other.

• The independent variable is categorical and contains


only two levels.

• The distribution of dependent variable is normal.

• If the distribution is seriously skewed, the t-test may


be
invalid, we have to go for non parametric test.

79
Test statistic is given by

Mean1 – Mean2
t=
SE (Mean1 –mean2)

SE(Mean1 –mean2) = S12 /n1 + S22 /n2

Where S1 and S2 are respectively called SD’s of first


and second group

80
Paired t test:
• It is applied to paired data of independent observations
from one sample, when each individual gives pair of
observations.

• Measurements made on the same people, before and


after intervention.

81
Types of data:

Outcome variable: continuous


Second variable: Dichotomous paired (before vs. after
treatment)
Ex: A study was carried to evaluate the effect of new
diet on weight loss. The study population consist of 12
People have used the diet for 2 months; their weights
before and after the diet are given.

The research question is whether diet makes the


difference.

82
Test statistic is given by

d
t =
SD/ n
Where,
d = mean difference between the before and after values.

S.D= standard deviation for the difference

n= sample size

83
Unpaired t test: to compare the means between two
independent groups.

Paired t test: to compare the means between pre and post


measures of the same group.

How do we compare if there are more


than two groups?

84
One way variable ANOVA test:

It is used to compare means of more than two groups.


If there are K groups,

Null hypothesis :
Ho = μ1 = μ2 = μ3 =…….= μk
Alternative hypothesis :

H1= μ1 ≠ μ2 ≠ μ3 ≠ …. ≠ μk

85
Types of data:

Independent variable:

One nominal variable with more than two levels


Ex: Socio economic status (low/medium/high)

Dependent variable: Continuous variable


Ex: Hb level
A study is conducted to assess the Hb levels of women
in low, medium and high socio economic status.

86
Assumptions:

• The samples are random and independent of each


other

• The independent variable is categorical and contains


more than two levels

• The distribution of dependent variable is normal. If


the
distribution is seriously skewed, the ANOVA may be
invalid, we have to go for non parametric test.

87
Procedural steps in ANOVA
Under one way ANOVA, only one factor considered

1. Null hypothesis, Alternative hypothesis

2. Calculate sum of squares between the groups

X12 X22 X32 X2


i.e. SSB = ------ + ------- + -------- + …….. -------
n1 n2 n3 N

3. Assume k-groups, d.f =K-1 for SSB

88
4. Calculate total sum of squares
( X)2
i.e. TSS =  X2 - --------
N
Where  X =  X 1+  X 2+  X 3+ … +  X k
N= n1+n2+ ……………… +nk, and df = N-1

5. Calculate the sum of squares within the groups /error


sum of square (SSE)
i.e. SSE = TSS- SSB, df = N-1 – (K-1)

89
ANOVA table
Source of Degrees of Sum of M Sum of F-ratio
variation freedom squares squares
(1) (2) 3=2/1

Between K-1 SS1= MSSB= F=MSSB/


groups SSB SSB/K-1 MSSE

Within N-K SS2= MSSE=


groups SSE SSE/N-K

Total N-1 TSS

90
Compare the calculated F-ratio by F-table, table with

df = (K-1, N-K)

calculated > F-ratio value

67Y7RFDTRE

F calculated < F-ratio value

Null hypothesis is accepted, P>0.05 or 0.01

91
Systolic blood pressure values (X) of 4 occupation are
given. Determine if there is significant difference in
mean blood pressure of 4 groups in order to assess the
role of occupation in causation of BP

Officer 125 130 135 120 115 120 130 135 140 135

Clerks 120 122 115 110 125 122 120 120 126 120

Lab 120 115 115 130 120 125 122 115 126 118

Technician

Attendants 118 120 118 120 120 115 125 125 120 115

92
Occupation Officers Clerks Lab Attendants
Technician
Total 1285 1200 1206 1196

Mean 128.5 120.0 120.6 119.6

Square total 165725 144194 145684 143148

X =
1285 + 1200 + 1206 + 1196 = 4887

X2 =
165725 + 144194 + 145684 + 143148 = 598761

93
( X)2
TSS = X2 - --------
N
= 598761 - [ (4887)2 / 40]
= 1681.78

X12 X22 X32 Xn2


SSB = ------ + ------- + -------- + …….. + ------
n1 n2 n3 nk

= 538.48

SSE = TSS- SSB


=1681.78-538.48
=1143.30

94
ANOVA Table :

Source of Degrees of Sum of M Sum of F-ratio


variation freedom squares squares
(1) (2) 3=2/1

Between 4-1=3 538.48 179.49 F=179.49


groups / 31.79 =
5.65

Within 40-1- 1143.30 31.79


groups 3=36

Total 40-1=39 1681.30

95
The F-table value with df = (3, 36) = 2.86

F calculated > F-table value


5.65 > 2.86

Null hypothesis is rejected,

P<0.05 , A significant difference is seen between


three occupational groups in relation to mean BP value

96
Two Way Analysis Of Variance:
It is used when the data classified on the basis of two
factors.
Ex:

Recovery of patient from disease:


Different Doctors

Different treatments

Effectiveness of drug:
Different period of time

Different groups

97
Chi square test:
• To determine if there any association between categorical
data from two or more groups.
• It is a test of proportions.

Ex: A study was done to determine if mother’s education


was associated with oral health status of the child?

Mother’s Good oral Poor oral Total


education health status health status
> Grade 5 78 42 120
<or = Grade 5 105 45 150
Total 183 87 270

98
Formula:

χ 2 = ∑ (O – E)2
E
O = The observed value
E = The expected value

∑ (O – E)2 = all the values of (O – E) squared then


added together

row total x column total


Expected frequency =
Grand total

99
Degree of freedom:

d. f = (column-1) (row-1)
chi square table:

d.f 0.05 0.01


1 3.84 6.63
2 5.99 9.21
3 7.28 11.34
4 9.49 13.28
5 11.07 15.09
6 12.59 16.81
7 14.07 18.48

100
Ex:
A cancer screening test was carried out by a team of
oncologists and a total of 300 people were screened for
oral cancer the findings were Oral cancer is present in
100 under whom 20 gave history of chewing tobacco
200 people were without oral cancer, under whom 110
gave history of chewing tobacco

Chewing Not Total


tobacco chewing
tobacco
Oral cancer 20 (43.3) 80(56.6) 100

No oral 110(86.3) 90(113.3) 200

cancer

Total 130 170 300


101
χ 2
= 33.26, d.f =1

Calculated chi square value > table chi square value


(0.05)  it is significant

Calculated chi square value < table chi square value


it is not significant

102
Regression analysis and correlation

Regression:
It is used to predict the value of one continuous variable
from the other, if the two variables are associated.

Ex:
1. A study was done to describe the relationship
between height and weight of Dentists; If one dentist is
5 feet 10 inches tall, how much is he expected to
weigh?

2. A study was conducted to predict family’s dental and


medical expenditures in terms of household income

103
Dependent variable (outcome variable): weight of the
dentist, family medical and dental expenditure.

Independent variable (predictor variable): height of the


dentist, household income

104
Correlation:
It is used to quantify the strength and direction of the
relationship between two continuous variables.
It is denoted by r.
(x-x) (y-y)
r=
 (x-x)2 (y-y)2

If r=1 then there is perfect positive correlation between X


and Y.
If r=-1 then there is perfect negative correlation between
X and Y
If r=0 then there is no correlation.

105
Given data

Subject Variable x Variable y


(Height) (weight)
1 182.9 cm 78.5 kg
2 172.7 cm 60.8 kg
3 175.3 cm 68.0 kg
4 172.7 cm 65.8 kg
5 160.0 cm 52.2 kg
6 165.1 cm 54.4 kg
7 172.7 cm 60.3 kg

106
Pearson correlation coefficient, r= 0.92=92%
Interpretation: The two variables are highly correlated.

The association is strong, with 92% of variation in


weight explained by variation in height.
Slope = 1.16

Interpretation: There is 1.16 kg increase in weight for


each 1 cm increase in height.

107
Mann - Whitney U test:
• A common nonparametric test for comparison of two
unpaired samples is the Mann-Whitney U test, also known
as the Wilcoxon rank sum test (not to be confused with
the Wilcoxon signed rank test).

Eg: Comparison of the effectiveness of two types of


toothbrushes on the oral hygiene of patients undergoing
orthodontic treatment with fixed appliances. Plaque index
and sulcus bleeding depth index was used.

108
• The median of each group is found by ranking the data
in each group from lowest to highest and identifying the
middle most value (median).

• P value that is not significant (P > .05), it is concluded


that there is no difference.

• But if this test is statistically significant (P < .05), it is


concluded that the two groups are actually different.

109
Wilcoxon signed rank test
• To test that 2 treatments are the same, or the
hypothesis is that 2 population distributions are identical.

• This test can be used in place of the t-test for


dependent
samples, without the assumption of the usual normal
distributions.

110
Kruskal- Wallis test:
• If there are more than two groups to compare and it
is not appropriate to use parametric procedures such as
ANOVA

• The nonparametric test comparable to the ANOVA is


the Kruskal- Wallis procedure.

• The ANOVA uses means and variances for its


computations

• The Kruskal-Wallis test, like the previous


nonparametric procedures described, examines inter
group differences based on ranks.

111
A study was conducted to evaluate the efficacy of
hyaluronon containing mouthwash in comparison with
0.2% chlorhexidine and a water based mouthwash. 45
volunteers were recruited in the present study. They
were randomly divided into three groups.
Group A (positive control) was the chlorhexidine group.
Group B (test group) was the hyaluronan group and
Group C (negative control) was the water based group.
Baseline plaque index scores were recorded at baseline and
at the end of the study. Mean plaque index scores were
recorded for all the three groups.

112
• A Kruskal wallis non parametric test was performed
to
compare all the three groups.

• Differences between the individual rinse solutions and


the water based solution were determined via Mann
Whitney test.

113
A study was conducted to evaluate the antiplaque
efficacy of two Dentifrices. 60 subjects were recruited
in the study. They were randomly allocated to two
Groups, test group and control group. Plaque scores are
measured by Turesky et al modification of Quigley-Hein
Plaque Index at baseline and at the end of the study.
Mean plaque scores were calculated for both the groups.

114
Comparison of mean plaque scores between the groups
is done by Mann Whitney-u-test and with in the group
is done by wilicoxon signed rank test.

115
First variable Second variable Test of choice

Continuous Continuous Pearson


Correlation
coefficient,
Linear regression

Continuous Ordinal data Spearman


correlation
coefficient

Continuous Dichotomous Unpaired t-test


unpaired

Continuous Dichotomous Paired t-test


Paired 116
First variable Second variable Test of choice

Ordinal Ordinal Spearman


correlation
coefficient

Ordinal Dichotomous Mann-Whitney U


unpaired test

Ordinal Dichotomous Wilicoxon signed


Paired rank test

117
First variable Second variable Test of choice

Dichotomous Dichotomous Chi-square test


unpaired

Dichotomous Dichotomous McNemar Chi


paired square test

Dichotomous Nominal Chi-square test

118
Parametric Test Non parametric test Use

To compare two paired


Paired t test Wilcoxon signed rank test samples for equality of
means/medians

To compare two
Two sample t test Mann- whitney U test independent samples for
(wilcoxon rank sum test) equality of
means/medians

To compare more than


Analysis of variance Kruskal- wallis two samples for equality
of means/medians

To compare nominal
- X2 analysis data: to compare two or
more samples for
equivalence in proportion
119
SPSS Statistics:
• It is a software package used for statistical analysis.

• It is now officially named "IBM SPSS Statistics".

• SPSS (originally, Statistical Package for the Social


Sciences, later modified to read Statistical Product and
Service Solutions) was released in its first version in
1968 after being developed by Norman H.Nie , Dale H.
Bent and C. Hadlai Hull.

• SPSS is among the most widely used programs


for statistical analysis in social science.

120
Conclusion:
• Advancing technology has enabled us to collect and safeguard
a
wide variety of data with minimal effort, from patients
demographic information to treatment regimens.

• It is difficult to make sense of this confusing and chaotic array


of raw data by visual inspections alone. The data must be
processed in meaningful and systematic ways to uncover the
hidden clues.

• The methods of statistical analysis are powerful tools for


drawing the conclusions that are eventually applied to diagnosis,
prognosis and treatment plans for patients.

121
References:
•Biostatistics for oral healthcare – Jay S. Kim,
Ronald J. Dailey

•Essentials of public health dentistry- Soben Peter

•Park text book of Community Medicine

•Orthodontics: Current principles and techniques.


Graber, Vanarsdall, Vig

122

You might also like