Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 99

BIOSTATISTICS

Depart of orthodontics
Pankaj Lakhanpan
Mds 1st year
INTRODUCTION
• Italian word “statista”- means “statesman”
• German word “statistik” - means political state.

• STATISTICS: science of compiling, classifying and tabulating


numerical data and expressing the results in mathematical or
graphical form.

• BIOSTATISTICS: branch of statistics concerned with


mathematical facts and data relates to biological events.
FATHER OF BIOSTATISTICS

(1620-1674)

3
USES OF BIOSTATISTICS

1. To test whether the difference between two populations,


regarding a particular event is real or a chance occurrence.
2. To study the correlation or association between two or more
attributes in the same population.
3. To evaluate the efficacy of vaccines, sera, etc. by control
studies.
4. To locate, define and measure the extent of morbidity and
mortality in the community.
5. To evaluate the achievements of public health programs.
6. To fix priorities in public health programs.
USES OF BIOSTATISTICS IN DENTISTRY
1. Assess state of oral health in the community to determine
availability and utilization of dental care facilities.

2. To indicate the basic factors underlying the state of oral


health by diagnosing and solutions to such problems.

3. To determine the success or failure of specific oral health


care programs or to evaluate the program action.

4. To promote health legislation and in creating administrative


standards for oral health.
BASIC TERMINOLOGY
• In most cases, the biomedical and health sciences data
consists of observations of certain characteristics of
individual subjects, experimental animals, chemicals,
microbiological, or physical phenomena in laboratories, or
observations of patients, responses to treatment.

• Whenever an experiment or a clinical trial is conducted,


measurements are taken and observations are made.

6
• Some data are numeric, such as height (5’6”), systolic
B.P. (112mm Hg), and some are non-numeric, such as sex
(female, male) and the patient’s level of pain (no pain,
moderate pain, severe pain).

• To adequately discuss and describe that data, few terms


that will be used repeatedly are defined.

7
POPULATION
• The collection of all elements of interest having one or
more common characteristics is called a population.

• The elements can be individual subjects, objects, or events.

• The population that contains an infinite number of


elements is called an infinite populations.

• The population that contains an finite


number of elements is called an
finite populations.

8
VARIABLE

• A variable is any characteristic of an object that can be


measured or categorized.

• Denoted by an upper case of the alphabet, X, Y, or Z.

E.g.:
Age
Sex
Waiting time in clinic
Diabetic levels

9
Types of variables

Quantitative/ Qualitative /
Numerical Categorical

Continuous

Discrete

10
QUALITATIVE VARIABLE
It is a characteristic of people or objects that cannot be
naturally expressed in a numeric value.

E.g.:
Sex – male, female
Facial type – Brachyfacial, Dolichofacial, Mesiofacial
Level of oral hygiene – poor, fair, good

11
QUANTITATIVE VARIABLE

It is a characteristic of people or objects that can be


naturally expressed in a numeric value.

E.g.:
Age
Height
Bond strength

12
DISCRETE VARIABLE
It is a random variable that can take on a finite number
of values or a countable infinite number (as many as
there are whole numbers) of values.

E.g.:
• The size of a family
• The number of DMFT teeth. T can be any one of the 33
numbers, 0,1,2,3,…32.

13
CONTINUOUS VARIABLE
It is a random variable that can take on a range of values
on a continuum, i.e., its range is uncountably infinite.

E.g.:
Treatment time
Temperature
Torque value on tightening an implant abutment

14
CONFOUNDING VARIABLE
The statistical results are said to be confounded when the
results can have more than one explanation.
E.g.:
In a study, smoking is the most important etiological factor
in the development of oral squamous cell carcinoma. It has
been suggested that alcohol is one of the major causes of
squamous cell carcinoma, and alcohol consumption is also
known to be closely related to smoking. Therefore, in this
study, alcohol is confounding variable.

15
SCALES OF MEASUREMENT

Nominal Measurement Scale:


It is the simplest type of data, in which the values are in
unordered categories.

E.g.:
• Sex (F, M)
• Blood type (A, B, AB and O)

The categories in a nominal measurement scale have no


quantitative relationship to each other.

16
Ordinal Measurement Scale:

.The categories can be ordered or ranked.


.The amount of the difference between any two
categories, though they can be ordered, is not quantified.

E.g.:
Pain after separator placement
0 - no pain
1 - mild pain
2 - moderate pain
3 - severe pain
4 - extremely severe pain
Only for statistic convenience

17
Interval Measurement Scale:
.Observations can be ordered, and precise differences
between units of measure exist. However, there is no
meaningful absolute zero.

E.g.:
• IQ score representing the level of intelligence.
IQ score 0 is not indicative of no intelligence.

• Statistics knowledge represented by a statistics test score.


The test score zero does not necessarily mean that the
individual has zero knowledge in statistics.

18
Ratio Measurement Scale:
It is as same as interval scale in every aspect except that
measurement begins at a true or absolute zero.

E.g.:
• Weight in pounds.
• Height in meters.

There cannot be negative measurements.

19
OBSERVATIONS

• The description of observations:


It includes collecting, summarizing and presenting.
It is also known as Descriptive statistics.

• The inference of observations:


It includes analyzing and interpreting.
It is known as Inferential statistics.

20
DATA
Data are a set of values of one or more variables recorded on
one or more individuals.

Types of Data

Secondary Qualitative Quantitative


Primary Data
data data data

21
Primary data:
It is the data obtained directly from an individual.

Advantages
I. Precise information
2. Reliable
Disadvantages
I. Time consuming

Secondary data:
It is obtained from outside sources,

e.g. hospital records, school register.

22
Quantitative data:
Measure something with a number.
E.g: the amount of crowding, overjet, incisor
inclination, and maxillomandibular skeletal discrepancy.

Qualitative data:
Data is collected on the basis of attribute or qualities.
E.g: The sex of the patient, severity of mandibular plane
angle (high, normal, low), likelihood of compliance with
headgear or elastics (yes/no).

23
Uses Of Data:

In designing a health care


programme.

In evaluating the effectiveness of an


on going program.

In determining the needs of a


specific population. .

In evaluating the scientific accuracy


of a journal article.

24
Method of collection of
data

Questionnaires Surveys Records Interviews

25
Presentation of Data:

Methods of presentation of
data

Tabulation Diagrams/graphs

26
Types of Tables

Frequency
Simple table Master table
distribution table

1. SIMPLE TABLE : in this, the characteristics under observation is fixed and the number or
frequency of events is small.

2. MASTER TABLE: in this table all initial readings as per the designed Performa are serially
recorded. When the number of observations is large and several attributes have to be
studied, the master table is a must.

3. FREQUENCY DISTRUBUTION TABLE: this is most important table in statistical work. In


this unsorted data is presented in a small manageable number of groups. It records
how frequently a characteristic occurs in person of the same age group. In qualitative
data, the characteristic remains the same and frequency varies, while in quantitative
data, both are variable. Frequency tables can be prepared using either type of data.
27
QUALITATIVE DATA QUANTITATIVE DATA

 Bar diagram  Histogram


 Pie chart  Frequency Polygon
 Pictogram  Scatter diagram
 Spot Map  Line diagram
Graphs And Diagrams:

Impact on Better retained Easy


imagination in memory comparisons

29
BAR DIAGRAMS
1) Simple bar
. Used to represent and compare the frequency distribution of
discrete variables
All the bars must have equal width and only the length varies
according to the frequency in each category.

30
2) Multiple Bar
• Used to compare the qualitative data with respect to single
variable.
• Each category of variable have a set of bars of same width
corresponding to the different sections without any gap in
between the width and the length corresponds to frequency.

31
3) Proportional Bar Diagram
• When it is desired to compare only the proportion of
subgroups between different major groups of observations,
then bars are drawn for each group with same length.
• These are then divided according to the subgroup proportion
in each major group.

32
Pie chart

• These are popularly used to show


percentage breakdowns for
qualitative data.

• Degree of angle denotes the


frequency.

33
Pictogram
• A diagram that uses pictures to represent amount
or numbers of a particular thing

34
Spot Map

• Show geographical distribution of frequencies of a


characteristic

35
Histogram

• Quantitative data of continuous type.


• Bar diagram without gap between the bars.
• Represents a frequency distribution.

36
Line diagram

• Useful to study changes of values in the variable over


time.

37
Frequency polygon

• Compare two or more frequency distributions.


• A point is marked over the mid-point of the class interval,
corresponding to the frequency.
• Points are connected by straight lines.

38
Scatter diagram

• A graph in which the values of two variables are plotted along


two axes, the pattern of the resulting points revealing any
correlation.

39
CENTRAL TENDENCY / STATISTICAL AVERAGES:
• Central tendency refers to the center of the distribution of data
points.

• Statistics/parameters as the
Mean (the arithmetic average)
Median (the middle datum)
Mode (the most frequent score).
Objectives
•To condense the entire mass of data.

•To facilitate comparison.

40
Ideal properties of central tendency

• Should be easy to understand and compute.

• Should be based on each and every item in the series.

• Should not be affected by extreme observations.

• Should be capable of further statistical computations.

• It should have sampling stability.

41
Mean:
• This measure implies the arithmetic average or arithmetic
mean.
• It is obtained by summing up all the observations and
dividing the total by number of observations.

E.g. The following gives you the fasting blood glucose levels of a
sample of 10 children.
I 2 3 4 5 6 7 8 9 10
56 62 63 65 65 65 65 68 70 71

Total Mean = 650 / 10 = 65

Mean is denoted by the sign X(X bar)

42
Advantages:
Easy to calculate
Easily understood
Utilizes entire data
Affords good comparison

Disadvantages:
Mean is affected by extreme values, In such cases it leads
to bad interpretation.

43
Median:
• In median the data are arranged in an ascending or
descending order of magnitude and the value of middle
observation is located.

Arrange them in ascending or descending order.


71,75,75,77,79,81,83,84,90,95.
Median = 79 + 81 / 2 = 80
If there are only 9 observations then median = 79.

Advantages:
1. It is more representative than mean.
2. It does not depend on every observations.
3. It is not affected by extreme values.

44
Mode:
Mode is that value which occurs with the greatest
frequency.
A distribution may have more than one mode.

E.g. Diastolic blood pressure of 10 individuals.


85,75,81,79,71,80,75,78,72,73
Here mode = 75 i.e. the distribution is uni-modal
85,75,81,79,80,71,80,78,75,73
Here mode =75 and 80 i.e. the distribution is bi-modal.

•In case of an ill defined mode:

Mode = 3 Median - 2 mean


45
Advantages :
1. It eliminates extreme variation.
2. Easy to understand

Disadvantages :
3. In small number of cases there may be no mode at all
because no values may be repeated; therefore it is not used
in medical or biological statistics.

46
DISPERSION:
Dispersion is the degree of spread or variation of the
variable about a central value. The measures of dispersion
helps us to study the spread of the values about the central
value.

Purpose of Measures of Dispersion


1. To study the variability of data.
2. To determine the reliability of an average.
3. Compare two or more series in relation to their variability.

47
Commonly used measures of variation are:

• Range
• Standard deviation
• Standard error
• Coefficient of variation
• Z score

48
The Range:

The range is defined as the difference between the highest


and lowest figures in a given sample.
• It is by far the simplest measure of dispersion.

Advantage:
• Easy to calculate

Disadvantages:
• Unstable
• It is affected by one extremely high or low score.

49
Standard deviation

• The most important and widely used measure of variation


• Also known as root mean square value
• Greater the Standard deviation  greater the dispersion from
mean
• Smaller the Standard deviation  higher degree of uniformity

50
• Calculation of Standard deviation
S.D= (x-xi)2
n
x = individual value
xi= mean value
n= total number

51
Uses of standard deviation

• Summarizes the deviations of a large distribution


• Indicates whether the variation from mean is by
chance or real
• Helps in finding standard error
• Helps in finding the suitable size of sample

52
Standard Error

It is not an error
Variation between sample size and population

Standard error = S.D


n

53
Coefficient of variation

• Compare relative variability.

• C.V = S.D. x 100


Mean
• Higher the C.V greater is the variation in the series of
data.

54
Z Score
• Standard score
• Signed fractional number of standard deviations by which the
value of an observation is above the mean value of what is
being observed or measured.

• Z score = Individual value- mean


S.D

55
Standard Curve

• Gaussian curve or normal curve

• It was first described by De Moivre in


1733 and subsequently by the
German mathematician C. F. Gauss
(1777 - 1885).

56
Properties:

• Curve is inverted bell shaped.

• The curve is symmetrical and it


never touches the base line.

• Height of curve is greatest in the


middle, which coincides with
mean, median & mode

57
Properties of normal distribution

a) Mean ±1 S.D. covers 68.3% of the observations


b) Mean ±2 S.D. covers 95.4% of the observations
c) Mean ±3 S.D. covers 99.7% of the observations

• This relationship is used for fixing confidence interval.


• Normal distribution law forms the basis for various tests of
significance.

58
PROBABILITY

• The relative frequency or chance of occurrence of an event.

• An element of uncertainty is associated with every conclusion


this uncertainty is numerically expressed as probability.

• Probability: Expressed by symbol ‘ p’.


Ranges from 0 to 1.

59
Laws of probability

1) Law of addition
• if A & B are mutually exclusive events then the probability of
A & B= PA+ PB

2) Law of multiplication
• if A & B are independent events then probability of A & B =
PA × PB

60
TESTS OF SIGNIFICANCE

• Once sample data has been gathered through an


observational study or experiment, statistical inference allows
analysts to assess evidence in favor or some claim about the
population from which the sample has been drawn.

• The methods of inference used to support or reject claims


based on sample data are known as tests of significance.

61
Null hypothesis
•Hypothesis– Tentative prediction or
explanation of relationship between
two or more variables.
• Every test of significance begins with a null hypothesis
H0. 

• It asserts that there is no real difference in the sample


and the population in the particular matter under
consideration.

• The difference found is accidental and arises out of


sampling variations.

62
• For example:

• In a clinical trial of a new drug, the null hypothesis


might be that the new drug is no better on average
than the current drug.
•  H0: there is no difference between the two drugs on
average.

63
Alternative hypothesis

• In the event of rejection of the null hypothesis, we need


another hypothesis.

• For example, in a clinical trial of a new drug, the alternative


hypothesis might be that the new drug has a different effect
on average compared to that of the current drug.

•  Ha: the two drugs have different effects on average.

64
• The final conclusion once the test has been carried out is
always given in terms of the null hypothesis.

• We either "reject H0 in favor of Ha" or "do not reject H0

• We never conclude "reject Ha", or "accept Ha".

65
• If we conclude "do not reject H0", this does not
mean that the null hypothesis is true, it only suggests
that there is not sufficient evidence against H0 in
favor of Ha

• Rejecting the null hypothesis suggests that the


alternative hypothesis may be true.

66
LEVEL OF SIGNIFICANCE

The level of significance is the probability of rejecting a true null


hypothesis. 

• If the P value is small, then the probability of attributing the


difference between sample estimates to sampling fluctuations
is small and the null hypothesis is rejected.

67
CONFIDENCE LIMITS
• When we set up certain limits on both sides of the population
mean on the basis of facts that means of samples are
normally distributed around the population mean.
• These limits are called confidence limits and the range
between the two is called the confidence interval.val.

68
Types of errors
• TYPE-I ERROR:
•When a true null hypothesis is rejected, it causes a Type I error

•TYPE-II ERROR:
•When a false null hypothesis is not rejected, it causes a Type II
error

•A Type I error is considered to be more serious than a Type II


error.
Inference Accept It Reject It

Hypothesis is True Correct Decision Type 1Error (α)

Hypothesis is False Type II Error (β) Correct Decision


69
DEGREE OF FREEDOM

• Degree of freedom is defined as the number of independent


members in the sample.
• E.g. if mean of 3 observations X, Y , Z is 5
mean=(X+Y+Z)/3 = 5
• Then out of these three values we can choose only 2 of them
freely, but the choice of third one depends upon the fact that
the total of three values should be 15.
• There are only 2 independent members in the sample of
three

70
Pearson coefficient of correlation

• It was developed by Karl Pearson

• It is a measure of the linear correlation between


two variables X and Y.

• It gives a value between +1 and −1 .

• 1 : total positive correlation


• 0 : no correlation
• −1 : total negative correlation.

71
Steps involved in testing of a hypothesis:

1) State an appropriate null hypothesis for the problem.

2) Calculate the suitable statistics using the standard error

3) Determine the degree of freedom for the statistic.

4) Find the probability level p corresponding to the test


statistic using the relevant tables.

5) The null hypothesis is rejected if p is less than 0.05;


otherwise it is accepted.
72
Objectives of using tests of significance is to compare

• Sample mean with population mean.


• Means of two samples
• Sample proportion with population proportion
• Proportion of two samples
• Association between two attributes

73
Parametric Tests and Non Parametric Tests:

• Statistical tests that assume a distribution and use


parameters are called parametric tests.

• Statistical tests that don't assume a distribution or use


parameters are called Non parametric tests.

• Means and standard deviations are called

Parameters.

74
Difference between parametric and non parametric
tests:

Parametric test Non parametric test


• Information about population is • No information about population is
completely known. available.
• Scientific assumptions are made • No assumptions are made.
regarding population.
• Null hypothesis is made on • Null hypothesis is free from
parameters of population distribution. parameters.
• Test statistic is based on distribution. • Test statistic is arbitrary.
• Parametric tests are applicable only • It is applied both variables and
for variables. attributes.
• No parametric test exist for nominal • Do exist for nominal and ordinal scale
scale data. data.
• It is powerful, if it exist. • Not so powerful like parametric test.

75
Various test of significance:
Parametric tests Non parametric tests

Z test Mann Whitney U test

Students t test Wilcoxon rank test

One way ANOVA Kruskal Wallis test

Two way ANOVA Friedman test

Spearman rank
correlation

Chi – square test

76
Z test (normal test):

Sample > 30

Used for
1. Comparison of sample mean and proportion mean
2. Difference between two sample proportions

Prerequisites to apply Z test


When Z test is applied to sampling variability, difference
between sample estimate and that of population
expressed in SE instead of SD.

77
t test:
• It was first described by W.S. Gossett, whose pen name
was student.

• so this test is also known as student’s t test.

• It was later shortened to t test.

• t test compare differences between means while z test


compare differences between proportions.

• t test is used for small samples (generally sample Size < 30), z
test is used for large samples.

78
CRITICAL RATIO:

Critical ratio = parameter


Standard Error of that parameter

• For t test
Critical ratio= t = Difference between two means
SE of the difference between two means

• For Z test
Critical ratio= z = Difference between two proportions
SE of the difference between two proportions

79
Types of t-tests

One sample Unpaired t


Paired t test
t-test test

80
One sample t test:
It is used to compare the mean of a single group of
observations with a specified value.

t = sample mean – hypothesized mean


standard error of sample mean

t= X- μ
SD/√n

81
Paired t test

This test is used to compare two small sets of


quantitative data when data in each sample set are
related.

Used when measurements are taken from the same


subject before and after some manipulation such as
injection of a drug.

82
Unpaired t test

To compare the means of unpaired data of independent


observations made on two different groups or samples

t = difference between the means


standard error in the difference

If t < critical value then there is no significant difference between the


two sets of data.

If t > critical value then there is a significant difference between the


two sets of data.

83
Analysis of Variance(ANOVA)

• Indications:
• When two or more groups are studied in term of two or more
factors.

ANOVA uses the mean and variance of groups

84
• ANOVA makes a series of pair-wise comparisons for all the
groups. For example, if groups I, II, and III are compared,
ANOVA will compare I to II, I to III, and II to III.

• A significant P value indicates that a difference exists


somewhere between any two comparisons, but ANOVA
does not identify which groups are different.

85
 One way Anova-test:
• Used when various experimental groups differs in terms of
only one factor at a time.
e.g. testing statistical significance of difference in heights of
school children among three socio economic groups.

 Two way Anova-test:


• It is indicated when various experimental groups differ in term
of two factors at a time.

86
Non Parametric Tests

Spearman's rank correlation 

• It measure the strength of association between two variables.

• It is the non parametric version of the Pearson coefficient of


correlation.

87
Chi square test

• It was developed by Karl Pearson in 1900.

• It is used to determine whether there is a significant


difference between the expected frequencies and the
observed frequencies between two independent groups.

88
Applications:

1. Test for goodness of fit


2. Test of association (independence)
3. Test of homogeneity or population variance

89
1. Test for goodness of fit

This test is used to determine how well does the assumed or


expected distribution fit to the observed data.

• χ² = sum of (observed f – expected f )²


Expected frequency

If x2 (calculated) > x2 (assumed), then null hypothesis is rejected


otherwise accepted.

90
2. Test of association (independence)

• This tests the association between two events in binomial


or multinomial samples.

• Binomial
Smoking and lung cancer.
Vaccination and immunity.
Weight and diabetes.

• Multinomial
Association between number of cigarettes, equal to 10,
10-20 or more than 20/ day and incidence of lung cancer.
91
3. Test of homogeneity or population variance

• This test can also be used to test whether the occurrence of


events follow uniformity or not.

e.g. the admission of patients in hospital in all days of week is


uniform or not can be tested with the help of chi square test.

• If x2 (calculated) < x2 (assumed), then null hypothesis is


accepted, and it can be concluded that there is a uniformity in
the occurrence of the events

92
Mann- Whitney U test

• It is used to test whether two independent samples of


observations are drawn from the same or identical
population.

• It is considered as non parametric alternative to ‘t –test’.

93
 Kruskal–Wallis test

• It is used to determine if there are significant differences


between two or more groups of an independent variable. 

• It is considered the nonparametric alternative to the one-


way ANOVA, and an extension of the Mann-Whitney U
test to allow the comparison of more than two independent
groups.

94
• McNemar’s test: variant of chi squared test , used when data
is paired

• Wilcoxon’s Sign rank test – compare medians of two


independent samples. Equivalent of Paired t –test.

95
Parametric Test Non parametric test Use

To compare two paired


Paired t test Wilcoxon signed rank test samples for equality of
means/medians

To compare two
Two sample t test Mann- whitney U test independent samples for
(wilcoxon rank sum test) equality of
means/medians

To compare more than


Analysis of variance Kruskal- wallis two samples for equality
of means/medians

To compare nominal
- X2 analysis data: to compare two or
more samples for
equivalence in proportion
96
SPSS Statistics:
• It is a software package used for statistical analysis.

• It is now officially named "IBM SPSS Statistics".

• SPSS (originally, Statistical Package for the Social


Sciences, later modified to read Statistical Product and
Service Solutions) was released in its first version in
1968 after being developed by Norman H.Nie , Dale H.
Bent and C. Hadlai Hull.

• SPSS is among the most widely used programs


for statistical analysis in social science.

97
Conclusion:
• Advancing technology has enabled us to collect and safeguard
a wide variety of data with minimal effort, from patients
demographic information to treatment regimens.

• It is difficult to make sense of this confusing and chaotic array


of raw data by visual inspections alone. The data must be
processed in meaningful and systematic ways to uncover the
hidden clues.

• The methods of statistical analysis are powerful tools for


drawing the conclusions that are eventually applied to diagnosis,
prognosis and treatment plans for patients.

98
References:
•Biostatistics for oral healthcare – Jay S. Kim,
Ronald J. Dailey

•Essentials of public health dentistry- Soben Peter

•Park text book of Community Medicine

99

You might also like