Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

13/3/2023

Workshop on
Introduction to
Statistics in Dentistry
using SPSS

Dr. Muhammad Khan Asif


BDS (PAK), CHPE (PAK), MDSc
(Malaysia), PhD (Malaysia).
Chairman Shifa College of Dentistry
Research Board, Assistant Professor,
Head of Research & Development and
Forensic Odontology Department,
Shifa College of Dentistry, Islamabad.

1
13/3/2023

What is Statistics?

 Statistics is the science of learning from data.

 As Dr. Diego Kuonen (from Statoo Consulting, Switzerland) once


remarked, “Statistics is concerned with one of the most basic
human needs: the need to know about the world and how it
operates in the face of variation and uncertainty.” We can say the
whole field of statistics revolves around this concept – variation.

2
13/3/2023

Population and Sample

 Population: A set of things or


objects in which we have an
interest at the particular time.
Examples: Workers at a factory,
students in a college, in-patients at
a hospital.
 Sample: A subset of the
population
Examples: A group of workers at
the factory, a selection of students
from the college.

3
13/3/2023

Sampling Methods

 Types:
 1. Probability Sampling: Every member of the population has a chance of
being selected. Probability sampling techniques are the most valid choice.

 2. Non Probability Sampling: Individuals are selected based on non-random


criteria, and not every individual has a chance of being included in the study.
This type of sample is easier and cheaper to access but it has a higher risk of
sampling bias.

4
13/3/2023

Variable
A variable can be defined as a characteristic of things or objects that
take different values in different items that are tested.
There are two types of variables:
 Quantitative variable: This is a phrase used to describe
measurable characteristics like height, weight, age and exam
marks and counts like number of passes, number students and
number of accidents.
 Qualitative variable: This is a phrase used to describe
characteristics that cannot be measured or counted, but merely
categorized like race, sex, colour, exam grades and blood group

5
13/3/2023

Data
 Raw material of statistics
Types of data
 Quantitative data can be classified into
discrete data and continuous data.
1. Discrete data are numerical characteristics
that are countable (whole numbers).
Examples: Number of males and number of
females, Number of patients waiting for surgery,
Number of students sitting for an exam
2. Continuous data are numerical characteristics
that are measurable.
Examples: Marks obtain by students in an exam,
Body mass index (BMI) of patients, Time taken by
athletes to complete a road race

6
13/3/2023

Data
 Qualitative data can be classified further into nominal data and
ordinal data.
1. Nominal data are categorical characteristics that can be named.
Examples: Gender: Male or female – based on physical traits. Blood
group: A, B, AB or O – based on allele types. Of course, it is not true that
group A is better than group B. They are just names given based on
particular characteristics.
2. Ordinal data are categorical characteristics that can be named
and ranked as well.
Examples: Socio-economic status: Low, middle or high. Exam grades: A,
B, C, D or E – based on level of achievement. Of course, grade A is
better than grade B and so on.

7
13/3/2023

DESCRIPTIVE STATISTICS

8
13/3/2023

DESCRIPTIVE STATISTICS
Distribution
 Skewness: Measures the asymmetry of a distribution.
 Generally, if the skewness value is within plus minus 1,
symmetry can be assumed.

9
13/3/2023

DESCRIPTIVE STATISTICS
Test of normality
 Shapiro-Wilk test is usually used when the sample size is small, generally less
than 50. Kolmogorov-Smirnov test can be used when the sample size is large.

 In both tests, if the p-value is more than 0.05, normality can be assumed.
In this example, since the sample size is 36, the Shapiro-Wilk test will be
used.
 The p-value of the test is more than 0.05. Hence, the data can be assumed to
be distributed normal. The normality assumption is the foundation of many
statistical tests.
 A number of tests require the raw data to be distributed normal. There are
other tests that require a derived variable to be distributed normal. When
the normality assumption is not met, the researcher will have to turn to the
next alternative, which is using nonparametric methods.

10
13/3/2023

Example 1: Body Mass Index (Dataset 1)

11
13/3/2023

DESCRIPTIVE STATISTICS: Example 1 BMI

12
13/3/2023

DESCRIPTIVE STATISTICS: Example 1 BMI

 Interpretation:
The mean BMI for the 36 subjects is 28.4 with a standard
deviation of 5.3 (usually written as 28.4±5.3). Maximum and
minimum BMI values are 37.8 and 17.2.The range is 20.6. The
median BMI is 28.0. Median value of 28.0 indicates that at least
50% of the respondents’ BMI is more than 28.0. The skewness
value is -0.131, which is within ±1. Hence, the data can be
assumed to be symmetrical. The P-Value for Shapiro-wilk test is
greater than 0.05, which shows the data is normally distributed
so the assumption is met.

13
13/3/2023

Example 2: : To examine Triglyceride (TG)

14
13/3/2023

Example 2: : To examine Triglyceride


(TG)
 Interpretation
The mean TG is 2.78± 1.77. The maximum and minimum values
are 9.00 and 1.00. The range is 8.00. The median value is 2.19,
indicating that at least 50% of the respondents’ TG is more than
2.19. The skewness value is 1.971 which is more than 1. Hence,
the data is not symmetrical. The P-Value for Shapiro-wilk test is
less than 0.05, which shows the data is not normally distributed
so the assumption is not met.
There is one extreme value (*) in case number 34, with a value of
approximately 9. There are two outliers (o) in cases 9 and 2.

15
13/3/2023

To Obtain Descriptive Statistics for


qualitative data (For Physical Activity PA)

16
13/3/2023

To Obtain Descriptive Statistics for


qualitative data

17
13/3/2023

Additional Exercise (Dataset 1): Investigate


and Interpret the descriptive statistics
(Explore) of Diastolic blood pressure
 94, 102, 106, 97, 101, 98, 95, 105, 112, 99, 97, 104, 109, 100, 111, 93, 100,
93, 116, 83, 91, 84, 105, 111, 110, 90, 122, 86, 111, 93, 112, 88, 88, 95, 124,
95

18
13/3/2023

ONE SAMPLE T-TEST


The objective is to test if the population mean is equal to a hypothesize
value.

 Example 1: To test if mean BMI in the population is 26 kg/m2

Assumptions
 The main assumption is that the test variable is normally distributed
in the population and the cases in the sample represent a random
sample from the population. In most circumstances, with a large
sample size of more than 30, the test yields relatively valid results even
if the population is substantially non-normal.

19
13/3/2023

ONE SAMPLE T-TEST Example 1 (dataset 1):


To test if mean BMI in the population is 26
kg/m2

20
13/3/2023

ONE SAMPLE T-TEST Example 1: To test if


mean BMI in the population is 26 kg/m2

Findings
1. The mean difference is 2.4.
2. The p-value of the test is 0.010, which is less than 0.05.
3. The 95% CI for mean difference is [0.6, 4.2] which do not include zero.
4. The 95% CI for population mean = [(0.6+26), (4.2+26] = [26.6, 30.2].
Conclusion: The p-value of the test is less than 0.05. Thus, the mean BMI in
the population is not 26. We are 95% confident that the mean BMI in the
population is between 26.6 and 30.2 kg/m2 .

21
13/3/2023

ONE SAMPLE T-TEST Example 2 (dataset 1):


To test if mean SBP in the population is 150
mmHg

Findings
1. The mean difference is 1.3
2. The p-value of the test is 0.621, which is more than 0.05.
3. The 95% CI for mean difference is [-6.8, 4.1], which includes zero.
4. The 95% CI for population mean = [(-6.8+150), (4.1+150)] = [143, 159].
Conclusion: The p-value of the test is more than 0.05. Thus, the mean SBP in
the population is no different from 150. We are 95% confident that the mean
SBP in the population is between 143 and 154 mmHg.

22
13/3/2023

Additional Exercise: Additional Exercise:


Enter the following data sets in SPSS and
perform the tests.
 94, 102, 106, 97, 101, 98, 95, 105, 112, 99, 97, 104, 109,
100, 111, 93, 100, 93, 116, 83, 91, 84, 105, 111, 110, 90,
122, 86, 111, 93, 112, 88, 88, 95, 124, 95

 Test if the mean Diastolic blood pressure is equal to 100? Run test and
interpret it.

23
13/3/2023

PAIRED SAMPLE TESTS


The Paired Samples T-Test can be used to test if there is a
difference in a measured characteristic between two time points.

Assumptions
The difference scores must be distributed normal. If this
assumption is not met, the nonparametric test must be used.

24
13/3/2023

Assumption test (Data set 2):


The difference scores must be distributed
normal.

25
13/3/2023

Assumption test (Data set 2):


The difference scores must be distributed
normal.

The p-value of the test is 0.235, which is greater than 0.05. Hence, the
assumption of equality of variances is met. Hence, the parametric
procedure test must be used.

26
13/3/2023

Paired Sample Tests Data Set 2 (Example 1): To test


if the drug is effective for the management of
hypertension.

27
13/3/2023

Paired Sample Tests Data Set 2 (Example 1): To test


if the drug is effective for the management of
hypertension.

Findings
1. The mean difference is 5.1±2.7.
2. The p-value of the test is less than 0.001.
3. The 95% CI for mean difference is [4.0, 6.2], which do not involve zero.
Conclusion: The p-value of the test is less than 0.05. Thus, there is a significant change in
mean SBP. Since the mean after is less than the mean before, the drug is effective. We are 95%
confident that the reduction in SBP is between 4.0mmHg and 6.2mmHg

28
13/3/2023

Paired Sample Tests Data Set 2 (Example 2): To


test if the drug is effective for the management
of hypertension (diastolic BP).
 Test of Normality:

29
13/3/2023

Paired Sample Tests Data Set 2 (Example 2):


To test if the drug is effective for the
management of hypertension (diastolic BP).
 Test of Normality:

The p-value of the test is 0.007, which is less than


0.05. Hence, the assumption of equality of variances is
not met. Hence, the parametric procedure is not valid.
Then the nonparametric test based on ranking must
be used to test if there is a difference.

30
13/3/2023

Paired Sample Tests Data Set 2 (Example 2):


Nonparametric Wilcoxon Signed –Rank test To
Obtain a Nonparametric Paired-Sample Test

31
13/3/2023

Paired Sample Tests Data Set 2 (Example 2):


Nonparametric Wilcoxon Signed –Rank test To Obtain
a Nonparametric Paired-Sample Test

Out of the 24 subjects, 19


recorded lower DBP and 5
recorded higher DBP

The p-value of the test is


0.004, which is less than 0.05.
Overall, there is a change in
DBP.

Conclusion: There is a
significant difference in DBP.
Hence, the drug is effective.

32
13/3/2023

Paired samples test

33
13/3/2023

Additional Exercise 1: The following are the


time (in min) taken to complete a task before
and after a new training program.

 Before the training program data: 12, 15, 16, 15, 13, 14,
15, 12, 18, 19
 After the training program data: 11, 14, 12, 14, 10, 12,
13, 11, 16, 17

 Test if the training program is effective???

34
13/3/2023

Independent Samples T-Test


The Independent Samples T-Test can be used to test if there is a
difference in a measured characteristic between two groups of
cases.
The objective is to test if there is a difference in means between the
2 populations.

Assumptions: The variances in the two groups must be similar, a


condition known as homogeneity. In SPSS the Levene’s test is used
to test if this assumption is met.

35
13/3/2023

Independent Samples T-Test Data set 1 (Example 1): To


test if there is a difference in mean BMI between the
male and females

36
13/3/2023

Independent Samples T-Test

Findings
1.The p-value for the Levene’s test for equality of variance is 0.883. Since the p-value is
more than 0.05, equality of variances is assumed.
2. The mean difference is 7.9.
3. The p-value of the test is less than 0.001.
4. The 95% CI for mean difference is [5.6, 10.3], which do not include zero.
Conclusion: The p-value of the test is less than 0.05. Thus, there is a significant difference
in mean BMI between the males and females. The mean BMI among the females is higher
compared to the males. We are 95% confident that the difference is between 5.6 and 10.3

37
13/3/2023

Independent Samples T-Test (Non


parametric test Example)

38
13/3/2023

Independent Samples T-Test (Non


parametric test Example)

39
13/3/2023

Independent Samples T-Test (Task 1)

Age of Facebook users:


33, 31, 44, 35, 45, 42, 40, 50, 46, 42, 38, 33, 43, 49, 46,
50, 40, 41, 37, 52
Age of Non-Facebook users:
46, 49, 39, 42, 43, 42, 43, 37, 50, 36, 40, 32, 40, 42, 36, 42

Test if there is a difference in mean age


between the Facebook users and non users?

40
13/3/2023

ONE-WAY ANOVA
The one way ANOVA can be used to test if there is a difference in a
measured characteristic between more than two groups of cases.
The objective is to test if there is a difference in means between more
than 2 groups of population.
Example:

Doctors Attendants Nurse

Group 1 Group 2 Group 3


Assumptions: The variances within the levels must be similar, a condition
known as homogeneity. Also, the distribution for all the data must be at
least fairly normal. In SPSS the Levene’s test is used to test if this
assumption is met.

41
13/3/2023

ONE-WAY ANOVA (Dataset 1): To test if mean


BMI differ between three job categories.

42
13/3/2023

ONE-WAY ANOVA Example 1: To test if mean


BMI differ between three job categories.

Findings
1. The p-value for the Levene’s test for equality of variance is 0.384, which is more
than 0.05. Thus, equality of variances assumption is met.
2. The p-value of the test is 0.028, which is less than 0.05. Hence, at least one pair
of means differ significantly.
Conclusion: At least one pair of means differ significantly.

43
13/3/2023

ONE-WAY ANOVA Example 1: To test if mean BMI differ


between three job categories.
When there is a difference, there is a need to identify the pair(s) that
differs significantly. This is done using post hoc tests

In this example, since equality of variances can be assumed, Tukey procedure is


chosen. When the variances are not similar, either Tamhanes’ T2, Dunnett’s T3,
Dunnett’s C or Games-Howell procedures is used.

44
13/3/2023

ONE-WAY ANOVA Example 1: To test if mean BMI differ


between three job categories.

• The p-values for the mean difference are given in the column ‘Sig.’

Conclusion: The mean BMI among Nurses is significantly higher


compared to that of Attendants and doctors. However there is no significant
difference in the mean BMI of doctor and attendants.

45
13/3/2023

ONE-WAY ANOVA (Dataset 1): To test if mean


DBP differ between three job categories

The p-value for the


Levene’s test for equality
of variance is 0.035,
which is less than 0.05.
Thus, equality of
variances assumption is
not met. Thus non-
parametric test should
be used.

46
13/3/2023

ONE-WAY ANOVA Example 2:Nonparametric


test:

47
13/3/2023

ONE-WAY ANOVA Example 2: Nonparametric


test:
The p-value of this test is 0.020,
which is less than 0.05. Thus,
DBP differ between at least one
pair of categories. Based on the
mean ranks, obviously, DBP
among Nurses is higher
compared to Attendants and
doctors. In this example, further
investigation can be made by
performing a post hoc test, under
the condition of equality of
variances not assumed (for
example Dunnett T3 procedure).

48
13/3/2023

ONE-WAY ANOVA Example 2: Nonparametric


test (Post hoc analysis):

49
13/3/2023

ONE-WAY ANOVA Example 2: Nonparametric


test (post hoc analysis):

Conclusion: There is a significant difference in DBP between


Attendants and Nurses.

50
13/3/2023

ONE-WAY ANOVA (Additional Exercise 1)

Test if mean BP differ between doctors, nurses and teachers?

51
13/3/2023

One-Way ANOVA: Flow Diagram

52
13/3/2023

ONE-WAY ANOVA (Additional Exercise 2)

Test if the marks obtained differ between the 3 faculties?

53
13/3/2023

Exercise: Choosing the Right Statistical


Test?????
1. Test if the mean fasting glucose level is 5.6 mmol/l??????

2. Test if there is a significant difference in the mean weight


change between dentists and students????

3. Test if there is a difference in pH levels between morning and


afternoon among 3rd year BDS students of the SCD?

4. To test distance travelled (in km) by four different makes of


“gasoline-saving” cars on 50 liters of petrol, over the same course?

54
13/3/2023

Thank you

Contact details: WhatsApp 03079314642


Email: muhammad.khan.scd@stmu.edu.pk /
dr.muhammad.khan@hotmail.com

55

You might also like