Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

Inferential Statistics Overview

Jose Jurel M. Nuevo, RMT, MA CHEM, MSMT,FRIMTECH, PhD, DrPH


INFERENTIAL STATISTICS
Inferential statistics can be used to prove or disprove
theories, determine associations between variables, and
determine if findings are significant and whether or not we
can generalize from our sample to the entire population

The types of inferential statistics we will go over:


• Correlation
• T-tests/ANOVA
• Chi-square
• Logistic Regression
Type of Data & Analysis
• Analysis of Categorical/Nominal Data
– Correlation T-tests
– T-tests

• Analysis of Continuous Data


– Chi-square
– Logistic Regression
Measures of Association
• Parametric Measures of Association – These
answer the question, “within a given population,
is there a relationship between one variable and
another variable?” A measure of association can
exist only if data can be logically paired. It can be
tested for significance.
– Correlation – answers the question, “What is the
degree of relationship between “x” and “y” – Use
Pearson Product Moment Correlation (Pearson r ) –see
next slide
Correlation
• When to use it?
– When you want to know about the association or relationship
between two continuous variables
• Ex) food intake and weight; drug dosage and blood pressure; air temperature and
metabolic rate, etc.

• What does it tell you?


– If a linear relationship exists between two variables, and how strong that
relationship is

• What do the results look like?


– The correlation coefficient = Pearson’s r
– Ranges from -1 to +1
– See next slide for examples of correlation results
Correlation
Guide for interpreting strength
of correlations:
 0 – 0.25 = Little or no
relationship

 0.25 – 0.50 = Fair degree of


relationship

 0.50 - 0.75 = Moderate degree


of relationship

 0.75 – 1.0 = Strong relationship

 1.0 = perfect correlation


Correlation
• How do you interpret it?
– If r is positive, high values of one variable are associated with high values of the other
variable (both go in SAME direction - ↑↑ OR ↓↓)
• Ex) Diastolic blood pressure tends to rise with age, thus the two variables are positively correlated

– If r is negative, low values of one variable are associated with high values of the other
variable (opposite direction - ↑↓ OR ↓ ↑)
• Ex) Heart rate tends to be lower in persons who exercise frequently, the two variables
correlate negatively

– Correlation of 0 indicates NO linear relationship

• How do you report it?


– “Diastolic blood pressure was positively correlated with age (r = .75, p < . 05).”

Tip: Correlation does NOT equal causation!!! Just because two variables are highly correlated, this does NOT mean that
one CAUSES the other!!!
Measures of Association
• Non-parametric tests for association
– Correlation
• The Spearman Rank Order Correlation (Rs)– “To what
extent and how strongly are two variables related?”
• Phi coefficient – it can be used with nominal data, but
should have ordinal data
• Kendall’s Q – can be used with nominal data
Tests of Significance
• Non-parametric tests of significance – small
numbers, can’t assume a normal distribution, or
measurement not interval
– Chi-square – requires only nominal data – allows
researcher to determine whether frequencies that have
been obtained in research differ from those that would
have been expected – use a X2 sampling distribution
Chi-square
• When to use it?
– When you want to know if there is an association between two
categorical (nominal) variables (i.e., between an exposure and
outcome)
• Ex) Smoking (yes/no) and lung cancer (yes/no)
• Ex) Obesity (yes/no) and diabetes (yes/no)

• What does a chi-square test tell you?


– If the observed frequencies of occurrence in each group are
significantly different from expected frequencies (i.e., a
difference of proportions)
Chi-square
• What do the results look like?
– Chi-square test statistics = X2

• How do you interpret it?


– Usually, the higher the chi-square statistic, the
greater likelihood the finding is significant, but you
must look at the corresponding p-value to
determine significance
Tip: Chi square requires that there be 5 or more in each cell of a 2x2 table and 5 or more in 80% of
cells in larger tables. No cells can have a zero count.
How do you report chi-square?
“248 (56.4%) of women and 52
(16.6%) of men had abdominal
obesity (Fig-2). The Chi square
test shows that these differences
are statistically significant
(p<0.001).”

“Distribution of obesity by gender showed


that 171 (38.9%) and 75 (17%) of women
were overweight and obese (Type I &II),
respectively. Whilst 118 (37.3%) and 12 (3.8%)
of men were overweight and obese (Type I &
II), respectively (Table-II).
The Chi square test shows that these
differences are statistically significant
(p<0.001).”
Tests of Significance
– Mann Whitney U – an alternate to the
independent t-test – must have at least ordinal
data. It counts the comparative ranks of scores in
two samples (from highest to lowest) The null
hypothesis is that the two samples are randomly
distributed. Use U sampling distribution tables for
small sample sizes (1-8) and medium sample sizes
(9-20) and the Z test for large samples
Tests of Significance
– Wilcoxin Matched Pairs (signed rank test) – is an
alternate to the paired t-test. It is used for
repeated measures on the same individual. It
requires a measurement between ordinal and
interval scales – the scores must have some real
meaning. Use a T table. If the T is less than or
equal to the T in the table, the null hypothesis is
rejected.
Prediction
• Parametric Prediction – using a correlation, if you
know score “x”, you can predict score “y” for one
person – Use regression analysis
– Simple linear regression – allows the prediction from one
variable to another – you must have at least interval level
data
– Multiple linear regression – this allows the prediction of
one variable from several other variables. The dependent
variable must be on the interval scale
Logistic Regression
• When to use it?
– When you want to measure the strength and direction of the
association between two variables, where the dependent or
outcome variable is categorical (e.g., yes/no)
– When you want to predict the likelihood of an outcome while
controlling for confounders
• Ex) examine the relationship between health behavior (smoking, exercise,
low-fat diet) and arthritis (arthritis vs. no arthritis)
• Ex) Predict the probability of stroke in relation to gender while controlling for
age or hypertension

• What does it tell you?


– The odds of an event occurring The probability of the outcome
event occurring divided by the probability of it not occurring
Logistic Regression
• What do the results look like?
• Odds Ratios (OR) & 95% Confidence Intervals (CI)

• How do you interpret the results?


– Significance can be inferred using by looking at confidence intervals:
• If the confidence interval does not cross 1 (e.g., 0.04 – 0.08 or 1.50 – 3.49), then the result is
significant

– If OR > 1  The outcome is that many times MORE likely to occur


• The independent variable may be a RISK FACTOR
• 1.50 = 50% more likely to experience event or 50% more at risk
• 2.0 = twice as likely
• 1.33 = 33% more likely

– If OR < 1  The outcome is that many times LESS likely to occur


• The independent variable may be a PROTECTIVE FACTOR
• 0.50 = 50% less likely to experience the event
• 0.75 = 25% less likely
Prediction
– Non-parametric Prediction – measures the extent to which
you can reduce the error in predicting the dependent
variable as a consequence of having some knowledge of
the independent variable such as, predicting income [DV]
by education [IV]
• Kendall’s Tau – used with ordinal data and ranking - is better than
the Gamma because it takes ties into account
• Gamma - used with ordinal data to predict the rank of one
variable by knowing rank on another variable
• Lambda – can be used with nominal data – knowledge of the IV
allows one to make a better prediction of the DV than if you had
no knowledge at all
Parametric Multiple Comparisons
• The analysis of variance (ANOVA) is probably the
most commonly encountered multiple comparison
test.
• It compares observed values with expected values
in trying to discover whether the means of several
populations are equal. It compares two estimates of
the population variance. One estimate is based on
variance within each sample – within groups.
• The other is based on variation across samples –
between groups. The between group variance is
the explained variance (due to the treatment) and
the variation within each group is the unexplained
variance (the error variance).
Parametric Multiple Comparisons
– ANOVA cont. The ratio of the explained scores to
the unexplained scores gives the F statistic. If the
variance between the groups is larger, giving an F
ratio greater than 1, it may be significant
depending upon the degrees of freedom. If the F
ratio is approximately 1, it means that the null
hypothesis is supported and there was no
significant difference between the groups.
Parametric Multiple Comparisons
– ANOVA cont. If the null hypothesis is rejected,
then one would be interested in determining
which groups showed a significant difference.
– The best way to check this is to conduct a post hoc
test such as the Tukey, Bonferrioni, or Scheffe.
(SPSS will do this for you if you click on Post-hoc
and check the test desired.Check on descriptives
while you still in ANOVA, and the program will also
give you the mean for each group)
Testing Hypothesis
• Type I error -- rejecting Ho when it was true (it
should have been accepted)
– equal to alpha
– if  = .05, then there’s a 5% chance of Type I
error
• Type II error -- accepting Ho when it should have
been rejected
– If increase , you will decrease the chance of
Type II error
POS HOC

• BONFERRONI – Most commonly used post hoc


test, highly flexible, very simple to compute
and can be used with any type of statistical
test. Sidak Bonferroni, Hochberg’s Sequential
Method.
POST HOC
• SCHEFFE TEST. When comparing two groups from the
larger ANOVA. This is for doing complex comparisons.
• FISHER LSD (LEAST SIGNIFICANT DIFFERENCE) TEST.
• DUNNET TEST. A set of comparisons are being made
to one particular group. (several treatment with one
control group)
• TUKEY HSD (HONEST SIGNIFICANT DIFFERENCE).
Evaluate whether differences between any two pairs
of means are significant.
POST HOC
• GAMES-HOWELL. This is used with variances
are unequal and also takes into account
unequal group sizes.
Parametric Multiple Comparisons
• Two-Way Analysis of Variance
– Classifies participants in two-ways
– Results answer three questions
• Two main effects
• An interaction effect
Non-parametric Multiple Comparison
• Kruskal-Wallis Test – an alternative to the one-way
ANOVA. The scores are ranked and the analyses
compare the mean rank in each group. It determines
if there is a difference between groups.
• McNemar Test – an adaptation of the Chi-square
that is used with repeated measures at the nominal
level.
• Friedman Test –an alternative to the repeated
ANOVA. Two or more measurements are taken from
the same subjects. It answers the questions as to
whether the measurement changes over time.
1. Which test would you use?
Participant New Participant No Scheme
number Mathematics number implemented
Help Scheme

1 11 11 14 IV = Whether or not given


2 12 12 13 Math’s help Scheme

3 9 13 12
4 8 14 16 DV = Math’s test score
5 11 15 14
6 14 16 15
7 10 17 19
8 7 18 20
9 15 19 16
10 9
2. Which test would you use?

Participant Study time (mins) Test score (%)


1 100 60
2 25 47
3 95 70
4 60 53
5 10 41
6 75 51
7 80 61
8 45 44
9 55 45
10 85 61
3. Which test would you use?

Participant Neutral Words Emotionally


number threatening words IV = Whether word
1 14 15 presented is neutral
or emotionally
2 18 16 threatening
3 19 19
4 20 14 DV = Recall of words

5 16 19
6 15 10
7 20 16
8 18 15
9 19 18
10 20 18
4. Which test would you use?

Art Students Science Row Total


Students IV = Whether
Art Students or
Extroverted 19 10 29
Science Students
Introverted 11 15 26
Column 30 25 Total: 55 (Natural Experiment)
Total
DV = Extroversion-
Introversion score
Summary of Statistical Tests
Statistic Test Type of Data Needed Test Statistic Example
Correlation Two continuous Pearson’s r Are blood pressure and
variables weight correlated?

T-tests/ANOVA Means from a Student’s t Do normal weight (group 1)


continuous variable patients have lower blood
taken from two or pressure than obese
more groups patients (group 2)?

Chi-square Two categorical Chi-square X2 Are obese individuals


variables (obese vs. not obese)
significantly more likely to
have a stroke (stroke vs. no
stroke)?
Logistic A dichotomous Odds Ratios (OR) Does obesity predict stroke
Regression variable as the & 95% (stroke vs. no stroke) when
outcome Confidence controlling for other
Intervals (CI) variables?
Summary
• Descriptive statistics can be used with nominal, ordinal, interval
and ratio data

• Frequencies and percentages describe categorical data and


means and standard deviations describe continuous variables

• Inferential statistics can be used to determine associations


between variables and predict the likelihood of outcomes or
events

• Inferential statistics tell us if our findings are significant and if we


can infer from our sample to the larger population
Next Steps
• Think about the data that you have collected
or will collect as part of your research project
– What is your research question?
– What are you trying to get your data to “say”?
– Which statistical tests will best help you answer
your research question?
– Contact the research coordinator to discuss how
to analyze your data!
References
• Essential Medical Statistics. Kirkwood & Sterne, 2nd Edition. 2003

• http://ocw.tufts.edu/Content/1/lecturenotes/193325

• http://stattrek.com/AP-Statistics-1/Association.aspx?Tutorial=AP

• http://udel.edu/~mcdonald/statcentral.html

• Background to Statistics for Non-Statisticians. Powerpoint


Lecture. Dr. Craig Jackson , Prof. Occupational Health Psychology ,
Faculty of Education, Law & Social Sciences, BCU.
ww.hcc.uce.ac.uk/craigjackson/Basic%20Statistics.ppt.

You might also like