Professional Documents
Culture Documents
Analysis of Continuous and Categorical Variables: January 28, 2020
Analysis of Continuous and Categorical Variables: January 28, 2020
AND CATEGORICAL
CONTINUOUS
VARIABLES
Lecture 4
January 28, 2020
Descriptive vs.
Inferential Statistics
■ Descriptive statistics: describing the
central tendency and dispersion in
data through numerical calculations,
tables, and graphs
■ Inferential statistics: use sample data to
draw conclusions about the population
that the sample is meant to represent
(sampling will naturally involve error)
– Estimate parameters and test
hypotheses to make inferences
about the population
– Compare means and evaluate
Analysis of Continuous
Data (comparison of
means) Non-
Samples Parametric test
parame
tric
test
2
Paired t-test Wilcoxon test
related
sample
s
2 Independen Mann-
independ t t- Whitney
ent test U test
Student’s t-
test
■ Used to compare means between two
groups
– Related groups: Paired t-test (e.g.
pre- and post- study measures on
the same participants)
– Independent groups:
Unpaired/Independent t- test on two
different groups
– Used more often in research
■ Null hypothesis: the means of the
groups are not statistically different
■ Degrees of freedom (df): amount of
information provided by the data that
can be used to estimate population
parameters and variability of the
estimates
– df = n – # of estimated parameters
– As df increase, t-distribution more
closely
resembles a normal distribution
■ E.g. One sample independent t-test to
estimate the
population mean
– Estimates the standard deviation
about the mean
– Uses a t-distribution with df = n – 1
– df = n – 1 for paired t-test as well
■ E.g. Two sample independent t-test to
Degrees of freedom
(df)
T-test
assumptions
■ Samples are independent
■ Variable is normally distributed
■ Variance homogeneity: variance within
each group is equal
– Levene’s test for equality of variances
– Informs you whether to use results for
pooled or unpooled variance
Independent Samples
Levene's Test
Test for
Equality
of t-test for Equality of
Variance Means 95%
s Confidence
Interval of
Sig. (2- Mean Std. Differen
the
F Sig. t df tailed) Differen Difference
Error Lower
ce
Bod Equal .747 .389 1.284 116 ce .8603
Upper -.5992
mas
y varianc .202 1.104 2.8086
inde
s es 7
x assume 1.264 44.0 .213 1.1047 .8739 -.6565
d Equal
variances 09 2.8659
not
assumed
Confidence interval
(CI)
■ Degree of uncertainty: area around the
sample statistic where the
corresponding population parameter is
likely to be
■ The larger the sample, the smaller the
Cl
– If the CI is small: Greater likelihood
that the sample statistic (sample
mean) approximates the population
parameter (population mean)
– If CI contains 0 (null value) then the
means are not statistically different
Calculating
CI
■ CI = 𝑥 ± z s
(
1–α/2√
)– 99% CI: 𝑥 ± 2.58(
n
s
√
– ) 95% CI: 𝑥 ± 1.96(
n
s
√
– ) 90% CI: 𝑥 ± 1.64
n (
s
√
) n
Reporting independent
t-test results
■ Report the means and standard deviations
for both
groups, t-value, degrees of freedom, and
p-value.
– E.g. Males (mean±𝑆𝑡𝐷): 24.0±4.1,
Females (mean±𝑆𝑡𝐷): 22.9±4.0;
t=1.3, df=116, p=0.20
http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-analysis-of-variance-ano
va-and-the-f-test
F-value that is
derived
http://blog.minitab.com/blog/adventures-in-statistics-2/understanding-analysis-of-variance-ano
va-and-the-f-test
ANOVA
assumptions
■ Variable is normally distributed
■ The errors are normally distributed
– We will discuss this more in
regression lecture
■ The cases are independent from each
other
■ Variance homogeneity
Test of Homogeneity of
Variances
Caloric intake (kcal/day)
Levene
Statistic df1 df2 Sig.
.115 3 114 .951
ANOVA
Caloric intake (kcal/day)
Sum of Mean
Squares df Square F Sig.
Between 3752358.8 3 1250786.2 3.30 .023
Groups 1 7 0
5 2
Within 43206696. 114 379006.10
Groups 2 8
70
Total 46959055. 117
0
90
ANOVA means
plot
Reporting ANOVA
results
■ Report means and standard deviations for
all groups, as well as the F value, degrees
of freedom, and p-value.
■ Text for this example:
– Analysis of variance indicated that the
different physical activity groups
report different levels of caloric intake:
■ Sedentary: 1640.1 ± 516.7
■ Light: 2030.8 ± 570.9
■ Moderate: 1999.4 ± 670.7
■ High: 2005.1 ± 633.5
■ F(3, 114) = 3.3, p = 0.02
Post-hoc tests
(ANOVA)
■ ANOVA tells you if there is a difference
between means, but not specifically
which means differ
– Conduct multiple comparison post-
hoc test to determine which means
differ
■ Many post-hoc tests to choose from (see
this link for list and description:
http://www.statisticshowto.com/post-ho
c/
)
■ Commonly applied to ANOVA:
– Tukey’s Test
Multiple Comparisons
Dependent Variable: Caloric intake (kcal/day)
LSD
Chi-Square Tests
Asymptoti
c Exact Sig. Exact Sig.
Value df Significanc (2- (1-
e sided) sided)
(2-sided)
Pearson Chi-Square 3.421a 1 .064
Continuity 2.440 1 .118
Correction b