Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Assignment #2

Suprita Anand (93856235)

Jenna Multani (10143741)

The University of British Columbia

KIN 206: Introduction to Statistics in Kinesiology

Dr. Carolyn McEwen

March 20th, 2022


Hypothesis 1:

“Students who have a running/walking path (excluding a sidewalk) within 200m of where
they live will engage in more hours of moderate-vigorous physical activity during the middle
of the term (PA last week) compared to students who do not have a running/walking path
within 200m of where they live.”

1a) Select and state the appropriate statistical analysis given the research hypothesis (1
mark).

The t-test for two independent means (unequal sample sizes) will be used.

1b) Using JASP, produce appropriate frequency distribution tables of the data (2
marks).

Frequencies for Hours of PA during the last week


Path within Hours of PA during the Frequenc Valid Cumulative
Percent
200m? last week y Percent Percent
No 0 2 13.33 13.33 13.33
  2 1 6.67 6.67 20.00
  3 1 6.67 6.67 26.67
  4 1 6.67 6.67 33.33
  5 1 6.67 6.67 40.00
  6 2 13.33 13.33 53.33
  7 2 13.33 13.33 66.67
  9 2 13.33 13.33 80.00
  11 2 13.33 13.33 93.33
  14 1 6.67 6.67 100.00
Missing 0 0.00    
  Total 15 100.00    
Yes 0 6 13.33 13.33 13.33
  1 1 2.22 2.22 15.56
  2 1 2.22 2.22 17.78
  3 5 11.11 11.11 28.89
  4 10 22.22 22.22 51.11
  5 1 2.22 2.22 53.33
  6 2 4.44 4.44 57.78
  7 3 6.67 6.67 64.44
  8 2 4.44 4.44 68.89
  9 1 2.22 2.22 71.11
  10 5 11.11 11.11 82.22
  11 2 4.44 4.44 86.67
  12 3 6.67 6.67 93.33
  14 3 6.67 6.67 100.00
Missing 0 0.00    
  Total 45 100.00    
1c) Is there problematic data? Explain your answer (1 mark).

The distribution does not contain any data that is ‘problematic’ since the values inputted are
appropriate to the question that was asked. Firstly, all participants have responded to the
question ‘Path within 200m?’ with either a ‘No’ or a ‘Yes’. Secondly, there are no entered
values like -3 hours or 170 hours that fall outside the limits of a 0-168-hour period within the
week.

2a) State the null and alternative hypothesis (assume two-tailed, alpha = .05) (2 marks).

 μ1: Hours of moderate-vigorous physical activity engaged during the last week by students
who do not have a running/walking path (excluding a sidewalk) within 200m of where they
live.

 μ2: Hours of moderate-vigorous physical activity engaged during the last week by students
who have a running/walking path (excluding a sidewalk) within 200m of where they live.

H0 : μ1 =  μ2

H1 : μ1 ≠ μ2

2b) State the decision rule for the analysis if you were to conduct the statistical analysis
by hand (2 marks).

Degrees of freedom = (N1 - 1) + ( N2 - 1) 

Degrees of Freedom = (45 - 1) + (15 - 1) = 58

For ɑ = .05 (two-tailed) and df = 58, critical value = ± 2.009

Rejection rule: 

If t < -2.009 or > 2.009, reject H0 ; otherwise do not reject H0.

2c) Using JASP, conduct the appropriate statistical analysis given the research
hypothesis (two-tailed, alpha = .05). Include the JASP output of the t-table in your
assignment write up.
Independent Samples T-Test
95% CI for
95% CI for
Mean
Cohen's d
Difference
Mean SE Cohen's
t df p Lower Upper Lower Upper
Difference Difference d
Hours of
PA during 0.1
58 0.85 0.24 1.26 -2.27 2.76 0.06 -0.53 0.64
the last 9
week
Note.  Student's t-test.
t = - 2.009 < 0.19 < 2.009; therefore, do not reject Ho (p > 0.05)

2d) Check and discuss the assumptions of the analysis you conducted (eg: what were the
assumptions based on the analysis? Were the assumptions met?) Please include the
JASP output for any assumption checks you did using JASP.

There are 5 assumptions associated with the analysis that was conducted.

1. Level of measurement of the dependent variable:


This assumption states that the dependent variable must be measured at an interval or
ratio level measurement. In this study, the hours of moderate-vigorous physical
activity engaged by KIN 206 students is the dependent variable, which due to its true
zero point makes it a ratio level of measurement. Therefore, this assumption is met.

2. Independence:
This assumption states that scores on the dependent variable must be independent,
meaning that they must come from different participants. In this study, data was
collected using a between groups design. Each participant was categorized into two
groups based on their response to the question that was asked regarding whether or
not they had a running/walking path (excluding sidewalk) within 200m of where they
lived. Therefore, this assumption is met since the data collected (hours of PA) came
from participants who were present in either group, not both groups.

3. Levels of the independent variable:


This assumption states that there must be two levels of the independent variable. In
this study, the two levels of the independent variable are as follows: KIN 206 students
who have a running/walking path (excluding sidewalk) within 200m of where they
live and KIN 206 students who do not have a running/walking path (excluding
sidewalk) within 200m of where they live. Therefore, this assumption is met.

4. Homogeneity of variance:
Homogeneity of variance assumes that the two groups have equal variances, which
was tested by the Levene's test. Since the p-value of 0.54 for the F-statistic is greater
than 0.05, the variances of the two groups do not significantly statistically differ from
each other. Therefore, the assumption of equality of variances is met.
Test of Equality of Variances (Levene's)
  F df p
Hours of PA during the last week 0.39 1 0.54

5. Normality:
This assumption states that the dependent variable should be normally distributed and
can be verified using the Shapiro-Wilk’s test – a statistical test to assess whether the
data deviates from a normal distribution.

Test of Normality (Shapiro-Wilk)


    W p
Hours of PA during the last week No 0.97 0.89
  Yes 0.93 9.89e-3
Note.  Significant results suggest a deviation from normality.

Since the p-value of 0.89 for the ‘No’ group is greater than 0.05, the distribution of
the hours of physical activity engaged by KIN 206 students who do not have a
running/walking path (excluding a sidewalk) within 200m of where they live does not
statistically significantly differ from a normal distribution. Thus, the assumption of normality
is met for this group.

However, the p-value of 0.00989 for the ‘Yes’ group is less than 0.05, which indicates
that the distribution of the hours of physical activity engaged by KIN 206 students who have
a running/walking path (excluding a sidewalk) within 200m of where they live statistically
significantly differs from a normal distribution. Although this initial analysis suggests that the
assumption of normality for this group is not met, it is important to evaluate the assumption
of normality further.

Hours of PA during the last week


Yes

Descriptive Statistics
Hours of PA during the last week
  No Yes
Valid 15 45
Missing 0 0
Median 6.00 4.00
Mean 6.27 6.02
Std. Error of Mean 1.06 0.63
Std. Deviation 4.10 4.25
Variance 16.78 18.02
Skewness 0.12 0.33
Std. Error of Skewness 0.58 0.35
Kurtosis -0.55 -0.97
Std. Error of Kurtosis 1.12 0.69
Minimum 0.00 0.00
Maximum 14.00 14.00
Since the mean hours of physical activity (6.02 hours) for KIN 206 students who have
a running/walking path (excluding sidewalk) within 200m of where they live is greater than
the median (4.00 hours), it results in a positively skewed distribution. The histogram
illustrates that the highest frequency of values appear towards the left of the distribution and
the lowest frequency of values occur near the tail-end (right side) of the distribution.
However, upon analysing the slightly positive skewness statistic of 0.33, it is evident that the
data still approximates a normal distribution, as only a skewness score that is greater than 2
or lesser than -2 gives rise to concerns about the degree to which the distribution is
asymmetrical.

Moreover, a slightly negative kurtosis score of -0.97 implies that the distribution has a
little more variability compared to a normal distribution that is mesokurtic (neither peaked
nor flat). This is also illustrated by the histogram that presents itself to have a flatter shape
(platykurtic). Given that only a kurtosis statistic that is greater than 2 or lesser than -2 raises
concerns regarding the degree to which the distribution deviates from normality, it is clear
that the data still approximates a normal distribution.

Therefore, based on the aforementioned analysis, it can be concluded that the


assumption of normality for the ‘Yes’ group is met.

2e) Write a concluding statement based on your analysis (6 marks).

The mean hours of physical activity last week engaged by 15 KIN 206 students who did not
have a running/walking path within 200m of where they live (excluding a sidewalk) (M =
6.27, SD = 4.10) did not statistically significantly differ from the mean hours of physical
activity last week engaged by 45 KIN 206 students who have a running/walking path within
200m of where they live (excluding a sidewalk) (M = 6.02, SD = 4.25), t(58) = 0.19, p >
0.05, with analyses suggesting an extremely small effect size (d = 0.06).

2f) Relate the results of the analysis to the research hypothesis (1 mark).

The existence of a walking/running path (excluding a sidewalk) within 200m of where a KIN
206 student lived did not affect the hours of moderate-vigorous physical activity they
engaged in during the middle of the term.

3a) Calculate and report the 95% confidence intervals for each sample mean in the
analysis you conducted in step 2. Show the formula that you used to calculate the 95%
confidence interval and the main values within the formula (eg: the man, standard
error, t) (4 marks).

‘No’ group ‘Yes’ group


95% C.I = x̄ ± t (sx̅) 95% C.I = x̄ ± t (sx̅)
x̄ = 6.27, s = 4.10, N = 15 x̄ = 6.02, s = 4.25, N = 45

s s
sx̅ = sx̅ =
√N √N
4.10 4.25
sx̅ = =1.0586 sx̅ = =0.6336
√ 15 √ 45
df = N – 1 df = N – 1

df = 15 – 1 = 14 df = 45 – 1 = 44

A 95% C.I implies α = 0.05 and df = 14. A 95% C.I implies α = 0.05 and df = 44.

t = ± 2.145 t = ± 2.021

95% C.I = 6.27 ± 2.145 (1.0586) 95% C.I = 6.02 ± 2.021 (0.6336)

95% C.I = 6.27 ± 2.27 95% C.I = 6.02 ± 1.2804

Upper limit = 8.54 (2dp) Upper limit = 7.30 (2dp)

Lower limit = 4.00 (2dp) Lower limit = 4.74 (2dp)


95% C.I = 4.00, 8.54 95% C.I = 4.74, 7.30

There is a 0.95 probability that the interval There is a 0.95 probability that the interval
of 4.00 hours to 8.54 hours contains the of 4.74 hours to 7.30 hours contains the
population mean hours of moderate- population mean hours of moderate-
vigorous physical activity during the middle vigorous physical activity during the middle
of the term (PA last week) engaged by 15 of the term (PA last week) engaged by 45
KIN 206 students who do not have a KIN 206 students who have a
running/walking path within 200m of where running/walking path within 200m of where
they live. they live.

3b) Using JASP, create a descriptive plot with 95% confidence intervals and include it
in your assignment write up (1 mark).
Hours of PA during the last week

3c) Based on your work in steps 3a) and 3b), do the intervals between the samples
overlap (2 marks)? What do you think this overlap or lack of overlap means (2 marks)?

From our calculations and analysis of the descriptive plot above, it is evident that the
intervals between the samples overlap. Since the statistical analysis was conducted to deduce
whether the presence of a running/walking path (excluding sidewalk) within 200m of a KIN
206 student’s area of residence affects the hours of moderate-vigorous physical activity they
engage in during the middle of the term, the overlap between the confidence intervals
suggests that the difference between the two groups of KIN 206 students is not statistically
significant. This reinforces the conclusion that was reached through our independent samples
t-test that did not reject the null hypothesis.

Hypothesis 2:

“Students who will participate in more hours of moderate-vigorous physical activity during
the first week of term compared to during the middle of term.”

1a) Select and state the appropriate statistical analysis given the research hypothesis (1
mark).

The t-test for paired sample means will be used.

1b) Using JASP, produce appropriate frequency distribution tables of the data (2
marks).

Frequencies for Hours of PA during the first week


Hours of PA during the first Frequenc Valid Cumulative
Percent
week y Percent Percent
0 9 15.00 15.00 15.00
1 1 1.67 1.67 16.67
Frequencies for Hours of PA during the first week
Hours of PA during the first Frequenc Valid Cumulative
Percent
week y Percent Percent
2 4 6.67 6.67 23.33
3 4 6.67 6.67 30.00
4 6 10.00 10.00 40.00
5 3 5.00 5.00 45.00
6 9 15.00 15.00 60.00
7 4 6.67 6.67 66.67
8 5 8.33 8.33 75.00
9 3 5.00 5.00 80.00
10 3 5.00 5.00 85.00
11 2 3.33 3.33 88.33
12 4 6.67 6.67 95.00
13 2 3.33 3.33 98.33
14 1 1.67 1.67 100.00
Missing 0 0.00    
Total 60 100.00    

Frequencies for Hours of PA during the last week


Hours of PA during the last Valid Cumulative
Frequency Percent
week Percent Percent
0 8 13.33 13.33 13.33
1 1 1.67 1.67 15.00
2 2 3.33 3.33 18.33
3 6 10.00 10.00 28.33
4 11 18.33 18.33 46.67
5 2 3.33 3.33 50.00
6 4 6.67 6.67 56.67
7 5 8.33 8.33 65.00
8 2 3.33 3.33 68.33
9 3 5.00 5.00 73.33
10 5 8.33 8.33 81.67
11 4 6.67 6.67 88.33
12 3 5.00 5.00 93.33
14 4 6.67 6.67 100.00
Missing 0 0.00    
Total 60 100.00    
Frequencies for Difference scores
Difference
Frequency Percent Valid Percent Cumulative Percent
scores
-9 1 1.67 1.67 1.67
-8 1 1.67 1.67 3.33
-5 1 1.67 1.67 5.00
-4 5 8.33 8.33 13.33
-3 3 5.00 5.00 18.33
-2 3 5.00 5.00 23.33
-1 4 6.67 6.67 30.00
0 26 43.33 43.33 73.33
1 1 1.67 1.67 75.00
2 7 11.67 11.67 86.67
3 5 8.33 8.33 95.00
4 1 1.67 1.67 96.67
5 1 1.67 1.67 98.33
6 1 1.67 1.67 100.00
Missing 0 0.00    
Total 60 100.00    

1c) Is there problematic data? Explain your answer (1 mark).

The distribution does not contain any data that is ‘problematic’ since the values inputted are
appropriate to the question that was asked. For instance, all participants answered the
question with appropriate values. There are no entered values like -3 hours or 170 hours that
fall outside the limits of a 0-168-hour period within the week.

2a) State the null and alternative hypothesis (assume two-tailed, alpha = .05) (2 marks).

H0 : μD =  0

H1 : μD ≠ 0

2b) State the decision rule for the analysis if you were to conduct the statistical analysis
by hand (2 marks).

Degrees of freedom = ND - 1

Degrees of Freedom = 60 - 1 = 59

For ɑ = .05 (two-tailed) and df = 59, critical value = ± 2.009

Rejection rule: 

If t < -2.009 or > 2.009, reject H0 ; otherwise do not reject H0.


2c) Using JASP, conduct the appropriate statistical analysis given the research
hypothesis (two-tailed, alpha = .05). Include the JASP output of the t-table in your
assignment write up.

t = - 2.009 < -0.76 < 2.009; therefore, do not reject Ho (p > 0.05)

2d) Check and discuss the assumptions of the analysis you conducted (eg: what were the
assumptions based on the analysis? Were the assumptions met?) Please include the
JASP output for any assumption checks you did using JASP.

There are 2 assumptions associated with the analysis that was conducted.

1. Level of measurement of the dependent variable:


This assumption states that the dependent variable must be measured at an interval or
ratio level measurement. In this study, the hours of moderate-vigorous physical
activity engaged by KIN 206 students during the start and middle of the term is the
dependent variable, which due to its true zero point makes it a ratio level of
measurement. Therefore, this assumption is met.

2. Normality:
This assumption states that the distribution of differences in the dependent variable
should be normally distributed and can be verified using the Shapiro-Wilk’s test – a
statistical test to assess whether the data deviates from a normal distribution.

Test of Normality (Shapiro-Wilk)


      W p
Hours of PA during the first week - Hours of PA during the last week 0.92 < .001
Note.  Significant results suggest a deviation from normality.

Upon first glance, since the p-value for the distribution of difference scores between
the hours of physical activity engaged by KIN 206 students is less than 0.01 (which is
automatically less than 0.05), it can be inferred that the distribution statistically significantly
differs from the normal distribution. However, this assumption must be explored further via
examining the modality, symmetry, and kurtosis for the distribution of the difference scores.
Descriptive Statistics
Hours of PA during the Hours of PA during the Difference
 
first week last week scores
Valid 60 60 60
Missing 0 0 0
Median 6.00 5.50 0.00
Mean 5.82 6.08 -0.27
Std. Deviation 3.97 4.18 2.73
Skewness 0.18 0.27 -0.73
Std. Error of
0.31 0.31 0.31
Skewness
Kurtosis -0.85 -0.93 1.73
Std. Error of
0.61 0.61 0.61
Kurtosis
Minimum 0.00 0.00 -9.00
Maximum 14.00 14.00 6.00

Distribution Plots

Difference scores

The mean of the difference scores (-0.027) falling to the left of the median (0.00)
indicates that the distribution is slightly negatively skewed. This is reflected in the histogram
as the long tail is on the left with a larger frequency of values on the right. Since the skewness
statistic of -0.73 (which is only slightly less than 0) falls within the range of -2 to 2, concerns
about the degree to which the distribution is asymmetrical can be ignored as the data still
approximates a normal distribution.

Furthermore, the histogram also models a leptokurtic shape, due to its kurtosis
statistic of 1.73 which reveals that the distribution has a little less variability compared to a
normal distribution. This statistic is closer to the value of 2 (where concerns begin about a
distribution’s degree of deviation away from normality), however, given that it is less than 2,
it can be inferred that the distribution still approximates a normal distribution.
Therefore, based on the aforementioned analysis, it can be concluded that the
assumption of normality for the distribution of difference scores has been met.

2e) Write a concluding statement based on your analysis (6 marks).

The hours of moderate-vigorous physical activity engaged by 60 KIN 206 students during the
first week of the term (M = 5.82, SD = 3.97), did not differ from the hours of moderate-
vigorous physical activity engaged by them during the last week of the term (M = 6.08, SD =
4.18), t(59) = -0.76, p > 0.05, with analyses suggesting a very small effect size (d = 0.10).

2f) Relate the results of the analysis to the research hypothesis (1 mark).

The statistical analyses revealed that the hours of moderate-vigorous physical activity KIN
206 students participated was not affected by the time of the term, i.e. whether it was during
the first week or during the middle.

3a) Calculate and report the 95% confidence intervals for each sample mean in the
analysis you conducted in step 2. Show the formula that you used to calculate the 95%
confidence interval and the main values within the formula (eg: the mean, standard
error, t) (4 marks).

Hours of PA first week Hours of PA last week


95% C.I = x̄ ± t (sx̅) 95% C.I = x̄ ± t (sx̅)

x̄ = 5.82, s = 3.97, and N = 60 x̄ = 6.08, s = 3.97, N = 60

s s
sx̅ = sx̅ =
√N √N
3.97 4.18
sx̅ = =0.5125 sx̅ = =0.5396
√ 60 √ 60
df = N – 1 df = N – 1

df = 60 – 1 = 59 df = 60 – 1 = 59

A 95% C.I implies α = 0.05 and df = 59. A 95% C.I implies α = 0.05 and df = 59.

t = ± 2.009 t = ± 2.009

95% C.I = 5.82 ± 2.009 (0.5125) 95% C.I = 6.08 ± 2.009 (0.5396)

95% C.I = 5.82 ± 1.0296 95% C.I = 6.08 ± 1.0840

Upper limit = 6.85 (2dp) Upper limit = 7.16 (2dp)

Lower limit = 4.79 (2dp) Lower limit = 5.00 (2dp)


95% C.I = 4.79, 6.85 95% C.I = 5.00, 7.16
There is a 0.95 probability that the interval There is a 0.95 probability that the interval
of 4.79 hours to 6.85 hours contains the of 5.00 hours to 7.16 hours contains the
population mean hours of moderate- population mean hours of moderate-
vigorous physical activity engaged by KIN vigorous physical activity engaged by KIN
206 students during the first week of term. 206 students during the first week of term.

3b) Using JASP, create a descriptive plot with 95% confidence intervals and include it
in your assignment write up (1 mark).

Hours of PA during the first week – Hours of PA during the last week

3c) Based on your work in steps 3a) and 3b), do the intervals between the samples
overlap (2 marks)? What do you think this overlap or lack of overlap means (2 marks)?

From our calculations and analysis of the descriptive plot above, it is evident that the
intervals between the samples overlap. Since the statistical analysis was conducted to deduce
whether the time of the term affects the hours of moderate-vigorous physical activity engaged
by KIN 206 students, the overlap between the confidence intervals suggests that the
difference between the hours of physical activity engaged within the two time periods is not
statistically significant. This reinforces the conclusion that was reached through our
dependent samples t-test that did not reject the null hypothesis.

You might also like