Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

KIN 206 – ASSIGNMENT #1

Suprita Anand (93856235) and Jenna Multani (10143741)

1a) Produce a frequency distribution table of the data using JASP. Make sure the table
and axes are appropriately labelled (2 marks).

Frequencies for Number of hours spent sitting

Number of hours spent


Frequency Percent Valid Percent Cumulative Percent
sitting

3 1 1.27 1.27 1.27

5 12 15.19 15.19 16.46

6 13 16.46 16.46 32.91

7 9 11.39 11.39 44.30

8 13 16.46 16.46 60.76

9 9 11.39 11.39 72.15

10 10 12.66 12.66 84.81

11 2 2.53 2.53 87.34

12 3 3.80 3.80 91.14

13 2 2.53 2.53 93.67

14 3 3.80 3.80 97.47

15 2 2.53 2.53 100.00

Missing 0 0.00    

Total 79 100.00    

1b) Is there problematic data? Explain your answer (1 mark).


The distribution does not contain any data that is ‘problematic’ since the values inputted are
appropriate to the question that was asked. For example, there are no entered values like -3
hours or 29 hours that fall outside the limits of a 0-24-hour period.
Distribution Plots

Number of hours spent sitting

2a) How would you describe the shape of the variable’s distribution? In your answer, be
sure to provide evidence from and interpretation of the relationship between the mean
and median, skewness and kurtosis statistics, and the histogram. Be sure to discuss the
modality, symmetry, and variability (6 marks).

 Firstly, in terms of modality, this distribution is classified as unimodal with 6 and 8 hours
spent sitting being the most frequently occurring scores. Despite both scores having the same
frequency of 13, this distribution is not referred to as bimodal as there is not a significant
valley between the frequency scores of the two data points - a distinguishing feature of
bimodal distributions. This is reflected by the histogram as the frequency score for 7 hours
spent sitting is 9 (only 4 less than 13), which is not a significant enough decrease from the
frequency scores of 13 for 6 and 8 hours spent sitting. Secondly, with regards to symmetry,
the distribution presents a skewness value of 0.732, indicating that the distribution is slightly
positively skewed. The histogram evidences this, as the long tail corresponding to lower
frequency values lies on the right of the distribution, and the peak representing the most
frequently occurring values (between 4-10 hours spent sitting) are clustered on the left. Since
this distribution is slightly positively skewed, it can be deduced that the mean will fall on the
right of the median. The greater value of 8.165 of the mean compared to 8.0 value of the
median corroborates this. Lastly, in relation to variability, the distribution presents a positive
kurtosis statistic of 0.085. This suggests that the distribution is leptokurtic and has slightly
less variability compared to a normal distribution. The slightly larger peak in the histogram
for the frequency scores for 4-10 hours spent sitting (compared to a normal distribution)
visually demonstrates this. 

2b) Based on the shape of the variable’s distribution, which measure of central tendency
most accurately represents the distribution? Why? (2 marks)

Based on the positively skewed asymmetric shape of the distribution, the median provides a
better estimate of the central tendency as it is not affected by outliers like the mean.
Boxplots

Number of hours spent sitting

3) Create a box-plot of the variable using JASP. Interpret the box-plot (4 marks).
Box plots are a useful tool to interpret the variability within a data set. Firstly, the line in the
middle of the box reveals that the median of the distribution is at 8 hours spent sitting, which
is consistent with previous findings. Secondly, the small length of the box illustrating the
middle 50% of data, shows that most of the scores collected are concentrated between 6-10
hours spent sitting. This is in line with the high peaks for these scores that occur on the
histogram. Thirdly, the longer length of the top whisker representing the top 25% of the data,
indicates a large range between the scores for the upper quartile (10 hours spent sitting) and
the largest value of the distribution (15 hours spent sitting). In contrast, the shorter length of
the bottom whisker representing the bottom 25% of the data, indicates a small range between
the scores for the lower quartile (6 hours spent sitting) and the smallest value of the
distribution (3 hours spent sitting). This information suggests that there is more variability
between the scores of the top 25% of the distribution, when compared to the bottom 25%.
Moreover, the range of the distribution (12 hours spent sitting) can also be calculated from
the largest and smallest value of the distribution. Lastly, the fact that the top whisker is longer
in length than the bottom whisker highlights that the data is positively skewed (higher
frequency of scores occurring at the bottom and lower frequency of scores occurring at the
top). To provide context, if both whiskers were the same length, this would mean that the
bottom 25% and top 25% are identical and the distribution would be symmetrical.

4) Convert the data to standardized scores (Hint: use Excel to assist with your
calculations; (3 marks). Please show the standardized scores, the formula, and one
sample calculation for this step but no need to include all of your work for the data
conversion. Did you use the population or sample formula? Justify your answer (1
mark).
Standardized scores:
Number of hours spent z-scores
sitting (SD)
7 -0.43
8 -0.06
7 -0.43
8 -0.06
7 -0.43
7 -0.43
8 -0.06
9 0.31
5 -1.18
10 0.68
14 2.17
8 -0.06
11 1.06
10 0.68
14 2.17
8 -0.06
6 -0.81
9 0.31
8 -0.06
6 -0.81
15 2.54
6 -0.81
5 -1.18
10 0.68
6 -0.81
10 0.68
10 0.68
5 -1.18
6 -0.81
6 -0.81
8 -0.06
6 -0.81
12 1.43
8 -0.06
9 0.31
7 -0.43
14 2.17
5 -1.18
5 -1.18
8 -0.06
7 -0.43
12 1.43
12 1.43
6 -0.81
5 -1.18
8 -0.06
13 1.80
9 0.31
7 -0.43
5 -1.18
6 -0.81
8 -0.06
10 0.68
5 -1.18
9 0.31
10 0.68
5 -1.18
15 2.54
11 1.06
5 -1.18
6 -0.81
10 0.68
7 -0.43
9 0.31
9 0.31
3 -1.92
6 -0.81
6 -0.81
9 0.31
10 0.68
9 0.31
13 1.80
7 -0.43
8 -0.06
8 -0.06
5 -1.18
5 -1.18
10 0.68
6 -0.81

Formula to transform raw score into z-score:


X −X
z=
s
Sample calculation:

7−8.16 −1.16
z= z= z=−0.43 S . D ( 2 d . p )
2.69 2.69

The sample formula was used to standardize the scores because the population values were
unknown. If data was gathered from the entire population (the whole KIN 206 class), then the
population values could have been used. In this instance, since data was only gathered from
79 participants of the class, the sample values were used.
5) Does the conversion to a standard score change the distribution of the data?
Explain. (2 marks).
Converting the data to standard scores does not change the distribution of the data because it
is a linear transformation. Since this conversion simply represents the data as a number of
standard deviations away from the mean, the position of the scores within the distribution do
not change. Hence, the shape of the distribution remains the same whilst allowing for
comparisons between variables.
6) How many hours sitting within a day corresponds to z scores of -2, -1, 0, 1 & 2. Show
your work (5 marks).
z-score: Calculation: Hours spent sitting
(answer):
-2 X −8.16
−2=
2.69 X =2.78 hours
−5.38=X−8.16
-1 X −8.16
−1=
2.69 X =5.47 hours
−2.69=X−8.16
0 X−8.16
0=
2.69 X =8.16 hours
0=X −8.16
1 X −8.16
1=
2.69 X =10.85 hours
2.69=X−8.16
2 X −8.16
2=
2.69 X =13.54 hours
5.38= X−8.16

7) What percentage of the population engages in sitting between 9 and 11 hours per


day (3 marks)? Show your work.
Hours spent Calculation: z-score (SD): Area between mean
sitting: and z:
9−8.16
z=
2.69
9 0.31 12.17%
0.84
z=
2.69
11−8.16
z=
2.69
11 1.06 35.54%
2.84
z=
2.69

Therefore, the percentage of the population that engages in sitting between 9 and 11 hours per
day is:
35.54% - 12.17% = 23.37%

You might also like