Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Section 1: Central Tendency, Spread, and Graphing

Dependant variable: Global obesity rates Independant Variable #1: Physical Activity Independent variable #2: Sugar Intake
Obesity in Canada and the top 10 mst populous countries (2016) Adults that had insufficient physical activity levels in Canada and the top 10 most populous countries (2016) Average sugar consumed daily in Canada and the top 10 most populous countries (2016)
Country % of the population that were obese Country % of adults that had insufficient physical activity levels Country Average Sugar Consumed Daily (g)
China 6.2 China 15 India 5.1
India 3.9 Russia 15 Indonesia 15.2
United States 36.2 Nigeria 25 Bangladesh 15.3
Indonesia 6.9 Bangladesh 25 China 15.7
Pakistan 8.6 Mexico 30 Russia 20
Brazil 22.1 Canada 30 Nigeria 21.9
Nigeria 8.9 ndia 35 Brazil 47.6
Bangladesh 3.6 Indonesia 35 Pakistan 68.5
Russia 23.1 Pakistan 35 Canada 89.1
Mexico 28.9 United States 40 Mexico 92.5
Canada 29.4 Brazil 50 United States 126.4

Central Tendency: Central tendency: Central tendency


Mean 16.16363636 Mean 30.45454545 Mean 47.02727273
Median 8.9 Median 30 Median 21.9
Mode none Mode 35 Mode no mode

Spread: Spread: Spread


Range 32.6 Range 35 Range 121.3
Q1 6.55 Q1 25 Q1 15.5
Q3 26 Q3 35 Q3 78.8
IQR 19.45 IQR 10 IQR 63.3
Variance 129.5932231 Variance 97.52066116 Variance 1520.00562
Standard deviation 11.38390193 Standard Deviation 9.875254992 Standard Deviation 38.98724945

This data is skewed right with a mean of 16.16 larger than the median of This data is slightly skewed right because the mean is slightly more than the median. Since The mean of this data is much higher than the median. This likely is caused by a large
8.9. The mean is the average of all of the data points. Since this mean very high the mean and median are so close, it can be implied that there are not any extreme outliers pulling the mean in outlier increasing the number. The median is 21.9 which is relatively low compared to
compared to the median conclusions can be drawn that there are a few outlying any direction. The mean value is very close to the median but slightly lower, which means that there is slightly the whole set of data. This would be caused because there are many lower numbers and
large numbers at the end of the measurement scale. The median is relatively more data points that are low than high. The histogram visualizes that the most of the data is almost only a few very high numbers. The histogram and box plot clearly show the data is right
low which implies that most of the data is at the lower end of the scale. The normally distributed, but more data is in the lower two bins of the measure than the higher two showing a slight skew. skewed. In the histogram, the first bin of data has the most points in it proving skewed data, and the box and
histogram and box plot very easily show that this data is skewed right because The box and whisker plot also shows this because the top whisker is slightly longer than the bottom whisker proving a whisker plot show the mean is higher than the median and the top whisker is longer than
there is much more data in the first colum of the histogram than the other right skew. The interquartile range is 10 implying that the middle points of the data are relatively close together. the bottom whisker, proving a right skew. The interquartile range of this data is 63.3, which is relatively large.
colums. In the box plot the median is far to the bottom of the box and the high the standard deviation is about 9.9, which is very close compared to the range of the data points, this implies that the data is relatively spread out in the middle of the set. The variance and
end whisker is longer than the low end whisker. The interquartile range is 19.45 this likely means the data is not very spread out through the entire set. standard deviation are also high numbers which can imply that throughout the entire set
which shows that the data in the middle is spread out. The standard deviation and data is very spread.
variance show that each point is relatively close to the mean, but some are very
far away which makes this number slightly larger.
Section 2: Two Variable Measures with Linear Models and Correlations
Table 2: Adults with Insufficient activity levels and the percentage of each population that are obese in Canada and the top 10 most populous countries (2016) Table 3: The average daily sugar consumption of individuals and the percentage of each population that are obese in Canada and the top 10 most populous countries (2016)
Country % of adults that had insufficient physical activity levels % of the population that were obese Country Average sugar consumed daily (g) % of the population that were obese
China 15 6.2 China 15.7 6.2
India 35 3.9 India 5.1 3.9
United states 40 36.2 United states 126.4 36.2
Indonesia 35 6.9 Indonesia 15.2 6.9
Pakistan 35 8.6 Pakistan 68.5 8.6
Brazil 50 22.1 Brazil 47.6 22.1
Nigeria 25 8.9 Nigeria 21.9 8.9
Bangladesh 25 3.6 Bangladesh 15.3 3.6
Russia 15 23.1 Russia 20 23.1
Mexico 30 28.9 Mexico 92.5 28.9
Canada 30 29.4 Canada 89.1 29.4

Daily Sugar Consumption and Obesity (2016)


Activity levels and Obesity (2016) 40
40
35
35
30 y = 0.2441x + 4.682
30
R² = 0.6992

% that were obese


% that were obese

25
25
20
20

15 y = 0.2966x + 7.1305 15
R² = 0.0662
10 10

5 5

0 0
0 10 20 30 40 50 60 0 20 40 60 80 100 120 140
% with insufficient activity levels Averge grams of sugar consumed daily

Line of best fit y = 0.2966x + 7.1305 Line of best fit: y = 0.2441x + 4.682
Correlation Coeficient (R): 0.2573 Correlation Coeficient (R): 0.8362

The correlation coefficient of this data is 0.2573. That classifies The correlation coefficient of this data is 0.8362. This classifies
the data as a weak positive relationship. Therefore, insufficient the data as a strong positive relationship. Therefore, daily sugar
activity levels and obesity are not highly correlated. The slope intake and obesity could be hghly correlated. The slope of the
of the line of best fit is 0.2966. This means for every 0.2966 percent line of best fit is 0.2441. this means that for every 0.2441 percent
increase in obesity, the percent of insufficient activity will increase increase in obesity, there is a 1 percent increase in the daily
by one percent. The y-intercept of this line is 7.1305, this means sugar intake. The y-intercept of the data is 4.682. this means that
that if the percent of insufficient levels was 0, the percent of if theroetically the sugar intake was 0 grams, the obesity percentage
obesity would be 7.1305. would be 4.682.
Section 3: Relevant Points
Adults with Insufficient activity levels and the percentage of each population that are obese in Canada and the top 10 most populous countries (2016) The average daily sugar consumption of individuals and the percentage of each population that are obese in Canada and the top 10 most populous countries (2016)
Physical Activity: Country % of adults that had insufficient physical activity levels % of the population that were obese Model Residuals Sugar Intake: Country Average sugar consumed daily (g) % of the population that were obese Model Residuals
China 15 6.2 11.5795 -5.3795 China 15.7 6.2 8.51437 -2.31437
India 35 3.9 17.5115 -13.6115 India 5.1 3.9 5.92691 -2.02691
United states 40 36.2 18.9945 17.2055 United states 126.4 36.2 35.53624 0.66376
Indonesia 35 6.9 17.5115 -10.6115 Indonesia 15.2 6.9 8.39232 -1.49232
Pakistan 35 8.6 17.5115 -8.9115 Pakistan 68.5 8.6 21.40285 -12.80285
Brazil 50 22.1 21.9605 0.1395 Brazil 47.6 22.1 16.30116 5.79884
Nigeria 25 8.9 14.5455 -5.6455 Nigeria 21.9 8.9 10.02779 -1.12779
Bangladesh 25 3.6 14.5455 -10.9455 Bangladesh 15.3 3.6 8.41673 -4.81673
Russia 15 23.1 11.5795 11.5205 Russia 20 23.1 9.564 13.536
Mexico 30 28.9 16.0285 12.8715 Mexico 92.5 28.9 27.26125 1.63875
Canada 30 29.4 16.0285 13.3715 Canada 89.1 29.4 26.43131 2.96869

Residuals As seen on this residual plot, point (18.9945, 17.2055) Residuals


20
is seen as the farthest away from the line 15
This residual plot of the data clearly shows that
of best fit. On the linear model this is point (40, 36.2), point (21.40285, -12.80285) is the farthest from
15
the country being the United States. This is the most 10 the line of best fit. On the linear model, this is point

Distance from line of best fit


outlying point in the data set. It affects the model because (68.5,8.6) and is the country Pakistan. This is the most outlying
Distance from line of best fit

10

5 it is far from where it is expected to lie based off the other 5 point in the data set. This point affects the data the most
data in the set it therefore it will change the outcome. because since it is very far from where it is expected to be it can
0 0
0 5 10 15 20 25 Mean 30.45 0 5 10 15 20 25 30 35 40
it can change the results of the data.
-5 Standard Deviation 9.875
-5
x-value 40 Mean 47.03
-10
Z-Score 1.06 Standard Deviation 38.99
-15 Percentile 86th -10 x-value 68.5
-20
Z-Score 0.55
Number value on line of best fit -15
In a normal distribution, this x-point would have a Number value on line of best fit Percentile 71st
z- score of 1.06 and be in the 86th percentile.
In a normal distribution, this x point (68.5) would have a
Country % of adults that had insufficient physical activity levels % of the population that were obese Country Average sugar consumed daily (g) % of the population that were obese z-score of 0.55 and it would be in the 71st percentile.
China 15 6.2 China 15.7 6.2
India 35 3.9 India 5.1 3.9
Indonesia 35 6.9 United states 126.4 36.2
Pakistan 35 8.6 Indonesia 15.2 6.9
Brazil 50 22.1 Brazil 47.6 22.1
Nigeria 25 8.9 Nigeria 21.9 8.9
Bangladesh 25 3.6 Bangladesh 15.3 3.6
Russia 15 23.1 Russia 20 23.1
Mexico 30 28.9 Mexico 92.5 28.9
Canada 30 29.4 Canada 89.1 29.4

The United States data has been removed. Pakistan data has been removed

Linear model with outlier: Linear model with outlier:

Activity levels and Obesity (2016) Daily Sugar Consumption and Obesity (2016)
40 40

35 35

30 30 y = 0.2441x + 4.682
% that were obese

% that were obese


25 R² = 0.6992
25

20 20
y = 0.2966x + 7.1305
15 15
R² = 0.0662
10 10

5 5

0 0
0 10 20 30 40 50 60 0 20 40 60 80 100 120 140
% with insufficient activity levels Average Grams of sugar consumed daily

Liner model without outlier: Linear model without outlier:

Activity levels and Obesity (2016)


Daily Sugar Consumption and Obesity (2016)
35

45
30
40
25
35
% that were obese

20 30
% that were obese

y = 0.2628x + 5.1253
25 R² = 0.8218
15
y = 0.1108x + 10.89 20
10 R² = 0.0121
15

5 10

5
0
0 10 20 30 40 50 60 0
% with insufficient activity levels 0 20 40 60 80 100 120 140
grams of sugar daily

The linear model for activity levels and obesity slightly changed when
the outlier was removed. The correlation has actually decreased. The linear model for daily sugar consumption and obesity has changed since the
This could be because the outlier was pulling the outlier was removed. The correlation increased from 0.8362 to 0.9065. It is now
line of best fit more towards the higher numbers, balancing a strong positive relationship. This likely happened because the outlier was much
between the higher and lower values. When it was removed less than it was expected to be, and it was pulling the line of best fit towards
the line of best fit was pulled away from the higher points, the outlier and away from the expected values that correlate more with eachother. When
slightly decreasing the correlation to be an even weaker positive relationship it was removed, the line of best fit moved towards the more correlated values.
Section 4: Classification of Correlation Type
The Correlation Between Activity Levels and Obesity The Correlation Between Daily Sugar Consumption and Obesity
Activity levels and Obesity (2016) Daily Sugar Consumption and Obesity (2016)
35
45
30
40

25 35
% that were obese

30

% that were obese


20
y = 0.2628x + 5.1253
25 R² = 0.8218
15
y = 0.1108x + 10.89 20

10 R² = 0.0121 15

10
5
5
0
0 10 20 30 40 50 60 0
0 20 40 60 80 100 120 140
% with insufficient activity levels
grams of sugar daily

This relationship between activity levels and obesity could be classified as a The relationship between daily sugar consumption and obesity can
reverse cause and effect. The more overweight someone is, the less activity be classified as a cause and effect relationship. When adults
they will endure. This linear model shows a slight increase in obesity consume excessive amounts of sugar, their chances of being obese
as more adults have insufficient activity levels. Since this model is increase. This linear model clearly shows that the more sugar
weakly positively correlated, the correlation type cant be certain, but it consumed in a country, the more percent of that population is
is likely the case. obese. This relationship has a strong positive correlation therefore
it it very likely that this correlation type is correct.

You might also like