Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

12 IBCP-2

Internal Assessment

Darren Boesono

12 IBCP-2

Subject:

Mathematics (Standard Level)

Topic:

The bivariate analysis of the correlation of coefficient of the GDP per capita with the death rate
attributed to smoking of different countries using the Pearson correlation coefficient.

1
12 IBCP-2

1. INTRODUCTION

The consumption of cigarettes is a very controversial topic to discuss in today’s world.


Tobacco usage around the world causes more than 7 million deaths every year, This is
approximate to one in five deaths annually, which is approximately about 1,300 deaths every
day. Smoking causes stroke and coronary heart disease, which are very fatal to the human body.
According to researchers, the average smokers die 10 years earlier than non-smokers. Indonesia
is often known as the country that has one of the highest cigarette consumption worldwide, this
may be caused by numerous factors, such as the lack of health education among the people of
Indonesia or the high affordability of cigarettes in Indonesia. However, there are also plentiful
countries who have high cigarette consumption, even greater than Indonesia itself. Regarding the
high tobacco usage in Indonesia, I wanted to know why cigarette consumption in Indonesia is
relatively high. Therefore, I went to a local mini-market to estimate the price of varieties of
cigarette packs and the result is that cigarette packs in Indonesia are highly affordable due to its
relatively cheap price, even low-income earners or people living under poverty are able to afford
cigarette packs. In my opinion, this has got to do with Indonesia's economic development.
Because Indonesia’s economy is not yet fully developed, they need to gain revenue from the
consumption of cigarettes.Noted that cigarettes are addictive, most smokers use tobacco
regularly because they are addicted to nicotine, even in the face of negative health consequences.
Alongside with the high consumption of cigarettes, cigarette consumption is one of the main
causes of high death rates in different countries.

What indicates the development of a country is its GDP per capita. The GDP per capita is
a measure of a country’s economic output that accounts for its number of people. This is a simple
measurement of a country’s standard of living and prosperity. The GDP per capita indicates how
developed a country is. My belief is that somehow the development of the economy of a country
affects the tobacco usage in that country that eventually takes part of their country’s death rate.
During this investigation, I will mainly focus on the portion of the death rate within a country
that is caused by smoking.

In conclusion, my aim of this investigation is to inspect the correlation the GDP per
capita of different countries with their death rate that is attributed to smoking. By the end of this
investigation, I will conclude the correlation of GDP per capita and the death rate that is caused
by cigarette consumption within a country,by obtaining datas from researches and using my
knowledge of mathematics I have learned in my school years so far.

2
12 IBCP-2

2. DATA COLLECTION

Secondary data collection method is applied in this investigation. To establish the


reliability of the data recorded, all of the data is collected and recorded from reliable sources
such as from official websites or organizations and governments reports that examine the
statistics of both variables in this investigation.

➢ The GDP per capita in different countries from the year of 1990 to 2017 is collected from
a worldwide database website called “Our World In Data”.
➢ The death rate that is attributed to smoking is obtained from the same source as the GDP
per capita (“Our World In Data”)

The data collected are all in the annual basis from almost every country, from the year
1990 to 2017. Every year, the data recorded in both variables from all countries differs from one
another. In this case, the year 2017 for both variable acts as the independent variable for this
investigation

3. RAW DATA

Below is the table which displays the number of GDP per capita in 30 different
countries in the year of 2017 and the annual number of deaths attributed to smoking per
100,000 people.

GDP per capita 2017 in


constant 2011 The annual number of deaths attributed to
Country international $(x) smoking per 100,000 people in 2017 (y)
Afghanistan 1803.9 89.89
Ireland 67335.3 76.08
Poland 27216.4 108.88
Malaysia 26808.1 99.55
Nigeria 5338.4 17.72
United States 54225.4 75.61
Turkey 25129.3 89.08
South Korea 35938.4 57.68
France 38605.7 57.4

3
12 IBCP-2

Denmark 46682.5 97.12


Germany 45229.2 73.1
New Zealand 36085.8 53.66
Mexico 17336.5 47.73
Kenya 2993 50.34
Laos 6397.4 141.19
Philippines 7599.2 140.13
Saudi Arabia 49045.4 34.81
Brazil 14103.5 74.85
China 15308.71 120.22
Australia 44648.7 50.83
Bahamas 27717.9 42.9
Pakistan 5034.7 128.66
Thailand 16277.7 58.4
Cambodia 3645.1 128.84
Indonesia 11188.7 116.48
Russia 24765.1 130.21
Iraq 15663.1 54.05
India 6426.7 91.58
Singapore 85535.4 35.63
Japan 39002.2 52.16
Total Σx Σy
Value 803087.41 2394.78
Table 1. Raw data of variables
https://ourworldindata.org/smoking
https://ourworldindata.org/grapher/gdp-per-capita-worldbank

➢ Data in table 1: GDP per capita is expressed in US dollar ($) and the death attributed to
smoking is expressed per 100,000 people.
➢ Variable ​X :​ The GDP per capita in 30 countries ($)
➢ Variable​ Y:​ The annual number of deaths attributed to smoking per 100,000 people.

4
12 IBCP-2

Graph 1. Scatter diagram of variable X and Y

As you can see in the table above, it informs us that the country that has the highest GDP
per capita is Singapore (85535.4 US dollar) and the lower being Afghanistan (1803.9 US dollar).
The country that has the most annual number of deaths attributed to smoking is Laos (141.19
thousand) while the lowest being Nigeria (17.72). There is no trend regarding this table as the
variables are arranged in random and both variables are recorded in the year 2017.

4. PROCESSED DATA

➢ Simple Linear Regression

Linear regression is the process of finding a line that best fits the data points that are on
the data recorded, the line will best fit in the plot given. It is used to predict output values for
inputs that are not present in the data set recorded, the best fit line is recorded with the prediction
that those output values will fall on the line. Simple linear regression uses only one explanatory
variable while multiple linear regression uses more than one explanatory variable. This theory is
very suitable for this investigation since the investigation deals with 1 independent variable (​x)​
and 1 dependent variable (​y​).

5
12 IBCP-2

➢ Simple Linear Regression Formula

The equation of the simple linear regression has the form of (​Y = ​ ​mx ​+ ​b)​ , where ​Y i​ s the
dependent variable (the ​y variable). Therefore, ​X is the independent variable (variable that is
plotted in the ​x axis). The slope in the line is known as (​m​) and the ​y-​ intercept is known as (​b​)
shown in the equation.

The formula for finding the gradient, or slope (​m)​ of a regression line is:
Sxy
m ​=​ (Sx)2
(Σx)(Σy)

Sxy = Σxy - n
(Σx)2
(Sx)2 = Σx2 - n

The formula for finding the ​y-​ intercept (​b)​ of a regression line is:

y​ - y = ​m​ (​x​ - x )
Where:
➢ m​ : The slope of the regression line
➢ Sxy​ : Sum of square of variable ​x m​ ultiplied with variable ​y
➢ Sx​ : Sum of square of variable ​x
➢ Σx : The sum of all of the ​x​ variables
➢ Σy : The sum of all of the ​x​ variables
➢ Σxy : The sum of all of the ​x​ variables multiplied with the ​x​ variables
➢ y : The mean of all of the ​x​ variables
➢ x : The mean of all of the x variables
➢ n ​: The total number of data points in the investigation, or 30.

Row / Column 1 2 3 4 5
x y x2 y2 xy
1 1803.9 89.89 3254055.21 8080.2121 162152.571
2 67335.3 76.08 4534042626 5788.1664 5122869.624
3 27216.4 108.88 740732429 11854.8544 2963321.632
4 26808.1 99.55 718674225.6 9910.2025 2668746.355

6
12 IBCP-2

5 5338.4 17.72 28498514.56 313.9984 94596.448


6 54225.4 75.61 2940394005 5716.8721 4099982.494
7 25129.3 89.08 631481718.5 7935.2464 2238518.044
8 35938.4 57.68 1291568595 3326.9824 2072926.912
9 38605.7 57.4 1490400072 3294.76 2215967.18
10 46682.5 97.12 2179255806 9432.2944 4533804.4
11 45229.2 73.1 2045680533 5343.61 3306254.52
12 36085.8 53.66 1302184962 2879.3956 1936364.028
13 17336.5 47.73 300554232.3 2278.1529 827471.145
14 2993 50.34 8958049 2534.1156 150667.62
15 6397.4 141.19 40926726.76 19934.6161 903248.906
16 7599.2 140.13 57747840.64 19636.4169 1064875.896
17 49045.4 34.81 2405451261 1211.7361 1707270.374
18 14103.5 74.85 198908712.3 5602.5225 1055646.975
19 15308.71 120.22 234356601.9 14452.8484 1840413.116
20 44648.7 50.83 1993506412 2583.6889 2269493.421
21 27717.9 42.9 768281980.4 1840.41 1189097.91
22 5034.7 128.66 25348204.09 16553.3956 647764.502
23 16277.7 58.4 264963517.3 3410.56 950617.68
24 3645.1 128.84 13286754.01 16599.7456 469634.684
25 11188.7 116.48 125187007.7 13567.5904 1303259.776
26 24765.1 130.21 613310178 16954.6441 3224663.671
27 15663.1 54.05 245332701.6 2921.4025 846590.555
28 6426.7 91.58 41302472.89 8386.8964 588557.186
29 85535.4 35.63 7316304653 1269.4969 3047626.302
30 39002.2 52.16 1521171605 2720.6656 2034354.752
Total Σx Σy Σx2 Σy 2 Σxy
Value 803087.41 2394.78 34081066451 226335.4992 55536758.68
Mean values 26769.58033 79.826 1136035548 7544.51664 1851225.289
Table 2. Processed data

7
12 IBCP-2

5. DATA CALCULATION

➢ Finding the slope or gradient of the regression line (m)

Sxy
m ​ =​ (Sx)2

(803087.41) × (2394.78)
Sxy = ​(​55536758.68) - (30)
​ (​55536758.68) - (​64107255.59)
Sxy =
​ ​−8570496.91
Sxy =

2 (803087.41)2
(​Sx ) = (​34081066451​) - 30
2
(​Sx ) = (​ ​34081066451​) - (​21498312900)
(​Sx )2 = ​12582800000

−8570496.91
m= 12582800000
m = -​ 0.000681

➢ Finding the y-intercept of the regression line (b)

y​ - y = ​m​ (​x​ - x )

y​ - (​79.826) = (​-0.000681) [x − (26769.58033) ]


y​ - (​79.826) = (​-0.000681​x​) - (12.6824759)
y= ​ ​ (​-0.000681​x)​ - (12.6824759) ​- (​79.826)
y ​= ​-0.000681​x -​ 92.5084759

b = ​92.5084759

8
12 IBCP-2

➢ Regression line equation

y = mx + b

m​ = ​-0.000681
b​ = ​92.5084759

y = ​-0.000681x - 92.5084759

Σxy ​is recorded in (column 5, row 31). Σx is recorded in (column 1, row 31)​. Σy is
recorded in (column 2, row 31)​. Σx2 (column 3, row 31). Σy 2 (column 4, row 31). By applying
these components to the formula stated, the result of (​m​) is ​-0.000681. The (​m)​ is then can be
applied to find the ​y​-intercept of the regression line (​y - y = ​m (​x - x )). y and x is the mean
value of both variables. This gives us the mean point of the the ​x and ​y v​ ariables ( x , y ),
(26769.58033,79.826). It can be deduced that the ​y-​ intercept is ​92.5084759 (​b = 92.5084759).
The final line equation is deduced (​y ​=​-0.000681 -​ 92.5084759). The linear regression above
serves to predict the number of ​the annual deaths attributed to smoking in different countries
from an independent variable (GDP per capita). The ​number of ​the annual deaths attributed to
smoking in different countries that represent the ​y-​intercept will occur even without the influence
of GDP per capita, which can be proven when ​x is 0, the value of ​y w ​ ould be ​92.5084759. When
the value of ​x i​ ncreases the value of ​y ​will also be decreasing. Thus, the equation tells us that the
relationship between the dependent variable (​y)​ and independent variables (​x​) is negative.The
regression line can be drawn into a line of equations which into a set of data drawn into a scatter
diagram. The best fit line can be drawn below:

9
12 IBCP-2

Graph 2. The best fit line of the linear regression equation

➢ The formula for finding Pearson’s correlation coefficient

Sxy
r​ = SxSy

(Σx)(Σy)
Sxy​ = Σxy - n

Sx​ =
√ Σx −2 (Σx)2
n

Sy​ =
√ Σy −2 (Σy)2
n

10
12 IBCP-2

r - v​ alues Correlation

0 < |r| ≤ 0.25 Very weak

0.25 < |r| ≤ 0.5 Weak

0.5 < |r| ≤ 0.75 Moderate

0.75 < |r| ≤ 1 Strong


Table 3. Interpretation of the r - value

➢ Finding the Pearson’s correlation coefficient

Sxy
r​ = SxSy

(803087.41) × (2394.78)
Sxy​ = (55536758.68) - (30)
Sxy ​= (55536758.68) - (​64107255.59)
Sxy ​= ​−8570496.91

(803087.41)2
Sx​ = √ (34081066451) − 30
Sx​ = √(34081066451) − (21498312900)
Sx​ = √12582800000
Sx​ = ​112173

Sy​ =
√ Σy 2 −
(Σy)2
n

(2394.78)2
Sy ​= √ 226335.4992 − 30
Sy ​= √226335.4992 − 191165.7083
S​y ​= √35169.7909
S​y​ = ​187.5361056

−8570496.91
r ​= (112173) × (187.5361056)

11
12 IBCP-2

r ​= −8570496.91
21036500
r ​= -0.407411

➢ Interpretation of r - value

From the calculations above, the formula for finding the Pearson’s correlation coefficient
Sxy
is used (​r = SxSy
). The Pearson product-moment correlation coefficient, which is denoted by ​r, ​is
a measure of the correlation between two variables ​x a​ nd ​y, g​ iving a value between +1 and -1
inclusive. Table.3 shows the interpretation of ​r -​ value. The Pearson correlation coefficient
between variables ​x a​ nd ​y i​ s (​r = ​-0.407411). It shows that the correlation is negative between
both variables, meaning that the variables move in opposite directions. For instance, when one
variable increases, the other decreases. As can be seen from Graph.2, as GDP per capita
increases the number of deaths attributed to smoking decreases thus showing a negative
correlation between the ​x ​and ​y variable. From Table.3, it is known that the correlation between
variable ​x and ​y ​is weak, ​(​r = -​ 0.407411) is placed in (0.25 < |r| ≤ 0.5) which is categorized to
have weak correlation between two variables.

6. ANALYSIS AND INTERPRETATIONS OF RESULTS

In Graph.2, the best fit line that is plotted through the calculation of using the obtained
regression line equation from the calculation process above. The best fit or the linear regression
line is based on the actual statistical reports of GDP per capita and the number of deaths
attributed to smoking in 30 different countries from worldwide statistics. As seen through the
graph, the trend of the linear line contradicts both variables, where both variables move in an
inverse direction. As the ​x variable (GDP per capita) increases the ​y variables (the number of
deaths attributed to smoking) decreases. This assures that the formula, though not exactly fitting
into each of the actual data, is able to be accounted as a reliable formula to estimate the prospects
of GDP per capita and number of deaths in different countries in other countries. This indicates
that the implemented mathematical theory and calculation was accurate enough to produce a
linear equation which we can calculate the estimated number of deaths caused by smoking
through the GDP per capita of a country.

12
12 IBCP-2

If the linear regression formula is used to produce an equation to model the expected the
number of deaths caused by smoking in different countries, Pearson product-moment correlation
coefficient is further utilized to verify if the two variables (GDP per capita and number of deaths
caused by smoking) do actually have an influence to each other. As calculated the Pearson
correlation coefficient, denoted by ​r​, is categorized to have a weak correlation according to
Table.3. The results from ​r - value has conveyed us that the GDP per capita and the number of
deaths caused by smoking in different countries have an insufficient influence on each other.
From the graph recorded from the data, there is no trend that correlates both of the variables.

This tells us that there is a chance when the higher amount of GDP per capita won't lower
the number of deaths attributed to smoking. From another standpoint, there are arguments that
revolve around this statement. First is that the fact that cigarettes are addictive, smokers who
develop an addiction for consuming cigarettes will ignore any negative side-effects, such as lung
disease that worsen their cardiovascular system. This happens in the short run at least, because
addiction towards consuming cigarettes can be very hard to be rehabilitated in a short-period of
time. Even though the government attempts to lower the cigarette consumption through imposing
a variety of regulations, the cigarette consumption is mainly dependent on the smoker. It's the
smoker’s choice to pick their alternatives to erase their addiction towards cigarettes. Second is
that there are countries with low GDP per capita but have high death rates attributed to other
factors aside from smoking. For instance, there are countries that live in extreme poverty which
resulted in low GDP per capita but have low death numbers attributed to smoking, instead they
have high death numbers attributed to other factors. The hypothetical reason for this is that the
death number resulting from smoking is a small proportion from the death that resulted from
hunger, sickness and many more.

Aside from the arguments above that state that the GDP per capita and death number
attributed from smoking does not correlate with each other. However, there are tolerable
arguments that support the correlation between the two variables. Death from smoking is a result
from the consumption of cigarettes. Cigarette consumption can be reduced by imposing taxes to
cigarettes by the government. The imposition of a large amount of tax can be concluded by
mostly developed countries (with high GDP per capita) because they do not need the revenue
that comes from allocation of cigarettes. However, most undeveloped countries do not impose
tax on cigarettes is because they need extra revenue from the allocation of cigarettes to fund
merit goods or perform public expenditure to develop their economy.

13
12 IBCP-2

7. CONCLUSION

The aim of this investigation, as mentioned previously, was able to come up with the
correlation of the two variables by using the simple linear regression line equation. The equation,
when implemented, was able to produce a linear line that resembles the best fit line of the two
variables. Then we find the correlation between the two variables using the Pearson correlation
coefficient which is denoted by ​r.​ The result is that the two have a weak correlation between the
two variables because of different factors that affect the weak correlation. The equation cannot
be used to predict the death number attributed to smoking by the GDP per capita of a country due
to the weak correlation denoted by the ​r​ - value.

8. FURTHER IMPROVEMENTS

The data of this investigation is limited to the year of 2017 for both variables. There is no
recent updated data that records today’s world situations. To improve the accuracy in terms of its
results, is to find the cigarette consumption data in each country. However, the data cannot be
found in statistical websites. Therefore, the annual number of deaths attributed to smoking is
used instead of the cigarette consumption in different countries.

14
12 IBCP-2

9. BIBLIOGRAPHY

➢ Buchanan, L., Fensom, J., Kemp, E., Rondie, P. L., & Stevens, J. (2012).
Mathematics: standard level. Oxford: Oxford University Press.
➢ Ritchie, H., & Roser, M. (2013, May 23). Smoking. Retrieved April 13, 2020, from
https://ourworldindata.org/smoking
➢ GDP per capita. (n.d.). Retrieved from
https://ourworldindata.org/grapher/gdp-per-capita-worldbank

15
12 IBCP-2

16

You might also like