Psyc417 Final World Happiness

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

WORLD HAPPINESS 1

World Happiness: A Computational Analysis of The Happiest Countries In The World

Ameya M. Sriram
University of Maryland
PSYC 417: Data Science for Psychology Majors
Dr. Solway
May 13th, 2022
WORLD HAPPINESS 2

World Happiness: A Computational Analysis of The Happiest Countries In The World

What does it mean to be happy? The answers to this question vary immensely, and are
impossible to fully quantify. However, there are some indicators that can be used to find who has
the highest level of happiness, and why. In addition to measuring a country’s prosperity in terms
of GDP per capita, life expectancy, or other indicators, one can use the metric of the population’s
overall happiness. Using the Gallup Poll survey on subjective well-being, which computes the
national average response to questions regarding life satisfaction or happiness, the “happiness”
of countries can be measured (Helliwell et al., 2022). In addition to this metric, the World
Happiness Report has compiled the measurements of several other factors that could influence a
country’s happiness, including GDP per capita, social support, and trust in government
(Helliwell et al., 2022).
The report, which started in 2015 and has recently published 2022’s measurements, acts
as a meaningful indicator of what countries are happier, and can serve as an important starting
point in examining the most prevalent factors of happiness. This also includes taking into
account the differences between countries, as well as overall trends of happiness giving
indications as to what systems, regions, and modes of life have resulted in the highest levels of
life satisfaction. Out of many variables for each country – Economy, Social Support, Health,
Freedom, Trust, and Generosity – it is crucial to see which has the biggest impact on the
happiness of the country, and if there is correlation between previous scores and the current
happiness score of each country. In seeking to characterize the most important factors of
happiness, this study aims to examine the possible independent variables that could predict
happiness scores. We hypothesize that these variables altogether will have a high positive
correlation with happiness scores, although some variables may have heavier weight than others
in determining the score. We also hypothesize that 2015 happiness scores will be significantly
correlated with 2022 scores. Finally, using a t-test, we hypothesize that the scores of the top
countries are significantly different from those of the bottom countries, implying a significant
gap across the world in terms of happiness.
Methods
In the original study, there were a total of 146 participating countries (N=146). All
participants' data was used for the current study. All statistical analyses were performed on
RStudio. First, the entire dataset for 2022 was loaded to the program. Each country had their
happiness score, their ranking, the independent variables of interest, and other supplementary
measurements. Upon completion, the data was cleaned to add information from the 2015 dataset
as to the region each country belonged to and the happiness score of the country in 2015
(Appendix A). The columns of the dataset were also renamed to more accurately define the
information. Countries which did not participate in the survey in 2015 were included in the
overall dataset for analysis of 2022 scores, but were not used in any analyses concerning the
2015 scores. The cleaned data was then used to analyze the effect of the independent variables on
the dependent variable, happiness score, as well as the relationship between 2015 and 2022
scores. It was also used to identify the top ten and bottom ten countries and compare them
against each other graphically.
Each independent variable was determined using a different measure. Economy was
measured by per capita GDP, or GDP for each person in the country. Health was measured by
using healthy life expectancy in each country, extrapolated from WHO data. Social support was
WORLD HAPPINESS 3

measured by average response to a binary question asking if people felt they would be supported
by friends or family. Freedom and generosity were both measured by the average response to a
binary question, the former question regarding satisfaction with their freedom to choose their
life, and the latter regarding charitable giving. Trust was measured by the level of corruption
perceived by the people, in response to questions regarding corruption in the government and in
businesses.
A linear regression was run, which would allow measuring of the beta regression
weights, significance of their differences from zero, and the overall fit of a linear regression to
the data. The linear regression was the optimal statistical method to find these weights since it
involves assessing the impact of each independent variable holding all other variables constant,
as well as the impact of all of the independent variables in predicting the dependent variable. The
coefficient of each independent variable represents the change in the value measured per each
standard deviation change in the happiness score. To reflect the predictive capability of each
independent variable to the happiness score, a series of scatterplots were visualized for each
independent variable. All analysis was done on RStudio (Appendix B).
A t-test was then conducted using the happiness scores of the topmost-ranking ten
countries and the bottom ten countries. The t-test examined the overall scores for each
participant and determined the sample mean of the difference, which was used to find the t-value
and p-value. The hypothesis was tested using a two-tailed design of t-test, so that it could be
examined whether or not the happiness scores of the top countries were significantly different
from the scores of the bottom countries. Two scatterplots were made, one depicting the happiness
scores and identifications of the top ten countries, and one for the scores and identifications of
the bottom ten countries. All of the analysis was done using RStudio code (Appendix C).
Finally, the relationship between happiness scores in 2015 and happiness scores in 2022
was explored. A linear regression was then performed using 2022 scores as the independent
variable and 2015 scores as the dependent variable. Any countries whose information was not in
the 2015 scores were omitted from this analysis. In addition, it was coded that a scatter plot and
additional regression line would be created to visualize the data. All of this analysis was coded in
RStudio (Appendix D).
A linear regression was primarily used in this study. This design enabled the analysis of a
correlational hypothesis with a continuous outcome and predictor variables, while also
estimating regression weights for each predictor variable, all of which suggested that the method
was appropriate for the purpose of this research. The predictor variables of one regression were
economy, health, social support, freedom, generosity, and trust of each country, and the outcome
variable was the happiness score of the country. Another regression used the 2015 happiness
scores as the predictor variable and the 2022 scores as the outcome variable. A t-test was also
utilized in order to determine the presence of a possibly significant difference between the mean
scores of the top ten ranking countries and the bottom ten countries. This was the ideal analysis
with the two variables studied as it afforded an examination of the significant difference with a
continuous outcome.
Results
All of the data analysis was conducted between May 2nd and 13th, 2022. The data from
all 146 participants was used. The only data was the data provided from the study mentioned
above(Helliwell et al., 2022) .
Given the operational definitions of the variables in this research study, a linear
regression with multiple predictor variables was the optimal statistic to utilize to test the
WORLD HAPPINESS 4

hypothesis that the independent variables of economy, health, social support, freedom,
generosity, and trust would be positively correlated with happiness scores. This is because the
data consisted of several possible continuous predictor variables that needed to be analyzed for
correlative capabilities. In addition, the relative effect each independent variable has when all
others are held constant can be examined using the correlation coefficients. The results showed
that there is a statistically significant correlation between the independent variables and the
happiness scores. The p-value, p<0.001, is far below that of the alpha value, α=0.05, indicating
that the correlative relationship between the variables is very significant. The correlation
coefficient, r=0.87, indicates a strong, positive correlation, in which it can be concluded that the
predictor variables as a whole explain 76% of the variance in mean response time, as R² =
0.7639. This would imply that factors such as GDP per capita, life expectancy, and trust in
government do have a significant impact on the happiness of a country. Scatterplots for each
independent variable and its individual effect on the happiness scores were made (Figure 1). The
individual coefficients for each independent variable were also noteworthy. Economy (β = 0.55),
Health (β = 1.27), Social Support (β=1.41), and Freedom (β1.60) all had beta values as their
regression weights that were significantly different from 0, all with p-values less than 0.01.
These weights indicate that freedom has the largest impact on happiness scores, but this is not an
absolute finding. The results of the analysis were calculated on RStudio (Table 1).
To examine the significance of the difference between the happiness scores of the top ten
and the bottom ten countries, a two-tailed t-test was utilized. The results showed that there was a
statistically significant difference between the scores of the top ten scores and the scores of the
bottom ten. The t-value was t(12) = 27.7, with p < .001, indicating a very low chance of random
association and high statistical significance. These values were calculated using RStudio (Table
2). This rejects the null hypothesis that the scores of the bottom and the top countries are not
significantly different. This means that happiness across the world is significantly different, so
there is a level of inequality in happiness at least somewhat related to the country of residence.
Two scatter plots, one for the happiness scores of each of the top ten countries, and one for
scores of each of the bottom ten countries, were created on RStudio to reflect the trend of the
scores. Information regarding the region to which these countries belong was also included
(Figure 2, Figure 3).
A linear regression was also conducted to determine correlation between 2015 and 2022
scores in happiness. There was a significant relationship between the 2015 scores and the 2022
scores for a country, since the p-value, p<0.001, was smaller than the alpha value α=0.05. It is
also a strong positive linear relationship, as indicated by a correlation coefficient of r=0.81. The
2015 Happiness scores explain 65% of the variance in 2022 scores, since R² =0.6544. A
scatterplot with a model regression line was made in RStudio to reflect this data (Figure 4).
These findings indicate that happiness scores across the years remain a reliable predictor of
future happiness scores, implying longevity in the factors that influence happiness. In addition to
other analyses, a simple graphical representation of the different happiness scores across each
region of the world was made to visually compare the difference between regions (Figure 5).
Discussion
It was predicted that the predictor variables of economy, social support, health, trust,
freedom, and generosity would have a strong correlation with the happiness scores of countries
in 2022. Utilizing a series of computational analyses that involved loading, cleaning, and
manipulating the data, this hypothesis was tested. There was a significant correlation between
those predictor variables and the scores. Specifically, it was a strong, positive linear correlation.
WORLD HAPPINESS 5

It was also predicted that there would be a significant difference between the scores of the top ten
and bottom ten countries ranked in terms of happiness, and a two-tailed t-test affirmed this
hypothesis. Lastly, it was predicted that correlation between the 2015 and 2022 happiness scores
of countries would be strong and positive, which was proven in the regression analysis
performed. In all cases, the null hypothesis of no significant difference was rejected.
This study’s limitations include the lack of further regression analyses to determine if
there are fixed and random effects, as well as possible confounding variables involved. There are
so many predictor variables that further analysis to determine presence of confounding factors
would be necessary, but was not possible at this time partially due to lack of data that could
include such confounds. Additionally, some of the countries did not have scores for both 2015
and 2022, or had data extrapolated from other years in order to fulfill some of the 2022 data.
Those examples were omitted from the dataset where necessary, but further investigation should
be done as to the true impact of these countries on overall results. There are also assumptions for
the t-test and the linear regression that may have not necessarily been met. For example, it is
assumed that there is homoscedasticity, no linear relationships in between variables, and that
each individual outcome is probabilistic.
There are several possible avenues for future research based on this study. Further
analysis using the several predictive variables could be performed to see whether or not there are
confounding effects on the results, even between variables. More information regarding other
aspects of what can cause happiness could be acquired for all countries, so that they may be used
in future studies to increase external validity of the study. The survey questions could also be
expanded to more than one for some of the variables, to again increase some external validity.
The question of what happiness can be caused by has broad potential, and even studying the
specificities of certain social and economic factors to a country’s overall happiness on specific
metrics opens the door to many further areas of study.

References
Helliwell, J. F., Layard, R., Sachs, J. D., De Neve, J.-E., Aknin, L. B., & Wang, S. (Eds.). (2022).
World Happiness Report 2022. New York: Sustainable Development Solutions Network.
WORLD HAPPINESS 6

Appendix A
RStudio Code for Loading and Cleaning Data

Appendix B
RStudio Code for Linear Regression of Independent Variables on 2022 Happiness Scores

Appendix C
T-Test of Top And Bottom 10 Ranking Countries’ Happiness Scores
WORLD HAPPINESS 7

Appendix D
Linear Regression of 2015 Scores on 2022 Scores

Table 1
Linear Regression analysis of Economy, Health, Social Support, Freedom, Trust nd Generosity
on 2022 Happiness Scores
WORLD HAPPINESS 8

Note: The above analysis denotes that with a degree of freedom of 139, the predictor variables
have a strong positive correlation (r=0.87) with the happiness scores. This result is statistically
significant as the p-value, p<0.001, is smaller than the alpha value 0.05.
Table 2
T-test analysis of Happiness Scores of top Ten Countries versus Bottom Ten

Note: The above correlation denotes that with a degree of freedom of 12, the happiness of the top
ten countries is significantly different than the happiness of the bottom ten. This result is
statistically significant as the p-value, p<0.001, is smaller than the alpha value 0.05.

Figure 1
Series of scatterplots of various predictor variables on 2022 Happiness Scores
WORLD HAPPINESS 9

Figure 2
Scatter Plot of Happiness Scores in 2022 of Top Ten Countries

Figure 3
Scatter Plot of Happiness Scores in 2022 of Bottom Ten Countries
WORLD HAPPINESS 10

Figure 4
Relationship between 2015 Happiness Scores and 2022 Happiness Scores

Note: The scatterplot above shows the correlative relationship between happiness scores in 2015
and 2022. Since p<0.001, which is far above the alpha value a=0.05, this relationship is not
statistically significant. The r-value of r=0.80 indicates a strong, positive linear relationship.

Figure 5
Bar Chart depicting Happiness Scores as Differentiated by World Region
WORLD HAPPINESS 11

You might also like