Econometrics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Cross-Sectional Data Analysis

How does basic sanitation conditions and health


expenditures affect life expectancy

Leonor Conde Ferreira


University of Zagreb, Faculty of Economics and Business
0067655493@net.efzg.hr

Econometrics

Winter Semester 2021/2022

1
1.Introduction:
In this paper, I will analyze the impact of basic sanitation conditions and expenditures
on health in life expectancy. Therefore, I selected 40 countries and analyzed cross-
sectional data for the year 2018.

To give some background for this paper we define Health care expenditures as “…the
economic resources dedicated to health functions, excluding capital investment. Health
care expenditure concerns itself primarily with health care goods and services that are
consumed by resident units, irrespective of where that consumption takes place or who
is paying for it. As such, exports of health care goods and services are excluded, whereas
imports of health care goods and services for final use are included”, access to basic
sanitation services is having access to “… improved sanitation facilities that are not
shared with other households” and life expectancy is “…the number of years a newborn
infant would live if prevailing patterns of mortality at the time of its birth were to stay
the same throughout its life.”

We can expect that increasing access to basic sanitation services will greatly improve
overall health and life expectancy. Similarly, countries that tend to have higher
expenditures on health care, will give the population more and better treatments which
will lead to a higher life expectancy.

Given these assumptions, we arrive at the major hypothesis of this paper which is that
access to basic sanitation conditions and health care expenditures affect positively life
expectancy.

2.Data and model:


To study how better living conditions and better health treatments affect life expectancy
I chose 40 countries: Afghanistan, Angola, Australia, Austria, Belgium, Benin, Bosnia and
Herzegovina, Brazil, Canada, China, Cameroon, Costa Rica, Germany, Denmark, Spain,
Ethiopia, France, Gabon, Georgia, Gambia, Croatia, Luxembourg, Madagascar,
Montenegro, Mongolia, Malaysia, Namibia, Nigeria, Netherlands, Norway, Nepal,
Poland, Portugal, Romania, Russian Federation, South Sudan, Sweden, Venezuela, South
Africa and Zimbabwe and analyze the data for the year of 2018.

2
The dependent variable is life expectancy at birth for both genders measured in years
and the independent variables are People using at least basic sanitation services as a
percentage of the total population and Current health expenditure per capita measured
in current US dollars. All of the previous data can be found in The World Bank database.

Then we arrive at the major hypothesis of this paper which is:

Access to basic sanitation conditions and health care expenditures affect positively life
expectancy.

Firstly, I installed all the required packages to make the analysis:


"wbstats","stargazer","lmtest" and "tseries".

Throughout this paper, I will be using the OLS method (Ordinary Least Squares), which
will find”… the regression line that gives the best fit to data points such that the sum of
squared residuals is small as possible “.

𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏 𝒙𝒊 + 𝒖𝒊 , ∀𝒊

Where 𝜷𝟎 is the constant term, 𝜷𝟏 is the slope coefficient and 𝒖𝒊 are the residuals which
is the difference between the actual value and the estimated value.

In the first model, I will use a linear model of life expectancy as a function of
expenditures on health care.

𝑳𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑬𝒊 + 𝒖𝒊 , 𝒊 = 𝟏, . . . , 𝟒𝟎

Where 𝑳𝒊 is the life expectancy in the country i and 𝑬𝒊 is the expenditures on health care
in the country i.

In the second model, I will use a linear model of life expectancy as a function of the basic
sanitation services.

𝑳𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑺𝒊 + 𝒖𝒊 , 𝒊 = 𝟏, . . . , 𝟒𝟎

Where 𝑳𝒊 is the life expectancy in the country i and 𝑺𝒊 is the percentage of people using
at least basic sanitation services in the country i.

In the third model, I will take the first model into logs.

𝒍𝒐𝒈(𝑳𝒊 ) = 𝜷𝟎 + 𝜷𝟏 𝒍𝒐𝒈(𝑬𝒊 ) + 𝒖𝒊 , 𝒊 = 𝟏, . . . , 𝟒𝟎

3
Where 𝑳𝒊 is the life expectancy in the country i and 𝑬𝒊 is the expenditures on health care
in the country i.

In the fourth model, I will take the second model into logs, but just the dependent
variable because the independent variable is already in percentage of the population.

𝒍𝒐𝒈(𝑳𝒊 ) = 𝜷𝟎 + 𝜷𝟏 𝑺𝒊 + 𝒖𝒊 , 𝒊 = 𝟏, . . . , 𝟒𝟎

Where 𝑳𝒊 is the life expectancy in the country i and 𝑺𝒊 is the percentage of people using
at least basic sanitation services in the country i.

Finally, in the fifth model, I will estimate multivariate linear econometric model by
adding to the third model the percentage of people using at least basic sanitation
services as an additional independent variable.

𝒍𝒐𝒈(𝑳𝒊 ) = 𝜷𝟎 + 𝜷𝟏 𝒍𝒐𝒈(𝑬𝒊 ) + 𝜷𝟐 𝑺𝒊 + 𝒖𝒊 , 𝒊 = 𝟏, . . . , 𝟒𝟎

3. Empirical results:
Firstly, I imported the data directly from The World Bank using the “wb_data” command.

Then I started the analysis of the first model, where life expectancy is plotted as a
function of health care expenditures.

4
By analyzing the output given by the summary command we can see that this estimation
is statistically significant, because it has three asterisks and the p-value=8.75e-08 is
smaller than 0.001. Therefore, the null hypothesis will be rejected, which means that
the expenditures on health care affect life expectancy.

Besides that, we can see that the R-squared, which measured the proportion of the
variance for the dependent variable ( life expectancy) explained by the independent
variable (expenditures on health care), is 0.5337 which means that the model does not
fit the data that well. If health expenditures increase by 1 US dollar the life expectancy
will increase by 0.0026 years, as we can see by the estimation of the health expenditure
coefficient.

For the second model, I estimated the life expectancy as a function of basic sanitation
conditions.

5
We conclude that this estimation is statistically significant with a p-value of 1.802e-13
which is smaller than 0.001, therefore the null hypothesis is rejected, so the sanitation
conditions affect the life expectancy. If the access to basic sanitation services will
increase by 1%, the life expectancy will increase by 0.2523 years.

This model fits the data better than the previous one because the R-squared is closer to
1.

For the third model, I used the same variables that in the first model but taken into logs.

6
As well as the previous models, this estimation is also statistically significant with p-
value=4.556e-13 which is smaller than 0.001. Besides that, we can see that this model
fits data better than the first one because the R-squared went from 0.5337 to 0.7521
with the logs because the logs not only reduce the outliers presented in our data but
also give us the linearity that we need to make the model better.

If health care expenditures increase by 1 % the life expectancy will increase by 0.0565%

In the fourth model, I used the same variables as in the second model but I took the
dependent variable into logs.

7
By analyzing these results, we can see that it has a p-value equal to 2.385e-13 so this
coefficient estimation is statistically significant as well. Therefore, the null hypothesis is
rejected which means that the sanitation conditions affect life expectancy. If the access
to basic sanitation services will increase by 1%, the life expectancy will increase by
0.0036%.

By adding logs to the second model the R-squared slightly decreased so, this model does
not fit the data as well as the second model.

The last model that I estimated was the multivariable model.

8
By adding the sanitation variable to the third model, the health care expenditures
coefficient becomes less significant, p-value= 0.002. And the sanitation coefficient is also
statistically significant but not as significant as in the second or fourth model.

Despite being less significant, the R-squared is bigger so it fits data best when we add
the sanitation variable.

Moreover, to compare all the 5 models I will use the “stargazer” command.

9
By adding the variable that measures the access to basic sanitation to the third model,
the created model (final one) was improved, since the estimation is significant and the
R2 is the highest among all five models. So, the fifth model fits data the best.

Lastly, I will do the diagnostic checking of all the previous models.

Heteroscedasticity testing

10
In this test, I will check if the error terms have constant variance, as long as the variance
of error terms is the same for every observation there will be no heteroscedasticity
problem. The null hypothesis will be that the error terms are homoscedastic.

By checking the p-values of the models, we can see that the first model which has a p-
value=0.01617 and the third model with p-value=0.01671, both have p-values bigger
than 0.01 but smaller than 0.05, which means that at 1% significance level we will not
reject the null hypothesis then at this significance level the error terms are
homoscedastic and there is no problem of heteroscedasticity in these models.

11
Doing the same thing for the other models, I can conclude that the second, fourth, and
fifth models have p-values smaller than 0.01 therefore, we will reject the null hypothesis
meaning that the heteroscedasticity problem exists since the error terms are not the
same for every observation.

Higher order autocorrelation testing

For the second diagnostic checking, we will use the Breusch − Godfrey test to see if error
terms are independent, to this happen all covariances should be 0. If this does not
happen we will have an autocorrelation problem.

12
By checking the p-values of all models, we can see that all the models have p-values
bigger than the significance levels, which means that we will not reject the null
hypothesis, then the error terms are independent and there is no problem of
autocorrelation.

Normality testing

13
In the last diagnostic checking, I will use the Jarque Bera test to see if the error terms
are normally distributed with zero mean. The null hypothesis will be that the error terms
are normally distributed and if the null hypothesis is rejected we will have a problem
because the distribution assumed was inappropriate which means that the studied test
is not valid.

By analyzing the p-values of the previous model I can conclude that the first, second,
and fourth models are valid because they have big p-values which means that the null
hypothesis is not rejected. On the other hand, the fifth model has a p-value=0.0005742
which is smaller than the significance level, so we will reject the null hypothesis and
conclude that the fifth model is not valid.

Lastly, the third model has a p-value smaller than 0.05 but bigger than 0.01, so at 1%
level of significance we will not reject the null hypothesis but at 5% we will reject.

After doing the diagnostic checking for all models, we see that the fifth model despite
being the one that fits data the best is not appropriate for my research because it does
not pass the bptest neither the jarque bera test. Among all models, the only ones that
pass all the diagnostic checking tests are the first and third models (but not at all levels
of significance).

4. Conclusion:
The goal of this paper was to understand how the investments in basic sanitation
conditions and in health care would affect the life expectancy by analyzing data of 40
countries for the year 2018. In order to do the appropriate estimations and test of the 5
models, I used the RStudio Cloud.

Despite all model estimations being statistically significant we can not conclude that life
expectancy depends positively on basic sanitation conditions because the second,
fourth and fifth models have heteroscedasticity problems and also, the distribution of
the fifth model is not appropriate. These heteroscedasticity problems can be explained
by the fact that there is a big disparity between the largest and smallest values.

We can conclude that life expectancy depends positively on health care expenditures at
1% level of significance because both the first and third model’s estimations are

14
statically significant as we previously saw and there is no heteroscedasticity,
autocorrelation problems and the error terms are normally distributed.

With this research, we conclude that to improve life expectancy investments in health
care should be done.

5.References:
Internet Sources:

The World Bank. Current health expenditure per capita (current US$) [online]. Available:
https://data.worldbank.org/indicator/SH.XPD.CHEX.PC.CD?view=chart

[Accessed: 22.12.2021]

The World Bank. Life expectancy at birth, total (years) [online]. Available:
https://data.worldbank.org/indicator/SP.DYN.LE00.IN?view=chart

[Accessed: 22.12.2021]

The World Bank. People using at least basic sanitation services (% of population)[online].
Available: https://data.worldbank.org/indicator/SH.STA.BASS.ZS?view=chart

[Accessed: 22.12.2021]

15

You might also like