Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

lOMoARcPSD|24382262

ECON1313 Assignment 2 s3879951 Fuoc An Doanh

Basic Econometrics (Royal Melbourne Institute of Technology University Vietnam)

Studocu is not sponsored or endorsed by any college or university


Downloaded by than loc (tloc02032010@gmail.com)
lOMoARcPSD|24382262

Assignment 2: Empirical Project Individual


Report (40%)
ASSIGNMENT COVER PAGE

Course code ECON1313

Course name Basic Econometrics

Assignment 2 - Empirical Project Individual Report

Lecturer Bob Baulch

Student Fuoc An Doanh

sID S3879951

Word count 2713

Part 1. Overview and Data Collection 2

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

1. Overview of topic 2
2. Data Importation and Descriptive Statistics 3
a) Missing values 3
b) Natural logarithm 3
Part 2. Descriptive Statistics and initial estimation 4
1. Descriptive Statistics 4
2. Model 1 5
3. OLS model of Model 1 5
Part 3. Interpretation 6
1. Interpret Goodness-of-Fit - R² 6
2. F-test for Model 1 7
3. t-test for Model 1 7
4. Discussion 8
5. Testing multicollinearity 8
Part 4. Further Estimation 9
1. High DEPRATIO 9
2. Hypothesis t-test for HDP 10
3. Generate interaction term 10
4. Hypothesis t-test for interaction term 10
5. Model 4 11
6. Choosing the best Model 11
Part 5. Conclusion 11
1. Summarizing the findings 11
2. Policy recommendations 12
3. Limitations and suggestions 12
References 12
Appendices 13

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

Part 1. Overview and Data Collection

1. Overview of topic
Access to healthcare is a prominent issue in many countries, as health is a fundamental factor
affecting the growth of not only humans but the country as a whole. In fact, it is the objective
of countries and one of the prime goals of the Sustainable Development Goals to ameliorate
the health conditions. Thus, it is crucial to identify the factors which determine healthcare
expenditures.

In a study in the previous year, researchers examined the determinants of healthcare expenses
in developing and transitional countries. The results showed that, among the examined
factors, Foreign Direct Investment (FDI), personal remittances (PR), urbanization, life
expectancy (LE), population age 65 and above (POP65) and unemployment are some
noteworthy determinants of healthcare spending in developing and transitional countries. In
particular, FDI and PR have negative relationships with wellbeing expenditure in developing
countries, while PR has a significant positive effect among transitional countries.
Additionally, wellbeing spending in transitional countries is also impacted by unemployment
with positive signs. On the other hand, urbanization has a considerably negative effect on
both developing and transitional countries. Meanwhile, LE and POP65 have positive
associations with wellbeing expenses in income-related classified countries (Awais et al.
2021).

Typically, FDI has a major impact on healthcare expenditure due to its ability to raise
awareness of health related goods and services in low-income countries, thus affecting the
stocks of said goods and services. Meanwhile, PR is found to affect healthcare knowledge of
the populace and reduce poverty. Moreover, trade liberalization has great relation with LE
and child fatality rates. LE is proven to be one of the main determinants of wellbeing
expenditure and child mortality rate, as well as having a major effect on FDI (Akca et al.
2017). Furthermore, urbanization helps people to have more access to healthcare. On the
other hand, some studies suggest that the unemployment rate has a negative effect on
healthcare expenditure (Abbas & Hiemenz 2011) while some say otherwise (Braendles &
Colombier 2016). As for population age, specifically population age 65 and above, it is
observed that the variable has an affirmative association with healthcare expenditure (Awais
et al. 2021).

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

2. Data Importation and Descriptive Statistics


The data provided of countries from 2012 is extracted from the World Bank. The variables
used in determining the healthcare expenditures are displayed in Appendix 1.

a) Missing values
In the dataset given, there are missing values for Niger and South Sudan in 2012. To solve the
issue, the value missing for Niger is replaced by the average of the two nearest years, 2011
and 2013. Regarding South Sudan, since there are many values missing and there are no
nearest values possible, the country is removed from the dataset.

b) Natural logarithm
In the suggested model, natural logarithm of current healthcare expenditure per capita
(PCHE), crude birth rate (per 1000 people) (CBR), gross domestic product per capita (current
US$) (GDPPC), and net official development assistance received (current US$) (NETODA)
will be used. The main reasons are that natural logarithm is useful for multiple linear
regression models in the sense that it helps the data be more skewed and normally distributed.
In addition, in the process of deriving the descriptive statistics of the variables, outliers are
found; natural logarithm plays a role in minimizing the impact of the outliers to the dataset.
Another benefit of natural logarithm is that it eradicates heteroscedasticity, which causes OLS
ceases to be the minimum variance estimator, as well as nullify F-test and t-test. Thus, it is in
best interest to take the natural logarithm of the variables.

Part 2. Descriptive Statistics and initial estimation

1. Descriptive Statistics
The descriptive statistics of the variables are demonstrated as below.

PCHE CBR URBAN DEPRATIO GDPPC NETODA

Mean 192.63 29.00 48.12 31.85 3845.64 708.84

Standard Error 29.47 1.71 3.16 3.23 680.15 191.32

Median 120.01 28.08 51.41 32.02 2590.14 420.24

Standard 176.81 10.23 18.94 19.41 4080.89 1147.93


Deviation

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

Sample 31260.77 104.68 358.88 376.73 16653665.36 1317733.16


Variance

Minimum 19.81 11.17 11.19 1.86 252.36 -181.95

Maximum 604.21 48.93 88.19 63.35 21711.15 6666.32

IQR -247.46 -16.53 -28.01 -35.34 -5034.45 -690.40

CoV(%) 0.92 0.35 0.39 0.61 1.06 1.62


Table 1. Descriptive Statistics of the variables (Note: for the convenience of display, NETODA has
been divided by 1 million compared to original data)
- PCHE: In general, the values of the variable are fairly spreaded; the sample variance
is observed to be relatively high, but the CoV of the variable is below 1, thus, the data
is rather low-variance.
- CBR: It can be seen that both the SD and CoV is considerably low, which indicates
that the values concentrated around the mean, hence, the dataset is stable.
- URBAN: As observed in Table 1, with the SD value of 3.16 and CoV of 0.39, the
dataset of URBAN is remarkably concentrated, which indicates that the dataset of this
variable is consistent.
- DEPRATIO: Regarding DEPRATIO, the descriptive statistics suggest that the dataset
is rather centralized; the CoV is less than 1, which means that the data is nonvolatile.
- GDPPC: In the dataset of this variable, it is clearly seen that there are signs of
volatility. In particular, SV is remarkably high, along with a CoV more than 1. The
dataset is risky to use.
- NETODA: Similar to GDPPC, data points in this dataset are fairly distant from the
mean; the minimum and maximum value have immense difference, causing the data
to be dispersed.

2. Model 1
The population multiple linear regression Model 1 is as demonstrated as below:

3. OLS model of Model 1


In order for Model 1 to be estimated by the Ordinary Least Square (OLS) model, at least 4
out 6 assumptions of OLS need to be satisfied.

MLR.1: The relationship between dependent and independent variables is linear. This
assumption is satisfied.

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

MLR.2: Random sampling is met. The dataset given consists of 36 countries from different
continents and income classes.

MLR.3: this is the assumption that the independent variables in the Model 1 have no perfect
collinearity relationship. This assumption is applied for Model 1.

MLR.4: In this assumption, zero conditional mean, implying that there should be no
information regarding the explanatory variables contained in the mean of error. The residual
means is figured out to be nearly zero, thus, MLR.4 is satisfied.

MLR.5: Homoscedasticity. This assumption is tested using the Breusch-Pagan test. The
result given provided that the p-value (0.666) is larger than 0.05. Hence, we reject the
hypothesis that there is the presence of heteroscedasticity in the data, and MLR.5 is met.

Since 4 out of 6 OLS assumptions are satisfied as mentioned above, OLS can be applied to
Model 1 as below.

Coefficient Standard Error t value P-value

Intercept 9.491e-01 2.242e+00 0.423 0.675074

log(CBR) -5.379e-01 5.539e-01 -0.971 0.339248

URBAN 3.709e-03 7.835e-03 0.473 0.639391

DEPRATIO 4.989e-03 1.217e-02 0.410 0.684826

log(GDPPC) 6.739e-01 1.569e-01 4.295 0.000169

NETODA 4.818e-05 7.707e-05 0.625 0.536621


Table 2. Regression of Model 1 using the OLS method

Part 3. Interpretation

1. Interpret Goodness-of-Fit - R²
Using the results given, Model 1 is written as below in standard regression format:

The adjusted R-squared is 0.81, which means that approximately 81% of the variations in

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

log(PCHE) is demonstrated by the variations of log(CBR), URBAN, DEPRATIO,


log(GDPPC), and NETODA. The remaining 19% is explained by further factors unmentioned
in this paper.

Because adding more variables to the Model can increase the R-squared value, despite
whether the variable has any relationship with the Model, the adjusted R-squared penalizes
the addition of new variables. If the gap between multiple and adjusted R-squared is great,
then it indicates overfitting. The multiple R-squared of Model 1 is 0.839, so the difference
between multiple and adjusted R-squared is approximately 0.029; this implies that the
variables are mostly significant to the Model.

2. F-test for Model 1


In order for the F-test to be done, at least 4 OLS assumptions need to be held. As discussed in
part 2, the condition is fulfilled.

The hypotheses of F-test state:

Null hypothesis: , all regression coefficients are equal to 0 (The variables are not at all useful
in determining healthcare expenditure).

Alternative: , reject null hypothesis (At least one variable has an impact on healthcare
expenditure).

Using R, we were able to derive that F-statistics is 31.204 and p-value of 5.002e-11. At the
significance level of , we reject the null hypothesis as the p-value is significantly smaller than
the significance level.

Thus, as the result of the F-test, at least one variable in the data has an impact on healthcare
expenditure.

3. t-test for Model 1


T-test is applicable as OLS assumptions are met as demonstrated in part 2. Hypotheses of t-
test state:

Null hypothesis: , The explanatory variables have no effect on healthcare expenditure.

Alternative: , reject null hypothesis (At least one variable is the determinants of healthcare
expenditure).

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

With , we derive the results as below:

Variables p-value >,<,= α Reject null hypothesis?

log(CBR) 0.3392 > No

URBAN 0.6394 > No

DEPRATIO 0.6848 > No

log(GDPPC) 0.000 < Yes

NETODA 0.5366 > No


Table 3. T-test for variables
Thus, according to Table 3, log(CBR), URBAN, DEPRATIO, and NETODA are not
statistically significant. log(GDPPC) is the only statistically significant variable.

Interpretation:

Coefficient of log(GDPPC) = 0.673904599 indicates that if GDP per capita increases by 1%,
healthcare expenditure will increase by 0.674%, holding other factors constant.

4. Discussion
From the analysis above and the research article in part 1, the findings are somewhat
expected. The relationship between urbanization and healthcare expenditure is found to be
negatively correlated in developing and transitional countries. On the other hand, in the
author’s findings, the relationships between GDP with developing and transitional countries
are negative, with the coefficients of -0.09 and -0.02 (Awais et al. 2021). However, since the
coefficients are not significantly low, the negative relationships are weak.

5. Testing multicollinearity
Using the variance inflation factor (VIF) method, multicollinearity is tested between the
variables.

VIF

log(CBR) 7.351083

URBAN 3.446012

DEPRATIO 8.733128

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

log(GDPPC) 4.548462

NETODA 1.224260
Table 4. Variance inflation factor of variables
It can be seen in Table 4, the VIF of log(CBR) and DEPRATIO are relatively high. However,
the values are below 10, so the correlation result is acceptable and no potential
multicollinearity is detected in the Model.

Part 4. Further Estimation

1. High DEPRATIO
The median of DEPRATIO is estimated at 32.0195. The new dummy variable, High
DEPRATIO (HDP) is generated and receives 1 with the condition of if DEPRATIO is higher
than its mean and 0 if otherwise. The new Model 2 is created by replacing the DEPRATIO
variable with the HDP variable. The population multiple linear regression of Model 2 is
demonstrated below.

Coefficients Standard Error t value P-value

Intercept 4.383e-01 1.795e+00 0.244 0.809

lCBR -4.090e-01 3.781e-01 -1.082 0.288

URBAN 3.448e-03 7.503e-03 0.460 0.649

HDP 3.224e-01 2.705e-01 1.192 0.243

lGDPPC 6.864e-01 1.433e-01 4.790 4.21e-05

NETODA 4.741e-05 7.340e-05 0.646 0.523


Table 5. Regression of Model 2 using the OLS method
With the above information, we can derive the sample multiple linear regression of Model 2
as below.

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

2. Hypothesis t-test for HDP


Null hypothesis: healthcare expenditure is not impacted by HDP.

Alternative: HDP has an influence on healthcare expenditure.

The p-value HDP is higher than the level of significance (0.243 > 0.05). As a result, the
alternative hypothesis is rejected, meaning that HDP does not affect the healthcare
expenditure per capita.

3. Generate interaction term


An interaction term between two variables, log(GDPPC) and HDP, is generated to evaluate
the impact of these two variables on healthcare expenditure. Adding the interaction term into
Model 2, we have the new Model 3:

Coefficients Standard Error t value P-value

Intercept 4.473e-01 1.847e+00 0.242 0.810

lCBR -4.138e-01 4.125e-01 -1.003 0.324

URBAN 3.456e-03 7.635e-03 0.453 0.654

HDP 3.768e-01 1.709e+00 0.220 0.827

lGDPPC 6.875e-01 1.499e-01 4.585 8.02e-05

NETODA 4.739e-05 7.465e-05 0.635 0.531

GDPPC×HDP -7.072e-03 2.194e-01 -0.032 0.975


Table 6. Regression of Model 3 using OLS method
Using the given data above, the sample multiple linear regression of Model 3 is as below.

4. Hypothesis t-test for interaction term


Null hypothesis: the interaction term does not affect healthcare expenditure.

Alternative: healthcare expenditure is influenced by the interaction term.

At a significance level of 0.05, the p-value of the interaction term is 0.975. Thus, the

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

alternative hypothesis is rejected, which indicates that the effect of GDPPC is not influenced
by HDP.

5. Model 4
Throughout re-working the Model, we found out that the data of NETODA is extremely
volatile due to a huge outlier. To eliminate the problem as well as secure homoscedasticity,
Model 4 could use the natural logarithm of NETODA (log(NETODA)). The estimating
equation for Model 4 would be:

However, the major problem of the dataset was the appearance of negative numbers (there are
two). Because of that, natural logarithm cannot be applied to NETODA. Hence, Model 4 is
not the best of choice.

6. Choosing the best Model


After considering the 4 models above, Model 1 is the final preferable choice. In Model 2 and
3, the p-values given of the two added variables, HDP and GDPPC×HDP, are higher than the
significance level α=0.05 (even if changing the significance level to 0.1, the p-values are still
larger), indicating that these two variables are statistically insignificant. As for Model 4, the
attempt of using natural logarithm for NETODA has failed, causing the Model to best be
avoided. The assumptions and interpretations of coefficients in Model 1 have been done
before in this paper.

Part 5. Conclusion

1. Summarizing the findings


To sum up my findings, among the variables, GDP per capita and NETODA were two
considerably volatile datasets to begin with. GDP per capita is the only factor found to have a
significant impact on healthcare expenditure in this research, despite the research paper used
for reference said otherwise. Meanwhile, some other factors, including later added ones, do
not play any role in defining healthcare expenditure. There is no potential multicollinearity in
Model 1. The best Model out of 4 models suggested is Model 1.

10

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

2. Policy recommendations
Based on my analysis, the policy recommendation for the governments of my assigned group
of countries would be to focus on factors which induce economic growth. As the economy is
more stable, this will attract FDI to invest in healthcare related systems. Another note is that
GDP per capita remarkably affects healthcare expenditure, which can explain in the way that
as income increases, people are more willing to access healthcare, which might be expensive
in some places.

3. Limitations and suggestions


One limitation of this report is that one of the countries in the dataset, South Sudan, is
eliminated because of insufficient data, causing the dataset to shrink. Another limitation is
that although the NETODA can be used best if natural logarithm can be applied, but the
appearance of negative values has prevented NETODA from being taken in the most suitable
functional forms. To improve the estimations in my models, considering variables is
suggested.

References
Abbas F and Hiemenz U (2011) 'Determinants of Public Health expenditures in Pakistan',
ZEF-Discussion Papers on Development Policy No. 158, 1 November 2011, accessed 6
December 2022, <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1967070>.

Akca N, Sönmez S and Yılmaz A (2017) 'Determinants of health expenditure in OECD


countries: A decision tree model', Pak J Med Sci, 2017, 33(6) : 1490-1494, doi:
https://doi.org/10.12669/pjms.336.13300, accessed 9 December 2022,
<https://pjms.com.pk/index.php/pjms/article/view/13300>.

Awais M, Khan A and Ahmad SM (2021) 'Determinants of health expenditure from global
perspective: A panel data analysis', Liberal Arts &Social Sciences International Journal, 29
June 2021, 5(1) : 481-496, doi: https://doi.org/10.47264/idea.lassij/5.1.31, accessed 9
December 2022, <https://ideapublishers.org/index.php/lassij/article/view/306/177>.

Braendle T and Colombier C (2016) 'What drives public health care expenditure growth?
Evidence from Swiss cantons, 1970–2012', Health Policy, September 2016, 120(9) : 1051-
1060, doi: https://doi.org/10.1016/j.healthpol.2016.07.009, accessed 10 December 2022,

11

Downloaded by than loc (tloc02032010@gmail.com)


lOMoARcPSD|24382262

<https://www.sciencedirect.com/science/article/abs/pii/S0168851016301816?via%3Dihub>.

Appendices
Dependent/Independent
Variables Unit Abbreviation
Variables

Current health
expenditure per Current US$ PCHE Dependent variable
capita

Birth rate, crude per 1000 people CBR Independent variable

urban population, %
Urbanization of the total Urban Independent variable
population

Share of the
population that is
under 15 years of
age or above 65 DepRatio Independent variable
years of age as a
percentage of the
population

GDP per capita current US$ GDPPC Independent variable

Net official
development current US$ NETODA Independent variable
assistance received
Appendix 1. Variables used in estimating multiple regression of healthcare expenditure

12

Downloaded by than loc (tloc02032010@gmail.com)

You might also like