Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 29

FOREIGN TRADE UNIVERSITY

FALCUTY OF INTERNATIONAL ECONOMICS

---------***--------

ECONOMETRICS
MID-TERM REPORT

FACTORS AFFECTING GROSS REGIONAL DOMESTIC


PER CAPITA PRODUCT OF VIET NAM IN 2018

Instructor: Ph.D Đinh Thị Thanh Bình


Class ID: KTEE 309.2
Group number: 10
Group members:
04 – Đỗ Quang Anh – 1911150007
23 – Nguyễn Khắc Đức – 1911150019
16 – Nguyễn Tuấn Cường – 1911150015
24 – Tạ Văn Đức – 1911150020

Ha Noi – December, 2020


TABLE OF CONTENTS

INTRODUCTION ................................................................................................................. 3
I. LITERATURE REVIEW.................................................................................................. 4
1. Question of interest.................................................................................................. 4
2. Procedure and program.......................................................................................... 5
II. DATA COLLECTION...................................................................................................... 7
1. Data type.................................................................................................................. 7
2. Data collection.......................................................................................................... 7
III. STATISTICAL DESCRIPTION OF VARIABLES...................................................... 8
1. Running DES function............................................................................................ 8
2. Running SUM function........................................................................................... 8
3. Running TAB function............................................................................................ 9
IV. QUANTITATIVE ANALYSIS....................................................................................... 13
1. OLS method and assumptions................................................................................ 13
2. Regression and correlation..................................................................................... 14
V. TESTING PROBLEM...................................................................................................... 16
1. Test omit variable.................................................................................................... 16
2. Multicollinearity testing.......................................................................................... 17
3. Heteroskedasticity testing....................................................................................... 18
4. Normality (of u) testing........................................................................................... 21
VI. SATISTICAL HYPOTHESIS TESTING...................................................................... 23
1. Critical value method.............................................................................................. 23
2. Confidence interval method.................................................................................... 23
VII. CONCLUSIONS AND POLICY IMPLICATION...................................................... 25
1. Conclusions.............................................................................................................. 25
2. Policy implication.................................................................................................... 25
VIII. REFERENCE................................................................................................................ 26
IX. APPENDIX....................................................................................................................... 27

2
INTRODUCTION

Economics is a science which determines social development and national growth. With the
development in economics research, econometrics is an important subject which helps people
study many economics issues to find the way to develop the economy. Econometrics is based on
the development of statistical methods for estimating economic relationships, testing economic
relationships, testing economic theories, evaluating government and business policies. It is an
useful and indispensable tool for economists to measure economic relationships. Therefore, we
realized the importance of understanding econometrics and successfully applying this knowledge
to logically analyze statistical problems.  Thanks to econometrics, humans will have a clear view
about economic policies, theories and phenomena. 

  Similar to GDP, GRDP per capita is an index which reveals the development of the economy but
in small regions such as cities, towns or provinces. Many people wonder what factors affect this
index and their impact on it. In this report, we will try our best to clarify for the readers about
“Factors affecting gross regional domestic product of Viet Nam” by using the methodology of
econometrics and the STATA program.

  We sincerely appreciate our econometrics instructor – PhD. Dinh Thi Thanh Binh on helping us
to complete this report. During our working process, mistakes are inevitable but we hope that you
can comment on our work and give us some advice to help us develop ourselves. 

3
I. LITERATURE REVIEW

1. Question of interest

  Gross domestic product (GDP) is a statistic that measures the size of a region's economy. The
GDP per capita is useful in capturing real output per person growth since inflationary effects have
been removed. It is, therefore, the most widely used measure of real income. However, we believe
that the income of people in each region of a country is relative different, hence, so we chose
GRDP per capita (gross regional domestic product per capita) as the main object of our research.
The GRDP per capita is one of the most important indexes to rate the growth of the economy of a
region. Therefore, our group raised a question:” What are the factors and their impact on the GRDP
per capita”.

  Even though there are many factors that impact on the GRDP per capita, we focus mainly on 4
factors. They are population density, high school graduate rate, participation labor rate and FDI.
We will focus on the factor to find out what impact or statistical impact of them on GRDP per
capita of Viet Nam.

These factors have their own ways to affect the economic growth, and can be shown by some
significant indexes like GDP, CPI, etc. And that’s why we consider that they can affect the GRDP,
and GRDP per capita, too.

Based on Anna Ek's study in 2007, the theoretical framework shows that FDI has a positive
impact on economic growth because it serves as a channel through which new technology is
transferred from one country to another, and thereby it increases output and GDP/GRDP in the
recipient country. 

About density, too high population density decreases the natural endowment per capita, but eases
the development of infrastructure, leading to existence of an optimal population density for
economic growth (Yegorov, 2009). 

The Alliance for Excellent Education (2015) released data outlining the economic benefits of a
high school diploma. The “Graduation Effect” data shows how increasing the high school
graduation rate to 90 percent creates new jobs, increases consumer spending, boosts tax revenue,
and increases the GDP/GRDP.

4
According to research published by the Federal Reserve Bank of Philadelphia in 2017, a falling
participant labor rate can slow the growth of GRDP at a region, since fewer people are contributing
to the region’s output of goods and services. Additionally, a lower participation rate can lead to
higher tax rates, since the government has a narrower tax base from which to draw revenue, the
authors noted.

  In the following parts, models and data are going to be utilized in order to run the regression
model and the result will be analyzed in order to answer the question of interest.

2. Procedure and program

   Econometrics refers to a branch of business analytics, modeling, and forecasting techniques


for modeling the behavior or forecasting certain business, financial, economic, physical science,
and other variables. The Stata program is primarily used to analyze the data and run the regression
model.

   A basic tool for econometrics is the multiple linear regression model. Econometric theory
uses statistical theory and mathematical statistics to evaluate and develop econometric methods.
Econometricians try to find estimators that have desirable statistical properties including
unbiasedness, efficiency, and consistency.

  There are 8 steps to conduct an empirical analysis:

 Step 1: Question of interest based on economic theories.

  In principle, econometric methods can be used to answer a wide range of questions, such as:
testing some aspects of an economic theory and effects of a government policy. In cases when we
need to test an economic theory, a formal economic model is constructed. An economic model
consists of mathematical equations that describe various relationships. For example, individual
consumption decisions, subject to a budget constraint, are described by mathematical methods

 Step 2: Set up mathematical model

  The mathematical model reflects the exact relationship between variables.

 Step 3: Set up econometric model

  An econometric model can be derived from a mathematical model by allowing for uncertainty

  The error term of disturbances in econometric models represents factors that are not included in
the model but can affect the dependent variable.

 Step 4: Data collection


5
  Data can be divided into 2 types: Primary and Secondary data

  The structure of economic data: Cross-sectional data, time-series data and pooled data. Pooled
data can be furthermore categorized into pool cross sectional data and panel data

 Step 5: Estimate parameters of the model

  Parameter estimates (also called coefficients) are the change in the response associated with a
one-unit change of the predictor, all other predictors being held constant. The unknown model
parameters are estimated using least-squares estimation.

 Step 6: Test mistakes of the model

  The assumptions of the model can be violated when there are high multicollinearity,
heteroskedasticity and autocorrelation

 Step 7: Test hypothesis 

Fisher, Durbin-Watson, Lagrange, Hausman test can be used to test the appropriation of the model
and estimated parameters.

 Step 8: Analyze the estimated results and forecasting/ policy implication

6
II. DATA COLLECTION

1. Data type

- The estimation of the model is in the form of a Cross Sectional Data.

- A cross-section data set consists of a sample of individuals, households, firms, cities... taken
at a given point of time. The analysis might also have no regard to differences in time.
Analysis of cross-sectional data usually consists of comparing the differences among
selected subjects. The data collected in this report are obtained from the data collected by
each provinces/cities of Vietnam

2. Data collection

- Data in this report is secondary data, as they are collected from a given source.

- Collected in 2018, from 62 provinces of Vietnam


- Source of data: General Statistics Office of Vietnam (link: gso.gov.vn)
- The meanings of each variable:

 GRDP: GRDP per capita (Mil VND/ Capita/ year)


 Grad: Highschool graduation rate (%)
 Inv: Foreign Direct Investment (Mil USD)
 Dens: Population density (people/km2)
 Rate: Labor participation rate (%)

7
III. STATISTICAL DESCRIPTION OF VARIABLES
1. Running DES fuction
The most important information after using the DES function is the variables’ label.

. des grdp grad inv dens rate

storage display value


variable name type format label variable label
----------------------------------------------------------------------------
grdp double %10.0g Gross regional domestic product
per capita
grad double %10.0g High school graduation rate
inv double %10.0g Foreign direct investment
dens int %8.0g Population density
rate double %10.0g Labor participation rate

DES function provides the meaning and the measurement of the 5 variables below:

 Grdp: stands for Gross regional domestic product per capita (unit: mil VND/capita/year).
Grdp is a quantitative variable.

 Grad: stands for High school graduation rate (unit: percent). Grad is a quantitative
variable.

 Inv: stands for Foreign direct investment (unit: mil USD). Inv is a quantitative variable.

 Dens: stands for Population density (unit: people/km2). Dens is a quantitative variable.

 Rate: stands for Participation labour rate (unit: percent). Rate is a quantitative variable.

2. Running SUM function

SUM function lets us know about observations, mean, standard deviation, max and min
value of the variables.

. sum grdp grad inv dens rate

Variable | Obs Mean Std. Dev. Min Max


-------------+---------------------------------------------------------
grdp | 62 55.39306 27.99608 20.7 154.84
grad | 62 94.59387 3.3962 85.36 99.4
inv | 62 678.3661 1571.956 .1 8669.7
dens | 62 516.0645 667.7978 51 4363
rate | 62 58.16613 3.807755 50.4 68.8
Where:
 Obs is the number of observations
 Std. Dev is the standard deviation of the variable

 Min/ Max is the minimum/ maximum value of the variable


8
By using SUM function, we have:
 Grdp: With 62 observations, the mean value is 55.393, Std. Dev. is 27.996. The minimum
value is 20.7, the maximum value is 154.84
 Grad: With 62 observations, the mean value is 94.593, Std. Dev. is 2.396. The minimum
value is 85.36, the maximum value is 99.4
 Inv: With 62 observations, the mean value is 678.366, Std. Dev. is 1571.956. The minimum
value is 0.1, the maximum value is 8669.7
 Dens: With 62 observations, the mean value is 516.0645, Std. Dev. is 667.7978. The
minimum value is 51, the maximum value is 4363
 Rate: With 62 observations, the mean value is 58.166, Std. Dev. is 3.808. The minimum
value is 50.4, the maximum value is 68.8
3. Running TAB function
Using TAB function respectively allows us to describe more than 1 variable coincidently
with frequency and percent of the variables.

Tab Grdp

. tab grdp

Gross |
regional |
domestic |
product | Freq. Percent Cum.
------------+-----------------------------------
20.7 | 1 1.61 1.61
26.7 | 1 1.61 3.23
27.31 | 1 1.61 4.84
30 | 1 1.61 6.45
33 | 2 3.23 9.68
33.6 | 1 1.61 11.29
34.33 | 1 1.61 12.90
36 | 1 1.61 14.52
36.64 | 1 1.61 16.13
37.49 | 1 1.61 17.74
37.5 | 2 3.23 20.97
……… … …… ……
80.5 | 1 1.61 85.48
83.16 | 1 1.61 87.10
86.5 | 1 1.61 88.71
93.94 | 1 1.61 90.32
97.1 | 1 1.61 91.94
97.3 | 1 1.61 93.55
117.66 | 1 1.61 95.16
130.8 | 1 1.61 96.77
150.1 | 1 1.61 98.39
154.84 | 1 1.61 100.00
------------+-----------------------------------
Total | 62 100.00

Analyzing information from the table above:


9
 Gross regional domestic product ranges from 20.7 to 154.84 (mil VND/capita/year)
 93.56% of the observations have the gross regional domestic product that is less than 100
mil VND/capita/year

Tab Grad

. tab grad

Graduation |
from high |
school rate | Freq. Percent Cum.
------------+-----------------------------------
85.36 | 1 1.61 1.61
86.01 | 1 1.61 3.23
86.74 | 1 1.61 4.84
87.07 | 1 1.61 6.45
89.81 | 1 1.61 8.06
90.45 | 1 1.61 9.68
90.77 | 1 1.61 11.29
90.86 | 1 1.61 12.90
91.1 | 1 1.61 14.52
91.45 | 1 1.61 16.13
91.51 | 1 1.61 17.74
……… … …… ……
97.83 | 1 1.61 82.26
97.92 | 1 1.61 83.87
97.97 | 1 1.61 85.48
98.22 | 1 1.61 87.10
98.29 | 1 1.61 88.71
98.43 | 1 1.61 90.32
98.87 | 1 1.61 91.94
99 | 1 1.61 93.55
99.22 | 1 1.61 95.16
99.24 | 1 1.61 96.77
99.39 | 1 1.61 98.39
99.4 | 1 1.61 100.00
------------+-----------------------------------
Total | 62 100.00

Analyzing information from the table above:


 High school graduation rate ranges from 85.36% to 99.4%

Tab Inv

. tab inv

Foreign |
direct |
investment | Freq. Percent Cum.
------------+-----------------------------------
.1 | 3 4.84 4.84
.2 | 1 1.61 6.45
10
.4 | 1 1.61 8.06
.5 | 1 1.61 9.68
.8 | 1 1.61 11.29
.9 | 1 1.61 12.90
1.2 | 1 1.61 14.52
1.4 | 1 1.61 16.13
4.4 | 1 1.61 17.74
7.3 | 1 1.61 19.35
…… … …… ……
1163.3 | 1 1.61 87.10
1263.5 | 1 1.61 88.71
1374 | 1 1.61 90.32
1695.2 | 1 1.61 91.94
1809 | 1 1.61 93.55
2178.8 | 1 1.61 95.16
3508.6 | 1 1.61 96.77
8338.2 | 1 1.61 98.39
8669.7 | 1 1.61 100.00
------------+-----------------------------------
Total | 62 100.00
Analyzing information from the table above:
 Foreign direct investment ranges from 0.1 to 8669.7 mil USD
 About 87,11% of the observations has foreign direct investment above 1 mil USD

Tab Dens

. tab dens

Population |
density | Freq. Percent Cum.
------------+-----------------------------------
51 | 1 1.61 1.61
56 | 1 1.61 3.23
63 | 1 1.61 4.84
65 | 1 1.61 6.45
79 | 1 1.61 8.06
88 | 1 1.61 9.68
94 | 1 1.61 11.29
96 | 1 1.61 12.90
… … …… ……
1022 | 1 1.61 88.71
1067 | 1 1.61 90.32
1176 | 1 1.61 91.94
1185 | 1 1.61 93.55
1347 | 1 1.61 95.16
1664 | 1 1.61 96.77
2398 | 1 1.61 98.39
4363 | 1 1.61 100.00
------------+-----------------------------------
Total | 62 100.00

Analyzing information from the table above:


 Population density ranges from 51 people/km2 to 4363 people/km2

11
Tab Rate

. tab rate

Participation|
labor rate | Freq. Percent Cum.
------------+-----------------------------------
50.4 | 1 1.61 1.61
51.6 | 1 1.61 3.23
51.7 | 1 1.61 4.84
52.4 | 1 1.61 6.45
53.2 | 1 1.61 8.06
53.5 | 2 3.23 11.29
53.7 | 2 3.23 14.52
54.1 | 1 1.61 16.13
54.6 | 2 3.23 19.35
54.7 | 1 1.61 20.97
… … …… ……
62 | 1 1.61 85.48
62.7 | 1 1.61 87.10
63.1 | 1 1.61 88.71
63.2 | 1 1.61 90.32
63.6 | 1 1.61 91.94
64.2 | 1 1.61 93.55
64.7 | 1 1.61 95.16
65 | 1 1.61 96.77
65.9 | 1 1.61 98.39
68.8 | 1 1.61 100.00
------------+-----------------------------------
Total | 62 100.00

Analyzing information from the table above:


 Labor participation rate ranges from 50.4% to 64.2%

12
IV. QUANTITATIVE ANALYSIS
1. OLS method and assumption
a. OLS method
Ordinary least squares (OLS) regression is a statistical method of analysis that
estimates the relationship between one or more independent variables and a dependent variable; the
method estimates the relationship by minimizing the sum of the squares in the difference between
the observed and predicted values of the dependent variable configured as a straight line. 

b. Assumptions

There are seven assumptions in the OLS method: 

 Assumption 1- Linear in parameters: In the PRF, the dependent variable, y, is related to the
independent variable, x, and the error term, u, as

Y = β 0 + β 1X + u

 Assumption 2 – Random sampling: We have a random sample of size n


 Assumption 3 – Sample variation in the explanatory variable: The sample outcomes on x,
namely { X i , i = 1,…, n}, are not all the same value.
 Assumption 4 – No perfect collinearity: In the sample, there are no exact linear relationships
among the independent variables.
 Assumption 5 - The error term has an expected value of zero given any value of the
explanatory variable. In other words, E(u|X)=0.

This assumption simply says that the factors not explicitly included in the model,
therefore subsumed in ui , do not systematically affect the mean value of Y; the positive
ui values cancel out the negative ui values so that their average or mean effect on Y is
zero.

 Assumption 6 - Homoskedasticity: The error term ui has the same variance given any value
of the independent variable. In other words, var (ui / X i )= E[ui - E(ui / X i )]^2= E(u2i / X i )= σ 2

Var(u) reflects the distribution of Y surrounding its E(Y|X). This assumption means that
Y corresponding to various X values have the same variance. The variance surrounding
the regression line is the same across the X values, it neither increases nor decreases as
X varies. 

 Assumption 7 - The population error u is independent of the explanatory variables X and


normally distributed:

𝑢~𝑁(0,σ 2)

13
If these assumptions hold true, the OLS procedure creates the best possible estimates. In
statistics, estimators that produce unbiased estimates that have the smallest variance are
referred to as being “efficient.” Efficiency is a statistical concept that compares the
quality of the estimates calculated by different procedures while holding the sample size
constant. OLS is the most efficient linear regression estimator when the assumptions
hold true. Another benefit of satisfying these assumptions is that as the sample size
increases to infinity, the coefficient estimates converge on the actual population
parameters.

2. Regression and correlation


a. Set up model
The relationship between the dependent variable (Y) and independent variables (X)
is illustrated by regression in the following form:
GRDP= β^0 + ^
^ β 1 × Grad+ ^
β 2 × Inv+ ^
β 3 × Dens+ ^
β 4 × Rate
Where:
 GRDP (dependent variable): Gross regional domestic product per
 Grad (independent variable): High school graduation rate
 Inv (independent variable): Foreign direct investment
 Dens (independent variable): Population density
 Rate (independent variable): Labor participation rate
b. Analyzing the corelation between independent variables
Running function: corr grdp grad inv dens rate
We have the following result:
. corr grdp grad inv dens rate
(obs=62)

| grdp grad inv dens rate


-------------+---------------------------------------------
grdp | 1.0000
grad | 0.0890 1.0000
inv | 0.6036 0.0607 1.0000
dens | 0.6576 0.2289 0.8097 1.0000
rate | -0.5164 0.1153 -0.3580 -0.3843 1.0000

As can be seen from the above result of running CORR, all 4 independent variables have
correlation of certain degrees with dependent variable AS, as the absolute values of coefficients are
different from 0 and not too small. Inv and Dens have the strongest correlation with GRDP. While,
Grad has weak association with GRDP and Rate has negative relationship with GRDP.
 Grad & GRDP: The higher the rate of graduation from high school is, the higher ross
regional domestic product is
 Inv & GRDP: The higher the foreign direct investment is, the higher ross regional domestic
product is
14
 Dens & GRDP: The higher the population density is, the higher ross regional domestic
product is
 Rate & GRDP: The lower the Labor participation rate is, the higher ross regional domestic
product is
c. Running regression function
Using function: reg Grdp grad inv dens rate
We have the following result:
. reg grdp grad inv dens rate
Source | SS df MS Number of obs = 62
-------------+---------------------------------- F(4, 57) = 15.68
Model | 25049.3279 4 6262.33198 Prob > F = 0.0000
Residual | 22761.2936 57 399.32094 R-squared = 0.5239
-------------+---------------------------------- Adj R-squared = 0.4905
Total | 47810.6215 61 783.780681 Root MSE = 19.983

------------------------------------------------------------------------------
grdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grad | .1927351 .8116529 0.24 0.813 -1.432572 1.818042
inv | .0030894 .0028439 1.09 0.282 -.0026054 .0087841
dens | .0165378 .0070495 2.35 0.022 .0024214 .0306541
rate | -2.245402 .7477628 -3.00 0.004 -3.742771 -.7480329
_cons | 157.1376 79.52838 1.98 0.053 -2.115253 316.3904

It can be inferred from the above result:

Variables Coefficient Coeffcient T p-values


s values
β0 157.1376 1.98 0.053
Grad β1 0.193 0.24 0.813
Inv β2 0.003 1.09 0.282
Dens β3 0.017 2.35 0.022
Rate β4 -2.246 -3.00 0.004

We can have the following regression model:


^
GRDP=157.1376+ 0.193× Grad+0.003 × Inv + 0.017 × Dens−2.246 × Rate

15
V. PROBLEM TESTING

1. Test omit variable

We have to run Ramsey’s test to check the functional form of the model

Apply ovtest
. reg grdp grad inv dens rate

      Source |       SS           df       MS      Number of obs   =        62


-------------+----------------------------------   F(4, 57)        =     15.68
       Model |  25049.3279         4  6262.33198   Prob > F        =    0.0000
    Residual |  22761.2936        57   399.32094   R-squared       =    0.5239
-------------+----------------------------------   Adj R-squared   =    0.4905
       Total |  47810.6215        61  783.780681   Root MSE        =    19.983

------------------------------------------------------------------------------
        grdp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        grad |   .1927351   .8116529     0.24   0.813    -1.432572    1.818042
         inv |   .0030894   .0028439     1.09   0.282    -.0026054    .0087841
        dens |   .0165378   .0070495     2.35   0.022     .0024214    .0306541
        rate |  -2.245402   .7477628    -3.00   0.004    -3.742771   -.7480329
       _cons |   157.1376   79.52838     1.98   0.053    -2.115253    316.3904
------------------------------------------------------------------------------

. ovtest

Ramsey RESET test using powers of the fitted values of grdp


       Ho:  model has no omitted variables
                  F(3, 54) =      3.60
                  Prob > F =      0.0190

From the result, we can see that (Prob > F) = 0.0190 < 0.05 => reject H 0

=> The model has omitted variable

=> The model has misspecification of functional form:

We have to change the functional form from lin – lin to lin – log model by changing variable
“Inv” into “ log (Inv)” (linv)

16
 Apply gen linv = log(inv) 
 Apply reg grdp grad linv dens rate
 Apply ovtest
. reg grdp grad linv dens rate

      Source |       SS           df       MS      Number of obs   =        62


-------------+----------------------------------   F(4, 57)        =     16.41
       Model |  25586.7408         4  6396.68521   Prob > F        =    0.0000
    Residual |  22223.8807        57  389.892643   R-squared       =    0.5352
-------------+----------------------------------   Adj R-squared   =    0.5025
       Total |  47810.6215        61  783.780681   Root MSE        =    19.746

------------------------------------------------------------------------------
        grdp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        grad |   .0867559   .7864118     0.11   0.913    -1.488007    1.661518
        linv |    1.77309    1.10238     1.61   0.113    -.4343888    3.980569
        dens |   .0197201    .004646     4.24   0.000     .0104167    .0290235
        rate |  -1.810615    .793251    -2.28   0.026    -3.399073   -.2221578
       _cons |   134.4399   80.91612     1.66   0.102    -27.59186    296.4716
------------------------------------------------------------------------------
. ovtest

Ramsey RESET test using powers of the fitted values of grdp


       Ho:  model has no omitted variables
                  F(3, 54) =      2.14
                  Prob > F =      0.1054

As (Prob > F) = 0.1054 > 0.05 => Accept H 0 ,the functional form is no longer
misspecification

2. Multicollinearity testing

Multicollinearity is the high degree of correlation amongst the explanatory variables, which may
make it difficult to separate out the effects of the individual regressors, standard errors may be
overestimated and t-value depressed. The problem of Multicollinearity can be detected by
examining the correlation matrix of regressors and carry out auxiliary regressions amongst them.  

17
In Stata, to test the multicollinearity, the VIF command is used. VIF (Variance Inflation Factor)
is defined as A measure of the amount of multicollinearity in a set of regression variables. The
presence of multicollinearity within the set of independent variables can cause a number of
problems in the understanding the significance of individual independent variables in the
regression model. When severe multicollinearity issues exist, the variance inflation factor exceeds
the acceptable value of 10 or proves to be very large for the variables involved. The VIF is given
by:  

VIF = 1/(1-Rj^2)  

If at least one of the variables has VIF greater than 10, we can define that the model is
multicollinearity. 
. vif

    Variable |       VIF       1/VIF  


-------------+----------------------
        linv |      1.56    0.639765
        dens |      1.51    0.664010
        rate |      1.43    0.700577
        grad |      1.12    0.896042
-------------+----------------------
    Mean VIF |      1.40

  
As the result, all 4 variables and the mean VIF of 4 variables is smaller than 10 (1.4 < 10),
we can jump to the conclusion that there is no multicollinearity

3. Heteroskedasticity testing   

Heteroskedasticity indicates that the variance of the error term is not constant, which makes the
least squares results no longer efficient and t tests and F tests results may be misleading. The
problem of Heteroskedasticity can be detected by plotting the residuals against each of the
regressors, most popularly the White’s test. It can be remedied by respecifying the model – look
for other missing variables. In Stata, the imtest, white command is used, which stands for
information matric test. 

  { H 0: Homoscedasticity     H 1: Heteroskedasticity  }

If P – Value is smaller than 0.05, we will reject H0 and accept H1. Apply the White Test or
Breusch - Pagan to test the model’s error. 

- Firstly, we use command rvfplot, yline (0) to see whether the model has heteroskedasticity  

18
As the distribution of the residual doesn’t converge in anydirection, we can predict that the model
has heteroskedasticity.

=> We will use the White Test and Breusch-Pagan / Cook-Weisberg test for further
conclusion: 
Apply command imtest, white:
. imtest, white
White's test for Ho: homoskedasticity
         against Ha: unrestricted heteroskedasticity
         chi2(14)     =     23.64
Prob > chi2  =    0.0506

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
              Source |       chi2     df      p

19
---------------------+-----------------------------
  Heteroskedasticity |      23.64     14    0.0506
            Skewness |      14.92      4    0.0049
            Kurtosis |       3.96      1    0.0467
---------------------+-----------------------------
               Total |      42.51     19    0.0015
---------------------------------------------------

As we can see (Prob > chi2) = 0.0506 > 0.05, so we do not have enough evidence to reject
the null hypothesis at the significant level of 5%. 

We also need to check again with Breusch-Pagan / Cook-Weisberg test for


heteroskedasticity:
. hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

Ho: Constant variance

Variables: fitted values of grdp

chi2(1) = 8.16

Prob > chi2 = 0.0043

As we can see : Prob > chi2= 0.0043 <0.05 => we reject the null hypothesis => the model
has heteroskedasticity

=> In conclusion, the model has heteroskedasticity

To alter the problem, we will run the model again with robust standard errors to fix the
heteroskedasticity, change into the real value of these errors.
. reg grdp grad linv dens rate, robust

Linear regression Number of obs = 62

F(4, 57) = 23.81

Prob > F = 0.0000

R-squared = 0.5352

20
Root MSE = 19.746

------------------------------------------------------------------------------

| Robust

grdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

grad | .0867559 .74026 0.12 0.907 -1.395589 1.569101

linv | 1.77309 .7925958 2.24 0.029 .1859443 3.360236

dens | .0197201 .0043407 4.54 0.000 .011028 .0284122

rate | -1.810615 .7342411 -2.47 0.017 -3.280908 -.3403229

_cons | 134.4399 55.52095 2.42 0.019 23.2611 245.6187

------------------------------------------------------------------------------

The value of these errors have been changed, giving the real value for variables in the regression
model.

4. Normality (of u) testing

H 0: u has normal distribution 

  Using the demand: 

 predict u, residuals                                
 histogram u, normal 

We have the following result: 

21
 

22
As can be seen from the graph, u does not have normal distribution. The Jacque - Bera test
is then executed with the demand: 

. sktest u

                    Skewness/Kurtosis tests for Normality


                                                          ------ joint ------
    Variable |        Obs  Pr(Skewness)  Pr(Kurtosis) adj chi2(2)   Prob>chi2
-------------+---------------------------------------------------------------
           u |         62     0.0000        0.0041       18.95         0.0001

As we can see from the table, because (Prob > chi2) = 0.0001 < 0.05, we reject the null
hypothesis. Thus, u has no normal distribution. However, in this model, we still assume that u has
normal distribution.

23
VI. STATISTICAL HYPOTHESIS TESTING

. reg grdp grad linv dens rate, robust

Linear regression Number of obs = 62


F(4, 57) = 23.81
Prob > F = 0.0000
R-squared = 0.5352
Root MSE = 19.746

------------------------------------------------------------------------------
| Robust
grdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grad | .0867559 .74026 0.12 0.907 -1.395589 1.569101
linv | 1.77309 .7925958 2.24 0.029 .1859443 3.360236
dens | .0197201 .0043407 4.54 0.000 .011028 .0284122
rate | -1.810615 .7342411 -2.47 0.017 -3.280908 -.3403229
_cons | 134.4399 55.52095 2.42 0.019 23.2611 245.6187
------------------------------------------------------------------------------

1. Critical value method

As we can see, grad has the absolute values of t=0.12 smaller than the critical value of 2.00.
Consequently, we do not have enough evidence to reject the null hypothesis, meaning that high
school graduation rate does not have statistically significant effect on GRDP. 

The absolute value of t of linv, dens and rate (2.24, 4.24, 2.28) is higher than the critical value of
2.00. As a result, we reject the null hypothesis, meaning that Foreign direct investment (FDI);
Population density and labour participation rate has statistically significant effect on GRDP. 

2. Confidence interval method

According to the statistics, β j = 0 is included in the confidence interval of grad so we accept the
null hypothesis and jump to a conclusion that that high school graduation rate does not have
statistically significant effect on GRDP. 

Whereas, β j = 0 does not fall in the range of the confidence interval of linv (0.1859443,
3.360236); dens (0.011028, 0.0284122), and rate (-3.280908, -0.3403229), as a result, we could

24
reject the null hypothesis, which means that FDI; Population density and Labor participation rate
has statistically significant effect on GRDP

25
VII. CONCLUSIONS AND POLICY IMPLICATION

1. Conclusions

After the whole analytic and testing process, we have raised an overview of the data set
given in terms of the statistical indication about how different determinants affect the Gross
regional domestic product in Vietnam in 2018 .
As mentioned above, we carry out a research how four factors - Grad (Graduation from
high school rate); Inv (Foreign direct investment), Dens (Population density) and Rate (Labour
participation rate) – are involved in changing Gross regional domestic product per capita in
Vietnam. The model illustrates that these four independent variables can explain 53,52% of the
total variation in the dependent one; the 46,48% remaining depends on other variables that are not
mentioned in our research. In addition, most of independent variables have the same positive effect
on the dependent variable except for Rate (Labour participation rate) based on the coefficient
testing result.
The data analysis, regression model and hypothesis testing have shown Foreign direct
investment (FDI), Population density and Labour participation rate has statistically significant
effect on GRDP per capita, while high school graduation rate do not have statistically significant
effect on GRDP per capita
2. Policy implications
About policy implication, through the analysis, we notice that labor participation rate (of
people from 15 years old) have a significant but negative impact on the GRDP per capita. We
believe that due to the low education level and harsh conditions in rural cities, People have to go to
work from early age with low income. And as a consequence, the GRDP per capita in these areas
are lower while labor participation rate is higher when comparing with urban cities. Therefore, in
order to increase GRDP per capita in Vietnam, the government must decrease labor participation
rate in each city, which can be done indirectly by improving our education system, especially in
rural area. Also, the government should create more subsidies for students with poor condition to
prevent them from dropping out to work at young age and encourage them to search for higher
education
On the other hand, foreign direct investment (FDI) also have a significant impact on GRDP
per capita (in a positive way). Hence, government need to take actions to raise foreign investor’s
awareness about potential investments in Vietnam, especially in Tourism. This can be done by
running promotion campaigns for famous Tourist Attractions and culture heritage. In addition,
Government should create more policies and subsidies that encourage the development of Tourism
in Vietnam, especially when Tourism are suffering a lot under the negative impact of Covid-19.

26
VIII. REFERENCE

1. https://www.gso.gov.vn/so-lieu-thong-ke/
2. ERSA2015_00207.pdf (econstor.eu)
3. Microsoft Word - thesis2.doc (diva-portal.org)
4. https://www.philadelphiafed.org/error
5. http://impact.all4ed.org

27
IX. APPENDIX

GRDP Highschool Foreign Direct Population Labour


(Mil VND/ graduation rate Investment density paticipation rate
Capita/ year) (%) (Mil USD) (people/km2) (%)

Province GRDP Grad Inv Dens Rate

Hà Nội 93.94 95.83 8669.70 2398.00 50.40


Vĩnh Phúc 86.50 99.39 586.20 934.00 54.60
Bắc Ninh 150.10 99.40 1695.20 1664.00 55.20
Quảng Ninh 117.66 97.34 242.10 214.00 54.60
Hải Dương 56.30 99.00 691.40 1022.00 55.50
Hải Phòng 97.10 91.88 1374.00 1176.00 54.70
Hưng Yên 55.30 97.76 488.20 1347.00 57.10
Thái Bình 38.00 98.29 67.50 1185.00 59.90
Hà Nam 55.20 96.70 864.20 991.00 56.60
Nam Định 52.00 99.24 267.70 1067.00 57.90
Ninh Bình 48.50 98.22 149.50 708.00 59.50
Hà Giang 20.70 91.10 0.50 108.00 62.70
Cao Bằng 26.70 95.24 0.20 79.00 65.90
Bắc Kạn 30.00 96.65 4.40 65.00 68.80
Tuyên Quang 36.00 98.87 20.00 134.00 61.00
Lào Cai 61.84 96.27 0.90 115.00 61.00
Yên Bái 33.60 96.07 7.30 119.00 63.60
Thái Nguyên 77.70 92.98 616.00 364.00 59.70
Lạng Sơn 38.40 91.51 1.40 94.00 62.00
Bắc Giang 52.10 99.22 1163.30 468.00 60.80
Phú Thọ 38.50 97.97 348.40 414.00 57.50
Điện Biên 27.31 94.53 1.20 63.00 57.40
Lai Châu 33.00 97.92 0.10 51.00 60.30
Sơn La 38.00 98.43 0.40 88.00 61.30
Hòa Bình 48.30 97.36 0.10 186.00 64.70
Thanh Hóa 41.10 97.34 350.40 328.00 61.60
Nghệ An 36.64 97.83 315.10 202.00 57.50
Hà Tĩnh 49.50 93.63 32.60 215.00 54.10
Quảng Bình 37.50 93.37 0.80 111.00 57.60
Quảng Trị 43.60 90.86 20.00 133.00 53.50
Thừa Thiên -
Huế 40.76 95.17 324.50 224.00 53.70
Đà Nẵng 83.16 85.36 515.20 883.00 51.60
Quảng Nam 61.07 86.01 184.20 141.00 57.70
Quảng Ngãi 57.80 92.62 136.60 240.00 59.40
Bình Định 46.89 94.99 96.60 245.00 59.40
Phú Yên 39.97 87.07 216.60 191.00 59.50
Khánh Hòa 62.13 89.81 202.30 240.00 55.50
Ninh Thuận 39.70 91.94 133.70 176.00 55.70

28
Bình Thuận 50.31 90.45 1809.00 158.00 57.60
Kon Tum 37.49 96.72 507.00 56.00 57.20
Gia Lai 45.36 90.77 0.10 98.00 59.20
Đắk Lắk 41.00 86.74 206.00 143.00 57.80
Đắk Nông 45.24 91.45 708.00 96.00 59.60
Bình Phước 56.85 92.14 465.90 145.00 58.10
Tây Ninh 58.30 92.72 1263.50 289.00 57.50
Bình Dương 62.79 94.41 3508.60 900.00 65.00
Đồng Nai 130.80 95.34 2178.80 524.00 53.20
Bà Rịa - Vũng
Tàu 97.30 96.73 1085.40 580.00 52.40
TP. Hồ Chí Minh 154.84 95.34 8338.20 4363.00 51.70
Long An 68.62 92.52 931.90 376.00 58.80
Tiền Giang 46.90 95.56 396.40 703.00 63.10
Bến Tre 33.00 96.07 64.80 538.00 63.20
Trà Vinh 44.00 95.83 110.30 428.00 57.00
Vĩnh Long 44.80 96.38 150.50 693.00 58.80
Đồng Tháp 40.00 92.36 13.00 473.00 64.20
An Giang 34.33 95.20 65.40 540.00 54.80
Kiên Giang 48.21 94.17 20.70 272.00 53.50
Cần Thơ 80.50 96.57 69.10 858.00 58.50
Hậu Giang 38.32 93.35 71.00 452.00 60.30
Sóc Trăng 37.50 95.58 112.30 362.00 53.70
Bạc Liêu 42.05 93.36 114.10 340.00 55.30
Cà Mau 43.29 91.89 80.20 226.00 56.30

29

You might also like