Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

FOREIGN TRADE UNIVERSITY

FACULTY OF INTERNATIONAL ECONOMICS


-----------------o0o-----------------

ECONOMETRICS REPORT

2015 CENSUS OF POPULATION


AND ITS DOMINANT COMPONENTS

Class : KTEE 309.2

Student Name – ID : Đào Nguyễn Minh Khuê - 1817150088 (33,33%)

Trịnh Thị Quỳnh Trang – 1811150135 (33,33%)

Hoàng Huyền Trang – 1811150130 (33,33%)

Instructor : Dr. Đinh Thị Thanh Bình

Hà Nội, 12/2019
Table of Contents
ABSTRACT..........................................................................................................................3
INTRODUCTION.................................................................................................................3
I, LITERATURE REVIEW.......................................................................................................4
1, Question of interest..................................................................................................................4
2. Procedure and program used...................................................................................................5
b.  Hypothesis:...................................................................................................................5
II.ECONOMIC MODEL.........................................................................................................5
1. Specifying the object for modeling...........................................................................................6
2. Defining the target for modeling by the choice of the variables to analyze, denoted as { xi}....6
3. Embedding that target in a general unrestricted model (GUM)................................................6
III, ECONOMETRICS MODEL................................................................................................7
IV, DATA COLLECTION........................................................................................................8
1. Data overview..........................................................................................................................8
2. Data description.......................................................................................................................8
V, ESTIMATION OF ECONOMETRIC MODEL.........................................................................9
1, Checking the correlation among variable.................................................................................9
2. Regression run........................................................................................................................10
VI, DIANOSING MULTICOLLINEARITY AND HETEROSKEDASTICITY.....................................11
1. Multicollinearity.....................................................................................................................11
2. Heteroskedasticity..................................................................................................................12
VII, HYPOTHESIS POSTULATED.........................................................................................14
VIII, RESULT ANALYSIS & POLICY IMPLICATION.................................................................16
RESEARCH LIMITATIONS..................................................................................................17
CONCLUSION....................................................................................................................17
REFERENCES.....................................................................................................................17
ABSTRACT
   Nowadays, the unstable changes of the worldwide population are gradually
becoming an issue for all countries in our world. For the time being, 2 major problems
still exist. The first problem is that the rapid increase in population creates a
population explosion in many countries. The second problem related to the decrease in
population in some developed countries. These existing phenomenons are the concern
for all economists because it affects directly the economic, environmental, and social
aspects of our world. Facing these problems, a worldwide population census and
determining the dominant components are highly necessary for each country.
  We will do statistical survey to explores the worldwide population census and
its importance for each country, so that we can find out the reasonable solutions from
an economic perspective by statistically testing the hypothesis that the dominant
constituents affect the population change 

INTRODUCTION

Population means the total number of people that inhabit an area and make an
entire region. A population census is the complete process of gathering, compiling,
evaluating, analysing and publishing, or otherwise disseminating, to all persons in a
country or in a well-defined part of a country, demographic, economic and social
information at a specified time. The most important aspect in any society is human
capital. The basic element of socio-economic growth and development is the human
resource, which always associates the population change in both quantity and quality.
Ultimately, the aim of development is to improve the quality of life and meet people's
rising needs.
So in one country, conducting a population census is very important and
necessary. Firstly, knowing the size of population helps government determine the
number of people who can pay taxes therefore government could estimate the amount
of revenue that can be obtained from the sector. Secondly, it helps to forecast the
country’s economic needs, such as electricity, housing, food, etc. Thirdly, it provides
social amenities. Population census data gives an idea of what kind of social amenities
should be provided to the particular families and areas, for example, hospitals,
housing, water, electricity and others. Another important role is assisting the
government and international agencies in giving of aids. As we can see, population
census is more than just assessing the amount of a certain area's population at a given
time. If the areas vary dramatically from each other, it can help a government grow
industry and balance the economy of the country. 
As much as population theory is a meaningful science that determines the social
development in general and national growth in particular, Econometrics is the use of
statistical techniques to understand those issues and test theories. Without evidence,
population theories are abstract and might have no bearing on reality (even if they are
completely rigorous). Econometrics is a set of tools we can use to confront theory
with real-world data.

Given the data set, our group, which includes three members: Dao Nguyen Minh
Khue, Hoang Huyen Trang, Trinh Thi Quynh Trang follows the methodology of
econometric comprising eight steps to analyze the data. Note that because of the lack
of information on the data set, all inferences of abbreviations and others are based on
assumptions and self-research. As a result, we hope to have shown clearly our logic
and reasoning of analysis.

To the extent of purpose and resources, there are still deficiencies in this report,
but we look forward to providing readers with a decent view of the overall of the data
set given and the knowledge that we have gained through Dr. Thanh Binh’s
Econometrics course.

I, LITERATURE REVIEW

1, Question of interest
After the assignment of basing on the datasheets and looking for the subject’s
name, our team decided to choose the topic title "2015 Census of population and its
dominant components " for the following reasons:
First of all, the topic title must highlight what we will analyze and point out in
this report. The data which we are provided are based on issues such as birth rate,
death rate, international migrant stock, and so on.
Secondly, the population census is always a matter of concern all over the world.
Certainly, we can not deny its importance that affects the economic and social aspects
of our world.
Thirdly, through the analysis of specific data, we also want to offer solutions to
solve the problems related to the population change for each country.
Other authors applied both quantitative and qualitative tools in their researches.
Most used the regression OLS model to analyze the dependence and relationships
among the variables.

2. Procedure and program used


a·    Procedure

Step 1: Questions of interest


Step 2: Economic model
Step 3: Econometric model
Step 4: Data collection
Step 5: Estimation of econometric model
Step 6: Check multicollinearity and heteroscedasticity
Step 7: Hypothesis postulated

Step 8: Result analysis & Policy implication

·     Stata program is primarily used to analyze the data and run the regression.

b.  Hypothesis:
We assume that the total worldwide population-ppt is affected by the following
variables: the refugee population by country or territory of asylum, birth rate, death
rate, mortality rate of infants, life expectancy at birth and the total of international
migration stock.
 Refugee population by country or territory of asylum-rp: political issues will
cause people immigrant and migrant to other country temporarily or
permanently.
 Birth rate-br: Number of children born will increase each nation’s population.
 Death rate-dr:Number of people dead will decrease each nation’s population.
 Mortality rate of infants-mri: Number of infants dead will decrease each
nation’s population and the speed of population expanding.
 Life expectancy at birth-leb: The longevity allow people to give birth more and
help developing countries, as well as increase each nation’s population.
 Total of international migration stock-ims: Due to economics and political
issues, pepple immigrant to more developing countries to have better life,
which will increase each nation’s population.

II.ECONOMIC MODEL
As data are provided up front, the economic model used in this report is an
empirical one. Note that the fundamental model is mathematical; with an empirical
model, however, data is gathered for the variables and using accepted statistical
techniques, the data are used to provide estimates of the model's values.

Empirical model discovery and theory evaluation are suggested to involve five
key steps, but for the limitation of purpose and resources, this part of the report only
follows three of them: (1) specifying the object for modeling, (2) defining the target for
modeling, (3) embedding that target in a general unrestricted model.

1. Specifying the object for modeling


ppt = f(x)
As such, this report finds the dominant factors that affects worldwide population,
which is the object for modeling, and each of relating factors including birth rate,
death rate, life expectancy at birth, mortality rate of infants, and refugee population by
country or territory of asylum

2. Defining the target for modeling by the choice of the variables


to analyze, denoted as { x i}
As mentioned above, there are three main categories that are expected to affect
worldwide population: the birth and death rate, longevity rate and number of
immigrant and migrant people. Hence, the choices of   i  would be such variables
x
that constitute them. After thorough research, factors have been narrowed down to six
significant ones: refugee population, birth rate, death rate, infant death rate, life
expectancy and number of immigrant.

3. Embedding that target in a general unrestricted model (GUM)


In its simplest acceptable representation (which will later be specified in the
econometric model), the GUM of is determined to be: 

 ppt= f(rp,br,mr,dr,mri,leb,ims)

Variabl Definition
e
Ppt Population total  Dependent Quantitative
variable
Rp Refugee population by country or Independent Quantitative
territory of asylum variable
Br Birth rate. crude (per 1.000 people) Independent Quantitative
variable
Dr Death rate. crude (per 1.000 people) Independent Quantitative
variable
Mri Mortality rate of infant (per 1.000 live Independent Quantitative
births) variable
Leb Life expectancy at birth (years) Independent Quantitative
variable
Ims International migrant stock. Independent Quantitative
variable

III, ECONOMETRICS MODEL


To demonstrate the relationship between population and other factors, the
regression function can be constructed as follows:

· (PRF): ppt  = β 0 + β 1*rp  + β 2*br + β 3*dr  + β 4 *mri + β 5*leb + β 6*ims + ui


·   (SRF):ppt= ^
β0 + ^ β 1*rp  + ^
β 2*br + ^
β 3*dr  + ^
β 4 *mri + ^
β 5*leb + ^
β 6*ims+u^i

where:
 0 is the intercept of the regression model

 i is the slope coefficient of the independent variable xi


 is the disturbance of the regression model

β 0 is the estimator of  0
^

β i is the estimator of  i
^

µi is the residual (the estimator of i )


^

From this model, this report is interested in explaining ppt in terms of each of the
six independent variables 
IV, DATA COLLECTION
1. Data overview
·  This set of data is a secondary one, as they are collected from a given source.
·  Data source: Regression Diagnostics: Identifying Influential Data and Sources of
Collinearity, by D.A. Belsey, E. Kuh, and R. Welsch, 1990. New York: Wiley

·  The structure of Economic data: cross-sectional data

2. Data description
To get statistic indicators of the variables, in Stata, the following command is
used:

Where:

           Obs is the number of observations

           Std. Dev is the standard deviation of the variable

           Min is the minimum value of the variable

           Max is the maximum value of the variable


V, ESTIMATION OF ECONOMETRIC MODEL
1, Checking the correlation among variable
Firstly, we have to analyze the correlation of variables, determining the
correlation coefficients then specifically consider whether there is multicollinearity
among variables in the model. Use corr command in stata:

corr ppt rp br dr mri leb ims

The correlation coefficient between ppt and rp is: 0,8805

The correlation coefficient between ppt and br is: -0,0449

The correlation coefficient between ppt and dr is: -0,0488

The correlation coefficient between ppt and mri is: 0,1028

The correlation coefficient between ppt and leb is: - 0,0186

The correlation coefficient between ppt and ims is:  0.7934

 From the result above, the correlation among variables is under 1 so that there is not
strong correlation among variables in the model

2. Regression run
Use reg command in stata: 

From the table, we have a sample regression model.

Population regression function (PRF) can be expressed as:

From the table, we have a sample regression model.

Population regression function (PRF) can be expressed as:

ppt = 4.28e+09 + 216.6547*rp - 3.72e+07*br - 3.80e+07*dr + 3427557*mri


-4.55e+07*leb +10.94527*ims

Sample regression function (SRF) can be expressed as:

ppt =4.28e+09 + 216.6547*rp - 3.72e+07*br - 3.80e+07*dr + 3427557*mri


^
-4.55e+07*leb +10.94527*ims

       Economic significance of regression coefficients:

-β0 = 4.28 : When all the independent variables are zero, the expected value of
population is  104,28

-β1 = 216.65 : When the number of refugee population by country or territory of


asylum increases by one, other determinants are held constant, the expected value
of population increase by 216.65.

-β2 = -3.72 : When crude birth rate increases by one person per 1000 people, other
determinants are held constant, the expected value of population decreases by
3.72.
-β3 = -3.8 : When crude death rate increases by one person per 1000 people, other
determinants are held constant, the expected value of population decreases by 3.8.

-β4 = 3427557 : When the mortality rate of infant increases by one person per
1000 live birth, other determinants are held constant, the expected value of
population increase by 3427557.

-β5 = -4.6 : When the life expectancy at birth increases by one year, the expected
value of population decrease by 4.6

-β6 = 10.94 : When the international migrant stock increases by one person, the
expected value of population increase by 10.94

The coefficient of determination R-squared= 0,8229   : all independent


variables (br, dr, mri, rp, ims, leb) jointly explain 82,29% of the variation in the
dependent variable (ppt); other factors that are not mentioned explain the
remaining 17,71% of the variation in the lprice.

Other indicators:

- Adjusted coefficient of determination adj R-squared = 0.8171

- Total Sum of Squares TSS = 2.1916e+20

- Explained Sum of Squares ESS = 1.8035e+20

- Residual Sum of Squares RSS = 3.8813e+19

- The degree of freedom of Model Df_m= 6

- The degree of freedom of residual Df_r = 183

VI, DIANOSING MULTICOLLINEARITY AND


HETEROSKEDASTICITY
1. Multicollinearity
Multicollinearity is the high degree of correlation amongst the explanatory
variables, which may make it difficult to separate out the effects of the individual
regressors, standard errors may be overestimated and t-value depressed. The problem
of Multicollinearity can be detected by examining the correlation matrix of regressors
and carry out auxiliary regressions amongst them. In Stata, the vif command is used,
which stand for variance inflation factor.
The value of VIF here is lower than 10, indicating that Multicollinearity is not
too worrisome a problem for this set of data.

2. Heteroskedasticity
Heteroskedasticity indicates that the variance of the error term is not constant,
which makes the least squares results no longer efficient and t tests and F tests results
may be misleading. The problem of Heteroskedasticity can be detected by plotting the
residuals against each of the regressors, most popularly the White’s test. It can be
remedied by respecifying the model – look for other missing variables. In Stata, the
imtest white command is used, which stands for information matric test.
At the 5% significance level, there is enough evidence to reject the null
hypothesis and conclude that this set of data meets the problem of Heteroskedasticity.

Another way to test if Heteroskedasticity exists is to graph the residual-versus-


fitted plot, which can be generated using the rvfplot, yline (0) line command in Stata.

In a well-fitted model, there should be no pattern to the residuals plotted


against the fitted values - something not true of our model. Ignoring the outliers at
the top center of the graph, we see curvature in the pattern of the residuals,
suggesting a violation of the assumption that price is linear in our independent
variables. We might also have seen increasing or decreasing variation in the
residuals— heteroskedasticity.
To fix the problem, robust standard errors are used to relax the assumption
that errors are both independent and identically distributed. In Stata, regression is
rerun with the robust option, using the command:
Note that comparing the results with the earlier regression, none of the
coefficient estimates changed, but the standard errors and hence the t values are
different, which gives reasonably more accurate p values.

VII, HYPOTHESIS POSTULATED

Test each coefficient to know whether it is meaningful to the model, in other words,
we test the significance of each independent variable on the dependent one (ppt).

Two hypotheses for hypothesis testing:


H 0 : β i=0
{ H 1 : βi ≠ 0

If P-value of an independent variable is smaller than the confidence level , we reject


H , accept H . It means this variable has significance on ppt.
0 1

Test for overall significance of β :


1

H 0 : β 1=0
{ H 1 : β1≠ 0

Prob (β ) = 0.000 <0.05, we cannot reject H0 at level of significance α = 5%.


1

Therefore, β is statistically significant at 5%.


1

Test for overall significance of  β 2:

H 0 : β 2=0
{ H 1 : β2≠ 0

Prob (β ) = 0.000 < 0.05, we reject H0 at level of significance α = 5%. Therefore, β is


2 2

statistically significant at 5%.

Test for overall significance of β :


3

H 0 : β 3=0
{ H 1 : β3≠ 0

Prob (β ) = 0.019 < 0.05, we reject H3 at level of significance α = 5%. Therefore, β is


3 3

statistically significant at 5%.

Test for overall significance of β4:

H 0 : β 4 =0
{ H 1 : β 4 ≠0

Prob (β4) = 0.036 < 0.05, we cannot reject H at level of significance α = 5%.
4

Therefore, β4 is not statistically significant at 5%.

Test for overall significance of β5:

H 0 : β 5=0
{ H 1 : β5≠ 0
Prob (β5) = 0.000 < 0.05, we reject H at level of significance α = 5%. Therefore, β 5 is 
5

statistically significant at 5%.

Test for overall significance of β6:

H 0 : β 6=0
{ H 1 : β6≠ 0

Prob (β6) = 0.000 < 0.05, we reject H at level of significance α = 5%. Therefore, β 6 is 
5

statistically significant at 5%.

In conclusion, all the factor have a significant effect on ppt 

VIII, RESULT ANALYSIS & POLICY IMPLICATION


From data analysis in preceding sections, we have gained an overall view of
the data set given in terms of the statistical proof of the relationship between
worldwide population and each of the following factors: birth and death rate,
longevity rate and number of immigrant and migrant people proposed. As
mentioned at the beginning of this report, we aim to learn how features are
associated with worldwide population. In other words, we are concerned about
what is the willingness of buyers to pay for these components.

Following the analysis of data, regression model run and hypothesis testing,
it can be concluded that the birth and death rate, longevity rate and number of
immigrant and migrant people factors do affect, or at least statistically so, the
housing prices. Therefore both Government and people should take all of these
ingredients into account when controlling the number or population.

RESEARCH LIMITATIONS
Apart from the factors we have studied, there are in fact many other factors of
demographics, economics, politics, v.v on worldwide population that we were not able
to examine. Otherwise, we have collect the data in 2015 so our research is quite time-
bias.   So that, the observations collected did not come to our expectations

 Finding the variables cut out for the model was a bit difficult. Though some
variables could be added to increase the suitable level, this might make the
model become more complex, possibly causing defects in testing
 Given limited team members’ capabilities, we met some difficulties during
testing process

CONCLUSION
This report is completed on the dedicated contribution of each member and the
knowledge from our study in Econometrics. This also provides us with a good
opportunity to practice what we have learned and to get a deeper understanding of
data analysis and relevant testing. From this useful application, we hope that our work
can somehow suggest the relationship between the worldwide population and the
number of refugee population by country or territory of asylum, crude of birth rate,
crude of death rate, mortality rate of infant, life expectancy at birth, international
migrant stock.

 Last but not least, due to the limitation of understanding and resources, our
report may contain misinterpretations. We hope that Dr. Dinh Thi Thanh Binh and
readers can give us constructive comments on the report so that we would improve
ourselves and do better in the future.

 
Sincerly,
Your students.

REFERENCES
1. Regression Diagnostics: Identifying Influential Data and Sources of
Collinearity, by D.A. Belsey, E. Kuh, and R. Welsch, 1990. New York: Wiley.
2. The lecture of Dr.Dinh Thi Thanh Binh
3. DataWorldbank.org

You might also like