Econometrics

VIETNAM NATIONAL UNIVERSITY,
HANOI INTERNATIONAL SCHOOL
------------------------------
SUBJECT: ECONOMETRICS
Project 4: Individuals wages who work as wage earners
in SOEs (state-owned enterprises) and private enterprises.
Class: INS304902
Lecturer: Tran Quang Tuyen

Le Van Dao
Group: Group 4
Member: Nguyen Bich Diep – 21070891 – 33%

Do Thanh Van – 21070790 – 38.5%
Dong Anh Tuan – 21070550 – 28.5%
Ha Noi, June 8th 2023

Determinants of individuals wages who work as wage earners in SOEs
( state-owned enterprises) and private enterprises.
I. Introduction
The income of people everywhere is of interest to researchers whether in rich countries,

poor countries or small localities. Since personal income is an important indicator of
economic significance to assess the level of development of a country, the standard of living
of a geographical area. Income can vary across regions of the country. Large disparities create
income inequalities across regions. The term "income inequality" describes the occurrence of
unequal income distribution among people or households in the economy. Income inequality
has a negative impact on social cohesion, economic growth, quality of life, poverty, and
crime. Increasing personal income, improving people's living standards and reducing social
inequality is an issue that is being concerned by governments around the world.
II. Literature review
According to the livelihood framework proposed by Tran et al. (2014), a number of

individual characteristics (such as marital status, age and gender) and social-economic
characteristics have impacts on livelihood outcomes
(https://www.researchgate.net/profile/Tran-Tuyen-
4/publication/262580514_Farmland_loss_and_livelihood_outcomes_a_microeconometric_an
alysis_of_household_surveys_in_Vietnam/links/567a226908aeaa48fa4aeeca/Farmland-loss-
and-livelihood-outcomes-a-microeconometric-analysis-of-household-surveys-in-
Vietnam.pdf).
Combining the previous factors, we get an economic model. Education is a crucial
determinant, as higher levels of education are associated with better job opportunities and
higher income. Gender may influence wage through differences in labor force participation
and earnings. Marital status may affect income through shared resources and economies of
scale. Finally, provinces and urban/rural locations can influence income levels due to
variations in economic opportunities and infrastructure. The year variable captures the impact
of changing economic conditions over time.
1|Page
III. Data and research design
1. Data:
In our research, we use the dataset that contains 56584 individuals who work as wage
earners in SOEs (state-owned enterprises) and private enterprises. The dataset consists of
useful information about a variety of personal characteristics of worker such as: the age,
gender, married, experience,.. In addition, factors which related to region are also mentioned.
To implement descriptive and inferential statistics, we conduct to create a logarithm of
income instead of income level: log(wage) = ln(wage + 1).
2. Research design:
Wage is the result of many factors from subjective to objective; from the capacity of
each person and the macro context of the economy:
• Economic model of wage
Y=f(age, edu, gender, married, SOE, experience, training, urban, region)
• Econometric model
We have : Log_wage = ln(wage+1)
Log_wage = 𝛽0 + 𝛽1 𝑎𝑔𝑒 + 𝛽2 𝑒𝑑𝑢 + 𝛽3 ∗ 𝑒𝑥𝑝𝑒𝑟 + 𝛿1 𝑔𝑒𝑛𝑑𝑒𝑟 + 𝛿2 𝑚𝑎𝑟𝑟𝑖𝑒𝑑 +

𝛿3 *SOE + 𝛿4 *training + 𝛿5 *urban + 𝛿6 *region + 𝛿7 gender*training + 𝛿8 edu*SOE
+µ
2|Page
This model is a multivariable linear regression model, used to predict median
individuals wage based on various factors. The explanatory variables in this dataset seem to
have been chosen to capture the demographic, socioeconomic and geographic characteristics
of the individuals who work as wage earners in SOEs ( state-owned enterprises) and private
enterprises. These variables can help explain the variation in individuals' income. Specific
explanations for variables:
- The age, edu, gender and married of the head of household are chosen because
they can affect their earning ability.
- SOE and experience were selected to show the difference about the factors
affecting wages can earn in types of businesses.
- Urban and region were chosen to represent geographic working and living
conditions, which can affect earning opportunities and cost of living.
- Year is selected to control for changes in earnings over time.
- Interaction variables such as gender*training and edu*SOE were chosen to
explore more complex relationships between these variables and income.
Expected sign:
- The age, edu, gender, married, experience, region and training can have a
positive effect on income (positive coefficient): Trained employees will have
higher wages than untrained employees, employees have more experience and
education will get higher salaries.
- Married (if set 1 = married), can have a positive impact in the labor market.
- Gender (if set 1 = male), can have a positive impact due to gender inequality in
the labor market.
- SOE can have a negative impact on income (negative coefficient) because
workers in state-owned enterprises are subject to more stringent regulations
than a private enterprise.
- The relationship of the interaction variables with wage can be more complex
and depends on the specific context.
The relationship of the interaction variables with wage can be more complex and depends on
the specific context.
IV. Result and Discussion
1. Discuss 5 assumptions to obtain unbiasedness estimators:
To obtain unbiased coefficient estimates from the OLS estimator for the model, there
are 4 assumptions that need to be satisfied. In the following, we will discuss in more detail the
context of this model.
a. Assumption 1: Linear in parameters
The MLR model: : 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥3 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝑢
3|Page
Linear in parameters means the parameter appears with a power of 1 only and is not
multiplied or divided by any other parameter. The dependent and the independent variables
can be arbitrary functions of the underlying variables of interest, such as natural logarithms
and squares. In this model, y has a functional form that is the logarithm of wage + 1, x has a
functional form that is the product of two underlying variables such as edu*SOE. While, the
functional form of x, and y can be flexible and the parameters must be linear and it can be
seen obviously in the model.
b. Assumption 2: Random Sampling
The sample taken for the linear regression model must be drawn randomly from the
population. To take this regression model, we must select 56584 individuals who work as
wage earners in SOEs ( state-owned enterprises) and private enterprises. It means that every
labour in the population has an equal chance of being selected for the sample. The number of
observations taken in the sample for making the linear regression model should be greater
than the number of parameters to be estimated. This makes sense mathematically too. If the
number of parameters to be estimated (unknowns) are more than the number of observations,
then estimation is not possible.
c. Assumption 3: No perfect multicollinearity
The independent variables should not be perfectly correlated with each other. Perfect
multicollinearity rarely occurs. This model allows the independent variables to be correlated
but not perfect. The problem of almost linear relationship among explanatory variables is
called multicollinearity. Multicollinearity does not violates assumption 3. However,
multicollinearity also affects the interpretation of causality in the OLS model. In the model,
this means that factors such as age, education, marriage and gender should not be strongly
correlated. If they are closely examined, it will be difficult to discern the impact of each factor
on wage.
d. Assumption 4: Zero conditional mean
Strong and key assumption: 𝐸(𝑢|𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑘 ) = 0
This assumption means that the error u has an expected value of zero given any values
of the independent variables. It also implies that the average value of unobservable factors is
uncorrelated with the explanatory variables – including no linear and nonlinear relationship.
This is a strict and difficult estimate to achieve, but when this assumption is reached, the OLS
model will result in unbiased and consistent estimates. Assumption 4 also implies a weaker
assumption which means there is no linear relation between the unobservable factors and the
independent variables. Since this is a weaker assumption, when achieved, the result of the
OLS model is a biased but still consistent estimate.
In the context of the model: 𝐸(𝑢|𝑎𝑔𝑒𝑖 , 𝑒𝑑𝑢𝑖 , 𝑔𝑒𝑛𝑑𝑒𝑟𝑖 , … ) = 0
According to this assumption, the error term u has an expected value of zero given any
value of the independent variables such as age, edu, and gender,... must be satisfied to obtain
unbiased coefficient estimates of the logarithm of household income per capita.
4|Page
e. Assumption 6: The normality of errors:
y|x ~ Normal(β0 + β1x1 + β2x2 + … + βkxk, 𝛿 2)
Parametric statistical procedures are based on the assumption of basic normality in the
population from which a sample is selected. While many univariate test statistics such as T-
test and F-test are said to be unaffected by moderate differences with basic assumptions about
normality and uniformity of variance. However, many authors have pointed out the danger of
erroneous statistical inferences when only the mean and standard deviation of the distributions
are reported, and the skewness and kurtosis are ignored. especially when n is small or alpha is
extremely small and the data is skewed.
Testing for outliers and normality should therefore be an important preliminary step
for many inferential statistical procedures. The simplest way to check for differences from
baseline in a population is to plot the distribution of sample points. It is possible to identify
outliers and the general shape of a distribution that indicates whether it is skewed and whether
it is positively or negatively skewed.
2. Test multicollinearity and homoscedasticity. How to deal with the variance of

the error is not constant.
a. Multicollinearity test
Multicollinearity is a phenomenon that often occurs when there is a high correlation

between two or more independent variables in the regression model. In other words, one
independent variable can be used to predict another independent variable. When the
independent variable A donates, the independent variable B increases, and vice versa, when A
decreases, B also decreases. This will lead to the generation of redundant information,
distorting the results of the multivariate regression model. The phenomenon of
multicollinearity violates the assumption of the linear regression model that the independent
variables do not have a linear relationship with each other. This is an important issue because
it affects the ability to distinguish the effects of each explanatory variable and accurately
estimate the regression coefficient.
VIF (Variance Inflation Factor) is a useful tool for detecting multicollinearity. As a

rule of thumb, if the VIF is greater than 5 (or in some cases 10), the variable may have
multicollinearity.
5|Page
Interaction between the ‘edu’ and ‘SOE’ variables, and the 'SOE' variable have the
highest VIFs, 13.65 and 13.10 respectively. This shows that there is strong multicollinearity
between these and other variables in the model. In particular, there may be a strong
correlation between education level ('edu') and the interaction variable between education
level and SOE ('edu_SOE'). Besides, 'gender', 'training', ‘edu’and 'gender##training' variables
also have high VIFs, although not as high as 'SOE' and 'edu_SOE'. This also shows the
correlation between these variables with other variables in the model.
b. Homoscedasticity test
6|Page
The results of the Breusch-Pagan/Cook-Weisberg test show that there is strong
evidence for heteroskedasticity in the regression model. The hypothesis H0 of this test is
"homoscedasticity", and for a small p-value (0.0000), we reject the null hypothesis, meaning
that we have substantial evidence for heteroskedasticity.
c. How to deal with the variance of the error is not constant?
Variance of variance is the phenomenon where the residuals or errors (e) of the model
after the regression do not follow a random distribution and the variances are not equal. This
violates the assumption of the linear regression model that the variance of the errors should be
the same.
There are some ways to deal with heteroskedasticity including logarithm

transformation, using the robust command in Stata, and using other linear regression
estimators (using the WLS (Weighted Least Squares) model, the model is quite similar to the
OLS model but it is necessary to use many tests to select the results). In this part, we use the
robust command in Stata.
7|Page
The differences in the F-test between regression with robust and non robust
counterpart indicates that heteroscedasticity is present.
3. Holding all other factors in the model fixed, the difference in income levels
between the following four groups:
- The difference in wages between group 4 and group 1 = 𝛿1 + 𝛿4 + 𝛿7 =
0.10864 + 0.0954369 + 0.0684133 = 0.2724902
- The difference in wages between group 2 and group 1 = 𝛿4 = 0.0954369
- The difference in wages between group 3 and group 1 = 𝛿1 = 0.10864
The proportional differential in wages between the base region (Northern Midlands
& Moutains) and other regions.
- Controlling other factors, the per capita wage is about 17.85% higher for Red
River Delta than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 0.97% higher for Central
Coast Region than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 2.84% higher for Central
Highlands than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 31.74% higher for
Southeast Region than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 11.62% higher for
Mekong Delta Region than for the base group (Northern Midlands &
Mountains).
4. Test the null hypothesis that the effect of education on the logarithm of wages
is the same for workers in SOEs and workers in private firms.
In this test, we will focus on two variables: education and workers in SOEs.
Meanwhile, education is the continuous variable and SOEs is the dummy variable.
Therefore, we created a new variable which is the interaction between 2 education and
SOEs:𝛿8 edu*SOE
Null hypothesis: the effect of education on the logarithm of wages is the same for
workers in SOEs and workers in private firms.
Alternative hypothesis: the effect of education on the logarithm of wages is the

difference between workers in SOEs and workers in private firms.
𝐻0 : 𝛿8 = 0
𝐻1 : 𝛿8 ≠ 0
Under the alternative hypothesis, edu * SOE has a ceteris paribus effect on log_wage,
without determining the effect is positive and negative
Reject 𝐻0 : ( with 2 case: 5% and 10% significant level) or p-value < α. )

8|Page
Here is the result of test in Stata:
From the table, we can conclude that:
Case 1: At 5% significance level:
p-value = 0.000 < 0.05
There is enough evidence to infer that the effect of education on wage is different
between Kinh households and ethnicity minority households at 5% significance level.
Case 2: At 10% significance level:
p-value = 0.000 < 0.1
There is enough evidence to infer that the effect of education on wage is the difference
between workers in SOEs and workers in private firms at 5% and 10% significant level.
The quadratic relationship between experience and the logarithm of wages
9|Page
Standard errors in parentheses: p<0.05 and p<0.1
The statistical significance test for the coefficient Exper2 (Exper2 = Exper*Exper)
yields a p-value = 0.0003957 < 0.05, which is less than the significance level of 0.05.
Therefore, we can conclude that the Exper2 coefficient is statistically significant. This means
that there is a relationship between the squared age variable and the logarithm of wages.
In other words, this implies that the logarithm of wages is dependent on the square of
experience.
log ̂
_𝑤𝑎𝑔𝑒 = 3.575094 + 0.0124987 exper − 0.0003957 exper2
𝛥log ̂
_𝑤𝑎𝑔𝑒 = (0.0124987 - 2(0.0003957)*exper) 𝛥exper
= (0.0124987 - 0.0007914*exper) 𝛥exper

𝛽
With 𝛽̂ 1 > 0 and 𝛽̂ 2 <0, the extreme point is x* = |2𝛽1 | = 0.0124987 / (2 * 0.0003957)
2
= 15.7931514 ≈16 (years)
At year 16, the marginal effect is zero, the function reaches its maximum value. Under
16 years, the marginal effect is positive (+), over 16 years, the marginal effect is negative (-).
5. The different effects of education on wages (different slopes) between

workers in SOEs and workers in private firms.
10 | P a g
e
In the graph, the intercept for workers in SOEs is below that for workers in private
firms, but the slope on education is larger for workers in SOEs. This means that workers in
SOEs earn less than workers in private firms at low levels of education, but the gap narrows
as education increases. At some point, a workers in SOEs earns more than a workers in
private firms, given the same levels of education (and this point is easily found given the
estimated equation).
6. Summarize the main findings and propose policy recommendations based on

the findings.
The findings suggest that age, marital status, education level, occupation and work
experience are the main determinants of the wage gap between the two regions for both male
and female workers. As individuals grow older, their income tends to increase, possibly due
to accumulated work experience and career advancements. Marital status also affects income,
with married individuals often benefiting from dual incomes or shared financial resources.
From the research results, we propose some recommendations related to reducing the
wage gap between state-owned enterprises and private enterprises by gender in Vietnam.
Firstly, improving the education level for employees, especially the university degree
or higher, is one of the core factors to contribute to reducing the wage gap between state-
owned enterprises and private enterprise. Therefore, it is necessary to have policies to
improve the education level of workers. The improvement of qualifications will help
employees improve productivity, increase profits for businesses, companies, and production
and business households, this will be the basis to ensure that both sides (boss-employees)
have mutual benefits. profit.
Second, the research results show that the wage gap between state-owned enterprises
and private enterprises is higher in women than in men. Therefore, there is a need for
11 | P a g
e
programs and policies to create opportunities for women in the private sector to have more
opportunities to participate in higher-paying jobs. Thereby stimulating the thinking and
creativity of the female workforce, contributing to economic development.
In many countries around the world, minimum wages are expressed by the hour and
by the month. Currently, the National Wage Council in Vietnam only regulates the minimum
monthly salary. Therefore, it is necessary to calculate and regulate the minimum hourly wage
in Vietnam to cover all workers (ILO Vietnam Office, 2018).
12 | P a g
e
Reference
1. Jeffrey M Wooldridge. (2019). Introductory Econometrics A Modern Approach (5nd

ed.). America. Thomson South-Western.
2. Trần Thị Tuấn Anh. (2019). Phân tích chênh lệch thu nhập theo giới tính ở TP. Hồ
Chí Minh bằng hồi quy phân vị. Tạp chí phát triển kinh tế, 21-37.
3. ResearchGate (). Farmland loss and livelihood outcomes: A microeconometric
analysis of household surveys in Vietnam.
https://www.researchgate.net/publication/262580514_Farmland_loss_and_livelihood_
outcomes_a_microeconometric_analysis_of_household_surveys_in_Vietnam. Access
30/05/2023
4. Vương Đình Huệ (2018). Cải cách chính sách tiền lương để nâng cao đời sống cho
CBCCVC, LLVT và người lao động trong doanh nghiệp Cải cách chính sách tiền
lương để nâng cao đời sống cho CBCCVC, LLVT và người lao động trong doanh
nghiệp (baochinhphu.vn)
13 | P a g
e
14 | P a g
e

Econometrics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics

Uploaded by

Copyright:

Available Formats

VIETNAM NATIONAL UNIVERSITY,

HANOI INTERNATIONAL SCHOOL

Lecturer: Tran Quang Tuyen

Member: Nguyen Bich Diep – 21070891 – 33%

Dong Anh Tuan – 21070550 – 28.5%

Ha Noi, June 8th 2023

The income of people everywhere is of interest to researchers whether in rich countries,

II. Literature review

According to the livelihood framework proposed by Tran et al. (2014), a number of

• Economic model of wage

Y=f(age, edu, gender, married, SOE, experience, training, urban, region)

We have : Log_wage = ln(wage+1)

Log_wage = 𝛽0 + 𝛽1 𝑎𝑔𝑒 + 𝛽2 𝑒𝑑𝑢 + 𝛽3 ∗ 𝑒𝑥𝑝𝑒𝑟 + 𝛿1 𝑔𝑒𝑛𝑑𝑒𝑟 + 𝛿2 𝑚𝑎𝑟𝑟𝑖𝑒𝑑 +

1. Discuss 5 assumptions to obtain unbiasedness estimators:

a. Assumption 1: Linear in parameters

The MLR model: : 𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥3 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝑢

b. Assumption 2: Random Sampling

c. Assumption 3: No perfect multicollinearity

d. Assumption 4: Zero conditional mean

Strong and key assumption: 𝐸(𝑢|𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑘 ) = 0

In the context of the model: 𝐸(𝑢|𝑎𝑔𝑒𝑖 , 𝑒𝑑𝑢𝑖 , 𝑔𝑒𝑛𝑑𝑒𝑟𝑖 , … ) = 0

y|x ~ Normal(β0 + β1x1 + β2x2 + … + βkxk, 𝛿 2)

2. Test multicollinearity and homoscedasticity. How to deal with the variance of

Multicollinearity is a phenomenon that often occurs when there is a high correlation

VIF (Variance Inflation Factor) is a useful tool for detecting multicollinearity. As a

c. How to deal with the variance of the error is not constant?

There are some ways to deal with heteroskedasticity including logarithm

Alternative hypothesis: the effect of education on the logarithm of wages is the

Reject 𝐻0 : ( with 2 case: 5% and 10% significant level) or p-value < α. )

From the table, we can conclude that:

Case 1: At 5% significance level:

p-value = 0.000 < 0.05

Case 2: At 10% significance level:

p-value = 0.000 < 0.1

The quadratic relationship between experience and the logarithm of wages

= (0.0124987 - 0.0007914*exper) 𝛥exper

= 15.7931514 ≈16 (years)

5. The different effects of education on wages (different slopes) between

6. Summarize the main findings and propose policy recommendations based on

1. Jeffrey M Wooldridge. (2019). Introductory Econometrics A Modern Approach (5nd

You might also like