Professional Documents
Culture Documents
Econometrics
Econometrics
------------------------------
SUBJECT: ECONOMETRICS
Project 4: Individuals wages who work as wage earners
in SOEs (state-owned enterprises) and private enterprises.
Class: INS304902
Group: Group 4
1|Page
III. Data and research design
1. Data:
In our research, we use the dataset that contains 56584 individuals who work as wage
earners in SOEs (state-owned enterprises) and private enterprises. The dataset consists of
useful information about a variety of personal characteristics of worker such as: the age,
gender, married, experience,.. In addition, factors which related to region are also mentioned.
To implement descriptive and inferential statistics, we conduct to create a logarithm of
income instead of income level: log(wage) = ln(wage + 1).
2. Research design:
Wage is the result of many factors from subjective to objective; from the capacity of
each person and the macro context of the economy:
• Econometric model
2|Page
This model is a multivariable linear regression model, used to predict median
individuals wage based on various factors. The explanatory variables in this dataset seem to
have been chosen to capture the demographic, socioeconomic and geographic characteristics
of the individuals who work as wage earners in SOEs ( state-owned enterprises) and private
enterprises. These variables can help explain the variation in individuals' income. Specific
explanations for variables:
- The age, edu, gender and married of the head of household are chosen because
they can affect their earning ability.
- SOE and experience were selected to show the difference about the factors
affecting wages can earn in types of businesses.
- Urban and region were chosen to represent geographic working and living
conditions, which can affect earning opportunities and cost of living.
- Year is selected to control for changes in earnings over time.
- Interaction variables such as gender*training and edu*SOE were chosen to
explore more complex relationships between these variables and income.
Expected sign:
- The age, edu, gender, married, experience, region and training can have a
positive effect on income (positive coefficient): Trained employees will have
higher wages than untrained employees, employees have more experience and
education will get higher salaries.
- Married (if set 1 = married), can have a positive impact in the labor market.
- Gender (if set 1 = male), can have a positive impact due to gender inequality in
the labor market.
- SOE can have a negative impact on income (negative coefficient) because
workers in state-owned enterprises are subject to more stringent regulations
than a private enterprise.
- The relationship of the interaction variables with wage can be more complex
and depends on the specific context.
The relationship of the interaction variables with wage can be more complex and depends on
the specific context.
IV. Result and Discussion
To obtain unbiased coefficient estimates from the OLS estimator for the model, there
are 4 assumptions that need to be satisfied. In the following, we will discuss in more detail the
context of this model.
3|Page
Linear in parameters means the parameter appears with a power of 1 only and is not
multiplied or divided by any other parameter. The dependent and the independent variables
can be arbitrary functions of the underlying variables of interest, such as natural logarithms
and squares. In this model, y has a functional form that is the logarithm of wage + 1, x has a
functional form that is the product of two underlying variables such as edu*SOE. While, the
functional form of x, and y can be flexible and the parameters must be linear and it can be
seen obviously in the model.
The sample taken for the linear regression model must be drawn randomly from the
population. To take this regression model, we must select 56584 individuals who work as
wage earners in SOEs ( state-owned enterprises) and private enterprises. It means that every
labour in the population has an equal chance of being selected for the sample. The number of
observations taken in the sample for making the linear regression model should be greater
than the number of parameters to be estimated. This makes sense mathematically too. If the
number of parameters to be estimated (unknowns) are more than the number of observations,
then estimation is not possible.
The independent variables should not be perfectly correlated with each other. Perfect
multicollinearity rarely occurs. This model allows the independent variables to be correlated
but not perfect. The problem of almost linear relationship among explanatory variables is
called multicollinearity. Multicollinearity does not violates assumption 3. However,
multicollinearity also affects the interpretation of causality in the OLS model. In the model,
this means that factors such as age, education, marriage and gender should not be strongly
correlated. If they are closely examined, it will be difficult to discern the impact of each factor
on wage.
This assumption means that the error u has an expected value of zero given any values
of the independent variables. It also implies that the average value of unobservable factors is
uncorrelated with the explanatory variables – including no linear and nonlinear relationship.
This is a strict and difficult estimate to achieve, but when this assumption is reached, the OLS
model will result in unbiased and consistent estimates. Assumption 4 also implies a weaker
assumption which means there is no linear relation between the unobservable factors and the
independent variables. Since this is a weaker assumption, when achieved, the result of the
OLS model is a biased but still consistent estimate.
According to this assumption, the error term u has an expected value of zero given any
value of the independent variables such as age, edu, and gender,... must be satisfied to obtain
unbiased coefficient estimates of the logarithm of household income per capita.
4|Page
e. Assumption 6: The normality of errors:
Parametric statistical procedures are based on the assumption of basic normality in the
population from which a sample is selected. While many univariate test statistics such as T-
test and F-test are said to be unaffected by moderate differences with basic assumptions about
normality and uniformity of variance. However, many authors have pointed out the danger of
erroneous statistical inferences when only the mean and standard deviation of the distributions
are reported, and the skewness and kurtosis are ignored. especially when n is small or alpha is
extremely small and the data is skewed.
Testing for outliers and normality should therefore be an important preliminary step
for many inferential statistical procedures. The simplest way to check for differences from
baseline in a population is to plot the distribution of sample points. It is possible to identify
outliers and the general shape of a distribution that indicates whether it is skewed and whether
it is positively or negatively skewed.
a. Multicollinearity test
5|Page
Interaction between the ‘edu’ and ‘SOE’ variables, and the 'SOE' variable have the
highest VIFs, 13.65 and 13.10 respectively. This shows that there is strong multicollinearity
between these and other variables in the model. In particular, there may be a strong
correlation between education level ('edu') and the interaction variable between education
level and SOE ('edu_SOE'). Besides, 'gender', 'training', ‘edu’and 'gender##training' variables
also have high VIFs, although not as high as 'SOE' and 'edu_SOE'. This also shows the
correlation between these variables with other variables in the model.
b. Homoscedasticity test
6|Page
The results of the Breusch-Pagan/Cook-Weisberg test show that there is strong
evidence for heteroskedasticity in the regression model. The hypothesis H0 of this test is
"homoscedasticity", and for a small p-value (0.0000), we reject the null hypothesis, meaning
that we have substantial evidence for heteroskedasticity.
Variance of variance is the phenomenon where the residuals or errors (e) of the model
after the regression do not follow a random distribution and the variances are not equal. This
violates the assumption of the linear regression model that the variance of the errors should be
the same.
7|Page
The differences in the F-test between regression with robust and non robust
counterpart indicates that heteroscedasticity is present.
3. Holding all other factors in the model fixed, the difference in income levels
between the following four groups:
- The difference in wages between group 4 and group 1 = 𝛿1 + 𝛿4 + 𝛿7 =
0.10864 + 0.0954369 + 0.0684133 = 0.2724902
- The difference in wages between group 2 and group 1 = 𝛿4 = 0.0954369
- The difference in wages between group 3 and group 1 = 𝛿1 = 0.10864
The proportional differential in wages between the base region (Northern Midlands
& Moutains) and other regions.
- Controlling other factors, the per capita wage is about 17.85% higher for Red
River Delta than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 0.97% higher for Central
Coast Region than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 2.84% higher for Central
Highlands than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 31.74% higher for
Southeast Region than for the base group (Northern Midlands & Mountains).
- Controlling other factors, the per capita wage is about 11.62% higher for
Mekong Delta Region than for the base group (Northern Midlands &
Mountains).
4. Test the null hypothesis that the effect of education on the logarithm of wages
is the same for workers in SOEs and workers in private firms.
In this test, we will focus on two variables: education and workers in SOEs.
Meanwhile, education is the continuous variable and SOEs is the dummy variable.
Therefore, we created a new variable which is the interaction between 2 education and
SOEs:𝛿8 edu*SOE
Null hypothesis: the effect of education on the logarithm of wages is the same for
workers in SOEs and workers in private firms.
𝐻0 : 𝛿8 = 0
𝐻1 : 𝛿8 ≠ 0
Under the alternative hypothesis, edu * SOE has a ceteris paribus effect on log_wage,
without determining the effect is positive and negative
There is enough evidence to infer that the effect of education on wage is different
between Kinh households and ethnicity minority households at 5% significance level.
There is enough evidence to infer that the effect of education on wage is the difference
between workers in SOEs and workers in private firms at 5% and 10% significant level.
9|Page
Standard errors in parentheses: p<0.05 and p<0.1
The statistical significance test for the coefficient Exper2 (Exper2 = Exper*Exper)
yields a p-value = 0.0003957 < 0.05, which is less than the significance level of 0.05.
Therefore, we can conclude that the Exper2 coefficient is statistically significant. This means
that there is a relationship between the squared age variable and the logarithm of wages.
In other words, this implies that the logarithm of wages is dependent on the square of
experience.
log ̂
_𝑤𝑎𝑔𝑒 = 3.575094 + 0.0124987 exper − 0.0003957 exper2
𝛥log ̂
_𝑤𝑎𝑔𝑒 = (0.0124987 - 2(0.0003957)*exper) 𝛥exper
At year 16, the marginal effect is zero, the function reaches its maximum value. Under
16 years, the marginal effect is positive (+), over 16 years, the marginal effect is negative (-).
10 | P a g
e
In the graph, the intercept for workers in SOEs is below that for workers in private
firms, but the slope on education is larger for workers in SOEs. This means that workers in
SOEs earn less than workers in private firms at low levels of education, but the gap narrows
as education increases. At some point, a workers in SOEs earns more than a workers in
private firms, given the same levels of education (and this point is easily found given the
estimated equation).
The findings suggest that age, marital status, education level, occupation and work
experience are the main determinants of the wage gap between the two regions for both male
and female workers. As individuals grow older, their income tends to increase, possibly due
to accumulated work experience and career advancements. Marital status also affects income,
with married individuals often benefiting from dual incomes or shared financial resources.
From the research results, we propose some recommendations related to reducing the
wage gap between state-owned enterprises and private enterprises by gender in Vietnam.
Firstly, improving the education level for employees, especially the university degree
or higher, is one of the core factors to contribute to reducing the wage gap between state-
owned enterprises and private enterprise. Therefore, it is necessary to have policies to
improve the education level of workers. The improvement of qualifications will help
employees improve productivity, increase profits for businesses, companies, and production
and business households, this will be the basis to ensure that both sides (boss-employees)
have mutual benefits. profit.
Second, the research results show that the wage gap between state-owned enterprises
and private enterprises is higher in women than in men. Therefore, there is a need for
11 | P a g
e
programs and policies to create opportunities for women in the private sector to have more
opportunities to participate in higher-paying jobs. Thereby stimulating the thinking and
creativity of the female workforce, contributing to economic development.
In many countries around the world, minimum wages are expressed by the hour and
by the month. Currently, the National Wage Council in Vietnam only regulates the minimum
monthly salary. Therefore, it is necessary to calculate and regulate the minimum hourly wage
in Vietnam to cover all workers (ILO Vietnam Office, 2018).
12 | P a g
e
Reference
4. Vương Đình Huệ (2018). Cải cách chính sách tiền lương để nâng cao đời sống cho
CBCCVC, LLVT và người lao động trong doanh nghiệp Cải cách chính sách tiền
lương để nâng cao đời sống cho CBCCVC, LLVT và người lao động trong doanh
nghiệp (baochinhphu.vn)
13 | P a g
e
14 | P a g
e