Professional Documents
Culture Documents
Assignment Case Analysis-1
Assignment Case Analysis-1
Case Analysis
Case 1
THE DETERMINANTS OF HOURLY WAGES
The Current Population Survey (CPS), undertaken by the U.S. Census Bureau, periodically
conducts a variety of survey on a variety of topics. We look at a cross-section of 1289 persons
interviewed in March 2020 to study the factors that determine hourly wage (in dollars) in this
sample. Keep in mind that these 1289 observations are a sample from a much bigger population.
The variables used in the analysis are defined as follows:
Wage: Hourly wage I dollars, which is the dependent variable.
The Explanatory variables
Female: Gender, coded 1 for female, 0 for male
Nonwhite: Race, coded 1 for nonwhite workers, 0 for white workers
Union: Union status, coded 1 if in a union job, 0 otherwise
Education: Education (in years)
Exper: Potential work experience (in years), defined as age minus years of schooling minus
6. (it is assumed that schooling starts at age 6)
Equation
wage=β 0 + β 1 female+ β 2 nonwhite+ β 3∪+ β 4 edu+ β 5 exper+ε
===============================================
Dependent variable:
---------------------------
WAGE
-----------------------------------------------
FEMALE -3.075***
(0.365)
NONWHITE -1.565***
(0.509)
UNION 1.096**
(0.506)
EDUCATON 1.370***
(0.066)
EXPER 0.166***
(0.016)
Constant -7.183***
(1.016)
-----------------------------------------------
Observations 1289
R2 0.323
Adjusted R2 0.321
Residual Std. Error 6.508
F Statistic 122.615***
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
b. What is the overall significance of the regression? Which test do you use?
The overall significance of the regression is using the F-statistic. In this case, the F-statistic is
122.615 with a p-value less than 0.01, indicating that the regression model is statistically
significant.
The estimated coefficients are individually significant if their p-values are less than the
significance level (i.e 0.05). From the provided information, all coefficients (FEMALE,
NONWHITE, UNION, EDUCATION, EXPER) have p-values less than 0.05, making them
individually statistically significant at the 5 percent level.
data: reg01
BP = 16.0011, df = 5, p-value = 0.0251
FEMALE (-3.075): female is associated with a $3.075 decrease in hourly wage compared to
males.
NONWHITE (-1.565): Nonwhite is associated with a $1.565 decrease in hourly wage compared
to white workers.
UNION (1.096): Union status is associated with a $1.096 increase in hourly wage.
EDUCATION (1.370): For each additional year of education, there is, on average, a $1.370
increase in hourly wage.
EXPER (0.166): For each additional year of potential work experience, there is, on average, a
$0.166 increase in hourly wage.
data: Model
LM test = 2.0188, df = 3, p-value = 0.5685
The p-value of 0.0251 is less than the typical significance level of 0.05, so we reject the null
hypothesis. The studentized Breusch-Pagan test suggests heteroscedasticity is present
(p-value = 0.0251 < 0.05).
The Breusch-Godfrey test for serial correlation (order up to 3) shows no strong evidence of serial
correlation (p-value = 0.5685 > 0.05).
Case 2
Define the fixed effects and random effects. Discuss about the Hausman test hypothesis.
[All variables (dependent and explanatory) are converted into natural logarithmic]
The above equation is estimated using annual country level data from 1980 – 2010 for 14
Latin American countries. The data for all variables are extracted from the World
Development Indicators (World Bank).
Variable descriptions
Pollution: CO2 emission, used as the indicator of pollution emission,
FDI: Foreign Direct Investment inflow
GDP: GDP per capita (constant 2005 US$)
energy: Energy use per capita in kt of oil equivalent
capital: Gross fixed capital formation as a proxy for capital stock
unemp: Unemployment rate, which also affect pollution in both positive and negative
directions.
hc: Human capital
============================================
Dependent variable:
-------------------------------
pollution
(FIXED) (Random)
--------------------------------------------
FDI 0.025** -0.036***
(2.21) (3.28)
hc 0.07 0.142**
(0.92) (2.05)
--------------------------------------------
Observations 434 434
R2 0.616 0.749
Hausman test FE Vs RE chisq = 11.93, p-value = 0.154
F Statistic 5.457*** (df = 5; 92) 27.143***
============================================
Note:*p<0.1; **p<0.05; ***p<0.01 and the values of parenthesis is t–
statistics
The fixed effect estimator uses within variation by using time demeaned variables. The time
demeaned model does not include the individual specific effect and can be estimated by OLS.
fixed effects estimator (also known as the within estimator) is an estimator for
the coefficients in panel data analysis. If we assume fixed effects, we impose time independent
effects for each individual.
Such models assist in controlling for unobserved heterogeneity, when this heterogeneity is
constant over time: typically the ethnicity, the year and location of birth are heterogeneous
variables a fixed effect model can control for. This constant heterogeneity is the fixed effect for
this individual. This constant can be removed from the data, for example by subtracting each
individual's means from each of his observations before estimating the model.
A random effects estimater makes the additional assumption that the individual effects are
randomly distributed. It is thus not the opposite of a fixed effects model, but a special case. If the
random effects assumption holds, the random effects model is more efficient than the fixed
effects model. However, if this additional assumption does not hold (ie, if the Hausman
test fails), the random effects model is not consistent.
The Hausman test is used to decide whether to use the fixed effects (FE) or random effect (RE)
estimator.
Ho: no correlation between individual specific effects and independent variables, FE and RE
coefficients are not significantly different from each other.
H1: correlation between individual specific effects and independent variables, FE and RE
coefficients are significantly different from each other.
If the Hausman test statistic w is not significantly different from zero, then both the FE and RE
estimators are consistent. RE estimator should be used because it is more efficient.
If the Hausman test statistic w is significantly different from zero, then only the FE estimator is
consistent and should be used.
The Hausman test evaluates the consistency of an RE estimator against a less efficient FE
estimator that is known to be consistent.
The individual specific effects are typically correlated with the independent variables, making
the FE estimator more appropriate.
a. Interpret the coefficient on FDI, energy, and GDP [based on Hausman test] and
comment its significance.
FDI: If 1 unit increases in FDI then 0.036 unit decreases in pollution. It is statistically significant
at the 0.01 level.
Energy: If 1 unit increases in energy then 0.766 unit increases in pollution. It is statistically
significant at the 0.01 level.
GDP: If 1 unit increases in GDP then 3.352 unit increases in pollution. It is statistically
significant at the 0.01 level. It is statistically significant at the 0.01 level.
b. Interpret the coefficient on capital, unemp, and hc [based on Hausman test] and
comment its significance.
Capital: If 1 unit increases in capital then 0.026 unit increases in pollution. It is not
statistically significant.
Unemp: If 1 unit increases in capital then 0.117 unit decrease in pollution. It is statistically
significant at 0.01 level.
Hc:If 1 unit increases in capital then 0.142 unit increases in pollution. It is statistically
significant at 0.05 level.
c. Based on the Housman Test, which model would you opt for: fixed effect or random
effect? Please provide your comments.
The p-value for the Hausman test is 0.154, which is greater than the commonly used
significance level of 0.05. This suggests that we fail to reject the null hypothesis that the
preferred model is random effects. Therefore, we interpret the coefficients based on the
random effects model.
Case 3
What is the logit Model and probit Model? Why the logit and probit model consider as the best
model for binary dependent variable?
Since the dependent variable, smoker, is a nominal variable, it takes a value of 1 (for
smoker) and 0 (for nonsmoker). Suppose we routinely apply the logit model to determine
smoking behavior in relation to age (AGE), education (EDUC), family income (INCOME), and
price of cigarettes (PCIGS79).
Model:
y i=β 0 + β 1 Agei + β 2 Educ i + β 3 Income i+ β 4 Pcigs i +ε
============================================
Dependent variable:
---------------------------
SMOKER
---------------------------------------------
AGE -0.021***
(0.004)
EDUC -0.091***
(0.021)
INCOME 4.72E-06
(7.27E-06)
PCIGS79 -0.022*
(0.012)
Constant 2.745***
(0.822)
---------------------------------------------
Observations 1196
Log Likelihood -770.841
Akaike Inf. Crit. 1.297
Note: *p<0.1; **p<0.05; ***p<0.01
a. Does the coefficient on age, education, income, and PCIGS79 are statistically significant?
b. Interpret the coefficients of the results.
The logit and probit models are statistical models used for modeling binary dependent variables,
particularly in the context of binary choice situations. These models estimate the probability of
an event occurring (e.g., smoking) given a set of explanatory variables.
Both logit and probit models transform a linear combination of explanatory variables into a
probability through a nonlinear function. The logit model uses the logistic function, and the
probit model uses the cumulative distribution function of the standard normal distribution.
a. Statistical Significance:
AGE: The coefficient is statistically significant at the 0.01 level.
EDUC: The coefficient is statistically significant at the 0.01 level.
INCOME: The coefficient is statistically significant at the 0.01 level.
PCIGS79: The coefficient is statistically significant at the 0.1 level.
b. Interpretation of Coefficients:
AGE: one-unit increase in age, the log-odds of being a smoker decrease by 0.021. This effect is
statistically significant at the 0.01 level.
EDUC: one-unit increase in education, the log-odds of being a smoker decrease by 0.091. This
effect is statistically significant at the 0.01 level.
INCOME: one-unit increase in family income, the log-odds of being a smoker increase by
4.72E-06. This effect is statistically significant at the 0.01 level.
PCIGS79: one-unit increase in the price of cigarettes, the log-odds of being a smoker decrease by
0.022. This effect is statistically significant at the 0.1 level.
Case 4
By taking logs of the variable, how does the interpretation of coefficients change?
Here is a study the relationship between the research and development (R&D) expenditure
of a firm, its size – often measured by annual sales and the profit margin for a sample of 32
firms in the chemical industry. rd = research and development (R&D) expenditure and
sales transformed into logarithmic from – profmarg = profit margin as a percentage of
sales – on R&D expenditure. Where standard errors appear in parentheses below the
estimated coefficients.
profmarg 0.022*
(0.013)
Constant -4.378***
(0.468)
-----------------------------------------------
Observations 32
R2 0.918
Adjusted R2 0.912
Residual Std. Error 0.514 (df = 29)
F Statistic 162.231*** (df = 2; 29)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01