Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Assignment

Case Analysis
Case 1
THE DETERMINANTS OF HOURLY WAGES
The Current Population Survey (CPS), undertaken by the U.S. Census Bureau, periodically
conducts a variety of survey on a variety of topics. We look at a cross-section of 1289 persons
interviewed in March 2020 to study the factors that determine hourly wage (in dollars) in this
sample. Keep in mind that these 1289 observations are a sample from a much bigger population.
The variables used in the analysis are defined as follows:
Wage: Hourly wage I dollars, which is the dependent variable.
The Explanatory variables
Female: Gender, coded 1 for female, 0 for male
Nonwhite: Race, coded 1 for nonwhite workers, 0 for white workers
Union: Union status, coded 1 if in a union job, 0 otherwise
Education: Education (in years)
Exper: Potential work experience (in years), defined as age minus years of schooling minus
6. (it is assumed that schooling starts at age 6)
Equation
wage=β 0 + β 1 female+ β 2 nonwhite+ β 3∪+ β 4 edu+ β 5 exper+ε
===============================================
Dependent variable:
---------------------------
WAGE
-----------------------------------------------
FEMALE -3.075***
(0.365)

NONWHITE -1.565***
(0.509)

UNION 1.096**
(0.506)

EDUCATON 1.370***
(0.066)
EXPER 0.166***
(0.016)
Constant -7.183***
(1.016)
-----------------------------------------------
Observations 1289
R2 0.323
Adjusted R2 0.321
Residual Std. Error 6.508
F Statistic 122.615***
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01

a. What proportion of the total variation in wage is explained by the regression?


The R-squared value (0.323) indicates that approximately 32.3% of the total variation in wage is
explained by the regression model.

b. What is the overall significance of the regression? Which test do you use?

The overall significance of the regression is using the F-statistic. In this case, the F-statistic is
122.615 with a p-value less than 0.01, indicating that the regression model is statistically
significant.

c. Are the estimated coefficients individually significant? Which of the coefficient is


individually statistically significant at the 5 percent level? [calculate t-value]

The estimated coefficients are individually significant if their p-values are less than the
significance level (i.e 0.05). From the provided information, all coefficients (FEMALE,
NONWHITE, UNION, EDUCATION, EXPER) have p-values less than 0.05, making them
individually statistically significant at the 5 percent level.

d. Interpret the regression results.

studentized Breusch-Pagan test

data: reg01
BP = 16.0011, df = 5, p-value = 0.0251

FEMALE (-3.075): female is associated with a $3.075 decrease in hourly wage compared to
males.
NONWHITE (-1.565): Nonwhite is associated with a $1.565 decrease in hourly wage compared
to white workers.
UNION (1.096): Union status is associated with a $1.096 increase in hourly wage.
EDUCATION (1.370): For each additional year of education, there is, on average, a $1.370
increase in hourly wage.
EXPER (0.166): For each additional year of potential work experience, there is, on average, a
$0.166 increase in hourly wage.

e. Is there presence of heteroscedasticity or not? Comment


Breusch-Godfrey test for serial correlation of order up to 3

data: Model
LM test = 2.0188, df = 3, p-value = 0.5685

The p-value of 0.0251 is less than the typical significance level of 0.05, so we reject the null
hypothesis. The studentized Breusch-Pagan test suggests heteroscedasticity is present
(p-value = 0.0251 < 0.05).

f. Is there strong evidence of serial correlation? Comment.

The Breusch-Godfrey test for serial correlation (order up to 3) shows no strong evidence of serial
correlation (p-value = 0.5685 > 0.05).

Case 2

Define the fixed effects and random effects. Discuss about the Hausman test hypothesis.

pollution¿=β 0 + β 1 FDI ¿ + β 2 GDP¿ + β3 energy ¿ + β 4 capital ¿ + β 5 unemp¿ + β 6 hc ¿ +η i+ γ t +ε ¿

[All variables (dependent and explanatory) are converted into natural logarithmic]

The above equation is estimated using annual country level data from 1980 – 2010 for 14
Latin American countries. The data for all variables are extracted from the World
Development Indicators (World Bank).

Variable descriptions
Pollution: CO2 emission, used as the indicator of pollution emission,
FDI: Foreign Direct Investment inflow
GDP: GDP per capita (constant 2005 US$)
energy: Energy use per capita in kt of oil equivalent
capital: Gross fixed capital formation as a proxy for capital stock
unemp: Unemployment rate, which also affect pollution in both positive and negative
directions.
hc: Human capital
============================================
Dependent variable:
-------------------------------
pollution
(FIXED) (Random)
--------------------------------------------
FDI 0.025** -0.036***
(2.21) (3.28)

GDP 3.254*** 3.352***


(2.28) (6.68)

energy 0.753*** 0.766***


(13.39) (13.83)

capital 0.065** 0.026


(1.91) (1.16)

unemp -0.112*** -0.117***


(-5.15) (-5.73)

hc 0.07 0.142**
(0.92) (2.05)

Constant -20.241*** -20.102***


(-8.77) (9.30)

--------------------------------------------
Observations 434 434
R2 0.616 0.749
Hausman test FE Vs RE chisq = 11.93, p-value = 0.154
F Statistic 5.457*** (df = 5; 92) 27.143***
============================================
Note:*p<0.1; **p<0.05; ***p<0.01 and the values of parenthesis is t–
statistics

The fixed effect estimator uses within variation by using time demeaned variables. The time
demeaned model does not include the individual specific effect and can be estimated by OLS.

fixed effects estimator (also known as the within estimator) is an estimator for
the coefficients in panel data analysis. If we assume fixed effects, we impose time independent
effects for each individual.
Such models assist in controlling for unobserved heterogeneity, when this heterogeneity is
constant over time: typically the ethnicity, the year and location of birth are heterogeneous
variables a fixed effect model can control for. This constant heterogeneity is the fixed effect for
this individual. This constant can be removed from the data, for example by subtracting each
individual's means from each of his observations before estimating the model.
A random effects estimater makes the additional assumption that the individual effects are
randomly distributed. It is thus not the opposite of a fixed effects model, but a special case. If the
random effects assumption holds, the random effects model is more efficient than the fixed
effects model. However, if this additional assumption does not hold (ie, if the Hausman
test fails), the random effects model is not consistent.
The Hausman test is used to decide whether to use the fixed effects (FE) or random effect (RE)
estimator.
Ho: no correlation between individual specific effects and independent variables, FE and RE
coefficients are not significantly different from each other.
H1: correlation between individual specific effects and independent variables, FE and RE
coefficients are significantly different from each other.
If the Hausman test statistic w is not significantly different from zero, then both the FE and RE
estimators are consistent. RE estimator should be used because it is more efficient.

If the Hausman test statistic w is significantly different from zero, then only the FE estimator is
consistent and should be used.

The Hausman test evaluates the consistency of an RE estimator against a less efficient FE
estimator that is known to be consistent.

The individual specific effects are typically correlated with the independent variables, making
the FE estimator more appropriate.

a. Interpret the coefficient on FDI, energy, and GDP [based on Hausman test] and
comment its significance.

FDI: If 1 unit increases in FDI then 0.036 unit decreases in pollution. It is statistically significant
at the 0.01 level.

Energy: If 1 unit increases in energy then 0.766 unit increases in pollution. It is statistically
significant at the 0.01 level.

GDP: If 1 unit increases in GDP then 3.352 unit increases in pollution. It is statistically
significant at the 0.01 level. It is statistically significant at the 0.01 level.

b. Interpret the coefficient on capital, unemp, and hc [based on Hausman test] and
comment its significance.

Capital: If 1 unit increases in capital then 0.026 unit increases in pollution. It is not
statistically significant.
Unemp: If 1 unit increases in capital then 0.117 unit decrease in pollution. It is statistically
significant at 0.01 level.
Hc:If 1 unit increases in capital then 0.142 unit increases in pollution. It is statistically
significant at 0.05 level.

c. Based on the Housman Test, which model would you opt for: fixed effect or random
effect? Please provide your comments.
The p-value for the Hausman test is 0.154, which is greater than the commonly used
significance level of 0.05. This suggests that we fail to reject the null hypothesis that the
preferred model is random effects. Therefore, we interpret the coefficients based on the
random effects model.

Case 3
What is the logit Model and probit Model? Why the logit and probit model consider as the best
model for binary dependent variable?

Since the dependent variable, smoker, is a nominal variable, it takes a value of 1 (for
smoker) and 0 (for nonsmoker). Suppose we routinely apply the logit model to determine
smoking behavior in relation to age (AGE), education (EDUC), family income (INCOME), and
price of cigarettes (PCIGS79).
Model:
y i=β 0 + β 1 Agei + β 2 Educ i + β 3 Income i+ β 4 Pcigs i +ε

============================================
Dependent variable:
---------------------------
SMOKER
---------------------------------------------
AGE -0.021***
(0.004)

EDUC -0.091***
(0.021)

INCOME 4.72E-06
(7.27E-06)

PCIGS79 -0.022*
(0.012)

Constant 2.745***
(0.822)
---------------------------------------------
Observations 1196
Log Likelihood -770.841
Akaike Inf. Crit. 1.297
Note: *p<0.1; **p<0.05; ***p<0.01
a. Does the coefficient on age, education, income, and PCIGS79 are statistically significant?
b. Interpret the coefficients of the results.
The logit and probit models are statistical models used for modeling binary dependent variables,
particularly in the context of binary choice situations. These models estimate the probability of
an event occurring (e.g., smoking) given a set of explanatory variables.
Both logit and probit models transform a linear combination of explanatory variables into a
probability through a nonlinear function. The logit model uses the logistic function, and the
probit model uses the cumulative distribution function of the standard normal distribution.
a. Statistical Significance:
AGE: The coefficient is statistically significant at the 0.01 level.
EDUC: The coefficient is statistically significant at the 0.01 level.
INCOME: The coefficient is statistically significant at the 0.01 level.
PCIGS79: The coefficient is statistically significant at the 0.1 level.
b. Interpretation of Coefficients:
AGE: one-unit increase in age, the log-odds of being a smoker decrease by 0.021. This effect is
statistically significant at the 0.01 level.
EDUC: one-unit increase in education, the log-odds of being a smoker decrease by 0.091. This
effect is statistically significant at the 0.01 level.
INCOME: one-unit increase in family income, the log-odds of being a smoker increase by
4.72E-06. This effect is statistically significant at the 0.01 level.
PCIGS79: one-unit increase in the price of cigarettes, the log-odds of being a smoker decrease by
0.022. This effect is statistically significant at the 0.1 level.
Case 4

By taking logs of the variable, how does the interpretation of coefficients change?

Here is a study the relationship between the research and development (R&D) expenditure
of a firm, its size – often measured by annual sales and the profit margin for a sample of 32
firms in the chemical industry. rd = research and development (R&D) expenditure and
sales transformed into logarithmic from – profmarg = profit margin as a percentage of
sales – on R&D expenditure. Where standard errors appear in parentheses below the
estimated coefficients.

log ( rd )=β 0 + β 1 log ⁡(sales)+ β 2 profmarg +ε (1)


===============================================
Dependent variable:
---------------------------
log(rd)
-----------------------------------------------
log(sales) 1.084***
(0.060)

profmarg 0.022*
(0.013)

Constant -4.378***
(0.468)

-----------------------------------------------
Observations 32
R2 0.918
Adjusted R2 0.912
Residual Std. Error 0.514 (df = 29)
F Statistic 162.231*** (df = 2; 29)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01

(a) How can you interpret the R2 and adj R2 values?


(b) Interpret the regression results. Which of the coefficient is individually statistically
significant at the 10 percent level?
(c) What is the overall significance of the regression? Which test do you use?

a. Interpretation of R-squared (R2) and Adjusted R-squared (Adj R2):


R2: The R-squared value of 0.918 indicates that approximately 91.8% of the variation in the
dependent variable (log(rd)) is explained by the independent variables in the model.
Adj R2: The adjusted R-squared value of 0.912 considers the number of predictors in the model,
providing a more reliable measure of the model's goodness of fit when compared to R2.
b. Interpretation of Coefficients:
log(sales): For a 1% increase in sales, then 1.084% increase in R&D expenditure. This effect is
statistically significant at the 0.01 level.
profmarg: For a 1% increase in profit margin, then 0.022% increase in R&D expenditure. This
effect is statistically significant at the 0.1 level.

c. Overall Significance of the Regression:


The overall significance is assessed using the F-statistic. In this case, the F-statistic is 162.231
with a p-value less than 0.01, indicating that the regression model is statistically significant.
This means that at least one of the independent variables (log(sales) or profmarg) is significantly
related to the dependent variable (log(rd)).
The degrees of freedom for the F-statistic are 2 and 29, representing the number of restrictions
(coefficients being tested) and the residual degrees of freedom, respectively.

You might also like