RT1 Project 1&2 Assignment

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

# RT1 Project 1&2 Assignment - Prashant Kudrya[EA20035]

# Project 1

library(skimr)

library(ggplot2)

library(dplyr)

library(broom)

library(stargazer)

library(wooldridge)

data <- wooldridge::gpa1

model <- lm(data$colGPA~data$age+data$soph+data$junior+data$senior + data$male+


data$campus + data$business+ data$engineer

+ data$hsGPA+ data$ACT + data$job20 + data$drive + data$bike + data$voluntr + data$PC +


data$greek + data$car

+ data$siblings + data$bgfriend + data$clubs + data$skipped + data$alcohol + data$gradMI +


data$fathcoll + data$mothcoll)

stargazer(model, type = 'text')

# 1 From the model we see that, only four factors are significant in affecting college GPA

model1 <- lm(data$colGPA~ data$hsGPA + data$PC+ data$gradMI + data$skipped )

stargazer(model1, type = 'text')

#2. Will owning a computer increase college GPA?

#Yes, owning a computer will increase college GPA. This can be concluded from the
coefficient of PC which is significant at 5% level of significance. Owning a computer will
increase college GPA by .135 points

#3. Is it statistically significant? (Hint: control as many variables as you can)

model3 <- lm(data$colGPA~data$PC)

stargazer(model3, type = 'text')

#Yes owing a PC is statistically significant even at 1% level of significance

#4. Argue that including mother’s and father’s college level education have any bearing on
college gpa.

#Adding mothcoll and fathcoll to the regression

#Unrestricted Model

model4 <- lm(data$colGPA~ data$hsGPA + data$PC+ data$gradMI + data$skipped + data$mothcoll +


data$fathcoll)
summary(model4, type = 'text')

#Restricted Model

model1 <- lm(data$colGPA~ data$hsGPA + data$PC+ data$gradMI + data$skipped )

summary(model1, type = 'text')

#hsGPA coeff changes from 0.458 to 0.457

#PC coeff changes from 0.124 to 0.117

#gradeMI coeff changes from 0.172 to 0.185

#skipped coeff undergoes no change

#The constant term changes from 1.372 to 1.332

#And all of them continue to be significant while father college and mother college are
insignificant

#Therefore, adding these parameters does not have much change on collegeGPA

#Also verifying the same using f-test to test joint significance

#H0: beta5 = beta6 = 0

#H1: beta5 != beta6

library(car)

nullhyp<- c("data$fathcoll", "data$mothcoll")

linearHypothesis(model4, nullhyp)

#The f-value is 0.629

#As per the F-table, the value at 10% significance is 1.77

#Since 0.629 < 1.77, we cannot reject null hypothesis

#mothcoll and fathcoll are jointly insignificant in the model

#5. Add hsGPA2 to the model that you constructed in (1) and decide whether this generalization is
needed.

data$gpa2 <- data$hsGPA^2

model5 <- lm(data$colGPA~ data$hsGPA + data$PC+ data$gradMI + data$skipped + data$gpa2)

summary(model5)

#When hsGPA2 is added to the regression, its coefficient is about .334 and its t statistic is

#about 1.67. (The coefficient on hsGPA is about –1.78) This is a borderline case. The

#quadratic in hsGPA has a U-shape, and it only turns up at about hsGPA*


#= 2.68, which is hard to interpret. The coefficient of main interest, on PC, falls to about .
116 but is still significant.

#Adding hsGPA2 is a simple robustness check of the main finding.

###----------------------------------------------------------------------------------------------------------------------------

#### Project 2

#1. Estimate the model

# Log(wage) = B0 + B1 educ + B2 exper + B3 tenure + B4 married + B5 black +

# B6 south + B7 urban + u

# and report the results in tabular form. Holding other factors fixed, what is

# the approximate difference in monthly salary between blacks and nonblacks?

# Is this difference statistically significant?

data2 <- wooldridge::wage2

summary(data2)

#and report the results in tabular form. Holding other factors fixed, what is the approximate
difference

#in monthly salary between blacks and non blacks? Is this difference statistically significant?

reg1 <- lm(log(wage)~educ+exper+tenure+married+black+south+urban, data2)

stargazer(reg1, type = 'text')

#The coefficient on black implies that, at given levels of the other explanatory variables, black

#men earn about 18.8% less than nonblack men. The t statistic is about –5, and so it is very

#statistically significant.

#2. Add the variables exper2 and tenure2 to the equation and show that they are jointly insignificant
at

#even the 20% level.

data2$exper2 <- data2$exper^2

data2$tenure2 <- data2$tenure^2

reg2 <- lm(log(wage)~educ+exper+tenure+married+black+south+urban+exper2+tenure2, data2)

summary(reg2)

#Checking f-statistic for joint significance

#H0: beta8 = beta9 =0


#H1: beta8 != beta9 != 0

nullhypo <- c("exper2", "tenure2")

linearHypothesis(reg2,nullhypo)

# The F statistic for joint significance of exper2 and tenure2

# with 2 and 925 df, is about 1.49 with p-value ≈ .226. Because the p-value is above .20, these
quadratics are jointly

### insignificant at the 20% level

#3. Extend the original model to allow the return to education to depend on race and test whether
the

#return to education does depend on race.

#H0 : beta(black*educ) !=0 (education does depend on race)

#H1: beta(black*educ) =0 (education does not depend on race)

reg2 <- lm(log(wage)~educ+exper+tenure+married+black+south+urban+black*educ, data2)

summary(reg2)

# The coefficient on the interaction is about −.0226. Therefore, this implies that is that the return to

# another year of education is about 2.3 percentage points lower for black men than nonblack men

#and the t-statistic is -1.12. on the basis of this we can reject the null hypothesis and say that

#return to education does not depend on race

#4. Again, start with the original model, but now allow wages to differ across four groups of people:
married

#and black, married and nonblack, single and black, and single and nonblack. What is the estimated

#wage differential between married blacks and married nonblacks?

#Creating variables for married blacks

data2$marriedblack<- ifelse(data2$married==1 & data2$black==1, 1,0)

data2$marriednonblack<- ifelse(data2$married==1 & data2$black==0, 1,0)

data2$unmarriedblack<- ifelse(data2$married==0 & data2$black==1, 1,0)

#We do not add the fourth category unmarried black as adding all four will lead to multicollinearity

#Adding these to regression equation and removing the columns married and black to avoid
multicollinearity

reg3 <- lm(log(wage)~educ+exper+tenure+south+urban+ unmarriedblack+ marriedblack +


marriednonblack, data2)

summary(reg3)
# The coefficient of married black is +0.009, which means that married black

# earns 0.9% more than single non-black (i.e., base group).

# The coefficient of married nonblack is +0.189, which means that married nonblack

# earns 18.9% more than single non-black.

#The differential between married blacks and married non blacks is given by

#the difference of their coefficients: .0094 − .189 = −.1796. That is, a

### married black man earns about 18% less than a, married nonblack man

You might also like