Professional Documents
Culture Documents
Homework 1
Homework 1
Homework 1
29-03-2023
Instructions
The homework includes five questions about the linear regression model. Most of the answers needs statistical
analyses conducted using R, although some answers can be complemented by theory. The data for each
question is contained in the library “AER” and can be loaded in R using the commands:
library(AER)
data("CollegeDistance")
Please submit your work not later than the 15th of April 2023 to ensure consideration. To do so, load the R
script and short document with the answers and the regression outputs on Moodle or, if this does not work
for you, send them to me at marco.martinez@santannapisa.it. Results will be uploaded not later than
the 22th of April 2023 on Moodle. The grade ranges from 0 to 17 with 15 being the maximum grade and 2
additional points. This grade will be summed to your grade of the second homework once you complete it,
for a total of 30 points + additional points. To pass this homework, you need to make not less than 9 points.
1
• tuition: average University tuition (in 1,000 USD)
• education: number of years of education.
• income: is the family income above USD 25,000 per year? (dummy)
• region: factor indicating U.S. region (dummy: west or other)?
Take your time to explore the dataset.
Rouse (1995) computed years of education by assigning 12 years to all members of the senior class. Each
additional year of secondary education counted as a one year. Students with vocational degrees were assigned
13 years, AA degrees were assigned 14 years, BA degrees were assigned 16 years, those with some graduate
education were assigned 17 years, and those with a graduate degree were assigned 18 years. (Rouse, C.E.
(1995). Democratization or Diversion? The Effect of Community Colleges on Educational Attainment.
Journal of Business & Economic Statistics, 12, 217–224).
ScoreHSi = δ0 + δ1 Distancei + ui
Where ScoreHSi is the final year test score of high-school seniors (see it as equivalent of the Italian Maturità,
German Abitur, and English A-levels). Distancei is the distance from the closest University (in 10 miles).
Run a regression to estimate the coefficients δ0 and δ1 and report the results.
• Is the coefficient of δ1 statistically different from zero?
• How can you interpret it?
• Can delta1 be affected by omitted variable biases? Which ones can you think of and why (please provide
at least two examples)? In which direction do you expect the direction of the bias to go?
ScoreHSi = γ0 + γ1 Incomei + ui
Incomei is a dummy variable taking value 1 if the family income is above 25,000 USD and 0 otherwise.
Run a regression to estimate the coefficients γ0 and γ1 and report the results.
• Would you expect the family Income to directly affect high school final test scores? [No need of R]
• Is the coefficient of γ1 statistically different from zero?
• How can you interpret γ1 ?
• Given this regression, can you conclude that the previously estimated δ1 suffers from omitted variable
biases? Why?
• Can you model the relationship between ScoreHSi and Incomei differently than in the previous
regression, considering that Incomei is a dummy variable? How? Would you prefer to include the
intercept or to include the two categories of Income and not the intercept? [No need of R]
2
• How can you justify a variation of Distancei in this model compared to the variation of Distancei in
the preceding univariate regression?
• How do you interpret the β1 coefficient?