Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

UNIVERSITY OF MICHIGAN | DEPARTMENT OF ECONOMICS

Econ452: Problem Set 2


This problem set is concerned with some of the algebraic properties of the simple regression
model. You can work in groups of up to four people and hand in the problem set as a group. It
is due on Canvas by midnight on Tuesday, February 9.

1. Suppose a researcher is interested in estimating the linear regression model yi =


β0 + β1 x i + ui and in a sample of n = 48 points, the following descriptive statistics
are generated:
n
X n
X
x̄ = 14, ȳ = 6, (x i − x̄)2 = 360, ( yi − ȳ)2 = 13,
i=i i=1
n
X
(x i − x̄)( yi − ȳ) = 18, SSE = 12
i=1

What are the OLS estimates of β0 and β1 ? What is the R2 for this model?

2. Suppose one
Pobtains OLS estimates for the linear regression model yi = β0 + β1 x i + ui .
Show that i=1 ûi ( ŷi − ¯ŷ) = 0.
n

3. Using data from the 2004 baseball season, a researcher collects data on the number
of wins a team had during the year and payroll in millions of dollars. The researcher
wants to estimate a model to examine whether the size of the payroll alters wins, so they
want to consider an OLS model of the form winsi = β0 + β1 payrolli + ui . The author gets
as far as getting the sample mean, standard devation (sd), and correlation coefficient
between wins and payroll (presented below), then their computer crashes. Using the
data below, calculate the estimates of β0 and β1 . Interpret the results for β1 . According
to the model estimates, by how much will wins increase if a team spend $15 million
more on salary?

> mean(wins)
[1] 66.43333
> sd(wins)
[1] 10.22399
> mean(payroll)
[1] 69.43402
> sd(payroll)
[1] 26.15059
> cor(wins, payroll)
[1] 0.5019958

1
4. Suppose a researcher is interested in estimating the impact of gasoline taxes (x i ) on per
capital gallons of gasoline consumed per year ( yi ). Assume tax is measured in cents per
gallon. The researcher has data from 51 states for a 10 year period for a total of 510
observations. The researcher estimates the linear model yi = β0 +β1 x i +ui and calculates
β̂1 = −0.90. Suppose instead of measuring taxes in cents per gallon, the researcher
measures taxes in dollars per gallon where the new model is yi = γ0 + γ1 x i∗ + ui and
x i∗ = x i /100. What will be the estimate of γ1 ? Suppose taxes are measured in cents
as in the first case, but consumption is measured as gallons consumed per month, i.e.,
yi∗ = yi /12. The model now is of the form yi∗ = α0 + α1 x i + ui . What will be the estimate
of α1 ?

5. Given an OLS model of the form yi = β0 + β1 x i + ui , show that ȳ = ¯ŷ.

6. The house price data we have been using as an example in class is located at

http://www-personal.umich.edu/~hagem/data/hprice1.raw

Load the data set into R with

url <- "http://www-personal.umich.edu/~hagem/data/hprice1.raw"


d <- read.table(url, header=TRUE)
attach(d)

(If you do everything correctly, there won’t be any output or response from R.) Construct
two new variables: The natural log of house price log(price) and the natural log of house
size log(sqrft) with

lnprice <- log(price)


lnsqrft <- log(sqrft)

Next, run a regression of lnprice on lnsqrft and then summarize with

f <- lm(lnprice ~ lnsqrft)


summary(f)

(Now R should produce results.) What are the estimates for β0 and β1 ? What is the
R2 for the model? Next, interpret the estimate for β1 . Be precise, explain the units of
measure on the variable and give a numeric example. Next, use

r <- f$residuals
mean(r)
cor(r, lnsqrft)

2
and report the average of the regression residuals and the correlation coefficient of the
residuals and log(sqrft). The round function can be useful here to make things more
readable, for example,

round(mean(r), 4)

rounds to 4 digits after the decimal point.

7. A researcher is interested in examining the impact of illegal music downloads on com-


mercial music sales. The author collects data on commercial sales of the top 500 singles
from 2017 ( y) and the number of downloads from a web site that allows file sharing
(x). The author estimates a regression model of the form yi = β0 + β1 x i + ui and gets a
large positive estimated parameter β̂1 . The author concludes these results demonstrate
that downloads actually spur on music sales. Is this a good estimate of the impact of
illegal music on sales? Why or why not? Do you expect the estimate to overstate or
understate the true relationship between y and x.

8. Below are R results for the sample mean, standard deviation, covariance, and regression
(lm) results of y on x. The sample has 100 observations.

> mean(x)
[1] 0.03524317
> sd(x)
[1] 1.120828
> sd(y)
[1] 5.241621
> cov(x, y)
[1] 1.636617
> f <- lm(y ~ x)
> summary(f)

Call:
lm(formula = y ~ x)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0358 0.5062 9.948 < 2e-16 ***
x ?.???? 0.4537 2.871 0.00501 **

Residual standard error: 5.06 on 98 degrees of freedom


Multiple R-squared: ?.????, Adjusted R-squared:
F-statistic: 8.245 on 1 and 98 DF, p-value: 0.005008

3
Some results have been deleted. What are the values for (a) β̂1 , (b) ȳ, and (c) R2 if
SSR = 2509.153?

9. (Bonus problem: hard) Suppose one estimates the linear regression model y = β0 +
β1 x + u by OLS. Show that the square of the correlation coefficient between y and ŷ is
equal to the R2 .

Pn Write the definition of the squared


Hint: Pn correlation coefficient and there should be
( y i − ȳ)( ŷ i − ¯
ŷ). Write this as y ( ŷi − ¯ŷ) and remember that yi = ŷi + ûi .
i=1 i=1 i

You might also like