ECON3334 Midterm Fall2022 Solution

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ECON3334 Introduction to Econometrics

Solution to Midterm Exam, Fall 2022

There are three parts, 12 questions in total. Let us know immediately if there are questions missing in
your exam paper. The total points is 60.

Part I. (16pt) Multiple choice problems.

1. A type II error:
A) is typically smaller than the type I error.
B) is the error you make when choosing type II or type I.
C) is the error you make when not rejecting the null hypothesis when it is false.
D) cannot be calculated when the alternative hypothesis contains an "=".

Ans: C

2. The standard error of 𝑌̅ is given by the following formula:


A) 𝑠𝑌
𝑠𝑌2
B) 𝑛
𝑠𝑌
C)
√𝑛
𝑠𝑌
D)
𝑛
1
where 𝑠𝑌 = 𝑛−1 ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̅)2.

Ans: C

3. To derive the OLS estimator of a linear regression model, you find the values of 𝑏0 and 𝑏1 which minimizes:
A) ∑𝑛𝑖=1(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )2
B) ∑𝑛𝑖=1 |𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 |
C) ∑𝑛𝑖=1(𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )
D) ∑𝑛𝑖=1 𝑌𝑖 (𝑌𝑖 − 𝑏0 − 𝑏1 𝑋𝑖 )

Ans: A

4. The reason why estimators have a sampling distribution is that:


A) economics is not a precise science.
B) individuals respond differently to incentives.
C) in real life you typically get to sample many times.
D) the values of the explanatory variable and the error term differ across samples.

Ans: D

5. To obtain the slope estimator using the least squares principle, you divide the:
A) sample variance of X by the sample variance of Y.
B) sample covariance of X and Y by the sample variance of Y.
C) sample covariance of X and Y by the sample variance of X.
D) sample variance of X by the sample covariance of X and Y.

Ans: C
6. The OLS estimator for the slope for the simple linear regression model is:
𝑠
A) 𝑠𝑋𝑌
2
𝑋
𝑠𝑋𝑌
B)
𝑠𝑋
2
𝑠𝑋𝑌
C) 2
𝑠𝑋
2
𝑠𝑋𝑌
D) 𝑠𝑋
1 1
where 𝑠𝑋𝑌 = ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)(𝑌𝑖 − 𝑌̅) and 𝑠𝑋2 = ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2.
𝑛−1 𝑛−1

Ans: A

7. In general, the t-statistic has the following form:


A) (estimator – hypothesized value) / (standard error of the estimator)
B) (estimator) / (standard error of the estimator)
C) (estimator – hypothesized value) / [√𝑛(standard error of the estimator)]
D) √𝑛 (estimator – hypothesized value) / (standard error of the estimator)

Ans: A

8. The 95% confidence interval for 𝛽1 is the interval:


A) [𝛽1 − 1.96 ⋅ 𝑆𝐸(𝛽1 ), 𝛽1 + 1.96 ⋅ 𝑆𝐸(𝛽1 )]
B) [𝛽̂1 − 1.65 ⋅ 𝑆𝐸(𝛽̂1 ), 𝛽̂1 + 1.65 ⋅ 𝑆𝐸(𝛽̂1 )]
C) [𝛽̂1 − 1.96 ⋅ 𝑆𝐸(𝛽̂1 ), 𝛽̂1 + 1.96 ⋅ 𝑆𝐸(𝛽̂1 )]
D) [𝛽̂1 − 1.96, 𝛽̂1 + 1.96]

Ans: C
Part II. (12pt) Discuss the following research plans.

9. Yana wants to study whether the public job training program in a city for the unemployed could increase the
chance of getting employed. An unemployed person can register for the program by visiting the government’s
website. For this, she collected the list of people who applied for unemployment insurance in a month in the
city and matched it with the job training participation record. The government gave information on only 10%
of randomly selected people who applied for the unemployment insurance. The resulting data set contains
6,341 all unemployed who applied for the unemployed insurance in the month in the city and includes the
social security ID of the subject, the date of training, which is NA if not participating the training, and whether
the subject is employed 3 month after the date of the unemployment insurance application.

a) (1pt) Yana is going to use a simple linear regression model to estimate the effect of job training on the
change of getting employed. What should be the dependent and explanatory variables?

The dependent variable is a binary variable indicating whether this person got a job 1 month after the
training.
The explanatory variable is a binary variable indicating whether the unemployed person has
participated in the job training program.

b) (1pt) Suppose that the population you consider is the unemployed who applied for unemployment
insurance in the cityin the month. Is this a random sampling? Answer yes or no and give a reason.

Yes, because it is randomly selecting 10% of people who applied for the unemployment insurance in
the city in the month.

c) (2pt) Suppose that the population you consider is unemployed in the city in the month. Is this a
random sampling? Answer yes or no and give a reason.

No, because not all unemployed people apply for the unemployment insurance.

d) (2pt) Yana found that the OLS estimate of the slope was positive and significant at a 1% level. Can
she say that the unemployed should participate in the job training program if they want to increase the
chance of getting employed? Answer yes or no and give a reason.

No. The unemployed people who participated in the job training program may have stronger
motivation to earn money and more diligent, and these factors affect the chance of getting employed
regardless of whether they actually received the job training program.

10. Kelly wants to know whether she can increase clean fuel adoption in rural areas, by raising awareness of
the adverse health effect of cooking with solid fuels. To study this, Kelly randomly picked up 100 villages and
randomly divided them into two groups. For a group of villages, Kelly did nothing. For another group of
villages, Kelly asked the local health authority to visit the village and provide a seminar on the topic. Kelly
conducted pre- and post-survey and measured the use of LPG refills (clean fuel) used in each household just
before and after the intervention period. Kelly aggregated the use of LPG at the village level, i.e. calculated the
total number of LPG refills used in the village.

a) (1pt) Kelly is going to use a simple linear regression model to estimate the effect of raising awareness
on clean fuel adoption. What should be the dependent and explanatory variables?
The dependent variable is the aggregated level of the use of LPG of the village. The explanatory
variable is a binary variable indicating whether the village received a seminar on the topic of adverse
health effect of cooking with solid fuels.

Kelly found that the OLS estimate of the slope was positive and significant at a 1% level. Kelly insists that this
could be interpreted as the average treatment effect of raising awareness. Let 𝑓𝑖 (0) be the number of clean
fuels used in village 𝑖 when there is no awareness campaign and 𝑓𝑖 (1) be the number of clean fuels used in the
village when there is an awareness campaign.

b) (1pt) How is the average treatment effect defined?

𝐸[𝑓𝑖 (1)] − 𝐸[𝑓𝑖 (0) ]

Suppose that half of the village is of type 𝑓𝑖 (𝑥) = 100 + 𝑥 and half of the village is of type 𝑓𝑖 (𝑥) = 98 + 𝑥.

c) (1pt) What is the value of the average treatment effect?

[0.5(100 + 1) + 0.5(98 + 1)] − [0.5(100) + 0.5(98)] = 1

d) (3pt) Discuss why the OLS estimator of the slope could be interpreted as the estimate of the average
treatment effect using the model.

Because the explanatory variable 𝑋𝑖 is randomly assigned, the distribution of the type of the subjects
are the same with the population. Therefore, the conditional expectation of the dependent variable 𝑌𝑖
satisfy 𝐸[𝑌𝑖 |𝑋𝑖 = 1] = 𝐸[𝑓𝑖 (1)] and 𝐸[𝑌𝑖 |𝑋𝑖 = 0] = 𝐸[𝑓𝑖 (0)] (1𝑝𝑡). Thus, 𝐴𝑇𝐸 = 𝐸[𝑌𝑖 |𝑋𝑖 = 1] −
𝐸[𝑌𝑖 |𝑋𝑖 = 0] = 𝛽1 (1pt). Because the data is a random sample, as long as there is no outlier, the OLS
estimator of the slope should consistently estimate 𝛽1 (1pt).
Part III. (32pt) Calculation and Analytics

11. (14pt) We have data on 546 properties in Canada in 1987. The relation between the property (1000 CAD)
and the lot size (1000 sqft) look like this:

By estimating a simple linear regression model, we obtained the following estimation results.

𝛽̂0 𝛽̂1
Estimates 33.836 or 34.434 or 34.135 6.60
Standard Error 2.65 0.45
t-statistics with the null 12.994 or 12.7683 or 12.881 14.80
hypothesis equal to 0
95% Confidence Interval [29.24, 39.03] [5.73,7.47]

a) (8pt) Fill in the blanks based on the given information and provide derivation. Use Φ(−1.96) = 0.025 and
Φ(−2.58) = 0.005. Write up to the second digit by rounding up the third digit.

• 𝛽̂0 . From the confidence interval, 𝛽̂0 + 1.96 × 2.65 = 39.03 ⇒ 𝛽̂0 = 33.84. (2pt).
33.84
So, with 𝐻0 : 𝛽0 = 0, |𝑡| = | 2.65 | = 12.77. (2pt)
For this problem, you can also calculate 𝛽̂0 as 29.24 + 1.96 × 2.65 and (29.24 + 39.03)/2 and t-
statistics for each of them.
6.60−0 6.6
• 𝛽̂1 . t-stat = 14.80. Then, 14.80 = 𝑆𝐸(𝛽̂ ) Therefore, 𝑆𝐸(𝛽̂1 ) = 14.8 = 0.446 (2pt).
1
The 95% confidence interval is [6.6 − 1.96 × 0.446, 6.6 + 1.96 × 0.446] = [5.73,7.47]. (2pt).

b) (3pt) Can you reject 𝐻0 : 𝛽0 = 30 at 5% level? Why?

No, we cannot reject 𝛽0 = 30 at 5% level because 30 is inside the 95% confidence interval.

c) (3pt) Can you reject 𝐻0 : 𝛽1 = 5 at 5% level? Why?


Yes, we can reject 𝛽1 = 5 at 5% level because 5 is outside the 95% confidence interval.

12. (18pt) Let 𝛽̂1 be the OLS estimator of 𝛽1 in the linear regression model 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖 . Suppose all
the three OLS assumptions hold (unconfoundedness, i.i.d., and no large outliers). Consider the following three
2 𝑛+2
estimators: 𝛽̂1 + 𝑛, 2√𝑛𝛽̂1 , and 𝑛 𝛽̂1 . 𝑛 is the sample size. Suppose 𝛽1 = 1.

a) (6pt) Calculate the bias for each estimator. (𝑛 may show up in your answer and you can leave it there.)

By unbiasedness of 𝛽̂1 , we have 𝐸(𝛽̂1 ) = 𝛽1 = 1. Therefore,


2 2
• 𝐸 (𝛽̂1 + ) − 𝛽1 = .
𝑛 𝑛
• 𝐸(2√𝑛 𝛽̂1 ) − 𝛽1 = 2√𝑛𝐸(𝛽̂1 ) − 𝛽1 = (2√𝑛 − 1)𝛽1 = 2√𝑛 − 1
𝑛+2 𝑛+2 𝑛+2 2
• 𝐸( 𝑛
𝛽̂1 ) − 𝛽1 = 𝐸(𝛽̂1 ) − 1 =
𝑛
−1= . 𝑛 𝑛

𝑛+2
b) (3pt) Is the third estimator, i.e., 𝑛
𝛽̂1 , consistent? Why or why not?

Yes. As n tends to infinity, (n+2)/n converges to 1. If the three OLS assumptions hold, 𝛽̂1 is consistent.
𝑛+2
It converges to 𝛽1 as n tends to infinity. So 𝛽̂1 converges to 𝛽1 , as n tends to infinity.
𝑛

c) (6pt) Suppose the variance of 𝛽̂1 is 1. Calculate the variances of these three estimators. (𝑛 may show up in
your answer and you can leave it there.)

Since 𝑛 is nonrandom, we have


2 2 2
• 𝑣𝑎𝑟 (𝛽̂1 + 𝑛) = 𝑣𝑎𝑟(𝛽̂1 ) + 𝑣𝑎𝑟 (𝑛) + 2𝑐𝑜𝑣 (𝛽̂1 , 𝑛) = 𝑣𝑎𝑟(𝛽̂1 ) = 1.
• 𝑣𝑎𝑟(2√𝑛𝛽̂1 ) = 4𝑛 × 𝑣𝑎𝑟(𝛽̂1 ) = 4𝑛.
𝑛+2 𝑛+2 2 𝑛+2 2
• 𝑣𝑎𝑟 ( 𝛽̂1 ) = ( 𝑛 ) 𝑣𝑎𝑟(𝛽̂1 ) = ( 𝑛 )
𝑛

d) (3pt) Based on a), b), and c), if you are asked to recommend one estimator from these three, which one
would you recommend?

The second estimator is not consistent, so it's directly ruled out.


The first estimator is the best. It has the same bias as the third estimator, with smaller variance.

You might also like