Faculty of Economics and Business Administration

Exam: Code: Coordinator: Date: Time: Duration: Calculator allowed: Graphical calculator allowed: Number of questions: Answer in: Credit score:

Advanced Methods for Applied Economic Research E EC AMAER (60422070) Jonneke Bolhaar 26 October 2010 8.45 am 2 hours and 45 minutes yes yes 7 English For each question you can earn a maximum number of points as indi cated in the individual questions. In total you can earn 100 points. The grades will be made public no later than Tuesday 9 November. You can see your solution on Friday 12 November between 2 pm and 3 pm in room 2A-37 6 pages (including this page). Please answer questions 1-4 and 5-7 on separate sheets of paper!

Grade: Inspection:

Number of pages:

Question 1. [19 points total] Consider the regression model Yi = 0 + 1 Xi + ui You may suppose that the assumptions for OLS to give correct estimates are satised. Now suppose that Yi is measured with error, and what you observe in the data is Yi . We can write Yi = Yi + wi , where wi is the measurement error in Yi that is i.i.d. and independent of Yi and Xi . Hence, the regression model that is estimated using the (mismeasured) data is Yi = 0 + 1 Xi + vi (i). [5 points] Show that vi = ui + wi . (ii). [7 points] Show that the regression Yi = 0 + 1 Xi + vi satises the assumptions for OLS to be a consistent and unbiased estimator. (You may assume here that wi is independent of Yj and Xj for all values of i and j and has nite fourth moments) (iii). [7 points] Can the condence intervals be constructed in the usual way?

Question 2. [7 points] (Yi , X1,i, X2,i ) satises the 4 assumptions for multivariate OLS. Now assume that X1 and X2 are uncorrelated. 1 is the parameter that measures the causal eect of X1 on Y . If you estimate 1 by running the regression Yi = 0 + 1 X1,i + ui , does the obtained estimator suer from omitted variable bias?

Question 3. [7 points] Labor economists studying the determinants of womens earnings discovered a puzzling empirical result. Using randomly selected employed women, they regressed earnings on the womens number of children and a set of control variables (age, education, occupation, etc.). They found that women with more children earn higher wages, controlling for these other factors. Explain how sample selection might be the cause of this result. 2

Question 4. [17 points total] Stefan estimated an intrumental variable regression model with one regressor, Xi , and two instruments, Z1,i and Z2,i . He also obtained a J-statistic with value 18.2. The J-statistic has a 2 distribution, of which the 1% critical value is 6.63. 1 (i). [4 points] Which of the two conditions for an instrument to be valid is tested here? What is the name of this test? (ii). [5 points] Describe in words (!) how this test works. (iii). [4 points] Does this suggest that E[ui |Z1,i, Z2,i ] = 0? Explain. (iv). [4 points] Does this suggest that E[ui |Z1,i] = 0? Explain.

Question 5. [12 points total] Consider a study to evaluate the eect of having access to internet in dorm rooms on college student grades. In a large dorm, half the rooms are randomly wired for high-speed Internet connections (the treatment group), and nal course grades are collected for all residents. Which of the following situation pose a threat to internal validity and external validity of the experiment, and why? (i). [3 points] Midway through the year all the male athletes move into a fraternity and drop out of the study (their nal grades are not observed) (ii). [3 points] Engineering students assigned to the control group put together a local area network so that they can share a private wireless Internet connection that they pay for jointly. (iii). [3 points] The art majors in the treatment group never learn how to access their internet accounts. (iv). [3 points] The economics majors in the treatment group provide access to their Internet connection to those in the control group, for a fee.

Question 6. [18 points total] This question is about binary dependent variables. (i). [4 points] Provide at least two arguments for using a probit model when the outcome variable is a 0/1-variable. (ii). [7 points] Let us assume that we are interested in the eect of race and income/mortage ratio on the probability of being denied a bank loan for buying a house. Using the Probit model, we estimate the following equation: P (Y = 1|P I Ratio, Black ) = (2.25 + 2.74 P I Ratio + .71 Black), where P I Ratio is the fraction of income that will go to mortage payment and Black is a dummy for being of black race in contrast to white. What do the estimated coeentsof these variables, as shown above, tell you? Explain how you would obtain the marginal eects of P I Ratio and black. (iii). [7 points] The likelihoods for the two possible outcomes in the bivariate probit model can be written as:

P(Y = 1|X) = (0 + 1 X1 ) P(Y = 0|X) = 1 (0 + 1 X1 ) Using these likelihoods, write down the maximun likelihood function and explain how you would go about to get the estimates of 0 and 1 .

Question 7. [20 points total] Assume that you have a data set that recorded all marriages in Netherlands that started in a specic month in 2009. The data follows these marriages until October 2010, when the data collection stops. (i). [5 points] In the data, marriage durations can be observed. During the time period, some marriages did not last but some were still ongoing in October 2010. Explain using both a formula and using words how you would estimate the empirical hazard rate in this case and explain what it measures. (ii). [5 points] Assume that empirical hazard rate of marriage durations decreases monotonically over time. In the case of marriage durations, give at least two examples that could explain such a decreasing hazard rate, where one explanation should involve selection. (iii). [5 points] Assume now that you would like to use maximum likelihood estimation to estimate a hazard rate model for the duration of marriages. The hazard rate is speciced as: (t; X) = 0 (t)(0 + 1 gender + 2 age + 3 education) where gender is an indicator for being female, age indicates age in years, and education measures years of schooling. You now have to choose between dierent parametric specications of the duration dependence in 0 (t). Motivate you choice of duration dependence and explain how the chosen duration dependence enters in the formula above. (iv). [5 points] In the model above, you needed to specify the duration dependence parametrically. Explain what is the major problem with having to choose a parametric specication and propose a solution to the problem. Explain both in words and using a formula

