Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

HW 1. Due 1/25. As always, write professionally.

The grader will deduct points for non-


professional looking documents, even if the answers are otherwise essentially correct. Follow
the same guidelines I gave in ISQS 5347. Include R code neatly for all problems so you can copy-
and-paste the code into R.
1. Your health club charges you $100 for initial membership, and $5 per visit thereafter.
After X = x visits, how much money (Y) have you spent? Explain why the model is
deterministic.

2. Give p(y| X = 3) for problem 1. (Note: it is a probability distribution that puts 100%
probability on a single number; this is called a degenerate probability distribution.)

3. How could you change the problem statement in problem 1. so that the amount of
money you spend spent after X = 3 visits has a non-degenerate probability distribution?
Keep the problem real, about the health club, the money Y. Do not discuss simulation or
other fake data.

4. Suppose the classical regression model holds, with β0 = 10, β1 = 1 and σ = 3. Using R,
graph the distribution of Y when X = 5, and when X = 10 on the same axes, with different
plot line types. Include labels.

5. Using the model of 4., find the probability that Y| X =5 is between 10 and 20. Use R.

6. Simulate data from the model in 4. as follows: (a) generate n=10 X values from the
Poisson distribution with λ = 12. (b) using the X values in (a), generate Y values
corresponding to the model.

7. Is the model in 6. a fixed-x or a random-X model? Explain, using the definitions in the
book.

8. Using the simulated data in 6, draw the scatterplot, and overlay (i) the true regression
function (ii) the least squares estimate of the true regression function, and (iii) the
LOESS estimate of the true regression function. Which of these three functions is
random? Which is fixed? How do you know for sure?

9. Which of the two estimated functions in 8. gives an estimate of E(Y | X = 5) that is


closest to true value of E(Y | X = 5) ? (To answer, you must first get the actual values of
E(Y | X = 5) and the two estimated values, then simply check which is closest.)

10. Repeat 9. With n=100.


11. Repeat 9. With n=1000.

12. Problems 10 and 11 have larger sample sizes. In your answers, what benefit do you see
of having a larger sample size?

13. Use the data set


toluca = read.table("http://westfall.ba.ttu.edu/isqs5349/Rdata/toluca.txt")
Let Y =workhours and X = lotsize. Draw the scatterplot, and overlay (i) the least squares
estimate of the true regression function, and (ii) the LOWESS (or LOESS) estimate of the
true regression function.

14. (i) Define the “true regression function” using the definition that involves conditional
distributions in problem 13. (ii) Why did you not overlay this true regression function in
problem 13.? (iii) Which estimate do you think is closer to the true regression function,
the least squares line or the LOWESS (LOESS) fit? (There is no clearly correct answer
here. See the book’s discussion, and use your judgment.)

15. Use the data set


cp = read.table("http://westfall.ba.ttu.edu/isqs5349/Rdata/complex.txt")
Let Y = Pref and X = Complex. Draw the scatterplot, and overlay (i) the least squares
estimate of the true regression function, and (ii) the LOWESS (or LOESS) estimate of the
true regression function.

16. (i) Define the “true regression function” using the definition that involves conditional
distributions in problem 15. (ii) Why did you not overlay this true regression function in
problem 15.? (iii) Which estimate do you think is closer to the true regression function,
the least squares line or the LOWESS (LOESS) fit? (There is a clearly correct answer here.
See the book’s discussion.)

You might also like