Professional Documents
Culture Documents
Regression 1
Regression 1
yi = β0 + β1 xi + i , i = 1, . . . , n,
yi = β0 + β1 xi + i , i = 1, . . . , n,
in matrix notation, stating clearly the contents of your design matrix, X, and vectors y, β, ε.
Form the information matrix XT X, its inverse (XT X)−1 , and X T y. Hence show that
P 2P
1 xi yi − xi xi yi
P P
β̂ = (X T X)−1 X T y =
−
P 2 P P P
n xi − ( xi )2 xi yi + n x i yi
P
Express β̂1 and β̂0 in the form given below, noting that to produce the expression for β̂0 in the
form given, it helps to add and subtract n2 x̄2 ȳ.
β̂0
The variance-covariance matrix of β̂ = is Var(β̂) = (XT X)−1 . What does this tell you
β̂1
about β̂0 and β̂1 here?
Q3 In the usual multiple linear regression model y = Xβ + , the estimated or fitted y values
are given by ŷ = Xβ̂ and the residuals are e = y − ŷ = (I − H)y. Show that H = H T ,
H 2 = H, and (I − H)2 = I − H.
Q4 The following data relate biomass production of soyabeans to cumulative intercepted solar radi-
ation over an 8-week period following emergence. Biomass production is the mean dry weight
in grams of independent samples of four plants.
Use the following R commands to carry out a linear regression analysis of plant biomass on
solar radiation. Evaluate 95% confidence intervals for the regression coefficients β0 and β1 .
Comment on your results.
Q5 A hotel experienced an outbreak of pseudomona dermatis among its guests. Physicians sus-
pected the source of infection to be the hotel whirlpool-spa. The data in the table give the
number of female guests and the number infected by categories of time (minutes) spent in the
whirlpool.
(a) State the basic assumptions of least squares regression and comment, if you can, on
whether each is satisfied by these data.
(b) Perform a linear regression analysis using R of the incidence of infection (number in-
fected/number exposed) on time spent in the whirlpool? Use the midpoint of the time
interval as the independent variable. Estimate the intercept and the slope, and plot the
regression line and the data. Comment on your results.
Q6 Show that for any multiple regression model with an intercept (a β0 constant term in the model),
the sum of the residuals equals zero.