Professional Documents
Culture Documents
330 Lecture5 2015
330 Lecture5 2015
29.07.2015
Housekeeping
I Contact details
Office auckland.ac.nz hours
Steffen Klaere 303.219 s.klaere 9:3010:30, Thu+Fri
Arden Miller 303.229C a.miller Wed 910, Thu 1213
I Class representatives
Course aucklanduni.ac.nz
Jessica Courtney 330 jcou608
Monica Hill 330 mhil084
Ben Wilson 762 bwil003
Tutor Office Hours
I Blake Seers
I Tuesday, 9-11
I Thursday, 10-11
I Friday, 14-15
I Hongbin Guo
I Monday, 9-11
I Tuesday, 15-16
I Wednesday, 14-15
I Stage 3 Assistance Room 303S.294
Assignments
http://www.rstudio.com/
http://r-project.org/
Mosaicplot: The Titanic
> data(Titanic)
> mosaicplot(Titanic,main="Survival on the Titanic",
+ col=c("tomato","steelblue"))
Survival on the Titanic
1st 2nd 3rd Crew
Child Adult Child Adult Child Adult Child Adult
No
Male
Yes
Sex
No
Female
Yes
Class
Todays lecture: The multiple regression model
= 0 + 1 X1 + + k Xk
I The relationship
= 0 + 1 X1 + + k Xk
0 , 1 , . . . , k .
0.5
Y 0
-0.5
0.9
0.6
-1 X1
0.3
0
0.2
0.4
0
0.6
0.8
X2 1
Non-deterministic form of model
Y =+
= 0 + 1x1 + + k xk +
where is normally distributed with mean zero and variance 2 .
Y
X1
X2
Checking the data
4
8
12
4
8
0
12
4
8
12
5
x1
4
Given : x2
8
12
4
8
10
12
4
8
12
0 40 80 0 40 80 0 40 80
1 2 3 4 5
Given : x3
Interpretation of coefficients
y = 5 2x1 + 4x2 + .
10
4 2 0 2 4 4 2 0 2 4
10
5
5
y
10
4 2 0 2 4
4 2 0 2 4
x1
x1
Points to note
X0Xb = X0y.
Calculation of best fitting plane
> summary(lm(y~x1+x2))
Call:
lm(formula = y ~ x1 + x2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.9863 0.0304 164.01 <2e-16 ***
x1 -2.0098 0.1015 -19.80 <2e-16 ***
x2 3.9800 0.1015 39.22 <2e-16 ***
---
I Fitted plane is
y = b0 + b1 x1 + b2 x2 .
The residual is the difference between the response and the fitted
value
ri = yi b0 + b1xi1 + b2xi2
= yi b
yi
How well does the plane fit?
n
X
RSS = yi ) 2
(yi b
i=1
is an overall measure of the size of the residuals.
Measuring goodness of fit
n
X
TSS = (yi y )2
i=1
is an overall measure of the variability of the data.
Math stuff: ANOVA identity
n
X n
X n
X
(yi y )2 = yi y )2
(b + (yi ybi )2
i=1 i=1 i=1
TSS = RegSS + RSS
I R 2 is defined as
R 2 = 1 RSS/TSS = RegSS/TSS
I 2 is estimated by
RSS
s2 = .
nk 1
Call:
lm(formula = volume ~ diameter + height, data = cherry.df)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
diameter 4.7082 0.2643 17.816 < 2e-16 ***
height 0.3393 0.1302 2.607 0.0145 *
---