Professional Documents
Culture Documents
Chapter 8: Multiple and Logistic Regression: Openintro Statistics, 3Rd Edition
Chapter 8: Multiple and Logistic Regression: Openintro Statistics, 3Rd Edition
Chapter 8: Multiple and Logistic Regression: Openintro Statistics, 3Rd Edition
2
Weights of books
weight (g)
volumes of books as well as the
600
regression output. Which of the
400
below is correct?
200 400 600 800 1000 1200 1400
volume (cm3)
weight (g)
volumes of books as well as the
600
regression output. Which of the
400
below is correct?
200 400 600 800 1000 1200 1400
volume (cm3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 107.67931 88.37758 1.218 0.245
volume 0.70864 0.09746 7.271 6.26e-06
5
Weights of hardcover and paperback books
1000 hardcover
paperback
800
weight (g)
600
400
6
Weights of hardcover and paperback books
1000 hardcover
paperback
800
weight (g)
600
400
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 197.96284 59.19274 3.344 0.005841 **
volume 0.71795 0.06153 11.669 6.6e-08 ***
cover:pb -184.04727 40.49420 -4.545 0.000672 ***
7
Determining the reference level
(a) paperback
(b) hardcover
8
Determining the reference level
(a) paperback
(b) hardcover
8
Determining the reference level
9
Determining the reference level
9
Linear model
10
Linear model
10
Linear model
10
Linear model
10
Linear model
10
Linear model
1000 hardcover
paperback
800
weight (g)
600
400
11
Interpretation of the regression coefficients
12
Interpretation of the regression coefficients
• Slope of volume: All else held constant, books that are 1 more
cubic centimeter in volume tend to weigh about 0.72 grams
more.
12
Interpretation of the regression coefficients
• Slope of volume: All else held constant, books that are 1 more
cubic centimeter in volume tend to weigh about 0.72 grams
more.
• Slope of cover: All else held constant, the model predicts that
paperback books weigh 184 grams lower than hardcover
books.
12
Interpretation of the regression coefficients
• Slope of volume: All else held constant, books that are 1 more
cubic centimeter in volume tend to weigh about 0.72 grams
more.
• Slope of cover: All else held constant, the model predicts that
paperback books weigh 184 grams lower than hardcover
books.
• Intercept: Hardcover books with no volume are expected on
average to weigh 198 grams.
12
Interpretation of the regression coefficients
• Slope of volume: All else held constant, books that are 1 more
cubic centimeter in volume tend to weigh about 0.72 grams
more.
• Slope of cover: All else held constant, the model predicts that
paperback books weigh 184 grams lower than hardcover
books.
• Intercept: Hardcover books with no volume are expected on
average to weigh 198 grams.
• Obviously, the intercept does not make sense in context. It
only serves to adjust the height of the line. 12
Prediction
13
Prediction
13
Another example: Modeling kid’s test scores
Gelman, Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. (2007) Cambridge University Press.
14
Interpreting the slope
, kids with mothers whose IQs are one point higher tend to score
on average 0.56 points higher.
15
Interpreting the slope
All else held constant, kids with mothers whose IQs are one point
higher tend to score on average 0.56 points higher.
15
Interpreting the slope
16
Interpreting the slope
Kids whose moms haven’t gone to HS, did not work during the first
three years of the kid’s life, have an IQ of 0 and are 0 yrs old are
expected on average to score 19.59. Obviously, the intercept does
not make any sense in context.
16
Interpreting the slope
All else being equal, kids whose moms worked during the first
three year’s of the kid’s life
All else being equal, kids whose moms worked during the first
three year’s of the kid’s life
18
14
poverty
8 10
6
100
80
−0.20 metro_res
60
40
90
70
−0.31 −0.34 white
50
30
90
−0.75 hs_grad
85
0.018
0.24
80
18
0.53 −0.75 −0.61
14
0.30 female_house
10
8
6 8 10 12 14 16 18 30 50 70 90 8 10 12 14 16 18
18
Predicting poverty using % female householder
18
16
% in poverty
14
12
R = 0.53
10
8
R2 = 0.532 = 0.28
6
8 10 12 14 16 18
% female householder
19
Another look at R2
20
Another look at R2
20
Another look at R2
20
Another look at R2
explained variability in y
R2 =
total variability in y
20
Another look at R2
explained variability in y
R2 =
total variability in y
20
Sum of squares
21
Sum of squares
X
Sum of squares of y: SSTotal = (y − ȳ)2 = 480.25 → total variability
21
Sum of squares
X
Sum of squares of y: SSTotal = (y − ȳ)2 = 480.25 → total variability
X
Sum of squares of residuals: SSError = e2i = 347.68 → unexplained variability
21
Sum of squares
X
Sum of squares of y: SSTotal = (y − ȳ)2 = 480.25 → total variability
X
Sum of squares of residuals: SSError = e2i = 347.68 → unexplained variability
Sum of squares of x: SSModel = SSTotal − SSError → explained variability
= 480.25 − 347.68 = 132.57
21
Sum of squares
X
Sum of squares of y: SSTotal = (y − ȳ)2 = 480.25 → total variability
X
Sum of squares of residuals: SSError = e2i = 347.68 → unexplained variability
Sum of squares of x: SSModel = SSTotal − SSError → explained variability
= 480.25 − 347.68 = 132.57
21
Why bother?
22
Why bother?
22
Predicting poverty using % female hh + % white
23
Predicting poverty using % female hh + % white
40 50 60 70 80 90 100 80 85 90
18
14
poverty
8 10
6
100
80
−0.20 metro_res
60
40
90
70
−0.31 −0.34 white
50
30
90
−0.75 hs_grad
85
0.018
0.24
80
18
0.53 −0.75 −0.61
14
0.30 female_house
10
8
6 8 10 12 14 16 18 30 50 70 90 8 10 12 14 16 18
24
Collinearity between explanatory variables
25
Collinearity between explanatory variables
25
Collinearity between explanatory variables (cont.)
26
Collinearity between explanatory variables (cont.)
26
Collinearity between explanatory variables (cont.)
26
R2 vs. adjusted R2
R2 Adjusted R2
Model 1 (Single-predictor) 0.28 0.26
Model 2 (Multiple) 0.29 0.26
27
R2 vs. adjusted R2
R2 Adjusted R2
Model 1 (Single-predictor) 0.28 0.26
Model 2 (Multiple) 0.29 0.26
27
R2 vs. adjusted R2
R2 Adjusted R2
Model 1 (Single-predictor) 0.28 0.26
Model 2 (Multiple) 0.29 0.26
27
Adjusted R2
Adjusted R2
!
SSError n−1
R2adj =1− ×
SSTotal n − p − 1
where n is the number of cases and p is the number of predictors
(explanatory variables) in the model.
28
Calculate adjusted R2
!
SSError n−1
R2adj = 1 − ×
SSTotal n − p − 1
29
Calculate adjusted R2
!
SSError n−1
R2adj = 1 − ×
SSTotal n − p − 1
!
339.47 51 − 1
= 1− ×
480.25 51 − 2 − 1
29
Calculate adjusted R2
!
SSError n−1
R2adj = 1 − ×
SSTotal n − p − 1
!
339.47 51 − 1
= 1− ×
480.25 51 − 2 − 1
!
339.47 50
= 1− ×
480.25 48
29
Calculate adjusted R2
!
SSError n−1
R2adj = 1 − ×
SSTotal n − p − 1
!
339.47 51 − 1
= 1− ×
480.25 51 − 2 − 1
!
339.47 50
= 1− ×
480.25 48
= 1 − 0.74
29
Calculate adjusted R2
!
SSError n−1
R2adj = 1 − ×
SSTotal n − p − 1
!
339.47 51 − 1
= 1− ×
480.25 51 − 2 − 1
!
339.47 50
= 1− ×
480.25 48
= 1 − 0.74
= 0.26 29
Model selection
Beauty in the classroom
Hamermesh & Parker. (2004)“Beauty in the classroom: instructorsÕ pulchritude and putative pedagogical productivityÓ
31
Professor rating vs. beauty
beauty
32
Which of the below is correct based on the model output?
33
Which of the below is correct based on the model output?
33
Exploratory analysis
5.0
Any interesting features?
4.5
professor evaluation
For a given beauty score, 4.0
beauty
34
Exploratory analysis
professor evaluation
beauty scores.
4.0
For a given beauty score,
3.5
are male professors evalu-
ated higher, lower, or about 3.0
male
the same as female profes- 2.5 female
beauty
34
Exploratory analysis
professor evaluation
For a given beauty score, 4.0
34
Professor rating vs. beauty + gender
(a) higher
(b) lower
(c) about the same
35
Professor rating vs. beauty + gender
35
Full model
37
Assessing significance: numerical variables
(a) Since p-value is positive, higher the professor’s age, the higher
we would expect them to be rated.
(b) If we keep all other variables in the model, there is strong
evidence that professor’s age is associated with their rating.
(c) Probability that the true slope parameter for age is 0 is 0.01.
(d) There is about 1% chance that the true slope parameter for
age is -0.0089.
38
Assessing significance: numerical variables
(a) Since p-value is positive, higher the professor’s age, the higher
we would expect them to be rated.
(b) If we keep all other variables in the model, there is strong
evidence that professor’s age is associated with their rating.
(c) Probability that the true slope parameter for age is 0 is 0.01.
(d) There is about 1% chance that the true slope parameter for
age is -0.0089.
38
Assessing significance: categorical variables
40
Model selection strategies
Based on what we’ve learned so far, what are some ways you can
think of that can be used to determine which variables to keep in
the model and which to leave out?
41
Backward-elimination
1. R2adj approach:
• Start with the full model
• Drop one variable at a time and record R2adj of each smaller
model
• Pick the model with the highest increase in R2adj
• Repeat until none of the models yield an increase in R2adj
2. p-value approach:
• Start with the full model
• Drop the variable with the highest p-value and refit a smaller
model
• Repeat until all variables left in the model are significant
42
Backward-elimination: R2adj approach
43
Backward-elimination: R2adj approach
43
Backward-elimination: R2adj approach
43
Backward-elimination: R2adj approach
43
Backward-elimination: R2adj approach
Call:
lm(formula = profevaluation ˜ beauty + gender + age + formal +
native + tenure, data = d)
Coefficients:
(Intercept) beauty gendermale
4.628435 0.105546 0.208079
age formalyes nativenon english
-0.008844 0.132422 -0.243003
tenuretenure track tenuretenured
-0.206784 -0.175967
44
step function in R
Call:
lm(formula = profevaluation ˜ beauty + gender + age + formal +
native + tenure, data = d)
Coefficients:
(Intercept) beauty gendermale
4.628435 0.105546 0.208079
age formalyes nativenon english
-0.008844 0.132422 -0.243003
tenuretenure track tenuretenured
-0.206784 -0.175967
44
Backward-elimination: p − value approach
45
Backward-elimination: p − value approach
45
Backward-elimination: p − value approach
45
Backward-elimination: p − value approach
45
Backward-elimination: p − value approach
45
Backward-elimination: p − value approach
45
Backward-elimination: p − value approach
45
Forward-selection
1. R2adj approach:
• Start with regressions of response vs. each explanatory
variable
• Pick the model with the highest R2adj
• Add the remaining variables one at a time to the existing
model, and once again pick the model with the highest R2adj
• Repeat until the addition of any of the remanning variables
does not result in a higher R2adj
2. p − value approach:
• Start with regressions of response vs. each explanatory
variable
• Pick the variable with the lowest significant p-value
• Add the remaining variables one at a time to the existing
model, and pick the variable with the lowest significant p-value
• Repeat until any of the remaining variables does not have a
significant p-value 46
Selected model
47
Checking model conditions using
graphs
Modeling conditions
ŷ = β0 + β1 x1 + β2 x2 + · · · + βp xp
60
0.0
Frequency
−0.5 40
−1.0
20
−1.5
0
−3 −2 −1 0 1 2 3 −1.5 −1.0 −0.5 0.0 0.5 1.0
50
(2) constant variability in residuals
0.5 1.5
abs(residuals)
0.0
residuals
1.0
−0.5
−1.0 0.5
−1.5
0.0
3.8 4.0 4.2 4.4 4.6 3.8 4.0 4.2 4.4 4.6
fitted fitted
51
Checking constant variance - recap
52
Checking constant variance - recap
52
(3) independent residuals
0.5
0.0
residuals
−0.5
−1.0
−1.5
53
More on the condition of independent residuals
54
(4) linear relationships
1.0 1.0
0.5 0.5
professor evaluation
professor evaluation
0.0 0.0
−0.5 −0.5
−1.0 −1.0
−1.5 −1.5
beauty age
Note: We use residuals instead of the predictors on the y-axis so that we can still
check for linearity without worrying about other possible violations like collinearity
55
between the predictors.
Logistic regression
Regression so far ...
57
Regression so far ...
• Multiple regression
• Relationship between numerical response and multiple
numerical and/or categorical predictors
57
Regression so far ...
• Multiple regression
• Relationship between numerical response and multiple
numerical and/or categorical predictors
57
Odds
Odds
58
Example - Donner Party
In 1846 the Donner and Reed families left Springfield, Illinois, for
California by covered wagon. In July, the Donner Party, as it
became known, reached Fort Bridger, Wyoming. There its leaders
decided to attempt a new and untested rote to the Sacramento
Valley. Having reached its full size of 87 people and 20 wagons, the
party was delayed by a difficult crossing of the Wasatch Range and
again in the crossing of the desert west of the Great Salt Lake. The
group became stranded in the eastern Sierra Nevada mountains
when the region was hit by heavy snows in late October. By the
time the last survivor was rescued on April 21, 1847, 40 of the 87
members had died from famine and exposure to extreme cold.
From Ramsey, F.L. and Schafer, D.W. (2002). The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)
59
Example - Donner Party - Data
60
Example - Donner Party - EDA
Male Female
Died 20 5
Survived 10 10
61
Example - Donner Party - EDA
Male Female
Died 20 5
Survived 10 10
60
50
40
30
20
61
Died Survived
Example - Donner Party
62
Example - Donner Party
62
Example - Donner Party
One way to think about the problem - we can treat Survived and
Died as successes and failures arising from a binomial distribution
where the probability of a success is given by a transformation of a
linear model of the predictors.
62
Generalized linear models
It turns out that this is a very general way of addressing this type of
problem in regression, and the resulting models are called
generalized linear models (GLMs). Logistic regression is just one
example of this type of model.
63
Generalized linear models
It turns out that this is a very general way of addressing this type of
problem in regression, and the resulting models are called
generalized linear models (GLMs). Logistic regression is just one
example of this type of model.
63
Logistic Regression
64
Logistic Regression
Logit function
!
p
logit(p) = log , for 0 ≤ p ≤ 1 64
1−p
Properties of the Logit
exp(x) 1
g−1 (x) = =
1 + exp(x) 1 + exp(−x)
yi ∼ Binom(pi )
η = β0 + β1 x 1 + · · · + βn x n
logit(p) = η
## Call:
## glm(formula = Status ˜ Age, family = binomial, data = donner)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.81852 0.99937 1.820 0.0688 .
## Age -0.06647 0.03222 -2.063 0.0391 *
##
## Null deviance: 61.827 on 44 degrees of freedom
## Residual deviance: 56.291 on 43 degrees of freedom
## AIC: 60.291
67
##
Example - Donner Party - Prediction
Model: !
p
log = 1.8185 − 0.0665 × Age
1−p
68
Example - Donner Party - Prediction
Model: !
p
log = 1.8185 − 0.0665 × Age
1−p
68
Example - Donner Party - Prediction
Model: !
p
log = 1.8185 − 0.0665 × Age
1−p
68
Example - Donner Party - Prediction (cont.)
Model: !
p
log = 1.8185 − 0.0665 × Age
1−p
69
Example - Donner Party - Prediction (cont.)
Model: !
p
log = 1.8185 − 0.0665 × Age
1−p
69
Example - Donner Party - Prediction (cont.)
Model: !
p
log = 1.8185 − 0.0665 × Age
1−p
69
Example - Donner Party - Prediction (cont.)
Model: !
p
log = 1.8185 − 0.0665 × Age
1−p
!
p
log = 1.8185 − 0.0665 × Age
1−p
1.0
Male
Female
0.8
0.6
Status
0.4
0.2
0.0
0 20 40 60 80
70
Example - Donner Party - Prediction (cont.)
!
p
log = 1.8185 − 0.0665 × Age
1−p
1.0
Male
● Female
0.8
0.6
Status
0.4
0.2 ●
0.0
0 20 40 60 80
70
Example - Donner Party - Interpretation
Intercept: The log odds of survival for a party member with an age
of 0. From this we can calculate the odds or probability, but
additional calculations are necessary.
Slope: For a unit increase in age (being 1 year older) how much
will the log odds ratio change, not particularly intuitive. More often
then not we care only about sign and relative magnitude.
71
Example - Donner Party - Interpretation - Slope
!
p1
log = 1.8185 − 0.0665(x + 1)
1 − p1
= 1.8185 − 0.0665x − 0.0665
!
p2
log = 1.8185 − 0.0665x
1 − p2
! !
p1 p2
log − log = −0.0665
1 − p1 1 − p2
, !
p1 p2
log = −0.0665
1 − p1 1 − p2
,
p1 p2
= exp(−0.0665) = 0.94
1 − p1 1 − p2
72
Example - Donner Party - Age and Gender
## Call:
## glm(formula = Status ˜ Age + Sex, family = binomial, data = donner)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.63312 1.11018 1.471 0.1413
## Age -0.07820 0.03728 -2.097 0.0359 *
## SexFemale 1.59729 0.75547 2.114 0.0345 *
## ---
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 61.827 on 44 degrees of freedom
## Residual deviance: 51.256 on 42 degrees of freedom
## AIC: 57.256
##
## Number of Fisher Scoring iterations: 4
73
Gender slope: When the other predictors are held constant this is
Example - Donner Party - Gender Models
Just like MLR we can plug in gender to arrive at two status vs age
models for men and women respectively.
General model:
!
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × Sex
1 − p1
Male model:
!
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × 0
1 − p1
= 1.63312 + −0.07820 × Age
Female model:
!
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × 1
1 − p1
= 3.23041 + −0.07820 × Age 74
Example - Donner Party - Gender Models (cont.)
1.0
Male
Female
0.8
Female
0.6
Status
0.4
Male
0.2
0.0
0 20 40 60 80
75
Hypothesis test for the whole model
## Call:
## glm(formula = Status ˜ Age + Sex, family = binomial, data = donner)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.63312 1.11018 1.471 0.1413
## Age -0.07820 0.03728 -2.097 0.0359 *
## SexFemale 1.59729 0.75547 2.114 0.0345 *
## ---
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 61.827 on 44 degrees of freedom
## Residual deviance: 51.256 on 42 degrees of freedom
## AIC: 57.256
##
## Number of Fisher Scoring iterations: 4
76
Hypothesis test for the whole model
## Call:
## glm(formula = Status ˜ Age + Sex, family = binomial, data = donner)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.63312 1.11018 1.471 0.1413
## Age -0.07820 0.03728 -2.097 0.0359 *
## SexFemale 1.59729 0.75547 2.114 0.0345 *
## ---
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 61.827 on 44 degrees of freedom
## Residual deviance: 51.256 on 42 degrees of freedom
## AIC: 57.256
##
## Number of Fisher Scoring iterations: 4
Note: The model output does not include any F-statistic, as a general rule there
are not single model hypothesis tests for GLM models.
76
Hypothesis tests for a coefficient
Note: The only tricky bit, which is way beyond the scope of this course, is how the
standard error is calculated.
77
Testing for the slope of Age
78
Testing for the slope of Age
H0 : βage = 0
HA : βage , 0
78
Testing for the slope of Age
H0 : βage = 0
HA : βage , 0
βage
ˆ − βage -0.0782 − 0
Z= = = -2.10
SEage 0.0373
p-value = P(|Z| > 2.10) = P(Z > 2.10) + P(Z < -2.10)
= 2 × 0.0178 = 0.0359 78
Confidence interval for age slope coefficient
79
Confidence interval for age slope coefficient
79
Confidence interval for age slope coefficient
Odds ratio:
exp(CI) = (exp −0.1513, exp −0.0051) = (0.85960.9949)
79
Example - Birdkeeping and Lung Cancer
From Ramsey, F.L. and Schafer, D.W. (2002). The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)
80
Example - Birdkeeping and Lung Cancer - Data
LC FM SS BK AG YR CD
1 LungCancer Male Low Bird 37.00 19.00 12.00
2 LungCancer Male Low Bird 41.00 22.00 15.00
3 LungCancer Male High NoBird 43.00 19.00 15.00
.. .. .. .. .. .. .. ..
. . . . . . . .
147 NoCancer Female Low NoBird 65.00 7.00 2.00
50
40
30
YR
20
10
0
40 45 50 55 60 65
Bird No Bird
Lung Cancer N •
No Lung Cancer 4 ◦
82
Example - Birdkeeping and Lung Cancer - Model
summary(glm(LC ˜ FM + SS + BK + AG + YR + CD, data=bird, family=binomial))
## Call:
## glm(formula = LC ˜ FM + SS + BK + AG + YR + CD, family = binomial,
## data = bird)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.93736 1.80425 -1.074 0.282924
## FMFemale 0.56127 0.53116 1.057 0.290653
## SSHigh 0.10545 0.46885 0.225 0.822050
## BKBird 1.36259 0.41128 3.313 0.000923 ***
## AG -0.03976 0.03548 -1.120 0.262503
## YR 0.07287 0.02649 2.751 0.005940 **
## CD 0.02602 0.02552 1.019 0.308055
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 187.14 on 146 degrees of freedom
## Residual deviance: 154.20 on 140 degrees of freedom
## AIC: 168.2 83
##
Example - Birdkeeping and Lung Cancer - Interpretation
84
Example - Birdkeeping and Lung Cancer - Interpretation
84
Example - Birdkeeping and Lung Cancer - Interpretation
84
Example - Birdkeeping and Lung Cancer - Interpretation
84
What do the numbers not mean ...
85
What do the numbers not mean ...
Bird keepers are not 4x more likely to develop lung cancer than
non-bird keepers.
85
What do the numbers not mean ...
Bird keepers are not 4x more likely to develop lung cancer than
non-bird keepers.
P(disease|exposed)
RR =
P(disease|unexposed)
P(disease|exposed)/[1 − P(disease|exposed)]
OR =
P(disease|unexposed)/[1 − P(disease|unexposed)]
85
Back to the birds
86
Back to the birds
0.05
3.91 × 0.95
P(lung cancer|birds) = = 0.171
1 + 3.91 × 0.05
0.95
86
Back to the birds
0.05
3.91 × 0.95
P(lung cancer|birds) = = 0.171
1 + 3.91 × 0.05
0.95
1.0
0.8
P(lung cancer | birds)
0.6
0.4
0.2
0.0
87
Bird OR Curve
1.0
0.8
P(lung cancer | birds)
0.6
0.4
0.2
●
0.0
87
Bird OR Curve
1.0
0.8
P(lung cancer | birds)
0.6
0.4
0.2
●
0.0
87
OR Curves
88
(An old) Example - House
If you’ve ever watched the TV show House on Fox, you know that
Dr. House regularly states, “It’s never lupus.”
The test for lupus is very accurate if the person actually has lupus,
however is very inaccurate if the person does not. More
specifically, the test is 98% accurate if a person actually has the
disease. The test is 74% accurate if a person does not have the
disease.
89
Is Dr. House correct even if someone tests positive for Lupus?
(An old) Example - House
Lupus? Result
positive, 0.98
0.02*0.98 = 0.0196
yes, 0.02
negative, 0.02
0.02*0.02 = 0.0004
positive, 0.26
0.98*0.26 = 0.2548
no, 0.98
negative, 0.74
0.98*0.74 = 0.7252
P(+, Lupus)
P(Lupus|+) =
P(+, Lupus) + P(+, No Lupus)
0.0196
= = 0.0714
0.0196 + 0.2548
90
Testing for lupus
91
Testing for lupus
92
Sensitivity and Specificity
93
Sensitivity and Specificity
93
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
94
Sensitivity and Specificity (cont.)
Condition Condition
Positive Negative
Additionally, our brief foray into power analysis before the first
midterm should also give you an idea about the trade offs that are
inherent in minimizing false positive and false negative rates
(increasing power required either increasing α or n).
95
Back to Spam
96
Back to Spam
1 (spam)
0 (not spam)
Predicted probability
97
Picking a threshold
1 (spam)
0 (not spam)
Predicted probability
97
Picking a threshold
1 (spam)
0 (not spam)
Predicted probability
97
Picking a threshold
Predicted probability
97
Picking a threshold
Predicted probability
97
Picking a threshold
Predicted probability
97
Consequences of picking a threshold
For our data set picking a threshold of 0.75 gives us the following
results:
FN = 340 TP = 27
TN = 3545 FP = 9
98
Consequences of picking a threshold
For our data set picking a threshold of 0.75 gives us the following
results:
FN = 340 TP = 27
TN = 3545 FP = 9
What are the sensitivity and specificity for this particular decision
rule?
98
Consequences of picking a threshold
For our data set picking a threshold of 0.75 gives us the following
results:
FN = 340 TP = 27
TN = 3545 FP = 9
What are the sensitivity and specificity for this particular decision
rule?
98
Trying other thresholds
1 (spam)
Predicted probability
99
Trying other thresholds
1 (spam)
Predicted probability
99
Trying other thresholds
1 (spam)
Predicted probability
99
Trying other thresholds
1 (spam)
Predicted probability
99
Trying other thresholds
1 (spam)
Predicted probability
99
Relationship between Sensitivity and Specificity
1.0
0.8
Sensitivity
0.6
●
0.4
●
0.2
●
●
●
0.0
Specificity
100
Relationship between Sensitivity and Specificity
1.0
●
●
●
0.8 ●
●●
●
Sensitivity
0.6
●
●
●
●
0.4
●
●
●
●●
0.2
●
●
●
●
●
●
0.0
●
●
●
●
Specificity
100
Relationship between Sensitivity and Specificity
1.0
●
●
●
0.8 ●
●●
●
Sensitivity
0.6
●
●●
●
0.4
●
●
●
●●
0.2
●
●
●
●
●
●
0.0
●
●
●
1 − Specificity
100
Receiver operating characteristic (ROC) curve
1.0
0.8
True positive rate
0.6
0.4
0.2
0.0
101
Receiver operating characteristic (ROC) curve (cont.)
• Shows the trade off in sensitivity and specificity for all possible
thresholds.
• Straight forward to compare performance vs. chance.
• Can use the area under the curve (AUC) as an assessment of
the predictive ability of a model.
102
Refining the Spam model
1.0
0.8
True positive rate
0.6
0.4
0.2
104
Utility Functions
105
Utility function for our spam filter
Outcome Utility
True Positive
True Negative
False Positive
False Negative
106
Utility function for our spam filter
Outcome Utility
True Positive 1
True Negative
False Positive
False Negative
106
Utility function for our spam filter
Outcome Utility
True Positive 1
True Negative 1
False Positive
False Negative
106
Utility function for our spam filter
Outcome Utility
True Positive 1
True Negative 1
False Negative
106
Utility function for our spam filter
Outcome Utility
True Positive 1
True Negative 1
False Negative -5
106
Utility function for our spam filter
Outcome Utility
True Positive 1
True Negative 1
False Negative -5
For the email data set picking a threshold of 0.75 gives us the
following results:
FN = 340 TP = 27
TN = 3545 FP = 9
107
Utility for the 0.75 threshold
For the email data set picking a threshold of 0.75 gives us the
following results:
FN = 340 TP = 27
TN = 3545 FP = 9
107
Utility for the 0.75 threshold
For the email data set picking a threshold of 0.75 gives us the
following results:
FN = 340 TP = 27
TN = 3545 FP = 9
107
Utility curve
−2e+04
−6e+04 ●
U
−1e+05
108
Utility curve (zoom)
1600
U[p > 0.6]
1400
●
1200
109
Utility curve (zoom)
1600 ●
U[p > 0.6]
1400
●
1200
109
Maximum Utility
1 (spam)
0 (not spam)
Predicted probability
110