Professional Documents
Culture Documents
Chapter 4 Drawing Conclusion
Chapter 4 Drawing Conclusion
Department of Statistics
The Chinese University of Hong Kong
2020/21 Term 1
1
Chapter Outline
• Ch 2 & Ch 3: OLS Estimates, MLE, Properties, Confidence
Intervals, Hypothesis Testing
• Ch 4: Interpret the results from the OLS estimates, including
Section 4.1: Understanding Parameter Estimates
- How we should interpret the parameters
Section 4.2: Experimental vs. Observational Explanatory Variables
- Data were collected in different ways
Section 4.3: Notes on R2
- Situations when R2 is useful/not useful
2
Section 4.1
Understanding Parameter Estimates
3
Interpretation of Beta
Multiple Linear Regression: E(Y | X ) 0 1x1 p x p
Consider the fitted regression line for the Fuel data:
E ( Fuel | X ) 154 .19 4.23Tax 0.47 Dlic 6.14 Income 18.54 log 2 ( Miles )
• βi is the Rate of Change of y on xi, after adjusting for other
variables . I.e. Unit of β = Unit of y / Unit of x.
E.g. Fuel decreased by 6.14 gallon when Income increased by $1000
Fuel increased by 18.54 gallon when Miles is doubled (log22x=1+log2(x))
Issues of the above interpretation:
• The sign/magnitude of the estimates may not be consistent
with your intuition, as we assumed other terms to stay
unchanged even though correlations exist between terms.
• Value of the parameter estimate may change if the other terms
are replaced by a linear combinations of the terms in the model.
4
Example - Berkeley Guidance Study
Berkeley Guidance Study: Want to relate the growth of weights
with somatotype (體型) of girls (n=70).
Responses: Somatotype (Y) at age 18, a scale of 1 to 7 to quantify
the body shape of a person based on photos
x 1 2 3 4 5 6 7 x
Very thin Average Obese
Explanatory Variables:
1. WT2 = Weight (in kg) at Age 2
2. WT9 = Weight (in kg) at Age 9
3. WT18 = Weight (in kg) at Age 18
Correlation Matrices:
10
OLS Estimates – Linear Transformation
Finding #2: Linear transformation of terms in the multiple linear
regression would not alter the least squares estimates:
Model 2
17
Section 4.2
Experimental vs. Observational Explanatory
Variables
18
Experimental vs. Observational Explanatory Variables
Types of Explanatory Variables (EVs) in regression analysis:
• Experimental EVs:
• Values are under the control of the experimenter
• Values are assigned based on randomization scheme
• Observational EVs: Values are observed (not set by the
experimenter)
• Values are observed via sampling, beyond the control of
experimenter
Example: Investigate factors affecting the crop yield
• Experimental EVs: Amt of fertilizers, Water, Spacing of
plants, … etc
• Observational EVs: Soil fertility, Temperature, Weather
19
Experimentation vs. Observation
Primary difference between the two types of EVs:
*** Difference Inference to be made ***
20
Experimentation vs. Observation
Example: Does more mobile phone usage decrease brain activity?
Experimental Study
1. Find a group of people and randomly assign them into groups
(Randomization helps to average out the lurking variables)
2. For each group, force them to use mobile phone for different
amount of time (X) [could be unethical]
3. Measure their brain activities (Y)
4. Regress brain activities (Y) on time (X)
Possible Conclusion: More phone usage cause lower brain
activities
21
Experimentation vs. Observation
Example: Does more mobile phone usage decrease brain activity?
Observational Study
1. Find a group of people via sampling (e.g. on the street
randomly?)
2. Measure their brain activities (Y) and habit of using mobile
phone (e.g. hours/week, average time to sleep, … etc)
3. Regress brain activities (Y) on time (X)
Possible Conclusion: More phone usage is associated with lower
brain activities
22
Example – Cake (Ch6)
• Problem: To study the palatability
score of cake (i.e. acceptability in
terms of taste) by baking temperature
and baking time
• Variables:
Y = Score for cake (The higher the better)
X1 = baking time (in minutes)
X2 = baking temperature (in F)
• n=14 observations: 6 cakes with x1= 35 minutes and x2 = 350F, and
the other 8 cakes with x1 and x2 scattered around (x1, x2)=(35, 350).
Experimental study
1) Explanatory variables are controlled by the experimenter
2) Lurking variables (if exists) are averaged out
* *
3) Strong Inference: Baking time and temperature at ( x1 , x2 ) will give
the best cake. 23
Section 4.3
Notes on R2
24
Notes on R2
• R2 tends to be large if the X are R2=0.241
dispersed
• R2 tends to be small if the X are
concentrated
=> Need to be careful about sampling!
R2=0.372 R2=0.027
25
Notes on R2
Simple linear regression: R2 is useful to measure the goodness-of- fit
if and only if the scatterplot looks like a sample from a bivariate
normal distribution (elliptical bivariate pdf)
R2 a useful
goodness-of-fit
measure
R2 NOT a useful
goodness-of-fit
measure:
• Leverage point
• Not-linear mean
function
• Lurking variable
26
Notes on R2
• Multiple linear regression: R2 is a useful goodness-of-fit
measure if the variables (terms and responses) follow a
multivariate normal distribution (i.e. ellipsoid joint pdf)
• Very hard to justify from the data, especially when (1) the
number of variables is large but (2) the data are sparse
• Residual plot would be extremely helpful: Identify if
possible non-null plot behavior on the residuals.
27