Professional Documents
Culture Documents
Regresión Lineal - 13
Regresión Lineal - 13
Noviembre 8, 2019
Iván Sisa, MD, MPH, MS
isisa@usfq.edu.ec
Temas de la clase
Fundamentos de la regresión lineal
Interpretación
Pruebas diagnósticas
Objetivos de la clase
Al final de esta clase se espera que el
estudiante esté en capacidad de:
.95
the association
between age
.9
and BPF using a
correlation
.85
BPF
coefficient
• Can we fit a line
.8
to this data?
.75
20 30 40 50 60
Age
Quick math review
increase in y 4
2
0
20
relationship) in the 10
data?
5
• Could we fit a line to
this data?
How do we find the best line?
• What is a way to
determine the best line
to use?
What is linear regression?
• Linear regression tries
to find the best line 25
i i
bˆ1 = i=1
å (x - x )
n 2
i
i=1
b̂0 = y - bˆ1x
yˆ = bˆ + bˆx
0 1 1
Example
E(BPF | age) = b0 + b1age
• Here is a regression
equation for the BPFi = b0 + b1agei + ei
comparison of age and Observed data points plus the residuals
BPF .95
.9
.85
BPF
.8
.75
20 30 40 50 60
Age
Results
• The estimated
BPFˆ = 0.957 - 0.0029 * age
regression equation
.95
.9
.85
.8
.75
20 30 40 50 60
Age
BPF predval
. regress bpf age
Estimated intercept
Interpretation of regression
coefficients
• The final regression equation is
( )ˆ
seˆ b0 = s y|x
1
+
x2 ()
seˆ bˆ1 =
n
sy|x
2
å (x - x)
n
n
(
å i
x - x )2
i
i=1 i=1
Estimated intercept
Comparison to correlation
• In this example, we found a relationship between the
age and BPF. We also investigated this relationship
using correlation
• We get the same p--value!!
• Our conclusion is exactly the same!!
Method p-value
Correlation 0.001
Linear regression 0.001
Assumptions of linear regression
• Independence
– All of the data points are independent
– Correlated data points can be taken into account using
multivariate and longitudinal data methods
• Linearity
– Linear relationship between outcome and predictors
• Homoscedasticity of the residuals
– The residuals, ei, have the same variance
• Normality of the residuals
– The residuals, ei, are normally distributed
Linearity assumption
• One of the assumptions of linear regression is that
the relationship between the predictors and the
outcomes is linear
95% confidence
Estimated intercept interval for intercept
R2
R2
• Although we have found a relationship between age
and BPF, linear regression also allows us to assess
how well our model fits the data
• R2=coefficient of determination=proportion of
variance in the outcome explained by the model
– When we have only one predictor, it is the proportion of
the variance in y explained by x
r vs. R2
• R2=(Pearson’s correlation coefficient)2=r2
• Since r is between --1and 1, R2 is always less than r
Method Estimate
r -0.577
R2 0.333
– r= 0.1, R2=0.01
– r= 0.5, R2=0.25
Prediction
Prediction
• Beyond determining if there is a significant
association, linear regression can also be used to
make predictions
20 30 40 50 60
Age
BPF predval
Confidence interval for mean value
• We can place a confidence interval around our
predicted mean value
• This corresponds to the plausible values for the mean
BPF at a specific age
• To calculate a confidence interval for the predicted
mean value, we need an estimate of variability in the
predicted mean
wider for more uncertainty
seˆ(yˆ) = s y|x
1
+
(x - x )2
n n 2
å(x i - x)
i=1
Confidence interval
• Note that the standard error equation has a different
magnitude based on the x value. In particular, the
magnitude is the least when x=the mean of x
20 30 40 50 60
Age
Que hemos aprendido
Evaluar la relación de dos variables continuas
Regresión lineal