Professional Documents
Culture Documents
An Alisis de Datos: Regresi On
An Alisis de Datos: Regresi On
An Alisis de Datos: Regresi On
Análisis de datos
Regresión
Summary
1 Normal distribution
2 Correlation
3 Linear regression
Linear model
Example ADCP
4 Tasks
Multiple regression Lab.
Bibliography
Normal Correlation Linear regression Tasks
1 1 0 −1
fX (x) = exp − (x − µ) Σ (x − µ)
(2π)n/2 det Σ1/2 2
Mean: E[X] = µ
Covariance:
Cov[X] = E[(x − µ)(x − µ)0 ] = Σ, (det Σ 6= 0)
Symmetric with respect to µ
Sums, marginals and conditionals of normal variables are normal
Sums of non-normal variables are approached by normal ones
Normal Correlation Linear regression Tasks
Simulate N(µ, Σ)
Factorise (Cholevsky) Σ = TT 0
Simulate k independent N(0,1) → u
Compute z = T u
Mean: x = z + µ
Covariance of z
PP and QQ-plots
PP and QQ-plots
1
Q predicted
-1
-2
-3
-4
-5
-4 -3 -2 -1 0 1 2 3
Q observed
Normal Correlation Linear regression Tasks
PP and QQ-plots
PP and QQ-plots
0.9
0.8
0.7
0.6
P predicted
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P observed
Normal Correlation Linear regression Tasks
Covariance
Properties of covariance
Variance ofP
a linear combination:
Y = a0 X = ai Xi
0 ≤ Var[Y ] = a0 · Σ · a
Non-negative definiteness
For all a, a0 · Σ · a ≥ 0
All eigenvectors are non-negative
If det Σ = 0, then a0 X = 0 for a non-null a
If Σ−1 exists, equation x0 · Σ−1 · x = k define an ellipsoid
Covariance is an inner product of random variables
Normal Correlation Linear regression Tasks
Correlation
It ranges −1 ≤ ρij ≤ 1
(Cauchy-Schwarz inequality)
Normal Correlation Linear regression Tasks
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Normal Correlation Linear regression Tasks
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Normal Correlation Linear regression Tasks
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Normal Correlation Linear regression Tasks
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Normal Correlation Linear regression Tasks
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Normal Correlation Linear regression Tasks
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Normal Correlation Linear regression Tasks
-1
-2
-3
-4 -3 -2 -1 0 1 2 3 4
Normal Correlation Linear regression Tasks
Y − µy = λ · (X − µX )
ρXY = ±1
Normal Correlation Linear regression Tasks
Y = b0 + b1 X + R , E[R] = 0 , Cov[X , R] = 0
Linear model
Response data: y1 , y2 , . . . , yn
Covariables, predictors: xi1 , xi2 , . . . , xik
Residuals, errors: e1 , e2 , . . . , en
Model: for i = 1, 2, . . . , n
k
X
yi = β0 + βj xij + ei
j=1
Matrix notation
y = Xb + e
with xi0 = 1, i = 1, 2, . . . , n
Normal Correlation Linear regression Tasks
Linear model
Model: y = Xb + e
Find the β’s such that
Criterion: kek2 minimum
Taking derivatives and equating to 0:
b = ( Xt
b
|{z} |{z} X )−1 |{z}
|{z} Xt y
|{z}
(k+1,1) (k+1,n) (n,k+1) (k+1,n) (n,1)
X† XX† = X† , XX† X = X
Linear model
Sums of squares
Linear model
SSR SSE
R2 = =1−
SST SST
R 2 = 0 ⇒ Regression model is useless
R 2 = 1 ⇒ No errors at all
Normal Correlation Linear regression Tasks
Linear model
H0 : β1 = β2 = · · · = βk = 0 ⇔ SSR = 0 ⇔ R 2 = 0
Example ADCP
v = β0 + β1 · ln h + β2 · h , h = d − d0
Example ADCP
14
12
10
dist fondo (m)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
vel (cm/s)
Normal Correlation Linear regression Tasks
Example ADCP
14
12
10
dist fondo (m)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
vel (cm/s)
Normal Correlation Linear regression Tasks
Example ADCP
Correlation matrix
Correlaciones
Example ADCP
Linear regression
0.9
0.8
0.7
0.6
vel (cm/s)
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
log-dist-fondo (m)
1 Por pasos
Normal Correlation Linear regression
(criterio: Tasks
Prob. de F
Example ADCP para
logdfondo . entrar <= .
050, Prob.
Regression coefficient and ANOVA de F para
salir >= .
100).
a. Variable dependiente: vel
ANOVAb
Suma de Media
Modelo cuadrados gl cuadrática F Sig.
1 Regresión .663 1 .663 317.312 .000a
Residual .121 58 .002
Total .784 59
a. Variables predictoras: (Constante), logdfondo
b. Variable dependiente: vel
Normal Correlation Linear regression Tasks
Example ADCP
Coeficientesa
Coeficientes
Coeficientes no estandarizad
estandarizados os
Modelo B Error típ. Beta t Sig.
1 (Constante) .614 .010 59.083 .000
logdfondo .093 .005 .919 17.813 .000
a. Variable dependiente: vel
Variables excluidasb
Estadísticos de
Correlación colinealidad
Modelo Beta dentro t Sig. parcial Tolerancia
1 dfondo -.110a -1.155 .253 -.151 .291
a. Variables predictoras en el modelo: (Constante), logdfondo
b. Variable dependiente: vel
Example ADCP
Residuals
0.15
0.1
0.05
resid vel (cm/s)
-0.05
-0.1
-0.15
-0.2
-3 -2 -1 0 1 2 3
log-dist-fondo (m)
Normal Correlation Linear regression Tasks
Example ADCP
Residuals
0.15
0.10
normal resid. quantile
0.05
0.00
-0.05
-0.10
-0.15
-0.20
-0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20
resid. quantile (m/s)
Normal Correlation Linear regression Tasks
Bibliography
Bibliography: Regression