Professional Documents
Culture Documents
ES031 MultipleCorrelationMLR
ES031 MultipleCorrelationMLR
ES031 MultipleCorrelationMLR
MULTIPLE
CORRELATION &
MULTIPLE LINEAR
REGRESSION
LEARNING OBJECTIVES
LO1: Appraise multiple regression techniques to build
empirical models to engineering and scientific data
CORRELATION w/ multiple
independent variables
MULTIPLE CORRELATION
COEFFICIENT
The multiple correlation coefficient is computed as:
# #
𝑟!!" + 𝑟!"" − 2𝑟!!" ' 𝑟!"" ' 𝑟!!!"
𝑅= #
1− 𝑟!!!"
𝑤ℎ𝑒𝑟𝑒:
𝑟!!" = correlation coefficient between a dependent and
independent variable
CORRELATION
MODULE 4
& '
'
&∑ #$% 𝑦#
𝑆"" = ( 𝑦# −
#$% 𝑛
MULTIPLE LINEAR REGRESSION
MODULE 4
ILLUSTRATIVE EXAMPLE
ILLUSTRATIVE EXAMPLE
Using the Correlation function under Data Analysis Tool, the function
displays a correlation matrix between variables where the values in the
matrix are the correlation coefficients of its intersecting variables. For
multiple independent variables, the correlation matrix provides a better
display of the correlation coefficients between variables.
Dielectric
Variable Loss Factor (y) Density (x1)
Constant (x2)
Loss Factor (y) 1
Density (x1) 0.9977 1
Dielectric Constant (x2) 0.9987 0.9987 1
ILLUSTRATIVE EXAMPLE
Multiple Correlation Coefficient R
DEPENDENT INDEPENDENT
Y = F( 𝑋! , 𝑋" … 𝑋# )
One Many
MULTIPLE LINEAR REGRESSION
MODULE 4
NEW CONSIDERATIONS
Adding more independent variables to a multiple regression
procedure does not mean the regression will be “better” or offer
better predictions; in fact it can make things worse. This is called
OVERFITTING.
The addition of more independent variables creates more
relationships among them. So not only are the independent variables
potentially related to dependent variable, they are also potentially
related to each other. When this happens, it is called
MULTICOLLINEARITY.
The ideal case is for all independent variables to be correlated with
the dependent variable but NOT with each other.
MULTIPLE LINEAR REGRESSION
MODULE 4
ASSUMPTIONS
Yi = β 0 + β1X1i + β 2 X 2i + × × × + β k X ki + ε i
Where:
β0 = Y intercept
β1 = slope of Y with variable X1, holding X2, X3, X4, . . . , Xk constant
β2 = slope of Y with variable X2, holding X1, X3, X4, . . . , Xk constant
β3 = slope of Y with variable X3, holding X1, X2, X4,. . . , Xk constant
…
βk = slope of Y with variable Xk, holding X1, X2, X3,. . . , Xk-1 constant
εi = random error in Y for observation i
MULTIPLE LINEAR REGRESSION
MODULE 4
Yi = β 0 + β1X1i + β 2 X 2i + ε i
Where:
β0 = Y intercept
β1 = slope of Y with variable X1, holding X2 constant
β2 = slope of Y with variable X2, holding X1 constant
εi = random error in Y for observation i
MULTIPLE LINEAR REGRESSION
MODULE 4
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of Y
ˆ = b + b X + b X + ××× + b X
Yi 0 1 1i 2 2i k ki
MULTIPLE LINEAR REGRESSION
MODULE 4
INTERPRETING COEFFICIENTS
𝑌! = 27 + 9𝑥! + 12𝑥" …
𝑋% = 𝑐𝑎𝑝𝑖𝑡𝑎𝑙 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 ($1000𝑠) 𝑋' = 𝑚𝑎𝑟𝑘𝑒𝑡𝑖𝑛𝑔 𝑒𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒𝑠 ($1000𝑠)
𝑦I = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑠𝑎𝑙𝑒𝑠 ($1000𝑠)
ILLUSTRATIVE EXAMPLE
ILLUSTRATIVE EXAMPLE
Advertising
Week Pie Sales Price ($) ($100s)
1 350 5.50 3.3 Multiple regression equation:
2 460 7.50 3.3
3
4
350
430
8.00
8.00
3.0
4.5
Sales = b0 + b1 (Price)
5 350 6.80 3.0 + b2 (Advertising).
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
MULTIPLE LINEAR REGRESSION
MODULE 4
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Sales = 306.526 - 24.975(Price) + 74.131(Advertising)
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
COEFFICIENT OF MULTIPLE
DETERMINATION, r2
• The coefficient of determination for the multiple regression model, called the
coefficient of multiple determination, is denoted by 𝑹𝟐
• It tells us how good the multiple regression model is and how well the
independent variables included in the model explain the dependent variable.
MULTIPLE COEFFICIENT OF
DETERMINATION IN EXCEL
Regression Statistics
SSR 29,460.027
Multiple R 0.72213
r2 = = = .52148
R Square 0.52148 SST 56,493.306
Adjusted R Square 0.44172
52.1% of the variation in pie sales
Standard Error 47.46341
is explained by the variation in
Observations 15
price and advertising.
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
COEFFICIENT OF MULTIPLE
DETERMINATION
Consideration of 𝑅#:
Just because we can increase the value of does not imply that the
regression equation with a higher value does a better job of predicting
the dependent variable. Such a value will be misleading
MULTIPLE LINEAR REGRESSION
MODULE 4
ADJUSTED COEFFICIENT OF
MULTIPLE DETERMINATION
ADJUSTED COEFFICIENT OF
MULTIPLE DETERMINATION
Shows the proportion of variation in Y explained by all X
variables adjusted for the number of X variables used:
é 2 æ n - 1 öù
r 2
adj = 1 - ê(1 - r )ç ÷ú
ë è n - k - 1 øû
(where n = sample size, k = number of independent variables)
é 2 æ n - 1 öù
r2
adj = 1 - ê(1 - r )ç ÷ú
ë è n - k - 1 øû
" 15 − 1
𝑟$%& =1− 1 − 0.52148
15 − 2 − 1
HYPOTHESES:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent variable affects Y)
MULTIPLE LINEAR REGRESSION
MODULE 4
• Test statistic:
SSR
MSR k
FSTAT = =
MSE SSE
n - k -1
where FSTAT has numerator d.f. = k and
denominator d.f. = (n – k - 1)
MULTIPLE LINEAR REGRESSION
MODULE 4
If the test results say that the model fits better without the
current set of independent variables, the next action should
be either the following:
HYPOTHESES:
• H0: βj = 0 (no linear relationship)
• H1: βj ≠ 0 (linear relationship does exist between Xj and Y)
MULTIPLE LINEAR REGRESSION
MODULE 4
Test Statistic:
bj - 0 (df = n – k – 1)
t STAT =
Sb
j
MULTIPLE LINEAR REGRESSION
MODULE 4
d.f. = 15-2-1 = 12
For Advertising tSTAT = 2.855, with p-value .0145
a = .05 The test statistic for each variable falls in the
ta/2 = 2.1788 rejection region (p-values < 0.05).
This section of the Regression Data Analysis tool will show which
variable(s) do not belong in the regression model
If p-value > alpha, omit the variable having a P-value > alpha which
indicates that the variable is not significantly correlated to the
dependent variable. If all x variables are insignificant, find other x
variables or increase sample size until results become significant.
CORRELATION & LINEAR REGRESSION ANALYSIS
MODULE 4
SUMMARY
MULTIPLE CORRELATION ANALYSIS
1. Check for a linear relationship between variables using a Scatter Diagram
2. Compute for the correlation coefficients between variables presented in a
correlation matrix
3. Compute for the multiple correlation coefficient, R