"Multiple Regression and Issues in Regression Analysis": D Numerator K D Denominator N - (k+1)

2019, Study Session # 3, Reading # 8
“MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS”

MSR = Mean Regression Sum of Squares 𝐹Z = Critical F taken from F
MSE = Mean Squared Error 1. INTRODUCTION Distribute Table
RSS = Regression Sum of Squares 𝐻) = Null Hypothesis
SSE = Sum of Squared Errors/Residuals 𝐻∝ = Alternative Hypothesis
α = Level of Significance Multiple linear regression models are more X = Independent Variable
ML = Machine Learning sophisticated. Y = Dependent Variable
They incorporate more than one independent F = F Statistic (calculated)
variable.
2. MULTIPLE LINEAR REGRESSIONS
Allows determining effects of more than one independent variable on a particular dependent variable
𝑌= = 𝑏) + 𝑏" 𝑋"= + 𝑏$𝑋$= + ⋯ 𝑏' 𝑋' + 𝐸=
Tells the impact on Y by changing X1 by 1 unit keeping other independent variables same.
Individual slope coefficients (e.g. b1) in multiple regressions known as partial regression/slope coefficients.
2.1 Assumption of the 2.2 Predicting the 2.3 Testing Whether All 2.4
Multiple Linear Dependent Variable in a Population Regression Adjusted
Regression Model Multiple Regression Model Coefficients Equals Zero R2
Relationship b/w Y Obtain estimates of 𝐻) Þ All slope coefficients are R2 á with addition of
and 𝑋" , 𝑋$ , 𝑋% , … 𝑋' is regression parameters. simultaneously = 0, none of the independent variables
linear. 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 = X variable helps explain Y. (X) in regression
Independent variables $
𝑏)^ , 𝑏"^ , 𝑏)^ , … 𝑏'^ To test 𝐻) F-test is used. 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅$ J𝑅 K =
are not random and 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 T-test cannot be used. '4"
no exact linear = 𝑏) , 𝑏" , 𝑏" , … 𝑏2 ,-. 1−J K (1 − 𝑅$ ).
'424"
𝐹= $
,-/
relationship exists b/w Determine assumed .--/2 When k ≥ 1 Þ 𝑅$ > 𝑅
2 or more values of 𝑋<"= , 𝑋<$= … 𝑋<2 =--//('4(25")) $
𝑅 can be –ve but R2 is
independent Compute predicted value Where
' always +ve.
variables. of 𝑌< using 𝑌<= = 𝑏<) + $ $
𝑅𝑆𝑆 = 9:𝑌<= − 𝑌? If 𝑅 is used for
Expected value of 𝑏<"𝑋<"= + 𝑏<$𝑋<$= + ⋯ +
error terms is 0.
=@" comparing regression
𝑏<2 𝑋<2= '
Variance of error term $ models.
To predict dependent 𝑆𝑆𝐸 = 9:𝑌= − 𝑌<= ?
is same for all Sample size must be
variable: =@"
observations. Decision rule Þ reject 𝐻) if F > the same
Be confident that
Error term is FC (for given α). Dependent variable
assumptions of the
uncorrelated across It is a one-tailed test. is defined in the
regression are met.
observations. df numerator =k same way.
Predictions regarding X $
Error term is normally must be within reliable df denominator =n-(k+1). á𝑅 Does not
distributed. range of data used to For k and n the test statistic necessarily indicate
estimate the model. representing H0, all slope regression is well
coefficients are equal to 0, is specified.
𝐹2,'4(25")
In F-distribution table
𝑓) 𝐹2,'4(25") where K
represents column and n-(k+1)
represents row.
Significance of F in ANOVA
table represents ‘p value’.
á F-statistic â chances of Type
I error.
1
Copyright © FinQuiz.com. All rights reserved.
3. USING DUMMY VARIABLES IN REGRESSION
Dummy variable Þ takes 1 if particular condition is true & 0 when it is false.

Diligence is required in choosing no. of dummy variables.
Usually n-1 dummy variables are used where n= no. of categories.
4. VIOLATIONS OF REGRESSION ASSUMPTIONS
4.1 Heteroskedasticity 4.2 Serial Correlation 4.3 Multicollinearity

4.4 Summarizing
the Issues
Variance of errors differs Occurs when two or more
Regression errors
independent variables (X) are
across observations Þ correlated across
heteroskedastic highly correlated with each other.
observations.
Regression can be estimated but
Variance of errors is similar Usually arises in
result becomes problematic.
across observations Þ time-series
Serious practical concern due to
homoskedastic regression.
commonly found approximate
Usually no systematic
linear relation among financial
relationship exists b/w X &
variables.
regression residuals.
If systematic relationship is
present Þ heteroskedasticity On page 3
can exist.
On page 4
4.1.3 Correcting for

4.1.1 The Consequence of 4.1.2 Testing for
Heteroskedasticity
Heteroskedasticity Heteroskedasticity
It can lead to mistake in inference. Does not Breush-Pagan test is widely used. Robust Generalized
affect consistency. Regression squared residuals of Standard Errors Least Squares
F-test becomes unreliable. regression on independent
Due to biased estimators of standard errors, variables.
Independent variables Corrects standard Modify original
t-test also becomes unreliable.
explain much of the variation error of estimated equation.
Heteroskedasticity effects may include:
coefficients. Requires
underestimation of estimated of errors Þ conditional
Also known as economic
standard errors heteroskedasticity exists.
heteroskedasticity expertise to
inflated t-statistic 𝐻) = no conditional
consistent implement
Ignoring heteroskedasticity leads to heteroskedasticity exists.
standards errors correctly on
significant relationship that does not exist 𝐻\ = conditional
or white-corrected financial data.
actually. heteroskedasticity exist
standards errors.
It becomes more serious while developing Under Breush-pagan test statistic =
investment strategy using regression nR2
analysis. R2: from regression of squared
Unconditional heteroskedasticity Þ when residuals on X
heteroskedasticity of error variance is not Critical value Þ calculated χ2
correlated with independent variables in distribution.
the multiple regression. df = no. of independent variables
Create major problems for statistical Reject 𝐻) if test-static > critical
inference. value.
Conditional heteroskedasticity Þ when
heteroskedasticity of error variance is
correlated with the independent variables.
It causes most problems.
Can be tested & corrected easily
through many statistically software
packages.
2
4.2 Serial Correlation
4.2.2 Testing for 4.2.3 Correcting for

4.2.1 The
Serial Correlation Serial Correlation
Consequences of
Serial Correlation
Incorrect estimate of regression coefficient standard

Adjust the coefficient Modify regression
errors
standard errors. equation.
Parameter estimates become inconsistent & invalid
→ Recommended method Extreme care is required.
when Y is lagged onto X under serial correlation.
Hansen’s method Þ most May lead to inconsistent
Positive serial correlation Þ positive (negative) errors
prevalent one. parameters estimates.
á chance of positive (negative) errors
Negative serial correlation Þ positive (negative)
errors á chance of negative (positive) errors
It leads to wrong inferences
If positive serial correlation:
Standard errors underestimated
T-statistic & F-statistics inflated
Type-I error á
If negative serial correlation
Standard errors overestimated
T-statistics & F-statistics understated
Type-II error á
4.1.2 Testing for Heteroskedasticity

Variety of tests, most common → Durbin-Watson test
∑` < <
abc (/a 4/ade )
c
𝐷𝑊 = ∑` <c
abe /a
Breush-Pagan
Where 𝐸<ftest is widely residual
= regression used. for period t.
Regression
For largesquared
sample residuals
size DWofstatistic
regression on independent variables.
is approximately.
» 2(1-r)variables explain much of the variation of errors Þ
Independent
→DW
conditional
→where r = heteroskedasticity
sample correlation exists.
b/w regression residuals of t and t-1
𝐻) = no conditional
Values of DW can heteroskedasticity
range from 0 to exists.
4.
𝐻\ = DW
conditional
= 2 Þ r=0heteroskedasticity exist
Þ no serial correlation.
Under Breush-pagan test statistic = nR2
DW = 0 Þ r=1 Þ perfectly positively serially correlated.
R2: from regression of squared residuals on X
DW = 4 Þ r = -1 Þ perfectly negatively serially correlated.
Critical value Þ calculated χ2 distribution.
For positive serial correlation:
df = no. of independent variables
𝐻) Þ No positive serial correlation
Reject 𝐻 if test-static > critical value.
𝐻)\ Þ Positive serial correlation
𝐷𝑊 < 𝑑𝑙 Þ reject 𝐻)
𝐷𝑊 > 𝑑𝑢 Þ do not reject 𝐻)
dl ≤ 𝐷𝑊 ≤ 𝑑𝑢 Þ inconclusive.
For negative serial correlation:
𝐻) Þ No negative serial correlation.
𝐻\ Þ Negative serial correlation.
𝐷𝑊 > 4 − 𝑑𝑙 Þ Reject 𝐻) .
𝐷𝑊 < 4 − 𝑑𝑢 Þ do not reject 𝐻)
4 − 𝑑𝑢 ≤ 𝐷𝑊 ≤ 4 − 𝑑𝑙 Þ inconclusive.
3
4.3 Multicollinearity
4.3.1 The 4.3.2 Detecting 4.3.3 Correcting

Consequences of Multicollinearity Multicollinearity
Multicollinearity
Difficulty in detecting significant Multicollinearity is a matter of degree Exclude one or more

relationships. rather than the presence / absence. regression variables.
Estimates become extremely á Pair wise correlation does not necessarily In many cases,
imprecise & unreliable though indicate presence of Multicollinearity experimentation is done to
consistency is unaffected. â Pair wise correlation does not necessarily determine variable causing
F-statistic is unaffected. indicate absence of Multicollinearity Multicollinearity
Standard errors of regression can á. With 2 independent variables Þ correlation
Causing insignificant t-tests is a useful indicator.
Wide confidence interval á R2 significant, F-statistic significant,
Type II errorá insignificant t-statistic on slope coefficients
Þ classic symptom of Multicollinearity
4.4 Summarizing
the Issues
Problem How to detect Consequences Possible

Corrections
(Conditional) Plot residuals or Wrong Use robust
Heteroskedasticity use Breusch– inferences; standard errors or
Pagan test incorrect GLS
standard errors
Serial correlation Durbin-Watson Wrong Use robust
Test inferences; standard errors
incorrect (Hansen’s method)
standard errors or modifying
equation
Multicollinearity High R2 and Wrong Omit variable
significant F- inferences;
statistic but low t-
statistic
4
5. MODEL SPECIFICATION AND ERRORS IN SPECIFICATION
Model specification Þ set of variables included in regression.

Incorrect specification leads to biased & inconsistent parameters
5.2 Misspecified 5.3 Times-Series 5.4 Other Types

5.1 Principles of Misspecification of Time-Series
Functional Form
Model Specification (Independent Variables Misspecification
Correlated with Errors)
Model grounded on One or more variables are Including lagged Non-stationarity:

economic reasoning. omitted. If omitted variable is variables (dependent) variable
Functional form of variables correlated with remaining as independent with properties, e.g.
compatible with nature of variable, error term will also be serial correlation. mean, are not
variables correlated with the latter and the: Including a function of constant through
Parsimonious Þ each result can be biased & the dependent variable time.
included variable should inconsistent. as an independent In practice non-
play an essential role estimated standard errors of variable. stationarity is a
Model is examined for the the coefficients will be Independent variables serious problem.
violation of regression inconsistent. measured with error
assumptions. One or more variables may
Model is tested for the require transformation.
validity & usefulness of the Pooling of data from different
out of sample data. samples that should not be
pooled.
Can lead to spurious results.
6. MODELS WITH QUALITATIVE DEPENDENT VARIABLES
Qualitative dependent variables Þ dummy variables used as

dependent instead of independent.
Probit modelÞ based on normal distribution estimates the
probability:
of discrete outcome, given values of independent variables
used to explain that outcome.
that Y=1, implying a condition is met.
Logit model:
Identical to Probit model.
Based on logistic distribution.
Both Logit and Probit models must be estimated using maximum
likelihood methods.
Discriminate analysis Þ can be used to create an overall score
that is used for classification.
Qualitative dependent variable models can be used for portfolio
management and business management.
5
7. MACHINE LEARNING
7.1 Major Focuses 7.2 What is 7.3 Types of 7.4 Machine 7.5 Supervised Machine
of Data Analytics Machine Learning? Machine Learning? Learning Algorithms Learning Training
Six focuses of data ML uses statistical Two broad categories of ML are: Some simple steps to train the ML models
analytics: techniques that 1. Supervised learning: uses labeled are:
i. Measuring training data and process that info. to 1. Define the ML algorithm.
give computer
correlations find the output. Supervised learning 2. Specify the hyperparameters used in
systems the ability
ii. Making predictions follows the logic of ‘X leads to Y’. the ML technique.
iii. Making casual to act by learning
3. Divide datasets in two major groups:
inferences from data without 2. Unsupervised learning: does not • Training sample
iv. Classifying data being explicitly make use of labelled training data • Validation sample
v. Sorting data into programmed. and does not follow the logic of ‘X 4. Evaluate model-fit through validation
clusters leads to Y’. There are no outcomes to sample and tune the model’s
vi. Reducing the match to, however, the input data is hyperparameters.
dimension of data analyzed, and the program discovers 5. Repeat the training cycles for some
structures within the data itself. given number of times or until the
required performance level is achieved.
7.4.1 Supervised 7.4.2 Unsupervised

Learning Learning
7.4.1.2
7.4.1.1 Penalized 7.4.1.3 Random
Classification & 7.4.1.4 Neural
Regression Forests
Regression Trees Networks
• computationally- CART model is ensembles multiple • also known as artificial neural

efficient method to • is represented by a binary tree decision trees together networks, or ANNs.
solve prediction • computationally efficient • are appropriate for nonlinear
based on random selection
problems. • adaptable to complex datasets statistical data and for data with
of features to produce complex connections among
• imposes penalty on the • useful for interpreting how
accurate and stable variables.
size of regression observations are classified
coefficients to address • works on a pre-classified prediction. Splitting a node • contain nodes that are linked to the
overfitting issue. training data. in random forests is based arrows.
• improves prediction in A classification tree is formed by on the best features from a • ANNs have three types of
large datasets by splitting each node into two random subsets of n interconnected layers:
shrinking the no. of distinct subsets, and the process a) an input layer
features therefore, each
independent variables of splitting the derived subsets is b) hidden layers
tree marginally varies from c) an output layer
and handling model repeated in a recursive manner.
complexity. other tress. • Deep learning nets (DLNs) are neural
Process ends when further
splitting is not possible. networks with many hidden layers
(often > 20 ).
7.4.2.1 Clustering 7.4.2.2 Dimension

Algorithms Reduction
Clustering algorithms discover the K-means Algorithm is a • reduces the no. of random
inherent groupings in the data without type of bottom-up variables for complex datasets
any predefined class labels. clustering algorithm where • Principal component analysis
Two clustering approaches are: data is divided into k- (PCA) is an established method
• Bottom-up clustering: Each clusters based on the for dimension reduction. PCA
observation starts in its own cluster, concept of two geometric reduces highly correlated data
and then assembles with other clusters ideas ‘Centroid’ (average variables into fewer, necessary,
progressively based on some criteria in position of points in the uncorrelated composite variables.
a non-overlapping manner. cluster) and ‘Euclidian’ • Dimension reduction techniques
• Top-down clustering: All observations (straight line distance b/w are applicable to numerical,
begin as one cluster, and then split into two points). textual or visual data.
smaller and smaller clusters gradually.
6

"Multiple Regression and Issues in Regression Analysis": D Numerator K D Denominator N - (k+1)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

"Multiple Regression and Issues in Regression Analysis": D Numerator K D Denominator N - (k+1)

Uploaded by

Copyright:

Available Formats

2019, Study Session # 3, Reading # 8

“MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS”

2. MULTIPLE LINEAR REGRESSIONS

3. USING DUMMY VARIABLES IN REGRESSION

 Dummy variable Þ takes 1 if particular condition is true & 0 when it is false.

4. VIOLATIONS OF REGRESSION ASSUMPTIONS

4.1 Heteroskedasticity 4.2 Serial Correlation 4.3 Multicollinearity

4.1.3 Correcting for

4.2 Serial Correlation

4.2.2 Testing for 4.2.3 Correcting for

 Incorrect estimate of regression coefficient standard

4.1.2 Testing for Heteroskedasticity

4.3.1 The 4.3.2 Detecting 4.3.3 Correcting

 Difficulty in detecting significant  Multicollinearity is a matter of degree  Exclude one or more

Problem How to detect Consequences Possible

5. MODEL SPECIFICATION AND ERRORS IN SPECIFICATION

 Model specification Þ set of variables included in regression.

5.2 Misspecified 5.3 Times-Series 5.4 Other Types

 Model grounded on  One or more variables are  Including lagged  Non-stationarity:

6. MODELS WITH QUALITATIVE DEPENDENT VARIABLES

 Qualitative dependent variables Þ dummy variables used as

7.4.1 Supervised 7.4.2 Unsupervised

• computationally- CART model is ensembles multiple • also known as artificial neural

7.4.2.1 Clustering 7.4.2.2 Dimension

You might also like

Dummy variable Þ takes 1 if particular condition is true & 0 when it is false.

Incorrect estimate of regression coefficient standard

Difficulty in detecting significant Multicollinearity is a matter of degree Exclude one or more

Model specification Þ set of variables included in regression.

Model grounded on One or more variables are Including lagged Non-stationarity:

Qualitative dependent variables Þ dummy variables used as