Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2019, Study Session # 3, Reading # 8

“MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS”


MSR = Mean Regression Sum of Squares 𝐹Z = Critical F taken from F
MSE = Mean Squared Error 1. INTRODUCTION Distribute Table
RSS = Regression Sum of Squares 𝐻) = Null Hypothesis
SSE = Sum of Squared Errors/Residuals 𝐻∝ = Alternative Hypothesis
α = Level of Significance – Multiple linear regression models are more X = Independent Variable
ML = Machine Learning sophisticated. Y = Dependent Variable
– They incorporate more than one independent F = F Statistic (calculated)
variable.

2. MULTIPLE LINEAR REGRESSIONS

– Allows determining effects of more than one independent variable on a particular dependent variable
– 𝑌= = 𝑏) + 𝑏" 𝑋"= + 𝑏$𝑋$= + ⋯ 𝑏' 𝑋' + 𝐸=
– Tells the impact on Y by changing X1 by 1 unit keeping other independent variables same.
– Individual slope coefficients (e.g. b1) in multiple regressions known as partial regression/slope coefficients.

2.1 Assumption of the 2.2 Predicting the 2.3 Testing Whether All 2.4
Multiple Linear Dependent Variable in a Population Regression Adjusted
Regression Model Multiple Regression Model Coefficients Equals Zero R2

– Relationship b/w Y – Obtain estimates of – 𝐻) Þ All slope coefficients are – R2 á with addition of
and 𝑋" , 𝑋$ , 𝑋% , … 𝑋' is regression parameters. simultaneously = 0, none of the independent variables
linear. – 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 = X variable helps explain Y. (X) in regression
– Independent variables $
𝑏)^ , 𝑏"^ , 𝑏)^ , … 𝑏'^ – To test 𝐻) F-test is used. – 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅$ J𝑅 K =
are not random and – 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 – T-test cannot be used. '4"
no exact linear = 𝑏) , 𝑏" , 𝑏" , … 𝑏2 ,-. 1−J K (1 − 𝑅$ ).
'424"
𝐹= $
,-/
relationship exists b/w – Determine assumed .--/2 – When k ≥ 1 Þ 𝑅$ > 𝑅
2 or more values of 𝑋<"= , 𝑋<$= … 𝑋<2 =--//('4(25")) $
– 𝑅 can be –ve but R2 is
independent – Compute predicted value Where
' always +ve.
variables. of 𝑌< using 𝑌<= = 𝑏<) + $ $
𝑅𝑆𝑆 = 9:𝑌<= − 𝑌? – If 𝑅 is used for
– Expected value of 𝑏<"𝑋<"= + 𝑏<$𝑋<$= + ⋯ +
error terms is 0.
=@" comparing regression
𝑏<2 𝑋<2= '
– Variance of error term $ models.
– To predict dependent 𝑆𝑆𝐸 = 9:𝑌= − 𝑌<= ?
is same for all – Sample size must be
variable: =@"
observations. Decision rule Þ reject 𝐻) if F > the same
– Be confident that
– Error term is FC (for given α). – Dependent variable
assumptions of the
uncorrelated across – It is a one-tailed test. is defined in the
regression are met.
observations. – df numerator =k same way.
– Predictions regarding X $
– Error term is normally must be within reliable – df denominator =n-(k+1). – á𝑅 Does not
distributed. range of data used to – For k and n the test statistic necessarily indicate
estimate the model. representing H0, all slope regression is well
coefficients are equal to 0, is specified.
𝐹2,'4(25")
– In F-distribution table
𝑓) 𝐹2,'4(25") where K
represents column and n-(k+1)
represents row.
– Significance of F in ANOVA
table represents ‘p value’.
– á F-statistic â chances of Type
I error.

1
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8

3. USING DUMMY VARIABLES IN REGRESSION

– Dummy variable Þ takes 1 if particular condition is true & 0 when it is false.


– Diligence is required in choosing no. of dummy variables.
– Usually n-1 dummy variables are used where n= no. of categories.

4. VIOLATIONS OF REGRESSION ASSUMPTIONS

4.1 Heteroskedasticity 4.2 Serial Correlation 4.3 Multicollinearity


4.4 Summarizing
the Issues
– Variance of errors differs – Occurs when two or more
– Regression errors
independent variables (X) are
across observations Þ correlated across
heteroskedastic highly correlated with each other.
observations.
– Regression can be estimated but
– Variance of errors is similar – Usually arises in
result becomes problematic.
across observations Þ time-series
– Serious practical concern due to
homoskedastic regression.
commonly found approximate
– Usually no systematic
linear relation among financial
relationship exists b/w X &
variables.
regression residuals.
– If systematic relationship is
present Þ heteroskedasticity On page 3
can exist.
On page 4

4.1.3 Correcting for


4.1.1 The Consequence of 4.1.2 Testing for
Heteroskedasticity
Heteroskedasticity Heteroskedasticity

– It can lead to mistake in inference. Does not Breush-Pagan test is widely used. Robust Generalized
affect consistency. – Regression squared residuals of Standard Errors Least Squares
– F-test becomes unreliable. regression on independent
– Due to biased estimators of standard errors, variables.
– Independent variables – Corrects standard – Modify original
t-test also becomes unreliable.
explain much of the variation error of estimated equation.
– Heteroskedasticity effects may include:
coefficients. – Requires
– underestimation of estimated of errors Þ conditional
– Also known as economic
standard errors heteroskedasticity exists.
heteroskedasticity expertise to
– inflated t-statistic – 𝐻) = no conditional
consistent implement
– Ignoring heteroskedasticity leads to heteroskedasticity exists.
standards errors correctly on
significant relationship that does not exist – 𝐻\ = conditional
or white-corrected financial data.
actually. heteroskedasticity exist
standards errors.
– It becomes more serious while developing Under Breush-pagan test statistic =
investment strategy using regression nR2
analysis. R2: from regression of squared
– Unconditional heteroskedasticity Þ when residuals on X
heteroskedasticity of error variance is not Critical value Þ calculated χ2
correlated with independent variables in distribution.
the multiple regression. df = no. of independent variables
– Create major problems for statistical Reject 𝐻) if test-static > critical
inference. value.
– Conditional heteroskedasticity Þ when
heteroskedasticity of error variance is
correlated with the independent variables.
– It causes most problems.
– Can be tested & corrected easily
through many statistically software
packages.

2
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8

4.2 Serial Correlation

4.2.2 Testing for 4.2.3 Correcting for


4.2.1 The
Serial Correlation Serial Correlation
Consequences of
Serial Correlation

– Incorrect estimate of regression coefficient standard


– Adjust the coefficient – Modify regression
errors
standard errors. equation.
– Parameter estimates become inconsistent & invalid
→ Recommended method – Extreme care is required.
when Y is lagged onto X under serial correlation.
– Hansen’s method Þ most – May lead to inconsistent
– Positive serial correlation Þ positive (negative) errors
prevalent one. parameters estimates.
á chance of positive (negative) errors
– Negative serial correlation Þ positive (negative)
errors á chance of negative (positive) errors
– It leads to wrong inferences
– If positive serial correlation:
– Standard errors underestimated
– T-statistic & F-statistics inflated
– Type-I error á
– If negative serial correlation
– Standard errors overestimated
– T-statistics & F-statistics understated
– Type-II error á

4.1.2 Testing for Heteroskedasticity


– Variety of tests, most common → Durbin-Watson test
∑` < <
abc (/a 4/ade )
c
– 𝐷𝑊 = ∑` <c
abe /a
– Breush-Pagan
Where 𝐸<ftest is widely residual
= regression used. for period t.
– Regression
– For largesquared
sample residuals
size DWofstatistic
regression on independent variables.
is approximately.
» 2(1-r)variables explain much of the variation of errors Þ
Independent
– →DW
conditional
→where r = heteroskedasticity
sample correlation exists.
b/w regression residuals of t and t-1
– 𝐻)– = no conditional
Values of DW can heteroskedasticity
range from 0 to exists.
4.
– 𝐻\– = DW
conditional
= 2 Þ r=0heteroskedasticity exist
Þ no serial correlation.
– Under Breush-pagan test statistic = nR2
– DW = 0 Þ r=1 Þ perfectly positively serially correlated.
– R2: from regression of squared residuals on X
– DW = 4 Þ r = -1 Þ perfectly negatively serially correlated.
– Critical value Þ calculated χ2 distribution.
– For positive serial correlation:
– df = no. of independent variables
– 𝐻) Þ No positive serial correlation
– Reject 𝐻 if test-static > critical value.
– 𝐻)\ Þ Positive serial correlation
– 𝐷𝑊 < 𝑑𝑙 Þ reject 𝐻)
– 𝐷𝑊 > 𝑑𝑢 Þ do not reject 𝐻)
– dl ≤ 𝐷𝑊 ≤ 𝑑𝑢 Þ inconclusive.
– For negative serial correlation:
– 𝐻) Þ No negative serial correlation.
– 𝐻\ Þ Negative serial correlation.
– 𝐷𝑊 > 4 − 𝑑𝑙 Þ Reject 𝐻) .
– 𝐷𝑊 < 4 − 𝑑𝑢 Þ do not reject 𝐻)
– 4 − 𝑑𝑢 ≤ 𝐷𝑊 ≤ 4 − 𝑑𝑙 Þ inconclusive.

3
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8

4.3 Multicollinearity

4.3.1 The 4.3.2 Detecting 4.3.3 Correcting


Consequences of Multicollinearity Multicollinearity
Multicollinearity

– Difficulty in detecting significant – Multicollinearity is a matter of degree – Exclude one or more


relationships. rather than the presence / absence. regression variables.
– Estimates become extremely – á Pair wise correlation does not necessarily – In many cases,
imprecise & unreliable though indicate presence of Multicollinearity experimentation is done to
consistency is unaffected. – â Pair wise correlation does not necessarily determine variable causing
– F-statistic is unaffected. indicate absence of Multicollinearity Multicollinearity
– Standard errors of regression can á. – With 2 independent variables Þ correlation
– Causing insignificant t-tests is a useful indicator.
– Wide confidence interval – á R2 significant, F-statistic significant,
– Type II errorá insignificant t-statistic on slope coefficients
Þ classic symptom of Multicollinearity

4.4 Summarizing
the Issues

Problem How to detect Consequences Possible


Corrections
(Conditional) Plot residuals or Wrong Use robust
Heteroskedasticity use Breusch– inferences; standard errors or
Pagan test incorrect GLS
standard errors
Serial correlation Durbin-Watson Wrong Use robust
Test inferences; standard errors
incorrect (Hansen’s method)
standard errors or modifying
equation
Multicollinearity High R2 and Wrong Omit variable
significant F- inferences;
statistic but low t-
statistic

4
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8

5. MODEL SPECIFICATION AND ERRORS IN SPECIFICATION

– Model specification Þ set of variables included in regression.


– Incorrect specification leads to biased & inconsistent parameters

5.2 Misspecified 5.3 Times-Series 5.4 Other Types


5.1 Principles of Misspecification of Time-Series
Functional Form
Model Specification (Independent Variables Misspecification
Correlated with Errors)

– Model grounded on – One or more variables are – Including lagged – Non-stationarity:


economic reasoning. omitted. If omitted variable is variables (dependent) variable
– Functional form of variables correlated with remaining as independent with properties, e.g.
compatible with nature of variable, error term will also be serial correlation. mean, are not
variables correlated with the latter and the: – Including a function of constant through
– Parsimonious Þ each – result can be biased & the dependent variable time.
included variable should inconsistent. as an independent – In practice non-
play an essential role – estimated standard errors of variable. stationarity is a
– Model is examined for the the coefficients will be – Independent variables serious problem.
violation of regression inconsistent. measured with error
assumptions. – One or more variables may
– Model is tested for the require transformation.
validity & usefulness of the – Pooling of data from different
out of sample data. samples that should not be
pooled.
– Can lead to spurious results.

6. MODELS WITH QUALITATIVE DEPENDENT VARIABLES

– Qualitative dependent variables Þ dummy variables used as


dependent instead of independent.
– Probit modelÞ based on normal distribution estimates the
probability:
– of discrete outcome, given values of independent variables
used to explain that outcome.
– that Y=1, implying a condition is met.
– Logit model:
– Identical to Probit model.
– Based on logistic distribution.
– Both Logit and Probit models must be estimated using maximum
likelihood methods.
– Discriminate analysis Þ can be used to create an overall score
that is used for classification.
– Qualitative dependent variable models can be used for portfolio
management and business management.

5
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8

7. MACHINE LEARNING

7.1 Major Focuses 7.2 What is 7.3 Types of 7.4 Machine 7.5 Supervised Machine
of Data Analytics Machine Learning? Machine Learning? Learning Algorithms Learning Training

Six focuses of data ML uses statistical Two broad categories of ML are: Some simple steps to train the ML models
analytics: techniques that 1. Supervised learning: uses labeled are:
i. Measuring training data and process that info. to 1. Define the ML algorithm.
give computer
correlations find the output. Supervised learning 2. Specify the hyperparameters used in
systems the ability
ii. Making predictions follows the logic of ‘X leads to Y’. the ML technique.
iii. Making casual to act by learning
3. Divide datasets in two major groups:
inferences from data without 2. Unsupervised learning: does not • Training sample
iv. Classifying data being explicitly make use of labelled training data • Validation sample
v. Sorting data into programmed. and does not follow the logic of ‘X 4. Evaluate model-fit through validation
clusters leads to Y’. There are no outcomes to sample and tune the model’s
vi. Reducing the match to, however, the input data is hyperparameters.
dimension of data analyzed, and the program discovers 5. Repeat the training cycles for some
structures within the data itself. given number of times or until the
required performance level is achieved.

7.4.1 Supervised 7.4.2 Unsupervised


Learning Learning

7.4.1.2
7.4.1.1 Penalized 7.4.1.3 Random
Classification & 7.4.1.4 Neural
Regression Forests
Regression Trees Networks

• computationally- CART model is ensembles multiple • also known as artificial neural


efficient method to • is represented by a binary tree decision trees together networks, or ANNs.
solve prediction • computationally efficient • are appropriate for nonlinear
based on random selection
problems. • adaptable to complex datasets statistical data and for data with
of features to produce complex connections among
• imposes penalty on the • useful for interpreting how
accurate and stable variables.
size of regression observations are classified
coefficients to address • works on a pre-classified prediction. Splitting a node • contain nodes that are linked to the
overfitting issue. training data. in random forests is based arrows.
• improves prediction in A classification tree is formed by on the best features from a • ANNs have three types of
large datasets by splitting each node into two random subsets of n interconnected layers:
shrinking the no. of distinct subsets, and the process a) an input layer
features therefore, each
independent variables of splitting the derived subsets is b) hidden layers
tree marginally varies from c) an output layer
and handling model repeated in a recursive manner.
complexity. other tress. • Deep learning nets (DLNs) are neural
Process ends when further
splitting is not possible. networks with many hidden layers
(often > 20 ).

7.4.2.1 Clustering 7.4.2.2 Dimension


Algorithms Reduction

Clustering algorithms discover the K-means Algorithm is a • reduces the no. of random
inherent groupings in the data without type of bottom-up variables for complex datasets
any predefined class labels. clustering algorithm where • Principal component analysis
Two clustering approaches are: data is divided into k- (PCA) is an established method
• Bottom-up clustering: Each clusters based on the for dimension reduction. PCA
observation starts in its own cluster, concept of two geometric reduces highly correlated data
and then assembles with other clusters ideas ‘Centroid’ (average variables into fewer, necessary,
progressively based on some criteria in position of points in the uncorrelated composite variables.
a non-overlapping manner. cluster) and ‘Euclidian’ • Dimension reduction techniques
• Top-down clustering: All observations (straight line distance b/w are applicable to numerical,
begin as one cluster, and then split into two points). textual or visual data.
smaller and smaller clusters gradually.

6
Copyright © FinQuiz.com. All rights reserved.

You might also like