Professional Documents
Culture Documents
"Multiple Regression and Issues in Regression Analysis": D Numerator K D Denominator N - (k+1)
"Multiple Regression and Issues in Regression Analysis": D Numerator K D Denominator N - (k+1)
Allows determining effects of more than one independent variable on a particular dependent variable
𝑌= = 𝑏) + 𝑏" 𝑋"= + 𝑏$𝑋$= + ⋯ 𝑏' 𝑋' + 𝐸=
Tells the impact on Y by changing X1 by 1 unit keeping other independent variables same.
Individual slope coefficients (e.g. b1) in multiple regressions known as partial regression/slope coefficients.
2.1 Assumption of the 2.2 Predicting the 2.3 Testing Whether All 2.4
Multiple Linear Dependent Variable in a Population Regression Adjusted
Regression Model Multiple Regression Model Coefficients Equals Zero R2
Relationship b/w Y Obtain estimates of 𝐻) Þ All slope coefficients are R2 á with addition of
and 𝑋" , 𝑋$ , 𝑋% , … 𝑋' is regression parameters. simultaneously = 0, none of the independent variables
linear. 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑠 = X variable helps explain Y. (X) in regression
Independent variables $
𝑏)^ , 𝑏"^ , 𝑏)^ , … 𝑏'^ To test 𝐻) F-test is used. 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅$ J𝑅 K =
are not random and 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 T-test cannot be used. '4"
no exact linear = 𝑏) , 𝑏" , 𝑏" , … 𝑏2 ,-. 1−J K (1 − 𝑅$ ).
'424"
𝐹= $
,-/
relationship exists b/w Determine assumed .--/2 When k ≥ 1 Þ 𝑅$ > 𝑅
2 or more values of 𝑋<"= , 𝑋<$= … 𝑋<2 =--//('4(25")) $
𝑅 can be –ve but R2 is
independent Compute predicted value Where
' always +ve.
variables. of 𝑌< using 𝑌<= = 𝑏<) + $ $
𝑅𝑆𝑆 = 9:𝑌<= − 𝑌? If 𝑅 is used for
Expected value of 𝑏<"𝑋<"= + 𝑏<$𝑋<$= + ⋯ +
error terms is 0.
=@" comparing regression
𝑏<2 𝑋<2= '
Variance of error term $ models.
To predict dependent 𝑆𝑆𝐸 = 9:𝑌= − 𝑌<= ?
is same for all Sample size must be
variable: =@"
observations. Decision rule Þ reject 𝐻) if F > the same
Be confident that
Error term is FC (for given α). Dependent variable
assumptions of the
uncorrelated across It is a one-tailed test. is defined in the
regression are met.
observations. df numerator =k same way.
Predictions regarding X $
Error term is normally must be within reliable df denominator =n-(k+1). á𝑅 Does not
distributed. range of data used to For k and n the test statistic necessarily indicate
estimate the model. representing H0, all slope regression is well
coefficients are equal to 0, is specified.
𝐹2,'4(25")
In F-distribution table
𝑓) 𝐹2,'4(25") where K
represents column and n-(k+1)
represents row.
Significance of F in ANOVA
table represents ‘p value’.
á F-statistic â chances of Type
I error.
1
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8
It can lead to mistake in inference. Does not Breush-Pagan test is widely used. Robust Generalized
affect consistency. Regression squared residuals of Standard Errors Least Squares
F-test becomes unreliable. regression on independent
Due to biased estimators of standard errors, variables.
Independent variables Corrects standard Modify original
t-test also becomes unreliable.
explain much of the variation error of estimated equation.
Heteroskedasticity effects may include:
coefficients. Requires
underestimation of estimated of errors Þ conditional
Also known as economic
standard errors heteroskedasticity exists.
heteroskedasticity expertise to
inflated t-statistic 𝐻) = no conditional
consistent implement
Ignoring heteroskedasticity leads to heteroskedasticity exists.
standards errors correctly on
significant relationship that does not exist 𝐻\ = conditional
or white-corrected financial data.
actually. heteroskedasticity exist
standards errors.
It becomes more serious while developing Under Breush-pagan test statistic =
investment strategy using regression nR2
analysis. R2: from regression of squared
Unconditional heteroskedasticity Þ when residuals on X
heteroskedasticity of error variance is not Critical value Þ calculated χ2
correlated with independent variables in distribution.
the multiple regression. df = no. of independent variables
Create major problems for statistical Reject 𝐻) if test-static > critical
inference. value.
Conditional heteroskedasticity Þ when
heteroskedasticity of error variance is
correlated with the independent variables.
It causes most problems.
Can be tested & corrected easily
through many statistically software
packages.
2
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8
3
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8
4.3 Multicollinearity
4.4 Summarizing
the Issues
4
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8
5
Copyright © FinQuiz.com. All rights reserved.
2019, Study Session # 3, Reading # 8
7. MACHINE LEARNING
7.1 Major Focuses 7.2 What is 7.3 Types of 7.4 Machine 7.5 Supervised Machine
of Data Analytics Machine Learning? Machine Learning? Learning Algorithms Learning Training
Six focuses of data ML uses statistical Two broad categories of ML are: Some simple steps to train the ML models
analytics: techniques that 1. Supervised learning: uses labeled are:
i. Measuring training data and process that info. to 1. Define the ML algorithm.
give computer
correlations find the output. Supervised learning 2. Specify the hyperparameters used in
systems the ability
ii. Making predictions follows the logic of ‘X leads to Y’. the ML technique.
iii. Making casual to act by learning
3. Divide datasets in two major groups:
inferences from data without 2. Unsupervised learning: does not • Training sample
iv. Classifying data being explicitly make use of labelled training data • Validation sample
v. Sorting data into programmed. and does not follow the logic of ‘X 4. Evaluate model-fit through validation
clusters leads to Y’. There are no outcomes to sample and tune the model’s
vi. Reducing the match to, however, the input data is hyperparameters.
dimension of data analyzed, and the program discovers 5. Repeat the training cycles for some
structures within the data itself. given number of times or until the
required performance level is achieved.
7.4.1.2
7.4.1.1 Penalized 7.4.1.3 Random
Classification & 7.4.1.4 Neural
Regression Forests
Regression Trees Networks
Clustering algorithms discover the K-means Algorithm is a • reduces the no. of random
inherent groupings in the data without type of bottom-up variables for complex datasets
any predefined class labels. clustering algorithm where • Principal component analysis
Two clustering approaches are: data is divided into k- (PCA) is an established method
• Bottom-up clustering: Each clusters based on the for dimension reduction. PCA
observation starts in its own cluster, concept of two geometric reduces highly correlated data
and then assembles with other clusters ideas ‘Centroid’ (average variables into fewer, necessary,
progressively based on some criteria in position of points in the uncorrelated composite variables.
a non-overlapping manner. cluster) and ‘Euclidian’ • Dimension reduction techniques
• Top-down clustering: All observations (straight line distance b/w are applicable to numerical,
begin as one cluster, and then split into two points). textual or visual data.
smaller and smaller clusters gradually.
6
Copyright © FinQuiz.com. All rights reserved.