Professional Documents
Culture Documents
Linear Algebra
Linear Algebra
Linear Algebra
IN DATA
ANALYSIS
What do you see when you look at the image above?
• Addition
• Scalar Multiplication
• Transposition
• Matrix Multiplication
Linear Regression
•Y — The dependent variable (Target prediction value)
•X — Independent variable(s)
•β — Regression coefficients: These refer to the relationship between the x variable
and the dependent variable y.
•ε — Error term: used to denote the a residual variable produced by a statistical or
mathematical model, which is created when the model does not fully represent the
actual relationship between the independent variables and the dependent variables.
As a result of this incomplete relationship, the error term is the amount at which the
equation may differ during empirical analysis.
The job of a linear regression model is essentially to find a linear relationship
between the input (X) and output (y).
Case study
• It describes the different variables of different baseball teams to predict whether it makes to
playoffs or not. But for right now to make it a regression problem, suppose we are interested in
predicting OOBP from the rest of the variables. So, ‘OOBP’ is our target variable. To solve this
problem using linear regression, we have to find parameter vector.
To find the final parameter vector(θ) assuming initial function parameterised by θ and X
, find the inverse of (XT X)
f θ (X)= θT X, where θ is the parameter to be calculated and X is the column vector of
features or independent variables.
The main attributes that we need to be
concerned with are
.RS — Runs Scored
.RA — Runs Allowed
.W — Wins
.OBP — On Base Percentage
.SLG — Slugging Percentage
.BA — Batting Average
.Playoffs — Whether a team made it to playoffs or not
.OOBP — Opponentʼs On Base Percentage
.OSLG — Opponents Slugging Percentage
• IRIS -
https://towardsdatascience.com/eigenvalu
es-and-eigenvectors-378e851bf372
POLYNOMIAL REGRESSION
• fi’s are polynomial terms of the input variables, e.g., f1(x) = x1, f2(x) = x12 etc.
• Example: 2nd order polynomial regression of one predictor variable
xi is the linear terms, xi 2 is the quadratic term, and x1x2 is called interaction term between x1 and x2.
WEIGHT LOSS – Gym, Yoga, Walking, Jogging, Diet, Surgery, Genetics, etc
HOSPITAL – Gender, Illness, stay, contagious, seriousness, etc
PROs AND CONs
• Multiple Regression - It is a powerful technique used to predict the unknown values of a variable from the available variables. The
known variables are classified as predictors.
• Ridge regression - shrinks the coefficients towards zero, it introduces some bias. But it can reduce the variance to a great extent which will
result in a better mean-squared error. The amount of shrinkage is controlled by λ which multiplies the ridge penalty. As large λ means more
shrinkage, we can get different coefficient estimates for the different values of λ.
Singular Value Decomposition
• SVD is some sort of generalisation of Eigen value decomposition.
• SVD is used to remove the redundant features in a data set.
• Running an algorithm on the original data set ( of 1000 features..) will be time inefficient and
will require a lot of memory.
• Convert Colour Image to BW image based on Pixel Intensity
• different images, correspond to different ranks with different resolution. Higher the rank
implies the larger amount of information about pixel intensity.
• But the irony is pretty well image with 20 or 30 ranks instead of 100 or 200 ranks is possible
removing highly redundant data.
• Presence of redundant features causes multi co-linearity in linear regression.
• Also, some features are not significant for the model. Omitting these features helps to find a
better fit of algorithm along with time efficiency and lesser disk space. Singular value
decomposition is used to get rid of the redundant features present in data.
SVD