Professional Documents
Culture Documents
Regression Analysis
Regression Analysis
The relationship can be represented by a simple equation called the regression equation.
In this context "regression" simply means that the average value of y is a "function" of x,
that is, it changes with x.
The regression equation representing how much y changes with any given change of x
can be used to construct a regression line on a scatter diagram, and in the simplest case
this is assumed to be a straight line. The direction in which the line slopes depends on
whether the correlation is positive or negative. When the two sets of observations
increase or decrease together (positive) the line slopes upwards from left to right; when
one set decreases as the other increases the line slopes downwards from left to right. As
the line must be straight, it will probably pass through few, if any, of the dots. Given that
the association is well described by a straight line we have to define two features of the
line if we are to place it correctly on the diagram. The first of these is its distance above
the baseline; the second is its slope.
The basic relationship between X and Y is given by
Y = ax+ b.
Y denotes the estimated value of Y for a given value of X. This equation is knownas the
regression equation of Y on X (also represents the regression line of Y on X when drawn
on a graph) which means that each unit change in X produces a change of b in Y, which
is positive for direct and negative for inverse relationships
They are expressed in the following regression equation :
In which a is the slope and b is the y-intercept.
A linear regression equation can be written as Yp= aX + b, where Yp is the predicted value
of the dependent variable, a is the slope of the regression line, and b is the Y-intercept of
the regression line.
Statement of the linear regression model
Robust regression
▪ A useful alternative to linear regression is robust regression in which mean absolute
error is minimized instead of mean squared error as in linear regression.
▪ Robust regression is computationally much more intensive than linear regression
and is somewhat more difficult to implement as well.
▪ Robust regression usually means linear regression with robust (Huber-White)
standard errors (e.g. relaxing the assumption of homoskedasticity).
▪ An equivalent formulation which explicitly shows the linear regression as a model
of conditional expectation is with the conditional distribution of y given x
essentially the same as the distribution of the error term.
▪ A linear regression model need not be affine, let alone linear, in the independent
variables x.
Advantages / Limitations of Linear Regression Model :
• Linear regression implements a statistical model that, when relationships between
the independent variables and the dependent variable are almost linear, shows optimal
results.
• Linear regression is often inappropriately used to model non-linear relationships.
• Linear regression is limited to predicting numeric output.
• A lack of explanation about what has been learned can be a problem.
LOGISTIC REGRESSION:
o It can be used for Classification as well as for Regression problems, but mainly used
for Classification problems.
o Logistic regression is used to predict the categorical dependent variable with the
help of independent variables.
o The output of Logistic Regression problem can be only between the 0 and 1.
o Logistic regression can be used where the probabilities between two classes is
required. Such as whether it will rain today or not, either 0 or 1, true or false etc.
o Logistic regression is based on the concept of Maximum Likelihood estimation.
According to this estimation, the observed data should be most probable.
o In logistic regression, we pass the weighted sum of inputs through an activation
function that can map values in between 0 and 1. Such activation function is known
as sigmoid function and the curve obtained is called as sigmoid curve or S-curve.
Consider the below image:
The output for Linear Regression The output of Logistic Regression must
must be a continuous value, such as be a Categorical value such as 0 or 1, Yes
price, age, etc. or No, etc.