Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

REGRESSION

• Regression analysis is a powerful statistical method that allows to examine the


relationship between two or more variables of interest.
• In regression analysis, the hypothesis is that there is a functional relationship that
allows prediction of a value of the dependent variable[Y] corresponding to a value
of the independent variable[X].
▪ Dependent Variable: This is the main factor that need to predict. The
dependent variable is also called the response or outcome variable.
▪ Independent Variables: These are the factors that you hypothesize
have an impact on your dependent variable. independent variables
are the risk factors and confounders which is called the predictors,
or explanatory variables.

• Statistically, a regression equation is developed that indicates that the


dependent variable is a function of the independent variable.
Regression analysis is primarily used for two conceptually distinct purposes:
▪ Prediction and Forecasting: Regression analysis is widely used for prediction and
forecasting.
▪ Causal relationships : Second, in some situations regression analysis can be used to
infer causal relationships between the independent and dependent variables.
Importantly, regressions by themselves only reveal relationships between a
dependent variable and a collection of independent variables in a fixed dataset.

Types of regression analysis:


• Linear
• Logistic
LINEAR REGRESSION:
o Linear regression is a statistical procedure for predicting the value of a dependent
variable from an independent variable when the relationship between the variables
can be described with a linear model.
o The goal of the Linear regression is to find the best fit line that can accurately
predict the output for the continuous dependent variable. . Linear regression
quantifies goodness of fit with r2.
o If single independent variable is used for prediction then it is called Simple Linear
Regression and if there are more than two independent variables then such
regression is called as Multiple Linear Regression.
o By finding the best fit line, algorithm establish the relationship between dependent
variable and independent variable. And the relationship should be of linear nature.
o The output for Linear regression should only be the continuous values such as price,
age, salary, etc. The relationship between the dependent variable and independent
variable can be shown in below image:

In above image the dependent variable is on Y-axis and independent variable is on x-


axis(experience).

The regression equation

Correlation describes the strength of an association between two variables, and is


completely symmetrical, the correlation between A and B is the same as the correlation
between B and A. However, if the two variables are related it means that when one
changes by a certain amount the other changes on an average by a certain amount. For
instance, in the children described earlier greater height is associated, on average, with
greater anatomical dead Space. If y represents the dependent variable and x the
independent variable, this relationship is described as the regression of y on x.

The relationship can be represented by a simple equation called the regression equation.
In this context "regression" simply means that the average value of y is a "function" of x,
that is, it changes with x.
The regression equation representing how much y changes with any given change of x
can be used to construct a regression line on a scatter diagram, and in the simplest case
this is assumed to be a straight line. The direction in which the line slopes depends on
whether the correlation is positive or negative. When the two sets of observations
increase or decrease together (positive) the line slopes upwards from left to right; when
one set decreases as the other increases the line slopes downwards from left to right. As
the line must be straight, it will probably pass through few, if any, of the dots. Given that
the association is well described by a straight line we have to define two features of the
line if we are to place it correctly on the diagram. The first of these is its distance above
the baseline; the second is its slope.
The basic relationship between X and Y is given by
Y = ax+ b.
Y denotes the estimated value of Y for a given value of X. This equation is knownas the
regression equation of Y on X (also represents the regression line of Y on X when drawn
on a graph) which means that each unit change in X produces a change of b in Y, which
is positive for direct and negative for inverse relationships
They are expressed in the following regression equation :
In which a is the slope and b is the y-intercept.

A linear regression equation can be written as Yp= aX + b, where Yp is the predicted value
of the dependent variable, a is the slope of the regression line, and b is the Y-intercept of
the regression line.
Statement of the linear regression model

• A linear regression model is typically stated in the form y = α + βx + ε


• The right hand side may take other forms, but generally comprises a linear
combination of the parameters, here denoted α and β.
• The term ε represents the unpredicted or unexplained variation in the dependent
variable; it is conventionally called the "error" whether it is really a measurement
error or not.
• The error term is conventionally assumed to have expected value equal to zero, as a
nonzero expected value could be absorbed into α.

Robust regression
▪ A useful alternative to linear regression is robust regression in which mean absolute
error is minimized instead of mean squared error as in linear regression.
▪ Robust regression is computationally much more intensive than linear regression
and is somewhat more difficult to implement as well.
▪ Robust regression usually means linear regression with robust (Huber-White)
standard errors (e.g. relaxing the assumption of homoskedasticity).
▪ An equivalent formulation which explicitly shows the linear regression as a model
of conditional expectation is with the conditional distribution of y given x
essentially the same as the distribution of the error term.
▪ A linear regression model need not be affine, let alone linear, in the independent
variables x.
Advantages / Limitations of Linear Regression Model :
• Linear regression implements a statistical model that, when relationships between
the independent variables and the dependent variable are almost linear, shows optimal
results.
• Linear regression is often inappropriately used to model non-linear relationships.
• Linear regression is limited to predicting numeric output.
• A lack of explanation about what has been learned can be a problem.

LOGISTIC REGRESSION:
o It can be used for Classification as well as for Regression problems, but mainly used
for Classification problems.
o Logistic regression is used to predict the categorical dependent variable with the
help of independent variables.
o The output of Logistic Regression problem can be only between the 0 and 1.
o Logistic regression can be used where the probabilities between two classes is
required. Such as whether it will rain today or not, either 0 or 1, true or false etc.
o Logistic regression is based on the concept of Maximum Likelihood estimation.
According to this estimation, the observed data should be most probable.
o In logistic regression, we pass the weighted sum of inputs through an activation
function that can map values in between 0 and 1. Such activation function is known
as sigmoid function and the curve obtained is called as sigmoid curve or S-curve.
Consider the below image:

o The equation for logistic


regression is:

Difference between Linear


Regression and Logistic
Regression:
Linear Regression Logistic Regression
Linear regression is used to predict Logistic Regression is used to predict the
the continuous dependent variable categorical dependent variable using a
using a given set of independent given set of independent variables.
variables.
Linear Regression is used for solving Logistic regression is used for solving
Regression problem. Classification problems.

In Linear regression, we predict the In logistic Regression, we predict the


value of continuous variables. values of categorical variables.

In linear regression, we find the best In Logistic Regression, we find the S-


fit line, by which we can easily curve by which we can classify the
predict the output. samples.

Least square estimation method is Maximum likelihood estimation method


used for estimation of accuracy. is used for estimation of accuracy.

The output for Linear Regression The output of Logistic Regression must
must be a continuous value, such as be a Categorical value such as 0 or 1, Yes
price, age, etc. or No, etc.

In Linear regression, it is required In Logistic regression, it is not required


that relationship between dependent to have the linear relationship between
variable and independent variable the dependent and independent variable.
must be linear.
In linear regression, there may be In logistic regression, there should not be
collinearity between the independent collinearity between the independent
variables. variable.

You might also like