Assignment On Regression

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Assignment on Regression

(Business Statistics & Research Methodology)


MBA-1st Year(2nd sem)

Submitted To:
Dr.Amanpreet Singh
Submitted By:
Sanjana Chhabra
Roll no. 19421198
Section:- D

School of Management Studies


What is Regression Analysis?
 Regression analysis is a statistical instrument for quantify the
relationship between just one independent variable (hence "simple") and
one dependent variable based on past experience (observations). For
example, simple linear regression analysis can be used to express how a
company's electricity cost (the dependent variable) changes as the
company's production machine hours (the independent variable) change.
Fortunately there is software to compute the best fitting straight
line (hence "linear") that expresses the past relationship between the
dependent and independent variable. Continuing our example, you will
enter 1) the amount of the past monthly electricity bills, and 2) the
number of machine hours occurring during the period of each of the
bills. Next, the software will likely use the least squares method to
produce the formula for the best fitting line. The line will appear in the
form y = a + bx. In addition, the software will provide statistics
regarding the correlation, confidence, dispersion around the line, and
more.
(In all likelihood there are many independent variables causing a change
in the amount of the dependent variable. Therefore, you should not
expect that only one independent variable will explain a high percentage
of the change in the dependent variable. To increase the percentage, you
should think of the many independent variables that could cause a
change in the dependent variable. Next you should test the effect of the
combination of these independent variables or drivers by using multiple
regression analysis software.)
Prior to using simple linear regression analysis it is important to follow
these preliminary steps:

 seek an independent variable that is likely to cause or drive the


change in the dependent variable
 make certain that the past amounts for the independent variable
occur in the exact same period as the amount of the dependent variable
 plot the past observations on a graph using the y-axis for the cost
(monthly electricity bill) and the x-axis for the activity (machine hours
used during the exact period of the electricity bill)
 review the plotted observations for a linear pattern and for
any outliers
 keep in mind that there can be correlation without cause and effect

What are the uses of regression model?


1. Regression Analysis Basics
In its most rudimentary form, regression analysis is the estimation of
the ratio between two variables. Say you want to estimate the growth
in meat sales (MS Growth), based on economic growth (GDP
Growth). If past data indicates that the growth in meat sales is around
one and a half times the growth in the economy, the regression would
look as follows:
MS Growth = (GDP Growth)1.5.
The relationship between many variables also involves a constant. If
meat sales are trending up, growing one percent even in a stagnant
economy, the equation would be: MS Growth = (GDP Growth)_1.5 +1.
2. Multiple and Non-Linear Regression
The variable you are trying to estimate is referred to as dependent, while
the variable you use in the model to predict the dependent variable is
called independent. A regression can only have one dependent variable.
However, the number of potential independent variables is unlimited and
the model is referred to as multiple regression if it involves several
independent variables. Regression models also can pinpoint more
complex relationships between variables.
Sometimes, a model uses the square, square-root or any other power of
one or more independent variables to predict the dependent one, which
makes it a non-linear regression. For example: MS Growth= 1/2 (Square
root of GDP Growth).
3. Predicting the Future
The most common use of regression in business is to predict events that
have yet to occur. Demand analysis, for example, predicts how many
units consumers will purchase. Many other key parameters other than
demand are dependent variables in regression models, however.
Predicting the number of shoppers who will pass in front of a particular
billboard or the number of viewers who will watch the Super Bowl may
help management assess what to pay for an advertisement. Insurance
companies heavily rely on regression analysis to estimate how many
policy holders will be involved in accidents or be victims of burglaries,
for example.
4. Optimization of Business Processes
Another key use of regression models is the optimization of business
processes. A factory manager might, for example, build a model to
understand the relationship between oven temperature and the shelf life
of the cookies baked in those ovens. A company operating a call center
may wish to know the relationship between wait times of callers and
number of complaints.
A fundamental driver of enhanced productivity in business and rapid
economic advancement around the globe during the 20th century was
the frequent use of statistical tools in manufacturing as well as service
industries. Today, managers considers regression an indispensable tool.
What is the interpretation of Regression coefficient?
Linear regression is one of the most popular statistical techniques.

Despite its popularity, interpretation of the regression coefficients of any


but the simplest models is sometimes, well….difficult.
So let’s interpret the coefficients of a continuous and a categorical
variable.  Although the example here is a linear regression model, the
approach works for interpreting coefficients from any regression model
without interactions, including logistic and proportional hazards models.

A linear regression model with two predictor variables can be expressed


with the following equation:

Y = B0 + B1*X1 + B2*X2 + e.


The variables in the model are:

 Y, the response variable;


 X1, the first predictor variable;
 X2, the second predictor variable; and
 e, the residual error, which is an unmeasured variable.
The parameters in the model are:

 B0, the Y-intercept;


 B1, the first regression coefficient; and
 B2, the second regression coefficient.
One example would be a model of the height of a shrub (Y) based on the
amount of bacteria in the soil (X1) and whether the plant is located in
partial or full sun (X2).
Height is measured in cm, bacteria is measured in thousand per ml of
soil, and type of sun = 0 if the plant is in partial sun and type of sun = 1
if the plant is in full sun.

Let’s say it turned out that the regression equation was estimated as
follows:

Y = 42 + 2.3*X1 + 11*X2
Interpreting the Intercept
B0, the Y-intercept, can be interpreted as the value you would predict for
Y if both X1 = 0 and X2 = 0.
We would expect an average height of 42 cm for shrubs in partial sun
with no bacteria in the soil. However, this is only a meaningful
interpretation if it is reasonable that both X1 and X2 can be 0, and if the
data set actually included values for X1 and X2 that were near 0.
If neither of these conditions are true, then B0 really has no meaningful
interpretation. It just anchors the regression line in the right place. In our
case, it is easy to see that X2 sometimes is 0, but if X1, our bacteria level,
never comes close to 0, then our intercept has no real interpretation.

Interpreting Coefficients of Continuous Predictor Variables


Since X1 is a continuous variable, B1 represents the difference in the
predicted value of Y for each one-unit difference in X 1, if X2 remains
constant.
This means that if X1 differed by one unit (and X2 did not differ) Y will
differ by B1 units, on average.
In our example, shrubs with a 5000 bacteria count would, on average, be
2.3 cm taller than those with a 4000/ml bacteria count, which likewise
would be about 2.3 cm taller than those with 3000/ml bacteria, as long as
they were in the same type of sun.

(Don’t forget that since the bacteria count was measured in 1000 per ml
of soil, 1000 bacteria represent one unit of X1).

Interpreting Coefficients of Categorical Predictor Variables


Similarly, B2 is interpreted as the difference in the predicted value in Y
for each one-unit difference in X2 if X1 remains constant. However,
since X2 is a categorical variable coded as 0 or 1, a one unit difference
represents switching from one category to the other.
B2 is then the average difference in Y between the category for which
X2 = 0 (the reference group) and the category for which X 2 = 1 (the
comparison group).
So compared to shrubs that were in partial sun, we would expect shrubs
in full sun to be 11 cm taller, on average, at the same level of soil
bacteria.

Interpreting Coefficients when Predictor Variables are Correlated


Don’t forget that each coefficient is influenced by the other variables in
a regression model. Because predictor variables are nearly always
associated, two or more variables may explain some of the same
variation in Y.

Therefore, each coefficient does not measure the total effect on Y of its
corresponding variable, as it would if it were the only variable in the
model.

Rather, each coefficient represents the additional effect of adding that


variable to the model, if the effects of all other variables in the model
are already accounted for. (This is called Type 3 regression coefficients
and is the usual way to calculate them. However, not all software uses
Type 3 coefficients, so make sure you check your software manual so
you know what you’re getting).
This means that each coefficient will change when other variables are
added to or deleted from the model.

What is Regression Line?

Definition: The Regression Line is the line that best fits the data, such
that the overall distance from the line to the points (variable values)
plotted on a graph is the smallest. In other words, a line used to
minimize the squared deviations of predictions is called as the regression
line.
There are as many numbers of regression lines as variables.
Suppose we take two variables, say X and Y, then there will be
two regression lines:

 Regression line of Y on X: This gives the most probable values of


Y from the given values of X.
 Regression line of X on Y: This gives the most probable values of
X from the given values of Y.
The algebraic expression of these regression lines is called
as Regression Equations. There will be two regression equations
for the two regression lines.

The correlation between the variables depend on the distance


between these two regression lines, such as the nearer the
regression lines to each other the higher is the degree of
correlation, and the farther the regression lines to each other the
lesser is the degree of correlation.

The correlation is said to be either perfect positive or perfect


negative when the two regression lines coincide, i.e. only one line
exists. In case, the variables are independent; then the correlation
will be zero, and the lines of regression will be at right angles, i.e.
parallel to the X axis and Y axis.

Note: The regression lines cut each other at the point of average of


X and Y. This means, from the point where the lines intersect each
other the perpendicular is drawn on the X axis we will get the
mean value of X. Similarly, if the horizontal line is drawn on the Y
axis we will get the mean value of Y.

What are the applications of Regression Model?


1. Predictive Analytics:
Predictive analytics i.e. forecasting future opportunities and risks is the
most prominent application of regression analysis in business. Demand
analysis, for instance, predicts the number of items which a consumer
will probably purchase. However, demand is not the only dependent
variable when it comes to business. Regression analysis can go far
beyond forecasting impact on direct revenue. For example, we can
forecast the number of shoppers who will pass in front of a particular
billboard and use that data to estimate the maximum to bid for an
advertisement. Insurance companies heavily rely on regression analysis
to estimate the credit standing of policyholders and a possible number of
claims in a given time period.
2. Operation Efficiency:
Regression models can also be used to optimize business processes. A
factory manager, for example, can create a statistical model to
understand the impact of oven temperature on the shelf life of the
cookies baked in those ovens. In a call center, we can analyze the
relationship between wait times of callers and number of complaints.
Data-driven decision making eliminates guesswork, hypothesis and
corporate politics from decision making. This improves the business
performance by highlighting the areas that have the maximum impact on
the operational efficiency and revenues.
 
3. Supporting Decisions:
Businesses today are overloaded with data on finances, operations and
customer purchases. Increasingly, executives are now leaning on data
analytics to make informed business decisions thus eliminating the
intuition and gut feel. Regression analysis can bring a scientific angle to
the management of any businesses. By reducing the tremendous amount
of raw data into actionable information, regression analysis leads the
way to smarter and more accurate decisions. This does not mean that
regression analysis is an end to managers creative thinking. This
technique acts as a perfect tool to test a hypothesis before diving into
execution.
 
4. Correcting Errors:
Regression is not only great for lending empirical support to
management decisions but also for identifying errors in judgment. For
example, a retail store manager may believe that extending shopping
hours will greatly increase sales. Regression analysis, however, may
indicate that the increase in revenue might not be sufficient to support
the rise in operating expenses due to longer working hours (such as
additional employee labor charges). Hence, regression analysis can
provide quantitative support for decisions and prevent mistakes due to
manager's intuitions.
 
5. New Insights:
Over time businesses have gathered a large volume of unorganized data
that has the potential to yield valuable insights. However, this data is
useless without proper analysis. Regression analysis techniques can find
a relationship between different variables by uncovering patterns that
were previously unnoticed. For example, analysis of data from point of
sales systems and purchase accounts may highlight market patterns like
increase in demand on certain days of the week or at certain times of the
year. You can maintain optimal stock and personnel before a spike in
demand arises by acknowledging these insights.

What is the interpretation of R-Squared?

R-squared evaluates the scatter of the data points around the fitted
regression line. It is also called the coefficient of determination, or the
coefficient of multiple determination for multiple regression. For the
same data set, higher R-squared values represent smaller differences
between the observed data and the fitted values.

R-squared is the percentage of the dependent variable variation that a


linear model explains.
R-squared is always between 0 and 100%:

o 0% represents a model that does not explain any of the variation in


the response variable around its mean. The mean of the dependent
variable predicts the dependent variable as well as the regression
model.

o 100% represents a model that explains all of the variation in the


response variable around its mean.

Usually, the larger the R2, the better the regression model fits your
observations.

You might also like