Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

NOTE

 Slides marked as Extra study are not as a part of syllabus.


 Those are provided for add-on knowledge.

1
Disclaimer
 This presentation is purely for academic purpose and does not carry
any commercial value.
 All non-academic images used in this presentation are property of
respective image holder(s). Images are used only for indicative
purpose and does not carry any other meaning.

2
Please follow this…

3
Regression Analysis
Data Analysis with Excel

www.pibm.in

4
TEXT BOOK
 Hector Guerrero, “Excel Data Analysis -Modeling and Simulation”,
Springer-Verlog
Camm, Cochran, Fry, Ohlmann, Anderson, Sweeney, Willians,
“Essentials of Business Analytics”, CenageLearning

REFERENCE LINK FOR EXCEL


https://www.excel-easy.com/
https://support.office.com/en-us/article/excel-for-windows-
training-9bc05390-e94c-46af-a5b3-
d7c22f6990bb?wt.mc_id=otc_home&ui=en-US&rs=en-US&ad=US
5
Table of Content
1 Linear regression Toy store : Case Study

2 Modeling

3 Estimation

4 Inference

5 Prediction

6 Example 2
6
Data Set
 Toy-Sale2.xlsx
 QuantitySales.xlsx

7
Linear Regression

What is Linear Regression?


Linear regression attempts to fit a linear relation between a variable of
interest and a set of variables that may be related to the variable of
interest.

8
Linear Regression for Business Statistics

Example
There is a Sales manager of a toys retail company which sells
various kinds of toys in the local market. This sales manager
needs to make some kind of projections about the number of
monthly units that the retail company will be able to sell of
this particular toy in the coming half year. In the past she has
been making such projections based on her gut feeling and
now wishes to be a little more scientific about the whole
process.

9
Linear Regression

Overview of Regression

1.Modeling - Developing a regression model


2.Estimation - Using software to estimate the model
3.Inference - Interpreting the estimated regression model
4.Prediction - Making predictions about the variable of interest

10
1. Modeling
Developing a regression model
To develop regression model implies representing your knowledge
and beliefs about process and equation
Example (continued…) Based on her experience, the manager figures
out that the monthly unit sales depend on three important variables,
the price at which the toy is sold, the monthly amount that the
company spends on advertising the toy and the monthly amount
spent on promotions for the toy.

11
1. Modeling
Question that comes to our mind is, are these only factor on which sales
depends?
Certainly not.

 The level of excitement amongst consumers about this toy,


 The month of year for which sales are being considered,
 Perhaps some micro factors such as countries, GDP and so
on. …….

There are lots of factors that might impact sales.


12
1. Modeling
Sales is variable of interest or Y variable or dependent variable or L.H.S.
variable.
The R.H.S. Variables will be used to explain sales. These are Price of toy in
particular month, Advertise Expenditure & promotional Expenditure.
Β0, β1, β2, β3 are coefficients of equation
Sales depends on Advertise Expenditure & promotional Expenditure

The Regression Line is

13
1. Modeling

14
Toy-sales2.xlsx
Month Unit Sales Price ($) Adexp ('000$) Promexp ('000$)

1 73959 8.75 50.04 61.13


2 71544 8.99 50.74 60.19
3 78587 7.50 50.14 59.16
4 80364 7.25 50.27 60.38
5 78771 7.40 51.25 59.71
6 71986 8.50 50.65 59.88
7 74885 8.40 50.87 60.14
8 73345 7.90 50.15 60.08
9 76659 7.25 48.24 59.90
10 71880 8.70 50.19 59.68
11 73598 8.40 51.11 59.83
12 74893 8.10 51.49 59.77
15
Data Analysis Toolpak - Excel
1. On the Data tab, in the Analysis group, click
Data Analysis
2. Select Regression and click OK.
3. Select the Y Range (A1:A25). This is the
predictor variable (also called dependent
variable).
4. Select the X Range(B1:E25). These are the
explanatory variables (also called independent
variables). These columns must be adjacent to
each other.
5. Check Labels.
6. Click in the Output Range box and select cell
H1.
7. Check Residuals.
8. Click OK.

16
Data Analysis Toolpak - Excel

SUMMARY OUTPUT Coefficients Standard Error t Stat P-value

Regression Statistics Intercept -25096.8329 24859.61131 -1.0095425 0.324773


Multiple R 0.926739
R Square 0.858845 Price ($) -5055.26987 526.3995537 -9.6034843 6.22E-09
Adjusted R Adexp
Square 0.837671 ('000$) 648.6121403 209.0048787 3.10333493 0.005602
Standard
Error 1274.936 Promexp
Observations 24 ('000$) 1802.610956 392.8485427 4.5885647 0.000178

17
2. Estimation
Using software to estimate the model

18
2. Estimation
β0 = -25096.83
β1 = -5055.27
β2 = 648.61
β3 = 1802.61

β1 represents extent of impact of price has on sales


β2 represents extent of impact of Advertisement Expenditure has on sales
β3 represents extent of impact of promotional Expenditure has on sales

19
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.926738719
R Square 0.858844653
Adjusted R
Square 0.83767135
Standard Error 1274.93553
Observations 24

ANOVA
df SS MS F Significance F
Regression 3 197798832.8 65932944.3 40.56262 1.085E-08
Residual 20 32509212.11 1625460.61
Total 23 230308045

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -25096.8329 24859.61131 -1.0095425 0.324773 -76953.073 26759.4074 -76953.0733 26759.40743
Price ($) -5055.26987 526.3995537 -9.6034843 6.22E-09 -6153.3201 -3957.2196 -6153.32009 -3957.21964
Adexp ('000$) 648.6121403 209.0048787 3.10333493 0.005602 212.6356 1084.58868 212.6356042 1084.588676

20
2. Estimation
R2 Statistic is measure of goodness of
fit
SUMMARY OUTPUT
0 ≤ R2 ≤ 1
Regression Statistics That is, The Value of R2 always lie between 0 and
Multiple R 0.926738719 1
R Square 0.858844653 R2 is proportion of variation in Y variable
Adjusted R Square 0.83767135 explained by regression model
Standard Error 1274.93553
Observations 24
R2 = 0.86 %
This model is able to explain about 0.86 %
variation of changes in unit sales of toy.

The R2 closer to 1, the better the regression line fits the data
21
Significance F and P-values
To check if your results are reliable (statistically significant), look at Significance F .
F = 1.085E-08 = 0.0000 < 0.05
1) If this value F < 0.05, then model is OK.
If Significance F > 0.05, 1.it's probably better to stop using this set of independent variables.
2. Delete a variable with a high P-value (greater than 0.05) and rerun the regression until Significance
F drops below 0.05.
2) Most or all P-values should be bellow 0.05 ( P-value for intercept > 0.05, P-values for independent are 0.000,
0.005 )
ANOVA
df SS MS F Significance F
Regression 3 197798832.8 65932944.3 40.56262 1.085E-08
Residual 20 32509212.11 1625460.61
Total 23 230308045

Coefficients Standard Error t Stat P-value Lower 95%


Intercept -25096.8329 24859.61131 -1.0095425 0.324773 -76953.073
Price ($) -5055.26987 526.3995537 -9.6034843 6.22E-09 -6153.3201
Adexp ('000$) 648.6121403 209.0048787 3.10333493 0.005602 212.6356

22
3. Inference
Interpreting the estimated regression model

When the variable X1 increases by one unit, the Y variable increases by β1


units all other variables in Model are being kept at the same level.
Similarly When the variable X2 increases by one unit, the Y variable
increases by β2 units all other variables in Model are being kept at the same
level. ……..

23
3. Inference
Interpreting the estimated regression model

24
4. Prediction

Making predictions about the variable of interest


For the coming six months, company management is considering three
alternative scenarios for selling this particular toy.

Scenario 1: Price = 9.10$, AdExp = 52,000$, PromExp = 61,000$


Scenario 2: Price = 7.10$, AdExp = 48,000$, PromExp = 57,000$
Scenario 3: Price = 8.10$, AdExp = 50,000$, PromExp = 60,000$

Which scenario to implement to maximize unit sales?

25
H I J K L M N
16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
17 Intercept -25096.8329 24859.61131 -1.0095425 0.324773 -76953.073 26759.4074
18 Price ($) -5055.26987 526.3995537 -9.6034843 6.22E-09 -6153.3201 -3957.2196

19 Adexp 648.6121403 209.0048787 3.10333493 0.005602 212.6356 1084.58868


('000$)

20 Promexp 1802.610956 392.8485427 4.5885647 0.000178 983.14326 2622.07865


('000$)

21

22 Price AdExp PromExp predicted


=I$17+I$18*J23+I$19*K23+I$20
23 scenario 1 9.1 52 61 72587 *L23

=I$17+I$18*J24+I$19*K24+I$20
24 2 7.1 48 57 72893 *L24

=I$17+I$18*J25+I$19*K25+I$20
25 3 8.1 50 60 74543 *L25
26
4. Predictions

Price AdExp PromExp predicted

scenario 1 9.1 52 61 72587


2 7.1 48 57 72893
3 8.1 50 60 74543

27
4. Prediction
Making predictions about the variable of interest

Scenario 1: Price = 9.10$, AdExp = 52,000$, PromExp = 61,000$


Predicted Sales = 72587.31
Scenario 2: Price = 7.10$, AdExp = 48,000$, PromExp = 57,000$
Predicted Sales = 72892.96
Scenario 3: Price = 8.10$, AdExp = 50,000$, PromExp = 60,000$
Predicted Sales = 74542.75

28
Residuals
 The residuals show you how far away the actual data points are from the predicted data points
(using the equation).
 For example, the first data point equals 73959
 Using the equation, the predicted for first month is
 -25096.83 -5055.27 * 8.75 648.61* 50.04 + 1802.61 *61.13 = 73319.71
 residual = 73959 - 73319.71 = 639.28.

Month Unit Sales Price ($) Adexp ('000$) Promexp ('000$) prediction residuals
1 73959 8.75 50.04 61.13 73319.71 639.2850022
2 71544 8.99 50.74 60.19 70866.02 677.9755706
3 78587 7.50 50.14 59.16 76152.52 2434.480039
4 80364 7.25 50.27 60.38 79699.84 664.1576281
5 78771 7.40 51.25 59.71 78369.44 401.5575512
29
Ex 2.

We have data from Company ELCAB


produces 7 items. The data on Quantity
Sold, price of item, Advertising cost for A B C D
Item is given 1 Quantity Sold Price Advertising
The question is: is there a relation between 2 8500 $2 $2,800
Quantity Sold (Output) and Price and 3 4700 $5 $200
Advertising cost (Input). 4 5800 $3 $400
In other words: can we predict Quantity 5 7400 $2 $500
Sold if we know Price and Advertising? 6 6200 $5 $3,200
7 7300 $3 $1,800
Here Price, Advertising cost are Input
variables. Quantity Sold is dependent 8 5600 $4 $900
variable. QuantitySales.XLSX

30
1. Modeling
The regression line is:
Quantity Sold = β0 + β1 * Price + β2 * Advertising cost

Quantity Sold is dependent variable


Price & Advertising cost are Independent Variables
β 0, β1, β2 are coefficients of equation

31
Data Analysis Toolpak - Excel
1. On the Data tab, in the Analysis group, click
Data Analysis
2. Select Regression and click OK.
3. Select the Y Range (A1:A8). This is the predictor
variable (also called dependent variable).
4. Select the X Range(B1:C8). These are the
explanatory variables (also called independent
variables). These columns must be adjacent to
each other.
5. Check Labels.
6. Click in the Output Range box and select cell E1.
7. Check Residuals.
8. Click OK.

32
2. Estimation

Quantity Sold = β0 + β1 * Price + β2 * Advertising cost


Quantity Sold = 8536.214 - 835.722 * Price + 0.592 * Advertising cost

β0 = 8536.214
β1 = -835.722
β2 = 0.592

33
2. Estimation
R2 = 0.962, which is a very good
SUMMARY OUTPUT fit.

Regression Statistics 96% of the variation in Quantity


Multiple R 0.980681 Sold is explained by the
independent variables Price and
R Square 0.961736 Advertising.
Adjusted R Square 0.942604
.
Standard Error 310.5239
Observations 7

The R2 closer to 1, the better the regression line fits the data
34
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.980681
R Square 0.961736
Adjusted R
Square 0.942604
Standard Error 310.5239
Observations 7

ANOVA
df SS MS F Significance F
Regression 2 9694300 4847150 50.26854 0.001464
Residual 4 385700.4 96425.11
Total 6 10080000

Upper
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% 95.0%

Intercept 8536.214 386.9117 22.06243 2.5E-05 7461.975 9610.453 7461.975 9610.453


Price -835.722 99.65304 -8.38632 0.001106 -1112.4 -559.041 -1112.4 -559.041

Advertising 0.592228 0.104347 5.675579 0.004755 0.302515 0.881942 0.302515 0.881942


35
Significance F and P-values
 To check the results are reliable or not (statistically significant),
 F = 0.001. Since F is less than 0.05, Model is OK.
 Most or all P-values should be below 0.05. In this example it is (0.000,
0.001 and 0.005).

Note : If F > 0.05, Donot use this set of independent variables.


1. Delete a variable with a high P-value (greater than 0.05)
2. Rerun the regression until Significance F drops below 0.05.

36
3. Inference
 When the variable Price increases by one unit, the Quantity Sold decreases
by - 835.722 units all other variables in Model are being kept at the same
level.
 Similarly When the Advertising cost increases by one unit, the Quantity
Sold increases by 0.592 units all other variables in Model are being kept at
the same level. ……..

Quantity Sold = β0 + β1 * Price + β2 * Advertising cost


Quantity Sold = 8536.214 - 835.722 * Price + 0.592 * Advertising cost

37
4. Predictions
 if price equals $4 and Advertising cost equals $3000, you might be want to
predict a Quantity Sold
 if price equals $3 and Advertising cost equals $2000, you might be want to
predict a Quantity Sold
 if price equals $2 and Advertising cost equals $1500, you might be want to
predict a Quantity Sold

Quantity Sold = 8536.214 - 835.722 * Price + 0.592 * Advertising cost

38
4. Predictions

B C D E F G H I

15 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%
16 Intercept 8536.214 386.9117 22.06243 2.5E-05 7461.974654 9610.453111 7461.975
17 Price -835.722 99.65304 -8.38632 0.001106 -1112.40356 -559.041143 -1112.4
18 Advertising 0.592228 0.104347 5.675579 0.004755 0.302515325 0.881941666 0.302515
19
Advertising Predicted
Price
cost Quantity
$4 $3,000 6970.01 =C17+C18*F22+C19*G22
$3 $2,000 7213.50 =C17+C18*F23+C19*G23
$2 $1,500 7753.11 =C17+C18*F24+C19*G24

39
4. Predictions

Predicted
Price Advertising cost
Quantity Sold
1 $4 $3,000 6970
2 $3 $2,000 7214
3 $2 $1,500 7753

40
Residuals

Quantity Advertising predicted Quantity


Price Residual
Sold Cost Sold
8500 $2 $2,800 $8,523.01 ($23.01)
4700 $5 $200 $4,476.05 $223.95
5800 $3 $400 $6,265.94 ($465.94)
7400 $2 $500 $7,160.88 $239.12
6200 $5 $3,200 $6,252.73 ($52.73)
7300 $3 $1,800 $7,095.06 $204.94
5600 $4 $900 $5,726.33 ($126.33)

41
HOMEWORK PROBLEMS

42
Home Work
Month Temperature Jackets Sold
Nov 15 100
Dec 10 150
Jan 8 250
Feb 12 100
Mar 16 50
Apr 22 10

Data of winter season jacket sold data with temperature in each month.
Temperature is the independent variable because one cannot control the
temperature.
Jackets Sold is the dependent variable because based on the temperature
increase and decreases jacket sale

43
Home Work
 Fit Regression line using Data Analysis Toolpak, Write equation of
Regression line with proper notations.
 Estimate the parameters.
 Find R2, F, P-Values.
 Find prediction for Jackets Sold when temperature of month June, Sep, Oct
when is 25, 13, 14
 Find predictions and residual for month Nov, Dec, Jan, Feb, Mar, Apr

44
PGDM
 Happy Analyzing !!!!!! ….!!!

45

You might also like