Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 51

Ch 4.

Prediction and Regression


1

Chapter 4:
Predict and forecast
Part I: Linear Regression
Ch 4. Prediction and Regression
2
1 Econometrics
1.1 Why “regression” and “econometrics”
• You’re able to demonstrate that a mean is equal to a target, you know how to compare two groups, you can check if two
variables are related. It’s important but it’s far from sufficient.
• As a manager you’ll need to predict values. But real products are complicated : a car is a combination of multiple
interacting factors. As a marketer you must know which ones are important for customers and which relative value those
customers give to each feature. Even more important: you need to create a model to predict how desirable a new car
would be.

Source: Gartner (blog)


Ch 4. Prediction and Regression
3
1 Econometrics
1.1 Why “regression” and “econometrics”
• Some examples:
 Our company is working on a new laptop. We know prices of each competitors on the market and features of each computer. Given
characteristics of our computer (screen size, chipset, 3D card, number of HDMI….), what is the optimum market price for this new
laptop so as to be attractive?
 Given past financial performance of all companies in the car industry, can we predict the performance of Ford?
 Given characteristics of houses in Lille, can we estimate if the house we want to purchase is over valued? Which factors have a real
effect on the price of houses? What is the effect of one more m² on the price? What is the effect of location? How can we define
“location”? Is colour important? Is it more important than a basement?
 Given knowledge about contaminations for a given disease, can we predict the number of cases next month? (yes mathematical
tools from epidemiology and econometrics are closely related).
….

• Then we want to measure the quality of our models and check when they are usable.
 A model that predicts the price of a laptop with a margin of error close to 45K€ is completely useless.
 A model that predicts only how much “men, students, in Lille, going twice to the cinema per month” are willing to
pay for this computer is completely useless as it’s only a tiny part of the potential market.
Ch 4. Prediction and Regression
4
1 Econometrics
1.1 Why “regression” and “econometrics”
• Historically methods to “predict” are coming from a field of science called Econometrics. Originally its purpose was to
test economic theories with real data.
• It does happen that tools developed for econometrics are incredibly interesting from a business point of view. They are
efficient, scientifically validated and well known by a large community of professionals. Some might even say “90% of
Market finance is econometrics”.
• Methods created for Econometrics are the most basic foundations of Big Data and Artificial Intelligence. Basically the
question was to develop even more efficient tools able to manipulate mountains of non structured data.
• In this lecture we cover the first important method called linear regression. More accurately the multivariate regression
whose objective is to predict a quantitative value thanks to one or many predictors.

… and there are many other ones!

Econometrics is the application of statistics to real data to extract simple relationships.


The linear regression is a method among other ones.
Ch 4. Prediction and Regression
5
1 Econometrics
1.2 Introductory example

You work for Tata Group. The Tata group comprises over 90 operating companies in seven business sectors. The group has
operations in more than 80 countries across six continents. Today, Tata Group wants to advertise and promote its image in
France but doesn’t know the French Advertising Market. The company especially wants to avoid mistakes such as wasting
money on low impact media. You need to find the most cost-effective media.
Specifically, the company is interested in the effectiveness of advertising to sell cars, foodstuff or IT products. A sample of 98
other companies has been selected for studies during a period of one month. You have been given the following data:

 Activity of the company: Car manufacturer, IT Product or Foodstuff.


 Change in Brand name fame: Brand name fame after campaign – Brand name fame before campaign. A The predicted value
 YouTube: amount spent on YouTube advertising by the company(K€)
 Social Media: amount spent on Facebook, Instagram… advertising (K€)
 TV: amount spent in television advertising (K€)
 Strength: advertising quality, from 0 to 10, according to an independent analyst company. (Example: 10 for a very effective message, 0 for
a boring ad).
 Timing: effective campaign timing, from 0 to 10. (Example: 0= advertising cold beer in winter; 10= advertising cold beer in summertime,
during a Football match break).
 Actor: “YES” if a famous actor appears in the add, “NO” otherwise
 Innovative: “Very”, “usual” or “conservative” ad according to an independent analyst company. A very innovative ad promotes for
example use of Facebook.
Ch 4. Prediction and Regression
6
1 Econometrics Book: chapter 13
1.2 Introductory example part 1

• We created a problematic:
 “How can we create the best marketing campaign?”
• We want to check assumptions :
 “A well-known actor is going to change the effectiveness of an add on TV “
 “ YouTube for cars is more efficient”
• We want a model:
 “Change in Brand Name Fame = 10 + 8xInvestment on TV + 5xFamous Actor”
• We must check the accuracy of this model
 “If we spend €1 on TV, can we say that Fame is going to increase by 8 units? Is this value accurate? Is it exactly 8 or
from 2 to 14?
 “Can we demonstrate that this model allows us to forecast the effect of a new marketing campaign?”
Ch 4. Prediction and Regression
7
1 Econometrics
1.3 Definitions

• We use regression analysis to predict the value of a dependent variable based on one or more independent variables.
 The dependent variable is called Y. It’s a continuous quantitative variable (a price, a temperature…). It’s also called
the explained variable.
 The independent variable is called X. It might be a quantitative or a qualitative variable. It’s also called the
explicative or explanatory variable. When we face more than one independent variable, they are called X 1, X2, …. Xk.

Ex: we want to predict the price of a house (Y) using its surface (X1) and its age (X2).

• The link between variables is a model. It’s just an estimate of the true relationship between those variables and it
probably neglects the effect of many other variables.
 We want to obtain a correct specification of the model i.e. when we selected all the right variables to explain Y.
 We want to obtain a correct shape for the model i.e. when we created the right function to link X and Y.

Ex: Y is a function of surface and age. We assume that other variables have no effect. This specification is probably not
correct as many other variables influence the price. We assume that the shape or “link between variables” is linear. No
exponential shape and so on. It would be close to a function like Y= a + b X 1 + c X2
Ch 4. Prediction and Regression
8
1 Econometrics
1.3 Definitions
• A scatter plot typically displays an explicative variable on the x-axis and the predicted variable on the y-axis.
Ex: we want to predict the price of a house (K€) using surface (m²). For clarity of purpose, we neglect “age”. The scatter plot
shows, as expected, that a larger house tends to be more expensive.

Scatter plot between surface (x-axis) and price (y-axis)


900

800

700

600

500

400

300

200

100

0
50 100 150 200 250 300 350
Ch 4. Prediction and Regression
9
1 Econometrics
1.3 Definitions
• The relationship is Y=f(X)+ where f is the function which links X to Y and  is the error as a model is never perfect.

900

= Observed Value for house i 800


700
Error for house i
= Predicted Value for house i 600
500

400

300

200
Model
100

0
50 100 150 200 250 300 350

Surface of house i
•  is needed: some observed values for X or Y in the database may be faulty (error in coding…), Y may be explained by
other variables, the function may under-estimate the effect of X on Y…
• Obviously, we would like this error to be as small as possible!
Ch 4. Prediction and Regression
10
1 Econometrics
1.3 Definitions
• The equation of the model is the function that links X and Y.
• When we work on the whole population, the equation of this model is
900

800

700

600

500

400

300
Model
200 S
100
Y axis intercept
0
50 100 150 200 250 300 350
• The Y-axis intercept is . The slope is . Let’s assume that the equation of the line is .
 When X, the surface, is increasing by 1, then the price of the house is increasing by 2,000€
 When X is equal to 0 then the price of the house is 10,000 (?). Well… there is no such house, or it might be taxes or
the land itself?
Ch 4. Prediction and Regression
11
1 Econometrics
1.3 Definitions

Intercept in Slope Independent


the population coefficient in variable
the population

Dependent
variable

Linear
component Random error
term

• Don’t forget that Y is the REAL value, it’s not “on” the line.
• The distance between the line (the predicted value) and the real value is the error.
Ch 4. Prediction and Regression
12
1 Econometrics
1.3 Definitions
• The error or statistical error ε is the amount by which a real value differs from the predicted value when we work on the
whole population. It should be as small as possible. This  may be explained by:
 a specification error: faulty selection of explicative variables or we forgot important ones.
Ex: predict the price of a house but not take into consideration the location
 a miss-specified function: we created a linear relationship while the real shape is non-linear.
Ex: modeling COVID with a linear shape when it’s exponential
 faulty data or outliers in the database.
Ex: the surface of a house is coded 100.000 m². Probably an error in the dataset!
• Practically, the error is NEVER observed as it assumes that the model has been created on the whole population…. While
you probably only collected a sample. What is observed is the residual called e. It’s an observable estimate of ε created
thanks to a sample.
Ex: the mean age in the population is 20. We select a sample from this population. The average age in this sample is
17. If a man is 21 then the residual is 4= 21-17 while the error is 1=21-20.

The error ε is obtained when the model is created from the population
The residual e is obtained when the model is created from a sample.
Ch 4. Prediction and Regression
13
1 Econometrics
1.3 Definitions
• As the sample might not behave exactly like the population, errors and residuals are not the same.
• We would like them to be the same as in this case that the sample is a good representation of the population.
900

800

700
Residual for house i Error for house i
600

500

400 Model on the population


Model on a sample
300

200

100

0
50 100 150 200 250 300 350
Ch 4. Prediction and Regression
14
1 Econometrics
1.3 Definitions
• In the population the model is (Greek letters as population):

• In a sample the model is (Latin letters as sample):

• We say that and are the estimated value of and . They are also called estimate of the intercept and estimate of the
slope. If we’re lucky (if the sample behaves exactly like the population), both values are equal.
• Y is the REAL value.
• All econometrics can be summarized in this simple sentence: how confident can we be that estimated coefficients are
good estimations of the real but unknow ones?
Ch 4. Prediction and Regression
15
1 Econometrics
1.3 Definitions

Estimated Slope Independent


Estimated coefficient in variable
Intercept in the sample
the sample

Predicted or
estimated Y
value for
observation i

No error term

• By definition, the predicted value is on the line. The error is the gap between the line and the real value… but we do not
discuss about the real value here! So we don’t write any “error”.
• Now let’s move to some practice. At the beginning, for pedagogical purpose, we assume that we face a single explicative
variable X and that the relationship between variables is indeed linear (we remove this limitation in the last part).
Ch 4. Prediction and Regression
16
2 Your first model (one explicative variable) Book: chapter 13
2.1 Global method part 2

2.1.1 Steps
What should be done to create a “good” model?

1) Draw a scatter plot between X and Y to get a first look


2) Transform your data if you detect a non-linear relationship
3) Estimate the coefficients and
4) Estimate the global quality of the model What is the quality of the model? Do we explain 10% 80?
5) Check that this model is valid. For instance, we assume implicitly that the relationship is linear. But is it true?
6) Check if the model is globally significant i.e., at least one explicative variable really influences Y
7) Check the individual significance i.e., check variable per variable that all of them really influence Y?
8) Estimate confidence intervals for estimated parameters. It’s kind of like a « margin of error ». For instance, 1m² changes
the price by 1000€ +/- 100 (quite accurate) or +/- 800 (not accurate)?
Ch 4. Prediction and Regression
17
2 Your first model (one explicative variable)
2.1 Global method
2.1.2 Example
• This same example (same as in the textbook, chapter 13 part 1) is studied over
House Price Square Feet
(Y) (X)
the whole document.
• 245 1400
A real estate agent wishes to examine the relationship between the selling
price of (thousands of $) a home and its size (measured in square feet). 312 1600
• A random sample of 10 houses is selected. Notice that it’s far too small for 279 1700
practical models. 308 1875
199 1100
219 1550
• Open SPSS and load the file Regression1.sav
405 2350
324 2450
319 1425
255 1700
Ch 4. Prediction and Regression
18
2 Your first model
2.1 Global method
2.1.2 Example
Draw a scatter plot with the Regression
plot or any other method

The link between variables is (more or


less) linear. We can add a line fit (activate
the plot with a double click and select fit
line / linear)

SPSS displays the equation of this line…


but where is it coming from?
Ch 4. Prediction and Regression
19
2 Your first model (one explicative variable)
2.2 Estimation of coefficients
2.2.1 Logic
• How can we estimate accurately and ? It’s easy to say that we select the “line the most in the middle of the cloud” but
it’s not satisfying.
• For a given point we want to minimize the error .
• For all points, we want to minimize the sum of all errors for the n points but some errors are positive, and others are
negative. They cancel each other and we might think that the model is a good one even if the distance to the line is large.
• So the strategy is to minimize the sum of squared errors

• As the error is the distance between the observed value and the predicted value, we want to minimize

• Which is equal to

• We want to minimize the sum of squared errors. Thus, the name “OLS” or ordinary least squared method or “MCE” pour
moindres carrés ordinaires.
Ch 4. Prediction and Regression
20
2 Your first model (one explicative variable)
2.2 Estimation of coefficients
2.2.1 Logic
We look for and that minimize this equation....

𝑛 𝑛
2
𝑀𝑖𝑛 ∑ 𝑒𝑖 =𝑀𝑖𝑛 ∑ ( 𝑦 𝑖 −𝒃𝟏 𝑥 𝑖 −𝒃 𝟎 )
2 and we can replace in equation 1…
But you will never has to derive them by hand.
𝑖=1 𝑖=1
(equation 1)
Ch 4. Prediction and Regression
21
2 Your first model (one explicative variable)
2.2 Estimation of coefficients
2.2.1 Example
We come back to SPSS and our dataset.
We select Analyze/Regression/Linear Regression

We obtain different tables and among them


the regression coefficients.
Estimated
Intercept value called b0

Explicative Estimated
variable value called b1

The regression equation is


Ch 4. Prediction and Regression
22
2 Your first model (one explicative variable)
2.2 Estimation of coefficients
2.2.1 Example 450
400 Value of the slope b1
• For the intercept,

House Price ($1000s)


350
0.110
300
250
200
150
Value of the Intercept b0 100
98.24 50
0
1000 1200 1400 1600 1800 2000 2200 2400 2600
Square Feet
Ch
The regression equation is
ap
13-
22
• When the surface is equal to 0, then the estimated average price of the house Y is $98.240 .
• There is no house with such a surface. Practically it’s meaningless. Maybe this amount might be related to the land or
taxes (?)
Ch 4. Prediction and Regression
23
2 Your first model (one explicative variable)
2.2 Estimation of coefficients
2.2.1 Example 450
400 Value of the slope b1
• For the slope

House Price ($1000s)


350
0.110
300
250
200
150
Value of the Intercept b0 100
98.24 50
0
1000 1200 1400 1600 1800 2000 2200 2400 2600
Square Feet
Ch
The regression equation is
ap
13-
23
• When the surface is increasing by one unit, the estimated average price of the house Y is increasing by 0.110 units
• We would say “if the surface is increasing by one unit, then the estimated price of the house Y is increasing by 0.110
thousand of dollars or $110”, everything else remaining constant.
• We obtained something incredibly important for a real estate agent: the link between surface and price!
Ch 4. Prediction and Regression
24
2 Your first model (one explicative variable)
2.2 Estimation of coefficients
2.2.1 Example 450
400 Value of the slope b1
• For the prediction

House Price ($1000s)


350
0.110
300
250
200
150
Value of the Intercept b0 100
98.24 50
0
1000 1200 1400 1600 1800 2000 2200 2400 2600
Square Feet
Ch
The regression equation is
ap
13-
• For a house whose surface is 2000, then the predicted price is 24
Ch 4. Prediction and Regression
25
2 Your first model (one explicative variable)
2.2 Estimation of coefficients Relevant model
2.2.1 Example 450
400
• For the prediction

House Price ($1000s)


350
300
250
200 model
Irrelevant Irrelevant model
150
100
50
0
1000 1200 1400 1600 1800 2000 2200 2400 2600
Square Feet
Ch
ap
• The first major trap lies here. Can we predict the price of a house whose surface is 500 sq. feet? 13-
 We can use this model to predict the price of houses between 1000 and 2500 sq. feet. 25
 We can NOT use this model to predict the price of tiny or larger houses as we don’t even have data about them
• A model is meaningful only in the data interval. More accurately the further you move, the larger the risk of a large
error. The model seems rather linear in the interval 1000 ; 2500 but it does not mean that your equation as well as the
linear assumption will always be valid outside this interval.
Ch 4. Prediction and Regression
26
2 Your first model (one explicative variable)
2.3 Practice

• Problem 13.9 has been solved on video (with all details). You can watch it at home and practice on the same problem.
• You can practice on
 Problem 13.4 (file “cars”). Don’t forget that answers are available at the end of the book
 Problem 13.6 (file “FTMBA”)
• The most important part is to be able to interpret the meaning of coefficients.
• You can, of course, try now to predict any quantitative variable thanks to another quantitative variable.

Ch
ap
13-
26
Ch 4. Prediction and Regression
27
3 Quality of a model (one explicative variable) Book: chapter 13
3.1 Concept part 3
• A model without an easy to read tool to measure its quality is completely worthless.
• Quality can be defined as “on average a close proximity between the predicted value and the observed value”.

Y Y

Good model Low quality model


Strong relationship Weak relationship
X X

Y Y

X X
Ch 4. Prediction and Regression
28
3 Quality of a model (one explicative variable)
3.1 Concept
• Where there is no relationship between X and Y, we can observe two typical shapes
 A random cluster of dots without any specific orientation
 For any value of X, Y remains the same

Y Y

X X

No relationship
Ch 4. Prediction and Regression
29
3 Quality of a model Y
3.1 Concept real price

B=

predicted price

average price

Xi
• A good model has to explain why the price of your house is not equal to the price of the average house in the city. X
 The distance is the quantity to explain.
 The distance B=is the distance between the observed value and the predicted value. Its other name is the residual e
*. It’s what the model fails to explain.
 The distance is what the distance between the predicted value and the average. It’s what the model can actually
explain.
Ch 4. Prediction and Regression
30
3 Quality of a model Y
3.1 Concept real price

predicted price

average price

Xi
• Let’s define quality as the ratio .
X
 When the model is really good, the error is close to 0 so
 When the model is really bad, the error is large, and close to A, so
• But we worked with only one house and the definition of quality has to integrate all houses.
Ch 4. Prediction and Regression
31
3 Quality of a model
3.1 Concept
The ratio remains the same
Quality=
But we take into consideration all houses:
Quality=
Unfortunately some real values might be above or below the predicted values (the line) so we might have issues with the
sign depending on the relative location of each dot. An idea is to remove this problem with the squared value:
Quality=
The quality is a ratio between “the squared sum of all the values that the model explain” and “the squared sum of all
quantities to be explained”. Let’s call this ratio “r²” or “coefficient of determination”.

Well, this explanation is just giving you the intuition. If we want to be accurate…
Ch 4. Prediction and Regression
32
3 Quality of a model Y
3.2 Definition real price

𝑆𝑆𝐸=∑ (𝑌¿¿𝑖−𝑌
^ )²¿
𝑆𝑆𝑇=∑ (𝑌¿¿i−𝑌)²¿
𝑖

predicted price

average price 𝑆𝑆𝑅=∑ (𝑌^ ¿¿𝑖−𝑌)²¿

Xi
• We can demonstrate that:
X

Total Sum of squares = Error sum of squares + Regression Sum of Squares


Quantity to be explained = what the model can’t explain + what the model can explain
Ch 4. Prediction and Regression
33
3 Quality of a model (one explicative variable)
3.2 Definition
• The exact name of the three parts are
 SST= total sum of squares or total variation or the variation of prices around the mean price.
 SSR= regression sum of squares or explained variation.
 SSE= error sum of squares or unexplained variation (for lack of some explicative variables…).

• Quality, called coefficient of determination is the proportion of the total variation (variance) in the dependent variable
that is explained by variation (variance) in the independent variable
• The coefficient of determination is called r-squared and is denoted as r2

• Important property: (see note below)


Ch 4. Prediction and Regression
34
3 Quality of a model (one explicative variable)
3.2 Definition

Y Y
Y

X X
X
Y Y r2 = 0

X X
r2 = 1 0 < r2 < 1
Ch 4. Prediction and Regression
35
3 Quality of a model (one explicative variable)
3.2 Definition
• There is no exact consensus for what a “good” r² is. The larger it is, the better the explanation of the variance of the
explained variable.
• An r² close to 0.97 in physics is probably the mark of a faulty theory. An r² close to 0.6 in sociology might be excellent
news! Some (subjective) values might be:
 Low quality
 Limited
 Medium
 Good
 9 Very good
 Excellent model or danger

• The danger comes from a “too good” model. Have we really created a model or estimated something obvious and
useless? Have we cheated the data?
• R² just measures the strength of the link in data and does not care about causality, plausibility or relevancy.
Ch 4. Prediction and Regression
36
3 Quality of a model (one explicative variable) R²
3.3 Example
We come back to SPSS and our dataset.
We select Analyze/Regression/Linear Regression

SSR

2 18934 SSE
𝑟 =0.581=
32600
SST

• In our model, variations in surface explain 58% of variations in prices.


• The quality of this model is limited. It does not mean that surface has no effect on price (when we think about it, a single
variable with such a large effect means that it’s an interesting one) but we probably forgot to take into consideration
many other variables such as age, location…
Ch 4. Prediction and Regression
37
3 Quality of a model (one explicative variable) Standard error of the estimate

3.4 Standard error of the estimate


• A second tool to estimate quality is the standard error of the
estimate called SYX in the book.
• It measures the average dispersion around the regression line.
• The formula (that you don’t need to know) is:

 n is the sample size and k the number of predictors. In this part, as we have a single explicative variable, k=1 so the
denominator is (n-2).
 In the numerator, SSE is the Sum of squared error and is measured in $². The square root makes us come back to $.
 The denominator is the number of degrees of freedom (we will come back to that). In this context, it’s the “quantity
of usable information to increase the accuracy of the model over the minimum quantity of information that we need
to estimate unknown coefficients”. The sample size is n while two unknow values b 0 and b1 are estimated. The
number of usable information to increase accuracy is thus (n-2) as we “need” two values to estimate two unknown.

• In the example, the estimated value is 41.330: the average distance between the true price of a house and it’s estimated
value is $41,330 … and the model is not that accurate!
• If SSE=0, obviously the model is perfect and thus the distance to the line is equal to 0!
Ch 4. Prediction and Regression
38
3 Quality of a model (one explicative variable)
3.4 Standard error of the estimate
Y Y

Small standard Large standard


error of the error of the
estimate estimate

X X
• The magnitude of SYX should always be judged relative to the size of the Y values in the sample data.
• The lower the value, the more interesting the model is.
Ch 4. Prediction and Regression
39
3 Quality of a model (one explicative variable)
3.4 Standard error of the estimate

• Problem 13.21 has been solved on video (with all details). You can watch it at home and practice on the same
problem.

• You can practice on


 Problem 13.16 (file “cars”). Don’t forget that answers are available at the end of the book
 Problem 13.18 (file “FTMBA”)
• The most important part is to be able to interpret the quality of a model thanks to the R²
Ch 4. Prediction and Regression
40
4 Multiple Linear Regression Book: chapter 14
part 1
4.1 The logic
• Good news: the multiple linear regression (with more than one explicative variable) follows exactly the same logic as
the simple linear regression (with one explicative variable). Some specific aspects will change, though.
• Basics remain identical: one explained quantitative variable Y is explained by k explicative variables called X 1, X2, …, Xk.
Notice that n denotes the sample size and k the number of explicative variables.
• The model to predict the value for case i is:

Error

Explained Explicative
Coefficients variables

• There are k explicative variables but (k+1) coefficients to estimate. They are , , …, . It’s a common trap.
Ch 4. Prediction and Regression
41
4 Multiple Linear Regression
4.1 The logic
• As with the simple linear regression, coefficients are estimated thanks to a sample. They are called b 0, b1, …, bK;

Predicted
value for Explicative
Estimated
individual I variables
Coefficients

• The dataset in SPSS is exactly like the ones that you’re used to manipulate: one row per case and as many columns as
variables.
• There are k explicative variables but (k+1) estimated coefficients.
Ch 4. Prediction and Regression
42
4 Multiple Linear Regression
4.1 The logic
• With one explicative variable, we estimate a line. With two we estimate a surface and so on.

Y Ŷ  b0  b1X1  b 2 X 2

X1
ble
ri a
v a
r
fo
l op
e X2
S
ia ble X 2
lo pe for var
S

X1
Chap 14-42
Ch 4. Prediction and Regression
43
4 Multiple Linear Regression
4.2 Practical example
• During your internship, you work for Baker&Pies which produces and sells pies
(how surprising) but shelf’s life is short (typically no more than one day).

• You wants to evaluate factors thought to influence demand as loss might be


significant is supply does not match demand. According to experts, the two most
important variables to predict sales (units) are advertising (hundreds of dollars)
and price (dollars)
• Data were collected for 15 weeks.
• The dataset regression2.sav is available in IESEG-Online. Open it. Chap 14-43
Ch 4. Prediction and Regression
44
4 Multiple Linear Regression
4.2 Practical example
• Go to Analysis/Regression/Linear Regression and select two explicative variables:

The model is assumed to be

Chap 14-44
Ch 4. Prediction and Regression
45
4 Multiple Linear Regression
4.2 Practical example
• Results are:

• Predicted sales are

b1= - 24.975 b2= 74.131


If the price is increasing by one unit (one If advertising is increasing by one unit
dollar) then sales decrease by 24.975 ($100) dollar) then sales increase by
units, everything else remaining the 74.131 units, everything else remaining
same. the same.

• As a manager and given this model, should you invest in ads? We will check….
Chap 14-45
Ch 4. Prediction and Regression
46
4 Multiple Linear Regression
4.2 Practical example
• Results are:

• If we plan the price to be $5.5 and to spend $350 in advertising, predicted sales are:

Don’t forget the unit! ($100)


dollar)

Chap 14-46
Ch 4. Prediction and Regression
47
4 Multiple Linear Regression
4.2 Practical example
• Results are:

• 52.1% of the variation in sales is explained by the variation in prices and advertising.
• The quality of this model is limited.

Until now nothing new….but the R² is problematic in a multiple linear regression….

With a multiple linear regression, the coefficient of determination r² is not recommended.


It has to be replaced by the Adjusted R²

Chap 14-47
Ch 4. Prediction and Regression
48
4 Multiple Linear Regression
4.2 Practical example
• The coefficient of determination R² is a limited tool. Practically it does not take into consideration the number of
explicative variables nor the sample size.

• Problem n°1: not able to distinguish between models:


 Let’s assume that your company created a model with 2 variables and one with 200 variables but R² are the same.
Which model is the best? R² does not tell you.
• Problem n°2: the R² will always increase if a new explicative variable is added to the model. It can’t decrease.
 If we create a model with many variables, R² increases little by little… so we might end with an excellent R² while
each explicative variable has a negligible effect on the explained variable.
 We would like a “tool” whose value is increasing if a variable brings important information and decreasing if this
variable does not provide anything new.
 Even worse: the more we select explicative variables, the higher the risk of a fortuitous but disastrous correlation.
Example: you want to explain income and R² is 0.7 when explained by education, age, gender and experience). You select
randomly 54 new variables (shoe size, car color...) and R² jumps to 0.97. The model is meaningless as R² increases even if
the variable has no real effect on income.
• Problem n°3: when the number of explicative variables increases by one, we “decrease” the number of degrees of
freedom by one (remember that it’s more or less sample size – nb of estimated coefficients). More DF is desirable as it
decreases the margin of error of each confidence interval.*
Chap 14-48
Ch 4. Prediction and Regression
49
4 Multiple Linear Regression
4.3 Adjusted R²
• The adjusted coefficient of determination or Adj R² is the proportion of variation in Y explained by the variation of all
explicative variables adjusted for the number of variables and sample size:

where n is sample size, k the number of explicative variables

Example: which model is preferable for you? model A R²=80% n=100 k=2
Model B R²=81% n=100 k=90

Technically, the R² of the second model should make you select it. But Model B requires 90 explicative variables while the
first one only needs 2: 88 more explicative variables only increase quality by 1%. Many of them are probably useless.

The Adj R² helps us: adjR² model A = 1- (1-0.8)(99)/(97) = 79%


adjR² model B = 1- (1-0.81)(99)/(9) = close to 1%

The quality of the model, when we take into consideration the sample size and the number of variables, is 79% for the first
model and only 1% for the second one. The first model is much better from a practical point of view.

Chap 14-49
Ch 4. Prediction and Regression
50
4 Multiple Linear Regression
4.3 Adjusted R²

• The Adjusted R²
 penalizes excessive use of unimportant independent variables.
 Is smaller than R²
 Can be negative (but it’s not desirable!)
 Tends to be close to R² when all explicative variables have a significant effect on Y. A low difference between R² and
Adjusted R² is desirable.

• In the example, 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into
account the sample size and number of independent variables.

Chap 14-50
Ch 4. Prediction and Regression
51
4 Multiple Linear Regression
4.3 Practice

• Problems 14.4 and 14.14 has been solved on video (with all details). You can watch it at home and practice on the same
problem.
 For 14.4 only questions a, b, c (you can not solve d nor e yet)
 For 14.14 only questions c and d.

• You can practice on


 Problem 14.6 question a, b, c, d (file “Bestcompanies”). Don’t forget that answers are available at the end of the
book
 Problem 14.8 questions a, b ,c , d(file “Restaurant”)
 Problem 14.17 questions c and d.

• The most important part is to be able to interpret coefficients, the r² and the adjusted r².

Chap 14-51

You might also like