Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

BUSINESS APPLICATIONS OF LINEAR REGRESSION

BASIC CONCEPTS AND INTERPRETATION


Types of business data

Three common regression applications: continuous


variable, binary variable and time series
OUTLINE
Linear regression as a three-in-one tool

Why correlation is not causation


DATA IN BUSINESS

 Companies routinely collect and accumulate huge amounts of data:


❖ sales by customers, goods and services
❖ HR statistics on employees
❖ interest rates, stock prices and other financial time series
❖ profitability of individual business units and so on

 Regression methods allow you to effectively obtain information from data in order to make proper management
decisions

 Even a simple regression analysis in MS Excel can give a competitive advantage because you can better understand your
clients, your employees, your competitors and your products
QUANTITATIVE AND QUALITATIVE VARIABLES

All business data can be divided into qualitative and quantitative

Quantitative data take the form of numeric variables – sales in units, revenue in dollars,
count of employees and so on

Qualitative data, on the other hand, takes the form of categorical variables – type of a
product (smartphone, watch, notebook etc), customer satisfaction on a scale or binary
response (did you buy our product?)

Linear regression can be used to analyze both quantitative and qualitative data (after
certain modifications)
Business data can be cross-sectional (measured
simultaneously for many units: employee salaries,
revenue by product etc) or time series (monthly
interest rate, stock price etc)

CROSS
Panel data are combination of those two types when
a group of units is observed over several periods
SECTIONAL
(sales of different brands per month) VERSUS TIME
SERIES DATA

Linear regression can be used to analyze cross-


sectional, time series and panel data
EXAMPLE 1: PREDICTING PRICE OF AN APARTMENT

 What determines price of an apartment? There are potentially many factors such as size, years since last
renovation, number of bedrooms etc

 A realtor may collect information on 100 apartments including apartment price (the key variable of interest) and
potential factors affecting the price

Number of Years Since


# Apartment Price
Bedrooms Last Renovation
1 87,500 0 7
2 120,500 2 10
… … … …
100 215,000 3 2
A simple linear regression (including just Number of
Bedrooms and Years Since Last Renovation) may take the
form:*

Apartment Price = 65,000 + 15,000*Number of Bedrooms –


1,800*Years Since Last Renovation
EXAMPLE 1:
SIMPLE MODEL
FOR PRICE Hence, a newly renovated (Years Since Last Renovation = 0)
studio apartment (Number of Bedrooms = 0) would cost
65,000 dollars

We call this first coefficient “intercept” or “constant”


which shows the price when all factors are equal to 0

*Later we will also need to add t-statistics or p-values for statistical significance
Let’s continue with our simple model:

Apartment Price = 65,000 + 15,000*Number of Bedrooms –


EXAMPLE 1: 1,800*Years Since Last Renovation

OTHER
COEFFICIENTS Here an additional bedroom is associated with a price
increase of 15,000 because it indicates a larger apartment

Similarly, a house which was renovated two years ago


would cost less by 3,600 ( = -1,800*2) dollars
Now we can easily predict the price of an apartment if we know the number
of bedrooms and years since the last renovation

For example, an apartment with two bedrooms which was renovated 5 years
ago would cost 86,000:

EXAMPLE 1:
PREDICTION
Apartment Price = 65,000 + 15,000*2 – 1,800*5 = 86,000

Notice that we have not yet discussed the notion of statistical significance –
despite a substantial effect of additional bedroom (based on the coefficient of
15,000) the number of bedrooms may be not associated with price in
statistical sense
 Regression is a great three-in-one tool
allowing you to:

WHY IS 1) Determine which factors affect price and

LINEAR which don’t (using statistical significance)

REGRESSION 2) Estimate the magnitude of this effect (i.e.


15,000$ extra for an additional bedroom)

SO 3) Predict the price for a new house not in


your sample (most important for business!)

POWERFUL? In this course we will discuss how to apply each


tool using Analysis ToolPak in MS Excel
EXAMPLE 2: QUALITATIVE DATA

PRICE OF AN APARTMENT IS AN EXAMPLE OF A MANY BUSINESS PROBLEMS INVOLVE QUALITATIVE BINARY VARIABLES REPRESENT AN IMPORTANT
QUANTITATIVE VARIABLE (SIMILARLY TO SALES, VARIABLES – A CLIENT’S DECISION TO BUY OUR EXAMPLE OF QUALITATIVE VARIABLES: WILL
WAGES ETC) PRODUCT (YES/NO), SATISFACTION WITH THE CLIENT BUY? WILL EMPLOYEE CHURN?
SERVICE (VERY SATISFIED, SATISFIED, NEUTRAL,
DISSATISFIED, VERY DISSATISFIED)
EXAMPLE 2: WILL CUSTOMER CHURN?

 It is easy to apply linear regression to answer this question by modifying “Yes” to 1 and “No” to 0

 A simple linear probability model (LPM)* can take the form


Customer Churns = 0.07 - 0.01*Months Using Service + 0.20*Number of Complaints

 Here a new client (Months Using Service = 0) with no complaints (Number of Complaints = 0) is likely to churn next
month with probability of 7% (intercept of 0.07)

*Logit and probit are two common alternatives to LPM but their interpretation is not as easy
EXAMPLE 2: OTHER COVARIATES

 We can continue with our simple model


Customer Churns = 0.07 - 0.01*Months Using Service + 0.20*Number of Complaints

 Each additional month will decrease the probability of churn by 1% point – client who has been with us for 1 year
is less likely to leave by 12% points ( = -0.01*12)

 Finally, a client with two complaints is more likely to leave by 40% points ( = 0.20*2)

 We can evaluate all our customers on the probability of churn and make special offers to those who are most
likely to leave
EXAMPLE 3: TIME SERIES

 Many business variables represent a time series – stock price, interest rate, weekly sales etc

 Imagine that Ice-cream Sales in a given week depend on outside Temperature and whether it is a week in Summer
Ice-cream Sales = 345 + 15*Temperature + 120*Summer

 As usual, the intercept of 345 shows Sales when all other variables are equal to 0

 Sales increase by 15K dollars when Temperature increases by 1 degree


EXAMPLE 3: INDICATOR OR DUMMY VARIABLE

 We can continue entertaining our simple model for time series of Ice-cream Sales
Ice-cream Sales = 345 + 15*Temperature + 120*Summer

 Variable Summer is called an indicator or dummy variable – it is equal to 1 if week is in Summer and 0 otherwise

 Observe that our model predicts that Ice-cream Sales are higher by 120K dollars during summer weeks

 As before, we can use the model for prediction – Sales in May when Temperature = 50 will be equal to 1095K
dollars ( = 345 + 15*50)
 One of the common mistakes is to draw causal
conclusions based on the linear regression model

CORRELATION  For example, Sales of ice-cream may positively


depend on the Temperature outside
Sales = 450 + 20*Temperature

IS NOT  However, if we flip the equation and estimate the

CAUSATION!
dependence of Temperature on Sales we will also
find a positive effect!

 Of course, we cannot conclude that selling more


ice-cream will increase outside temperature!
GREAT EXAMPLE OF SPURIOUS CORRELATION FROM TYLER VIGEN

“Spurious correlations” from http://tylervigen.com/spurious-correlations is licensed under CC BY 4.0


THANK YOU!

Front and back photo by Scott Graham on Unsplash

You might also like