Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Simple Linear Regression

Correlation
Correlation analyzes the LINEAR ASSOCIATION between two
variables. The CORRELATION COEFFICIENT (r) gives an
indication of the STRENGTH and DIRECTION of association
between the two variables.

Doesn’t differntiate between independent and dependent


variable
Eg: Height and Weight
Height and IQ
Regression
• Regression refers to the statistical technique of modeling the
relationship between variables.
• In simple linear regression, we model the relationship
between two variables.
• One of the variables, denoted by Y, is called the dependent
variable and the other, denoted by X, is called the
independent variable.
• The model we will use to depict the relationship between X
and Y will be a straight-line relationship (linear)
• A graphical sketch of the the pairs (X, Y) is called a scatter
plot.
This scatterplot locates pairs of observations of Scatterplot of Advertising Expenditures (X) and Sales (Y)
advertising expenditures on the x-axis and sales 140

on the y-axis. We notice that: 120

100
.
80

S ale s
60

40

20

0
0 10 20 30 40 50
A d ve rti s ing

 The scatter of points tends to be distributed around a positively sloped straight line.

 The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
 The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
 The line represents the nature of the relationship on average.
Examples of Other Scatterplots
0

Y
Y
Y

X 0 X X
Y

Y
X X X
Regression Analysis
In regression analysis we use the independent variable (X) to
estimate the dependent variable (Y).
• The relationship between the variables is linear.
• Both variables must be at least interval scale.
• The least squares criterion is used to determine the
equation.

6
Method of least squares-Example

X Y (Observed)

1 3

2 6

3 6

4 7

5 8
Linear Regression Model
Assumptions
• The true relationship form is linear (Y is a linear function of X,
plus random error)
• The error terms, εi are independent of the x values
• The error terms are random variables with mean 0 and
constant variance, σ2
• The random error terms, εi, are not correlated with one
another
• No multicollinearity ( correlation between independent
variables)
Simple Linear Regression Model

Y Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi

εi Slope = β1

Predicted Value Random Error for this Xi


of Y for Xi value

Intercept = β0

Xi
X
Simple Linear Regression Equation
The simple linear regression equation provides an estimate of the
population regression line

Estimated (or Estimate of the Estimate of the


predicted) y regression regression slope
value for intercept
observation i
Value of x for

yˆ i  b0  b1x i observation i

The individual random error terms ei have a mean of zero

ei  ( y i - yˆ i )  yi - (b0  b1x i )
Interpretation of the
Slope and the Intercept

• b0 (intercept) is the estimated average value


of y when the value of x is zero (if x = 0 is
in the range of observed x values)

• b1 (slope)is the estimated change in the


average value of y as a result of a one-unit
change in x
Measures of Variation

• Total variation is made up of two parts:


SST  SSR  SSE
Total Sum of Regression Sum of Error Sum of
Squares Squares Squares

SST   (y i  y)2 SSR   (yˆ i  y)2 SSE   (y i  yˆ i )2


where:
y
= Average value of the dependent variable
yi = Observed values of the dependent variable

= Predicted value of y for the given xi value
i
Measures of Variation
(continued)

• SST = total sum of squares


– Measures the variation of the yi values around their
mean, y
• SSR = regression sum of squares
– Explained variation attributable to the linear
relationship between x and y
• SSE = error sum of squares
– Variation attributable to factors other than the linear
relationship between x and y
Regression Analysis – Interpretation of Results

1. Explanatory Power: R-squared, Adjusted R-squared


gives you the ‘explanatory power’ of the set of
independent variables used in the model. It ranges
from zero to one, higher the better.

2. Goodness-of-fit: given by the significance of the F-


value. Only if the F-statistic is significant, your
regression model is good, else you need to revisit the
specification of your variables in the model.

3. Regression Coefficients: The standardized regression


coefficients give the extent and direction of influence of
a particular independent variable on the dependent
variable. The statistical significance of this coefficient
is given by the corresponding t-value.
Coefficient of Determination, R2
• The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
• The coefficient of determination is also called R-
squared and is denoted as R2

2 SSR regression sum of squares


R  
SST total sum of squares

2
note: 0 R 1
Adjusted Coefficient of
Determination, R 2

(continued)
• Used to correct for the fact that adding non-relevant independent
variables will still reduce the error sum of squares

2 SSE / (n  K  1)
R  1
SST / (n  1)
(where n = sample size, K = number of independent variables)

– Adjusted R2 provides a better comparison between


multiple regression models with different numbers of
independent variables
– Penalize excessive use of unimportant independent
variables
– Smaller than R2 Chap 13-16
Simple Linear Regression Example

• A real estate agent wishes to examine the relationship


between the selling price of a home and its size
(measured in square feet)

• A random sample of 10 houses is selected


– Dependent variable (Y) = house price in $1000s
– Independent variable (X) = square feet
Sample Data for House Price Model
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Output
Regression Statistics

R Square 0.58082
The regression equation is:
Adjusted R Square 0.52842
Standard Error 41.33032 house price  98.24833  0.10977 (square feet)
Observations 10

ANOVA
  df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Prediction

• The regression equation can be used to


predict a value for y, given a particular x

• For a specified value, x , the predicted


value is
ŷ  b 0  b1x
Predictions Using
Regression Analysis
Predict the price for a house
with 2000 square feet:

house price  98.25  0.1098 (sq.ft.)

 98.25  0.1098(200 0)

 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Multiple Regression
What if there are several factors affecting the
independent variable?
As an example, think of the price of a home as a
dependent variable. Several factors contribute to the
price of a home… among them are square footage, the
number of bedrooms, the number of bathrooms, the
age of the home, whether or not it has a garage or a
swimming pool, if it has both central heat and air
conditioning, how many fireplaces it has, and, of
course, location.
Regression with dummy variables
When one or more of the independent variables
is non-metric in nature
Quantify the qualitative variable by coding
Dummy variables usually take values of 0 and 1
Researcher interested in explaining or predicting a
metric dependent variable from a set of metric
independent variable (although dummy
variables may also be used).

Regression provide information on


1. Statistical significance of independent variable
2. The strength of association between one or
more of the predictors and the criterion
3. A predictive equation for future use.

You might also like