Regression-SIMPLE LINEAR

Simple Linear Regression
Correlation
Correlation analyzes the LINEAR ASSOCIATION between two
variables. The CORRELATION COEFFICIENT (r) gives an
indication of the STRENGTH and DIRECTION of association
between the two variables.
Doesn’t differntiate between independent and dependent

variable
Eg: Height and Weight
Height and IQ
Regression
• Regression refers to the statistical technique of modeling the
relationship between variables.
• In simple linear regression, we model the relationship
between two variables.
• One of the variables, denoted by Y, is called the dependent
variable and the other, denoted by X, is called the
independent variable.
• The model we will use to depict the relationship between X
and Y will be a straight-line relationship (linear)
• A graphical sketch of the the pairs (X, Y) is called a scatter
plot.
This scatterplot locates pairs of observations of Scatterplot of Advertising Expenditures (X) and Sales (Y)
advertising expenditures on the x-axis and sales 140
on the y-axis. We notice that: 120
100
.
80
S ale s
60
40
20
0
0 10 20 30 40 50
A d ve rti s ing
 The scatter of points tends to be distributed around a positively sloped straight line.
 The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
 The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
 The line represents the nature of the relationship on average.
Examples of Other Scatterplots
0
Y
Y
Y
X 0 X X
Y
Y
X X X
Regression Analysis
In regression analysis we use the independent variable (X) to
estimate the dependent variable (Y).
• The relationship between the variables is linear.
• Both variables must be at least interval scale.
• The least squares criterion is used to determine the
equation.
6
Method of least squares-Example
X Y (Observed)
1 3
2 6
3 6
4 7
5 8
Linear Regression Model
Assumptions
• The true relationship form is linear (Y is a linear function of X,
plus random error)
• The error terms, εi are independent of the x values
• The error terms are random variables with mean 0 and
constant variance, σ2
• The random error terms, εi, are not correlated with one
another
• No multicollinearity ( correlation between independent
variables)
Simple Linear Regression Model
Y Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error for this Xi

of Y for Xi value
Intercept = β0
Xi
X
Simple Linear Regression Equation
The simple linear regression equation provides an estimate of the
population regression line
Estimated (or Estimate of the Estimate of the

predicted) y regression regression slope
value for intercept
observation i
Value of x for
yˆ i  b0  b1x i observation i
The individual random error terms ei have a mean of zero
ei  ( y i - yˆ i )  yi - (b0  b1x i )
Interpretation of the
Slope and the Intercept
• b0 (intercept) is the estimated average value

of y when the value of x is zero (if x = 0 is
in the range of observed x values)
• b1 (slope)is the estimated change in the

average value of y as a result of a one-unit
change in x
Measures of Variation
• Total variation is made up of two parts:

SST  SSR  SSE
Total Sum of Regression Sum of Error Sum of
Squares Squares Squares
SST   (y i  y)2 SSR   (yˆ i  y)2 SSE   (y i  yˆ i )2

where:
y
= Average value of the dependent variable
yi = Observed values of the dependent variable
ŷ
= Predicted value of y for the given xi value
i
Measures of Variation
(continued)
• SST = total sum of squares

– Measures the variation of the yi values around their
mean, y
• SSR = regression sum of squares
– Explained variation attributable to the linear
relationship between x and y
• SSE = error sum of squares
– Variation attributable to factors other than the linear
relationship between x and y
Regression Analysis – Interpretation of Results
1. Explanatory Power: R-squared, Adjusted R-squared

gives you the ‘explanatory power’ of the set of
independent variables used in the model. It ranges
from zero to one, higher the better.
2. Goodness-of-fit: given by the significance of the F-

value. Only if the F-statistic is significant, your
regression model is good, else you need to revisit the
specification of your variables in the model.
3. Regression Coefficients: The standardized regression

coefficients give the extent and direction of influence of
a particular independent variable on the dependent
variable. The statistical significance of this coefficient
is given by the corresponding t-value.
Coefficient of Determination, R2
• The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
• The coefficient of determination is also called R-
squared and is denoted as R2
2 SSR regression sum of squares

R  
SST total sum of squares
2
note: 0 R 1
Adjusted Coefficient of
Determination, R 2
(continued)
• Used to correct for the fact that adding non-relevant independent
variables will still reduce the error sum of squares
2 SSE / (n  K  1)
R  1
SST / (n  1)
(where n = sample size, K = number of independent variables)
– Adjusted R2 provides a better comparison between

multiple regression models with different numbers of
independent variables
– Penalize excessive use of unimportant independent
variables
– Smaller than R2 Chap 13-16
Simple Linear Regression Example
• A real estate agent wishes to examine the relationship

between the selling price of a home and its size
(measured in square feet)
• A random sample of 10 houses is selected

– Dependent variable (Y) = house price in $1000s
– Independent variable (X) = square feet
Sample Data for House Price Model
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Output
Regression Statistics
R Square 0.58082
The regression equation is:
Adjusted R Square 0.52842
Standard Error 41.33032 house price  98.24833  0.10977 (square feet)
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Prediction
• The regression equation can be used to

predict a value for y, given a particular x
• For a specified value, x , the predicted

value is
ŷ  b 0  b1x
Predictions Using
Regression Analysis
Predict the price for a house
with 2000 square feet:
house price  98.25  0.1098 (sq.ft.)
 98.25  0.1098(200 0)
 317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Multiple Regression
What if there are several factors affecting the
independent variable?
As an example, think of the price of a home as a
dependent variable. Several factors contribute to the
price of a home… among them are square footage, the
number of bedrooms, the number of bathrooms, the
age of the home, whether or not it has a garage or a
swimming pool, if it has both central heat and air
conditioning, how many fireplaces it has, and, of
course, location.
Regression with dummy variables
When one or more of the independent variables
is non-metric in nature
Quantify the qualitative variable by coding
Dummy variables usually take values of 0 and 1
Researcher interested in explaining or predicting a
metric dependent variable from a set of metric
independent variable (although dummy
variables may also be used).
Regression provide information on

1. Statistical significance of independent variable
2. The strength of association between one or
more of the predictors and the criterion
3. A predictive equation for future use.

Regression-SIMPLE LINEAR

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression-SIMPLE LINEAR

Uploaded by

Copyright:

Available Formats

Simple Linear Regression

Doesn’t differntiate between independent and dependent

on the y-axis. We notice that: 120

Predicted Value Random Error for this Xi

Estimated (or Estimate of the Estimate of the

The individual random error terms ei have a mean of zero

• b0 (intercept) is the estimated average value

• b1 (slope)is the estimated change in the

• Total variation is made up of two parts:

SST   (y i  y)2 SSR   (yˆ i  y)2 SSE   (y i  yˆ i )2

• SST = total sum of squares

1. Explanatory Power: R-squared, Adjusted R-squared

2. Goodness-of-fit: given by the significance of the F-

3. Regression Coefficients: The standardized regression

2 SSR regression sum of squares

– Adjusted R2 provides a better comparison between

• A real estate agent wishes to examine the relationship

• A random sample of 10 houses is selected

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

• The regression equation can be used to

• For a specified value, x , the predicted

house price  98.25  0.1098 (sq.ft.)

Regression provide information on

You might also like