Professional Documents
Culture Documents
Regression-SIMPLE LINEAR
Regression-SIMPLE LINEAR
Correlation
Correlation analyzes the LINEAR ASSOCIATION between two
variables. The CORRELATION COEFFICIENT (r) gives an
indication of the STRENGTH and DIRECTION of association
between the two variables.
100
.
80
S ale s
60
40
20
0
0 10 20 30 40 50
A d ve rti s ing
The scatter of points tends to be distributed around a positively sloped straight line.
The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
The line represents the nature of the relationship on average.
Examples of Other Scatterplots
0
Y
Y
Y
X 0 X X
Y
Y
X X X
Regression Analysis
In regression analysis we use the independent variable (X) to
estimate the dependent variable (Y).
• The relationship between the variables is linear.
• Both variables must be at least interval scale.
• The least squares criterion is used to determine the
equation.
6
Method of least squares-Example
X Y (Observed)
1 3
2 6
3 6
4 7
5 8
Linear Regression Model
Assumptions
• The true relationship form is linear (Y is a linear function of X,
plus random error)
• The error terms, εi are independent of the x values
• The error terms are random variables with mean 0 and
constant variance, σ2
• The random error terms, εi, are not correlated with one
another
• No multicollinearity ( correlation between independent
variables)
Simple Linear Regression Model
Y Yi β0 β1Xi ε i
Observed Value
of Y for Xi
εi Slope = β1
Intercept = β0
Xi
X
Simple Linear Regression Equation
The simple linear regression equation provides an estimate of the
population regression line
yˆ i b0 b1x i observation i
ei ( y i - yˆ i ) yi - (b0 b1x i )
Interpretation of the
Slope and the Intercept
2
note: 0 R 1
Adjusted Coefficient of
Determination, R 2
(continued)
• Used to correct for the fact that adding non-relevant independent
variables will still reduce the error sum of squares
2 SSE / (n K 1)
R 1
SST / (n 1)
(where n = sample size, K = number of independent variables)
R Square 0.58082
The regression equation is:
Adjusted R Square 0.52842
Standard Error 41.33032 house price 98.24833 0.10977 (square feet)
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Multiple Regression
What if there are several factors affecting the
independent variable?
As an example, think of the price of a home as a
dependent variable. Several factors contribute to the
price of a home… among them are square footage, the
number of bedrooms, the number of bathrooms, the
age of the home, whether or not it has a garage or a
swimming pool, if it has both central heat and air
conditioning, how many fireplaces it has, and, of
course, location.
Regression with dummy variables
When one or more of the independent variables
is non-metric in nature
Quantify the qualitative variable by coding
Dummy variables usually take values of 0 and 1
Researcher interested in explaining or predicting a
metric dependent variable from a set of metric
independent variable (although dummy
variables may also be used).