Professional Documents
Culture Documents
Unit II D - Regression Modeling
Unit II D - Regression Modeling
Introduction
• Regression is a statistical technique to determine the linear relationship
between two or more variables.
• Regression is primarily used for prediction and causal inference.
• In its simplest (bivariate) form, regression shows the relationship between one
independent variable (X) and a dependent variable (Y), as in the formula below:
– The magnitude and direction of that relation are given by the slope parameter (β1)
– The status of the dependent variable when the independent variable is absent is
given by the intercept parameter (β0).
– An error term (u) captures the amount of variation not predicted by the slope and
intercept terms.
– The regression coefficient shows how well the values fit the data.
Regression vs Correlation
• Regression thus shows us how variation in one variable co-occurs
with variation in another.
• What regression cannot show is causation.
– For example, a regression with shoe size as an independent variable and foot size as
a dependent variable would show a very high regression coefficient and highly
significant parameter estimates, but we should not conclude that higher shoe size
causes higher foot size.
• Correlation determines the strength of the relationship between
variables, while regression attempts to describe that relationship
between these variables in more detail.
REGRESSION MODEL
Types of Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
LINEAR REGRESSION MODEL
The Linear Regression Model (LRM)
• The simple (or bivariate) LRM model is designed to
study the relationship between a pair of variables that
appear in a data set.
• The multiple LRM is designed to study the relationship
between one variable and several of other variables.
– In both cases, the sample is considered a random sample from some
population.
The Linear Regression Model
The output of a LRM is a function that predicts the
dependent variable based upon values of the
independent variables.
If, the two variables, X
and Y, are two measured
y’ = β0 + β1X ± є outcomes for each
є
variable (y)
Dependent
n
n
Xi 2 Yi 2
Xi Yi Xi Yi
n
X i Yi
X12 Y12
X i Yi i1
X1 Y1 X1Y1 i 1
n
X2 Y2 X22 Y22 X2Y2 β1 i 1
2
Xi
n
: : : : :
n
X 2i
i 1
Xn Yn Xn2 Yn2 XnYn
i 1 n
Xi Yi Xi2 Yi2 XiYi
ˆ X
β0 Y β1
Interpretation of Coefficients
• Slope (1)
– Y changes by 1 for each 1 unit
increase in X
• If 1 = 2, then Y is expected to
increase by 2 for each 1 unit
increase in X
• Y-Intercept (0)
– Average value of Y when X = 0
• If 0 = 4, then average Y is expected
to be 4 when X is 0
Example
• Consider the following data about food
intake by cows and milk yield collected
from a cattle farm:
• Food (kg) Milk yield (ltrs)
4 3.0
6 5.5
10 6.5
12 9.0
• What is the relationship between cows’
food intake and milk yield?
Scattergram
Parameter Estimation
n
X n Y
Xi Yi Xi2 Yi2 X iYi n
XiYi
i1 i i1 i
218
3224
4 3.0 16 9.00 12 i1 n 4
β1 2 0.65
6 5.5 36 30.25 33
n
X
2
296
32
n 2 i1 i
10 6.5 100 42.25 65 X 4
i1 i n
12 9.0 144 81.00 108
β 0 Y β1 X 6 0.658 0.80
32 24.0 296 162.50 218
13
Coefficient Interpretation
• Slope (1)
– Milk yield (Y) is expected to
increase by .65 ltr for each 1 kg
increase in food intake (X)
• Y-Intercept (0)
– Average milk yield (Y) is expected to
be 0.8 ltr. when food intake (X) is 0
14