Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

REGRESSION ANALYSIS

Introduction
• Regression is a statistical technique to determine the linear relationship
between two or more variables.
• Regression is primarily used for prediction and causal inference.
• In its simplest (bivariate) form, regression shows the relationship between one
independent variable (X) and a dependent variable (Y), as in the formula below:

– The magnitude and direction of that relation are given by the slope parameter (β1)
– The status of the dependent variable when the independent variable is absent is
given by the intercept parameter (β0).
– An error term (u) captures the amount of variation not predicted by the slope and
intercept terms.
– The regression coefficient shows how well the values fit the data.
Regression vs Correlation
• Regression thus shows us how variation in one variable co-occurs
with variation in another.
• What regression cannot show is causation.
– For example, a regression with shoe size as an independent variable and foot size as
a dependent variable would show a very high regression coefficient and highly
significant parameter estimates, but we should not conclude that higher shoe size
causes higher foot size.
• Correlation determines the strength of the relationship between
variables, while regression attempts to describe that relationship
between these variables in more detail.
REGRESSION MODEL
Types of Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear
LINEAR REGRESSION MODEL
The Linear Regression Model (LRM)
• The simple (or bivariate) LRM model is designed to
study the relationship between a pair of variables that
appear in a data set.
• The multiple LRM is designed to study the relationship
between one variable and several of other variables.
– In both cases, the sample is considered a random sample from some
population.
The Linear Regression Model
The output of a LRM is a function that predicts the
dependent variable based upon values of the
independent variables.
If, the two variables, X
and Y, are two measured
y’ = β0 + β1X ± є outcomes for each
є
variable (y)
Dependent

observation in the data


β1 = slope set.
β0 (y intercept) = ∆y/ ∆x

Independent variable (x)

Simple linear regression fits a straight line to the data.


Computation Table and Parameter Estimation

  n 
n
Xi 2 Yi 2
Xi Yi Xi Yi
n
  X i   Yi 
X12 Y12
 X i Yi    i1 
X1 Y1 X1Y1 i 1

n
X2 Y2 X22 Y22 X2Y2 β1  i 1
2
 Xi 
n
: : : : :
n
  
 X 2i   
i 1
Xn Yn Xn2 Yn2 XnYn
i 1 n
Xi Yi Xi2 Yi2 XiYi
ˆ X
β0  Y  β1
Interpretation of Coefficients
• Slope (1)
– Y changes by 1 for each 1 unit
increase in X
• If 1 = 2, then Y is expected to
increase by 2 for each 1 unit
increase in X
• Y-Intercept (0)
– Average value of Y when X = 0
• If 0 = 4, then average Y is expected
to be 4 when X is 0
Example
• Consider the following data about food
intake by cows and milk yield collected
from a cattle farm:
• Food (kg) Milk yield (ltrs)
4 3.0
6 5.5
10 6.5
12 9.0
• What is the relationship between cows’
food intake and milk yield?
Scattergram
Parameter Estimation
n
  X  n Y 
Xi Yi Xi2 Yi2 X iYi n
 XiYi 
 i1 i  i1 i 
218 
3224
4 3.0 16 9.00 12 i1 n 4
β1   2  0.65
6 5.5 36 30.25 33
n
  X 
2
296 
32
n 2  i1 i 
10 6.5 100 42.25 65 X  4
i1 i n
12 9.0 144 81.00 108

β 0  Y  β1 X  6  0.658   0.80
32 24.0 296 162.50 218

13
Coefficient Interpretation
• Slope (1)
– Milk yield (Y) is expected to
increase by .65 ltr for each 1 kg
increase in food intake (X)
• Y-Intercept (0)
– Average milk yield (Y) is expected to
be 0.8 ltr. when food intake (X) is 0
14

You might also like