Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

Regression

KNOWLEDGE FOR THE BENEFIT OF


HUMANITY

Presented by Ayesha khan


1
Topic Learning Outcomes
At the end of this lecture, students should be able to;
• identify types of regression analysis and their use.
• explain assumptions to be met when using Simple Linear
Regression.
• perform Simple Linear Regression analysis using SPSS.
• explain how to interpret the SPSS outputs from Simple
Linear Regression analysis.

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN 2
Introduction
• Linear regression is the next step up after
correlation.
• It is used when we want to predict the value of a
variable based on the value of another variable.
• The variable we want to predict is called the
dependent variable (or sometimes, the outcome
variable).
• The variable we are using to predict the other
variable's value is called the independent
variable (or sometimes, the predictor variable).

12/07/16 DR ATHAR KHAN - LCMD 3


Introduction
• For exam performance can be
example,
predicted based on revision time; whether
cigarette consumption can be predicted based
on smoking duration; and so forth.
• If you have two or more independent variables,
rather than just one, you need to use multiple
regression.

12/07/16 DR ATHAR KHAN - LCMD 4


Assumptions

• Assumption #1: Your two variables should


be

measured at the continuous level (i.e., they are

either interval or ratio variables).

12/07/16 DR ATHAR KHAN - LCMD 5


Assumptions
• Assumption #2: There needs to be a linear
relationship between the two variables.
• Creating a scatter plot using SPSS Statistics and
then visually inspect the scatter plot to check for
linearity.
• If the relationship displayed in your scatter plot is
not linear, you will have to either run a non-
linear regression analysis, perform a polynomial
regression or "transform" your data.

12/07/16 DR ATHAR KHAN - LCMD 6


12/07/16 DR ATHAR KHAN - LCMD 7
Assumptions
• Assumption #3: There should be no
significant outliers.
• An outlier is an observed data point that has a
dependent variable value that is very different to
the value predicted by the regression equation.
• As such, an outlier will be a point on a
scatterplot that is (vertically) far away from the
regression line indicating that it has a large
residual. The difference between the individual
value in the sample and the observable sample
mean is a residual.
12/07/16 DR ATHAR KHAN - LCMD 8
Residual
In regression analysis, the difference between the
observed value of the dependent variable (y) and the predicted value (ŷ) is called
the residual (e). Each data point has one residual.
Residual = Observed value - Predicted value
e= y- ŷ
Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and
e = 0.

12/07/16 DR ATHAR KHAN - LCMD 9


12/07/16 DR ATHAR KHAN - LCMD 10
Assumptions
• Assumption #4: independence
of observations, which you can easily check
using the Durbin-Watson statistic.
• If observations are made over time, it is likely
that successive observations are related.
• If there is no autocorrelation (where subsequent
observations are related), the Durbin-Watson
statistic should be between 1.5 and 2.5.
12/07/16 DR ATHAR KHAN - LCMD 11
12/07/16 DR ATHAR KHAN - LCMD 12
Assumptions
• Assumption #5: Data needs to
show homoscedasticity, whichis where
variances along the line of best fit remain similar
the
as you move along the line.

12/07/16 DR ATHAR KHAN - LCMD 13


12/07/16 DR ATHAR KHAN - LCMD 14
Assumptions
• Assumption #6: Finally, residuals (errors) of
the regression line are approximately normally
distributed
• Two common methods to check this assumption
include using either a histogram (with a
superimposed normal curve) or a Normal P-P
Plot.

12/07/16 DR ATHAR KHAN - LCMD 15


12/07/16 - MD 15
Regression
• Regression analysis is the estimation of linear
relationship between a dependent variable and one or
more independent variables or covariates
• Regression is used to predict the value of the dependent
variable when value of independent variable(s) known
• Does not imply causality
• Regression analysis requires interval and ratio-level
data.

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN 17
Scatter Plot
• To see if your data fits
the models of regression,
it is wise to conduct a
scatter plot analysis.
• The reason?
– Regression analysis
assumes a linear
relationship. If you have
a curvilinear relationship
or no relationship,
regression analysis is of
little use.

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN 18
Regression line
• The best straight line
description of the plotted
points
• Regression line is used to
describe the association
between the variables.

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN 19
Beta (β) regression coefficient
• Predicts the variation of dependent variable by
changing one unit of explanatory (independent)
variable.
Regression coefficient
(change in Y when X increases by 1)
Exam scores

Y = a + βx
Intercept
(value of Y when X=0)

a { 2 4 6 8
Sleeping (hours) 6
0 SCHOOL OF NUTRITION AND DIETETICS • UNIV
ERSITI SULTAN ZAINAL ABIDIN
1. Unstandardized coefficient and
standardized coefficient : 1 unit change
in iv will predict -----(value) change in
dv.
2. Unstandardized : when both variables
are in same unit. (personality traits and
resilience)
3. Standardized: when both of your
variables are in different unit. (age and
time) when your are giving meaning to
the raw data.
Coefficient of determination, R2
• R2 represents how much
proportion of the variation
of dependent variable
explained by the
independent variable.
– R2 = 1, indicates that
the regression line

Y Changes
perfectly fits the data R2=0.75
– R2 = 0, indicates that
the line does not fit Only 75%
the data at all. of Y
changes
explained
by X.
SCHOOL OF NUTRITION AND DIETETICS • UNIV
ERSITI SULTAN ZAINAL ABIDIN 22
Types of regression analysis
• Simple Linear Regression
– 1 numerical variable (dependent) vs. 1 numerical
variable
(independent)
• Multiple Linear Regression
– 1 numerical variable (dependent) vs. more than 1 numerical
variable (independent)
• Multivariable Linear Regression
– 1 numerical variable (dependent) vs. more than 1 numerical or
categorical variables (independent)
• Multivariate Linear Regression
– More than 1 numerical or categorical variables (dependent) vs.
more than 1 numerical or categorical variables (independent)
• Logistics Regression
– 1 categorical variable (dependent) vs. more than 1 numerical or
categorical variables (independent)
SCHOOL OF NUTRITION AND DIETETICS • UNIV
ERSITI SULTAN ZAINAL ABIDIN 23
• Simple linear regression: 1 dv and 1 IV
• Multiple linear regression: 1 dv and more than 1 iv
• Multivariable linear regression 1 dv and one iv with
subscale
• Multivariate linear regression: 1 dv and more than 1
iv with levels
• Logistics regression: 1 dv but categories and iv with
categories
Research Q’s and Hypothesis
Example;
• Research Question
– Is sleeping hours a predicting factor of exam scores?

• Null Hypothesis (Ho: β = 0)


– There is no linear relationship between the sleeping
hours and exam scores

• Alternate Hypothesis (Ha: β ≠ 0)


– There is a significant linear relationship between the
sleeping hours and exam scores

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN 25
Assumptions
• The data is drawn from a random sample of
population.
• The data is independent to each other.
• The relationship between two variables must
be
linear.
• There is normal distribution of y at any point of
x.
• There is equal variance of y at any point of x.

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN 26
Assumptions 3 - Linearity
1

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN 27
Assumptions 3 – Linearity (cont.)
4
6

Put the independent variable


into “X Axis” box

Put the dependent variable


into “Y Axis” box

7 12
SCHOOL OF NUTRITION AND DIETETICS • UNIV
ERSITI SULTAN ZAINAL ABIDIN
Assumptions 3 – Linearity (cont.)

8
To add regression line;
Double click on the plots

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Assumptions 3 – Linearity (cont.)

The relationship between two


variables is linear

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Assumptions 4 – Normal distribution

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Assumptions 4 – Normal distribution (cont.)

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Assumptions 5 – Equal variance

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Assumptions 5 – Equal variance (cont.)

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Assumptions 5 – Equal variance (cont.)

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Simple Linear Regression in SPSS
1

2
3

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Simple Linear Regression in SPSS
4
5

Put the independent variable


into “X Axis” box

Put the dependent variable


into “Y Axis” box

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Simple Linear Regression in SPSS

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
Simple Linear Regression in SPSS

9
SCHOOL OF NUTRITION AND DIETETICS • UNIV
ERSITI SULTAN ZAINAL ABIDIN
SPSS Output
1

• The table demonstrates the method used in this data analysis.


• No variable selection was carried out.

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
SPSS Output
2

The ‘Model Summary’ table shows the


• Correlation coefficient (R)
• Coefficient of determination (R2)

• The correlation coefficient (r) is 0.463 and thus there is fair


positive linear relationship between the two variable.
• The coefficient of determination (r2) is 0.214.
• Thus 21.4% of variation of exam scores is explained by sleeping
hours.
SCHOOL OF NUTRITION AND DIETETICS • UNIV
ERSITI SULTAN ZAINAL ABIDIN
SPSS Output
3

The ANOVA table explicates the p value of the relationship .

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
SPSS Output
4

Y=a+βx
The coefficients table shows
• the slope of the line (β),
• the intercept at y axis (constant),
• the p value of the relationship.

SCHOOL OF NUTRITION AND DIETETICS • UNIV


ERSITI SULTAN ZAINAL ABIDIN
SPSS Output Interpretation
• The slope of the regression line (β) is 3.456 with y axis
intercept at 39.151.
• Increase 1 hours of sleeping hours will increase
3.456 exam scores.
• The regression equation:
Exam scores = 39.151 + 3.456 (sleeping hours)

• The p value is < 0.05, therefore reject null hypothesis.


• There is a significant linear relationship between
sleeping hours and exam scores (p<0.001).
• Sleeping hours is a significant predicting factor for
exam scores.
28
SCHOOL OF NUTRITION AND DIETETICS • UNIV
ERSITI SULTAN ZAINAL ABIDIN
Results Presentation
Table: Relationship between sleeping hours and exam scores
β (95% CI) t statistics P value* R2

Sleeping hours 3.456 (3.166, 3.746) 23.354 < 0.001 0.214


*Simple Linear Regression

There is a significant linear


relationship between sleeping hours
and exam scores (p<0.001). It is
observed that an Increase 1 hours of
sleeping hours will increase 3.456
exam scores. Sleeping hours is a
significant predicting factor for
exam scores.
29
SCHOOLOFNUTRITIONANDDIETE TICS • UNIVERS ITI SULTAN ZAINAL ABIDIN
T h a n k Yo u

30

You might also like