Professional Documents
Culture Documents
Chapter 8-10 Contigency Table, Correlation and Regression
Chapter 8-10 Contigency Table, Correlation and Regression
Chapter 8-10 Contigency Table, Correlation and Regression
E
❑where: O and E denote the observed and expected
frequencies
❑The chi-squared test measures the disparity between
observed frequency (data from the sample ) and expected
frequency
5/12/2023 Sisay W.(PhD) 11
Assumptions of the 2 - test
• No observed frequency is zero
2
• Calculations:
= (260− 247.86) / 247.86 + (299−311.14)
2 2 2
/ 311.14
• X2 ≈ 9
5/12/2023 Sisay W.(PhD) 20
Using SPSS
1. Analysis➔ Descriptive statistics➔ Crosstab
Under crosstabs
Under ‘statistics’
‘Chi square’,
‘risk’.
Under ‘Cells’,
‘rows’ .
depression diagnosis
depression
non-case case Total
gender female Count 497 358 855
% within gender 58.1% 41.9% 100.0%
male Count 420 160 580
% within gender 72.4% 27.6% 100.0%
Total Count 917 518 1435
% within gender 63.9% 36.1% 100.0%
Compare percentages
between different
Chi-Square T ests exposure status
1
( xi - x )( yi - y )
1 n xi - x yi - y
r=
n − 1 i =1 sx s y
= i =1
n −1 sx s y
( x - x )( y - y )
i i
r= i =1
= ˆ
n n
[ ( xi - x ) ][ ( yi - y ) 2 ]
2
i =1 i =1
– -1 ≤ r ≤ 1
– The value r=1 and r=-1 occur when there is an exact linear
relationship between x and y.
x x
y y
x
Sisay W.(PhD)
x
5/12/2023 37
Scatter Plot Examples
Weak relationships
Strong relationships
y y
x x
y y
x x
5/12/2023 Sisay W.(PhD) 38
Scatter Plot Examples
No relationship at all
y
x
5/12/2023 Sisay W.(PhD) 39
Examples of Approximate r Values
y y y
x x x
r = -1 r = -.6 r=0
y y
r = +0.3 x r = +1 x
5/12/2023 Sisay W.(PhD) 40
Scatter plot
It is useful to:
• Depicts the pattern of the data on the plane.
• Shows whether x and y are related or
independent.
• Shows ranges for x and y
• Shows unique observations
• Suggests what type of models you can fit to the data
y = β 0 + β1x + ε
Linear component Random Error
component
Sisay W.(PhD)
5/12/2023 63
6
3
Example: Linear relationship (e.g. Y=cholesterol versus
X=age)
Y = b0 + b1X
b0 is the intercept,
b1 is the slope.
Prediction
5/12/2023 665
5
Predicted value for an individual…
yi= 𝛽0 + 1 xi + random errori
Fixed –
exactly
on the Follows a
line normal
distribution
Sisay W.(PhD)
5/12/2023 66
6
6
Linear Regression Model:
Meaning of and
> 0 [positive slope], < 0 [negative slope]
rise
run
=slope (=rise/run)
=y-intercept
Sisay W.(PhD)
5/12/2023 67
17.8
Linear regression
• This model-known as the regression line (average)-
is the equation of a straight line.
• The parameters 𝛽0 and 𝛽1 are constants called the
coefficients of the equation; 𝛽0 is the y-intercept of
the line and 𝛽1 is its slope.
• The y-intercept is the mean value of the response y
when x is equal to 0·
• The slope is the change in the mean value of y that
corresponds to a one-unit increase in x. If 𝛽1 is
positive, µy |x increases in magnitude as x increases;
if 𝛽1 is negative, µy|x decreases as x increases.
Sisay W.(PhD)
5/12/2023 68
The method of least square –parameter
estimation
• In regression model, two variables X and Y are of interest. The
variable X is the explanatory and Y is a response.
– Suppose that we were to draw an arbitrary line through the
scatter data of the variables X and Y.
– Lines sketched by two different individuals are unlikely to be
identical, even though both might attempt to show the same
trend
– The question then arises as to which line best describes the
relationship between X and Y that fits well the data: One
mathematical technique for fitting a straight line to a set of
points is known as the method of least square
Sisay W.(PhD)
5/12/2023 69
The Method of least square
Sisay W.(PhD)
5/12/2023 70
Using the sample data:
y =𝛽መ0 + 𝛽መ1 x
Where
o 𝛽መ0 is the intercept or constant;
o 𝛽መ1 is the gradient or slop;
o These parameters are referred to as Regression
coefficients
Response (Y)
Total
Yi −Y Unexplained Variability Yi − Yˆi
Variability
_ Explained Variability Ŷi −Y
Y
Predictor (X)
Sisay W.(PhD)
5/12/2023 73
Least squares criterion
Sisay W.(PhD)
5/12/2023 74
Which line has the best “fit” to the data
Sisay W.(PhD)
5/12/2023 76
Checking assumptions
• Normality
– Histogram with normality plot
– Checking skewness
– K-S normality test
• Linearity
– Scatter plot (No Excuse)
• Homogeneity (homoscedasticity) of variances
– Residual plot - Residuals Versus Fitted Values
• it can help us to detect outlying observations in the sample.
• Failure in the assumption of homoscedasticity.
• If the residuals do not exhibit a random scatter but instead follow a
distinct trend the true relationship between x and y might not be
linear
Sisay W.(PhD)
5/12/2023 77
Steps in testing significance in regression
analysis
• Step 1. Observe the data
– Y represents the response and X represents explanatory
– All observations (measurements) must be independent
– The underlying population from which the sample is
selected must be normal.
• Step 2. Construct a scatter plot
• Step 3. Define the model:
Y = 𝛽0 +𝛽1 X
𝛽1 - the Y intercept and 𝛽1 - the slope are called the coefficients of
the regression equation.
𝛽0 Is the mean value of Y when X=0
𝛽1 is the change in Y that corresponds to one unit change in X
Sisay W.(PhD)
5/12/2023 78
Steps…
•Step 4. Estimate the coefficients y: =𝛽መ0 + 𝛽መ1 x
Using the least square method:
Sisay W.(PhD)
5/12/2023 80
Example: Education & Job Prestige
• The actual SPSS regression results for that data:
Model Summary Estimates of 𝛽0 and 𝛽1 :
Adjusted Std. Error of “Constant” = 𝛽0 = 9.427
Model R R Square R Square the Estimate Slope for “Year of School” =
1 .521a .272 .271 12.40 𝛽1 = 2.487
a. Predictors: (Constant), HIGHEST YEAR OF SCHOOL
COMPLETED Coefficientsa
Standardi
zed
Uns tandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 9.427 1.418 6.648 .000
HIGHEST YEAR OF
SCHOOL COMPLETED 2.487 .108 .521 23. 102 .000
Standardi
zed
Uns tandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 9.427 1.418 6.648 .000
HIGHEST YEAR OF
SCHOOL COMPLETED 2.487 .108 .521 23. 102 .000
Sisay W.(PhD)
5/12/2023 84
Multiple linear regression
Sisay W.(PhD)
5/12/2023 85
How Good is the Model?
One of the measures of how well the model explains the
data is the R2 value. Differences between observations
that are not explained by the model remain in the error
term.
The R2 value tells you what percent of those differences is
explained by the model. An R2 of .68 means that 68% of
the variance in the observed values of the dependent
variable is explained by the model, and 32% of those
differences remains unexplained in the error term.
Sisay W.(PhD)
5/12/2023 86
Coefficient of determination
• Explained variation + unexplained variation =Total variation
• The ratio of the explained variation to the total variation
measures how well the linear regression line fits the given pairs
of scores. It is called the coefficient of determination, and is
denoted by r².
explained var iation
r2 =
totalvar iation
• The explained variation is never negative and is never larger than
the total variation. Therefore, r² is always between 0 and 1. If the
explained variation equals 0, r² = 0.
Sisay W.(PhD)
5/12/2023 87
R-square
• In most cases, the ratio would fall somewhere between these
extremes, that is, between 0.0 and 1.0.
• One minus this ratio is referred to as R-square or the coefficient
of determination.
• This value is immediately interpretable in the following manner.
If we have an R-square of 0.4 then we know that the variability
of the Y values around the regression line is 1- 0.4 times the
original variance.
• In other words, we have explained 40% of the original
variability, and are left with 60% residual variability.
• Ideally, we would like to explain most if not all of the original
variability.
5/12/2023
Sisay W.(PhD) 88
Review: R-Square
• Visually: Deviation is partitioned into two
parts
“Error
4 Variance”
“Explained
2 Variance”
Y-bar
-4 -2 0 2 4
-2
Y=2+.5X
-4
Sisay W.(PhD)
5/12/2023 89
Residual Variance and R-square
♣ The R-square value is an indicator of how well the model fits the
data
♣ An R-square close to 1.0 indicates that we have accounted for
almost all of the variability with the variables specified in the
model.
Sisay W.(PhD)
5/12/2023 90
End of SLR & MLR