Professional Documents
Culture Documents
Session 4 Correlation and Regression
Session 4 Correlation and Regression
Session 4 Correlation and Regression
REGRESSION
Objectives:
Discuss and perform correlation and
regression using MS EXCEL and SPSS;
Discuss and perform z-test, t-test and f-test
(ANOVA-Analysis of Variance) using MS
EXCEL and SPSS;
Construct and perform the statistical analysis
and interpretation on the statistic
REVIEW ON HYPOTHESIS
TESTING
Hypothesis
A statistical hypothesis is a
conjecture about a population
parameter.
This conjecture may or may not
be true (Bluman, 2016).
Types of Hypothesis:
The null hypothesis, symbolized by H0, is a
statistical hypothesis that states that there is
no difference between a parameter and a
specific value, or that there is no difference
between two parameters.
The alternative hypothesis, symbolized by H1,
is a statistical hypothesis that states the
existence of a difference between a parameter
and a specific value, or states that there is a
difference between two parameters.
A statistical test uses the data
obtained from a sample to make a
decision about whether the null
hypothesis should be rejected.
The numerical value obtained from a
statistical test is called the test value.
Types of Error:
A type I error (alpha error) occurs if
you reject the null hypothesis when it
is true.
A type II error (Beta error)occurs if
you do not reject the null hypothesis
when it is false.
Level of Significance (α)
The level of significance is the maximum
probability of committing a type I error. This
probability is symbolized by α (Greek letter
alpha). That is, P(type I error) = α.
Statisticians generally agree on using three
arbitrary significance levels: the 0.10, 0.05,
and 0.01 levels.
Level of Significance (α)
That is, if the null hypothesis is rejected, the
probability of a type I error will be 10%, 5%, or
1%, depending on which level of significance is
used.
Here is another way of putting it:
When α=0.10, there is a 10% chance of rejecting
a true null hypothesis;
when α=0.05, there is a 5% chance of rejecting a
true null hypothesis;
and when α=0.01, there is a 1% chance of
rejecting a true null hypothesis.
Level of Significance (α)
In a hypothesis-testing situation, the
researcher decides what level of
significance to use.
It does not have to be the 0.10, 0.05,
or 0.01 level.
It can be any level, depending on
the seriousness of the type I error.
0.05 for educational researches
Methods of Hypothesis
Testing:
X Y
Hours of Training Number of Accidents
Shoe Size Height
Cigarettes smoked per day Lung Capacity
Score on NAT Grade Point Average
Height IQ
It is used to measure the strength of relationships between
two variables and to use this measure of strength to decide
whether or not any significant linear relationship exists.
Correlation
Measures and describes the strength and
direction of the relationship
Variable 1 Variable 2
Variable 1
Variable 3
Variable 2 Multivariate
Scatter Plots and Types of Correlation
x = hours of training (horizontal axis)
y = number of accidents (vertical axis)
60
50
Accidents
40
30
20
10
0
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
x = NAT score
y = GPA
4.00
3.75
3.50
3.25
GPA
3.00
2.75
2.50
2.25
2.00
1.75
1.50
300 350 400 450 500 550 600 650 700 750 800
Math NAT
Positive Correlation – as x increases, y increases
Scatter Plots and Types of Correlation
x = height y = IQ
160
150
140
130
IQ
120
110
100
90
80
60 64 68 72 76 80
Height
No linear correlation
Scatter Plots and Types of Correlation
Strong, negative
relationship
but non-linear!
Correlation Coefficient “r”
–1 0 1
If r is close to –1 If r is close to 0 If r is close to 1
there is a strong there is no there is a strong
negative linear positive
correlation. correlation. correlation.
Sample Application
95 Final
90 Absences Grade
85
80 x y
75 8 78
70
2 92
Final Grade
65
60 5 90
55 12 58
15 43
50
45
40 9 74
6 81
0 2 4 6 8 10 12 14 16
Absences
X
Computation of r
x y xy x2 y2
1 8 78 624 64 6084
2 2 92 184 4 8464
3 5 90 450 25 8100
4 12 58 696 144 3364
5 15 43 645 225 1849
6 9 74 666 81 5476
7 6 81 486 36 6561
= a residual
260 Best fitting straight line
250
240
Revenue
230
220
210
200
190
180
95
90
Final Grade
85
80
75
70
65
60
55
50
45
40
0 2 4 6 8 10 12 14 16
Absences
Note that the point = (8.143, 73.714) is on the line.
Predicting y Values
The regression line can be used to predict
values of y for values of x falling within the range of
the data.
65
60 5 90
55 12 58
15 43
50
45
40 9 74
6 81
0 2 4 6 8 10 12 14 16
Absences
X
Computation of r
x y xy x2 y2
1 8 78 624 64 6084
2 2 92 184 4 8464
3 5 90 450 25 8100
4 12 58 696 144 3364
5 15 43 645 225 1849
6 9 74 666 81 5476
7 6 81 486 36 6561
Critical Values ± t0
t
–4.032 0 4.032
df\p 0.40 0.25 0.10 0.05 0.025 0.01 0.005 0.0005
4. Find the critical value. 1 0.324920 1.000000 3.077684 6.313752 12.70620 31.82052 63.65674 636.6192
5. Find the rejection region.3 0.276671 0.764892 1.637744 2.353363 3.18245 4.54070 5.84091 12.9240
t
tc = - 9.811 - 4.032 0 + 4.032
– 4.032
Drug cases 21 18 10 15 4 17
Is there a significant relationship between
the frequency of information campaign and
drug cases in the city?
Exercises
4. The following data shows the student
enrolment for the past 5 years in a certain
school.
Year 2007 2008 2009 2010 2011
Sampling Techniques
Non-Probability Probability
Sampling Techniques Sampling Techniques
Population Sample
composition composition
Control
Characteristic Percentage Percentage Number
Sex
Male 48 48 480
Female 52 52 520
____ ____ ____
100 100 1000
Snowball Sampling
In snowball sampling, an initial group of
respondents is selected, usually at random.
Simple Random
Sampling
nh = n
h=1