Professional Documents
Culture Documents
EFM 515 Stats Lecture Notes
EFM 515 Stats Lecture Notes
EFM 515 Stats Lecture Notes
IN EDUCATIONAL RESEARCH
1. It permits the most exact kind of description i.e, it is
our descriptive language eg, 10% of the students
passed the exam.
2. Statistics forces us to be definite and exact in our
thinking which means it allows us to summarise our
results in a short, meaningful and convenient form.
Hence, it saves us from the hassles of giving long
narrative descriptions of phenomena
3. It enables us to analyse relationships between
variables, e.g The average pass rate for the A class is
60% while that of the B class is 40%.
4. Thus, it enables us to predict how much of a variable
will happen under certain conditions.
MEASURES OF CENTRAL TENDENCY
They indicate how much the terms in the
distribution move towards the middle; i.e
centre.
A distribution is a set of data / numbers.
Frequency is the number of times.
There are three Measures of Central Tendency
namely mode, median and mean.
1. MODE
-It is the piece of data with the highest
frequency, i.e, it appears most often than the
other characters.
1. MODE
A distribution can have no mode, one mode or
even more.
a). If a distribution has one mode, we say it is
unimodal.
b). If it has two modes, we say it is bimodal.
C). If it has three modes, it is called a trimodal
distribution
d). There is also a multimodal distribution
which has many modes
Advantages of Using the Mode in
Educational Research
There is no need to arrange numbers in order of size. It is not affected
by extreme values (outliers) as is with the range.
DISADVANTAGES
It does not use all the terms or characters in the distribution. It only
focuses on those which appear more frequently than others. Hence, it
is not normally used for further statistical calculations
SOLUTIONS
d) The modes are 81 and 82 hence it has a bimodal
e) No mode
f) The mode is 5 hence it is a unimodal
2. MEDIAN
It is the term which occupies the middle position
when the terms /numbers are arranged in terms of
size.
The terms can be arranged in ascending or
descending order
When the number of terms in the distribution is
odd the median is found using the formula:
Median = ½ (n+1) th term
Where n= number of terms in the distribution eg if
n= 15
Median = ½ (15+1) th term
=8
MEDIAN continues
SOLUTION
(a) Rank the numbers or arrange them in ascending order
(b) 1 3 4 5 6 7 7 8 9 n= 9
Median= ½ (9+1) th term
= 5th term
Therefore Median = 6
MEDIAN continues
(b) Ranking the numbers
19 24 29 36 50 60 77 82 100 105
= 50+60
2
= 110
2
DISADVANTAGES
-Does not use all the terms in the distribution
-It is not always used for further statistical
calculations
-It can be different from the terms in the distribution
MEAN
It is also called arithmetic mean or average
-It is obtained by adding all the terms in the
distribution and dividing the sum by the
number of terms in the distribution.
Mean = sum of terms
n
It is represented by x̅ (x bar)
∑ Summation sign. This means something is
being added
x1 + x2+ - - - - -=∑ x
MEAN continues
∑x = (x₁ + x₂ + x₃ - - - -xn)
n
n = Number of terms in the distribution
x̅ = ∑x
n
MEAN continues
ADVANTAGES OF MEAN
- It uses all the terms in the distribution
- It is used for further statistical calculations eg in
finding the variance and standard deviation.
- There is no need to rank the numbers.
DISADVANTAGES
-It is affected by the outliers
-It can be different from all the terms in the
distribution
Find the mean of each of the following
a) 60, 74, 88, 36, 54, 81, 93, 96, 50 68
a) x̅ = ∑x
n
MEAN continues
x̅ = 60+74+88+36+54+81+96+93+50+68
10
= 700
10
= 70
= 16912
5
x̅ = 3382,4
MEASURES OF DISPERSION /VARIABILITY/ Scatter/
SPREAD
Disadvantages
Only focuses on outliers and ignores all the
other terms
It is very misleading if the distribution has
outliers
The Interquartile range
numbers.
Examples 1
Find the IQR of the set 1,2,3,4,5,6,7
IQR cont’ed
Q1 = (n+1)/4 (7+1)/4=2
Q3 = ¾(n+1) ¾(7+1)=6
IQR = Q3-Q1
=6 - 2 = 4
IQR
Example 2 find IQR for this marks
30,25,80, 41,4o,56,65,77
Arrange: 25,30,40,41,56,65,77,80
Q1= (8+1)/4
= 2.25 therefore take the average of 30 and
40 and get Q1 = 35
Q3 = ¾(8+1)= 6.75 therefore take the average
of 65 and 77 and get Q3=71
So IQR= 71-35= 36
IQR cont’
Advantages
1. Can be used as a measure of variability if
the extreme values are not recorded
correctly
2. It is not affected by extreme values
IQR cont’
Disadvantages
1.Not easy to calculate
2.it is a positional measure, based on only the
twenty- 5fth and seventy-5 fth percentile
b. VARIANCE
X x̅ x -x̅
x₁ ↑ (x -x̅ )
x̅
↓ ( Xn - X̅)²
∑X 0 ∑ ( X - X̅ )²
SUM OF SQUARED
DEVIATION
CALCULATION OF VARIANCE
X X̅ X - X̅ (X - X̅ )²
60 71 -11 121
83 71 12 144
71 71 0 0
63 71 -8 64
89 71 18 324
90 71 19 361
40 71 -30 961
72 71 1 1
∑ 1976
VARIANCE continues
Variance (s²) = ∑ (x -x̅ )²
n- 1
= 1976
8 -1
= 1976
7
SD = √ 282,2857143
SD = √ ∑ (X - X̅)²
n–1
=√ 1976
7
=1680136049
= 16,80 (2 d p)
INTERPRETATION OF VARIANCE AND
STANDARD DEVIATION
A large value of the standard deviation/variance
shows that the values are widely scattered relatively
to the mean which means the greater the variance /
standard deviation the more widely spaced the
terms are above and below the mean. The smaller
the variance the more closely packed the values are
around the mean.
d. MEASURES OF RELATIVE STANDING
SCORES
They are used to indicate how an individual
compares to other individuals and determine his or
her relative position. They are concerned with how a
particular score stands in relation to other scores.
Maths 77 74 76 2,15
English 79 76 78 3,24
Z – score continues
a)Considering pupil A Z - score
Maths score = x - x̅ = 77- 76
s 2,15
= 1
2,15
= 0,465116279
= 0,465
Z - score continues
Pupil A Z - Score
Z- score = 79 – 78
3, 24
= 1
3,24
= 0, 308
Pupil A performed better in Maths since the z – score
in Maths is higher than in English.
Z- SCORE continues
Pupil B Maths z – score z = x- x̅
s
= 74 – 76
2, 15
= -2
2,15
= -0,930232558
Therefore to two decimal places = -0,93
Z - SCORE continues
Pupil B English Z-score
= x - x̅
s
= 76 – 78
3,24
= -2
3,24
= - 0,61728
x x x
x x
x x x x
x x x x
0 X
Graph showing no correlation
MEASURES OF ASSOCIATION continues
Graph showing negative correlation
y
x
x
x
x
x
x
x
x
0 X
MEASURES OF ASSOCIATION continues
Graph showing perfect positive correlation
y
x
x
x
x
x
x
x
0 x
MEASURES OF ASSOCIATION
continues
A graph showing perfect negative correlation
Y
x
x
x
x
x
x
x x
o x
MEASURES OF ASSOCIATION continues
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of
association between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive
correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive
correlation
CORRELATION CO-EFFICIENT continues
r= n∑ x y - ∑ x ∑ y
√ [n∑ x² - (∑ x)²] [n ∑ y² - ( ∑y )²]
PEARSON’S PRODUCT CORRELATION
continues
The following table can be used to obtain the
values which are to be substituted in the
formulae
x y x² y² Xy
x₁ y₁ x₁² y₁² x₁ y₁
x₂ y₂ x₂² y₂² x₂ y₂
x₃ y₃ x₃² y₃² x₃ y₃
x₄ y₄ x₄² y₄² x₄ y₄
∑x ∑y ∑ x² ∑ y² ∑xy
PEARSON’S PRODUCT WORKED
EXAMPLE
Ten Form 4 pupils at a certain school wrote two
tests one in History and the other one in
Mathematics and results are as follows
pu A B C D E F G H I J
pil
HIS 80 74 56 52 78 90 73 65 40 75
TO
RY
Ma 40 52 75 74 50 54 59 60 71 48
ths
PEARSON WORKED EXAMPLE continues
x y x² y² Xy
80 40 6400 1600 3200
74 52 5476 2704 3848
56 75 3136 5625 4200
52 74 2704 5476 3848
78 50 6084 2500 3900
90 54 8100 2916 4860
73 59 5329 3481 4307
65 60 4225 3600 3900
40 71 1600 5041 2840
75 48 5625 2304 3600
r= n∑xy-∑x∑y
√[n∑ x² - (∑ x)²] [n∑ y² - (∑ y )²]
n = 10
= - 13 159
√20 301 x 12 581
= -13 159
√255 401 881
= -13 159
15981,45428 = - 0,823 391 899
PEARSON continues
Therefore r = - 0, 8 23 to 3 decimal
rho = │ -6 ∑ d²
n (n² - 1)
This correlation co-efficient does not use the actual
scores of the variables. It uses the rank order of the
scores (variables). The values of x and y are ranked
separately either in ascending or descending order.
The corresponding rank orders are subtracted,
squared and finally added leading to ∑d².
SPEARMAN’S RANK ORDER CORRELATION
CO - EFFICIENT continues
x y rx ry Rx -ry d²
50 52 2 2 0 0
60 3 3 0 0
75 58 5 5 0 0
42 80 1 1 0 0
92 47 6 6 0 0
61 95 4 4 0 0
60
∑ 380 ∑ 392 ∑ d² 0
SPEARMAN’S RANK ORDER continues
rho = │- 6 x 0
6 (6² – 1)
=│ -0
6 x 35
= │- 0
210 = │- 0 =│
AGE 61 71 72 74 83 54 74 67 57 61
(X)
MAS 63 61 51 58 48 75 57 60 75 61
S (Y)
SPEARMAN RANK continues
x y Rx Ry d= r x-r y d²
61 63 3,5 8 -4,5 20,25
71 61 6 6,5 - 0,5 0,25
72 51 7 2 5 25
74 58 8,5 4 4,5 20,25
83 48 10 1 9 81
54 75 1 9,5 -8,5 72,25
74 57 8,5 3 5,5 30,25
67 60 5 5 0 0
57 75 2 9,5 -7,5 56,25
61 61 3,5 6,5 -3 9
∑ d²
=314,5
SPEARMAN continues
n = 10
When ranking if there are common numbers you add
the numbers and divide by the number for example
75 in the above table under (y) it falls under position
9 and 10 so it becomes 9+10 =19 divided by 2 =
9,5
rho = │ -6 ∑ d²
n (n² - 1)
= │ - 6 x 314,5
10 ( 10² - 1)
SPEARMAN continues
= │- 1887
10 (99)
= │ - 1887
990
= │- 1,906060
= -0,906060
= - 0,91
There is a very strong negative correlation between
age and mass that is as some gets older the mass
decreases
MEASURES OF ASSOCIATION continues
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of
association between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive
correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive
CORRELATION CO-EFFICIENT continues
SCALES OF MEASUREMENT
Nominal Scale
These are the most primitive scales primarily
Ratio scales
2. It is symmetrical
5. It begins with the low frequency which raises at the middle and
evenly subsides towards the end.
6. The mean, mode and median are = and they coincide on the line of
symmetry.
HYPOTHESIS TESTING
Alternative Hypothesis H₁ can be stated in 3 ways
which are (a) definite increase (one tailed test
1. State the null and alternative hypothesis i.e Hₒ and H₁ for example
there is no difference or association or relationship. It’s a prediction
about a population.
2. Decide on the test statistic to be used.
3. State the rejection criterion (decision rule) . It is a statement which
specifies when the null hypothesis should be rejected.
4. Calculate the test statistic
5. Make a statistical decision
6. Make a conclusion
ASSUMPTIONS OF THE T -TEST
1) The sample must be taken from the
population which follows the normal
distribution
2) The variance of the population must be
unknown (σ ² )
3) The sample size must be small i.e. n < 30
T - test
We need the degrees of freedom and the level of
significance
Degrees of freedom (d f) = n – 1 where n is the
sample size e.g. If n = 15 d f = 15-1 = 14
e.g. A 2 tailed t –tests for paired samples at 5%
significance level when n = 20
n = 20 - 1 d f = 19
critical value = 2,093 i.e. the value from the table
Reject H₀ if t calc < -2,093 or t calc > 2,093
(rejection criterion)
T – test continues
A 2 tailed t – test for paired samples at 1 %
significance level when n = 25 d f = 25-1
=24
T –test worked example
A form 4 teacher wanted to find out if there is a
difference between the academic performance of
pupils in Maths and English. The 4A pupils were
given Maths and English test and there scores were
as follows
pu A B C D E F G H I J
pil
Ma 60 58 75 36 50 61 85 70 77 63
ths
EN 72 65 70 40 50 64 90 72 70 60
GLI
SH
T –test continues
Carry out a t test at 10 % significance level to
determine if there is a difference between the
academic performance of 4A pupils in Maths and
English.
SOLUTION
H₀ there is no difference between the academic
performance of 4A pupils in Maths and English
H₁ there is a difference in performance of 4A pupils
in Maths and in English.
T-test solution continues
A 2 tailed t – test at 10% significance level
n = 10
d f 10 – 1 = 9
Reject H₀ if │ t calc │ > 1,833
T – test working Continues
t calc =√ ( n – 1) ∑ d
√ n ∑ d² - ( ∑d )²
The values to be substituted in the formulae
can be obtained using the following table.
T- test continues
x y d= x-y d²
X₁ Y₁ D₁ = x₁ - y₁ ²D²
X₂ Y₂ D₂ = x₂ -y₂ D²
x₃ Y₃ D₃ = x₃ -y₃ D²
xn yn Dn = xn -yn dn²
∑x ∑y ∑d ∑ d²
T – test continues
x y D=x-y d²
60 72 -12 144
58 65 -- 7 49
75 70 5 25
36 40 -4 16
50 50 0 0
61 64 -3 9
85 90 -5 25
70 72 -2 4
77 70 7 49
63 60 3 9
t calc = √ ( n – 1) ∑ d
√ n ∑ d² - ( ∑ d)²
= √ (10 – 1 ) x - 18
√ 10 x 330 – (- 18 ) ²
= √ 9 x -18
√ 3300 – 324
T – test worked example continues
= - 54
√ 2976
= 54
54,5527679
= 0, 989868026
= - 0,99
Since the │ t calc │ < 1,833 accept H₀ and
conclude that there is no difference between the
academic performance in Maths and English
CHI – SQUARE ASSUMPTIONS
1) Observations must be independent
2) The categories must be mutually exclusive
i.e. Each observation must appear in one and
only one of the categories in the table
3) The observations must be measured as
frequencies.
Chi – square test continues
(χ²) chi – square test
This is the version of hypothesis testing which
focuses on Observed ( O) and Expected (E)
Frequencies
(O) → F₀ ( E ) → fₑ
The critical value is obtained using degrees of
freedom and the significance level
The chi – square curve is not symmetrical. It starts
from 0
CHI –SQUARE continues (test for
independence)
This is a version of the chi- square(χ²) test in which
we explore the association between variables which
are represented on a contingency table. A
contingency table is a table which shows the
association between 2 variables. It has rows and
columns. One attribute is expressed in in rows e.g.
Gender versus academic performance or socio –
economics vs academic performance or highest
professional qualifications vs attitudes to learning
Chi –square a worked example
GENDER POOR GOOD EXCELLENT TOTAL
F 15 20 24 60
M 10 35 40 85
TOTAL 25 55 65 145grant total
CHI – SQUARE WORKED EXAMPLE CONTINUES
O₂ E₃
O₃ - E₃
O₃ En
On –En
On
∑O ∑E ∑ (O –E)²
Chi –square continues worked example
STEP 1
Hₒ There is no association between the teachers’ highest
qualifications and their attitudes towards teaching in rural areas.
E 15 = 81X66 =26,73
200
E 20 = 81X66 =26,73
200
E 46 = 81X68 =27,54
200
WORKED EXAMPLE CONTINUES
STEP 5 O E O-E (O-E)²
E
30 21,12 8,88 3,734
2,88 0,393
24 21,12 -11,76 6,356
10 21,76 2,85 0,448
21 18,15 3,85 0,817
22 18,15 -6,7 2,401
12 18,7 -11,73 5,148
15 26,73 -6,73 1,694
20 26,73 18,46 12,374
46 27,54
33,365
Worked example continued
∴ χ² calc = 33,365
STEP 6