EFM 515 Stats Lecture Notes

IMPORTANCE IN OF STATISTICS
IN EDUCATIONAL RESEARCH
1. It permits the most exact kind of description i.e, it is
our descriptive language eg, 10% of the students
passed the exam.
2. Statistics forces us to be definite and exact in our
thinking which means it allows us to summarise our
results in a short, meaningful and convenient form.
Hence, it saves us from the hassles of giving long
narrative descriptions of phenomena
3. It enables us to analyse relationships between
variables, e.g The average pass rate for the A class is
60% while that of the B class is 40%.
4. Thus, it enables us to predict how much of a variable
will happen under certain conditions.
MEASURES OF CENTRAL TENDENCY
They indicate how much the terms in the
distribution move towards the middle; i.e
centre.
A distribution is a set of data / numbers.
Frequency is the number of times.
There are three Measures of Central Tendency
namely mode, median and mean.
1. MODE
-It is the piece of data with the highest
frequency, i.e, it appears most often than the
other characters.
1. MODE
A distribution can have no mode, one mode or
even more.
a). If a distribution has one mode, we say it is
unimodal.
b). If it has two modes, we say it is bimodal.
C). If it has three modes, it is called a trimodal
distribution
d). There is also a multimodal distribution
which has many modes
Advantages of Using the Mode in
Educational Research
There is no need to arrange numbers in order of size. It is not affected
by extreme values (outliers) as is with the range.
DISADVANTAGES
It does not use all the terms or characters in the distribution. It only
focuses on those which appear more frequently than others. Hence, it
is not normally used for further statistical calculations
E.g find the mode of the following distributions

a) 82, 69, 81, 82, 74, 81
b) 1000, 101, 500, 60
c) 0, 5, 0, 5, 0, 5, 4, 5
SOLUTIONS
d) The modes are 81 and 82 hence it has a bimodal
e) No mode
f) The mode is 5 hence it is a unimodal
2. MEDIAN
It is the term which occupies the middle position
when the terms /numbers are arranged in terms of
size.
The terms can be arranged in ascending or
descending order
When the number of terms in the distribution is
odd the median is found using the formula:
Median = ½ (n+1) th term
Where n= number of terms in the distribution eg if
n= 15
Median = ½ (15+1) th term
=8
MEDIAN continues
If the number of terms in the distribution is even the median is half

(1/2) of the sum of two middle terms eg if n= 20
Median = 10th term +11th term
2
Eg find the median of each of the following distributions
a) 7, 1, 4, 9, 7, 8, 6, 5, 6,3
b) 77, 29, 36, 24, 82, 100, 105, 19, 60, 50
c) 99, 81, 74, 65, 50, 28, 3
SOLUTION
(a) Rank the numbers or arrange them in ascending order
(b) 1 3 4 5 6 7 7 8 9 n= 9
Median= ½ (9+1) th term
= 5th term
Therefore Median = 6
MEDIAN continues
(b) Ranking the numbers
19 24 29 36 50 60 77 82 100 105
Median = ½ (5th+6th) term
= 50+60
2
= 110
2
Therefore the median = 55

MEDIAN continues
(c) Ranking or arranging in ascending order
3 28 50 65 74 81 99
n = 7+1 = 8 = 4th
2 2
Median = the fourth number which is 65
DISADVANTAGES
-Does not use all the terms in the distribution
-It is not always used for further statistical
calculations
-It can be different from the terms in the distribution
MEAN
It is also called arithmetic mean or average
-It is obtained by adding all the terms in the
distribution and dividing the sum by the
number of terms in the distribution.
Mean = sum of terms
n
It is represented by x̅ (x bar)
∑ Summation sign. This means something is
being added
x1 + x2+ - - - - -=∑ x
MEAN continues
∑x = (x₁ + x₂ + x₃ - - - -xn)
n
n = Number of terms in the distribution
x̅ = ∑x
n
MEAN continues
ADVANTAGES OF MEAN
- It uses all the terms in the distribution
- It is used for further statistical calculations eg in
finding the variance and standard deviation.
- There is no need to rank the numbers.
DISADVANTAGES
-It is affected by the outliers
-It can be different from all the terms in the
distribution
Find the mean of each of the following
a) 60, 74, 88, 36, 54, 81, 93, 96, 50 68
b) 1000, 4000, 3600, 7200, 1112
a) x̅ = ∑x
n
MEAN continues
x̅ = 60+74+88+36+54+81+96+93+50+68
10
= 700
10
= 70
b) x̅ = 1000 + 4000 + 3600 + 7200 + 1112

5
= 16912
5
x̅ = 3382,4
MEASURES OF DISPERSION /VARIABILITY/ Scatter/
SPREAD
They are also called measures of spread, variability

or scatter.
They indicate how much the terms in the
distribution are spread or scattered from the
mean/average.
The distribution can be measured relative to the
mean (starting from the mean)
Measures of Dispersion include the variance,
Standard deviation and range
1. RANGE
It focuses on the difference between the
greatest and the smallest values in the
distribution. There are two types of range
namely the ordinary range and the inclusive
range.
a). Ordinary range
It is the difference between the greatest and the

smallest value in the distribution.
Ordinary range= x max – x min
= Greatest value – Smallest value
RANGE continues
b. Inclusive range
=(Greatest value –Smallest value) + 1
= (x max –x min) + 1
Find the range (ordinary and inclusive) of the
following distribution
a) 85, 36, 14, 91, 99, 64, 50, 28, 12, 9, 3
Ordinary range = x max- x min
= 99 – 3
= 96
RANGE continues
Inclusive range =( x max – x min ) + 1
= (99 – 3 ) + 1
= 96 + 1
= 97
Advantage of range
Easy to calculate
Disadvantages
Only focuses on outliers and ignores all the
other terms
It is very misleading if the distribution has
outliers
The Interquartile range
The central portion of the

distribution
􀂄 Away from the extremes
􀂄 it is the difference
between the third quartile
(75%) and the first quartile
(25%) of observations.
50% of the data still discarded
and only focus in the middle 5o%
Interquartile range
The interquartile range is another range used as a
measure of the spread.
The difference between upper and lower quartiles
(Q3–Q1), which is called the interquartile range,
also indicates the dispersion of a data set.
The interquartile range spans 50% of a data set,
and eliminates the influence of outliers because,
in effect, the highest and lowest quarters are
removed.
Interquartile range = difference between upper

quartile (Q3) and lower quartile (Q1)
Interquatile range cont’
Steps to calculate the IQR
 arrange data in ascending order
 Calculate Q1 using the formula Q1= (n+1)/4
 And Q3= (n+1)*3/4
 Then calculate IQR using the formula IQR= Q3-Q1
 Take averages when Q1 or Q3 are in between two
numbers.
Examples 1
Find the IQR of the set 1,2,3,4,5,6,7
IQR cont’ed
Q1 = (n+1)/4 (7+1)/4=2
Q3 = ¾(n+1) ¾(7+1)=6
IQR = Q3-Q1
=6 - 2 = 4
IQR
 Example 2 find IQR for this marks
30,25,80, 41,4o,56,65,77
Arrange: 25,30,40,41,56,65,77,80
Q1= (8+1)/4
= 2.25 therefore take the average of 30 and
40 and get Q1 = 35
Q3 = ¾(8+1)= 6.75 therefore take the average
of 65 and 77 and get Q3=71
So IQR= 71-35= 36
IQR cont’
Advantages
1. Can be used as a measure of variability if
the extreme values are not recorded
correctly
2. It is not affected by extreme values
IQR cont’
Disadvantages
1.Not easy to calculate
2.it is a positional measure, based on only the
twenty- 5fth and seventy-5 fth percentile
b. VARIANCE
Population variance is denoted by σ²

Sample variance is denoted by S². Variance is the mean of the
sum of the squared deviation.
How to find the variance
1. Find the mean of the distribution
2. Subtract the mean from each term in the distribution to get

the deviations i.e
(x₁ - x̅ )(x₂-x̅ )- - - - (x n- x̅)
3. Square each deviation to get (x₁ - x̅)², (x₂ -x̅)²- - - - - - (x n -

x̅ )²
VARIANCE continues
4) Add the squared deviation
(x₁ -x)̅ ² + (x₂ - x)̅ ² +- - - - (xn -x)̅ ² = ∑ (x -x̅ )²
Which means sum of the squared deviations
5) Divide the sum by n-1
Therefore Variance is = s² = ∑ (x - x)̅ ²

n -1
The End
Thank you for your attention

C. Standard Deviation
The square root of the variance
Samples SD is represented by S
Population SD is represented by σ
Standard Deviation (SD) =√∑( X - X̅)²

n-1
The variance and standard deviation can be

calculated using the following table
c. STANDARD DEVIATION
Score Mean Deviation Squared deviation
X x̅ x -x̅
x₁ ↑ (x -x̅ )
x̅
↓ ( Xn - X̅)²
∑X 0 ∑ ( X - X̅ )²
SUM OF SQUARED
DEVIATION
CALCULATION OF VARIANCE
X X̅ X - X̅ (X - X̅ )²
60 71 -11 121
83 71 12 144
71 71 0 0
63 71 -8 64
89 71 18 324
90 71 19 361
40 71 -30 961
72 71 1 1
∑ 1976
VARIANCE continues
Variance (s²) = ∑ (x -x̅ )²
n- 1
= 1976
8 -1
= 1976
7
=282, 285 7143
= 282,29 (2 decimal place)

Calculation of standard deviation
SD = √ 282,2857143
SD = √ ∑ (X - X̅)²
n–1
=√ 1976
7
=1680136049
= 16,80 (2 d p)
INTERPRETATION OF VARIANCE AND
STANDARD DEVIATION
A large value of the standard deviation/variance
shows that the values are widely scattered relatively
to the mean which means the greater the variance /
standard deviation the more widely spaced the
terms are above and below the mean. The smaller
the variance the more closely packed the values are
around the mean.
d. MEASURES OF RELATIVE STANDING
SCORES
They are used to indicate how an individual
compares to other individuals and determine his or
her relative position. They are concerned with how a
particular score stands in relation to other scores.
Z – score is the number of standard deviations a

score is away from the mean. It helps us to compare
meaningfully, scores obtained in tests using different
scales.
Z –SCORE continues
Z = X- X̅
S
Where( x) is score ( x̅) is mean and (S ) is standard
deviation
The greater the Z score the greater the

performance
Z- SCORES continues
Pupil A B X̅ S
Maths 77 74 76 2,15
English 79 76 78 3,24
Z – score continues
a)Considering pupil A Z - score
Maths score = x - x̅ = 77- 76
s 2,15
= 1
2,15
= 0,465116279
= 0,465
Z - score continues
Pupil A Z - Score
Z- score = 79 – 78
3, 24
= 1
3,24
= 0, 308
Pupil A performed better in Maths since the z – score
in Maths is higher than in English.
Z- SCORE continues
Pupil B Maths z – score z = x- x̅
s
= 74 – 76
2, 15
= -2
2,15
= -0,930232558
Therefore to two decimal places = -0,93
Z - SCORE continues
Pupil B English Z-score
= x - x̅
s
= 76 – 78
3,24
= -2
3,24
= - 0,61728
Therefore to two decimal places= -0,62

Pupil B performed better in English than in Maths
e. MEASURES OF ASSOCIATION
It focuses on the relationship between variables.
Correlation It is the degree of association
between 2 or more variables or factors.
There are two types of variables
1) Independent variables (x) and the dependent
variable (y).
The independent variable is the variable which
is manipulated by the researcher during an
experiment.
The dependent variable is the factor which is
influenced by the manipulation of the
independent variable
MEASURES OF ASSOCIATION continues
INDEPENDENT VARIABLE DEPENDENT VARIABLE

1. Amount of fertilizer applied - Yields obtained
2. Number of hours spent studying - Exam mark
3. Distance from the CBD - Rentals
Correlation can be positive or negative .

POSITIVE CORRELATION
The two variables increase or decrease together. An increase
in one variable is matched by corresponding increase in the
other.
NEGATIVE CORRELATION
An increase in one variable is matched by corresponding
decrease in the other.
Scatter Diagram/Scatter Plot/Scatter Gram/Scatter
Graph
It is a diagram showing the corresponding values of

the independent and dependent variables as co
ordinates. It is the simplest way of determining
correlation between two variables on a graph. The
independent variable is on the horizontal axis and
the dependent variable on the vertical axis. An
accurate scale should be used on both axes.
The axis should be labeled as shown below
x x x
x x
x x x x
x x x x
0 X
Graph showing no correlation
Graph showing negative correlation
y
x
x
x
x
x
x
x
x
0 X
Graph showing perfect positive correlation
y
x
x
x
x
x
x
x
0 x
MEASURES OF ASSOCIATION
continues
A graph showing perfect negative correlation
Y
x
x
x
x
x
x
x x
o x
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of
association between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive
correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive
correlation
CORRELATION CO-EFFICIENT continues
-1 perfect negative correlation

-0,8 to – 0,99 very strong negative correlation
-0,6 to – 0,79 strong negative correlation
- 0,4 to – 0,59 moderate negative correlation
-0,2 to - 0,39 weak negative correlation
- 0,01 to – 0,19 very weak negative correlation
The 2 popular correlation co-efficiency are the
Pearson’s product correlation co-efficient (r) and
the Spearman’s rank order correlation co-efficient
(rho)
PEARSON’S PRODUCT CORRELATION
CO-EFFICIENT (r)
Its mainly strength is that it uses actual values of the
variables.
It is calculated using the following formulae
r= n∑ x y - ∑ x ∑ y
√ [n∑ x² - (∑ x)²] [n ∑ y² - ( ∑y )²]
PEARSON’S PRODUCT CORRELATION
continues
The following table can be used to obtain the
values which are to be substituted in the
formulae
x y x² y² Xy
x₁ y₁ x₁² y₁² x₁ y₁
x₂ y₂ x₂² y₂² x₂ y₂
x₃ y₃ x₃² y₃² x₃ y₃
x₄ y₄ x₄² y₄² x₄ y₄
∑x ∑y ∑ x² ∑ y² ∑xy
PEARSON’S PRODUCT WORKED
EXAMPLE
Ten Form 4 pupils at a certain school wrote two
tests one in History and the other one in
Mathematics and results are as follows
pu A B C D E F G H I J
pil
HIS 80 74 56 52 78 90 73 65 40 75
TO
RY
Ma 40 52 75 74 50 54 59 60 71 48
ths
PEARSON WORKED EXAMPLE continues
x y x² y² Xy
80 40 6400 1600 3200
74 52 5476 2704 3848
56 75 3136 5625 4200
52 74 2704 5476 3848
78 50 6084 2500 3900
90 54 8100 2916 4860
73 59 5329 3481 4307
65 60 4225 3600 3900
40 71 1600 5041 2840
75 48 5625 2304 3600
∑ x 683 ∑ y 583 ∑x² 48 679 ∑y² 35 247 ∑ x y 38 503

Pearson worked example continues
r= n∑xy-∑x∑y
√[n∑ x² - (∑ x)²] [n∑ y² - (∑ y )²]
n = 10
= 10 x 38503 – 683 x 583

√[10 x48679 – (683)²][10x35247 – (583)²]
PEARSON WORKED EXAMPLE
continues
= 385 030 – 398 189

√[486790 – 466489] [352470 – 339 889]
= - 13 159
√20 301 x 12 581
= -13 159
√255 401 881
= -13 159
15981,45428 = - 0,823 391 899
PEARSON continues
Therefore r = - 0, 8 23 to 3 decimal
There is a very strong negative correlation between

History marks and Mathematics marks
SPEARMAN’S RANK ORDER CORRELATION CO – EFFICIENT (rho)
rho = │ -6 ∑ d²
n (n² - 1)
This correlation co-efficient does not use the actual
scores of the variables. It uses the rank order of the
scores (variables). The values of x and y are ranked
separately either in ascending or descending order.
The corresponding rank orders are subtracted,
squared and finally added leading to ∑d².
SPEARMAN’S RANK ORDER CORRELATION
CO - EFFICIENT continues
The following table can be used

x y Rank x Rank y D= r x d²
(r x) (r y) –r y
Maths mark (x)
Physics mark (y)

SPEARMAN’S RANK continues
Maths 50 60 75 42 92 61
(x)
Physics 52 58 80 47 95 60
(y)
SPEARMAN’S RANK ORDER continues
x y rx ry Rx -ry d²
50 52 2 2 0 0
60 3 3 0 0
75 58 5 5 0 0
42 80 1 1 0 0
92 47 6 6 0 0
61 95 4 4 0 0
60
∑ 380 ∑ 392 ∑ d² 0
SPEARMAN’S RANK ORDER continues
rho = │- 6 x 0
6 (6² – 1)
=│ -0
6 x 35
= │- 0
210 = │- 0 =│
rho = 1 There is a perfect positive correlation between Maths and

Physics marks.
SPEARMAN’S RANK ORDER EXAMPLE 2
AGE 61 71 72 74 83 54 74 67 57 61
(X)
MAS 63 61 51 58 48 75 57 60 75 61
S (Y)
SPEARMAN RANK continues
x y Rx Ry d= r x-r y d²
61 63 3,5 8 -4,5 20,25
71 61 6 6,5 - 0,5 0,25
72 51 7 2 5 25
74 58 8,5 4 4,5 20,25
83 48 10 1 9 81
54 75 1 9,5 -8,5 72,25
74 57 8,5 3 5,5 30,25
67 60 5 5 0 0
57 75 2 9,5 -7,5 56,25
61 61 3,5 6,5 -3 9
∑ d²
=314,5
SPEARMAN continues
n = 10
When ranking if there are common numbers you add
the numbers and divide by the number for example
75 in the above table under (y) it falls under position
9 and 10 so it becomes 9+10 =19 divided by 2 =
9,5
rho = │ -6 ∑ d²
n (n² - 1)
= │ - 6 x 314,5
10 ( 10² - 1)
SPEARMAN continues
= │- 1887
10 (99)
= │ - 1887
990
= │- 1,906060
= -0,906060
= - 0,91
There is a very strong negative correlation between
age and mass that is as some gets older the mass
decreases
CORRELATION CO-EFFICIENT
It is the number which shows the size and direction of
association between variables.
r normally represents a correlation co-efficient
The maximum value of r is + 1
The minimum value of r is – 1
This means r lies between -1 and + 1
When r is + 1 it is perfect positive correlation
When it is between 0,8 – 0,99 very strong positive correlation
“ “ 0,6 – 0,79 strong positive correlation
“ “ “ 0,4 – 0,59 moderate positive
correlation
“ “ ‘ 0,2 – 0,39 weak positive correlation
“ “ “ 0,1 – 0,19 very weak positive
CORRELATION CO-EFFICIENT continues
-1 perfect negative correlation

-0,8 to – 0,99 very strong negative correlation
-0,6 to – 0,79 strong negative correlation
- 0,4 to – 0,59 moderate negative correlation
-0,2 to - 0,39 weak negative correlation
- 0,01 to – 0,19 very weak negative correlation
The 2 popular correlation co-efficiency are the
Pearson’s product correlation co-efficient (r) and
the Spearman’s rank order correlation co-efficient
(rho)
SPEARMAN’S RANK CORRELATION
Pearson’s order is better than Spearman’s rank
order because it uses actual variables and
spearman uses rank orders of the variables.
However the correlation calculated for the same
data not very different and in most cases they
leave to the same conclusion.
SCALES OF MEASUREMENT
Measurement refers to assignment of numerical
value to an entity
nominal ordinal interval ratio

 Nominal Scale
 These are the most primitive scales primarily
used for labelling or naming. Each group is

assigned a number or a name as a
distinguishing label or classification for
example the numbers given to soccer players.
It does not mean that number 7 plays
qualitatively better than number 3 or vice
versa. Gender / sex where the labels male or
females are used.
ORDINAL SCALES
According to Latif and Maunganidze (2004) an
ordinal scale is the simplest true scale which
orders or ranks people, objects or events along
particular continuum. There is qualitative
classification and one can actually say that one
class is better than another relative to a particular
variable.
The scale determines the relative position of an
object or an individual with respect to others for
example in terms of academic performance
religiosity, maturity etc.
Ordinal scale continued
It satisfies the transitivity rule which states that if A is
greater (>) than B and B greater ( >) than C then A is
greater than (> ) C
Interval Scale
There is equality of units i.e the same numerical
distance is associated with the same empirical
distance on the same real continuum e.g. The
difference between 35° c and 25° c has the same value
as 80° c and 70° c . The intervals emanate from an
abitrary origin i.e. There is no true zero point. Zero
does not mean the total absence of what is being
measured e.g. I Q of 0 does not imply zero
intelligence
SCALES OF MEASUREMENT continues
Ratio scales
It is the best scale of measurement because it has a

true zero point. According to Latif and Maunganidze
(2004) a true zero point is a point corresponding to
the absence of a thing being measured for example
time, mass and volume where zero seconds means
no time and zero kg means weightless. Apart from
possessing all the properties of 3 proceeding scales
ratio scales have true ratios for example we can
safely say 50kg is ¼ of 200kg.
NORMAL DISTRIBUTION CURVE
It is one of the most important distribution in statistics because it
mirrors/reflects the distribution of many real life measurements such as
mass, height, weight etc.
Characteristics of a normal distribution curve
1. It represents a cross section of a bell
2. It is symmetrical
3. It is asymptotic to the horizontal axis i.e. it does not come into

contact with the horizontal axis
4. The total area under the curve is one (1)
5. It begins with the low frequency which raises at the middle and
evenly subsides towards the end.
6. The mean, mode and median are = and they coincide on the line of
symmetry.
HYPOTHESIS TESTING
Alternative Hypothesis H₁ can be stated in 3 ways
which are (a) definite increase (one tailed test
(b) definite decrease (one tailed test

(c) any change (two tailed test)
Critical value is value taken from the table
Null Hypothesis H₀ states there is no difference
between characteristics of 2 samples
Alternative Hypothesis H₁ claim that there is a
difference between characteristics of a 2 samples.
STATEMENT OF HYPOTHESIS continues
Examples of null (H₀)

1) H₀ There is no difference between the academic
performance of Grade 7A pupils in Maths and
English
2) H₀ There is no difference between the
performance of boys and girls in Maths
Examples of Alternative Hypothesis
3) H₁ There is a difference between the academic
performance of Grade 7A pupils in Maths and
English
4) H₁ There is a difference between the academic
performance of boys and girls in Maths
HYPOTHESIS TESTING PROCEDURE
1. State the null and alternative hypothesis i.e Hₒ and H₁ for example
there is no difference or association or relationship. It’s a prediction
about a population.
2. Decide on the test statistic to be used.
3. State the rejection criterion (decision rule) . It is a statement which
specifies when the null hypothesis should be rejected.
4. Calculate the test statistic
5. Make a statistical decision
6. Make a conclusion
ASSUMPTIONS OF THE T -TEST
1) The sample must be taken from the
population which follows the normal
distribution
2) The variance of the population must be
unknown (σ ² )
3) The sample size must be small i.e. n < 30
T - test
We need the degrees of freedom and the level of
significance
Degrees of freedom (d f) = n – 1 where n is the
sample size e.g. If n = 15 d f = 15-1 = 14
e.g. A 2 tailed t –tests for paired samples at 5%
significance level when n = 20
n = 20 - 1 d f = 19
critical value = 2,093 i.e. the value from the table
Reject H₀ if t calc < -2,093 or t calc > 2,093
(rejection criterion)
T – test continues
A 2 tailed t – test for paired samples at 1 %
significance level when n = 25 d f = 25-1
=24
T –test worked example
A form 4 teacher wanted to find out if there is a
difference between the academic performance of
pupils in Maths and English. The 4A pupils were
given Maths and English test and there scores were
as follows
pu A B C D E F G H I J
pil
Ma 60 58 75 36 50 61 85 70 77 63
ths
EN 72 65 70 40 50 64 90 72 70 60
GLI
SH
T –test continues
Carry out a t test at 10 % significance level to
determine if there is a difference between the
academic performance of 4A pupils in Maths and
English.
SOLUTION
H₀ there is no difference between the academic
performance of 4A pupils in Maths and English
H₁ there is a difference in performance of 4A pupils
in Maths and in English.
T-test solution continues
A 2 tailed t – test at 10% significance level
n = 10
d f 10 – 1 = 9
Reject H₀ if │ t calc │ > 1,833
T – test working Continues
t calc =√ ( n – 1) ∑ d
√ n ∑ d² - ( ∑d )²
The values to be substituted in the formulae
can be obtained using the following table.
T- test continues
x y d= x-y d²
X₁ Y₁ D₁ = x₁ - y₁ ²D²
X₂ Y₂ D₂ = x₂ -y₂ D²
x₃ Y₃ D₃ = x₃ -y₃ D²
xn yn Dn = xn -yn dn²
∑x ∑y ∑d ∑ d²
T – test continues
x y D=x-y d²
60 72 -12 144
58 65 -- 7 49
75 70 5 25
36 40 -4 16
50 50 0 0
61 64 -3 9
85 90 -5 25
70 72 -2 4
77 70 7 49
63 60 3 9
Add positive first ∑ d ² = 330

and subtract the
negatives
T –test worked example continues
t calc = √ ( n – 1) ∑ d
√ n ∑ d² - ( ∑ d)²
= √ (10 – 1 ) x - 18
√ 10 x 330 – (- 18 ) ²
= √ 9 x -18
√ 3300 – 324
T – test worked example continues
= - 54
√ 2976
= 54
54,5527679
= 0, 989868026
= - 0,99
Since the │ t calc │ < 1,833 accept H₀ and
conclude that there is no difference between the
academic performance in Maths and English
CHI – SQUARE ASSUMPTIONS
1) Observations must be independent
2) The categories must be mutually exclusive
i.e. Each observation must appear in one and
only one of the categories in the table
3) The observations must be measured as
frequencies.
Chi – square test continues
(χ²) chi – square test
This is the version of hypothesis testing which
focuses on Observed ( O) and Expected (E)
Frequencies
(O) → F₀ ( E ) → fₑ
The critical value is obtained using degrees of
freedom and the significance level
The chi – square curve is not symmetrical. It starts
from 0
CHI –SQUARE continues (test for
independence)
This is a version of the chi- square(χ²) test in which
we explore the association between variables which
are represented on a contingency table. A
contingency table is a table which shows the
association between 2 variables. It has rows and
columns. One attribute is expressed in in rows e.g.
Gender versus academic performance or socio –
economics vs academic performance or highest
professional qualifications vs attitudes to learning
Chi –square a worked example
GENDER POOR GOOD EXCELLENT TOTAL
F 15 20 24 60
M 10 35 40 85
TOTAL 25 55 65 145grant total
CHI – SQUARE WORKED EXAMPLE CONTINUES
Row x column = 2 x 3 table

Total row not to be counted and is used to calculate
Degreesof freedom
60 and 85 (row totals )
25 and 55 (column totals )
The numbers appearing on the original contingency table
are the observed frequencies
Degrees of freedom are obtained using the number of
rows and columns i.e. d f = (r-1) (c – 1) r =no of rows
C = no of columns
Chi – square continues
e.g. For a 4x5 table d f = (4-1) (5-1)
= (3) (4)
= 12
In the above table it is (2-1) (3-1)

=1x2
=2
Expected frequency are obtained using the
following formulae r total x c total
grant total
e.g. 60x25
145
CHI- SQUARE continues
E 40 = 85 X 65 = 38,10
145
The test statistic is obtained using the following
table
O E O -E (O-E)²
E
O₁ E₁ O₁ - E₁ (O₁-E₁)²
E₁
E₂ O₂ -E₂
O₂ E₃
O₃ - E₃
O₃ En
On –En
On
∑O ∑E ∑ (O –E)²
Chi –square continues worked example
An education researcher wanted to explore if there

is
an association between teachers’ highest
qualifications and their attitudes towards teaching in
rural areas. After conducting a survey the data
shown in the table below was obtained

Chi –square worked example continues
Highest professional qualifications
ATTITUDES Diploma in undergraduate Post graduate Total

education
Favourable 30 24 10 64
Neutral 21 22 12 55
Unfavourable 15 20 46 81
Total 66 66 68 200
Worked example continues
Carry out a chi- square test(χ²) at 5% significance
level to determine if there is an association between
teachers’ highest professional qualifications and
their attitudes towards teaching in rural areas.
STEP 1
Hₒ There is no association between the teachers’ highest
qualifications and their attitudes towards teaching in rural areas.
H₁ There is an association between teachers’ highest

qualifications and their attitudes towards teaching in rural
areas.
STEP 2
A chi-square( χ²) at 5% significance level

3x3 table
d f = (3-1) (3-1)
2x2
df=4
WORKED EXAMPLE CONTINUES
STEP 3
Look for the value of d f (4) at 5% in the table which
is 9,49 and draw a curve. Reject Hₒ if χ² calc > 9,49
Step 4
Calculating the expected frequencies
Expected Frequency is = row total x column total
Grant total
E30 = 64X66 = 21,12
200
E24 =64X66 = 21,12
200
E 10 = 64X 68 = 21,76
200
E 21 =55X66 = 18,15
200
STEP 4 CONTINUATION
E 12 = 55X68 =18,7
200
E 15 = 81X66 =26,73
200
E 20 = 81X66 =26,73
200
E 46 = 81X68 =27,54
200
STEP 5 O E O-E (O-E)²
E
30 21,12 8,88 3,734
2,88 0,393
24 21,12 -11,76 6,356
10 21,76 2,85 0,448
21 18,15 3,85 0,817
22 18,15 -6,7 2,401
12 18,7 -11,73 5,148
15 26,73 -6,73 1,694
20 26,73 18,46 12,374
46 27,54
33,365
Worked example continued
∴ χ² calc = 33,365
STEP 6
Since χ² calc is > 9,49 reject Hₒ and conclude that there is an

association between teachers’ highest qualification and their
attitude towards teaching in rural areas.

EFM 515 Stats Lecture Notes

Uploaded by

Copyright:

Available Formats

You might also like

EFM 515 Stats Lecture Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EFM 515 Stats Lecture Notes

Uploaded by

Copyright:

Available Formats

IMPORTANCE IN OF STATISTICS

E.g find the mode of the following distributions

If the number of terms in the distribution is even the median is half

Median = ½ (5th+6th) term

Therefore the median = 55

b) 1000, 4000, 3600, 7200, 1112

b) x̅ = 1000 + 4000 + 3600 + 7200 + 1112

They are also called measures of spread, variability

It is the difference between the greatest and the

The central portion of the

Interquartile range = difference between upper

Population variance is denoted by σ²

How to find the variance

1. Find the mean of the distribution

2. Subtract the mean from each term in the distribution to get

3. Square each deviation to get (x₁ - x̅)², (x₂ -x̅)²- - - - - - (x n -

(x₁ -x)̅ ² + (x₂ - x)̅ ² +- - - - (xn -x)̅ ² = ∑ (x -x̅ )²

Which means sum of the squared deviations

5) Divide the sum by n-1

Therefore Variance is = s² = ∑ (x - x)̅ ²

Thank you for your attention

Standard Deviation (SD) =√∑( X - X̅)²

The variance and standard deviation can be

=282, 285 7143

= 282,29 (2 decimal place)

Z – score is the number of standard deviations a

The greater the Z score the greater the

Therefore to two decimal places= -0,62

INDEPENDENT VARIABLE DEPENDENT VARIABLE

Correlation can be positive or negative .

It is a diagram showing the corresponding values of

-1 perfect negative correlation

It is calculated using the following formulae

∑ x 683 ∑ y 583 ∑x² 48 679 ∑y² 35 247 ∑ x y 38 503

= 10 x 38503 – 683 x 583

= 385 030 – 398 189

There is a very strong negative correlation between

The following table can be used

Maths mark (x)

Physics mark (y)

rho = 1 There is a perfect positive correlation between Maths and

-1 perfect negative correlation

nominal ordinal interval ratio

used for labelling or naming. Each group is

It is the best scale of measurement because it has a

3. It is asymptotic to the horizontal axis i.e. it does not come into

(b) definite decrease (one tailed test

Examples of null (H₀)

Add positive first ∑ d ² = 330

Row x column = 2 x 3 table

In the above table it is (2-1) (3-1)

An education researcher wanted to explore if there

an association between teachers’ highest

qualifications and their attitudes towards teaching in

rural areas. After conducting a survey the data

shown in the table below was obtained

ATTITUDES Diploma in undergraduate Post graduate Total

H₁ There is an association between teachers’ highest