Professional Documents
Culture Documents
Educational Statistics: Prof. Y. K. A. Etsey, Dept. of Educational Foundations
Educational Statistics: Prof. Y. K. A. Etsey, Dept. of Educational Foundations
Educational Statistics: Prof. Y. K. A. Etsey, Dept. of Educational Foundations
Statistics
100
80
Chemistry
60
40
20
0
0 20 40 60 80 100
Mathematics
EPS 211
Prof. Y. K. A. Etsey,
Dept. of Educational
S Foundations
August 2012
TABLE OF CONTENTS
Pages
Appendix 92
2
UNIT 1
INTRODUCTION TO STATISTICS
DEFINITIONS OF STATISTICS
There are three basic definitions of the term, Statistics.
1. Statistics (plural) is the body of numbers or data collected in any field.
For example industrial statistics – number of employees in an industry,
number of products, value of products; vital statistics – measurements of
bust, waist, hips; population statistics – number of people in a region, number
of people with secondary education
2. Statistics (singular) is the study of methods and procedures used in
collecting, organizing, analyzing, and interpreting a body of numbers for
information and decision making.
3. Statistics (plural; statistic - singular) are the values computed from a body
of numerical data. For example the “average” age of Level 200 students in
UCC, the proportion of EPS 211 students who are males.
3
would indicate whether the general performance of a class is low, average or
high.
3. It helps teachers to evaluate course grades and the differences in ability
represented by different grades. For students personal report cards, grades alone
do not provide enough information on a student‟s level of performance in a
subject. This information should be combined with the ranking in the subject.
4. It helps the teacher in the critical reading and understanding of professional
journals in education. Journals such as the Journal of Educational Management,
Journal of Educational Research, Journal of Educational Development and
Practice, Journal of Research and Development in Education often use statistics
in their analysis of results.
5. It is useful for research purposes. Statistics are used for data analysis in project
work and dissertations/thesis. Teachers would also use it in their own research in
the teaching profession.
6. It helps the teacher to understand information from standardized achievement
test manuals. The statistical information provided in the test manuals describes
the quality of the test and the interpretation of the test scores.
INTRODUCTORY CONCEPTS
Variables
A variable is any characteristic of an individual or object that can take on
different values. A value is an assigned number or label representing the attribute of
a given individual or object. For example, marital status as a variable can be broken
down into categories and given values as never married - 1, married - 2, divorced - 3
and widowed - 4. Number of children in a family as a variable can be given the
values 0, 1, 2, 3, 4 etc. Height can take on values such as 1.2 metres, 1.7 metres, 2.0
metres and 2.2 metres. Religious affiliation can be broken down to categories and
given values as: Christian – 1, Moslem – 2, Traditionalist – 3, Buddhist - 4.
4
Variables can also be classified as discrete or continuous.
Discrete variables have values which in theory assume only certain distinct
values or whole numbers on a number line. These variables usually represent
counts of indivisible entities, for example, 8, 12, 20, 45, 100. For example the
number of goals scored in a soccer game or the number of students in a class.
Continuous variables have values which in theory assume any value on a number
line between two points. The values can differ by infinitesimal amounts, for
example, 10.5, 14.16, 42.001, 56.2222278. For example the height of a student
or the weight of a car is a continuous variable.
Inferential statistics uses data from a small group called a sample to make
statements or generalizations about a much larger group called a population.
For example, to know the mean age of first year university students (i.e.
population) in Ghana, a small group (i. e. sample) say 200, of first year students
can be used. Their mean age could be used as an estimate of the mean age of all
first year university students in Ghana.
Scales of Measurement
Depending upon the traits/attributes/characteristics and the way they are
measured, different kinds of data result representing different scales of
measurement. There are 4 types of measurement scales. These are Nominal,
Ordinal, Interval and Ratio.
Nominal Scales: A nominal scale classifies persons or objects into two or more
categories. Whatever the classification, a person can be in one and only one
category, and members of a given category have a common set of
characteristics. For identification purposes, categories are numbered. e.g.
Gender: Male - 1, Female - 2. All males have a common characteristic and all
females have a common characteristic which is different from males.
Ordinal Scales: An ordinal scale not only classifies subjects but also ranks them
in terms of the degree to which they possess a characteristic/attribute of interest.
5
An ordinal scale puts subjects in order from highest to lowest, or from most to
least. With respect to height, 5 students can be ranked from 1 to 5, the subject
with rank 1 being the shortest. Though ordinal scales do indicate that some
subjects are higher or better than others, they do not indicate how much higher
or better. The intervals between the ranks are not equal.
Interval Scales: An interval scale has all the characteristics of both nominal and
ordinal scales and in addition has equal intervals. The zero point is arbitrary
and does not mean the absence of the characteristics/trait. Values can be added
and subtracted to and from each other but not multiplied or divided. Examples
include Celsius temperature, academic achievement.
Ratio Scales: A ratio scale has all the advantages of the types of scales and in
addition it has a meaningful true zero point. Height, weight and time are
examples. Values can be added, subtracted, multiplied and divided. Sixty
(60) minutes can be said to be 3 times as long as 20 minutes.
Arithmetic Comparisons
Practice Exercises
1. Statistics is important for classroom teachers because it
A. enables them to write appropriate objectives.
B. helps them to construct good test items.
C. helps them to evaluate students‟ grades.
D. is useful for promotion and certification.
6
3. A district director of education measures many variables on a sample of schools.
An example of a variable measured in an ordinal scale is the
A. enrolment of the classes in each school.
B. income in cedis of the teachers.
C. professional qualification of the teachers.
D. years of service for each teacher.
A study was conducted to see how well reading success in primary three
could be predicted from various kinds of information obtained in
kindergarten (reading readiness, age, gender, and socio-economic status).
7
8. The grades, A, B, C, D, E, F in a test were changed to 4, 3, 2, 1, 0 for statistical
purposes. What scale of measurement was used?
A. Interval
B. Nominal
C. Ordinal
D. Ratio
11. Which one of the following variables, measured from Primary 6 pupils, has
an interval scale?
A. Dancing ability
B. Languages spoken
C. Region of birth
D. Religious affiliation
8
UNIT 2
DATA REPRESENTATION
Raw scores are often represented by graphics/pictures or tables. The
representation of data in these forms enables more information to be derived from the
scores. In educational statistics, more emphasis is placed on pictorial
representations.
Bar graph/Charts
Data that are from nominal scales are represented in graphic form with the
use of bar graphs. Bar graphs give a pictorial description of the data and emphasize
how groups compare with one another. They are used to compare the sizes of the
various parts. The height of the bars is the basis for the comparisons and not the area
of the bars.
Bar graphs are either column or horizontal. Column graphs are more popular
in education. Column bar graphs are simple, compound (multiple) or component.
Examples are shown below.
9
Table 1
Performance in Inter-House Athletics at BASS
House Total Points
One 120
Two 100
Three 150
Four 60
Five 170
The figure below is a simple column bar graph showing performance in an Inter-
House Athletics competition at BASS.
Inter-House Athletics at BASS
180
160
140
120
Total Points
100
80
60
40
20
10
Table 2
School Enrolment at Texas JSS
Form Male Female
1A 18 35
1B 30 25
1C 36 12
2A 22 30
2B 25 25
2C 40 18
The figure below is a compound column bar graph showing school enrolment at
Texas JSS by gender. School Enrolment at Texas JSS by Gender
45
40
35
30
25
Enrolment
Male
20
Female
15
10
0
1A 1B 1C 2A 2B 2C
Forms
It is known as composite or stacked bar chart. It is used when a set of data combines
to form a total. The total is the length/height of the bar. It allows for visual
comparisons between different components ie how components contribute to the
total of the category.
11
School enrolment at Texas JSS
1. Draw two axes, a vertical and horizontal. Label the vertical axis by the source of
the values/scores e.g. enrolment, points etc. Label the horizontal axis by the
names of the categories.
2. Divide vertical scale by points considering the lowest value and the highest
value. Choose appropriate scales such that the bars are not too tall or too short
and must start with zero.
3. Construct equally wide and equally spaced bars for each category with the height
of the bar being the value/score for the category on the horizontal axis, which has
the names of the categories as the label.
4. Where computer softwares such as Microsoft Excel and SPSS are not available, it
is recommended that graph sheets be used.
5. Shade/colour the bars to differentiate bars and components.
Teachers can use bar graphs in several ways. Enrolment by classes, courses and
subjects and inter-house competitions can be represented by bar graphs.
Pie Chart
Pie charts use nominal or categorical data. Pie charts are represented in the form of a
circle of 360 0 sliced into the shape of „pies‟. Each pie is cut from an angle at the
centre of the circle. The angle corresponds to the data for each category or group.
Pie charts give a pictorial view and the contributions of the parts that make a whole.
An example is shown below.
Table 3
Performance in Inter Hall House Athletics at BASS
One
Two
Three
Four
Five
13
Constructing pie charts
1. Calculate the degree equivalents for the value of each category/group by dividing
the total point for each group by the overall total points and multiple the result by
3600.
For example, for House One above we have:
120 100
360 0 72 0 and for House Two, we have 360 0 60 0
600 600
2. Use a pair of compass and protractor to draw the circle and the sectors based on
the degrees calculated.
3. Shade/Colour the sectors to differentiate one from the other.
Uses
Pie charts can be used by teachers and educational practitioners for examination
results by number of passes in various subjects, school enrolment by class, form
or subjects.
Line Graphs
Data that are related to time are best used for line graphs. Time could be days,
weeks, months and years. Line graphs show changes in the data over a period of
time. Data from interval and ratio scales are most appropriate. Line graphs
could be simple or compound. Simple line graphs give a pictorial description of
the data. Compound line graphs compare group data over a period of time.
14
Examples are shown below.
Table 4
Attendance at monthly teachers‟ workshops
Month Total
January 120
February 85
March 100
April 150
May 90
June 85
July 100
August 60
September 90
October 75
November 100
December 150
The figure below is a simple line graph showing attendance at a monthly teachers‟
workshop.
Attendance at monthly teachers‟ workshops
160
140
Number of attendants
120
100
80
60
40
20
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
15
Table 5
Attendance at monthly teachers‟ workshops
Attendance
Month Female Male
Jan 60 60
Feb 45 40
March 60 40
April 80 70
May 50 40
June 40 45
July 50 50
August 25 35
Sep 40 50
Oct 30 35
Nov 50 50
Dec 90 60
0 Female
5 Male
0
4
0
3
0
2
0
1
0
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Months
16
Constructing line graphs
1. Draw two axes, a vertical and horizontal. Label the vertical axis by the source of
the values/scores e.g. attendance, enrolment, points etc. Label the horizontal axis
by the time period e.g. months, days, weeks etc.
2. Divide vertical scale by points considering the lowest value and the highest
value. Choose appropriate scales such that the graph is not too tall or too flat and
must start with zero.
3. Plot the value/quantity for each time period on the graph and join all the points
by a straight line.
4. Where computer softwares such as Microsoft Excel and SPSS are not available, it
is recommended that graph sheets be used.
Uses
Teachers and educational practitioners can use line graphs in several ways.
Examination results over a period of years in a subject, total school enrolment as
well as enrolment by subjects and courses for a period of time can be represented
by line graphs.
FREQUENCY DISTRIBUTIONS
Data normally comes in raw or ungrouped form as shown below for 40 students in a
Statistics class.
17
The raw data alone does not give much information. In the example above, we can
best know the highest score (93) and the lowest score (49). A lot more information
can be obtained if the data are treated or put in other forms. One of the ways to
obtain information from data is to use frequency distributions.
Table 6
Ungrouped frequency distribution table
18
Ungrouped frequency distributions are not very useful for further work. Zero
frequencies are often common and the tables are sometimes too tall.
For grouped frequency distributions, the individual scores are put into groups or
classes. The scores are most often put in groups/classes of 3, 5, 7, 9, and 10 as
group sizes. Column 2 provides the mid-points, column 3 the tallies, and
column 4 the frequencies.
Table 7
Grouped frequency distribution of Statistics students‟ performance
Features
20
Table 8
An expanded frequency distribution table
22
Practice Exercise
Given the following scores of 50 students in a Statistics class, and using a class
width of 5, construct a grouped frequency distribution table. Also obtain the
cumulative percentage frequencies, and cumulative relative frequencies.
32 38 25 40 47 22 48 45 20 35
16 18 10 6 8 11 33 30 28 27
42 35 30 34 31 21 25 12 20 25
43 33 36 39 42 17 19 22 26 10
33 38 32 22 26 42 37 35 40 46
GRAPHIC REPRESENTATIONS OF
FREQUENCY DISTRIBUTIONS
Histogram
Histograms use data from ratio or interval scale and depend on frequency
distributions. It uses the classes and the frequencies from the frequency distribution
table. An example is shown below.
F 40
r
e 30
q
20
10
0 5 10 15 20 25 30 35 40 45 50 55 60
Classes
23
To construct a histogram
1. Draw two axes, a vertical and horizontal. Label the vertical axis by frequency
and the horizontal axis scores/classes.
2. Select an appropriate scale on the vertical axis considering the highest/largest
value. When using a graph sheet, the scale should be such that the bars are not
too tall nor too short.
3. Use class midpoints/marks or class boundaries or class limits to label the points
on the horizontal axis.
4. Draw bars of equal width representing the classes from a frequency distribution
table with corresponding heights as the frequencies.
Importance
1. It gives a pictorial description of the raw data, providing information about the
nature of the data.
2. It gives the direction of performance in terms of academic performance (i.e.
skewness).
F 40 F 40
r r
e 30 e 30
q q
20 20
10 10
0
5 10 15 20 25 30 0 5 10 15 20 25 30
Classes Classes
Skewed to the right Skewed to the left
Group performance tends to be low Group performance tends to be high
3. It provides an estimate of the most typical score. This is the intersection of the
two diagonals of the tallest bar.
Frequency Polygon
Frequency polygon uses data from ratio or interval scales and depends on
frequency distributions. It uses the classes and the frequencies from the frequency
distribution table. An example is shown below.
24
F
r
e
q
Classes
Importance
1. It gives a pictorial description of the raw data, providing information about the
nature of the data.
2. It provides an estimate of the most typical score. This is the point on the
horizontal axis where the highest point of the polygon is located.
25
Form 1
Form 2
10 20
The diagram shows that Form 2 class, which is more to the right, performs better.
The most typical scores, where the highest point of the polygon is located can be
used to confirm the comparisons. Where the total frequencies are not the same, use
relative frequencies in place of the actual frequencies to draw the polygon.
A B C
26
Plot the graph using the upper class boundaries of each class against the cumulative
percentage frequencies.
C 100
U 80
M 70
% 60
50
F 40
R 30
E 20
Q 10
0 10 20 30 40 50 60 70 80
CLASSES
To construct an ogive,
1. Obtain cumulative percentage frequencies.
2. Plot the cumulative percentage frequencies in each class on the vertical scale.
Choose appropriate scales, on a graph sheet, such that the ogive is not distorted.
3. Label the horizontal axis as scores or classes.
4. Plot at the upper class boundary of each class the relevant values of the
cumulative frequency. Join the points with a straight line.
5. Extend the line one class to the left so that the polygon touches the horizontal
axis.
Importance
1. It is used for comparisons of distributions of performance especially for
distributions where the class/group sizes are not the same. Generally, the graph
that moves more to the right has better performance. The median score obtained
at the cumulative frequency of 50 is also used.
27
Given the following performances in a test, draw two ogives. Which school
performed better?
School A School B
Classes Frequency Cum. % Freq Frequency Cum. % Freq
91 - 100 1 100 7 100
81 – 90 2 99 17 95.3
71 – 80 11 97 30 84
61 – 70 24 86 25 64
51 – 60 20 62 15 47.3
41 – 50 16 42 11 37.3
31 – 40 12 26 19 30
21 – 30 8 14 14 17.3
11 - 20 4 6 6 8
1 - 10 2 2 6 4
Total 100 150
2. It is used to determine percentiles and percentile ranks. Later in the course, you
will learn how to obtain the percentiles and percentile ranks.
A box and whisker plot is drawn below. Later in the course, you will learn how to
obtain the percentiles and quartiles.
Q1 Q2 Q3
P10 P90
An example.
Assume that the following values were obtained for two classes, Form 1A and Form
1B in a class test in Mathematics.
P10 Q2 P90
Form 1A 10 40 73
Form 1B 28 56 91
28
The information is presented below by two box and whisker plots.
Form 1A
Q1 Q2 Q3
P10 P90
Form 1B
Q1 Q2 Q3
P10 P90
0 25 50 75 100
10 28 40 56 73 91
It can be observed that P10 , Q1, Q2, Q3, and P90 values are greater in Form 1B than
in Form 1A. This means that performance is better in Form 1B than in Form 1A.
Also note that the graph for Form 1B has moved more to the right towards higher
values than that of Form 1A.
29
Practice Exercises
1. You have data on the long vacation earnings of a sample of 1,000 University of
Cape Coast students. What kind of graph is most appropriate to use to describe
the distribution of their earnings?
A. Bar chart.
B. Box and Whisker.
C. Histogram.
D. Pie chart.
2. You are writing an article for the SRC newspaper about the cost of attending a
university. You want to make a graph to compare costs at your institution and
three similar institutions. The most appropriate choice of a graph would be a
A. Bar chart.
B. Frequency polygon.
C. Histogram.
D. Pie chart.
30
4. What is the relative frequency for the class, 61 – 70?
A. 0.50
B. 0.20
C. 10.0
D. 17.0
6. Histograms are most useful for representing data when the scale of measurement
is
I. Interval II. Nominal III. Ordinal IV. Ratio
A. I only.
B. IV only.
C. I and IV.
D. I, III, IV.
Classes Frequency
46 - 50 12
41 - 45 14
36 - 40 12
31 – 35 6
26 – 30 5
21 – 25 1
Total 50
31
7. The relative frequency for the class, 36-40, is
A. 0.024
B. 0.24
C. 0.48
D. 0.52
32
UNIT 3
Illustration
33
THE MEAN ( X )
There are three types. These are Arithmetic, Geometric and Harmonic. In Education,
the Arithmetic mean is the most useful.
The Arithmetic Mean. It is the sum of the observations divided by the total number
of observations.
i.e. Add the values and divide by the number of observations.
15
i.e. 4 + 2 + 3 + 1 + 5 = 15 Mean = 3
5
Methods
The Arithmetic Mean ( X ) can be obtained from both the ungrouped and grouped
data. It can also be easily obtained from Microsoft Excel.
1. Ungrouped data
Given the following scores, 15, 12, 10, 10, 9, 20, 14, 11, 13, 16, to obtain the mean,
all the scores are added and divided by the total number of observations. The mean
2. Grouped data
Two methods can be used. These are the long method and the coding method. The
methods are used with frequency distributions.
Long method: X
fx OR X fx where f is the frequency and x, the class
n N
marks.
34
Example using the long method
Scores Midpoint Freq
X f fx
46 – 50 48 4 192
41 – 45 43 6 258
36 – 40 38 10 380
31 – 35 33 12 396
26 – 30 28 8 224
21 – 25 23 7 161
16 – 20 18 3 54
Total 50 1665
Long method X
fx 1665 33.3
n 50
Coding method: X AM
fd i , which is used for distributions with equal
n
class intervals. AM, is the assumed mean, f, is the frequency, d is the code for each
class, n is the total frequency and i, the class size.
To use the coding method, class intervals must have the same size. The class
in the middle or the class with the highest frequency is chosen for the code of 0.
Classes above the zero coded class are given positive codes and those below are
given negative codes in steps of 1.
fd i 35 33 15 33.3
Coding method X AM 33
n 50 50
35
OPTIONAL
Using Microsoft Excel
1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Mean.
4. Click the empty cell directly below where you typed Mean.
5. Click white space to the right of the fx symbol.
6. Type in =AVERAGEA(cell number where data begins from:cell number
where data ends at). E.g. =AVERAGEA(B2:B32). This means that data
begins at cell B2 and ends at cell B32.
7. Press Enter. (The mean is given in the empty cell clicked.
36
Properties of the Mean
1. The mean is influenced by every score or value that makes it up. If a score is
changed, the values of the mean changes.
3, 4, 2, 4, 7 Mean = 4
3, 4, 7, 4, 7 Mean = 5. The change of the score 2 to 7 has changed the
mean to 5.
3. The mean is a function of the sum (or aggregate or total) of the scores.
X
X
N
NX X This implies that the number of observations
multiplied by the mean gives the sum of the scores.
Of the three measures it is the only one that is a function of the sum of the scores.
It is also possible to calculate the mean for a combined group if only the means and
number of scores (N) are available.
4. If the mean is subtracted from each individual score and the differences are
summed, the result is 0.
4 – 4 =0
2 – 4 = -2
3 – 4 = -1
6–4=2
5–4=1
The distance of the score from the mean is known as the deviation.
5. If the same value is added to or subtracted from every number in a set of scores,
the mean goes up or goes down by the value of the number.
For example, given 8 2 10 4 X 6.
37
Now add 2 to each score: 10 4 12 6 X 8 ie 6 + 2
6. If each score is multiplied or divided by the same value, the mean increases or
decreases by the same value.
For example, given 8 2 10 4, X 6.
Now multiple each score by 3. 24 6 30 12 X 18 ie 6 × 3
(n 1)
For odd set of numbers, median occupies the th position.
2
For even set of numbers, find the mean of the two middle numbers or the number at
(n 1)
the th position.
2
The median can be obtained from both ungrouped and grouped data and also from
Microsoft Excel.
38
To find the median from ungrouped data
1. Arrange all observations in order of size from smallest to largest or vice versa.
2. If the number of observations, n, is odd, the median is the number at the centre or
(n 1)
the number at the th position.
2
3. If the number of observations, n, is even, the median is the mean of the two
centre observations.
Examples
39
Step 1. Identify the median class. It is the class that will contain the middle score.
N
Find the value of , where N is the total score. This is the position of the middle
2
score. Checking from the cumulative frequency column, find the number equal to
the position or the smallest
N 50
number that is greater than the position. From the table above, 25 ,
2 2
therefore the number is 30. The class that this number belongs to is the median class.
From the table above, the median class is 31 – 35.
Step 2. Use the formula below to obtain the Median.
N
2 cf
Mdn = L i where
1
f mdn
L1 is the lower class boundary of the median class
N is the total frequency
cf is the cumulative frequency of the class just below the median class
i is the class size/width
fmdn is the frequency of the median class
Mdn =
50
2 18 25 18 7
30.5 5 30.5 5 30.5 5 30.5 0.585 30.5 2.9 33.4
12 12 12
OPTIONAL
40
6. Type in =MEDIAN(cell number where data begins from:cell number where
data ends
at). E.g. =MEDIAN(B2:B32). This means that data begins at cell B2 and
ends at cell B32.
7. Press Enter. (The median is given in the empty cell clicked.
41
6. Where there are very few observations, the median is not representative of the
data.
7. Where the data set is large, it is tedious to arrange the data in an array for
ungrouped data computation of the median.
THE MODE
It is the number that occurs most frequently in a distribution.
Given the following scores, 1, 2, 4, 6, 4, 6, 7, 2, 4 the number that occurs most
frequently is 4. This is the Mode. This number appears 3 times.
Given the following scores,11, 22, 14, 26, 34, 6, 27, 12, 40 no number occurs most
frequently. There is therefore no mode.
1. The main advantage is that it is the only measure that is useful for nominal scale.
2. It is used when there is the need for a rough estimate of the measure of location.
3. It is used when there is the need to know the most frequently occurring value e.g.
dress styles.
4. It is not useful for further statistical work because the distribution can be bi-modal
or trimodal or no mode at all.
42
Practice Exercises
Use the histogram below to answer questions 1 and 2
40
F
30
R
20
E
10
Q
0 10 20 30 40 50 60 70 80
Classes
43
4. The median score for a group of 19 students was 58. A 20th student had a score
of 45. What is the new median score?
A. 10.5
B. 45.0
C. 58.0
D. It cannot be determined
7. In a class quiz, a mean of 62 was obtained with a median of 48. How would the
performance of the class be described?
A. Above average
B. Average
C. High
D. Low
MEASURES OF VARIABILITY
These are also called measures of variation, dispersion or scatter. The main
measures that are used mainly in education are:
1. The range
2. The Variance
3. The Standard Deviation
4. The Quartile Deviation (Semi-interquartile range)
They are used as single scores to describe individual differences in terms of
achievement.
For example: 48, 51, 47, 50 Total = 196 Mean = 49 …..(i)
30, 72, 90, 4 Total = 196 Mean = 49 …..(ii)
However, a closer look at the two sets of data shows that the distribution within each
set is not the same. Where the scores cluster around the mean, performance is said to
be homogeneous as in (i). Where the scores move away from the mean,
performance is said to be heterogeneous as in (ii).
THE RANGE
It is the difference between the highest and the lowest values in a set of data.
e.g.: 48, 51, 47, 50 Total = 196 Mean = 49 …..(i) Range: 51 – 47 = 4
30, 72, 90, 4 Total = 196 Mean = 49 …..(ii) Range: 90 – 4 = 86
Features
1. It is easy to compute.
2. It is easy to interpret.
3. It is a crude measure of dispersion and does not take into account all the
data/scores.
4. It ignores the spread of all the scores.
5. It uses only two values and does not consider how the other scores relate to each
other.
6. The range does not consider the typical observations in the distribution but
concentrates only on the extreme values.
7. It can give a distorted picture of the variation within a set of data.
8. Different distributions can have the same range which would give misleading
conclusions.
45
Uses
1. When data is too scanty or too scattered to justify the computation of a more
precise measure.
2. When knowledge of extreme scores or total spread is all that is needed.
Ungrouped data
This is based on raw data. It is computed by using the following formulae.
Variance (S2, )
2
X X X
2 2
1. Var ( S
2
) 2. Var ( S 2
) X 2 3.
n n
X X
2
2
Var ( S 2
)
n n
X X
2
2
Std .Dev ( S )
n N
Given a set of data as 48 51 50 47 and the mean of the distribution as 49, the
variance and the standard deviation could be computed as follows:
46
196
X 49
4
X XX X X 2
X2
48 -1 1 2304
51 2 4 2601
47 -2 4 2209
50 1 1 2500
Total 10 9614
X X
2
10
SD 2.5 1.58 Variance = 1.582 = 2.5
n 4
OR
SD
X2 2
9614
49 2 2403 .5 2401 .0 2.5 1.58 Var. =
n X 4
1.582 = 2.5
OR
2
SD
X 2 X
9614 196
2
2403 .5 2401 .0 2.5 1.58
n n 4 4
Grouped data:
This is based on a frequency distribution of the scores.
f X X f X X
2 2
Long method: SD Var
n n
fX
2
fX 2 fX fX 2
2
Short method: SD
Var
n n n n
47
2
fd 2
fd
Coding Method SD i This is useful with equal class
n
n
intervals.
Short method:
fX 2 fX
2
58765 1665
2
48
Coding Method
2
fd 2 fd
133 3
2
Standard Deviation
1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Std. Dev.
4. Click the empty cell directly below where you typed Std. Dev.
5. Click white space to the right of the fx symbol.
6. Type in =STDEVPA(cell number where data begins from:cell number
where data ends at). E.g. =STDEVPA(B2:B32). This means that data begins
at cell B2 and ends at cell B32.
7. Press Enter. (The standard deviation is given in the empty cell clicked.
Variance
1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Variance.
4. Click the empty cell directly below where you typed Variance.
5. Click white space to the right of the fx symbol.
6. Type in =VARPA(cell number where data begins from:cell number where
data ends at). E.g. =VARPA(B2:B32). This means that data begins at cell
B2 and ends at cell B32.
7. Press Enter. The variance is given in the empty cell clicked.
49
An example is below:
USES
1. It is used as the most appropriate measure of variation/dispersion when there is
reason to believe that the distribution is normal.
2. It helps to find out the variation in achievement among a group of students. i.e. it
determines if a group is homogeneous or heterogeneous.
Where the standard deviation is relatively small, the group is believed to be
homogeneous i.e. performing at the about the same level. On the other hand,
where the standard deviation is relatively large, the group is believed to be
heterogeneous, i.e. performing at different levels.
To be more precise, the coefficient of variation (CV) is computed.
CV = x 100 If the value of CV is greater than 33, the group is
x
heterogeneous, otherwise it is homogeneous.
With this information, the teacher has to adopt a teaching method to suit each
group.
3. It is helpful in computing other statistics e.g. standard scores, correlation
coefficients.
4. It is useful in determining the reliability of test scores. The split-half correlation
method or internal consistency methods use the standard deviation of the scores.
QUARTILE DEVIATION
There are two methods – the median method and the formula method.
Example.
Given the following scores, 8, 10, 12, 7, 6, 13, 18, 25, 4, 22, 9.
Q1 Median Q3
Given the following scores, 8, 10, 12, 7, 6, 13, 18, 25, 4, 22, 9, after arranging them
in ascending order as,4, 6, 7, 8, 9, 10, 12, 13, 18, 22, 25
1 1
Q1 = (n+1)th position → (12) = 3rd position
4 4
3 3
Q3 = (n+1)th position → (12) = 9th position
4 4
52
Q3 Q1 18 7 11
QD 5.5
2 2 2
N
4 cf
Q1 = LQ1 i where
fQ1
LQ1 is the lower class boundary of the lower quartile class
N is the total frequency
cf is the cumulative frequency of the class just below the lower quartile class
i is the class size/width
fQ1 is the frequency of the lower quartile class
3N
4 cf
Q3 = LQ3 i where
fQ3
Example
Classes Midpoint Freq Cum Freq
X f cf
46 – 50 48 4 50
41 – 45 43 6 46
36 – 40 38 10 40
31 – 35 33 12 30
26 – 30 28 8 18
21 – 25 23 7 10
16 – 20 18 3 3 __
Total 50 __
53
Step 1. Identify the quartile class. It is the class that will contain the quartile of
N 3N
interest. Find the value of , for the lower quartile and for the upper quartile
4 4
(where N is the total score) as positions. Checking from the cumulative frequency
column, find the number equal to the position or the smallest number that is greater
N 50
than the position. From the table above, 12.5 , therefore the number is 18
4 4
3N 150
and 37.5 therefore the number is 40. The classes that these numbers
4 4
belong to are the quartile classes. From the table above, the lower quartile class is 26
– 30 and the upper quartile class is 36 – 40
.
Step 2. Use the formulae below to obtain the lower and upper quartiles.
N
4 cf
Q1 = LQ1 i =
fQ1
50
4 10 12.5 10 2 .5
25.5 5 25.5 5 25.5 5 25.5 1.5625 27.06
8 8 8
3N
4 cf
Q3 = LQ3 i =
fQ3
150
4 30 37.5 30 7 .5
35.5 5 35.5 5 35.5 5 35.5 3.75 39.25
10 10 10
54
OPTIONAL
Using Microsoft Excel
First/Lower Quartile
1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Q1(Lower
Quartile).
4. Click the empty cell directly below where you typed Q1.
5. Click white space to the right of the fx symbol.
6. Type in =QUARTILE(cell number where data begins from:cell number
where data
ends at,1). E.g. = QUARTILE(B2:B32,1). This means that data begins at
cell B2 and
ends at cell B32 and 1 means first or lower quartile) .
7. Press Enter. The Q1, first/lower quartile is given in the empty cell clicked.
Third/Upper Quartile
1. Open Excel
2. Type in data to be used in one column, if data is not yet entered.
3. Click an empty cell where you want the result to be and type in Q3 (Upper
Quartile).
4. Click the empty cell directly below where you typed Q3.
5. Click white space to the right of the fx symbol.
6. Type in =QUARTILE(cell number where data begins from:cell number
where data
ends at,3). E.g. = QUARTILE(B2:B32,3). This means that data begins at
cell B2 and
ends at cell B32 and 3 means third or upper quartile) .
7. Press Enter. The Q3, third/upper quartile is given in the empty cell clicked.
55
An example is shown below.
QD
CV = x 100 If the value of CV is greater than 33, the group is
Mdn
heterogeneous, otherwise it is homogeneous.
With this information, the teacher has to adopt a teaching method to suit
each group.
56
3. It does not make use of all the information provided by the scores.
Class Exercise
Distribution of examination scores for Statistics students
Classes Frequency
46 - 50 12
41 - 45 14
36 - 40 12
31 – 35 6
26 – 30 5
21 – 25 1
Total 50
57
Practice Exercises
1. The standard deviation is a measure of the
A. Center of a distribution.
B. Center of the mean deviation.
C. Validity of a measurement.
D. Variability of a distribution.
5. When a distribution is highly skewed to the right, the most appropriate measure
of variability is the
A. first quartile.
B. mean deviation.
C. quartile deviation.
D. standard deviation.
58
6. The first quartile in a distribution of scores is 10.0. The third quartile in the same
distribution is 25.0. What is the value of the semi-interquartile range?
A. 7.5
B. 10.0
C. 15.0
D. It cannot be determined.
7. In a Psychology course quiz, the mean was 45 with a standard deviation of 15.
The instructor later added 10 points to every student‟s score. What is the new
standard deviation?
A. 10
B. 15
C. 25
D. It cannot be determined
59
UNIT 5
There are two main measures. These are Percentiles and Percentile Ranks, Z scores
and T scores. Z scores and T scores are often referred to as standard scores.
The main purpose of these measures is to describe an individual‟s position in relation
to a known group or the norm group.
PERCENTILES
Definition: They are points in a distribution below which a given percent, P, of the
cases lie.
There are 99 percentiles that divide a distribution into 100 equal parts.
Percentiles are individual scores.
Notation: P40 = 60. Sixty is the score below which 40% of the scores lie in a
specific group after the scores have been arranged sequentially. This
means that a student who obtains a score of 60 has done better than
40% of the members in the specific group.
P75 = 50. Fifty is the score below which 75% of the scores lie in a
specific group after the scores have been arranged sequentially. This
means that a student who obtains a score of 50 has done better than
75% of the members in the specific group.
A score in one group may be a different percentile in another group.
For example, in Statistics Quiz 1, a student with a score of 15 may be at P90 in the
Social Science group but the same score may put the student at P85 in the Home
Economics group.
P50 is the same as the median. P25 is the first quartile and P75 is the third
quartile.
PERCENTILE RANKS
Definition: The percentage of cases falling below a given point on the measurement
scale. It is the position on a scale of 100 to which an individual score
lies.
Notation: PR of 60 = 75. Seventy-five is the position for a score of 60 when the
distribution is divided into 100 parts. This means that a student who
obtains a score of 60 has 75% of the scores falling below him/her in
the group.
60
The easiest way to obtain percentiles and percentile ranks is to use the ogive
(cumulative percentage graph).
100
90
80
70
60
50
40
30
20
10
0
0 5 10 15 20 25 30 35 40 45
Scores
From the ogive, P60 = 34. PR of a score of 26 is 40.
XX
Formula: Z , T = 50 + 10Z, where mean is 50 and standard deviation
s
is 10.
1. A student had a Z score of 2.5. The mean for the class was 60 with a standard
deviation of 4.0. What was the student‟s observed score?
XX X 60
Z → 2.5 → 10 X 60 →X = 10 + 60 = 70
s 4
XX 70 X
Z → 3 .5 → 17.5 70 X → X = 70 − 17.5 = 52.5
s 5
USES
Salome has done better in Mathematics than Social Studies, considering the class
performance.
3. It helps the teacher to guide and counsel the student to choose the correct
course for a future career and vocation
62
Practice Exercises
1. Scores on an SSSCE Social Studies paper had a mean of 46. Joana obtained a
score of 70, giving her a standard score of 3.0. What was the standard deviation
of the scores?
A. 3
B. 8
C. 24
D. 72
Use the ogive below to answer questions 2 & 3.
100
%age
Cum.
Freq. 80
60
40
20
0 10 20 30 40 50 60 70 80 90
Scores
5. Joyce‟s percentile rank in an end of year examination was 80. Her actual
examination score was 95. This information means that she performed better
than _____ of the students in the class.
A. 5%
B. 20%
C. 80%
D. 95%
6. Paul‟s score in a final examination is at the 80th percentile of the scores in the
class. Paul‟s score lies
A. above the third quartile.
B. at the median.
C. below the first quartile.
D. between the median and the third quartile.
MEASURES OF RELATIONSHIPS
Concept
Natural relationships exist in the world. Parents and children as well as twins have
things in common. Males are normally attracted to females and rain results in good
harvest.
The concept of correlation provides information about the extent of the relationship
between two variables. Two variables are correlated if they tend to „go together‟.
For example, if high scores on one variable tend to be associated with high scores on
a second variable, then both variables are correlated.
The statistical summary of the degree and direction of the linear relationship or
association between any two variables is given by the coefficient of correlation.
Correlation coefficients range between -1.0 and +1.0. Correlation coefficients are
normally represented by the symbols, r and ρ (rho).
Scatter plots
A scatter plot or scatter diagram shows the nature of the relationship between any 2
variables. To obtain a scatter plot, marks are made on a graph representing the
intersection of the two variables. Scatter plots could either be linear or curvilinear.
65
Examples
100
80
Chemistry
60
Linear relationship
40
20
0
0 20 40 60 80 100
Mathematics
100
80
English
60 Curvilinear relationship
40
20
0
0 20 40 60 80 100
Accounts
Assumptions
1. The variables are random. Neither the values of X nor Y are predetermined.
2. The relationship between the variables is linear.
3. The probability distribution of X‟s, given a fixed Y, is normal, i.e. the sample
is drawn from a joint normal distribution.
4. The standard deviation of X‟s, given each value of Y is assumed to be the
same, just as the standard deviation of Y‟s given each value of X is the same.
66
Assume the following scores in two tests.
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X 14 16 15 10 9 18 18 14 12 13 15 18 10 12 16 20 15 12 14 10
Y 10 12 15 10 12 15 15 12 14 14 14 10 12 15 10 12 15 15 10 14
(a) Direction: Positive, (+) High values go with high values and low values go
with low values.
Negative (─) High values go with low values and low values go with
high values.
67
Some examples
68
Moderate linear positive
correlation
Computational examples
r=
Co var iance ( X , Y )
=
( X X )(Y Y ) =
S X .S Y n S x.S Y
( X X )(Y Y ) ……...................….(1)
( X X ) . (Y Y )
2 2
n XY X Y )
r= .....(2)
[ n X 2 ( X ) 2 ][ n Y 2 ( Y ) 2 ]
70
Student Quiz Quiz
XX (X X )2 Y Y (Y Y ) 2 ( X X )( Y Y )
1 2
No. X Y
1 4 6 -2 4 -1 1 2
2 8 8 2 4 1 1 2
3 10 9 4 16 2 4 8
4 7 7 1 1 0 0 0
5 6 8 0 0 1 1 0
6 3 2 -3 9 -5 25 15
7 8 9 2 4 2 4 4
8 5 10 -1 1 3 9 -3
9 5 6 -1 1 -1 1 1
10 4 5 -2 4 -2 2 4
Total 60 70 44 50 33
Note: X = 6 and Y 7
Using Formula 1:
33 33
r= = = 0.7
( 44)(50) 46.9
2. The Spearman rank correlation coefficient (ρ): For ordinal scale variables
6 d 2
ρ = 1
N N 2 1
71
Given the following scores:
6 d 2 643 258
ρ = 1 1 1 1 0.26 0.74
N N 1
2
10100 1 990
2
Φ=
n
This is used when there are only two sub-categories for rows as well as columns i.e.
2x2
2
C=
n 2
This is used when there is at least more than two sub-categories for either row or
column. i.e. 2x3, 3x3, 2x4, 3x4, etc.
72
Oij Eij
r c 2
i 1 j 1
Eij
where Oij is the observed count in each cell and Eij is
Gender
Male Female
Result Total
r c Oij Eij 2 150 125 2 100 125 2 50 75 2 100 75 2
= 125 125 75 75
i 1 j 1 Eij
2 26.66
Φ= = = 0.365
n 200
The result shows that there is a weak positive association between gender and
passing a driving test.
73
Example 2: Association between Halls of Residence and Region of Birth (3x3)
Hall of Residence
Region of Birth Hall 1 Hall 2 Hall 3 Total
Region 1 40 30 30 100
(30) (30) (40)
Region 2 50 40 60 150
(45) (45) (60)
Region 3 30 50 70 150
(45) (45) (60)
Total 120 120 160 400
The figures in bold and in bracket are the expected counts in each cell.
r c 2
Oij Eij
i 1 j 1
Eij
=
40 30 2
30 30 2
30 40
2
50 45
2
40 45
2
30 30 40 45 45
60 60 2
30 45 2
50 45
2
70 60
2
60 45 45 60
100 0 100 25 25 0 225 25 100
=
30 30 40 45 45 60 45 45 60
= 3.3+0.0+2.5+0.56+0.56+0.0+5.0+0.56+1.67 = 14.15
2 14.15 14.15
C= 0.0342 0.185
n 2 400 14.15 414 .15
The result shows that there is a very weak positive association between gender and
passing a driving test.
74
Uses of correlation in education
Practice Exercises
1. The Phi Coefficient (Φ) is the most appropriate measure of linear relationship
when two variables are:
A. Both continuous.
B. Both natural dichotomies.
C. Continuous and natural dichotomy.
D. Continuous and artificial dichotomy.
2. A University professor wishes to find the relationship between the age of the
students in her Statistics class and the scores in a quiz. What is the most
appropriate measure of relationship to use?
A. Pearson‟s product moment correlation coefficient
B. Phi coefficient
C. Point-biserial correlation coefficient
D. Spearman‟s rank correlation coefficient
75
3. Which of the following correlation coefficients indicates the strongest
relationship?
A. –0.6
B. 0.07
C. 0.25
D. 0.55
4. A teacher wishes to find the relationship between the spelling ability of the
students in her class and their performance in the end-of-semester examination.
What is the most appropriate measure of relationship to use?
A. Pearson‟s product moment correlation coefficient
B. Phi coefficient
C. Point-biserial correlation coefficient
D. Spearman‟s rank correlation coefficient
5. The correlation between study habits and achievement in Statistics has been
found to be 0.92 in a study. The study implies that a student with a
A. high score in study habits is more likely to score low in Statistics.
B. low score in study habits is more likely to obtain a moderate score in
Statistics.
C. low score in study habits is more likely to score low in Statistics.
D. moderate score in study habits is more likely to score high in Statistics.
6. A teacher wishes to find the relationship between the gender of students in his
class and their performance in the end-of-semester examination. What is the
most appropriate measure of relationship to use?
A. Phi coefficient
B. Spearman‟s rank correlation coefficient.
C. Point-biserial correlation coefficient.
D. Pearson‟s product moment correlation coefficient.
76
The graph below shows the relationship between achievement in Mathematics and
English. The correlation of the relationship is approximately
60
50
40
English
30
20
10
0
0 10 20 30 40 50
Mathematics
A. –0.8
B. –0.2
C. 0.3
D. 0.8
The scatter plot shows that students with low college entrance examination scores are
more likely to have
A. low college GPA.
B. high college GPA.
C. perfect college GPA.
D. zero college GPA.
77
UNIT 7
Conditions/Assumptions
1. The possible values of the independent variable, X, are fixed in advance.
2. The true relationship between the variables, X and Y, is linear and expressed by
the equation, Y = a + bX +ei known as the regression equation. a and b are
parameters of the population and are estimated while ei is the random error. The
equation is the line of regression of Y on X. a is the Y intercept and b is the
regression coefficient or the slope of the regression line.
3. The probability distribution of Y‟s, given a fixed X, is normal.
b slope
a
intercept
78
1. The first step is to present the variables on a scatter diagram to be sure that the
relationship between the variables is linear.
2. Normal equations are solved to obtain the equations for the parameter estimates in
raw score form.
Y = a + bX
∑Y = na + b∑X
∑XY = a∑X + b∑X2
Intercept, a =
Y - b X OR a = Y bX
n
The intercept is the point on the Y axis where X, the independent variable has a value
of 0.
Example
The following scores were obtained in Quiz 1 and Final Examination.
Quiz 1 Final
Exam
X Y XY X2
18 75 1350 324
12 55 660 144
10 45 450 100
20 85 1700 400
15 65 975 225
15 65 975 225
14 60 840 196
10 60 600 100
12 50 600 144
11 50 550 121
18 70 1260 324
16 75 1200 256
9 45 405 81
13 60 780 169
17 70 1190 289
79
∑X = 210 ∑Y = 930 ∑XY = 13535 ∑X2 = 3098
b=
XY - nXY =
13535 - 15 14 62 13535 - 13025 510
= = =3.23
X nX
2 2
3098 15 14
2
3098 2940 158
OR
11.77
b = 0.93 =3.22
3.36
a=
Y - b X =
930 3.23210 251.7
= =16.8 OR a = Y bX =62-3.23(14) =
n 15 15
16.8
Use in prediction
After obtaining the estimates of a and b, the least squares regression line can be
drawn using two values for X (including X = 0 to obtain the intercept).
Corresponding Y values are obtained for the X values and these values are used to
draw the estimated regression line. Values can then be read from the regression line
to obtain the predicted values.
Method 1
Given Yˆ 16.8 3.22 X ,
Select two values say 0 and 10 for x and compute the corresponding Y values.
For example:
X = 0, Y = 16.8 +3.22(0) =16.8
X = 10, Y = 16.8+3.22(10)=49
Plot the values (0, 16.8) and (10, 49) on the graph using a graph sheet and draw a
80
straight line. Then estimate any value of Y given an X value on the regression line.
Yˆ 16.8 3.22 X =
49
16.8
0 10
Method 2
The estimated regression equation can be used by substituting the given X values to
obtain the predicted values for Y.
i. What would be the exam score for a student who obtains 12.5 in Quiz 1?
ˆ
Y 16.8 3.22(12.5) 57
ii. A student obtained 72 in her exam. However, she did not take part in
Quiz 1. What would be an estimate of her Quiz 1 score?
72 16.8 3.22 X
72-16.8 = 3.22X
3.22X = 72 ─ 16.8
72 16.8
X= =17
3.22
81
Practice Exercises
The following scores were obtained by 20 students in an intelligent test which was
used to predict final examination scores in Educational Psychology.
2. A student obtained 98 in the intelligent test. What is his final examination score
in Educational Psychology?
3. A student obtained a final examination score of 75. What was her score in the
intelligence test?
82
Use the following information to answer questions 4 and 5.
A study gathers data on the outside temperature during February and March, in
degrees Celsius, and the amount of electricity an office complex consumes, in
kilowatts. Call the temperature, x and electricity consumed, y. The office complex
uses an air-conditioner, so x helps explain y. The least-squares regression line for
predicting y from x is
y = -1344 + 19x
4. It can be seen from the equation of the line that as the temperature goes up,
electricity used, y, goes
A. Down because –1344 is less than 19.
B. Down, because the slope 19 is positive.
C. Up, because the intercept 1344 is negative.
D. Up, because the slope 19 is positive.
5. When the temperature goes up 1 degree, the electricity usage predicted by the
regression line goes
A. Down 1 kilowatt.
B. Down 19 kilowatts.
C. Up 1 kilowatt.
D. Up 19 kilowatts.
6. The following estimate of a least squares regression equation was obtained for
headteachers‟ supervision and achievement in a private school.
Y = –5.0 + 1.5X, where Y is the achievement score and X is the supervision
rating.
Mr. Danso obtained a supervision rating of 40. What is the estimate of his score
in achievement?
A. 6.5
B. 55.0
C. 65.0
D. 65.5
83
UNIT 8
The horizontal axis is measured in terms of standard deviation units. The values
decrease to the left and increase to the right from the centre.
Suppose the standard deviation is 4 with a mean of 21. The distribution takes the
form below.
-3 -2 -1 μ 1 2 3
9 13 17 21 25 29 33
84
Symbol
μ, is the mean and σ 2 , the variance. This is read as „the variable, X, is distributed
as norma1 with
a given mean and a given variance.
Features
1. It is a bell-shaped curve.
2. It is unimodal.
3. It is symmetrical.
4. It is asymptotic.
5. The total area under the curve is 1.0.
6. The mean, mode and median are all equal.
7. When the values of a normal distribution have been converted to standard z-
scores, a standard normal curve is obtained. The standard normal curve has a
mean of 0 and a standard deviation/variance of 1.
-3 -2 -1 0 1 2 3
Mean = mode = median = 0
The mean of 0 also means that the Z value is 0.
85
8. Areas under the normal curve. Note that these areas are obtained from the table
on normal distributions. Refer to Appendix to follow the areas.
Basic applications
Finding Probabilities
1. The distribution for a Statistics examination is normal with a mean of 60 and
variance of 64 (i.e. X ~ N(60, 64). A student is selected at random from the
class. What is the probability that the student selected obtains a score above
68? Above 76? Below 52?
68 60
P (X>68) = P( Z )
8
8
= P( Z )
8
= P ( Z 1)
0 1
=0.5000-0.3413
= 0.1587
86
2. The distribution for a Statistics examination is normal with a mean of 60 and
variance of 64 (i.e. X ~ N(60,64). A student is selected at random from the
class. What is the probability that the student selected obtains a score
between 52 and 76? Between 68 and 76?
52 60 76 60
P (52<X<76) = P( Z )
8 8
8 16
= P( Z )
8 8
= P ( 1 Z 2)
─1 0 2
=0.3413 + 0.4773
= 0.8186
12 16
P (X<12) = P( Z )
2
4
= P( Z )
2
= P(Z<−2)
─2 0
=0.5000 − 0.4772
= 0.0228 (About 2%. Actual 2.28%)
87
2. Given that a distribution is normal, with a mean of 50 and a standard deviation of
10. From a class of 2000 students, approximately how many students obtained
scores above 70? Between 40 and 60?
70 50
P (X>70) = P( Z )
10
20
= P( Z )
10
= P ( Z 2)
=0.5000-0.4772
= 0.0228
Number of students: 0.0228 x 2000 = 45.6 ≈ 46
3. In a promotion examination, a pass mark was fixed at 40. Given that the
distribution is normal, with a mean of 50 and a standard deviation of 5.1,
approximately how many students failed from a class of 400?
40 50
P (X<40) = P( Z )
5.1
10
= P( Z )
5.1
= P ( Z 1.96)
─1.96 0
=0.5000-0.475
= 0.025
Number of students: 0.025 x 400 = 10
88
Practice Exercises
89
5. Which of the following statements is true about the standard normal curve?
A. It is bi-modal.
B. Mean is 1.0.
C. Mean is less than median.
D. Variance is 1.0.
90
References
91
92
93