Fund-Statistics CorrectedVersion

CHAPTER 1
INTRODUCING STATISTICS
1.0 INTRODUCTION
In this chapter, we shall consider the following as

learning objective
 Define statistics
 State some branches of statistics
 Identify some types of data
 Enumerate and discuss methods of data collection
 Discuss various types of sampling
1.1 DEFINITION OF STATISTICS

The word statistics is seen frequently in journals, books,
magazine and TV. They mean different things depending
on the context and usage. In the first place, the word
statistics means records or data. You may be required to
give the records (statistics) of students in a certain class
and this may simply mean a data set consisting of
names, sex, and matriculation numbers. Secondly,
‘statistics’ is the plural of the word ‘statistic’. The word
‘statistic’ which also means estimator is a measurable
characteristic that is computed in the sample and used
for estimating equivalent quantities in the population.
Some examples of statistic (estimator) are sample mean,
sample standard deviation and sample variance.
Thirdly, statistics is a course of study which may be

defined as a branch of mathematics or science that deals
1
with methods of planning experiments for obtaining data
and drawing conclusion or making useful decisions on
the basis of available data. The subject itself may be split
into
(a) Statistical inference
(b) Descriptive statistics
(c) Sampling, and
(d) Decisions analysis
In statistical inference, we make conclusions (inference)

about a population based on samples drawn from it by
means of probability. Descriptive statistics involves
characterizing and summarizing a given data set without
direct reference to inference. Sampling statistics consists
of methods of obtaining samples for statistical inference.
In decision analysis, one considers the best action to
take with respect to several possible options that are
available. For instance, after a business executive has
been presented with the results of an investigation
concerning the expected sales resulting from opening a
new branch of the firm, he must decide whether to open
the branch or not.
The role of decision making has recently become
increasingly important and popular in the solution of
business problems. Such methods require a course in
elementary statistics as a basis, but many of them
require fairly sophisticated mathematical and statistical
techniques as well, and therefore are best considered in
a higher course.
1.2 TYPES OF DATA
2
Data classification is done in many ways. Data can be
said to be either internal or external, primary or
secondary cross-sectional or time series. Other
classification is to treat data as either qualitative or
quantitative. Finally, data can be either discrete or
continuous.
1.2.1 INTERNAL AND EXTERNAL DATA

Internal data set is one that is generated through routine
business, tasks and activities. The generation of data
may be by an individual, for examples records of
students performance kept by a lecturer or records by a
scientist, observations or measurements in laboratory
experiment for a certain time duration. Data on
production and stocks kept by a manufacturing
company are also examples of internal data.
Data that come from outside the organization is simply

referred to as external data.
1.2.2 PRIMARY AND SECONDARY DATA

Primary data are measurements that are observed and
recorded by the user as part of an original study. Such
data are usually not available somewhere else since they
are basically compiled for the first time by the user
himself. Such data are normally used for the purpose for
which they are compiled.
Secondary data are those that have been generated and

assembled by someone else or by some other agency
which is different from the user. In many instances such
data may be used for various purposes that are different
from the initial objective which motivated the compilation
in the first instance.
3
1.2.3 CROSS-SECTIONAL AND TIME-SERIES
DATA
Cross-sectional data refers to data that is obtained from a
‘cross-section of the population’ at one point in time.
Some few examples of cross-sectional data are the
weights of first year students in the Department of
Mathematics of UST recorded on the first day of lectures
and infant mortality records in a hospital at any time of
the month. Another is data relating to frauds in an
organization or the ages of some company directors at a
time.
Time-series data, on the other hand, refers to a set of
records that are observed and recorded on the same unit
or object at different points in time. The time interval
between such recordings may be equal (periodic) or
unqual (nonperiodic). Examples of time-series data are
(i)Yearly production output of a manufacturing company
(ii) Monthly sales in a supermarket
(iii Weekly records of rainfall
(iv) Daily records in the stock markets and exchange
rate
(v) Hourly temperature readings of a patient for three
days.
The data (i) and (ii) are low frequency data while (iii) , (iv)
and (v) are called high frequency data.
1.2.4 QUALITATIVE AND QUANTITATIVE DATA

When a variable measures quality or attribute on each
experimental unit, it is a qualitative variable and the
values assumed by this variable is called qualitative or
categorical data.This is data of nominal or ordinal scale
that classifies by a label or category.The labels may be
numeric or non-numeric.Examples of qualitative data
4
include colour of one’s eyes identified as being black,
brown or bright;one’s political affiliation being recorded as
Republican, Democratic, e.t.c. Taste can be ranked as
very sweet, sweet, bitter and very bitter. Crops or food
products can be rated as being of grade 1,2 or 3.A
quantitative variable measures a numerical quantity or
amount on each experimental unit. Values assumed by a
quantitative variable results in quantitative
data.Quantitative data is data of the interval or ratio level
that uses naturally occurring numeric scale. Some
examples are data on exchange rates, scores of students
in a class test and weights of boxers.
1.2.5 DISCRETE AND CONTINUOUS DATA

Discrete data is numeric data in which the values can
only come from a finite or countable list of some specific
values. Discrete data comes from a counting process.
Data on class attendance for lecture days for a semester
and voters in each ward in a local government area are
good examples of discrete data. Continuous data is
numeric data that can assume values at every point in a
given interval. Continuous data results from measuring
process.
1.3 POPULATION AND SAMPLE

The word population is a common occurrence among
people, scientists and researchers. We talk about
population of a village, city or a nation, to mean total
number of people. The units, making the population need
not be people but objects or items, for example houses,
primary schools, etc. therefore the word population
means the total collection of persons, objects or items
under study or about which information is desired. There
5
is also the issue of a sample. A sample is a portion of the
whole population. We can also see it as a subset of the
population. An economists may want to study about the
employment status of Nigerians between the ages of 24
and 30. He now goes into a large city, may be Lagos or
Abuja to draw his sample. The youths between 24 and 30
years of age all over the nation constitute the larger
population whereas the large city from which the sample
items are drawn is the sampled population therefore, we
can define target population as the set of all Nigerians
population units about which inferences will be made
while sampled population is the population of elements to
which we have observational access by means of a
sampling frame. By frame, here, we mean a list from
which a sample can be drawn. An example of a frame is
the list of voters register for a senatorial district of a state
of the Federal Republic of Nigeria.
When we collect data as the sample is drawn, we have

a sample survey. However, when we collect data from
the whole population, we have a census.
1.4 PARAMETER AND STATISTIC

A statistic (plural: statistics) is a measurable
characteristic of the sample. Some examples are the
sample mean, sample standard deviation, sample
variance, and sample correlation coefficient. A parameter
is a measurable characteristic of the population. A
parameter in the population is simply what the statistic is
in the sample. Example of parameters are population
mean, population standard devotion, population variance
and population correlation coefficient. A statistic, say b, is
also called an estimator, since it is used to find an
6
estimate of the parameter, say in the population.
Though the value of a parameter is generally unknown, it
is fixed and is not subject to variation. However, the
statistic b is a random variable, and is subject to
variation. If we take a sample, and compute the mean
, it is a random variable. This is not so with the population
mean µ which is constant. The following table gives the
usual symbols for some statistics and parameters.
Table 1.1
Mean Sample Population

Mean X (“X bar”) µ (mµ)
Standard S σ (sigma)
devotion
Variance S2 σ2
Correlation R ρ(rho)
size N N
1.5 SAMPLING AND SAMPLING DESIGNS

An estimator (statistic) is normally constructed from the
sample so that inference will be made about a desired
parameter in the population. There are reasons for
sampling or working with samples rather than the whole
population. Firstly, it is generally less costly with regards
to money. It is obviously cheaper to obtain answers from
500 families than 10000 families.Secondly, sampling
saves labour. A smaller team of personnel is needed for
field work and for tabulating and processing data. In the
third place, sampling saves time. Complete coverage
may prolong for a long time, in some cases, running into
years before bringing out the results, by which time many
7
had inevitably lost some interest.Another practical
advantage is that a sample coverage often allows a
higher overall level of accuracy than full enumeration.
Since the number of units are few, the quality of the field
work is at a higher level, more checks and test for
accuracy can be performed at all stages and more care
can be given to editing and analysis.
Sampling can either be probability sampling or non-

probability sampling.Probability sampling refers to a
specific method of sampling involving random selection.
Examples in this case are simple random sampling,
stratified random sampling, cluster sampling and two-
stage sampling .
When statistical randomness is not applied in selection,

we have non-probability sampling. Some examples are
Haphazard sampling, judgmental sampling and quota
sampling.
1.6 SOME EXAMPLES OF PROBABILITY SAMPLING
1. SIMPLE RANDOM SAMPLING

In this case, each of the N population units has an
equal chance of being one of the n selected for
measurement and the selection of one unit is
independent (or does not affect) the selection of
other units. This method of sampling is suitable for
estimating means and totals if the population is
free of major trends, cycles or patterns.
8
2. STRATIFIED RANDOM SAMPLING
This sampling design involves dividing the whole
N population elements into L parts or sub regions
called strata and applying simple random sampling
to each stratum. Stratified sampling is useful when
we have a heterogeneous population. The
population is now split into parts that are internally
homogeneous.
3. CLUSTER SAMPLING
Cluster sampling occurs where clusters of
individual elements (units) are chosen at random,
and all units in the drawn clusters are measured.
This is normally the situation where it is difficult or
at times impossible to select individual items but
where every item in the chosen clusters can be
measured. A typical example is that it may be
difficult to obtain a random sample of fish since
this would involve numbering the individual fish
and selecting n of them at random.
4. SYSTEMATIC SAMPLING
This involves taking measurements at places
and/or times following a spatial or temporal
pattern. In the case of spatial pattern, readings or
measurements may be taken at equidistant
intervals along a line or a grid pattern. For
temporal pattern, measurements of observation
may be taken at equal time intervals. This may be
the case in pollution studies where time and
location are significant in determining the level,
pattern or intensity of contamination or pollution of
somebody of water like a lake , river experimental
pond
9
5 TWO-STAGE SAMPLING.
The first stage consists of dividing the population
into primary units and a sample of primary units is
obtained by simple random sampling. The second
stage is to take a random sample of
measurements from each unit in the original
sample. A typical example is getting soil samples
(primary units) at random and then selecting one
or more aliquots at random from each soil sample.
Multi-stage sampling is an extension of two-stage
sampling involving three or more stages.
1.7 SOME EXAMPLES OF NON-PROBABILITY

SAMPLING
1. HAPHAZARD SAMPLING
This type of sampling subscribes to the idea that
place or time of sampling does not matter. The
philosophy encourages taking samples at
convenient times and locations. This results in
having biased estimates of population parameters.
It is useful when the target population is
completely homogeneous.
2. JUDGEMENT SAMPLING
The sample is obtained by subjective selection of
population units by an individual. This type of
design is useful when the target population is
clearly defined, homogeneous and completely
accessible so that one does not have the problem
of sample selection bias. At times, especially in
environmental modeling, some samples are
selected for some reasons which are different from
10
the interest of using them for making inference to
a wider population.
3. QUOTA SAMPLING
This is a type of stratified sampling in which the
selection within strata is non-random element
inherent in this sampling constitutes its greatest
weakness.
The reader can get more light on Sampling from

Cochran(1977) and Gilbert(1987).
1.8 METHODS OF DATA COLLECTION

The data relevant to the problem at hand may
already be in published form, or the nature of the
problem may require that a sample survey be
carried out or the conduct of scientific experiment
to generate the necessary data. Data may be
collected from publications, surveys and
experiments.
PUBLICATIONS: Data from published sources

automatically constitute secondary
data .Publication may come from government, for
instance, local, state or federal government-
owned agencies e,g Federal Bureau of Statistics,
National Population Commission and Central
Bank of Nigeria. Another source of published data
is private organization. These include financial
statements by companies. In addition, we have
publications by international organisations.The
Statistical Office of the United Nations is the
11
leading organization supplying international
information.
SURVEY:The purpose of a survey is to gather

statistical data which are not available in published
or any other form.There are many methods used
to obtain the required data. Some methods are
mail questionnaire and personal interviewing. The
mail questionnaire is a relatively easy method of
collecting data, especially if the respondents are
spread over a wide geographic area. Mail method
serves well for routine business and administrative
enquiries. It gives enough time for thought and
consideration with some people(this has merit and
demerit) Embarrassment in answering questions
of a personal nature is avoided. The major
drawback of mail enquiry is the low response rate.
In market research and public-opinion
investigations, where a high response rate is
required,personal interviews have been found to
be the best.Those who conduct interviews are
trained field staff armed with a standard relevant
questionaire on the problem investigated. More on
surveys, mail questionnaire and personal
interviews may be obtained from Moser and
Kalton(1979) and Connor and Morell(1981)
12
Problem Set 1
1. What do you mean by the word ‘Statistics’?
2. Write briefly on the following;
(a) Internal and external data
(b) Primary and secondary data
(c) Cross-sectional and time series data
(d) Qualitative and quantitative data
(e) Discrete and continuous data
3. Give example for data in Question two above.
4. Define
(a) Population
(b) Sample
5. Define
(a) Statistic (estimator)
(b) Parameter
State the differences between a statistic and a
parameter
13
6. Write briefly on the following;
(a) Simple random sampling
(b) Stratified random sampling
(c) Cluster sampling
(d) Systematic sampling
(e) Two-stage sampling
7. Mention three examples of non-probability
sampling
8. What do you mean by probability sampling?
Mention at least four examples of probability
sampling. What do you think are the benefits for
using probability sampling?
14
CHAPTER 2
BASIC MATHEMATICAL CONCEPTS
2.0 INTRODUCTION
One does not have to be a mathematical guru to master,

the statistical principles found in this book. The level of
mathematical sophistication required for a good
understanding of the basics of statistics is often
overblown. Really speaking statistics requires a good
15
deal of algebraic manipulation, arithmetic computation,
sound logic and the patience to focus on a point until it is
firmly grasped. Apart from these foundational
requirements, not much is needed but familiarity with
several algebraic and arithmetic procedures that most of
us learnt earlier in our post-primary education.
In this chapter we will consider grammar of mathematical

notation, discuses different types of scales of
measurement and conventions for the rounding of
numbers, among others.
2.1 THE LANGUAGE OF MATHEMATICAL

NOTATION
It is necessary to consider in a separate treatment the

three commonest notations and their involvement in
mathematical formulas and operations. These symbols
are ∑(pronounced sigma), x and n.
VARIABLE: A variable is a symbol (X,Y, Q, etc) that

may assume different values in a given problem.
Class attendance is a variable. Class attendance varies
from one lecture to another.
CONSTANT: A constant is a symbol that may

assume only one value during a particular situation. For
example in a construction site, while wages do vary,
launch allowance may remain the same for every
labourer. In this case, launch allowance is a constant. A
bonus mark awarded by a lecturer in an examination is
the same for all the students in the course, hence it is a
constant. Given that X is a variable and it assumes many
values in a given problem, it is customary to denote the
16
total number of values by n. Suppose X assumes ten
values in a problem, we write n = 10 and place a
subscript on X, by writing X1 , X2 …, X10. The sum of
these numbers is
X1 + X2 +…+ X10
In a condensed form we would represent the summation
as
Suppose we are to add from i = 6 to i=10 we write
= X6 + X7 + X8 + X9 + X10
Generally, to add n items we use
= X1 + X2 +…+ Xn
2.2 RULES OF SUMMATION

The summation sign ∑ is one of the most
frequently used operators in statistics. Below we
summarize a few rules governing the use of the
summation sign.
17
Rule 1. The sum of a constant added together n times
is equal to n times that constant. Symbolically, if c is a
constant
= nc
Thus, if c = 10 and n = 4
= = (10 +10 +10 +10)
Similarly, if is the constant and = 20, n = 25
= n = (25)(20) = 500
Rule 2: Multiplying each value of a variable by a

constant and then summing the products yields the same
result as first summing the values and then multiplying
the sum by the constant. Symbolically
Thus, if c = 5 and n = 4, X1 = 3, X2= 6, X3 = 7, X4 = 9

18
= c X1 +c X2 +c X3 + c X4
= (5)(3) + (5)(6) +(5)(7)+(5)(9
= 125
c = 5(3 + 6 + 7 + 9)
= 5 (25) = 125
Rule 3: Adding a constant to each value of a

variable and then summing these new values yields the
same result as first summing the values of the variable
and then adding that sum to n times the constant
Symbolically.
Imagine a sample in which n = 4 and X 1 = 3, X2= 6, X3 =

7, X4 = 9. The sum of the four values of the variable is
given by
= X1 + X2 + X3 + X4
= 3 + 6 + 7 + 9 = 25
19
To show the sum of the values of a variable when a
constant (c = 2) has been added to each,
= (3 + c) + (6 +c) + (7 + c) + (9 +c)
= (3+2) + (6+2) + (7+2) + (9 + 2)
= (5 + 8 + 9 + 11)
= 33
+ nc = (3 + 6 + 7 + 9) + 4(2)
= 25 + 8
= 33
Rule 4: Subtracting a constant from each value of a

variable and then summing these new values yields the
same result as first summing the value of the variable
and then taking away n times the constant from that sum.
Symbolically.
= - nc
To verify the last formula
Let and c = 2., X1 = 3, X2= 6, X3 = 7, X4 = 9.
= (3 -2) + (6-2) + (7 – 2) + (9-2)
20
= (-1) + (4) + (5) + (7)
= 17
- nc = ( 3 + 6 +7 + 9) - 4(2)
= 25 - 8 = 17
2.3 TYPES OF NUMBERS AND SCALES

A careful analysis reveals that most of the numbers we
use do not have the arithmetical properties we often
ascribe to them. We cannot legitimately apply the
operations of additions, subtraction, multiplication and
division on them. Some examples are batch numbers on
packages of manufactured products, telephone numbers,
a residential address, and vehicle registration numbers,
and room number in a faculty building in a university
campus.
Numbers are used in different ways to achieve different

aims and these aims may not include representation of
an amount or quantity. In practice, there are four basic
different ways in which numbers can be used. These are
1. To name ( nominal numbers)
2. To represent position in a series ( ordinal
numbers)
3. To represent quantity ( interval numbers)
4. To represent quantities that may also include
zero (ratio numbers)
Measurement on a unit or object is the assignment of
numbers to objects or events based on predetermined
(or arbitrary) rules. The different levels of measurement
21
depend on different levels of numerical information
contained in a data set and the mathematical operations
that can be meaningfully performed on the numbers.
NOMINAL SCALE
Measurements on a nominal scale are from observations
of unordered variables (or categories) where numerical
values are assigned to represent various classes. We
encounter this when categorizing sex, political affiliation,
ethnic nationality, colour, religion etc.
ORDINAL SCALE
We use this level of measurement when we attach order
or level of relevance of one class relative to another. The
numbers used in ordinal scales are nonquantitative (non
numeric). They indicate only position in an ordered
series and not “how much” of a difference exists between
successive points on the scale. For example, three
students may be ranked A, B and C with 80, 65 and 55
as actual marks respectively. Though the first is rank 1,
the second 2 and the third 3, we cannot tell the difference
between 1 and 2 without the actual scores.
Apart from ranking in examinations, other examples of

ordinal measurements are rank ordering of football
players according to the numbers of goals scored in a
given season, rank ordering of potential candidates for
political post using any convenient popularity index from
opinions polls.
INTERVAL SCALE
On this scale, numerical values allow the use of

arithmetic operations such as adding, subtracting,
22
multiplying, dividing. This scale lacks a real zero point
but may be arbitrarily determined. The year zero does
not mean that there was no time before this year. Other
examples are Centigrade and Fahrenheit scales. Zeros
on these scales does not mean complete absence of
heat.
RATIO SCALE
Ratios scales are ones in which a true zero origin exists.
Examples include actual number of purchases by a lady
in a given visit to a supermarket, of kilometers traveled
number of years of university education and number of
children in a family.
One’s age is a number that starts from zero.
2.4 CONTINUOUS AND DISCRETE VARIABLES

A variable is said to be discrete if it can assume only a
certain finite number of values. Discrete variables are
characterized by intervals in which no real values of the
variable are found. Provided our counting is accurate,
observations of discrete variables are always exact.
Some examples of discrete variables are weekly class
attendance in algebra, number of births in the general
hospital every month and the number of students
admitted every year into the university. Taking class
attendance X for more elaboration, suppose for the first
three weeks the attendance are respectively 40, 41 and
50. We cannot talk of an attendance of 40.5 because no
human being is “half”. Therefore, for a discrete variable,
it is possible to have some gaps between any two points.
As we can see between 40 and 41, the variable X has no
23
value. It is also not wise to believe that discrete variables
necessarily involve only whole numbers. Nevertheless,
most of the discrete variables encountered in real life are
usually expressed in terms of whole numbers.
A continuous variable is one that is capable of assuming

an unlimited number of possible values between any two
values on the scale.
That is, there are no gaps in which values of variable are
absent. For example, if the variable is age measured to
one decimal point in years, 40 and 41 are examples of
two adjacent values of the variable age. However, there
are an infinite number of values that can be between 40
and 41 that will give a person’s age. Such a value can
be 40.1, 40.6 or even 40.9. Other examples of
continuous variables are height and weight.
Apart from the issue of “gaps” measurement of discrete

variables is always exact while that of continuous
variables is always approximate. The approximation
comes in because of the level of accuracy we want. For
example, if we look at time, the time to finish a class text
may be in hours and minutes, but to finish a 100-metre
race the timing is in minutes and seconds and fractions of
seconds.
2.5 ROUNDING
Rounding of numbers involve representing a given
number with the required number of decimal places and
or significant figures for a required level of accuracy to be
attained. Let us consider a few examples.
In each case let us round to the second decimal place:
24
47.546 becomes 47.55
47.542 becomes 47.54
2.987 becomes 2.99
3.534 becomes 3.53
3.532 becomes 3.53
What is the result of the rounding process, if the digit at

the third decimal place is 5 ?.
We consider the following cases
1. If the third decimal place is 5 but the second

decimal place is an odd number, we simply add
one to the odd number
Examples are
47.535 becomes 47.54
47.51501 becomes 47.52

decimal place is even and what follows the five are
all zeros then, we call this 5 zero and add to the
even number. That is the even number remains
the same in the rounding process.
Examples are
47.545 becomes 47.54

47.54500 becomes 47.54
3.00500 becomes 3.00

decimal place is even and what follows are not all
zeros, then call this 5 one and add to the even
number.
25
Examples are
47.545002 becomes 47.55

1.62503 becomes 1.63
2.6 THE NOTION OF A FUNCTION

If a variable y depends on the values assigned to
another variable x, we say that y is a function of x and
write y = f(x) where f(x) is pronounced “f of x” or
“function of x”. The variable X is called the
independent variable. At times the variable X may be
called explanatory variable or input variable while Y
may be called the response variable or output
variable. We define an independent variable as the
presumed cause of any change in a response
variable while the dependent variable is the presumed
effect of, or response to a change (stimulus) from the
independent variable. Examples of functions are:
1. The consumption Y of a household is a

function of income X and we write
Y = F(X)
2. The weight W of a baby calculated from birth is

a function of time t and we write
W = f(t)
2.7 TABLE OF VALUES: The functional dependence

between variables may be expressed by an equation
such as Y = 5X – 2 from which values of Y can be
determined corresponding to various values of X.
26
A table showing values of Y for given values of X is
called table of values. If Y = f(X), it is conventional to let
f(2) denote the value of Y when
X = 2 and f (20) to mean value of Y when X = 20.
Thus if Y = 5x, f(3) = 5(3) = 15 is the value of Y when X =

3.
The concept of functional dependence may be extended
to two or more variables. For instance Z may depend on
both X and Y. This is written as Z = f(X,Y). A numerical
example is
z = 3x +4y + 8.
A function that depends on two or more independent
variables is called a function of several variables.
2.8 PERMUTATION AND COMBINATION

Consider n distinct objects and we are interested in
getting a subset of the n objects. Let us suppose that the
size of the subset is r. The number of arrangements of n
distinct objects taking r at a time is called the permutation
of n objects taking r at a time. This is denoted by:
n
Pr = n!
(n – r)!
where n! = n(n-1)(n-2)……..2.1 and 0! = 1. The symbol

n! is pronounced “n factorial”
In combinations, we are interested in choosing or

selecting objects without giving regard to order. If we
have n distinct objects, the total number of selections of r
27
objects from n is called the combination of n objects
taken r at a time. It is denoted by nCr and it is defined as
n
Cr = n!
r!(n-1)!
Example 2.8.1 Given the letters A, B, C, find the

number of
(i) Permutations of the letters taken 2 letters at a
time
(ii) Combinations of the letters taken 2 letters at a
time.
Solution
(i) Permutations: n = 3, r = 2 therefore
3
P2 = 3!
(3-2)! = 6
3
C2 = 3!
2!(3-2)! = 3!
2! = 3
28
Problem Set 2
1 Define the following terms giving two examples in

each case
(i) a discrete variable
(ii) a continuous variable
2 What do you mean by
(iii) Independent variable
(iv) Dependent variable
3 State and explain the four basic levels of
measurement.
4 Given that the variable X assumes the values 3, 5,
7,7, 9 and 11, find
(i) (ii)
(iii) (iv)
i=1
5 Find the values of f(x,y) when ( x, y) = (1, 2,), (3,5),
(0,6), ( 7,2) given that f(x,y) = 3xy + x2 + y2
6 Draw a table of values for the function f(x) = 5x + 2
for x = 0,1,2, 3, 4, 5, 6, 7
7 Using the tables of values in question 2.6 plot the
graph of y against x.
29
8 Round off the figures 97, 1136, 47, 80, 106 to (i)
the nearest 10 (ii) one significant figure. Compute
the absolute error, relative error and percentage
error in each case.
9 Distinguish between continuous and discrete
variables. Give examples of five continuous and
five discrete variables
10 Explain the following (a) percentage (b) rate (c)
ratio. Give an example to illustrate.
11 Find out the number of men and women in your
class. Then answer the following questions.
(a) What is the ratio of men to women?

(b) What is the ratio of women to men?
(c) What is the percentage of men in the class?
(d) What is the percentage of women in the class
30
CHAPTER 3
DESCRIPTIVE CHARTS AND GRAPHS
3 .0 INTRODUCTION
In this chapter, we shall consider among others, pictorial
and graphical techniques of presenting data. Data need
to be arranged and presented in a suitable format before
the information contained in the data can be given some
meaningful interpretation. This should be so irrespective
of the data source.
Our data source may be primary or secondary. Primary

data may be a heap of completed questionnaire or series
of numbers obtained from field statistical survey.
Secondary data may be found in government bulletins
company quarterly or annual reports, books and archival
records from which relevant information are collated for
presentation.
The sole objective of data presentation is to

communicate information clearly and at the same time
create impact. For instance, does the presentation
indicate whether national income is rising or falling? Has
volume of sales risen or fallen since the appointment of
new directors for the company?
31
3.1 SOME METHODS OF DATA PRESENTATION
In order to communicate some useful information in a
company, be it annual account/ budget to the public, say,
in a national daily, presentation may be made using text
method, tables of figures (numbers) or pictorially using
charts and graphs with tables and comments (reports).
Though a data set may have a lot of information, poor

presentation may result in a lot of issues being
overlooked. For a good result, at least the following
points should be incorporated into data presentation:
a) Clear statement of the subject matter
b) Statement of purpose of the presentation
c) Consideration of amount of detail
d) Choice of the most suitable method of
presentation
TEXT METHOD
This is a method, most often adopted by journalists in
their reports in dailies and magazines. The method of
presentation is that the data is not separated from the
written report. A typical example may take the form below
Example 3.1 “ABC University admitted 2500 students

last year. Out of this number, 500 were in the
management and social sciences, 500 come in for
engineering whereas education, science and law had 350
each, while the rest were in the faculties of agriculture
and environmental science.”
SEMI-TEXT METHOD
Here we combine text and table in order to present the
data.
32
Example 3.2: Students’ admission into ABC University
last year was as follows:
Management and Social science 500
Engineering 500
Education 350
Science 350
Science 350
Law 350
Agriculture and environmental sciences 450
Out of 2500 students admitted, 100 were later sent away
for using forged SSC certificates”
TABLE PRESENTATION
A table is a rectangular arrangement of data, with
headings for different rows and columns. More
elaborately a statistical table has the following features.
(a) Table number. This is a number giving the
position of a particular table among other tables in
the chapter.
(b) Title: This is a brief description of the objective of
the table.
(c) Stub and Caption: Stub is the first column that
contains headings for the rows. The heading for
the columns forms the captions while the data
(figures) constitute the body of the table.
(d) Source: This is a short note, normally at the
bottom of the table to indicate the source of the
data .
(e) Units: The units, be it for currency, space and
time should be indicated. Table with 10 -3 means
every entry should be multiply by 10-3 if we have
‘000,000, we have to multiply all entries by
1000,000.
33
(f) Approximations and Omissions can be
explained as footnotes.
An example of a statistical table is Table 3.1

given below.
Table 3.1: Aptitude Test Scores for Directors’ Positions

Didi and Company.
S/ Name English Numeracy Computer Management Total
No (25%) (25 %) Literacy Potentials 100
(25%) (25%) %
1 A. 20 16 20 21 77
Simeons
2 P. James 15 16 17 22 70
3 C. 19 22 21 20 82
George
4 R. 23 15 15 17 70
Martins
5 J. Henry 19 22 18 21 80
RAW DATA AND ARRAY

Normally, we may start with a raw data- data set that
awaits numerical organization. An example is the set of
ages of all management staff of Briscoe Motors in Port
Harcourt. When this data points are arranged in
ascending or descending order we have an array. The
difference between the largest and smallest numbers is
called the range of the data.
34
Examples 3.4: Construct an array (in ascending order) of
the ages 28, 20, 19, 51, 40, 18, 22 of staff in the
Marketing Department of Stern Motors. Find also the
range of the data.
Solution: The required array in ascending order is the
set 18, 19, 20, 22, 28, 40, 51 and the range of the data
set is 51-18 = 33.
CLASSIFICATION
This comes in to save time and paper space when the
array is long. Before data can be tabulated, interpreted
and presented it must be classified. Classification is the
process of relating the individual items within a collected
data set and defining of various categories for them
(these separate items) in the table.
FREQUENCY DISTRIBUTION TABLE

One of the outcomes of tabulation is the construction of
frequency distribution tables. When summarizing large
bodies of raw data, it is usual to distribute the individual
items into classes or categories and to determine the
number of separate item falling into each class, called the
class frequency. A tabular arrangement of data by
classes along with the corresponding class frequencies is
called a frequency distribution or a frequency table.
Example 3.5: Table 3.2 is a frequency distribution from
the wage list of a firm Stats & Co employing 1000persons
Table 3.2:Frequency Distribution of Wages in Stats & Co

Classes (weekly Wages to the No of Employees
nearest $)
41 – 60 15
35
61 – 80 35
81 – 100 145
101- 120 620
121 – 140 150
141 – 160 35
1000
Data organized and summarized as in the tables above

are often called grouped frequency distribution. In the
case of Table 3.2, the variable values are grouped into
intervals to provide a summary which gives a clear
pattern of wages within the firm. Although the grouping
process generally hides much of the original detail of the
data, the process pays off since we can have the “overall
picture’ of the distribution of wages in the firm and distinct
patterns are thereby made evident.
The first class, consists of wages from 41 to 60 dollars
per week and is indicated by the symbol 41 – 60 since 15
employees have wages belonging to this class, the
corresponding class frequency is 15.
Example 3.6: Table 3.3 is constructed from a traffic

survey that shows the flow of vehicles passing a
particular point during an hour.
Table 3.3: Traffic Survey

Vehicle Frequency
36
Cars 50
Lorries 23
Motorcycle 7
Buses 4
In practice, with a tally sheet, the observer may have

recorded these types of vehicles and put a mark against
the appropriate category when a vehicle passed. The
tally sheet may have been as shown below (see Table
3.4) with 5 items making a ‘bundle’).
Table 3.4 Tally Sheet: Traffic Survey

Vehicles Tallies Frequency
Cars IIIII IIIII IIIII IIIII 40
IIIII IIIII IIIII IIIII

Lorries IIIII IIIII IIIII IIIII 23
III
Motorcycles IIIII II 7
Buses IIII 4
CLASS INTERVALS AND CLASS LIMITS

In Table 3.2, the first class is represented by 41 – 60.
This representation is called a class interval. The end
numbers, 41 and 60 are referred to as class limits; the
first number 41 is the lower class limit and the second
number 60 is the upper class limit. The terms class and
class interval are often used interchangeably, in the
context of frequency distribution.
37
In practice, it is common to meet a class interval, which
has neither upper class limit nor lower class limit. Such a
class interval is called an open class interval. Let us look
at Table 3.5 below on employees’ ages in Stem Motors.
Table 3.5: Age of Employees’ in Stem Motors

Age of Employees (in Frequency
years)
Under 20 50
20 but under 40 765
40 but under 60 150
60 and over 35
1000
Open – ended classes normally occur at the beginning or

end of a frequency distribution. This is true of the classes
“Under 20” and “60 and over” in Table 3.5.
One outstanding problem faced in using a frequency
distribution with open - ended classes is when needed to
carry out calculations, for instance in the computing of
class width and class midpoints.
In looking at Table 3.5, we are definite in mentioning the
lower class limit of the first class; we simply say it is less
than 20. Similarly we can say that the upper limit for the
lat class is 62 or 65 or 70. We simply know that it is well
above 60. The question is how can we find the class
widths of the first and last classes? In solution should
accommodate both theoretical and practical
considerations.
The theoretical viewpoint can make use of the widths of

adjacent class intervals. Turning to our example in Table
3.5, because the third class interval is 40 but under 60, it
could be argued that it would be logically acceptable to
38
make the last class interval 60 but under 80. There may
be practical consideration adopted in classification, such
as if the age of a school-leaver is 15 then the probability
is that the first class interval will be “15 but under 20”. In
addition, using the example under consideration, if Stem
Motors’ management policy retirement at 65, the upper
class would be 60 but under 65.
Decisions leading to the determination of open – ended

class intervals are matters of discretion, so it is
necessary to state (usually in a footnote or in the report)
the rationale behind the decision which has been taken.
If we record wages in Table 3.2 to the nearest dollar, the

class interval 41 – 60 theoretically includes all
measurements from 40.500 dollars. In this circumstance,
we refer to the numbers 40.5 and 60.5 as class
boundaries.
In empirical studies, we construct class boundaries by

adding the upper limit of one class interval to the lower
limit of the next higher – class interval and dividing by 2.
We advise that some caution be applied in the use of
class boundaries while we can use the class boundaries
40.5 – 60.5, 60.5 – 80.5, 80.5 – 100.5,… to represent
classes in Table 3.1, in order to avoid ambiguity with
such notation, class boundaries should not coincide with
actual observations. For instance, if an observation were
39
60.5, it would not be possible to decide whether it
belonged to the class interval 40.5 – 60.5 or 60.5 – 80.5.
3.2 THE SIZE OR WIDTH OF A CLASS INTERVAL

The size of a class interval is the difference between the
lower and the upper class boundaries. Class size is also
called class width or class length. In Table 3.2, the class
size for all the classes are the same and is equal to 60.5
– 40.5 = 20. Whenever we have all classes having equal
size, we simply denote this common width by c.
3.3 THE CLASS MARK

The class mark, also referred to as class midpoint is
obtained b adding the lower and upper class limits and
dividing by 2. For example, the class mark for the class
41 – 60 is (41 + 60)/2 = 50.5.
3.4 GENERAL RULES FOR FORMING FREQUENCY

DISTRIBUTIONS
1. find the largest and smallest numbers in the raw
data and use them to compute the range
(difference between largest and smallest
numbers).
2. Divide the range into convenient number of
classes having the same size. If this is not
possible, use class intervals of different sizes or
open class intervals. Spiegel and Stephen (1999)
suggest that the number of class intervals be
taken between 5 and 20, depending on the data.
40
Other ways have been devised to obtain the
number of classes and eventually the width of the
class. Sturges (1926) proposed that if the size of
the data is n, (where n = total of all frequencies),
the number of classes for the data set is roughly
given by 1 + 3.322(logn), where logarithm is to
base 10 and then the class width c , is
Some caution should be exercised in obtaining

class intervals. The first is that class intervals are
chosen so that the class marks coincide with the
observed data values. This tends to minimize the
error arising from grouping processes involved in
computing some measures from the data.
Secondly, class boundaries should not coincide
with actually observed data.
3. Count the number of observations falling into each

class interval’ i.e. find the class frequencies. T he
use of a tally or score sheet enables the
frequencies to be found easily.
3.5 RELATIVE FREQUENCY DISTRIBUTIONS
41
The relative frequency of a class is the frequency of that
class divided by the total frequency of all classes and is
usually expressed as a percentage.
As an example, the relative frequency of the class 101 –
120 in Table 3.2 is 620/1000 = 62%. The relative
frequencies of all classes sum to 1 or 100%.
Relative frequency distribution is obtained by replacing
class frequencies, by relative frequencies.The relative
frequency distribution means the same as percentage
distribution or relative frequency table
3.6 CUMULATIVE FREQUENCY DISTRIBUTIONS

The total frequency of all values less than the upper class
boundary of a given class interval is called the cumulative
frequency for that class. For instance, it can be seen from
Table 3.1 that the cumulative frequency up to including
the class interval 101 – 120 is 15 + 35 + 145 + 620 =
815, indicating that 81.5% of the employees earn
between $ 41 and $ 120 a week. We also observe that
91.5% of the employees earn between $ 81 and $ 140 a
week, with only a 3.5% of the employees earning beyond
$140 a week.
The 15 employees in the first class earn less than $ 61 a

week and are poorly paid compared with the rest while
35 employees in the last class earning $ 141 or more a
week are well paid.
From a table like this one, it is possible to obtain a
structure in the figures, which might be more difficult to
see by looking at a list of a thousand employees with
their respective wages.
A tabular arrangement showing classes and cumulative

frequencies is called a cumulative frequency distribution
42
or cumulative frequency table. A cumulative distribution
of wages in Table 3.2 is given in Table 3.6.
Table 3.6: Cumulative Frequecy Distribution for

Wages in Table 3.2 (on “less than basis”
Wages ($) Number of Employees
Less than 40.5 0
Less than 60.5 15
Less than 80.5 50
Less than 100.5 195
Less than 120.5 815
Less than 140.5 965
Less than 160.5 1000
For some reasons it is desirable to consider a cumulative

frequency distribution of all values greater than or equal
to the lower class boundary of each class interval. Using
Table 3.2, we would consider number of employees
earning $ 40.5 or more, earning 60.5 or more and
construct a cumulative frequency table using on “more
than” basis.
Table 3. 7: Cumulative Frequecy Distribution for Wages

in Table 3.2 (on “more than basis”)
Wages ($) Number of Employees

More than 40.5 1000
More than 60.5 985
43
More than 80.5 950
More than 100.5 805
More than 120.5 185
More than 140 .5 35
More than 160.5 0
Table 3.6 gives a “less than” cumulative frequency.

Whenever we refer t cumulative distributions without
qualification, the “less than” type is implied.
3.7 RELATIVE CUMULATIVE FREQUENCY

DISTRIBUTIONS
If we divide each cumulative frequency in Table 3.6, by
the total frequency, we obtain the relative cumulative
frequency for example, the cumulative frequencies 0, 15,
50, 195, 815, 965, 1000 result respectively in relative
cumulative frequencies 0, 1.5%, 50%, 19.5%, 81.5%,
96.5% and 100%.
Problem Set 3
1. State at least three essential factors that should be
incorporated in any type of data presentation.
2. Explain the following terms:
a) Raw Data
b) Array
c) Tabulation
44
d) Classification
e) Frequency Distribution Table
3. a) Arrange the numbers 18,46, 39, 29, 7, 49,
12, 58, 35, 23, in an array and
b) Determine the range.
4. The final marks in Aptitude Test of 80 candidates

for job interview with Stem Research Group are
recorded below;
67 85 74 82 68 89 62 88 94
77 86 64 67 82 73 84 54 61
74 79 88 73 60 93 71 60 75
62 65 75 87 74 63 95 78 73
65 77 82 75 95 77 69 74 60
95 77 89 61 75 96 60 79 71
80 62 68 97 79 85 76 65 76
65 80 73 57 88 78 62 76 74
86 68 74 81 73 63 73 76
With reference to this data set find;
a) the highest mark,

b) the lowest mark
c) the range,
d) the marks of the five highest ranking
candidates,
45
e) the marks of the five lowest ranking
candidates,
f) how many candidates scored marks of 65 or
higher,
g) how many candidates scored marks below 75
h) what percentage of candidates scored marks
higher than 75 but not higher than 95,
i) Which marks did not appear?
CHAPTER 4
GRAPHICAL TECHNIQUES
4.0 INTRODUCTION
Tables, reports and frequency distributions are
some of the basic methods of presenting raw data.
The next stage is to present data graphically and
diagrammatically to make the immediate impact,
illustrate the information and bring out the salient
points.
Diagrammatic or pictorial presentation falls into

two main categories.
i) Charts ii) graphs, including
frequency curves.
4.1 CHARTS
46
LINE CHARTS: These are the simplest diagrams
to represent a frequency distribution, where the
length of each line is proportional to the frequency.
Example 4.1: From the traffic survey in the last

chapter, the observations are reshown in Table
4.1: for a quick view.
Table 4.1: Traffic Survey

Vehicle Frequency
Cars 40
Lorries 23
Motor cycles 7
Buses 4
A line chart can be drawn to represent this data.

The length (or height) of each line represents the
frequency.
PIE CHART
A pie chart is a circle divided into sectors to
represent each item or variable. Each sector of
the circle should have an area proportional to the
quantity of the variable.
Example 4.2 :Table 4.2 below gives a brief annual

account of Uyi & Co for 1981/82 .Represent the
data in the table by a Pie Chart.
Table 4.2: Finance of Uyi & Co for 1981/82

Item Amount ($ ‘m)
Costs 20
Profit 10
Taxes 15
47
Solution: The sectors representing costs,
profits and taxes respectively will be
(20 x 360, 10 x 360, 15 x 360) = (1600, 800, 1200)
45 45 45
Pie charts are useful where there are a few items,

which make up proportions of a whole and where
the proportions are more important than numerical
values. For instant, share holders may be
interested in the proportion of profit than the actual
values; they want to know the size of their share of
the cake.
Pie charts are useful when:

1 There are few variables to be considered
2 These variables make up proportions of
one whole.
3 The interest of users in more in proportion
than numerical values
4 The aim is to provide a strong visual impact
Pie charts raise problems because:
1 They can involve long calculations (se

comparative pie charts)
2 They can not provide information on
absolute values unless figures are inserted
in each segment
3 The segments cannot be scaled against a
single axis as the values can on a bar chart.
48
4 To compare two totals the areas of the pie
charts should be in proportion to the totals
of the data.
COMPARATIVE PIE CHARTS

These are used to compare two sets of data,
usually two sets of the same items over time. The
areas of the circles must be in proportion to the
totals of the totals.
We can illustrate the use of comparative pie charts

by using Table 4.3
Table 4.3 Finances of Uyi & Co for 1980-82

1980/81 1981/82
($ ‘m) ($ ‘m)
Costs 20 20
Profit 2 10
Taxes 8 15
30 45
A pie chart is constructed for 1980/81 with a radius (r) of

say 2cm (it could have been 3cm or any other convenient
length). Therefore r2 = 4cm2. #30m is represented by a
circle of radius π (4)cm2. Using simple proportion # would
be represented by an area of (4 π /30) = 6πcm 2 .
The radius of this circle is given by the equation.
πr2 = 6 π
giving r = 2.4cm
49
Therefore the first pie chart will have a radius of 2cm and
the second pie char a radius of 2.4cm with the
appropriate segments for the three variables.
BAR CHARTS
THE SIMPLE BAR CHART
If we want to plot the output of a company against time,

bar chart may be one option. This chart makes
comparison between the years easy. The height of each
bar shows the total output for each year. This chart
should not be confused with the histogram. It is more like
a line chart than a histogram because it is the height of
each bar, which is important.
The height (or length) represents the data, the width and
areas of the bar are not important because they are not
drawn in proportion to any data as they are .
400
Total Output of 300

Company A () 50
200
2006 2007 2008
Fiqure 4.1:The Simple Bar Chart
Simple bar charts (for instance Figure 4.1) can be used

to illustrate only simple pieces of information but can
illustrate information clearly and provide reasonable and
an immediate visual impact. Figure 4.1 shows that the
total output of Company A has increased over the three
years but the rate of increase between 2006 and 2007 is
virtually the same as that between 2007 and 2008.
The bar chart may be drawn with horizontal bars (Figure

4.2) if this will improve visual impact of the chart. This is
quick in showing positive and negative changes of some
quantities over two consecutive periods.
-5 -4 -3 -2 -1 0 1 2 3 4 5
Chemicals
51
Construction
Machines
Textiles
Figure 4.2:Horizontal Bar Chart: Changes in Output over

Some Years.
THE COMPOUND BAR CHART
This form of bar chart is useful for comparing a number of

items within, say a year, as well as comparing the items
between years (see Figure 4.3), the compound bar chart
is sometimes called “multiple bar charts”.
52
53
2005 2006 2007
Figure 4..3: The Compound Bar Chart: Output for an
Electrical Company.
COMPONENT BAR CHARTS
These are useful to show the division of the whole of an

item into its constituent parts. This is particularly
necessary when variables say in one year, increases.
When this happens, we have bars for one year rising in
number and so taking much space. When the bars are
three (as in Figure 4.3), we can handle comfortably, but
when they go on to four or more, we have problem of
space. Figure 4.4 shows the output of goods for an
electrical company over three years.
2005
54
2006
2007
Television
Radios
Cell phones
Figure 4.4: The Component Bar Chart: Output for an
Electrical Company.
This type of chart enables a comparison of the total

output to be made easily, while the compound bar chart
emphasizes the comparisons between items.
HISTOGRAMS
The histogram is an extension of the line chart.
A histogram or frequency histogram consists of a set of
rectangles having
1. Bases on a horizontal axis 9the x –

axis 0 with centres at the class marks and lengths
equal to the class interval sizes
2. Areas proportional to class
frequencies. If the class intervals all have equal
size, the heights of the rectangles are proportional
to the class frequencies and it is then customary to
take the heights numerically equal to class
frequencies
STEM AND LEAF PLOT
55
In the course of constructing histogram we first of all
construct a frequency distribution table so that a large
data set can be put into a more manageable form. This
without doubt involves some loss of information. Over
the years similar techniques have been proposed and
found to provide a useful and quickly formed pictorial
representation of the data set. This representation is
quite adequate for preliminary exploratory purposes of
small data sets without any loss of the numerical values
of the variable involved.
One such representation that meets the conditions

outlined above is the stem and leaf plot invented by J.W.
Tukey in 1977.
For illustration purpose consider the following

experimental results rounded to the nearest whole.
28 45 13 54 22 35 38 26 47
23 18 25 27 33 35 16 46 22
29 39
We might group these data into the following distribution.
Experimental
Results Tally Frequency
10-19 III 3
20-29 IIIII III 8
30-39 IIIII 5│
40-49 III 3
50-59 I 1
Where the tally pictures the overall pattern of the data
like a histogram (or bar chart) lying on its side. By
56
considering Problem Set, problem 6 this notion shall be
revived for elaboration to enhance our understanding of
the similarity between stem – leaf plot and the histogram.
If we want to avoid the loss of information inherent in the

above table, we could replace the tally marks with the
last digits of the corresponding readings, getting
10-19 3 8 6
20-29 8 2 6 3 5 7 2 9
30-39 5 8 3 5 9
40-49 5 7 6
50-59 4
This can also be written as

1* 2 7 5
2* 8 2 6 3 5 7 2 9
3* 5 8 3 5 9
4* 5 7 6
5* 4
Where * is a placeholder for 0,1,2,3,4,5,6,7,8,or 9, or

simply as
1 3 8 6
2 8 2 6 3 5 7 2 9
3 5 8 3 5 9
4 5 7 6
5 4
In either of these final forms. The table is called a stem –

and – leaf plot ( or simply a stem – leaf plot)- each line is
57
a stem and each digit on a stem to the right of the vertical
line is a leaf. To the left of the vertical line are stem
labels, which in our example, are 1*, 2*… and 5* , or 1,
2…..and 5,
Essentially, a stem – and – leaf plot presents the same
picture as the corresponding tally, yet it retains all the
original information. For instance, if a stem-and-leaf plot
has the stem
1.2* │ 5 6 3 0 7
The corresponding data are 1.25, 1.26, 1.23, 1.20 and
1.27 If a stem-and-leaf plot has the stem
0.3** │18 05 66 79
With two-digit leaves, the corresponding data are 0.318,
0.305, 0.366 and 0.379.
There are various examples of stem and leaf plot and a

lot of ways have been suggested to circumvent their
potential difficulties in order to meet particular needs.
However, we shall not delve into this here in any detail
since our objective has been to present one of the
relatively new techniques which come under the general
heading of exploratory data analysis.
4.2 GRAPHS
We shall consider a set of graphs that are connected with
frequencies in one way or the other. Some of them are
frequency polygon and commutative frequency polygon
or ogive.
FREQUENCY POLYGON
58
A frequency polygon is a line graph of class frequency
plotted against class mark. It can be obtained by
connecting midpoints of the tops of the rectangles in the
histogram.
CUMULATIVE FREQUENCY POYGON

A commutative frequency on the basis of “less than any
upper class boundary” plotted against the upper class
boundary is called a cumulative frequency polygon.
59
Problem Set 4
1. Draw a line Chart for data in Table 4.1.

2. Draw a simple Pie Chart for data in Table 4.2.
3. Draw comparative pie Charts for data in Table 4.3.
4. Using data of Problem Set 3 No. 4 to construct

(a) a histogram
(b) a frequency polygon
(c) a percentage cumulative frequency ogive.
5. What is the basic difference between a

histogram and a bar chart.
6. For the data below
83,105,108,110,110,110,113,116,123,123,123,123 122,
126,126,126,126,126,126,130,134,134,134,141,141,
141, 145,155, 160,164 170,176,176,180,180, 180, 207
(a) Group the measurement into a distribution having

classes 80 – 89, 90 – 99…. 200 – 209 and
construct a histogram.
(b) Construct a stem – and leaf plot for the data
(c) Rotate your plot in (b) by 90º so that the stem 8/2
is the first and 20/7 last
(d) Compare (a) and (c) and comment.
60
CHAPTER 5
61
NUMERICAL TECHNIQUES
5.0 INTRODUCTION
In this chapter, we shall treat some summary

measures, which broadly speaking fall into two
classes. One is called measures of central tendency
and the other is called measures of dispersion. We
shall also discuss methods for computing these
measures.
5.1 THE MEANING AND ROLE OF THE AVERAGE
Measure of central tendency are also referred to as

measures of location. One word for any measure of
central tendency is the average. As a measure of central
tendency an average provides a value around which a
set of data is located. Average is also called a measure
of location because it gives an idea of where the values
are in the data set. An average price can give an
indication of whether a particular commodity is likely to
cost £10 or £100 or £1000.
An average summarizes a set of values and represents it

in the that the average gives an immediate idea about
the group. An average can provide a description of a
group of items so as to differentiate it from another set
with similar characteristics.
Therefore an average gives a concise description of a

data set, and because of this, there are a number of
averages, which can be used depending on the type of
description required. The three most commonly used
62
averages are the arithmetic mean, the median and the
mode.
Averages are used all the time in everyday life and work.
The statement that `inflation has risen by 5% in the last
year` is usually qualified by the word average``. The
average consumer is said to buy more this than last
year.,, the average temperature has risen or fallen. It is
also common to tal of average height of people… the
average price of electrical goods… and so on.
In general averages can be said to:
1 Summarize a set of numbers, smoothing out

abnormalities in a way that is helpful in making
comparison.
Example 5.1
10 candidates scored an average mark of 72 in Lagos
centre in an aptitude test for top positions in a certain
company. At Abuja, the average score for 10 candidates
was 82. The overall average is 77 and each candidate’s
score can be compared to this value.
Example 5.2
Two firms A and B may pay very widely varying wages to
similar types of employees. In firm A, wages may vary
between N450 a week and N1250: in firm B, wages may
vary between N850 and N1500. Both firms may have an
average wage of N1000 for this type of employee..
2 An average can give a mental picture of the

distribution it represents
63
Example 5.3
It may be recorded that a shop that is for sale has
average weekly sales of N200000. This provides
an immediate idea of the size of the business,
although sales might have been N700000 in
Christmas week and N80000 in the worst week.
3 An average can provide valuable knowledge about

the whole distribution.
Example 5.4
If the average wage in a factory is N5000 and there are
3000 employees, then it can be deduced that the weekly
wage bill is N15000000.
4 The word `average` is common in mutual interactions

daily and at times is used loosely.
Example 5.5
The statement ` I think that on average I use about 30
litres of petrol a week`, is using the average as an
estimate.
5 Average can conceal vital information.
Example 5.6
Two companies A and B may both have average annual
profits of £ 5000,000 over the last five years. However,
when their actual records are inspected it may be found
that the companies’ performances are very different.
2001 2002 2003 2004 2005
64
CompanyA: 11000000 8000000 3500000 2000000 500000
Company B 500000 2000000 3500000 8000000 11000000
In the same way, it is important to know not only the

average but also other figures such as the minimum and
maximum. An engineer designing an irrigation project
must know not only the average rainfall of the area, but
also the maximum.
6 Therefore averages can provide useful numerical

hints at the first stage of an investigation, but do
not provide all the information required for many
purposes.
5.2 SOME MEASURES OF LOCATION
THE ARITHMETIC MEAN
The arithmetic mean or the mean of a set of numbers X 1,

X2 …, X10 , , denoted by is defined as
= (5.1)
Example 5.7: Find the arithmetic mean of the numbers

15, 17, 18, 20, 30.
Solution
The arithmetic mean of the numbers 15, 17, 18, 20, 30 is
= = = 20
65
THE ARITHMETIC MEAN OF A FREQUENCY
DISTRIBUTION
If the numbers X1 , X2 …, XK occur respectively with

frequencies f1, f2 …, fk , the arithmetic mean is
= (5.2)
.where n = and k is the number of classes.

If we have class intervals, X i (i =1, 2,…, k) is the
classmark of class i.
Example 5.8:
Find the arithmetic mean for the price of transistor radios
in Table 5.1.
Table 5.1 Price of Transistor Radios
Price (N) Number

2000 2
2400 6
2500 10
66
300 4
3200 3
Solution:
Here, n = ∑f = 2 + 6 + 10 + 4 + 3 = 25
∑fX = 2 (2000)+6(2400)+10(2500) + 4 (3000) + 3(3200).

= 2400 + 14400 + 25000 + 12000 + 9600
= 65000
= ∑fX / ∑f = 65000 / 25 = N2600
The result shows that although there are some transistor

radios priced at over N3000 and others at N2000, the
average price for this selection is N2600.
If this was a random sample of all transistor radios then
the consumer would know that N2600 was likely to be a
representative price for this type of radio.
The mean can also be computed using the formula
= A + (5.3)
.where di = Xi – A and A is any guessed or assumed

mean. We can use any numerical value for the number
A.
THE ARITHMETIC MEAN OF A GROUPED

FREQUENCY DISTRBUTION
67
For a grouped frequency distribution, all values falling
within a given class interval are considered as coincident
with the class mark of that interval. Formulae (5. 3) and
( 5.4) are suitable for such grouped data if we interpret X i
as the class mark or midpoint of the interval with the
corresponding class frequency Xi. The number A is any
guessed or assumed class mark, and di = Xi – A are the
deviations of Xi from A.
If class intervals all have equal size c, the deviation d i =

Xi – A can all be expressed as cui where ui can be
positive or negative integers or zero, i.e. 0, ± 1, ± 2, ± 3,
…, and formula (5.3) becomes
= A + (5.4)
.where
cui = di = Xi – A (5.5)
Formula (5.4) is called the coding method for computing
the mean. It is a very short method and should be used
always for grouped data with equal class interval sizes.
THE MEDIAN
The median of a set numbers arranged in order of

magnitude is the middle value or the arithmetic mean of
the two middle values.
Example 5.9 The set of number 60,70,100, 115 and 320

has median 100
68
Example 5.10: The set of number 5, 5, 7, 9, 11, 12, 15,
18 has median ½ (9 + 11) = 10
THE MEDIAN OF A GROUPED FREQUENCY

DISTRIBUTION
For grouped data the median, obtained by interpolation,

is given by
XMED = L0 + (5.6)
.where L0 = lower class boundary of the median class ,

n = number of items in the data (i.e total frequency)
= sum of frequencies of all classes lower than
the median class.
= frequency of median class,
c = width of median class interval
The median is the value on the X-axis corresponding to

50% point on the y-axis in a percentage cumulative
frequency polygon curve. The mode may not exist, and
even if it does exist it may not be unique.
Example 5.11: The set 2, 3, 3, 4, 7, 8, 8, 8, 10, 11, 12,

19 has mode 8.
Example 5.12: The set 2, 3, 4, 6, 7, 9 has no mode.
69
Example 5.13: The set 20, 30, 40, 40, 40, 50, 70, 70, 95
has two modes 40 and 70 and is called a bimodal
distribution.
A distribution having only one mode is called a unimodal

distribution.
THE MODE OF GROUPED FREQUENCY DISTRIBUTION
For a grouped frequency data , the mode is the value (or

values) of X corresponding to the maximum point (or
points) on the frequency curve. This value of X may be
denoted by XMODE.
From a frequency distribution or histogram the mode can

be computed using the formula
XMODE = L0 + (5.7)
.where L0 = lower class boundary of the modal class
∆1 = excess of model frequency over frequency of

next lower class
∆2 = excess of model frequency over frequency of
next higher class.
c = with of modal class interval.
EMPIRICAL RELATION BETWEEN MEAN, MEDIAN

AND MODE
For unimodal frequency curves, with moderate skewness

(asymmetry), we have the empirical relation
70
Mean – Mode = 3 (Mean - Median) (5.8)
THE GEOMETRIC MEAN G
The geometric mean G of a set of numbers is the nth root

of the product of the numbers. That is
G = (X1X2 ,…, Xn) (5.9)
Example 5.14 The geometric mean of the numbers 2, 3,

5, is
G= ((2) (3) (5)) = 30 = 3.11
In practice, G is computed by using logarithms and

Equation (5.8) becomes
logG = { logX1 +logX2+…+logXn } (5.10)
THE HARMONIC MEAN H
The harmonic mean H of a set of n numbers X 1, X2,….Xn

is the reciprocal of the arithmetic mean of the reciprocals
of the numbers.
(5.11)
In practice it may be easier to remember that
71
Example 5.15. The harmonic mean of the numbers
2, 3, 5, is
H = = 2.90
RELATION BETWEEN ARITHMETIC, GEOMETRIC

AND HARMONIC MEANS
The geometric mean of a set of positive numbers X 1 , X2

…, Xn is less than or equal to their arithmetic mean but is
greater than or equal to their harmonic mean. In symbols
H ≤G≤X (5.12)
The equality signs hold only if all the numbers X 1 , X2 …,

Xn are identical
Example 5.16 The set 2, 3, 5, has arithmetic mean 3.67,

geometric mean 3.11 and harmonic mean 2.90 clearly
2.90 < 3.11 < 3.67 verifying the inequality (5.12).
QUARTILES, DECILES AND PERCENTILES

The number that divides a set of data that is
arranged in order of magnitude into two equal parts is
called the median. By extending this idea we can think of
those values which divide the set into n equal parts.
72
When n is equal to 4, we have the quartiles. These
values, denoted by Q1, Q2, and Q3, are called the first,
second and third quartiles respectively. The second
quartile, Q2 is equal to the median.
Similarly the values which divide the data into ten
equal parts are called deciles and are denoted by D 1, D2,
…..D9, while the percentiles divide the data into one
hundred equal parts. Percentiles are denoted by P1, P2,…
P99. The 25th and 75th percentiles correspond to the first
and third quartiles respectively ,while the 5 th decile and
the 50th percentile correspond to the median.
Collectively, quartiles, deciles, percentiles and other

values obtained by equal subdivisions of the data are
called measures of partition. One word for measure of
partition is quantiles.
5.3 MEASURES OF DISPERSION
Data can be summarized and compared by means of

averagse because they can represent a distribution and
provide an indication of central tendency. Also, data can
be summarized and compared by measures of
dispersion, which are also known as measures of
variation or measures of spread.
The extend to which numerical data tend to spread

about an average value is called the variation or
dispersion of the data.
If items are widely spread out , averages do not

provide a clear summary of the distribution; they do not
give an indication of the form or shape of a distribution.
73
Data values are not only clustered around a central point,
but also spread out around it.
Various measures of spread are available, the most

common being the range, mean deviation, semi-
interquartile range, 10-90 percentile range, and the
standard deviation.
THE RANGE
The range of a set of numbers is the difference between

the largest and smallest numbers in the set.
Example 5.17, the range of the set 3, 4, 2, 5, 5, 9, 7, 8 is

9-2 = 7. Sometimes, the range is given by simply quoting
the smallest and largest numbers.
In the above example, for instance, the range could be

indicated as 2 to 9 or 2-9.
THE MEAN DEVIATION, OR AVERAGE DEVIATION

The mean deviation, of the set X1 , X2 …, Xn , denoted
by MD is given by
MD = (5.13)
Example 5.18, Find the mean deviation of the set 1, 3,

5,7,9
Arithmetic mean , = ( 1 + 3 + 5 + 7 + 9)/5 = 5
74
Mean Deviation , MD =
= (4 + 2 +0 + 2 + 4 )/5 = 12/5 = 2.4
If X1 , X2 …, Xk occur with frequency f1 , f2 …, fk

respectively, the mean deviation MD can be written as
MD = (5.14)
.where
n=
This form is useful for grouped data where the X 1 , X2 …,

Xk represent class marks with f 1 , f2 …, fk as
corresponding class frequencies.
THE SEMI-INTERQUARTILE RANGE OR QUARTILE

DEVIATION
The interquartile range or quartile deviation, denoted by

Q of a set of data is defined by
Q = ( Q3 - Q1)/2 (5.15)
THE STANDARD DEVIATION

75
The standard deviation of a set of n number X 1 , X2 …, Xn
is denoted by s and defined by
s = (5.16)
If X1 , X2 …, Xk occur with frequencies f1 , f2 …, fk

respectively, the standard deviation can be written as
s = (5.17)
.where n = . In this form it is useful for grouped

data.
Sometimes we use n instead of (n -1) in Equations (5.16)

and (5.17). The denominator (n-1) offers a better
estimate of the standard deviation of the sample taken
from a population. For large values of n certainly ( n> 30)
there is essentially no difference between the two
definitions. However, whenever we want to calculate
population standard deviation, we use the divisor n
instead of (n -1).
76
VARIANCE
The variance of a data set is defined as the square of the

standard deviation and is thus given by s 2 in (5.16) and
(5.17)
When it is necessary to differentiate the standard

deviation of a population from the standard deviation of a
sample drawn from this population, we adopt the notation
s for the latter and for the former. Thus the symbols
2 2
s and σ would denote the sample variance and
population variance respectively.
COEFFICIENT OF VARIATION
The actual variation or dispersion as determined from the

standard deviation or other measure of dispersion is
called the absolute dispersion. However, a variation or
dispersion of 1 metre in measuring a distance of 2000
metres is quite different in effect from the same variation
of 1 metre in a distance of 200 metre. A measure of this
effect is furnished by the relative dispersion defined by
Relative dispersion = (5.18)
If the absolute dispersion is the standard deviations and

the average is the mean , the relative dispersion is
called the coefficient of variation or coefficient of
dispersion defined by
77
Coefficient of Variation = (5.19)
and is generally expressed as a percentage.
The coefficient of variation is not dependent on choice of

units. Indeed it can be used to compare the dispersion of
distributions expressed in different units e.g. the daily
output of two factories producing different commodities.
One expressed in tons and the other in litres. One major
drawback in the use of this coefficient of variation is that
it is not useful when X is close to zero.
STANDARD SCORE
The variable (5.20)
is called a standardized variable. It is a dimensionless
quantity (i.e. is independent of units used).
5.4 MOMENTS, SKEWNESS AND KURTOSIS
MOMENTS
If X1, X2,…. Xn are the n values assumed by the variable
X, the quantity
Xr = X1r + Xr2 +…+ Xrn = (5.21)
is called the rth moment of X. The first moment with r = 1

if the arithmetic mean X.
The rth moment about the mean is defined as
78
Mr = (5.22)
The moment as defied by (5.22) is called the central

moment. The moments about zero as in (5.21) are
usually called simply moments (or non- central
moments).
Moment can also be defined about any origin, say A and

given as
= (5.23)
Moments for grouped data are similarly defined with X i

being the class mark with frequency f i (i = 1, 2,
……………..k), k being the total number of classes.
MOMENTS IN DIMENSIONLESS FORM
Dimensionless moments about the mean can be defined

as
(5.24)
where is the standard deviation. Since M 1 = 0

2
and M2 = s , we have A1 = 0, A2 = 1
79
SKEWNESS
One of the ways to look at the shape of a distribution
is the extend to which it departs from symmetry.
Skewness of a data set is the degree of departure from
symmetry. We have two kinds of skewness for any
unimodal distribution- a unimodal distribution is either
skewed to the right (positive skewness) or skewed to the
left (negative skewness).
There are many measures of skewness. Some of
these are as follows:
1. Pearson’s first coefficient of skewness 1 ,given by
(5.25)
2. Pearson’s second coefficient of skewness 2
(5.26)
Equation (5.26) is obtained by applying the empirical

rule
Mean- Mode = 3(Mean – Median)
That is
(5.27)
is used in Equation (5.25) to obtain Equation (5.26)
3. Third moment coefficient of skewness 3 ,
(5.28)
80
In order to determine positive and negative skewness,
it is common to use 2 as defined in Equation (5.26).
When mean > median, 2 > 0 and we have positive
skewness. When mean < median, 2 < 0 and we have
negative skewness. Positive and negative skewness are
illustrated in Figure 5.1 and Figure 5.2
Figure 5.1: Positively Skewed Distribution
For perfectly symmetrical curves, such as the normal

distribution , 2 is zero.
Figure 5.2: Negatively Skewed Distribution
KURTOSIS
Kurtosis of a distribution is the extend of its of
peakedness . We have basically, three kinds of
peakedness. They are
1. High (leptokurtic) peak

2.Flat-topped (platykurtic) peak
3. Moderately High and Moderately Flat-topped
(mesokurtic) peak
81
(a) Leptokurtic Peak (b) Platykurtic Peak
(c) Mesokurtic Peak

Figure 5.3: Different Kinds of Peak
Different kinds of peak are illustrated in Figure 5.3
One measure of kurtosis based on the fourth moment

about the mean and expressed in dimensionless form is
given by.
Moment Coefficient of Kurtosis = (5.29)
If the distribution is normal, = 3. This explains why

kurtosis is sometimes defined by ( – 3) which is
positive for a leptokurtic distribution, negative for a
82
platykurtic distribution and zero for the normal
distribution.
Problem Set 5
1. The table below shows a frequency distribution of

the monthly wages in pounds sterling of 65
employees at Stem Motors.
With reference to this table

(a) Construct a histogram fro the frequency distribution
Wages Number of Employees
50.00 – 59.00 8
60.00 – 69.99 10
70.00 – 79.99 16
80.00 – 89.99 14
90.00 – 99.99 10
100.00 – 109.99 5
110.00 – 119.99 2
(b) Plot the cumulative frequency curve for the table
(c) Calculate the mean, median and mode of the
distribution
(d) Calculate the quartiles of the distribution
83
(e) Calculate the standard deviation for the
distribution.
2. Tonye has a batting average of 32 with a standard
deviation of 13. Koye has an average of 47 with a
standard deviation of 18. Which is the more
consistent player?
3. Find Pearson’s (a) first and (b) second coefficient
of skewness for the distribution of Table 5.2
4. Find
(a) moment coefficient of skewness
(b) moment coefficient of kurtosis for the
distribution of scores of students in Biology test
62 92 90 85 53
64 87 80 77 58
95 90 64 60 70
86 70 67 74 86
5. Find the
(a) Harmonic mean
(b) Geometric mean
(c) Arithmetic mean for the data of question 4 and
comment on your result.
6. Find the mean, mode and median for the data in
Table 3.2 and comment on your results.
7. Find the 1st and 3rd quartiles for data in Table 3.2
84
CHAPTER 6
PROBABILITY
6.0 INTRODUCTION
In most areas of human endeavor and in statistics in

particular we are faced with making decisions and this
involves element of uncertainty and degree of risk. Since
decisions are made most of the time under uncertainty
we need to know about the level of uncertainty or risk
that is likely to be involved. Put alternatively we need to
have an idea of the chance of a particular event
occurring. The notion of “likelihood of occurring” will
enable us make a decision incorporating both the sample
statistic under consideration and likelihood of relevant
occurrence.
That is, if for instance the mean weight of students in the

faculty of management science Rivers State University of
Science and Technology Port Harcourt is 65.1kg, what is
the level of confidence or certainty with which we can say
“ the mean weight of the students is 65.1kg?.
The answer to this question is supplied by probability. If

5 percent error is allowed, the probability is 95 percent
that the estimated mean is 65.1kg. At other times we
may want to know for instance, the chance that it will rain
tomorrow so that some adjustment could be made or if
some outdoor function could be planned for or not. From
the discussion above and real life situations we see
probability as a powerful too useful
85
1. In measuring the likelihood of occurrence of an
event.
2. In measuring the level of uncertainty or risk we are
likely to encounter in decision making and
3. in bridging the gap between descriptive statistics
and inferential statistics, thereby making us
comfortable enough to handle decision issues
both in theory and practice.
6.1 SOME SELECTED TERMS ASSOCIATED WITH

PROBABILITY
An Experiment: This is the process of observing

something happen under known or given conditions
leading to some final “result” that is listed or recorded.
A Random Experiment: This is a statistical process

which can be repeated and in a single trial of which the
outcome is not known in advance. That is, a random
experiment is one whose outcome cannot be determined
in advance. Tossing a fair coin and rolling a fair die are
some examples of a random experiment.
Sample Space: The set of all possible outcomes of

an experiment is called the sample space. It is normally
denoted by Ω.
Event: An event is a specific collection of

outcomes or sample points. In short, an event E is a
subset of the sample space Ω. Also, a point in Ω is
called an elementary or simple event. This type of event
is characterized by the fact that it cannot be broken
further.
86
Example 6.1 Flip a fair coin once. Construct the sample
space. Enumerate the simple (elementary ) events of the
experiment.
Solution: If we denote tail and head respectively by T

and H, the sample space Ω is given by
Ω = {H, T}
The simple events are only two and are E1 = { H } and

E2 = { T }
Example 6.2 Toss a fair die once. What is the sample
space Ω?
Solution: The sample space Ω is

Ω = {1, 2, 3, 4, 5, 6}
Example 6.3 Toss two fair coins once. What is the

sample space Ω ? Enumerate all the elementary(simple)
events.

Ω = { (H,H), (H,T), (T,H) (T, T)}
The simple events are
E1 = {(H,H)}, E2 = {(H, T)}, E3 = { (T,H)} and
E4 = {(T, T)}
A Random Variable: If a variable assumes values

as a result of a random experiment or process, then it is
called a random variable.
Example 6.4: The number of heads X in the random
experiment of Example 6.3 is an Example of a random
variable. Find the value of X corresponding to the
outcomes (simple events) E1, E2, E3, and E4 .
87
Solution: The values of X corresponding to the
outcomes E1, E2, E3, and E4 are 2, 1, 1 and 0.
6.2 THE DEFINITION OF PROBABILITY
There are many approaches to the definition of the word

probability. We shall consider four of these approaches
and they are the classical, empirical, subjective and
axiomatic approaches.
CLASSICAL OR A PRIORI PROBABILITY

The classical definition is traceable to the close
association of probability to games of chance in the
seventeenth century. Games of chance include throwing
a die, tossing a coin and drawing card. For instance,
without experiment, it is assumed that since a fair coin
has two sides, probability of a head is one out of two and
probability of a tail is the same. This similar argument is
extended to a fair die when it is cast.The probability of a
number facing up in the case of a fair die is one out of
six.
If an event A can occur in m ways out of a total of

n possible equally likely ways, then the probability
of occurrence of the event (also called its success)
denoted by P(A) is
P(A) = m/n
The probability of non-occurrence of the event (called its
failure) denoted by P(Ac) = 1- m/n
Sometimes we denote P(A) and P (A c) respectively by p

and q. We observe that
p+q=1
88
Example 6.5: Let a fair die be tossed once. What
is the probability that an even number turns up?
Ω = {1, 2, 3, 4, 5, 6} ,
giving three even numbers 2, 4, 6

Therefore the probability that an even number turns up is
equal to
= 3/6 = ½
EMPIRICAL OR A POSTERIORI PROBABILITY
It states that the probability of an event A is the

relative frequency of occurrence of the event when
the number of observations is very large.
Example 6.6 The first 6000 tosses of a fair die result in

1092 “3” . Further 6000 tosses of the same die
result in another 1006 “3”
4. Calculate the probability of a “3” showing up
(a) For the first 6000 tosses
(b) For the total of 12000 tosses
(c) Comment on your results in (a) and (b)
Solution
(a) The probability of a “3”, showing up in 6000 tosses
of the die is equal to the relative frequency in the
89
tossing experiment. Hence the required
probability is
1092/6000 = 0.182
(b) Probability of a 3 in 12000 tosses
= (1092 + 1006) /12000

= 0.174
(c) The relative frequency of a 3 showing up

approaches 1/6 = 0.167, as the number of trials
get larger. This is why the relative frequency in (b)
of 0.174 is closer to 0.167 then 0.182 in (a). The
empirical probability is also called relative
frequency probability.
SUBJECTIVE PROBABILITY
Subjective probability is one that is based on the
personal belief or feelings of the person who
assigns the probability and is useful in dealing with
events that cannot be repeated and as such, are
generally cannot be given frequency
interpretation.This probability is also applicable
when there is little or no past information or
experience. Suppose we are to choose one out of
four candidates as a professor for a chair in
environmental physics. All four have good
personality, good communication skill, highly
knowledgeable in their areas as seen during oral
interview and equally impressive track records
over the years. The chances that each of them
will be a good professor may require assigning a
subjective probability that could be nothing more
90
than a good guess. For instance “what is the
probability that it will rain tomorrow”? One person
may say the answer is 0.5 and another may
believe that it is 0.7; all estimates emanating from
personal feelings.
This method of assigning probability enjoys the

greatest flexibility and speed; and so is most
widely applied in business decisions since
decision makes use of available evidence
tempered with their personal feelings in arriving at
probability estimates.
AXIOMATIC PROBABILITY
The classical definition involves using the
expression “equally likely” which is the same as
“equally probable”. This makes us guilty of
circular reasoning. There is also vagueness in the
use of the word “large”. The subjective approach
is equally not reliable since it is based on personal
feelings of those assigning probabilities. Due to
these drawbacks mathematicians and other
researchers have resorted to the axiomatic
approach as the only philosophically satisfactory
way to define probability. In the axiomatic
approach we simply state what probability is by
enumerating the rules (axioms) that it follows. We
state axiomatic probability as follows:
Probability is a function P(.) which depends on event
(set). A such that the following axioms hold.
1. 0 ≤ P(A) ≤ 1
2. P(Ω) = 1
91
3. P(A1 U A2 U….) = P(A1) + (A2)+... where A1, A2,
…,. are mutually exclusive events and Ω is the
sample space.
Apart from being purged from the snags found in other

approaches, the axiomatic approach leads to theorems
that are useful in dealing with practical problems. Some
of these problems involve addition and / or multiplication
of probabilities. We shall consider worked examples
along with some theorems and definitions that spring
from axiomatic probability. These are found in the next
three sections.
6.3 ALGEBRA OF EVENTS Sets together with certain

operations on them constitute algebra of sets. Since
events are sets, set algebra also apply to events. In
essence, operations on event(s) give rise to new
event(s). We define algebra of events together with
operations on them resulting in new event(s). Below are
few operations on events.
UNION: The union of two events A and B written, as

AUB and read “A or B” is the event consisting of all
outcomes which belong to either A or B or in both A and
B.
INTERSECTION: The set of all outcomes which are

commonly found in both events A and B is the
intersection of A and B. The intersection is denoted by
AnB and called “A and B”
RELATIVE DIFFERENCE: The event consisting of all

outcomes of A which do not belong to B is called the
difference of A and B and denoted by A-B.
92
COMPLEMENT: The complement of an event A
denoted by A’ (or Ac) is the set of all outcomes in the
sample space Ω which are not contained in A
Example 6.7: Toss two fair coins once. Let A be

the event” at least one head occur” and B the event
“there is a tail on the second space Ω and deduce the
events AUB, AnB, A and A-B.
Solution. Ω = {(HH), (HT), (TH), (TT)}

A = {(HH), (TH), (HT)} and
B = {(HT), (TT)}
AUB = {(HH), (HT), (TH), (TT))}
which is equal to Ω
AnB = {HT}
A = {TT}
Now B” = {(HH}, (TH)}
So that
A-B = AnB’ = {(HH), (TH)}
6.4 LAWS OF ALGEBRA OF EVENTS
Since events are sets, as we have seen already,

the different laws of set algebra (commutative,
associative, distributive, idempotent laws, etc) are
also applicable to them.
6.5 VENN DIAGRAM (EULER DIAGRAM)
This diagram furnishes the user with a geometric

intuition in looking at events and the result of their
involvement in set operations. In Venn diagram,
93
we use a rectangle to represent the universe of
outcomes”, which in this case is the sample
space .Ω Events, which are themselves subsets of
Ω are represented by the interior of closed curve
(which is often a circle) contained in Ω. An
outcome (sample point) is represented by a point
in the rectangle.
6.6 TYPES OF EVENTS
SURE OR CERTAIN EVENT: This is an event

whose probability of occurrence is one. The event
“at least one human death in the world for the past
two days”. This is a sure event since at least one
person dies daily in the world. The sample space
is another example of a sure event.
IMPOSSIBLE EVENT: This is an event whose

probability of occurrence is zero. The probability
that a university professor in Nigeria is 150 years
is zero. The empty set Ø is an example of an
impossible event. Ø is pronounced “phi”.
MUTUALLY EXCLUSIVE EVENTS
Events A and B are said to be mutually exclusive if

they cannot take place at the same time. In
essence, mutually exclusive events have no
sample points in common. An event A and its
complement Ac are mutually exclusive.
94
Example 6.8. In the experiment of tossing
two fair die once, what is the probability of
obtaining either sum of 9 or a sum of 11?
Solution: Let A be the event “sum is 9” and B

the event “Sum is 11”. Then
A = {(6,3), (5,4), (4,5), (3,6)} and
B = {(6,5), (5,6)}
Clearly, the events A and B have no sample points
in common, hence they are mutually exclusive.
COMPOSITE EVENTS: While a simple event (as

already stated) has only one sample point, a composite
event has two or more sample points.
Example 6.9 In an experiment of flipping two fair coins

once.
(a) State the simple events, and

(b) Give any two examples of a composite
event.
Solution. (a) Ω = {(HH), (HT), (TH),(TT)}
The simple events are:
E1 = {HH} E2 = {HT}, E3 = {TH} and E4 = {TT}

(b) C1 = {(HH), (HT), (TH) and
C2 = {(HT), (TH)} are some examples of
composite events.
6.7 ADDITION RULE:
Let A and B be events, the general addition rule states

that
95
P (AUB) = P(A) + P(B) – P (AnB).
However, when the events A and B are mutually
exclusive, AnB = Ø and accordingly
P(AnB) = 0 making the addition rule to take the form.

P(AUB) = P(A) + P(B)
Example 6.10 Consider the two events
A: A receptionist spends at least N2000 for food in

the month
B: A receptionist spends at least N600 for lunch in a
month
From past experiment, we have that.
P(A)= 0.46 and P (B) = 0.52
Suppose the probability that a receptionist spends at

least N2000 for food and N600 for lunch is 0.22. What is
the probability that a receptionist spends at least N600
for lunch or at least N2000 for food in the month.
Solution: The reported probability is
P(AUB) = P(A) + P(B) – P(AnB)

= 0.46 + 052 – 0.22
= 0.76
Example 6.11.Using data of Example 6.8, find the

probability that the sum is either 9 or 11.
Solution: Recall that
A = {(6,3) (5,4), (4,5) (3,6)} and

B = {(6,5), (5,6)}
96
Since A and B are mutually exclusive
P(AUB) = P(A) + P(B)

= 4/36 + 2/36
= 6/36 = 1/6
6.8 CONDITIONAL PROBABILITY
CONDITIONAL PROBABILITY: Let A and B be events

such that P(A) >0. The probability of B given that A has
occurred, denoted by P(B ) is called the conditional
probability of B given A. This probability is given by
Example 6.12 Suppose a sample of 2000 persons

reveals that 140 of them are bankers and 1040 are
females. Let B be the event “person is a banker”. Let A
be the event “a person is a female”. Given that 80
females are bankers find P(A), P(B) and P( ).
Solution
P(A) = 1040/2000 = 0.52
P(B) = 140/2000 = 0.57
97
P( ). = Probability that a person is a banker given
that the person is a female.
= 80/1040 = 1/13
Alternatively, using the definition of conditional probability
we see that
P(AnB) = 80/2000
P(A) = 1040/2000
P( ). = P(AnB) / P(A)
= 80/1040 = 1/13
6.9 GENERAL MULTIPLICATION RULE
The conditional probability P( ).is given as
Re-arranging the last equation we have the general

multiplication law in the form
P(AnB) = P(A) P(B│A) (6.4)
6.10 INDEPENDENT AND DEPENDENT EVENTS
We say that the events A and B are independents if the

probability of B occurring is not affected by the
occurrence or nonoccurrence of A. If A is independent of
B so also is B independent of A. Hence when
independence exist between A and B.
98
P(B│A) = P(B) (6.5)
and the general multiplication rule (6.4) between reduces

to
P(AnB) = P(A) P(B) (6.6)
This is a special multiplication rule and it holds when

there is independence.
Example 6.13 Let Ω = {1,2,3,4,5,6} and A, B, be events

such that A= { 1, 3, 5,} , B = {1,2} . Show that events A
and B are independent.
Solution
AnB = {1}
P(A) = 3/6 = ½ and P(B) 2/6 = 1/3
P(AnB) = 1/6
Since P(A) P (B) = (1/2) (1/3) = 1/6 = P(AnB) the

events A and B are independent.
The special multiplication rule is indeed a test for

independence.
For independent events A1, A2, …,, An
P(A1U A2 U… An) = P(A1) +P A2 + ….+.P(An). (6.7)
Example 6.14.
Consider a box containing 6 white balls and 4 black balls.
2 balls are drawn at random. What is the probability that
the first ball is white and the second black if drawing is?
99
(a) With replacement
(b) Without replacement
Solution
Let A be the event “first ball is white” and B the
event “the second ball is black”. Then AnB is the
event “ 1st ball is white and 2nd ball is black”
(a) If the drawing is with replacement

P(AnB) = P(A) P(B) = [6/10] [4/10] = 0.24
(b) If the drawing is without replacement

P(AnB) = P(A) P(B│A)
= {6/10] [4/9] = [24/90]
= 0.27
In case (a) the events are independent but in case

(b) the events are dependent why?
6.11 PROBABILITY TREE

Many experiments consist of a sequence of two or
more events (stages). A probability tree provides
a useful device for obtaining possible outcomes at
each stage, and their permutations together with
the associated probabilities.
In this method, a tree starts with a node ,which is

in a form of a dot (.) and various skew lines
emanating from the node and going from left to
right representing the branches of the tree. These
branches are the various possibilities for that
stage. At the end of each branch, other events
occur, and so on and so forth until the experiment
ends.
100
Dots on the same imaginary vertical line are on
the same stage of the experiment. Dots on the
last vertical line gives the last stage of the
experiment and the total number of branches from
these dots gives the sample space for the whole
experiment.
Example 6.15 A bag contains 4 balls marked A, 3

marked B and 1 marked C. Two balls are drawn
at random without replacement use probability tree
to find the probability that one marked A and one
marked B are drawn.
Solution: We draw a probability tree as given

below:
Path Probability
101
From the probability tree, we have Path1, Path 2, …,Path
9 and the sample space Ω is
Ω = {AA, AB, AC, BA, BB, BC, CA, CB, CC}
P(one A and one B)

= P(1st is A, 2nd is B ) + P(1st is B, 2nd is A)
= 4 3 + 3 4 (using paths 2 and 4)

8 7 8 7
= 3 + 3 = 6 = 3
14 14 14 7
102
6.12 BAYE’S THEOREM
THEOREM 6.1.
For any events A and B,
P(A) = P(A│B) P(B) + P(A│Bc) P (Bc)
Proof:The sets AnB and AnBc form a partition of A, so

that
A = (AnB) U (AnBc)
and
P(A) = P(AnB) + P(AnB)

= P(A│B) P(B) + P(A│Bc) P(Bc)
This is the end of the proof. We shall make use of

Equation (5.1) later. Meanwhile recall that
103
Equation (6.10) is total probability theorem and Equation
(6.11) is a Baye’s formula when Ω is partitioned into two
events B1 and B2.
The total probabilities theorem is used when an event A

in Ω must result in one of the mutually exclusive and
exhaustive events B1 and B2 and Baye’s formula is built
up from here. We have, as it were considered a
systematic and formal derivation of Baye’s formula
through total probabilities. We want to formally state that
Baye’s theorem by generalizing Equation Ω .
Theorem 6.2. (Baye’s Theorem) for a given sample

space Ω, let B1, B2…,…, Bn be a collection of mutually
104
disjoint and exhaustive events in Ω such that P(Bi) > 0
for i=1, 2,….n. Then for every A in Ω for which P(A) > 0
where Bk is any of the Bi , (i=1, 2,…..n)
Observe that
is generalization of Equation (6.10)
Example 6.16. (Total Probability)
Given two boxes 1 and 2, suppose Box 1 contains 5

black and 6 white balls. Box 2 contains 3 black, 2 white
and 4 green balls. We select a box at random and then
draw a ball. What is the probability that we obtain a black
ball?.
Solution: Let B1 and B2 stand for the events “Box 1 is

chosen” and “Box 2 is chosen” respectively. Similarly, let
A, W, Y stand for the event.
“a black,” “a white,” or “a yellow ball is chosen,”
respectively.
Since the event A result in mutually exclusive and
exhaustive outcomes B1 and B2, we have that
A = (AnB1) U (AnB2)
105
P(A) = P(AnB1) U (AnB2)
= P(B1) P(A│B1) + P(B2) P(A│B2)
= 1 5 + 1 3
2 11 2 9
= 13/33
= 0.3939
Example 6.17. Suppose in Example 6.16 a black

ball is drawn. What is the probability that it was drawn
from
(a) Box 1
(b) Box 2
Solution: To answer the above questions, we

compute the conditional probabilities P (B 1 │ A) and
P (B2 │A)
(a) P(B1│A) = P(B1nA) / P(A)
P(B1) P(A│B1)
= P(B1) P(A│B1) + P(B2) P (A│B2)
= ½ 5/11
(½ ) ( 5/11) + ( ½ ) (3/9)
= 5 13 = 15
22 33 26
(b) P(B2│A) = P(An B2) / P(A)

= P(B2) P(A│B2)/P(A)
106
=
6.13 PRIOR AND POSTERIOR PROBABILITIES
The conditional probabilities P(B 1│A) and P(B2│A) have

changed from their original probabilities P(B 1) and P(B2).
It is customary to call the original probabilities prior
probabilities and the conditional probabilities posterior
probabilities.
Remark: If we had three boxes in the last example

the posterior probability P(B 3│A) would have been given
by.
P(B3│A) = P(B1)P(A│B1)+P(B2)P(A│B2)+ P(B3) P(A│B3)
Example 6.18.
In a certain factory machines B1, B2 and B3 are all
producing electric bulbs of the same dimension,
machines B1, B2 and B3 produce 3, 2 and 1% defective
bulbs respectively. Of the total production of bulbs in the
factory, machine B1 produce 36%, machine B2 produces
24% and machine B3 produces 40%. If a bulb is selected
at random in a day, what is the probability that it is
107
defective?. If a selective bulb is defective what is the
probability that it was produced by machine B3?
Solution
Let B1 be the event that a bulb is produced by machine
B1, B2 and B3 are similarly defined. Also let D be the
event that a bulb is defective. The probability that a
selected bulb is defective is P (D) = P(B 1) P(D│ B1) +
P(B2) P(D│ B2) + P(B3)P(D│ B3)
36 3 + 24 2 + 40 1 =
100 100 100 100 100 100
96
100000
= 0.0196
If the selected bulbs are defective the probability that it
was produced by machine B3 is
P(B3│D) = P(B3) P(D│B3)
P(D)
= 40 1 196 = 40
100 100 10000 196
= 0.2040
6.14 THE NOTION OF PROBABILITY DISTRIBUTION
We have stated that when a variable assumes values as

a result of outcomes of a random experiment or process,
it is a random variable. Since the numerical values of a
random variable X depend on the experimental
outcomes, the values and the pattern of a random
variable are governed by a chance model with some
108
functional form referred to as probability distribution in
statistical parlance. In short, associated with each
random variable is a probability distribution and scholars
use this mathematical structure to study various aspects
of the random variable.
Some probability distribution can be described by a close

mathematical function called the probability density
function (pdf). Once the pdf of a random variable is
given, it can be used to calculate probabilities of events
of interest in decision-making. We shall discuss briefly
three pdfs which are common in practice. They are the
binomial, normal and Poisson pdfs.
6.15 THE BINOMIAL DISTRIBUTION

A random variable X is said to follow the binomial
distribution if its pdf is given by
f(x) =
p = probability that an event will happen

in any single trial
q = 1-p is the probability that the event
will fail to happen in any single trial.
x = number of times an event occur
n-x = number of failure
The values of X are 0, 1, 2,….,n, with n as the total

number of tails in the random experiment.
109
Example 6.19 What is the probability of getting
exactly 2 tails in 6 tosses of a fair coin?
Solution: The required probability is
Example 6.20 What is the probability of getting at least

4 tails in 6 tosses of a fair coin?
Solution: Let x be the number of tails P (at least 4
tails)
= P(X =4) + P(X = 5) + (X = 6)
6.16 THE NORMAL DISTRIBUTION
The normal pdf is giving by
where  = mean, σ= standard deviation,  = 3.14159 …,

e = 2.7128….
We note the following about the normal distribition
110
1. The total area bounded by the curve of the pdf and the
X axis is one.
2.The area under the curve between two ordinates X = a

and X = b, where a < b gives the probability that X lies
between a and b and this probability is denoted by
P (a < X < b).
At times, circumstances may make use of special units

instead of X. These special units are referred to as
standard scores and can be obtained by the use of the
transformation.
The normal pdf using Z becomes
The random variable Z has the standard normal pdf with

zero mean and variance equals to one.
We use statistical tables to compute probabilities under
the standard normal curve.
Example 6.21 Compute the following probabilities using

statistical table.
(a) P(-1.96 < Z < 1.96)

(b) P(-1.96 < Z < 0 )
(c) P(0.82) < Z < 1.94)
Solution (a) P(-1.96 < Z< 1.96)

= 0.4750 + 0.4750 = 0.95
(c) P(-1.96 <Z < 0) = 0.4750
(d) P(0.82 < Z < 1.94)
111
= area between Z =0 and Z = 1.94
Minus area between Z= 0 and Z= 0.82
= 0.4738 – 0.2939 = 0.1799
6.17 THE POISSON DISTRIBUTION

A random variable X is said to follow the Poisson
distribution if its probability function is
where e = 2.71828…, x = 0,1, 2, … and λ is a constant

equal to the mean of X. One important feature of this
distribution is that its mean equals to its variance.
Example 6.22. 10% of the tools produced in a

certain factory are found to be defective sample of 15
tools chosen at random exactly 4 will be defective.
Solution. Let X = np = (15) (0.1) = 1.5

Then P(X=4) =
= (1.5)4 e-1.5
4!
= 0.047
It is worth mentioning that the binomial and Poisson
distributions are examples of discrete distributions
whereas the normal distribution is an example of a
continuous distribution.
6.18 MATHEMATICAL EXPECTATION
112
If p is the probability that a man wins a
construction contract of D naira, the mathematical
expectation or simply expectation is pD naira.
Example 6.23. If the probability that a man wins a lottery

of $80000 is 1/4, what is his expectation?
Solution. The expectation of the man is (1/4) (80000)

= $2000.
Let X be a discrete random variable which can assume

values, X1 , X2,…..Xk with respective probabilities P1 ,P2 ,
…. Pk such that P1 + P2 +…. Pk = 1. The mathematical
expectation of X, denoted by E(X) is defined as
E(X) = P1X1 + P2X2 +…+ PkXk
Example 6.24. A lady has three departmental stores

A, A and C. In A and she makes a profit of N900,000
and N450,000 respectively with probabilities 0.3 and 0.4.
In C she losses N250,000 with probability 0.3. What is
the expected profit of this lady?
Solution. The expected profit of this lady is

(0.3) (900000) + (0.4) (450000) – (0.3) (250000)
= 270000 + 180000 – 750000
= N375,000
Expectation can also be computed for continuous

variables but the evaluation involves the use of calculus.
However, expectation involving calculus shall be
considered later in this book.
113
Problem Set 6
1. If a letter is taken at random from the word
“POLYANTHUS’ what is the chance that it is a
vowel?
2. Bola chooses at random a number between 1 and
300. What is the probability that the number is
divisible by 4?
3. What is the probability that a number chosen at
random from the integers between 1 and 10
inclusive is either a prime or a multiple of 3.
4. Find the probability of selecting a figure which is a
parallelogram from a square, a rectangle, a
rhombus, kite and a trapezium.
5. A box contains 6 red pens and 9 blue pens. If one
pen is picked at random, what is the probability
that it is a red pen?
6. In an urn containing 6 red, 3 white and 2 yellow
balls, a ball is randomly selected, what is the
probability that the selected ball is white?
7. The following table gives the numbers of students
present in forms 5A and 5B of a Secondary School a
certain day.
Form 5A Form 5B
Number of boys 14 10
Number of girls 6 5
The bell goes for lunch and students come out of their
classroom at random. Find the probability that the first
student to come out is:
(i) a boy from 5B
(ii) a girl
(iii) from form 5A
(iv) a boy from 5B or a girl.
114
8. Find the probability that a number selected at
random from 41 to 56 is a multiple of 9.
9. What is the probability that an integer selected
from the set of integers 20,20,…30 is a prime
number?
10. A fair die is rolled once, what is the probability or
obtaining a number less than 3?
11. What is the probability of having an even number
in single toss of a fair die?
12. A die is rolled 200 times, the outcomes obtained
are shown in the table below.
(a) Find the probability of obtaining a 2
(b) What is the probability of obtaining a number less
than 3 ?
13. A create of soft drinks contains 10 bottles Coca-
cola, 8 of Fanta and 6 of Sprite. If one bottle is
selected at random what is the probability that it is
NOT a Coca-cola bottle?
14. If events x and y are mutually
exclusive,
F(X) = 1/3 and P(Y) = 2/5, (i) Find P(XUY) (ii)
P(XnY)
15. Ade and Chike threw a die in
turn. If the die contains the numbers 1,2,3,4,5 and
6, what is the chance that Chike will throw either
a5 or a6?
16. The probability of an event P ¾ while that of
another event Q is 1/6. If the probability of both P
and Q is 1/12, what is the probability of either P or
Q?
17. In a class of 30 students who sat the teachers’
examination in a certain year, 12 passed with
merit, 15 had passes and the rest had credits.
115
Find the probability of selecting at random, (i) a
student with Merit, (ii) a student with a Failure
(iii) a student with a Credit (iv) a student with a
pass or Credit.
18. Mrs Jones is expecting a baby. The probability
that it will be a boy is ½ and the probability that the
baby will have blue eyes is ¼. What is the
probability that she will have a blue – eyed boy?
19. A box contains 2 white and 3 blue identical
marbles. If two marbles are picked at random,
one after the other, without replacement, what is
the probability of picking two marbles of different
colours?
20. If the probability that a civil servant owns a car is
1/6, find the probability that: (i) two civil servants,
A and B, selected at random, each owns a car; (ii)
of two civil servants C and D, selected at random,
only one owns a car, (iii) of three civil servants, x,y
and z, selected at random, only one owns a car.
21. A box contains identical balls of which 12 are red,
16 white and 8 blue. Three balls are drawn from
the box one after the other without replacement.
Find the probability that: (a) three are red; (b) the
first is blue and the other two are red; (c) two are
white and one is blue.
22. What is the probability that a total sum of seven
would appear in two tosses of a fair die?
23. A pair of fair dice each
numbered 1 to 6 is tossed. Find the probability of
getting a sum of at least 9.
24. Rolling two unbiased dice,
what is the chance of the difference in scores
being 4?
116
25. Two numbers 1,2,3 and 4,. What is the probability
that the sum of the numbers removed is even?.
26. Define the following terms;
(a) An experiment (b) A random experiment

(c) Sample space
(d) Event (e) A random variable
27. State and explain four different approaches to the

definition of probability.
28. (a) Set up a sample space for a single toss of a
pair of fair dice
(b) From the sample space determine the probability
that the sum in tossing a pair of dice is either 9
or 11.
(c) Solve the problem in (b) without using the
sample space.
29. The probability that a man will be alive in 10 years is
3/5 and the probability that his wife be alive in 10
years is 3/5. Find the probability that in 10 years.
(a) Both will be alive
(b) Only the man will be alive
(c) Only the wife will be alive
(d) At least one will be alive
(e) None will be alive
30. How many different committees of 3 men and 2

women can be formed from 7 men and 5 women.
31. In how many ways can 5 persons be allocated to 3
houses along a street?
32. Seeds of a certain plant have a 70% germination
rate. Calculate the probability that when 10 of
these seeds are planted, 8 or more will germinate.
117
33. An insurance salesman sells policies to 4 men all of
identical age and in good health. According to the
actuarial tables the probability that a man of this
particular age will be alive in 20 years is 3/4. Find the
probability that in 20 years,
(a) all 4 men

(b) at least 2 men
(c) only 2 men
(d) at least 1 man, will be alive
34. Find (a) the mean and (b) the standard

deviation on an examination in which marks of 72 and 90
correspond to standard scores of - 0.7 and 1.5
respectively.
35. Using the standard normal table compute the
following probabilities
(a) P(-1.20 < z < 2.47)

(b) p (-1.24 <Z <1.89)
(c ) P (-2.36< Z< -0.60)
(d) P (z >-1.79)
(e) P(z > -1.46)
118
CHAPTER 7
DISCRETE PROBABILITY DISTRIBUTIONS
7.0 DISCRETE AND CONTINUOUS RANDOM

VARIABLES
In this book, we encounter two types of random

variables. One of them is the discrete random variable
and the other is the continuous type. By discrete random
variable X we mean that X is capable of either assigning
a finite number of possible values or its values may
constitute or take the form of a countable infinite set as in
the list {1, 2, 3,.....}
Some examples of discrete random variables are as

follows:
1. The attendance X in algebra class for each week
for the whole semester. Variable X can assume
the value 0,1,2,.....n, where n is the total number
of students who registered for algebra.
2. Ten patients suffering from tuberculosis have

registered for treatment for 14 consecutive days at
a community hospital.
119
The number X of patients who present themselves for
treatment daily is a discrete random variable. X can
assume the values 0, 1, 2,....., 10
3. The number X of video machines in a sample of

20 newly manufactured video machines that have
passed quality control test is a discrete random
variable. The variable X can take values 0,1,2, ..20
4. The rating X on a 1 through 10 scale given to

height of ladies in a beauty contest is a discrete
random variable. The values of X are 1,2,......10
5. The number X of major armed robbery operations

in a certain state in Nigeria during the last six
months. X could be 0,1,2,3 and so forth. The
maximum number of operation is not definite.
A continuous random variable is one that is capable of

assuming values at every point over a given interval,
while values taken by a discrete random variable results
from a counting process, values assumed by a
continuous random variable results from a measuring
process.
Some examples of continuous variables are height,

weight, volumes, cost, profit, revenues, expenses,
financial ratios and inventory turnover.
7.1 PROBABILITY DISTRIBUTION
Some probability distribution can be described by a close

mathematical function called probability mass function
120
f(x) if the random variable X is discrete. If the random
variable X is continuous, f(x) is called probability density
function:
The function f(x) enables us to compute the probability
associated with the values assumed by the random
variable X.
Some examples of special discrete distribution are the

binomial, Poisson, and hypergeometric distributions.
They are called discrete distributions because they are
constructed from discrete random variables. The normal,
uniform, exponential, t, Chi-square and F distributions are
continuous on continuous variables.
In this chapter we shall focus on discrete distribution-
7.2 PROBABILITY MASS FUNCTION
The function f(x) is called the probability mass function

(pmf) for the discrete random variable X, if it is a formula
or table that gives the probability associated with each
value of X such that :
(a) for each value X (7.1)
(b)
where summation is over all the possible values of X. We

at times use a set A to to represent all the possible
values of X in a given problem.
121
7.3 MATHEMATICAL EXPECTATION
Let X be a discrete random variable which can
assume values, X1 , X2,…..Xk with respective probabilities
P1 ,P2 ,…. Pk such that P1 + P2 +…. Pk = 1. The
mathematical expectation of X, denoted by E(X) is
defined as
E(X) = P1X1 + P2X2 +… ..+PkXk (7.2)
Example 7.1 If the probability that

A man wins a lottery of N16000000
Is ¼ , what is his expectation?
Solution: The expectation of the man is
(1/4) 16000000 = N4000000

]
Example 7.2 A lady has three supermarkets A, B and C.
In A and B she makes a profit of N9000000 and N450000
respectively with probabilities 0.3 and 0.4. In C she loses
N250000 with probability 0.3. What is the expected profit
of this lady?
Solution. The expected profit of this lady is

(0.3) (900 000 ) + (0.4) (450000) - (0.3) (250000)
= 270000 + 180000-75000
= N375000
Expectation can also be computed for continuous

random variables but the evaluation involves the use of
calculus and shall be considered in the next chapter.
122
One special application of expectation is in the evaluation
of the mean or expected value. Another one is the
calculation of variance. In doing this,we replace P i in (7.2)
with
The mean or expected value  of a discrete random
variable X is
and the variance , of X is
The variance can alternatively be computed using
(7.5)
where
THE BINOMIAL DISTRIBUTION
We encounter often a problem of modeling

probabilistically a situation involving X successes in n
independent trials. The binomial distribution offers a
solution in this case A discrete random variable X follows
the binomial distribution if its probability mass function
f (x) is
123
where
X = number of successes in n trials

p = probability of success at each trial
q = 1-p = probability of failure at each trial
The mean and variance of the binomial distribution

are given by
= np (7.8)
= npq (7.9)
In using the binomial distribution we assume that
(a) The experiment consists of n identical trials

(b) The trials are independent
(c ) There are only two possible outcomes for every trial
“success” and “failure”
(d) The probability p of success at each trial is constant
throughout the experiment
(e) X can assume any of the integral values 0,1,2,.......,n.
Some areas of application of B(n,p) include

1. Quality Control
124
2 As approximation to other distributions, for example
the hypergeometric distribution.
Example 7.3 Suppose the probability that any flight

arrives on time at Calabar airport is 0.90. If we have 5
flights, what is the probability that
(a) None of the flights is late
(b) Exactly two of the flights are late.
Solution (a) In this problem, n= 5, p = 0.90 No flight is

late implies all of the 5 flights arrive on time ,hence x=5
and
P(X=5) =
(b) Two flights being late implies that three flights arrive
on time. Hence X=3 and
P(X=3)=
Example 7.4 What is the probability that 2 customers

out of 5 who enter a supermarket will definitely make
purchases if the probability of a customer making a
purchase is 0.6?
Solution: n= 5, x = 2, p=0.6
125
THE POISSON DISTRIBUTION
The discrete random variable X has the Poisson

distribution if its probability mass function f(x) is of the
form
where is equal to the mean of the distribution and

e = 2.718282......
This is the only distribution where the variance is equal to

the mean .
The Poisson distribution has the following characteristics
(a) Tt describes discrete occurrences over an interval

(b) It is used to model rare events, for example plane
crash over the air space of a given nation
(c) Occurrences are independent
126
(d) The variable X can assume integral values from
zero to infinity
(e) The expectation E[X], (expected no of

occurrences) must remain constant throughout the
experiment.
Other areas of application of the Poisson distribution are:
(a) Stochastic processes involving queues and waiting

time
(b) Modeling Birth and Death processes
Example 7.5 If the number of bags lost per flight arriving

a certain airport is 0.25 , what is the probability of a plane
(a) losing no bag on arrival ?
(b) losing 2 bags on arrival ?
Solution (a) A plane arriving an airport with lost

baggage is a rare event. Hence the number of lost bags
per flight follows the Poisson distribution with mean
= 0.25 Therefore if no bag is lost X= 0.
P(X=x) = =
(b) P(X= 2) =
127
Example 7.6 Using information in Example 7.5,find the
probability of having at most 2 lost bags in a flight.
Solution P(X 2) = P(X= 0)+ P(X=1)+ P(X=2)

= 0.7788+ 0.1947+ 0.0243
= 0.9978
POISSON APPROXIMATION TO THE BINOMIAL

DISTRIBUTION
At times, it becomes tedious to calculate the binomial

probabilities when n and p 0. Now as n
becomes large and p becomes smaller and smaller, such
that the Poisson parameter is constant and equal to np,
we can approximate binomial probabilities by the Poisson
distribution.
Example 7.7 Five percent of electric bulbs produced in a

certain factory turn out to be defective. Find the
probability that in a sample of 20 bulbs chosen at
random, exactly 4 will be defective by using
(a) the Binomial distribution
(b) the Poisson approximation to the binomial distribution
Solution: (a) Using the binomial distribution

128
p= 0.05, n=20, X= 4. Then
P(X=4) = = 0.013
(b) Using Poisson distribution, = np = 20(0.05)= 1

P(X=4) = = 0.015
In general, the approximation is good when p< 0.1 and

np 5
THE HYPERGEOMETRIC DISTRIBUTION
The discrete random variable x has the hyper geometric

distribution if it probability mass function is given by
(7.11)
where
N= size of the population
K = number of successes in the population
x = number of successes in the sample
n = sample size or number of trials.
The mean μ and variance σ2 of this distribution,are

respectively
129
μ= (7.12)
σ2 = n. (7.13)
Assumptions of the hyper-geometric distribution are as

follows:
(a) Sampling is done without replacement
(b) The population size N is finite and known
(c) There are only two outcomes for each trial success
and failure
(d) Trials are not independent Areas of application of the
hyper-geometric distribution include
(1) Estimating probability of defectives in a sample when

number of defectives in the population and population
size are known. This application is in sampling
inspection.
(2) Estimating probability of preference of a particular

brand to another. If X medical laboratories in a sample of
n laboratories prefer Reagent A to Reagent B in carrying
out HIV/Aids test, and the sample is drawn from N
medical laboratories, where K prefer Reagent A, what is
the probability that X in a sample of n units prefer
Reagent A?
130
Example 7.7 A pond contains 24 fish, out of which 8 are
marked. A sample of size 5 fish are drawn at random,
what is the probability that 3 of them are marked?
. Here N = 24, n=5, K=8 and x=3 According to the hyper-

geometric distribution
P(x=3) =
Problem Set 7
1. Of all the Anita Jones books bought last year 70%

were purchased for readers 15 or older. If 10 Anita
Jones fans who bought books last year are
interviewed. Find the probability that
(a) At least 8 of then are 15 or older
(b) Exactly 6 of them are 15 or older
(c) Less than 4 of the fans surveyed are 15 or
older.
2. Health survey records show that 25% of all
patients admitted to a certain community clinic fail
to pay their bills and in the course of time, the bills
are written off. A random sample of size 5 is
selected from a large set of prospective patients
served by the clinic. Find the probability that
(a) All the patients bills will eventually been written
off.
(b) Only 2 bills will be written off.
131
(c) None will be written off
3. Suppose that 10% of the fields in a cultivated area
are infested with cassava pest. 200 fields in this
area are randomly selected and checked for
cassava pest.
(a) What is the mean number of fields sampled
that are attacked with pest?
(b) Within what limits would you expect to find the
number of infested fields, with probability
approximately 99% ?
4. The increase of small passenger planes in major
airports has called for concern over air safety. An
airport in south-east has recorded a monthly
average of 6 near-misses on landings and takeoffs
in the past four years. Find the probability that
during a given month
(a) There are no near-misses on landings and
takeoffs at the airport.
(b) There are 4 near-misses
(c) There are at least 4 near-misses
5. The number of people entering the intensive care
unit of Soyo Medical Clinic on any one day is a
random variable X. If X is Poisson with mean
equal to 8 persons per day. What is the
probability that the number of people entering the
intensive care unit on a particular day is
(a) 3
(b) Less than or equal to 3?
6. A packet of sweet contains 5 Tom Tom and 4 Mr.
Blue sweets. A child brings out 3 sweets without
looking what is the probability that
(a) There are 2 Mr. Blue and 1 Tom Tom sweets
(b) The sweets are all Tom Tom?
(c) The sweets are all Mr. Blue?
132
7. A company has 6 applicants for 3 positions: 2 men
and 4 women. If the 6 applicants are equally
qualified and that no gender is preferred in the
selection procedure and if X represents the
number of women chosen to fill the three positions
(a) Write out f (x), the probability function for f(x).
(b) What is the mean and variance of this
distribution?
8. A packaging experiment is carried out by
displaying two different package designs for an
instant noodle food side by side on a supermarket
shelf. This is to find out customers’ preference for
one of the two designs. On a given day 20
customers, each purchased a package from the
supermarket. Let X be the number of customers
who prefer the first design.
(a) If there is no preference for any of the designs
what is the probability f(x) that a buyer chooses
the first package design?
(b) If there is no preference, use the results in part
(a) to find the mean and variance of X.
(c) If 4 of the 20 customers choose the first
package design and 16 choose the second design
what is your conclusion about the second design?
9. A manufacturer of radio cassettes ships them in
lots of 1500 cassettes per lot. Before shipment, a
random sample of 20 cassettes are tested. Some
may or may not be defective.
(a) Construct the probability distribution of X, the
number of defective cassettes in a sample of 20?
(b) What distribution can be used to estimate
approximately the probability in (a) ?
133
134
CHAPTER 8
CONTINUOUS PROBABILITY DISTRIBUTIONS
8.0 INTRODUCTION
In this chapter, we shall consider, distributions
that depend on continuous random variables. Precisely,
we shall look at the mathematical form for each
distribution considered and applications.
8.1 THE UNIFORM DISTRIBUTION

The continuous random variable X has the uniform
distribution if its probability density function f(x) is of the
form
.
(8.1)
xxxxx
Figure 8.1: Sketch of the Uniform Distribution.
The sketch of the distribution shows that it has a constant

altitude equal to 1/b-a. This is why it is also called
rectangular distribution. The mean of this distribution is
and the variance is
135
Some applications of the rectangular distribution include
random number generation and simulation in general.
Example 8.1 Given the uniform distribution (8.1)

compute the probability that x lies between c and d (c <
d)
Solution: P(c < x <d ) = (d-c) (1/b-a)
= (d- c)/(b-a)
Example 8.2 Using the result in Example 8.1 find

P(c < x <d ) when a =5 , b =10, c= 3 and d=6.
8.2 THE EXPONENTIAL DISTRIBUTION

The Poisson distribution models the number of events or
events happening in a given time interval. However, the
waiting time X, between events (changes) is a random
variable which follows the exponential distribution, say
with parameter . If the mean rate of events happening is
λ, then the Poisson parameter is λ and the mean waiting
time between events is  = 1/λ
The continuous random variable X is said to have an
exponential distribution if its pdf is given by
(8.4)
The mean of (8.4) is
136
(8.5)
and its variance is
We apply the exponential distribution in the examples

below.
Example 8.3 : Customers arrive a certain supermarket in

central Port Harcourt at a mean rate of 15 per hour.
(a) What is the mean waiting time in minutes ?
(b) If X denotes waiting time in minutes until the first
customer arrives, what is its pdf ?
(c ) What is the probability that the supermarket
receptionist will have to wait more than 8 minutes before
the first customer arrives?
Solution: (a) The mean rate of events in person/minute

λ is
λ = 15 per min = 15/60 = 0.25/min
Therefore the mean waiting time is
 = 1/λ = 1/0.25 = 4 minutes.
(b) The pdf of X, the waiting time is
(c )
Example 8.4 Using data from Example 8.3, estimate the

probability .
137
Solution: Take it as an exercise.
8.3 THE NORMAL DISTRIBUTION
A continuous random variable X has the normal

distribution if its pdf is
= mean of X
= variance of X
= 3.14159....
e = 2.71828…
Figure 8.2: The Shape of the Normal Distribution.
Properties of the Normal distribution.
1. It is a symmetrical (bell-shaped) distribution

2. It is unimodal
3. The total area under the curve sums to 1. T
4. Area to the right of the mean equals area to the
left of the mean = 0.5
NOTATION: If X has the normal distribution with mean

and variance , we write this as “ X~ N( , )“
138
STANDARD NORMAL DISTRIBUTION
Let X has the normal distribution with mean and
variance then, the variable Z givens as
(8.8)
has a standard normal distribution with mean 0 and

variance equals 1. Hence Z ~ N ( 0, 1). The probability
density function of Z is
(8.9)
The random variable Z is used to compute the

probability that a random variable X lies between X =a
and X = b by using
= (8.10)
where and Z = .
While X may be affected by units, Z has no unit and is

dimensionless.
139
The value of Z is referred to as Z score or standard
score. Values of Z and associated probabilities are
obtained from table of standard normal distribution called
Z table.
Example 8.5:Using the standard normal table, find the

following probabilities.
(a) P(-3 < Z <3) (b) P(-2.51 < Z <2.51)

(c) P(Z <1) (d) P(Z >-3)
(e) P(24<X<36) (f) P(X<48)
Given that r=10 and m=12
Solution (a) see figure 8.3
(A) P(-3<Z <3)=Area under the curve

between Z = 0 and Z =-3
plus area under the curve
between Z =0 and Z =3
= 2times the area between
Z = 0 and Z=3 (because
of symmetry
2 = (0.4987) = 0.9974
(b) = 2(Area under the Curve between

Z = 0 and Z=2.51)
= 2(0.4940)
= 0.9880
(c) P(Z<1)= Area to the left of Z=0 Plus area between

140
Z=0 and Z=1
= 0.5 + 0.3413
= 0.8413
(d) P(Z -3)

= Area to the right of Z=0 plus area between Z=0 and
Z= -3 )
= 0.5 plus area between Z=0 and Z=3 (symmetry)
= 0.5 + 0.4987
= 0.9987
(e) P(24<X<36)
= P[(24-12)/10 <(X-12)/10 < (36-12)/10]
= P(1.2 <Z < 2.4)
= Area between Z=0 and Z=2.4 - area between Z=0 and
Z= 1.2
= 0.4918-0.3849
= 0.1069
(f) P(X<48) = P[(x-12)/10 < (48-12/10)]

= P(Z<3.6)
= Area to the left of Z=0 plus area between Z=0 and
Z= 3.6
=0.5 + 0.5 = 1
EMPIRICAL RULE
Many distribution of data in real life are mound-shaped

and so can be approximated by a bell-shaped frequency
distribution known as a normal curve.Data set with this
type of distribution has some special characteristics in
variability which are contained and stated in the law,
called empirical rule:
141
Given a distribution of data or measurement that is
approximately normal (bell-shaped), then it follows that
the interval with end points
 σ contains approximately 68% of the measurement.

 2σ contains approximately 95% of the measurement.
 3σ contains approximately 99.7% of the
measurements.
Example 8.6 Suppose the scores on a aptitude test

given to all senior secondary students in a certain state
have approximately a normal distribution with mean
 =74 and standard deviation σ =10. Find the bounds
for approximately
(a) 68% of the scores
(b) 95% of the scores
(c) 99.7% of the scores
Solution
(a)  σ = 74 10 contains approx.68% of the scores
Hence the bounds are (64, 84)
(b)  2σ= 74 2(10) contains 95% of the scores

I.e. (54, 94) contains 95% of the scores
(c)  3σ= 74 3(10) = 74 30 =(44, 104) contains

99.7% of the scores.
142
8.4 CENTRAL LIMIT THEOREM (CLT)
THEOREM (C.L.T). If is the mean of a random

sample of size n taken from a population having the
mean  and finite variance σ2, then as n tends to
has a standard normal distribution with mean 0 and

variance 1.
THE CENTRAL LIMIT THEOREM APPLIED

TO THE BINOMIAL DISTRIBUTION [B(n,p)]
This is the same as normal approximation to the binomial
distribution B(n,p), with parameters n and p, the required
standard score Z is
since E(X) = np and Var(X) = npq
The quantity is N (0 , 1).
In applying the CLT, we note that
143
(a) Z has the general form
or
(b) Continuity correction has to be made if the

variable X is integer-valued .
That is, if X is discrete, then we use

P (j-0.5 < X< K+0.5)
Whenever we seek to compute
P(j < X < K). Therefore
P(j< X < K) = P(j - 0.5 <X< K+0.5)
Example 8.7: In 100 tosses of a fair coin find

P(45<X<55), where X is the number of heads.
Solution: Here n = 100, np =100(1/2)=50, npq=25

P(45<X<55)
= P(44.5 <X< 55.5)
=P[(44.5 – 50)/50 <( X-50)/50 < (55.5 -50/50)]

=P(-1.1 < Z < 1.1) = 0.7286
THE CENTRAL LIMIT THEOREM APPLIED TO THE

POISSON DISTRIBUTION.
144
The Poison distribution with parameter λ has
Mean = E(X) = λ
Variance σ2 = Var(X) =λ
The quantity Z is
Example 8.8: Calls arrive at the rate of λ =3 calls per

minute in a medical clinic.
(a) What is the expected number of calls within 8 hours of
a working day?
(b) What is the probability that in an 8-hour working day
the medical centre will receive 900 and 1500 calls?
Solution (a) For 8 hours, expected number of calls is

3(60) (8) = 1440 and
(b)
=
= 0.5000 + 0.444 = 0.9441
145
Problem Set 8
1. Find the probability of getting between 4 and 7

heads inclusive in 10 tosses of a fair coin by using
(a) The binomial distribution and
(b) The normal approximation to the binomial
distribution.
2. In objective test in algebra a student is to choose
an answer to a question from five options A, B, C,
D and E. If the probability of any of these letters
being the correct answer is 0.2, find the probability
that
(a) 15 or more out of 25 questions are correct
(b) 30 out of 40 questions are correct
3. Fifteen percent of electric bulbs produced by a
certain factory are defective. In a random sample
of 500 bulbs produced by this factory find the
probability that
(a) At most 40 bulbs will be defective
(b) Between 40 and 50 bulbs will be defective.
4. The probability that an individual suffers a bad
reaction from oral intake of a certain constipation
mixture is 0.002. Find the probability that out of
1000 patients that take the mixture.
(a) Exactly 4 persons will suffer a bad reaction
(b) More than 5 persons will suffer a bad reaction.
146
CHAPTER 9
CONFIDENCE INTERVAL
9.0 INTRODUCTION
We shall consider among other issues:

- Point and interval estimation
- Confidence interval for population mean for both
large and small samples
- Confidence interval for the population proportion
some introductory write-up.
9.1 TYPES OF ESTIMATORS

In order to estimate the value of a population parameter,
we use information from the sample in the form of an
estimator. Estimators are constructed using information
from sample realizations and hence, by definition they
are also statistics.
Definition: An estimator is a rule, or formula that guides

on how to calculate an estimate of a desired population
parameter using sample observations.
We either use a point estimation or an interval
estimation. In point estimation, a single number is
calculated from sample data to represent a parameter. In
interval estimation, using sample data, two numbers are
calculated to form an interval within which the parameter
is expected to be: The pair of numbers used form an
interval estimate or confidence interval.
147
In practice, we encounter both point and interval
estimators in many situations. For instance, tourists
moving from one country to another are concerned about
not only of average temperature (point estimate) but also
of the minimum and maximum temperature of cities to be
visited. Also, an employee leaving his current job for
another is concerned about the new salary brackets
(interval). Will his new salary fall within a reasonable
brackets?
Definition: (UNBIASED ESTIMATOR) An estimator of a

parameter is said to be unbiased if the mean of its
distribution is equal to the true value of the parameter,
otherwise, it is said to be biased.
Another way to explain this is to use mathematical
expectation. If we denote the estimator of a parameter θ
by , then we say that the estimator is unbiased if E(
) = θ. An example of an unbiased estimator is the
sample mean .It can be shown that E( ) =
μ(population mean). Also the sample variance s 2 is
unbiased for the population variance σ2, where
EFFICIENCY: It is common to talk of the efficiency of one

estimator relative to another.
Definition: Let and be unbiased estimators of the
parameter θ. We say that is more efficient than
if
148
Var ( ) ≤ Var( )
The variance (or better variability) of an estimator is

measured in terms of its standard error. For instance,
has a standard error (SE) of size , where s is the
standard deviation of the sample data. For n ≥ 30, we
expect to be computed with a margin of error of ±1.96
, if we allow a 5% error.
9.2 CONSTRUCTING A CONFIDENCE INTERVAL

The first step is to allow an error, called in percent (in
decimals) and define a quantity called confidence
coefficient 1 - . Confidence coefficient is the probability
that a confidence interval will contain the estimated
parameter.
At times, we express the coefficient or interval in

percentage.
For instance, 95% confidence interval (CI) is when
=0.05. For any , we talk of a 100 (1 - )% CI. If we
assume normality, a 95% CI for a parameter θ is of the
form:
Parameter estimate ± 1.96(SE)
Details of the construction of CI will be clearer when

we
come to specific cases and worked examples.
149
9.3 A LARGE-SAMPLE CI FOR A POPULATION MEAN
μ.
Suppose we have a normal population or the sample size

n is large than a 100(1- )% CI for μ is
If σ is not known and n ≥ 30, we use s as an estimate for

σ. A 100(1- )% CI for μ in this case is
In all these, is the Z-value corresponding to an area

in the upper tail of a standard Z distribution.
Example 9.1: A random sample of 50 points in a fish

experimental pond reveals an average of 76 grams of
chemical contaminants per day with a standard deviation
s = 4 gram per day. Using this sample statistics,
construct a 95% CI for the mean daily contamination of
the pond.
Solution
The approximate 95% confidence interval for μ is
= 76 ± 1.96 (4/√50)
= 76 ± 1.11
150
= [74.89, 77.11]
Example 9.2. Repeat your calculations in the last

Example 9.1 if a 99% confidence interval is desired.
Solution. Now 99% = (1-0.01)100% meaning that =

0.01 and = 0.005. Therefore = 2.58 and a 99% CI
is
= 76 ± 2.58 (4/√50)
= 76 ± 1.46
= [74.54, 77.46]
9.4 A LARGE-SAMPLE CI FOR A POPULATION

PROPORTION, p.
The word proportion is a common occurrence in

estimation effort. Some examples are
(a) Proportion of students that can be expected to be

females in a university admission.
(b) Proportion of seeds that germinate
(c) Proportion of voters that is expected to favour a
certain political candidate in an election.
(d) Proportion of sales of a product that may come
through internet purchase.
If the sample size is large, a 100(1- )% CI for the

population proportion is
151
where is the Z-value corresponding to an area
in the right (upper) tail of a standard normal distribution.
Since p and q are unknown, we estimate them as
and
We accept the sample size as being large enough when

np ≥ 5, nq ≥ 5.
Example 9:3 A random sample of 973 sale contacts

reveal that 584 customers made purchase of video
camera from DK Technology through the internet.
Construct a 90% confidence interval for p, the proportion
of customers who made purchase of video camera
through the internet.
Solution: The point estimate for p is
0.600
with standard error
152
For a 90% CI, = 0.10 and /2 = 0.05
Therefore = 1.645. Then a 90% CI for p is
= 0.600 ± 0.026.
This implies that p lies in the interval [0.574, 0.626]. We

infer that sales done online by the company is between
57.4% and 62.6%.
9.5 A LARGE – SAMPLE CI FOR μ1 – μ2
Estimation of the difference between two population

means μ1 and μ2 can be met in many situations.
This type of estimation occur when making comparison
involving two populations. Some situations in real life
include;
(a) Production output in a factory from raw material
supplied by two different contractors.
(b) Minerals content of a plant grown on two different
types of soil nutrients
(c) Performance of students under two different
teaching methods.
We begin the construction of a confidence interval for

μ1 – μ2 by noting that if the samples are
independent, then
153
(a) is normally distributed if each population has a
normal distributions
(b) has approximately a normal distribution if each of
n1 and n2 is large (using the central limit theorem).
(c) has mean μ1 – μ2
(d) has standard error
SE =
Then the 100 (1- )% CI for μ1 – μ2 is
( )±
If σ12 and σ22 are unknown, we estimate them

respectively by s12 and s22 the 100 (1- )% CI for μ1 – μ2
is now
( )±
Example 9.4:For the first 100 days of manufacturing

electric bulbs, Xcel got their raw materials from two major
contractors Delon and Xpert. The outputs gave a mean of
26350 bulbs with variance 1435000 using inputs from De
lon and 25096 with variance equal to 1957000 using
inputs from expert. Construct a 99% confidence interval
for μ1 – μ2, the difference in mean number of bulbs
produced daily.
Solution. The point estimate for μ1 – μ2 is

( ) = 26350 – 25096 = 1254 with standard error.
154
SE = = = 184.17
A 99% CI for μ1 – μ2 is
1254 ± (184.17)
= 1254 ± 2.58(184.17)
= 1254 ± 475.2
That is 778.8< μ1 – μ2 < 1729.2.

The difference in the average daily output using inputs
from two independent sources is estimated to lie between
778.8 and 1729.2 bulbs. However since we are dealing
with number of bulbs, we accept the limits to be 779 and
1730.
9.6 A LARGE-SAMPLE CONFIDENCE INTERVAL

FOR P1 – P2.
We assume that independent random samples of sizes
n1 and n2 have been selected from binomial populations
with parameters p1 and p2 respectively. The difference
between sample proportions.
= x1/n1 - x2/n2 has

(a) The mean p1 - p2
(b) The standard error
which is estimated by
SE =
155
(c) approximately normal distribution when n1 and n2 are
large, due to central limit thermo (the product,
n1q1, n1p1 , n2q2 and n2q2 must each be greater
than 5 to justify the use of normal approximation:
Example 9.5: Opinion polls were conducted in two local

government areas Oto and Aka of the Same ethnicity in
South South Nigeria to know the minds of people towards
the process of choosing a paramount ruler.
The Council of Chiefs proposes the use of consensus
decision instead of election. Below is the outcome of the
Polls.
Table 9.1: Samples Values for Opinion on Consensus
choice versus Election.
Oto Aka
Sample Size,n
Favouring
consensus
Estimate the difference in the true proportions favouring

consensus decision in the choice of a paramount ruler.
Solution. The estimate for p1 – p2 is 39/50 – 66/100 =

0.78 – 0.66 = 0.12.
The standard error of is
A 99% CI for is
156
=0.12 0.194
= [-0.074, 0.314].
The interval contains the value p 1 – p2=0 which implies
that there may be no difference in proportions.
9.7 A SMALL – SAMPLE CI FOR A POPULATION

MEAN μ.
In small samples, we use the Student’s t distribution in
constructing confidential interval for μ. This is based on
the assumption that.
(a) The sample must be randomly selected
(b) The population from which we are sampling must
be normally distributed.
A small- sample (1 - )100% confidence interval for μ.
±
where is the t-value corresponding to an area in

the right tail of t distribution, with (n-1) degrees of
freedom.
Example 9.6: A new process for producing cement can

be operated at an efficiency level only if the mean daily
production of cement is greater than 500000kg. To
assess the efficiency of the production process, six
operations with recorded weights 460000, 615000,
520000, 480000, 570000 and 540000kg were used.
Find a 95% CI for the population mean μ.
Solution: From sample data

157
A 95% CI for μ is
± =
= [470680, 590980]
9.8 A SMALL SAMPLE CI FOR WHEN
Suppose that we have two randomly selected

independent samples from two normally distributed
populations having equal variance then, a 100 ( 1 - )%
CI for is
where
and is based on ( ) degrees of freedom.
Example 9.7: An assembly process in a factory requires

approximately a four–week training period for a newly
employed staff to attain maximum efficiency in
assembling a device. A new method of training was
158
proposed, and a test was conducted to compare the new
method with the old standard procedure. Two groups,
each of ten new employees were trained for a period of a
month, one group using the new procedure and the other
applying the standard procedure. Table 9.2, gives time in
minutes required by employees under the two
procedures have been recorded at the end of the one-
month period. Find a 90% CI for , the difference in
mean time between the two procedures.
Table 9.2: Assembly Times for two Training Methods.
Standard Method New Method

43 46
48 42
46 40
39 36
52 45
55 51
46 38
42 43
45 42
35 32
Solution
159
Pooled variance
= 31.6333 with 10+10-2=18 df

( df=degrees of freedom )
A 90% CI for is
9.9 A SMALL – SAMPLE CI FOR WHEN
The two-sample method for small samples that uses a

pooled estimate of the common variance σ2 leans on
some important assumptions:
(a) The samples must be randomly selected to minimize

bias and maintain the significant levels we report.
160
(b) The samples must be independent otherwise we use
procedures for paired (dependent) samples to be
considered later.
(c) The samples must come from normal populations.
Mild departure from normality do not seriously affect
the distribution of the test statistic, especially when n 1
and n2 are almost the same.
(d) The population variance must be equal or
approximately equal to ensure the validity of the
procedures.
If both the sample sizes and variances differ substantially

the pooled estimator is no more appropriate, and each
population variance must be estimated by its
corresponding sample variance. Therefore the variance
of is now
with degrees of freedom furnished by the formula
161
This formula is known as Satterthwaite’s approximation
some authors and researchers apply this approximation
to estimate degrees of freedom when the ratio of the
larger variance to that of the smaller variance is greater
than 3.
If we fear that the sampled populations might be far from
being normal, whether variances are equal or not we can
use a non-parametric method. One non-parametric
method for comparing two independent samples is the
Wilcoxon rank sum test.
9.10 A SMALL – SAMPLE CI FOR μ1 – μ2 WHERE THE

SAMPLES ARE DEPENDENT.
There are many instances, where paired comparison

design arises naturally. Some cases are cited by Gacula
and Singh (1984) and they are:
1. the use of the right and the left carcass in meat
science experiments.
2. the use of identical twins and littermates in
genetics and nutrition studies.
3. Self-pairing in which measurements on units or
individuals are done on two occasions, such as
before and after treatment .
The motivation behind “paired comparison is to form
homogeneous pairs of like units so that comparisons
between the units of a pair measure differences due to
treatment rather than of unit “(Gacula and Singh)
Example 9.8: Two devices, A and B were used to
determine the amount of a certain contaminant given in
162
experimental ponds. The results are given in Table 9.3
below.
Find a 95% CI for μ1 – μ2
Table 9.3 Contaminants in Ponds using Devices A

and B.
Pond X1 X2 D = X1- X2 D2
1 11.7 11.3 0.4 0.16
2 10.9 10.5 0.4 0.16
3 13.4 12.8 0.4 0.36
4 10.8 10.2 0.6 0.36
5 9.9 9.3 0.6 0.36
Mean 11.34 10.82 0.6 ∑D2= 1.4
Solution: Let the standard deviation of D be SD , then
A 95% CI for μ1 – μ2 is
= 0.52 ± 0.14
= [0.38, 0.66]
Problem Set 9
1. Define estimator (statistic) and parameter
163
2. When do we say that a point estimator is
(a) unbiased
(b) efficient (best) ?
3. A random sample of 60 first year students in
certain university reveals an average weight of
65kg with standard derivation 1.5kg. Construct a
95 percent confidence interval for the mean weight
of first year students. Repeat your calculations if a
99 percent confidence interval is required.
4. The time in minutes workers in Xcel group spent in
getting to work are given below:
30 40 40 32 33
22 24 46 36 29
41 36 36 44 29
29 35 32 25 28
Construct a 95% confidence interval for the population

mean and comment on your result.
5. Out of 200 customer to Teks supermarket 160 of
them visited the supermarket because of a TV
advertisement. If p is the population proportion of
customers who visited the supermarket through
TV adverts,
(a) Estimate the population proportion p
(b) Compute the standard error of the proportion
(c) Construct a 99% confidence interval for the
population proportion.
CHAPTER 10
164
TEST OF HYPOTHESES
10.0 TESTING HYPOTHESES ABOUT POPULATION

PARAMETERS.
A statistical hypothesis is an assertion about a

population parameter. Hypothesis testing is a statistical
procedure used to provide some evidence resulting in
either accepting or rejecting some statement
(hypothesis). A typical example is hypothesis testing
employed to assess whether a population parameter,
such as the population mean , differs from a specified
acceptable standard or previous value, say 0. It is
customary to denote an hypothesis by H. A statistical
hypothesis, we begin with, which is held to be true at
least temporally is known as the null hypothesis. We
denote a null hypothesis by H 0. The hypothesis that
refutes the assertion of a null hypothesis is called the
alternative hypothesis and is denoted by H1 or Ha.
The alternative hypothesis H1 is generally the

hypothesis the researcher wishes to support; this is the
reason why it is sometimes called research hypothesis.
An hypothesis may be simple or composite. If the
hypothesis completely or precisely states the value of a
population parameter it is referred to as a simple
hypothesis, if it does not, it is called a composite
hypothesis. For instance the hypothesis H: μ = 30 is a
simple hypothesis whereas the hypothesis H*: μ >30 is a
composite. In practice, it is common to have a null
hypothesis which is simple with an associated alternative
which is composite.
165
A test of statistical hypothesis is a rule which uses
experimental sample values to decide on accepting or
rejecting the hypothesis under consideration. Before this
can be done, a measure is needed to define the critical
region of the test. This measure is called the test statistic.
If the behaviour of the statistic is characterized by a
known probability distribution then its use will be greatly
enhanced since it will be possible to make probability
statements about it. However in practice, not all statistics
(estimators) can be said to have known probability
distribution.
Example 10.1 Suppose in Example 9.1 we want to test

Ho : μ = 72 versus H1 : μ ≠ 72 using =76, s = 4gm per
day, n = 50, allowing an error of  = 5%. Find the test
statistics.
Solution: The test statistic we use is
Example 10.2 Suppose in Example 10.1 we are to test.

(a) H0: μ = 72 Versus H1 : μ < 72
(b) H0: μ = 72 Versus H1 : μ > 72
Find the test statistic .
Solution: The test statistic is still the same as in

Example 10.1.
166
A two-tailed test and a one- tail test .
The test given in Example 10.1 is a 2 –tail test since H 1:

μ ≠ 72 means that either μ < 72 or μ > 72.
A two-tail test is also called a non-directional test. All the
tests in example 10.2 are one – tail test.
When H1 requires μ < 72 we have the direction going to

the left and when we have H1: μ > 72 the direction of the
test goes to the right. Therefore whenever the null
hypothesis Ho is a simple hypothesis, the direction of the
parameter in H1 decides the direction of the test.
10.1 ERROR OF TYPE I and TYPE II
A decision is normally made to accept or reject an

hypothesis and there is always a probability of incorrect
decision because of sampling variation. An incorrect
decision occurs when we reject a true hypothesis or
accept a false one resulting respectively in Type I and
Type II errors. Table 10.1 gives all possibilities of correct
and incorrect decisions on a given hypothesis
167
Table 10.1: All possible decision on a given hypothesis.
State of nature or fact Ho True Ho false
↓ →
Decision
Accept Ho Correct Wrong Decision
Decision Type II error ()
Reject Ho Wrong Correct Decision

Decision
Type I
error ()
Type I Error
Type I error is the probability of rejecting a true
hypothesis. The size of this error is normally denoted by
.The area spanned by  defines the critical region C for
Ho. When sample points fall within C, Ho is rejected and
when they fall outside C Ho is accepted and H1 rejected.
The acceptance region is (1-) (See Figures 10.1-3) The
probability of Type I error is also called the significance
level of the test.
Type II Error
Type II error is the probability of accepting a false
hypothesis. The size of Type II error is denoted by .
10.2 POWER OF A STATISTICAL TEST.

Closely related to the Type II error is the power of the
test. We define the power of a test as the probability of
rejecting Ho rightly. If the size of Type II error is , then
the power of the test is 1 – .
168
/2 = 0.25
-1.96 0 1.96
Shaded portion is the critical region. The un-shaded
portion is the acceptance region.
Figure 10.1: Critical Region for Testing Ho: μ = 72

Versus H1: μ ≠ 72
The critical region is shaded
 = 0.05
Figure 10.2: Critical Region for Testing Ho: μ = 72

Versus H1:μ < 72.
169
Figure 10.3: Critical Region for Testing Ho: = 72 Versus
H1: > 72
Example 10.3; What is the decision of the test of

hypothesis in Example 10.1?
Solution. Recall that Ho: μ = 72 Versus H1 : μ = 72 . The

test statistic Z = 7.07.
Decision Rule: Reject Ho if
That is reject Ho if = 1.96.

since Z = 7.07 > 1.96, we reject Ho and uphold the notion
that μ = 72, i.e μ < 72 or μ > 72.
Example 10.4 Carry out the test of hypotheses in

Example 10.2
170
Solution Case (a) Ho: μ= 72 Vs H1 : μ < 72
(i) Test statistic Z = 7.07
(ii) Decision Rule: Reject Ho if Z < Z = Z0.05
= - 1.645. Since Z =7.07 is not less than
Z0.05 = -1.645, we cannot reject Ho:
Case (b) Ho: μ = 72 Versus H1: μ > 72

(i) Test statistic : Z = 7.07
(ii) Decision Rule: Reject Ho if Z > Z = Z0.05 = 1.645.
Since Z = 7.07 > Z = Z0.05 = 1.645
we reject H0 and claim that the population mean μ is
greater than 72.
10.3 p – VALUE FOR A TEST
For a every test statistic, there is an associated p-value.

We reject or accept Ho on comparing the test statistic
with an equivalent critical value of the statistic.
Alternatively, a decision can still be taken using p-value.
What we do is to compare the p-value with , the level of
significance of the test and then take a decision.
Example 10: 5 Find the p-value for the test in Example

10.3 and repeat the test.
Solution; Recall that we are testing Ho: μ =72 versus
H1: μ ≠72, given  = 0.05
The calculated test statistic is Z = 7.07
Therefore the
p-value = P (Z >7.07) + P(Z<-7.07) ≈ 0
Decision Rule: Reject Ho if p < 0.05
171
Since p = 0.0 <  = 0.05, we reject Ho and this is the
same conclusion reached in Example 10.3
Example 10.6; Suppose the calculated Z was 3.04 in the

last example, calculate p-value and re-conduct the test.
Solution
P-value = P [ Z >3.04] + P[Z<-3.04]
= 0.0012 + 0.0012
= 0.0024
Since p-value = 0.0024 < 0.05 we reject the null

hypothesis Ho.
Example 10.7 Calculate the p-value for the tests in

example 10.3
Solution: The cases are

(a) Ho: μ= 72 versus H1: μ < 72
(b) Ho: μ = 72 versus H1: μ > 72
Case (a) p-value = P[Z < -7.07] ≈ 0.00
Since p-value < 0.05 we reject Ho
Case (b) p-value = P[z > 7.07] ≈ 00
Since p-value 0 < 0.05 we reject Ho
10.4 CALCULATING THE POWER OF A STATISTICAL

TEST
When we accept Ho when it is false, we incur Type II

error, denoted by . The complement of , defined by
172
K() = 1- is the power of the test. It measures the
probability of rejecting the null hypothesis when it is false.
That is
K() = P [ Reject Ho when Ho is false]

= P [ Reject Ho when H1 is true]
= 1-.
Example 10.8 Using Example 10.1 calculate  and the

power of the test (1-) when μ is actually equal to 70.
Solution: Acceptance region of the test is within the
interval; μ0 ± 1.96
That is, 72 ± 1.96 = 72 ± 1.11
= [70.89, 73.11]
The value of  is equal to the probability of accepting

Ho, given  = 70, and this is equal to the area under the
Sampling distribution for the test statistic in the
interval 70.89 to 73.11.
The next step is to calculate Z1 and Z2 given by

Z1 =
173
Z2 =
 = P[ Accept Ho when μ = 70 ]
= P [70.89 < < 73.11 when = 70]
= P [1.57 < Z < 5.50]
= 0,0582
Hence the power of the test is
1- = 1 – 0.0582 = 0.9418
The probability of correctly rejecting Ho given that μ is

actually equal to 70 is 0.9418. This is approximately
equal to 94%.
10.5 A LARGE – SAMPLE TEST OF HYPOTHESIS

FOR DIFFERENCE BETWEEN TWO POPULATION
MEANS μ1 AND μ2
We follow the following steps

1. Ho: μ1 – μ2 = D0 , where D0 is a certain difference
we wish to test. However, in many instances we
state that there is no difference between μ 1 and
μ2, hence D0 = 0
2. Alternative Hypothesis H1 Decision Rule

Reject H0 if
H1: μ1 -μ2 > D0
H1: μ1 -μ2 < D0
H1: μ1 – μ2 ≠ D0
174
3. Test statistic:
4. Assumption
(i) The samples are randomly and independently

selected
(ii) The two populations are normally distributed
(iii) n1 ≥ 30 and n2 ≥30
Example 10.9 For the first 100 days of manufacturing

electric bulbs, Xcel Lights got their raw materials from
two major contractors, De Lon and Xpert. The outputs
gave a mean 26350 bulbs with variance 1435000 using
inputs from De Lon, and 25096 bulbs with variance equal
to 1957000 using inputs from Xpert.
Test the Ho: μ1- μ2 = 0 Verus Ho: μ1-μ2 ≠ 0

at  = 0.01
Solution: The test statistic Z is
175
Decision Rule: Reject H0 if
@where
Since , we reject H0 .
10.6 A LARGE-SAMPLE TEST FOR A POPULATION
PROPORTION P
We follow the follow steps:
1. Null Hypothesis: H0: P = P0
Reject H0 if
H1: P > P0
H1: P < P0
H1 : P ≠ P0
( )
3. Test statistic
4 The value of n should be considered large when

n P0≥ 5 and n(1- P0) ≥ 5.
Example 10:10. A random sample of 973 sale

transactions reveal that 584 customers made purchases
176
of video camera from DK Technology through the internet
for the year 2006. Estimate the population of customer
who made purchases of video camera online. If this
proportion for the population was 0.70 in 2006. Test the
hypothesis that it is less than 0.70 in 2007 at  = 0.05.
Also obtain p-value for the test.
Solution: The point estimate for P is

P = x = 584 = 0.60
n 973
Now we want to test

Ho: P = 0.70 Versus H1: P < 0.70
The test statistic is
= -6.81
Since -6.81 < -Z0.05 = -1.645
We reject Ho and conclude that less than 70% of

customers made online purchases.
P-value for the test is P [ Z <-6.81] ≈ 0.00< 0.05
10.7 A LARGE-SAMPLE STATISTICAL TEST FOR

THE DIFFERENCE BETWEEN TWO PROPORTIONS
We follow the following steps

177
1. Null Hypothesis Ho: P1 -P2 = Do
Reject H0 if
H1: P1 -P2 > D0
H1: P1 -P2 < D0
H1: P1 – P2 ≠ D0
2. Test Statistic
(a) If D0 = 0, SE is given by the formula
(b) If D0 ≠ 0, we use SE =
Example 10:11
Use the information is Example 9.5 to investigate if the
proportion of people from Oto favoring consensus
appointment is higher than the proportion from Aka
Take = 0.01
Solution: We want to test

Ho: P1 = P2 Versus H1: P1 > P2
Test Statistic is
178
From Example 9.5,
SE = 0.0753 and = 39/50-66/100 = 0.78-0.66=0.12
Z= 0.12/0.0753 = 1.5936
The critical value of Z, denoted by Z0.01 is

Z0.01 = 2.3263
Since Z = 1.5936 < Z0.01 = 2.32663
We cannot reject the null hypothesis. Hence the
proportions from Oto and Aka favoring appointment by
consensus are practically the same. In other words we
accept the null hypothesis.
10.8 A SMALL SAMPLE TEST ABOUT A

POPULATION MEAN (TESTING Ho: μ = μ0 )
In carrying out this test, we use the t-distribution. The

basic assumptions underlying its use are
(a) The items coming into the sample must be

randomly selected
(b) The population from which the sample is drawn
must be normally distributed. The steps for the
test are as follows:
1. State the null hypothesis: Ho: μ = μ0
2. Define the test statistic
with (n-1) degrees of freedom

179
Example: 10:12 Using data supplied by Example 9.6 can
you say that the daily mean production of cement is
higher than 500000kg?
Solution: The null hypothesis is

Ho: μ = 500000 Versus H1: μ > 500000
Test statistic:
= 530830kg, s = 57310kg, = 500000

n = 6, df = n-1 = 5 , = 0.05
For df = 5, = = 2 .015
Since t = 1.32 is not greater than = 2.015
We cannot reject Ho
180
Example 10:13 Suppose we are interested in testing
Ho: μ= 500000 as in Example 10.12
Find the p-value for the following alternative hypothesis if
= 0.05
(a) H1:μ > 500000 (b) H1 : μ<500000 (c) H1: μ ≠

500000
Solution (a) H1:μ > 500000

P-value lies to the right of t=1.32 under the t-curve.
When p = 0.05, t = 2.015 and when p = 0.10, t = 1.476.
If we denote by y the p-value corresponding to t = 1.32,
we can have the following arrangement.
P-value y 0.10 0.05

t-value: 1.32 1.476 2.015
using 5 degrees of freedom
We use linear interpolation as follows.
The difference 0.10- y is proportional to 1.476 -1.32 even

as 0.10 – 0.05 is proportional to 2.05 – 1.476
That is
(y- 0.10) : 0.156 as 0.05 : 0.539
means
Extremes
181
Product of extremes = product of means
(y- 0.10) (0.539) = (0.156) (0.05)
y- 0.10 = (0.156) ( 0.05) = 0.0145

0.539
i.e y - 0.10 = 0.0145

and y = 0.10 + 0.0145 = 0.1145
Therefore the required p-value is 0.1145 and is greater

than = 0.05. Ho is accepted .
(b) H1 : μ < 500000

The P-value is the area lying to the left of - =-
1.32. By symmetry this area is equal to 0.1145.
Again we accept Ho
(c ) H1: μ ≠ 500000.
We can only reject .
Now, since the test is 2-tailed,

P-value =
= 0 .1145 + 0.1145 = 0.2 290
Again we accept Ho
10.9 A SMALL SAMPLE TEST FOR THE

DIFFERENCE BETWEEN TWO POPULATION
MEANS:INDEPENDENT RANDOM SAMPLES WITH
EQUAL POPULATION VARIANCES.
182
TESTING Ho: μ1 – μ2 = Do when
1. The test statistic is
2. We assume that population 1 and 2 are normally

distributed and samples 1 and 2 are independent.
3. We test versus H1 as follows:
The degrees of freedom for the test is n1 + n2 – 2
Example 10.14 From Example 9.7 and Table 9.2 can we

conclude that μ1 – μ2 ≠ 0 ? Use.
Solution The test statistic is
183
= 3.6/2.5153 = 1.431
Df = n1+ n2 – 2 = 10 + 10 – 2 = 18,
= 0.10, = 1.734
Ho: μ1 – μ2 = 0 Versus H1 : μ1 – μ2 ≠ 0
Since t= 1.431 < 1.734 we accept Ho.
The population means are essentially equal.
10.10 A SMALL SAMPLE HYPOTHESIS TEST FOR

μ1 – μ2 = Do when
In this case we have heteroscedastic variances and the

test statistic is
where the degrees of freedom df is furnished by

Satterthwaite’s approximation mentioned earlier.
Procedures for testing Ho: μ1 – μ2 = Do when
are similar to the case for
10.11 A SMALL SAMPLE HYPOTHESIS TEST FOR

Ho: μ1 – μ2 = Do WHEN DATA ARE PAIRED.
184
We will simply work an example
Example 10:15 Table 9.3 shows amount of contaminants
in experimental ponds. The measurements were taken
using Device 1 and 2. Can we claim that μ 1 > μ2.
Compute for each Device and comment.
Assume  = 0.05
Solution: The test statistic is
where
Since t > , we reject Ho : μ1 – μ2 = 0 and uphold

that μ1 > μ2.
185
For Device I, = 1.316/11.34 = 0.116 (= 11.6%)
For Device II, = 1.858/10.82 = 0.172 ( = 17.2%)

From the above results, Device I is more efficient. Its
coefficient of variation of 11.6% is lower than that of
Device 2 with efficiency of 17.2%.
10.12 INFERENCES CONCERNING A POPULATION

VARIANCE
The problem of Example 10.15 shows us that we should

look at both the means and variances of the populations
considered. A measure of variability gives us an idea of
error of measurement and efficiency especially when
using two or more scientific instruments in measuring the
same phenomenon.
Let s2 be the variance of a random sample of n items

selected from a normally distributed population having
variance σ2. The statistic (n-1)s 2/ σ2 has a chi-square
distribution with n-1 degrees of freedom. The notation for
“chi-square” is
1. We can test Ho: by using the statistic
If Type 1 error is , we can reject Ho if for
(a) H 1: we have
(b) H 1: we have
(c) H1: we have
186
2. A 100(1- ) % confidence interval for σ2 is
Example 10.16: If n = 30, s2 = 0.1985
(a) Find a 95% confidence interval for σ2

(b) Test Ho: σ2 = 0.49 Versus H1: σ2 > 0.49
using  = 0.05
Solution A 95% confidence interval for σ2

is
(b) To test Ho: σ2 = 0.49 Versus H1: σ2 > 0.49

]
The test statistic is
Now
Since = 11.748 < ,
we accept Ho.
187
10.13 COMPARING TWO POPULATION VARIANCE
Consider two normal populations with variances and
. Now Ho: = is equivalent to Ho : / =1.
and H1: > is equivalent to H1: / > 1. The
statistic / has an F distribution with (n1 -1) degrees of
freedom for the numerator and (n 2-1) degrees of freedom
for the denominator.
We use and to denote respectively (n 1-1) and (n2-1).

At times some authors use df1 and df2 instead.
We also assume that the two samples are independent.
1. To test Ho: = Versus H1 : >

we use the test statistic F = / . We reject Ho
if F> F or equivalently if p-value <  . The p-value is the
area under the F -distribution curve to the right of F.
2. To test Ho: = Versus H1 : <

Test statistic: F = /
We reject Ho if F> F or equivalently if p-value < 
‫ﻪ‬
3. To test Ho: = Versus H1 : ≠
Test Statistic F =
Reject Ho if F> F/2 or equivalently if p-value < 
188
P-value is equal to twice the area to the right of F
under the F distribution curve.
Problem Set 10
1. Given the following hypothesis:
Ho: =8
HI: >8
For a random sample with size n =10 the sample mean is

11 and the standard deviation is 4. Using = 0.05
(a) Compute the test statistic
(b) State the decision rule
(c) Are you accepting Ho?
2. Given the following hypothesis
Ho: = 395
HI:  395
A random sample of 11 observation gives a mean of 405
and standard deviation of 5 using = 0.01
(a) Compute the test statistic
(b) State the decision rule
(c) Are we accepting Ho?

189
3. The dosage taken by a random sample of 8
chickens of a certain medicine are as follows (in
grams)
9.1, 8.8, 8.8, 8.5,8.9, 8.6, 8.9, 9.1
At the 0.5 significance level, can we conclude that the

population mean intake is less than 9.0 grams?
(a) State the null and the alternate hypothesis
(b) Compute the test station t and state its degree of

freedom
(c) State the decision rule
(d) Is the null hypothesis accepted?
4. A random sample of 70 observations is selected

from one population. The sample mean is 3.02 and the
sample standard deviation is 0.80. A random sample of
40 observations is selected from a second population.
The sample mean is 2.89 and the sample standard
deviation is 0.70. Test the Ho: versus HI: .
(a) Is the above test a one-tailed or a two-tailed test?
(b) Compute the value of the test statistic
(c) Have you accepted Ho?
190
5. A food scientist wishes to compare the weight gain
of infants using Tex baby good and Dido brand. A sample
of 50 babies using Tex’s products revealed a mean
weight gain of 8.7 pounds in the first three months after
birth. The standard deviation of the sample was 3.3
pounds. A sample of 60 babies using Dido brand of good
revealed a mean increase in weight of 9,2 pounds, with a
standard deviation of 3.9 pounds. If the level of
significance is 0.05, can we conclude that babies that
used Tex brand gained less weight ?
6. Of 300 adults who tried a new brand of short

bread 173 rated it as excellent. Of 400 children
surveyed, 247 rated the flavour as excellent. Using a 5%
level of significance, can we conclude that there is a
significant difference in the proportion of adults and the
proportion of children who rated the flavour excellent?.
a) State Ho and H1
b) Is this a one-tailed or a two – tailed test?
c) Compute the test statistic
191
d) State the decision rule
e) Have you accepted Ho ?
7 Sales of a CD containing a play by a star actress were done

with normal price and reduced price and the data are as
follows.
Normal price 237 221 188 215 242 226
Reduced Price 229 235 253 236 216 213 221
At the 0.01 level of significance,

can we conclude that the price reduction boosted sales ?
8. A study of the effectiveness of a medicated soap in

reducing operating room contamination gave rise to the table
below.
Room 1 2 3 4 5 6 7 8
Before 7.7 7.6 10.0 11.4 12.3 9.2 7.4 12.7
After 7.9 3.5 8.5 9.6 9.2 7.2 4.5 3.0
The test on the new soap was carried out in central Port
Harcourt last year. If the level of significance is 0.05, can
192
we say that the new soap lower the contamination
measurements?
9Consider the following hypotheses

Ho:
A random sample of 7 observations from the first
population gives rise to a standard deviation 10. A
random sample of 5 observations from the second
population gives rise to a standard deviation of 6. At the
0.05 significance level, is there a difference in the
variation of the two populations?
10. A stockbroker with Smart Securities revealed that the

mean rate of return on a sample of 10 oil stocks was 14.7
percent with a standard deviation of 4.02 percent. The
mean rate of return on a sample of 9 utility stocks was
11.8 percent with a standard deviation of 3.85. At the
0.01 significance level, can we conclude that the oil
stocks has more variation then the utility stocks?
193
CHAPTER 11
THE DESIGN AND ANALYSIS OF EXPERIMENTS
11.0 INTRODUCING EXPERIMENTAL DESIGN
Studies may be categorized into two types. Some

studies are observational, in which the researcher does
not impose control on the data but observes and records
characteristics of existing data. Other studies involve
experimentation where there is room to bring in some
measure of control by imposing one or more conditions
on experimental units and recording or determining the
effect on the response variable. Experiments are the
main way of discovering knowledge in most areas of
studies. Some areas where we apply experiments are:
1.Medicine: Scientists test new drugs and treatments.
2.Agriculture:Researchers investigate new crop varieties

and new ways of growing crops.
3.Education: Educationists examine ways of teaching.

4..Industry: Industrialists investigate ways of improving
product quality.
The question is: “ what is design of experiments” or “

experimental design” ?
Design of experiments involves

(a) Freedom to fix the levels of
the explanatory variables, and
194
(b) Choosing the combinations of
these levels at which to observe the response
variable.
11.1 SOME BASIC TERMS IN EXPERIMENTAL

DESIGN
1. Response: This is the dependent variable being

measured by the researcher or experimenter, as a
result of a stimulus or stimuli furnished by some
factor (the independent variable).
2. Factor: A factor is an independent variable

whose values are controlled and varied by the
experimenter.
3. Level: This is the intensity at which the factor is

set.
4. Treatment. A treatment is a specific combination

of factor levels.
5.……..Experimental unit. This is the object or item on

which a measurement (or measurements) is taken. It is
also that part of the material to which the treatment is
applied for taking observation and measurement.
11.2 PRINCIPLES OF EXPERIMENTAL DESIGN
We discuss briefly below four basic principles of

experimental design which are randomization, replication
blocking and blinding
195
Randomization: This is defined as a procedure for
randomly
(a) allocating the experimental units to specified
groups and
(b) Ordering of individual runs (trials) of the
experiment to be performed. We note that
randomness touches both allocation of
experimental material and ordering of
experimental trials.
The reasons for randomization are as follows:
1. Random assignment of experimental units to

treatments is needed to fulfill the statistical
assumption in the analysis of variance.
2. Random allocation minimizes the effects of
systematic and personal biases.
Replication. Replication means independent repetitions

of an experiment under identical experimental conditions.
Each trial or repetition constitutes a replicate of that
experiment. In an accelerated life testing study, a
researcher wishes to determine the shelf life of hot dogs
using four storage temperature and it is agreed that 20
packages should be used in each storage temperature.
In this case, a package of hot dogs is the experimental
unit and r = 20 constitutes the number of replications in
the experiment. If S 2 is an estimate of the error variance
σ2, the the standard error SE( ),of the mean shelf life
for each temperature setting is
SE( ) = S/
196
It is clear, that SE( ) can be reduced by increasing the
number of replication r, hence beefing up precision and
sensitivity in measuring treatment differences.
Blocking. This is a technique that makes statistical tests

to be more sensitive by reducing the experimental error.
Reduction of experimental error implies increasing the
precision of results. Blocking means dividing
experimental material into sections such that;within each
block, material is more homogeneous than material in
another sector.
Some examples of blocking are the paired comparison
designs given in Examples 9.8 and 10.15 where ponds
are the blocks. The paired comparison design is a
special case of a more general type of design called
randomized block design (RBD) where treatment
combinations are randomized within blocks.
Blinding.
A nonstiatistical element, so important in consumers
testing and other studies where the data are subjectively
obtained, is blinding. Blinding refers to the concealment
of the identification of treatments from the experimenter
and from the panelists or judges. Blinding is
accomplished by coding the treatments using two-or
three-digit numbers. The purpose of blinding is to control
bias due to the experimenter’s knowledge of the study
and the panelists’ knowledge of the treatments. For
instance, comparison of products that are brand identified
are biased owing to brand effects, package design and
other appearance characteristics, when not concealed,
generally contribute to biased evaluation by panelists.
11.3 WHAT IS ANALYSIS OF VARIANCE?

197
Analysis of variance (ANOVA) is a procedure for
partitioning the total variation (sum of squares) in an
observed data set into various components and
assigning them to respective causes. The cause, is also
called source of variation.
When the source of each component is identified, it is
then tested for its significance as a source of variation in
the data.
11.4 SOME USES OF ANALYSIS OF VARIANCE.
ANOVA is useful in many aspects. First, it provided a

quick means of calculating proofed variance for all the
groups involved in a particular analysis. Second, it
enables comparison involving two or more means to be
made at once. Finally, it is used to determine the effects
of various factors in more complex experimental designs.
11.5 THE ASSUMPTIONS FOR ONE-WAY ANOVA.
The assumptions are that of homoscedasticity, normality

and independence. In our discussion, let us assume
that we have a treatments
1. Homoscedasticity. Observations within each

treatment are distributed with the same variance.
That is if we have a treatments, each of them has the
same population variance, say σ 2. In practice, if
treatment i has the largest sample standard deviation
Si and jth treatment has the smallest sample standard
deviation Sj constant variance is said to hold if Si <2Sj
198
2. Normality. Responses within each treatment are
normally distributed.
3. Independences. Every sample of experimental limits

for each treatment are randomly selected. Sample for
the treatment are independently selected. That is we
select sample i for treatment i and sample j for
treatment j such that the samples i and sample j are
independent.
11.6 THE COMPLETELY RANDOMISED DESIGN

(CRD) ONE-WAY CLASSIFICATION ANOVA).
In this design, we select random samples

independently from each of a populations. Each of the
populations represents one level of the factor that is
being studied. That is one factor with a different levels
giving rise to one factor with a treatments. This is the
reason we call the design one-way classification. We also
state that CRD is an extension of test involving two
means from two independent samples.
Since the samples are more than two, we cannot

use the usual t-test to compare all the means at once.
We use an F test instead. However, we compare the
means pair wise to find out where the differences came
from. For instance, suppose we test the hypothesis Ho:
μ1 = μ2 = μ3 and Ho is rejected, from where comes the
differences?
In order to answer this, we conduct three tests

involving pairs of means;
Ho: μ1 = μ2 Ho : μ2 = μ3 and Ho : μ1 = μ3
199
Our next numerical illustration is on how to construct and
implement a CRD.
Example 11.1 Suppose we have 1000 students in first

year algebra class in the University. We want to know
the effects of three teaching method A, B, C, D. How do
we implement a completely randomized design to
investigate the effects of the three methods on the score
(performance) of the candidates.
Solution. Before testing them we allocate the

candidates into groups A, B, C, using random
assignment. We experimental units here are the
candidates. We can choose to put 10 candidates in each
group (you can choose 20 or 30 if you wish).
Now, we proceed to randomly select 10 candidates from

the 1000 candidates and send them to group A. Next we
randomly take another 10 candidates from the remaining
990 students and send them to group B. Lastly, we
select randomly another 10 candidates from the
remaining 980 candidates and put them in group C.
Lastly we teach the groups 1, 2, 3 using methods A, B

and C respectively. A test is given to all the groups and
the scores recorded.
Example 11.2 Suppose we want to compare a

population means 1 , 2 …. a using independent
random samples size n1, n2……, na from normal
populations with a common variance σ2. Proceed with
the necessary intermediate calculations and test the
hypothesis
200
Ho: 1 = 2 =….= a
Versus
H1: At least one of the means is different from others.
Solution: This is ANOVA for a completely randomized

design. We begin the ANOVA by partitioning the total
variation into some components and drawing an ANOVA
table from which we calculate the F-ration required for
testing Ho.
(a) Partitioning the total variation in the Experiment

Let n= n1, n2……, na be the total number of
observations in the experiment
Recall ni ( i = 1, 2,…..a) is the number of observations in
sample i.
Let Ta be the total of all observations in sample i and G

be the grand total of all n observations. Finally let y ij be
the jth measurement ( j = 1, 2 ….. , ni ) in the ith sample..
With these notations, if the grand total of all the n

observations is G and = G2 /n
Then
1. Total Variation SSyy =
The total variation is partitioned into two components –
one component measures variation between samples
and the other is due to variation within samples variation
between samples is called treatment sum of squares T yy
and variation within sample is called error sum of squares
Eyy
(2) Ttreatment sum of squares Tyy
201
(3) Eyy = SSyy – Tyy
(4) Degrees of freedom. The degrees of freedom for

total sum of squares SS yy is n-1 for treatments, it
is (a -1) and error has (n-1) – ( a-1) = n - a
degrees of freedom
(5) Mean squares (ms) are obtained by dividing each

sum of squares by degrees of freedom df.
(c) ANOVA table (See Table 11.1) This is a display of

sums of squares, mean squares and degree of
freedom. F ratio is also calculated.
Table 11.1 ANOVA for a Independent Samples
Source Df SS MS F
Treatment a-1 Tyy MST = Tyy /(a -1)
Error n-a Eyy MSE = Eyy /(n-a)

Total n-1 SSyy
202
Example 11.3. In an experiment to determine the
effect of teaching on performance of secondary school
students a group of 15 students were randomly assigned
to each of three teaching methods: teaching with no
charts, teaching with commercial charts and teaching
with teacher-made charts. The post-test scores are as
given in Table 11.2
Table 11.2: Students’ Scores with Different Teaching

Methods.
No Charts Commercial Teacher-Made
Charts Charts
42 43 16
36 40 35
71 35 40
30 42 28
55 44 33
T1 = 234 T2 = 204 T3 = 152
Construct the ANOVA table for this experiment.
Solution. In this problem a = 3, n1 = n2 = n3 = 5 and

, n1 + n2 + n3= 15
Hence G /n = 5902 /15 = 23206.67
2
(1) Total sum of squares SSyy = -G2 /n

422 + 362 +….+ 282 +332 – 5902 /15
= 25367.81- 23206.67
203
= 2161.14
(2) Treatment ss: Tyy = 2342 + 2042 + 1522 – G2/n

5
= 23895.20 – 23206.67
= 688.53
(3) Error SS: Eyy = SSyy – Tyy
= 2461.14 – 688.53
= 1472.61
The ANOVA table to display the above calculations is
Table 11.3 below.
Table 11.3:One-Way ANOVA for Example 11.3
Source df SS MS F
Treatment 3 -1 =2 688.53 344.26 2.81
Error 15 -3 = 12 1472.61 122.72
Total 15-1 = 14 2161.14
Example 11.4 Is there sufficient evidence from

Table 11.2 to indicate a difference in the mean scores
based on the type of teaching method? Use a
significance level of 0.05
Solution: We answer this question by testing

Ho: Ho: 1 = 2 = 3 versus
H1: At least one of the means is different from others.
The calculated F statistic (using information from Table

11.3) is
F = MST / MSE = 344.26 / 122.72 = 2.81
with (2, 12) degrees of freedom. The critical F-value is
204
Since F = 2.81 < we uphold the notion that
the means are essentially the same. That is, we accept
Ho.
11.7 MULTIPLE COMPARISON TESTS.

There are several multiple comparison tests. Some are
least significant difference test, (LSD, Dunnett’s
test,Tukey’s test and Duncan test. For more on multiple
comparison tests, see Gacular and Singh (1984).We
shall describe only LSD and Duncan tests in this book.
LSD Test. We shall describe the steps involved

Step1. Arrange the treatment means in ascending order
Step 2. Calculate the standard error s ij of pairwise

differences between means using the formula
(11.1)
where
MSE = the mean squre error from the ANOVA table
= size of group i
= size of group j,
Step 3 Let the degree of freedom of MSE from ANOVA
be find Recall that
205
(11.2)
Step 4 Calculate LSD using the formula
LSD = (11.3)
If the groups have equal sample size n, Equation (11.3)

Is replaced by
LSD = (11.4)
Step 5 Let If
(11.5)
then the population means i and j differ significantly

Step 6 Find out pairs of means which do not experience
significance difference and and underline them. From
here , the mean which is more significant than any of the
other means will stand out clearly.
Duncan’s Multiple Range Test We simply outline the

procedure as follows:
Step 1. Arrange the means in ascending order.
Step2 . With equal sample sizes n , the standard error of
each mean is
(11.6)
For unqual sample sizes, n in the last equation is

replaced its harmonic mean nH, where
206
(11.7)
Step 3 . Compute Rp, the least significant range, using
, p =2, 3, …., a (11.8).
The quantity , is obtained from Duncan’s table of

significant ranges for p treatment means.
Step 4. Compute dij for all the a(a-1)/2 pairs of means.
Step 5. Declare dij ,significant if

dij > Rp (11.9)
Step 6. Any pair of means not significantly different from
each other are underlined.
In all these calculations, the  used for the computing

of Rp is the same used for the significance of F in the
ANOVA. Also, for Duncan and LSD tests, F from the
ANOVA must be significant.
11.8 THE ANOVA FOR A RANDOMIZED BLOCK

DESIGN (RBD)
In a randomized block design, we compare a treatments

(for example teaching methods) by using b blocks. The
blocks can be made of b teachers with b types of
qualifications. This implies using blocks of a
experimental units that are homogeneous (relatively
207
similar) with one unit within each block randomly
assigned to each treatment. The main advantage of
RBD over the completely randomized design (CRD) is
that we are compare treatments using homogeneous
units, so that any true differences in the treatments will
not be concealed by differences in the experimental
units. The number of observations in the experiment is n
= ab
11.9 SUMS OF SQUARES FOR RBD WITH a

TREATMENTS AND b BLOCKS.
Let yij be the response when the ith treatment is

applied to the jth block .( i = 1, 2,….a) j = 1,2,….b)
Let SSyy be the total variation in the data set
Then
SSyy = Tyy + Byy + Eyy
where
Tyy = treatment sum of squares
Byy = block sum of squares
Eyy = Error sum of squares
If C =
Then
208
with
= total of all observations in treatment i.

= total of all observations in block j.
The above calculations are displayed in Table 11.5

below
Table 11.5 ANOVA TABLE FOR RBD, a

TREATMENTS, b BLOCKS
Source Df SS MS F
Treat a-1
ments
Blocks b-1
Error (a-
1)
(b-
1)
Total .ab
–1
209
Example 11.6 Each of four different agencies, took
samples of groundwater from five toxic-waste dump sites.
Each sample was analyzed and amount of contaminant
determined by agency collecting the sample. The
concentration in parts per million are given in Table 11.6
below
Table11.6:Measurement of contaminant from five

sites.
Agency 1 Agency 2 Agency 3 Agency 4

Site 1 24.7 20.3 21.8 19.1
Site 2 8.7 7.9 6.1 6.0
Site 3 16.5 14.3 15.0 14.0
Site 4 31.7 23.6 28.3 23.5
Site 5 5.4 4.2 4.0 4.1
(a) Is there sufficient evidence to believe that there
are inconsistencies in the measurements of the
agencies?
(b) Can we say that the dump sites differ from one
another in their level of contamination. Take the
level of significance to be 0.05
Solution. We shall regard agencies as treatments and

sites as blocks. We are to test the following hypotheses:
(a) Ho: Treatment means are the same(That is, there is
consistency in the agencies’ recordings) Versus
H1: Treatment means are not the same(That is, there

is no consistency in the agencies’ recordings)
(b) Ho: Block means are the same(That is, levels of

contamination are the same at the sites) Versus
210
(c) H1: Block means are not the same(That is, levels of
contamination are not the same at the sites) Versus
Source Df SS MS F
Agencies 3 46.972 15.6573 6.6279
Sites 4 1414.528 353.632 149.6961

Error 12 28.348 2.3623
Total 19 1489.848
The F, p-value and Fcritical are as follows:

F p-value Fcritical
Agencies 6.6279 0.0069 4.49
Sites 149.6961 3.96E-10 3.26
For agencies p-value < 0.05 .This same result holds for
sites. Therefore
(a) The agencies are inconsistent in their
measurements. We reject Ho.
(b) The dump sites differ in their levels of
contamination. We reject Ho.
11.10 THE ANOVA FOR AN a x b FACTORIAL
EXPERIMENT
Situations arise when interest of researchers is in the

investigation of two or more factors on a responses and
to explore the interactions between the factors. The
design that is appropriate here is factorial experiment. It
is customary to denote the factors by upper case letters
A and B (in the case of two factors) and the levels in the
factors by lower-cases a and b. The factors give rise to
ab factor combinations. Each factor combination gives
rise to a treatment. If we design to have, say r
211
observations per treatment, we say that the experiment is
replicated r times since each of ab combination is
replicated r times, the total number of observations in the
experiment is n = abr
11.11 PARTIONING OF TOTAL VARIATION IN

AN a x b FACTORIAL EXPERIMENT.
First, we have
a levels of factor A
b levels of factor B
r replications of each of the ab factor combinations
n = abr observations in the whole experiment
Let G = sum of all n = abr observations

Ai = sum of all observations at the ith
Level of factor A ( i = 1, 2 ….a)
Bj = Sum of all observation at the jth
Level of factor B (j = 1, 2,…b)
(AB)ij = sum of all the r observations at the ith level
of factor A and jth level of factor B
SSyy = total sum of square

SSA = sum of squares for factor A
SSB = sum of squares for factor B
SS(AB) = sum of squares for interaction
SSE = sum of squares due to error

SSA and SSB are called main effect sum of squares and
SS(AB) is the interaction sum of squares
As before, set C = G2 / n
Then
212
SSE = SSyy – [SSA + SSB + SS (AB) ]
All these sums of squares are displayed in Table 11.7
Table 11.7: ANOVA TABLE for an a x b factorial

Experiment
Source Df SS MS F
A a-1 SSA MSA = SSA/a-1 MSA/MSE
B b-1 SSB MSB = SSB/b-1 MSB/MSE
AB (a-1) SS MS(AB) =
(b-1) (AB)
Error ab(r-1) SSE MSE =
Total abr – 1 SSyy
11.12 TESTS OF HYPOTHESES IN A

FACTORIAL EXPERIMENT.
213
(a) Testing for Main Effect A
1. Null Hypothesis: Ho there are no differences
among means
2. Alternative Hypothesis: H1: AT least two of
the Factor A means differ
3. Test statistic F = MSA / MSE based on [ a-1,
ab ( r-1)] degrees of freedom
4. Decision Rule: Reject : Ho if F > F  or
equivalently when p < 
(b) Testing for Main effect B
1. Null hypothesis: Ho:There are no differences

among means of Factor B.
2. Alternative hypothesis: H1 : At least two of the
Factor B means differ
3. Test statistic F = MSB / MSE, based on [ ( b-1) ,
ab (r-1)]
4. Decision rule: Reject : Ho if F > F  or
(c) Testing for Interaction
1 Null Hypothesis Ho: Factors A and B do not

interact.
2 Alternative Hypothesis Reject : H1: Factors A
and B interact
3 Test Statistic: F = MS(AB) / MSE, based on
[ ( a-1) b-1), ab(r-1)] degrees of freedom
4 Decision Rule. Reject : Ho if F > F  or
214
Example 11.7: Table 11.8 shows sales of scientific
calculators by two sales agents observed on four
randomly selected brands of calculators. The
numerical entries in the table are sales in thousand of
naira. Consider the table as containing two factors
brand and sales agent.
Table 11.8: Calculator brands and Sales Agents
Brands of Calculator
Sales Agent Brand1 Brand 2 Brand 3 Brand 4
Sales agent 1 673 581 571 645
712 575 532 725
730 640 552 600
Sales agent 2 580 724 732 800
617 700 780 750
575 680 761 855
Using  = 0.05 significant level test for
(a) Sales agent effect
(b) Brand effect, and
(c) Interaction between sales agent and brand.
Solution
Source Df SS MS F
Brands 3 27758.83333 9252.944444 6.80
Sales 1 43180.16667 43180.16667 31.74
Agents
3 87430.16667 21.42
29143.38889
Error 16 21764.66667
1360.291667
215
Total 23 180133.8333
The F, p-value and Fcritical are as follows:

F p-value Fcritical
Brands 6.80 0.0036 3.34
Sales Agents 31.74 3.73E-05 4.49
Interactions 21.42 7.54E-06 3.24
The brands’ p-value < 0.05 .This same result holds for
sales and interactions. Therefore
(a) There is brand effect. We reject the null hypothesis

of no brand effect.
(b) There is sales agent effect. We reject the null

hypothesis of no sales agent effect.
The above results can still be obtained since

in all the three cases above F > Fcritical
Problem Set 11
Write briefly on the following, in the context of

experimental design:
a) Randomization b) Replication c) Blocking
d) Blinding
2. The following are the number of errors made in

five successive days by four operators working for a
medical diagnostic laboratory.
Operator 1 Operator 2 Operator 3 Operator 4

216
7 15 11 10
13 10 13 11
11 11 6 7
7 12 14 12
12 13 10 10
Test at the level of significance whether the

differences among means can be attributed to chance.
3. Random samples of size 5 are taken from large

group of persons taught by 3 different methods. Below
are the scores obtained in an assessment test.
Method 1 Method 2 Method 3

74 93 73
78 83 78
68 90 77
73 87 80
76 75 76
(a)Test for treatment effect at

(b) Perfom multiple comparison test on the data .Can you
justify the use of Duncan’s Mutiple range test ?
4. A study was carried out to study the performance

of five different brands of detergents. The following
‘brightness’ reading resulted with specially designed
washing machines.
217
Detergent 1 Detergent 2 Detergent 3 Detergent 4
Machine 1 46 48 49 43
Machine 2 42 45 51 39
Machine 3 52 53 54 50
Taking the detergents as treatments and the machines

as blocks.
a) Draw the appropriate ANOVA table
b) Test for treatment effect
c) Test for block effect
Take
5. The table below gives the number of pair of

scissors produced by three different technician working
on two different types of machines 1 and 2, on different
days of the week.
Machine 1 Machine
Mon Tues Wed Thu Fri Mon Tues Wed Thu Fri
Technician 1 16 19 18 19 13 15 17 19 18 16
Technician 2 13 17 15 19 12 12 16 13 17 132
218
Technician 3 18 17 22 24 19 18 16 19 22 16
Test at for significance

a) Between machines
b) Among Technicians
6. The scores for four secondary schools in the South-

South, Nigeria in a Zonal contest in English communication
are as follows:
School 1 School 2 School 3 School 4
95 76 69 69
91 69 72 70
86 78 75 73
79 85 77 64
89 78 75
69 66
62
a) Perform a one-way ANOVA on the data

b) Calculate treatment means and plot them
c) Perfom multiple comparison test on the data .Can
you justify the use of least common difference (LSD) ?
219
CHAPTER 12
CORRELATION
12.0 INTRODUCTION
Correlation is concerned with whether or not there is

any association between two variables. If there is an
association between two variables to some degree, then
changes in one variables are associated with changes in
the other. Some examples include association between
household expenditure and income; and volume of sales
and advertisement made on a given product.
The data set to be used shall be bivariate we shall
call them independent and dependent variables, to be
220
denoted respectively by X and Y. An independent
variable is the presumed cause of any change in a
response or dependent variable. It is also called
predictor variable or explanatory variable. The
dependent variable is the presumed effect, or response
to a change in an independent variable.
12.1 TYPES OF MEASURES OF ASSOCIATION.
There are various measures of association that

are used to study bivariate data. One criterion used to
distinguish these measures is the scales for
measurements. While some measures have been
specially devised for nominal variables, others are for
either ordinal, interval or ratio-level variables.
Another important criterion used in categorizing
measure of association is how they are standardized or
normal. A measure of association whose values lie in a
fixed interval is called a normed or standardized
measures of association.Most normed measures of
association fall into one of two categories,Type 1,Type 2.
A Type 1 measure assumes values between 0 and 1. A

value of 0 indicates “no association”, while a value of 1
indicates perfect association. A type 2 measure assumes
values between – 1 and 1. A value of -1 indicates perfect
negative association, a value of 0 indicates while a value
of 1 indicates a perfect positive association.Also, a value
less than zero, for example -0.7 indicates a negative
association or indirect association”
The measures we use in correlation analysis in this

book are Pearson and Spearman rank correlation
coefficient whose values are in the closed interval [-1,
221
1] .Hence they are of Type 2 measure of association.
Some examples of Type 1 measure are correlation
coefficient of contingency and correlation coefficient
between attributes For 2 x 2 tables correlation between
attributes is often referred to as tetrachoric correlation.
12.2 USES OF CORRELATION ANALYSIS.
Correlation analysis is used in knowing.

1. The existence or otherwise of any association between
Two variables.
2 The strength of the association
3 The direction of the association, and
4 The proportion of the variation in the dependent
variable that is accounted for by the independent
variable.
12.3 HOW TO INDICATE CORRELATION BETWEEN

TWO VARIABLES.
There are different methods of showing if correlation
exists between two variables. These methods includes;
1. Scatter diagrams
2. The product moment coefficient of correlation
3. The coefficient of rank correlation
4. Regression.
12.4 SCATTER DIAGRAMS.
The scatter diagram may be used to give the initial signal

of the presence or otherwise of association between
variables. Using diagrams from Figure 12.1 to Figure
12.5. we illustrate different types of correlation that are
possible with a bivariate data set.
222
Y Y
X
X X
X X
X X
X X
X
X
X
X X
X X
X
X Figure 6.2: Negative Correlation

X
Figure 6.1 Positive Correlation
X
Y X Y X
X X
X X
X
X
X
X X
X X
X X
X
X
Figure 6.3: Perfect positive X Figure 6.4: Perfect Negative X
Correlation Correlation
Figure 12.1 is an example of positive correlation in which

the variables X and Y move in the same direction. Y
223
increases as X increases. In Figure 12.2, we have
negative correlation so that as X increases Y decreases.
Figure 12.3 is an example of perfect positive correlation

between the two variables so that both variables
increases in the same proportion. In the case of perfect
negative correlation between X and Y as shown in Figure
6.4, as one rises the other falls in exact proportion.
Figure 6.5 indicates no correlation between the variable.

We cannot identify any useful pattern for the data points.
The method of scatter diagrams indicating correlation is
plagued by a few drawbacks.
Though it may indicate the direction of association, a

scatter diagram cannot give the numerical value of the
strength of the association. The “true pattern” of the sets
of plotted points vary with individual judgements. The
methods of correlation analysis using Pearson and
Spearman coefficients are free from these problems.
12.5 PEARSON PRODUCT MOMENT

CORRELATION COEFFICIENT.
When two variables are measured in interval or

rations scales, Pearson correlation coefficient is
useful in studying association between them. This
coefficient, which is named after Karl Pearson who
devised it is defined as
224
where
The quantity is called the sum of cross product of X

and Y corrected for the mean. The quantities and
are respectively sum of squares of X and sum of
squares of Y corrected for the mean. Let us consider an
example on Pearson’s correlation coefficient.
Example 12.1. Table 12.1 shows investments X and

profit Y for Asako Consult for the period 1981 to 1984.
Figures are in millions of naira. Find Pearson correlation
coefficient r.
Table 12.1: Investment and profit for Asako Consult.

Year Investment (X) Profit (Y)
1981 4 10
1982 5 7
1983 7 9
1984 8 22
Solution:
Sxy = 26, Sxx = 10, Syy = 138
Therefore, = 0.70
225
The correlation coefficient value of 0.70 indicates a
strong positive correlation between the two variables.
12.6 RANK CORRELATION COEFFICIENT.
At times, our interest may be in variables measured in

ordinal scale. At other times though the measurements
are in interval and ratio scales, we can convert then into
ordinal measurements. In these cases we compute
coefficient of rank correlation to asses the correlation
between the two variables concerned.
There are many types of rank correlation coefficients.
The one considered here is called Spearman rank
coefficient of correlation. The interpretation of is the
same as follows (for variables X and Y)
Step 1. Rank each respective variable in order

Step 2. Replace the values of the variables by their
respective ranks
Step 3. Find the difference between the rankings in
each case.
Step 4. Square the differences in ranks and find the
sum of the squares.
Step 5 Apply the formula
Where D = difference between ranks of corresponding

values of X and Y and n = number of pairs of values (X,
Y) in the data.
We illustrate the calculation of r’ using data of Example
12.2. The values of X and Y are from table 12.1.
226
Example 12.2 Using Table 12.2, calculate Spearman
rank correlation coefficient .
Solution
Table 6.2 Rank correlation coefficient

Year X Y
1981 4 10 1 3 -2 4
1982 5 7 2 1 1 1
1983 7 9 3 2 1 1
1984 8 22 4 4 0 0
From Table 12.2

=4+1+1+0=6
Note that n = 4
Therefore
This indicates a high of positive correlation between

investment and profit. It is worth working from the
Examples 12.1 and 12.2 that the values of r and for
the same data set need not be the same. This should be
expected. In calculating Pearson’s coefficient r, we use
the actual measurement whereas ranks are used in the
calculation of Spearman coefficient . However, the
interpretation of r and r’ is the same.
227
12.7 INTERPRETATION OF CORRELATION
COEFFICIENT.
We interpret r (and also ) as follows:
DIRECTION: If r is positive, both variables move in the

same direction but if r is negative, the variables move in
opposite directions.
STRENGTH: The numerical value of r indicates the

strength of the linear relationship. The values of r range
from -1 to +1 inclusive. The higher the value of r
(regardless of sign) the stronger is the correlation.
VARIATION: The proportion of the variation in one

variable that can be accounted for by its linear
relationship with the other variable is given by r 2.
Therefore if r = 0.90, r 2 = 0.81.We can now say that 81%
of the variation in one variable, say Y can be accounted
for by its linear relationship with the other variable X
RELIABILITY: Generally, whenever the sample size

is very large, r appears something to rely on. However,
with a small sample size, the relationship between the
two variables may be ascribed to chance. Be that as it
may, a sufficient approach to reliability is normally carried
out using test of hypothesis, and this is considered in
Chapter 14 of this book.
LAG: At times a variable, say Y may delay in responding

to stimulus coming from another variable X. This may
result in the wrong correlation.
228
If the necessary allowance is done for lag, it is likely that
we will have the true picture of the correlation between
the two variables.
SPURIOUS CORRELATION: It is possible for

correlation analysis to verify that two variables move
together without any claim to be established this
necessarily indicates cause and effect. For instance, the
correlation of lecturers’ salaries and the consumption of
alcohol over a period of years is found to be 0.85. This
does not confirm that lecturers are alcoholics; nor does it
establish that sales of alcohol increases lecturers
salaries. Rather both variables moving together is largely
due to the influence of the third variable-growth in gross
domestic product and population over time. Correlations
such as cited above are often referred to as spurious or
nonsense correlations. We agree that correlation is real
does not necessarily imply cause and effect.
PROBLEM SET 12
1. State instances in real life where we have

correlation between two variables X and Y.
2. State the formula for correlation coefficient

between two variables X and Y and show that r
remains invariant under affine transformation of X
and Y.
229
(Hint: Let Show that the
correlation between X* and Y* is the same between X
and Y)
3. Ten students sat for physics and calculus and

the scores are as given below.
Physics 58 70 65 85 77 55 40 62 82 86
Calculus 55 64 60 82 74 50 38 60 83 84
a) Calculate the correlation coefficient r between

the two courses
b) Add 2 to all numbers in the above table and re
calculate r
c) If scores in physics and calculus are
respectively given by Y and X how much of
variation in Y is accounted for by X?
4. Draw a scatter diagram for data in problem 3.

Is a linear relationship between X and Y
suitable?
CHAPTER 13
SIMPLE REGRESSION
13.0 INTRODUCTION
Regression analysis is a technique that is used to study

relationship between two or more variables with an
attempt to putting down functional (mathematical) form of
such relationship. The variables used in this technique
230
are those measured on interval or ratio scales. This
chapter focuses on uses of regression, types of
regression models and methods of fitting linear
regression line, among others.
13.1 USES OF REGRESSION ANALYSIS

There are a few reasons behind regression
analysis. Some of them are estimation, prediction
and decision analysis
ESTIMATION: In estimation we use the sample

data to obtain estimates of the model parameters.
Estimation inevitably calls for testing for the reliability of
individual parameters and goodness of it of the overall
model. This we will do also when we get to Chapter 14.
PREDICTION: Having obtained the estimated

model we use it for prediction. Prediction is a situation of
what is believed will happen in the future made on the
basis of past experience of prior knowledge of observed
data. The two approaches to prediction are interpolation
and extrapolation. In extrapolation we use the estimated
model to project an identified pattern into the future. For
example if the sample period is 1961 to 1980 with time as
the independent variable, our interest may be in knowing
what the dependent variable will be in 1981 or 1982.
In interpolation we are concerned with what the
response variable Y will be for the value of X within the
sample period.
DECISION ANALYSIS: On the basis of the result from

estimation and prediction one can then make policies that
can guide in decision making. For instance, “will it be
231
necessary to open additional warehouses based on
projected sales of commodity A?”.
13.2 TYPES OF REGRESSION MODELS

SCATTER DIAGRAM.
After collecting data on the variables. Say X and Y, we
plot the data on the X-Y plane with Y being on the vertical
axis and X on the horizontal axis. The diagram that
results is called a scatter diagram and gives a certain
“Pattern” for the data set. The diagram may suggest a
straight-line, a parabola or extremely complicated
relationships. At times the diagram may be a “no clear
pattern” situation” possibly suggesting that there is no
correlation between the variables. However, four
regression models that are frequently used are:
(a) The simple linear (straight-line) model
(b) The intrinsically linear model
(c) The polynomial regression model
(d) The multiple linear regression model.
Types (b) and (c) are examples of nonlinear regression.
In this chapter, we shall be concerned with only types (a)
and (b).
13.3 METHODS OF FITTING THE LINEAR

REGRESSION LINE.
Different methods exist for fitting the simple linear
regression line. These are
(a) The freehand method

(b) The modified freehand method
(c) The method of semi-averages, and
(d) The least squares method.
13.4 THE FREEHAND METHOD.

232
The freehand regression line or the “eye ball fit
can be obtained by placing the ruler on the already
plotted scatter diagram and moving it (the ruler)
about the plane until it appears to pass through
( or pass close to ) as many plotted marks (dots)
as possible. An outstanding drawback of this
method is that different people would probably
draw different lines using the same data set. The
procedure is therefore dependent on subjective
judgements.
13.5 THE MODIFIED FREEHAND METHOD.
The modified freehand method is a variant of the

freehand method that ensures some level of
improvement over the latter. This technique,
essentially involves plotting the mean coordinates
of the data set, which are ( ) and ensuring that
the regression line passes through this point. This
procedure is equally subjective since several
straight lines can be made to pass through the
point ( ).
13.6 THE METHOD OF SEMI-AVERAGES.
The subjectivity in the above two methods can be

purged by adopting the method of semi-averages.
The procedure consists of splitting the data into
two equal groups, plotting the mean point for each
233
group and then joining these two points with a
straight line. The steps involved in this technique
can be outlined as follows:
Step 1: Sort the bivariate data by size of the X-value.

Step 2: Split the data into two equal groups, a lower half
and an upper half. (If there is an odd
number of items, ignore the central one).
Step 3: Calculate the mean point for each group
Step 4: Plot the above mean points on a graph using
suitably chosen scales and join them with a
straight line. The resulting line is the required
regression line of Y on X.
13.7 THE LEAST SQUARES METHOD.

The method of least square is an analytical and
objective approach for obtaining regression. Its
objectivity lies on the fact that the line so obtained
is unique and it analytical nature is based on the
fact that the procedures permits structural analysis
and prediction to be carried out using the
estimated model with the incorporation of
probability statement on the error involved. Let us
note that the line obtained using semi averages is
equally unique but we cannot discuss or use this
estimated structure using the language of
probability.
The least squares method states that the line of
best fit is that line which minimizes the total of the
squared deviation between it and the data points.
The steps in obtaining least square estimates of
the simple linear model are outlined below:
Step 1: SPECIFICATION OF THE MODEL

234
The simple regression model takes the form
(13.1)
where X = independent variable, also known as

the regressor, predictor or explanatory variable.
are the parameters of the model whose
values we want to estimate.
The model (13.1) is an hypothesized model. It is

the model we envision in our minds and which we
propose as being suitable for the study of the
phenomenon under consideration. It incorporates
the random error term u. The error term u takes
care of variation of Y (changes in Y) which are not
accounted for by X. The quantity is the
intercept of the regression line and is equal to the
value of Y when X becomes zero. The quantity
is the coefficient of X and is equal to the value of
the change in Y for a unit change in X.
Step 2 : ESTIMATION OF THE MODEL
After estimating the parameters and we

come out with what we call the estimated model.
It takes the form.
= b0 + b1X (13.2)
235
The quantities b0 and b1 are respectively the
estimates for and . We notice that the
random error term u is no more there. Its absence
is due to one of the classical assumptions of least
squares estimation that the mean of the stochastic
term is zero.
Let us recall that we used the quantities S xy,
Sxx and Syy in the last chapter. For convenience
we restate them here.
Sxy is the sum of cross-product of X and Y

corrected for the mean.
Sxx is the sum of squares of X corrected for the
mean and Syy is the sum of squares due to Y
corrected for the mean with these definition the
estimates for b1 and b0 are respectively
b1 = Sxy/Sxx
and
Step 3: VARIATION IN THE REGRESSION
The word variation is used to represent changes

that take place in a variable in the course of
236
estimation. In regression analysis we have
basically three types of variation as follows:
1.Total variation (TSS). This is given by

TSS =
2.Residual variation.If RSS denotes this variation,
then
RSS= (1-r2)SYY
where r is Pearson correlate on coefficient.
3.Explained variation, ESS is given by

ESS = r2 SYY
The explained variation is also called variation due
to regression. That is, it is the variation accounted
for by X. The residual variation RSS is the
variation due to the error term.
That is, RSS is the variation due to factors other

than X. Notice that the total variation is to
residual variation plus explained variation. That is
TSS = RSS + ESS
Step 4: EXPLANATORY POWER OF THE MODEL
The quantity r2 given by
237
Is called coefficient of determination of the model. It is
the square of the correlation coefficient. It is obtained
also by dividing explained variation by total variation. It
gives the explanatory power of the model. The higher
the value of r2, the more the estimated model is believed
capable of governing data set. That is, the higher the
value of r2 the better is the model fit.
Example 13.1 (Use data of Example 13.1)
(a) State the specification of the simple linear regression

model and explain your notations.
(b) Estimate the parameters of the model and comment
(c ) Using the estimated model predict the value of profit
when investment X assumes the values 3.5, 5 and 6.5.
Comment on your result.
(d) Calculate r and r2 and comment.
Solution
(a) The specification of the simple regression
model is
where
Y = dependent variable
X = independent variable
and are regression parameters u is the.
error term
(b) Now from the data
238
,Sxy = 26, Sxx = 10, Syy = 138
b1 = Sxy/Sxx
= 26/10 = 2.6
and
= 12 – (2.6)(6) = -3.6
The estimated model becomes

= -3.6 + 2.6X
When X = 0, = -3.6
This value suggests that the firm loses 3.6 million
naira when nothing is invested. A slope of 2.6
means that a rise of one million naira in
investment will result in an increase in profit to the
tune of 2.6 million naira.
(c) When X= 3.5,Y = - 3.6 + 2.6 (3.6) = 5.5

When = 5, Y = -3.6 + 2.6 (5) = 9.4
When X= 6,5, Y=-3.6 + 2.6 (6.5 = 13.3
From the data, profit Y is 7 when X = 5. However, the
predicted value of profit Y for X = 5 is 9.4. We can infer
from these results that the predicted and actual values of
the response variable Y are not always the same.
(d) The correlation coefficient r is
= 0.70
2
r = 0.49
The value of r = 0.7 shows a strong positive correlation
and r2 = 0.49 means here that X can only account for
49% of the total variation in Y.
Example 13.2. Below is a data set on the drying

time of a certain varnish and the amount of an additive
which is expected to reduce the drying time.
239
Table 13.1: Data on varnish and Drying Time
Varnish 0 1.1 2.2 3.3 4.4 5.4 6.5 7.4 8.5
additive
(gm) X
Time 13.0 11.5 11.0 9.0 8.0 9.0 8.5 9.5 10
(a) Draw a scatter diagram
(b) From your scatter diagram comment on

whether or not it is right to fit the data with a
simple linear regression.
Solution: (Partial Solution). The data plot reveals

that a second-degree polynomial with one local minimum
will best fit the data.
Example 13.3. The data below consists of humidity X

and moisture content Y (in appropriate units)
Table 13:2 : Humidity Data

Humidity 39 32 47 40 45 59 28 34 40 36 52 45 43
X
Moisture 14.0 10 16 11 18 9.0 11 13 14 12 15 13 12
Y
(a) Plot a scatter diagram to show that it is reasonable

to assume that the regression of Y on X is linear.
240
(b) Find the regression line using the
(i) Freehand method
(ii) Modified freehand method
(iii) Semi averages
(iv) Least squares.
Solution :
X (Humidity)
Y (Moisture content)
X 39 32 47 40 45 59 28 34 40 36 52 45 43
Y 14 10 16 11 18 9 11 13 14 12 15 13 12
(a) Scatter diagram

(b) (i) Freehand line
(ii) Modified freehand line
(iii) Line by semi averages
(iv) Least squares
Solution: (Partial Solution)
We shall give limits for the solution of (ii), (iii) and (iv)
(ii) We need ( ) to draw the modified
freehand straight line
=(∑X)/n = 540/13 = 41.54
=(∑Y/n = 168/13 = 12.92
Hence ( ) = (41.54, 12.92). This can now
be used for the modified freehand method for a straight
line required to fit the data set.
(iii) Data in the original form
241
S/No 1 2 3 4 5 6 7 8 9 10 11 12 13
X 39 32 47 40 45 59 28 34 40 36 62 45 43
Y 14 10 16 11 18 9 11 13 14 12 15 13 12
Data in ascending order of X
S/ 7 2 8 10 1 4 9 13 5 12 3 11 6
No
X 28 32 34 36 39 40 40 43 45 45 47 52 59
Y 11 10 13 12 14 11 14 12 18 13 16 15 9
Data in ascending order of X
( ) for the lower half of the data set is (∑x/6, ∑
y /6)
= (209/6, 71/6) = (34.83, 11.83)
For the upper half of the data set ( ) = (291/6, 83/6)
= (48.5, 13.83).
We join the points (34.83, 11.83) and (48.5, 13.83) by a

straight line.
The straight line so obtained is called a straight line fit of

the data set by method of semi-averages. Observe that
the central observation (40, 14) was omitted since the
sample size is odd. This point has the serial number 9.
(iv) If a simple linear represent model

is proposed, the estimated model takes the form
Y = 10.78 + 0.0516 X
(The reader can verify that the parameter estimates by
manual calculations).
242
13.8 REGRESSION FOR INTRINSICALLY LINEAR
MODELS
Let Y be a nonlinear function of X. This function is
said to be intrinsically linear if we can transform
the nonlinear function to linear by a suitable
algebraic substitution.
Assume the error term is multiplicative for the
following:
(a) y = aex
(b) y = ax3
(c) y = a +/ b logx
(d) y = ax
(e) y = a + b (1/x)
Example 13.4. What transformation are needed to

obtain linearity in models (a), to (e) above?
Solution. This should be handled as an exercise

either in class or at home.
Example13.5. The following data represent
Table 13: 3 Data on Enrolment Y and Year X

X 1 2 3 4 5 6 7
Y 302 340 390 460 550 673 880
243
a certain college for the past 7 years. Assuming a
multiplicative error term estimate a regression line for the
data according to the relation.
where a and b are constants. Using the estimated

regression model predict the enrolment 5 years from
now.
Solution
Given the data
X 1 2 3 4 5 6 7
Y 302 340 390 460 550 673 880
We offer a partial solution

(a)
Taking log to base 10 of both sides gives
Log Y = Loga + (Logb)X
That is
where
.
We observe that X is not affected by logarithm. The

estimated relation is
Y = 2.377 + 0.0763X
Now loga =2.377, a = 238.2 and logb = 0.0763 , b =

1.192
Hence the estimated relation in exponential form is
244
Y = 238.2 (1.192)x
Based on this model, we could expect the enrollment 5
years from now (X = 12) to be
Y = (238.2) (1.192)(12)
= 1960
Problem Set 13
1. Consider the simple regression model
Why do we incorporate the error term u in the above

model?
2. Mention four instances in real life where

regression is applied.
3. A study was conducted to find a linear regression

model that relates advertisement expenses X to volumes
of sales Y for an electronics company, Solano. The data
in mullion of $ are as given below for 10 years.
Y 2 3.5 2.1 3.2 1.6 1.7 2.5 2.4 2.9 3.8

X 35 62 45 63 32 45 44 54 37 76
Assume a regression model of the form.
a) Estimate the parameters and

b) Estimate the residual variance s2
c) Are the parameters and significant?
2
d) What is the value of r
245
e) If one million dollars is put into advertisement, by
how much is sales expected to rise?
4. Data on Y (personal consumption expenditure)

and X income for a household for 10 years are given
below (given are in Thousands of dollars)
Year 1 2 3 4 5 6 7 8 9 10
Y 34.5 34.8 35.0 36.2 37.5 38.7 39.7 40.5 41.6 42.2
X 47.8 48.4 47.6 49.1 51.5 52.8 54.1 55.4 57.2 58.8
a) Estimate the parameters of the consumption

model
b) If income is increased by 1000, by how much will

consumption rise?
c) Test the significance of and
d) What can you say about the overall fit of the
model?
5. Using the data for problem 4 and taking a Type 1

error of 0.05
a) Predict the mean consumption when income is
$50,000.
b) Predict consumption when income is $50,000.
246
CHAPTER 14:
FURTHER REGRESSION ANALYSIS

14.0INTRODUCTION
In business and economics, it is common to meet

situations where one variable, does response to changes
in two or more other variables. For instance, volume of
sales Y may be influenced by price of product,
advertisement and location of the manufacturing industry.
If this is the case, multiple regression model is the
appropriate model to be used to forecast sales.
14.1SPECIFICATION OF THE LINEAR REGRESSION

MODEL.
The mathematical form of the general linear
regression model is
(14.1)
where
1. are parameters to be estimated.

We shall denote the estimates of these
parameters respectively by
247
2. (14.2)
is the mean value of y given the independent
variables . The variable y is the
response variable. It is also called the dependent
variable
3. u is the disturbance term, otherwise called

stochastic error term. The estimate of u shall be
denoted by e. While e is also an error term, it is
more appropriate to distinguish it from u by calling
it residual term or residual error.
For short, u is simply the disturbance while e, the
estimate of u is the residual. The term u accounts
for influence on y that is not from the variables,
. The influence of the X’s on y is
said to be deterministic while the influence of u on
y is random or stochastic.
14.2 THE ESTIMATED REGRESSION MODEL.
The model in Equation (14.1) is a specified model. It is

also an hypothesized model. It is the model envisioned
by the researcher or model builder. He assumes that he
can best study the phenomenon of interest using the
model. In essence, the model is an hypothesis to be
accepted or not. The model in Equation (14.1) contains
parameters and after estimating it, we
have estimates so that the fitted y,
denoted by y is
(14.3)
248
Observe that there is no hat on any of the X’s, we
assume that the X’s are fixed. It can be confirmed
empirically that we can recover the original responses y
by using the relation,
+e (14.4)
The recovery of y using Equation (14.4) has a lot of
implications and applications in Monte Carlo simulation.
In essence, we can write the estimated model in the form
of Equation (14.3) or Equation (14.4).However, the
estimation of the model (14.1) which results in Equation
(14.3) or (14.4) is based on some underlying classical
assumptions, which we call ordinary least squares(OLS)
assumptions and are given in the next section.
14.3 ASSUMPTIONS FOR THE LINEAR

REGRESSION MODEL
1. Zero Mean of u: Mean of u is zero at any given

time.
2. Constant Variance of the Disturbance. If u is a
vector of n elements. The variance of any member
of u, say has the same variance as another
member (i ≠ j), i = 1, 2…..n and j = 1, 2,….,n).
This constant variance can be denoted by δ2
3. Normality of the Error term: If u= ,
has a normal distribution with mean zero and
variance σ2 ( i = 1, 2,….n)
4. Independence of Error term. The u’s are
independent. The u in period i is independent of u
in period j( i j)
249
5. The X’s are considered measured without error
and is independent of u
6. The X’s do not correlate with themselves.
14.4 THE LEAST SQUARE CRITERION (PRINCIPLE)
We will state this criterion using a two-Beta model

(a model with two ’s. The principle is obvious by
extension to cases involving more than two ’s):
Given the model
Dropping the i and n for simplicity, and replacing

respectively by their estimates , then
(14.6)
The least squares criterion states that the line that

minimize the sum of squares of the deviations of the
observed values of Y from those predicted values Ŷ is
the line of best fit. The quantity S in (14.6) is the sum of
squared deviations which is equal to the error or residual
sum of squares. Hence;
(14.7)
250
Using elementary calculus, we differentiate S partially
with respective to and to get
The equations in Equation 14.8 are termed normal

equations for the line of best fit. We can also refer to
them as normal equations for the simple regression
model of Equation (14.5). We can write Equation 14.8 in
matrix form as
Solving Equation (14.9) simultaneously
(14.10)
where
The normal equations (14.8) can be derived in another

way without using calculus.
251
First replace 0, 1 and ui in Equation (14.5) by their
respective estimators b0, b1 and ei so that
We have
Yi = b0 + b1 Xi + ei
and if we drop the subscript i, we obtain
(14.12)
Applying the summation operator ∑ through the last
equation, we have
From the assumptions of least squares, the mean of u is

zero. This implies that the mean of e is also zero. If the
mean of e is zero, the sum of e is zero. This makes the
last equation to become
The second equation of the normal system can be

obtained by first multiplying through Equation (14.12) by
X and summing through it,
i.e
Since X is independent of the error e

∑Xe = 0
and we have
Suppose we have two X’s, then
Y= b0 + b1X1 + b2X2 + e (14.13)
252
and the normal equations are
(14.14)
The first equation in (14.14) is obtained by summing

through (14.13), the second is obtained by first
multiplying (14.13) by X1 and summing through it. Lastly,
the third equation is obtained by multiplying (14.13) by X 2
and summing through it.We have to note that
since the X’s are measured independent of e.
In matrix form we have
(14.15)
For a model with three independent variables,
Y= b0 + b1X1 + b2X2 + b3X3 + e (14.16)
The normal equations are
(14.17)
253
In matrix from (14.17) becomes
(14.18)
Observe that as the number of the independent

variable increases, the dimension of the matrix involving
X’s, and their sum of squares and cross-products on the
right hand side (RHS) of (14.18) increases.
Hence, we set ,
to obtain
(14.19)
Notice that the matrix involving the X’s on the RHS of

Equation (14.19) is smaller in dimension to a similar
matrix in Equation (13.18). Therefore, in practice it is
easier to handle Equation (14.19) than Equation (13.18)
and b0 is usually obtained from
254
(14.20)
Similarly Equation (14.15) reduces to
and
(14.22)
For the estimated model (14.13) with two independent

variables X1 and X2, we use Equations (14.21 and 14.22)
to get the b’s
Example 14.1: Using data in Table 14.1 estimate the

parameters of the model
Table 14.1: Hypothetical Data for X and Y
X Y XY X2 Y2 Ŷ e = Y -Ŷ
3 6 18 9 36 6.50 -0.50
4 9 36 16 81 8.25 +0.75
2 5 10 4 25 4.75 +0.25
6 11 66 36 121 11.75 -0.75
10 19 190 100 361 18.75 +0.25
25 50 320 165 624 50 0.00
Sum
255
Solution: Using Equation (14.11) we have
The estimated model is
14.5DECOMPPOSITION OF SUM OF SQUARES
The sum of squares and cross product given by Equation

(14.11) are useful, not only to find the estimates of
and but to partition the total variation in the model into
appropriate components. The total sum of squares S yy
gives the total variation. If r is the correlation coefficient
between X and Y, then
(14.23)
256
The quantity r2, called coefficient of determination gives
the proportion of variation in Y explained by X.
Now if the total variation TSS = Syy , then
the variation explained by X or the variation due to the
regression of Y on X, denoted by RSS is
Therefore the variation due to error, denoted by ESS is
ESS = TSS – RSS
Example 14.2. Using data for Example 14.1

Find
(a) r2
(b) regression sum of square (RSS)
(c) residual sum of square (ESS)
(d) residual variance S2
Solution (a)
(b) Regression Sum of Squares.
(c ) Residual Sum of Squares, ESS = TSS – RSS

= Syy – RSS
257
= 124 – 122.5 = 1.5
(d ) Residual variance s2
= 1.5 / (5 -2) = 0.5
14.6 VARIANCE OF b0 AND b1
Let the variance of b0 and b1 be respectively and .

Then, it can be shown that
and
14.7INFERENCES IN THE LEAST SQUARES MODEL.
If the level of significance is , then 100( 1 -)%

confidence interval for b0 and b1 are respectively
258
where is the value of t – distribution with tail
probability /2.
Next is to test for the significance of b0 and b1 .

If bi (i = 0,1) is significant then there is justification for
retaining the corresponding parameter i (i =0 ,1) in the
model, otherwise should be expunged and the model re-
specified.
(a) Testing the significance of b0 : We test

1. Ho : 0 = 0 Versus H1 : 0 ≠ 0
2. Test statistic:
2. Test statistic Decision rule: Reject Ho if t >

where t and , each has n-2 df
(b) Testing the significance of b1: We test
1, Ho: 1 ≠ 0 Versus H1 : 1 ≠ 0
2. Test statistic:
3. Decision rule: Reject Ho: if t >
Example 14.3 (a) Find a 95% confidence interval for 0

and 1 using information from Example 14.1 and 14.2
(b)Test for the significance of b0 and b1 at  = 0.05
Solution (a) The values of and are
259
= (0.5 [1/5 + 5/40] = 0.1625
= 0.5 / 40 = 0.0125
Therefore s0 = 0.4031
s1 = 0.1118
df = n-2 = 5 -2 = 3 and t0.025 = 3.182
Thus a 95 percent confidence interval for 0 is
1.25 ± 3.182 (0.4031) = 1.25 ±1.283
= [-0.033, 2.533]
and a 95 percent confidence interval for 1 is
1.75 ± 3.182 (0.118)

= 1.75 ± 0.356 = [1.39, 2.11]
(b) For 0, t = = 1.25 / 0.4031 = 3.101< 3.182 and
for 1 , t = = 1.75/0.1118 = 15.653 >3.182. There

results shows that only b1 is significant .
14.8ANALYSIS OF VARIANCE IN REGRESSION
For the regression problem of Example 14.1 the

ANOVA table is as in Table 14.2.
260
Table 14.2: ANOVA for Two-Variable Regression
Source of Sum of Df Mean F
Variation squares square
SS ms
X RSS 1 MRSS F=
Residual ESS n-2 MESS MRSS/MESS
Total TSS n-1
Table 14.3 constructed from Example 14.2 and Table

14.2
Table 14.3 : ANOVA Using Data from Example 14.2

Source of SS df ms F
X 122.5 1 122.5 245
Residual 1.5 3 0.5
Total 124 4
Example 14.4. Use the computed F in Table 14.3 to test

for the overall goodness of fit of the simple regression.
,  = 0.05
Solution: F0.05(1, n-2) = F0.05, (1,3) = 10.1
We are to test
H0: Regression is not significant
261
Versus
H1: Regression is significant.
Since F = 245 > F0.05, (1,3) = 10.1, we reject H0 and

state that X significantly explains Y.
Example 14.5.Construct a 100(1- ) percent confidence

interval for the disturbance variance σ2. Use your result
to find a 95 percent confidence interval for σ2 in the
simple regression of Example 14.1 and Example 14.2
Solution. First we use the fact that
Observe that
Hence
If we are looking for a 100(1-) percent confidence

interval, then
262
The quantities are respectively the100(/2)
100(1-/2) percentiles of the distribution. All
percentile are found using (n-2) degrees of freedom (df).
Now the left-hand side of (14.31) is the same as
From here we see that 100( 1-)ercent confidence

interval is
If we seek a 95 percent confidence interval for σ2 then 

= 0.05, /2 = 0.025 and for n-2 = 5-2 = 3df, = 0.216
and = 9.35.
From Example 14.2 , (n-2)s2 = 3(s2) = 3(0.5) = 1.5
Therefore a 95 percent confidence interval for σ 2 is
14.9 PREDICTION WITH REGRESSION MODELS.
263
We want to do two things here. They are
(a) Predicting a new value of Y given X = Xo, and

(b) Predicting the mean value of Y at X = Xo given a
specified value of 
Predicting a New Value of Y: A 100(1-) percent

confidence interval for a new value of Y at X = X o is
(14.33)
Predicting the Mean Value of Y: A 100 percent

confidence interval for the mean value at X = Xo
(14.34)
Example 14.6 From Example 14.1, we have

Ŷ = 1.25 + 1.75X
n = 5, = 5, s2 = ∑e2/n-2 = 0.5
Sxx = 40.
Find a 95 percent confidence interval for

(a) the value of Y at X = 10
(b) the mean value of Y at X = 10
Solution. For = 3.182

Applying Equation (14.33) a 95 percent confidence
interval for Y at X = 10 is
264
1.25+1.75 (10) ± 3.182
18.75 ± 3.040 = [15.71 , 21.79]
A 95 percent confidence interval for the mean of Y is
18.75 ± 3.182
= 18.75 ± 2.04
= [16.71, 20.79 ]
Variances for the b’s in the model (14.13)
Y= b0 + b1X1 + b2X2 + e
can be obtained using the following formula
(14.35)
(14.36)
(14.37)
265
where s2 is the variance of error term e.
Problem Set 14
In a certain graduate school

1.The grade point average GPA for MBA students in
their first semester examination are believed to relate to
the mathematical and communication ability doing
admission exercise
Data on ten students are as given below
Student 1 2 3 4 5 6 7 8 9 10
GPA(Y) 4.22 2.69 4.59 4.93 4.00 3.83 2.66 3.32 3.33 3.42
Maths(x1) 62 48 59 68 61 58 52 51 56 53
Commun 68 59 62 70 70 63 62 73 67 61
ication(x2)
Using GPA (Y) as the dependent variable

a) Estimate the parameters of the model
b) Calculate the variances of the estimates of

c) Test the significance of the parameters
2. The Cobb-Douglas production of function with

multiplication error term may take the form.
266
where Q = Output
K = Capital input
L = Labour input
 = Stochastic disturbance term
e = 2.718
Using the data below and setting

In Q = Y, InK = X1, InL = X2 , In o=
Year Real Gross Real Capital Labour Days
Product Q input K (Billion
Days)
1 18.7 18.9 0.282
2 19.5 19.2 0.283
3 22.2 19.3 0.280
4 23.0 20.3 0.277
5 22.5 20.8 0.278
6 22.9 21.9 0.286
7 26.9 23.2 0.292
8 28.6 24.6 0.312
9 29.5 25.8 0.318
10 30.7 27.9 0.314
where Q, K and L are recordings from a certain country

for ten years in the mining sector. Data are in Billion
dollars estimate
b) The quantity is referred to as returns to

scale – the response of output to a proportionate change
in the inputs. If > 1, we have increasing returns to
scale, if = 1, we have constant returns to scale
267
and < 1 yields decreasing returns to scale. Which
returns to scale is revealed from your estimation in (a) ?.
c) Assuming that the parameters and

do not correlate test the hypothesis Ho: =1
Versus H1:  1 using a significance level of 0.05.
3. The table below gives measurement on lean body

mass Y, height (cm) X1, body mass index (BMI) X2, age
in years X3 for 20 adult persons.
S/No Height BMI Age (Yrs) Lean

X1 X2 X3 Body
Mass Y
1 175.6 17.0 28 41.1
2 165.2 16.0 26 33.0
3 169.9 18.0 27 39.0
4 169.7 18.4 27 43.2
5 170.2 16.2 28 38.0
6 170.6 17.0 27 40.4
7 175.2 19.0 25 43.4
8 159.0 21.2 29 40.3
9 164.2 18.0 27 38.0
10 160.2 17.0 29 35.0
11 166.1 19.1 28 45.0
12 161.2 17.2 27 38.0
13 167. 16.2 31 39.0
14 168.4 16.0 31 39.1
15 167.0 17.0 25 43.5
16 153 17.1 27 34.9
17 151 19.2 25 38.0
18 160 17.1 24 36.9
268
19 152 17 23 33.9
20 169 15 26 38.1
a) Construct a correlation matrix of Y, X1, X2 and X3.

b) Test for significance of the correlation coefficients in
(a)
c) Assuming a linear model of the form.
Find the estimates of and .

d) Use Excel or SPSS to re-estimate the model in (c ).
From the output, are the estimates of ,and
significant? (You can use the tail probabilities of the
associated parameter estimates to help you).
e) What can you say about the overall goodness of fit of

the model?
4. Below is a data set on the length L, breadth K and

area A of leaves of a certain tree, measurements are in
cm and cm2. A botanist assumes that the area A follows.
The specification
where, 0 ,1 and 2 are parameters to be estimated, u is

the stochastic disturbance and e = 2.71 ….. By taking
logarithms of A, K and L to base e,
S/No Length (cm) Breadth(cm) Area A (cm2)

1 11.8 13.9 112.0
2 12.0 13.6 110.2
3 9.6 11.8 74.9
4 10.5 12.3 87.3
269
5 10.1 12.1 82.6
6 11.4 12.9 97.1
7 11.6 12.7 98.9
8 10.1 12.1 83.0
9 11.2 13.1 100.0
10 12.4 13.4 111.0
11 9.2 11.9 71.1
12 11.9 14.2 112.2
13 11.3 12.7 97.0
14 10.0 12 83.0
15 12.1 135 110.0
a) Find estimate of the parameters , 0 , 1 ,and 2

b) Find the estimates of , 0 -1, 1 + 2 -2 and , 1 / 2
c) Find the Euclidean norm of the vector
The estimate of is called leaf rectangularity index

and measures the deviation of an oval leaf surface from
being a perfect rectangle. As → 0, the leave surface
tends to be a perfect rectangle. What is the point
estimate of ? [In practice, we use bootstrap
regression to estimate ]
5. a) What is the basic difference between correlation

analysis and regression analysis.
b) Mention some applications of multiple regression in
real life.
270
6. The selling price of a house, Y (in millions of naira) is
believed by estate agents to relate to amount of living
area X1 (in hundreds of square feet), the number of floors
X2, the number of bedrooms X 3 and an index for bathing
facilities (Bathing Facility Index BFI). A random sample of
size 12 from a new GRA in a middle belt city of Nigeria
reveals the following data.
S/No List Living Floors Bedrooms BFI

Price Y Area X2 X3 X4
X1
1 70.1 7 1 2 1
2 200.2 11 1 2 1.8
3 118.5 11 1 3 2.3
4 127.0 12 1 3 2
5 131.3 14 2 4 1.2
6 137.2 15 1 3 2
7 142.0 15 2 4 2
8 149.8 18 1 4 2
9 162.1 21 1 4 3
10 173.0 19 1 4 3.3
11 212.0 22 2 4 3
12 196.1 21 2 4 3.1
(a) Construct by means of either SPSS , Minitab ,

Eviews or Excel the correlation matrix of Y, X1, X2 ,X3 ,
and X4
(b) Estimate by means of either SPSS , Minitab , Eviews

or Excel, the regression coefficients of the model
(c ) Comment on the significance of the coeffients in (b).

(d) Is the overall fit significant ?
271
CHAPTER 15
272
ANALYSIS OF CATEGORICAL DATA
15.0 INTRODUCTION
Results of many experiments have measurements that

are qualitative or categorical rather than quantitative.
That is measurements of the outcomes are quality ratings
rather than numerical values. The data set is now
summarized by creating a list of categories and showing
a count of the number of measurements falling into each
category. Some examples are;
1) Lectures falling into six class rankings; Professors,

Associate Professors, Senior lectures, Lecturer I,
Lecturer 2 and assistant lecturer
2) A rat responding to one of four ways to a stimulus
3) Crop yield can be rated as being of grade 1, grade
2 or grade 3.
4) Manufactured items can be classified as excellent,
acceptable, average or defective.
15.1 CHI-SQUARE TESTS OF CATEGORICAL

DATA.
We may reduce the analysis of categorical data
into three types, though all of them use chi-square
distribution in testing the necessary and
associated hypotheses
They are
(a) Chi-square test of goodness of fit

273
(b) Chi-square test of homogeneity
(c) Chi-square test of independence.
We elaborate more on these tests.
Goodness of fit Test: This test investigates whether

data falling into several categories do so with a set of
theoretical or hypothesized set of probabilities. This will
be illustrated in both one-way and two-ways multinomial
experiments below. Goodness of fit evaluation also
involves testing different types of departure from an
assumed distribution .For instance:
a) Testing for normality when skewness is suspected

b) Testing for a Poisson model for a data set
c) Testing for a Binomial model for a data set.
Homogeneity Test:
In this case, we test if proportions of an attribute of
interest from various categories are the same.
In a multinomial experiment with n trials and k outcomes,
are the P’s the same? That is, is it true that
P1 = P2 =…..= Pk ?
This makes it a special case of goodness of fit test. In
the case of goodness of fit test involving multinomial
distribution, we test
H0: P1=P10 , P2 = P20 ,….., Pk = Pk0
Whereas in homogeneity test, we investigate if

H0 : P1 = P2 =…..= Pk (say)
If we have k binomial samples such that the ith sample of
size ni has ai items having the attribute A (i = 1, 2 …..k),
then in sample i, the proportion Pi = ai /ni has attribute A.
Homogeneity test examines whether the true P i vary from
sample to sample. That is, are the Pi’s the same?
274
Another situation where we evaluate homogeneity is
testing if Poisson samples have the same mean. Since
for Poisson model the mean. Since for Poisson model
the mean equals its variance, this test is sometimes
called variance test of the homogeneity of the Poisson
distribution.
Test of independence. This test involves a two-way

classification where two categorical variables are
recorded at the intersection of category levels. These
intersections are known as cells. The main problem to
solve here is to evaluate if classification of one variable is
contigent (dependent) on the other variable’s
classification. If not the two methods of classification are
said to be independent. This test is also called test of
association. If one variable has r levels and the second
has c levels we have r x c contingency table with rc cells
15.2 ASSUMPTIONS OF CHI-SQUARE TESTS.

All the tests in Section 15.2 are carried out under some
conditions for them to be valid. First the data must be
categorical counts, where categories can be stipulated by
attributes or the range of some variables. The second
condition is that expected counts per category should not
be less than five. If the expected frequency is less than
five, some pooling of adjacent categories may be
necessary. The third requirement is that percentages
have to be converted into counts.
275
15.3 ILLUSTRATIONS OF GOODNESS OF FIT
TESTS
Example 15.1 Out of 200 lecturers recruited for

different positions in a certain University, 90 were
professors, 17 were associate professors, 13 were senior
lecturers and 80 were from Lecturer 1 cadre. The
recruitment policy of management was to offer the new
positions using quota allocation of 40%. For professors
(P),10% for associate professor (AP), 5% for senior
lecturers (SL) and 45% for Lecturer 1 (L1) Does the
recruitment conform with recruitment policy of
management? Use  = 0.05 .
Solution: We can represent the information in a form

of a table (See Table 15.1)
Table 15.1 Recruitment of Lecturers by Rank
Rank of Lecturers P AP SL L1
Observed frequency 90 17 13 80
(oi)
Expected 80 20 10 90
Frequency (ei)
We are to test:
H0: P1= 0.40, P2 = 0.10, P3 = 0.05, P4 = 0.45

Versus
H1: At least one of the four probabilities is different from
the given value.
276
= (90 -80)2 /80 + (17 -20)2 /20 + (13 -10)2 /10
+ (80 -90)2 /90
= 3.71
From statistical table , df = k-1 = 4 -1 = 3
Since
we do not have sufficient evidence to reject H 0: We state
that management policy was followed in the recruitment
exercise.
15.4 ILLUSTRATION OF TEST OF INDEPENDENCE.
This test is also called contingency test or test of

association.
Example 15.2
Table 15.2: Distribution of Cars showing Tyre Brands
and Durability
Brand Brand B Brand C Brand D

A
Low 25 24 16 32
Durability
277
Moderate 120 93 114 122
Durability
High 55 83 70 46
Durability
Use 0.05 level of significance to test

H0: Tyre durability is independent of brand.
H0 : Tyre durability is dependent on brand
Solution
Table 15:3: Computer Output for Data in Table 15:2
BrandA BrandB BrandC Brand D Total

Low D Observed 25 24 16 32 97
Expected 24.25 24.25 24.25 24.25 97.00
Moderate Observed 120 93 114 122 449
D Expected 112.25 112.25 112.25 112.25 449.00
High D Observed 55 83 70 46 254
Expected 63.50 63.50 63.50 63.50 254.00
Total Observed 200 200 200 200 800
Expected 200.00 200.00 200.00 200.00 800.00
22.63 Chi-square
6 Df
278
.0009 p-value
The
where eij = expected frequency for the ijth cell
The quantity eij is given by
The observed frequency o21 has the value 120.

The corresponding expected frequency is 112.25.
That is, e21 has the value 112.25. We compute it as
That is,
In like manner, all the expected frequencies are

calculated. Then we use the formula,
= 22.63
If we denote respectively the number of rows and column
by r and c, the degrees of freedom for the problem is
df = (r-1)(c-1) =(3-1)(4-1) = 6.
Since = 22.63 > , we reject the null

hypothesis and uphold the point that tyre durability
depends on its brand.
279
Example 15.3 The Table 15.3 shows the frequency
distribution of the number of defects per a copy of
magazine called Blosom for a random sample of 1000
mass produced copies of the magazine.
Table 15.5 Frequency Distribution of Defects for 1000 copies

of Blosom
No of defects 0 1 2 3 4 5 or
(x) more
Frequency (f) 560 308 107 21 4 0
Conduct a standard goodness of fit analysis to assess

whether these data are consistent with the notion that the
number of defects per copy of the magazine has a
Poisson distribution
Solution.We first obtain the Poisson parameter given

as
Hence, the Poisson probability is
Next, we are to test
Ho: No of defects per copy of magazine is Poisson

Versus
H1: Negation of Ho
Estimated frequencies E under Ho are given by
280
Table15:6:Expected Frequencies of the Poisson Data
No of 0 1 2 3 4 5 Total
defects (x)
Observed 560 308 107 21 4 0 1000
Frequency
O
Expected 548.3 329.5 99.0 19.8 3 0.4 1000
Frequency
E
23.2
Since E must be greater or equal to 5, the last three

classes are combined to make E = 23. 2
2 = ∑ ( 0 – E )2 / E
=(560 – 548.3.3 )2 / 548.3 + ..+ (25 – 23.2)2 / 23.2
= 2.44
Note that the degree of freedom is k – 1 – m, where k is

the number of classes and m is the number of estimated
parameters.
Since we have estimated , m =1 and k -1 – m = 4 – 1 –

1 =2 ( the number of classes is now 4 instead of 6)
Therefore,
281
Since
, we cannot reject Ho at  = 0.05. Therefore data set is

consistent with the Poisson distribution.
15.5 ILLUSTRATIONS OF HOMOGENEITY TEST;
Example 15.4. Using the observed frequencies in Table

15.1 test for equality of proportions in recruitment in all
classes Use  = 0.05.
Solution, we are to test
Ho: P1 = P2 = P3 = P4 = 0.25
H1 :At least one Pi is different from 0.25
Under Ho , the observed and expected frequencies are in

Table 15.7
Table:15.7: Recruitment of Lecturers’ Data
Rank of lecturer: P AP SL L1
Observed frequency 90 17 13 80
(o)
Expected frequency e 50 50 50 50
2 = ∑(o –e)2 / e
= (90 – 50)/50 + (17 – 50)2/50 + (13-50)2 /5
+ (80-50)2/50
= 99.16
282
Now for 3df,
Since
2 = 99.16 > ,
we reject Ho. This shows that at least one of the P i’s is
different from 0.25
In the next example, suppose we have k binomial

populations, each with sample size ni ( i = 1, 2, 3, ….,.k)
Let ai be the number in sample i with attribute A.The

question is, are all proportions, ai/ni with attribute A
equal ?
Example 15.5 . A sample of 300 respondents were

randomly polled in each of three senatorial districts in a
certain state in the Niger Delta to ascertain the popularity
of candidate A for a gubernatorial post. Some favored,
candidate A and some did not favor A. Details are in
Table 15.8
Table 14.3: Respondents’ Opinion in three Senatorial

Districts
Favor A 130(129) 119(128) 111(131) 360
Do not 170 181 189 540
Favor A
District 300 300 300 900
Total
(Expected frequencies are in parenthesis)
Is there sufficient evidence from the data to
indicate that the proportions of respondents who favour A
283
differ in the three senatorial districts? Use  = 0.01 [Hint:
Is the proportion favouring A independent of district? ]
Solution: The reader can take this as an exercise.
Problem Set 15
284
1. A survey of 400 respondents resulted in the 2x3
contingency table below
38 36 90
64 56 116
(a) Find the statistic
(b) Test the null hypothesis that there is

independence between rows and columns
Take = 0.01
2. Female and male respondents to a questionnaire

on gender issues in politics could be categorized
into three groups as follows:
Male 35 48 76
Female 9 59 30
Find if there is difference in the responses

according to gender.
Take = 0.05
3. A survey of voter opinion was conducted to

compare the proportion of voters who favour
candidate A in four wards in a local government
area in a one south-south state, Nigeria. A random
sample of size 200 was taken in each of these four
wards. Do the data give evidence to show that the
proportion of voters who favour candidate A differ
in the four wards.
Is the proportion favouring A independent of ward?
285
Favor A 77(76) 51(77) 64(70) 50(51)
Do not 123 149 136 150
Favor A
Ward Total 200 200 200 200
(Expected frequencies are in parenthesis)
Take = 0.05
4. The reasoning among many people is that level of

wealth is affected by level of education. A random
sample of hundred persons from each of three
income groups in a GRA in a southwest city in
Nigeria yielded the data below;
Rich Comfortably Stingingly

rich rich
No secondary 33 21 22
school
Secondary 14 15 2
school
Degree 42 50 58
Postgraduate 11 14 18
degree
(a) Does this data set confirm that the level of wealth
depends on educational attainment? Test at =
0.01
(b) Using the outcome of the test in (a) discuss on the

relationship between educational attainment and
level of wealth.
5. A study was conducted last year in a city in south-

south to determine if customer’s preference to
286
fast-food chain is affected by age. A random
sample of 500 fast food customers aged between
15 and older was obtained giving rise to the
entries below;
Age Restaurant Restaurant2 Restaurant3 Restaurant4

group 1
15-20 74 35 9 7
21-30 88 43 18 11
31-50 53 53 27 19
50 20 26 6 11
and
above
(a) Test for independence between age and choice of

restaurant
(b) If customer’s fast-food preference is dependent on

age, what is the practical implication for marketing
experts? Take = 0.05
CHAPTER 16
NONPARAMETRIC METHODS
16.0INTRODUCTION
287
The tests of Chapter 10, one-sample t test, the two
independent sample t test, the paired t-test and the one-
way analysis of variance F test all assume that samples
come from normally distributed populations or at worst
the populations are mound-shaped and are not highly
skewed either to the left or right. Many times this
normality assumptions fails to hold and we resort to
techniques that do not require assumptions about the
shapes of the probability distributions of the sampled
populations. These techniques are collectively called
non-parametric methods.In these cases we convert all
measurements into ranks.
When we cannot measure outcomes of an experiment

directly, they are ordered or ranked and non parametric
methods are used to analyze them. Some examples are;
1. Preference scores for leaf patties on a scale of 1
to 8
2. Rating of smoke level to consist of none, very
light, light, medium, heavy and very heavy
3. Four brands of Mercedes Benz may be ranked
from most appealing to least appealing.
The nonparametric methods considered in this chapter
are
1. Wilcoxon Rank Sum Test
2. Kruskal – Wallis H Test
3. Sign Test for a Paired Experiment
4. Wilcoxon Signed Rank Test
16.1 WILCOXON RANK SUM TEST:
Wilcoxon (1945, 1946, 1947) developed tests

based on ranks in testing the equality of two
288
populations. Since then a lot of literature on the
optimal properties of his tests have been
investigated.
Some of these results can be found in Wilcoxon
and Wilcox (1964) and Bradley (1964)
This test is the equivalent of the parametric

independent t-test where the two sampled
populations are normal. When the normality
assumption fails and the sample sizes are small,
we resort to Wilcoxon rank sum test.
The procedure for the Wilcoxon rank sum test is

as follows:
1 Rank all n1 +n2 observations from the smallest to
the largest observations,
where n1 ≤ n2 . Note the sample sizes need not
be equal.
2 Determine T1 and T2 ,the sums respectively of

ranks of the observations in samples 1 and 2
3 The test statistic is T is

T = T1 if n1 ≤ n2 , and
T = T2 if n1 > n2
4Null Hypothesis H0: Denoting by μ1 and μ2 the
location parameters of distribution 1 and
distribution 2. Let us denote by F 1 and F2 these
distributions. Then we can write H0 as
H0: F1 and F2 are identical, (μ1 = μ2)
289
5 Alternative Hypothesis
(a) H1: F1 is shifted to the left of F2 (=>μ1< μ2)
and
(b) H1: F1 is shifted to the right of F 2 (=>μ1> μ2)
and the test is right –tailed)
(c) H1: F1 is shifted to the left or right of F2
6 Decision Rule:
Case (a): H1 :μ1< μ2 , Reject H0 if
T = T1 and T1 ≤ TL
T = T2 and T2 ≥ TU
Case (b): H1 :μ1 >μ2 , Reject H0 if

T= T1 and T1 ≥ TU
T = T2 and T2 ≤ TL
Case (c) H1 :μ1 μ2 , Reject H0 if

either T ≤ H1: F1 or T ≥ TU
Note once again, that the test statistic is always the one
associated with the smaller sample size.
Example 16.1. In an experiment to compare visual

acuity of deaf and hearing children eye movement rates
are recorded for 10 deaf and 10 hearing children. The
results are given in Table 16.1
290
Table : 16 .1 Visual Acuity of Children
Deaf Children Hearing Children
2.85 1.96
3.15 1.25
3.33 1.66
2.31 1.53
2.65 1.74
1.96 1.26
2.27 2.01
2.47 1.65
2.85 1.86
2.25 1.46
A clinical psychologist holds the view that deaf children

have higher visual acuity than hearing children. Test this
claim at  = 0.05 ( Note that higher eye movement rate
implies greater visual acuity).
16.2 THE WILCOXON RANK SUM TEST FOR LARGE

SAMPLES
Let μ1 and μ2 be the location parameters of populations 1

and 2 respectively, T1 has an approximate normal
distribution with mean μ and variance σ 2 given by
291
So that
This approximation is valid for n1 ≥ 10 and n2 ≥10.
Example 16 .2 . Rework Example 16.1 using the normal

approximation stated in Equation (16.3).
16.3 THE KRUSKAL – WALLIS H TEST(KWH TEST)
The one-way ANOVA in the completely randomized

design is carried out on the assumptions that the
samples are independent and drawn from normal
populations. Also we assume equal variances for the
populations. Now if the independence assumption
stands but normality and / or equal variances do not hold,
we can use a non-parametric method to compare several
populations. One appropriate approach is the Kruskal-
Wallis H test (KWH Test).
The main assumptions of the test are that;
(1) the number of samples must be at least three and
(2) samples are obtained using a completely randomized
design.
The KWH test proceeds as follows:

1. Null Hypothesis: H0: k populations are identical
2. Alternative Hypothesis: H1: At least two
populations differ in location (i.e are shifted
either to the left or to the right of one another).
3. The KWH statistic is
292
has an 2 distribution with k-1 degrees of freedom
where n = n1 + n2 +…………..+ nk
Ti = sum of ranks of observations in the ith

sample.
4. Decision Rule: Reject H0 if ,based on k -1

degrees of freedom.
Example 16.3. Using the data in Table 11.2 test if there

is difference in locations for the three populations.
Solution. We simply refer to the different teaching

methods as Method I, 2, and 3 and give their respective
observations. Ti is now sum of ranks for method i and
not sum of treatment as before.
Method 1 Method 2 Method 3

42 (10.5) 43(12) 16(1)
36(7) 40(8.5) 35(5.5)
71 (15) 35 (5.5) 40 (8.5)
30 (3) 42 (10.5) 28 (2)
55 (14) 44 (13) 33 (4)
T1 = 49.5 T2 = 49.5 T3 = 21
The H statistic is
293
For df = k -1 = 3 -1 =2, = 5.991
Since
H = 5.415 <
the three populations do not differ in location. This was
the same conclusion when we used the parametric
method of one-way ANOVA
16.4 THE SIGN TEST FOR A PAIRED EXPERIMENT
The Wilcoxon rank sum test is suitable for company two

populations when the two populations are sampled
completely independently of each other. However, in
many situations observations arise from natural pairing
(a, b), with a coming from population 1 and b coming
from population 2. One good example, is in taste testing
294
experiment in which each judge is asked to assign
preference score for two competing food product brands
1 and 2. Generally observations are naturally paired
when we compare two treatments in a randomized block
design with blocks of size two. Here the treatments in
each block are the same. Whenever we have these
conditions satisfied, one method of comparison is the
sign test.
The procedure for the sign test consists of the following
steps.
1. Calculate the difference di = ai – bi (i = 1, 2,…., n) and

give it a + sign if di > 0 and - sign if di < 0
2.) Whenever we have ties, (d i = 0), no sign should
be assigned and reduce the sample size by the
number of ties
3.) Let X+= and X- denote the number of times d i = ai –
bi is positive and negative respectively. Set X =
minimum of X+= and X- and let the observed
minimum value of X be denoted by xo
4.) Null hypothesis: H0: The distributions D1 and D2
are identical and P [a > b] = P = 0.5
5.) Alternative hypothesis
HI: D1 is shifted to the left of D2 and P < 0.5
HI: D1 is shifted to the right of D2 and P >0.5
H1: D1 and D2 are not identical and P ≠ 0.5
6.) Case (a) H1: P <0.5 Reject H0 is P[X≤ x0] < 

Case (b) H1: P >0.5 Reject : H0 if P [ X ≤ x0] < 
Case (c) H1: P ≠0.5 Reject H0 if 2P[X ≤ x0 ] < 
Since we are using binomial probabilities
If r0 = max [ X+ , X- ] is used, then
P [ X ≤ x0] = P [ X ≥ r0]
295
As an example, if X+ = 1 and X- = 9
Then x0 = 1. Also if x 0 = 1 and n = 10 then for a two –
tailed test at  = 0.05, p-value = 2P[X ≤ 1] = 2P [X ≥9] =
0.0214
As another example suppose we are to test

H0: P = 0.5 versus H0: P > 0.5
Given  = 0.05, n = 20, X+ = 15, X- = 5. Then x0 = min [
X+, X-] = 5 and
P-value = P[ X≤ 5] = P[ X ≥15 ] = 0.0207
≈ 0.021
Example 16.4
The scores of 10, one hundred-level students in Algebra
and Calculus are as given in
Table 16.2 Scores of 10 students in Algebra and

Calculus
Student 1 2 3 4 5 6 7 8 9 10
Algebra 88 65 70 90 90 50 69 74 88 58
Calculus 82 46 59 85 73 52 56 58 82 42
Conduct a test of hypothesis to determine if the median

score differs for the two subjects. Use the sign test with
 = 0.05.
Solution. The test statistic X = min [X +, X-] Here Xo =
min [ X+=, X-] = min [ 1,9] = 1. We are to test Ho: P = 0.5
Versus P ≠ 0.5
The P-value = 2 [X ≤1 ] = 2 ( 0.0107) = 0.0214 since
0.0214 <  = 0.05, we reject Ho and conclude that there
is a significant median difference in the performance of
students in both subjects.
296
16.5 THE LARGE- SAMPLE SIGN TEST FOR A
PAIRED EXPERIMENT
For n large enough
We then use it to make inference according to the given

H0 and H1.
16.5 THE WILCOXON SIGNED – RANK TEST FOR A

PAIRED EXPERIMENT
The Wilcoxon signed-rank test is a modification of the

sign test for the reason that it takes into consideration
both the numerical values of di and the signs. We
calculate the test statistic as follows.
1. Calculate di = ai – bi, for I = 1, 2, ,….n and reduce

n by the number of zero difference (in the cases
where di = 0).
2. Assign rank to the absolute values of the
differences di. In case of ties while ranking ,
assign average of the ranks
3. Calculate the rank sum for the positive differences
and designate this value T+. Similarly find the rank
sum for the negative differences and cell it T-
4. Alternative Hypothesis Test Statistic
Reject H0 if
H1: μ1< μ2 T+ T + ≤ T0
-
H1: μ1 >μ2 T T - ≤ T0
H1: μ 1 ≠ μ 2 T = min [ T+ , T-] T ≤ T0
where T0 is the critical value from the statistical table.
297
Example 16.5 . Using the data for Example 16.4 Test H 0:
μ1 = μ2 Versus
(a) H I : μ1 ≠ μ 2
(b) H I : μ1 > μ 2
Solution. We ease of calculation we show a copy of the data

again below together with differences and ranks
Student 1 2 3 4 5 6 7 8 9 10
Algebra 88 65 70 90 90 50 69 74 88 58
Calculus 82 46 59 85 73 52 56 58 82 42
di= ai – bi 6 19 11 5 17 -2 13 16 6 16
6 19 11 5 17 -2 13 16 6 16
Rank of 3.5 10 5 2 9 1 6 7.5 3.5 7.5
T+= 54 and T- = 1
(a) H1 : μ1 ≠ μ2
T = min [T+, T- ] = min [ 54, 1] = 1
For  = 0.05, T0 = 8
Since T ≤ T0 , we reject H0
(b) H1: μ1 > μ2

The test statistic is T- = 1 and T0 = 11
Since T- ≤ T0 we also reject H0
16.6LARGE SAMPLE WILCOXON SIGNED RANK

TEST
When the sample size n ≥ 25, the test statistic is
298
where μ = n(n+1) / 4 and 2 = n(n+1) (2n+1) / 24
299
Problem Set 16
1. Observation from two random and independent

samples, drawn from two populations produced the
following data.
Sample 1 2 4 3 4 6
Sample 2 5 8 7 9 7
a) Use the Wilcoxon rank sum test to determine

whether population 1 is shifted to the left of population
2.
b) State Ho and H1
c) Calculate the appropriate test statistic
d) From your test can you conclude that population 1 is

shifted to the left of population 2? Take
2. In dependent random samples of size n1 = 20 and n2

= 25 are taken from non-normal populations. The
combined sample is ranked and T1 = 253.
Using the large – sample approximation to the Wilcoxon

rank sum test investigate if there is a difference in the
distributions of the two populations.
3. If we have two samples with n1 = 12 and n2 = 14 and

T1 = 194 is there a shift of distribution I to the right of
300
distribution 2? Take and use Wilcoxon rank
sum test.
4. Two estate valuers assessed 8 properties in GRA

in a certain city last year and the outcome yielded the
data below. (The ratings are in percentages)
Property 1 2 3 4 5 6 7 8
Assessor 1 75 87 81 92 66 80 75 77
Assessor 2 74 85 76 89 68 79 74 78
a) Test If Take and use sign

test.
b) Repeat the test in (a) using independent t – test
and compare your result to that of (a) and
comment.
5. Six bakers are contracted to deliver cakes in the

morning and evening of each day for six days. (A
baker takes only one day of the week) in a
boarding school. The mean densities of deliveries
are given below. Take
Day Mon Tue Wed Thur Fri Sat

Morning x2 0.145 0.112 0.108 0.151 0.141 0.153
Evening x2 0.138 0.130 0.122 0.163 0.146 0.173
Using Wilcoxon signed – rank test for a pared

experiment to test the hypothesis of no difference in the
population distributions of cake densities between
morning and evening deliveries by the contractors.
301
6. The following observations are from test scores of
candidates in an aptitude test at four different
locations in the same city. The groups used four
different teaching techniques.
Location 1: 88, 25, 75, 33
Location 2: 24, 80, 30, 70
Location 3: 25, 31, 65, 66
Location 4: 33, 24, 30, 64
Does the performance of candidates differ in the four

locations?. Use the Kruskal – Wallis test (KWT) (In a
more serious practical study, there is a modification to
this test when we have ties(See Montgomery, 1984)).
Take
302
REFERENCES
303
Bradley R. A (1964) Applications of the Modified Triangle
Test in Sensory Differences Trials, Journals of Food Science
29, 688-672.
Cochran W. G. (1977) Sampling Techniques 3/e John Wiley &

Sons, New York.
Cochran W. G. and Cox G. M (1980) Experimental Designs

John Wily & Sons New York.
Connor , L. R. and A.J. H. Morell (1981), Statistics in Theory

and Practice, Pitman, London.
Gacular M. C. and J. Singh (1984) Statistical Methods in Food

and Consumer Research, Academic Press Orlando, Florida.
Gilbert, R. O. (1987) Statistical Methods for Environmental

Pollution Monitoring, Van Nostrand Reinhod, New York.
Montgomery D. C.(1984) Design and Analysis of Experiments

2/e John Wiley & Sons.
Moser, C. and G. Kalton (1979) Survey Methods in Social

Investigation 2/e, Heinemann, London
Spiegel M. R. and L. J. Stephens (1999) Statistics Schaum’s

Outline Series 3/e McGraw-Hill, New York.
Sturges A, Journal of the American Statistical Association

March 1929, pp. 65-66.
Wilcoxon F. (1945) Individual Comparison of Grouped Data

by Ranking Methods, Biometric Bulletin 1, 80-83.
304
Wilcoxon F (1946) Individual Comparison of Grouped Data
by Ranking Methods Journal of Econ. Entomology 39, 269.
Wilcoxon F (1947) Probability Tables for Individual

Comparison by Ranking Methods, Biometrics 3, 119-122.
Wilcoxon F (1964) Some Rapid Approximate Statistical

Procedures Lederle Laboratories, Pear River, New York.
305

Fund-Statistics CorrectedVersion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fund-Statistics CorrectedVersion

Uploaded by

Copyright:

Available Formats

CHAPTER 1

In this chapter, we shall consider the following as

1.1 DEFINITION OF STATISTICS

Thirdly, statistics is a course of study which may be

In statistical inference, we make conclusions (inference)

1.2 TYPES OF DATA

1.2.1 INTERNAL AND EXTERNAL DATA

Data that come from outside the organization is simply

1.2.2 PRIMARY AND SECONDARY DATA

Secondary data are those that have been generated and

1.2.4 QUALITATIVE AND QUANTITATIVE DATA

1.2.5 DISCRETE AND CONTINUOUS DATA

1.3 POPULATION AND SAMPLE

When we collect data as the sample is drawn, we have

1.4 PARAMETER AND STATISTIC

Mean Sample Population

1.5 SAMPLING AND SAMPLING DESIGNS

Sampling can either be probability sampling or non-

When statistical randomness is not applied in selection,

1.6 SOME EXAMPLES OF PROBABILITY SAMPLING

1. SIMPLE RANDOM SAMPLING

1.7 SOME EXAMPLES OF NON-PROBABILITY

The reader can get more light on Sampling from

1.8 METHODS OF DATA COLLECTION

PUBLICATIONS: Data from published sources

SURVEY:The purpose of a survey is to gather

1. What do you mean by the word ‘Statistics’?

2. Write briefly on the following;

(a) Internal and external data

(b) Primary and secondary data

(c) Cross-sectional and time series data

(d) Qualitative and quantitative data

(e) Discrete and continuous data

3. Give example for data in Question two above.

(a) Statistic (estimator)

State the differences between a statistic and a

(a) Simple random sampling

(b) Stratified random sampling

(c) Cluster sampling

(d) Systematic sampling

(e) Two-stage sampling

7. Mention three examples of non-probability

8. What do you mean by probability sampling?

Mention at least four examples of probability

sampling. What do you think are the benefits for

using probability sampling?

BASIC MATHEMATICAL CONCEPTS

One does not have to be a mathematical guru to master,

In this chapter we will consider grammar of mathematical

2.1 THE LANGUAGE OF MATHEMATICAL

It is necessary to consider in a separate treatment the

VARIABLE: A variable is a symbol (X,Y, Q, etc) that

CONSTANT: A constant is a symbol that may

Suppose we are to add from i = 6 to i=10 we write

Generally, to add n items we use

2.2 RULES OF SUMMATION

= = (10 +10 +10 +10)

Similarly, if is the constant and = 20, n = 25

Rule 2: Multiplying each value of a variable by a

Thus, if c = 5 and n = 4, X1 = 3, X2= 6, X3 = 7, X4 = 9

Rule 3: Adding a constant to each value of a

Imagine a sample in which n = 4 and X 1 = 3, X2= 6, X3 =

Rule 4: Subtracting a constant from each value of a