Statistics Lecture Notes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 121

STATISTICS LECTURE NOTES

COURSE CONTENT

1.0 Purpose of the Course


The main purpose of the course is to prepare students to describe, gather and analyze
business data, and to use statistical and management science tools to make effective business
decisions in operations, finance, marketing, management, and new product development

2.0 Expected Learning Outcomes of the Course


At the end of the course students should be able to:
1.Enumerate the major concepts in statistics;
2.Analyse the fundamental decision making models;
3.Review the procedures used in sampling and hypothesis testing;
4.Apply correlation and regression models for decision making; and,
5.Determine the use of time series analysis in decision analysis.
3.0 Course Content
Introduction to statistics. Time Series Analysis and Index Numbers. Concepts of applied
probability. Fundamentals for decision making models. Risk based decision making, Correlation
and Regression, Sampling, Inference and Hypothesis Testing.

4.0 Course outline


4.1 INTRODUCTION (week 1)
3.1.1 Definition of Statistics,
3.1.2 Types of Statistics
3.1.3 Population, Sample and Variables
3.1.4 Functions of Statistics
3.1.5 Limitations of Decision-making
3.1.6 Levels of Measurement
4.2 DATA COLLECTION, ORGANIZATION AND PRESENTATION
4.2.1 Introduction
4.2.2 Organization and Presentation of Data
4.2.3 Graphical Representation of a Frequency Distribution

4.3 MEASURES OF CENTRAL TENDENCY


4.3.1 Introduction
4.3.2 Characteristics of a good average
4.3.3 Types of averages
4.3.4 Factors to consider in the choice of an average
4.3.5 Exercise

4.4 MEASURES OF DISPERSION


4.4.1 Introduction
4.4.2 Significance of measuring dispersion
4.4.3 Properties of a good measure of dispersion
4.4.4 Measures of dispersion
4.4.5 Skewness and Kurtosis
4.4.6 Exercises

4.5 PROBABILITY DISTRIBUTIONS


4.5.1 Introduction
4.5.2 Probability distribution function of a discrete random variable
4.5.3 Discrete Probability Distributions
4.5.4 Continuous Probability Distributions
4.5.5 Exercises

4.6 SAMPLING AND SAMPLING DISTRIBUTIONS


4.6.1 Introduction
4.6.2 Types of sampling Designs
4.6.3 Reasons for Sampling
4.6.4 Bias and Error in sampling
4.6.5 Sampling Distributions

4.7 ESTIMATION THEORY


4.7.1 Introduction
4.7.2 Point Estimation
4.7.3 Properties for a good estimator
4.7.4 Confidence Intervals for Population Mean when the Population Variance
is Known.
4.7.5 How Large a Sample?
4.7.6 Confidence Interval for Population Mean When the Population Variance is
Unknown

4.8 HYPOTHESIS TESTING


4.8.1 Introduction
4.8.2 The Null and Alternative Hypothesis
4.8.3 Type I and Type II errors
4.8.4 One-Tailed and Two-Tailed tests
4.8.5 Steps to be followed in testing a hypothesis
4.8.6 Test of Hypothesis (Single Population)
4.8.7 Test of Hypothesis (Two Populations)

4.9 CHI-SQUARE TESTS


4.9.1 Introduction
4.9.2 Test of Goodness of Fit
4.9.3 Test of Independence
4.9.4 Test of Homogeneity
4.9.5 Exercises

4.10 ANALYSIS OF VARIANCE


4.10.1 Introduction
4.10.2 Assumptions of Analysis of Variance
4.10.3 Computation of Analysis of Variance
4.10.4 One – Way Classification
4.10.5 Analysis of Variance Table
4.10.6 Exercises

4.11 REGRESSION AND CORRELATION ANALYSIS


4.11.1 Introduction
4.11.2 Correlation Analysis
4.11.3 Types of Correlation
4.11.4 Coefficient of Correlation
4.11.5 Methods of Studying Correlation
4.11.6 Test of Hypothesis Regarding Population Correlation Coefficient
4.11.7 Regression Analysis
4.11.8 Exercises
4.12 ADDITIONAL TOPICS
4.12.1 Linear programming
4.12.2 Introduction
4.12.3 Assumptions of linear programming
4.12.4 Methods of Solving Linear Programming Problems
4.12.5 Duality
4.12.6 Sensitivity Analysis
4.12.7 Exercises
4.12.8 Index Numbers
4.12.9 Introduction
4.12.10 Limitations of Index Numbers
4.12.11 Price index number
4.12.12 Decision Theory
4.12.13 Game Theory

5.0 Methods of Delivery


5.1 Lectures
5.2 Assignments readings
5.3 Discussions led by students given their experience in the industry
5.4 Case analysis and group discussions
5.5 Tutorials

5.0 Instructional Material and/ or Equipment


Overhead projector and LCD, whiteboard, Audio-visuals, computers, pens and
smart boards

6.0 Course Assessment

Continuous Assessments Tests x 2 20%


Term Paper/Assignments 15%
Class Presentation/Participation 05%
Final Examination 60%
Total 100%

7.0 Core Reading Materials for the Course


7.1 Sharma, J.K. (2008), Business Statistics, second edition, Pearson Publishers
7.2 Gupta S.C (2004), Fundamentals of Statistics, Himalaya Publishing

8.0 Recommended Reference Materials

8.1 Gupta, S.P. (2002), Statistical Methods, Sultan Chand and Sons
8.2 Aczel, B. & Sounderpandian, M. (2006), Complete Business Statistics, McGraw
Hill
8.3 Anderson, Sweeney and William, (2007), Statistics for Business Economics, 9th
edition, Thomson Publishing
8.4 Wisniewski, M. (2010), Quantitative Methods for Decision Makers, Prentice Hall;
5 edition
8.5 Curwin, J. (2001), Quantitative Methods for Business Decisions, Cengage
Learning Business Press; 5 edition
8.6 Waters, D. (2007), Quantitative Methods for Business, Prentice Hall; 4 edition
8.7 Terry Lucey (2007), Quantitative Methods, BookPower, Sixth Edition

................end...........
LESSON ONE: INTRODUCTION

1.1 Definition of Statistics


Statistics is a science that deals with the methods of collecting, organizing, presenting, analyzing
and interpretation of numerical data to assist in making more effective decisions.
According to the above definition, there are five stages in a statistical investigation.
i) Collection
 It is the first step in a statistical investigation
 Data form the foundation of any statistical analysis and therefore should be collected
with utmost care.
 If data are faulty, the conclusions drawn can never be reliable.
ii) Organization
 The large mass of figures that are collected from surveys usually need organization.
 The first step in organizing a mass of data is editing so that omissions, inconsistencies
and irrelevant answers may be corrected.
 The next step is to classify some common characteristics possessed by the items
constituting the data.
 The last step in organization is tabulation. The objective of tabulation is to arrange the
data in columns and rows so that there is clarity
iii) Presentation
 After the data have been collected and organized they are ready for presentation.
 Data presented in an orderly manner facilitates statistical analysis
iv) Analysis
 It’s a major step in any statistical investigation.
 Methods of analysis are numerous ranging from simple observation of data to highly
mathematical techniques.
 We consider only the most common methods of statistical analysis
v) Interpretation
 It entails drawing conclusions from the data collected and analyzed.
 Correct interpretation will lead to valid conclusions of the study and thus can aid in
decision making.
1.2 Types of Statistics
a) Descriptive statistics: It deals with processing data without attempting to draw any
inferences from it. It refers to the presentation of data in the form of tables and graphs and to
the description of some of its features such as averages.
b) Inferential/Inductive statistics: Refers to methods of using a sample to obtain information
about a population i.e. making conclusions about the population based on information from
the sample.

1.3 Population, Sample and Variables


 Population: is the totality of all the items or individuals whose characteristics we wish to
study. Examples of a population are all the eligible voters in an election.
 Sample: is a subset or section of the population that is used to represent the whole
population.
 Parameter: is any quantitative measure that describes a characteristic of a population e.g.
population mean (µ) or population variance (  2 ).
 Statistic: is a quantitative measure that describes a characteristic of a sample e.g. sample
mean ( x ) or sample variance ( s 2 ).
E.G. The mean height of the people in Kenya is a parameter, whereas the mean height of a
sample of 500 people is a statistic.
 Variable: is the characteristic that is being studied. It is represented by symbols X, Y, or Z.
Height of people, grades in a test etc are examples of variables.
 There are two kinds of variables:
a) Qualitative variables: Are variables that are non-numeric i.e. attributes e.g. Gender,
Religion, Color, State of birth etc.
b) Quantitative variables: are numeric variables e.g. the height of an individual when
expressed in feet or inches, etc. Quantitative variables are either discrete or continuous.
i) Discrete variables: Are variables, which can only assume certain values i.e. whole
numbers. Are always counted. E.G: number of children in a family, the number of
defective bulbs, etc.
ii) Continuous variables: Are variables, which can assume any value within a specific
range. Are always measured e.g. height, temperature, weight, radius etc.

1.4 Functions of Statistics


i) Definiteness i.e. statistics presents facts in a definite form:
 Statements or facts conveyed in exact quantitative terms are more convincing than vague
utterances.
 Statements like “the population of Kenya is growing at a very fast rate”, or “the prices of
various commodities are rising”, may not be very convincing as they don’t specify the
numerical dimensions involved.
ii) Condensation i.e. statistics simplifies a mass of figures
 Statistics helps in condensing a mass of figures into a few significant values e.g. mean,
mode, median, standard deviation, etc.
iii) Comparison:
 Statistics facilitates comparison.
 Unless figures are compared with others of the same kind they are foten devoid of any
meaning.
iv) It helps in formulation and testing hypothesis:
 Statistical methods are useful in formulating and testing hypothesis and to develop new
theories.
v) Prediction and formulation of policies:
 Statistical methods provide useful means of forecasting future events.
 Knowledge of future trends is very helpful in framing suitable policies and plans.

1.5 Applications of Statistical Knowledge in Business Management


i) Marketing
 Statistical analysis are frequently used in providing information for marking decisions
 E.G: Analysis of data on population purchasing power, habits of people, completion,
transportation costs etc should precede any attempt to establish a new market.
ii) Production
 The decision about what to produce, how to produce, when to produce, for whom to
produce is based largely on facts analyzed statistically.
iii) Finance
 The finance mangers in discharging their finance functions efficiently depend heavily on
statistical analysis of facts and figures.
 Financial forecasting, break even analysis and investment decisions under uncertainty are
part of their activities.
 The area of security analysis is also highly quantitative.
iv) Banking
 Banks need to gather and analyze information on the general economic consideration.
 Banks’ reserves are highly influenced by money markets which are not only local but
also international.
 The credit department performs statistical analysis to determine how much credit to
extend to various customers.
v) Purchase
 The purchasing department makes use of statistical data to frame suitable purchase
policies such as where to buy, how to buy, at what time to buy and at what price to buy.
vi) Accounting
 The auditing function makes frequent applications of statistical sampling and estimation
procedures.
 The account collects data on historical costs in the course of auditing a company’s
financial records and may use regression analysis to analyze the cost.
vii) Personnel
 The personnel department frames policies based on facts.
 It makes statistical studies of wage rates, incentive plans, cost of living, labor turnover
rates, employment trends, accident rates employment grievances, performance appraisal,
training programs etc.
 Such studies help the personnel department in the process of manpower planning.
viii) Investment
 Statistics greatly assists investors in making clear judgments in his investment decisions
in selecting securities which are safe and which have the best prospects of yielding a
good income.
1.6 Limitations of Decision-making
i) Statistics does not deal with isolated measurement
 Data are statistical when they relate to measurement of masses, not statistical when they
relate to an individual item or event as a separate entity.
 E.G: The wage earned by an individual worker at any one time taken by itself is not
statistical, but taken as a part of a mass of information, it may be a statistical data.
ii) Statistics deals only with quantitative characteristics
 Statements are numerical statements of facts. Thus qualitative characteristics like
honesty, efficiency, intelligence etc cannot be studied directly.
iii) Statistical results are true only on an average
 The conclusions obtained statistically are not universally true; they are true only under
certain conditions
iv) Statistics is only a means
 Statistical methods furnish only one method of studying a problem.
 They may not provide the best solution under all circumstances.
 Very often it may be necessary to supplement the conclusions arrived at by the help of
statistical with other methods
 In deciding a course of action, it may be necessary to take into account other factors like
the country’s culture, religion, philosophy, personal, political or other non-quantitative
considerations.
 Excessive dependence on statistics may lead to fallacious conclusions.
v) Statistics can be misused
 Statistics can be misused i.e. wrong interpretation. It requires experience and skill to draw
sensible conclusions from the data.
 E.G: If statistical conclusions are based on incomplete information or there is bias in
sampling.

1.7 Levels of Measurement


There are four levels of measurement; nominal, ordinal, interval and ratio.
a) Nominal scale
 It’s the lowest level of measurement
 It merely groups observations into categories based on common characteristics eg gender, race,
marital status, religion etc.
 Numbers are often assigned to the various categories for the purpose of identification. E.G: for the
variable marital status we can assign 1 = married, 2 = single, 3 = divorced, 4 = windowed, 5 =
separated.
 The numbers assigned to the various categories do no represent quantity or order and therefore
performing mathematical operations on these numbers would yield meaningless values.
 The counting of members in each group is the only possible arithmetic operation when a nominal
scale is employed. Accordingly we are restricted to use the mode as the measure of central
tendency. There is mo measure of dispersion used for nominal scales.
 Chi-square test is the most common test of statistical significance.

b) Ordinal scale
 Items are not only grouped into categories but they are also ranked into some order. Therefore in an
ordinal scale, numerals are used to represent relative position or order among the values of the
variables.
 The use of ordinal scale implies a statement of ‘greater than’ or ‘less than’ (equality is also
acceptable) without being able to state how much greater or less. The real difference between ranks
1 and 2 may be more or less than the difference between ranks 5 and 6.
 Since the numbers of this scale have only a rank meaning, the appropriate measure of central
tendency is the median. A percentile or quartile measure is used for measuring dispersion.
 Correlations are restricted to various rank order methods. Measures of statistical significance are
restricted to non-parametric methods.

c) Interval scale
 Numerals assigned to each measure are ranked in order and the intervals between them are equal.
Hence numerals used represent quantity and some mathematical operations would yield
meaningful values.
 However, the zero point is not meaningful, i.e. interval scales have an arbitrary zero and it is not
possible to determine for them what may be called an absolute zero or the unique origin.
 The primary limitation of the interval scale is the lack of a true zero; it does not have the capacity to
measure the complete absence of a trait or characteristic.
 The Fahrenheit scale is an example of an interval scale. One can say that an increase in temperature
from 30o to 40o involves the same increase in temperature as an increase from 60o to 70o, but one
cannot say that the temperature of 60o is twice as warm as the temperature of 30o because both
numbers are dependent on the fact that then zero on the scale is set arbitrarily at the temperature
of the freezing point of water. The ratio of the two temperatures, 30o and 60o, means nothing
because zero is an arbitrary point.
 Intervals scales provide more powerful measurement than ordinal scales since the interval scale
incorporates the concept of equality of interval.
 As such more powerful statistical measures can be used with interval scales. Mean is the
appropriate measure of central tendency, while standard deviation is the most widely used
measure of dispersion.
 Product moment correlation techniques are appropriate and the generally used tests for statistical
significance are the‘t’ test and ‘F’ test.
d) Ratio scale
 Ratio scales have an absolute or true zero of measurement. E.G: the zero point on a centimeter
scale indicates the complete absence of length or height. But an absolute zero of temperature is
theoretically unattainable and it remains a concept existing only in the scientist’s mind.
 Ratio scale represents the actual amounts of variables. Measures of physical dimensions such as
weight, height, distance, et. Are examples.
 All statistical techniques are usable with ratio scale and all mathematical operations (including
multiplication and division) can be used
 Geometric and harmonic means can be used as measures of central tendency and coefficients of
variation may also be calculated.
LESSON TWO: DATA COLLECTION, ORGANIZATION AND PRESENTATION

2.1 Introduction
 Data refers to any information or facts collected for reference or analysis.
 There are two types of data: secondary data and primary data.
Secondary Data
 Its data that been gathered earlier for some other purpose. In contrast, the data that are
collected first hand by someone specifically for the purpose of facilitating the study are
known as primary data.
 E.G: the demographic statistics collected every ten years are the primary data with the
registrar of persons but the same statistics used by anyone else would be secondary data
with that individual.
Advantages of secondary data
i) It is far more economical as the cost of collecting original data is saved.
ii) Use of secondary data is time saving.
Disadvantages of secondary data
i) One does not always know how accurate the secondary data are.
ii) The secondary data might be out dated.

 Before using secondary data it is important to consider the following:


i) Whether the data are suitable for the purpose of investigation
 The suitability of the data can be judged in the light of the nature and scope of
investigation.
 E.G: if the object of inquiry is to study the wage levels including allowances of
workers and the data relate to basic wages alone, such data would not be
suitable for the immediate purpose
ii) Whether the data are adequate for the purpose of investigation
 Adequacy of the data is to be judged in the light of the requirements of the
study and the geographical area covered by the available.
 E.G: if the object is to study wage rates of the workers in the sugar industry in
Kenya and if the available data cover only one region, it would not serve the
purpose.
 The question of adequacy may also be considered in the in the light of the time
period for which the data are available
 E.G: For studying trend of prices data for the last 8-10 years may be required
but if from the sources known the data available is for the last 5-6 years only,
this would not serve the object.
iii) Whether the data are reliable
 Reliability of the data has to do with the data collection procedures.
 To ensure reliability of the data one may need to determine the context in
which the data were collected, the procedure followed and the level of
accuracy exercised in the collection.
 Determination of the reliability of secondary data is perhaps the most
important and at the same time most difficult job.
Primary Data
 Primary data are measurements observed and recorded as part of an original study.
 The work of collecting primary data is usually limited by time, money and manpower
available.
 When the data to be collected are very large in volume, it is possible to draw reasonably
accurate conclusions from a sample.
 There are two methods of obtaining primary data:
a) Questioning
b) Observation
 Questions may be asked in person or in writing. A formal list of such questions is called a
questionnaire.
 When the data are collected by observation, the investigator asks no questions. Instead, he
observes and records the desired information.
 Of the two methods named above, the questionnaire method is more widely used for
calculating business data. Three different ways of communicating with questionnaires are
available
i) Personal interview
ii) Mail
iii) Telephone interview
 Personal interviews are those in which an interviewer obtains information from respondents
in face-to-face meetings. The information obtained by this method is likely to be more
accurate because the interviewer can clear-up doubts, can cross-examine the informants and
thereby obtain correct information.
 In mail surveys, questionnaires are mailed to respondents who are supposed to fill them and
return. They are appropriate where the field of investigation is very vast and the informants
are spread over a wide geographical area.
 Telephone interviews are similar to personal interviews except that communication between
interviewer and respondents is on telephone instead of direct personal contact.

2.3 Organization and Presentation of Data


 Data collected in an investigation and not organized systematically is called raw data. The
arrangement of this data in ascending or descending order of magnitude is called an array.
 The difference between the largest and the smallest value is called the range.
 E.G: The table below records the heights, in inches, of eight students. Column I represents
the raw data and column II illustrates the arrangement in an array.

Raw Data Array


66 65
68 66
72 66
65 68
66 68
73 69
68 72
69 73

 The largest value is 73 and the smallest is 65. Hence, the range is 73 – 65 = 8 inches.

Frequency Distribution
Ungrouped data
 In forming an array a value is repeated as many times as it appears. The number of times a
value appears in the listing is referred to as its frequency. In giving the frequency of a value,
we answer the question, “ How frequently does the value occur in the listing?”
 When the data is arranged in tabular form by giving its frequencies, the table is called a
frequency table. The arrangement itself is called a frequency distribution.
 Quite often it is useful to give relative frequencies instead of actual frequencies. The relative
frequency of any observation is obtained by dividing the actual frequency of the observation
by the total frequency (sum of all frequencies).
 If the relative frequencies are multiplied by 100 and expressed as a percentage, we get the
percentage frequency distribution.
 An advantage of expressing frequencies as percentages is that one can then compare
frequency distributions of two sets of data.
Example:
The following data were obtained when a die was tossed 30 times. Construct a frequency
table.
1 2 4 2 2 6 3 5 6 3
3 1 3 1 3 4 5 3 5 3
5 1 6 3 1 2 4 2 4 4

Grouped Data
 When dealing with a huge mass of data and when the observed values consist of too many
distinct values, it is preferable to divide the entire range of values and group the data into
classes.
 E.G: If we are interested in the distribution of ages of people, we could form the classes
0 – 19, 20 – 39, 40 – 59, 60 – 79 and 80 – 99. A class such as 40 – 59 represents all the
people with ages between 40 and 59 years inclusive.
 When data are arranged in this way, they are called grouped data. The number of
individuals in a class is called the class frequency.
 The following set of steps are suggested to form a frequency distribution from the raw data
i) Range
Scan through the raw data and find the smallest and the largest value. The largest
value minus the smallest value gives the range.
ii) Number of classes
Decide on a suitable number of classes. This could be anywhere from six to twenty.
iii) Class size
Divide the range by the number of classes. Round this figure to a convenient value to
obtain the class size and form the classes.
iv) Frequency
Find the number of observations in each class.
Example
The following data gives the amounts (in dollars) spent on groceries by 40 housewives during a
week.
22 12 9 8 33 32 30 33 8 11
21 16 12 15 37 30 16 22 12 24
18 25 37 16 25 28 25 18 9 28
25 28 26 15 12 35 38 16 24 31
Construct a frequency distribution using seven classes.

Class Intervals, Class Marks and Class Boundaries


 The blocks 10 – 20, 20 – 30, 30 – 40, etc are called class intervals. The lower ends of the
class intervals are called lower limits and their upper ends are called upper limits.
 The number of values specified in a given interval is called its length or width or
magnitude.
E.G: The class 1 – 3 has values 1, 2, 3 thus its length is 3.
The class 5 – 9 has values 5, 6, 7, 8, 9; the length or magnitude is 5
 There are two types of classes
i) Inclusive type: These are of the type 5 – 9, 10 – 14, 15 – 19, … where both the
upper and lower class limits are included in a given class.
ii) Exclusive type: These are of the type 5 – 10, 10 – 15, 15 – 20, … where the upper
class limit of a given class is the lower class limit of the succeeding class.
The class 5 – 10 has values 5, 6, 7, 8, 9 and the class 10 – 15 has 10, 11, 12, 13, 14.
NB: The conversion of inclusive type of classes to exclusive type is useful in calculating
certain measures such as mode and median.
 A point that represents the halfway or dividing point between successful classes is called a
class boundary. If d is the difference between the lower class limit of a given class and the
upper class limit of the succeeding class, then
1
Upper Class Boundary (UCB) = Lower Class Limit (UCL) + d
2
1
Lower Class Boundary (LCB) = Upper Class Limit (LCL) - d
2
 The class mark is defined as the mid point of a class interval. It is computed by adding the
lower and upper class limits of a class and then dividing by 2.
1
Mid point = UCB  LCB 
2
1
= UCB  LCB 
2
Example
Class L.C.L U.C.L L.B U.B Class mark
(Midpoint)
10 – 19 10 19 9.5 19.5 14.5
20 – 29 20 29 19.5 29.5 24.5
30 – 39 30 39 29.5 39.5 34.5
40 – 49 30 49 39.5 49.5 44.5
50 – 59 50 59 49.5 59.5 54.5

NB: The upper boundary of one class is the lower boundary of the next.

Cumulative Frequency Distribution:


 If a frequency distribution is arranged in the “less than” form, it is called a cumulative
frequency distribution which presents the accumulated.
 When the data is not grouped, a cumulative frequency distribution will show the number of
items less than or equal to a given value.
Example
The data below gives the weights of 30 people. Find the cumulative frequency distribution.

Weight Frequency Cumulative frequency (c.f)


140 3 3
150 5 8
160 6 14
170 7 21
180 6 27
190 3 30

 When the data is grouped, the cumulative frequency distribution gives the total frequency of
all the values less than the upper boundary of a given class.
Example
Find the cumulative frequency distribution for the grouped data given below:
Class Frequency Cumulative frequency (cf)
5 – 19 4 4
20 – 34 12 16
35 – 49 15 31
50 – 64 16 47
65 – 79 22 69
80 – 94 11 80

2.4 Graphical Representation of a Frequency Distribution


The following types of graphical representation are usually used for frequency distribution.
a) Histogram: It is a graph in which classes boundaries are marked on the horizontal axis and
and the class frequencies on the vertical axis. The class frequencies are represented by the
heights of the bars and the bars are drawn adjacent to each other.
b) Frequency polygons and Frequency Curve: A frequency polygon is a line graph where we
plot the class marks or midpoints along the horizontal axis and the corresponding frequencies
along the vertical axis. The class midpoints are connected with a line segment.
If the classes are very many and the class widths are so small that the midpoints are close
together, the polygon can be formed by free hand to give a smooth curve known as a
frequency curve.
c) Cumulative Frequency Curve or the Ogive. An ogive is a line graph obtained by
representing the upper class boundaries along the horizontal axis and the corresponding
cumulative frequency along the vertical axis.
2.5 Exercise
A random sample of 50 auto drivers insured with a company and having similar auto
insurance policies was selected. The following data shows monthly auto insurance premium
(in Kshs.000) paid by them.
54 40 45 20 60 30 35 40 55 70 20 15
45 60 45 25 15 30 25 18 35 25 45 56
59 25 27 39 50 56 20 25 30 30 41 25
56 48 45 25 35 60 55 48 38 34 60 60
60 64
i) Group the above data starting with the class 10 -20 exclusive
ii) Represent the data using a Histogram and an Ogive.
LESSON THREE: MEASURES OF CENTRAL TENDENCY

3.1 Introduction
 A measure of central tendency, also called measures of location or averages, is a single
value within the range of data that is used to represent all the values in the series.

3.2 Characteristics of a good average


Should be-
 Rigidly defined
 Based on all values
 Easily understood and calculated
 Least affected by the fluctuations of sampling
 Capable of further algebraic treatment
 Least affected by extreme values

3.3 Types of averages


The measures of central tendency that are generally used in business are:
a) Arithmetic mean
b) Median
c) Mode
d) Geometric mean
e) Harmonic mean

3.3.1 The Arithmetic Mean


It is obtained by summing up the values of all the items of a series and dividing this sum by the
number of items.
Computation of the arithmetic mean for
Individual series :-
Direct method
X 
X where X = arithmetic mean , n = number of items
n

Indirect method

 Dx
X  P.M . 
n
where P.M = provisional mean, Dx = Deviations from P.M,  Dx =
the sum of deviations from P.M

Grouped series
Direct method

X 
 xf Where f = frequencies, n = number of items
n

Indirect method

X  P.M . 
 fDx
n
NB: For a grouped frequency distribution the value of X is taken as the mid point of each class.

Examples
1. The monthly sales of ABC stores for the period of 6 months were as follows:
37,000, 48,000, 84,000, 73,000, 35,000, 53,000.

2.Calculate the mean of the following distribution


Number of vehicles serviced (x) 0 1 2 3 4 5
Number of days (f) 2 5 11 4 4 1

The following tables gives the marks of 58 students in statistics.


Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Number of students 4 8 11 15 12 6 2
Calculate the mean mark.

Advantages of the arithmetic mean


 Can be easily understood
 Takes into account all the items of the series
 It is not necessary to arrange the data before calculating the average
 It is capable of algebraic treatment
 It is a good method of comparison
 It is not indefinite
 It is used frequently.

Disadvantages of the arithmetic mean


 It is affected by extreme values to a great extent
 It may be a figure that does not exist in a series
 It cannot be calculated if all the items of a series are not known
 It cannot be used incase of qualitative data

Properties of Arithmetic Mean


1. The product of the arithmetic mean and the number of items is equal to the sum of all the
given values.
2. The algebraic sum of the deviations of the values from the arithmetic mean is equal to zero.
As such the mean may be characterized as a point of balance.
3. The sum of the squares of deviations from arithmetic mean is the least.
4. As the arithmetic mean is based on all the items in a series, a change in the value of any
item will lead to a change in the value of the arithmetic mean.
5. If we have the arithmetic means and number of observations of two or more than two groups,
we can compute combined mean of these groups using this formula:
N1 X 1  N 2 X 2
X 12 
N1  N 2
Examples
1.There are two branches of a company employing 100 and 80 employees respectively. If
arithmetic means of the monthly salaries paid by the two branches are $4570 and $6750
respectively. Find the arithmetic mean of the salaries of the employees of the company put
together.
3.3.2 The Median
 It is the middle value when data has been arranged increasing or decreasing magnitude.

Computation of the Median for Ungrouped data


 If the number of observations is odd, the median is the middle value after the observations
have been arranged in some order
 If the number of observations is even, the median is the arithmetic mean of the two middle
observations after the data has been arranged in some order

Computation of the median in discrete series with Frequencies


Steps
1. Construct the less than cumulative frequency distribution

2. Find N , where N   f
2

3. Check the cumulative frequency just greater than N


2
4. The corresponding value of the variable is the median

Computation of the Median in Grouped Data.


There are two approaches
1. Graphical method - Using the cumulative frequency curve (Ogive curve)
2. Interpolation formula

Interpolation Formula
Steps
1. Construct the less than cumulative frequency distribution

2. Find N , where N   f
2

3. Check the cumulative frequency just greater than N


2
4. The corresponding class contains the median and is called the median class.
The median has to be interpolated in the class interval containing the median using the
formula:-
hN 
median  L   C
f 2 
where L = Lower class boundary of the median class
h= Length of the classes
f = Frequency of the median class
N = Total frequency
C = cumulative frequency of the class preceding the median class.
Examples
1. Find the median of the data below:
a) 5, 5, 4, 7, 0, 7, 8
b) 20, 15, 30, 45, 60, 10
2. Determine the median of the data below.

Grade A B C D E

No of students (f) 10 15 67 50 21

3. Determine the median for the grouped data below.


Marks 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 70 – 79 80-89

No of students 2 7 15 30 20 4 1

Properties of the Median


 It is a positional average and is influenced by the position of the items in the series and not
by the size of items
 The sum of the absolute values of deviations is least.

Advantages of the Median


 It is easy to calculate
 It is simple and is understood easily
 It is less affected by the value of extreme items
 It can be calculated by inspection in some cases
 It is useful in the study of phenomenon which are of qualitative nature
Disadvantages of the Median
 It is not a suitable representative of a series in most cases
 It is not suitable for further algebraic treatment
 It is not used frequently like arithmetic mean

Quartiles, deciles and percentiles


 Quartiles are the values of the items that divide the series into four equal parts.
 Deciles divide the series into 10 equal parts.
 Percentiles divide the series into 100 equal parts.
 The 2nd quartile, 5th decile and 50th percentile are equal to the median.

Computation of the Quartiles


h N 
First Quartile: Q1  LQ1   C
f Q1  4 

h  2N 
Second Quartile: Q2  LQ2   C
f Q2  4 

h  3N 
Third Quartile: Q3  LQ3   C
fQ3  4 

In general, the three quartiles can be computed for grouped data by the formula
h  iN 
Qi  LQi   C
fQi  4 

where LQi = Lower class boundary of the ith quartile class

h = Length of the classes


f Qi = Frequency of the ith quartile class

N = Total frequency
C = Cumulative frequency of the class preceding the ith quartile class.
Computation of the Deciles
h  iN 
Di  LDi   C
f Di  10 

where LDi = Lower class boundary of the ith decile class

h = Length of the classes


f Di = Frequency of the ith decile class

N = Total frequency
C = Cumulative frequency of the class preceding the ith decile class.
Computation of the Percentiles
h  iN 
Pi  LPi   C
f Pi  100 

where LPi = Lower class boundary of the ith percentile class

h = Length of the classes


f Pi = Frequency of the ith percentile class

N = Total frequency
C = Cumulative frequency of the class preceding the ith percentile class.

NB: Analogous to the graphical method of estimating the median, the quartiles, deciles and
percentiles of a grouped frequency distribution can be estimated using the cumulative
frequency curve (ogive curve).
Examples
1. Find the 1st , 2nd and 3rd quartiles for the following data
13, 9, 18, 15, 14, 21, 7, 10, 11, 20, 5, 18, 25, 16, 17

2. Given below is the number of families in a locality according to their monthly expenditure
Monthly expenditure No. of families
140 - 150 17
150 - 160 29
160 - 170 42
170 - 180 72
180 - 190 84
190 – 200 107
200 – 210 49
210 – 220 34
220 – 230 31
230 – 240 16
240 – 250 12

Calculate:
i) All the quartiles
ii) 7th decile
iii) 90th percentile

3.3.3 The Mode


 The mode is the value, which occurs most often in the data. A distribution with one mode is
called unimodal, with two modes bimodal and with many modes, multimodal distribution.
 There are two methods that can be used to estimate the mode of grouped data .
a) Graphically, using a histogram
b) Using an interpolation formula

Graphical determination of mode


Procedure
1. Construct a histogram for the data
2. Locate the highest cell in the histogram, join the upper class boundary of the cell with the upper
boundary of the preceding cell; then join the lower class boundary of the highest cell with the lower
class boundary of the succeeding cell, locate the intersection,
3. Draw a vertical line from the intersection to the horizontal.
4. The value of the vertical line on the horizontal axis is the mode.

Interpolation Formula
h  f m  f1 
Mode  L 
2 f m  f1  f 2

 D1 
= L i
 1
D  D2

Where L Lower class boundary of the modal class


h= Length of the classes
f m  Frequency of the modal class

f1  Frequency of the class preceding the modal class


f2  Frequency of the class succeeding the modal class

D1  f1  f 0 , D2  f1  f 2
Examples
1. Find the mode for the data below
a) 1, 2, 3, 4, 5, 6; Solution: The mode does not exist
b) 7, 8, 3, 8, 6, 10, 8 Solution: Mode = 8; This is a uni-modal distribution
c) 29,30,60,13,30,7,2,7 Solution: Modes are 30 and 7; This is a bi-modal distribution
d)
X 4 5 6 7 8 9 10
F 2 5 21 18 9 2 1

Solution: Mode = 6; it has the highest frequency.

2. Calculate the mode for the following data

Class (marks) No of student


0 – 10 2
10 – 20 7
20 – 30 11
30 – 40 6
40 – 50 4
Properties of the mode
 It represents the most typical value of the distribution and it should coincide with existing
items
 It is not affected by the presence of extremely large or small items
Advantages of the Mode
 It is easy to understand
 Extreme items do not affect its value
 It possesses the merit of simplicity

Disadvantages of the Mode


 It is often not clearly defined
 Exact location is often uncertain
 It is unsuitable for further algebraic treatment
 It does not take into account extreme values.

Relationship between the mean, median and mode


There usually exists a relationship among the mean, median and mode for moderately
asymmetrical distributions.
 If the distribution is symmetrical, the mean median and mode will have identical values.
 If the distribution is skewed (moderately) the mean, median and mode will pull apart. If the
distribution tails off towards higher values, the mean and the median will be greater than the mode
i.e. In case, a distribution is skewed to the right, then mean> median> mode. Generally, income
distribution is skewed to the right where a large number of families have relatively low income and a
small number of families have extremely high income. In such a case, the mean is pulled up by the
extreme high incomes.
If it tails off towards lower values, the mode will be greater than either of the two measures i.e. When a
distribution is skewed to the left, then mode> median > mean. This is because here mean is pulled down
below the median by extremely low values.

In either case the median will be about one third as far away from the mean as the mode is. This means
that

Mode = mean –3 (mean – mode)


= 3(median) – 2(mean)

3.3.4 Geometric Mean


Geometric Mean (GM) is the nth root of the product of n values
For ungrouped data

G.M  n x1  x2  ...  xn
1
 G.M   x1  x2  ...  xn  n
1
 Log G.M   log x1  log x2  ...  log xn 
n
1
 log G.M 
n
 log xi

 G.M  Anti log


 Logx
n
Grouped data

G.M  N x1f1  x2f2  ...xnfn


1


 G.M  x1fn  x2f2  ...  xnfn  N

1
 Log G.M   f1 log x1  f 2 log x2  ...  f n log xn 
N
1
 log G.M 
N
 f log x
i i

 G.M  Anti log


 f Logx
i i
where N   f
N
Examples
1. The weekly incomes (‘000) of 10 families are given below. Find the geometric mean?
50, 80, 45, 70, 15, 75, 85, 40, 36, 25

2. Calculate the geometric mean of the given data


X 15 20 25 30 35 40 45 50
F 2 22 29 24 7 8 6 2

Merits of the Geometric mean


 It takes into account all the items in the data and condenses them into one representative
value.
 It gives more weight to smaller values than to large values.
 It is amenable to algebraic manipulations
Demerits
 It is difficult to use and compute
 It is determinate for positive values and cannot be used for negative values or zero.

3.3.5 Harmonic Mean


It is the reciprocal of the arithmetic mean of the reciprocal of a series of observations.
Ungrouped data
n n
 x
H.M = =
1 1 1
  ... 
1
x1 x2 xn

Grouped data

f f1  f 2  ...  f n

 f x 
H.M = =
f1 f 2 f
  ...  n
x1 x2 xn

Examples
1. Calculate the Harmonic mean of the following data
11, 13, 15, 16, 19, 22, 13, 20

2. Calculate the Harmonic mean of the following data


X 15 20 25 30 35 40 45 50
F 2 22 29 24 7 8 6 2

Merits of the Harmonic mean


 It takes into account all the observations in the data
 It gives more weight to smaller items
 It is amenable to algebraic manipulations
 It measures the rates of change
Demerits
 It is difficult to compute when the number of items is large
 It assigns too much weight to smaller items.

3.4 Factors to consider in the choice of an average


 The purpose for which the average is being used
 The nature, characteristics and properties of the average
 The nature and characteristics of the data.

3.5 Exercise
1. What are the requirements of a good average? Compare the mean, the median and the mode
in the light of these requirements.
2. Find the mean, median and mode for the following set of data
i) 3, 5, 2, 6, 5, 9, 5, 2, 8 and 6
ii) 51.6, 48.7, 50.3, 49.5 and 48.9
3. The following data pertain to marks obtained by 120 students in their final examination in
mathematics:
Marks Number of Students
30 -39 1
40 – 49 3
50 – 59 11
60 – 69 21
70 – 79 43
80 -89 32
90 - 99 9
Total 120
Calculate the mode and the median.
4. Suppose we are given the following series:
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70
interval

Frequency 6 12 22 37 17 8 5

i) Draw the histogram and the Ogive from these data


ii) Estimate the median and the mode from the graphs in (i) above
5. The mean of marks in statistics of 100 students of a class was 72. The mean of marks of boys
was 75 while their number was 70. Find out the mean mark of girls in the class.
LESSON FOUR: MEASURES OF DISPERSION

4.1 Introduction
 Dispersion refers to the degree to which numerical data tends to spread about an average
value. It is the extent of the scatteredness of items around a measure of central tendency.
 The measures of dispersion are also referred to as measures of variation or measures of
spread.

4.2 Significance of measuring dispersion


 To determine the reliability of an average
 To serve as a basis for the control of the variability
 To compare two or more series with regard to their variability
 To facilitate the use of other statistical measures

4.3 Properties of a good measure of dispersion


It should be: -
 Simple to understand
 Easy to compute
 Rigidly defined
 Based on each and every item in the distribution
 Amenable to further algebraic calculations
 Have sampling stability
 Not be unduly affected by extreme values
NOTE:
The measures of dispersion which are expressed in terms of the original units of the observations
are termed as absolute measures. Such measures are not suitable for comparing the variability of
two distributions which are not expressed in the same units of measurements. Therefore it is
better to use relative measure of dispersion obtained as ratios or percentages and are thus pure
numbers independent of the unit of measurement.
4.4 Measures of dispersion
 Range
 Interquartile Range and Quartile Deviation
 Mean deviation
 Standard deviation / Variance

4.4.1 The Range


It is the difference between the smallest value and the largest value of a series
Example
The following are the prices of shares of a company from Monday to Saturday.
Day Monday Tuesday Wednesday Thursday Friday Saturday
Price 200 210 208 160 220 250

Calculate the range.


Solution: Range = L – S
= 250 – 160 = 90
NB:
In case of grouped frequency distribution the range is the difference between the upper class
boundary of the largest class and the lower class boundary of the smallest class.

Advantages of the Range


 It is the simplest to understand and compute
 It takes the minimum time to calculate the value of the range

Limitations
 It is not based on each and every value of the distribution
 It is subject to fluctuations of considerable magnitude from sample to sample
 It cannot be computed in case of open-ended distributions
 It does not explain or indicate anything about the character of the distribution within the
two extreme observations.

Uses of the range


 Quality control
 Fluctuations of prices
 Weather forecast
 Finding the difference between two values e.g. wages earned by different employees.

4.4.2 The Interquartile Range and Quartile Deviation


Interquartile range: it’s the difference between the third quartile and the first quartile
i.e. Interquartile range = Q3 – Q1
Quartile Deviation: also called the semi-interquartile range. It’s obtained by dividing the
interquartile range by 2.
Q3  Q1
i.e. Q.D = where Q.D = Quartile Deviation
2

4.4.3 The Mean Deviation


It is the average amount of scatter of the items in the distribution from the mean, median or
mode, ignoring the signs of deviation. If x1 , x2, ..., xn are n observations then the mean deviation

about the mean is calculated as;

For ungrouped data: M .D 


 xx
n

For grouped data: M .D 


 f xx
f
Examples
1. Calculate the mean deviation of the following values
3000, 4000, 4200, 4400, 4600, 4800, 5800

2. Calculate the average deviation from the mean for the following
Sales (thousands) 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of days (f) 3 6 11 3 2
Merits of Mean Deviation
1. It is easy to compute and understand
2. It uses all the data
3. It is less affected by the extreme values
4. Since deviations are taken from a central value, comparison about formation of different
distributions can easily be made.
5. It shows the significance of an average in the distribution

Demerits
1. Ignores algebraic signs while taking the deviations
2. Cannot be computed for distributions with open-ended class
3. Rarely used in sociological studies

4.4.4 The Variance and Standard Deviation


 The variance of a set of observations is the average squared deviations of the data points
from their mean. Variance is the mean square deviation. It is denoted by s 2 for sample data
and  2 for population data.
 Standard deviation is the square root of the variance. It is denoted by s for sample data and
 for population data.

Computing the Variance


 Variance for ungrouped data

 x  x 
2

  x  x 
2
2
 , where = sum of squares of the deviations from arithmetic mean
n
 Variance for grouped data

 f x  x
2

 2

f
Computing the standard deviation
Standard deviation for ungrouped data

x  x 
2


n

Standard deviation for grouped data

 f x  x
2


f

NB: The computations of  2 can be simplified by using the following version of the formula

For ungrouped data:  2 


 x x
2
2

For grouped data:  2



 fx   x 
2
2

f
Examples
1. Find the standard deviation of the wages of the following ten workers working in a factory
Worker A B C D E F G H I J
Weekly Sales 1320 1310 1315 1322 1326 1340 1325 1321 1320 1331

2. An analysis of production rejects resulted in the following figures:


No. of rejects per 21 - 25 26 - 30 31 -35 36 - 40 41 - 45 45 - 50 51 - 55
operator
No of operators (f) 5 15 28 42 15 12 3
Calculate the mean and standard deviation

Combined standard deviation


Combined arithmetic mean for two sets of data with arithmetic means x1 , x 2 and the number of

N1 X 1  N 2 X 2
observations n1 n 2 is given by X 12 
N1  N 2
Combined standard deviation of two series is given by

N112  N 2 2 2  N1d12  N 2 d 2 2
12 
N1  N 2
where 12 = Combined standard deviation

1 = standard deviation of the first group


 2 = standard deviation of the second group

d1  X 1  X 12 ; d 2  X 2  X 12

NB: The above formula can be extended to find out the standard deviation of three or more
groups. For example, combined standard deviation of three groups would be

N112  N 2 2 2  N3 3  N1d12  N 2 d 2 2  N3d3


123 
N1  N 2  N3

Where d1  X 1  X 123 ; d 2  X 2  X 123 ; d3  X 3  X 123

Example
1. The number of workers employed, the mean wage per week and the standard deviation in each
branch of a company are given below. Calculate the mean wages and standard deviation of all
workers taken together for the factory.

Branch No. of workers Weekly mean wage Standard deviation


A 50 1413 60
B 60 1420 70
C 90 1415 80

Advantages of the standard deviation


 It is rigidly defined and is based on all the observations of the series
 It is applied or used in other statistical techniques like correlation and regression analysis
and sampling theory.
 It is possible to calculate the combined standard deviation of two or more groups.

Disadvantages of the standard deviation


 It cannot be used for comparing the dispersion of two or more series of observations given
in different units.
 It gives more weight to extreme values.

Coefficient of Variation
The measures of dispersion which are expressed in terms of the original units of the
observations are termed as absolute measures. Such measures are not suitable for comparing
the variability of two distributions which are not expressed in the same units of measurements.
Therefore it is better to use relative measure of dispersion obtained as ratios or percentages and
are thus pure numbers independent of the unit of measurement.

Standard deviation is an absolute measure of dispersion and a relative measure based on the
standard deviation is called the coefficient of variation. It is a pure number and suitable for
comparing the variability, homogeneity or uniformity of two or more distributions. It is given as
a percentage and calculated as

Coefficient of variation (CV) =  100
Mean
The lower the C.V the more consistent or stable the distribution is since the less the variability.
Example
Over a period of 3 months the daily number of components produced by two comparable
machines was measured, giving the following statistics
Machine A: mean = 242.8; Standard deviation = 20.5
Machine B: mean = 281.3; Standard deviation = 23.0
Which machine has less variability in its performance?

4.5 Skewness and Kurtosis


 The term ‘skewness’ refers to lack of symmetry or departure from symmetry. When a distribution is
not symmetrical it is called a skewed distribution.
 In a symmetrical distribution the values of mean, median and mode are alike. If the value of mean is
greater than the mode, skewness is said to be positive. If the value of mode is greater than mean,
skewness is said to be negative.
 The Karl Pearson’s coefficient of skewness is frequently used for measuring skewness and its
calculated as
Mean  Mode
SK p 

But Mean  Mode  3  Mean  Median  . Thus the formula for calculating the coefficient of

skewness can be written as

3  Mean  Median 
SK p 

 Kurtosis refers to the degree of flatness or peakedness of a frequency curve. The degree of
peakedness of a distribution is measured relative to the peakedness of the normal distribution.
If a distribution is more peaked than the normal curve, it is called Leptokurtic; if it is more flat-
topped than the normal curve, it is called platykurtic or flat-topped. The normal curve is itself
known as Mesokurtic.

Leptokurtic curve
Freq

Mesokurtic (normal
curve)
Platykurtic curve
4.6 Activities
1. The following table indicates the marks obtained by students in a statistics test.
Marks Number of students
0 – 20 5
20 – 40 7
40 – 60 -
60 – 80 8
80 – 100 7
The arithmetic mean for the class was 52.5 marks. You are required to determine the value
of:
i) The missing frequency
ii) The median mark
iii) The modal mark
iv) The standard deviation
v) The coefficient of skewness
2. From the prices of the shares X and Y given below, state which share is more stable in value
and which one would you invest on and why?
X: 55 54 52 53 56 58 52 50 51 49
Y: 108 107 105 105 106 107 104 103 104 101

3. An analysis of the monthly wages paid to workers of two firms A and B belonging to the
same industry gives the following results:
Firm A Firm B
No. of wage earners 586 648
Average monthly wage 52.5 47.5
Standard deviation 10 11
Compute the combined standard deviation.
LESSON FIVE: PROBABILITY DISTRIBUTIONS

5.1 Introduction
 Probability is the likelihood or chance that a particular event will occur.
 In probability and statistics the term experiment refers to any procedure that gives rise to a
collection of outcomes which cannot be predetermined.
 In tossing a coin, the possible outcomes are as follows:
Tossing 1 coin : H , T 
Tossing 2 coins:  HH , HT , TH , TT 
Tossing 3 coins:  HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 

 The set of all possible outcomes in an experiment is called a sample space.


 An event is a subset of the sample space.

EXAMPLE
Let the set of all outcomes (sample space) in the experiment of tossing two coins be

HH , HT , TH , TT  . Then
A= HT , TH  is the event of getting just one head/tail
B= HH , HT , TH  is the event of getting atleast one head
=   is the the impossible event
S = HH , HT , TH , TT  is the sure event

 An elementary event or simple event is the event containing only one point of the sample
space E.G: In the Toss of two coins, the following are elementary events:

HH  ,HT  ,TH  ,TT  .


 A random variable is a function which assigns a numerical value to each simple event in a
sample space.
Example
Suppose that three students are selected at random from a class and each is asked whether he
smokes (S) or he does not (N). Then the sample space of this experiment is given by
S  SSS , SSN , SNS , SNN , NSS , NSN , NNS , NNN

 Let X denote the number of smokers among the three students chosen. Then:
Simple event in S Random variable X
SSS 3
SSN 2
SNS 2
SNN 1
NSS 2
NSN 1
NNS 1
NNN 0

Thus X is a random variable which takes the values 0, 1, 2, or 3.


 If a random variable can assume only a countable number of distinct values, it is called a
discrete random variable.
E.G: The number of children in a family, the number of telephone calls at a switchboard in
ten minutes period etc.
 A continuous random variable is one that can assume any value within a given time
interval.
E.G: Lifetime of an electric bulb, weight of a person etc.

5.3 Probability distribution function of a discrete random variable

 The probability distribution of a random variable can be described by using all the values that
a random variable can together with the corresponding probabilities. Such a listing is called a
probability distribution or probability mass function of the random variable.
Example
Suppose X represents the number of heads in a random experiment of tossing three coins.
The sample space is:
S  HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 

The probability distribution of the random variable X defined as the “number of heads” is
x P(X =x)
1
0 8

3
1 8

3
2 8

1
3 8

 In general, suppose X is a random variable that assumes the values x1 , x2, …, xk. if we
represent the probability that X assumes the value xi by P(X=xi), then the probability function
can be given in the form of a table as
X P(x)
x1 p(x1)
x2 P(x2)
. .
. .
. .
xk P(xk)

Sum = 1

k
 The sum of the probabilities, i.e.  p( x )  p( x )  P( x )  ...  P( x ) is one.
i 1
i 1 2 k

Conditions for a function to be a probability function


i) The probability that a random variable assumes a value xi is always between 0 and
1,
i.e. 0  p( xi )  1
k
ii) The sum of all probabilities is equal to one, i.e.  p( x )  1
i 1
i

Example
The number of telephone calls received in an office between 9 – 10 am has the probability
distribution as shown below:
Number of calls (X) Probability, P(x)
0 0.05
1 0.20
2 0.25
3 0.20
4 0.10
5 0.15
6 0.05

a) Verify that it is a probability function


b) Find the probability that there will be 3 or more calls
c) Find the probability that there will be an even number of calls
The Mean or Expected Value of a Discrete Random Variable
 It is obtained by multiplying each possible value of the random variable by the corresponding
probability and summing the terms. That is, if x1 , x2 ,...xn are the values assumed by a random

variable with respective probabilities p( x1 ), p( x2 ),... p( xn ) , then its mean  (also called the
expected value) is given by
  x1 p  x1   x2 p  x2   ...xn p  xn 
n
= xi p  xi 
i 1

The mean  is also referred to as the expected value is denoted by E (x ).

The Variance of a Discrete Random Variable


 The variance of a discrete random variable is defined as
n
Var  X     xi    . p  xi 
2

i 1

 The positive square root of the variance is called the standard deviation of the random
variable. The variance is commonly denoted as  2 , hence the standard deviation equals  .

Example
Suppose we are given the following data relating to the breakdown of a machine in a certain
company during a given week, where x represents the number of breakdowns of the machine and
P(x) represents the probability value of x.
x 0 1 2 3 4
P(x) 0.12 0.20 0.25 0.30 0.13

Find the mean and the variance of the number of breakdowns per week for this machine

NB: The computations of  2 can be simplified by using the following version of the formulae:
 2   x2 .P  x    2

5.4 Discrete Probability Distributions


Binomial Probability Distribution
Characteristics
i) An outcome on each trial of an experiment is classified into one of two mutually exclusive
categories; a success or a failure.
i) The probability of a success (p) remains the same from trial to trial and so does the
probability of a failure (q), where p +q = 1.
ii) The trials are independent i.e. the outcome of one trial does not affect the outcome of any
other trial.
 We are interested in the random variable x, where x is the number of successes in n trials.
 It is common to refer to each trial as a Bernoulli trial and to refer to the entire experiment as a
binomial experiment.
 Given a Bernoulli Process where the probability of success in any trial equals p and the
probability of a failure equals q, the probability of x successes in n trials is calculated as
n
p  n, x     p x q n  x
 x
The mean of a Binomial distribution  = np

The variance  2  npq


Example
There are five flights daily from Moi International airport to Jomo Kenyatta International airport.
Suppose the probability that any flight arrives late is 0.2. What is the probability that: -
i) None of the flights are late today?
ii) Exactly one of the flights is late today?

5.5 Continuous Probability Distributions


Normal Probability Distribution
Characteristics
 It is bell shaped and has a single peak at the center of the distribution
 The arithmetic mean, median and mode of the distribution are equal and located at the peak.
 Half of the area under the curve is above this center point and the other half is below it.
 It is symmetrical about its mean i.e. if it is cut vertically at the central value, the two halves
will be mirror images
 It is asymptotic i.e. the curve gets closer and closer to the x-axis but never actually touches
it.
 Since the normal distribution is a continuous distribution, the probabilities are given in
terms of appropriate areas, and the total area under the curve is equal to 1. Thus the
probability that a random variable X having a normal distribution will assume a value
between two numbers a and b is equal to the area under the curve between x = a and x = b,
as shown below:

The standard normal probability distribution ( = 0,  = 1)


 The standard normal curve describes the distribution of a normal random variable with
mean zero and standard deviation 1. The random variable itself is called the standard normal
variable and is denoted by Z.
E.G: To find the area between z = 0 and z = 1.73, we go to 1.7 in the column and 0.03 in the
row and read the corresponding entry as 0.4582. Hence the area between 0 and 1.73 is
0.4582 and P  0  z  1.73  0.4582

NB:
i) The curve is symmetrical w.r.t the vertical axis through zero
ii) It is strongly recommended that we sketch the curves and identify the areas under the
curve and the values along the horizontal axis.

EXAMPLES
1. If P  0  z  c   0.3944 . Find c.

2. Find P  2.42  z  0.8

3. Find a) P 1.8  z  2.8 b) P  2.8  z  1.8

4. Find a) P( z  2.13 b) P( z  1.81


5. Suppose z is a standard normal variable. In each of the following cases find c for which
a) P  z  c   0.1151

b) P  z  c   0.8238

c) P 1  z  c   0.1525

d) P  c  z  c   0.8164

 Having considered areas under the standard normal curve, we now consider the general case
of a normal distribution with any mean  and any standard deviation  , where   0 .
 If X is a normal random variable with mean  and standard deviation  , then X can be
X 
converted into a standard normal variable z by setting z 

EXAMPLE 6
Suppose X has a normal distribution with  = 30 and   4. Find
a) P(30  X  35) b) P( X  40) c) P( X  22)

5.6 Activities
1. A salesman who sells cars for General Motors claims that he sells the largest number of cars
on Saturday. He has the following probability distribution for the number of cars he expects
to sell on a particular Saturday.
No. of cars (x) Probability P(x)
0 .1
1 .2
2 .3
3 .3
4 .1
Total 1.0
i) On a typical Saturday, how many cars does the salesman expect to sell?
ii) What is the variance of the distribution?
2. In a recent survey, 90% of the homes in a city were found to have colored TV’s. In a sample
of nine homes, what is the probability that:
i. All nine have colored TV’s?
ii. Less than five have colored TV’s?
iii. More than five have colored TV’s?
iv. At least seven homes have colored TV’s?
3. The life times of electric components manufactured by Raman Industries Ltd are normally
distributed with mean of 2500 hours and standard deviation of 600 hours. If the daily production is
500 components, how many are expected to have a life time of:
i) Less than 2600 hours
ii) Between 2350 hours and 2580 hours
iii) More than 2380 hours
LESSON SIX: SAMPLING AND SAMPLING DISTRIBUTIONS

6.1 Introduction
 The field of inferential or inductive statistics is concerned with studying facts about populations.
Specifically, the interest is in learning about the population parameters. This is accomplished by
picking a sample and computing the values of the appropriate statistics.
 A parameter is a numerical descriptive measure of a population. Because it is based on the
observation in the population, its value is almost always unknown.
 A Sample statistic is a numerical descriptive measure of a sample. It is calculated from the
observations in the sample.
NB: The term statistic refers to sample quantity and the term parameter refers to a population
quantity.
 Sampling is the process of selecting a sample from a population.

6.2 Types of sampling Designs


There are two major ways of selecting samples;
a) Probability sampling methods
b) Non - Probability sampling methods

a) Probability sampling methods


i) Simple random sampling
 Assumes that every member of the population has an equal chance of being independently
selected. All members of the population are labeled with a number and random numbers
should be used to select the sample.
 This is the best method of sampling as independence of sample members is assumed by
many statistical tests. Unfortunately all members of the population have to be available for
selection and this is rarely the case.
ii) Systematic sampling
 It is useful when the whole sampling frame is not available. The population is listed and
every nth member is included in the sample after the first has been selected randomly.
 Sampling from a production line may make use of this method.
iii) Stratified random sampling
 Useful when the population consists of a number of distinct subpopulations and there is no
difference between the subpopulations than within each of them.
 The population is split into these differing groups – strata. A random sub-sample is then
drawn from each, in proportion to the strata size.
iv) Cluster Sampling:
The population is divided into internally heterogeneous subgroups and some are randomly
selected for further study. It is used when it is not possible to obtain a sampling frame
because the population is either very large or scattered over a large geographical area.

b) Non-probability sampling
It is used when a researcher is not interested in selecting a sample that is representative of the
population.
i) Purposive Sampling
It allows the researcher to use cases that have the required information with respect to the
objectives of his or her study e.g. educational level, age group, religious sect etc.
ii) Quota Sampling
The researcher purposively selects subjects to fit the quotas identified e.g. Gender: Male or
Female; Class Level: Graduate or Undergraduate; Religion: Muslim, Protestant, catholic,
Jewish; Social economic class: Upper, middle or lower.
iii) Snow ball sampling
It is used when the population that possesses the characteristics under study is not well
known and can be best located through referral networks. Initial subjects are identified who
in turn identify others. Commonly used in drug cultures, teenage gang activities, Mungiki
sect, insider trading, Mau Mau etc.
iv) Convenience or Accidental Sampling
Involves selecting cases or units of observation as they become available to the researcher
e.g. asking a question to the radio listeners, roommates or neighbours.
6.3 Reasons for Sampling
We obtain a sample rather than a complete enumeration (a census) of the population for many
reasons. There are six main reasons for sampling in lieu of the census.
i) Economy: Directly observing only a portion of the population requires fewer resources than a
census.
ii) The Time factor: A sample may provide an investigator with needed information quickly
iii) The very large populations: Many populations about which inferences must be made are quite
large and sample evidence may be the only way to obtain information.
iv) Partly inaccessible populations: Some populations contain elementary units so difficult to
observe that they are in a sense inaccessible e.g. in determining consumer attitudes not all of the
users of a product can be queried.
v) The Destructive nature of the observation: Sometimes the very act of observing the desired
characteristics of the elementary unit destroys it for the use intended. Classical examples of this
occur in quality control
vi) Accuracy and sampling: A sample may be more accurate than a census. A sloppily conducted
census can provide less reliable information than a carefully obtained sample.

6.4 Bias and Error in sampling


A sample is expected to mirror the population from which it comes from. However, there is no
guarantee that any sample will be precisely representative of the population. One of the things
that make a sample unrepresentative of its population is the sampling error.

Sampling error: It comprises the difference between the sample and the population that are due
solely to the particular elementary units that happen to have been selected.
There are two basic causes for sampling error.
 One is Chance: Bad luck may result in untypical choices. Unusual elementary units do
exists, and there is always a possibility that an abnormally large number of them will be
chosen. The main protection against this type of error is to use a large enough sample.
 Another cause of sampling error is sampling bias. This is the tendency to favor the selection
of elementary units that have particular characteristics. Sampling bias is usually the result of
a poor sampling plan.
Non sampling error
 The other main cause of unrepresentative samples is non sampling error. This type can occur
whether a census or a sample is being used.
 A non-sampling error is an error that results solely from the manner in which the observations
are made. The simplest example of non sampling error is inaccurate physical measurement due
to faulty instruments or poor procedures. Consider the observation of human weights – no 2
answers will be of equal reliability.

6.5 Sampling Distributions


 By sampling distribution of a statistic we mean the theoretical probability distribution of the
statistic.

6.5.1 Sampling Distribution of the Mean


 If samples of size n are drawn with replacement from a population with mean  and variance  2 ,the

2
mean and variance of the sampling distribution of x are given by  x   and  x2  .
n
 When random samples of size n are drawn without replacement from a finite population of size N that

has a mean  and a variance  2 , the mean and the variance of the sampling distribution of x are

given by

2 N n
x   and  x2  
n N 1
2
 If the population size is large compared to the sample size,  x2  , approximately
n
 The standard deviation of the sampling distribution of x is commonly known as the standard error of

the mean. It is when sampling with replacement. For a sample drawn without replacement from a
n

 N n
finite population of size N, the standard error of the mean is
n N 1

 In the latter case it is approximately if the population is very large compared to the sample
n
2
size. In our discussion, we shall assume that the population is large enough that can be taken
n
as the value of  x2 even when sampling without replacement.
 The standard error of the mean then depends on two quantities,  2 and n. It will be large if
 2 is large, i.e. if the scatter in the parent population is large. On the other hand, the standard
error will be small if the sample size n is large. Since with a larger sample we can get more
information about the population mean  and consequently less scatter of the sample mean
about  .
 The variance of the parent population is usually not under the experimenter’s control. Therefore
one sure way of reducing the standard error of the mean is by picking a large sample – the larger
the better.

 So for we have concerned ourselves with two parameters of the sampling distribution of
x    x and  x2  . We now turn our attention to the distribution itself

 The probability distribution of x will very much depend on the distribution of the sampled
population.
 Note that if n the sample size, is large, the distribution of x is close to a normal
2
distribution of course with mean  and variance . The statement of this result is contained
n
in the central limit theorem.

Central Limit Theorem


The distribution of the sample mean x of a random sample drawn from practically any
population with mean  and variance  2 can be approximated by means of a normal

2
distribution with mean  and variance , provided the sample size is large.
n
 The central limit theorem tells us that the shape of the distribution is approximately normal. We
2
already know that if the population has mean  and variance  2 , then  x   and  x2  .
n
 Converting to the z scale, we can give an alternate version of the central limit theorem.
x 
When the sample size is large, the distribution of is close to that of a standard

n
normal variable z.
(Recall that to convert to the z scale the rule is: subtract the mean and divide by the standard
deviation of the r.v in question)
 Since the central limit theorem applies if the sample size is large, a natural question is, how large
is large enough?
This will depend on the nature of the sampled population
 If the parent population is normally distributed, then the distribution of x is normal for
any sample size,
 If the parent population has a symmetric distribution, the approximation to the normal
distribution will be reached for a moderately small sample size, as low as 10.
 In most instances, the tendency towards normality is so strong that the approximation is fairly
satisfactory with a sample size of about 30.

Example 1
The records of the Dept of health, education and welfare show that the mean expenditure
incurred by a student during 2010 was $5000 and the standard deviation of the expenditure was
$800. Find the approximate probability that the mean expenditure of 64 students picked at
random was
a) More than $4820
b) Between $4800 and $5120

Example 2
The length of life (in hours) of a certain type of electric bulb is a random variable with a mean
life of 500 hours and a standard deviation of 35 hours.

What is the approximate probability that a random sample of 49 bulbs will have a mean life
between 488 and 505 hours?

6.6.2 Sampling Distribution of the Proportion


 If n items are picked independently from a population where the probability of success is p (not

very close to 0 or 1) and if n is large, then the distribution of the sample proportion x is
n

approximately normal with mean p and variance


pq where p  q  1 .
n
x
p
 Converting to the z scale, it follows that n has a distribution that is very close to the standard
pq
n
x  np
normal distribution provided n is large. This leads to the conclusion that is distributed
npq
approximately as a standard normal variable.
Example 1
Suppose 10% of the tubes produced by a Machine are defective. If a sample of 100 tubes is
inspected at random
a) Find the expected proportion of defectives in the sample
b) Find the variance of the proportion of defective in the sample
c) Find the approximate distribution of the sample proportion
d) Find the probability that the proportion of defective will exceed 0.16
Example 2
If 60% of the population feels that the president is doing a satisfactory job, find the approximate
probability that in a sample of 900 people interviewed at random, the proportion who share this
view will
a) Exceed 0.65
b) Be less than 0.56
LESSON SEVEN: ESTIMATION THEORY
7.1 Introduction

 The main objective of any statistical investigation is to acquire an understanding of the


population by studying the population parameters.
 The investigation of the entire population may not be feasible due to several reasons. Thus
there is a need to get an idea about the population parameters by studying the corresponding
sample statistics.
 There are two ways of giving an estimate of a parameter: Point estimate and Interval
estimate.

7.2 Point Estimation


 A numerical value of the estimator computed from a given set of sample values is called an
estimate of the parameter. Thus a point estimate is a single number, which is used to
estimate an unknown population parameter.
 An estimator of a parameter is a statistic relevant for estimating the parameter. An estimator
is thus a random variable; an estimate is its computed value from a given sample. For
instance X is an estimator of  . A particular value of X computed from a given sample will
be denoted by x and will represent an estimate of  .
n

 x  x 
2
i
 Similarly, S 2 is an estimator of  2 and s 2  i 1
is its estimate computed from a set
n 1
X
of data x1 , x2 ,......, xn . Also If X represents the number of successes in a sample of n, then
n
x
is an estimator of P and if in a particular sample there are x successes, then is an estimate
n
of P.
 The major limitation of a point estimate is that it fails to indicate how close it is to the
quantity it is supposed to estimate. In other words, a point estimate does not give any idea
about the reliability or precision of the method of estimation used.
Interval Estimation
 Another method of estimating parameters is called the method of Interval Estimation or
Confidence Interval.
 It involves computing two points and constructing an interval within which the parameter lies
with a specified degree of confidence. In constructing the end points of the interval, all of the
factors, namely, the point estimate, the population variance, and the sample size, are brought
into play.

7.3 Properties for a good estimator


a) Unbiasedness: An unbiased estimator of a population parameter is an estimator whose
expected value is equal to that parameter i.e. if you were to take an infinite number of
samples, calculate the value of the estimator in each sample, and then average these values,
the average value would equal the parameter.
b) Consistency: An estimator is said to be consistent if the difference between the estimator
and the parameter grows smaller as the sample size grows large.
c) Efficiency: an efficient estimator should have the least variance or least standard error.
d) Sufficiency: An estimator is said to be sufficient if it extracts from the sample such an
amount of information as no other estimator does. This means that an estimator should be
such that it utilizes all the information contained in the sample for the purpose of
estimating a given parameter.

 When we find a point estimate, we certainly do not expect that it will exactly equal to the
parameter value on the dot. Also if we take two samples from the same population, we do not
expect the two estimates computed from these samples to be exactly equal. This is due to the
sampling error involved. Thus, the method of point estimation has some drawbacks.
7.4 Confidence Intervals for Population Mean when the Population
Variance is Known.
If the population has a normal distribution and  is known, then a 1    100 percent

confidence interval for  is given by


 
x  z    x  z .
2 n 2 n
Example 1:
A gas station sold a total of 8019 gallons of gas on 9 randomly picked days. Suppose the amount
sold on a day is normally distributed with a standard deviation of   90 gallons. Construct
confidence intervals for the true mean amount sold on a day with the following confidence
levels:
a) 98%
b) 80%

Example 2:
A random sample of 16 fully grown turkeys had a mean weight of 20.8kgs. If we can assume
from past experience that   2.8 kgs, construct confidence interval for  , the true mean weight,
with the following confidence coefficients.
a) 90%
b) 95%
c) 98%

7.5 How Large a Sample?


The sample size needed so as to be 1    100 percent confident that the estimate x does not

  
2

differ from  by more than a pre assigned quality e is n 2  .
 e 
 

Example
A population has a normal distribution with variance 225. Find how large a sample must be
drawn in order to be 95% confident that the sample mean will not differ from the population
mean by more than 2 units.
7.6 Confidence Interval for Population Mean When the Population
Variance is Unknown
A 1    100 percent confidence interval for  when the population is normally distributed and

 is not known is given by


S S
x  tn 1,    x  tn 1,
2 n 2 n

Note that tn 1, , will be very close to  2 if n is 30 or more. In that case, the above confidence.
2

Interval for  becomes, approximately


S S
x   2    x   2
n n

Example 1
When 16 cigarettes of a particular brand were tested in a laboratory for the amount of nicotine
content, it was found that their mean content was 18.3 mg with S =1.8mg.
Set a 90 percent confidence interval for the mean nicotine content  in the population of
cigarettes of this brand. (Assume that the amount of nicotine in the cigarette is normally
distributed).

Example 2
In order to estimate the amount of time in minutes that teller spends on a customer, a bank
manager decided to observe 64 customers picked at random. The amount of time the teller spent
on each customer was recorded. It was found that the sample mean was 3.2 minutes with
S 2  1.44 find a 98% confidence interval for the mean amount of time  .

Example 3
The following data represent the amount of sugar consumed (in pounds) in a household during
five randomly picked weeks: 3.8, 4.5, 5.2, 4.0 and 5.5. Construct a 90% confidence interval for
the true mean consumption  . (Assume a normal distribution for the amount of sugar consumed)
LESSON EIGHT: HYPOTHESIS TESTING

8.0 Introduction
 A statistical hypothesis is a statement, assertion or claim about the nature of a population.
Hypothesis testing is a procedure based on sample evidence and probability theory to
determine whether the hypothesis is a reasonable statement.

8.2 The Null and Alternative Hypothesis


 A hypothesis that is being tested for the purpose of possible rejection is called a null
hypothesis denoted as H 0 . It should be stated in such a way that it contains the equality sign.
 The hypothesis against which the null hypothesis is tested is called the Alternative
hypothesis denoted as H A . This is the hypothesis that is accepted when the null hypothesis
is rejected. The null hypothesis denies the claim posed in the question.
 A test of statistical hypothesis is a rule or procedure that leads to a decision to accept or to
reject the hypothesis under consideration when the experimental sample values are obtained.
This rule is often referred to as a decision rule. If the evidence compiled from the sample
does not support the claim under H 0 , we will reject H 0 and conclude that H 0 is false.

8.3 Type I and Type II errors


 The error of rejecting the null hypothesis when it is in fact true is called a type I error or rejection
error. The probability of committing this error is denoted by the Greek letter  (alpha) and is
referred to as the level of significance of the test.
 This error of accepting H 0 when it is false is called a type II error or an acceptance error. The

probability of this error is denoted by the Greek letter  (beta).

8.4 One-Tailed and Two-Tailed tests


The nature of the critical region for a statistical test procedure depends on the alternative
hypothesis. We shall consider 3 cases of the alternative hypothesis.
a) H A :   0

b) H A :   0
c) H A :   0 , where 0 is a given specific value.

a) A right – tailed test


In discussing the engineer’s claim, we have considered the principle of testing the null
hypothesis
H 0 :   0 against the alternative hypothesis

H A :   0 with 0  450

It can be seen that for arbitrary 0 , the critical region C is given by


C  0  Z
n
where n is the sample size,  is the population standard deviation, which is assumed known and
Z is the value on the z scale such that the area in right tail is  .

The decision rule with the level of significance  is the given by


 x  0
Reject H 0 if x  0  Z or equivalently reject H 0 if  Z
n 
n
It is the one-sided nature of the alternative hypothesis (greater than, >) that prompts the rejection
of H 0 if the value of the statistic falls in the right tail of its distribution. The test is therefore
called a one-tailed test, specifically, a right-tailed test.

b) A left-tailed test
Suppose the null and alternative hypotheses are given as
H 0 :   0

H A :   0

Once again, the alternative hypothesis is one sided (less than, <). We reject H 0 for smaller

values of x , leading to the rejection of H 0 if the value falls in the left tail of the distribution of
x as shown below. This gives a one-tailed test that is specifically a left-tailed test.

x
0

 Z
Actions Reject H0 Do not reject H0


The critical value C is given by C  0  Z .
n
The decision rule is given as:
 x  0
Reject H 0 if x  0  Z or equivalently reject H 0 if   Z
n 
n

c) A Two-Tailed test
A test leads to a two-tailed test if the alternative hypothesis is two sided.
Consider he following example:
E.g. Suppose a machine is adjusted to manufacture bolts to the specification of 1 – inch diameter,
and we state the null and alternative hypotheses as
H0 :   1

HA :  1
If the sample mean of the diameters was too far off on either side of 1, we would favor rejecting
H 0 . If the value of x falls in either tail of the distribution of X , we will reject H 0 .


The rejection region with   0.05 has been distributed as  0.025 at each tail
2
 
2 2

x
0
Z Z z - scale
2 2

Actions: Reject H0 Do not reject H0 Reject H0

We have two critical values C1 and C2 and they are given by

 
C1  0  Z and C2  0  Z
2 n 2 n
The decision rule is formulated as follows:
  x  0
Reject H 0 if x  0  Z or x  0  Z or equivalently reject H 0 if is less
2 n 2 n 
n
than Z or greater than Z
2 2

8.5 Steps to be followed in testing a hypothesis


1. State the null hypothesis. We treat here only the special case where H 0 stipulates that the
parameter value is equal to a specific number.
2. State the alternative hypothesis. There should be no overlap between the sets of parameter
values stipulated under H 0 and H A .
The alternative hypotheses is important in deciding whether the critical region is one-tailed or
two-tailed. Rejection of H 0 leads to the acceptance of H A
3. Pick an appropriate test statistic
4. Stipulate the value of  , the probability of rejecting H 0 wrongly. It is the value of  that will
determine the critical point(s). Together with step 2, formulate the decision rule, i.e. determine
the values of the test statistics that will lead to the rejection of H 0 (the critical region)
5. Take a random sample and compute the value of the test statistic.
6. The final step consist of making the decision in light of the decision rule formulated in step 4.
It is important to interpret the conclusions in a non statistical language for the benefit of the un-
initiated

8.6 Test of Hypothesis (Single Population)


8.6.1 Test of Hypothesis for the Population Mean When the Population Variance is known
A basic assumption about the population in this case is that it is normally distributed. In the
absence of a normally distributed population, we will require that the sample size be large
x  0
  30 . The relevant statistic in this case is 
n
A summary of the test criteria to test H 0 :   0 against the three forms of alternative
hypotheses is given below
Alternative hypothesis The decision rule is to reject H 0 if the
computed value is
  0 Greater than Z

  0 Less than  Z

  0 Less than
 Z
or greater than
Z
2 2

Example 1
After taking a refresher course, a salesman found that his sales (in dollars) on 9 random days
were 1280, 1250, 990, 1100, 880, 1300, 1100, 950 and 1050. Does the sample indicate that the
refresher course had the desired effect, in that his mean sale is now more than 1000 dollars?
Assume   100 , and the probability of erroneously saying that the refresher course is beneficial
should not exceed 0.01. Also assume that the sales are normally distributed.

Example 2
An IQ test was administered to 9 students and their mean IQ was found to be 95. Assuming the
population variance is 144, is it true that the mean IQ in the population is less than 100?
Use   0.15 , and assume that IQ is normally distributed.
Example 3
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5kgs. From past experience, the standard deviation of the amount filled is known to be
0.15kgs.
To check if the machine is in control, a random sample of 16 bags was weighed and the mean
weight was found to be 5.1kgs. At the 5% level of significance, is there evidence to believe that
the adjustment is out of control [Assume a normal distribution of the amount of sugar filled in a
bag]

8.6.2 Test of Hypothesis for the Population Mean when the Population Variance is
Unknown and the Sample is Small
x  0
In the case where  was known, we used the test statistic

n
Since  is not known, we will use its estimation S. Hence the appropriate test statistic is
x  0
T
S
n

At this point we need the added assumption that the population is normally distributed,
especially if n is small. Since, under this assumption, the statistic T has student’s t distribution
with n – 1 d.f, we get the decision rules given in the following table, depending upon the
particular alternative hypothesis

Alternative Hypothesis The decision rule is to reject H 0 if the computed value of T


is
1.   0 Greater than tn 1,

2.   0 Less than tn 1,

3.   0 Less than tn1, or greater than tn 1,


2 2

Example 4
A car salesman claims that a particular make of car would give a mean milleage of greater than
20 miles per litre To test the claim, a field experiment was conducted where 10 cars were each
run on one litre of petrol. The results (in miles) were 23, 18, 22, 19, 19, 22, 18, 18, 24, 22.
Do the data corroborate the salesman’s claim? Use   0.05 and assume a normal distribution
for mileage per gallon.

Example 5
A home economist claims that is a person is put on a certain diet, it will lead to a reduction of his
or her weight. The following data records the weights (in pounds) of five people, before and
after the diet. Does the data support the claim at the 5% level of significance?
Person number 1 2 3 4 5
Before the diet 175 168 140 130 150
After the diet 170 169 133 132 143

Example 6
An auto dealer believes that his new model will give mean trouble-free service of at least 12,000
miles. In a simulated test with 4 cars, the following numbers of trouble-free miles were
obtained: 11,000, 12,000, 11,800 and 11,200
Do these data refute the dealer’s claim? Use   0.05 [assume a normal distribution]

Example 7
A machine can be adjusted so that when under control, the mean amount of sugar filled in a bag
is 5 kg. To check if the machine is in control, six bags were picked at random and their weights
were found to be 5.3, 5.2, 4.8, 5.2, 4.8 and 5.3.
At the 5% level of significance, is there evidence to believe that the machine is not in control?
[Assume a normal distribution for the weight of a bag]

8.6.3 The Population Proportion for Qualitative Data


 So far we have considered data where the observed variable can be measured on a numerical
scale. We now consider the case of a qualitative variable where the data is recorded as short
- tall, black - green, defective - non defective etc
 Our objective will be to test hypothesis regarding the proportion p of a certain attribute in the
population.
 We shall specifically consider the problem of testing the null hypothesis H 0 : p  p0 , where

p0 is a number between 0 and 1 against various alternative hypotheses.


e.g. we might be interested in the proportion of defective items produced by a machine and
wish to test: p  0.2 against p  0.2 ; or p  0.2 against p  0.2 ; or p  0.2 against p  0.2
 To carry out a test of hypothesis regarding the population proportion. We pick a sample of
independent observations and use the sample proportion as the statistic on which the test is
x
based. If p is the proportion in the population then the sample proportion has a sampling
n

distribution with mean p and standard deviation p 1  p  n

x
 Furthermore, if the sample is large, the shape of the distribution of is approximately
n
normal. Consequently, under the null hypothesis, which postulates that the population
x
proportion is p0 , has a distribution that is approximately normal with mean p0 and
n

standard deviation p0 1  p0  n provided n is large,

 We now have a situation analogous to the one where we tested hypotheses regarding the
population mean when  2 was known.

p0 1  p0 
, that of 0 by p0 and that of 
x
The role of x is played by by
n n n
The table below gives the 3 cases based on the nature of the alternative hypothesis

Alternative The decision rule is to reject H 0 if the computed value of


hypothesis x
 p0
n is
p0 1  p0  n

p  p0 Greater than Z
p  p0 Less than  Z

p  p0 Less than Z or greater than Z


2 2

Example 1
A machine is known to produce 30% defective tubes. After repairing the machine, it was found
that it produced 22 defective tubes in the first run of 100. Is it true that after the repaired the
proportion of defective tubes is reduced? Use   0.01 .

Example 2
The proportion of Kenyans who traveled abroad last year was 20%. To find the attitude of
people on foreign travel this year, 100 people were interviewed. Of these 15 said they would
travel and the remaining 85 said they would not. Is there any basis to believe that the attitude has
changed from last year? Use   0.10 .

8.7 Test of Hypothesis (Two Populations)


We now consider tests of hypothesis concerning the difference of means of two populations and
the difference of proportions of an attribute in two populations.

8.7.1 Difference in Population Mean When the Variances are Known


The null hypothesis under test is
H 0 : 1  2 that is 1  2  0 and the test statistic appropriate for the purpose is

X Y
12  22

m n
The decision rules for various forms of alternative hypothesis are given in the table below.
Alternative hypothesis The decision rule is to reject H 0 if the computed value is

1  2 Greater than Z

1  2 Less than  Z

1  2 Less than Z or greater than Z


2 2
Example 1
For a sample of 15 adult Kenyans picked at random, the mean weight was x  154 pounds,
whereas for a sample of 18 people in the U.S, the mean weight was y  162 pounds. From past

surveys it is known that the variance of weight in Kenya is 12  100 and in the U.S it is

 22  169 .
Is it true that there is significant difference between mean weights in the two places? Use
  0.05 . [Assume that the weights are normally distributed]

Example 2
In order to compare two brands of cigarettes, brand A and brand B, for their nicotine content, a
sample of 60 was inspected from brand A and a sample of 40 from brand B. The results of the
tests were summarized as follows.
Brand A x  15.4 S12  3

Brand B y  16.8 S22  4

At the 5% level of significance, do the two brands differ in their mean nicotine content?

8.7.2 Difference in Population Means when the Variances are unknown but are assumed
equal
 The following test procedure is particularly suited for the case when small independent
samples are drawn from normally distributed populations both having the same variance.
 We are interested in testing the null hypothesis H 0 : 1  2

X Y
 When the variance are known, we used the statistic
12  22

m n
 But we are given that the variances are equal. So suppose 12   22 and let  2 represent the

X Y
common value. The above test statistic then reduces to
1 1
 
m n
 Since  is not known, we shall use its polled estimator S P where, S p2 
 m  1 S12   n  1 S22
mn2
X Y
 Therefore, the test statistic appropriate for carrying out the test of H 0 is
1 1
Sp 
m n
The test procedure for the various form of the alternative hypothesis are given in the table below
Alternative Hypothesis The decision rule is to reject H 0 if the computed value of is

4.   2 Greater than tm  n  2,

5.   2 Less than - tm  n  2,

6.   2 Less than tm n2, or greater than tmn2,


2 2

Example 3
A nitrogen fertilizer was used on 10 plots and the mean yield per plot was found to be x  82.5
with an estimate S1 of the population standard deviation of yield per plot equal to 10kg. On the

other hand, 15 plots treated with phosphate fertilizer gave a mean yield y  90.5 kg per plot with

an estimate S 2 of the standard deviation of yield per plot equal to 20kg. At the 5% level of
significance are the two fertilizers significantly different?

LESSON NINE: CHI-SQUARE TESTS

9.0 Introduction
 This lesson covers the tests of goodness of fit, tests of independence and tests of
homogeneity

9.2 Test of Goodness of Fit


 While discussing tests of hypothesis about a population proportion, the items that were
inspected were classified into one of two categories: for instance, a coin could land heads or
tails, a person could be a smoker or a non smoker, an item could be defective or non
defective, and so on.

 If n items are picked independently from such a population, this leads to the binomial distribution.
 A generalization of this is when the population can be broken into more than two mutually exclusive
categories. For example, a coin could land heads, trails or on edge; when a die is rolled it could land
showing up one of the six faces; a person might be a democrat, a Republican, or an independent; a
person might be an A, B, O or AB blood type, and so on.
 If n independent observations are made from such a population, we get a generalized concept of the
binomial distribution called the Multinomial distribution.
 With our background of the last section, we are equipped to test the following null hypothesis

Ho: The Proportion of Democrats in the U.S is 0.60 (implying the proportion of non-
Democrats is 0.40)
 In this section we consider how to test a null hypothesis of the following type.

Ho: In the U.S, the proportion of Democrats is 0.55, the proportion of Republicans is 0.35,
and the proportion of independents is 0.10.
 To test the above hypothesis, suppose we interview 1000 people picked at random. On the basis of
the stipulated null hypothesis, we would expect 550 Democrats, 350 Republicans and 100
independents.
 If we actually observe 568 Democrats, 342 Republicans and 90 independents in this sample, we
might be quite willing to go along with the null hypothesis.
 On the other hand, if the sample yields 460 Democrats, 400 Republicans and 140 independent, we
would be reluctant to accept Ho.
 Thus in the final analysis, the statistical test will have to be based on how good a fit or closeness
there is between the observed numbers and the numbers that one would expect from the
hypothesized distribution.
 Tests of this type which determine whether the sample data are in conformity with the
hypothesized distribution are called tests of goodness of fit, since they literally test how good the fit
is.

 The test criterion is provided by a statistic X whose value for any sample is given as a number  2

defined by

 Oi  Ei 
2
6
 
2

i 1 Ei

Where Oi represents the observed frequency of the face marked i on the die and Ei the
corresponding expected frequency obtained by assuming that the null hypothesis is true.
Example:
It is believed that the proportions of people with A, B ,O and AB blood types in the population
are, respectively. 0.4, 0.2, 0.3 and 0.1. When 400 randomly picked people were examined, the
observed numbers of each type were 148, 96,106 and 50.
At the 5% level of significance, test the hypothesis that these data bear out the stated belief.

Summary:
1. The population is divided into K categories (classes) C1, C2,…, Ck
2. The null hypothesis stipulates that the probability that as individual belongs to category C1 is P1, that
it belongs to category C2 is P2, and so on.
3. To test this hypothesis, a random sample of n individuals is picked. The observed frequencies of the
categories are recorded as O1, O2,…,OK.
4. If the null hypothesis is true, then the expected frequencies E1, E2,…,Ek are obtained as follows:

E1  nP1 , E2  nP2 ,…, Ek  nPk


5. The departure of the observed frequencies from those expected is measured by means of a statistic

X whose value  2 is given by

O  E   O  E2   O  Ek 
2 2 2

  1 1
2
 2  ...  k
E1 E2 Ek
6. If none of the expected frequencies is less than 5, the distribution of X can be approximated very
closely by a chi-square distribution. Since there are K categories, the number of d.f associated with
the chi-square is K – I.
7. The critical region for a given level of significance will therefore consist of the right tail of the chi-
square distribution with K – 1 d.f.
The decision rule is:

Reject Ho if the computed  2 value is greater than the table value  k 1, ,
2

Note:
The distribution of the statistic X employed here is only approximately chi-square. It should not be used
if one of more of the expected frequencies is less than 5.
9.3 Test of Independence
 In the previous section, we have observed only one characteristic on any individual e.g. in classifying
an individual as A, B, O or AB blood type, we observed the characteristic “blood type”.
 Here we are interested in observing more than one variable on each individual and finding if there
exists a relationship between these variables. For example: for each person we might observe both
blood type and eye color and investigate if these characteristics are related in any way.
 In short, our goal is to test whether two attributes observed on members of a population are
independent.
 As a first step, we pick a sample of size n and classify the data in a two way table on the basis of the
two variables. Such a table is called a contingency table, since it alludes to whether the distribution
according to one variable is contingent on the distribution of the other. If there are r rows and c
columns, it is referred to as an “r by c” contingency table.

O  E 
2

 The test statistic is given by  2


 E
with (r-1) (c-1) d.f. The decision rule for an  level of

significance is: Reject Ho if the computed  2 value is greater than the table value 2r 1 c 1,

Example:
In a certain community, 360 randomly picked people were classified according to their age group
and political leaning. The data is presented below:
Political Age group
leaning 20-35 36-50 Over 50 Total
Conservative 10 40 10 60
Moderate 80 85 45 210
Liberal 30 25 35 90
Total 120 150 90 360

Test the hypothesis that a person’s age and political leaning are not related. Use  = 0.05

9.4 Test of Homogeneity


 Sometimes, one might want to compare the proportions of a characteristic in more than two
populations. For instance one might want to compare the proportions of democrats in four states
such as Newyork, California, Indiana, and Florida
 Also, if one considered three states, say network, California and Indiana, we might want to test
whether in these three states, the proportions of Republications are the same, whether the
proportions of Democrats are the same and whether the proportion of independents are the same.
In short, what we are interested in is whether the three states are homogeneous with respect to the
party affiliations of their residents. Tests that deal with problems of this type are called tests of
Homogeneity:
 Once again, the measure of departure from homogeneity is provided by a statistic X whose value for

O  E 
2

any sample is given by  


2
 E
 The distribution of the statistic is approximately chi-square with (r-1) (c-1) d.f, where r represents
the number of rows and c the number of columns. The approximation is satisfactory if none of the
expected frequencies is less than 5.

Example:
In order to investigate whether the distribution of the blood types in Europe is the same as in the
U.S , information was collected on 200 randomly picked people in Europe and 300 people in the
U.S. From the data provided below, is it true that the distribution of blood types in Europe and
the U.S are significantly different:

Location
Blood type Europe U.S Total
A 95 125 220
B 50 70 120
O 45 90 135
AB 10 15 25
Total 200 300 500
LESSON TEN: ANALYSIS OF VARIANCE

10.1 Introduction

 Analysis of variance (ANOVA) is a technique used to test for the


significance of the difference between more than two sample means
and to make inferences about whether the samples are drawn from
the same mean.
 The ‘analysis of variance’ procedure or ‘F test’ is used in such
problems, to test for the significance of the difference among more
than two sample means.

10.2 Assumptions of Analysis of Variance


The analysis of variance technique is based on the following assumptions:
1) Each sample is drawn from a normal population and the sample statistics tend to reflect the
characteristics of the population.
2) The populations from which the samples are drawn have identical means and variances i.e.
1  2  3  ...  n
1   2   3  ...   n

In case we are not able to make these assumptions in a particular problem, the analysis of
variance technique should not be used. In such cases, we should consider using a “non-
parametric (distribution-free) technique”.

10.3 Computation of Analysis of Variance


 The null hypothesis taken while applying analysis of variance technique is that the means of
different samples do not differ significantly.
 The procedure followed in the analysis of variance would be explained separately for
1) One-way classification
2) Two-way classification
 However, irrespective of the type of classification, the analysis of variance is a technique of
partitioning the total sum of squared deviations of all sample values from the grand mean and is
divided into two parts – the sum of squares between the samples and the sum of squares within the
samples.
 Individual observations in the same treatment samples, however, can differ from each other only
because of chance variation, since each individual within the group receives exactly the same
treatment.

10.4 One – Way Classification


 The term ‘one-factor analysis of variance’ refers to the fact that a single variable or factor of interest
is controlled and its effect on the elementary units is observed.
 In other words, in one-way classification, the data are classified according to only one criterion.
 Suppose we have k independent random samples of n1 , n2 , ..., nk observations from k populations.

 The population means are denoted by 1  2  3  ...  k .

 The one-way analysis of variance is designed to test the null hypothesis:


H 0 : 1  2  3  ...  k

i.e. the arithmetic means of the population from which the k samples are randomly drawn
are equal to one another.
 The steps involved in carrying out the analysis are:

 Calculate the variance between the samples:


 The variance (sum of squares) between samples reflects the contribution of both different
treatments and chance to inter-sample variability.
 Sum of squares is a measure of variability. The sum of squares between samples is denoted by SSB.
 For calculating variance between samples, we take the total of the squares of the variations of the
means of various samples from the grand mean and divide this total by the degrees of freedom.
 Thus the steps in calculating variance between samples will be:

i) Calculate the mean of each sample i.e. X 1 , X 2 , ... X K .

ii) Calculate the grand mean X . Its value is obtained as


X1  X 2  ... X k
X
n1  n2  ... nk
iii) Take the difference between the means of the various samples and the grand mean.
iv) Square the deviations and obtain the total which will give the sum of squares between the
samples; and
v) Divide the total obtained I step (d) by the degrees of freedom.

The degrees of freedom will be one less the number of samples i.e. if there are 4 samples,
then the degrees of freedom will be 4 – 1 = 3. In general v = k – 1 where k = number of
samples.

 Calculating the variance within the samples:


 The variance (sum of squares) within samples measures those inter-sample differences that
arise due to chance only.
 It is denoted by SSW. For calculating the variance within the samples we take the total of the
sum of squares of the deviation of various items from the mean values of the respective
samples and divide this total by the degrees of freedom
 Thus the steps in calculating variance within the samples will be:

i) Calculate the mean of each sample i.e. X 1 , X 2 , ... X K .

ii) Take the deviations of the various observations in a sample from the mean values of the
respective samples
iii) Square these deviations and obtain the total which gives the sum of squares within the
samples.
iv) Divide this total obtained in step (c) by the degrees of freedom, the d.f is obtained by
deducting from the total number of observations, the number samples, the number of
samples, i.e. v = n – k , where k refers to the total number of all the observations.
 Calculate the F-Ratio
 Calculate the F – ratio as follows

Variance between the samples S12


F*  i.e. F * 
Variance within the samples S22

 F is always computed with the variance between the sample means as the numerator and the
variance within the sample means as the denominator.
 The denominator is computed by combining the variance within the k samples into single
measures.
 Compare the computed value of F
 Compare the calculated value of F with the table value of F for the given d.f at a certain critical level
(generally we take 5% level of significance).
 If the calculated value of F is greater than the table value of F, it indicates that the difference in
sample means is significant,
i.e. it could not have arisen due to fluctuations of random sampling or, in other words, the
samples do not come from the same population.
 On the other hand, if the calculated value of F is less than the table value, the difference is not
significant and hence could have arisen due to fluctuations of random sampling.

Example
As head of a department of a consumers’ research organization, you have the responsibility for testing
and comparing lifetimes of four brands of electric bulbs. Suppose you test the lifetime of three electric
bulbs of each of the four brands.

The data is shown below, each entry representing the lifetime of an electric bulb, measured in
hundreds of hours.

Brand
A B C D
20 25 24 23
19 23 20 20
21 21 22 20

Can we infer that the mean lifetime of the four brands of electric bulbs are equal?

10.5 Analysis of Variance Table


Since there are several steps involved in the computation of both the between and within sample
variances, the entire set of results may be organized into an analysis of variance (ANOVA) table.
This table is summarized as shown below:

Source of Sum of Degrees of Mean squares Variance

Variation Squares Freedom MS Ratio, F


SSB
Between Samples SSB k–1 MSB =
k-1
MSB
F
MSW
SSW
Within Samples SSW n–k MSW =
nk
Total SST n-1

To use the ANOVA table, it is convenient to use the following short-cut computational formulas:

k Tj2 T2
Between samples sum of squares = SSB =  
j=1 nj N

k nj k T j2
Within samples sum of squares = SSW =  X   2
ij
j 1 i 1 j 1 nj

nj
k
T2
Total sum of squares = SST   X ij2 
j 1 i 1 N

The format for the ANOVA table using the computational formulas is shown below:

Source of Sum of Mean squares Variance

Variation Squares D.F MS Ratio, F


k Tj2 T2
SSB = 
SSB
Between Samples  k–1 MSB =
j=1 nj N k-1

MSB
F
MSW
k nj k T j2
SSW =  X  
SSW
Within Samples 2
n–k MSW =
nk
ij
j 1 i 1 j 1 nj
nj
k
T2
Total SST   X ij2  n-1
j 1 i 1 N

Example
Consider the above example.

In order to use the computational formulas the following four quantities must be computed;

k nj k T j2 T2
 X ij2 ,
j 1 j 1
Tj , n
j 1
, and
N
.
j
LESSON ELEVEN: REGRESSION AND CORRELATION ANALYSIS

11.1 Introduction

 Correlation analysis is a statistical tool used to ascertain the association between two variables while
regression analysis is used to determine the nature and extent of relationship between variables.
This lesson explains the methods used in studying correlation and regression.

11.2 Correlation Analysis


 If two quantities vary in such a way that movements in one are accompanied by movements in the
other, these quantities are said to be correlated. Thus correlation is the existence of some definite
relationship between two or more variables.
 E.G: There exists some relationship between family income and expenditure on luxury items, price
of a commodity and amount demanded, etc.
 Correlation analysis helps in determining the degree of relationship between two or more variables
– it does not tell us anything about cause-effect relationship.

11.3 Types of Correlation


Correlation may be classified in the following ways:-
(a) Positive and negative correlation.
 Whether correlation is positive (direct) or negative (inverse) would depend upon the direction of
change of the variable.
 If both the variables are varying in the same direction, i.e. if one variable is increasing the other is
also increasing or, if one variable is decreasing the other is also decreasing, correlation is said to be
positive.
 If, on the other hand, the variables are varying in opposite directions, i.e. as one variable is
increasing the other is decreasing and vice versa, correlation is said to be negative.

(b) Simple, partial and multiple correlation


 The distinction between simple, partial and multiple correlation is based upon the number of
variables studied.
 When only two variables are studied, it is a problem of simple correlation.
 When three or more variables are studied it is a problem of either multiple or partial correlation.
 In multiple correlation three or more variables are studied simultaneously. In partial correlation,
there are more than two variables but only two variables that are influencing each other are
considered, the effect of other influencing variables being kept constant.

(c) Linear and Non-Linear correlation


 The distinction between linear and non-linear correlation is based upon the constancy of the ratio
of change.
 If the amount of change in one variable tends to bear a constant ratio to the amount of change in
the other variable, correlation is said to be linear.
 Correlation would be called non-linear or curvilinear if the amount of change in one variable does
not bear a constant ratio to the amount of change in the other variable.

11.4 Methods of Studying Correlation


1. Scatter diagram
2. Karl Pearson’s coefficient of correlation
3. Spearman’s rank correlation coefficient

Scatter Diagram
It helps to illustrate diagrammatically any relationships that may exist between two variables.
The following diagram indicate various degrees of correlation

Diagram to be drawn
Examples
1. Draw a scatter diagram from the following data
Supply (x) 4 5 8 9 10 12 15
Demand (y) 3 4 6 5 7 8 11

11.5 Coefficient of Correlation


Coefficient of correlation, denoted by r, is a unit free measure of the degree of linear relationship
between two or more variables. The square of correlation coefficient i.e. r2 is called the
coefficient of determination. It measures the amount of variation one variable that can be
accounted for in terms of variation in the other(s). For instance if r = 0.90 then r2 = 0.81, which
implies that 81% of the variation in one variable can be attributed to variation in the in the other.

11.5.1 Karl Pearson’s coefficient of correlation (Product moment coefficient of correlation)


The coefficient of correlation (r) is a measure of strength of the linear relationship between two
variables. It is also referred to as the sample coefficient of correlation and is given by
n XY   X  Y
r
 n X 2   X 2   n Y 2   Y 2 
       

Example
The following data refers to exam marks vs hours of study for a sample of 8 candidates that sat a
statistics exam

Exam mark (Y) 64 61 84 70 88 92 72 71


Hours of study (X) 20 16 34 23 27 32 18 22
a) Calculate the Pearson’s product moment coefficient of correlation
b) Calculate the coefficient of determination and give a comment about the correlation
between exam marks and hours of study.
Interpretation of the coefficient of correlation
1. When r = +1, there is a perfect positive correlation between the variables
2. When r = -1, there is a perfect negative correlation between the variables
3. When r = 0, there is no correlation between the variables
4. The closer r is to +1 or to –1, the closer the relationship between the variables and the closer r
is to 0, the less close the relationship.
Advantage
 It summarizes in one figure the degree of correlation and whether it is positive or negative.
Limitations
 It assumes linear relationship regardless of the fact whether that assumption is true or not.
 The coefficient can be misinterpreted.
 The value of the coefficient is unduly affected by the extreme values.
 It is time consuming.

11.5.2 Spearman’s Rank Correlation Coefficient


 This is a measure of the degree of linear relationship between variables which are given in
terms of terms of their ranks (positions) in the series.
 The spearman’s rank coefficient is denoted by r and is given by the formula
6 di2 6 d 2
r  1  1
n(n 2  1) n3  n
In rank correlation, there are two types of problems:-
i. Where actual ranks are given
ii. Where actual ranks are not given
Where actual ranks are given
Steps:
 Take the differences of the two ranks i.e. (R1-R2) and denote these differences by d.
 Square these differences and obtain the total d 2

6 d 2
 Use the formula r  1 
n  n 2  1

Example
Two managers are asked to rank a group of employees in order of potential to eventually become
top managers. The rankings are as follows:
Employees ranking by manager I Ranking by manager II
A 10 9
B 2 4
C 1 2
D 4 3
E 3 1
F 6 5
G 5 6
H 8 8
I 7 7
J 9 10
Calculate the coefficient of rank correlation and comment on the value.

Where ranks are not given


Ranks can be assigned by taking either the highest value as 1 or the lowest value as 1. The same
method should be followed in case of all the variables.

Example
Calculate the rank correlation Coefficient for the following data of marks of 2 tests given to
candidates for a clerical job
Preliminary Test 92 89 87 86 83 77 71 63 53 50
Final test 86 83 91 77 68 85 52 82 37 57

EQUAL RANKS OR TIE IN RANKS


 Where two or more individuals are to be ranked equal, the rank assigned for purposes of
calculating the coefficient of correlation is the average of the ranks which these individuals
would have got had they differed slightly from each other.
 Where equal ranks are assigned to some entries, an adjustment in the formula for
calculating the Rank coefficient of correlation is made.

 
 The adjustment consists of adding 1 m 3  m to the value of
12 d 2
where m stands for

the number of items whose ranks are common.


 The formula can thus be written as
 
6  d 2   m13  m1    m23  m2   ...
1 1
r  1  
12 12
n  n  1
2

Example
An examination of eight applicants for a clerical post was taken by a firm. From the marks
obtained by the applicants in the accounting and statistics papers, compute the Rank coefficient
of correlation.
Applicant A B C D E F G H
Marks in accounting 15 20 28 12 40 60 20 80
Marks in statistics 40 30 50 30 20 10 30 60
Merits of the Rank method
 It is simpler to understand and easier to apply compared to the Karl Pearson’s method.
 Where the data are of qualitative nature like honesty, efficiency, intelligence etc, the method
can be used with great advantage.
 It is the only method that can be used where we are given the ranks and not the actual values.

Limitations
 The method cannot be used for finding out correlation in a grouped frequency distribution.
 Where the number of observations exceeds 30, the calculations become quite tedious and
require a lot of time.

11.6 Test of Hypothesis Regarding Population Correlation Coefficient


 The parameter that provides a measure of association between two variables in the
population analogous to the way r does in the sample is called the population correlation
coefficient and is denoted by the Greek letter  (rho).
 Suppose we obtain a certain value f r from a given set of data. What is it suggesting about
 ? We shall consider only the simple case where the null hypothesis in which we are
interested is H 0 :   0, meaning that there is no relationship between the two variables in
the population.

n2
 The test statistic to carry out the test is r .
1 r2
 If H0 is true, then this statistic has the students’ t distribution with n-2 degrees of freedom.
Example
Consider the previous example on Exam marks Vs hours of study where we obtained r = 0.88
and r2 = 0.77 based on a sample with n = 10. Test the hypothesis that the population correlation
coefficient is zero at the 5% level.

11.7 Regression Analysis


Regression analysis is the statistical tool which helps to estimate or predict the unknown values
of one of one variable from known values of another variable.

Types of Regression
Simple linear regression: Involves a relationship between two variables only.
Multiple regression: Analyses or considers the relationship between three or more variables.

In regression analysis, an attempt is made to determine a line (Curve) which best fits the given
pair of data. In case of a linear relationship, a line with the equation of the Y  a  bX where a
and b are constants to be determined is fitted. The constants a and b are determined such that

S   Y  a  bX  is a minimum.
2

With the use of differential calculus, S is minimized for a and b which satisfy the following two
normal equations

 Y  na  b X
 XY  a X  b X 2

Solving for a and b simultaneously yields the formulas


n XY   X  Y
bˆ 
n X 2    X 
2

aˆ 
1
n
  Y  bˆ X  = ˆ
Y  bX

The constant b in the equation Y  a  bX is called the regression coefficient of Y on X. It


measures the linear relationship between the two variables X and Y. X is called the independent
variable, also known as the regressor or predictor. Y is called the dependent variable, also known
as the regressed or explained variable.

Example
The following data give the observations on weekly income and expenditure on food for five
households.
Weekly Income (£) 240 270 300 30 360
Expenditure on food(£) 200 220 240 245 250
a) Plot the data on a scatter diagram
b) Determine the least squares regression line of expenditure on weekly income.
c) Using the equation in (b), estimate the expenditure on food for someone having a weekly
income of £380.

11.8 Activities
1. For the following results showing marks obtained by 15 students, calculate the Rank
correlation
Marks in 50 50 40 39 38 37 36 35 34 33 32 31 30 29 28
Maths
Marks in 50 49 51 52 43 47 42 40 44 40 30 41 32 33 31
English

2. The following data gives the aptitude test scores and productivity indices of 10 workers selected at
random.

Aptitude scores (X) 60 62 65 70 72 48 53 73 65 82


Productivity index (Y) 68 60 62 80 85 40 52 62 60 81

i) Determine the regression equation of Y on X.


ii) Estimate the productivity index of a worker whose test score is 92
iii) Compute the coefficient of correlation and coefficient of determination and interpret their
values.
iv) Test the hypothesis that the population correlation coefficient is zero at the 5% level.
LINEAR PROGRAMMING

This is a mathematical technique that deals with the optimization of a linear function of
variables known as objective function subject to a set of linear inequalities known as
constrains. The objective function may be profit, revenue, contribution and cost. The
constrains may be imposed by different resources such as labour, finance, materials,
machines, market, technology etc. By linearity is meant a mathematical expression in
which all expressions among the variables are linear (plotted you obtain a straight line.

A linear programming has two basic parts:

 The objective function, which describes the primary purpose of the formulation -
to maximize some return (profit) or to minimize some cost.
 The constrains set, which is a system of inequalities under which optimization is
to be accomplished.

Assumptions of linear programming:

a) Linearity - costs, revenues or any physical properties which form the basis of the
problem vary in direct proportion (linearly) with the quantities or number of
components produced.
b) Divisibility - quantities, revenues and costs are infinitely divisible i.e. any
fraction or decimal answer is valid.
c) Certainty – the technique makes no allowance for uncertainty in the estimate
made, although the evaluation of dual values indicates the sensitivity of the
solution to marginal uncertainty in constraint values.
d) Positive solutions – non-negativity constraints are introduced to ensure only
positive values are considered.
e) Interdependence between demand for products is ignored; products may be
complementary or a substitute for one another.
f) Time factors are ignored. All production is assumed to be instantaneous

SOME APPLICATIONS OF DYNAMIC PROGRAMMING

a) Production and distribution problems


b) Scheduling inventory control
c) Resource allocation
d) Replacement and maintenance problems

ADVANTAGES OF LP
1. In certain types of problems such as inventory control management, Chemical
Engineering design, dynamic programming may be the only technique that can solve
the problems.
2. It helps in attaining the optimum use of productive factors. Linear programming
indicates how a manager can utilize his productive factors most effectively by a better
selection and distribution of these elements. E.g. more efficient use of manpower and
machines can be obtained by use of linear programming.
3. Most problems requiring multistage, multi period or sequential decision process are
solved using this type of programming.
4. Because of its wide range, it is applicable to linear or non-linear problems, discrete or
continuous variables, deterministic or stochastic problems.
5. The mathematical techniques used can be adapted to the computer.
6. Better and more successful decisions

LIMITATIONS OF L.P
1. Each problem has to be modelled according to its own constraints and
requirements. This requires great experience and ingenuity.
2. The number of state variables has to be kept low to prevent complicated
calculations.
3. It treats all relationships as linear. I.e. if direct cost of producing 10 units is sh. 100
then on 20 units it is assumed to be sh. 200. This may not always be the case in
practice.
4. All the parameters in the linear programming model are assumed to be known
with certainty which is not possible in real situation.

METHODS OF SOLVING LINEAR PROGRAMMING PROBLEMS:

The two methods used to solve linear programming problems are:

a) Graphical methods
b) Simplex method

Whichever the method to be adopted, the first step is to formulate the linear
programming problems using the following steps:

 Identify the decision variables to be determined and express them in terms of


algebraic symbols.
 Identify all the limitations or constrains in the given problem and then express them
as linear inequalities.
 Identify the objective/ criterion which is to be optimized (maximize or minimize)
and express it as a linear function of the defined decision variables.

Example 1:

A manufacturer has two products P1 and P2 both of which are produced in two steps by
machines M1 and M2. The process times per hundred for the products on the machines
are:

M1 M2 contribution (per 100 units)

P1 4 5 10

P2 5 2 5

Available hours 100 80

The manufacturer is in a market upswing and can sell as much as he can produce of
both the products. Formulate the mathematical model and determine the optimal
product mix.

Solutions:

Using the graphical method

Formulate the linear programming:

Let product P1 be represented by x1 and P2 by x2

Objective function, Z = 10x1 + 5x2

Subject to; 4x1 + 5x2 ≤ 100 (M1 constrain)

5x1 + 2x2 ≤ 80 (M2 constrain)

And x1, x2 ≥ 0 (non-negativity condition)

Solving using graphical method,

 Determine the coordinates;

M1 constrain: 4x1 + 5x2 = 100

When x1 = 0; x2 = 100/5 = 20 (0, 20)

When x2 = 0; x1 = 100/4 = 25 (25, 0)


M2 constrain: 5x1 + 2x2 = 80

When x1 = 0; x2 = 80/2= 40 (0, 40)

When x2 = 0; x1 = 80/5 = 16 (16, 0)

 Plotting the graph;

X2
26
24
22
20
18
16
14
12
10
8
6
4 D(0,16)
2
0 C ( 10,12)

Feasible region

A (0,0) B(25,0)

4 8 12 16 20 24 25 28 32 36 40

X1
 Considering the points of intersections, their coordinates and testing using the
objective functions;

Points coordinates Z= 10x1 + 5x2

A (0,0) 10(0) + 5 (0) = 0

B (25,0) 10(25) + 5 (0) = 250

C (10,12) 10(10) + 5(12) = 160

D (0, 16) 10(0) + 5(16) = 80

Thus the product mix should be;

Product P1 = 25

Product P2 = 0

And maximum contribution will be 250

2) Using Simplex method


It’s a method which is designed to solve any linear programme. It is an iterations where
the same computational steps are repeated a number of times before the optimum is
reached. In order to develop a general solution method, the LP problem must be put in a
common format, which we call the standard form.

Step 1: Formulate the LP problem

Let product P1 be represented by x1 and P2 by x2

Objective function, Z = 10x1 + 5x2

Subject to; 4x1 + 5x2 ≤ 100 (M1 constrain)

5x1 + 2x2 ≤ 80 (M2 constrain)

And x1, x2 ≥ 0 (non-negativity condition)

Step 2: Convert the inequalities in constraints into equalities.

This can be done by adding the slack variables s1, s2, ….

Z = 10x1 + 5x2

4x1 + 5x2 + S1 = 100


5x1 + 2x2 + S2 = 80

Step 3: Initial Simplex Tableau


Solution Product Slack Quantit
variable variables y
X1 X2 S1 S2 solution
S1 4 5 1 0 100
S2 5 2 0 1 80
Z 10 5 0 0 0

Step 4: Obtain the Pivot Element

 Identify the biggest number in Z row (10). This gives the column of the interest.
 Divide the elements in the identified column by quantity solution

100/ 4 = 25

80/5 = 16

 The smallest of the answer obtained is 16, which identifies the row of interest.
 The point where the identified column and the row meet, gives the pivot element
(5)

Step 5: Make pivot elements 1 (by dividing the row with pivot element by the value of
pivot element) and give the row identified a new identity (the identity of the identified
column). The draw initial simplex tableau reproduced.

Old row: S2 5 2 0 1 80

New row: X1 5/5 2/5 0/5 1/5 80/5

X1 1 0.4 0 0.2 16

Initial Simplex Tableau reproduced


Solution Product Slack Quantit
variable variables y
X1 X2 S1 S2 solution
S1 4 5 1 0 100
X1 1 0.4 0 0.2 16
Z 10 5 0 0 0

Step 6: Row operations.


Done to make the elements in identified column zero, except the pivot element which
MUST remain one (1). The operation must be within any two rows one of which is the one
with pivot element. I.E

OLD ROW: S1 4 5 1 0 100

X1 (1 0.4 0 0.2 16) × 4

OLD ROW: S1 4 5 1 0 100

X1 4 1.6 0 0.8 64

NEW ROW: S1 0 3.4 1 -0.8 36

OLD ROW: Z 10 5 0 0 0

X1 (1 0.4 0 0.2 16) × 10

OLD ROW: Z 10 5 0 0 0

X1 10 4 0 2 160

NEW ROW: Z 0 1 0 -2 -160

Step 7: Second Simplex Tableau

Second Simplex Tableau


Solution Product Slack Quantit
variable variables y
X1 X2 S1 S2 solution
S1 0 3.4 1 -0.8 36
X1 1 0.4 0 0.2 16
Z 0 1 0 -2 -160

Since all the elements in the Z row are not negatives or zeros, the optimal solution is not
reached. Go to step 8.

Step 8: Repeat steps 4 to 7.

a) Pivot element

Column identified = X2

Dividing elements in this column by elements in quantity solution;

36/3.4 = 10.6
16/0.4 = 40

The smallest of answer obtained (10.6) identify the row

Where the row and column identified meet is pivot element.

b) Make pivot element 1 and give the row new identity.

Old row: S1 0 3.4 1 -0.8 36

New row: X2 0/3.4 3.4/3.4 1/3.4 -0.8/3.4 36/3.4

X2 0 1 0.29 -0.24 10.6

Second Simplex Tableau reproduced


Solution Product Slack Quantit
variable variables y
X1 X2 S1 S2 solution
X2 0 1 0.29 -0.24 36
X1 1 0.4 0 0.2 16
Z 0 1 0 -2 -160
Row operations

Old row: X1 1 0.4 0 0.2 16

X2 (0 1 0.29 -0.24 36) × 0.4

Old row: X1 1 0.4 0 0.2 16

X2 0 0.4 0.122 -0.096 14.4

New row: X1 1 0 -0.122 0.296 1.6

0ld row: Z 0 1 0 -2 -160

X2 0 1 0.29 -0.24 36

New row: Z 0 0 -0.29 -1.76 -196

c) Third simplex tableau

Third Simplex Tableau


Solution Product Slack Quantit
variable variables y
X1 X2 S1 S2 solution
X2 0 1 0.29 -0.24 36
X1 1 0 -0.122 0.296 1.6
Z 0 0 -0.29 -1.76 -196
Thus, the product mix should be:

Product P1 = 1.6

Product P2 = 36

Maximum contribution of 196

DUALITY
Every linear program has an opposite program called Dual program. The initial
formulated programme is called primal program. The relationship between primal and
dual program is that the objective optimal solution is the same and the solution of one
can be deduced from the other.
Procedure for determining dual program from primal is:
a) Maximum primal implies minimum dual and vice versa
b) Less or equal to (≤) primal implies greater or equal to (≥) dual and vice versa.
c) Number of variables in the dual program equal number of constraints in the primal and
vice versa.
d) The right hand side of dual constraints inequalities are objective co-efficient in primal
program and vice versa.
e) Constraint coefficients in the dual program are the transpose of the matrix of constraint
co-efficient in the primal.
f) Non-negativity conditions do not change.

Example 1:
Given primal program:
Max, Z = 4 x1 + 2x2 +5x3
Subject to: x1 + 2x2 - x3 ≤ 20 …………………….y1
4 x1 + 8x2 +11x3 ≤ 28 ……………….y2
6 x1 + x2 + 8x3 ≤ 32 ………………....y3
And x1, x2, x3 ≥ 0
Required:
Obtain the dual program
Solution:
Constraints coefficient matrix
1 2 -1
4 8 11
6 1 8
Transposing the above matrix:
1 4 6
2 8 1
-1 11 8
Dual program;
Mix, Z = 20 y1 + 28y2 +32y3
Subject to: y1 + 4y2 - 6y3 ≥ 4
2y1 + 8y2 +y3 ≥ 2
-1y1 + 11y2 + 8y3≥5
And y1, y2, y3 ≥ 0

Example 2:
Given primal program:
Min, Z = 5x1 + 8x2
Subject to: 2x1 + 3x2 ≥ 5 …………………….y1
4 x1 + 10x2 ≥ 19… ……………….y2
x1 + 12x2 ≥ 24… ……………….y2
And x1, x2 ≥ 0
Required:
Obtain the dual program

Constraints coefficient matrix


2 3
4 10
1 12
Transposing the above matrix:
2 4 1
3 10 12

Dual program:
Max, Z = 5 y1 + 19y2 + 24y3
Subject to: 2y1 + 4y2 + y3 ≤ 5
3y1 + 10y2 +12y3 ≤ 8
And x1, x2, x3 ≥ 0
NOTE:
The solution to the dual can be deduced from the solution to the primal using simplex
method. The procedure involve associating the values in the Z-row of the optimal
primal tableau with the dual variables, where the first slack variable is associated with
the first dual variable, the second slack variable with the second dual variable and so
on.
Example3:
Suppose you have primal program as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 5
And x1, x2 ≥ 0
After performing all steps involved in simplex method, the optimal (last) tableau is:

Solution Products Slack variables Quantity


variable x1 x2 s1 s2 solution
x1 1 0 2/3 1/3 1
x2 0 1 1/3 2/3 2
Z 0 0 -1/3 - 4/3 -8
Dual program would be:
Min, 𝝻 = 4y1 + 5y2
Subject to: 2y1 + y2 ≥ 2
y1 + 2y2 ≥ 3
And y1, y2 ≥ 0
The solution to the dual program is determined by associating the Z-row values in the
primal optimal tableau corresponding to the slack variables. That is:
y1 = 1/3 which corresponds to s1
y2 = 4/3 which corresponds to s2
Thus, the optimal solution for the dual is:
y1 = 1/3 y2 = 4/3 𝝻=8

SENSITIVITY ANALYSIS
This involves determining the effect the various change to the primal programme
would have on the current solution to the program. It is also called Post-Optimality
analysis.
The various changes that can occur in linear programming problem include:
a) Changes in the coefficient of the objective program.
b) Changes in the availability of resources or the right hand side of the inequalities.
c) Changes in the coefficient of the constraints.
d) Addition of new constraints.

Example 4:
Suppose we have a formulated linear program model as:
Max, Z = 2x1 + 3x2
Subject to: 2x1 + x2 ≤ 4 ……………………………R1
x1 + 2x2 ≤ 5 ……………………………R2
And x1, x2 ≥ 0
Also suppose we are given optimal solution (after solving using simplex of graphical
method) as: x1 = 1 x2 = 2 and Z = 8
a) Supposing the 1st constrain (R1) increases by 20% and 2nd constrain (R2) increases by 10%,
perform the sensitivity analysis to find the new solution and check whether it is feasible
solution.

Solution:
The new solution is given as:
Current basic variable = (inverse matrix for constraints coefficient) x (New right
hand side)
But inverse of matrix = x Adjoint
Matrix for the coefficient of the constraints for the problem above is 2 1
1 2
Determinant = 4 – 1 =3
Adjoint = Transpose of the cofactor of the matrix
But cofactor = 2 -1
-1 2
Transposing the cofactor = Adjoint = 2 -1
-1 2
Inverse = x 2 -1
-1 2
New right hand side:
New R1 = 4 + x 4 = 4.8
New R2 = 5 + x 5 = 5.5

Thus, current basic variables, x1 = 4.8


x2 5.5
Hence, x1 = 1.4 and x2 = 2.1
Finding optimal solution,= 2 (1.4) + 3 (2.1) = 8.9
Since the values for the basic variables are all positive then we can conclude that the
new solution is feasible solution.

REVISION EXERCISE:
1) Using the information give an example 4 above, determine the new optimal solution by
performing the sensitivity analysis when:
a) R1 increases by 10% and R2 decreases by 20%.
b) R1 remain 4 and R2 increases by 30%
c) R1 reduces by 2 units and R2 increase by 3 units.

2) Given primal program as:

Min, Z = 1000x1 + 800 x2


Subject to: 6x1 + 2x2 ≤ 12
12x1 + 4x2 ≤ 24
And x1, x2 ≥ 0
a) Write a dual program
b) Solve the dual programme using simplex method
c) Deduce the solution to the primal program from the dual program
INDEX NUMBERS
An index number is a number which indicates the level of a certain phenomenon at any
given date with the level of the same phenomenon at some standard date.
It provides an opportunity for measuring the relative change of a variable where
measurement of its actual change is inconvenient or impossible. It is also a series of
numbers by which changes in the magnitudes of a phenomenon are measured from
time to time or from place to place. An index is constructed by selecting a base year as a
starting point. The price or quantity of base year is represented by 100 and those of
other years measured against it.
Uses of index numbers;
a) Price index numbers are used to measure changes in a particular group of prices and
help in comparing the movement of one commodity with another.
b) Index numbers of industrial production provide a measure of change in the level of
industrial production in a country.
c) The quantity index numbers show the rise or fall in the volume of production, volume of
exports and imports.
d) The imports and export prices indices are used to measure the changes in the terms of
trade of a country
e) Used to forecast business conditions of a country and to discover seasonal fluctuations
and business cycles
f) Used to measure enrolment changes and performance of students.

Limitations of index numbers;


a) It is not practicable to price all the goods and services as well as to take into account all
changes in quantity or product.
b) Can be affected by sampling error as we calculate index numbers using samples.
c) In price index numbers the choice of a normal period is difficult as few periods can be
regarded as normal for all segments of the economy.
d) The results obtained by different methods of construction may not quite agree
e) Comparisons of changes in variables over long periods are not reliable

Price index number:


Such index shows that the value of money is fluctuating i.e. appreciating or
depreciating accordingly as index numbers of prices are rising or falling. A rise in index
number of prices will signify the deterioration in the value of money and vice versa.
Simple index numbers;
These are cases where construction of index numbers involves a single commodity.
Methods used in constructing simple index numbers are;
a. Fixed base method

Here, the base period is fixed and prices of subsequent years are expressed as relatives
of the prices of the base year. A price relative is price of an item in one year relative to
another year i.e.
P1/P0 ×100
Where; P1 = price of current year
P0 = price of base year
Example:
From the following data, compute price index number by taking 2002 as base year.
Year 2002 2003 2004 2005 2006 2007
Price of 8 10 12.5 18 22 25
sugar/ Kg
Solution
Year Price of sugar/ Price index (P1/P0×
Kg 100)
2002 8 8/8 × 100 = 100
2003 10 10/8 ×100 = 125
2004 12.5 12.5/8 × 100 =
2005 18 156.50
2006 22 18/8 × 100 = 225
2007 25 22/8 × 100 = 275
25/8 ×100 = 312.5

b. Chain base method

In this method, the base is not fixed and it changes from year to year. The price of the
previous period is taken as the base period. This method shows whether the rate of
change is rising, falling or constant as well as the extent of change from year to year.
Price index number = (price of the current year)/ (price of previous year) × 100
Example;
Construct the chain base index numbers from the following data.
Year 2002 2003 2004 2005 2006 2007
Price 120 125 140 150 135 160
(Shs)
Solution
Year Prices Chain base index
(Shs) numbers
2002 120 -
2003 125 125/120 × 100 = 104.17
2004 140 140/125 × 100 = 112.0
2005 150 150/140 × 100 = 107.14 Weighted index number;
2006 135 135/150 × 100 = 90.00 If all commodities selected do not have
2007 160 160/135 × 100 = 118.52 equal importance for consumers then
weighted system is adopted. Appropriate
weights are assigned to different commodities. An index is called Weighted Aggregate
index when it is constructed for an aggregate of items (prices) that have been weighted
in some way (by corresponding quantities produced, consumed or sold), so as to reflect
their importance.
The important formulae of constructing weighted index numbers include;
i) Laspeyres Method (L) - The base year quantities/prices are taken as weights. The
method tries to answer the question “what is the change in aggregate value of the base
period list of goods when valued at given period prices?”

P01 = ∑P1q0 × 100


∑P0q0
Where: P01 = price index number
P0 = price of the base year
q0 = quantity of the base year
P1 = price of the current year
q1 = quantity of current year
ii) Paasche Method (P) - Here, the current year quantities / prices are taken as weights. It
tries to answer the question, “what would be the value of the given period list of goods
when valued at current period prices?”

P01 = ∑P1q1 × 100


∑P0q1

N.B in Laspeyres index weights (q0) are the base year quantities and do not change from
one year to next unlike Paasche index which requires continuous use of new quantity
weights for each period considered.
iii) Fisher’s Ideal Method - Taken as geometric mean of Laspeyres and Paasche indices.

P01 = ∑P1q0 × ∑P1q1 × 100

∑P0q0 ∑P0q1
P01 = √ (L × P)
iv) Marshall-Edge Worth method - The current year as well as base year prices and
quantities are considered.

P01 = ∑(q0 +q1) P1 × 100


∑(q0 + q1) P0
On opening the brackets;
P01 = ∑ P1q0 + P1q1 × 100
∑P0q0 + P0q1
Example:
From the following data, calculate index numbers for 2013 taking 2012 as the base and
using the following formulae;
a) Laspeyres
b) Paasche
c) Fishers
d) Marshall –edge worth
2012 2013
Price Quantity Price Quantity
(Shs) (bags) (Shs) (bags)
Maize 65 20 135 30
Wheat 95 8 160 7
Beans 150 5 320 8
Solution:
2012 2013
P0 q0 P1 q1 P1q0 P0q0 P1q1 P0q1
Maiz 65 20 13 30 2700 1300 4050 1950
e 5
Whea 95 8 16 7 1280 760 1120 665
t 0
Beans 15 5 32 8 1600 750 2560 1200
0 0
5580 2810 7730 3815
a) Laspeyres index number
P01 = ∑P1q0 × 100
∑P0q0

= 5580/ 2810 × 100 = 198.6


b) Paasches index number

P01 = ∑P1q1 × 100


∑P0q1
= 7730/3815 × 100 = 202.6
c) Fishers index number
P01 = ∑P1q0 × ∑P1q1 × 100
∑P0q0 ∑P0q1

= (5580/2810) × (7730/3875) × 100


= 2.0058 × 100
= 200.6

d) Marshall –edge index number


P01 = ∑ P1q0 + P1q1 × 100

∑P0q0 + P0q1
P01 = 5580 + 7730 × 100
2810 + 3815
= 13310/ 6625 × 100 = 200.9
REVISION QUESTIONS:
1) Explain uses and limitations of index numbers
2) Given below is a table of four commodities with the corresponding prices and quantities
over the years (2012 and 2013)
TIME

PRODUCT 2012 2013

Quantity Price (shs) Quantity Price (shs)


(Kg) (Kg)

Bread 5 5 7 6.5

Eggs 6 7.75 10 8.8

Soap 4 9.63 6 10.75

Sugar 9 12.5 9 12.75

Calculate:
a) Laspeyre’s price index
b) Paasche price index
c) Fishers price index
DECISION THEORY
Decision making is at the core of businesses and the lives of each person. Some
decisions are major and not made often while other are minor and made often. Success
in business or in life depends on the decisions made. Therefore, what is involved in
good decision making is crucial. Decision theory is an analytical and systematic
approach to the study of decision making.
It’s important to distinguish between a good decision and a bad decision. A good
decision:
 Is based on logic
 Is made after considering all available data and alternatives
 Applies appropriate quantitative techniques
A bad decision misses at least one of these components.
Even though a good decision occasionally does not result in favourable outcome it is
still a good decision because if used in the long term it results in successful outcomes. A
bad decision sometimes by luck may results in a favourable outcome but none the less it
is still a bad decision.
There are six steps involved in taking any decision irrespective of how major or minor
it’s such as taking a trip to town or investing two millions of shillings.
a) Clearly define the problem at hand (for example, whether or not produce a new
product x)
b) List possible alternatives (strategies or courses of action) which the decision maker
can choose from. For example, production of x can be from a large plant, a small
plant or some other alternatives. Not producing at all that is doing nothing is an
important alternative. All important alternatives must be considered.
c) Identify possible outcomes. The outcomes that the decision maker has no control
over are termed as states of nature. Since the product is for sale the possible
outcomes are the kind of demand for the product that will exist in the market: the
product might have high demand or it might have low demand. The full ranges of
outcomes have to be considered; pessimistic and optimistic ones.
d) List the payoffs or profit of each combination of alternatives and outcomes. It is clear
that not all decisions can be evaluated on the basis of profit but a way to measure
benefits from different alternatives and outcomes has to be found. Such payoffs are
termed as conditional values. The payoffs are more easily compared when
presented in a payoff matrix, also termed as payoff table or decision table. (see table
1)
e) Select one of the mathematical decision theory models
f) Apply model to make the decision.
Table1: pay off table (matrix) showing conditional values for a manufacturer
State of nature
Strategy or Favourable Unfavourable
alternatives market market
Construct large plant 200,000 -180,000
Construct small plant 100,000 -20,000
Do nothing 0 0
Decision Making Environment for managers:
Managers make decision in environments which can be grouped into four states:
 Certainty
 Risk
 Uncertainty
 Conflict / Game theory
Both decision theory and game theory have the objective of assisting the decision maker
by providing a structure to enable the evaluation of information of the relative
likelihood of different outcomes so that the best course of action can be identified.
a) Environment of Certainty
Certainty exists if all information required to make a decision is known and available.
This is a case of perfect information. Assuming certainty for a problem where all the
information is not known with certainty often provides a reasonable approximation of
the optimal solution. This is where all the information about which state of nature will
occur is for sure. The model used to recommend the best cause of action is deterministic
models.
b) Environment of Risk
Condition of risk exists if perfect information is not available but the probabilities of
certain outcomes can be estimated. Therefore, decision making under risk relies heavily
on probability theory. Various stochastic methods have been developed for decision
making under conditions of risk as queuing theory. In a risk situation the different
outcomes available to the decision maker have known probabilities which can be
expressed in a probability distribution or function.
The method of using the expected monetary value (EMV) is the most popular method
of decision making under risk. EMV is the weighted sum of possible payoffs for each
alternative. In this environment it’s not known exactly which state of nature will occur.
However, there is sufficient information for us to estimate the chances of occurrence of
the various state of nature. The model is used to recommend the best cause of action is
probabilistic models (stochastic models).
This includes:
i) Maximise expected monetary value
ii) Minimum expected opportunity loss
Either the case use the formula: Expected value =Σ (Real value ₓ corresponding
probability)
E (X) = Σ X P (X)
Example:
James M is a manager who is contemplating in putting up plant which could be large or
small. The following data has to interrupt; the market demand is likely to be either
favourable or unfavourable. If James constructs a large plant and under favourable
market is likely to get a profit of 200,000, but if the market demand is unfavourable he
makes loss of 180,000. If he constructs a small plant and under a favourable market he
gets a profit of 100,000 but if the market is unfavourable he gets a loss of 20,000. Further
James believed the favourable and unfavourable markets are equally likely. Represent
the above information in decision table and advice the management on what plant to
put up basing on monetary value and opportunity loss.
Solution:
Decision table:
State of nature
Strategy or Favourable market Unfavourable market (0.5)
alternatives (0.5)
Construct large plant 200,000 - 180,000
Construct small plant 100,000 -20,000
No plant 0 0
Maximise expected monetary value:
Large plant: 200,000 (0.5) + -180,000 (0.5) = 100,000 – 90,000 = 10,000
Small plant: 100,000 (0.5) + -20,000 (0.5) = 50,000 – 10,000 = 40,000
No plant: 0 (0.5) + 0 (0.5) = 0
Decision is to put up small plant as it will maximise on the expected monetary value
Opportunity loss:
This is the amount one would lose by not taking the best alternative. It is also called the
amount of regret. To obtain the regret table, for each state on nature we get the
difference between the consequences of any alternative and the best possible alternative
i.e.
Opportunity loss table/ regret table:
Options Favourable market Unfavourable market
Large plant 200,000 – 200,000 = 0 0 - -180,000 = 180,000
Small plant 200,000 – 100,000 = 100,000 0 - -20,000 = 20,000
No plant 200,000 – 0 = 200,000 0–0=0
Expected opportunity loss;
Large plant: 0 (0.5) + 180,000 (0.5) = 90,000
Small plant: 100,000 (0.5) + 20,000 (0.5) = 60,000
No plant: 200,000 (0.5) + 0 (0.5) = 100,000
Decision is put up small plant as it minimises on the opportunity loss.
c) Environment under uncertainty

These refer to situations where more than one outcome can result from any single
decision. Several methods are used to make decision in circumstances where only the pay
offs are known and the likelihood of each state of nature are known.

a) Maximin Method
This criteria is based on the “conservative approach’ to assume that the worst possible is
going to happen. The decision maker considers each strategy and locates the minimum
pay off for each and then selects that alternative which maximizes the minimum payoff

Illustration
Rank the products A B and C applying the Maximin rule using the following payoff table
showing potential profits and losses which are expected to arise from launching these
three products in three market conditions

Pay off table in £ 000’s


Boom Steady state Recession Mini profits
condition row minima
Product A +8 1 -10 -10
Product B -2 +6 +12 -2
Product C +16 0 -26 -26

Table 1
Ranking the MAXIMIN rule = BAC

b) MAXIMAX method
This method is based on ‘extreme optimism’ the decision maker selects that particular
strategy which corresponds to the maximum of the maximum pay off for each strategy

Illustration
Using the above example
Max. profits row maxima
Product A +8
Product B +12
Product C +16

Ranking using the MAXIMAX method = CBA

c) MINIMAX regret method


This method assumes that the decision maker will experience ‘regret’ after he has made
the decision and the events have occurred. The decision maker selects the alternative
which minimizes the maximum possible regret.

Illustration
Regret table in £ 000’s
Boom Steady state Recession Mini regret row
condition maxima
Product A 8 5 22 22
Product B 18 0 0 18
Product C 0 6 38 38

A regret table (table 2) is constructed based on the payoff table. The regret is the
‘opportunity loss’ from taking one decision given that a certain contingency occurs in
our example whether there is boom steady state or recession
The ranking using MINIMAX regret method = BAC
d) The expected monetary value method
The expected pay off (profit) associated with a given combination of act and event is
obtained by multiplying the payoff for that act and event combination by the probability
of occurrence of the given event. The expected monetary value (EMV) of an act is the
sum of all expected conditional profits associated with that act

Example
A manager has a choice between
i. A risky contract promising shs 7 million with probability 0.6 and shs 4 million
with probability 0.4 and
ii. A diversified portfolio consisting of two contracts with independent outcomes
each promising Shs 3.5 million with probability 0.6 and shs 2 million with
probability 0.4
Can you arrive at the decision using EMV method?

Solution
The conditional payoff table for the problem may be constructed as below.
(Shillings in millions)
Event Probability Conditional pay offs Expected pay off decision
Ei (Ei) decision
(i) Contract Portfolio(iii) Contract (i) x Portfolio (i) x
(ii) (ii) (iii)
Ei 0.6 7 3.5 4.2 2.1
E2 0.4 4 2 1.6 0.8
EMV 5.8 2.9

Using the EMV method the manager must go in for the risky contract which will yield
him a higher expected monetary value of shs 5.8 million

e) Expected opportunity loss (EOL) method


This method is aimed at minimizing the expected opportunity loss (OEL). The decision
maker chooses the strategy with the minimum expected opportunity loss

f) The Hurwitz method


This method was the concept of coefficient of optimism (or pessimism) introduced by L.
Hurwicz. The decision maker takes into account both the maximum and minimum pay
off for each alternative and assigns them weights according to his degree of optimism (or
pessimism). The alternative which maximizes the sum of these weighted payoffs is then
selected

g) The Laplace method


This method uses all the information by assigning equal probabilities to the possible
payoffs for each action and then selecting that alternative which corresponds to the
maximum expected pay off
Example
A company is considering investing in one of three investment opportunities A, B and C
under certain economic conditions. The payoff matrix for this situation is economic
condition

Investment 1£ 2£ 3£
opportunities
A 5000 7000 3000
B -2000 10000 6000
C 4000 4000 4000

Determine the best investment opportunity using the following criteria


i. Maximin
ii. Maximax
iii. Minimax
iv. Hurwicz (Alpha = 0.3

Solution
Economic condition
Investment 1£ 2£ 3£ Minimum Maximum
opportunities £ £
A 5000 7000 3000 3000 7000
B -2000 10000 6000 -2000 10000
C 4000 4000 4000 4000 4000
i. Using the Maximin rule Highest minimum = £ 4000
Choose investment C
ii. Using the Maximax rule Highest maximum = £ 10000
Choose investment B
a. Minimax Regret rule

1 2 3 Maximum regret
A 0 3000 3000 3000
B 7000 0 0 7000
C 1000 6000 2000 6000

Choose the minimum of the maximum regret i.e. £3000


Choose investment A
iii. Hurwicz rule: expected values
For A (7000 x 0.3) + (3000 x 0.7) = 2100 + 2100 = £4200
For B (10000 x 0.3) + (-2000 x 0.7) = 3000 + 1400 = £ 1600
For C (4000 x 0.3) + (4000 x 0.7) = 1200 + 2800 = £ 4000
Best outcome is £ 4200 choose investment A
GAME THEORY
 Game theory is used to determine the optimum strategy in a competitive situation.
 When two or more competitors are engaged in making decisions, it may involve
conflict of interest.
 In such a case the outcome depends not only upon an individual’s action but also
upon the action of others.
 Both competing sides face a similar problem. Hence game theory is a science of
conflict
Game theory does not concern itself with finding an optimum strategy but it helps to improve
the decision process.
Game theory has been used in business and industry to develop:
 bidding tactics,
 pricing policies,
 advertising strategies,
 timing of the introduction of new models in the market e.t.c.

RULES/ ASSUMPTIONS OF GAME THEORY


i. The number of competitors is finite
ii. There is conflict of interests between the participants
iii. Each of these participants has available to him a finite set of available courses of
action i.e. choices
iv. The rules governing these choices are specified and known to all players
v. While playing each player chooses a course of action from a list of choices
available to him.
vi. The outcome of the game is affected by choices made by all of the players. The
choices are to be made simultaneously so that no competitor knows his
opponents choice until he is already committed to his own.
vii. The outcome for all specific choices by all the players is known in advance and
numerically defined.

NOTE: When a competitive situation meets all these criteria above we call it a game. Only
in a few real life competitive situation can game theory be applied because all the rules are
difficult to apply at the same time to a given situation.

LIMITATIONS OF GAME THEORY:


a) Most of the competitive situations in which managerial decisions are made are never
really a two-person games because the government and or society are present as the
third and /or fourth persons in the game.
b) There are many situations in the managerial decisions environment when both the
competitors may lose or gain i.e. it may not be a zero-sum game.
c) In real life game, the two competitors rarely have equal information or intelligence.
d) The technique of solving games involving mixed strategies practically in case of
larger pay off matrices is very complicated. This limit the application of this
analysis.

DEFINITION OF TERMS:
Game: It is an activity between two or more persons involving actions by each one of
them according to a set of rules which results in some gain for each. If in a game the
actions are determined by skills, it is called game of strategy but if they are determined
by chance it is termed as a game of chance.

Player: Is each participant or competitor playing a game.


Play: A play of the game is said to occur when each player chooses one of his courses of
action.

Strategy: It is the total pattern of choices employed by any player. It’s a complete set of
plan of action specifying precisely what the player will do under every possible future
contingency that might occur during the play of the game. Two types of strategies are:
a) Pure strategy – It’s a situation where each player in the game adopts a simple
strategy as an optimal strategy. Here the value of the game is the same for both
players.
b) Mixed strategy – A player adopt a mixture of strategies if the game is played
many times. In this case the players’ uses a combination of strategies and each
player always keep guessing as to which course of action is to be selected by the
other player at a particular occasion. Thus, there is a probabilistic situation and
objective of the player is to maximize expected gains or to minimize losses.

Example
Two players X and Y have two alternatives each. They show their choices by pressing
two types of buttons in front of them but they cannot see the opponents move. It is
assumed that both players have equal intelligence and both intend to win the game.
This sort of simple game can be illustrated in tabular form as follows:
Player Y
Player X Button r Button t
Button m X wins 2 points X wins 3 points
Button n Y wins 2 points X wins 1 point

The game is biased against Y because if player X presses button ‘m’ he will always win.
Hence Y will be forced to press button r to cut down his losses

Alternative example
Player Y
Player X Button r Button t
Button m X wins 3 points Y wins 4 points
Button n Y wins 2 points X wins 1 point

In this case X will not be able to press button ‘m’ all the time in order to win (or button ‘n’).
Similarly Y will not be able to press button ‘r’ or button‘t’ all the time in order to win. In
such a situation each player will exercise his choice for part of the time based on the
probability.

STANDARD CONVENTIONS IN GAME THEORY:


Consider the following table
Y
X 3 -4
-2 1
(Assuming X wins on +ve and Y wins on –ve)
X plays row I, Y plays columns I, X wins 3 points
X plays row I, Y plays columns II, X looses 4 points
X plays row II, Y plays columns I, X looses 2 points
X plays row II, Y plays columns II, X wins 1 point

3, -4, -2, 1 are the known pay offs and here the game has been represented in the form of a
matrix. When the games are expressed in this fashion the resulting matrix is commonly
known as PAYOFF MATRIX.

STRATEGY:
It refers to a total pattern of choices employed by any player. Strategy could be pure or a
mixed.
 In a pure strategy, player X will play one row all of the time or player Y will also
play one of the column all the time.
 In a mixed strategy, player X will play each of his rows a certain portion of the time
and player Y will play each of his columns a certain portion of the time.

VALUE OF THE GAME:


Refers to the average pay off per play of the game over an extended period of time.

a) Pure strategy Game


Example
Determine the optimum strategies for the two players X and Y and find the value of the
game from the following pay off matrix
Player Y
 3 -1 4 2 
Player X -1 -3 -7 0 
 4 -7 3 -9 
Strategy assume the worst and act accordingly if X plays first with his row one then Y will
play with his 2nd column to win 1 point similarly if X plays with his 2nd row then Y will
play his 3rd column to win 7 points and if x plays with his 3rd row then Y will play his
fourth column to win 9 points

In this game X cannot win so he should adopt first row strategy in order to minimize
losses
This decision rule is known as ‘maximum strategy’ i.e. X chooses the highest of these
minimum pay offs

Using the same reasoning from the point of view of y


If Y plays with his 1st column, then X will play his 3rd row to win 4 points
If Y plays with his 2nd column, then X will play his 1st row to lose 1 point
If Y plays with his 3rd column, then X will play his 1st row to win 4 points
If Y plays with his 4th column, then X will play his 1st row to win 2 points

Thus player Y will make the best of the situation by playing his 2nd column which is a
‘Minimax strategy’
This game is also a game of pure strategy and the value of the game is –1(win of 1 point
per game to y) using matrix notation, the solution is shown below
Player Y
Row Minimum
 3 -1 4 2  1
Player X -1 -3 -7 0  7
 4 -7 3 -9  9
4 -1 4 2
column maximum

In this case value of the game is –1


Minimum of the column maximums is –1
Maximum of the row is also –1
And best strategies are: For player X 1st row
For Player Y 2nd column
Saddle Point
The saddle point in a pay off matrix is one which is the smallest value in its row and the
largest value in its column. It is also known as equilibrium point in the theory of games.

Saddle point also gives the value of such a game. In a game having a saddle point, the
optimum strategy for both players is to play the row or column containing the saddle
point.

Note: if in a game there is no saddle point the players will resort to what is known as
mixed strategies.

b) Mixed Strategies
Example
Find the optimum strategies and the value of the game from the following pay off matrix
concerning two person game
Player Y
1 4 
Player X  
5 3 
In this game there is no saddle point.

Let Q be the proportion of time player X spends playing his 1st row and 1-Q be the
proportion of time player X spends playing his 2nd row;

Similarly
Let R be the proportion of time player Y spends playing his 1st column and 1-R be the
proportion of time player Y spends playing his second row
The following matrix shows this strategy
Player Y
R 1 R
Q 1 4 
1  Q 5 3 
Player X

X’s strategy
X will like to divide his play between his rows in such a way that his expected winning or
loses when Y plays the 1st column will be equal to his expected winning or losses when y
plays the second column

Column 1
Points Proportion played Expected winnings
1 Q Q
5 1-Q 5(1-Q)

Total = Q + 5(1 –Q)


Column 2
Points Proportion played Expected winnings
4 Q 4Q
3 1-Q 3(1-Q)

Total = 4Q + 3(1 –Q)


Therefore Q + 5(1-Q) = 4Q +3(1-Q)
Giving Q = 2 and (1-Q) = 3
5 5
This means that player X should play his first row 2 th of the time and his second row
5
3 th of the time
5
Using the same reasoning
1×R + 4(1-R) = 5R +3(1-R)
Giving R = 1 and (1-R) = 4
5 5
This means that player Y should divide his time between his first column and second
column in the ratio 1:4
Player Y
1 4
5 5
2
5 1 4 
Player X 5 3 
 
3
5

Short cut method of determining mixed matrices


Player Y
1 4 
Player X  
5 3 
Step I
Subtract the smaller pay off in each row from the larger one and smaller pay off in each
column from the larger one
1 4  4 -1  3
5 3  5 - 3  2
 
5 1  4 4  3  1

Step II
Interchange each of these pairs of subtracted numbers found in step I
1 4  2
5 3  3
 
1 4
Thus player X plays his two rows in the ratio 2: 3
And player Y plays his columns in the ratio 1:4
This is the same result as calculated before

To determine the value of the game in mixed strategies


In a simple 2 x 2 game without a saddle point, each players strategy consists of two
probabilities denoting the portion of the time he spends on each of his rows or columns.
Since each player plays a random pattern the probabilities are listed under

Pay off Strategies which produce this pay Joint probability


off
1 Row I column I 2 1 2
5 5 25
4 Row I column II 2  4 8
5 5 25
5 Row II column I 3  1 3
5 5 25
3 Row II column II 3  4  12
5 5 25

Expected value (or value of the game)


Pay off Probability p(x) Expected value x (p(x)
1 2 2
25 25
4 8 32
25 25
5 3 15
25 25
3 12 36
25 25

Ƹx p(x) = 85/25 = 17/5 = 3.4


3.4 is the value of the game

DOMINANCE
Dominated strategy is useful for reducing the size of the payoff table.

Rule of dominance
i. If all the elements in a column are greater than or equal to the corresponding
elements in another column, then the column is dominated.

ii. Similarly if all the elements in a row are less than or equal to the corresponding
elements in another row, then the row is dominated.

Dominated rows and columns may be deleted which reduces the size of the game to a 2 by
2 game.

N.B. Always look for dominance then saddle points first when solving a game problem.

Example:
Determine the optimum strategies and the value of the game from the following 2 x m pay
off matrix game for X and Y
Y
6 3 1 0 3 
X  
3 2 4 2 1
In this columns I, II, and IV are dominated by columns III and V hence Y will not play
these columns.

So the game is reduced to 2×2 matrix, hence this game can be solved using methods
already discussed.
Y
 1 3
X  
 4 1

You might also like