Introduction To Statistics DaDU

Contents
CHAPTER ONE ............................................................................................... 1
1. INTRODUCTION ......................................................................................... 1
1.1 Definition and Classification of Statistics ............................................... 1

1.2 Stages in Statistical Investigation .......................................................... 2
1.3 Definition of some basic statistical terms ............................................... 3
1.4 Application, Uses and Limitation of Statistics ........................................ 5
1.5.1 Types of Variables .................................................................................... 7
1.5.2 Scale of Measurement ............................................................................. 8
CHAPTER TWO ........................................................................................... 13
2. METHODS OF DATA COLLECTION AND PRESENTATION.......................... 13
1.1 Methods of Data Collection ................................................................... 14

2.1.1 Source of Data ........................................................................................ 16
2.2 Methods of Data Presentation ............................................................... 17

2.2.1 Tabular Presentation .............................................................................. 17
2.2.2 Diagrammatic and Graphic Presentation of Data ................................ 24
CHAPTER 3 ................................................................................................... 33
3. MEASURES OF CENTERAL TENDENCY .................................................... 33
3.1 Introduction ............................................................................................ 34
3.2 Types of Measures of Central Tendency ................................................ 37

3.2.1Arithmetic Mean ...................................................................................... 37
3.2.2 Geometric Mean ...................................................................................... 42
3.2.3 Harmonic Mean ...................................................................................... 43
3.3.4 The Mode or Modal Value ...................................................................... 45
3.2.5 Median ..................................................................................................... 47
i
1.2.6 Quantiles ................................................................................................. 50
CHAPTER FOUR ............................................................................................ 56
5. MEASURES OF VARIATION ....................................................................... 56
4.1 Introduction ......................................................................................... 57

4.2 Objectives of Measuring Variation ......................................................... 57
4.3 Absolute and Relative Measures of Dispersion ...................................... 58
4.3.1 The Range and Relative Range .............................................................. 58
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation ............. 60
4.3.3 The Mean Deviation And Coefficient Of Mean Deviation ..................... 60
4.5 The standard Score (Z-score).. .............................................................. 68

4.6. Moments, Skewness and Kurtosis .......................................................... 70
CHAPTER FIVE ............................................................................................. 74
5. ELEMENTARY PROBABLITY ...................................................................... 74
5.1 INTRODUCTION ................................................................................... 75

5.3 Counting Rules ..................................................................................... 76
5.4 Approaches to Measuring Probability .................................................... 83
5.5 Conditional Probability and Independency ............................................ 89
5.5.1 Conditional Events: ................................................................................ 89
5.5.2 Conditional Probability of an Event ...................................................... 89
Review Exercise on Chapter Five ................................................................... 91
CHAPTER SIX ............................................................................................... 93
6. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS....................... 93
Introduction ............................................................................................... 94
6.1 Random Variable .................................................................................. 94
6.2 Probability Distribution ........................................................................ 95
6.3 Introduction to Expectation .................................................................. 98
6.4 Common Discrete Probability Distributions ........................................ 100
ii
6.5 Common Continuous Probability Distributions ................................... 105
7.1 Introduction ....................................................................................... 111
7.2 Definitions of Some Basic Terms in Sampling ..................................... 112
7.3 Sampling Techniques.......................................................................... 115
7.3 Sampling Distribution ........................................................................ 125
7.4 The Central Limit Theorem ................................................................. 128
CHAPTER EIGHT......................................................................................... 131
8. ESTIMATION AND HYPOTHESIS TESTING .............................................. 131
8.1 Introduction ....................................................................................... 132

8.2 Statistical Estimation ......................................................................... 132
8.2.1 Point Estimation ................................................................................... 133
8.2.2 Interval Estimation ............................................................................... 134
8.2 Hypothesis Testing ............................................................................. 139

8.3 Types and Size of Errors ..................................................................... 144
8.4 Test of Association .............................................................................. 144
CHAPTER NINE ........................................................................................... 150
9. SIMPLE LINEAR REGRESSION AND CORRELATION ............................... 150
9.1 Introduction ....................................................................................... 151

9.2 Simple Linear Regression .................................................................... 151
9.3 Simple Correlation and Coefficient of Determination .......................... 156
9.3.1 Simple correlation (r) ............................................................................ 156
9.3.2 Coefficient of Determination (r2) .................................................... 158

9.4 Spearman‘s Rank Correlation Coefficient ........................................... 159
APPENDIX A................................................................................................ 163
APPENDIX B ............................................................................................... 167
Answers for Exercises ............................................................................... 167

REFERENCES ............................................................................................. 172
iii
iv
CHAPTER ONE
1. INTRODUCTION
Objectives
At the end of this chapter students will be able to:
 define Statistics, population, census, sample survey, parameter and
variable
 distinguish descriptive statistics and inferential statistics
 identify the types of variables and level of measurement
 identify applications, uses, and limitations of statistics
1.1 Definition and Classification of Statistics

We can define the word statistics in two ways.
Definition1. Statistics (Plural sense): statistics are numerical statements of
facts in any department of enquiry placed in relation to each other. The classified
facts respecting the condition of the people in a stated especially those facts
which can be stated in numbers or in tables of numbers or in any tabular or
classified arrangement. The statistics of raw data themselves like numerical data
on births, marital status, death, transmitted diseases, employment (rates),
unemployment(rates), production, price, inflation rate.
Definition 2: Statistics (singular sense): Statistics is defined as the science of

collecting, organizing, presenting, analyzing and interpreting numerical data
for the purpose of assisting in making a more effective decision.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data
are used.
a) Descriptive Statistics: It is an area of statistics which is mainly
concerned with the methods and techniques used in collection,
1
organization, presentation, and analysis of a set of data without making any
conclusions or inferences.
Example 1.2: Suppose that the mark of 10 students in Statistics course for
sport science section A is given as 55, 40, 50, 60, 78,90,80,75, 70 and 85. The
average mark of the 10 students is 68.3 and it is considered as descriptive
statistics.
b) Inferential Statistics: Inferential statistics is an area of statistics which
deals with the method of inferring or drawing conclusion about the
population based upon the results of a sample. It consists of performing
hypothesis testing, determining relationships among variables and making
predictions.
Example 1.3: the average income of all families (the population) in Ethiopia can
be estimated from figures obtained from a few hundred (the sample) families.
 It is important because statistical data usually arises from sample.
1.2 Stages in Statistical Investigation

The area of statistics points out the following five stages. These are collection,
organization, presentation, analysis and interpretation of data.
i. Collection of data: This is the process of obtaining measurements or counts
or obtaining raw data.
Data can be collected in a variety of ways; one of the most common methods is
through the use of sample or census survey. Survey can also be done in different
methods, three of the most common methods are:
 Telephone survey
 Mailed questionnaire
 Personal interview.
ii. Organization of data: Data collected from published sources are generally in
organized form. However if an investigator has collected data through a survey, it
is necessary to edit these data in order to correct any apparent inconsistencies,
ambiguities, and recording errors.
2
This phase also includes correcting the data for errors, grouping data into
classes and tabulating.
iii. Presentation of data: After the data have been collected and organized they
can be presented in the form of tables, charts, diagrams and graphs. This
presentation in an orderly manner facilitates the understanding as well as
analysis of data.
iv. Analysis of data: the basic purpose of data analysis is to dig out useful
information for decision making. This analysis may simply be a critical
observation of data to draw some meaningful conclusions about it or it may
involve highly complex and sophisticated mathematical techniques.
v. Interpretation of data: Interpretation means drawing conclusions from the
data collected and analyzed. Correct interpretation will lead to a valid conclusion
of the study & thus can aid in decision making.
1.3 Definition of some basic statistical terms
Population: A population is a totality of things, objects, peoples, etc

about which information is being collected. It is the totality of
observations with which the researcher is concerned. The population
represents the target of an investigation, and the objective of the
investigation is to draw conclusions about the population hence we
sometimes call it target population.
Examples
 Population of trees under specified climatic conditions.
 Population of animals fed a certain type of diet.
 Population of farms having a certain type of natural fertility
 Population of households, etc
The population could be finite or infinite (an imaginary collection of
units).
There are two ways of investigation: Census and sample survey.
Census: Censes survey (studying the whole population without
3
considering samples) requires a great deal of time, money and energy.
Trying to study the entire population is in most cases technically and
economically not feasible. To solve this problem, we take a representative
sample out of the population on the basis of which we draw conclusions
about the entire population.
Sample survey: A sample is a subset or part of a population selected to
draw conclusions about the population
Therefore, sampling survey
 Helps to estimate the parameter of a large population.
 Is cheaper, practical, and convenient.
 Save time and energy.
 Easy to handle and analysis.
Sampling: The process of selecting a sample from the population is called

sampling.
Parameter: Characteristic or measure obtained from a population. It is the
population measurement used to describe the population.
Example: population mean and population standard deviation
Statistic: Characteristic or measure obtained from a sample. It is a measure

used to describe the sample
Sampling frame: A list of people, items or units from which the sample is taken.
Sample size: The number of elements or observation to be included in the

sample.
Data: Data as a collection of related facts and figures from which conclusions
may be drawn.
Variable: It is an item of interest that can take on many different numerical
values.
4
1.4 Application, Uses and Limitation of Statistics
Application
No research activity can takes place without having the knowledge and
application of statistics
 To control the quality of product in a given production process
 To compare the breaking strength of two types of minerals
 To determine the probability of reliability (how often the product fails?)
of a product. if a product requires frequently repairs it is unreliable
 To compare the improvement of yield due to certain additives like
fertilizer, herbicides, pesticide
Function/Uses of Statistics
Today the field of statistics is recognized as a highly useful tool to making decision
process by managers of modern business, industry, frequently changing technology. It
has a lot of functions in everyday activities. The following are some uses of statistics:
a. It condenses and summarizes a mass of data: the original set of data

(raw data) is normally voluminous and disorganized unless it is
summarized and expressed in few presentable, understandable & precise
figures. The complex data may be reduced to totals, averages,
percentages, etc and presented either graphical or diagrammatically.
These devices help us to understand quickly the significant
characteristics of the numerical data. Single figures like averages &
percentages can be grasped more easily than a mass of statistical data.
b. Statistics facilitates comparison of data: measures obtained from
different set of data can be compared to draw conclusion about those sets.
Certain facts, by themselves, may be meaningless unless they are capable
of being compared with similar facts at other places or at other period of
time.
Example: we estimate the national income of Ethiopia not essentially for
the value of that fact itself but mainly in order to compare the income of
5
today with that of the past & thus draw conclusions as to whether the
standard living of the people is on the increase, decrease or stationary.
Some of the methods of comparisons provided by statistics are totals,
ratios, averages (measures of central tendency), and measure of variation,
graphs, diagrams & coefficients.
c. Statistics helps to predict future trends: statistics is very useful for
analyzing the past and present data and forecasting future events.
d. Statistics helps to formulate & review policies: Statistics provide the
basic material for framing suitable policies. Statistical study results in the
areas of taxation, on unemployment rate, on inflation, on the performance
of every sort of military equipment, etc, may convince a government to
review its policies and plans with the view to meet national needs and
aspirations.
e. Formulating and testing hypothesis: Statistical methods are extremely
useful in formulating and testing hypothesis and to develop new theories.
Limitations of Statistics
The field of statistics, though widely used in all areas of human knowledge and
widely applied in a variety of disciplines such as engineering, economics and
research, has its own limitations. Some of these limitations are:
a) It does not deal with individual values: as discussed earlier, statistics deals
with aggregate of facts. For example, wage earned by an individual worker at any
one time, taken by itself is not a statistics.
b) It does not deal with qualitative characteristics directly: statistics is not
applicable to qualitative characteristics such as beauty, honesty, poverty,
standard of living and so on since these cannot be expressed in quantitative
terms. These characteristics, however, can be statistically dealt with if some
quantitative values can be assigned to these with logical criterion. For example,
intelligence may be compared to some degree by comparing IQs or some other
scores in certain intelligence tests.
6
c) Statistical conclusions are not universally true: since statistics is not an
exact science, as is the case with natural sciences, the statistical conclusions are
true only under certain assumptions.
d) It can be misused: statistics cannot be used to full advantage in the absence
of proper understanding of the subject matter.
1.5 Types of Variables & Measurement Scales

1.5.1 Types of Variables
A variable is a condition that can differ from one case to another or one object to
the other. It is a quantity which shows variability & takes on different values.
Example:- weight , height, age, production, blood pressure, heart beat, number
of patients on a given hospital, sex etc
There are two types of variables
A. Qualitative variables:
These are variables that can be placed in distinct category according to some
characteristics.
Qualitative variables are nonnumeric and cannot be measured or counted.
Example: - religion, gender, race, beauty, religion, degree of pain, place of birth,
ethnic group, type of drug, stages of breast cancer (I, II, III, or IV), degree of pain
(minimal, moderate, severe or unbearable).
B. Quantitative variables:
Quantitative variables are that can be quantified or can have numerical values
and it can be measured and counting.
Example: weight, height, age, production, blood pressure, heart beat, number of
patients on a given hospital etc
A quantitative variable is one that can be measured and expressed numerically
and they can be of two types (discrete or continuous).
i. Discrete variables: are variables which can assume only a specific
number of values. Discrete variables are a result of counting and values
are usually whole numbers,
7
Example: the number of items purchased, the number of HIV patient indifferent
year, number of students in Assosa university, number of chairs, number of
accidents in a given year, number of defective items in a given production
process, number of employees, number of family members....
ii. Continuous variable: continuous variables are variables that can have
any value with in an interval. The values of continuous variables are obtained by
measurement.
Example: weight, height, blood pressure, age, expenditure, productions, rainfall
generally any measurable quantity etc.
1.5.2 Scale of Measurement

Proper knowledge about the nature and type of data to be dealt with is essential
in order to specify and apply the proper statistical method for their analysis and
inferences. Measurement scale refers to the property of value assigned to the
data based on the properties of order, distance and fixed zero.
In mathematical terms measurement is a functional mapping from the set of
objects {Oi} to the set of real numbers {M(Oi)}.
The goal of measurement systems is to structure the rule for assigning numbers
to objects in such a way that the relationship between the objects is preserved in
the numbers assigned to the objects. The different kinds of relationships
preserved are called properties of the measurement system.
8
Order
The property of order exists when an object that has more of the attribute than
another object, is given a bigger number by the rule system. This relationship
must hold for all objects in the "real world". The property of ORDER exists When
for all i, j if Oi > Oj, then M(Oi) > M(Oj).
Distance
The property of distance is concerned with the relationship of differences

between objects. If a measurement system possesses the property of distance it
means that the unit of measurement means the same thing throughout the scale
of numbers. That is, an inch is an inch, no matters were it falls - immediately
ahead or a mile downs the road.
More precisely, an equal difference between two numbers reflects an equal

difference in the "real world" between the objects that were assigned the
numbers. In order to define the property of distance in the mathematical
notation, four objects are required: Oi, Oj, Ok, and Ol . The difference between
objects is represented by the "-" sign; Oi - Oj refers to the actual "real world"
difference between object i and object j, while M(Oi) - M(Oj) refers to differences
between numbers.
The property of distance exists, for all i, j, k, l
If Oi-Oj ≥ Ok- Ol then M(Oi)-M(Oj) ≥ M(Ok)-M( Ol ).
Fixed Zero
A measurement system possesses a rational zero (fixed zero) if an object that has
none of the attribute in question is assigned the number zero by the system of
rules. The object does not need to really exist in the "real world", as it is
somewhat difficult to visualize a "man with no height". The requirement for a
rational zero is this: if objects with none of the attribute did exist would they be
given the value zero. Defining O0 as the object with none of the attribute in
question, the definition of a rational zero becomes: The property of fixed zero
exists if M(O0) = 0.
9
Scale Types
Measurement is the assignment of values to objects or events in a systematic
fashion. Four levels of measurement scales are commonly distinguished:
nominal, ordinal, interval, and ratio and each possessed different properties of
measurement systems. The first two are qualitative while the last two are
quantitative.
Nominal scale: The Nominal scales are measurement systems that possess
none of the three properties stated above. The nominal scale applies to data that
are used for category identification. The nominal level of measurement is
characterized by data that consist of names, labels, or categories only. Nominal
scale data cannot be arranged in an ordering scheme. The arithmetic operations
of addition, subtraction, multiplication, and division are not performed for
nominal data. In this scale one different from the other, they are not
interchangeable & ranking, ordering, mathematical comparisons (<,>, =) is
impossible.
Example1.4: eye color: (brown, black, others), sex: (male, female), Political party
preference (Republican, Democrat, or Others), Marital status: (married, single,
widow, divorce), Regional differentiation of Ethiopia.
Ordinal scale: - Ordinal Scales are measurement systems that possess the
property of order, but not the property of distance. The property of fixed zero is
not important if the property of distance is not satisfied. Thus nominal and
ordinal scales are sometimes collectively called categorical scales. However, an
ordinal scale provides additional information. An ordinal scale of measurement,
in addition to the function of classification, allows cases to be ordered or ranked
by degree according to measurements of the variable. Arithmetic operations (+, -,
*, ÷) are not applicable but relational operations (<, >) are applicable.
Example1.5: Letter grading (A, B, C, D, F), rating scales (excellent, very good,
good), etc
10
Interval Level: Interval scales are measurement systems that possess the
properties of Order and distance, but not the property of fixed zero. Level of
measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
Arithmetic operations such as Multiplication and division are not possible. But
addition, subtraction and Relational operations are applicable.
Note: Celsius & Fahrenheit temperature readings have no meaningful zero and
ratios are meaningless. The zero point of interval scale of measurement does not
indicate an absence of the measured scale. Example 0℃ does not mean absence
of temperature.
Example 1.5: IQ, Temperature.
Ratio scale: Ratio scales are measurement systems that possess all three
properties: order, distance, and fixed zero. There exist a zero point (true zero,
absolute zero, unique zero point) that means True zero starting point. Ratio data
Set apart from ordinal (increasing order) and interval data (equal spacing) by
having the additional property of an absolute lower value (such as zero) that
corresponds to the absence of the measure. There is a physical significant to this
zero point, this means the zero point of this scale indicate an absence of the
measured scale. All arithmetic (+, -, *, ÷) and relational operations are applicable.
Ratio variables exhibit the characteristics of nominal, ordinal and interval
measurement. Ratio variables can be continuous or discrete.
Example 1.6: weight, length (height), volume, age(time), blood
pressure(pressure), heartbeat, area, rainfall, monthly consumption, amount of
money in the pocket etc...
Summary of chapter one
Dear student have you made yourself familiar with each of the following
statistical concepts very well? If not, please make sure that you are able to
define, explain, describe, distinguish differences or similarities among the terms
11
considered in the unit by revising the lessons where you can find the terms you
are interested in.
 Cheek yourself whether you are able to define:
 Variables
 Population
 Census
 Sample surveys
 Cheek yourself whether you are able to distinguish between:
 Parameter and Statistic
 Qualitative and quantitative variable
 Cheek whether you can give sufficient explanation about:
 Types of statistical data
 Application of statistics
 Limitation of statistics
 Uses of statistics
 Measurement scales
 Cheek whether you are able to describe:
 Types of variable
 General steps for statistical investigation
 Cheek whether you are able to define the field of study ―statistics‖
Review Exercise on Chapter One

1. Classify the following sentences as belonging to the area of descriptive
statistics or inferential statistics.
i. As a result of recent cutbacks by oil-producing nations, we can expect the
price of gasoline to double in the next year.
ii. The average monthly income of all household in city x is 500 dollars
based on sample data.
12
iii. Adane concludes that his chance of passing the first year this academic
year is at least 80% based on the statistics that 75% of the freshmen
passed last year.
2. Suppose the CGPA of all students taking stat 3011 is 3.5, you determine
the average CGPA of a sample of 50 students is 2.8, then, Determine:
a) the population
b) the Sample
c) the variable under study
d) the parameter of interest and
e) Is the variable qualitative or quantitative
3. Classify each of the following first as qualitative or quantitative and
second as nominal (categorical) ordinal, interval, or ratio measure.
a) Times for swimmers to complete a 50-meter race
b) Months of the year Meskerm, Tikimit…
c) Socioeconomic status of a family when classified as low, middle and upper
classes.
d) Blood type of individuals, A, B, AB and O.
e) Pollen counts provided as numbers between 1 and 10 where 1 implies there
is almost no pollen and 10 that it is rampant, but for which the values do
not represent an actual counts of grains of pollen.
f) Regions numbers of Ethiopia (1, 2, 3 etc.)
g) The number of students in a college;
h) The net wages of a group of workers;
CHAPTER TWO
2. METHODS OF DATA COLLECTION AND PRESENTATION
13
Objectives
Up on completing this chapter students will be able to:
 Describe the different method of data collection
 Construct ungrouped (discrete), grouped (continuous), relative and cumulative
frequency distribution for raw data.
 Compute class mark, class width, class limits, class boundaries, relative
frequency and cumulative frequency.
 Present numerical data using suitable graphs or diagrams.
Introduction
This unit will deal how to collect and present the data you have collected so that
they can be of use. Thus the collected data also known as raw data are always in
an unorganized form and need to be organized and presented in a meaningful
and readily comprehensible form in order to facilitate further statistical analysis.
1.1 Methods of Data Collection

Definition some basic terms
Data: Data defined as any information collected as parts of a research project
and numerical result of any scientific measurement it may be in the form of
counting or measurement.
Raw data: are collected data, which have not been organized numerically.
Array: is an arrangement of raw numerical data in ascending or descending
order of magnitude. It enables us to know the range of the data set easily and it
also gives us some idea about the general characteristics of the distribution.
Frequency: is the number of times a certain value of the variable repeated in the
given data or the number of times a certain value (set of value) occurs in a
specific group.
There are two things which must be considered before starting the data
collection. These are:
14
a. Statement of the purpose of investigation (objective)
b. Plan of data collection
A. Purpose of investigation (objective of statistical inquiry):
The objective of statistical investigation may be:
1. To supplement, disprove or to test some theory (hypothesis) which is
current.
2. To discover a new theory (hypothesis)
3. To solve a problem involving the inter dependence of several group of facts
B. Plan of data collection: in planning data collection the following points

should be considered:
a. Scope of inquiry: decide with reference to
 Time: the work of collection of data must be finished within a
reasonable time. Reasonable depends on the nature of the
phenomenon under investigation. If the phenomenon is such where
the conditions change quickly & frequently the duration of the
process of investigation should narrowed to such an extent that
there is no possibility of a change affecting the data.
 Space:
 Political & administrative (country, district, woreda,
municipality)
 Economic division (agriculture & animal husbandry), mining ,
manufacturing, trade, transport
 Natural or climate division (plaints, mountains, plateaus, forests
)
 The number of items included in the study: it means the
questions of choice between the census and the sampling technique
of data collection.
 Census: each item constituting for information in the
population(the universe is enumerated)
15
 Sample: a limited number of items is taken in to
account(this limited of items regarded as the sample of the
population)
2.1.1 Source of Data

Any scientific investigation requires data related to the study. The required data
is obtained from two sources called primary & secondary.
A. Primary Sources: is a source of data that supplies firsthand information
for the use of immediate purpose. Primary data are data originally
collected for the immediate purpose. The sources of primary data are the
objects under study themselves and there is also a direct contact between
the investigator and the items (objects) under investigation because of this
it is more expensive.
B. Secondary Sources: When an investigator uses data, which have already
been collected by others, such data are called "Secondary Data". Such
data are primary data for the agency that collected them, and become
secondary for someone else who uses these data for his own purposes. The
secondary data can be obtained from journals, reports, government
publications, publications of professionals and research organizations.
Secondary data are less expensive to collect both in money, cost and time.
NB:
 Primary data are more expensive than secondary data.
 Data which are primary for one may be secondary for the other.
Method of Primary Data Collection

In primary data collection, you collect the data yourself using methods such as
interviews, observations, laboratory experiments and questionnaires. The key
point here is that the data you collect is unique to you and your research and,
until you publish, no one else has access to it. There are many methods of
collecting primary data and the main methods include:
16
 Questionnaire methods: it includes personal interview (face to face,
telephone) & mail interview.
 Observation: It involves recording the behavioral patterns of people,
objects and events in a systematic manner.
 Diaries: A diary is a way of gathering information about the way
individuals spend their time on professional activities. They are not
about records of engagements or personal journals of thought!
Diaries can record either quantitative or qualitative data, and in
management research can provide information about work patterns
and activities.
 Laboratory experiment: Conducting laboratory experiments on fields
of chemical, biological sciences and so on.
2.2 Methods of Data Presentation

Having collected and edited the data, the next important step is to organize it.
That is to present it in a readily comprehensible condensed form that aids in
order to draw inferences from it. It is also necessary that the like be separated
from the unlike ones.
The presentation of data is broadly classified in to the following two categories:
• Tabular presentation
• Diagrammatic and Graphic presentation.
2.2.1 Tabular Presentation
Classification is the process of arranging items/data/ in to classes or categories
according to their similarities or differences. Classification is a preliminary and it
prepares the ground for proper presentation of data. Tabular presentation of
data is presented by using Frequency distribution.
A Frequency distribution is a table that presents data according to some

criteria with the corresponding number of items following in each class (i.e. with
the corresponding frequencies)
17
A frequency distribution is essentially the classification of data in to an
appropriate number of mutually exclusive (non-overlapping) classes.
There are 3 types of Frequency distribution. These are:
1. Categorical Frequency distribution
2. Ungrouped Frequency distribution
3. Grouped Frequency distribution
There are specific procedures for constructing each type.
1) Categorical Frequency distribution: Used for data that can be place in

specific categories such as nominal, or ordinal. e.g. marital status and Letter
grade
Example 2.1: a social worker collected the following data on marital status for
25 persons.(M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types
of marital status M, S, D, and W. These types will be used as class for the
distribution. We follow procedure to construct the frequency distribution.
Step 1: Make a table as shown.
Class Tally Frequency Percent

(1) (2) (3) (4)
M
S
D
W
Step 2: Tally the data and place the result in column (2).
18
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% * 100 Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be
added since they are used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.
Class Tally Frequency Percent

(1) (2) (3) (4)
M //// 6 20
/
S //// 7 28
//
D //// 7 28
//
2) Ungrouped Frequency
W //// 5 24
distribution: Is a table of all the
potential raw score values that could possible occur in the data along with the
number of times each actually occurred. Ungrouped frequency distribution is
often constructed for small set or data on discrete variable.
Steps for constructing ungrouped frequency distribution:

 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.
Example:
The following data represent the mark of 20 students.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
19
Construct a frequency distribution, which is ungrouped.
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 / 1
76 // 2
80 /// 3
85 /// 3
90 / 1
Each individual value is presented separately, that is why it is named ungrouped

frequency distribution.
1) Grouped Frequency Distribution: is used when the range of the data is
large, the data must be grouped in to classes that are more than one unit
in width.
Definition of some basic terms

 Grouped frequency distribution: is a FD when several numbers are
grouped into one class.
 Class limits (CL): It separate one class from another. The limits could
actually appear in the data and have gaps between the upper limits of one
class and the lower limit of the next class.
 Unit of measure (U): This is the possible difference between successive
values. E.g. 1, 0.1, 0.01, 0.001, etc
 Class boundaries: Separate one class in a grouped frequency distribution
from the other. The boundary has one more decimal place than the raw
20
data. There is no gap between the upper boundaries of one class and the
lower boundaries of the succeeding class. Lower class boundary is found by
subtracting half of the unit of measure from the lower class limit and upper
class boundary is found by adding half unit measure to the upper class
limit.
 Class width (W): The difference between the upper and lower boundaries of
any consecutive class. The class width is also the difference between the
lower limit or upper limits of two consecutive class.
 Class mark (Midpoint): is the average of the lower and upper class limits
or the average of upper and lower class boundary.
 Cumulative frequency: It is the number of observation less than or greater
than the upper class boundary of class.
 CF (Less than type): it is the number of values less than the upper class
boundary of a given class.
 CF (Greater than type): it is the number of values greater than the lower
class boundary of a given class.
 Relative frequency (Rf ): The frequency divided by the total frequency. This
gives the present of values falling in that class.
Rfi = fi/n= fi/ ∑fi , where fi is frequency of ith class and n= total number of
observation or items
 Relative cumulative frequency (RCf): The running total of the relative
frequencies or the cumulative frequency divided by the total frequency gives
the present of the values which are less than the upper class boundary or
the reverse.
CRfi=Cfi/n=Cfi/∑fi
Guidelines for classes

1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This means that no data value can
fall into two different classes
21
3. The classes must be all inclusive or exhaustive. This means that all data
values must be included.
4. The classes must be continuous. There are no gaps in a frequency
distribution.
5. The classes must be equal in width.
Steps for constructing Grouped frequency Distribution

1. Find the largest and smallest values
2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use
Sturges rule k  1  3.32 log n where k is number of classes desired and n is
total number of observation.
4. Find the class width by dividing the range by the number of classes and
R
rounding up, not off. w  .
k
5. Pick a suitable starting point less than or equal to the minimum value.
The starting point is called the lower limit of the first class. Continue to
add the class width to this lower limit to get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of
the second class (i.e UCLi = LCLi -U) . Then continue to add the class width
to this upper limit to find the rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and
adding U/2 units from the upper limits. The boundaries are also half-way
between the upper limit of one class and the lower limit of the next class.
Mathematically expressed as:
LCBi = LCLi – ½ U, where LCBi is lower class boundary of the ith class
UCBi = UCLi + ½ U , where UCBi is upper class boundary of the ith class
8. Find class mark (CM)

CMi = (UCLi + LCLi )/ 2 or CMi = (UCBi + LCBi )/ 2.
9. Tally the data.
22
10. Find the frequencies.
11. Find the cumulative frequencies. Depending on what you're trying to
accomplish, it may not be necessary to find the cumulative frequencies.
12. If necessary, find the relative frequencies and/or relative cumulative
frequencies
Example: Construct a frequency distribution for the following data.

11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes desired using Sturges formula;
k =1+3.32log (20) =5.32=5(rounding up)
Step 4: Find the class width; w=R/k=33/5=6.6=7 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
 6, 13, 20, 27, 34 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=13-U=13-1=12
 12, 19, 26, 33, 40 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class
limits
6 – 12
13 – 19
20 – 26
27 – 33
34 – 40
Step 7: Find the class boundaries;

E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =12+U/2=12.5
 Then continue adding class width (w) on both boundaries to obtain the
rest boundaries. By doing so one can obtain the following classes.
23
Class boundary
5.5 – 12.5
12.5 – 19.5
19.5 – 26.5
26.5 – 33.5
33.5 – 40.5
Step 8: find the class marks (CM)

CMi = (UCLi + LCLi )/ 2, CM1 = 6+12/2 =9, then continued to add W to find
the rest class marks. So the class marks are:
9, 16 , 23, 30, 37
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:
Class Class Class Tally Freq. Cf (less Cf(more rf. rcf(less

limit boundary Mark than than than type
type) type)
6 – 12 5.5 – 12.5 9 // 2 2 20 0.1 0.10
0
13 – 19 12.5 – 19.5 16 // 2 4 18 0.1 0.20
0
20 – 26 19.5 – 26.5 23 //// // 7 11 16 0.3 0.55
5
27 – 33 26.5 – 33.5 30 //// 4 15 9 0.2 0.75
0
34 – 40 33.5 – 40.5 33 /// 3 18 5 0.1 0.90
5
2.2.2 Diagrammatic and Graphic Presentation of Data

The most convenient and popular way of describing data is using graphical
presentation. It is easier to understand and interpret data when they are
presented graphically than using words or a frequency table. A graph can
present data in a simple and clear way. Also it can illustrate the important
24
aspects of the data. This leads to better analysis and presentation of the data. In
this article, we discuss the approach for the most commonly used diagrammatic
or graphical methods.
2.2.2.1 Diagrammatic Presentation of Data
The three most commonly used diagrammatic presentation for discrete as well as
qualitative data are:
 Pie charts
 Bar charts
 Pictogram
A) Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the
percentage of frequencies in each category of the distribution. The angle of the
sector is obtained using:
Example: Draw a suitable diagram to represent the following population in a

town.
Men Women Girls Boys
2500 2000 4000 1500
Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.
Step 3: Using a protractor and compass, graph each section and write its name
corresponding percentage.
Class Frequency Percent Degree

Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54
Figure1. Pie chart of the population in a town
25
B) Pictogram: is a device used to represent data by means of pictures or
small symbols. We decide about a suitable picture to represent a definite
number of units in which the variable is measured.
Example: The following table shows the orange production in a plantation
from production year 1990-1993. Represent the data by a pictogram.
Production 1990 1991 1992 1993
year
Amount (in 3000 3850 3500 5000
kg)
Figure 2: Pictogram of the data on Orange productions from 1990 to 1993.
26
C) Bar Charts: Used to represent & compare the frequency distribution of
discrete variables and attributes or categorical series. Bars can be drawn
either vertically or horizontally.
In presenting data using bar diagram,
 All bars must have equal width and the distance between bars must be
equal.
 The height or length of each bar indicates the size (frequency) of the figure
represented.
There are different types of bar charts. The most common being:
 Simple bar chart

 Component or sub divided bar chart.
 Multiple bar charts.
I. Simple bar chart
 Are used to display data on one variable.
 They are thick lines (narrow rectangles) having the same breadth. The
magnitude of a quantity is represented by the height /length of the bar.
Example: Draw a bar chart for the following coffee production data from 1990 to
1995.
Year 1990 1991 1992 1993 1994 1995
Amount (in 50 75 92 64 100 120

1000 tones)
Figure 3: Production of coffee from 1990 to 1995
27
120
Amount of coffee in 1000 tons

100
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
II. Component Bar Chart:

When there is a desire to show how a total (or aggregate) is divided in to its
component parts, we use component bar chart. The bars represent total value of
a variable with each total broken in to its component parts and different colors
or designs are used for identifications
Example: The following data represent sale by product, 1957- 1959 of a given
company for three products A, B, C.
Product Sales($) Sales($) Sales($)

In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
Draw a component bar chart to represent the sales by product from 1957 to
1959.
Figure3. Component bar chart of sales by product from 1957 to 1959.
28
SALES BY PRODUCT 1957-1959
100
80
Sales in $
Product C
60
Product B
40
Product A
20
0
1957 1958 1959
Year of production
III. Multiple Bar charts: These are used to display data on more than one
variable. They are used for comparing different variables at the same
time.
Example: Draw a multiple bar chart to represent the sales by product from 1957
to 1959.
Figure4. Multiple bar charts sales by product from 1957 to 1959.
Sales by product 1957-1959
60
50
Sales in $
40 Product A
30 Product B
20 Product C
10
0
1957 1958 1959
Year of production
2.2.2.2 Graphical Presentation of Data

The histogram, frequency polygon and cumulative frequency graph or ogive is
most commonly applied graphical representation for continuous data.
29
Procedures for constructing statistical graphs:
 Draw and label the x and y axes.
 Choose a suitable scale for the frequencies or cumulative frequencies and
label it on the y-axes.
 Represent the class boundaries for the histogram or ogive or the mid points
for the frequency polygon on the x-axes.
 Plot the points.
 Draw the bars or lines to connect the points.
i. Histogram: is a graph which displays the data by using vertical bars of
various heights to represent frequencies. Class boundaries are placed along
the horizontal axes. Class marks and class limits are sometimes used as
quantity on the x-axis.
Example: Construct a histogram for the frequency distribution of the time spent
by the automobile workers. The frequency distribution is:
Time (class boundaries) Class mark Number of workers
15.5-16.5 18.5 3
16.5-27.5 24.5 6
27.5-33.5 30.5 8
33.5-39.5 36.5 4
39.5-45.5 42.5 3
45.5-51.5 48.5 1
Figure5. The time in minutes spent by automobile workers to travel from home
to work.
30
ii. Frequency polygon
Frequency polygon is a line graph. The frequency is placed along the vertical axis
and classes mid points are placed along the horizontal axis. Add two classes with
zero frequencies at the two ends of the frequency distribution; this is to make it
a complete polygon.
Example: Construct a frequency polygon for the frequency distribution of the

time spent by the automobile workers.
Figure 5: The time in minutes spent by automobile workers to travel from home
to work.
iii. Ogive (cumulative frequency polygon):
Ogive is a graph plotting the cumulative frequencies of a distribution against the

boundaries. There are two type of Ogive namely less than Ogive and more than
Ogive. Less than Ogive plotted against upper class boundaries and more than
31
Ogive plotted against lower class boundaries. That is class boundaries are plotted
along the horizontal axis and the corresponding cumulative frequencies are
plotted along the vertical axis. The points are joined by a free hand curve.
Exercise: Construct an ogive for the time spent by the automobile workers
Cheek list of chapter two
Dear student make sure that you are able to define, explain, describe,
distinguish differences or similarities among the terms considered in the chapter
two:
 Cheek yourself whether you are able to describe the different method of data
collection
 Cheek yourself whether you are able to distinguish between Primary data
and secondary data
 Cheek yourself whether you are able to define:
 Raw Data
 Frequency
 Frequency distribution
 Cheek yourself whether you are able to construct ungrouped (discrete),
grouped (continuous), relative and cumulative frequency distribution for raw
data.
 Cheek yourself whether you are able to compute class mark, class width,
class limits, class boundaries, relative frequency and cumulative frequency.
 Cheek yourself whether you are able to present numerical data using
suitable graphs or diagrams such as histograms, frequency polygon, ogive,
pie chart, pictogram and bar charts(i.e simple bar chart, component bar chart
and multiple bar chart ).
Review Exercise on Chapter Two

1. Which of the diagrams is most appropriate for each of the following data set?
Draw the diagrams.
a. Students enrolled to a certain department from year 1 to 3
32
No. of students
Year Male Female
1 50 20
2 45 15
3 40 10
2. Suppose data collected for heights (in cms) 0f 390 cows were tabulated in a
frequency distribution and the following results were obtained.
fi: 6, 25, 48, 72, 116, 60, 38, 22, 3
CM1 =112, CM2=117 where CMi ith class mark
Determine:
i. the class interval size (class width)
ii. the class limits
iii. class boundaries
iv. class marks
v. the less than cumulative frequency distribution
vi. the class intervals having the highest frequency
vii. Above which height do we find 50% of the cows?
viii. Below which height do we get 25% of the cows?
Draw
A. histogram
B. a frequency polygon
C. a less than ogive for the above data
CHAPTER 3
3. MEASURES OF CENTERAL TENDENCY
Objectives
At the end of this chapter students will be able to:

 Identify types of measure of central tendency
 Understand the data easily.
33
 Define and calculate the mean, mode, median, quartiles, deciles and
percentiles with their interpretation.
 Summarize an aggregate of statistical data by using single measure and
make comparison
3.1 Introduction
On the scale of values of a variable there is a certain stage at which the largest
number of items tends to cluster/center around. Since this stage is usually in
the center of distribution, the tendency of the statistical data to get concentrated
at this stage/value is called" central tendency―. The various measures
determining the actual value at which the data tends to concentrate are called
measures of central tendency. So, a measure of central location is the single
value that best represents the whole series. This single value is called the
average of the group. An average which is representative is called typical average
and an average which is not representative and has only a theoretical value is
called a descriptive average.
A typical average should possess the following:
 It should be rigidly defined.

 It should be based on all observation under investigation.
 It should be as little as affected by extreme observations.
 It should be capable of further algebraic treatment.
 It should be as little as affected by fluctuations of sampling.
 It should be ease to calculate and simple to understand.
The Summation Notation (
Statistical Symbols: Let X1, X2 ,X3 …XN be a number of measurements where N is

the total number of observation and Xi is ith observation. Very often in statistics
an algebraic expression of the form X1+X2+X3+...+XN is used in a formula to
34
compute a statistic. It is tedious to write an expression like this very often, so
mathematicians have developed a shorthand notation to represent a sum of
scores, called the summation notation.
N
 The symbol X
i 1
i is a mathematical shorthand for X1+X2+X3+...+XN
The expression is read, "the sum of X sub i from i equals 1 to N." It means "add
up all the numbers."
Example 3.1: Suppose the following were scores made on the first homework
assignment for five students in the class: 5, 7, 7, 6, and 8. in this example set of
five numbers, where N=5, the summation could be written:
The "i=1" in the bottom of the summation notation tells where to begin the
sequence of summation. If the expression were written with "i=3", the
summation would start with the third number in the set.
For example:
Sometimes if the summation notation is used in an expression and the

expression must be written a number of times, as in a proof, then a shorthand
notation for the shorthand notation is employed. When the summation sign "∑"
is used without additional notation, then "i=1" and "N" are assumed. For
example:
35
Properties of summation
n
 k  nk
i 1
where k is any constant
where k is any constant

n n
 kX
i 1
i  k X i
i 1
where a and b are any constant

n n
 (a  bX
i 1
i )  na  b X i
i 1
n n n
 ( X i  Yi )   X i  Yi
i 1 i 1 i 1
Example 3.2: considering the following data determine

X Y
5 6
7 7
7 8
6 7
8 8
5 5
a)  Xi
i 1
e) (X
i 1
i  Yi )
5 5
b) Y
i 1
i f) X Y
i 1
i i
5 5
c) 10 g) X
2
i
i 1 i 1
5 5 5
d)  ( X i  Yi ) h) ( X i )( Yi )
i 1 i 1 i 1
Solutions:
5
a) X
i 1
i  5  7  7  6  8  33
5
b) Y
i 1
i  6  7  8  7  8  36
5
c) 10  5 *10  50
i 1
36
5
d) (X
i 1
i  Yi )  (5  6)  (7  7)  (7  8)  (6  7)  (8  8)  69  33  36
5
e) (X
i 1
i  Yi )  (5  6)  (7  7)  (7  8)  (6  7)  (8  8)  3  33  36
5
f) X Y
i 1
i i  5 * 6  7 * 7  7 * 8  6 * 7  8 * 8  241
5
g) X  5 2  7 2  7 2  6 2  8 2  223
2
i
i 1
5 5
h) ( X i )( Yi )  33 * 36  1188
i 1 i 1
3.2 Types of Measures of Central Tendency

There are several different measures of central tendency; each has its advantage
and disadvantage.
 The Mean (Arithmetic, Geometric and Harmonic)
 The Mode
 The Median
 Quantiles (Quartiles, Deciles and Percentiles)
The choice of these averages depends up on which best fit the property under
discussion.
1.2.1 Arithmetic Mean: Is defined as the sum of the magnitude of the

items divided by the number of items. The mean of X1, X2 ,X3 …Xn is
denoted by A.M ,m or X and is given by:
X 1  X 2  ...  X n
X 
n
n
X i
X  i 1
n
If X1 occurs f1 times, if X2occurs f2 times, … , if Xn occurs fn times
k
fX i i
Then the mean will be X i 1
k , where k is the number of classes
f
i 1
i
k
and f
i 1
i n
Example 3.3: Obtain the mean of the following number

2, 7, 8, 2, 7, 3, 7
37
Solution:
Xi fi Xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
f i Xi
36
X  i 1
4
  5.15
f
7
i
i 1
Arithmetic Mean for Grouped Data
If data are given in the shape of a continuous frequency distribution, then the
mean is obtained as follows:
k
f i Xi
, Where Xi =the class mark of the i class and fi = the frequency of
th
X  i 1
k
f i 1
i
the ith class
Example 3.4: calculate the mean for the following age distribution.
Class frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
Solutions:
 First find the class marks
 Find the product of frequency and class marks
 Find mean using the formula.
Class fi Xi Xifi
6- 10 35 8 280
38
11- 15 23 13 299
16- 20 15 18 270 6
21- 25
26- 30
12
9
23
28
276
252
f X i i
1575
►X  i 1
  15.75
31- 35 6 33 198 6
f
100
Total 100 1575 i 1
i
Special properties of Arithmetic mean
1. The sum of the deviations of a set of items from their mean is always zero.
n
i.e.  ( X i  X )  0.
i 1
2. The sum of the squared deviations of a set of items from their mean is the
n n
minimum. i.e.  ( Xi  X )2   ( X i  A)2 , A  X
i 1 i 1
3. If X 1 is the mean of n1 observations, if X 2 is the mean of n 2 observations, … ,

if X k is the mean of n k observation, then the mean of all the observation in
all groups often called the combined mean is given by:
k
X n  X 2 n 2  ....  X k n k X n i i
Xc  1 1  i 1
n1  n 2  ...n k
k
n
i 1
i
Example 3.5: In a class there are 30 females and 70 males. If females averaged
60 in an examination and boys averaged 72, find the mean for the entire class.
Solutions:
Females Males
X 1  60 X 2  72
n1  30 n2  70
2
X 1n1  X 2 n2 X n i i
Xc   i 1
n1  n2 2
ni 1
i
30(60)  70(72) 6840

 Xc    68.40
30  70 100
4. If a wrong figure has been used when calculating the mean the correct mean
can be obtained without repeating the whole process using:
39
(CorrectValue  WrongValue)
CorrectMean  WrongMean 
n
Where n is total number of observations.
Example 3.6: An average weight of 10 students was calculated to be 65.Latter
it was discovered that one weight was misread as 40 instead of 80 kg.
Calculate the correct average weight.
Solutions:
(CorrectValue  WrongValue)
CorrectMean  WrongMean 
n
(80  40)
CorrectMean  65   65  4  69k.g.
10
5. The effect of transforming original series on the mean.

a) If a constant k is added/ subtracted to/from every observation then
the new mean will be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new
mean will be k*old mean
Example 3.7:
1. The mean of n Tetracycline Capsules X1, X2, …, Xn are known to be 12

gm. New set of capsules of another drug are obtained by the linear
transformation Yi = 2Xi – 0.5 ( i = 1, 2, …, n ) then what will be the
mean of the new set of capsules.
Solutions:
NewMean  2 * OldMean  0.5  2 * 12  0.5  23.5
2. The mean of a set of numbers is 500.
a) If 10 is added to each of the numbers in the set, then what will be the
mean of the new set?
b) If each of the numbers in the set are multiplied by -5, then what will be
the mean of the new set?
40
Solutions:
a).NewMean  OldMean  10  500  10  510
b).NewMean  5 * OldMean  5 * 500  2500
Weighted Arithmetic Mean

While calculating simple arithmetic mean, all items were assumed to be of
equally importance (each value in the data set has equal weight). When the
observations have different weight, we use weighted average. Weights are
assigned to each item in proportion to its relative importance.
If , represent values of the items and , are the

corresponding weights, then the weighted mean, ( ) is given by
w1 x1  w2 x2    wn xn  wi xi
xw  
w1  w2    wn  wi
Example 3.8:
A student‘s final mark in Mathematics, Physics, Chemistry and Biology are
respectively A, B, D and C. If the respective credits received for these courses are
4, 4, 3 and 2, determine the approximate average mark the student has got for
the course.
Solution
We use a weighted arithmetic mean, weight associated with each course being
taken as the number of credits received for the corresponding course.
4 3 1 2 Total
4 4 3 2 13
16 12 3 4 35
w1 x1  w2 x2    wn xn  wi xi
xw  
w1  w2    wn  wi
= = = 2.69, Average mark of the student is approximately 2.69.
41
Merits and Demerits of Arithmetic Mean
Merits:
 It is based on all observation.
 It is suitable for further mathematical treatment.
 It is stable average, i.e. it is not affected by fluctuations of sampling to some
extent.
 It is easy to calculate and simple to understand.
Demerits:
 It is affected by extreme observations.
 It cannot be used in the case of open end classes.
 It cannot be determined by the method of inspection.
 It cannot be used when dealing with qualitative characteristics, such as
intelligence, honesty, beauty.
3.2.2 Geometric Mean
The geometric mean like arithmetic mean is calculated average. It is used when
observed values are measured as ratios, percentages, proportions, indices or
growth rates.
The geometric mean, G.M. of a set of n observation , is defined as the
nth root of their product.
G.M  n x1 .x2  xn = antilog ( )
Taking the logarithms of both sides

1
log(G.M)  log(n X 1 * X 2 * ... * X n )  log(X 1 * X 2 * ... * X n ) n
1 1
 log(G.M)  log(X 1 * X 2 * .... * X n )  (log X 1  log X 2  ...  log X n )
n n
n
1
 log(G.M)   log X i
n i1
 The logarithm of the G.M of a set of observation is the arithmetic mean of their
logarithm.
1 n
 G.M  Anti log(  log X i )
n i1
Example 3.9:
Find the G.M of the numbers 2, 4, 8.
42
Solutions:
G.M  n X1 * X2 * ... * Xn  3 2 * 4 * 8  3 64  4
Geometric mean for discrete data arranged in FD:- When the numbers ,
occur with frequencies , , respectively, then the geometric mean
is obtained by
G.M .  n x1f1 .x2f2 ..xmfm = antilog ( )
Example 3.9
Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6
Frequency 2 3 1 2
G.M. = = 4.236
The geometric mean for the given data is 4.236.

Geometric mean for grouped data: The above formula can also be used
whenever the frequency distribution is grouped continuous, class marks of the
class intervals are considered as xi.
Properties of geometric mean
 It is less affected by extreme values.
 It takes each and every observation into consideration.
 If the value of one observation is zero its values becomes zero.
3.2.3 Harmonic Mean
It is a suitable measure of central tendency when the data pertains to speed, rate
and time. The harmonic of n values is defined as n divided by the sum of their
reciprocal.
Harmonic mean for individual series:- If , are n observations, then
harmonic mean can be represented by the following formula:
43
n
H .M 
1 1 1
 
x1 x2 xn
Example 3.10: A cyclist pedals from his house to his college at speed of 10
km/hr and back from the college to his house at 15 km/hr. Find the average
speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
2
H.M   12km/hr
1 1

10 15
Harmonic mean for discrete data:- If the data is arranged in the form of
frequency distribution
n
H .M  , where n   f k
m
f1 f 2 f
  m k 1
x1 x 2 xm
Harmonic mean for continuous grouped data: Whenever the frequency

distribution are grouped continuous, class marks of the class intervals are
considered as and the above formula can be used as
H.M. = where n   f k
m
k 1
is the class mark of ith class
Properties of harmonic mean

 It is unique for a given set of data.
 It takes each and every observation into consideration.
 Difficult to calculate and understand.
 Appropriate measure of central tendency in situations where data is in
ratio, speed or rate.
44
3.3.4 The Mode or Modal Value
The mode or the modal value is the value with the highest frequency and
denoted by . The mode may not exist and even if it does exist, it may not be
unique. A distribution is called a bimodal distribution if it has two data values

that appear with the greatest frequency. If a distribution has more than two
modes, then the distribution is multimodal. If a distribution has no modes, then
the distribution is non-modal.
Mode for ungrouped data: In case of discrete distribution the value having the
maximum frequency is the modal value.
Examples 3.11:
1. Find the mode of 5, 3, 5, 8, 9
The Mode ( ) =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5.
It is a bimodal ( ): 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7.
No mode for this data.
Mode for Grouped Data
If data are given in the shape of continuous frequency distribution, the mode is
defined as:
 1 
X̂  L mo  w 
 1   2 
Where:
Xˆ  the mod e of the distribution
w  the sizeof the mod al class
1  f mo  f1
 2  f mo  f 2
f mo  frequencyof the mod al class
f1  frequencyof the class preceedingthe mod al class
f 2  frequencyof the class followingthe mod al class
Note: The modal class is a class with the highest frequency.
Example 3.12: Following is the distribution of the size of certain farms selected
at random from a district. Calculate the mode of the distribution.
45
Size of farms No. of
farms
5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3
Solutions:
45  55 is the mod al class, sin ce it is a class with the highest frequency.

Lmo  45
w  10
1  f mo  f1  2
 2  f mo  f 2  26
f mo  31
f1  29
f2  5
 Xˆ  45  10
2 

 2  26 
 45.71
Merits and Demerits of Mode
Merits:
 It is not affected by extreme observations.
 Easy to calculate and simple to understand.
 It can be calculated for distribution with open end class
Demerits:
 It is not rigidly defined.
 It is not based on all observations
 It is not suitable for further mathematical treatment.
 It is not stable average, i.e. it is affected by fluctuations of sampling
to
some extent.
 Often its value is not unique.
46
Note: being the point of maximum density, mode is especially useful in finding
the most popular size in studies relating to marketing, trade, business, and
industry. It is the appropriate average to be used to find the ideal size.
3.2.5 Median
The median is as its name indicates the middle most value in the arrangement
which divides the data into two equal parts. It is obtained by arranging the data
in an increasing or decreasing order of magnitude. If X1, X2, …Xn be the
observations, then the numbers arranged in ascending order will be X[1], X[2],
…X[n], where X[i] is ith smallest value ( i.e. X[1]< X[2]< …<X[n] )
~
Median is denoted by X .
Median for ungrouped data: We arrange the sample in ascending order of the
variable of interest. Then if the sample size n is odd the median is the middle
value or the sample size n is even the median is the average of the two middle
values.
The median is obtained by
 X ( n1) 2  th , If n is odd.
~ 
X  1
(X  X
n 2
)th If n is even

2  ( n 2)  1

Example: Find the median of the following numbers.

a) 6, 5, 2, 8, 9, 4.
b) 2, 1, 8, 3, 5
Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9 b) Order the data: 1, 2, 3, 5, 8 Here
n=5, which is odd. Here n=6, which is even, then the middle value is the 3rd
observation. So the median is 3
~ 1
X  (X n  X n )
2 [2] [  1]
2
1
 ( X [3]  X [ 4 ] )
2
1
 ( 5  6)  5.5
2
47
Median for grouped data: If data are given in the shape of continuous frequency
distribution, the median is defined as:
~ w n
X  L med  (  c)
f med 2
Where :
L med  lower class boundary of the median class.
w  the size of the median class
n  total number of observations.
c  the cumulativefrequency(less than type) preceeding the median class.
f med  thefrequency of the median class.
Remark:
The median class is the class with the smallest cumulative frequency (less than
n
type) greater than or equal to .
2
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
 First find the less than cumulative frequency.
 Identify the median class.
 Find median using formula.
Class Frequency Cumu.Freq(less

than type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75
48
n 75
  37.5
2 2
39 is the first cumulative frequencyto be greater thanor equalto 37.5
 50  54 is the median class.
L  49.5, w  5
med
n  75, c  17, f  22
med
~
 X L  w ( n  c)
med f 2
med
 49.5  5 (37.5  17)
22
 54.16
Merits and Demerits of Median
Merits:
 Median is a positional average and hence not influenced by extreme
observations.
 Can be calculated in the case of open end intervals.
 Median can be located even if the data are incomplete.
Demerits:
 It is not a good representative of data if the number of items is small.
 It is not amenable to further algebraic treatment.
 It is susceptible to sampling fluctuations.
Remark: In the case of symmetrical distribution; mean, median and mode

coincide. That is, mean=median = mode. However, for a moderately asymmetrical
(nonsymmetrical) distribution, mean and mode lie on the two ends and median
lies between them and they have the following important empirical relationship,
which is (Mean – Mode) = 3(Mean - Median).
49
1.2.6 Quantiles
When a distribution is arranged in order of magnitude of items, the median is
the value of the middle term. Their measures that depend up on their positions
in distribution quartiles, deciles, and percentiles are collectively called quantiles.
1.2.6.1 Quartiles
Quartiles are measures that divide the frequency distribution in to four equal
parts. The value of the variables corresponding to these divisions are denoted Q1,
Q2, and Q3 often called the first, the second and the third quartile respectively.
Q1 is a value which has 25% items which are less than or equal to it. Similarly
Q2 has 50%items with value less than or equal to it and Q3 has 75% items whose
values are less than or equal to it.
iN
To find Qi (i=1, 2, 3) we count of the classes beginning from the lowest class.
4
For grouped data: we have the following formula

w ( iN  c) , i  1,2,3
Q
i  LQ i  f 4
Qi
Where :
L  lower class boundary of the quartile class.
Qi
w  the size of the quartile class
N  total number of observations.
c  the cumulative frequency (less than type) preceeding the quartile class.
f  thefrequency of the quartile class.
Qi
Remark:
The quartile class (class containing Qi ) is the class with the smallest cumulative
iN
frequency (less than type) greater than or equal to .
4
1) Deciles: Deciles are measures that divide the frequency distribution in to

ten equal parts.
50
The values of the variables corresponding to these divisions are denoted D1, D2,..
D9 often called the first, the second,…, the ninth deciles respectively.
iN
To find Di (i=1, 2,..9) we count of the classes beginning from the lowest class.
10
w iN
Di  LD i  (  c) , i  1,2,...,9
f Di 10
Where :
LDi  lower class boundaryof the decile class.
w  the size of the decileclass
c  the cumulative frequency (less than type) preceeding the decile class.
f Di  thefrequency of the decile class.
Remark:
The deciles class (class containing Di) is the class with the smallest cumulative
iN
frequency (less than type) greater than or equal to .
10
2) Percentiles: Percentiles are measures that divide the frequency

distribution in to hundred equal parts. The values of the variables
corresponding to these divisions are denoted P1, P2,.. P99 often called the first,
the second,…, the ninety-ninth percentile respectively.
iN
To find Pi (i=1, 2,..99) we count of the classes beginning from the lowest
100
class.
51
w iN
P L  (  c) , i  1,2,...,99
i P f 100
i P
i
Where :
L  lower class boundary of the percentile class.
P
i
w  the size of the percentile class
c  the cumulative frequency (less than type) preceeding the percentile class.
f  thefrequency of the percentile class.
P
i
Remark:
The percentile class (class containing Pi) is the class with the small
cumulative frequency
iN
(less than type) greater than or equal to .
100
Example: Considering the following distribution
Calculate:
a) All quartiles.
b) The 7th decile.
c) The 90th percentile.
Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:
 First find the less than cumulative frequency.
52
 Use the formula to calculate the required quantile.
Values Frequency Cum.Freq(less

than type)
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493
a) Quartiles:
i. Q1
- determine the class containing the first quartile.
N
 123.25
4
 170  180 is the class containingthe first quartile.
LQ  170 ,
1
w 10
N  493 , c  88 , f Q  72
1
w N
 Q1  LQ1  (  c)
fQ 4
1
10
 170  (123.25  88)
72
 174.90
ii. Q2
- determine the class containing the second quartile.
2* N
 246.5
4
 190  200 is the class containingthe sec ond quartile.
53
LQ  190 ,
2
w 10
N  493 , c  244 , f Q 107
2
w 2* N
 Q2  LQ  (  c)
2
fQ
2
4
10
 170  (246.5  244)
72
 190.23
iii. Q3
- determine the class containing the third quartile.
3* N
 369.75
4
 200  210 is the class containingthe third quartile.
LQ  200 ,
3
w 10
N  493 , c  351 , f Q  49
3
w 3* N
 Q3  LQ 3  (  c)
fQ
3
4
10
 200  (369.75  351)
49
 203.83
b) D7
- determine the class containing the 7th decile.
7* N
 345.1
10
190  200 is the class containingthe seventh decile.
LD  190 ,
7
w 10
N  493 , c  244 , f D 107
7
54
w 7* N
 D7  LD  (  c)
7
f D 10
7
10
 190  (345.1  244)
107
 199.45
c) P90
- determine the class containing the 90th percentile.
90 * N
 443.7
100
 220  230 is the class containingthe 90th percentile.
LP  220 ,
90
w 10
N  493 , c  434 , f P  3107
90
w 90 * N
 P90  LP  (  c)
90
f P 100
90
10
 220  (443.7  434)
31
 223.13
Review Exercise on Chapter Three
1. Given the data 5,6,7,4,9,10,12,20,3,8 if each item is multiplied by 5 and 6 is added

then, the new mean will be_________
2. Marks of 75 students are summarized in the following frequency distribution:
Marks No. of
students
0-44 7
45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
55
70-74 3 If 20% of the students have marks between 55 and 59
a. Find the missing frequencies f4 and f5.

b. Find the mean.
c. Median
d. Mode
3. The following data on income in the form of cumulative frequency distribution

is given:
INCOME NO.OF PERSONS
100---200 15
100---300 33
100---400 63
100---500 83
100---600 100
Find (a) The mean
(b) The median, mode and all quartiles
(c) The 2nd and the 8th deciles
(d) The 40th and the 90th percentiles
CHAPTER FOUR
5. MEASURES OF VARIATION
Objectives:
After completing this chapter, you should be able to
 Explain the meaning of measures of dispersion
 Describe data, using measures of variation, such as the range, mean
deviation, variance and standard Deviation.
 Understand the characteristics, uses, advantages, and disadvantages of
each measure of dispersion.
 Understand Chebyshev's theorem and the Empirical Rule as they relate to
a set of observations.
56
 Apply the Z-score to find out the relative standing of values.
 Explain measures of skewness and kurtosis.

 Identify the position of the mean, median, and mode for both symmetric
and skewed distributions.
4.1 Introduction
In addition to locating the center of the observed values of the variable in the
data, another important aspect of a descriptive study of the variable is
numerically measuring the extent of variation around the center. Two data sets
of the same variable may exhibit similar positions of center but may be
remarkably different with respect to variability.
Just as there are several different measures of center, there are also several
different measures of variation. In this section, we will examine three of the most
frequently used measures of variation; the sample range, the sample
interquartile range and the sample standard deviation. Measures of variation are
used mostly only for quantitative variables.
4.2 Objectives of Measuring Variation

The general object of measuring dispersion is to obtain a single summary figure
which adequately exhibits whether the distribution is compact or spread out.
 To judge the reliability of measures of central tendency
 To control variability itself.
 To compare two or more groups of numbers in terms of their variability.
 To make further statistical analysis.
57
4.3 Absolute and Relative Measures of Dispersion
The measures of dispersion which are expressed in terms of the original unit of a
series are termed as absolute measures. Such measures are not suitable for
comparing the variability of two distributions which are expressed in different
units of measurement and different average size. Relative measures of
dispersions are a ratio or percentage of a measure of absolute dispersion to an
appropriate measure of central tendency and are thus pure numbers
independent of the units of measurement. For comparing the variability of two
distributions (even if they are not measured in the same unit), we compute the
relative measure of dispersion instead of absolute measures of dispersion.
It is useful for comparing variation in two or more distributions where units of

measurements are the same. Various measures of dispersions are in use. The
most commonly used measures of dispersions are:
1. Range and Relative Range

2. Quartile Deviation and Coefficient of Quartile Deviation
3. Mean Deviation and Coefficient of Mean Deviation
4. Standard Deviation
and Coefficient of Variation.
4.3.1 The Range and Relative Range

The Range (R): The range is the largest score minus the smallest score. It is a
quick and dirty measure of variability, although when a test is given back to
students they very often wish to know the range of scores. Because the range is
greatly affected by extreme scores, it may give a distorted picture of the scores.
Range for grouped frequency distribution is the upper class boundary of the last
class interval minus the lower class boundary of the first class interval, i.e., R =
UCBlci - LCBfci .
The following two distributions have the same range, 13, yet appear to differ
greatly in the amount of variability.
58
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
For this reason, among others, the range is not the most important measure of
variability.
Range for grouped data:

If data are given in the shape of continuous frequency distribution, the range is
computed as:
R  UCLk  LCL1 , UCLk is upperclasslim it of the last class.

UCL1 is lower class lim it of the first class.
This is sometimes expressed as:
R  X k  X1 , X k is class mark of the last class.

X 1 is classmark of the first class.
Merits and Demerits of range
Merits:
 It is rigidly defined.
 It is easy to calculate and simple to understand.
Demerits:
 It is not based on all observation.
 It is highly affected by extreme observations.
 It is affected by fluctuation in sampling.
 It cannot be computed in the case of open end distribution.
 It is very sensitive to the size of the sample.
Relative Range (RR): It is also sometimes called coefficient of range and given
by:
CR = (highest value – smallest value)/(highest value + smallest value)
Example:
59
1. Find the relative range of the above two distribution. (Exercise!)
2. If the range and relative range of a series are 4 and 0.25 respectively. Then
what is the value of:
a. Smallest observation
b. Largest observation
Solution: (2)
R  4  L  S  4 __________ _______(1)
RR  0.25  L  S  16 __________ ___( 2)
Solving (1) and (2) at the same time , one can obtain the following value
L  10 and S  6
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation

The Quartile Deviation (Semi-inter quartile range, Q.D): The inter quartile
range is the difference between the third and the first quartiles of a set of items
and semi-inter quartile range is half of the inter quartile range.
Q.D =
Coefficient of Quartile Deviation (C.Q.D):
C.Q.D = =
Remark: Q.D or C.Q.D includes only the middle 50% of the observation.
4.3.3 The Mean Deviation And Coefficient Of Mean Deviation

The Mean Deviation (M.D): The mean deviation of a set of items is defined as
the arithmetic mean of the values of the absolute deviations from a given
average. Depending up on the type of averages used we have different mean
deviations.
a) Mean Deviation about the mean
MD = .
60
For the case of a frequency distribution data where the values X1, X2, X3, …, Xm
occur f1, f2, f3, …, fm times respectively, then mean deviation is obtained by:
MD = .
For grouped data that is if the data is given in the form of frequency distribution
of K-classes in which mi and fi are the class marks and frequency of the ith class
respectively then the mean deviation is given by: MD = .
1
b. Mean deviation from median =
n
 | xi  Md . |
1
c. Mean deviation from mode =
n
 | xi  mod e |
 In the case of frequency distribution:
1
b‘. Mean deviation from median =
n
 fi | xi  Md . |
1
c‘. Mean deviation from mode =
n
 fi | xi  Mode. |
Steps to calculate M.D:
1. Find the arithmetic mean,

2. Find the deviations of each reading from and
3. Find the arithmetic mean of the deviations, ignoring sign.
Example: calculate the mean deviation for the following data:

Xi 10 8 9 7 6
Fi 8 9 13 6 3
Solution: first find the mean as = = (10*8 + 8*9 +…+6*3)/(8+9+…+3) =
8.4, then
Xi 10 8 9 7 6
fi 8 9 13 6 3
│Xi - │ 1.6 0.6 0.4 1.4 2.4
fi │Xi - │ 12.8 7.8 3.6 8.4 7.2
61
Thus, MD = = (12.8 + …+ 7.2)/ (8+…+3) = 39.8/39 =1.02.
Interpretation: each value deviates on average 1.02 from the arithmetic

mean, 8.4.
Note: You can also calculate the mean deviation about the Median and Mode.
Coefficient of Mean Deviation (C.M.D):
CMD = .
Exercise: find the coefficient of mean deviation about the mean for the above
example.
4.2.4 The Variance, Standard Deviation and the Coefficient of Variation

The Variance: is the "average squared deviation from the mean" and it
measures the average of the square of the deviations from the mean for each
observations.
Suppose we have population of N observations, say X1, X2, X3, …, XN, then we
define the population variance as:
= = .
But most of the time we have sample of n observations, say X1, X2, X3, …, Xn
from the population of N, then we define the sample variance as:
= .
This measure of variation is universally used to show the scatter of the

individual measurements around the mean of all the measurements in a given
distribution. But the disadvantage is that the units of variance are the square of
the units of the original observations. The easiest way for this difficulty is to use
the square root of the variance as a measure of variability called the standard
deviation.
62
The population and the sample standard deviations denoted by σ and S
respectively are defined as: σ = and S = =
For the case of frequency distribution data the population and sample variance
are given as:
= and =
and the square roots of these will give the corresponding standard deviations.
Variance and Standard Deviation for Grouped Data
To obtain the variance and standard deviation of data presented in a grouped
frequency distribution, we make the same assumptions that made in the
calculation of the mean for grouped data in which each value falling in to a class
is identically distributed and observations in each class represented by the
class mark. The calculation is the same to the formula of data given in frequency
distribution except that Xi is substitute by the mid points of each class and m by
k.
The following steps are used to calculate the sample variance:
1. Find the arithmetic mean.

2. Find the difference between each observation and the mean.
3. Square these differences.
4. Sum the squared differences.
5. Since the data is a sample, divide the number (from step 4 above) by the
number of observations minus one, (i.e., n-1), where n is the number of
observations in the data set.
63
Example: Areas of spray able surfaces with DDT from a sample of 15 houses are
as follows (m2): 101, 105, 110, 114, 115, 124, 125, 125, 130, 133, 135, 136,
137, 140, 145. Find the variance and standard deviation of the above
distribution.
Solution: The mean of the sample is 125 m2, then
S2 = = {(101-125)2 +(105-125)2 + ….(145-125)2 } / (15-1) =
178.71m4
Hence, the standard deviation = S = (178.71m4)1/2 = 13.37 m2.
It implies that each spray surface of the house deviates from the mean by
13.37 m2 on average.
Examples: Find the variance and standard deviation of the following sample
data
a) 5, 17, 12, 10.

b) The data is given in the form of grouped frequency distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions: a) = 11
Xi 5 10 12 17 Total
(Xi- )2 36 1 1 36 74
64
Then S2 = 74/(4-1) = 24.67 and S = (24.67) ½ = 4.97
b) = 55
mi(midpoint) 42 47 52 57 62 67 72 Total
fi(mi- )2 1183 640 198 60 588 864 867 4400
Then S2 = 4400 /(75-1) = 59.46 and S = (59.46) ½ = 7.71
Some Important Properties of Variance and Standard Deviation

1. For normal (symmetric) distribution the following holds.
 Approximately 68.27% of the data values fall within one standard deviation of the
mean. i.e. with in ( X  S , X  S )
 Approximately 95.45% of the data values fall within two standard deviations of the
mean. i.e. with in ( X  2S , X  2S )
 Approximately 99.73% of the data values fall within three standard deviations of the
mean. i.e. with in ( X  3S , X  3S )
2. Chebyshev's Theorem
For any data set ,no matter what the pattern of variation, the proportion of the
values that fall within k standard deviations of the mean or ( X  kS , X  kS ) will
1
be at least 1  , where k is a number greater than 1. i.e. the proportion of items
k2
1
falling beyond k standard deviations of the mean is at most
k2
Example: Suppose a distribution has mean 50 and standard deviation 6. What
percent of the numbers are:
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.
Solutions:
65
a) 38 and 62 are at equal distance from the mean,50 and this distance is 12
 ks  12
12 12
k   2
S 6
1
 Applying the above theorem, at least (1  ) *100%  75% of the numbers lie
k2
between 38 and 62.
b) Similarly done.
1
c) It is just the complement of a) i.e. at most * 100%  25% of the numbers lie
k2
less than 32 or more than 62.
d) Similarly done.
3. Consider a sample X1, ….., Xn, which will be referred to as the original
sample. To create a translated sample X1+C, add a constant C to each data
point. Let Yi = Xi+C, i = 1, …., n. Suppose we want to compute the standard
deviation of the translated sample, we can show that the following
relationship holds: If Yi = Xi + C, i = 1, …., n, then Sy = Sx. Therefore, the
standard deviation of Y will be the same as the standard deviation of X.
4. What happens to the standard deviation if the units or scales being worked
with are changed? A re-scaled sample can be created: If Yi = CXi, i=1, …….,
n, then Sy = CSx and S2y = C2S2x. Therefore, to find the variance and
standard deviation of the Y‘s compute the variance and standard deviations
of the X‘s and multiply it by the constant C2 and C, respectively.
Example: If we have a sample of temperature in °C with a standard
deviation of 1.8, then what is the standard deviation of a sample temperature
in °F?
Solution: Let Yi denote the °F temperature that corresponds to a °C

temperature of Xi. Since the required transformation to convert the data to
°F would be: Yi = Xi + 32, i= 1, 2, 3, …, n. Then the standard deviation in
oF would be: Sy = 9/5(1.8) = 3.24 0F.
66
5. On the other hand, where several standard deviations for a variable are
available and if we need to compute the combined standard deviation, the
pooled standard deviation (Sp) of the entire group consisting of all the
samples may be computed as:
Sp = , where ni and Si represent number of observations
and standard deviation of each single sample, respectively.
4. The value of S is usually positive and it is zero only when all of the data values
are the same. Values close together will yield a small SD, whereas values spread
apart will yield a larger SD. Also, larger values of S indicate greater amount of
variation.
Example: The standard deviation of systolic blood pressure was found to be

10.6 and 15.2 mm Hg, respectively, for two groups of 12 and 15 men. What is
the standard deviation of systolic pressure of all the 27 men?
Solution: Given: Group 1: S1 = 10.6 and n1 = 12 Group 2: S2 = 15.2 and n2 = 15,

then
Sp = = {(11*10.62 + 14*15.52)/(11*14)}1/2 = 13.37 mm Hg.
Coefficient of Variation (CV): The coefficient of variation (CV) is defined by
*100%. The coefficient of variation is most useful in comparing the variability of

several different samples, each with different means. This is because a higher
variability is usually expected when the mean increases, and the CV is a
measure that accounts for this variability.
The coefficient of variation is also useful for comparing the reproducibility of

different variables. CV is a relative measure free from unit of measurement.
67
Examples: An analysis of the monthly wages paid (in Birr) to workers in two
firms A and B belonging to the same industry gives the following results.
Value Firm A Firm B

Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
Solution: C.VA = *100% = 10/52.5 = 19.05% and
C.VB = *100% = 11/47.5 = 23.16%.
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
Exercises 4.1
1. Find the missing information from the following data.
Group 1 Group 2 All group
Mean 55 70 60
Sample size 100 ? 150
Standard 15 10 ?
deviation
2. A meteorologist interested in the consistency of temperatures in three cities
during a given week collected the following data. The temperatures for the five
days of the week in the three cities were
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these data?
4.5 The standard Score (Z-score)

The Z-score is the number of standard deviations that a given value X is below or
above the mean and defined as Z = (for the sample data sets) and Z =
68
(for the population data sets). Values above the mean have positive z-scores and
values below the mean have negative Z-scores. The numerical value of the Z-
score reflects because of this Z-score is also referred to as relative measure of
relative standing. Scores are generally meaningless by themselves unless they
are compared to the distribution or scores from some reference group. In
addition to comparison the data sets it is useful to transform a given data sets in
to a new distribution and the resulting data has mean value zero and variance
one which is the standard normal distribution (we will see it in chapters of
hypothesis testing).
Note: A Z-score value less than -2 and greater than 2 considers as unusual
value while between -2 and 2 is considers as ordinary values.
Examples 1. Two sections were given introduction to statistics examinations.

The following information was given.
Value Section 1 Section 2

Mean 78 90
Standard deviation 6 5
Student A from section 1 scored 90 and student B from section 2 scored 95.
Relatively speaking who performed better?
Solution: ZA = = (90-78)/6 = 2 and ZB = = (95-90)/5 = 1.
Student A performed better relative to his section because the score of student A
is two standard deviation above the mean score of his section while, the score of
student B is only one standard deviation above the mean score of his section.
Exercise 4.2
1. Two groups of people were trained 100km race and tested to find out which
group is faster to complete the race. For the two groups the following
information was given:
Value Group one Group two
69
Mean 10.4 min 11.9 min
Stan.dev. 1.2 min 1.3 min
Relatively speaking:
a. Which group is more consistent in its performance?
b. Suppose a person A from group one take 9.2 minutes while person B from
Group two take 9.3 minutes, who was faster in completing the race? Why?
4.6. Moments, Skewness and Kurtosis

In describing a numerical data set it is not only necessary to summarize the data
by presenting appropriate measures of central tendency, dispersion and relative
standing, it is also necessary to consider the shape of the data – the manner, in
which the data are distributed. There are two measures of the shape of a data
set: skewness and kurtosis.
Moments
Moments are statistical measures used to describe the characteristics of a
distribution and we can have moment about any number A and /or about the
mean (called central moment).
The rth moment of the distribution about the mean is:
for ungrouped data set and
for grouped data set.
The rth moments of the distribution about A is:
for ungrouped data set and
for grouped data set.
Skewness
70
If the distribution of the data is not symmetrical, it is called asymmetrical or
skewed. Skewness characterizes the degree of asymmetry of a distribution
around its mean.
The direction of the skewness depends upon the location of the extreme values.
If the extreme values are the larger observations, the mean will be the measure
of location most greatly distorted toward the upward direction. Since the mean
exceeds the median and the mode, such distribution is said to be positive or
right-skewed. The tail of its distribution is extended to the right.
On the other hand, if the extreme values are the smaller observations, the mean
will be the measure of location most greatly reduced. Since the mean is exceeded
by the median and the mode, such distribution is said to be negative or left-
skewed. The tail of its distribution is extended to the left.
Right-skewed distribution Left -skewed

distribution
For a sample data, the skewness is defined by the formula:
Sk = , where n = no of obsns in the sample & s = SD of the
sample.
Mean  Mode
It is also possible to find skewness as: SK=
S tan dard deviation
Properties of Skewness
 If SK = 0, then the distribution is symmetrical.
 If SK > 0, then the distribution is positively skewed.
 If SK < 0, then the distribution is negatively skewed.
71
 There is no theoretical limit to this measure, however, in practice the
value given by this formula falls between -3 and 3.
Kurtosis
Kurtosis characterizes the relative peakedness or flatness of a distribution
compared with the bell-shaped distribution (normal distribution) or kurtosis is
the degree of measure of peakedness of a distribution.
If a distribution is very peaked than a normal distribution, then it is called
Leptokurtic distribution and if it is flat it is called Pletykurtic and if it is
moderate (normal) we call it Mesokurtic.
Kurtosis of a sample data set is calculated directly from the data by the formula:
= -
It is also possible to calculate the measure of kurtosis from the rth moment about
the mean of the sample data as: , where is the 4th moment about the
mean.
Interpretation of the value of
1. If =3, then the distribution is mesokurtic.
2. If > 3, then the distribution is leptokurtic.
3. If < 3, then the distribution is platykurtic.
If we want to our reference point to be zero, we can change the above coefficient
as: φ = - 3.
Accordingly, If φ =0, then the distribution is said to be mesokurtic.

If φ > 0, then the distribution is said to be leptokurtic.
If φ < 0, then the distribution is said to be platykurtic.
72
The distributions with positive and negative kurtosis
Review Exercise on Chapter Four

1. A dietitian obtains the amounts of sugar (in grams) from one gram in each of
16 different cereals are: 0.03 0.24 0.30 0.47 0.43 0.07 0.47 0.13 0.44 0.39
0.48 0.17 0.13 0.09 0.45 and 0.43. Then find IQR, MD, variance, SD, CV,
skewness and kurtosis of the amount of sugar.
2. Random samples of 10 boys are selected from the population of a certain

camp, and each boy‘s weight and height are measured and recorded. The
average weight of boys in the sample is 32.66kg with a standard deviation of
3.9kg and the average height is 95.5cm with a standard deviation of 5.2cm. Is
measurement of weight or height has less variable?
3. Some characteristics of annually family income distribution (in Birr) in two
regions is as follows:
Region Mean Median Standard Deviation
A 6250 5100 960
B 6980 5500 940
a) Calculate coefficient of skewness for each region
b) For which region is, the income distribution more skewed. Give your
interpretation for this Region
c) For which region is the income more consistent?
73
(a) Calculate the pearsonian coefficient of skewness and give appropriate
conclusion.
(b) Are smaller values more or less frequent than bigger values for this
distribution?
(c) If a constant k was added on each observation, what will be the new
pearsonian coefficient of skewness? Show your steps. What do you conclude
from this?
5. The median and the mode of a mesokurtic distribution are 32 and 34
respectively. The 4th moment about the mean is 243. Compute the Pearsonian
coefficient of skewness and identify the type of skewness. Assume (n-1 = n).
6. If the standard deviation of a symmetric distribution is 10, what should be the
value of the fourth moment so that the distribution is mesokurtic?
CHAPTER FIVE
5. ELEMENTARY PROBABLITY
Objectives
 Determine sample spaces, using the fundamental counting rule.
 Find the number of ways that r objects can be selected from n objects,
using the permutation rule.
 Find the number of ways that r objects can be selected from n objects
without regard to order, using the combination rule.
 Find the probability of an event, using the counting rules.
 Find the probability of an event, using classical probability or empirical
probability.
 Find the conditional probability of an event and independency.
74
5.1 INTRODUCTION
A cynical person once said, ―The only two sure things are death and taxes.‖ This
philosophy no doubt arose because so much in people‘s lives is affected by
chance. From the time you awake until you go to bed, you make decisions
regarding the possible events that are governed at least in part by chance. For
example, should you carry an umbrella to work today? Will your car battery last
until spring? Should you accept that new job?
Probability as a general concept can be defined as the chance of an event

occurring. Many people are familiar with probability from observing or playing
games of chance, such as card games, slot machines, or lotteries. In addition to
being used in games of chance, probability theory is used in the fields of
insurance, investments, and weather forecasting and in various other areas.
Finally, as stated in Chapter 1, probability is the basis of inferential statistics.
For example, predictions are based on probability, and hypotheses are tested by
using probability.
5.2 Definitions of Some Probability Terms

 Experiment: - Any process of observation or measurement or any process
which generates well defined outcome.
 Random experiment:- it is an experiment which can be repeated any
number of times under the same conditions, but does not give unique results.
The result will be any one of several possible outcomes, but for each trial, the
result will not be known in advance. A Random experiment is also called a
trial & the outcomes are called events.
 Sample space:- is the collection of all possible outcomes or sample points of
a random experiment.
75
 Event: - is a subset of a sample space i.e. an event is a collection of sample
points.
 Impossible event:- this is an event which will never occur.
Example: In an experiment of tossing a coin three times, S = {HHH, HHT, HTH,

HTT, THH, THT, TTH, TTT}, each sample point is an equally likely outcome. It is
possible to define many events on this sample space as follows:
A = {HHH} - the event of getting only head.
B = {HHH, HHT} - the event of getting head on the first two tosses.
C = the event of getting number 9 is an impossible event.
 Mutually exclusive event: - two events A and B are said to be mutually

exclusive if there is no sample point which is common to A and B. i.e. A ∩ B =
 Independent event: two or more events are said to be independent if the

occurrence or non-occurrence of an event does not affect the occurrence or
non-occurrence of the other.
 Dependent Events: Two events are dependent if the first event affects the
outcome or occurrence of the second event in a way the probability is
changed.
 Complement of an Event: the complement of an event A means
nonoccurrence of A and is denoted by A', or Ac contains those points of the
sample space which don‘t belong to A.
 Equally likely outcomes: if each outcome in a sample space has the same
chance to be occurred.
5.3 Counting Rules

In order to calculate probabilities, we have to know
 The number of elements of an event
 The number of elements of the sample space.
76
That is in order to judge what is probable, we have to know what is possible.
In order to determine the number of outcomes one can use several rules of
counting:
1. Addition rule
2. Multiplication rule
3. Permutation rule
4. Combination rule.
To list the outcomes of the sequence of events, a useful device called tree
diagram is used.
Example: A student goes to the nearest snack to have a breakfast. He can take
tea, coffee, or milk with bread, cake and sandwich. How many possibilities does
he have?
Solutions:
Tea Bread
Cake
Sandwich
Coffee Bread
Cake
Sandwich
Milk Bread
Cake
Sandwich
Therefore, there are nine possibilities.
1. The Addition Rule

Suppose that a procedure, designated by 1, can be done in n1 ways. Assume that
a second procedure designated by 2, can be done in n2 ways. Suppose
furthermore, that it is not possible that both 1 and 2 done together. Then, the
number of ways in which we can do1 or 2 is ways.
77
Example : Suppose we are planning a trip to some place. If there are 3 bus
routes & two train routs that we can take, then there are 3 + 2 = 5 different
routes that we can take.
2. Multiplication Rule:
If a choice consists of k steps of which the first can be made in n1 ways, the second
can be made in n2 ways, …, the kth can be made in nk ways, then the whole choice can
be made in (n1 * n2 * ........* nk ) ways.
Example 1: The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification

card. How many different cards are possible if
a) Repetitions are permitted?
b) Repetitions are not permitted.
Solutions:
a)
1st digit 2nd digit 3rd digit 4th digit
5 5 5 5
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 5 ways.
3. Selecting the 3rd digit, this can be made in 5 ways.
4. Selecting the 4th digit, this can be made in 5 ways.
 5 * 5 * 5 * 5  625 differentcards are possible.

b)
1st digit 2nd digit 3rd digit 4th
digit
5 4 3 2
There are four steps

1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 4 ways.
78
3. Selecting the 3rd digit, this can be made in 3 ways.
4. Selecting the 4th digit, this can be made in 2 ways.
 5 * 4 * 3 * 2  120 differentcards are possible.
3. Permutation
An arrangement of n objects in a specified order is called permutation of the
objects. The number of permutation of n different objects taken r at a time is
n!
obtained by: Pr  for r  0, 1, 2,  , n
(n  r )!
n
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! n * (n  1) * (n  2) * ..... * 3 * 2 *1
Note: By definition 0! = 1.
2. The arrangement of n objects in a specified order using r objects at a time
is called the permutation of n objects taken r objects at a time. It is
written as n Pr and the formula is
n!
n Pr 
(n  r )!
3. The number of permutations of n objects in which k1 are alike k2 are alike
etc is
n!

k1!*k2 * ... * kn
Examples:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there if two letters are used at a time?
2. How many different permutations can be made from the letters in the word
―CORRECTION‖?
Solutions:
1. a)
79
Here n  4, there are four disnict object
 There are 4! 24 permutations.
b)
Here n  4, r  2
4! 24
 There are 4 P2    12 permutations.
(4  2)! 2
2.
Here n  10
Of which 2 are C , 2 are O, 2 are R ,1E ,1T ,1I ,1N
 K1  2, k 2  2, k 3  2, k 4  k 5  k 6  k 7  1
U sin g the 3 rd rule of permutation , thereare
10!
 453600 permutations.
2!*2!*2!*1!*1!*1!*1!
EXERCISES 5.1
1. Six different statistics books, seven different physics books, and 3 different
Economics books are arranged on a shelf. How many different arrangements
are possible if;
i. The books in each particular subject must all stand together
ii. Only the statistics books must stand together
4. Combination
A selection of objects without regard to order is called combination.
Example 1: Given the letters A, B, C, and D list the permutation and

combination for selecting two letters.
Solutions:
AB BA CA DA
Permutation
AC BC CB DB
Combination
AD BD CD DC
80
AB BC
AC BD
AD DC
Note that in permutation AB is different from BA. But in combination AB is the

same as BA.
Combination Rule
The number of combinations of r objects selected from n objects is denoted by
n
n Cr or   and is given by the formula:
r
 n n!
  
 r  (n  r )!*r!
Examples:
1. In how many ways a committee of 5 people is chosen out of 9 people?
Solutions:
n9, r 5
n n! 9!
     126 ways
 r  (n  r )!*r! 4!*5!
2. Among 15 clocks there are two defectives .In how many ways can an
inspector chose three of the clocks for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions: n=15 of which 2 are defective and 13 are non-defective; and r=3
a) If there is no restriction select three clocks from 15 clocks and this can
be done in :
n  15 , r  3
n n! 15!
     455 ways
 r  (n  r )!*r! 12!*3!
81
b) None of the defective clocks is included.
This is equivalent to zero defective and three non defective, which can be
done
in:
 2  13
  *    286 ways.
 0  3 
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non defective, which can be
done in:
 2  13
  *    156 ways.
1  2 
d) Two of the defective clock is included.
This is equivalent to two defective and one non defective, which can be
done in:
 2  13
  *    13 ways.
 2  3 
EXERCISES: 5.2
1. Out of 5 Mathematician and 7 Statistician a committee consisting of 2
Mathematician and 3 Statistician is to be formed. In how many ways this can
be done if
a. There is no restriction
b. One particular Statistician should be included
c. Two particular Mathematicians cannot be included on the committee.
2. If 3 books are picked at random from a shelf containing 5 novels, 3 books of
poems, and a dictionary, in how many ways this can be done if
a. There is no restriction.
b. The dictionary is selected?
c. 2 novels and 1 book of poems are selected?
82
5.4 Approaches to Measuring Probability
There are four different conceptual approaches to the study of probability theory.
These are:
 The classical approach.
 The frequentist approach.
 The axiomatic approach.
 The subjective approach.
1. The classical approach

This approach is used when:
- All outcomes are equally likely.
- Total number of outcome is finite, say N.
Definition: If a random experiment with N equally likely outcomes is conducted

and out of these NA outcomes are favorable to the event A, then the probability
that event A occur denoted P ( A) is defined as:
N A No. of outcomes favourableto A n( A)
P( A)   
N Total numberof outcomes n(S )
Examples:
1. A fair die is tossed once. What is the probability of getting
a) Number 4?
b) An odd number?
c) An even number?
d) Number 8?
Solutions:
First identify the sample space, say S
S  1, 2, 3, 4, 5, 6
 N  n( S )  6
83
a) Let A be the event of number 4 c) Let A be the event of even
numbers
A  4 A  2,4,6
 N A  n( A)  1  N A  n( A)  3
n( A) n( A)
P( A)  1 6 P( A)   3 6  0.5
n( S ) n( S )
b) Let A be the event of odd numbers d) Let A be the event of number 8
A  1,3,5 A  {}
 N A  n( A)  3  N A  n( A)  0
n( A) n( A)
P( A)   3 6  0.5 P( A)  0 60
n( S ) n( S )
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If
10 of this candles are selected at random, what is the probability that
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
Solutions:
 80 
Total selection     N  n( S )
10 
a) Let A be the event that all will be defective.
 30   50 
Total way in which A occur    *    N A  n( A)
 10   0 
 30   50 
 * 
n( A)  10   0 
 P( A)    0.00001825
n( S )  80 
 
 10 
b) Let A be the event that 6 will be non defective.
84
 30   50 
4 6
 30   50 
 * 
n( A)  4   6 
 P( A)    0.265
n( S )  80 
 
 10 
c) Let A be the event that all will be non defective.
 30   50 
 0   10 
 30   50 
 * 
n( A)  0   10 
 P( A)    0.00624
n( S )  80 
 
 10 
EXERCISES 5.3
1. Two dice are rolled. Find the probability of getting
a. A sum of 5, 6, or 7
b. Doubles or a sum of 6 or 8
c. A sum greater than 8 or less than 3
d. Based on the answers to parts a, b, and c, which is least likely to occur?
Explain why.
2. If 3 books are picked at random from a shelf containing 5 novels, 3 books of
poems, and a dictionary, what is the probability that
a. The dictionary is selected?
b. 2 novels and 1 book of poems are selected?
 Short coming of the classical approach:

This approach is not applicable when:
 The total number of outcomes is infinite.
 Outcomes are not equally likely.
85
2. The Frequentist Approach
This approach to probability is based on relative frequencies.
Definition: Suppose we do again and again a certain experiment n times and let
A be an event of the experiment and let k be the number of times that event A
occurs. Therefore the probability of the event A happening in the long run is
given by:
P(A) =
In other words given a frequency distribution, the probability of an event (A)

being in a given class is
P(A) =
This is based on the relative frequencies of outcomes belonging to an event.
Example 1: If records show that 60 out of 100,000 bulbs produced are defective.
What is the probability of a newly produced bulb to be defective?
Solution: Let A be the event that the newly produced bulb is defective.
NA 60
P( A)  lim   0.0006
N  N 100,000
Example 2: Distribution of Blood Types
In a sample of 50 people, 21 had type O blood, 22 had type A blood, 5 had type
B blood, and 2 had type AB blood. Set up a frequency distribution and find the
following probabilities.
a. A person has type O blood.
b. A person has type A or type B blood.
c. A person has neither type A nor type O blood.
d. A person does not have type AB blood.
Solutions:
i. P(O) =f/n=21/50
ii. P(A or B)=22/50+5/50 =27/50
86
(Add the frequencies of the two classes.)
iii. P(neither A nor O)=5/50+2/50=7/50
(Neither A nor O means that a person has either type B or type AB blood.)
iv. P(not AB)= 1- P(AB)=1-2/50=48/50

(Find the probability of not AB by subtracting the probability of type AB from
1.)
Exercise 5.4
1. Hospital Stays for Maternity Patients: Hospital records indicated that
maternity patients stayed in the hospital for the number of days shown in the
distribution.
Number of days stayed Frequency
3 15
4 32
5 56
6 19
7 5
Find these probabilities:
a. A patient stayed exactly 5 days.
b. A patient stayed at most 4 days.
c. A patient stayed less than 6 days.
d. A patient stayed at least 5 days.
3. Axiomatic Approach
Let E be a random experiment and S be a sample space associated with E. With
each event A a real number called the probability of A satisfies the following
properties called axioms of probability or postulates of probability.
1. P( A)  0
2. P( S )  1, S is the sure event.
87
3. If A and B are mutually exclusive events, the probability that one or the other
occur equals the sum of the two probabilities. i.e. P ( A  B )  P ( A)  P ( B )
4. If A and B are independent events, the probability that both will occur is the
product of the two probabilities. i.e. P(A ∩ B) = P(A)*P(B)
5. P( A' )  1  P( A)
6. 0  P ( A)  1
7. P(ø) =0, ø is the impossible event.
Remark: Venn-diagrams can be used to solve probability problems.
A
AUB A∩B
In general p ( A  B)  p ( A)  p ( B )  p ( A  B)
4. Subjective Probability
The fourth type of probability is called subjective probability. Subjective
probability uses a probability value based on an educated guess or estimate,
employing opinions and in exact information. In subjective probability, a person
or group makes an educated guess at the chance that an event will occur. This
guess is based on the person‘s experience and evaluation of a solution. For
example, a sportswriter may say that there is a 70% probability that the Pirates
will win the pennant next year. A physician might say that, on the basis of her
diagnosis, there is a 30% chance the patient will need an operation. A
seismologist might say there is an 80% probability that an earthquake will occur
in a certain area. These are only a few examples of how subjective probability is
used in everyday life. All four types of probability (classical, empirical, axiomatic
88
and subjective) are used to solve a variety of problems in business, engineering,
and other fields.
5.5 Conditional Probability and Independency

5.5.1 Conditional Events:
If the occurrence of one event has an effect on the next occurrence of the other
event then the two events are conditional or dependent events.
Examples: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
Since the first drawn ball is replaced for a second draw it doesn‘t affect the
second draw. For this reason A and B are independent. Then if we let
2
A= the event that the first draw is red p ( A) 
5
2
B= the event that the second draw is red  p ( B ) 
5
2. Draw a ball without replacement
This is conditional b/c the first drawn ball is not to be replaced for a
second draw
in that it does affect the second draw. If we let
2
A= the event that the first draw is red p ( A) 
5
B= the event that the second draw is red  p ( B )  ?
Let B= the event that the second draw is red given that the first draw is red
P(B) = 1/4
5.5.2 Conditional Probability of an Event

The conditional probability of an event A given that B has already occurred,
denoted by p( A B) is
p( A  B)
p( A B) = , p( B)  0
p( B)
Remark: (1) p( A' B)  1  p( A B)
89
(2) p( B ' A)  1  p( B A)
Examples:
1. For a student enrolling at freshman at certain university the probability is
0.25 that he/she will get scholarship and 0.75 that he/she will graduate.
If the probability is 0.2 that he/she will get scholarship and will also
graduate. What is the probability that a student who get a scholarship
graduate?
Solution: Let A= the event that a student will get a scholarship
B= the event that a student will graduate
given p( A)  0.25, p( B)  0.75, p A  B   0.20
Re quired pB A
p A  B  0.20
p  B A    0.80
p  A 0.25
2. If the probability that a research project will be well planned is 0.60 and
the probability that it will be well planned and well executed is 0.54, what
is the probability that it will be well executed given that it is well planned?
Solution; Let A= the event that a research project will be well

Planned
B= the event that a research project will be well
Executed
given p( A)  0.60, p A  B   0.54
Re quired pB A
p A  B  0.54
p  B A    0.90
p  A 0.60
EXERCISE 5.5
1. A lot consists of 20 defective and 80 non-defective items from which two items
are chosen without replacement. Events A & B are defined as A = the first
item chosen is defective, B = the second item chosen is defective
a) What is the probability that both items are defective?
b) What is the probability that the second item is defective?
90
2. A box contains black chips and white chips. A person selects two chips
without replacement. If the probability of selecting a black chip and a white
chip is 15/56, and the probability of selecting a black chip on the first draw is
3/8, find the probability of selecting the white chip on the second draw, given
that the first chip selected was a black chip.
Note: for any two events A and B the following relation holds.
   
pB   pB A. p A  p B A' . p A'
5.6.3 Probability of Independent Events

Two events A and B are independent if and only if p A  B  p A. pB
Here p A B  p A, PB A  pB

Example; A box contains four black and six white balls. What is the probability
of getting two black balls in drawing one after the other under the following
conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution; Let A= first drawn ball is black
B= second drawn is black
Required p A  B 
a. p A  B  pB A. p A  3 / 94 10  2 15
b. p A  B   p A. pB   4 104 10  4 25
Review Exercise on Chapter Five

1. Why would anyone study probability?
2. A newly married couple is planning to have three children. List the elements
of the sample space
a. using B for male and G for female
91
b. if the sample points in the sample space represents the number of females
3. Four married couples have bought 8 seats in a row for a show. In how many
different ways can they be seated
a. If each couple is to sit together?
b. If all the women sit together?
c. If all the women sit together to the right of all the men?
a. In how many ways can the customers be arranged at the counter?
b. In how many ways can they be arranged at the counter if all the women are
to be seated?
c. In how many ways can they be arranged at the counter if all the women are
to be seated and if men occupy the first and last stool?
d. If customers take seats at random, what is the probability that all of the
men are seated and that a woman occupies the middle stool?
4. In how many ways can a committee of three be chosen from 4 married
couples if
a. All are equally eligible?
b. One particular man must be on the committee?
c. Husband and wife cannot serve in the same committee?
5. Let A and B be two events associated with an experiment and suppose that
P(A)=0.4 while P(AUB)=0.7. Let P(B)=P
a. For what choice of P are A and B mutually exclusive?
b. For what choice of P are A and B independent?
6. The personnel department of a company has records which show the
following analysis of its 200 accountants.
Age Bachelor‘s degree only Master‘s degree
Under 30 90 10
30 to 40 20 30
Over 40 40 10
If one accountant is selected at random from the company, find
i) The probability he has only a bachelor‘s degree
ii) The probability he has a master‘s degree, given that he is over 40
92
iii) The probability he is under 30, given that he has a bachelor‘s
degree
CHAPTER SIX
6. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Objectives:
 define the term random variable
 understand discrete and continuous random variables
 Construct a probability distribution for a random variable.
 Find the mean, variance and expected value for a discrete random
variable.
 Find the exact probability for X successes in n trials of a binomial
experiment.
 Find probabilities for outcomes of variables, using the Poisson and normal
distributions.
 Find the mean, variance, and standard deviation for the variable of a
binomial, Poisson and normal distributions
93
Introduction
Before probability distribution is defined formally, the definition of a variable is
reviewed. In Chapter one, a variable was defined as a characteristic or attribute
that can assume different values. Various letters of the alphabet, such as X, Y,
or Z, are used to represent variables. Since the variables in this chapter are
associated with probability, they are called random variables.
For example, if a die is rolled, a letter such as X can be used to represent the
outcomes. Then the value that X can assume is 1, 2, 3, 4, 5, or 6, corresponding
to the outcomes of rolling a single die. If two coins are tossed, a letter, says Y,
can be used to represent the number of heads, in this case 0, 1, or 2. As another
example, if the temperature at 8:00 A.M. is 43o and at noon it is 530, then the
values T that the temperature assumes are said to be random, since they are
due to various atmospheric conditions at the time the temperature was taken.
6.1 Random Variable

Definition: A random variable is a numerical description of the outcomes of the
experiment or a numerical valued function defined on sample space, usually denoted
by capital letters.
Example: If X is a random variable, then it is a function from the elements of the

sample space to the set of real numbers. i.e. X is a function X: S
A random variable takes a possible outcome and assigns a number to it.
Example: Flip a coin three times, let X be the number of heads in three tosses.
 S  HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 
 X HHH   3,
X HHT   X HTH   X THH   2,
X HTT   X THT   X TTH   1
X TTT   0
X = {0, 1, 2, 3}
X assumes a specific number of values with some probabilities.
94
Random variables are of two types:
1. Discrete random variable: are variables which can assume only a specific
number of values. They have values that can be counted
Examples:
 Toss coin n times and count the number of heads.
 Number of children in a family.
 Number of car accidents per week.
 Number of defective items in a given company.
 Number of bacteria per two cubic centimeter of water.
2. Continuous random variable: are variables that can assume all values between
any two give values.
Examples:
 Height of students at certain college.
 Mark of a student.
 Life time of light bulbs.
 Length of time required to complete a given training.
6.2 Probability Distribution

Definition: A probability distribution shows the possible outcomes of an
experiment and the probability of each of these outcomes. That is, probability
distribution is a complete list of all possible of values of a random variable and
their corresponding probabilities.
A formula giving the probability of the different values of the random variable X
for:
 Discrete variable is the probability massy function (pmf) and is usually
denoted by p(x). If X is a discrete random variable taking at most a countable
infinite number of values x1, x2, …, then P (xi) = P(X = xi): i= 1, 2 …is called
95
the probability mass function of random variable X. The set of ordered pairs
{xi, P (xi)} i= 1, 2 … gives the probability distribution of the random variable X.
 Continuous variable is the probability density function (pdf) and is usually
denoted by f(x). A random variable X, is said to be a continuous random
variable if there is a non–negative function, f,
F(x) =
Example: Consider the experiment of tossing a coin three times. Let X is the
number of heads. Construct the probability distribution of X.
Solution:
 First identify the possible value that X can assume.
 Calculate the probability of each possible distinct value of X and express X

in the form of frequency distribution.
X x 0 1 2 3
P X  x  18 38 38 18
Note: Probability distribution is denoted by P for discrete and by f for continuous

random variable.
Properties of Probability Distribution:
1.
P( x)  0, if X is discrete.
f ( x)  0, if X is continuous.
2.
 P X  x 
x
 1 , if X is discrete.
 f ( x)dx
x
 1 , if is continuous.
Note:
1. If X is a continuous random variable then
b
P(a  X  b)   f ( x)dx
a
96
2. Probability of a fixed value of a continuous random variable is zero.
 P ( a  X  b)  P ( a  X  b)  P ( a  X  b)  P ( a  X  b)
3. If X is discrete random variable then
b 1
P ( a  X  b)   P ( x)
x  a 1
b 1
P ( a  X  b)   p ( x )
xa
b
P ( a  X  b)   P ( x)
x  a 1
b
P ( a  X  b)   P ( x )
xa
4. Probability means area for continuous random variable.

Exercise 6.1:
1. Baseball World Series: the baseball World Series is played by the winner
of the National League and the American League. The first team to win four
games wins the World Series. In other words, the series will consist of four to
seven games, depending on the individual victories. The data shown consist of
the number of games played in the World Series from 1965 through 2005. (There
was no World Series in 1994). The number of games played is represented by the
variable X. Find the probability P(X) for each X, construct a probability
distribution, and draw a graph for the data.
X Number of games played

4 8
5 7
6 9
7 16
2. Determine whether each distribution is a probability distribution.
a. X 0 5 10 15 20
P(X) 1/5 1/5 1/5 1/5 1/5
b. X 1 2 3 4
P(X) 1/4 1/8 1/16 9/16
c. X 0 2 4 6
P(X) -1 1.5 0.3 0.2
d. X 2 3 7
P(X) 0.5 0.3 0.4
97
6.3 Introduction to Expectation
Definition:
1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the
probabilities P(X1), P(X2), ….,P(Xn) respectively. Then the expected value of X,
denoted as E(X) is defined as:
E ( X )  X 1 P( X 1 )  X 2 P( X 2 )  ....  X n P( X n )
n
  X i P( X i )
i 1
2. Let X be a continuous random variable assuming the values in the interval

b b
(a, b) such that  f ( x)dx  1,then E ( X )   x f ( x)dx
a a
Examples:
1. What is the expected value of a random variable X obtained by tossing a coin
three times where X is the number of heads?
Solution: First construct the probability distribution of X
X x 0 1 2 3
P X  x  18 38 38 18
 E ( X )  X 1 P( X 1 )  X 2 P( X 2 )  ....  X n P( X n )
 0 *1 8  1 * 3 8  .....  2 *1 8
 1.5
2. Suppose a charity organization is mailing printed return-address stickers to

over one million homes in Ethiopia. Each recipient is asked to donate either $1,
$2, $5, $10, $15, or $20. Based on past experience, the amount a person
donates is believed to follow the following probability distribution:
X x $1 $2 $5 $10 $15 $20
98
P X  x  0.1 0.2 0.3 0.2 0.15 0.05
What is expected that an average donor to contribute?

Solution:
X x $1 $2 $5 $10 $15 $20 Total

P X  x  0.1 0.2 0.3 0.2 0.15 0.05 1
xP( X  x) 0.1 0.4 1.5 2 2.25 1 7.25

6
 E ( X )   xi P( X  xi )  $7.25
i 1
Mean and Variance of a random variable

Let X is given random variable.
1. The expected value of X is its mean
 Mean of X  E ( X )
2. The variance of X is given by:
Varianceof X  var( X )  E ( X 2 )  [ E ( X )]2
Where:
n
E ( X 2 )   xi P( X  xi ) , if X is discrete
2
i 1
  x 2 f ( x)dx , if X is continuous.
x
Examples:
1. Find the mean and the variance of a random variable X in example 2 above.
Solution:
X x $1 $2 $5 $10 $15 $20 Total
P X  x  0.1 0.2 0.3 0.2 0.15 0.05 1
xP( X  x) 0.1 0.4 1.5 2 2.25 1 7.25
x 2 P( X  x) 0.1 0.8 7.5 20 33.75 20 82.15
99
 E ( X )  7.25
Var( X )  E ( X 2 )  [ E ( X )]2  82.15  7.252  29.59
Exercise 6.2
1. Two dice are rolled. Let X is a random variable denoting the sum of the
numbers on the two dice.
a. Give the probability distribution of X
b. Compute the expected value of X and its variance
 There are some general rules for mathematical expectation.

Let X and Y are random variables and k is a constant.
RULE 1: E (k )  k
RULE 2: Var (k )  0
RULE 3: E (kX )  kE( X )
RULE 4: Var (kX )  k 2Var ( X )

RULE 5: E ( X  Y )  E ( X )  E (Y )
6.4 Common Discrete Probability Distributions

1. Binomial Distribution
A binomial experiment is a probability experiment that satisfies the following

four requirements called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes,
success or a failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.
Examples of binomial experiments

 Tossing a coin 20 times to see how many tails occur.
 Asking 200 people if they watch BBC news.
 Registering a newly produced product as defective or non defective.

100
Definition: The outcomes of the binomial experiment and the corresponding
probabilities of these outcomes are called Binomial Distribution.
Let P  the probability of success q  1  p  the probability of failureon any given trial
Then the probability of getting x successes in n trials becomes:
 n
P( X  x)    p x q n x , x  0,1,2,...., n
 x
And this is sometimes written as: X ~ Bin(n, p)
When using the binomial formula to solve problems, we have to identify three
things:
1. The number of trials ( n )
2. The probability of a success on any one trial ( p ) and
3. The number of successes desired ( X ).
Examples:
1. What is the probability of getting three heads by tossing a fair con four times?
Solution: Let X be the number of heads in tossing a fair coin four times.
X ~ Bin(n  4, p  0.50)
n
 P( X  x)    p x q n x , x  0,1,2,3,4
 x
 4
  0.5 x0.54 x
 x
 4
  0.54
 x
 4
 P( X  3)   0.54  0.25
 3
2. Suppose that an examination consists of six true and false questions, and
assume that a student has no knowledge of the subject matter. The
probability that the student will guess the correct answer to the first question
is 30%. Likewise, the probability of guessing each of the remaining questions
correctly is also 30%.
101
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
Solution: Let X = the number of correct answers that the student gets.
X ~ Bin(n  6, p  0.30)
a) P ( X  3)  ?
n
 P( X  x)    p x q n x , x  0,1,2,..6
 x
6
  0.3 x 0.7 6 x
 x
 P( X  3)  P( X  4)  P( X  5)  P( X  6)
 0.060  0.010  0.001
 0.071
Thus, we may conclude that if 30% of the exam questions are answered by
guessing, the probability is 0.071 (or 7.1%) that more than four of the questions
are answered correctly by the student.
b) P( X  2)  ?
P( X  2)  P( X  2)  P( X  3)  P( X  4)  P( X  5)  P( X  6)
 0.324  0.185  0.060  0.010  0.001
 0.58
c) P( X  3)  ?
P( X  3)  P( X  0)  P( X  1)  P( X  2)  P( X  3)
 0.118  0.303  0.324  0.185
 0.93
d) P ( X  5)  ?
P( X  5)  1  P( X  5)
 1  {P( X  5)  P( X  6)}
 1  (0.010  0.001)
 0.989
102
Exercises 6.3
1. Suppose that 4% of all TVs made by A&B Company in 2000 are defective. If
eight of these TVs are randomly selected from across the country and tested,
what is the probability that exactly three of them are defective? Assume that
each TV is made independently of the others.
2. An allergist claims that 45% of the patients she tests are allergic to some type
of weed. What is the probability that
a) Exactly 3 of her next 4 patients are allergic to weeds?
b) None of her next 4 patients are allergic to weeds?
3. Explain why the following experiments are not Binomial
a) Asking 20 people how old they are.
b) Drawing 5 cards from a deck for a poker hand.
Remark: If X is a binomial random variable with parameters n and p then
E ( X )  np , Var ( X )  npq
2. Poisson Distribution
A random variable X is said to have a Poisson distribution if its probability
distribution is given by:
x e 
P( X  x)  , x  0,1,2,......
x!
Where   the averagenumber.
The Poisson distribution depends only on the average number of occurrences per
unit time of space.
The Poisson distribution is used as a distribution of rare events, such as:
Arrivals, Accidents, Number of misprints, Hereditary, Natural disasters like earth
quake, etc.
The process that gives rise to such events is called Poisson process.
103
Example: If 1.6 accidents can be expected an intersection on any given day,
what is the probability that there will be 3 accidents on any given day?
Solution: Let X =the number of accidents,   1.6
1.6 x e 1.6
X  poisson1.6  p X  x  
x!
3 1.6
p X  3 
1.6 e
 0.1380
3!
Exercise 6.4
1. On the average, five smokers pass a certain street corners every ten minutes,
what is the probability that during a given 10 minutes the number of smokers
passing will be
a) 6 or fewer
b) 7 or more
c) Exactly 8…….
If X is a Poisson random variable with parameter  then
E (X )   , Var ( X )  
Note: The Poisson probability distribution provides a close approximation to

binomial probability distribution when n is large and p is quite small or quite large
  np .
(np) x e  ( np )
P( X  x)  , x  0,1,2,......
x!
Where   np  the averagenumber.
Usually we use this approximation if np  5 . In other words, if n  20 and
np  5 n(1  p )  5 ], then we may use Poisson distribution as an approximation to
binomial distribution.
Example: Find the binomial probability P(X=3) by using the Poisson distribution
if p  0.01
and n  200. Solution:
104
U sin g Poisson ,   np  0.01 * 200  2
23 e  2
 P ( X  3)   0.1804
3!
U sin g Binomial , n  200, p  0.01
 200 
 P ( X  3)   (0.01)3 (0.99)99  0.1814
 3 
6.5 Common Continuous Probability Distributions
1. Normal Distribution
A random variable X is said to have a normal distribution if its probability
1  x  2
1   
f ( x)  e 2  
,    x  ,      ,   0
 2
density function is Where   E ( X ),  2  Variance( X )
 and  2 are the Parametersof the Normal Distribution.
Properties of Normal Distribution:

1. It is bell shaped and is symmetrical about its mean and it is mesokurtic.
The maximum ordinate is at x   and is given by f ( x) 

1
 2
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction

from the mean.
3. It is a continuous distribution.
4. It is a family of curves, i.e., every unique pair of mean and standard

deviation defines a different normal distribution. Thus, the normal
distribution is completely described by two parameters: mean and standard
deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on

each side of the mean is 0.5.   f ( x)dx  1

6. It is unimodal, i.e., values mound up only in the center of the curve.
105
7. Mean  Median  mod e  
8. The probability that a random variable will have a value between any two
points is equal to the area under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution
known as the standard normal distribution was derived by using the
transformation
X 
1
1  2 z2
Z  f ( z)  e
 2
Properties of the Standard Normal Distribution:
Same as a normal distribution, but also
 mean is zero,
 variance is one,
 standard Deviation is one
Areas under the standard normal distribution curve have been tabulated in
various ways. The most common ones are the areas between
Z  0 and a positive value of Z .
Given normal distributed random variable X with mean
 and s tan dard deviation 
a X  b
P ( a  X  b)  P (   )
  
 P ( a  X  b)  P ( a    Z  b   )
 
Note:
P ( a  X  b)  P ( a  X  b)
 P ( a  X  b)
 P ( a  X  b)
Examples:
1. Find the area under the standard normal distribution which lies
a) Between Z  0 and Z  0.96
106
Solution:
Area  P (0  Z  0.96)  0.3315
Between Z  1.45 and Z  0
Solution:
Area  P(1.45  Z  0)
 P(0  Z  1.45)
 0.4265
b) To the right of Z  0.35

Solution:
Area  P( Z  0.35)
 P(0.35  Z  0)  P( Z  0)
 P(0  Z  0.35)  P( Z  0)
 0.1368  0.50  0.6368
c) To the left of Z  0.35

Solution:
Area  P( Z  0.35)
 1  P( Z  0.35)
 1  0.6368  0.3632
d) Between Z  0.67 and Z  0.75

Solution:
Area  P (0.67  Z  0.75)
 P (0.67  Z  0)  P(0  Z  0.75)
 P (0  Z  0.67)  P(0  Z  0.75)
 0.2486  0.2734  0.5220
e) Between Z  0.25 and Z  1.25

Solution:
Area  P(0.25  Z  1.25)
 P(0  Z  1.25)  P(0  Z  0.25)
 0.3934  0.0987  0.2957
107
2. Find the value of Z if
a) The normal curve area between 0 and z(positive) is 0.4726
Solution:
P(0  Z  z )  0.4726 and from table
P(0  Z  1.92)  0.4726
 z  1.92.....uniquenessof Areea.
b) The area to the left of z is 0.9868
Solution:
P( Z  z )  0.9868
 P( Z  0)  P(0  Z  z )
 0.50  P(0  Z  z )
 P(0  Z  z )  0.9868  0.50  0.4868
and from table
P(0  Z  2.2)  0.4868
 z  2.2
3. A random variable X has a normal distribution with mean 80 and standard

deviation 4.8. What is the probability that it will take a value
a) Less than 87.2
b) Greater than 76.4
c) Between 81.2 and 86.0
Solution:
X is normal with mean,   80, s tan dard deviation ,   4.8

a)
X  87.2  
P( X  87.2)  P(  )
 
87.2  80
 P( Z  )
4.8
 P( Z  1.5)
 P( Z  0)  P(0  Z  1.5)
 0.50  0.4332  0.9332
108
b)
X  76.4  
P( X  76.4)  P(  )
 
76.4  80
 P( Z  )
4.8
 P( Z  0.75)
 P( Z  0)  P(0  Z  0.75)
 0.50  0.2734  0.7734
c)
81.2   X  86.0  
P(81.2  X  86.0)  P(   )
  
81.2  80 86.0  80
 P( Z )
4.8 4.8
 P(0.25  Z  1.25)
 P(0  Z  1.25)  P(0  Z  1.25)
 0.3934  0.0987  0.2957
4. A random variable has a normal distribution with   5 .Find its mean if the
probability that the random variable will assume a value less than 52.5 is
0.6915.
Solution:
52.5  
P( Z  z )  P( Z  )  0.6915
5
 P(0  Z  z )  0.6915  0.50  0.1915.
But from the table
 P(0  Z  0.5)  0.1915
52.5  
z  0.5
5
   50
Review Exercise on Chapter Six

1. The probability that a freshman entering AAU (Science Faculty) will survive
first semester is 0.92. (From 1993/94 academic year statistics). Assuming
109
this pattern remain unchanged over the subsequent years, what is the
probability that among 100 randomly selected freshmen in first semester,
a. None will survive?
b. Exactly 97 will survive?
c. At least three will survive?
2. A secretary makes 2 errors per page on the average. What is the probability
that on the next page she makes (Assignment )
a) 4 or more errors?
b) No errors?
3. In a study of suicides, the monthly distribution of adolescent suicides in an
area for ten years interval closely followed a Poisson distribution with
parameter λ = 2.75. Find the probability that a randomly selected month will
be one in which three adolescent suicides occurred.
4. The number of calories in a salad on the lunch menu is normally distributed
with mean μ = 200 and standard deviation σ= 5. Find the probability that the
salad you select will contain: (Assignment)
(a) More than 208 calories.
(b) Between 190 and 200 calories.
5. Of a large group of men, 5% are less than 60 inches in height and 40% are
between 60 & 65 inches. Assuming a normal distribution, find the mean and
standard deviation of heights.(Assignment)
6. Suppose that X N (165, 9), where X = the breaking strength of cotton fabric.
A sample is defective if X<162. Find the probability that a randomly chosen
fabric will be defective?
110
CHAPTER SEVEN
7. SAMPLING AND SAMPLING DISTRIBUTION
Objectives:
 Describe the basic concepts of sampling.

 Explain why a sample is often the only feasible way to learn something
about a population
 Describe methods to select a sample
 Define and construct a sampling distribution of the sample mean
 Explain the central limit theorem
 Use the central limit theorem to find probabilities of selecting possible
sample means from a specified population
7.1 Introduction
Given a variable X, if we arrange its values in ascending order and assign
probability to each of the values or if we present Xi in a form of relative frequency
distribution the result is called Sampling Distribution of X.
111
7.2 Definitions of Some Basic Terms in Sampling
Sample Survey: A study that asks questions of a sample drawn from some
population.
Census: Complete count of population is called census. The observations on all
the sampling units in the population are collected in a census. For example, in
Ethiopia, the census is conducted at every tenth year in which observations on
all the persons staying in Ethiopia is collected.
Sample: One or more sampling units are selected from the population according
to some specified procedure. A sample consists only of a portion of the
population units.
Sampling is the process of selecting a number of study units from a defined
study population. By studying this part (sample), we try to generalize findings of
the sample to the population.
Sampling unit: An element or a group of elements on which observations can be
taken is called a sampling unit. The objective of the survey helps in determining
the definition of sampling unit.
For example,
 If the objective is to determine the total income of all the persons in the
household, then the sampling unit is the household.
 If the objective is to determine the income of any particular person in the
household, then the sampling unit is the income of the particular
person in the household.
So the definition of sampling unit depends and varies as per the objective of the
survey.
 Similarly, in another example, if the objective is to study the blood sugar
level, then the sampling unit is the value of blood sugar level of a
person.
 On the other hand, if the objective is to study the health conditions, then
the sampling unit is the person on whom the readings on the blood
112
sugar level, blood pressure and other factors will be obtained. These
values will together classify the person as healthy or unhealthy.
Sampling frame: List of all the units of the population to be surveyed
constitutes the sampling frame. All the sampling units in the sampling frame
have identification particulars.
For example,
 All the students in a particular university listed along with their roll
numbers constitute the sampling frame.
 Similarly, the list of households with the name of head of family or house
address constitutes the sampling frame.
 In another example, the residents of a city area may be listed in more than
one frame - as per automobile registration as well as the listing in the
telephone directory.
Errors in sample survey: When we take a sample, our results will not exactly
equal the results for the whole population. That is, our results will be subjects to
errors.
There are two types of errors
a) Sampling error:
- It is the discrepancy between the population value and sample value.
- May arise due to inappropriate sampling techniques applied
- Sampling error can be minimized by increasing the size of the sample.
- When n ~ N, sampling ~ error=0
b) Non sampling errors: It is a type of systematic error in the design or conducts
of a sampling procedure which results in distortion of the sample, so that it is no
longer representative of the reference population.
 It could be introduced during:

 Measurement or counting (i.e. observational error).
 Respondent or non-respondent error.
 Lack of preciseness of definition.
113
 Errors in editing and tabulation of data, and
 Selection bias (e.g. accessibility bias, volunteer bias,
etc).
- We can eliminate or reduce non sampling error (bias) by careful

design of the sampling procedure & not by increasing the sample
size.
Advantage of sampling as compared to complete enumeration
Reduced cost and enlarged scope

 Sampling involves the collection of data on smaller number of units in
comparison to complete enumeration, so the cost involved in the collection
of information is reduced. Further, additional information can be obtained
at little cost in comparison to conducting another survey. For example,
when an interviewer is collecting information on health conditions, then
he/she can also ask some questions on health practices. This will provide
additional information on health practices and the cost involved will be
much less than conducting an entirely new survey on health practices.
Organization of work
 It is easier to manage the organization of collection of smaller number of
units than all the units in a census. For example, in order to draw a
representative sample from a state, it is easier to manage to draw small
samples from every city than drawing the sample from the whole state at a
time. This ultimately results in more accuracy in the statistical inferences
because better organization provides better data and in turn, improved
statistical inferences are obtained.
Greater accuracy
114
 The persons involved in the collection of data are trained personals. They
can collect the data more accurately if they have to collect smaller number
of units than large number of unites in a given time.
Greater speedy (Urgent information required)
 For the same reason, the data can be collected and summarized more
quickly with a sample than with a complete count. This is a vital
consideration when the information is urgently needed.
 For example, the forecasting of the crop production can be done quickly on
the basis of a sample of data than collecting first all the observations.
Feasibility
 Conducting the experiment on smaller number of units, particularly when

the units are destroyed, is more feasible.
 For example, in determining the life of bulbs, it is more feasible to fuse
minimum number of bulbs. Similarly, in any medical experiment, it is more
feasible to use less number of animals.
7.3 Sampling Techniques

The technique of selecting a sample is important in sampling theory and usually
it depends upon the nature of the investigation.
 There are two basic types of sampling.
A) Random Sampling or Probability Sampling: Is a method of sampling in
which all elements in the population have a pre-assigned non-zero
probability to be included in to the sample.
Probability sampling methods are characterized by:
 A sampling frame exists or can be complied
 Involve random selection procedures
 All units of the population should have an equal or at least a known
chance of being included in the sample.
 Generalization is possible
115
 Types of probability sampling:
i. Simple random sampling
ii. Stratified random sampling
iii. Cluster sampling
iv. Systematic sampling
i. Simple Random Sampling:

Simple random sampling (SRS) is a method of selection of a sample
comprising of n number of sampling units out of the population having N
number of sampling units such that every sampling unit has an equal chance
of being chosen.
In this case the samples can be drawn in two possible ways.
 The sampling units are chosen without replacement in the sense that the
units once chosen are not placed back in the population.
 The sampling units are chosen with replacement in the sense that the
chosen units are placed back in the population.
Under SRS, you have two methods that can be used in order to ensure the
randomness of the selection-Lottery methods and table of random numbers.
Lottery method: may be possible to use the ―lottery‖ method for a small
population. All items of the population are numbered or named on separate slips
of paper of identical size, color and shape. These slips are then folded and mixed
up in a container or drum. A blindfold selection is then made of the number of
slips required to constitute the desired size of sample. As the size of the
population is greater, the lottery method becomes cumbersome. Thus, the
alternative method is using table of random numbers.
Table of Random Numbers: Table of random numbers are tables of the digits 0,
1, 2… 9, each digit having an equal chance of selection at any draw (in row,
116
column and/or diagonal). If there are many units, however, the above technique
soon becomes laborious. Selection of the units is greatly facilitated and made
more accurate by using a set of random numbers in which a large number of
digits is set out in random order.
Steps to Select a Sample Using Table of Random Numbers

 Determine how many digits is needed random number to be constructed
based on the total number of units in the population.
 Choose the starting place and direction (right, left, diagonal, up or down)
in which we will read the numbers from the table.
 In the direction we chose, read the number of digits required. Numbers
that are not within the range needed (greater than the population size)
are discarded and continue reading the numbers in the chosen direction
until all random numbers have been selected.
For instance, you have a population of N= 528 students and you want to draw a
sample of n=10 students. In this case, you may start by assigning a three digit
number to each member of the population so that each member is known as
001, 002, …, 528. Select three columns from table 2.1, say columns 25 to 27. Go
dawn the three columns, selecting the first 10 distinct numbers between 001
and 528. These are 36, 509, 364, 417, 348, 127, 149, 186, 290, and 162. For
the last two numbers we jumped to columns 30 to 32. In repeated selections it is
advisable to vary the starting point in the table. This means that if you get any
three digit number(s) which is/are greater than 528, you should drop and go to
the next number. Then your sample consists of student 36, 509, 364, 417, 348,
127, 149, 186, 290, and 162.
117
ii. Stratified Random Sampling:
When the population is heterogeneous with respect to the study variable it
would not be desirable to use simple random sampling. In such cases stratified
random sampling would be appropriate.
In Stratified Random Sampling the population is first divided into
homogenous groups called strata and a simple random sample is then taken
from each strata. Stratified random sampling is typically used when the
experimenter wants all sub-populations to be represented in the sample. This
type of sampling procedure can be expensive to implement. The number of units
to be selected from each stratum can be determined by one of the following
118
allocation methods. Proportional allocation: If the same sampling fraction is
used for each stratum.
Non-proportional allocation: if a different sampling fraction is used for each
stratum or if the strata are unequal in size and a fixed number of units is
selected from each stratum.
Some of the criteria for dividing in to strata could be: Sex (Male, Female), Age
(under 18, 18 to 28, 29 to 39,); Occupation (blue-collar, professional), Species of
plants or animals, age etc. Note that stratification results in greater
representativeness of the sample and hence adds accuracy.
Example: In order to find the average height of the students in a school of class
1 to class 12, the height varies a lot as the students in class 1 are of age around
6 years and students in class 10 are of age around 16 years. So one can divide
all the students into different subpopulations or strata such as Students of class
1, 2 and 3: Stratum 1
- Students of class 4, 5 and 6: Stratum 2
- Now draw the samples by SRS from each of the strata 1, 2, 3 and 4. All the
drawn samples combined together will constitute the final stratified sample for
further analysis.
Advantages
a. If a correct stratification has been made even a small number of units will
form a representative sample.
b. Achieves different degree of accuracy for different segments of the
population.
Disadvantages
a. It is a very difficult task to divide the universe into homogeneous strata.
119
b. If the strata are overlapping, unsuitable or disproportionate, the selection of
the sample may not be representative.
iii. Cluster Sampling:

Cluster samples are obtained when the population is divided into groups called
clusters (the widely used criteria for forming clusters is geographical proximity).
When a cluster is taken as a sampling unit the procedure of sampling is called
cluster sampling.
In cluster sampling, we follow the following steps:
1. Divide population into clusters (usually along geographic boundaries).
2. Randomly sample clusters.
3. Measure all units within sampled clusters.
Conditions under which the cluster sampling is used:

Cluster sampling is preferred when
i) No reliable listing of elements is available and it is expensive to prepare it.
ii) Even if the list of elements is available, the location or identification of the
units may be difficult.
iii) A necessary condition for the validity of this procedure is that every unit of
the population under study must correspond to one and only one unit of
the cluster so that the total number of sampling units in the frame may
cover all the units of the population under study without any omission or
duplication. When this condition is not satisfied, bias is introduced.
Advantages
1. Significant cost gain.

2. Easier and more practical method which facilitates the field work.
Disadvantages
1. Probability and the representativeness of the sample are sometimes

affected, if the number of the cluster is very large.
2. The results obtained are likely to be less accurate if the number of
sampling units in each cluster is not approximately the same.
120
iv. Systematic Sampling:
This is a convenient method of sampling when a complete list of the sampling
units (sampling frame) is readily available.
The steps you need to follow are:

 The procedure starts in determining the first element to be included
in the sample.
 Then the technique is to take the kth item from the sampling frame.
N
 Let, N  populationsize, n  sample size, k   sampling int erval.
n
 Chose any number between 1 and k . Suppose it is j (1  j  k) .
 The j th unit is selected at first and then ( j  k )th , ( j  2k )th ,....etc until
the required sample size is reached.
Advantages
a. It is easy to apply in some situations where no easier way to do random
sampling.
b. Randomness and probability features are present in this model which
makes sample representative.
Disadvantages
a) It works well only if the complete and up-to-date frame is available and if
the units are randomly arranged.
b) Any hidden periodicity in the list will adversely affect the
representativeness of the sample.
Example: Let N = 50 and n = 5. So k =10. Suppose first selected number

between 1 and 10 is 3. Then systematic sample consists of units with following
serial number 3, 13, 23, 33, 43.
v. Multi-stage Sampling
121
Multistage sampling combines the simple methods described earlier in a
variety of useful ways. Multi-stage sampling is appropriate when the reference
population is large and widely scattered. Selection is done in stages until the
final sampling units are selected. In Multi-stage sampling, the population is
divided in to first stage units called primary sampling units (PSU). Then a
random sample of PSU is made in the first stage; and in the second stage a
random sample of secondary sampling units (SSU) is made from the selected
primary sampling units and so on. The process can be continued for a
number of stages.
Advantages
1. It is more flexible in comparison to other methods of sampling.
2. It is of great significant in surveys of underdeveloped areas where an up-
to-date and accurate frame is not generally available for subdivision of the
material into reasonably small sampling units.
Disadvantages
1. Errors are likely to be large in comparison to others.
2. It involves considerable amount of listing of first stage units, second stage
units etc.
B. Non Random Sampling or non-probability sampling

A sampling method in which a sample is selected on a basis other than
probability consideration such judgment; convenience; prior knowledge etc is
called Non-probability sampling. The difference between non-probability and
probability sampling is that non-probability sampling does not involve random
selection and probability sampling does. In general, researchers prefer
probabilistic or random sampling methods over non-probabilistic ones, and
consider them to be more accurate and rigorous.
In case of non-probability sampling method:
 The probability of inclusion of any units (of population) in a sample is not
known.
122
 The selection of units within a sample involves human judgment rather
than pure chance.
The most common types of non probability sampling are:

i) Judgment sampling.
ii) Convenience sampling or haphazard sampling
iii) Quota Sampling.
iv) Volunteer sampling
i. Judgment Sampling - In this case, the person taking the sample has direct
or indirect control over which items are selected for the sample.
 This approach is used when a sample is taken based on certain judgment
about the overall population.
 The underlying assumption is that the investigator will select units that
are characteristics of the population.
 The critical issue here is objectivity: how much can judgment be relied
upon to arrive at a typical sample?
 Judgment sampling is subject to the researcher‘s biases & is perhaps even
more biased than haphazard sampling.
 Since any preconceptions the researcher may have are reflected in the
sample, large biases can be introduced if theses preconceptions are in
accurate.
 Statisticians often use this method in exploratory studies like pre-testing
of questionnaires & focus groups.
 They also prefer to use this method in laboratory settings where the choice
of experimental subjects (i.e. animal, human, vegetables) reflects the
investigator‘s pre-existing beliefs about the population.
 One advantage of judgment sampling is the reduced cost & time involved
in acquiring the sample.
ii. Convenience Sampling - In this method, the decision maker selects a
sample from the population in a manner that is relatively easy and
123
convenient. It is not normally representative of the target population b/c
sample units are only selected if they can be accessed easily & conveniently.
 A food critic, for example, may try several appetizers or entrees to judge
the quality & the variety of a menu.
 And television reporters often seek so-called ‗people-on-the street
interviewers‘ to find out how people view an issue.
iii. Quota Sampling - In this method, the decision maker requires the sample to
contain a certain number of items with a given characteristic. Many political
polls are, in part, quota sampling.
 One of the most common forms of non-probability sampling.
 Sampling is done until a specific number of units (quotas) for various
sub-populations have been selected.
 Since there are no rules as to how these quotas are to be filled, quota
sampling is really a means for satisfying sample size objectives for
certain sub-populations.
 The quotas may be based on populations.
 For example, if there are 100 men& 100 women in a population &
a sample of 20 are to be drawn to participate in a cola taste
challenge, you may want to divide the sample evenly between the
sexes 10 men& 10 women.
 Quota sampling can be considered preferable to other forms of
non probability sampling (judgment sampling) because it forces
the inclusion of members of different sub-populations.
iv. Volunteer sampling: As the term implies, this type of sampling occurs when
people volunteer their services for the study. In psychological experiments or
pharmaceutical trials (drug testing), for example, it would be difficult &
unethical to enlist random participants from the general public. In these
instances, the sample is taken from a group of volunteers. Sometimes, the
research offers payment to entice responders. In exchange, the volunteers
124
accept the possibility of the lengthy, demanding or sometimes unpleasant
process.
Note: let N  population size, n  sample size.
1. Suppose simple random sampling is used
 We have N n possible samples if sampling is with replacement.
N
 We have   possible samples if sampling is without replacement.
n
2. After this onwards, we consider that samples are drawn from a given
population using simple random sampling.
7.3 Sampling Distribution

Statistical inference draws conclusions about population on the basis of data.
The data are summarized by statistics such as the sample mean and the sample
standard deviation. When the data are produced by random sampling or
randomized experimentation, a statistic is a random variable that obeys the laws
of probability theory. The link between probability and data is formed by the
sampling distributions of statistics. A sampling distribution shows how a
statistic would vary in repeated data production.
Definition: A sampling distribution is a probability distribution that determines
probabilities of the possible values of a sample statistic.
Each statistic has a sampling distribution. A sampling distribution is simply a
type of probability distribution. Unlike the distributions studied so far, a
sampling distribution refers not to individual observations but to the values of
statistic computed from those observations, in sample after sample.
Sampling distributions reflect the sampling variability that occurs in collecting
data and using sample statistics to estimate parameters. A sampling distribution
of statistic based on n observations is the probability distribution for that
statistic resulting from repeatedly taking samples of size n, each time calculating
the statistic value. The form of sampling distribution is often known
125
theoretically. We can then make probabilistic statements about the value of
statistic for one sample of some fixed size n.
7.3.1 Sampling Distribution of the Sample Mean

 Sampling distribution of the sample mean is a theoretical probability
distribution that shows the functional relationship between the possible
values of a given sample mean based on samples of size n and the
probability associated with each value, for all possible samples of size n
drawn from that particular population.
 There are commonly three properties of interest of a given sampling
distribution.
 Its Mean
 Its Variance
 Its Functional form
Steps for the construction of Sampling Distribution of the mean
1. From a finite population of size N , randomly draw all possible samples of

size n.
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution
or relative frequency distribution.
Example: Suppose we have a population of size N  5 , consisting of the age of

five children: 6, 8, 10, 12, and 14. Take samples of size 2 with replacement and
construct sampling distribution of the sample mean.
Solution: N  5, n  2
 We have N n  52  25 possible samples since sampling is with
replacement.
Step 1: Draw all possible samples:
6 8 10 12 14
6 (6, 6) (6, 8) (6, 10) (6, 12) (6, 14)
126
8 (8,6) (8,8) (8,10) (8,12) (8,14)
10 (10,6) (10,8) (10,10) (10,12) (10,14)
12 (12,6) (12,8) (12,10) (12,12) (12,14)
14 (14,6) (14,8) (14,10) (14,12) (14,14)
Step 2: Calculate the mean for each sample:

6 8 10 12 14
6 6 7 8 9 10
8 7 8 9 10 11
10 8 9 10 11 12
12 9 10 11 12 13
14 10 11 12 13 14
Step 3: Summarize the mean obtained in step 2 in terms of frequency

distribution.
X Frequency
6 1
7 2
8 3
9 4
10 5
11 4
12 3
13 2
14 1
a) Find the mean of X , say  X
X  X f i i

250
 10  
f i 25
b) Find the variance of X , say  X

2
, X 
2 (X  
i X
)2 fi

100
 4 2
f i 25
Remark:
1. In general if sampling is with replacement
127
2
X  2
n
2. If sampling is without replacement
2  N n
X2   
n  N 1 
3. In any case the sample mean is unbiased estimator of the population mean.
i.e.  X    E(X )   (Show!)
 Sampling may be from a normally distributed population or from a non-

normally distributed population.
 When sampling is from a normally distributed population, the distribution
of X will possess the following property.
1. The distribution of X will be normal
2. The mean of X is equal to the population mean , i.e.  X 

3. The variance of X is equal to the population variance divided by the
2
 X ~ N ( , )
2 n
sample size, i.e.  X 
2
n X 
Z  ~ N (0,1)
 n
Exercise 7.3
1. Suppose we have a population of size N  5, consisting of the age of five
children: 6, 8, 10, 12, and 14. Take samples of size 2 without replacement and
construct sampling distribution of the sample mean.
7.4 The Central Limit Theorem

Given a population of any functional form with mean  and finite variance  ,
2
the sampling distribution of X , computed from samples of size n from the
128
population will be approximately normal when the sample size n is large. That
is, when n is large, then
2
X Approximately Normal (  , )
n
In practice, the normal approximation for X is usually adequate when n is
greater than 30. The central limit theorem allows us to use normal probability
calculations to answer questions about sample means from many observations
even when the population distribution is not normal.
Example 1: The mean weight of 500 male students at a certain university is 151
pounds (lb) and the standard deviation is 15 lb. Assuming that the weights are
normally distributed. Suppose that a sample of 64 students is taken, what is the
probability that the weight in the sample is more than 154.75 lb?
Solution: As we have taken a large (n=64) sample we can use the Central Limit
Theorem. This says that the mean weight of the sample can be approximated by
a normal random variable with a mean of 151 and a variance of 225. If we let X
be the mean weight of the students, it is required to find
P( X >154.75) = X  N 151,225/ 64

154.75  151
P( X >154.75) = p( X   > ) = P (Z>2.00) = 0.5 – 0.4772 = 0.0228.
/ n 15 / 8
Case-III: When sampling is from normally distributed population with unknown

population variance,
a) If the sample size is large, Z  X    N (0, 1), where S is an estimate of  .

S/ n
b) If the sample size is small (n<30), t  X   t(n-1). t has t-distribution with (n-1)
S/ n
degree of freedom, where S is an estimate of  .
Review Exercises on Chapter Seven

1. Describe briefly the difference between
a) Census and sample survey
129
b) Sampling and non sampling error
c) Sampling frame and sampling units
2. Why do researchers usually select sample elements from a given population?
3. Mention some of the disadvantage of sampling
4. In each of the following statements identify whether the sampling used are:
random, systematic, stratified, cluster or convenience
a) An instructor wants to select a random sample of 20 students from a

population of 180 students using a table of random numbers.
b) An economist surveys all households from each of the six ―kebeles‖ of the
city.
c) A telephone company selects every 20th pager from the assembly line.
d) A news reporter of ETV gets opinion about ―GEMENA‖ drama by interviewing

people as they pass ―THE CHURCHIL ROAD‖.
e) A lawyer asks a pickpocket man to now other robbers.

5. Consider a normal population with mean μ = 82 and standard deviation σ =
12.
a) If a random sample of size 64 is selected, what is the probability that the
sample mean will lie between 80.8 and 83.2?
b) With a random sample of size 100, what is the probability that the sample
mean will lie between 80.8 and 83.2?
6. Describe how to obtain a systematic random sample of 24 VISA user
customers from a population of 360 customers in Dashen bank.
130
CHAPTER EIGHT
8. ESTIMATION AND HYPOTHESIS TESTING
Objectives:
 Define a point estimate and interval estimate.
 Find the confidence interval for the mean when σ is known.
 Find the confidence interval for the mean when σ is unknown.
 Understand the definitions used in hypothesis testing.
 State the null and alternative hypotheses.
 Find critical values for the z test
 State the five steps used in hypothesis testing.
 Test means when σ is known, using the z test.
 Test means when σ is unknown, using the t test.
 Explain the relationship between type I and type II errors and the power of
a test.
131
8.1 Introduction
Inference is the process of making interpretations or conclusions from sample
data for the totality of the population. It is only the sample data that is ready
for inference.
In statistics there are two ways though which inference can be made.
 Statistical estimation
 Statistical hypothesis testing.
Population
Inference Analyzed
Data
Numerical
Sample data
Data analysis is the process of extracting relevant information from the

summarized data.
8.2 Statistical Estimation

It is the procedure of using a sample statistic to estimate a population
parameter. This is one way of making inference about the population parameter
where the investigator does not have any prior notion about values or
characteristics of the population parameter.
Statistical estimation is divided into two main categories:
i. Point Estimation
ii. Interval Estimation.
132
8.2.1 Point Estimation
When we use a single value of a statistic to estimate the corresponding

parameter of a population, it is called point estimation.
The object of point estimation is to calculate, from the sample data, a single
number that is likely to be close to the unknown value of the population
parameter. The available information is assumed to be in the form of a random
sample X1, X2, . . . , Xn of size n taken from the population. The object is to
formulate a statistic such that its value computed from the sample data would
reflect the value of the population parameter as closely as possible.
Definition8.1. A point estimator of a unknown population parameter is a
statistic that estimates the value of that parameter. A point estimate of a
parameter is the value of a statistic that is used to estimate the parameter.
(Agresti & Finlay, 1997 and Weiss, 1999)
For instance, to estimate a population means μ, perhaps the most intuitive point
estimator is the sample mean:
Once the observed values x1, x2, . . . , xn of the random variables Xi are available,
we can actually calculate the observed value of the sample mean , which is
called a point estimate of μ.
Suppose a college president wishes to estimate the average age of students

attending classes this semester. The president could select a random sample of
100 students and find the average age of these students say, is an
estimator for the population mean and = 22.3 years is an estimate, which is
one of the possible value of . From the sample mean, the president could infer
that the average age of all the students is 22.3 years. This type of estimate is
called a point estimate.
133
Properties of a Good Estimator
Let be an estimator of θ.
1. The estimator should be an unbiased estimator. That is, the expected value
or the mean of the estimates obtained from samples of a given size is equal to
the parameter being estimated. i.e. E( ) = θ.
2. The estimator should be consistent. For a consistent estimator, as sample
size increases, the value of the estimator approaches the value of the
parameter estimated. i.e. gets closer to θ as the sample size increases.
3. The estimator should be a relatively efficient estimator. That is, of all the
statistics that can be used to estimate a parameter, the relatively efficient
estimator has the smallest variance.
8.2.1.1 Point estimators of the population mean and standard deviation

The sample mean is the obvious point estimator of a population mean μ. In
fact, is unbiased, and it is relatively efficient for most population distributions.
It is the point estimator, denoted by , used in this text:
Moreover, the sample standard deviation s is the most popular point estimate of
the population standard deviation σ. That is,
8.2.2 Interval Estimation

For point estimation, a single number lies in the forefront even though a
standard error is attached. Instead, it is often more desirable to produce an
interval of values that is likely to contain the true value of the unknown
parameter.
134
A confidence interval estimate of a parameter consists of an interval of numbers
obtained from a point estimate of the parameter together with a percentage that
specifies how confident we are that the parameter lies in the interval. The
confidence percentage is called the confidence level.
It deals with identifying the upper and lower limits of a parameter. The limits by
themselves are random variable.
Estimate ± critical value × Standard error of the estimator
8.2.2.1 Confidence interval estimation of the population mean
Although X possesses nearly all the qualities of a good estimator, because of

sampling error, we know that it's not likely that our sample statistic will be equal
to the population parameter, but instead will fall into an interval of values. We
will have to be satisfied knowing that the statistic is "close to" the parameter.
That leads to the obvious question, what is "close"?
We can phrase the latter question differently: How confident can we be that the
value of the statistic falls within a certain "distance" of the parameter? Or, what
is the probability that the parameter's value is within a certain range of the
statistic's value? This range is the confidence interval.
The confidence level is the probability that the value of the parameter falls
within the range specified by the confidence interval surrounding the statistic.
There are different cases to be considered to construct confidence

intervals.
Case 1: If sample size is large or if the population is normal with known variance.
Recall the Central Limit Theorem, which applies to the sampling distribution of
the mean of a sample. Consider samples of size n drawn from a population,
whose mean is  and standard deviation is  with replacement and order
important. The population can have any frequency distribution. The sampling

distribution of X will have a mean  x   and a standard deviation x  ,
n
135
and approaches a normal distribution as n gets large. This allows us to use the
normal distribution curve for computing confidence intervals.
X 
Z  has a normal distribution with mean  0 and var iance  1
 n
   X  Z n
 X , where  is a measure of error.
  Z n
 For the interval estimator to be good the error should be small. How it be
small?
 By making n large
 Small variability
 Taking Z small
 To obtain the value of Z, we have to attach this to a theory of chance. That is,
there is an area of size 1 such that
P( Z  2  Z  Z  2 )  1  
Where   is the probability that the parameterlies outsidethe int erval
Z  2  s tan ds for the s tan dard normal var iableto the right of which
 2 probability lies, i.e P( Z  Z  2 )   2
For , we then have
X 
 P ( Z 2   Z 2 )  1  
 n
 P ( X  Z 2  n    X  Z 2  n)  1
Hence the above probability statement
 ( X  Z 2  n , X  Z 2  n ) is a1001   % conifidenc e int erval for 
But usually 2 is not known, in that case we estimate by its point estimator S2
136
 ( X  Z 2 S n , X  Z 2 S n ) is a1001   % conifidenc e int erval for 
Here are the Z values corresponding to the most commonly used confidence levels.
100(1   ) %   2 Z 2
90 0.10 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.58
Case 2: If sample size is small and the population variance,  2 is not

known.
X 
t has t distributi on with n  1 deg rees of freedom.
S n
 ( X  t 2 S n , X  t 2 S n ) is a1001   % conifidenc e int erval for 

T
he unit of measurement of the confidence interval is the standard error. This is

just the standard deviation of the sampling distribution of the statistic.
Examples:
1. From a normal sample of size 25 a mean of 32 was found .Given that the
population standard deviation is 4.2. Find
a) A 95% confidence interval for the population mean.
b) A 99% confidence interval for the population mean.
Solution:
X  32,   4.2, 1    0.95    0.05,  2  0.025

 Z 2  1.96 from table.
a)  The required int erval will be X  Z 2  n
 32  1.96 * 4.2 25
 32  1.65
 (30.35, 33.65)
137
X  32,   4.2, 1    0.99    0.01,  2  0.005
 Z 2  2.58 from table.
 The required int erval will be X  Z 2  n
 32  2.58 * 4.2 25
 32  2.17
 (29.83, 34.17)
2. A drug company is testing a new drug which is supposed to reduce blood

pressure. From the six people who are used as subjects, it is found that the
average drop in blood pressure is 2.28 points, with a standard deviation of
.95 points. What is the 95% confidence interval for the mean change in
pressure?
Solution:
X  2.28, S  0.95, 1    0.95    0.05,  2  0.025

 t 2  2.571 with df  5 from table.
 The required int erval will be X  t 2 S n
 2.28  2.571 * 0.95 6
 2.28  1.008
 (1.28, 3.28)
That is, we can be 95% confident that the mean decrease in blood pressure is
between 1.28 and 3.28 points.
Exercises: 8.1
1. An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed with a standard deviation of 40 hours. If a
random sample of 30 bulbs has an average life of 780 hours, find a 99%
confidence interval for the population mean of all bulbs produced by this firm.
2. A random sample of 400 households was drawn from a town and a survey
generated data on weekly earning. The mean in the sample was Birr 250 with a
138
standard deviation Birr 80. Construct a 95% confidence interval for the
population mean earning.
3. A major truck has kept extensive records on various transactions with its
customers. If a random sample of 16 of these records shows average sales of 290
liters of diesel fuel with a standard deviation of 12 liters, construct a 95%
confidence interval for the mean of the population sampled.
8.2 Hypothesis Testing

This is also one way of making inference about population parameter, where the
investigator has prior notion about the value of the parameter.
`Definitions:
 Statistical hypothesis: is an assertion or statement about the population
whose plausibility is to be evaluated on the basis of the sample data.
 Test statistic: is a statistics whose value serves to determine whether to
reject or accept the hypothesis to be tested. It is a random variable.
 Statistic test: is a test or procedure used to evaluate a statistical hypothesis
and its value depends on sample data.
There are two types of hypothesis:

Null hypothesis:
 It is the hypothesis to be tested.
 It is the hypothesis of equality or the hypothesis of no difference.
 Usually denoted by H0.
Alternative hypothesis:
 It is the hypothesis available when the null hypothesis has to be rejected.

 It is the hypothesis of difference.
 Usually denoted by H1 or Ha.
139
General steps in hypothesis testing:
1. Specify the null hypothesis (H0) and the alternative hypothesis (H1).
2. Specify the significance level, 
3. Identify the sampling distribution (if it is Z or t) of the estimator.
4. Identify the critical region.
5. Calculate a statistic analogous to the parameter specified by the null
hypothesis.
6. Making decision.
7. Summarization of the result.
8. .2.1 Hypothesis testing about the population mean, :

Suppose the assumed or hypothesized value of  is denoted by  0 , then one
can formulate two sided (1) and one sided (2 and 3) hypothesis as follows:
1. H 0 :   0 vs H1 :    0
2. H 0 :   0 vs H1 :   0
3. H 0 :   0 vs H1 :   0
Case 1: When sampling is from a normal distribution with 2 known
1. The relevant test statistic is
X  0
Z cal 
 n
2. After specifying  we have the following regions (critical and acceptance)
on the standard normal distribution corresponding to the above three
hypothesis.
Summary table for decision rule:
H0 Reject H0 if Accept H0 if Inconclusive if
140
  0 Z cal  Z 2 Z cal  Z 2
Z cal  Z 2 or Z cal  Z 2
  0 Z cal  Z Z cal  Z Z cal  Z

  0 Z cal  Z Z cal  Z Z cal  Z
Case 2: When sampling is from a normal distribution with 2 unknown and

small sample size
3. The relevant test statistic is

X  0
t cal  ~ t with n  1 deg rees of freedom.
S n
4. After specifying  we have the following regions on the student t-
distribution corresponding to the above three hypothesis.
H0 Reject H0 if Accept H0 if Inconclusive if
  0 tcal  t 2 tcal  t 2
tcal  t 2 or tcal  t 2
  0 tcal  t tcal  t tcal  t
  0 tcal  t tcal  t tcal  t
Case 3: When sampling is from a non- normally distributed population or

a population whose functional form is unknown.
5. If a sample size is large one can perform a test hypothesis about the mean
by using:
X  0
Z cal  , if  2 is known.
 n
X  0
 , if  2 is unknown.
S n
6. The decision rule is the same as case I.
Examples:
141
1. Test the hypotheses that the average height content of containers of
certain lubricant is 10 liters if the contents of a random sample of 10
containers are 10.2, 9.7, 10.1, 10.3, 10.1, 9.8, 9.9, 10.4, 10.3, and 9.8 liters.
Use the 0.01 level of significance and assume that the distribution of contents
is normal.
Solution:
Let   Population mean. ,  0  10
Step 1: Identify the appropriate hypothesis
H 0 :   10 vs H1 :   10
Step 2: select the level of significance,   0.01( given)
Step 3: Select an appropriate test statistics
t- Statistic is appropriate because population variance is not known and the
sample size is also small.
Step 4: identify the critical region.
Here we have two critical regions since we have two tailed hypothesis
The critical region is tcal  t0.005 (9)  3.2498
 (3.2498, 3.2498) is accep tan ce region.
Step 5: Computations:
X  10.06, S  0.25
X   0 10.06  10
 t cal    0.76
S n 0.25 10
Step 6: Decision
Accept H0 , since tcal is in the acceptance region.
Step 7: Conclusion
At 1% level of significance, we have no evidence to say that the average height

content of containers of the given lubricant is different from 10 litters, based on
the given sample data.
142
2. The mean life time of a sample of 16 fluorescent light bulbs produced by a
company is computed to be 1570 hours. The population standard deviation is
120 hours. Suppose the hypothesized value for the population mean is 1600
hours. Can we conclude that the life time of light bulbs is decreasing?
(Use   0.05 and assume the normality of the population)
Solution:
Let   Population mean. , 0  1600
Step 1: Identify the appropriate hypothesis
H 0 :   1600 vs H1 :   1600
Step 2: select the level of significance,   0.05 ( given)
Step 3: Select an appropriate test statistics
Z- Statistic is appropriate because population variance is known.
Step 4: identify the critical region.
The critical region is Z cal  Z 0.05  1.645

 (1.645, ) is accep tan ce region.
Step 5: Computations:
X  0 1570  1600
Z cal    1.0
 n 120 16
Step 6: Decision
Accept H0, since Zcal is in the acceptance region.
Step 7: Conclusion
At 5% level of significance, we have no evidence to say that that the life time of
light bulbs is decreasing, based on the given sample data.
Exercise 8.2
1. It is known in a pharmacological experiment that rats fed with a
particular diet over a certain period gain an average of 40 gms in weight. A new
143
diet was tried on a sample of 20 rats yielding a weight gain of 43 gms with
variance 7 gms. Test the hypothesis that the new diet is an improvement
assuming normality.
8.3 Types and Size of Errors

 Testing hypothesis is based on sample data which may involve sampling and
non sampling errors.
 The following table gives a summary of possible results of any hypothesis test:
Decision
Reject H0 Don't reject H0
H0 Type I Error Right Decision

Truth
H1 Right Decision Type II Error
 Type I error: Rejecting the null hypothesis when it is true.

 Type II error: Failing to reject the null hypothesis when it is false.
NOTE:
2. There are errors that are prevalent in any two choice decision making
problems.
3. There is always a possibility of committing one or the other errors.
4. Type I error (  ) and type II error (  ) have inverse relationship and
therefore, can not be minimized at the same time.

 In practice we set  at some value and design a test that minimize  . This is
because a type I error is often considered to be more serious, and therefore more
important to avoid, than a type II error.
8.4 Test of Association

Suppose we have a population consisting of observations having two attributes
or qualitative characteristics say A and B. If the attributes are independent
then the probability of possessing both A and B is PA*PB.
144
Where: PA is the probability that a number has attribute A. PB is the
probability that a number has attribute B.
- Suppose A has r mutually exclusive and exhaustive classes. B has c
mutually exclusive and exhaustive classes. The entire set of data can be
represented using r *c contingency table.
B
A B1 B2 . . Bj . Bc Total
A1 O11 O12 O1j O1c R1
A2 O21 O22 O2j O2c R2
.
.
Ai Oi1 Oi2 Oij Oic Ri
.
.
Ar Or1 Or2 Orj Orc
Total C1 C2 Cj n
- The chi-square procedure test is used to test the hypothesis of independency

of two attributes For instance we may be interested
 Whether the presence or absence of hypertension is independent of
smoking habit or not.
 Whether the size of the family is independent of the level of education
attained by the mothers.
 Whether there is association between father and son regarding boldness.
 Whether there is association between stability of marriage and period of
acquaintance ship prior to marriage.
- The  2 statistic is given by:

 (Oij  eij ) 2 
 ~  ( r 1)( c 1)
r c
 cal   
2 2

i 1 j 1  eij 
145
Where Oij  the number of units that belongto categoryi of A and j of B.
eij  Expected frequencythat belongto categoryi of A and j of B.
- The eij is given by :

Where Ri  the i th row total.
Ri * C j
eij  C j  the j th column total.
n
n  total number of oservations
r c r c
Remark: n   Oij   eij
i 1 j 1 i 1 j 1
- The null and alternative hypothesis may be stated as:

H 0 : Thereis no association between A and B.
H1 : not H 0 ( Thereis association between A and B).
Decision Rule: Reject H0 for independency at  level of significance if the

calculated value of 2 exceeds the tabulated value with degree of
freedom equal to (r  1)(c  1) .
r c  (Oij  eij ) 2 
 Reject H 0 if  2 cal      2( r 1)( c 1) at 
i 1 j 1 
 eij 
Examples:
1. A geneticist took a random sample of 300 men to study whether there is

association between father and son regarding boldness. He obtained the
following results.
Son
Father Bold Not
Bold 85 59
Not 65 91
146
Using   5% , test whether there is association between father and son
regarding boldness.
Solution:
H 0 : Thereis no association between Father and Son regardingboldness.
H1 : not H 0
- First calculate the row and column totals

R1  144, R2  156, C1  150, C2  150
- Then calculate the expected frequencies( eij‘s)

Ri * C j
eij 
n
 e11  R1 * C1  144 *150  72 e12 

R1 * C2 144 * 150
  72
n 300 n 300
R2 * C1 156 * 150 R2 * C2 156 * 150

e21    78 e22    78
n 300 n 300
- Obtain the calculated value of the chi-square.

2  (Oij  eij ) 2 
2
 cal   
2


i 1 j 1  eij 
(85  72) 2 (59  72) 2 (65  78) 2 (91  78) 2
     9.028
72 72 78 78
- Obtain the tabulated value of chi-square

  0.05
Degreesof freedom (r  1)(c  1)  1*1  1
 02.05 (1)  3.841 from table.
- The decision is to reject H0 since  2 cal   02.05 (1)

Conclusion: At 5% level of significance we have evidence to say there is
association between father and son regarding boldness, based on this sample
data.
147
2. Random samples of 200 men, all retired were classified according to
education and number of children is as shown below
3.
Education Number of
level children
0-1 2-3 Over 3
Elementary 14 37 32
Secondary 31 59 27
and above
Test the hypothesis that the size of the family is independent of the level of
education attained by fathers. (Use 5% level of significance)
Solution:
H 0 : There is no association between the size of the family and the level of
educationattained by fathers.
H1 : not H 0 .
- First calculate the row and column totals

R1  83, R2  117, C1  45, C2  96, C3  59
- Then calculate the expected frequencies( eij‘s)

Ri * C j  e11  18.675, e12  39.84, e13  24.485
eij 
n e21  26.325, e22  56.16, e23  34.515
- Obtain the calculated value of the chi-square.

2 3  (Oij  eij ) 2 
 2 cal    
i 1 j 1 
 eij 
(14  18.675) 2
(37  39.84) 2 (27  34.515) 2
   ...   6.3
18.675 39.84 34.515
- Obtain the tabulated value of chi-square
  0.05
Degreesof freedom  (r  1)(c  1)  1 * 2  2
 02.05 (2)  5.99 from table.
148
- The decision is to reject H0 since  2 cal   02.05 (2)
Conclusion: At 5% level of significance we have evidence to say there is
association between the size of the family and the level of education attained by
fathers, based on this sample data.
Review exercise on chapter eight

1. A survey of 30 adults found that the mean age of a person‘s primary
vehicle is 5.6 years. Assuming the standard deviation of the population is
0.8 year, find the best point estimate of the population mean and the 99%
confidence interval of the population mean.
2. Ten randomly selected people were asked how long they slept at night. The
mean time was 7.1 hours, and the standard deviation was 0.78 hour. Find
the 95% confidence interval of the mean time. Assume the variable is
normally distributed.
3. A researcher reports that the average salary of assistant professors is more
than $42,000. A sample of 30 assistant professors has a mean salary of
$43,260. At a =0.05, test the claim that assistant professors earn more
than $42,000 per year. The standard deviation of the population is $5230.
4. The Medical Rehabilitation Education Foundation reports that the average
cost of rehabilitation for stroke victims is $24,672. To see if the average
cost of rehabilitation is different at a particular hospital, a researcher
selects a random sample of 35 stroke victims at the hospital and finds that
the average cost of their rehabilitation is $25,226. The standard deviation
of the population is $3251. At a =0.01, can it be concluded that the
average cost of stroke rehabilitation at a particular hospital is different
from $24,672?
149
CHAPTER NINE
9. SIMPLE LINEAR REGRESSION AND CORRELATION
Objectives:
After completing the topic, the students will be able to:
 Determine the relationship between variables.
 Find the fitted regression line of the two variables.
 Draw and describe scatter diagram.
 Interpret the slope and intercept of the fitted regression line.
 Calculate and interpret the correlation coefficient.
 Find and interpret the coefficient of determination.
150
 Calculate and Interpret explained and unexplained variations.
 Calculate and interpret the spearman‘s correlation coefficient.
9.1 Introduction
In Chapter 8, two areas of inferential statistics—confidence intervals and
hypothesis testing—were explained. Another area of inferential statistics involves
determining whether a relationship exists between two or more numerical or
quantitative variables. For example, a businessperson may want to know
whether the volume of sales for a given month is related to the amount of
advertising the firm does that month. Educators are interested in determining
whether the number of hours a student studies is related to the student‘s score
on a particular exam. Medical researchers are interested in questions such as,
Is caffeine related to heart damage? Or is there a relationship between a person‘s
age and his or her blood pressure? A zoologist may want to know whether the
birth weight of a certain animal is related to its life span. These are only a few of
the many questions that can be answered by using the techniques of correlation
and regression analysis. Linear regression and correlation is studying and
measuring the linear relationship among two or more variables. When only two
variables are involved, the analysis is referred to as simple correlation and
simple linear regression analysis, and when there are more than two variables
the term multiple regression and partial correlation is used.
Correlation is a statistical method used to determine whether a relationship

between variables exists.
Regression is a statistical method used to describe the nature of the relationship
between variables, that is, positive or negative, linear or nonlinear.
9.2 Simple Linear Regression

Simple linear regression refers to the linear relationship between two variables.
We usually denote the dependent variable by Y and the independent variable by
X. A simple regression line is the line fitted to the points plotted in the scatter
151
diagram, which would describe the average relationship between the two
variables. Therefore, to see the type of relationship, it is advisable to prepare
scatter plot before fitting the model.
Suppose we have one independent variable X  ( X 1 , X 2 ,... X n ) and the dependent
Y  (Y1 , Y2 ,...Yn ) a simple linear regression model Y on X is given as
Model:
 Y is the response variable (also called dependent variable)
 X is the predictor (also called independent or explanatory variable)
 and are respectively the intercept of the regression line on the Y axis
when X = 0 and the slope of the regression line.

 ε is the error, residual (also called random deviation).
To estimate the parameters (  and  ) we have several methods:

 The free hand method
 The semi-average method
 The least square method
 The maximum likelihood method
 The method of moments
 Bayesian estimation technique.
The above model is estimated by: = Xi.

and are found by minimizing SSE    2   (Yi  Yˆi ) 2
Where: Yi  observedvalue
Yˆ  estimatedvalue  a  bX
i i
And this method is known as OLS (ordinary least square)
152
To minimize this function, first we take the partial derivatives of with
respect to respectively. Then the partial derivatives are equated to zero

separately and result in the following normal equations respectively.
Solving these normal equations simultaneously (i.e. Minimizing SSE    2 )
gives
The estimated (fitted) regression line is given by

= Xi
Before estimating the regression coefficients, it would be wise to plot the

observed data on a graph known as a scatter diagram. Scatter diagram is a plot
of all ordered pairs (xi ,yi ) on the coordinate plane which helps to observe
relationship between two variables. This diagram gives a preliminary idea on the
type of relationship the two variables have.
Regression analysis is useful in predicting the value of one variable from the
given value of another variable, = Xi.
Example 1: The following data shows the score of 12 students for Accounting
and Statistics examinations.
a) Calculate a simple correlation coefficient
b) Fit a regression equation of Statistics on Accounting using least square
estimates.
153
c) Predict the score of Statistics if the score of accounting is 85.
Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
a)
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two
variables are positively correlated (Y increases as X increases).
b)
154
 Yˆ  7.0194  0.9560X is the estimated regressionline.
c) Insert X=85 in the estimated regression line.
Yˆ  7.0194  0.9560X
 7.0194  0.9560(85)  88.28
Exercise 9.1
1. A car rental agency is interested in studying the relationship between the
distance driven in kilometer (Y) and the maintenance cost for their cars (X in
birr). The following summarized information is given based on samples of size
5. (Assignment )
   
2 5 5
X i  23,000 Y  36 ,
5
 147,000,000
5
 314 ,
2
i 1
Xi Y
i 1 i i 1 i 1 i

5
i 1
X i Yi  212, 000
a. Find the least squares regression equation of Y on X

b. Compute the correlation coefficient and interpret it.
c. Estimate the maintenance cost of a car which has been driven for 6 km
155
9.3 Simple Correlation and Coefficient of Determination
9.3.1 Simple correlation (r)
Suppose we have two variables X  ( X 1 , X 2 ,... X n ) and Y  (Y1 , Y2 ,...Yn ) . We may want
to describe the type & strength of relationship between the independent variable
X and the dependent variable Y. We can give these two by applying an index
called simple correlation coefficient. The population correlation coefficient is
represented by and its estimator by r. The correlation coefficient r is also called
Pearson‘s correlation coefficient since it was developed by Karl Pearson.

The correlation coefficient between X and Y denoted by r is given by
r
 ( X  X )(Y  Y )
i i
and the short cut formula is
 ( X  X )  (Y  Y )
i
2
i
2
n XY  ( X )( Y )
r
[ n  X 2  ( X ) 2 ] [ n  Y 2  ( Y ) 2
r
 XY  nXY
[ X  nX ] [ Y
2 2 2
 nY 2 ]
Possible Relationships between Variables
 When higher values of X are associated with higher values of Y and lower
values of X are associated with lower values of Y, then the correlation is said
to be positive or direct.
Examples:
 Income and expenditure
 Number of hours spent in studying and the score obtained
 Height and weight
 Distance covered and fuel consumed by car.
 When higher values of X are associated with lower values of Y and lower
values of X are associated with higher values of Y, then the correlation is said
to be negative or inverse.
156
Examples:
 Demand and supply
 Income and the proportion of income spent on food.
The correlation between X and Y may be one of the following
1. Perfect positive (slope=1)
2. Positive (slope between 0 and 1)
3. No correlation (slope=0)
4. Negative (slope between -1 and 0)
5. Perfect negative (slope=-1)
Remark: Always this r lies between -1 and 1 inclusively and it is also

symmetric.
Interpretation of r:
1. Perfect positive linear relationship ( if r  1)
2. Some Positive linear relationship ( if r is between 0 and 1)
3. No linear relationship ( if r  0)
4. Some Negative linear relationship ( if r is between -1 and 0)
5. Perfect negative linear relationship ( if r  1)
Example: Calculate the simple correlation between mid semester and final
exam scores of 10 students (both out of 50)
Student Mid Exam (X) Final Exam (Y)
1 31 31
2 23 29
3 41 34
157
4 32 35
5 29 25
6 33 35
7 28 33
8 31 42
9 31 31
10 33 34
Solution:
n  10, X  31.2, Y  32.9, X 2  973.4, Y 2  1082.4
 XY  10331, X 2
 9920, Y 2
 11003
r
 XY  nXY
[ X 2  nX 2 ] [ Y 2  nY 2 ]
10331  10(31.2)(32.9)

(9920  10(973.4)) (11003  10(1082.4))
66.2
  0.363
182.5
This means mid semester exam and final exam scores have a slightly positive
correlation.
Exercise 9.2
1. The following data were collected from a certain household on the monthly
income (X) and consumption (Y) for the past 10 months. Compute the simple
correlation coefficient.
X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360
9.3.2 Coefficient of Determination (r2)

The square of the correlation coefficient, r2, is called the coefficient of
determination. It measures the variation in the dependent Y explained by the
simple linear regression of Y on X.
158
i.e r 2 
 (Yˆ  Y ) 2
 (Y  Y ) 2
Where r  the simple correlation coefficient.
 r 2 measures the proportion of the variation in Y explained by the regression

of Y on X.
 1  r 2 measures the unexplained proportion and is called coefficient of

indetermination.
Example: If r = 0.9, then r2 = 0.81 and 1- r2 =0.19. Approximately 81% of the

variation in the dependent variable, Y, is explained by the simple linear
regression of Y on X fitted on sample data. The remaining, 1-r2, 19 % of the
variation in Y is unexplained by the simple linear regression of Y on X fitted on
sample data.
EXERCISES 9.3
1. The research director of the Saving and Loan Bank collected 25 observation of
montage interest rates X and number of house sales Y at each interest rate.
The director computed that,
, = 436 .
Compute and Interpret:

a. Coefficient of correlation.
b. The coefficient of determination.
9.4 Spearman’s Rank Correlation Coefficient

The simple correlation coefficient (r) cannot be used when we are dealing with a
qualitative data such as judgment about beauty, efficiency, honesty, etc. In
such cases, the rank correlation coefficient is used to explain the correlation or if
there is an agreement in ranking. It is denoted by rs and is defined as follows:
Definition: The coefficient of rank correlation, rs , given by Spearman for n
pairs, is
159
6 d 2
rs  1  , where d is the difference between the rank of x and the
n(n 2  1)
corresponding y.
To calculate rs , we first rank xs among themselves from least to best or from
best to least; then we rank the y' s in the same way, find the sum of the squares
of the differences, d, between the ranks of the x's and the y's. When there are
ties in rank, we assign to each of the tied observations (having equal value) the
mean of their ranks.
Example: Assume that ten girls in a beauty contest for Miss Ethiopia were
ranked by two judges as follows:
Girl 1 2 3 4 5 6 7 8 9 10
Number
Judge A 4 8 6 7 1 3 2 5 10 9
Judge B 3 9 6 5 1 2 4 7 8 10
Calculate rs and interpret it.
Solution: Since the ranks are given, we need to find only the difference in
ranks for each girl and the square of these differences.
Girl 1 2 3 4 5 6 7 8 9 10 Total
Number
D 1 -1 0 2 0 1 -2 -2 2 -1 0
d2 1 1 0 4 0 1 4 4 4 1 20
For these n = 10 pairs, d 2

 20 , and rs = 1  6(20)
10(100  1)
 0.88 , which is
positive and close to 1, showing that there is a very good agreement (or
concordance) between the two judges regarding the beauty of the girls.
160
Like the values of r, the values of rs also lie between -1 and +1, inclusive,
and the interpretations of its size and sign are analogous to those of r.
rs  1  Perfect positive agreement,
rs  1  Complete disagreement, where the two rankings go completely in
opposite direction.
Review Exercise on Chapter Nine

1. Stopping Distances: In a study on speed control, it was found that the main
reasons for regulations were to make traffic flow more efficient and to minimize the
risk of danger. An area that was focused on in the study was the distance required to
completely stop a vehicle at various speeds. Use the following table to answer the
questions.(Assignment )
MPH Braking distance (feet)
20 20
30 45
40 81
50 133
60 205
80 411
Assume MPH is going to be used to predict stopping distance.
a. Which of the two variables is the independent variable?
b. Which is the dependent variable?
c. What type of variable is the independent variable?
d. What type of variable is the dependent variable?
e. Construct a scatter plot for the data.
f. Is there a linear relationship between the two variables?
g. Redraw the scatter plot, and change the distances between the independent-variable
numbers. Does the relationship look different?
h. Is the relationship positive or negative?
i. Can braking distance be accurately predicted from MPH?
j. List some other variables that affect braking distance.
k. Compute the correlation coefficient and coefficient of determination, and give an
interpretation.
161
l. Find the linear regression equation.
m. What does the slope tell you about MPH and the braking distance? How about the y
intercept?
n. Find the braking distance when MPH =100.
o. Comment on predicting beyond the given data values.
2. The following table shows the heights to the nearest inch (in) and the weights to the
nearest pound (lb) of a sample or 12 male students drawn at random from the first
year students at a university.
Height x (in) 70 63 72 60 66 70 74 65 62 67 65
68
Weight y (lb) 155 150 180 135 156 168 178 160 132 145
139 152
a. Plot a scatter diagram of the data
b. Fit the least square equation
c. Estimate the weight of a student whose height is 63 inches.
3. It has been observed that the amount of soil eroded (in Kg) per day (Y) is determined
by the wind velocity (X) in that day (Km/Hr). Data obtained from the ministry of
agriculture for a certain area gave the ff summary statistics. Y  80.9 X2=
117,123.86 Y 2  412.81
Regression line: Y = -4.54 + 0.1123 X
a. What change in amount of soil erosion would be associated with 1 Km/Hr change
in wind velocity?
b. What amount of soil erosion would you predict for a wind velocity of 90 Km/Hr?
162
APPENDIX A
Appendix: Table-A: Area between z=0 and Z=z OR area between Z= 0 and Z≤z):
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0190 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2157 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2969 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3513 0.3554 0.3577 0.3529 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4215 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4492 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994
163 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
164
Table B. t- table with right tail probabilities
t
α=p 0.1 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005
df = 1 3.078 6.314 12.706 31.821 63.656 127.321 318.289 636.578
2 1.886 2.920 4.303 6.965 9.925 14.089 22.328 31.600
3 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924
4 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.869
6 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850
21 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.768
24 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.689
28 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.660
30 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
50 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496
165
60 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
Infinity 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.290
Table C. Right tail areas for the Chi-square Distribution
df\area 0.995 0.99 0.975 0.95 0.9 0.25 0.1 0.05 0.025 0.01 0.005
1 0.000 0.000 0.001 0.004 0.016 1.323 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 2.773 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 4.108 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 5.385 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 6.626 9.236 11.071 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 7.841 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 9.037 12.017 14.067 16.013 18.475 20.278
8 1.344 1.647 2.180 2.733 3.490 10.219 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 11.389 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 12.549 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 13.701 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 14.845 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 15.984 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 17.117 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 18.245 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 19.369 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 20.489 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 21.605 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 22.718 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 23.828 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 24.935 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 26.039 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 27.141 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 28.241 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 29.339 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 30.435 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 31.528 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 32.620 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 33.711 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 34.800 40.256 43.773 46.979 50.892 53.672
166
Appendix B
Answers for Exercises
Chapter 1
Review Exercise Chapter 1
1. I. Inferential Statistics II. Inferential Statistics III. Inferential
Statistics
2. A) All Students Taking Stat 3011 B) 50 Students C) CGPA D)
3.5 E) 2.8 F) Quantitative
3. A) Quantitative And Ratio B) Qualitative And Nominal C)
Qualitative And Ordinal ,
D) Qualitative And Nominal
E) Qualitative And Interval
F) Quantitative And Nominal G) Quantitative And Ratio H) Quantitative
And Ratio
Chapter 2
1. Ref. Self Exercise
Chapter 3

2. A) F4=15 And F5= 12 B) Mean= 55 C) Median 54.16 D) Mode=
52.657
Chapter 4
Exercise 4.1
2. City 1:
City 2:
167
City 1:
Therefore, The City2 Has The Most Consistent Temperature
Exercise 4.2
3. A) C.V1= 11.54% And C.V2= 10.92%, Since C.V2 < C.V1, Group 2 Is More
Consistent.
B) ZA= -1and ZB=-2, Child B Is Faster Because The Time Taken By Child B
Is Two Standard Deviations Shorter Than The Average Time Taken By Group 2,
While The Time Taken By Child A Is Only One Standard Deviation Shorter
Than The Average Time Taken By Group 1.
Review Exercise on Chapter 4

5. . Since , The Distribution Is Negatively Skewed.
6. The Fourth Moment Is 30,000.
Chapter 5
Exercise 5.1
4. 130,636,800
Exercise 5.2
5. A) 84 Ways B) 28 Ways C) 30 Ways
Exercise 5.3
1. A) 5/36 B) 1/6 C) 2/9 D) 1/6
2. A) 0.333 B) 0.357
Exercise 5.4
1. A) 56/127 B) 103/127 C) 47/127 D) 80/127
168
Exercise 5.5
1. A) 19/495 B) 19/99
2. A) 5/7
Review Exercise on Chapter 5

1. Explain It By Yourself
2. A) S= {BBB BBG BGB GBB GGG GGB GBG BGG}
B) S= {BBG BGB GBB GGG GGB GBG BGG}
3. A) 384 B) 1152
6. A) 0.3
7. A) 0.707 B) 0.589 C) 0.011 D) 0.731
Chapter 6
Exercise 6.1
1. Number Of Games X 4 5 6 7
Probability P(X) 0.200 0.175 0.225 0.400
2. A. Yes, It Is A Probability Distribution.
B. No, It Is Not A Probability Distribution, Since P(X) Cannot Be 1.5 Or -
1.0.
C. Yes, It Is A Probability Distribution.
D. No, It Is Not, Since =1.2.
Exercise 6.2
Exercise 6.3
169
2. A) B)
3. Because At Least One Of The Four Assumptions Of Binomial Experiment

Not Satisfied.
Exercise 6.4
1. A)
B)
C)
Review Chapter 6
1. A) B) C)
P(X>=3)=1 –P(X<3)=?
2. 0.221584
5. 0.1587
Chapter 7
Exercise 7.1
Review Chapter 7
1. Explain It By Yourself
3. A) Simple Random Sampling B) Cluster Sampling C) Systematic
Sampling D) Convenience Sampling E) Judgment Sampling
4. Assignment
5. Your Answer Depend On The First Selected Random Number , The Next
Number Also Chose Using Systematic Kth Intervals
170
Chapter 8
Exercises: 8.1
1. (761.19, 798.81)
2. (242.16, 257.84)
3. (283.61, 296.39)
1. 5.22 < µ<5.98
2. 6.54 < µ<7.66
Exercise Chapter 9
Chapter 9.1
Chapter 9.2
Chapter 9.1

2. a) Self Exercise b) = -60.7 + 3.22x c) 142.16,
3. a) -4.4277 b) 5.567
171
References
1. Eshetu Wencheko, Introduction to Statistics. April 2000, Addis Ababa
University.
2. Gupta S.P., Gupta M.P., Business Statistics, 2001, Sultan chand &
sons, New Delhi.
3. Monga G.S., Mathematics and Statistics for Economics (second
revised edition),
2007.
4. Moorthy M.B.K., Subramani K. & Santha A. Probability and
Statistics, Dec. 2007,
Scitechpublications (India) pvt. Ltd.
5. Pal Nabendu, Sarkar Sahadeb, Statistics concepts and applications,
2006, New Delhi.
6. Spiegel Murry R. & Stephen Larry J.. Statistics-schaum’s outline,
1999,
ATAMCGraw-Hill edition, 3rd edition, New Delhi.
172
7. Sullivan Michael, iii, Statistics: informed decision using data: 2004,
New Jersey.
Assosa University
Faculty of Natural and Computational Science
Department of Statistics
Introduction to Statistics for Sport Science Summer Student
Assessment Out of 60%
1. Suppose data collected for heights (in cms) 0f 390 cows were tabulated in a
frequency distribution and the following results were obtained.
fi: 6, 25, 48, 72, 116, 60, 38, 22, 3
CM1 =112, CM2=117 where CMi ith class mark
Determine:
a. the class interval size (class width)
173
b. the class limits
c. class boundaries
d. class marks
e. the less than cumulative frequency distribution
f. the class intervals having the highest frequency
g. Above which height do we find 50% of the cows?
h. Below which height do we get 25% of the cows?
Draw
i. histogram
ii. a frequency polygon
iii. a less than ogive for the above data
2. A meteorologist interested in the consistency of temperatures in three
cities during a given week collected the following data. The temperatures for
the five days of the week in the three cities were
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these
data?
3. Random samples of 10 boys are selected from the population of a certain

camp, and each boy‘s weight and height are measured and recorded. The
average weight of boys in the sample is 32.66kg with a standard deviation of
3.9kg and the average height is 95.5cm with a standard deviation of 5.2cm.
Is measurement of weight or height has less variable?
4. Some characteristics of annually family income distribution (in Birr) in two
regions is as follows:
Region Mean Median Standard Deviation
A 6250 5100 960
B 6980 5500 940
174
a. Calculate coefficient of skewness for each region
b. For which region is, the income distribution more skewed. Give your
interpretation for this Region
c. For which region is the income more consistent?
5. For a moderately skewed frequency distribution, the mean is 10 and the
median is 8.5. If the coefficient of variation is 20%, find the Pearsonian
coefficient of skewness and the probable mode of the distribution.
6. The sum of fifteen observations, whose mode is 8, was found to be 150 with
coefficient of variation of 20%
a. Calculate the pearsonian coefficient of skewness and give appropriate
conclusion.
b. Are smaller values more or less frequent than bigger values for this
distribution?
c. If a constant k was added on each observation, what will be the new
pearsonian coefficient of skewness? Show your steps. What do you
conclude from this?
7. The median and the mode of a mesokurtic distribution are 32 and 34
respectively. The 4th moment about the mean is 243. Compute the
Pearsonian coefficient of skewness and identify the type of skewness.
Assume (n-1 = n).
8. If the standard deviation of a symmetric distribution is 10, what should be
the value of the fourth moment so that the distribution is mesokurtic?
9. Out of 5 Mathematician and 7 Statistician a committee consisting of 2
Mathematician and 3 Statistician is to be formed. In how many ways this
can be done if
a. There is no restriction
b. One particular Statistician should be included
c. Two particular Mathematicians cannot be included on the committee.
10. If 3 books are picked at random from a shelf containing 5 novels, 3 books
of poems, and a dictionary, in how many ways this can be done if
175
a. There is no restriction.
b. The dictionary is selected?
c. 2 novels and 1 book of poems are selected?
d. Find the probability for a), b) and c ?
11. A box contains black chips and white chips. A person selects two chips
without replacement. If the probability of selecting a black chip and a white
chip is 15/56, and the probability of selecting a black chip on the first draw
is 3/8, find the probability of selecting the white chip on the second draw,
given that the first chip selected was a black chip.
12. Four married couples have bought 8 seats in a row for a show. In how
many different ways can they be seated
a. If each couple is to sit together?
b. If all the women sit together?
c. If all the women sit together to the right of all the men?
13. Two dice are rolled. Let X is a random variable denoting the sum of the
numbers on the two dice.
a. Give the probability distribution of X
b. Compute the expected value of X and its variance
14. An allergist claims that 45% of the patients she tests are allergic to some
type of weed. What is the probability that
a. Exactly 3 of her next 4 patients are allergic to weeds?
b. None of her next 4 patients are allergic to weeds?
15. On the average, five smokers pass a certain street corners
every ten minutes, what is the probability that during a given 10 minutes
the number of smokers passing will be
a. 6 or fewer
b. 7 or more
c. Exactly 8…….
176
16. Of a large group of men, 5% are less than 60 inches in height and
40% are between 60 & 65 inches. Assuming a normal distribution, find the
mean and standard deviation of heights.
17. Show that 65.24% of the observations in a normally distributed
population lie Between  -1.1 and  +0.8.
18. It is known in a pharmacological experiment that rats fed with a
particular diet over a certain period gain an average of 40 gms in weight. A
new diet was tried on a sample of 20 rats yielding a weight gain of 43 gms
with variance 7 gms.
a. Construct a 95% confidence interval estimate of the population
mean
b. Test the hypothesis that the new diet is an improvement assuming
normality. (Use α= 0.05 and assume the normality of the
population).
19. A car rental agency is interested in studying the relationship
between the distance driven in kilometer(Y) and the maintenance cost for
their cars(X in birr). The following summarized information is given based
on samples of size 5.
The summary data is given as: 23,000 36 147,
000,000 = 314 and = 212, 000.
a. Fit the regression equation Y on X and interpret the estimated
coefficients (the slope and intercept).
b. Estimate the maintenance cost of a car which has been driven for 6
km
c. Calculate the correlation coefficient and interpret it
d. Find the coefficient of determination and interpret it
177

Introduction To Statistics DaDU

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics DaDU

Uploaded by

Copyright:

Available Formats

Contents

CHAPTER ONE ............................................................................................... 1

1.1 Definition and Classification of Statistics ............................................... 1

1.5.2 Scale of Measurement ............................................................................. 8

CHAPTER TWO ........................................................................................... 13

2. METHODS OF DATA COLLECTION AND PRESENTATION.......................... 13

1.1 Methods of Data Collection ................................................................... 14

2.2 Methods of Data Presentation ............................................................... 17

2.2.2 Diagrammatic and Graphic Presentation of Data ................................ 24

3. MEASURES OF CENTERAL TENDENCY .................................................... 33

3.1 Introduction ............................................................................................ 34

3.2 Types of Measures of Central Tendency ................................................ 37

3.2.2 Geometric Mean ...................................................................................... 42

3.2.3 Harmonic Mean ...................................................................................... 43

3.3.4 The Mode or Modal Value ...................................................................... 45

3.2.5 Median ..................................................................................................... 47

CHAPTER FOUR ............................................................................................ 56

5. MEASURES OF VARIATION ....................................................................... 56

4.1 Introduction ......................................................................................... 57

4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation ............. 60

4.3.3 The Mean Deviation And Coefficient Of Mean Deviation ..................... 60

4.5 The standard Score (Z-score).. .............................................................. 68

CHAPTER FIVE ............................................................................................. 74

5. ELEMENTARY PROBABLITY ...................................................................... 74

5.1 INTRODUCTION ................................................................................... 75

5.5.2 Conditional Probability of an Event ...................................................... 89

Review Exercise on Chapter Five ................................................................... 91

CHAPTER SIX ............................................................................................... 93

6. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS....................... 93

8. ESTIMATION AND HYPOTHESIS TESTING .............................................. 131

8.1 Introduction ....................................................................................... 132

8.2.2 Interval Estimation ............................................................................... 134

8.2 Hypothesis Testing ............................................................................. 139

9. SIMPLE LINEAR REGRESSION AND CORRELATION ............................... 150

9.1 Introduction ....................................................................................... 151

9.3.2 Coefficient of Determination (r2) .................................................... 158

APPENDIX B ............................................................................................... 167

Answers for Exercises ............................................................................... 167

1.1 Definition and Classification of Statistics

Definition 2: Statistics (singular sense): Statistics is defined as the science of

1.2 Stages in Statistical Investigation

1.3 Definition of some basic statistical terms

Population: A population is a totality of things, objects, peoples, etc

 Helps to estimate the parameter of a large population.

 Is cheaper, practical, and convenient.

 Save time and energy.

 Easy to handle and analysis.

Sampling: The process of selecting a sample from the population is called

Statistic: Characteristic or measure obtained from a sample. It is a measure

Sample size: The number of elements or observation to be included in the

a. It condenses and summarizes a mass of data: the original set of data

1.5 Types of Variables & Measurement Scales

1.5.2 Scale of Measurement

The property of distance is concerned with the relationship of differences

More precisely, an equal difference between two numbers reflects an equal

Summary of chapter one

Review Exercise on Chapter One

1.1 Methods of Data Collection

B. Plan of data collection: in planning data collection the following points

2.1.1 Source of Data

Method of Primary Data Collection

2.2 Methods of Data Presentation