Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Statistics and probability ZCTB ( by: Abdulwahid .

R)
CHAPTER 1
1 CHAPTER ONE: INTRODUCTION
1.1 Definition and classification of statistics
1.1.1 Definition:
 Plural sense (lay man definition): Statistics is a collection of numerical facts and data.
 Singular sense (formal definition): Statistics is a mathematical science dealing with the
methods of collection, organizing the collected data, presentation, analysis and
interpretation of the data.
 Statistics is a subject that deals with numbers and figures describing certain situations. It
primarily deals with numerical data taken by surveys and summarizes these data in such a
way that this summary gives a good indication about the nature of the data.

1.1.2 Classification:
Statistics is broadly divided into two categories based on how the collected data are used.
1. Descriptive Statistics
 It deals with describing data without attempting to infer anything that goes beyond the
given set of data.
 It consists of collection, organization, summarization and presentation of data.
 It is concerned with summary calculations, graphs, charts and tables.
2. Inferential Statistics
 It deals with making inferences and/or conclusions about a population based on data
obtained from a limited sample of observations,
 It consists of performing hypothesis testing, determining relationships among variables and
making predictions.
 It is important because statistical data usually arises from sample.
 Statistical techniques based on probability theory are required.
For example,
a) The average income of all families (the population) in Ethiopia can be estimated from figures
obtained from a few hundred (the sample) families.
b) The average age of a student in Zion College is 20.1 years.
c) There is a relationship between smoking tobacco and an increased risk of developing lung cancer.

1.2 Stages in statistical investigation


There are five stages or steps in any statistical investigation.
1. Collection of data: the process of measuring, gathering, assembling the raw data up on which
the statistical investigation is to be based.
 Valid conclusions can only result from properly collected data.
 Data gathering is the basis (foundation) of any statistical work.
 Data can be collected in a variety of ways; one of the most common methods is through the
use of survey. Survey can also be done in different methods, three of the most common methods
are:
 Mailed Questionnaire
 Schedules through enumerator
 Personal interview and Observation(Direct and Indirect)
Secondary data may be available from published or unpublished sources.

Exercise: discuss the advantage and disadvantage of the above three methods with respect to each
other.
Zion CTB 2015/16 Page 1 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)

2. Organization of data: Summarization of data in some meaningful way, e.g table form
3. Presentation of the data: The process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.
5. Inference of data: The interpretation and further observation of the various statistical measures
through the analysis of the data by implementing those methods by which conclusions are
formed and inferences made.
 Statistical techniques based on probability theory are required.
Exercise: the purpose of the above stages.

1.3 Definition of some basic terms


a) Statistical Population: It is the collection of all possible observations of a specified
characteristic of interest (possessing certain common property) and being under study.
An example is all of the students in ZC taken stat 102 course in this term.
b) Sample: It is a subset of the population, selected using some sampling technique in such
a way that they represent the population.
c) Sampling: The process or method of sample selection from the population.
d) Sample size: The number of elements or observation to be included in the sample.
e) Sampling frame:- A list of people, items or units from which the sample is taken.
f) Parameter: is a descriptive measure of a population, or summary value calculated from
a population. Examples: Average, Range, proportion, variance,
g) Statistic: is a descriptive measure of a sample, or summary value calculated from a
sample. Eg. Average =20,
h) Census: Complete enumeration or observation of the elements of the population. Or it is
the collection of data from every element in a population
i) Variable: It is an item of interest that can take on many different numerical values.

1.4 Application and limitation of statistics


Statistics can be applied in any field of study which seeks quantitative evidence. For instance
 Statistics condenses and summarizes complex data. The original set of data is normally
voluminous and disorganized unless it is summarized and expressed in few numerical values.
 Statistics facilitates comparison of data. Measures obtained from different set of data can be
compared to draw conclusion about those sets. Statistical values such as averages, percentages,
ratios, etc, are the tools that can be used for the purpose of comparing sets of data.
 Statistics helps in predicting future trends. Statistics is extremely useful for analyzing the past
and present data and predicting some future trends.
 Statistics influences the policies of government. Statistical study results in the areas of
taxation, on unemployment rate, on the performance of every sort of military equipment, etc,
may convince a government to review its policies and plans with the view to meet national
needs and aspirations.
 Statistical methods are very helpful in formulating and testing hypothesis and to develop new
theories.

Zion CTB 2015/16 Page 2 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
However, Statistics has the following limitations.
a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty, and
standard of living.
b) It does not study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative studies.
c) Statistical results are true only on the average.
d) It is sensitive for misuse: Examples: The number of car accidents committed in a city in a
particular year by women drivers is 10 while that committed by men drivers is 40. Hence
women drivers are safe drivers.
Uses of statistics:
The main function of statistics is to enlarge our knowledge of complex phenomena. The following
are some uses of statistics:
1. It presents facts in a definite and precise form.
2. Data reduction.
3. Measuring the magnitude of variations in data.
4. Furnishes a technique of comparison
5. Estimating unknown population characteristics.
6. Testing and formulating of hypothesis.
7. Studying the relationship between two or more variable.
8. Forecasting future events.

1.5 TYPES OF VARIABLES AND MEASUREMENT SCALES


1.5.1 TYPES OF VARIABLES
A variable is a characteristic of an object that can have different possible values.
There are two types of variables.
a) Quantitative variables: are variables that can be quantified or can have numerical values.
Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that cannot be quantified directly.
Examples: color , beauty, sex, location
qualitative variables are also called categorical variables (attributes).
 And hence we have two types of data; quantitative & qualitative data.
Quantitative variables can be further classified as
 Discrete variables, and
 Continuous variables
I. Discrete variables are variables whose values are counts.
Examples: number of students, number of households (family size), Number of pages of a
book. Number of defective products in company ABC.
II. Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.

1.5.2 MEASUREMENT SCALES


There are four types of measurement scales for variables
1. Nominal scale: - “Nominal “is a Latin word for “name” This is a scale for grouping individuals
into different categories.
 Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
 In this scale, one is different from the other
 +, -, *, /, Impossible,( No arithmetic and relational operation can be applied)
 comparison is impossible
Examples: Blood Type(A,B,AB,O),Political party preference (Republican, Democrat, or
Other), Sex (Male or Female.)Marital status(married, single, widow, divorce),Country code,
Regional differentiation of Ethiopia, red, brown, black, short, tall, pass, fail etc
Zion CTB 2015/16 Page 3 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
2. Ordinal scale: - “ ordinal” is a Latin word, meaning “order”
 It is a scale for grouping and ordering of individuals in to different categories.
 Data consisting of an ordering of ranking of measurements are said to be on an ordinal
scale of measurements.
 Level of measurement which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
 One is different from and grater /better/ less than the other.
 +, -, *, / Are impossible,( Arithmetic operations are not applicable)
 Comparison is possible.(relational operations are applicable)
Examples:
 Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).
 Military status.
 Man A weighs more than man B
 Faster, taller, shorter, military ranks, ranks in race, e t c
Ordinal scales data contain and convey more information than the nominal scale data, for
relative magnitudes are known, however, quantitative comparisons are impossible.
3. Interval scale: is a measurement scale in which:
 There is no physical significance to the zero point.
 There is a constant interval size between any adjacent units on the measurement
scale.
 Interval scales are measurement systems that possess the properties of Order and
distance, but not the property of fixed zero.
 Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.
 Interval scale data convey better information than nominal and ordinal scale data
 In this measurement when zero occurs it is an arbitrary measurement rather than
actually indicating “nothing”.
Examples:
 IQ
 Temperature in 0F.

4. Ratio Scales:
 Ratio scales are measurement systems that possess all three properties: order,
distance, and fixed zero.
 The added power of a fixed zero allows ratios of numbers to be meaningfully
interpreted; i.e. the ratio of Bekele's height to Martha's height is 1.32, whereas
this is not possible with interval scales.
 Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units
of measure.
 +, -, *, / Are possible on this scale and relational operations are applicable.
 This measurement scale provides better information than interval scale of
measurement
 Zero measurement indicates absence of the quantity being measured.
Examples:
Weight
Height
Number of students
Age
Zion CTB 2015/16 Page 4 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
CHAPTER 2
2 CHAPTER TWO: SAMPLING SURVEY
2.1 Introduction
When secondary data are not available for the problem under study, a decision may be taken to collect
primary data by using any of the methods discussed in the previous chapter. The required information
may be obtained by following either the census method or the sample method.
What is the Difference between Census and Sample Method?
2.2 Census and Sample Method
What are the merits and demerits of Census method?
The merits of the census method are
Data are obtained from each and every unit of the population, The results obtained are likely to be
more representative, accurate and reliable, It is an appropriate method of obtaining information on
rare events &Data of complete enumeration census can be widely used as a basis for various surveys.
Demerits
However, despite these advantages the census method is not very popularly used in practice.
The effort, money and time required for carrying out complete enumeration will generally be very
large and in many cases cost may be so prohibitive that the very idea of collecting information may
have to be dropped. Also if the population is infinite or the evaluation process destroys the population
unit, the method cannot be adopted.
What is ‘universe’ in Statistics?
The word ‘universe’ as used in Statistics denotes the aggregate from which the sample is to be taken.
The universe may be either finite or infinite.
A finite universe is one in which the number of items is determinable, such as the number of students
in ZCTB or in Ethiopia.
An infinite universe is that in which the number of items can not be determined, such as the number
of stars in the sky
Q. What is sampling?
The process of sampling involves three elements:
a. Selecting the sample.
b. Col1ecting the information, and
c. Making an inference about the population.
Q. Can you give Practical examples of sampling?
2.3. Essentials of Sampling
i. Representative ness: A sample should be so selected that it truly represents the universe.
ii. Adequacy: The size of sample should be adequate; otherwise it may not represent the
characteristics of the universe.
Zion CTB 2015/16 Page 5 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
iii. Independence: All items of the sample should be selected independently of one another and all
items of the universe should have the same chance of being selected in the sample.
iv. Homogeneity: When we talk of homogeneity we mean that there is no basic difference in the
nature of units of the universe and that of the sample.
What the Methods of Sampling?
2.4. Methods of Sampling
‘The various methods of sampling can be grouped under two broad heads:
1. Probability sampling (also known as random sampling) and
2. Non-probability (or non-random) sampling.
Probability Sampling
Probability sampling methods are those in which every item in the universe has a known chance, or
probability, of being chosen for sample.
Advantages of Probability Sampling
The following are the basic advantages of probability sampling methods:
 does not depend upon the existence of detailed info. about the universe for its effectiveness.
 It provides estimates which are essentially unbiased and have measurable precision
 It is possible to evaluate the relative efficiency of various sample designs only when it is used.
Limitations of Probability Sampling
The limitations are
 It requires a very high level of skill and experience, a lot of time to plan and execute and costly.
N.B. Non-random sampling is a process of sample selection without the use of randomization.
The most important difference between random and non-random sampling is that whereas the
pattern of sampling variability can be ascertained in case of random sampling. In non-random
sampling, there is no way of knowing the pattern of variability in the process.
Non-probability Sampling Methods
Non-probability sampling methods are those, which do not provide every item in the universe with a
known chance of being included in the sample. The selection process is, at least, partially subjective.
i. Judgment sampling;
ii. Convenience sampling; and
iii. Quota sampling.
Probability Sampling Methods
a. Simple or unrestricted random sampling; and
b. Restricted random sampling:
i. Stratified sampling.
ii. Systematic sampling. And
iii. Cluster sampling.

Zion CTB 2015/16 Page 6 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
CHAPTER 3
3 Methods of data collection and Presentation
3.1. Methods of data collection
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
 Comparable
 Meaningful and
 Collected for a well-defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of magnitude.
 It enables us to know the range of the data set easily and it also gives us some idea about
the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained from
either a primary source or a secondary source.
Primary source: Is a source of data that supplies first-hand information for the use of the immediate
purpose.
1. Primary data: are data originally collected for the immediate purpose.
 Data measured or collect by the investigator or the user directly from the source.
 Primary data are more expensive than secondary data.
 The process of data collection from a primary source may in value.
a) Field trials
b) Laboratory experiments
c) Surveys – census survey - Sample survey.
2. Secondary data: data collected from a secondary source.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.
 Usually they are published or unpublished materials, records, reports, e t c.
 When our source is secondary data check that:
I. The type and objective of the situations.
II. The purpose for which the data are collected and compatible with the present
problem.
III. The nature and classification of data is appropriate to our problem.
IV. There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
2.2 METHODS OF DATA PRESNTATION
Having collected and edited the data, the next important step is to organize it. That is to present it in
a readily comprehensible condensed form that aids in order to draw inferences from it. It is also
necessary that the like be separated from the unlike ones.

The presentation of data is broadly classified in to the following two categories:


 Tabular presentation
 Diagrammatic and Graphic presentation.

 The process of arranging data in to classes or categories according to similarities technically is


called classification.

Zion CTB 2015/16 Page 7 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
 Classification is a preliminary and it prepares the ground for proper presentation of data.
Classification eliminates inconsistency and also brings out the points of similarity and/or
dissimilarity of collected items/data. It is necessary because it would not be possible to draw
inferences and conclusions if we have a large set of collected [raw] data.

2.2.1 Frequency Distributions


Definitions:
 Raw data: recorded information in its original collected form, whether it be counts or
measurements, is referred to as raw data.
 Frequency: - is the number of times a certain value or set of values occurs in a specific group.
 Frequency distribution: is the organization of raw data in table form using classes and
frequencies.
Example: A frequency distribution presenting the number of males and females in a class
Sex Frequency
Male 57
Female 39
There are three basic types of frequency distributions
 Categorical frequency distribution
 Ungrouped frequency distribution
 Grouped frequency distribution
There are specific procedures for constructing each type.
1) Categorical frequency Distribution:
Used for data that can be place in specific categories such as nominal, or ordinal. E.g. Marital status.
Example: a social worker collected the following data on marital status for 25
persons.(M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M, S,
D, and W. These types will be used as class for the distribution. We follow procedure to construct the
frequency distribution.
Step 1: Make a table as shown.

Class (1) Tally (2) Frequency (3) Percent (4)


M
S
D
W
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
f
Step 4: Find the percentages of values in each class by using; %  * 100
N
Where f= frequency of the class, N=total number of value.
Zion CTB 2015/16 Page 8 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
Percentages are not normally a part of frequency distribution but they can be added since they are used
in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.

Class(1) Tally (2) Frequency (3) Percent(4)


M //// 5 20
S //// // 7 28
D //// // 7 28
W //// / 6 24
2) Ungrouped frequency distribution
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution is
often constructed for small set of data or a discrete variable.

Constructing an ungrouped frequency distribution

To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in
the collected data. Then make a columnar table of all potential raw scored values arranged in order
of magnitude with the number of times a particular value is repeated, i.e., the frequency of that value.
To facilitate counting method, tallies can be used.
Example: The following data are the ages in years of 20 ZC Instructors who attend seminar on
auditing. 30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
Construct a frequency distribution for these data.
STEP 1. Find the range of the data:
Range  Maximum observatio n  Minimum observatio n (R=H-L)
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency
STEP 3. distribution becomes as follows.

Age Tally Frequency


29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1
3) Grouped frequency distribution
When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.

Zion CTB 2015/16 Page 9 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
Some Important Definitions
 Class: the different, non-overlapping groups of data.
 Class limits: separate one class in a grouped frequency distribution from another. The limits
could actually appear in the collected data and have gaps between the limit of one class and
the lower limit of the next class.
 Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001, -----.
 Class boundaries: separate one class in a grouped frequency distribution from another. The
boundaries have one more decimal place than the raw data and therefore do not appear in the
collected data. There is no gap between the upper boundary of one class and the lower
boundary of the next class. The lower class boundary (LCB) is found by subtracting 0.5 units
of measurement from the lower class limit (LCL) and the upper class boundary (UCB) is
found by adding 0.5 units of measurement to the upper class limit (UCL). That is,
LCB=LCL- 1 2 U and UCB =UCL + 1 2 U
 Class width (W): the difference between the upper and lower boundaries of any class or the
lower limits of two consecutive classes, or the upper limits of two consecutive classes.
o N.B. Class width is not equal to the difference between UCL and LCL of the same class.
 Class mark (M): the midpoint of a class interval.
UCBi  LCB i
i.e. M 
2
 Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
 Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.
 A tabular arrangement of class intervals together with their corresponding cumulative
frequency (either less than or more than type; as defined above) is called a cumulative
frequency distribution.
 Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
Frequency of that class
Re lative frequency of a class 
Total frequency

Note:
 The relative frequency shows what fractional part or proportion of the total frequency belongs
to the corresponding class.
 The sum of all the relative frequencies in the frequency distribution is always 1.
 Relative cumulative frequency (less than type/ more than type): total of the relative
frequencies above/ below a class inclusively. Or the cumulative frequency (less than
type/more than type) divided by the total frequency. This gives the percent of values which
are less than/more than the upper/lower class boundary.

Zion CTB 2015/16 Page 10 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This means that no data value can fall into two
different classes
3. The classes must be all inclusive or exhaustive. This means that all data values must be
included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width. The exception here is the first or last class. It is possible
to have an "below ..." or "... and above" class. This is often used with ages.
Guidelines to construct a grouped frequency distribution
STEP 1. Determine the unit of measurement, U
STEP 2. Find the maximum(Max) and the minimum(Min) observation, and then compute their range,
R Range  Max  Min
STEP 3. Fix the number of classes desired (k). there are two ways to fix k:
 Fix k arbitrarily between 5 and 20, or
 Use Sturge’s Formula: k  1 3.332log10 N where N is the total frequency. And round

this value of k up to get an integer number.


STEP 4. Find the class widths (W) by dividing the range by the number of classes and round the number

up to get an integer value. W R


K
STEP 5. Pick a suitable starting point less than or equal to the minimum value. This starting point is the
lower limit of the first class. Continue to add the class width to this lower limit to get the rest
of the lower limits.
STEP 6. Find the upper class limits. To find the upper class limit of the first class, subtract one unit of
measurement from the lower limit of the second class. Then continue to add the class width to
this upper limit so as to get the rest of the upper limits.
STEP 7. Compute the class boundaries as: LCB  LCL  12 U and UCB  UCL  12 U
STEP 8. Tally the data.
STEP 9. Find the frequencies.
STEP 10. (If necessary) Find the cumulative frequencies (more than and less than types).
Example: The number of hours 40 employees spends on their job for the last 7 working days is given
below.
62 50 35 36 31 43 43 43
41 31 65 30 41 58 49 41
37 62 27 47 65 50 45 48
27 53 40 29 63 34 44 32
58 61 38 41 26 50 47 37
Construct a suitable frequency distribution for these data using 8 classes.
STEP 1. Unit of measurement; U= 1year
STEP 2. Max = 65, Min = 26 so that R = 65-26 = 39
STEP 3. It is already determined to construct a frequency distribution having 8 classes.

Zion CTB 2015/16 Page 11 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
STEP 4. Class width W  39  4.875  5
8
STEP 5. Starting point = 26 = lower limit of the first class. And hence the lower class limits become
26 31 36 41 46 51 56 61
STEP 6. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65
The lower and the upper class limits (Steps 5 and 6) can be written as follows.
Class limits Class limits
26 – 30 46 – 50
31 – 35 51 – 55
36 – 40 56 – 60
41 – 45 61 – 65
STEP 7. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units of
measurement to the upper class limits, we can get lower and upper class boundaries as follows.
Class Class
boundaries boundaries
25.5 – 30.5 45.5– 50.5
30.5 – 35.5 50.5– 55.5
35.5– 40.5 55.5– 60.5
40.5– 45.5 60.5– 65.5
STEPS 8, 9 and 10 are displayed in the following table (columns 3, 4 and 5&6 respectively).
Class limits Class Tally frequency Cumulative Cumulative
boundaries frequency (less frequency (more
than type) than type)

26 – 30 25.5 – 30.5 //// 5 5 40


31 – 35 30.5 – 35.5 //// 5 10 35
36 – 40 35.5– 40.5 //// 5 15 30
41 – 45 40.5– 45.5 //// //// 9 24 25
46 – 50 45.5– 50.5 //// // 7 31 16
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
61 – 65 60.5– 65.5 //// / 6 40 6

2.2.2 Diagrammatic and Graphic Presentation of Data


The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically.
Diagrams and graphs:
 are techniques for presenting data in visual displays using geometric figures;
 are visual aids which give a bird’s eye view about a given set of numerical data;
 have greater attraction than mere figures (numbers);
 facilitate comparison of data;
 are easily understandable by anyone who does have no statistical background
Zion CTB 2015/16 Page 12 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continuous types of data.
There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms, as well as three common graphic presentations of data: histogram, frequency polygon,
and cumulative frequency polygon (ogive).
I. Bar-diagrams/ Bar-charts
 Bar-diagram is a series of equally spaced bars having equal width and the height of each bar
representing the magnitude or frequency of observations in each group.
 Bar-diagrams are usually used to represent one way or simple frequency distribution.
 Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-diagrams are
used for qualitatively classified data whereas vertical bar-diagrams are used for quantitatively
classified data.
Example: Horizontal bar-diagram.

AB
Blood Type

8 10 12 14 16 18

Frequency

There are a number of bar-diagrams. The most common being:


 Simple bar-diagrams
 Deviation (two-way) bar-diagrams
 Broken bar-diagrams
 Component (subdivided) bar-diagrams
 Multiple bar-diagrams
1. Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.
Example: The following frequency distribution shows sales of production (in million birr) of three
products for 2004 production year.
Product Sale (in
million)
A 14
B 21
C 9
D 17
The bar-diagram presentation for these data is given below.
22

20

18

16
Sales (in mil ion birr)

14

12

10

6
A B C D

Product

Zion CTB 2015/16 Page 13 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
2. Deviation bar-diagrams
When the data take both positive and negative values (for instance data on profit, net export, percent
change, etc) deviation bar-diagrams are appropriate.
Example: Present Net profit (in thousands birr) in oil sales for five years using a suitable bar-diagram.
Year Profit (in
thousands)
1997 12
1998 -5
1999 14
2000 9
2001 -6
The deviation bar-diagram for the data looks like the following.
20
Profit (in thousands)

10

-10
1997 1998 1999 2000 2001

Year

3. Broken bar-diagrams
This kind of bar-diagram is used to present data involving a few extreme values where it will be
difficult to accommodate the magnitude of the bars corresponding to these values within the graph
paper. In this case we use pieces of bars with each piece starting with a jump on the numerical scale.
Example: Data: - Amount of production per a day for four products of a factory.
Product Quantity
produced (kg/day)
A 14
B 35
C 23
D 109

4. Component bar-diagrams
When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a variable
with each aggregate broken into its component parts and different colors or designs are used for
identification.
Example: Represent the following data using bar-charts
Data: Yields of production of farmers in Southern Ethiopia.
Year  1990 EC 1991 EC 1992 EC 1993 EC
Crop
Barley 14 15 26 19
Wheat 10 15 14 25
Maize 2 6 10 3
Total 26 36 50 47
The component bar-diagram for this table is as follows

Zion CTB 2015/16 Page 14 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
60

50

Production
40

30

20

MAIZE
10
WHA ET

0 BARLEY
1990 1991 1992 1993

YEAR

5. Multiple bar-diagrams
Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.
Example: The data given in the above example can be presented by using multiple bar-diagram as
below.
30

20
Production

10

BARLEY

WHAET

0 MAIZE
1990 1991 1992 1993

YEAR

II. Pie-charts
A pie-chart is a circle that is divided into sections according to the percentages of frequencies in each
category of the distribution. The angle of the sector of a class is obtained by multiplying the ratio of
the frequency of the class to the total frequency by 3600.
frequency of the class
i.e. sec tor angle of a class   360 0
total frequency
Note that pie-charts are usually used for depicting nominal level data.
Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses. Below
is the breakdown of the various expenditure items. Draw an appropriate chart to portray the data.
Expenditure item Amount (in birr)
Fuel 603
Interest on car loan 279
Repairs 930
Insurance and license 646
Depreciation 492
Total 2,950
How to draw a pie-chart
 First find the percentages of each class
 Next calculate the degree measures for each class

Zion CTB 2015/16 Page 15 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
 Finally, using a protractor, put each sector /degree measure/ in a circle and give a key for
explanation.
Expenditure item Amount (in Percentage Degree
birr) (approx) (approx)
Fuel 603 20 74
Interest on car loan 279 9 34
Repairs 930 32 113
Insurance and license 646 22 79
Depreciation 492 17 60
Total 2,950 100 360

Now we can draw the pie-chart for the data.

Key
20%
17%
Fuel
Insurance and license
9% 22%
Repairs
Interest on car loan
Depreciation
32%

III. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a suitable
picture to represent a definite number of units in which the variable is measured.
Example: Draw a pictorial diagram to present the following data (number of students in a certain
school for four years.)
Year 2000 2001 2002 2003
No. of students 2000 3000 5000 7000
Let a single picture () represents one thousand students.
2003 
2002  Key: = 1000 students
2001 
2000 

IV. Histogram
A histogram is another way of data presentation which is more suitable for frequency distributions
with continuous classes. In drawing a histogram, we put the class boundaries of each class on the
horizontal axis and its respective frequency on the vertical axis.
Example: Draw a histogram presenting the following data.

Zion CTB 2015/16 Page 16 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)

V. Frequency Polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical
axis and their respective class marks along the horizontal axis. Then join the cross points by a free
hand curve.
Example: Present the data in the previous example using a frequency polygon.
8

4
Value Frequency

0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5

Class Mid points

VI. Cumulative Frequency Polygon (Ogive)


Cumulative frequency polygon can be traced on less than or more than cumulative frequency basis.
Place the class boundaries along the horizontal axis and the corresponding cumulative frequencies
(either less than or more than cumulative frequencies) along the vertical axis. Then join the cross
points by a free hand curve.
Example: the data in the previous example can be presented using either a less than or a more than
cumulative frequency polygon as given below (i) and (ii) respectively.
(i) Less than type cumulative frequency polygon (ii) More than type cumulative frequency polygon

30
30
Less than type cumulative frequencies

More than type cumulative frequencies

20
20

10 10

0 0
11.50 17.50 23.50 29.50 35.50 41.50 5.50 11.50 17.50 23.50 29.50 35.50

Upper class boundaries Lower class boundaries

Zion CTB 2015/16 Page 17 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
CHAPTER 4
Measuring Central Tendency:
4.1. Introduction
The most important aspect of studying the distribution of a sample measurement is the position of the
central value, that is, a representative value about which the measurements are distributed and when
it is convenient to have one figure that is representative of each group. This figure is known as the
average of the group. If the numbers of the group are arranged in order of magnitude, the averages
tend to fall around the central position in the group, so averages are called measures of central
tendency. In short, any measure intended to represent the center of data set is called a measure of
location or central tendency.
Objectives
The most important objectives of measuring central tendency are:
 To determining a single value around which the other data will concentrate
 To summarizing/reducing the volume of the data
 To facilitating comparison within one group or between groups of data
Desirable properties of good measure of central tendency
We say a measure of central tendency is best if it posses most of the following. It should:
 be simple to understand and easy to calculate/interpret,
 exist and be unique,
 be rigidly defined by mathematical formula,
 be based on all observations,
 Not be seriously affected by extreme observations,
 Have capable of further statistical analysis and/or algebraic manipulation.

4.2. The Summation Notation (∑)


Let a data set consists of a number of observations, represents by x1 , x 2 , ..., xn where n (the last
subscript) denotes the number of observations in the data and xi is the ith observation. Then the sum

+ + ⋯+ =

For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x2 , x3 , x4 , x5 and x6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x6 = 37.
6
Their sum becomes xi 1
i  21+13+59+46+32+37=208.
n
2 2 2 2
Similarly x1  x 2  ...  x n =  xi
i 1
Some Properties of the Summation Notation
n
1.  c = n.c
i 1
where c is a constant number.
n n
2.  b.x
i 1
i  b  xi where b is a constant number
i 1
n n
3.  (a  bx )  n.a  b x
i 1
i
i 1
i where a and b are constant numbers
n n n
4.  (x
i 1
i  y i )   xi   y i
i 1 i 1

Zion CTB 2015/16 Page 18 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
4.3. Types of Measures of Central Tendency
Several types of averages or measures of central tendency can be defined, the most commons are
- the arithmetic mean or the mean
- the mode
- the median
The choice of average (measure of central tendency) depends upon which best represents the property
under discussion.

4.3.1. The Arithmetic Mean (The Mean)


The arithmetic mean is defined as the sum of the measurements of the items divided by the total
number of items.
Arithmetic Mean for Ungrouped Frequency Distribution
When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is
+ +⋯+ ∑ k
=
+ + ⋯+
=

Note that f
i 1
i n

Example: Obtain the mean of the following number


2, 7, 8, 2, 7, 3, 7
Solution:
Xi fi Xi f i
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
4


i 1
fi X i
36
X  4
  5 .15
7
 i 1
fi

Exercise 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record the
following:
17.5 19.5 17.5 19 20
21 18 19.5 18 10.75
Compute the sample mean length of the infants for these data.
Exercise 2: Monthly incomes of second year regular students are given in the following frequency
distribution.
Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
Compute the mean for these data.
Arithmetic Mean for Grouped Frequency Distribution
If data are given in the form of continuous FD, the sample mean can be computed as
⋯ ∑
= ⋯
= ∑
Where xi = the class mark of the i th class; i = 1, 2, …, k f i th
= the frequency of the i class and
k
Note that  f i  n = the total number of observations.
k = the number of classes i 1

Zion CTB 2015/16 Page 19 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)

Example: Calculate the mean for the following age distribution.


Class Frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
Solutions:
 First find the class marks
 Find the product of frequency and class marks
 Find mean using the formula.
Class fi Xi Xifi
6- 10 35 8 280
11- 15 23 13 299
16- 20 15 18 270
21- 25 12 23 276
26- 30 9 28 252
31- 35 6 33 198
Total 100 1575
6

f
i 1
i Xi
1575
X  6
  15 .75
100
 i 1
fi

Exercises:
1. Marks of 75 students are summarized in the following frequency distribution:

Marks No. of students


40-44 7
45-49 10
50-54 22
55-59 f4
60-64 f5
65-69 6
70-74 3
If 20% of the students have marks between 55 and 59
i. Find the missing frequencies f4 and f5.
ii. Find the mean.
2. The following table gives the daily wages of laborers. Calculate the average daily wages paid
to a laborer.
Wages in birr 11-13 13-15 15-17 17-19 19-21 21-23 23-25
Number of laborers 3 4 5 6 6 4 3
Properties of the Arithmetic Mean
 The sum of the deviations of the items from their arithmetic mean is zero.
 If the mean of x1 , x2 , ..., xn is x , then
a) the mean of x1  k , x2  k , ..., xn  k will be x  k
b) The mean of kx1 , kx2 , ..., kxn will be kx .

Zion CTB 2015/16 Page 20 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
Example 1: Last year there were three sections taking Stat 102 course in ZCTB. At the end of the
semester, the three sections got average marks of 80, 83 and 76. There were 28, 32 and 35 students
in each section respectively. Find the mean mark for the entire students.
Solution:
n x  n2 x 2  n3 x3 28(80)  32(83)  35(76) 7556
xc  1 1    79.54
n1  n2  n3 28  32  35 95
Exercise: The average score on the mid-term examination of 25 students was 75.8 out of 100. After
the mid-term exam, however, a student whose score was 41 out of 100 dropped the course. What is
the average/mean score among the 24 students?
4.3.2. Weighted Arithmetic Mean
In finding arithmetic mean, all items were assumed to be of equal importance. When due importance
is to be given to each item, that is, when proper importance is required to be given to different data,
then we find weighted average. Weights are assigned to each item in proportion to its relative
importance.
If x1 , x2 , ..., xk represent values of the items and w1 , w2 , ... , wk are the corresponding weights, then the
weighted mean, ( xw ) is given by
+ +⋯+ ∑
= =
+ + ⋯+ ∑
Example: A student’s final mark in stat, PAI, English and psychology are respectively 82, 80, 90 and
70.If the respective credits received for these courses are 3, 5, 3 and 1, determine the approximate
average mark the student has got for one course.
Solution: We use a weighted arithmetic mean, weight associated with each course being taken as the
number of credits received for the corresponding course.
xi 82 80 90 70
wi 3 5 3 1

Therefore x w 
w x i (3  82)  (5  80)  (3  90)  (1  70)
i
  82.17
w i 3  5  3 1
Average mark of the student for one course is approximately 82.
Merits of Arithmetic Mean
 Arithmetic mean is rigidly defined a mathematical formula so that its value is always
definite.
 It is calculated based on all observations.
 Arithmetic mean is simple to calculate and easy to understand. It doesn’t need arraying
(arranging in increasing or decreasing order) of the data.
 Arithmetic mean is also capable of further algebraic treatment.
 It affords a good standard of comparison.
Demerits of Arithmetic Mean
 It is highly affected by extreme (abnormal) observations in the series. For instance, the
monthly incomes of three boys are 37 birr, 53 birr and 48 birr and that of their father is 1026
birr. The average income become for one of these four people becomes 219 birr which is
not at all a representative figure.
 It can be a number which does not exist in the series.
 It sometime gives such results which appear almost meaningless. For example it is likely
that we can get an average of ‘3.6 children’ per family.
 It gives greater importance to bigger items of a series and lesser importance to smaller items.
That means it is an upward bias measure.
 It can’t be calculated for open-ended classes.
THE GEOMETRIC MEAN &THE HARMONIC MEA (reading assignment)
Zion CTB 2015/16 Page 21 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)

4.4. The Median


The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of x1 , x2 , ..., xn
by ~
x . For ungrouped data the median is obtained by
 x n1 if the number of items, n, is odd
~  2
x  1
 ( x n  x n 2 ) if the number of items, n, is even
 2 2 2

For grouped data the median, obtained by interpolation method, is given by


= + −
2
Where Lmed  lower class boundary of the median class
C = Sum of frequencies of all class lower than the median class (in other words it is the cumulative
frequency preceding the median class)
f med  Frequency of the median class and W  is class width
The median class is the class with the smallest cumulative frequency greater than or equal to n . It can
2
be located by counting n of the frequencies beginning from the lowest class.
2
Example: Find the median of the following numbers.
a) 6, 5, 2, 8, 9, 4.
b) 2, 1, 8, 3, 5.
Solutions:
a) First order the data: 2, 4, 5, 6, 8, 9
Here n=6
~ 1 1 1
X  ( X n  X n )  ( X [3]  X [ 4] )  (5  6)  5.5
2 [
2
] [
2
1] 2 2
b) Order the data :1, 2, 3, 5, 8
Here n=5
~
X  X n1  X [ 3]  3
[ ]
2

Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2, 6.4,
10.5, 8.1 and 7.8. Find the median weight of these five babies.
Solution: the median is 8.1.
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
 First find the less than cumulative frequency.
 Identify the median class.
Zion CTB 2015/16 Page 22 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
 Find median using formula.

Class Frequency Cumu.Freq(less


than type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75

n 75
  37.5. 39 is the first cumulative frequency to be greater than or equal to 37.5
2 2
 50  54 is the median class.
L  49.5, w  5, n  75, c  17, f  22
med med
~ w
 X L  ( n  c)  49.5  5 (37.5 17)  54.16
med f 2 22
med
Exersise1: The following table gives the distribution of the weekly wages of employees of a small firm.
Wages in birr No. of employees
126 and below 3
127 – 135 5
136 – 144 9
145 – 153 12
154 – 162 5
163 – 171 4
172 and above 2
a) Find the median weekly wage.
b) Why is the median a more suitable measure of central tendency than the mean in this
case?
Merits of median
 Median is a positional average and hence it is not influenced by extreme values.
 Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
 Median can be calculated even in case of open-ended intervals.
 It gives best result in a study of those phenomena’s which are incapable of direct quantitative
measurement. Example: intelligence
Demerits of median
 It is not capable of further algebraic treatment.
 It is not a good representative of the data if the number of items (data) is small.
 The arrangement of items in order of magnitude is sometimes very tedious process if the number of
items is very large.

4.5. The Mode


The mode or the modal value is the most frequently occurring score/observation in a series and denoted
by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be unique.
Examples:
1. Find the mode of 5, 3, 5, 8, 9 Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
Zion CTB 2015/16 Page 23 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)

For grouped data, the mode is found by the following formula:


 1 
xˆ  Lmod   W
 1   2 
Where Lmod  lower class boundary of the modal class
 1  The difference between the frequency of the modal class and the next lower class
 2  The difference between the frequency of the modal class and the next higher class
W  is the class width
The modal class is the class with the highest frequency in the distribution.
Example: Following is the distribution of the size of certain farms selected at random from a district.
Calculate the mode of the distribution.
Size of farms No. of farms
5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3
Solutions:
45  55 is the mod al class, sin ce it is a class with the highest frequency.
Lmo  45, w  10,  1  f mo  f 1  2,  2  f mo  f 2  26
f mo  31, f 1  29, f2  5
 2 
 Xˆ  45  10   45.71
 2  26 
Exercise 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70, 75,
73, 80, 70, 83 and 86. Find the mode of the students’ marks.
Exercise 2: Find the mode for the frequency distribution of the birth weight (in kilogram) of 30 children
given below.
Weight 1.9-2.3 2.3-2.7 2.7-3.1 3.1-3.5 3.5-3.9 3.9-4.3
No. of children 5 5 9 4 4 3
Merits of mode
 Mode is not affected by extreme values.
 Mode can be calculated even in the case of 3open-end intervals. And it is not necessary to know
all observations.
Demerits of mode
 Mode may not exist in the series and if it exists it may not be a unique value.
 It does not fulfill most of the requirements of a good measure of central tendency
 It may be unrepresentative in many cases.

Zion CTB 2015/16 Page 24 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)

CHAPTER 5
5. Measures of Dispersion (Variation)

5.1. Introduction
Variation (dispersion) is the scatter or spread of observations /values/ in a distribution. The average
or central value is of little use unless the degree of variation, which occurs about it, is given. If the
scatter about the measure of central tendency is very large, the average is not a typical value.
Therefore it is necessary to develop a quantitative measure of the dispersion (or variation) of the
values about the average.
Measures of variation are statistical measures, which provide ways of measuring the extent to which
the data are dispersed or spread out.
Objectives : Measures of variation are needed for the following basic objectives.
 To judge the reliability of a measure of central tendency
 To compare two or more sets of data with regard to their variability
 To control variability itself like in quality control, body temperature, etc
 To make further statistical analysis or to facilitate the use of other statistical measures
Properties of a good measure of dispersion
A good measure of dispersion should:
 be rigidly defined by a mathematical formula,
 be simple to understand and easy to calculate,
 be unique,
 be fundamental of all observations in the series,
 not be affected by some extreme values existing in the series,
 have sampling stability property, and
 Be capable of further algebraic treatment as well as further statistical analysis.

5.2. Absolute and Relative Measures of Dispersion


Measures of dispersion /variation may be either absolute or relative. Absolute measures of dispersion
are expressed in the same unit of measurement in which the original data are given. These values may
be used to compare the variation in two distributions provided that the variables are in the same units
and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar versus
tones of sugarcane or if the average sizes are very different such as manager’s salary versus worker’s
salary, the absolute measures of dispersion are not comparable. In such cases measures of relative
dispersion should be used.
A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate
measure of central tendency. It is sometimes called coefficient of dispersion because the word
“coefficient” represents a pure number (that is independent of any unit of measurement). It should be
noted that while computing the relative dispersion, the average (the measure of central tendency) used
as a base should be the same one from which the absolute deviations were measured. Note also that
the value of a relative dispersion is unit less quantity.
5.3. Types of Measures of Dispersion

5.3.1. The Range and Relative Range


Range (R) is defined as the difference between the largest and the smallest observation in a given set
of data. That is, R  xmax  xmin where xmax and xmin are the largest and the smallest observations in the
series respectively.
In case grouped data, range is found by taking the difference between the class mark of the last class
and that of the first class. That is, R  M last  M first where M last and M first are the class marks of the
last class and that of the first class respectively.
Zion CTB 2015/16 Page 25 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
A relative range (RR), also known as coefficient of range, is given by
x  x min R
RR  max  ........ for ungrouped data
x max  xmin x max  x min
M last  M first R
RR   ......... for grouped data
M last  M first M last  M first
Properties of Range and Relative Range
 Range and relative range are easy to calculate and simple to understand.
 Both cannot be computed for grouped data with open ended classes.
 They do not tell us anything about the distribution of values in the series.
Example 1: Find the range and relative range for the monthly salary of ten workers in a certain paint
factory given below.
462 480 534 624 498 552 606 588 516 570
Solution:
xmax  624 birr x min  462 birr
R  x max  xmin  624 birr  462 birr  162 birr
x max  xmin 624 birr  462 birr 162 birr
RR     0.149
xmax  x min 624 birr  462 birr 1086 birr
Example 2: Find the values of the range and relative range for the following frequency distribution:
which shows the distribution of the maximum loads supported by a certain number of cables.
Maximum load Number
(in kilo- of cables
Newton)
93 – 97 2
98 – 102 5
103 – 107 12
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1
Solution:
M first  95 kN M last  130 kN
R  M last  M first  130 kN  95 kN  35 kN
M last  M first 130 kN  95 kN 35 kN
RR     0.156
M last  M first 130 kN  95 kN 225 kN

5.3.2. The Quartile Deviation (Semi-inter quartile range), Q.D


The inter quartile range is the difference between the third and the first quartiles of a set of items and
semi-inter quartile range is half of the inter quartile range.
Q  Q1
Q .D  3
2
Coefficient of Quartile Deviation (C.Q.D)
( Q 3  Q1 2 2 * Q .D Q 3  Q1
C . Q .D   
(Q 3  Q1 ) 2 Q 3  Q1 Q 3  Q1
 It gives the average amount by which the two quartiles differ from the median.

Zion CTB 2015/16 Page 26 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
Example
For the following frequency distribution find
a) Inter– quartile range.
b) Quartile deviation
Class limit Frequency
21 – 22 10
23 – 24 22
25 – 26 20
27 – 28 14
29 – 30 14___
Total 80
Solution
N/4 = 80/4 = 20, (20) th ordered observation
The 1 st quartile class is 23 -24

Q1  LC b Q 1 

N  cf w
4  22.5 
 20  10  2
 23.4
f Q1 22
 n  80 
Q2  2    2    40, Q2 is 40 th obeservation
 4  4
The class interval containing Q2 is 25 – 26.
Therefore

Q2  L C b Q 2 
  
2 N
4
 cf w 
f Q2

= 24.5 
40  32  x2
20
= 25.3
N
And Q3  3    60,
4
Q3 is 60th position observation.
The class limits containing Q3 is 27 – 28

Q3  L C bQ 3 
3 N 4   cf w   26.5 
60  52 
 27.84
f Q3 14

a) Inter quartile range = Q3  Q1


= 27.64 - 23.44 = 4.23

b) Q . D  1 Q3  Q1   4.23 / 2  2.115


2
The quartile deviation is more stable than the range as it defenses on two intermediate values.
This is not affected by extreme values since the extreme values are already removed. However,
quartile deviation also fails to take the values of all deviations.

Zion CTB 2015/16 Page 27 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
5.3.3. The Mean Deviation and Coefficient of Mean Deviation
The mean deviation (MD) measures the average deviation of a set of observations about their central
value, generally the mean or the median, ignoring the plus/minus sign of the deviations.
The mean deviation of a sample of n observations x1 , x2 , ... , xn is given as

MD 
x i A
Where A is a central measure (the mean or the median)
n
In case of grouped data, the formula for MD becomes

MD 
 f i xi  A Where x is the class mark of the i th class, f is the frequency of the i th
i i
n
class and n   f i .
 The mean deviation about the arithmetic mean is, therefore, given by

MD 
 xi  x .... for ungrouped data
n

MD 
 f i xi  x .... for grouped frequency distribution; where x is the class mark of the i th
i
n
class, f i is the frequency of the i th class and n   f i
 The mean deviation about the median is also given by

MD 
 xi ~x .... for ungrouped data
n

MD 
 f i xi  ~x .... for grouped frequency distribution; where x is the class mark of the i th
i
n
class, f i is the frequency of the i th class and n   f i .
 Mean Deviation about the mode.
n

 X i  Xˆ
 Denoted by M.D( X̂ ) and given by M . D ( Xˆ )  i 1 for ungrouped data
n
k

 f i X i  Xˆ
 For the case of frequency distribution it is given as: M . D ( Xˆ )  i 1

n
Coefficient of mean deviation (CMD)
The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median.
MD
In general, CMD  where A is a measure of central tendency: the arithmetic mean or the median.
A
MD
That is, CMD about the arithmetic mean is given by CMD  where MD is the mean deviation
x
calculated about the arithmetic mean. On the other hand CMD about the median is given by
MD
CMD  ~ in which case MD is calculated about the median of the observations. And also CMD
x
MD
about the mode is given by CMD  in which case MD is calculated about the mode of the

observations.
Properties of Mean Deviation and coefficient of mean deviation
- It is easy to understand and compute
- It is based on all observations
- It is not affected very much by the values of extreme value(s).
Zion CTB 2015/16 Page 28 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
-It is not capable of further mathematical treatments and it is not a very accurate measure of
dispersion.
Examples:
1. The following are the number of visit made by 10 students to the ZC Psychologist for advise
8, 6, 5, 5, 7, 4, 5, 9, 7, 4 Find mean deviation about mean, median and mode.
Solutions:
First calculate the three averages
~
X  6, X  5.5, Xˆ  5
Then take the deviations of each observation from these averages.
Xi 4 4 5 5 5 6 7 7 8 9 total
Xi  6 2 2 1 1 1 0 1 1 2 3 14
X i  5 .5 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
X i  5 1 1 0 0 0 1 2 2 3 4 14
10 10

 i 1
X i  6)
14 ~

i 1
X i  5 .5
14
 M .D ( X )    1 .4 M .D ( X )    1 .4
10 10 10 10
10

 X i  5)
14
M . D ( Xˆ )  i 1
  1 .4
10 10

5.3.4. The Variance, the Standard Deviation and Coefficient of Variation


The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their arithmetic
mean.
 Population Variance (  2 )

For ungrouped data  2



 x i  
2
1
 ...    xi 
2  xi  2


N N N 
 
Where  is the population arithmetic mean and N is the total number of observations in the
population.
For grouped data

2  f x i 1 
i   2  f i xi 2  Where is the population arithmetic
2

 
N
 . .. 
N
 f i xi  N  
 
mean, xi is the class mark of the i class, f i is the frequency of the i th class and N   f i .
th

 Sample Variance ( S 2 )
For ungrouped data
2  x i  x
1 
2
2  xi   Where is the sample arithmetic mean and n
2

S
n 1
  .. . 
n 1
 xi 
n 
x
 
is the total number of observations in the sample.
2  f i xi  x 
2
1  2  f i xi 2 
For grouped data : S 
n 1
 . .. 
n 1 
 f i xi  n  Where x is the sample
 
arithmetic mean, xi is the class mark of the i class, f i is the frequency of the i th class and n   f i .
th

Zion CTB 2015/16 Page 29 of 32


Statistics and probability ZCTB ( by: Abdulwahid .R)
The Standard Deviation
Standard deviation is the positive square root of the variance.

 Population Standard Deviation (  ) : -    2 Where  2 is the population variance

 Sample Standard Deviation ( S ) : - S  S 2 Where S 2 is the sample standard variance.


Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure is
known as the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two or
more than two different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
S
CV   100 . Where S is the standard deviation of the observations.
x
N.B. A distribution having less coefficient of variation is said to be less variable or more
consistent or more uniform or more homogeneous.
Example: Last year semester II, the students of Accounting and B. Management Departments took
Stat 102 course. At the end of the semester, the following information was recorded.
Department Accounting B. Management
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Accounting Department B. Management Department
S S
CV   100 CV   100
x x
23 11
  100  29 . 11 %   100  17 . 19 %
79 64
Interpretation: Since the CV of Accounting Department students is greater than that of
B. Management Department students, we can say that there is more dispersion relative to the mean
in the distribution of accounting students’ scores compared with that of B. Management students.
Examples:
1. An analysis of the monthly wages paid (in Birr) to workers in two firms A and B belonging to
the same industry gives the following results
Value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
Solutions:
Calculate coefficient of variation for both firms.
S 10
C .V A  A * 100  * 100  19 .05 %
XA 52 . 5
SB 11
C .VB  * 100  * 100  23 .16%
XB 47 .5
Since C.VA < C.VB, in firm B there is greater variability in individual wages.
Zion CTB 2015/16 Page 30 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
2. A meteorologist interested in the consistency of temperatures in three cities during a given week
collected the following data. The temperatures for the five days of the week in the three cities
were
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these data? (Exercise)
Properties of the Variance and the Standard Deviation
Variance
 It removes most of the demerits or drawbacks of the measures of dispersion discussed so far.
 Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
 It is calculated based on all the observations/data in the series.
 It gives more weight to extreme values and less to those which are near to the mean.
Standard Deviation
 It is considered to be the best measure of dispersion.
 [Demerits] If the values of two series have different unit of measurement, then we cannot
compare their variability just by comparing the values of their respective standard deviations.
 It is calculated based on all the observations/data in the series. Standard deviation is capable of
further algebraic treatment.
 Standard deviation is as such neither easy to calculate nor to understand.
 Similar to the variance, standard deviation gives more weight to extreme values and less to those
which are near to the mean.
The Standard Scores (Z-Scores)
A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
x
Population standard score: Z  where x is the value of the observation,  and  are the mean

and standard deviation of the population respectively.
Sample standard score: Z  x  x where x is the value of the observation, x and S are the mean and
S
standard deviation of the sample respectively.
Interpretation:
, ℎ ℎ
, ℎ ℎ
, ℎ ℎ
Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from section
1 scored 84 and student B from section 2 scored 90. Who performed better relative to his/her group?
Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84
Section 2: x = 85, S = 5 and score of student B from Section 2; x B = 90
x  x1 84  72
Z-score of student A: Z  A   2.00
S1 6
x  x 2 90  85
Z-score of student B: Z  B   1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of section
2 students.
Zion CTB 2015/16 Page 31 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
Examples 1: Two sections were given introduction to statistics examinations. The following
information was given.
Value Section 1 Section 2
Mean 78 90
Stan.deviation 6 5

Student A from section 1 scored 90 and student B from section 2 scored 95.Relatively speaking
who performed better?
Solutions: Calculate the standard score of both students.
X  X 1 90  78 X  X 2 95  90
ZA  A   2, ZB  B  1
S1 6 S2 5
 Student A performed better relative to his section because the score of student A is two standard
deviation above the mean score of his section while, the score of student B is only one standard
deviation above the mean score of his section.
Examples 2: Two groups of people were trained to perform a certain task and tested to find out
which group is faster to learn the task. For the two groups the following information was given:

Value Group one Group two


Mean 10.4 min 11.9 min
Stan.dev. 1.2 min 1.3 min
Relatively speaking:
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B from Group two take 9.3
minutes, who was faster in performing the task? Why?
Solutions:
a) Use coefficient of variation.
S 1.2 S 1.3
C.V1  1 *100  *100  11.54% C.V2  2 *100  *100  10.92%
X 10.4 X2 11.9
1
Since C.V2 < C.V1, group 2 is more consistent.
b) Calculate the standard score of A and B

X A  X 1 9.2  10.4 X  X 2 9.3  11.9


ZA    1, ZB  B   2
S1 1.2 S2 1.3
Child B is faster because the time taken by child B is two standard deviations shorter than the
average time taken by group 2 while, the time taken by child A is only one standard deviation
shorter than the average time taken by group 1.

Zion CTB 2015/16 Page 32 of 32

You might also like