Professional Documents
Culture Documents
Bussiness Stat
Bussiness Stat
R)
CHAPTER 1
1 CHAPTER ONE: INTRODUCTION
1.1 Definition and classification of statistics
1.1.1 Definition:
Plural sense (lay man definition): Statistics is a collection of numerical facts and data.
Singular sense (formal definition): Statistics is a mathematical science dealing with the
methods of collection, organizing the collected data, presentation, analysis and
interpretation of the data.
Statistics is a subject that deals with numbers and figures describing certain situations. It
primarily deals with numerical data taken by surveys and summarizes these data in such a
way that this summary gives a good indication about the nature of the data.
1.1.2 Classification:
Statistics is broadly divided into two categories based on how the collected data are used.
1. Descriptive Statistics
It deals with describing data without attempting to infer anything that goes beyond the
given set of data.
It consists of collection, organization, summarization and presentation of data.
It is concerned with summary calculations, graphs, charts and tables.
2. Inferential Statistics
It deals with making inferences and/or conclusions about a population based on data
obtained from a limited sample of observations,
It consists of performing hypothesis testing, determining relationships among variables and
making predictions.
It is important because statistical data usually arises from sample.
Statistical techniques based on probability theory are required.
For example,
a) The average income of all families (the population) in Ethiopia can be estimated from figures
obtained from a few hundred (the sample) families.
b) The average age of a student in Zion College is 20.1 years.
c) There is a relationship between smoking tobacco and an increased risk of developing lung cancer.
Exercise: discuss the advantage and disadvantage of the above three methods with respect to each
other.
Zion CTB 2015/16 Page 1 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
2. Organization of data: Summarization of data in some meaningful way, e.g table form
3. Presentation of the data: The process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.
5. Inference of data: The interpretation and further observation of the various statistical measures
through the analysis of the data by implementing those methods by which conclusions are
formed and inferences made.
Statistical techniques based on probability theory are required.
Exercise: the purpose of the above stages.
4. Ratio Scales:
Ratio scales are measurement systems that possess all three properties: order,
distance, and fixed zero.
The added power of a fixed zero allows ratios of numbers to be meaningfully
interpreted; i.e. the ratio of Bekele's height to Martha's height is 1.32, whereas
this is not possible with interval scales.
Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units
of measure.
+, -, *, / Are possible on this scale and relational operations are applicable.
This measurement scale provides better information than interval scale of
measurement
Zero measurement indicates absence of the quantity being measured.
Examples:
Weight
Height
Number of students
Age
Zion CTB 2015/16 Page 4 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
CHAPTER 2
2 CHAPTER TWO: SAMPLING SURVEY
2.1 Introduction
When secondary data are not available for the problem under study, a decision may be taken to collect
primary data by using any of the methods discussed in the previous chapter. The required information
may be obtained by following either the census method or the sample method.
What is the Difference between Census and Sample Method?
2.2 Census and Sample Method
What are the merits and demerits of Census method?
The merits of the census method are
Data are obtained from each and every unit of the population, The results obtained are likely to be
more representative, accurate and reliable, It is an appropriate method of obtaining information on
rare events &Data of complete enumeration census can be widely used as a basis for various surveys.
Demerits
However, despite these advantages the census method is not very popularly used in practice.
The effort, money and time required for carrying out complete enumeration will generally be very
large and in many cases cost may be so prohibitive that the very idea of collecting information may
have to be dropped. Also if the population is infinite or the evaluation process destroys the population
unit, the method cannot be adopted.
What is ‘universe’ in Statistics?
The word ‘universe’ as used in Statistics denotes the aggregate from which the sample is to be taken.
The universe may be either finite or infinite.
A finite universe is one in which the number of items is determinable, such as the number of students
in ZCTB or in Ethiopia.
An infinite universe is that in which the number of items can not be determined, such as the number
of stars in the sky
Q. What is sampling?
The process of sampling involves three elements:
a. Selecting the sample.
b. Col1ecting the information, and
c. Making an inference about the population.
Q. Can you give Practical examples of sampling?
2.3. Essentials of Sampling
i. Representative ness: A sample should be so selected that it truly represents the universe.
ii. Adequacy: The size of sample should be adequate; otherwise it may not represent the
characteristics of the universe.
Zion CTB 2015/16 Page 5 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
iii. Independence: All items of the sample should be selected independently of one another and all
items of the universe should have the same chance of being selected in the sample.
iv. Homogeneity: When we talk of homogeneity we mean that there is no basic difference in the
nature of units of the universe and that of the sample.
What the Methods of Sampling?
2.4. Methods of Sampling
‘The various methods of sampling can be grouped under two broad heads:
1. Probability sampling (also known as random sampling) and
2. Non-probability (or non-random) sampling.
Probability Sampling
Probability sampling methods are those in which every item in the universe has a known chance, or
probability, of being chosen for sample.
Advantages of Probability Sampling
The following are the basic advantages of probability sampling methods:
does not depend upon the existence of detailed info. about the universe for its effectiveness.
It provides estimates which are essentially unbiased and have measurable precision
It is possible to evaluate the relative efficiency of various sample designs only when it is used.
Limitations of Probability Sampling
The limitations are
It requires a very high level of skill and experience, a lot of time to plan and execute and costly.
N.B. Non-random sampling is a process of sample selection without the use of randomization.
The most important difference between random and non-random sampling is that whereas the
pattern of sampling variability can be ascertained in case of random sampling. In non-random
sampling, there is no way of knowing the pattern of variability in the process.
Non-probability Sampling Methods
Non-probability sampling methods are those, which do not provide every item in the universe with a
known chance of being included in the sample. The selection process is, at least, partially subjective.
i. Judgment sampling;
ii. Convenience sampling; and
iii. Quota sampling.
Probability Sampling Methods
a. Simple or unrestricted random sampling; and
b. Restricted random sampling:
i. Stratified sampling.
ii. Systematic sampling. And
iii. Cluster sampling.
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M, S,
D, and W. These types will be used as class for the distribution. We follow procedure to construct the
frequency distribution.
Step 1: Make a table as shown.
To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in
the collected data. Then make a columnar table of all potential raw scored values arranged in order
of magnitude with the number of times a particular value is repeated, i.e., the frequency of that value.
To facilitate counting method, tallies can be used.
Example: The following data are the ages in years of 20 ZC Instructors who attend seminar on
auditing. 30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
Construct a frequency distribution for these data.
STEP 1. Find the range of the data:
Range Maximum observatio n Minimum observatio n (R=H-L)
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency
STEP 3. distribution becomes as follows.
Note:
The relative frequency shows what fractional part or proportion of the total frequency belongs
to the corresponding class.
The sum of all the relative frequencies in the frequency distribution is always 1.
Relative cumulative frequency (less than type/ more than type): total of the relative
frequencies above/ below a class inclusively. Or the cumulative frequency (less than
type/more than type) divided by the total frequency. This gives the percent of values which
are less than/more than the upper/lower class boundary.
AB
Blood Type
8 10 12 14 16 18
Frequency
20
18
16
Sales (in mil ion birr)
14
12
10
6
A B C D
Product
10
-10
1997 1998 1999 2000 2001
Year
3. Broken bar-diagrams
This kind of bar-diagram is used to present data involving a few extreme values where it will be
difficult to accommodate the magnitude of the bars corresponding to these values within the graph
paper. In this case we use pieces of bars with each piece starting with a jump on the numerical scale.
Example: Data: - Amount of production per a day for four products of a factory.
Product Quantity
produced (kg/day)
A 14
B 35
C 23
D 109
4. Component bar-diagrams
When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a variable
with each aggregate broken into its component parts and different colors or designs are used for
identification.
Example: Represent the following data using bar-charts
Data: Yields of production of farmers in Southern Ethiopia.
Year 1990 EC 1991 EC 1992 EC 1993 EC
Crop
Barley 14 15 26 19
Wheat 10 15 14 25
Maize 2 6 10 3
Total 26 36 50 47
The component bar-diagram for this table is as follows
50
Production
40
30
20
MAIZE
10
WHA ET
0 BARLEY
1990 1991 1992 1993
YEAR
5. Multiple bar-diagrams
Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.
Example: The data given in the above example can be presented by using multiple bar-diagram as
below.
30
20
Production
10
BARLEY
WHAET
0 MAIZE
1990 1991 1992 1993
YEAR
II. Pie-charts
A pie-chart is a circle that is divided into sections according to the percentages of frequencies in each
category of the distribution. The angle of the sector of a class is obtained by multiplying the ratio of
the frequency of the class to the total frequency by 3600.
frequency of the class
i.e. sec tor angle of a class 360 0
total frequency
Note that pie-charts are usually used for depicting nominal level data.
Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses. Below
is the breakdown of the various expenditure items. Draw an appropriate chart to portray the data.
Expenditure item Amount (in birr)
Fuel 603
Interest on car loan 279
Repairs 930
Insurance and license 646
Depreciation 492
Total 2,950
How to draw a pie-chart
First find the percentages of each class
Next calculate the degree measures for each class
Key
20%
17%
Fuel
Insurance and license
9% 22%
Repairs
Interest on car loan
Depreciation
32%
III. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a suitable
picture to represent a definite number of units in which the variable is measured.
Example: Draw a pictorial diagram to present the following data (number of students in a certain
school for four years.)
Year 2000 2001 2002 2003
No. of students 2000 3000 5000 7000
Let a single picture () represents one thousand students.
2003
2002 Key: = 1000 students
2001
2000
IV. Histogram
A histogram is another way of data presentation which is more suitable for frequency distributions
with continuous classes. In drawing a histogram, we put the class boundaries of each class on the
horizontal axis and its respective frequency on the vertical axis.
Example: Draw a histogram presenting the following data.
V. Frequency Polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical
axis and their respective class marks along the horizontal axis. Then join the cross points by a free
hand curve.
Example: Present the data in the previous example using a frequency polygon.
8
4
Value Frequency
0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
30
30
Less than type cumulative frequencies
20
20
10 10
0 0
11.50 17.50 23.50 29.50 35.50 41.50 5.50 11.50 17.50 23.50 29.50 35.50
+ + ⋯+ =
For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x2 , x3 , x4 , x5 and x6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x6 = 37.
6
Their sum becomes xi 1
i 21+13+59+46+32+37=208.
n
2 2 2 2
Similarly x1 x 2 ... x n = xi
i 1
Some Properties of the Summation Notation
n
1. c = n.c
i 1
where c is a constant number.
n n
2. b.x
i 1
i b xi where b is a constant number
i 1
n n
3. (a bx ) n.a b x
i 1
i
i 1
i where a and b are constant numbers
n n n
4. (x
i 1
i y i ) xi y i
i 1 i 1
i 1
fi X i
36
X 4
5 .15
7
i 1
fi
Exercise 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record the
following:
17.5 19.5 17.5 19 20
21 18 19.5 18 10.75
Compute the sample mean length of the infants for these data.
Exercise 2: Monthly incomes of second year regular students are given in the following frequency
distribution.
Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
Compute the mean for these data.
Arithmetic Mean for Grouped Frequency Distribution
If data are given in the form of continuous FD, the sample mean can be computed as
⋯ ∑
= ⋯
= ∑
Where xi = the class mark of the i th class; i = 1, 2, …, k f i th
= the frequency of the i class and
k
Note that f i n = the total number of observations.
k = the number of classes i 1
f
i 1
i Xi
1575
X 6
15 .75
100
i 1
fi
Exercises:
1. Marks of 75 students are summarized in the following frequency distribution:
Therefore x w
w x i (3 82) (5 80) (3 90) (1 70)
i
82.17
w i 3 5 3 1
Average mark of the student for one course is approximately 82.
Merits of Arithmetic Mean
Arithmetic mean is rigidly defined a mathematical formula so that its value is always
definite.
It is calculated based on all observations.
Arithmetic mean is simple to calculate and easy to understand. It doesn’t need arraying
(arranging in increasing or decreasing order) of the data.
Arithmetic mean is also capable of further algebraic treatment.
It affords a good standard of comparison.
Demerits of Arithmetic Mean
It is highly affected by extreme (abnormal) observations in the series. For instance, the
monthly incomes of three boys are 37 birr, 53 birr and 48 birr and that of their father is 1026
birr. The average income become for one of these four people becomes 219 birr which is
not at all a representative figure.
It can be a number which does not exist in the series.
It sometime gives such results which appear almost meaningless. For example it is likely
that we can get an average of ‘3.6 children’ per family.
It gives greater importance to bigger items of a series and lesser importance to smaller items.
That means it is an upward bias measure.
It can’t be calculated for open-ended classes.
THE GEOMETRIC MEAN &THE HARMONIC MEA (reading assignment)
Zion CTB 2015/16 Page 21 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2, 6.4,
10.5, 8.1 and 7.8. Find the median weight of these five babies.
Solution: the median is 8.1.
Example: Find the median of the following distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:
First find the less than cumulative frequency.
Identify the median class.
Zion CTB 2015/16 Page 22 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
Find median using formula.
n 75
37.5. 39 is the first cumulative frequency to be greater than or equal to 37.5
2 2
50 54 is the median class.
L 49.5, w 5, n 75, c 17, f 22
med med
~ w
X L ( n c) 49.5 5 (37.5 17) 54.16
med f 2 22
med
Exersise1: The following table gives the distribution of the weekly wages of employees of a small firm.
Wages in birr No. of employees
126 and below 3
127 – 135 5
136 – 144 9
145 – 153 12
154 – 162 5
163 – 171 4
172 and above 2
a) Find the median weekly wage.
b) Why is the median a more suitable measure of central tendency than the mean in this
case?
Merits of median
Median is a positional average and hence it is not influenced by extreme values.
Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
Median can be calculated even in case of open-ended intervals.
It gives best result in a study of those phenomena’s which are incapable of direct quantitative
measurement. Example: intelligence
Demerits of median
It is not capable of further algebraic treatment.
It is not a good representative of the data if the number of items (data) is small.
The arrangement of items in order of magnitude is sometimes very tedious process if the number of
items is very large.
CHAPTER 5
5. Measures of Dispersion (Variation)
5.1. Introduction
Variation (dispersion) is the scatter or spread of observations /values/ in a distribution. The average
or central value is of little use unless the degree of variation, which occurs about it, is given. If the
scatter about the measure of central tendency is very large, the average is not a typical value.
Therefore it is necessary to develop a quantitative measure of the dispersion (or variation) of the
values about the average.
Measures of variation are statistical measures, which provide ways of measuring the extent to which
the data are dispersed or spread out.
Objectives : Measures of variation are needed for the following basic objectives.
To judge the reliability of a measure of central tendency
To compare two or more sets of data with regard to their variability
To control variability itself like in quality control, body temperature, etc
To make further statistical analysis or to facilitate the use of other statistical measures
Properties of a good measure of dispersion
A good measure of dispersion should:
be rigidly defined by a mathematical formula,
be simple to understand and easy to calculate,
be unique,
be fundamental of all observations in the series,
not be affected by some extreme values existing in the series,
have sampling stability property, and
Be capable of further algebraic treatment as well as further statistical analysis.
Q1 LC b Q 1
N cf w
4 22.5
20 10 2
23.4
f Q1 22
n 80
Q2 2 2 40, Q2 is 40 th obeservation
4 4
The class interval containing Q2 is 25 – 26.
Therefore
Q2 L C b Q 2
2 N
4
cf w
f Q2
= 24.5
40 32 x2
20
= 25.3
N
And Q3 3 60,
4
Q3 is 60th position observation.
The class limits containing Q3 is 27 – 28
Q3 L C bQ 3
3 N 4 cf w 26.5
60 52
27.84
f Q3 14
MD
x i A
Where A is a central measure (the mean or the median)
n
In case of grouped data, the formula for MD becomes
MD
f i xi A Where x is the class mark of the i th class, f is the frequency of the i th
i i
n
class and n f i .
The mean deviation about the arithmetic mean is, therefore, given by
MD
xi x .... for ungrouped data
n
MD
f i xi x .... for grouped frequency distribution; where x is the class mark of the i th
i
n
class, f i is the frequency of the i th class and n f i
The mean deviation about the median is also given by
MD
xi ~x .... for ungrouped data
n
MD
f i xi ~x .... for grouped frequency distribution; where x is the class mark of the i th
i
n
class, f i is the frequency of the i th class and n f i .
Mean Deviation about the mode.
n
X i Xˆ
Denoted by M.D( X̂ ) and given by M . D ( Xˆ ) i 1 for ungrouped data
n
k
f i X i Xˆ
For the case of frequency distribution it is given as: M . D ( Xˆ ) i 1
n
Coefficient of mean deviation (CMD)
The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median.
MD
In general, CMD where A is a measure of central tendency: the arithmetic mean or the median.
A
MD
That is, CMD about the arithmetic mean is given by CMD where MD is the mean deviation
x
calculated about the arithmetic mean. On the other hand CMD about the median is given by
MD
CMD ~ in which case MD is calculated about the median of the observations. And also CMD
x
MD
about the mode is given by CMD in which case MD is calculated about the mode of the
xˆ
observations.
Properties of Mean Deviation and coefficient of mean deviation
- It is easy to understand and compute
- It is based on all observations
- It is not affected very much by the values of extreme value(s).
Zion CTB 2015/16 Page 28 of 32
Statistics and probability ZCTB ( by: Abdulwahid .R)
-It is not capable of further mathematical treatments and it is not a very accurate measure of
dispersion.
Examples:
1. The following are the number of visit made by 10 students to the ZC Psychologist for advise
8, 6, 5, 5, 7, 4, 5, 9, 7, 4 Find mean deviation about mean, median and mode.
Solutions:
First calculate the three averages
~
X 6, X 5.5, Xˆ 5
Then take the deviations of each observation from these averages.
Xi 4 4 5 5 5 6 7 7 8 9 total
Xi 6 2 2 1 1 1 0 1 1 2 3 14
X i 5 .5 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
X i 5 1 1 0 0 0 1 2 2 3 4 14
10 10
i 1
X i 6)
14 ~
i 1
X i 5 .5
14
M .D ( X ) 1 .4 M .D ( X ) 1 .4
10 10 10 10
10
X i 5)
14
M . D ( Xˆ ) i 1
1 .4
10 10
2 f x i 1
i 2 f i xi 2 Where is the population arithmetic
2
N
. ..
N
f i xi N
mean, xi is the class mark of the i class, f i is the frequency of the i th class and N f i .
th
Sample Variance ( S 2 )
For ungrouped data
2 x i x
1
2
2 xi Where is the sample arithmetic mean and n
2
S
n 1
.. .
n 1
xi
n
x
is the total number of observations in the sample.
2 f i xi x
2
1 2 f i xi 2
For grouped data : S
n 1
. ..
n 1
f i xi n Where x is the sample
arithmetic mean, xi is the class mark of the i class, f i is the frequency of the i th class and n f i .
th
Student A from section 1 scored 90 and student B from section 2 scored 95.Relatively speaking
who performed better?
Solutions: Calculate the standard score of both students.
X X 1 90 78 X X 2 95 90
ZA A 2, ZB B 1
S1 6 S2 5
Student A performed better relative to his section because the score of student A is two standard
deviation above the mean score of his section while, the score of student B is only one standard
deviation above the mean score of his section.
Examples 2: Two groups of people were trained to perform a certain task and tested to find out
which group is faster to learn the task. For the two groups the following information was given: