Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

APPLYING STATISTICS TO DESCISION MAKING

Introduction

The role of statistics in research is to function as a tool for designing research, analysing data and
drawing conclusions therefrom. Most research studies result in a large volume of raw data which
must be suitably reduced so that it be easily read and used for further analysis. The science of
statistics cannot be ignored by any research worker even though he or she may not use detailed
statistical methods.

Determining the Measures of Central Tendency & Dispersion

Scale Types

Measurement is the assignment of numbers to objects or events in a systematic fashion. Four levels
of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio and each
possessed different properties of measurement systems.

Nominal Scales

Nominal scales are measurement systems that possess none of the three properties stated above.

 Level of measurement which classifies data into mutually exclusive, all-inclusive categories in
which no order or ranking can be imposed on the data.

 No arithmetic and relational operation can be applied.

Examples:

 Sex (Male or Female.)

 Choice (Yes or No)

 Marital status(Married, Single, Widow, Divorced)

Ordinal Scales

Ordinal Scales are measurement systems that possess the property of order, but not the property of
distance.

 Level of measurement which classifies data into categories that can be ranked. Differences
between the ranks do not exist.

 Arithmetic operations are not applicable but relational operations are applicable.

 Ordering is the sole property of ordinal scale.

Examples:

 Agreement (Strongly Disagree, Disagree, Agree, Neither Disagree nor Agree, Strongly Agree)

 Rating scales (Excellent, Very good, Good, Fair, poor)

Interval Scales
Interval scales are measurement systems that possess the properties of Order and distance, but not
the property of fixed zero.

 Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.

 All arithmetic operations except division and multiplication are applicable.

 Relational operations are also possible.

Examples:

 Temperature

 Time

 Test scores

Ratio Scales

Ratio scales are measurement systems that possess all three properties: order, distance, and fixed
zero. The added power of a fixed zero allows ratios of numbers to be meaningfully interpreted; i.e.
the ratio of Bekele’s height to Martha's height is 1.32, whereas this is not possible with interval
scales.

 Level of measurement which classifies data that can be ranked, differences are meaningful,
and there is a true zero. True ratios exist between the different units of measure.

 All arithmetic and relational operations are applicable.

 Examples:

 Weight

 Height

 Length

Measures of Central Tendency

Notations Used in Statistics

∑ = an operation for summation or addition

N= population size

n= sample Size

X̅ = Sample mean

µ= population mean

∑5i=1 yi = summation of elements in a series starting at 1 and ending at 5.


Example 1:

∑5i=1 yi = Y1 +Y2 +Y3 +Y4 + Y5

Example 2:

∑5i=1 yi = 1+2+3+4+5= 15

Example 3:
4
∑y=1 y 2 = 12+22+32+42= 1+4+9+16= 30

Example 4:

∑5x=2 3𝑋= 3(2) + 3(3) +3(4) + 3(5) = 6+9+12+15= 42

What Is Central Tendency?

Usually when two or more different data sets are to be compared it is necessary to condense the
data, but for comparison the condensation of data set into a frequency distribution and visual
presentation are not enough. It is then necessary to summarize the data set in a single value. Such a
value usually somewhere in the centre and represent the entire data set and hence it is called
measure of central tendency or averages. Since a measure of central tendency (i.e. an average)
indicates the location or the general position of the distribution on the X-axis therefore it is also
known as a measure of location or position.

a) Mean

The mean is a value obtained by dividing the sum of all the observations by the number of
observation. The mean is used to summarize interval or ratio data in situations when the distribution
is symmetrical and unimodal data. The mean is used to describe scores that reflect interval or ration
scale of measurement. The formula for the mean is given by:

∑𝑥
Mean= 𝑛
(sample mean)

∑𝑥
Mean= 𝑁
(population mean)

Example:

A sample of 10 executives received the following bonuses in thousands last year, determine the
mean of the bonuses:

10, 14, 15, 17, 16, 16, 20, 21, 25, 26


∑𝑥
Mean= 𝑛

∑(10+14+15+17+16+16+20+21+25+26)
Mean= 10

180
Mean= = 18
10

Mean= 18

b) Median

When the observations are arranged in ascending or descending order, the value that divides a
distribution into equal parts is called median. The median is commonly used to describe scores that
reflect ordinal scale of measurement. The median is given by the following formula:
(𝑛+1)𝑡ℎ
If n is odd=
2
𝑛 𝑛
[( )𝑡ℎ+( )𝑡ℎ+1)]
2 2
If n is even=
2

Example 1

Calculate the median for the following marks obtained by 9 students:

45 32 37 46 39 36 41 48 36

Solution:

First arrange the series in ascending or descending order:

32 36 36 37 39 41 45 46 48

(𝑛+1)𝑡ℎ
Since n is odd we use the formula n= 2

(9+1)𝑡ℎ (10)𝑡ℎ
n= = = 5th
2 2

Median= 39

Example 2

Calculate the median for the following marks obtained by 10 students:


45 32 37 46 39 36 41 48 36 50

Solution:

First arrange the series in ascending or descending order:

32 36 36 37 39 41 45 46 48 50

𝑛 𝑛
[( )𝑡ℎ+( )𝑡ℎ+1)]
2 2
Since n is even we use the formula n= 2

10 10
[( )𝑡ℎ+( )𝑡ℎ+1)] (5+6)𝑡ℎ (39+41)
2 2
If n is even= 2
= 2
= 2
= 40

Median =40

c) Mode

The mode is the most frequently occurring score. Typically, useful in describing central tendency and
is typically useful in describing central tendency when the scores reflect a nominal scale of
measurement. However, the mode is limited in giving us information about a distribution. If two or
more values occur the same number of times but most frequently than the other values, then there
is more than one mode.

 Data having one mode is called uni-modal.

 Data having two modes is called bi-modal.

 Data having more than two modes is called multi-modal.

Example 1:

The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.

Solution:

The score of 81 occurs the most often. It is the Mode. This data set is unimodal.

Example 2:

Find the mode of the data: 9, 6, 8, 9, 10, 7, 12, 15, 22, 15

Solution:

Arranging the data in increasing order, we have:


6, 7, 8, 9, 9, 10, 12, 15, 15, 22

We find that the both the observations 9 and 15 have the same frequency of 2. So, 9 and 15 are the
modes of the data. This data is bi-modal.

Measures of Dispersion

The scatter or spread of items of a distribution is known as dispersion or variation. In other words
the degree to which numerical data tend to spread about an average value is called dispersion or
variation of the data. Measures of dispersions are statistical measures which provide ways of
measuring the extent in which data are dispersed or spread out.

Types of Measures of Dispersion

Various measures of dispersions are in use. The most commonly used measures of dispersions are:

1) Range

2) Standard deviation

3) Variance

4) Coefficient of variation.

a) The Range

The range is obtained by subtracting the smallest score from the largest score in the distribution. It is
a quick method for measuring of variability. Because the range is greatly affected by extreme scores,
it may give a distorted picture of the scores. For this reason, among others, the range is not the most
important measure of variability. The formula for the range is given by:

Range= L-S

Example: Determine the range for the following distribution

32 35 36 36 37 38 40 42 42 43 43 45

Solution:

Range=L-S

Range= 45-32= 13

b) Standard Deviation

Standard deviation is another measure of dispersion in statistics. The standard deviation shows how
much your data is spread out around the mean or average. It is the most robust and widely used
measure of dispersion since, unlike the range because it takes into account every variable in the
dataset. When the values in a dataset are pretty tightly bunched together the standard deviation is
small. When the values are spread apart the standard deviation will be relatively large. The standard
deviation is usually presented in conjunction with the mean and is measured in the same units. The
formula for standard deviation is given by:

∑(xi −x̅)2
S.D= √
n

Where;

n= sample size

Xi= value of each element in the distribution

X̅ = mean of the sample

Example:

For example, suppose we have five climatic stations and have recorded rainfall in mm as follows
60,47,17,43,30. Calculate the standard deviation.

Solution
∑𝑥 ∑ 60+47+17+43+30 197
Mean= = = = 39.4
𝑛 5 5

Xi X̅ (Xi- X̅ ) (Xi- X̅ )2

60 39.4 20.6 424.36

47 39.4 7.6 57.76

17 39.4 22.4 501.76

43 39.4 3.6 12.96

30 39.4 9.4 88.36

∑= 1085.2

∑(xi −x̅)2
S.d= √ n

1085.2
S.d= √ 5
= S.D= √217.04

S.d= 14.73mm
c) Variance

Variance in statistics is a measurement of the spread between numbers in a data set. That is, it
measures how far each number in the set is from the mean and therefore from every other number
in the set, so Variance defined as the average of the squared differences from the mean. Variance
can be negative. A zero value means that all of the values within a data set are identical. If the
variance is low that’s mean the data collect near average, while If the variance is high the data will
spread from the average. Variance. Variance is given by the following formula:
∑(xi −x̅)2
Var= n

Where;

n= sample size

Xi= value of each element in the distribution

X̅ = mean of the sample

Example:

The heights (in cm) of students of a class is given to be 163, 158, 167, 174, 148. Determine the
variance.

Solution:
∑𝑥 ∑ 163+158+167+174+148 810
Mean= 𝑛
= 5
= 5
= 162

Xi X̅ (Xi- X̅ ) (Xi- X̅ )2

163 162 1 1

158 162 -4 16

167 162 5 25

174 162 12 144

148 162 -14 196

∑= 382

382
Var= 5
= 76.4

Var= 76.4

d) Coefficient of variation (CV)


The coefficient of variation (CV) is a statistical measure of the dispersion of data points in a data
series around the mean. The coefficient of variation represents the ratio of the standard deviation to
the mean, and it is a useful statistic for comparing the degree of variation from one data series to
another, even if the means are drastically different from one another. The formula is given below:

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Coefficient of variation (COV)= 𝑀𝑒𝑎𝑛
x 100

Example:

An analysis of the monthly wages paid to workers in two firms A and B belonging to the same
industry gives the following results:

Value Firm A Firm B

Mean Wage 52.5 47.5

Variance 100 121

10
COV Firm A= x 100= 19.05%
52.5

12
COV Firm B= x 100= 25.3%
47.5

This implies that there is greater variability in wages in Firm B compared to Firm A.

3.1.3 Measures of Shape

a) Skewness

The measure of central tendency and measure of dispersion can describe the distribution but they
are not sufficient to describe the nature of the distribution. For this purpose, we use other two
statistical measures that compare the shape to the normal curve called Skewness and Kurtosis.
Skewness and Kurtosis are the two important characteristics of distribution that are studied in
descriptive statistics.

Skewness is a statistical number that tells us if a distribution is symmetric or not. A distribution is


symmetric if the right side of the distribution is similar to the left side of the distribution.

If a distribution is symmetric, then the Skewness value is 0.

i.e. If a distribution is Symmetric (normal distribution):

median= mean= mode, (Skewness value is 0)


If Skewness is greater than 0, then it is called right-skewed or that the right tail is longer than the left
tail. If Skewness is less than 0, then it is called left-skewed or that the left tail is longer than the right
tail.

Kurtosis is a statistical number that tells us if a distribution is taller or shorter than a normal
distribution. If a distribution is similar to the normal distribution, the Kurtosis value is 0. If Kurtosis is
greater than 0, then it has a higher peak compared to the normal distribution. If Kurtosis is less than
0, then it is flatter than a normal distribution.

There are three types of distributions:

 Leptokurtic: Sharply peaked with fat tails, and less variable.

 Mesokurtic: Medium peaked

 Platykurtic: Flattest peak and highly dispersed.

You might also like