Measure of Central Tendency: Chapetr - Three

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 25

Introduction to statistics

CHAPETR - THREE
Measure of Central Tendency
o When we want to make comparison b/n groups of numbers it is good to have a single value, which is considered to be a good representative of each
group.
o This single value is called the average of the group.
o Averages are also called measure of central tendency.
o An average, which is representative, is called typical average and average which is not representative and has only a theoretical value is called a
descriptive average.
o A typical average should posses the following.
o It should be strictly defined.
o It should be based on all observation under investigation.
o It should be as little as affected by extreme observations.
o It should be capable of further algebraic treatment.
o It should be ease to calculate and simple to understand.

Objectives:
o To comprehend the data easily.
o To facilitate comparison.
o To make further statistical analysis.

The summation notation


o Let x1, x2,............xn be a number of measurement where n is the total number of observation and xi is the ith observation.
n n
o The symbol∑ xi is the shorthand for x1+x2+x3+..........+ xn, then we say that ∑ xi is the summation notation.
i=1 i=1
n
o i.e ∑ xi = x1+ x2+...........+ Xn.
i=1

Properties of Summation
Introduction to statistics

n
i) ∑ k = nk, where k is any constant.
i=1

n n
ii) ∑ kxi = k∑ xi where k is any constant.
i=1 i=1
n
n
¿ ( a+kxi ) =na+k ∑ xi where a∧k are any constant . ¿
iii) ∑ i=1
i=1
¿
n n n
iv)∑ ( xi+ yi ) =∑ xi+ ∑ yi
i=1 i=1 i=1

Types of Central Tendency


Ü There are different types of measure of central tendency
► Mean (Arithmetic, Geometric, and Harmonic)
► Median (the middle value)
► Mode (the most frequently appearing value)
► Quantiles (quartiles ,Deciles, percentiles).
■The choice of the averages depends up on which best fit the property under discussion.
The arithmetic mean (A.M)
n
x 1+ x 2+...+ xn
• For Row data , x = =¿ 1 ∑ xi
n n i=1

n
1
For ungroup frequency distribution x ¿
n ∑ fixi
∑ fi i=1

i=1
Introduction to statistics

Where k is the number of classes and fi is the number of the occurrence of xi


Example: obtain the mean of the following numbers 2,7,8,2,7,3,7
Solution

xi fi xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36

n
1
x ¿ n ∑ fixi = 36 = 5.15
∑ fi i=1 7
i=1

1 k

∑ fi ∑
k
Ü Grouped data x = fixi
i=1
i=1

Where: xi is the class mark of the ith class and fi is the ith class frequency.
Example: calculate the mean for the following age distribution.

class Frequency
6-10 35
11-15 23
16-20 15
21-25 12
26-30 9
31-35 6
o Solutions:
Introduction to statistics

o First find the class marks.


o Find the product of frequency and class marks
o Find mean using the formula.

class fi xi xifi
6-10 35 8 280
11-15 23 13 299
16-20 15 18 270
21-25 12 23 276
26-30 9 28 252
31-35 6 33 198
total 100 1575

1 k
1
∑ fi ∑
k
Ü x = fixi = (1575) =15.75
i=1 100
i=1

Exercise: - Marks of 75 students are summarized in the following frequency distribution:

Marks Frequency
40-44 7
45-49 10
50-54 22
55-59 f1
60-64 f2
65-69 6
70-74 3

If 20% of the students have marks b/n 55 and 59.


Introduction to statistics

o Find the missing frequency


o Find the mean of the distribution.

Special properties of A. mean


i) The sum of deviations of a set of items from their mean is always zero i.e.
n

∑ (xi−x ¿)¿ = 0
i=1

 Consider the set: 3, 8, 4


 The mean is 5

ii. If x 1 if the mean of n1 observation, if x 2 is the mean of n2 observations, ........, if x k is the mean of nk observations, then the mean of all the
observation in all groups often called the combined mean is given by :-
1
xc =
n 1 x 1+ n 2 x 2+ …+nk x K k ∑ ¿x
=
n 1+n 2+…+ nk ∑¿ i
i=1

Example:-In a class there are 30 females and 70 males .If females averaged 60 in an examination and boys averaged 72, find the mean of the entire class.
✈solutions:-
Females males
x 1= 60 x 2 = 72
Introduction to statistics

n1= 30 n2=70
n 1 x 1+ n 2 x 2+ …+nk xk 60∗30+ 72∗70 1800+5040
xc = = =¿ = 68.4
n 1+ n 2+ …+nk 30+70 100

iii. If wrong figure has been used when calculating; the mean of the correct mean can be obtained without repeating the whole process using:
Correct Value−Wrong Value
Correct Mean = Wrong Mean + , where n is the total number of observations
n
Example: - An average weight of 10 students was calculated to be 65k.g latter it was discovered that one weight was misread as 40 instead of 80 k.g.
calculate the correct average weight.
Correct Value−Wrong Value
Correct Mean = Wrong Mean +
n
80−40
65 + = 69 k.g.
10
iv) Weighted A. mean
o When a different importance is desired to be giving to different data a weighted mean is appropriate.
o Weights are assigned to each item in proportion to its relative importance.
o Let x1, x2 ,…., xn be the values of the items a series and w1,W2,..., Wn their corresponding weights, the weighted mean denoted by xw is defined as:-
o
( w X +w X +. ..+wn X n )
X̄ w = 1 1 2 2
( w1 +w2 +. . . wn )
1 n

∑ wi ∑
n
xw = wix i
i=1
i=1

• Example:-A student obtained the following percentage in an examination:- English 60, biology 75, mathematics 63, physics 59,and chemistry 55.
Find the students weighted arithmetic mean if weights 1, 2,1,3,3 respectively are allotted to all students.
Introduction to statistics

• Solution :-
1 n
60∗1+75∗2+63∗1+59∗3+55∗3
xw = n

∑ wi ∑ wixi =
1+2+1+3+ 3
= 61.5
i=1
i=1

Merit and Demerits of A. mean


►Advantages:-
■ It is strictly defined
■ It is based on all observation
■ It is suitable for further algebraic treatment
■ It is stable average, i.e. it is not affected by fluctuation of sampling to some extent.

■ it is ease to calculate and simple to understand.


► Demerits:-
■ it is affected by extreme observations.
■ it can be a number, which does not in a series.
Geometric mean (G.M)
o G.M is defined as the nth root of the product of n items or values of series.
o If there are two items, we take square root; if there are three items, the cube root and so on.
o symbolically, let x1,x2,x3,…,xn be the n values of a variable x, then their G.M is defined as:
o For row data

G.M= √n x 1. x 2. x 3 … xn
o For grouped data
G= (x √ 1
f1x
2
f
2
x 3 ⋯⋯⋯⋯x n )
3
f
Introduction to statistics

Where xi represent class mark

- If the number of observation is more than three or more, the computation of the nth root very tedious, to simplify computation, logarithm is used in terms of
log.
n
1
LogG.M =
n ∑ log xi
i=1

n
1
Anti log (Log G.M) = Antilog [
n ∑ log xi ]
i=1

n
1
G.M = Anti log [
n ∑ log xi ] For row data
i=1

( )
n
1
G= AntiLog
N
∑ fi Log x i For grouped data
i =1

Example: - Find the geometric mean of 3,9,27

Solution: - G.M = √n x 1. x 2. x 3 … xn = √3 3∗9∗27 = 9

Note: - The geometric mean is useful and appropriate for finding averages of ratios or growth rates.
Harmonic Mean (H.M)
The harmonic mean of x1, x2, x3… xn is denoted by H.M
1 n
n n
o H.M = 1 1 = 1
n
∑ xi
∑ xi
i=1 i=1

And in a case of frequency distribution:


Introduction to statistics

1 n n
, n =∑ fi
k k
H.M = 1
∑ fi =
n i=1 xi
∑ xifi i=1
i=1

- If x1, x2, x3,…, xn be the value of the items a series and w1,w2,…,wn their corresponding weights, the weighted Harmonic Mean denoted by;
1
n
1
H.Mw = n ∑ wi
xi
∑ wi
i =1

i=1

Example:- Find the harmonic mean of the following data. 20,30


2
Solution:- H.M = 1 1 = 24
+
20 30

Note:- The Harmonic Mean is useful and appropriate in finding average speeds and average rates.
N.B a). A.M>G.M>H.M

b). √ A . M ∗H . M = G.M, Where A.M and H.M. are the usual abbreviations.

The mode ( ^x )
►Mode is a value, which occurs most frequently in a set of values.
►The mode may not exist and even if it does exist, it may not be unique.
►In case of discrete distribution the values having the maximum frequency in the modal class.
Examples:-
►Find the mode of 5, 3,5,8,9
Mode is =5. It is a unimodal data.
Introduction to statistics

►Find the mode of 8,9,9,7,8,2,5


Mode is=8 and 9. It is a bimodal data.
Ü For grouped data
Discrete series: mode equals the value of the variables corresponding to the maximum frequency.

V 2 3 4 5
f 5 8 12 1
Mode is 4.
continues series: (class frequency distribution).

Mode =L+ [ f 1−f 0


( f 1−f 0 )+(f 1−f 2) ]
C

Where L= LCB of the modal class


f1= max frequency
f0= frequency of preceding f1
f2= frequency of succeeding f1
c = class width
Example:- following is the distribution of the size of certain farms selected at random from a district.

Class( siz 5-15 15-25 25-35 35-45 45-55 55-65 65-75


e of
farms)
f 8 12 17 29 31 5 3
Find the modal of the distribution
Solution: - Modal class= 45-55
l=45, f1=31, fo=29, f2=5, c=10
Introduction to statistics

Then the mode = Mode =45+ [ 31−29


( 31−29 ) +(31−5) ]
10 =45.71

Merit and Demerit of Mode


Merit
o It represents the most typical value in the distribution
o It is not affected by extreme observations.
o Easy to calculate and understand.

Demerit
o It is not rigid.
o It not based on all observations.
o It is not suitable for further mathematical treatment.
o It is not stable average. i.e. it is affected by fluctuations of sampling to some extent
o Often its value is not unique. i.e It may not be uniquely defined
 Example: X={1,1,2,2,3,4}, Mode(X)=1 and 2

The Median (~
x)

o In a distribution, median is the value of the variables, which divides it into two equal parts
o One part comprising all the values greater and the other all the values less than median.
o Median can be defined as the middle value of a set of data value when they are arranged in ascending or descending order.
 For ungrouped data

~
x
=

Example:- Find the median of the following numbers:


Introduction to statistics

a) 2, 1,8,3,5
b) 6, 5, 2,8,9,4

 For grouped data (Class Frequency Distribution):


The steps are:-
1. Compute the Lcfi of the classes.
n
2. Search the value of ( ¿ th , this helps to find the median class.
2
cw
3. Use the formula. Median = L + ¿ - cf)
f

NB. Median class: - is a class with Lcf > n/2


Where: L =LCB of the median class

cw= class width of the median class

f= frequency of the median class

cf= cumulative frequency of just less than n/2

Example:- Find the median of the following distribution

Class 50-60 60-70 70-80 80-90 90-100 100-110


Fi 20 21 50 40 53 16
<cfi 20 41 91 131 184 200
Solution:-
2. Median class= value of (n/2 )th item =100th = 80-90
3. Where l= 80, cw=10, f=40 , cf =91, then
Median = l+ cw/f (n/2 –cf)= 80+10/40 (100-91) = 82.25
Introduction to statistics

►Quartiles
o Are the three values, which divided the given data in to four equal parts, they are denoted by Q1, Q2 and Q3.

Q1= lower or first quartile, it covers 25% of the distribution

Q2= the middle or second quartile, it covers 50% of the distribution

Q3= the upper or third quartile, it covers 75% of the distribution.

►Deciles
o Are the nine values, which divide, the series in to 10 equal parts, they are denoted by D1, D2, D3,..., D9
D1= covers 10% of the distribution

D2= covers 20% of the distribution

D3= covers 30% 0f the distribution

D9= covers 90% 0f the distribution

► Percentiles
o Are the 99 values, which divide the series in to 100 equal parts. They are denoted by P1, P2,…, P99.

Note that

i) Q1=P25 Q2=D5=P50 = Median, Q3=P75

ii) D1=P10 D2=P20, D3=P30,….D9=P90.

Reading assignments:
 Quartiles, Deciles and Percentiles from the row data
For grouped data (from class frequency distribution)
Introduction to statistics

cw ¿
Qi = Lcb + ( - cf) N.B Qi class= Lcf> iN/4
f 4
cw ¿
Di= Lcb + ( - cf) N.B Di class =Lcf > iN/10
f 10
cw ¿
Pi= Lcb + ( - cf) N.B Pi class =Lcf > iN/100
f 100
E.g. for the data given below, compute the quartiles, D3, D7, P15 and P88 interpret.

marks Below 10-20 20-40 40-60 60-80 above


10
f 10 15 25 30 14 6
<cfi 10 25 50 80 94 100
Solution:-
Q1 – Size of N/4 th item= 25th item. Quartile class Lcf> iN/4 is 10- 20 quartile class
L=10, cw=10, f=15, cf=10.
cw 1 N 10
Q1 = Lcb + ( - cf) = 10 + (25-10) =20
f 4 15

Mark of 25% of the students are less than 20.


2N
Q2- size of th item =50% item 20-40 quartile Class
4
L= 20, cw=20, f=25, cf=25

cw 2 N 20
Q2= Lcb + ( - cf) = 20 + (50 - 25) = 40
f 4 25
Marks of half of students are below 40.
Introduction to statistics

3N
D3- size th= 30th item 20-40 deciles class
10
L=20, cw=20, f=25,cf=25

cw 3 N 20
D3= Lcb + ( - cf) =20 + (30 - 25) =24
f 10 25
Marks of 30% of the students are below 24.

7N
D7- size th , item= 70th item 40-60 deciles
10
L=40, cw=20, f=30, cf=50

cw 7 N 20
D7=Lcb + ( –cf) == 40 + (70-50) = 53.33
f 10 30
Marks of 70% of students is below 53.33

15 N
P15= size th = 15th 10-20 percentile class
100
L=10, cw=10, f=15, cf=10

cw 15 N 10
P15= = Lcb + ( –cf) = 10 + (15 –10) = 13.3
f 100 15
Mark of 15% of the students is below 13.33

88 N
P88 –size ( ¿ th = 88th item 60-80 percentiles class
100
L=60, cw=20, f=14, cf=80

cw 88 N
P88 = Lcb + ( – cf) = 60+20/14 (88 - 80 ) = 71.43
f 100
Introduction to statistics

Mark of 88% of students is below 71.43.

CHAPTER - FOUR
MEASURE OF DISPERSION (VARIATION)
 The degree to which a numerical data tends to spread about an average is called dispersion or variation of the data
Objectives of Measuring variation or Dispersion
o To judge the reliability of measure of central tendency,
o To compare two or more groups of numbers in terms of their variability, and
o To further statistical analysis.

Absolute and Relative Measure of Dispersion


Absolute Measure of Dispersion
o Are expressed in terms of the original unit.
o Not suitable for comparing the variability of two or more groups with d/f unit of measurement.
Introduction to statistics

Relative Measure of Dispersion


o Are the a ratio or percentage of a measure of absolute dispersion to an appropriate measure of central tendency
o These are pure numbers. That is independent of the unit of measurement.

Types of Measure of Dispersion


There are various measure of dispersions, out of which the most commonly used are:

1. Range (R) and Relative Range (RR)

2. Quartile Deviation (Q.D) and Coefficient of Quartile Deviation (C.Q.D)

3. Standard Deviation (s) and Coefficient of Variation (CV).

1. Range (R)
 R = X max – X min
o Easy to compute and a quick but not good measure of variability since it fails to take into account how the data are distributed and it is greatly affected
by extreme value.
o The following two distributions have the same range, 13, yet appear to differ greatly in the amount of variability.

Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45

Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45

Relative Range (RR)


 It is known as coefficient of range and given by :-
Xmax− X min R
RR= =
Xmax + Xmin Xmax + Xmin

Example:
1. If the range and relative range of a series are 4 and 0.25 respectively.
Then what is the value of:
a). smallest observation.
Introduction to statistics

b). largest observation.

2. Quartile Deviations (Q.D)


 It is also known as semi-inter quartile range.
 Reading assignment.

Q3−Q 1 Q3−Q 1 Q3+Q 1 Q3−Q 1


Q.D = and C.Q.D = / =
2 2 2 Q 3+Q1
Example:

Q1=21.4, Q2=27.14, Q3=31.53

Q.D =? , C.Q.D =?

3. Variance and Standard Deviation


Variance
 Is the “average squared deviation from the mean”
 Population variance 1/ N ∑Xi-
 For the case of frequency distribution it is expressed as: 1/ N ∑fiXi - i=1,2....N
1
 Sample variance(s2): s2 =
n−1
∑ ¿ ¿)2, i=1,2,3....n
 For the case of frequency distribution it is expressed as:
1
s2 =
n−1
∑ fi ¿ ¿)2, i=1, 2, 3....k
Short- cut formula:
1
s2 ¿ ¿Xi2 - n x 2) for row data,
n−1
1
s2 ¿ ¿fiXi2 - n x 2) for freq. distribution.
n−1
Introduction to statistics

Standard Deviation
o There is a problem with variances. Recall that the deviations were squared. That means the units were also squared.
o To get the units back the same as the original data values, the square root must be taken.
o = √  and s = √ s 2
Examples: find the variances and standard deviations of the following sample data 5,17,12,10. The data is given in the form of frequency
distribution.
Solutions: x =11

xi 5 10 12 17 total
(Xi- x )2 36 1 1 36 74

1
s2 =
n−1
∑ ¿ ¿)2 = 74/3 =24.67  s == √ s 2 =√ 24.67 = 4.97

class frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
x = 55

Xi(C.M) 42 47 52 57 62 67 72 total
fi( xi – x ¿ 2 1183 640 198 60 588 864 867 4400
Introduction to statistics

1
s2 =
n−1
∑ fi ¿ ¿)2 = 4400/74 = 59.46  S= √ 59.46 = 7.71
Coefficient of Variation (CV)
o Is defined as the ratio of standard deviation to the mean usually expressed as percents.
S
o CV= * 100%
x
Examples:
1. An analysis of the monthly wages paid (in birr) to workers in two firms A and B belonging to the same industry gives the following results:

value Firm A Firm B


Mean wage 52.5 47.5
Median wage 50.5 45.5
variance 100 121
o In which firm A or B is there greater variability in individual wage?
Solutions: calculate coefficient of variation for both firms.
S 10
 CVA= * 100% = * 100% = 19.05% ,
x 52.5
S 11
And CVB = * 100% = * 100% = 23.16%
x 47.5
o Since C.VA < C.VB, in firm B there is greater variability in individual wages

Exercise:-
1. A meteorologist interested in the consistency of temperatures in three cities during a given week collected the following data. The temperature for the five days of the week in the
three cities were

City -1 25 24 23 26 17
City-2 22 21 24 22 20
City-3 32 27 35 24 28

o Then, which city do you think have the most consistent temperature, based on these data?
2. Two groups of people were trained to perform a certain task and tested to find out which group is faster to learn the task. For the two groups the following
information was given:
Introduction to statistics

Value Group one Group two


Mean 10.4 min 11.9 min
Standard deviation 1.2 min 1.3 min

Relatively speaking, which group is more consistent in its performance?

Ü Moments

 If X is a variable that assume the value X1, X2,......,Xn, then


 The rth moment is defined as:
r r r r n
x 1 + x 2 + x 3 +. ..+ xn 1
xr = = ∑ xi r
n n i=1
k
1

o for the case of frequency distribution this is expressed as : x r
n i=1
fixir=

o if r=1. It is simple arithmetic mean, this is called the 1st moment.


 The rth moment about the mean( the rth central moment): denoted by Mr and defined as:
n n

Mr = ∑ ∑ ( xi−x ) r
(xi−x ¿)r
i=1 n−1 i=1 =
¿
n n n−1
 For the case of frequency distribution this is expressed as:
n

o Mr = ∑
fi (xi−x ¿)r
i=1
¿
n
o If r=2 , it is population variance, this is called the second central moment.
o If we assume n-1~n, it is also the sample variance.
Examples:

1. Find the first two moments for the following set of numbers 2,3,7
2. Find the first three central moments of the numbers in problem 1.
Solutions:
1. Use the rth moment formula.
Introduction to statistics

n
1
xr = ∑
n i=1
xi r = x 1 = (2+3+7)/3 =4, x 2 = (22+32+72)/3 = 20.67

2. Use the rth central moment formula.


( 2−4 ) + ( 3−4 ) +(7−4 )
M1 = =0
3
M2 =? , M3=?

Measure of Shapes

Skewness
Skewness is concerned with the shape the curve not size
o Skewness is the degree of asymmetry or departure from symmetry of a distribution.
o A skewed frequency distribution is one that is not symmetrical.
o If the frequency curve (smoothed frequency polygon) of a distribution has a longer tail to the right of the central maximum than to the left, the
distribution is said to be skewed to the right or said to be Positive skewness.
o If it has a longer tail to the left of the central maximum than to the right, it is said to be Skewed to the left said to have negative skewness

 For the moderately skewed distribution, the relation holds among the three commonly used measure of central tendency. Mean – mode =3*(mean – median)

Measure of Skewness: denoted by α3

There are various measure of skewness

1. The Pearsonian coefficient of skewness


mean−mode x−^x
α3 = =
standard deviation s
2. The Bowley’s coefficient of skewness(Coefficient of skewness based on quartiles)
α3 = (Q3 –Q2) –(Q2-Q1) = Q3+Q1 -2Q2

Q3-Q1 Q3 - Q1

3. The moment coefficient of skewness


Introduction to statistics

α3 = M3 = M3 
M2
3/2
(  
 

o The shape of the curve is determined by the value of α3


 If α3 > 0 then the distribution is positively skewed.
 If α3 = 0 then the distribution is symmetric.
 If α3 < 0 then the distribution is negatively skewed.

Examples:
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are 32, 30.5 and 10 respectively. What is the shape of the curve representing
the distribution?

Solutions: use the Pearsonian coefficient of skewness


mean−mode 32−30.5
α3 = = α3 = 0.15
standard deviation 10 =

α3 > 0  the distribution is positively skewed.

2. In a frequency distribution, the coefficient of the skewness based on the quartiles is given to be 0.5. If the sum of the upper and lower quartiles is 28 and the
median is 11, find the values of the upper and the lower quartiles.
Solutions:
Given:
α3 =0.5, median =Q2=11
Q1+Q3= 28....................................... (*)
Required Q1 and Q3
α3 = (Q3 –Q2) –(Q2-Q1) = Q3+Q1 -2Q2 = 0.5
Q3-Q1 Q3- Q1
Substituting the given value
Q3-Q1=12………………………… (**)
Solving (*) and (**) Q1=8 , Q3=20
3. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5. If the coefficient of variation is 20%, find the Pearsonian coefficient of
skewness and the probable mode of the distribution.
Introduction to statistics

4. The sum of fifteen observations, whose mode is 8, was found to be 150 with coefficients of variation of 20%. Then, calculate the Pearsonian coefficient of
skewness and give appropriate conclusion.

Ü Kurtosis
o Kurtosis is the degree of peakdness of a distribution, usually taken relative to a normal distribution.
o A distribution having relatively high peak is®Leptokurtic
o if a curve representing a distribution is flat topped ® Platy kurtic
o The normal distribution which is not very high peaked or flat topped ® Mesokurtic

Measure of Kurtosis
 The moment coefficient of kurtosis: denoted by α4
M4 M4
where 2 = 4
M2 σ
Where:-
M4 = is the 4th moment about mean
M2 = is 2nd moment about mean.
is population standard deviation
 The peakdness of depends on the value of α4 :
 If α4 > 3 then the curve is leptokurtic.
 If α4 = 3 the curve is Mesokurtic
 If α4 < 3 then the curve is Platykurtic.

Examples:- If the first four central moments of a distribution are:


M1= 0, M2 =16, M3 = -60, M4 =162
a). Compute a measure of skewness
b) . Compute a measure of kurtosis and give your interpretation.
Solutions:-

M3 −60
a 3/2  3 /2 the distribution is negatively skewed
M 2 16
Introduction to statistics

M4 162
b 2  2 the curve is platykurtic
M 2 16

You might also like