Professional Documents
Culture Documents
Emba (Central Tendency & Dispersion)
Emba (Central Tendency & Dispersion)
Emba (Central Tendency & Dispersion)
Syllabus
Introduction & Concepts
Descriptive Statistics & Inferential Statistics
Measures of Central Tendency [Mean, Median & Mode]Measures of
Dispersion [Standard Deviation, Variance, Coefficient of Variation]
Correlation Analysis[Pearson Correlation, Spearman Rank
Correlation]
Regression Analysis[Simple Regression]
1|Page
Definition of Statistics
The word statistics has been derived from the Latin word ‘status’. In the plural sense it means a
set of numerical figures called ‘data’ obtained by counting or measurement. In the singular sense
it means collection, classification, presentation, analysis, comparison and meaningful
interpretation of raw data
It is defined as the science which deals with the collection, analysis and interpretation of
numerical data
Presentation of findings
The planning of operations – relating to the special projects for the firm
Setting up of standards – size of employment, volume of sales, fixation of quality norms for the
manufactured product, norms for the daily output
In statistical quality control methods – statistics can be useful in various ways to ensure of
production of quality goods. This is achieved by identifying and rejecting defective or
substandard goods. The sales targets can be fixed on the basis of sale forecasts, which are done
by using varying methods of forecasting, analysis of sales etc
2|Page
Export marketing – analyzing the quality of the products, to select the right products which has
demand in the overseas markets, analyzing the statistics of imports and exports
Maintenance of cost records – ensure cost of production includes cost of raw materials and
wages
Expenditure on advertising and sales – to find association between two or more variables such as
advertising expenditure and sales
Mutual funds, banking and financial institutions – statistics provide certain tools or techniques to
a consultant or financial adviser, provide an avenue to a person to invest his savings for
reasonable returns.
Population(N)
It is a collection of people, items or events about which you want to make inferences
Population is defined as the potential set of respondents in a geographical area. It is any large
collection of objects or individuals, such as Indian housewives, consumers etc about which the
information is desired.
Parameter
It is any summary number, like an average or percentage, that describes the entire population
Ex: to determine the average annual expenditure on clothing in a city, proportion of employees
working overtime in a factory
The mean and variance for a given population are known as population parameters.
Sample (n)
It is a subgroup of population
3|Page
Statistics
Data
Qualitative data
It is a non numerical property such as satisfaction of customer, good will of a company etc
Quantitative classification
It is done according to number size like weights in kg or heights in cm. Here we classify data by
assigning arbitrary limits known as class-limits. For ex, the population of the whole country may
be classified according to different variables like age, income, wage, price, etc. hence this
classification is often ‘classification by variables
Variable
Classification of Variables
Discrete variable
A variable which can take up only exact values and not any fractional values, is called a discrete
variable. It has specific values in a given class interval.
4|Page
Continuous variable
A variable which can take up any numerical value[integral/fractional of whole number] with a
certain range is called a continuous variable. Height, weight, rainfall, time, temperature, etc., are
examples of continuous variables.
Ungrouped data
Grouped data
Ex:
Sales[‘00]: 20 40 60 80
Discrete series:
The series without class interval and having frequency. It takes specific values. The items which
can be easily counted
Ex:
30 10
45 30
5|Page
Continuous series:
The series with class interval and having frequency. It may take any value in a given class
interval. The items cannot be easily counted.
Ex:
30-40 10
40-45 30
For ex: 10 – 19
It can be made continuous by reducing the lower limit of the class interval by 0.5 and
increasing the upper limit of the class interval by 0.5 then the interval will be as follows :
Frequency
Frequency Distribution
The arrangement and display of data, where the observed value is paired with the frequency
Class limit
They are the lowest and highest values of a class. In 30 – 50, 30 is lower limit and 50 is the
higher limit
Class interval
The difference between the upper limit and the lower limit is called the class interval. 30 is
lowest and 50 is highest
6|Page
Classification of Quantitative Techniques
Descriptive Statistics
Statistical procedures used for drawing of inferences about the properties of populations from
sample data are generally referred to as sampling or inferential techniques.
Ex: Chi-Square test, Correlation, Regression, Tests of Hypothesis - t test, z test, Analysis of
Variance(ANOVA) etc
Computational measures – Arithmetic Mean (A.M), Harmonic Mean (HM) & Geometric
Mean (GM)
Arithmetic Mean is also called as Mean or Average. A.M is useful while making
comparisons among several data sets. A.M is calculated by taking all the values in the
given data set.
7|Page
X_ (sample mean) = ∑ x/n = 10+15+30+7+42+79+83/7 = 38
N – number of observations
The monthly wages of 4 workmen are Rs. 400, Rs. 440, Rs. 380 and Rs. 360. Find the A.M.
of the wages of four workmen.
Solution
x (d = x – 380)
400 20
440 60
380 0
360 -20
∑d = 60
Here A = 380, n = 4, ∑d = 60
8|Page
A weighted average is a type of average where each observation in the data set is multiplied
by a predetermined weight before calculation.
Relative weight(w) : 2 3 5
Marks(x) : 30 25 20
wx : 60 75 100
Years : 1 2 3 4 5
Income(’00) : 5 10 15 20 25
10 30
15 25
20 40
30 20
40 35
9|Page
Find the average demand of eggs in numbers/day
Solution
X f
10 30 200
15 25 375
20 40 800
30 20 600
40 35 1400
Find the A.M. of the following frequency distribution by short-cut method and direct method
x(observations) f(frequency)
92 12
125 7
180 6
80 9
Solution
Direct Method
10 | P a g e
A.M = X_ = ∑ = {fx/N} = {12*92+125*7+180*6+80*9}/ {12 + 7 + 6+ 9}
={3779/34} = 111.15
i.e. d = (x – 100)
92 12 -8 -96
125 7 25 175
180 6 80 480
80 9 -20 -180
∑f = 34 ∑ fd = 379
Arithmetic mean
F : 10 15 5 30 15 12 13
Solution
X_ = A + ∑ fd’ * i
A – Assumed mean
f – Frequency
11 | P a g e
where m – the midpoint
Locate the assumed mean (assumed mean can be identified from the centre of the
series having the midpoints)
20 – 30 15 25 -2 -30
30 – 40 5 35 -1 -5
40 – 50 30 45 - Assumed Mean 0 0
50 – 60 15 55 1 15
60 – 70 12 65 2 24
70 – 80 13 75 3 39
80 – 90 10 85 4 40
X_ = A + ∑ fd’ * i
X_ = 45 + (53) / 110 * 10
X_ = 45 + [4.818]*10 = 49.818
12 | P a g e
Median
It is the positional measure which divides the entire series into two halves. It is also called as
second quartile
Ex: 5, 6, 8, 10, 12
If n is even
4, 5, 6, 7, 8, 9
Here n = 6
Frequency : 6 10 5 4 2
Wt in kg F Cumulative frequency
112 6 6
118 10 16
13 | P a g e
122 5 21
130 4 25
40 2 27
N=∑f =
27
Frequency : 10 5 8 6 7 4
Wt in kg F Cumulative frequency
200 10 10
250 5 15
260 8 23
270 6 29
280 7 36
300 4 40
N = ∑f= 40
14 | P a g e
Median = A.M of 20th and 21th values = 260+260/2 = 260
F : 10 15 5 30 15 12 13
Solution
Md = l 1 + N/2 – C * i
Frequency
10 – 20 10 15
15 | P a g e
20 – 30 15 25
30 – 40 5 30 (C)
40 – 50 30(f) 60
50 – 60 15 75
60 – 70 12 87
70 – 80 13 100
80 – 90 10 110
Md = l 1 + N/2 – C * i
Md = 40 + 55 – 30/30 * 10
Md = 40 + 8.33 = 48. 33
Mode
For the ungrouped data, mode is the value of the variable that occurs most frequently.
16 | P a g e
For the Discrete Series
The value of the individual observations or items with the highest frequency is the mode for the
discrete series.
Continuous Series
The class interval with highest frequency is mode for the continuous series.
Ex: 8,9,11,15,16,12,15,3,7,15
In the above ex: out of 10 items the number 15 appears 3 times then 15 is called mode.
F : 10 15 5 30 15 12 13
Solution
2f1 – f0 – f2
10 – 20 10
20 – 30 15
Determination of Mode
30 – 40 5(f0)
40 – 50 30(f1)
50 – 60 15(f2)
60 – 70 12
17 | P a g e
70 – 80 13
80 – 90 10
Substitute the values in the formula we get
2f1 – f0 – f2
= 40 + (30 – 5) * 10
2*30 – 5 - 15
Measures of Dispersion
18 | P a g e
Mathematical or computational – mean deviation or average deviation, standard deviation
or root mean square deviation taken from AM
Range :
In an arranged array of data the difference between the two extreme values, i.e., the
largest and the smallest values of the distribution is called the range.
Ex: the marks obtained by 6 students were 6,8,16, 25, 30, 40. Find the rand the range
Solution
Range = L – S = 40 – 6 = 34
Weight (in Kg) 140 – 150 150 – 160 160 – 170 170 - 180
No. of bags 5 8 10 12
19 | P a g e
Range = L – S = Largest value – Smallest value
Applications of range
It is used for quality control of the finished products using the control chart for the range
in industry
It is also used by the meteorological department for forecasting weather since it gives an
idea of the fluctuation of temperatures between maximum and minimum levels
Coefficient of Range (or relative range) = Absolute range / sum of two extreme values =
L – S/L + S
Quartile Deviation
Quartile : median divides the series into two halves, whereas the quartile divides the
series into four halves
20 | P a g e
Quartile Deviation for ungrouped data
Here n = 7
Q1 = 2nd term = 9
Q3 = 6th term = 30
16 1 1
18 4 5(Q1)
21 6 11
28 9 20(Q3)
32 12 32
40 3 35
F : 10 15 5 30 15 12 13
21 | P a g e
Solution
Coefficient of QD = Q3-Q1
Q3 +Q1
Frequency
10 – 20 10 15
20 – 30 15 25(c)
l1 - 30 – 40 5(f) 30
40 – 50 30 60
50 – 60 15 75(c)
l3 - 60 – 70 12(f) 87
70 – 80 13 100
80 – 90 10 110
22 | P a g e
Q1(Lower or first Quartile) = l1 + N/4 – C * i
Q1 = 30 + 2.5 * 10
Q1 = 35
= 60 + (82.5-75) * 10
12
= 60 + 7.5 * 10
12
Q3 = 66.25
QD = 66.25 – 35/2
QD = 31.25/2 = 15.625
Coefficient of QD = Q3-Q1
Q3 +Q1
23 | P a g e
Coefficient of QD = 31.25/101.25 = 0.30
QD = Q3 – Q1 /2
Q3 – Q1
24 | P a g e
Coefficient of Quartile Deviation = Quartile Deviation/Median
(or) Q3 – Q1/Q3 + Q1
Quartile Deviation is rarely used for practical purposes since it does not consider the
variability of all the values. It gives a fair measure of variability as 50% of the observations
lie between the two quartiles and is affected by fluctuations
It denotes the total variation in the mean. Standard deviation is also called as the Root Mean
Square Deviation. The square of the standard deviation is called variance
25 | P a g e
The Standard Deviation is an absolute measure of the scatter of the various values about the
A.M. the relative measure of dispersion based on S.D. is called Coefficient of S.D. which is
given by :
About 68% of values in the population fall within ± 1 standard deviation from the mean
About 95% of the values in the population fall within ± 2 standard deviation from the mean
About 99% of the values in the population fall within ± 3 standard deviation from the mean
µ-3𝛔 µ-2𝛔 µ- 𝛔 µ µ- 𝛔 µ 3𝛔 µ
According to Prof. Karl Pearson, who first suggested this relative measure, Coefficient of
Variation(C.V) is the percentage variation in the mean whereas S.D is the total variation in the
mean. It is widely used since it provides a suitable basis of comparison when the frequency
distributions are of different sizes and have variables of different units. The expression in
percentages gives a better idea about the magnitude of deviations in a number of distributions
For comparing uniformity, homogeneity, variability, stability and consistency of two series, we
must compute the C.V. of the given series. The series with larger coefficient of variation is
considered more variable than the other. The series having smaller C.V. is said to be more
consistent, more uniform and highly stable than the other.
X – individual observation
N – number of observations
26 | P a g e
d - deviation
x (x – x) (x – x )2
9 -0.14286 0.020408
12 2.857143 8.163265
10 0.857143 0.734694
11 1.857143 3.44898
8 -1.14286 1.306122
3 -6.14286 37.73469
11 1.857143 3.44898
Total 54.85714
𝛔 = √(x- X_ )2/n-1
𝛔 = √54.85714/7-1 = 3.02
Calculating Variance
𝛔 = √∑d2/n – {∑d/n}2
d is deviation = x - A
S.No x (Observations)
1 9
27 | P a g e
2 12
3 10
4 11
5 8
6 13
7 11
8 12
9 10
10 11
11 11
12 12
13 11
14 8
15 11
16 16
Solution
𝛔 = √∑{d2/n} – {∑d/n}2
Computation of S.D
Values (x) d = x – 10 d2
9 -1 1
12 2 4
28 | P a g e
10 0 0
11 1 1
8 -2 4
13 3 9
11 1 1
12 2 4
10 0 0
11 1 1
11 1 1
12 2 4
11 1 1
8 -2 4
11 1 1
16 6 36
∑ = 176 ∑d = 16 ∑d2 = 72
Here, n = 16
𝛔 = √∑{d2/n} – {∑d/n}2
𝛔 = √∑{72/16} – {∑16/16}2
Alternative formula
𝛔 = √∑{x2/n} – {∑x/n}2
29 | P a g e
Size of items (x) Frequency
6 4
10 7
9 5
11 13
12 8
13 10
14 3
Solution
𝛔 = √∑{fx2/N} – {∑fx/N}2
6 4 24 144
10 7 70 700
9 5 45 405
11 13 143 1573
12 8 96 1152
13 10 130 1690
14 3 42 588
𝛔 = √∑{fx2/N} – {∑fx/N}2
𝛔 = √{6250/50} – {550/50}2
𝛔 = √{125.04} – {11}2
30 | P a g e
𝛔 = √4.04
𝛔 = 2.01
𝛔 = √∑{fd2/N} – {∑fd/N}2
Compute the S.D. of household size from the frequency distribution of 500 households :
1 92
2 49
3 52
4 82
5 102
6 60
7 35
8 24
9 4
Solution
1 92 -3 -276 828
2 49 -2 -98 196
3 52 -1 -52 52
4 82 0 0 0
31 | P a g e
6 60 2 120 240
7 35 3 105 315
8 24 4 96 384
9 4 5 20 100
𝛔 = 11√∑fd’2/N – {fd’/N}2 * i
d’ = m – A/i
d’ - deviation
m – midpoint
A – Assumed Mean
f : 8 24 27 21 10
44-46 8 45 -2 4 -16 32
46-48 24 47 -1 1 -24 24
32 | P a g e
48-50 27 49 0 0 0 0
50-52 21 51 1 1 21 21
52-54 10 53 2 4 20 40
90 Total 1 117
𝛔 = √∑fd’2/N – {∑fd’/N}2 * i
𝛔 = √117/90 – {1/90}2 * 2
𝛔 = 2.28
Calculating Variance
A small SD means a high degree of uniformity and homogeneity of the observations and
vice versa.
If two or more comparable series have almost identical means, the distribution with
minimum SD has the most representative mean.
CSD = 𝛔/ x
33 | P a g e
Coefficient of Variation(CV)
Coefficient of Variation is the percentage of variation in the mean. It is used to compare the
percentage of variation of the mean for the two series
Formula of CV
CV = 𝛔 / x * 100
X_ = A + ∑fd’/N * i
Frequency 2 4 9 11 12 6 4 2
Frequency 2 6 14 16 8 3 1
Frequency 14 20 42 54 45 18 7
34 | P a g e
4. Calculate Arithmetic Mean, Median and mode for the following :
Frequency 18 38 46 27 15 8
No. of 5 7 19 29 16 9 8 7
companies
6. Calculate Arithmetic Mean, Median ,mode and Quartile Deviation for the following :
No. of 1 3 11 21 43 32 9
students
Frequencies 1 3 13 17 27 36 38
F 5 16 56 19 4
35 | P a g e
9. Calculate Quartile Deviation and its coefficient
No. of 4 8 11 15 12 6 5
companies
10. Calculate Arithmetic Mean, Median and mode for the following :
Net 30- 32- 34- 36- 38- 40- 42- 44- 46- 48-
Profit(crores) 32 34 36 38 40 42 44 46 48 50
No. of 3 8 24 31 50 61 38 21 12 2
companies
11. Calculate Arithmetic Mean, Median and mode for the following :
12. Calculate Arithmetic Mean, Median and mode for the following :
Net 55- 65- 75- 85- 95- 105- 115- 125- 135-
Profit(crores) 64 74 84 94 104 114 124 134 144
36 | P a g e
No. of 2 20 79 184 302 207 82 24 4
companies
No. of 10 20 30 20 10 10
companies
No. of 8 24 27 21 10
companies
15. Calculate SD
No. of 2 12 22 20 14 4 1
companies
37 | P a g e
No. of 14 24 38 20 4
companies
No. of 8 12 6 4 10
companies
18. Compare the variability of life of two varieties of lamps using Coefficient of Variation
No. of 5 11 26 10 8
lamps(A)
No. of lamps 4 30 12 8 6
(B)
19. Compare the variability of wages for the two varieties M and N using Coefficient of
Variation
Factory M 15 30 44 60 30 14 7
Factory N 25 40 60 35 20 15 5
Additional Problems
Find the Standard Deviation for the sample observations on the weights(g) of a certain
product
S.No x (Observations)
38 | P a g e
1 9
2 12
3 10
4 11
5 8
6 13
7 11
8 12
9 10
10 11
11 11
12 12
13 11
14 8
15 11
16 16
Problem
The following table gives the number of finished articles turned out per day by different number
of workers in a factory. Find the mean value and S.D. of the daily output of finished articles:
18 3
19 7
20 11
21 14
22 18
39 | P a g e
23 17
24 13
25 8
26 5
27 4
Problem
The following data give the number of passengers travelled by Airbus from Kolkata to
Mumbai from Sunday to Saturday
No. of Passengers
320
290
265
300
270
200
315
Problem
Two batsmen A and B made the following scores in a series of cricket matches.
No. of Matches A B
1 14 37
2 13 22
3 26 56
40 | P a g e
4 53 52
5 17 14
6 29 10
7 79 37
8 36 48
9 84 20
10 49 4
Calculate the measure of C.V. and determine the consistent player among two batsmen A and B
Problem
Model A Model B
0–2 5 2
2–4 16 7
4–6 13 12
6–8 7 19
8 – 10 5 9
10 – 12 4 1
What is the average life of each model of each refrigerators? Which model has greater
uniformity?
Problem
From the prices of shares x and y below find out which is more stable in value :
x y
35 108
41 | P a g e
54 107
52 105
53 105
56 106
58 107
52 104
50 103
51 104
49 101
Problem
The following data refer to the dividend (%) paid by two companies A and B cover the last seven
years:
A B
4 12
8 8
4 3
15 15
10 6
11 4
9 10
Problem
42 | P a g e
No. of goals scored in a match No. of Matches
A B
0 26 18
1 10 8
2 7 5
3 6 6
4 4 3
The following table gives the figures of profits of two companies X and Y for the last 10 years.
Which of the two companies has greater consistency in profits :
X Y
43 | P a g e
Problem
The following table gives the figures of electricity generated (million K.W. hours) profits
of two companies X and Y. Which of the two companies have greater consistency in
electricity generation :
X Y
Problem
2005 7
2006 9
2007 10
2008 7
2009 5
44 | P a g e
Year I II
2008 68 63
2009 70 59
2010 60 55
2011 68 51
2012 65 43
Problem
2008 9
2009 10
2010 12
2011 15
2012 13
2013 10
2014 8
2015 16
2016 15
Problem
45 | P a g e
Year Earnings(Rs. Lakhs)
2008 38
2009 40
2010 65
2011 72
2012 69
2013 60
2014 87
2015 95
2016 74
Year
47 | P a g e
48 | P a g e
49 | P a g e