Data Analysis Using Spss

2
Measures of Central Tendency and Dispersion
2.1 Measures of Central Tendency

Central tendency refers to the location of the distribution.
The most important measures of central tendency are:
1- Arithmetic mean
2- Geometric mean
3- Harmonic mean
4- Median
5- Mode
2.2 Arithmetic Mean (A.M)
It is obtained by dividing the sum of items by their
numbers. It is denoted by  (Greek letter mu) for a
population and by X for a sample.
N
 = x / N
i 1
i for ungrouped data
n
X =  x / n for ungrouped data
i 1
i
X =  fx /  f for grouped data
2.3 When To Use Arithmetic Mean

We use arithmetic mean when we are required to study
social, economic, commercial problems, like production, price,
income, export and import. It helps in getting average income,
average price, etc.
2.4 Difference Between Weighted Average and Simple
Average
In simple average we give equal weight to the variates. But
in actual field there are so many situations, where variates are not
of equal importance. So we assign weight (relative frequency) to
each variate in accordance with their importance. For example,
when we are required to examine the effect of change in the prices
of different commodities of different sections of society, we see
that the change in the price of wheat is of more importance than
the change in the price of motor car.
2.5 Advantages and Disadvantages of Arithmetic Mean
a) The advantages of arithmetic mean:
1- It is simple to understand and easy to calculate.
2- It makes use of all the data in the group.
3- It can be determined with mathematical precision.
4- It can further be manipulated algebraically.
5- It can be determined even the sum and numbers of
items are known.
b) The disadvantages of arithmetic mean:
1- A few items of a very high or very low value may make
the mean unrepresentative of the distribution.
2- It may not correspond to an actual value and this may
make it unrealistic.
3- When there are open end class intervals, assumptions
have to be made which may not be accurate.
2.6 Geometric Mean (G. M)
It is the nth positive root of the product of ‘n’ positive
values.
G. M = ( x1. x2 ... xn )1/ n
2.7 Harmonic Mean (H. M)

It is defined as the reciprocal of the A. M of the reciprocals
of the non-zero values.
n
H. M =
 (1/ xi )
H. M is applied to the data which involves, time, speed, rate, etc.
2.8 Relation Between A. M, G. M, H. M
a) If all items of a series are same then A. M = G. M = H. M,
otherwise
A. M  G. M  H. M.
b) G. M = ( H .M ).( A.M )
2.9 Median
It is the most middle value of the arranged data, either in
ascending or descending order.
2.10 When We Apply Median
We apply median to the qualitative data or quantitative data
involve extremely large/ small values.
2.11 Advantages of Median
1- It is not affected by extremely high or low value. It is,
therefore, useful for describing distributions in areas
such as wages.
2- It is easy to calculate even if not all the values are
known.
3- It is often an actual value.
2.12 Disadvantages of Median
1- It gives the value of only one item. If the values are
spread erratically, the median is not a representative
measure.
2- In a continuous series, grouped in class intervals, the
value of the median is only an estimate based on the
assumption that the items in a class are distributed
evenly within the class.
3- It is not suitable for further arithmetical calculations,
such as the number of the items multiplied by the
median will not give the total for data.
2.13 Mode
The mode is the value which occurs most frequently in the
data set.
2.14 When We Apply Mode
We apply mode when it is required to study the problems
which involves average size of shoes, clothes, agricultural holding,
2.15 Advantages of Mode
1- It is easy to calculate and simple to understand.
2- It is often an actual value. It is, therefore, may appear to
be realistic and sensible.
3- Model information can often be supplied quickly by
people who have experience in a particular area.
4- It can be best representative of the typical item because
it is value which occurs most frequently.
5- It is a commonly used average, although the people do
not always realize that they are using it.
6- It has practical uses. For instance, employers will often
adopt the model rates of pay, the rate paid by most
other employers. Cars and clothes arte made to model
sizes.
2.16 Disadvantages of Mode
1- It is not well defined and can often be a matter of
judgment.
2- It is not useful if the distribution is widely dispersed.
3- It is not suitable for further arithmetical calculations.
4- It does not include all the values in the distribution.
2.17 Empirical Relation Between Mean, Median and Mode
In a single peaked distribution, mean, median and mode
coincide. When distribution is skewed or asymmetrical then
Mode = 3median – 2mean. This relation does not hold if
distribution is J-shaped or extremely skewed distribution.
2.18 Quantiles
The class of (n-1) partition values of a variate which
divides the total frequency into a given number ‘n’ of equal
proportion is called quantiles, e.g. if n = 10 then n-1 = 9 values are
called deciles.
2.19 Quartiles
These are the values which divide data into four equal parts
when data are arranged in ascending or descending order.
2.20 Deciles
These are the values which divide data into ten equal parts
when data are arranged in ascending or descending order.
2.21 Percentiles
These are the values which divide data into hundred equal
parts when data are arranged in ascending or descending order.
2.22 Measures of Dispersion
Dispersion refers to the extent to which the values are
spread about their average. The most important measures of
dispersion are
1- Range
2- Quartile deviation
3- Average deviation
4- Standard deviation
2.23 Absolute Measure of Dispersion
An absolute measure of dispersion is the actual variation
expressed in units of the variable.
2.24 Relative Measure of Dispersion
A measure of dispersion when expressed as a pure number
in the form of coefficients, percentage or ratio, is called a relative
measure of dispersion.
2.25 Range
It is the difference between the largest value and the
smallest value of the data.
2.26 Quartile Deviation
The difference between upper and lower quartiles is called
inter-quartile range. The half of the difference between upper and
lower quartiles is called quartile deviation or semi inter-quartile
range.
2.27 Standard Deviation
It is the positive square root of the arithmetic mean of the
squared deviations taken from mean. It is denoted by  for
population and ‘s’ for sample.
The square of the standard deviation is called variance. It is
denoted by  2 for population and s 2 for sample.
2.28 Co-Efficient of Variation (C.V)
The co-efficient of variation measures the relative
dispersion. It is the ratio of standard deviation to the arithmetic
mean and expressed in percentage.
2.29 Moments
Moments are of great importance in the study of symmetry
and normality of the distribution. These are designated by the
power of the deviation before their average.
The r th moment about mean is defined as
i ( xi  x )2
mr  for sample and
n
 (x  )i
2
r  i
for population.
N
2.30 Moments Ratio
Karl Pearson defined the moments ratio as:
3 2 4
1 = , 2 =
23 2 2
1 test the symmetry of the distribution. If 1 = 0,
distribution is said to be symmetrical otherwise non-symmetrical
or skewed.
2 test the degree of peakedness of the distribution. If 2
= 3, distribution is said to be Mesokurtic, If  2 > 3, distribution is
said to be Lepokurtic, otherwise Platykurtic.
2.31 Skewness
The term skewness means the lack of symmetry of the
values about some central value i.e. mean, median or mode. A
distribution has zero skewness if it is symmetrical about its mean.
2.32 Positive Skewness
If curve has long tail towards right then the skewness will
be positive. In this case
Mean > median > mode.
2.33 Negative Skewness
If curve has long tail towards left then the skewness will be
negative. In this case
Mean < median < mode.
2.34 Symmetrical Distribution
A distribution or curve is said to be symmetrical if both
tails of the distribution are equidistant from origin.
For symmetrical distribution:
a- Mean, median and mode are identical.
b- The graph of the series will be bell shaped.
c- Quartiles are equidistant from median.
d- The sum of deviations from median is zero.
e- Always 1 = 0.
f- The normal distribution is always symmetrical but
symmetrical may or may not be normal.
2.35 Kurtosis
It is the degree of peakedness of the distribution. In other
words,  2 is a measure of kurtosis, which tells us the shape of the
curve of the distribution? Distribution is Mesokurtic if  2 = 3,
Leptokurtic when  2 > 3 and Platykurtic when  2 < 3.
2.36 Ginni’s Mean Differences

Ginni, the Italian statistician has suggested that instead of
studying dispersion from any measure of central tendency, the
mean difference between the values of all possible pairs of the item
should be formed, which would give a good measure of dispersion.
Ginni’s mean difference =  d / m
Where d = xi  x j , i  j  1, 2, …, n and m = total numbers
of pairs, i.e. n(n-1)/2.

Data Analysis Using Spss

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis Using Spss

Uploaded by

Copyright:

Available Formats

2

Measures of Central Tendency and Dispersion

2.1 Measures of Central Tendency

X =  fx /  f for grouped data

2.3 When To Use Arithmetic Mean

2.7 Harmonic Mean (H. M)

2.36 Ginni’s Mean Differences

You might also like