Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

The measure of dispersion (spread):

I
n this, we have di�erent concepts such as Range, Standard Deviation, Variance,
Quartile. Howeve……r, it mainly tells how data is spread from the center, nothing but
mean median, mode.

The range is nothing but the largest value subtracted from the lowest value. It ignores the
e�ect of outliers, considers only two points in its estimation and does not recognize data
distribution.   

Next is deviation; the deviation is calculated to know how values have deviated from the
mean. We can calculate the deviation for any central measures, i.e. mean, median, mode.
While calculating deviation, we have to ignore negative values and consider them as
positive.    

Quartile means quarterly basis calculations. Quartiles of distribution are the three values
that split the data into four equal parts like as below where Q1 is 25th percentile, Q2 is 50th
percentile, and Q3 is the 75th percentile;

Source (https://www.cdc.gov/csels/dsepd/ss1978/lesson2
/section7.html)
The Interquartile range(IQR) is a measure that indicates the extent to which the central 50%
values within the dataset are dispersed. It is calculated as Q3-Q1. As far as dealing with
outliers is concerned, IQR can be used to impute the outliers values. 

Next is variance, used mainly to �nd variation in the dataset. Variance indicates how close
to or far from the mean are most of the values from a particular variable, and the standard
deviation of the square root of the variance gives the magnitude of the variance. In other
words, the standard deviation is used to check the consistency of the data lower the high-
value consistency is there.

To calculate all the parameters under a measure of dispersion, we can code individually for
all the parameters or use the NumPy package to do so. Here, as we are dealing with the data
frame, the pandas .describe() function gives all of the parameters we need. 

dataset[['Education', 'JobCategory',
'CurrentSalary', 'After6Months', 'SalBegin', 'Job Time', 'Prev
Exep']].describe(include='all')

You might also like