BS-chapter2-2021-Summary Measures of Dispersion - Variability-22

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

Summary Measures

Measures Of Variability/Dispersion

1
IS 310 – Business Statistics Slide
1
Measures of Variability (Dispersion)
 Statisticians use summary measures to describe the amount of variability
or spread in a set of data. The most common measures of variability are
the range, the inter quartile range (IQR), variance, & standard deviation.
 The goal for variability is to obtain a measure of how spread out the
scores are in a distribution.
 A measure of variability usually accompanies a measure of central
tendency as basic descriptive statistics for a set of scores.
 Variability serves both as a descriptive measure and as an important
component of most inferential statistics.
 As a descriptive statistic, variability measures the degree to which the
scores are spread out or clustered together in a distribution.
 In the context of inferential statistics, variability provides a measure of
how accurately any individual score or sample 2
represents the entire
population.
IS 310 – Business Statistics Slide
2
Central Tendency VS Variability
 Central tendency describes the central point of the distribution,
and variability describes how the scores are scattered around that
central point.
 Together, central tendency and variability are the two primary
values that are used to describe a distribution of scores.

3
IS 310 – Business Statistics Slide
3
Measures of Variability (Dispersion)
In measures of variation, there are the sample and population standards deviation
and variance the most important measures. The coefficient of variation is the ratio of
standard deviation to the mean in %.

4
IS 310 – Business Statistics Slide
4
Measures of Variability
Variability can be measured with
 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation
In each case, variability is determined by measuring distance.

IS 310 – Business Statistics Slide


5
Range

 The range of a data set is the difference between the


largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.
The range is the total distance covered by the distribution, from the highest
score to the lowest score (using the upper and lower real limits of the range).

For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. For this set of
numbers, the range would be R=(11 – 1)= 10.

IS 310 – Business Statistics Slide


6
Range
Arrange in ascending order then
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

IS 310 – Business Statistics Slide


7
Interquartile Range (IQR = Q3-Q1) i = (Q/4)(n)
The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles.
Interquartile range is the difference between the 1st and 3rd quartile
Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are
called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively.
Q1 is the "middle" value in the first half of the rank-ordered data set.
Q2 is the median value in the set.

 The interquartile range of a data set is the difference


between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.

The interquartile range is the distance covered by the middle 50% of the
distribution (the difference between Q1 and Q3).
For example, consider the following numbers: 1, 2, 3, 4, 5, 6, 7, 8.

IS 310 – Business Statistics Slide


8
Example-Interquartile Range (IQR) i = (Q/4)(n)

Q2 is the median of the entire data set - the middle value.


In this example, we have an even number of data points, so the median is equal to the average
of the two middle values.
Thus, Q2 = (4 + 5)/2 or Q2 = 4.5. Q1 is the middle value in the first half of the data set. Since
there are an even number of data points in the first half of the data set, the middle value is the
average of the two middle values; that is,
Q1 = (2 + 3)/2 or Q1 = 2.5. Q3 is the middle value in the second half of the data set.
Again, since the second half of the data set has an even number of observations, the middle
value is the average of the two middle values; that is, Q3 = (6 + 7)/2 or Q3 = 6.5.
The interquartile range = (Q3 - Q1) = (6.5 - 2.5) = 4.
Notice that this process divided the data set into four parts of equal size. The first part consists
of 1 and 2; the second part, 3 and 4; the third part, 5 and 6; and the fourth part, 7 and 8.

IS 310 – Business Statistics Slide


9
Interquartile Range (IQR) i = (Q/4)(n)
An Alternative Definition for IQR

In some texts, the interquartile range is defined differently. It is defined as the


difference between the largest and smallest values in the middle 50% of a set of
data.
To compute an interquartile range using this definition, first remove observations
from the lower quartile. Then, remove observations from the upper quartile. Then,
from the remaining observations, compute the difference between the largest and
smallest values.
For example, consider the following numbers: 1, 2, 3, 4, 5, 6, 7, 8. After we remove
observations from the lower and upper quartiles, we are left with: 3, 4, 5, 6. The
interquartile range (IQR) would be 6 - 3 = 3.

IS 310 – Business Statistics Slide


10
i = (Q/4)(n) Interquartile Range

3rd Quartile (Q3) = 52.5=53rd loc=525


1st Quartile (Q1) = 17.5=18th loc=445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

IS 310 – Business Statistics Slide


11
IS 310 – Business Statistics Slide
12
Mean Absolute Deviation or Avg Deviation
 Mean absolute deviation (MAD) or Average deviation of a
data set is the average distance between each data value and
the mean. 
 Mean absolute deviation is a way to describe variation in a
data set. Mean Absolute Deviation=Mean: average of data;
Absolute Value: distance a number is from zero on a number
line; Deviation: measure of variability. It is used for the
calculation of demand variability. It is expressed by the
following formula. Mean deviation is rarely used.

13
IS 310 – Business Statistics Slide
13
Variance

The variance is a measure of variability that utilizes


all the data.

It is based on the difference between the value of


each observation (xi) and the mean ( x for a sample,
m for a population).

Variance- Determined by averaging the squared difference of all the


values from the mean.
- denoted by δ2
δ2 = Σ ( х – х )2 Or δ2 = Σ ( х – X )2
n-1 N

IS 310 – Business Statistics Slide


14
Variance

The variance is the average of the squared


differences between each data value and the mean.

The variance is computed as follows:


2 2
 ( xi  x )  ( x   )
s2  2  i
n 1 N

for a for a
sample population

IS 310 – Business Statistics Slide


15
Variance
(Why to Use Variance)
Look at the following two data sets:

Set1 = 20 40 50 50 60 75 80 85 90 100
Set2 = 50 60 60 60 65 65 70 70 70 80

The Mean is the same in both data sets (65).


However, there is a difference between the data sets. What’s the
difference?
In the first data set-1, data values vary widely. The data values
in the second set-2 are closer together.
The mean alone does not fully describe a data set. That’s why
we need another measure to find variance.

IS 310 – Business Statistics Slide


16
Standard Deviation
Standard deviation measures the standard distance
between a score and the Mean or square of the difference.

The standard deviation of a data set is the positive


square root of the variance.

It is measured in the same units as the data, making


it more easily interpreted than the variance.

IS 310 – Business Statistics Slide


17
Standard Deviation

 Difference: Both measure the dispersion of your data by


computing the distance of the data to its mean. The
difference between the two norms is that the standard
deviation is calculating the square of the difference whereas
the mean absolute deviation is only looking at the absolute
difference.

The standard deviation is computed as follows:

s  s2  2

for a for a
sample population
18
IS 310 – Business Statistics Slide
18
Comparison of Variance & Standard deviation
Variance is the average of the squared differences
between each data value and the Mean.
2 2
 ( xi  x )  ( xi   )
s2  2 
n 1 N

sample population

The standard deviation is computed by taking


square root of variance as follows:

s s 2
 2

for a for a
sample population
IS 310 – Business Statistics Slide
19
Problem 1 : A population consists of four observations : {1, 3, 5, 7}. Find, Mean “μ”, Median, Mode,
Population variance “σ2” , sample variance “S 2” and population standard deviation “σ” and Sample
Standard deviation “S” ?
Solution : Median = (3+5)/2 = 4, Mode = 0
Given, N=4. for var and SD, first, we need to compute the population mean μ .
Mean = μ = (ΣX / N )= ( 1 + 3 + 5 + 7 ) / 4
Mean = μ = (16)/4 Hence Mean = μ = 4 -- Variance =  
2  ( x i   ) 2

Variance Pop = σ2 = [ ( 1 - 4 )2  + ( 3 - 4 )2  + ( 5 - 4 )2  + ( 7 - 4 )2  ] / 4 


Variance Pop = σ2 = [ ( -3 )2  + ( -1 )2  + ( 1 )2  + ( 3 )2 ] / 4 
Variance Pop = σ2 = [ 9 + 1 + 1 + 9 ] / 4

Variance Population = σ2 = (20) / 4 = 5


Variance Sample = S2 = 20 /3 = 6.66)
Std Deviatn Popultn = SD = σ = variance = σ2 = 5 = 2.236
Std Deviatn Sample = SD = S = variance = S2 = 6.66 = 2.56
Note: Sometimes, students unsure about whether the denominator in the formula for variance should be N or (n - 1). We use N
to compute the variance of population, based on population data; and we use (n - 1) to estimate the variance of a population,
based on sample data. In this problem, we are computing the variance of a population based on population data, so this
ISuses
solution 310 N
– Business Statistics
in the denominator. Slide
20
Problem 2 Sample of Car rental rates in $ is: 43,35,34,58,30,30,36
Find Mean, Median, Mode, sample Variance and sample Standard Deviation.
1. Mean = (43+35+34+58+30+30+36)/7 = $ 38
2. Median = arrange ascending ordr= (30,30,34,35,36,43,58) = Middle value = 35
3. Mode = Most frequent or most repeated value = 30
Remember in sample calculations 2
we use n-1 at denominator
 ( xi  x )
Sample Variance = s 2 
n 1
Variance= [(43-38) + (35-38)2 + (34-38)2 + (58-38)2 + (30-38)2 + (30-38)2 + (36-38)2 ]
2

(7-1)
Hence Variance of sample = S2 = 582/6 = 97

And Standard deviation of sample = S = √ 97 = 9.85

Interpretation: Car rental rates deviate, on the average, from the mean by $9.85.

IS 310 – Business Statistics Slide


21
Examples: Find Variance from a Frequency Distribution Table

IS 310 – Business Statistics Slide


22
IS 310 – Business Statistics Slide
23
Coefficient of Variation

The coefficient of variation indicates how large the


standard deviation is in relation to the mean.

The coefficient of variation is computed as follows:


s   
 100  %  100  %
x   
for a for a
sample population

IS 310 – Business Statistics Slide


24
Why to use -Coefficient of Variation
Ex: 1 Student GPAs Ex: 2 Home Prices
Student # GPA Home # Price
1 4.0 1 400,000
2 3.5 2 370,000
3 3.0 3 350,000
4 3.0 4 330,000
5 2.5 5 300,000
_ _
x = 3.2 x = 350,000
Variance = s = 0.57 Variance = s = 38,078
By looking at these numbers, we cannot conclude that home prices have a
larger variation.
Calculate the coefficient of variation and see what the numbers look like.
CV = (s/x) * 100 = 17.81% CV = (s/ x) * 100 = 10.88%
Student GPAs show a larger variation compared to home prices.

IS 310 – Business Statistics Slide


25
Assignment-1 : Seventy apartments were randomly sampled. Monthly rent
prices for these apartments are listed as follows. Find sample variance “S2” &
sample Standard Deviation “S”.

 Apartment Rent Sample Data


425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

x  x i

34, 356
 490.80
Mean = n 70
 Sample Variance  Sample Standard Deviation

s2   (x i  x ) 2
 2, 996.16 s  s 2  2996.16  54.74
n1
IS 310 – Business Statistics Slide
26
 Variance for sample

s2 
 i
( x  x ) 2

 2, 996.16
n1

 Standard Deviation for sample


the standard
deviation is
s  s 2  2996.16  54.74
about 11%
of the mean
 Coefficient of Variation

s   54.74 
  100  %    100  %  11.15%
x   490.80 

IS 310 – Business Statistics Slide


27
Assignment-2:
Sample Car rental rate in $ is: 43,35,34,58,30,30,36
Find Mean, Median, Mode, MAD, Variance and Standard
Deviation.
1. Mean = X= (43+35+34+58+30+30+36)/7 = $ 38
2. Median =arrange in ascending order= 30,30,34,35,36,43,58 = 35
3. Mode = Most frequent or most repeated value = 30
4. Variance = S2 = [(43-38) + (35-38) + (34-38) + (58-38) + (30-
38) + (30-38) + (36-38) ] / (7-1) = 582/6 = 97
5. Standard deviation = S = √ 97 = 9.85
Interpretation: Car rental rates deviate, on the average, from the
mean by $ 9.85.

IS 310 – Business Statistics Slide


28
Co -Variance

Covariance & Co relation Both describe the degree to which two random variables or sets of
random variables tend to deviate from their expected values in similar ways. Intuitively, the
covariance between X and Y indicates how the values of X and Y move relative to each other. If
large values of X tend to happen with large values of Y, then (X−EX)(Y−EY) is positive on
average. In this case, the covariance is positive and we say X and Y are positively correlated. The
covariance is computed as follows:
The covariance Variation of two quantities
The covariance between X and Y is defined as 
Cov(X,Y)=E[(X−EX)(Y−EY)]=E[XY]−(EX)(EY).

IS 310 – Business Statistics Slide


29
Co Variance & Co relation
Covariance and correlation are two terms that are opposed and are both used
in statistics and regression analysis. Covariance shows you how the two
variables differ, whereas correlation shows you how the two variables are
related.

IS 310 – Business Statistics Slide


30
Example-1. For the paired observations, ( x, y ) given below:

a. Calculate covariance of x and y


b. Calculate using
x 12 13 16 18 21 22

y 10 50 30 20 60 10

IS 310 – Business Statistics Slide


31
IS 310 – Business Statistics Slide
32
Solution:
a. First calculate Mean of X & Y then find Co Variance using this formulae

x y xy x2 y
2 X-X Y-Y (x-x)(y-y)
12 10 120 144 100 -5 -21.5 107.5
13 50 650 169 2500 -4 18.5 -74
16 30 480 256 900 -1 -1.5 1.5
18 29 522 324 841 1 -2.5 -2.5
21 60 1260 441 3600 4 28.5 114
22 10 220 484 100 5 -21.5 -107.5
Σx=102 Σy=189 Σxy=3252 Σx2=1818 Σy2=8041 Σ(x-x)(y-y)= 39

x=Σx/n y=Σy/n Σ(x-x)(y-y)/n=


= 102/6=17 = 189/6=31.5 39/6 = 6.5

(Σx)2=IS10404 (Σy)2Statistics
310 – Business =35721
Slide
33
Solution:
b. Now find r

6(1818) - 10404
6(6-1)
= 16.8
6(8041) – 35721 = 417.5
6(6-1)

6(3252) – (102)(189) = 19512 – 19278


6(6-1)
= 7.8
30

= 7.8 = 7.8 = 0.001112


(16.8)(417.5) 7014 Ans

IS 310 – Business Statistics Slide


34

You might also like