Professional Documents
Culture Documents
Descriptive Statistics - Numerical Measures
Descriptive Statistics - Numerical Measures
Descriptive Statistics - Numerical Measures
Managerial Decisions
Concepts covered in the last two sessions?
2
Descriptive Statistics - Numerical Measures
◼ Measures of Location
◼ Measures of Variability
3
Measures of Location
Mean
If the measures are computed
Median for data from a sample,
Mode they are called sample statistics.
Percentiles
Quartiles If the measures are computed
for data from a population,
they are called population parameters.
4
Mean
◼ Perhaps the most important measure of location is the mean.
◼ The mean provides a measure of central location.
◼ The mean of a data set is the average of all the data values.
◼ The sample mean 𝑥ҧ is the point estimator of the population mean 𝜇.
Sample Mean 𝑥ҧ
x i
x=
n
Number of observations
in the sample
6
Population Mean 𝜇
x i
=
N
Number of observations
in the population
7
Sample Mean
Example: Apartment Rents
Seventy apartments were randomly sampled in a small college town. The monthly rent prices for these
apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Sample Mean
Example: Apartment Rents
Seventy apartments were randomly sampled in a small college town. The monthly rent prices for these
apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
x=
x i
=
34, 356
= 490.80
n 70
Mean
◼ Each of them earns $35,000 a year, which makes the mean annual
income for the group as ………….
◼ Now Jack Dorsey walks into the bar; assume that Dorsey has annual
income of $1 billion
◼ Now the mean annual income of the bar patrons rises to ……….
10
Mean
◼ Each of them earns $35,000 a year, which makes the mean annual
income for the group as $35,000.
◼ Now Jack Dorsey walks into the bar; assume that Dorsey has annual
income of $1 billion
◼ Now the mean annual income of the bar patrons rises to $90,940,909.
11
Mean
12
Limitations of using Mean
◼ This isn’t a bar where multimillionaires hang out; it’s a bar where a bunch
of folks with relatively low incomes happen to be sitting next to Dorsey
◼ Because there has been explosive growth in incomes at the top end of the
distribution, the average income gets heavily skewed by the ultra rich
13
Quiz Time
◼ If there are very high or very low values in that data that don’t look
like most of the data, then the mean is _______ ____________
14
Quiz Time
◼ If there are very high or very low values in that data that don’t look
like most of the data, then the mean is not representative
15
Median
◼ A few extremely large values can inflate the mean
◼ Recall the previous example
◼ The median of a data set is the value in the middle when the data items are arranged in
ascending order
◼ Whenever a data set has extreme values, the median is the preferred measure of central
location
◼ The median is a robust estimator of central location but says nothing about how the
data on either side of its value is spread or dispersed
Median
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
Median = 19
Median
For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
21
Exercise – Compute Mean , Median and Mode
Sum 80 Sum 98
Count 16 Count 16
Mean 5 Mean 6.13
Median 5 Median 7
Mode 5 Mode 8
Sum 80
Count 16
Mean 5
Median 5
Mode 2,8
22
Percentiles
◼ A percentile provides information about how the data are spread over the interval
from the smallest value to the largest value.
◼ Admission test scores for colleges and universities are frequently reported in terms of
percentiles.
◼ The pth percentile of a data set is a value such that at least p percent of the items take
on this value or less and at least (100 - p) percent of the items take on this value or
more.
Percentiles
Arrange the data in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Quartiles
Package of Package of
company 1 company 2
25 17
20 16
15 15
10 14
5 13
Avg: 15 Avg: 15
Measures of Variability
◼ Range
◼ Interquartile Range
◼ Variance
◼ Standard Deviation
◼ Coefficient of Variation
Range
◼ The range of a data set is the difference between the largest and smallest data values.
◼ It is the simplest measure of variability.
◼ It is very sensitive to the smallest and largest data values.
Range
Example: Apartment Rents
◼ The interquartile range of a data set is the difference between the third quartile and
the first quartile.
◼ It is the range for the middle 50% of the data.
◼ It overcomes the sensitivity to extreme data values.
Interquartile Range
Example: Apartment Rents
2 ( xi − x )
2
( xi − )
2
s = =
2
n −1 N
for a for a
sample population
Variance
2 ( xi − x )
2
( xi − )
2
s = =
2
n −1 N
for a for a
sample population
s = s2 = 2
for a for a
sample population
Coefficient of Variation
◼ The coefficient of variation indicates how large the standard deviation is relative
to the mean.
◼ Stock A ◼ Stock B
41
Coefficient of Variation (CoV)
◼ Stock A ◼ Stock B
42
Sample Variance, Standard Deviation,
And Coefficient of Variation
◼ Example: Apartment Rents
Variance
s 2
=
(x i − x )2
= 2, 996.16
n−1
Standard Deviation
the standard
deviation is
s = s 2 = 2996.16 = 54.74
about 11%
of the mean
Coefficient of Variation
s 54.74
100 % = 100 % = 11.15%
x 490.80
Summary of Last Session
✓ Mean
✓ Median
✓ Mode
✓ Percentiles
✓ Quartiles
✓ Range
✓ Interquartile Range
✓ Variance
✓ Standard Deviation
✓ Coefficient of Variation
44
Distribution Shape: Skewness
xi − x
3
n
Skewness =
(n − 1)(n − 2) s
Skewness = 0
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Skewness = − .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Moderately Skewed Right
◼ Skewness is positive.
◼ Mean will usually be more than the median.
Skewness = .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Highly Skewed Right
◼ Skewness is positive (often above 1.0).
◼ Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Example: Apartment Rents
Seventy efficiency apartments were randomly sampled in a college town. The
monthly rent prices for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Distribution Shape: Skewness
Example: Apartment Rents
.25
.20
.15
.10
.05
0
z-Scores
◼ The z-score is often called the standardized value.
◼ It denotes the number of standard deviations a data value xi is away from the mean.
xi − x
zi =
s
𝑥𝑖 = 𝑥ҧ + 𝑠𝑧𝑖
z-Scores
◼ An observation’s z-score is a measure of the relative location of the observation
in a data set.
◼ A data value less than the sample mean will have a z-score less than zero.
◼ A data value greater than the sample mean will have a z-score greater than zero.
◼ A data value equal to the sample mean will have a z-score of zero.
z-Scores
Example: Apartment Rents
◼ z-Score of Smallest Value (425)
xi − x 425 − 490.80
z= = = − 1.20
s 54.74
57
Five-Number Summaries
and Box Plots
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
Five-Number Summaries
Example: Apartment Rents
A key to the development of a box plot is the computation of the median and
the quartiles Q1 and Q3.
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
◼ Limits are located (not drawn) using the interquartile range (IQR).
◼ Data outside these limits are considered outliers.
◼ The locations of each outlier is shown with the symbol * .
Box Plot
Example: Apartment Rents
There are no outliers (values less than 325 or greater than 645)
in the apartment rent data.
Box Plot
Example: Apartment Rents
◼ Whiskers (dashed lines) are drawn from the ends of the box to the
smallest and largest data values inside the limits.
400 425 450 475 500 525 550 575 600 625
(Actually, 86% of the rent values are between 409 and 573.)
Empirical Rule
When the data are believed to approximate a bell-shaped distribution …
The empirical rule can be used to determine the percentage of data values that must be within a
specified number of standard deviations of the mean.
The empirical rule is based on the normal distribution, which will be covered later
Empirical Rule
99.72%
95.44%
68.26%
x
– 3 – 1 + 1 + 3
– 2 + 2
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used to summarize the data for
one variable at a time.
75
Example: Stereo & Sound Equipment Store
76
Covariance
( xi − x )( yi − y ) for
sxy =
n −1 samples
( xi − x )( yi − y ) for
xy = populations
N
Covariance
Example: Stereo & Sound Equipment Store
79
Covariance
Example: Stereo & Sound Equipment Store
80
Covariance
Example: Stereo & Sound Equipment Store
➢ If the value of Sxy is +, the points with the greatest influence on Sxy must be in Q1 and Q3. Hence a positive value
for Sxy indicates a positive linear association between x and y.
➢ If the value of Sxy is -, the points with the greatest influence on Sxy must be in Q2 and Q4. Hence a negative value
for Sxy indicates a negative linear association between x and y.
83
Correlation Coefficient
Just because two variables are highly correlated, it does not mean that one variable
is the cause of the other.
◼ Correlation is NOT causation
85
Correlation Coefficient
for for
samples populations
Correlation Coefficient
Sample Covariance
sxy =
( x − x )( y
i i − y)
=
99
= 11
n−1 9
Sample Correlation Coefficient
sxy 11
rxy = = = 0.93
sx sy (1.49)(7.93)
Correlation Coefficient ?
89
Correlation Coefficient ?
➢ Correlation coefficient measures only the strength of linear relationship between two variables.
➢ It is possible for the correlation coefficient to be near zero, suggesting no linear relationship, when the
relationship between the two variables is non-linear.
90
Thank You !!!